production SAL

5301-5350 of 10000 results (44ms)

2021-02-16 §
23:44	<mutante>	reimaging mwdebug1001 with buster	[production]
23:43	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet	[production]
23:37	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: OS upgrade	[production]
23:37	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: OS upgrade	[production]
23:09	<twentyafterfour@deploy1001>	Synchronized php-1.36.0-wmf.30/includes/HookContainer/DeprecatedHooks.php: silence deprecation refs T274889 (duration: 01m 14s)	[production]
22:52	<jgleeson>	updated payments-wiki config to 3d1b4564a2	[production]
22:39	<gehel>	restarting wdqs-updater on wdqs2001	[production]
22:35	<bstorm@cumin1001>	END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)	[production]
22:23	<bstorm@cumin1001>	START - Cookbook wmcs.wikireplicas.add_wiki	[production]
22:22	<akosiaris>	re-enable puppet and squid on install1003. wdqs seems to be mildly related to the outage, restart it	[production]
22:09	<elukey@cumin1001>	END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster	[production]
21:45	<akosiaris>	stop squid as a stopgap on install1003 and disable puppet so that it is not restarted while we figure out what wdqs updater is doing to cause issue to mediawiki	[production]
20:47	<marxarelli>	1.36.0-wmf.31 rolled to group0. no new errors for wmf.31 (T271345)	[production]
20:33	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.31	[production]
20:20	<mutante>	mwdebug1002 has been recreated on buster and has been repooled after scap pull - you can find a .tar.gz in your home with the contents of your home before reimaging, fingerprint at T274023#6835116	[production]
20:18	<legoktm@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet	[production]
20:18	<legoktm@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet	[production]
20:18	<legoktm@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1289.eqiad.wmnet	[production]
20:18	<legoktm@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet	[production]
20:17	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet	[production]
20:15	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mwdebug1002.eqiad.wmnet	[production]
20:04	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet	[production]
20:04	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet	[production]
20:04	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet	[production]
20:03	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=mw1288.eqiad.wmnet	[production]
19:58	<ryankemper>	[WDQS] De-pooled `wdqs100[4,7]` to catch up on lag, and pooled `wdqs100[5,6]`	[production]
19:09	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade	[production]
19:09	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade	[production]
19:06	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1297.eqiad.wmnet with reason: REIMAGE	[production]
19:04	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1290.eqiad.wmnet with reason: REIMAGE	[production]
19:03	<legoktm@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1297.eqiad.wmnet with reason: REIMAGE	[production]
19:02	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1289.eqiad.wmnet with reason: REIMAGE	[production]
19:01	<legoktm@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1290.eqiad.wmnet with reason: REIMAGE	[production]
19:00	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1288.eqiad.wmnet with reason: REIMAGE	[production]
18:59	<mutante>	puppetmaster1002 - puppet cert clean mwdebug1002.eqiad.wmnet, sign new request, initial puppet run (T274023)	[production]
18:59	<legoktm@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1289.eqiad.wmnet with reason: REIMAGE	[production]
18:58	<legoktm@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1288.eqiad.wmnet with reason: REIMAGE	[production]
18:52	<mutante>	re-creating mwdebug1002	[production]
18:49	<dduvall@deploy1001>	Finished scap: testwikis wikis to 1.36.0-wmf.31 (duration: 49m 37s)	[production]
18:41	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1346.eqiad.wmnet	[production]
18:38	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet	[production]
18:37	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet	[production]
18:35	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1346.eqiad.wmnet	[production]
18:33	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1352.eqiad.wmnet	[production]
18:32	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet	[production]
18:28	<mutante>	mw1352 - powercycle via mgmt	[production]
18:04	<dduvall@deploy1001>	Started scap: testwikis wikis to 1.36.0-wmf.31	[production]
17:41	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1346.eqiad.wmnet with reason: REIMAGE	[production]
17:39	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1347.eqiad.wmnet with reason: REIMAGE	[production]
17:39	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1346.eqiad.wmnet with reason: REIMAGE	[production]