production SAL

6751-6800 of 10000 results (54ms)

2021-08-04 §
17:15	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE	[production]
17:13	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE	[production]
17:12	<urbanecm@deploy1002>	Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 66c2c7593322dfc575edc818aaff8d9b79466bdd: updateMenteeData: Output how long the script took (T287964) (duration: 01m 07s)	[production]
17:11	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE	[production]
17:11	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE	[production]
17:10	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
17:10	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
17:09	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE	[production]
17:08	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE	[production]
16:57	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
16:55	<mutante>	mw2351, mw2353, mw2355 - scap pull	[production]
16:40	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:37	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
16:25	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2355.codfw.wmnet with reason: reimage	[production]
16:25	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw2355.codfw.wmnet with reason: reimage	[production]
16:23	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE	[production]
16:23	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage	[production]
16:22	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage	[production]
16:22	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage	[production]
16:22	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage	[production]
16:21	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage	[production]
16:21	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage	[production]
16:21	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE	[production]
16:21	<joe>	find . -type f -delete on /var/cache/nginx-docker-registry on registry2, the disk is too small for unbound cache and* accepting large uploads	[production]
16:20	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE	[production]
16:19	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE	[production]
16:18	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE	[production]
16:16	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE	[production]
16:15	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009	[production]
16:15	<hnowlan@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009	[production]
16:14	<hnowlan>	draining maps1008 from cassandra cluster	[production]
16:13	<hnowlan@puppetmaster1001>	conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet	[production]
16:02	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage	[production]
16:02	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage	[production]
16:01	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2380.codfw.wmnet with reason: reimage	[production]
16:01	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw2380.codfw.wmnet with reason: reimage	[production]
16:01	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage	[production]
16:01	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage	[production]
15:58	<mutante>	mw2351, mw2353, mw2355, mw2357 - converting from appserver to jobrunner, mw2377, mw2378, mw2379, mw2380 - converting from jobrunner to appserver - for balancing of server types over rows	[production]
15:51	<dzahn@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet	[production]
15:50	<dzahn@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw237[789].codfw.wmnet	[production]
15:48	<dzahn@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw235[1357].codfw.wmnet	[production]
15:47	<dzahn@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw235[1357].wmnet	[production]
14:30	<godog>	upgrade prometheus on cloudmetrics hosts - T222113	[production]
14:28	<godog>	upgrade prometheus on prometheus4001 - T222113	[production]
14:19	<moritzm>	imported gitlab-ce 13.12.9 to thirdparty/gitlab T287671	[production]
14:18	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
14:17	<godog>	depool prometheus2004 and pool prometheus2003 - T222113	[production]
14:13	<kormat@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) T286226	[production]
14:13	<kormat@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) T286226	[production]