production SAL

3901-3950 of 10000 results (48ms)

2022-01-15 §
00:27	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
00:26	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
00:14	<dduvall@deploy1002>	rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17"	[production]
2022-01-14 §
23:07	<ryankemper@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch	[production]
22:26	<ryankemper@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch	[production]
18:09	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing	[production]
18:09	<hnowlan@cumin1001>	START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing	[production]
17:44	<bblack>	drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue	[production]
17:26	<bblack>	reboot lvs600[23]	[production]
16:55	<bblack>	reboot lvs6001	[production]
16:30	<bblack>	rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed)	[production]
16:15	<dancy@deploy1002>	Synchronized README: Testing php-fpm restart (duration: 03m 18s)	[production]
16:04	<hnowlan@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster	[production]
15:40	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster	[production]
15:39	<bblack>	lvs6001 + all services downtimed	[production]
15:29	<bblack@cumin1001>	conftool action : set/pooled=yes; selector: dc=drmrs	[production]
15:00	<bblack>	silenced site=drmrs in alertmanager for one month, I think	[production]
15:00	<bblack>	silenced site=drmrs in alertmanager, I think	[production]
13:31	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye	[production]
13:20	<hnowlan@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster	[production]
12:59	<marostegui@cumin1001>	START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye	[production]
12:53	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster	[production]
12:51	<hnowlan@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster	[production]
12:49	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster	[production]
12:22	<jmm@cumin2002>	START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster	[production]
12:20	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster	[production]
12:18	<hnowlan@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster	[production]
11:51	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster	[production]
11:49	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing	[production]
11:48	<hnowlan@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing	[production]
11:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster	[production]
11:18	<jmm@cumin2002>	START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster	[production]
11:01	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org	[production]
11:00	<moritzm>	systemctl reset-failed ifup@ens5.service on archiva1002 T273026	[production]
10:56	<moritzm>	rebooting archiva1002 (running archiva.wikimedia.org)	[production]
10:56	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org	[production]
10:55	<bking@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch	[production]
10:50	<moritzm>	systemctl reset-failed ifup@ens5.service on an-test-ui1001 T273026	[production]
10:50	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet	[production]
10:42	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet	[production]
10:21	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet	[production]
10:17	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet	[production]
10:07	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet	[production]
10:05	<moritzm>	rebooting matomo1002 (running piwik.wikimedia.org)	[production]
10:04	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet	[production]
09:59	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-druid1001.eqiad.wmnet	[production]
09:55	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM an-test-druid1001.eqiad.wmnet	[production]
09:38	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt1001.wikimedia.org	[production]
09:35	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM apt1001.wikimedia.org	[production]
09:32	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1003.wikimedia.org	[production]