production SAL

301-350 of 10000 results (53ms)

2022-06-16 §
11:53	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet	[production]
11:45	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet	[production]
11:44	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet	[production]
11:38	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet	[production]
11:35	<godog>	trim swift logs older than 25d from centrallog hosts - T309171	[production]
11:34	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots	[production]
11:34	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots	[production]
11:33	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet	[production]
11:27	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet	[production]
11:25	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet	[production]
11:22	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet	[production]
11:20	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet	[production]
11:19	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet	[production]
11:17	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet	[production]
11:16	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance	[production]
11:16	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance	[production]
11:16	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json	[production]
11:16	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet	[production]
11:12	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet	[production]
11:09	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet	[production]
11:07	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet	[production]
11:02	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet	[production]
11:01	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json	[production]
11:00	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet	[production]
10:57	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet	[production]
10:54	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet	[production]
10:53	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet	[production]
10:49	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet	[production]
10:46	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots	[production]
10:46	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots	[production]
10:46	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json	[production]
10:45	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet	[production]
10:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet	[production]
10:41	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet	[production]
10:37	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet	[production]
10:36	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots	[production]
10:36	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots	[production]
10:35	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet	[production]
10:34	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet	[production]
10:31	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet	[production]
10:31	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json	[production]
10:28	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483	[production]
10:28	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483	[production]
10:21	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483?	[production]
10:21	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483?	[production]
10:11	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster	[production]
10:08	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster	[production]
10:02	<elukey>	ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]	[production]
09:47	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage	[production]
09:44	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage	[production]