production SAL

701-750 of 10000 results (71ms)

2022-06-16 §
11:01	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json	[production]
11:00	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet	[production]
10:57	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet	[production]
10:54	<klausman@cumin1001>	START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet	[production]
10:53	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet	[production]
10:49	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet	[production]
10:46	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots	[production]
10:46	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots	[production]
10:46	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json	[production]
10:45	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet	[production]
10:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet	[production]
10:41	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet	[production]
10:37	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet	[production]
10:36	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots	[production]
10:36	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots	[production]
10:35	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet	[production]
10:34	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet	[production]
10:31	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet	[production]
10:31	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json	[production]
10:28	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483	[production]
10:28	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483	[production]
10:21	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483?	[production]
10:21	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483?	[production]
10:11	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster	[production]
10:08	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster	[production]
10:02	<elukey>	ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]	[production]
09:47	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage	[production]
09:44	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage	[production]
09:36	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage	[production]
09:33	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage	[production]
09:32	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster	[production]
09:21	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster	[production]
09:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depooling db1170:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json	[production]
09:11	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance	[production]
09:11	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance	[production]
09:02	<jmm@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet	[production]
08:52	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet	[production]
08:45	<moritzm>	failover ganeti master in drmrs/2 to ganeti6004	[production]
07:28	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:24	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:24	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:22	<kartik@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370\|testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)	[production]
07:18	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
07:13	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:12	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:12	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:11	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
06:49	<joal>	Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure	[production]
2022-06-15 §
22:48	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184 (T310011)', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json	[production]
22:33	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json	[production]