production SAL

1001-1050 of 10000 results (80ms)

2022-06-16 §
10:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet	[production]
10:41	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet	[production]
10:37	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet	[production]
10:36	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots	[production]
10:36	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots	[production]
10:35	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet	[production]
10:34	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet	[production]
10:31	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet	[production]
10:31	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json	[production]
10:28	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483	[production]
10:28	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483	[production]
10:21	<klausman@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483?	[production]
10:21	<klausman@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for T310483?	[production]
10:11	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster	[production]
10:08	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster	[production]
10:02	<elukey>	ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]	[production]
09:47	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage	[production]
09:44	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage	[production]
09:36	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage	[production]
09:33	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage	[production]
09:32	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster	[production]
09:21	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster	[production]
09:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depooling db1170:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json	[production]
09:11	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance	[production]
09:11	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance	[production]
09:02	<jmm@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet	[production]
08:52	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet	[production]
08:45	<moritzm>	failover ganeti master in drmrs/2 to ganeti6004	[production]
07:28	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:24	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:24	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:22	<kartik@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370\|testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)	[production]
07:18	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
07:13	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:12	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:12	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:11	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
06:49	<joal>	Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure	[production]
2022-06-15 §
22:48	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184 (T310011)', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json	[production]
22:33	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json	[production]
22:31	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster	[production]
22:18	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json	[production]
22:17	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster	[production]
22:17	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage	[production]
22:17	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster	[production]
22:16	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster	[production]
22:14	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage	[production]
22:12	<cmjohnson@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster	[production]
22:05	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage	[production]
22:03	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184 (T310011)', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json	[production]