production SAL

51-100 of 10000 results (95ms)

2023-03-01 §
16:19	<hnowlan@deploy2002>	helmfile [staging] DONE helmfile.d/services/thumbor: apply	[production]
16:19	<hnowlan@deploy2002>	helmfile [staging] START helmfile.d/services/thumbor: apply	[production]
16:17	<taavi@deploy2002>	Started scap: Backport for [[gerrit:891833\|Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031)]]	[production]
16:17	<hnowlan@deploy2002>	helmfile [eqiad] START helmfile.d/services/thumbor: apply	[production]
16:17	<hnowlan@deploy2002>	helmfile [codfw] DONE helmfile.d/services/thumbor: apply	[production]
16:17	<hnowlan@deploy2002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
16:15	<hnowlan@deploy2002>	helmfile [codfw] DONE helmfile.d/services/thumbor: apply	[production]
16:15	<hnowlan@deploy2002>	helmfile [staging] DONE helmfile.d/services/thumbor: sync	[production]
16:12	<hnowlan@deploy2002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
16:05	<hnowlan@deploy2002>	helmfile [staging] START helmfile.d/services/thumbor: sync	[production]
16:02	<bblack>	cr[23]-esams: manually adding brett's ssh-rsa to match https://gerrit.wikimedia.org/r/c/operations/homer/public/+/892551	[production]
16:01	<jmm@cumin2002>	END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw	[production]
16:00	<dcaro@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye	[production]
15:57	<dcaro@cumin1001>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:57	<dcaro@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:44	<root@cumin1001>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:39	<aokoth@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database	[production]
15:39	<aokoth@cumin1001>	START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database	[production]
15:35	<root@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:32	<jmm@cumin2002>	START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw	[production]
15:28	<root@cumin1001>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:22	<root@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:20	<jmm@cumin2002>	END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-canary	[production]
15:18	<jmm@cumin2002>	START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-canary	[production]
15:12	<hnowlan@deploy2002>	helmfile [staging] DONE helmfile.d/services/thumbor: apply	[production]
15:11	<root@cumin1001>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:09	<elukey@cumin2002>	START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye	[production]
15:09	<elukey@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye	[production]
15:06	<hashar>	Restarting Apache on Gerrit host	[production]
15:04	<root@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']	[production]
15:02	<hnowlan@deploy2002>	helmfile [staging] START helmfile.d/services/thumbor: apply	[production]
14:57	<jmm@cumin2002>	END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad	[production]
14:52	<dcaro@cumin1001>	END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1005	[production]
14:45	<jmm@cumin2002>	START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad	[production]
14:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw	[production]
14:45	<dcaro@cumin1001>	START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1005	[production]
14:34	<filippo@cumin1001>	conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web	[production]
14:33	<elukey@cumin2002>	START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye	[production]
14:32	<jmm@cumin2002>	START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw	[production]
14:30	<jmm@cumin2002>	END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-canary	[production]
14:30	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1008.eqiad.wmnet with OS bullseye	[production]
14:30	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye	[production]
14:29	<jmm@cumin2002>	START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-canary	[production]
14:27	<taavi>	re-start persistRevisionThreadItems.php on itwiki from P44912 after DC switchover T315510	[production]
14:27	<claime>	End mediawiki datacenter switchover - T327920	[production]
14:26	<cgoubert@deploy2002>	Finished scap: Backport for [[gerrit:892428\|debug.json: List primary DC servers first (T327920)]] (duration: 07m 54s)	[production]
14:20	<cgoubert@deploy2002>	cgoubert: Backport for [[gerrit:892428\|debug.json: List primary DC servers first (T327920)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
14:18	<cgoubert@deploy2002>	Started scap: Backport for [[gerrit:892428\|debug.json: List primary DC servers first (T327920)]]	[production]
14:16	<claime>	Removing scap lock - T327920	[production]
14:15	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)	[production]