production SAL

851-900 of 10000 results (85ms)

2023-10-03 §
10:32	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage	[production]
10:32	<vgutierrez@cumin1001>	START - Cookbook sre.dns.netbox	[production]
10:30	<vgutierrez@cumin1001>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
10:19	<vgutierrez@cumin1001>	START - Cookbook sre.dns.netbox	[production]
10:15	<filippo@cumin1001>	START - Cookbook sre.hosts.reimage for host thanos-fe1003.eqiad.wmnet with OS bullseye	[production]
09:50	<claime>	Uncordoned kubernetes2010.codfw.wmnet	[production]
09:50	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2010.codfw.wmnet	[production]
09:49	<cgoubert@cumin1001>	START - Cookbook sre.hosts.remove-downtime for kubernetes2010.codfw.wmnet	[production]
09:45	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1002.eqiad.wmnet with OS bullseye	[production]
09:42	<dcausse@deploy2002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply	[production]
09:42	<dcausse@deploy2002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply	[production]
09:38	<ladsgroup@deploy2002>	Finished scap: Creating fonwiki (T347935) (duration: 07m 34s)	[production]
09:30	<ladsgroup@deploy2002>	Started scap: Creating fonwiki (T347935)	[production]
09:28	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change	[production]
09:28	<cgoubert@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change	[production]
09:27	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage	[production]
09:26	<claime>	Draining kubernetes2010.codfw.wmnet for reboot to change BIOS setting	[production]
09:24	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage	[production]
09:07	<filippo@cumin1001>	START - Cookbook sre.hosts.reimage for host thanos-fe1002.eqiad.wmnet with OS bullseye	[production]
09:06	<isaranto@deploy2002>	helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .	[production]
09:06	<isaranto@deploy2002>	helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .	[production]
09:05	<isaranto@deploy2002>	helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .	[production]
08:27	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting	[production]
08:27	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting	[production]
08:26	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1001.eqiad.wmnet with OS bullseye	[production]
08:17	<cmooney@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
08:15	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
08:14	<cmooney@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
08:13	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
08:12	<cmooney@cumin1001>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
08:09	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
08:03	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage	[production]
08:01	<taavi>	taavi@mwmaint2002 ~ $ mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip=155.232.7.202 # T347874	[production]
07:59	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage	[production]
07:56	<taavi@deploy2002>	Finished scap: T347874 and T347069 (duration: 29m 22s)	[production]
07:42	<taavi@deploy2002>	taavi: Continuing with sync	[production]
07:42	<filippo@cumin1001>	START - Cookbook sre.hosts.reimage for host thanos-fe1001.eqiad.wmnet with OS bullseye	[production]
07:40	<taavi@deploy2002>	taavi: T347874 and T347069 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
07:27	<taavi@deploy2002>	Started scap: T347874 and T347069	[production]
07:03	<kart_>	Updated MinT to 2023-09-28-043052-production (T343450, T341478)	[production]
07:03	<kartik@deploy2002>	helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply	[production]
06:59	<kartik@deploy2002>	helmfile [codfw] START helmfile.d/services/machinetranslation: apply	[production]
06:56	<kartik@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply	[production]
06:51	<kartik@deploy2002>	helmfile [eqiad] START helmfile.d/services/machinetranslation: apply	[production]
06:45	<kartik@deploy2002>	helmfile [staging] DONE helmfile.d/services/machinetranslation: apply	[production]
06:42	<kartik@deploy2002>	helmfile [staging] START helmfile.d/services/machinetranslation: apply	[production]
06:42	<kartik@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply	[production]
06:42	<kartik@deploy2002>	helmfile [eqiad] START helmfile.d/services/machinetranslation: apply	[production]
05:52	<stevemunene@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster	[production]
05:52	<stevemunene@cumin1001>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster	[production]