production SAL

1651-1700 of 10000 results (84ms)

2023-01-25 §
16:08	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage	[production]
16:04	<btullis@cumin1001>	START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster	[production]
16:03	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
15:56	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']	[production]
15:56	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']	[production]
15:56	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']	[production]
15:53	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
15:50	<robh>	db1139 ilom wins/netbios disabled and ilom reset T327877	[production]
15:48	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye	[production]
15:47	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye	[production]
15:46	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']	[production]
15:45	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']	[production]
15:45	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']	[production]
15:44	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']	[production]
15:44	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']	[production]
15:43	<robh>	netbios wins disabled on db1140 ilom and ilom reset T327877	[production]
15:43	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye	[production]
15:38	<papaul>	on going maintenance on fasw-c-eqiad	[production]
15:33	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet	[production]
15:33	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye	[production]
15:33	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye	[production]
15:29	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet	[production]
15:23	<btullis@cumin1001>	END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster	[production]
15:21	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye	[production]
15:19	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet	[production]
15:17	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be	[production]
15:17	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn	[production]
15:14	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye	[production]
15:13	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet	[production]
15:13	<btullis@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)	[production]
15:13	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-cluster	[production]
15:12	<urbanecm@deploy1002>	Finished scap: triggering i18n refresh for T327824 (duration: 07m 57s)	[production]
15:07	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye	[production]
15:04	<urbanecm@deploy1002>	Started scap: triggering i18n refresh for T327824	[production]
15:04	<urbanecm@deploy1002>	Finished scap: Backport for [[gerrit:882615\|Enable the Wikibase REST API on Wikidata (T324999)]] (duration: 08m 43s)	[production]
15:02	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be	[production]
15:02	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn	[production]
15:01	<urbanecm>	Overrunning B&C window	[production]
14:57	<urbanecm@deploy1002>	urbanecm and migr: Backport for [[gerrit:882615\|Enable the Wikibase REST API on Wikidata (T324999)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
14:57	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye	[production]
14:55	<urbanecm@deploy1002>	Started scap: Backport for [[gerrit:882615\|Enable the Wikibase REST API on Wikidata (T324999)]]	[production]
14:53	<btullis@cumin1001>	START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster	[production]
14:53	<urbanecm@deploy1002>	Finished scap: Backport for [[gerrit:883224\|REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547\|User impact: amend incorrect parameter for the single day streak text (T327824)]] (duration: 32m 21s)	[production]
14:53	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage	[production]
14:50	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage	[production]
14:45	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org	[production]
14:39	<urbanecm@deploy1002>	jakob and sgimeno and urbanecm: Backport for [[gerrit:883224\|REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547\|User impact: amend incorrect parameter for the single day streak text (T327824)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
14:32	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage	[production]
14:30	<jmm@cumin2002>	END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors	[production]
14:30	<jmm@cumin2002>	START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors	[production]