production SAL

1501-1550 of 10000 results (73ms)

2022-12-20 §
10:29	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet	[production]
10:29	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet	[production]
10:25	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet	[production]
10:24	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet	[production]
10:19	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet	[production]
10:16	<moritzm>	rebalance ganeti cluster in ulsfo after adding new node and decom of the old hardware T317247	[production]
10:06	<oblivian@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mw-web: apply	[production]
10:06	<oblivian@deploy1002>	helmfile [eqiad] START helmfile.d/services/mw-web: apply	[production]
10:05	<oblivian@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mw-web: apply	[production]
10:05	<oblivian@deploy1002>	helmfile [codfw] START helmfile.d/services/mw-web: apply	[production]
09:48	<jmm@cumin2002>	END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1	[production]
09:47	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1	[production]
08:45	<jmm@cumin2002>	END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1	[production]
08:45	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1	[production]
08:40	<jmm@cumin2002>	END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1	[production]
08:40	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1	[production]
08:38	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet	[production]
08:32	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet	[production]
04:10	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
03:56	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
02:02	<bking@cumin1001>	END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)	[production]
01:50	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
00:40	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
00:38	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
00:27	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
2022-12-19 §
23:50	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
23:32	<ryankemper@puppetmaster1001>	conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*	[production]
23:32	<ryankemper@puppetmaster1001>	conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*	[production]
23:32	<ryankemper@puppetmaster1001>	conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*	[production]
23:32	<ryankemper>	[WDQS] Temporarily removing wdqs20[09-12] from pybal; these are new hosts that aren't ready for service until data reload has completed (long-running process). In meantime, remove these so they don't factor into pybal's depool threshold	[production]
23:30	<ryankemper@puppetmaster1001>	conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*	[production]
23:30	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
23:07	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
23:05	<cwhite@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2003.codfw.wmnet with OS bullseye	[production]
23:01	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
23:01	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
22:59	<ryankemper>	[WDQS] Continuing with reboot of WDQS hosts. Doing 1 host each of `[eqiad, codfw]` X `[internal, public]`, so 4 total hosts at once	[production]
22:58	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:58	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:58	<bking@cumin1001>	END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)	[production]
22:58	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:58	<bking@cumin1001>	END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)	[production]
22:58	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:58	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:57	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:56	<ryankemper>	[WDQS] Pooled `wdqs2005`	[production]
22:43	<ryankemper>	[WDQS] Pooled `wdqs2007` (was depooled, we may have forgotten to re-pool it in the last week or so)	[production]
22:38	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)	[production]
22:34	<bking@cumin1001>	START - Cookbook sre.wdqs.reboot	[production]
22:24	<bking@cumin1001>	END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)	[production]