production SAL

1301-1350 of 10000 results (72ms)

2023-01-25 §
18:37	<brett@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage	[production]
18:35	<hnowlan@deploy1002>	helmfile [codfw] DONE helmfile.d/services/thumbor: apply	[production]
18:34	<brett@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage	[production]
18:33	<hnowlan@deploy1002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
18:33	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/thumbor: apply	[production]
18:32	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/thumbor: apply	[production]
18:14	<brett@cumin1001>	START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye	[production]
18:11	<hnowlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/thumbor: apply	[production]
18:11	<hnowlan@deploy1002>	helmfile [staging] START helmfile.d/services/thumbor: apply	[production]
18:11	<hnowlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/thumbor: apply	[production]
18:10	<hnowlan@deploy1002>	helmfile [staging] START helmfile.d/services/thumbor: apply	[production]
18:05	<brett@cumin1001>	conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet	[production]
17:58	<brett@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye	[production]
17:32	<mutante>	removing racktables.wikimedia.org from DNS - that's it for this ancient service T327405	[production]
16:57	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be	[production]
16:57	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn	[production]
16:51	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye	[production]
16:50	<btullis@cumin1001>	START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes	[production]
16:46	<brett@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage	[production]
16:43	<brett@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage	[production]
16:34	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be	[production]
16:34	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn	[production]
16:33	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye	[production]
16:32	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage	[production]
16:28	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage	[production]
16:24	<brett@cumin1001>	START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye	[production]
16:14	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet	[production]
16:11	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage	[production]
16:09	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye	[production]
16:08	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet	[production]
16:08	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage	[production]
16:04	<btullis@cumin1001>	START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster	[production]
16:03	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
15:56	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']	[production]
15:56	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']	[production]
15:56	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']	[production]
15:53	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
15:50	<robh>	db1139 ilom wins/netbios disabled and ilom reset T327877	[production]
15:48	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye	[production]
15:47	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye	[production]
15:46	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']	[production]
15:45	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']	[production]
15:45	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']	[production]
15:44	<sukhe@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']	[production]
15:44	<sukhe@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']	[production]
15:43	<robh>	netbios wins disabled on db1140 ilom and ilom reset T327877	[production]
15:43	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye	[production]
15:38	<papaul>	on going maintenance on fasw-c-eqiad	[production]
15:33	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet	[production]
15:33	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye	[production]