production SAL

5651-5700 of 10000 results (92ms)

2023-03-28 §
12:56	<elukey@deploy2002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.	[production]
12:56	<elukey@deploy2002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.	[production]
12:44	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108	[production]
12:44	<ayounsi@cumin1001>	START - Cookbook sre.network.debug for Netbox circuit ID 108	[production]
12:43	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108	[production]
12:43	<ayounsi@cumin1001>	START - Cookbook sre.network.debug for Netbox circuit ID 108	[production]
12:38	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108	[production]
12:38	<ayounsi@cumin1001>	START - Cookbook sre.network.debug for Netbox circuit ID 108	[production]
12:36	<eoghan@cumin1001>	END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict1002.eqiad.wmnet with OS bullseye	[production]
12:34	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112	[production]
12:34	<ayounsi@cumin1001>	START - Cookbook sre.network.debug for Netbox circuit ID 112	[production]
12:24	<eoghan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage	[production]
12:21	<eoghan@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage	[production]
12:20	<elukey@deploy2002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
12:20	<elukey@deploy2002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
12:16	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295	[production]
12:15	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 45295	[production]
12:09	<eoghan@cumin1001>	START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye	[production]
11:57	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade	[production]
11:57	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade	[production]
11:56	<elukey>	dist-upgrade kafka-main1002 to debian bullseye - T332013	[production]
11:51	<ladsgroup@deploy2002>	Finished scap: Backport for [[gerrit:903549\|api: Mark query as read-only to avoid regex on SQL (T332942)]] (duration: 18m 42s)	[production]
11:47	<hnowlan@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/thumbor: apply	[production]
11:37	<hnowlan@deploy2002>	helmfile [eqiad] START helmfile.d/services/thumbor: apply	[production]
11:34	<hnowlan@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/thumbor: apply	[production]
11:34	<ladsgroup@deploy2002>	ladsgroup: Backport for [[gerrit:903549\|api: Mark query as read-only to avoid regex on SQL (T332942)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet	[production]
11:32	<ladsgroup@deploy2002>	Started scap: Backport for [[gerrit:903549\|api: Mark query as read-only to avoid regex on SQL (T332942)]]	[production]
11:24	<hnowlan@deploy2002>	helmfile [eqiad] START helmfile.d/services/thumbor: apply	[production]
11:23	<hnowlan@deploy2002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
11:22	<hnowlan@deploy2002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
11:22	<hnowlan@deploy2002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
11:21	<hnowlan@deploy2002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
11:08	<akosiaris@deploy2002>	helmfile [codfw] DONE helmfile.d/services/thumbor: apply	[production]
11:00	<akosiaris@deploy2002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
10:24	<elukey@deploy2002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
10:24	<elukey@deploy2002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
10:16	<stevemunene@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage	[production]
10:12	<stevemunene@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage	[production]
09:56	<stevemunene@cumin1001>	START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye	[production]
09:45	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues	[production]
09:45	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues	[production]
09:41	<vgutierrez>	resetting cp2035 management card - T333312	[production]
09:38	<elukey>	dist-upgrade kafka-main1001 to bullseye - T332013	[production]
09:36	<godog>	silence systemdunitfailed alerts for team=wmcs - T333315	[production]
09:35	<vgutierrez>	depool cp2035 - T333312	[production]
09:28	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade	[production]
09:28	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade	[production]
09:12	<jbond@cumin1001>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts	[production]
09:11	<jbond@cumin1001>	START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts	[production]
09:11	<jbond@cumin1001>	END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts	[production]