production SAL

1201-1250 of 10000 results (74ms)

2023-03-28 §
11:22	<hnowlan@deploy2002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
11:21	<hnowlan@deploy2002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
11:08	<akosiaris@deploy2002>	helmfile [codfw] DONE helmfile.d/services/thumbor: apply	[production]
11:00	<akosiaris@deploy2002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
10:24	<elukey@deploy2002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
10:24	<elukey@deploy2002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
10:16	<stevemunene@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage	[production]
10:12	<stevemunene@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage	[production]
09:56	<stevemunene@cumin1001>	START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye	[production]
09:45	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues	[production]
09:45	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues	[production]
09:41	<vgutierrez>	resetting cp2035 management card - T333312	[production]
09:38	<elukey>	dist-upgrade kafka-main1001 to bullseye - T332013	[production]
09:36	<godog>	silence systemdunitfailed alerts for team=wmcs - T333315	[production]
09:35	<vgutierrez>	depool cp2035 - T333312	[production]
09:28	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade	[production]
09:28	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade	[production]
09:12	<jbond@cumin1001>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts	[production]
09:11	<jbond@cumin1001>	START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts	[production]
09:11	<jbond@cumin1001>	END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts	[production]
09:11	<jbond@cumin1001>	START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts	[production]
08:58	<vgutierrez>	restart ipmiseld on cp2035	[production]
08:50	<aborrero@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org	[production]
08:49	<ayounsi@deploy1002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:48	<AndyRussG>	update payments.wiki config 65bedd4a -> e31ffd7d, payments (automatic updates only) a6c6c2b1 -> f5ec2677	[production]
08:45	<ayounsi@deploy1002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
08:43	<ayounsi@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
08:42	<aborrero@cumin2002>	START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org	[production]
08:39	<ayounsi@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
08:37	<ayounsi@deploy1002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:35	<ayounsi@deploy1002>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
08:34	<ayounsi@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:32	<ayounsi@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.	[production]
08:32	<phedenskog@deploy2002>	Finished deploy [performance/navtiming@e757bdf]: (no justification provided) (duration: 00m 06s)	[production]
08:32	<phedenskog@deploy2002>	Started deploy [performance/navtiming@e757bdf]: (no justification provided)	[production]
08:31	<ayounsi@deploy1002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.	[production]
08:29	<ayounsi@deploy1002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.	[production]
08:25	<ayounsi@deploy1002>	helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:21	<ayounsi@deploy1002>	helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
08:14	<ayounsi@deploy1002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
08:11	<oblivian@deploy2002>	Finished scap: Backport for [[gerrit:903209\|Failover statsd to graphite2004 (T330165)]] (duration: 08m 48s)	[production]
08:08	<ayounsi@deploy1002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.	[production]
08:06	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 16 hosts with reason: Switch maintenance	[production]
08:05	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 8:00:00 on 16 hosts with reason: Switch maintenance	[production]
08:05	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 21 hosts with reason: Switch maintenance	[production]
08:05	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 8:00:00 on 21 hosts with reason: Switch maintenance	[production]
08:04	<oblivian@deploy2002>	oblivian and filippo: Backport for [[gerrit:903209\|Failover statsd to graphite2004 (T330165)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
08:03	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance	[production]
08:03	<ayounsi@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:03	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance	[production]