production SAL

4401-4450 of 10000 results (81ms)

2023-06-27 §
08:32	<kartik@deploy1002>	helmfile [eqiad] START helmfile.d/services/cxserver: apply	[production]
08:32	<root@cumin2002>	START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 767 hosts	[production]
08:31	<root@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 1265 hosts	[production]
08:30	<root@cumin2002>	START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 1265 hosts	[production]
08:29	<marostegui>	Failover m2-master to dbproxy1022 T337812	[production]
08:28	<kartik@deploy1002>	helmfile [codfw] DONE helmfile.d/services/cxserver: apply	[production]
08:28	<kartik@deploy1002>	helmfile [codfw] START helmfile.d/services/cxserver: apply	[production]
08:25	<kartik@deploy1002>	helmfile [staging] DONE helmfile.d/services/cxserver: apply	[production]
08:24	<kartik@deploy1002>	helmfile [staging] START helmfile.d/services/cxserver: apply	[production]
08:14	<kartik@deploy1002>	Finished scap: Backport for [[gerrit:933125\|Enable Content and Section Translation for 4 Wikipedias (T338123)]] (duration: 16m 17s)	[production]
08:03	<moritzm>	installing openjdk-8 security updates for bullseye	[production]
08:02	<kartik@deploy1002>	kartik: Backport for [[gerrit:933125\|Enable Content and Section Translation for 4 Wikipedias (T338123)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
07:58	<kartik@deploy1002>	Started scap: Backport for [[gerrit:933125\|Enable Content and Section Translation for 4 Wikipedias (T338123)]]	[production]
07:54	<moritzm>	uploaded openjdk-8 8u372-ga-1~deb11u1 to component/jdk8 for bullseye (forward port of Java 8 for Buster)	[production]
07:48	<hashar>	Restart Zuul due to stuck connection \| T340518 \| T309376	[production]
07:15	<elukey>	`sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet	[production]
06:22	<marostegui>	Failover m1-master to dbproxy1022 T337812	[production]
2023-06-26 §
23:21	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery	[production]
23:21	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery	[production]
23:07	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
23:02	<sbassett>	Deployed updated mitigation for T336027	[production]
23:01	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
22:55	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:51	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
22:46	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
22:33	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:31	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:24	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:18	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
22:17	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)	[production]
22:17	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
22:17	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
22:16	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)	[production]
22:05	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:58	<eevans@cumin2002>	END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance	[production]
21:57	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
21:55	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:54	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
21:53	<eevans@cumin2002>	START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance	[production]
21:53	<urandom>	pooling sessionstore/codfw for bullseye upgrades — T340043	[production]
21:45	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:44	<eevans@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS bullseye	[production]
21:43	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)	[production]
21:39	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
21:36	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
21:26	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:22	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:22	<eevans@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage	[production]
21:21	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:18	<eevans@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage	[production]