production SAL

1151-1200 of 10000 results (81ms)

2023-06-27 §
07:48	<hashar>	Restart Zuul due to stuck connection \| T340518 \| T309376	[production]
07:15	<elukey>	`sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet	[production]
06:22	<marostegui>	Failover m1-master to dbproxy1022 T337812	[production]
2023-06-26 §
23:21	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery	[production]
23:21	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery	[production]
23:07	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
23:02	<sbassett>	Deployed updated mitigation for T336027	[production]
23:01	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
22:55	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:51	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
22:46	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
22:33	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:31	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:24	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:18	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
22:17	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)	[production]
22:17	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
22:17	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
22:16	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)	[production]
22:05	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:58	<eevans@cumin2002>	END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance	[production]
21:57	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
21:55	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:54	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
21:53	<eevans@cumin2002>	START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance	[production]
21:53	<urandom>	pooling sessionstore/codfw for bullseye upgrades — T340043	[production]
21:45	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:44	<eevans@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS bullseye	[production]
21:43	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)	[production]
21:39	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
21:36	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
21:26	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:22	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:22	<eevans@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage	[production]
21:21	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:18	<eevans@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage	[production]
21:15	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
21:13	<ryankemper@puppetmaster1001>	conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*	[production]
21:13	<ryankemper@puppetmaster1001>	conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*	[production]
21:13	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
21:02	<eevans@cumin2002>	START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS bullseye	[production]
20:55	<eevans@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2003.codfw.wmnet with OS bullseye	[production]
20:45	<eevans@cumin2002>	START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS bullseye	[production]
20:42	<eevans@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2001.codfw.wmnet with OS bullseye	[production]
20:34	<brennen@deploy1002>	Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab1004 (duration: 00m 31s)	[production]
20:33	<brennen@deploy1002>	Started deploy [phabricator/deployment@0529926]: deploy latest state to phab1004	[production]
20:30	<brennen@deploy1002>	Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004 (duration: 00m 34s)	[production]
20:30	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: patch application	[production]
20:30	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on phab2002.codfw.wmnet with reason: patch application	[production]
20:30	<brennen@deploy1002>	Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004	[production]