production SAL

7651-7700 of 10000 results (110ms)

2023-01-18 §
11:54	<volans>	upgraded cumin on cumin1001 to 4.2.0-1+deb11u1	[production]
11:47	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis	[production]
11:47	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis	[production]
11:42	<jelto@cumin1001>	START - Cookbook sre.gitlab.upgrade	[production]
11:27	<jelto@cumin1001>	END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)	[production]
11:16	<volans@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
11:16	<volans@cumin1001>	START - Cookbook sre.network.cf	[production]
11:15	<volans@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
11:15	<volans@cumin1001>	START - Cookbook sre.network.cf	[production]
11:12	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye	[production]
11:11	<volans@cumin2002>	END (FAIL) - Cookbook sre.network.cf (exit_code=1)	[production]
11:11	<volans@cumin2002>	START - Cookbook sre.network.cf	[production]
11:10	<volans@cumin1001>	END (FAIL) - Cookbook sre.network.cf (exit_code=1)	[production]
11:10	<volans@cumin1001>	START - Cookbook sre.network.cf	[production]
11:10	<volans@cumin1001>	END (FAIL) - Cookbook sre.network.cf (exit_code=1)	[production]
11:10	<volans@cumin1001>	START - Cookbook sre.network.cf	[production]
11:07	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json	[production]
10:59	<volans@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
10:59	<volans@cumin1001>	START - Cookbook sre.network.cf	[production]
10:57	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage	[production]
10:54	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage	[production]
10:51	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json	[production]
10:49	<jelto@cumin1001>	START - Cookbook sre.gitlab.upgrade	[production]
10:48	<jelto@cumin1001>	END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)	[production]
10:43	<jiji@cumin1001>	START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye	[production]
10:21	<zabe@deploy1002>	Finished scap: Backport for [[gerrit:881361\|Start reading from cuc_comment_id from a few wikis (T233004)]] (duration: 09m 17s)	[production]
10:14	<zabe@deploy1002>	zabe and zabe: Backport for [[gerrit:881361\|Start reading from cuc_comment_id from a few wikis (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
10:12	<jelto@cumin1001>	START - Cookbook sre.gitlab.upgrade	[production]
10:12	<zabe@deploy1002>	Started scap: Backport for [[gerrit:881361\|Start reading from cuc_comment_id from a few wikis (T233004)]]	[production]
09:51	<elukey@deploy1002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
09:51	<elukey@deploy1002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
09:49	<godog>	start migration from webperf1004 to arclamp1001 - T319434	[production]
09:41	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet	[production]
09:39	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet	[production]
09:35	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet	[production]
09:33	<jelto@cumin1001>	END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)	[production]
09:32	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet	[production]
09:24	<jnuche@deploy1002>	Synchronized php: group1 wikis to 1.40.0-wmf.19 refs T325582 (duration: 08m 20s)	[production]
09:15	<jnuche@deploy1002>	rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19 refs T325582	[production]
08:54	<jelto@cumin1001>	START - Cookbook sre.gitlab.upgrade	[production]
08:34	<mvernon@cumin1001>	conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet	[production]
08:34	<mvernon@cumin1001>	conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet	[production]
08:32	<mvernon@cumin1001>	conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw	[production]
08:32	<mvernon@cumin1001>	conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw	[production]
08:32	<mvernon@cumin1001>	conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw	[production]
08:30	<jelto@cumin1001>	END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)	[production]
07:56	<jelto@cumin1001>	START - Cookbook sre.gitlab.upgrade	[production]
02:37	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable	[production]
02:37	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable	[production]
02:36	<sukhe@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be	[production]