production SAL

701-750 of 10000 results (30ms)

2021-08-16 §
05:09	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17025 and previous config saved to /var/cache/conftool/dbconfig/20210816-050934-root.json	[production]
05:09	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17024 and previous config saved to /var/cache/conftool/dbconfig/20210816-050916-root.json	[production]
04:54	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17023 and previous config saved to /var/cache/conftool/dbconfig/20210816-045430-root.json	[production]
04:54	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17022 and previous config saved to /var/cache/conftool/dbconfig/20210816-045413-root.json	[production]
04:49	<marostegui>	Upgrade db2088 (s1 and s2) to 10.4.21	[production]
04:49	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db2088 (s1 and s2) to upgrade', diff saved to https://phabricator.wikimedia.org/P17021 and previous config saved to /var/cache/conftool/dbconfig/20210816-044906-marostegui.json	[production]
2021-08-15 §
20:02	<addshore>	restarting blazegraph on wdqs2004	[production]
16:13	<andrew@deploy1002>	Finished deploy [horizon/deploy@c23a155]: adding cinder volume resize warning (duration: 03m 52s)	[production]
16:10	<andrew@deploy1002>	Started deploy [horizon/deploy@c23a155]: adding cinder volume resize warning	[production]
2021-08-14 §
03:54	<legoktm[m]>	restarting mailman3 on lists1001, bounce runner crashed (T288880)	[production]
2021-08-13 §
18:43	<bblack>	reprepro: uploaded gdnsd-3.8.0-1~wmf1 to buster-wikimedia - T252132	[production]
17:32	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
17:32	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
17:06	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
17:05	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
15:39	<mutante>	mw1451, mw1452, mw1454 - rebooting after reimage, memcached needs one	[production]
15:30	<mutante>	mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue)	[production]
15:18	<godog>	restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled	[production]
15:14	<godog>	restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled	[production]
15:02	<mutante>	etherpad1002 - started failed ferm	[production]
15:00	<mutante>	an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in )	[production]
14:57	<jelto@cumin1001>	conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet	[production]
14:57	<jelto@cumin1001>	conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet	[production]
14:54	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup	[production]
14:54	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup	[production]
14:50	<mutante>	an-worker1079 - started failed ferm	[production]
14:47	<jelto@cumin1001>	conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet	[production]
14:46	<jelto@cumin1001>	conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet	[production]
14:45	<mutante>	an-worker1095 - started ferm, service failed	[production]
14:44	<mutante>	an-worker1082 - started ferm (was failed due to DNS hickup)	[production]
14:44	<jelto@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet	[production]
14:43	<jelto@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet	[production]
14:41	<mutante>	mw1419 - started ferm	[production]
13:35	<sukhe>	ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo	[production]
13:23	<mutante>	mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one	[production]
13:22	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
13:21	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
13:21	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
13:21	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
12:54	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE	[production]
12:53	<godog>	set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - T288815	[production]
12:53	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup	[production]
12:53	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup	[production]
12:52	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE	[production]
12:33	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE	[production]
12:31	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE	[production]
12:30	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE	[production]
12:29	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup	[production]
12:29	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup	[production]
12:29	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE	[production]