production SAL

1701-1750 of 10000 results (36ms)

2021-08-13 §
15:30	<mutante>	mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue)	[production]
15:18	<godog>	restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled	[production]
15:14	<godog>	restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled	[production]
15:02	<mutante>	etherpad1002 - started failed ferm	[production]
15:00	<mutante>	an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in )	[production]
14:57	<jelto@cumin1001>	conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet	[production]
14:57	<jelto@cumin1001>	conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet	[production]
14:54	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup	[production]
14:54	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup	[production]
14:50	<mutante>	an-worker1079 - started failed ferm	[production]
14:47	<jelto@cumin1001>	conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet	[production]
14:46	<jelto@cumin1001>	conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet	[production]
14:45	<mutante>	an-worker1095 - started ferm, service failed	[production]
14:44	<mutante>	an-worker1082 - started ferm (was failed due to DNS hickup)	[production]
14:44	<jelto@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet	[production]
14:43	<jelto@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet	[production]
14:41	<mutante>	mw1419 - started ferm	[production]
13:35	<sukhe>	ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo	[production]
13:23	<mutante>	mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one	[production]
13:22	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
13:21	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
13:21	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
13:21	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309	[production]
12:54	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE	[production]
12:53	<godog>	set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - T288815	[production]
12:53	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup	[production]
12:53	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup	[production]
12:52	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE	[production]
12:33	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE	[production]
12:31	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE	[production]
12:30	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE	[production]
12:29	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup	[production]
12:29	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup	[production]
12:29	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE	[production]
12:28	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE	[production]
12:26	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE	[production]
12:26	<jelto@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE	[production]
12:24	<urbanecm>	mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=commonswiki --jobqueue # T288683	[production]
12:24	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE	[production]
12:24	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE	[production]
12:22	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE	[production]
12:21	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw1444.eqiad.wmnet	[production]
12:21	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE	[production]
12:21	<mutante>	mw1444 - scap pull, pooled as new API server for the first time	[production]
12:20	<dzahn@cumin1001>	conftool action : set/weight=30; selector: name=mw1444.eqiad.wmnet	[production]
12:19	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE	[production]
11:59	<urbanecm>	mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=mediawikiwiki --jobqueue # T288683	[production]
11:36	<topranks>	cloudsw1-d5-eqiad - configuring new 2x40G trunk to cloudsw2-d5-eqiad with homer (T277340)	[production]
11:11	<jelto>	mw1455 - powering on via mgmt - OS install, initial setup (T279309, T273915)	[production]
10:22	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup	[production]