101-150 of 10000 results (29ms)
2021-08-13 ยง
15:30 <mutante> mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue) [production]
15:18 <godog> restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled [production]
15:14 <godog> restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled [production]
15:02 <mutante> etherpad1002 - started failed ferm [production]
15:00 <mutante> an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in ) [production]
14:57 <jelto@cumin1001> conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet [production]
14:57 <jelto@cumin1001> conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet [production]
14:54 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup [production]
14:54 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup [production]
14:50 <mutante> an-worker1079 - started failed ferm [production]
14:47 <jelto@cumin1001> conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet [production]
14:46 <jelto@cumin1001> conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet [production]
14:45 <mutante> an-worker1095 - started ferm, service failed [production]
14:44 <mutante> an-worker1082 - started ferm (was failed due to DNS hickup) [production]
14:44 <jelto@cumin1001> conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet [production]
14:43 <jelto@cumin1001> conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet [production]
14:41 <mutante> mw1419 - started ferm [production]
13:35 <sukhe> ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo [production]
13:23 <mutante> mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one [production]
13:22 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
13:21 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
13:21 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
13:21 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
12:54 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE [production]
12:53 <godog> set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - T288815 [production]
12:53 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup [production]
12:53 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup [production]
12:52 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE [production]
12:33 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE [production]
12:31 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE [production]
12:30 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE [production]
12:29 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup [production]
12:29 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup [production]
12:29 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE [production]
12:28 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE [production]
12:26 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE [production]
12:26 <jelto@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE [production]
12:24 <urbanecm> mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=commonswiki --jobqueue # T288683 [production]
12:24 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE [production]
12:24 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE [production]
12:22 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE [production]
12:21 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1444.eqiad.wmnet [production]
12:21 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE [production]
12:21 <mutante> mw1444 - scap pull, pooled as new API server for the first time [production]
12:20 <dzahn@cumin1001> conftool action : set/weight=30; selector: name=mw1444.eqiad.wmnet [production]
12:19 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE [production]
11:59 <urbanecm> mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=mediawikiwiki --jobqueue # T288683 [production]
11:36 <topranks> cloudsw1-d5-eqiad - configuring new 2x40G trunk to cloudsw2-d5-eqiad with homer (T277340) [production]
11:11 <jelto> mw1455 - powering on via mgmt - OS install, initial setup (T279309, T273915) [production]
10:22 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup [production]