701-750 of 10000 results (36ms)
2021-08-16 §
05:09 <marostegui@cumin1001> dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17025 and previous config saved to /var/cache/conftool/dbconfig/20210816-050934-root.json [production]
05:09 <marostegui@cumin1001> dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17024 and previous config saved to /var/cache/conftool/dbconfig/20210816-050916-root.json [production]
04:54 <marostegui@cumin1001> dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17023 and previous config saved to /var/cache/conftool/dbconfig/20210816-045430-root.json [production]
04:54 <marostegui@cumin1001> dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17022 and previous config saved to /var/cache/conftool/dbconfig/20210816-045413-root.json [production]
04:49 <marostegui> Upgrade db2088 (s1 and s2) to 10.4.21 [production]
04:49 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2088 (s1 and s2) to upgrade', diff saved to https://phabricator.wikimedia.org/P17021 and previous config saved to /var/cache/conftool/dbconfig/20210816-044906-marostegui.json [production]
2021-08-15 §
20:02 <addshore> restarting blazegraph on wdqs2004 [production]
16:13 <andrew@deploy1002> Finished deploy [horizon/deploy@c23a155]: adding cinder volume resize warning (duration: 03m 52s) [production]
16:10 <andrew@deploy1002> Started deploy [horizon/deploy@c23a155]: adding cinder volume resize warning [production]
2021-08-14 §
03:54 <legoktm[m]> restarting mailman3 on lists1001, bounce runner crashed (T288880) [production]
2021-08-13 §
18:43 <bblack> reprepro: uploaded gdnsd-3.8.0-1~wmf1 to buster-wikimedia - T252132 [production]
17:32 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
17:32 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
17:06 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
17:05 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
15:39 <mutante> mw1451, mw1452, mw1454 - rebooting after reimage, memcached needs one [production]
15:30 <mutante> mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue) [production]
15:18 <godog> restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled [production]
15:14 <godog> restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled [production]
15:02 <mutante> etherpad1002 - started failed ferm [production]
15:00 <mutante> an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in ) [production]
14:57 <jelto@cumin1001> conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet [production]
14:57 <jelto@cumin1001> conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet [production]
14:54 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup [production]
14:54 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup [production]
14:50 <mutante> an-worker1079 - started failed ferm [production]
14:47 <jelto@cumin1001> conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet [production]
14:46 <jelto@cumin1001> conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet [production]
14:45 <mutante> an-worker1095 - started ferm, service failed [production]
14:44 <mutante> an-worker1082 - started ferm (was failed due to DNS hickup) [production]
14:44 <jelto@cumin1001> conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet [production]
14:43 <jelto@cumin1001> conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet [production]
14:41 <mutante> mw1419 - started ferm [production]
13:35 <sukhe> ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo [production]
13:23 <mutante> mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one [production]
13:22 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
13:21 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
13:21 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
13:21 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309 [production]
12:54 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE [production]
12:53 <godog> set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - T288815 [production]
12:53 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup [production]
12:53 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup [production]
12:52 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE [production]
12:33 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE [production]
12:31 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE [production]
12:30 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE [production]
12:29 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup [production]
12:29 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup [production]
12:29 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE [production]