451-500 of 10000 results (26ms)
2021-01-08 ยง
13:37 <klausman@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE [production]
12:52 <klausman@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE [production]
12:49 <klausman@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE [production]
12:04 <marostegui@cumin1001> dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13694 and previous config saved to /var/cache/conftool/dbconfig/20210108-120415-root.json [production]
11:49 <marostegui@cumin1001> dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13693 and previous config saved to /var/cache/conftool/dbconfig/20210108-114912-root.json [production]
11:34 <marostegui@cumin1001> dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13692 and previous config saved to /var/cache/conftool/dbconfig/20210108-113408-root.json [production]
11:19 <marostegui@cumin1001> dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13691 and previous config saved to /var/cache/conftool/dbconfig/20210108-111905-root.json [production]
11:17 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P13690 and previous config saved to /var/cache/conftool/dbconfig/20210108-111733-marostegui.json [production]
11:13 <marostegui@cumin1001> dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13689 and previous config saved to /var/cache/conftool/dbconfig/20210108-111345-root.json [production]
10:58 <marostegui@cumin1001> dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13688 and previous config saved to /var/cache/conftool/dbconfig/20210108-105842-root.json [production]
10:43 <marostegui@cumin1001> dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13676 and previous config saved to /var/cache/conftool/dbconfig/20210108-104338-root.json [production]
10:38 <urbanecm@deploy1001> Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 10s) [production]
10:28 <marostegui@cumin1001> dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13675 and previous config saved to /var/cache/conftool/dbconfig/20210108-102835-root.json [production]
10:26 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13674 and previous config saved to /var/cache/conftool/dbconfig/20210108-102606-marostegui.json [production]
10:01 <elukey> restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well [production]
10:00 <marostegui@cumin1001> dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13673 and previous config saved to /var/cache/conftool/dbconfig/20210108-100040-root.json [production]
09:45 <marostegui@cumin1001> dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13672 and previous config saved to /var/cache/conftool/dbconfig/20210108-094535-root.json [production]
09:30 <marostegui@cumin1001> dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13671 and previous config saved to /var/cache/conftool/dbconfig/20210108-093032-root.json [production]
09:30 <marostegui> Restart mysql on db1115 (tendril/dbtree) [production]
09:15 <marostegui@cumin1001> dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13670 and previous config saved to /var/cache/conftool/dbconfig/20210108-091528-root.json [production]
09:08 <moritzm> installing libxstream-java security updates on Buster [production]
09:01 <godog> swift codfw-prod: more weight to ms-be20[58-61] - T269337 [production]
08:12 <marostegui> Deploy schema change on s4 codfw master - T270187 [production]
07:57 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P13669 and previous config saved to /var/cache/conftool/dbconfig/20210108-075714-marostegui.json [production]
07:23 <marostegui> Deploy schema change on s5 codfw master - T270187 [production]
06:33 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1085 to clone db1155:3316 T268742 ', diff saved to https://phabricator.wikimedia.org/P13666 and previous config saved to /var/cache/conftool/dbconfig/20210108-063301-marostegui.json [production]
06:18 <marostegui> Deploy schema change on s2 codfw master - T270187 [production]
04:59 <mutante> mw1266 - restart-php7.2-fpm [production]
03:04 <ryankemper> [wdqs deploy] Deploy complete, service is healthy. This is done. [production]
02:35 <ryankemper> [wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` [production]
02:35 <ryankemper> [wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [production]
02:34 <ryankemper> [wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` [production]
02:27 <ryankemper@deploy1001> Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s) [production]
02:15 <ryankemper> [wdqs deploy] Nevermind - the UI failure I mentioned above is transient. Restarting my ssh tunnel seemed to make the problem go away. Proceeding with deploy [production]
02:12 <ryankemper> [wdqs deploy] While queries run fine, it looks like there might be a UI glitch in this version. Digging in to see if it's transient, but I'll likely be aborting this deploy [production]
02:09 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@b15fc5c]: 0.3.58 [production]
02:09 <ryankemper> [wdqs deploy] Tests passing on canary before beginning wdqs deploy, proceeding [production]
01:29 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet [production]
01:28 <mutante> mw1276, mw1277 - first API appervers on buster, now serving traffic, free to depool if any issues [production]
01:28 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet [production]
01:28 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet [production]
01:24 <mutante> mw1266 - another buster appserver now serving traffic [production]
01:24 <mutante> mw1265 - raised weight to 25 like regular appservers (buster) [production]
01:23 <dzahn@cumin1001> conftool action : set/weight=25; selector: name=mw1265.eqiad.wmnet [production]
01:18 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1266.eqiad.wmnet [production]
01:17 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1277.eqiad.wmnet [production]
01:17 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1276.eqiad.wmnet [production]
01:16 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet [production]
01:12 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1266.eqiad.wmnet [production]
00:27 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE [production]