2021-01-08
§
|
09:01 |
<godog> |
swift codfw-prod: more weight to ms-be20[58-61] - T269337 |
[production] |
08:12 |
<marostegui> |
Deploy schema change on s4 codfw master - T270187 |
[production] |
07:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P13669 and previous config saved to /var/cache/conftool/dbconfig/20210108-075714-marostegui.json |
[production] |
07:23 |
<marostegui> |
Deploy schema change on s5 codfw master - T270187 |
[production] |
06:33 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1085 to clone db1155:3316 T268742 ', diff saved to https://phabricator.wikimedia.org/P13666 and previous config saved to /var/cache/conftool/dbconfig/20210108-063301-marostegui.json |
[production] |
06:18 |
<marostegui> |
Deploy schema change on s2 codfw master - T270187 |
[production] |
04:59 |
<mutante> |
mw1266 - restart-php7.2-fpm |
[production] |
03:04 |
<ryankemper> |
[wdqs deploy] Deploy complete, service is healthy. This is done. |
[production] |
02:35 |
<ryankemper> |
[wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
02:35 |
<ryankemper> |
[wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
02:34 |
<ryankemper> |
[wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
02:27 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s) |
[production] |
02:15 |
<ryankemper> |
[wdqs deploy] Nevermind - the UI failure I mentioned above is transient. Restarting my ssh tunnel seemed to make the problem go away. Proceeding with deploy |
[production] |
02:12 |
<ryankemper> |
[wdqs deploy] While queries run fine, it looks like there might be a UI glitch in this version. Digging in to see if it's transient, but I'll likely be aborting this deploy |
[production] |
02:09 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@b15fc5c]: 0.3.58 |
[production] |
02:09 |
<ryankemper> |
[wdqs deploy] Tests passing on canary before beginning wdqs deploy, proceeding |
[production] |
01:29 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet |
[production] |
01:28 |
<mutante> |
mw1276, mw1277 - first API appervers on buster, now serving traffic, free to depool if any issues |
[production] |
01:28 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet |
[production] |
01:28 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet |
[production] |
01:24 |
<mutante> |
mw1266 - another buster appserver now serving traffic |
[production] |
01:24 |
<mutante> |
mw1265 - raised weight to 25 like regular appservers (buster) |
[production] |
01:23 |
<dzahn@cumin1001> |
conftool action : set/weight=25; selector: name=mw1265.eqiad.wmnet |
[production] |
01:18 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1266.eqiad.wmnet |
[production] |
01:17 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1277.eqiad.wmnet |
[production] |
01:17 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1276.eqiad.wmnet |
[production] |
01:16 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet |
[production] |
01:12 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1266.eqiad.wmnet |
[production] |
00:27 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE |
[production] |
00:25 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE |
[production] |
00:23 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE |
[production] |
00:23 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE |
[production] |
00:22 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE |
[production] |
00:21 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE |
[production] |
00:17 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE |
[production] |
00:15 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE |
[production] |
00:06 |
<jforrester@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Undeploy graphoid on enwiki T271495 (duration: 00m 57s) |
[production] |
2021-01-07
§
|
23:55 |
<mutante> |
reimaging mw1267,mw1276,mw1277 |
[production] |
23:28 |
<mutante> |
reimaging mw1266 |
[production] |
23:14 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 00s) |
[production] |
23:12 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host |
[production] |
22:54 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 04s) |
[production] |
22:54 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host |
[production] |
22:52 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 07m 44s) |
[production] |
22:44 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host |
[production] |
22:41 |
<andrew@deploy1001> |
Finished deploy [striker/deploy@e4db843]: striker -> labweb1002 (duration: 00m 04s) |
[production] |
22:41 |
<andrew@deploy1001> |
Started deploy [striker/deploy@e4db843]: striker -> labweb1002 |
[production] |
22:39 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 06s) |
[production] |
22:39 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host |
[production] |
22:31 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |