2023-08-08
§
|
23:52 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50211 and previous config saved to /var/cache/conftool/dbconfig/20230808-235258-ladsgroup.json |
[production] |
22:33 |
<urbanecm> |
mwmaint1002: stop persistRevisionThreadItems.php frwiki instance because of T343859 (cc T315510) |
[production] |
22:04 |
<bking@deploy1002> |
Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s) |
[production] |
22:03 |
<bking@deploy1002> |
Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 |
[production] |
21:57 |
<bking@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
21:46 |
<bking@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
21:46 |
<bking@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1003.eqiad.wmnet with OS bullseye |
[production] |
21:22 |
<brett> |
Exported varnish-modules 0.15.0-4 for bookworm-wikimedia (T342154) |
[production] |
21:18 |
<bking@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage |
[production] |
21:15 |
<bking@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage |
[production] |
21:06 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 |
[production] |
21:06 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.debug for Netbox circuit ID 108 |
[production] |
21:04 |
<bking@cumin1001> |
conftool action : set/pooled=no; selector: name=wcqs1003.eqiad.wmnet,service=wcqs |
[production] |
21:02 |
<bking@cumin1001> |
conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=eqiad |
[production] |
21:02 |
<bking@cumin1001> |
START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye |
[production] |
20:58 |
<bking@deploy1002> |
Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s) |
[production] |
20:58 |
<bking@deploy1002> |
Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 |
[production] |
20:57 |
<bking@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1002.eqiad.wmnet with OS bullseye |
[production] |
20:52 |
<bking@deploy1002> |
Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 18s) |
[production] |
20:52 |
<bking@deploy1002> |
Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 |
[production] |
20:43 |
<urbanecm@deploy1002> |
Finished scap: Backport for [[gerrit:946997|Deploy to CN language wikis (T335886)]] (duration: 09m 08s) |
[production] |
20:41 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347 (duration: 10m 44s) |
[production] |
20:37 |
<urbanecm@deploy1002> |
ksarabia and urbanecm: Continuing with sync |
[production] |
20:36 |
<urbanecm@deploy1002> |
ksarabia and urbanecm: Backport for [[gerrit:946997|Deploy to CN language wikis (T335886)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) |
[production] |
20:34 |
<urbanecm@deploy1002> |
Started scap: Backport for [[gerrit:946997|Deploy to CN language wikis (T335886)]] |
[production] |
20:31 |
<urbanecm> |
mwmaint1002: `foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315510) |
[production] |
20:30 |
<urbanecm> |
mwmaint1002: `foreachwikiindblist 'group2 & s5' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353) |
[production] |
20:30 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347 |
[production] |
20:30 |
<urbanecm> |
mwmaint1002: `foreachwikiindblist 'group2 & s3' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353) |
[production] |
20:29 |
<urbanecm> |
mwmaint1002: `foreachwikiindblist 'group2 & s2' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353) |
[production] |
20:24 |
<urbanecm@deploy1002> |
Finished scap: Backport for [[gerrit:946998|Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353)]] (duration: 10m 55s) |
[production] |
20:17 |
<urbanecm@deploy1002> |
urbanecm and matmarex: Continuing with sync |
[production] |
20:16 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347 (duration: 02m 54s) |
[production] |
20:14 |
<urbanecm@deploy1002> |
urbanecm and matmarex: Backport for [[gerrit:946998|Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) |
[production] |
20:14 |
<ryankemper> |
[WDQS] Lag caught up on `wdqs1006`; repooled -> `ryankemper@wdqs1006:~$ sudo pool` |
[production] |
20:13 |
<urbanecm@deploy1002> |
Started scap: Backport for [[gerrit:946998|Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353)]] |
[production] |
20:13 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347 |
[production] |
19:28 |
<bking@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300 |
[production] |
19:28 |
<bking@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300 |
[production] |
19:23 |
<bking@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
19:06 |
<ryankemper> |
[WDQS] Depooled `wdqs1006` while it catches up on 7 hours of lag |
[production] |
19:05 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2 (duration: 11m 34s) |
[production] |
18:54 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye |
[production] |
18:54 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2 |
[production] |
18:49 |
<bking@cumin1001> |
conftool action : set/pooled=false; selector: dnsdisc=wcqs,name=eqiad |
[production] |
18:48 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints (duration: 03m 08s) |
[production] |
18:45 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints |
[production] |
18:45 |
<ryankemper@deploy1002> |
deploy aborted: 0.3.124 (duration: 01m 50s) |
[production] |
18:43 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@dff41b7]: 0.3.124 |
[production] |