2021-04-27
§
|
05:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json |
[production] |
05:21 |
<marostegui> |
Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) T258361 |
[production] |
05:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json |
[production] |
05:18 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json |
[production] |
05:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 T258361', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json |
[production] |
05:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json |
[production] |
05:03 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet |
[production] |
04:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet |
[production] |
04:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json |
[production] |
04:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 T258361', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json |
[production] |
04:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json |
[production] |
04:37 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json |
[production] |
04:25 |
<legoktm> |
upgrading lists-next.wikimedia.org to mailman3-from-bullseye (T280887) |
[production] |
04:19 |
<marostegui> |
Set phabricator on read only T279625 |
[production] |
03:37 |
<ryankemper> |
[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
03:37 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
03:37 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
03:36 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s) |
[production] |
03:28 |
<ryankemper> |
[WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet |
[production] |
03:28 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@08ad17a]: 0.3.70 |
[production] |
03:27 |
<ryankemper> |
[WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003` |
[production] |
03:17 |
<ryankemper> |
T280382 `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2 2.6T 998G 1.5T 40% /srv` |
[production] |
02:56 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
01:29 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id T280382` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
01:29 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
01:27 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
01:21 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
01:21 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
2021-04-26
§
|
23:28 |
<mutante> |
renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts |
[production] |
23:21 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host |
[production] |
23:21 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host |
[production] |
22:26 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE |
[production] |
22:24 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE |
[production] |
22:11 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
21:21 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet |
[production] |
20:48 |
<twentyafterfour> |
restarting php-fpm on phab1001 to deploy phabricator hotfix d238db85b8d8072d99f31805aa4a8a7cf0c09941 |
[production] |
20:35 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet |
[production] |
20:26 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet |
[production] |
20:15 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet |
[production] |
19:45 |
<legoktm> |
uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o |
[production] |
18:51 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE |
[production] |
18:49 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE |
[production] |
18:49 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE |
[production] |
18:47 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE |
[production] |
18:47 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE |
[production] |
18:45 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE |
[production] |
18:18 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: 2d16f6251a67cf13cef02bbdcb3c9f5c1c505d16: elwiki: Update Growth experiments configuration (T280172) (duration: 00m 58s) |
[production] |
18:06 |
<urbanecm@deploy1002> |
Synchronized multiversion/MWScript.php: 5ace4e1b806bcfc4ea059f9e9cae9aa94c0bdbd1: Fix error message if MWScript.php is run without arguments (duration: 00m 58s) |
[production] |
17:28 |
<dduvall@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:26 |
<dduvall@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |