2021-04-27
§
|
03:37 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
03:36 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s) |
[production] |
03:28 |
<ryankemper> |
[WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet |
[production] |
03:28 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@08ad17a]: 0.3.70 |
[production] |
03:27 |
<ryankemper> |
[WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003` |
[production] |
03:17 |
<ryankemper> |
T280382 `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2 2.6T 998G 1.5T 40% /srv` |
[production] |
02:56 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
01:29 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id T280382` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
01:29 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
01:27 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
01:21 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
01:21 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
2021-04-26
§
|
23:28 |
<mutante> |
renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts |
[production] |
23:21 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host |
[production] |
23:21 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host |
[production] |
22:26 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE |
[production] |
22:24 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE |
[production] |
22:11 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
21:21 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet |
[production] |
20:48 |
<twentyafterfour> |
restarting php-fpm on phab1001 to deploy phabricator hotfix d238db85b8d8072d99f31805aa4a8a7cf0c09941 |
[production] |
20:35 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet |
[production] |
20:26 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet |
[production] |
20:15 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet |
[production] |
19:45 |
<legoktm> |
uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o |
[production] |
18:51 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE |
[production] |
18:49 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE |
[production] |
18:49 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE |
[production] |
18:47 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE |
[production] |
18:47 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE |
[production] |
18:45 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE |
[production] |
18:18 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: 2d16f6251a67cf13cef02bbdcb3c9f5c1c505d16: elwiki: Update Growth experiments configuration (T280172) (duration: 00m 58s) |
[production] |
18:06 |
<urbanecm@deploy1002> |
Synchronized multiversion/MWScript.php: 5ace4e1b806bcfc4ea059f9e9cae9aa94c0bdbd1: Fix error message if MWScript.php is run without arguments (duration: 00m 58s) |
[production] |
17:28 |
<dduvall@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:26 |
<dduvall@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:18 |
<dduvall@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
17:06 |
<legoktm> |
imported postorius_1.3.4-2~bpo10+2 to apt.wm.o |
[production] |
16:49 |
<mutante> |
gerrit - restarted apache (hard) to remove time out from gerrit:682502 |
[production] |
16:40 |
<mutante> |
gerrit1001 - reload apache2 |
[production] |
16:36 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet |
[production] |
16:30 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet |
[production] |
15:26 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE |
[production] |
15:24 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE |
[production] |
15:21 |
<elukey> |
restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter |
[production] |
15:06 |
<moritzm> |
installing jquery security updates on stretch |
[production] |
15:01 |
<hnowlan@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |
15:01 |
<hnowlan@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
14:54 |
<hnowlan@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |
14:54 |
<hnowlan@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
14:48 |
<hnowlan@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
14:47 |
<hnowlan@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |