2022-02-24
ยง
|
21:30 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2085.codfw.wmnet with reason: host reimage |
[production] |
21:29 |
<brennen@deploy1002> |
Synchronized wmf-config/CirrusSearch-production.php: Config: [[gerrit:765577|cirrus: Reduce write isolation to only cloudelastic (T295705)]] (duration: 00m 55s) |
[production] |
21:27 |
<mutante> |
phabricator - disabling git repo rGEDS (Elasticdash) - only one commit from 2015 - T296022 |
[production] |
21:19 |
<tzatziki> |
removing 1 file for legal compliance |
[production] |
21:19 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2086.codfw.wmnet with OS bullseye |
[production] |
21:18 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2083.codfw.wmnet with OS bullseye |
[production] |
21:13 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2085.codfw.wmnet with OS bullseye |
[production] |
21:11 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2084.codfw.wmnet with OS bullseye |
[production] |
21:07 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2083.codfw.wmnet with reason: host reimage |
[production] |
21:05 |
<tzatziki> |
removing 4 files for legal compilance |
[production] |
21:04 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2083.codfw.wmnet with reason: host reimage |
[production] |
21:02 |
<taavi@deploy1002> |
Finished deploy [horizon/deploy@9d02cd6]: (no justification provided) (duration: 03m 18s) |
[production] |
21:01 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2084.codfw.wmnet with reason: host reimage |
[production] |
20:59 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye |
[production] |
20:58 |
<taavi@deploy1002> |
Started deploy [horizon/deploy@9d02cd6]: (no justification provided) |
[production] |
20:58 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2084.codfw.wmnet with reason: host reimage |
[production] |
20:51 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye |
[production] |
20:14 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2084.codfw.wmnet with OS bullseye |
[production] |
20:10 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2083.codfw.wmnet with OS bullseye |
[production] |
20:04 |
<ryankemper> |
T302526 `ryankemper@cumin1001:~$ sudo -E cumin -b 3 'wcqs*' 'enable-puppet "query_service: Simply jvm arg handling - T302526"; sudo run-puppet-agent'` in tmux `wcqs` |
[production] |
20:02 |
<ryankemper> |
T302526 Depooled `wcqs1001`, ran puppet agent, and restarted `wcqs-blazegraph`. Service came up healthy, proceeding to rest of wcqs fleet |
[production] |
19:57 |
<ryankemper> |
T302526 `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'wdqs*' 'enable-puppet "query_service: Simply jvm arg handling - T302526"; sudo run-puppet-agent'` in tmux `deploy_window` |
[production] |
19:55 |
<ryankemper> |
T302526 Depooled canary `wdqs1003`, ran puppet agent, and restarted `wdqs-blazegraph`. Tests look good, proceeding to rest of wdqs fleet |
[production] |
19:48 |
<ryankemper> |
T302526 (Forgot to merge patch first, take two) |
[production] |
19:48 |
<ryankemper> |
T302526 Running puppet on wdqs canary: `ryankemper@wdqs1003:~$ sudo enable-puppet "query_service: Simply jvm arg handling - T302526" && sudo run-puppet-agent` |
[production] |
19:46 |
<ryankemper> |
T302526 Disabling puppet across entire query service (wdqs & wcqs) fleet for merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/761080: `ryankemper@cumin1001:~$ sudo -E cumin 'w*qs*' 'disable-puppet "query_service: Simply jvm arg handling - T302526"'` |
[production] |
19:06 |
<dduvall@deploy1002> |
rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.23 refs T300199 |
[production] |
19:00 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye |
[production] |
18:56 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye |
[production] |
18:55 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:53 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2082.codfw.wmnet with OS bullseye |
[production] |
18:52 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:51 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:45 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:43 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:43 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage |
[production] |
18:39 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage |
[production] |
18:27 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:22 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2082.codfw.wmnet with OS bullseye |
[production] |
18:21 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21508 and previous config saved to /var/cache/conftool/dbconfig/20220224-182102-kormat.json |
[production] |
18:20 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:13 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2081.codfw.wmnet with OS bullseye |
[production] |
18:05 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21506 and previous config saved to /var/cache/conftool/dbconfig/20220224-180557-kormat.json |
[production] |
18:04 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:03 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage |
[production] |
18:02 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:02 |
<kevinbazira@deploy1002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
18:01 |
<kevinbazira@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
18:01 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
18:00 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage |
[production] |