2022-02-24
ยง
|
19:06 |
<dduvall@deploy1002> |
rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.23 refs T300199 |
[production] |
19:00 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye |
[production] |
18:56 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye |
[production] |
18:55 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:53 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2082.codfw.wmnet with OS bullseye |
[production] |
18:52 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:51 |
<pt1979@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:45 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:43 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:43 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage |
[production] |
18:39 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage |
[production] |
18:27 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:22 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2082.codfw.wmnet with OS bullseye |
[production] |
18:21 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21508 and previous config saved to /var/cache/conftool/dbconfig/20220224-182102-kormat.json |
[production] |
18:20 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:13 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2081.codfw.wmnet with OS bullseye |
[production] |
18:05 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21506 and previous config saved to /var/cache/conftool/dbconfig/20220224-180557-kormat.json |
[production] |
18:04 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:03 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage |
[production] |
18:02 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
18:02 |
<kevinbazira@deploy1002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
18:01 |
<kevinbazira@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
18:01 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
18:00 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage |
[production] |
18:00 |
<kevinbazira@deploy1002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
17:59 |
<kevinbazira@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
17:50 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
17:50 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21504 and previous config saved to /var/cache/conftool/dbconfig/20220224-175052-kormat.json |
[production] |
17:46 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
17:45 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
17:44 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
17:44 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
17:44 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[1039,1043].eqiad.wmnet |
[production] |
17:43 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
17:43 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2081.codfw.wmnet with OS bullseye |
[production] |
17:40 |
<elukey> |
`truncate -s 1g /var/log/auth.log.1` on krb1001 to free space on the root partition |
[production] |
17:38 |
<elukey> |
`truncate -s 1g /var/log/auth.log` on krb1001 to free space on the root partition |
[production] |
17:35 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21503 and previous config saved to /var/cache/conftool/dbconfig/20220224-173548-kormat.json |
[production] |
17:33 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Depooling db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21502 and previous config saved to /var/cache/conftool/dbconfig/20220224-173307-kormat.json |
[production] |
17:33 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance |
[production] |
17:33 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance |
[production] |
17:33 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300774)', diff saved to https://phabricator.wikimedia.org/P21501 and previous config saved to /var/cache/conftool/dbconfig/20220224-173259-kormat.json |
[production] |
17:32 |
<krinkle@deploy1002> |
Synchronized wmf-config/: Ia61fea4d0dcf86d51547d3132093a336ab3f2e9f (duration: 00m 52s) |
[production] |
17:30 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2080.codfw.wmnet with OS bullseye |
[production] |
17:22 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts elastic[1039,1043].eqiad.wmnet |
[production] |
17:20 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2080.codfw.wmnet with reason: host reimage |
[production] |
17:17 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21500 and previous config saved to /var/cache/conftool/dbconfig/20220224-171755-kormat.json |
[production] |
17:16 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2080.codfw.wmnet with reason: host reimage |
[production] |
17:11 |
<jayme@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/miscweb: apply |
[production] |
17:11 |
<jayme@deploy1002> |
helmfile [eqiad] START helmfile.d/services/miscweb: apply |
[production] |