2024-08-14
ยง
|
14:49 |
<jayme@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bookworm |
[production] |
14:43 |
<jayme@cumin1002> |
START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bookworm |
[production] |
14:43 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67305 and previous config saved to /var/cache/conftool/dbconfig/20240814-144314-arnaudb.json |
[production] |
14:32 |
<elukey@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/thumbor: sync |
[production] |
14:28 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67304 and previous config saved to /var/cache/conftool/dbconfig/20240814-142808-arnaudb.json |
[production] |
14:27 |
<elukey@deploy1003> |
helmfile [eqiad] START helmfile.d/services/thumbor: sync |
[production] |
14:22 |
<elukey@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/thumbor: sync |
[production] |
14:21 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'es1 es1029 depooling for hdd hotswap', diff saved to https://phabricator.wikimedia.org/P67299 and previous config saved to /var/cache/conftool/dbconfig/20240814-142147-arnaudb.json |
[production] |
14:21 |
<ebernhardson@deploy1003> |
Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 12m 33s) |
[production] |
14:17 |
<elukey@deploy1003> |
helmfile [codfw] START helmfile.d/services/thumbor: sync |
[production] |
13:55 |
<elukey@deploy1003> |
helmfile [staging] DONE helmfile.d/services/thumbor: sync |
[production] |
13:55 |
<elukey@deploy1003> |
helmfile [staging] START helmfile.d/services/thumbor: sync |
[production] |
13:52 |
<hnowlan@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/thumbor: sync |
[production] |
13:50 |
<hnowlan@deploy1003> |
helmfile [codfw] START helmfile.d/services/thumbor: sync |
[production] |
13:33 |
<kartik@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] (duration: 07m 51s) |
[production] |
13:32 |
<jayme@cumin1002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-main2010.codfw.wmnet'] |
[production] |
13:29 |
<kartik@deploy1003> |
kartik: Continuing with sync |
[production] |
13:28 |
<kartik@deploy1003> |
kartik: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
13:25 |
<kartik@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] |
[production] |
13:25 |
<kartik@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1062697|Use the updated recommendation API from liftwing (T371465)]] (duration: 08m 37s) |
[production] |
13:22 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67296 and previous config saved to /var/cache/conftool/dbconfig/20240814-132256-arnaudb.json |
[production] |
13:22 |
<jayme@cumin1002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-main2010.codfw.wmnet'] |
[production] |
13:20 |
<kartik@deploy1003> |
kartik: Continuing with sync |
[production] |
13:18 |
<kartik@deploy1003> |
kartik: Backport for [[gerrit:1062697|Use the updated recommendation API from liftwing (T371465)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
13:18 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
13:18 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
13:16 |
<kartik@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1062697|Use the updated recommendation API from liftwing (T371465)]] |
[production] |
13:14 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
13:14 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
13:11 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
13:11 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
13:07 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67295 and previous config saved to /var/cache/conftool/dbconfig/20240814-130750-arnaudb.json |
[production] |
12:52 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67293 and previous config saved to /var/cache/conftool/dbconfig/20240814-125245-arnaudb.json |
[production] |
12:49 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: replication table exclusion deployment |
[production] |
12:49 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: replication table exclusion deployment |
[production] |
12:37 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67292 and previous config saved to /var/cache/conftool/dbconfig/20240814-123739-arnaudb.json |
[production] |
12:22 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 16%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67291 and previous config saved to /var/cache/conftool/dbconfig/20240814-122234-arnaudb.json |
[production] |
12:07 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 8%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67290 and previous config saved to /var/cache/conftool/dbconfig/20240814-120729-arnaudb.json |
[production] |
11:52 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 4%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67289 and previous config saved to /var/cache/conftool/dbconfig/20240814-115223-arnaudb.json |
[production] |
11:37 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 2%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67288 and previous config saved to /var/cache/conftool/dbconfig/20240814-113718-arnaudb.json |
[production] |
11:23 |
<mvolz@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/citoid: apply |
[production] |
11:23 |
<mvolz@deploy1003> |
helmfile [eqiad] START helmfile.d/services/citoid: apply |
[production] |
11:22 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 1%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67287 and previous config saved to /var/cache/conftool/dbconfig/20240814-112212-arnaudb.json |
[production] |
11:20 |
<mvolz@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/citoid: apply |
[production] |
11:19 |
<mvolz@deploy1003> |
helmfile [codfw] START helmfile.d/services/citoid: apply |
[production] |
11:19 |
<mvolz@deploy1003> |
helmfile [staging] DONE helmfile.d/services/citoid: apply |
[production] |
11:18 |
<mvolz@deploy1003> |
helmfile [staging] START helmfile.d/services/citoid: apply |
[production] |
09:56 |
<fnegri@cumin1002> |
conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1 |
[production] |
09:26 |
<klausman@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/api-gateway: apply |
[production] |
09:26 |
<klausman@deploy1003> |
helmfile [codfw] START helmfile.d/services/api-gateway: apply |
[production] |