2251-2300 of 10000 results (113ms)
2024-08-14 ยง
16:03 <klausman@deploy1003> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
16:01 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye [production]
15:58 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67316 and previous config saved to /var/cache/conftool/dbconfig/20240814-155844-arnaudb.json [production]
15:48 <ebernhardson@deploy1003> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
15:47 <ebernhardson@deploy1003> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
15:43 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67315 and previous config saved to /var/cache/conftool/dbconfig/20240814-154338-arnaudb.json [production]
15:40 <dani@deploy1003> helmfile [codfw] DONE helmfile.d/services/miscweb: apply [production]
15:39 <dani@deploy1003> helmfile [codfw] START helmfile.d/services/miscweb: apply [production]
15:39 <dani@deploy1003> helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [production]
15:39 <dani@deploy1003> helmfile [eqiad] START helmfile.d/services/miscweb: apply [production]
15:39 <dani@deploy1003> helmfile [staging] DONE helmfile.d/services/miscweb: apply [production]
15:39 <dani@deploy1003> helmfile [staging] START helmfile.d/services/miscweb: apply [production]
15:34 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye [production]
15:28 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 16%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67314 and previous config saved to /var/cache/conftool/dbconfig/20240814-152833-arnaudb.json [production]
15:13 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 8%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67312 and previous config saved to /var/cache/conftool/dbconfig/20240814-151328-arnaudb.json [production]
14:59 <klausman@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2010.codfw.wmnet [production]
14:58 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 4%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67307 and previous config saved to /var/cache/conftool/dbconfig/20240814-145819-arnaudb.json [production]
14:53 <klausman@cumin2002> START - Cookbook sre.hosts.reboot-single for host ml-serve2010.codfw.wmnet [production]
14:49 <jayme@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bookworm [production]
14:43 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bookworm [production]
14:43 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67305 and previous config saved to /var/cache/conftool/dbconfig/20240814-144314-arnaudb.json [production]
14:32 <elukey@deploy1003> helmfile [eqiad] DONE helmfile.d/services/thumbor: sync [production]
14:28 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67304 and previous config saved to /var/cache/conftool/dbconfig/20240814-142808-arnaudb.json [production]
14:27 <elukey@deploy1003> helmfile [eqiad] START helmfile.d/services/thumbor: sync [production]
14:22 <elukey@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: sync [production]
14:21 <arnaudb@cumin1002> dbctl commit (dc=all): 'es1 es1029 depooling for hdd hotswap', diff saved to https://phabricator.wikimedia.org/P67299 and previous config saved to /var/cache/conftool/dbconfig/20240814-142147-arnaudb.json [production]
14:21 <ebernhardson@deploy1003> Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 12m 33s) [production]
14:17 <elukey@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: sync [production]
13:55 <elukey@deploy1003> helmfile [staging] DONE helmfile.d/services/thumbor: sync [production]
13:55 <elukey@deploy1003> helmfile [staging] START helmfile.d/services/thumbor: sync [production]
13:52 <hnowlan@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: sync [production]
13:50 <hnowlan@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: sync [production]
13:33 <kartik@deploy1003> Finished scap sync-world: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] (duration: 07m 51s) [production]
13:32 <jayme@cumin1002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-main2010.codfw.wmnet'] [production]
13:29 <kartik@deploy1003> kartik: Continuing with sync [production]
13:28 <kartik@deploy1003> kartik: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:25 <kartik@deploy1003> Started scap sync-world: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] [production]
13:25 <kartik@deploy1003> Finished scap sync-world: Backport for [[gerrit:1062697|Use the updated recommendation API from liftwing (T371465)]] (duration: 08m 37s) [production]
13:22 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67296 and previous config saved to /var/cache/conftool/dbconfig/20240814-132256-arnaudb.json [production]
13:22 <jayme@cumin1002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-main2010.codfw.wmnet'] [production]
13:20 <kartik@deploy1003> kartik: Continuing with sync [production]
13:18 <kartik@deploy1003> kartik: Backport for [[gerrit:1062697|Use the updated recommendation API from liftwing (T371465)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:18 <ebernhardson@deploy1003> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
13:18 <ebernhardson@deploy1003> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
13:16 <kartik@deploy1003> Started scap sync-world: Backport for [[gerrit:1062697|Use the updated recommendation API from liftwing (T371465)]] [production]
13:14 <ebernhardson@deploy1003> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
13:14 <ebernhardson@deploy1003> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
13:11 <ebernhardson@deploy1003> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
13:11 <ebernhardson@deploy1003> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
13:07 <arnaudb@cumin1002> dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67295 and previous config saved to /var/cache/conftool/dbconfig/20240814-130750-arnaudb.json [production]