| 2024-08-14
      
      ยง | 
    
  | 16:43 | <ladsgroup@deploy1003> | ladsgroup: Backport for [[gerrit:1062736|Avoid primary DB query for non-talk page edits (T370304)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | [production] | 
            
  | 16:42 | <otto@deploy1003> | Started deploy [analytics/refinery@f033576] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f0335766] | [production] | 
            
  | 16:42 | <ottomata> | deploying refinery for weekly train | [analytics] | 
            
  | 16:41 | <ladsgroup@deploy1003> | Started scap sync-world: Backport for [[gerrit:1062736|Avoid primary DB query for non-talk page edits (T370304)]] | [production] | 
            
  | 16:28 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67318 and previous config saved to /var/cache/conftool/dbconfig/20240814-162854-arnaudb.json | [production] | 
            
  | 16:24 | <jayme@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye | [production] | 
            
  | 16:13 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67317 and previous config saved to /var/cache/conftool/dbconfig/20240814-161350-arnaudb.json | [production] | 
            
  | 16:04 | <klausman@deploy1003> | helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. | [production] | 
            
  | 16:04 | <klausman@deploy1003> | helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. | [production] | 
            
  | 16:03 | <klausman@deploy1003> | helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. | [production] | 
            
  | 16:01 | <jayme@cumin1002> | START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye | [production] | 
            
  | 15:58 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67316 and previous config saved to /var/cache/conftool/dbconfig/20240814-155844-arnaudb.json | [production] | 
            
  | 15:48 | <ebernhardson@deploy1003> | helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply | [production] | 
            
  | 15:47 | <ebernhardson@deploy1003> | helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply | [production] | 
            
  | 15:43 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67315 and previous config saved to /var/cache/conftool/dbconfig/20240814-154338-arnaudb.json | [production] | 
            
  | 15:40 | <dani@deploy1003> | helmfile [codfw] DONE helmfile.d/services/miscweb: apply | [production] | 
            
  | 15:39 | <dani@deploy1003> | helmfile [codfw] START helmfile.d/services/miscweb: apply | [production] | 
            
  | 15:39 | <dani@deploy1003> | helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | [production] | 
            
  | 15:39 | <dani@deploy1003> | helmfile [eqiad] START helmfile.d/services/miscweb: apply | [production] | 
            
  | 15:39 | <dani@deploy1003> | helmfile [staging] DONE helmfile.d/services/miscweb: apply | [production] | 
            
  | 15:39 | <dani@deploy1003> | helmfile [staging] START helmfile.d/services/miscweb: apply | [production] | 
            
  | 15:34 | <jayme@cumin1002> | START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye | [production] | 
            
  | 15:28 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 16%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67314 and previous config saved to /var/cache/conftool/dbconfig/20240814-152833-arnaudb.json | [production] | 
            
  | 15:13 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 8%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67312 and previous config saved to /var/cache/conftool/dbconfig/20240814-151328-arnaudb.json | [production] | 
            
  | 15:06 | <andrew@cloudcumin1001> | END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) | [admin] | 
            
  | 15:05 | <andrew@cloudcumin1001> | START - Cookbook wmcs.ceph.osd.undrain_node | [admin] | 
            
  | 15:05 | <wmbot~fnegri@tools-bastion-13> | webservice restart after a user reported the web page was down | [tools.sal] | 
            
  | 15:05 | <andrew@cloudcumin1001> | END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) | [admin] | 
            
  | 15:05 | <andrew@cloudcumin1001> | START - Cookbook wmcs.ceph.osd.undrain_node | [admin] | 
            
  | 14:59 | <klausman@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2010.codfw.wmnet | [production] | 
            
  | 14:58 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 4%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67307 and previous config saved to /var/cache/conftool/dbconfig/20240814-145819-arnaudb.json | [production] | 
            
  | 14:53 | <klausman@cumin2002> | START - Cookbook sre.hosts.reboot-single for host ml-serve2010.codfw.wmnet | [production] | 
            
  | 14:49 | <jayme@cumin1002> | END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bookworm | [production] | 
            
  | 14:43 | <jayme@cumin1002> | START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bookworm | [production] | 
            
  | 14:43 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67305 and previous config saved to /var/cache/conftool/dbconfig/20240814-144314-arnaudb.json | [production] | 
            
  | 14:32 | <elukey@deploy1003> | helmfile [eqiad] DONE helmfile.d/services/thumbor: sync | [production] | 
            
  | 14:28 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67304 and previous config saved to /var/cache/conftool/dbconfig/20240814-142808-arnaudb.json | [production] | 
            
  | 14:27 | <elukey@deploy1003> | helmfile [eqiad] START helmfile.d/services/thumbor: sync | [production] | 
            
  | 14:22 | <elukey@deploy1003> | helmfile [codfw] DONE helmfile.d/services/thumbor: sync | [production] | 
            
  | 14:21 | <arnaudb@cumin1002> | dbctl commit (dc=all): 'es1 es1029 depooling for hdd hotswap', diff saved to https://phabricator.wikimedia.org/P67299 and previous config saved to /var/cache/conftool/dbconfig/20240814-142147-arnaudb.json | [production] | 
            
  | 14:21 | <ebernhardson@deploy1003> | Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 12m 33s) | [production] | 
            
  | 14:17 | <elukey@deploy1003> | helmfile [codfw] START helmfile.d/services/thumbor: sync | [production] | 
            
  | 13:55 | <elukey@deploy1003> | helmfile [staging] DONE helmfile.d/services/thumbor: sync | [production] | 
            
  | 13:55 | <elukey@deploy1003> | helmfile [staging] START helmfile.d/services/thumbor: sync | [production] | 
            
  | 13:52 | <hnowlan@deploy1003> | helmfile [codfw] DONE helmfile.d/services/thumbor: sync | [production] | 
            
  | 13:50 | <hnowlan@deploy1003> | helmfile [codfw] START helmfile.d/services/thumbor: sync | [production] | 
            
  | 13:33 | <kartik@deploy1003> | Finished scap sync-world: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] (duration: 07m 51s) | [production] | 
            
  | 13:32 | <jayme@cumin1002> | END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-main2010.codfw.wmnet'] | [production] | 
            
  | 13:29 | <kartik@deploy1003> | kartik: Continuing with sync | [production] | 
            
  | 13:28 | <kartik@deploy1003> | kartik: Backport for [[gerrit:1062696|Use the updated recommendation API from liftwing (T371465)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | [production] |