| 
      
        2024-04-25
      
      ยง
     | 
  
    
  | 13:12 | 
  <wmbot~anticomposite@tools-bastion-13> | 
  deploy ed1afde | 
  [tools.stewardbots] | 
            
  | 13:09 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 13:08 | 
  <arnaudb@cumin1002> | 
  END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2155.codfw.wmnet with OS bullseye | 
  [production] | 
            
  | 13:07 | 
  <cgoubert@deploy1002> | 
  cgoubert: Continuing with sync | 
  [production] | 
            
  | 13:07 | 
  <cgoubert@deploy1002> | 
  cgoubert: Backport for [[gerrit:1020280|ClusterConfigTest: Add mw-on-k8s specific tests]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | 
  [production] | 
            
  | 13:04 | 
  <wmbot~anticomposite@tools-bastion-13> | 
  SULWatcher/manage.sh restart # SULWatcher3 disconnected | 
  [tools.stewardbots] | 
            
  | 13:04 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bullseye | 
  [production] | 
            
  | 13:04 | 
  <cgoubert@deploy1002> | 
  Started scap: Backport for [[gerrit:1020280|ClusterConfigTest: Add mw-on-k8s specific tests]] | 
  [production] | 
            
  | 12:57 | 
  <dcaro@cloudcumin1001> | 
  END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api | 
  [tools] | 
            
  | 12:57 | 
  <dcaro@cloudcumin1001> | 
  START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api | 
  [tools] | 
            
  | 12:55 | 
  <dcaro@cloudcumin1001> | 
  END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api | 
  [toolsbeta] | 
            
  | 12:55 | 
  <dcaro@cloudcumin1001> | 
  START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api | 
  [toolsbeta] | 
            
  | 12:54 | 
  <wmbot~anticomposite@tools-bastion-13> | 
  ./stewardbots/StewardBot/manage.sh restart # SULWatcher3 not coming back up | 
  [tools.stewardbots] | 
            
  | 12:44 | 
  <arnaudb@cumin1002> | 
  END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 12:05 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 12:04 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'Depool db2155', diff saved to https://phabricator.wikimedia.org/P61211 and previous config saved to /var/cache/conftool/dbconfig/20240425-120409-arnaudb.json | 
  [production] | 
            
  | 12:03 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2155,2187].codfw.wmnet with reason: T362746 | 
  [production] | 
            
  | 12:03 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 3:00:00 on db[2155,2187].codfw.wmnet with reason: T362746 | 
  [production] | 
            
  | 12:02 | 
  <root@cumin1002> | 
  END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1005.eqiad.wmnet with OS bullseye | 
  [production] | 
            
  | 11:37 | 
  <cgoubert@deploy1002> | 
  helmfile [eqiad] DONE helmfile.d/admin 'apply'. | 
  [production] | 
            
  | 11:37 | 
  <cgoubert@deploy1002> | 
  helmfile [eqiad] START helmfile.d/admin 'apply'. | 
  [production] | 
            
  | 11:37 | 
  <cgoubert@deploy1002> | 
  helmfile [codfw] DONE helmfile.d/admin 'apply'. | 
  [production] | 
            
  | 11:36 | 
  <cgoubert@deploy1002> | 
  helmfile [codfw] START helmfile.d/admin 'apply'. | 
  [production] | 
            
  | 11:20 | 
  <wmbot~bsadowski1@tools-bastion-13> | 
  Restarted StewardBot/SULWatcher because of a connection loss | 
  [tools.stewardbots] | 
            
  | 11:17 | 
  <root@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host backup1005.eqiad.wmnet with OS bullseye | 
  [production] | 
            
  | 11:15 | 
  <root@cumin1002> | 
  END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1005.eqiad.wmnet with OS bookworm | 
  [production] | 
            
  | 11:10 | 
  <root@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host backup1005.eqiad.wmnet with OS bookworm | 
  [production] | 
            
  | 10:38 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61208 and previous config saved to /var/cache/conftool/dbconfig/20240425-103802-arnaudb.json | 
  [production] | 
            
  | 10:22 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61207 and previous config saved to /var/cache/conftool/dbconfig/20240425-102255-arnaudb.json | 
  [production] | 
            
  | 10:07 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61206 and previous config saved to /var/cache/conftool/dbconfig/20240425-100748-arnaudb.json | 
  [production] | 
            
  | 09:55 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61205 and previous config saved to /var/cache/conftool/dbconfig/20240425-095459-arnaudb.json | 
  [production] | 
            
  | 09:52 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61204 and previous config saved to /var/cache/conftool/dbconfig/20240425-095242-arnaudb.json | 
  [production] | 
            
  | 09:48 | 
  <taavi> | 
  update pywikibot script image to v9.1.0 T363132 | 
  [tools] | 
            
  | 09:44 | 
  <wmbot~lucaswerkmeister@tools-bastion-13> | 
  Double IRC messages to other bridges | 
  [tools.bridgebot] | 
            
  | 09:39 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61203 and previous config saved to /var/cache/conftool/dbconfig/20240425-093954-arnaudb.json | 
  [production] | 
            
  | 09:37 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61202 and previous config saved to /var/cache/conftool/dbconfig/20240425-093735-arnaudb.json | 
  [production] | 
            
  | 09:36 | 
  <jelto@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org | 
  [production] | 
            
  | 09:29 | 
  <jelto@cumin1002> | 
  START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org | 
  [production] | 
            
  | 09:29 | 
  <jelto@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org | 
  [production] | 
            
  | 09:24 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1241 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61201 and previous config saved to /var/cache/conftool/dbconfig/20240425-092448-arnaudb.json | 
  [production] | 
            
  | 09:24 | 
  <btullis@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 09:22 | 
  <jelto@cumin1002> | 
  START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org | 
  [production] | 
            
  | 09:22 | 
  <jelto@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org | 
  [production] | 
            
  | 09:22 | 
  <arnaudb@cumin1002> | 
  dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61200 and previous config saved to /var/cache/conftool/dbconfig/20240425-092229-arnaudb.json | 
  [production] | 
            
  | 09:21 | 
  <btullis@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 09:18 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS bookworm | 
  [production] | 
            
  | 09:17 | 
  <jmm@cumin2002> | 
  END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic | 
  [production] | 
            
  | 09:16 | 
  <jelto@cumin1002> | 
  START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org | 
  [production] | 
            
  | 09:16 | 
  <jayme@deploy1002> | 
  helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply | 
  [production] | 
            
  | 09:15 | 
  <jayme@deploy1002> | 
  helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply | 
  [production] |