| 2023-04-05
      
      ยง | 
    
  | 15:14 | <hnowlan@deploy2002> | helmfile [eqiad] START helmfile.d/services/thumbor: apply | [production] | 
            
  | 15:11 | <hnowlan@deploy2002> | helmfile [eqiad] START helmfile.d/services/thumbor: apply | [production] | 
            
  | 15:10 | <elukey@cumin1001> | END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS bullseye | [production] | 
            
  | 15:09 | <lucaswerkmeister-wmde@deploy2002> | lucaswerkmeister-wmde and phuedx: Backport for [[gerrit:905979|Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | [production] | 
            
  | 15:09 | <moritzm> | installing nodejs security updates on buster | [production] | 
            
  | 15:09 | <elukey@deploy2002> | helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync | [production] | 
            
  | 15:08 | <elukey@deploy2002> | helmfile [eqiad] START helmfile.d/services/eventgate-main: sync | [production] | 
            
  | 15:07 | <lucaswerkmeister-wmde@deploy2002> | Started scap: Backport for [[gerrit:905979|Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"]] | [production] | 
            
  | 15:05 | <hnowlan@deploy2002> | helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | [production] | 
            
  | 15:04 | <hnowlan@deploy2002> | helmfile [eqiad] START helmfile.d/services/thumbor: apply | [production] | 
            
  | 15:03 | <hnowlan@deploy2002> | helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | [production] | 
            
  | 15:03 | <hnowlan@deploy2002> | helmfile [eqiad] START helmfile.d/services/thumbor: apply | [production] | 
            
  | 14:54 | <elukey@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:51 | <elukey@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:48 | <dcausse@deploy2002> | helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | [production] | 
            
  | 14:48 | <hnowlan@deploy2002> | helmfile [eqiad] START helmfile.d/services/thumbor: apply | [production] | 
            
  | 14:48 | <dcausse@deploy2002> | helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply | [production] | 
            
  | 14:36 | <elukey@cumin1001> | START - Cookbook sre.ganeti.reimage for host kafka-test1009.eqiad.wmnet with OS bullseye | [production] | 
            
  | 14:33 | <elukey> | restart kafka on kafka-main1005 to pick up the new TLS certificate (PKI based) - T319372 | [production] | 
            
  | 14:31 | <elukey@cumin1001> | END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS bullseye | [production] | 
            
  | 14:31 | <elukey@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1005.eqiad.wmnet with reason: restart kafka, switch to PKI | [production] | 
            
  | 14:30 | <elukey@cumin1001> | START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1005.eqiad.wmnet with reason: restart kafka, switch to PKI | [production] | 
            
  | 14:14 | <elukey@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:14 | <cgoubert@deploy2002> | helmfile [staging] DONE helmfile.d/services/termbox: apply | [production] | 
            
  | 14:14 | <cgoubert@deploy2002> | helmfile [staging] START helmfile.d/services/termbox: apply | [production] | 
            
  | 14:11 | <jclark@cumin1001> | END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED | [production] | 
            
  | 14:11 | <elukey@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:00 | <elukey> | powercycle an-worker1132 | [production] | 
            
  | 13:58 | <elukey@cumin1001> | START - Cookbook sre.ganeti.reimage for host kafka-test1008.eqiad.wmnet with OS bullseye | [production] | 
            
  | 13:57 | <elukey@cumin1001> | END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet | [production] | 
            
  | 13:54 | <cgoubert@deploy2002> | helmfile [staging] DONE helmfile.d/services/termbox: apply | [production] | 
            
  | 13:54 | <cgoubert@deploy2002> | helmfile [staging] START helmfile.d/services/termbox: apply | [production] | 
            
  | 13:53 | <cgoubert@deploy2002> | helmfile [staging] DONE helmfile.d/services/termbox: apply | [production] | 
            
  | 13:53 | <cgoubert@deploy2002> | helmfile [staging] START helmfile.d/services/termbox: apply | [production] | 
            
  | 13:52 | <elukey@cumin1001> | START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet | [production] | 
            
  | 13:52 | <elukey@cumin1001> | END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet | [production] | 
            
  | 13:52 | <elukey> | restart kafka on kafka-main1004 to pick up the new TLS certificate (PKI based) - T319372 | [production] | 
            
  | 13:49 | <elukey@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1004.eqiad.wmnet with reason: restart kafka, switch to PKI | [production] | 
            
  | 13:48 | <elukey@cumin1001> | START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1004.eqiad.wmnet with reason: restart kafka, switch to PKI | [production] | 
            
  | 13:48 | <elukey@cumin1001> | START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet | [production] | 
            
  | 13:46 | <lucaswerkmeister-wmde@deploy2002> | Finished scap: Backport for [[gerrit:905601|VisualEditorFeatureUse sampling rate to 1 everywhere (T333168)]] (duration: 14m 47s) | [production] | 
            
  | 13:33 | <lucaswerkmeister-wmde@deploy2002> | lucaswerkmeister-wmde and phuedx: Backport for [[gerrit:905601|VisualEditorFeatureUse sampling rate to 1 everywhere (T333168)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | [production] | 
            
  | 13:31 | <lucaswerkmeister-wmde@deploy2002> | Started scap: Backport for [[gerrit:905601|VisualEditorFeatureUse sampling rate to 1 everywhere (T333168)]] | [production] | 
            
  | 13:29 | <lucaswerkmeister-wmde@deploy2002> | Finished scap: Backport for [[gerrit:905261|mediawiki.edit_attempt: Ignore events from PHP MPC (T309985)]] (duration: 10m 52s) | [production] | 
            
  | 13:28 | <jclark@cumin1001> | START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED | [production] | 
            
  | 13:28 | <jclark@cumin1001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 13:27 | <jclark@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 13:26 | <jclark@cumin1001> | END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED | [production] | 
            
  | 13:23 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46079 and previous config saved to /var/cache/conftool/dbconfig/20230405-132318-root.json | [production] | 
            
  | 13:21 | <jclark@cumin1001> | START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED | [production] |