| 2024-04-26
      
      ยง | 
    
  | 14:38 | <elukey@cumin1002> | START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2021.codfw.wmnet: Move to PKI TLS certs - elukey@cumin1002 | [production] | 
            
  | 14:15 | <btullis@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephadm1001.eqiad.wmnet with OS bookworm | [production] | 
            
  | 14:10 | <eevans@cumin1002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet | [production] | 
            
  | 14:03 | <btullis@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephadm1001.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 14:02 | <eevans@cumin1002> | START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet | [production] | 
            
  | 13:57 | <btullis@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on cephadm1001.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 13:48 | <jayme@cumin1002> | END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2003.codfw.wmnet to plain | [production] | 
            
  | 13:47 | <jayme@cumin1002> | START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2003.codfw.wmnet to plain | [production] | 
            
  | 13:45 | <btullis@cumin1002> | START - Cookbook sre.hosts.reimage for host cephadm1001.eqiad.wmnet with OS bookworm | [production] | 
            
  | 13:28 | <akosiaris@cumin1002> | conftool action : set/pooled=no; selector: name=elastic110[3-7]\.eqiad\.wmnet | [production] | 
            
  | 13:28 | <eoghan@cumin1002> | END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists2001.codfw.wmnet | [production] | 
            
  | 13:28 | <eoghan@cumin1002> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 13:27 | <eoghan@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002" | [production] | 
            
  | 13:25 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . | [production] | 
            
  | 13:23 | <eoghan@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002" | [production] | 
            
  | 13:21 | <eoghan@cumin1002> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 13:14 | <eoghan@cumin1002> | START - Cookbook sre.hosts.decommission for hosts lists2001.codfw.wmnet | [production] | 
            
  | 12:52 | <elukey@deploy1002> | helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . | [production] | 
            
  | 12:45 | <wmbot~bsadowski1@tools-bastion-13> | Restarted StewardBot/SULWatcher because of a connection loss | [tools.stewardbots] | 
            
  | 12:44 | <elukey@deploy1002> | helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | [production] | 
            
  | 12:44 | <elukey@deploy1002> | helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | [production] | 
            
  | 12:27 | <btullis@cumin1002> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host cephadm1001.eqiad.wmnet | [production] | 
            
  | 12:26 | <btullis@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephadm1001.eqiad.wmnet with OS bookworm | [production] | 
            
  | 12:21 | <ladsgroup@cumin1002> | dbctl commit (dc=all): 'Depooling db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P61251 and previous config saved to /var/cache/conftool/dbconfig/20240426-121951-ladsgroup.json | [production] | 
            
  | 12:20 | <ladsgroup@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 12:20 | <ladsgroup@cumin1002> | START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 12:19 | <ladsgroup@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P61250 and previous config saved to /var/cache/conftool/dbconfig/20240426-121939-ladsgroup.json | [production] | 
            
  | 12:04 | <ladsgroup@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P61249 and previous config saved to /var/cache/conftool/dbconfig/20240426-120431-ladsgroup.json | [production] | 
            
  | 11:53 | <claime> | Silencing all alerts matching parse1002.* for 4 days - T363086 | [production] | 
            
  | 11:53 | <moritzm> | uploaded debdeploy 0.0.99.14 to apt.wikimedia.org (for buster/bullseye/bookworm) | [production] | 
            
  | 11:50 | <jayme@cumin1002> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster2003.codfw.wmnet | [production] | 
            
  | 11:50 | <jayme@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bullseye | [production] | 
            
  | 11:49 | <ladsgroup@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P61248 and previous config saved to /var/cache/conftool/dbconfig/20240426-114923-ladsgroup.json | [production] | 
            
  | 11:43 | <btullis@cumin1002> | START - Cookbook sre.hosts.reimage for host cephadm1001.eqiad.wmnet with OS bookworm | [production] | 
            
  | 11:43 | <btullis@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cephadm1001.eqiad.wmnet - btullis@cumin1002" | [production] | 
            
  | 11:43 | <btullis@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cephadm1001.eqiad.wmnet - btullis@cumin1002" | [production] | 
            
  | 11:43 | <btullis@cumin1002> | END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephadm1001.eqiad.wmnet on all recursors | [production] | 
            
  | 11:42 | <btullis@cumin1002> | START - Cookbook sre.dns.wipe-cache cephadm1001.eqiad.wmnet on all recursors | [production] | 
            
  | 11:42 | <btullis@cumin1002> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 11:42 | <btullis@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cephadm1001.eqiad.wmnet - btullis@cumin1002" | [production] | 
            
  | 11:39 | <btullis@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cephadm1001.eqiad.wmnet - btullis@cumin1002" | [production] | 
            
  | 11:36 | <btullis@cumin1002> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 11:36 | <btullis@cumin1002> | START - Cookbook sre.ganeti.makevm for new host cephadm1001.eqiad.wmnet | [production] | 
            
  | 11:35 | <jayme@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage | [production] | 
            
  | 11:34 | <ladsgroup@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P61247 and previous config saved to /var/cache/conftool/dbconfig/20240426-113416-ladsgroup.json | [production] | 
            
  | 11:33 | <claime> | Forcing puppet run on O:alerting_host - T363086 | [production] | 
            
  | 11:32 | <jayme@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage | [production] | 
            
  | 11:29 | <claime> | Forcing puppet run on deploy server - T363086 | [production] | 
            
  | 11:28 | <claime> | Deactivating puppet for parse1002 - T363086 | [production] | 
            
  | 11:19 | <jayme@cumin1002> | START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bullseye | [production] |