| 2023-10-17
      
      § | 
    
  | 13:47 | <marostegui@cumin1001> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2160.codfw.wmnet with OS bookworm | [production] | 
            
  | 13:46 | <hnowlan@deploy2002> | helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 13:46 | <hnowlan@deploy2002> | helmfile [staging] START helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 13:40 | <tchin@deploy2002> | Started deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc] | [production] | 
            
  | 13:40 | <jdrewniak@deploy2002> | Finished scap: Backport for [[gerrit:966528|Enable Vector readability survey on select wikis (T347208)]] (duration: 09m 50s) | [production] | 
            
  | 13:34 | <jdrewniak@deploy2002> | jdrewniak: Continuing with sync | [production] | 
            
  | 13:32 | <jdrewniak@deploy2002> | jdrewniak: Backport for [[gerrit:966528|Enable Vector readability survey on select wikis (T347208)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | [production] | 
            
  | 13:30 | <jdrewniak@deploy2002> | Started scap: Backport for [[gerrit:966528|Enable Vector readability survey on select wikis (T347208)]] | [production] | 
            
  | 13:26 | <jdrewniak@deploy2002> | Backport cancelled. | [production] | 
            
  | 13:15 | <jdrewniak@deploy2002> | Backport cancelled. | [production] | 
            
  | 12:59 | <marostegui@cumin1001> | START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm | [production] | 
            
  | 12:49 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repool db1119 T339185', diff saved to https://phabricator.wikimedia.org/P52995 and previous config saved to /var/cache/conftool/dbconfig/20231017-124916-root.json | [production] | 
            
  | 12:28 | <urandom> | Starting Cassandra decommission(s) of restbase1017 — | [production] | 
            
  | 11:52 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52994 and previous config saved to /var/cache/conftool/dbconfig/20231017-115217-arnaudb.json | [production] | 
            
  | 11:39 | <hnowlan@deploy2002> | helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 11:38 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Depool db1126 T349077', diff saved to https://phabricator.wikimedia.org/P52993 and previous config saved to /var/cache/conftool/dbconfig/20231017-113809-arnaudb.json | [production] | 
            
  | 11:37 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52992 and previous config saved to /var/cache/conftool/dbconfig/20231017-113711-arnaudb.json | [production] | 
            
  | 11:34 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Set db1126 with weight 275 T349077', diff saved to https://phabricator.wikimedia.org/P52991 and previous config saved to /var/cache/conftool/dbconfig/20231017-113432-arnaudb.json | [production] | 
            
  | 11:29 | <hnowlan@deploy2002> | helmfile [staging] START helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 11:27 | <hnowlan@deploy2002> | helmfile [staging] START helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 11:22 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52990 and previous config saved to /var/cache/conftool/dbconfig/20231017-112204-arnaudb.json | [production] | 
            
  | 11:17 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Promote db1209 to s8 primary T349077', diff saved to https://phabricator.wikimedia.org/P52989 and previous config saved to /var/cache/conftool/dbconfig/20231017-111720-arnaudb.json | [production] | 
            
  | 11:12 | <arnaudb> | Starting s8 eqiad failover from db1126 to db1209 - T349077 | [production] | 
            
  | 11:06 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52988 and previous config saved to /var/cache/conftool/dbconfig/20231017-110658-arnaudb.json | [production] | 
            
  | 11:00 | <hnowlan@deploy2002> | helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 10:59 | <hnowlan@deploy2002> | helmfile [staging] START helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 10:48 | <arnaudb@cumin1001> | dbctl commit (dc=all): 'Set db1209 with weight 0 T349077', diff saved to https://phabricator.wikimedia.org/P52987 and previous config saved to /var/cache/conftool/dbconfig/20231017-104839-arnaudb.json | [production] | 
            
  | 10:46 | <arnaudb@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077 | [production] | 
            
  | 10:46 | <arnaudb@cumin1001> | START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077 | [production] | 
            
  | 10:28 | <hnowlan@deploy2002> | helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 10:28 | <hnowlan@deploy2002> | helmfile [staging] START helmfile.d/services/rest-gateway: apply | [production] | 
            
  | 09:59 | <hashar> | Deleted operations-puppet-catalog-compiler Jenkins job to replace it with a new job letting one picks the Puppet version(s) to compile against | T236373 | [production] | 
            
  | 09:58 | <arnaudb@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 09:58 | <arnaudb@cumin1001> | START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 09:58 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet | [production] | 
            
  | 09:58 | <btullis@cumin1001> | START - Cookbook sre.hosts.remove-downtime for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet | [production] | 
            
  | 09:48 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet | [production] | 
            
  | 09:48 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671 | [production] | 
            
  | 09:47 | <btullis@cumin1001> | START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671 | [production] | 
            
  | 09:42 | <btullis@cumin1001> | END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-airflow1007.eqiad.wmnet | [production] | 
            
  | 09:42 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet | [production] | 
            
  | 09:36 | <mfossati@deploy2002> | Finished deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) (duration: 00m 46s) | [production] | 
            
  | 09:35 | <mfossati@deploy2002> | Started deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) | [production] | 
            
  | 09:33 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet | [production] | 
            
  | 09:33 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet | [production] | 
            
  | 09:28 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet | [production] | 
            
  | 09:28 | <btullis@cumin1001> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet | [production] | 
            
  | 09:26 | <arnaudb@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 09:26 | <arnaudb@cumin1001> | START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 09:24 | <btullis@cumin1001> | START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet | [production] |