| 
      
        2024-04-25
      
      ยง
     | 
  
    
  | 17:13 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 17:12 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P61218 and previous config saved to /var/cache/conftool/dbconfig/20240425-171218-ladsgroup.json | 
  [production] | 
            
  | 16:34 | 
  <mutante> | 
  releases1003 - docker and containerd restarted by manually starting wmf_auto_restart services | 
  [production] | 
            
  | 15:38 | 
  <dancy@deploy1002> | 
  Finished scap: Testing (duration: 08m 44s) | 
  [production] | 
            
  | 15:34 | 
  <mforns@deploy1002> | 
  Finished deploy [airflow-dags/analytics@b17acd0]: (no justification provided) (duration: 00m 27s) | 
  [production] | 
            
  | 15:33 | 
  <mforns@deploy1002> | 
  Started deploy [airflow-dags/analytics@b17acd0]: (no justification provided) | 
  [production] | 
            
  | 15:29 | 
  <dancy@deploy1002> | 
  Started scap: Testing | 
  [production] | 
            
  | 15:29 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 15:27 | 
  <dancy@deploy1002> | 
  sync-world aborted: Testing (duration: 01m 33s) | 
  [production] | 
            
  | 15:26 | 
  <klausman@cumin1002> | 
  END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin1002 | 
  [production] | 
            
  | 15:25 | 
  <dancy@deploy1002> | 
  Started scap: Testing | 
  [production] | 
            
  | 15:12 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Depooling db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P61216 and previous config saved to /var/cache/conftool/dbconfig/20240425-151120-ladsgroup.json | 
  [production] | 
            
  | 15:12 | 
  <ladsgroup@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 15:12 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 15:11 | 
  <ladsgroup@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 15:11 | 
  <ladsgroup@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance | 
  [production] | 
            
  | 15:10 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P61215 and previous config saved to /var/cache/conftool/dbconfig/20240425-151041-ladsgroup.json | 
  [production] | 
            
  | 15:07 | 
  <klausman@cumin1002> | 
  START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin1002 | 
  [production] | 
            
  | 15:07 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: host reimage | 
  [production] | 
            
  | 15:03 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on db2155.codfw.wmnet with reason: host reimage | 
  [production] | 
            
  | 14:59 | 
  <klausman@cumin1002> | 
  END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin1002 | 
  [production] | 
            
  | 14:55 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P61214 and previous config saved to /var/cache/conftool/dbconfig/20240425-145534-ladsgroup.json | 
  [production] | 
            
  | 14:53 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db2187.codfw.wmnet with reason: Host has hardware issues | 
  [production] | 
            
  | 14:53 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 20:00:00 on db2187.codfw.wmnet with reason: Host has hardware issues | 
  [production] | 
            
  | 14:44 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: Host has hardware issues | 
  [production] | 
            
  | 14:44 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: Host has hardware issues | 
  [production] | 
            
  | 14:41 | 
  <klausman@cumin1002> | 
  START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin1002 | 
  [production] | 
            
  | 14:40 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P61213 and previous config saved to /var/cache/conftool/dbconfig/20240425-144027-ladsgroup.json | 
  [production] | 
            
  | 14:29 | 
  <moritzm> | 
  installing Java 11 security updates | 
  [production] | 
            
  | 14:25 | 
  <ladsgroup@cumin1002> | 
  dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P61212 and previous config saved to /var/cache/conftool/dbconfig/20240425-142520-ladsgroup.json | 
  [production] | 
            
  | 14:21 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 14:15 | 
  <arnaudb@cumin1002> | 
  END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 14:10 | 
  <root@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1005.eqiad.wmnet with OS bullseye | 
  [production] | 
            
  | 14:10 | 
  <root@cumin1002> | 
  END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - root@cumin1002" | 
  [production] | 
            
  | 13:47 | 
  <claime> | 
  UTC afternoon backports window closed | 
  [production] | 
            
  | 13:45 | 
  <root@cumin1002> | 
  START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - root@cumin1002" | 
  [production] | 
            
  | 13:44 | 
  <cgoubert@deploy1002> | 
  Finished scap: Backport for [[gerrit:1024345|Set conflicting gadget settings for the Cite extension (T362771)]] (duration: 21m 33s) | 
  [production] | 
            
  | 13:28 | 
  <arnaudb@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: host reimage | 
  [production] | 
            
  | 13:26 | 
  <cgoubert@deploy1002> | 
  cgoubert and wmde-fisch: Continuing with sync | 
  [production] | 
            
  | 13:26 | 
  <cgoubert@deploy1002> | 
  cgoubert and wmde-fisch: Backport for [[gerrit:1024345|Set conflicting gadget settings for the Cite extension (T362771)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | 
  [production] | 
            
  | 13:26 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on db2155.codfw.wmnet with reason: host reimage | 
  [production] | 
            
  | 13:23 | 
  <root@cumin1002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 13:22 | 
  <cgoubert@deploy1002> | 
  Started scap: Backport for [[gerrit:1024345|Set conflicting gadget settings for the Cite extension (T362771)]] | 
  [production] | 
            
  | 13:20 | 
  <root@cumin1002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: host reimage | 
  [production] | 
            
  | 13:19 | 
  <cgoubert@deploy1002> | 
  Finished scap: Backport for [[gerrit:1020280|ClusterConfigTest: Add mw-on-k8s specific tests]] (duration: 14m 54s) | 
  [production] | 
            
  | 13:09 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm | 
  [production] | 
            
  | 13:08 | 
  <arnaudb@cumin1002> | 
  END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2155.codfw.wmnet with OS bullseye | 
  [production] | 
            
  | 13:07 | 
  <cgoubert@deploy1002> | 
  cgoubert: Continuing with sync | 
  [production] | 
            
  | 13:07 | 
  <cgoubert@deploy1002> | 
  cgoubert: Backport for [[gerrit:1020280|ClusterConfigTest: Add mw-on-k8s specific tests]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) | 
  [production] | 
            
  | 13:04 | 
  <arnaudb@cumin1002> | 
  START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bullseye | 
  [production] |