| 2024-08-04
      
      § | 
    
  | 15:44 | <mnz@deploy1003> | Started deploy [airflow-dags/research@d573c40]: (no justification provided) | [production] | 
            
  | 11:37 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Depooling db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67217 and previous config saved to /var/cache/conftool/dbconfig/20240804-113742-marostegui.json | [production] | 
            
  | 11:37 | <marostegui@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 11:37 | <marostegui@cumin1002> | START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 11:37 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67216 and previous config saved to /var/cache/conftool/dbconfig/20240804-113720-marostegui.json | [production] | 
            
  | 11:22 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P67215 and previous config saved to /var/cache/conftool/dbconfig/20240804-112213-marostegui.json | [production] | 
            
  | 11:07 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P67214 and previous config saved to /var/cache/conftool/dbconfig/20240804-110706-marostegui.json | [production] | 
            
  | 10:51 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67213 and previous config saved to /var/cache/conftool/dbconfig/20240804-105159-marostegui.json | [production] | 
            
  | 05:54 | <ryankemper> | [WDQS] Restart wdqs2010 to fix free allocators error | [production] | 
            
  
    | 2024-08-03
      
      § | 
    
  | 16:53 | <ryankemper@cumin2002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1022.eqiad.wmnet with OS bullseye | [production] | 
            
  | 16:15 | <ryankemper@cumin2002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye | [production] | 
            
  | 15:09 | <wmbot~lucaswerkmeister@tools-bastion-13> | FTR, the manual update-index job completed 23m ago, so about 2½ hours of runtime apparently | [tools.cdnjs] | 
            
  | 12:17 | <wmbot~lucaswerkmeister@tools-bastion-13> | kubectl create job --from=cronjob/update-index update-index # manual run to see if it works better now | [tools.cdnjs] | 
            
  | 12:05 | <wmbot~lucaswerkmeister@tools-bastion-13> | rotated + compressed update-index.err again | [tools.cdnjs] | 
            
  | 12:04 | <wmbot~lucaswerkmeister@tools-bastion-13> | kubectl delete pod update-index-28687937-tqlqw # was apparently stuck for 16 days | [tools.cdnjs] | 
            
  | 10:54 | <wmbot~lucaswerkmeister@tools-bastion-13> | regenerated GitHub token (cdnjs-index/tokenfile) | [tools.cdnjs] | 
            
  | 10:03 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Depooling db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67212 and previous config saved to /var/cache/conftool/dbconfig/20240803-100308-marostegui.json | [production] | 
            
  | 10:03 | <marostegui@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 10:02 | <marostegui@cumin1002> | START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 10:02 | <marostegui@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 10:02 | <marostegui@cumin1002> | START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 10:02 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67211 and previous config saved to /var/cache/conftool/dbconfig/20240803-100228-marostegui.json | [production] | 
            
  | 09:47 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P67210 and previous config saved to /var/cache/conftool/dbconfig/20240803-094721-marostegui.json | [production] | 
            
  | 09:32 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P67209 and previous config saved to /var/cache/conftool/dbconfig/20240803-093214-marostegui.json | [production] | 
            
  | 09:17 | <marostegui@cumin1002> | dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67208 and previous config saved to /var/cache/conftool/dbconfig/20240803-091707-marostegui.json | [production] | 
            
  | 03:09 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1260.eqiad.wmnet with OS bullseye | [production] | 
            
  | 02:50 | <ryankemper@cumin2002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye | [production] | 
            
  | 02:22 | <jclark@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1269.eqiad.wmnet with OS bullseye | [production] | 
            
  | 02:22 | <jclark@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" | [production] | 
            
  | 02:21 | <jclark@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" | [production] | 
            
  | 02:15 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1266.eqiad.wmnet with OS bullseye | [production] | 
            
  | 02:05 | <jclark@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1269.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 02:02 | <jclark@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1269.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 01:53 | <ryankemper@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 01:50 | <ryankemper@cumin2002> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 01:49 | <jclark@cumin1002> | START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:48 | <jclark@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1268.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:48 | <jclark@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" | [production] | 
            
  | 01:48 | <jclark@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" | [production] | 
            
  | 01:46 | <jclark@cumin1002> | START - Cookbook sre.hosts.reimage for host wikikube-worker1269.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:45 | <jclark@cumin1002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1267.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:45 | <jclark@cumin1002> | END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" | [production] | 
            
  | 01:45 | <jclark@cumin1002> | START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" | [production] | 
            
  | 01:37 | <jclark@cumin1002> | END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1260.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:30 | <jclark@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1268.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 01:29 | <ryankemper@cumin2002> | START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:28 | <ryankemper@cumin2002> | START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye | [production] | 
            
  | 01:28 | <jclark@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1268.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 01:28 | <jclark@cumin1002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1267.eqiad.wmnet with reason: host reimage | [production] | 
            
  | 01:25 | <jclark@cumin1002> | START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1267.eqiad.wmnet with reason: host reimage | [production] |