| 2022-04-05
      
      § | 
    
  | 01:59 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423 | [production] | 
            
  | 01:59 | <sukhe@cumin2002> | START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423 | [production] | 
            
  | 01:57 | <eileen> | process control config revision changed from 06379640 to 25728a0e | [production] | 
            
  | 01:51 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24078 and previous config saved to /var/cache/conftool/dbconfig/20220405-015114-ladsgroup.json | [production] | 
            
  | 01:47 | <sukhe@cumin2002> | END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp5002.eqsin.wmnet | [production] | 
            
  | 01:42 | <eileen> | civicrm revision changed from 84c737b6 to 87bc3114 | [production] | 
            
  | 01:37 | <eileen> | config revision changed from bb0e1af3 to 06379640 | [production] | 
            
  | 01:36 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24077 and previous config saved to /var/cache/conftool/dbconfig/20220405-013609-ladsgroup.json | [production] | 
            
  | 01:15 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet | [production] | 
            
  | 01:07 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet | [production] | 
            
  | 01:06 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet | [production] | 
            
  | 01:02 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet | [production] | 
            
  | 00:58 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet | [production] | 
            
  | 00:53 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet | [production] | 
            
  | 00:53 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet | [production] | 
            
  | 00:51 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet | [production] | 
            
  | 00:51 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp4034.ulsfo.wmnet | [production] | 
            
  | 00:50 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet | [production] | 
            
  | 00:43 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp5016.eqsin.wmnet | [production] | 
            
  | 00:42 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet | [production] | 
            
  | 00:42 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet | [production] | 
            
  | 00:40 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet | [production] | 
            
  | 00:39 | <mutante> | gitlab1001 - mv 1648814678_2022_04_01_14.9.1_gitlab_backup.tar and other files from April 2nd/April 3rd over from /srv/gitlab-backup to /mnt/gitlab-backup to prevent another outage due to disk space T274463 | [production] | 
            
  | 00:36 | <mutante> | gitlab2001 - apt-get clean to prevent disk space issues | [production] | 
            
  | 00:34 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24076 and previous config saved to /var/cache/conftool/dbconfig/20220405-003419-ladsgroup.json | [production] | 
            
  | 00:34 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 00:34 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 00:34 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24075 and previous config saved to /var/cache/conftool/dbconfig/20220405-003405-ladsgroup.json | [production] | 
            
  | 00:33 | <sukhe@cumin2002> | START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet | [production] | 
            
  | 00:33 | <dzahn@cumin2002> | conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet | [production] | 
            
  | 00:33 | <dzahn@cumin2002> | conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1047.eqiad.wmnet | [production] | 
            
  | 00:32 | <mutante> | gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... T274463 - <+icinga-wm> RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK | [production] | 
            
  | 00:30 | <mutante> | gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... | [production] | 
            
  | 00:27 | <dzahn@cumin2002> | conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1048.eqiad.wmnet | [production] | 
            
  | 00:23 | <mutante> | wtp1046, wtp1047, wtp1048 - rebooting, one at a time | [production] | 
            
  | 00:21 | <dzahn@cumin2002> | conftool action : set/pooled=no; selector: dc=eqiad,name=wtp104[6-8].eqiad.wmnet | [production] | 
            
  | 00:19 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24074 and previous config saved to /var/cache/conftool/dbconfig/20220405-001900-ladsgroup.json | [production] | 
            
  | 00:18 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet | [production] | 
            
  | 00:17 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet | [production] | 
            
  | 00:16 | <sukhe@cumin2002> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet | [production] | 
            
  | 00:03 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24073 and previous config saved to /var/cache/conftool/dbconfig/20220405-000355-ladsgroup.json | [production] | 
            
  
    | 2022-04-04
      
      § | 
    
  | 23:51 | <mutante> | apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold  --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659) | [production] | 
            
  | 23:48 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json | [production] | 
            
  | 22:48 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json | [production] | 
            
  | 22:48 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 22:48 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 22:48 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json | [production] | 
            
  | 22:33 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json | [production] | 
            
  | 22:18 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json | [production] | 
            
  | 22:03 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json | [production] |