| 
      
        2020-09-04
      
      §
     | 
  
    
  | 22:15 | 
  <ryankemper> | 
  wdqs deploy complete, service is healthy | 
  [production] | 
            
  | 21:54 | 
  <ryankemper> | 
  `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` | 
  [production] | 
            
  | 21:52 | 
  <ryankemper> | 
  `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` | 
  [production] | 
            
  | 21:49 | 
  <ryankemper@deploy1001> | 
  Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s) | 
  [production] | 
            
  | 21:37 | 
  <ryankemper> | 
  Tests on canary `wdqs1003` passing, beginning full wdqs deploy | 
  [production] | 
            
  | 21:36 | 
  <ryankemper@deploy1001> | 
  Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47 | 
  [production] | 
            
  | 21:31 | 
  <ryankemper> | 
  `ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph` | 
  [production] | 
            
  | 21:06 | 
  <mutante> | 
  apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light (T261962) | 
  [production] | 
            
  | 21:02 | 
  <mutante> | 
  apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo | 
  [production] | 
            
  | 20:59 | 
  <mutante> | 
  apt2001 - sudo apt-get autoremove | 
  [production] | 
            
  | 20:51 | 
  <mutante> | 
  apt2001 - apt-get remove --purge libnginx*  and run puppet to replace nginx-full with nginx-light (T261962) | 
  [production] | 
            
  | 20:43 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:41 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:39 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:38 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:38 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:36 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:36 | 
  <cmjohnson@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 20:35 | 
  <cmjohnson@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 20:34 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:32 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:31 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:31 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:30 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:30 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:05 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:04 | 
  <cmjohnson@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 20:03 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:01 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 20:01 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 20:00 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 19:59 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 19:59 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 19:57 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 19:57 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 19:22 | 
  <mutante> | 
  Icinga - ACKing with sticky - alerts on test and dev hosts | 
  [production] | 
            
  | 18:10 | 
  <milimetric@deploy1001> | 
  Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s) | 
  [production] | 
            
  | 18:02 | 
  <milimetric@deploy1001> | 
  Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing | 
  [production] | 
            
  | 10:31 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) | 
  [production] | 
            
  | 10:29 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json | 
  [production] | 
            
  | 10:28 | 
  <marostegui> | 
  Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hosts  T238966 | 
  [production] | 
            
  | 09:48 | 
  <marostegui> | 
  Restart prometheus-mysqld-exporter on db2125 | 
  [production] | 
            
  | 09:11 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hadoop.roll-restart-workers | 
  [production] | 
            
  | 08:58 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) | 
  [production] | 
            
  | 08:31 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hadoop.roll-restart-workers | 
  [production] | 
            
  | 08:29 | 
  <elukey> | 
  roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades | 
  [production] | 
            
  | 08:08 | 
  <moritzm> | 
  installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately) | 
  [production] | 
            
  | 07:30 | 
  <moritzm> | 
  installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately) | 
  [production] | 
            
  | 05:13 | 
  <marostegui> | 
  Deploy MCR schema change on s4 eqiad master T238966 | 
  [production] |