| 2021-10-01
      
      § | 
    
  | 23:19 | <bd808@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . | [production] | 
            
  | 22:27 | <mutante> | puppetmaster2001 - systemctl reset-failed | [production] | 
            
  | 22:16 | <mutante> | puppetmaster2001 systemctl disable geoip_update_ipinfo.timer | [production] | 
            
  | 22:15 | <mutante> | puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for T288844 | [production] | 
            
  | 21:56 | <bd808@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . | [production] | 
            
  | 21:44 | <mutante> | puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - T288844 | [production] | 
            
  | 21:19 | <mutante> | puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' T273673 | [production] | 
            
  | 21:12 | <mutante> | puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001  - T273673 | [production] | 
            
  | 21:07 | <mutante> | puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role | [production] | 
            
  | 21:06 | <mbsantos@deploy1002> | Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s) | [production] | 
            
  | 21:05 | <mbsantos@deploy1002> | Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend | [production] | 
            
  | 21:05 | <mutante> | puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer | [production] | 
            
  | 21:05 | <mwdebug-deploy@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 21:02 | <mwdebug-deploy@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | [production] | 
            
  | 21:01 | <legoktm@deploy1002> | Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s) | [production] | 
            
  | 20:58 | <mutante> | temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) T273673 | [production] | 
            
  | 18:58 | <robh@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 18:56 | <robh@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 18:55 | <robh@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 18:53 | <robh@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 18:07 | <robh@cumin1001> | END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet | [production] | 
            
  | 18:05 | <robh@cumin1001> | START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet | [production] | 
            
  | 17:58 | <effie> | depool mw1025, mw1319, mw1312 for test | [production] | 
            
  | 16:20 | <dancy> | testing upcoming Scap 4.0.2 release on beta | [production] | 
            
  | 14:04 | <bblack> | C:envoyproxy (appservers and others): restarting envoyproxy | [production] | 
            
  | 14:04 | <bblack> | C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround T292291 issues | [production] | 
            
  | 13:45 | <elukey@deploy1002> | helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | [production] | 
            
  | 13:45 | <elukey@deploy1002> | helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | [production] | 
            
  | 13:23 | <bblack> | manually trying LE expired root workaround on mwdebug1001 with puppet disabled ... | [production] | 
            
  | 13:12 | <gehel@cumin1001> | START - Cookbook sre.wdqs.data-reload | [production] | 
            
  | 13:11 | <gehel@cumin1001> | END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) | [production] | 
            
  | 13:11 | <gehel@cumin1001> | START - Cookbook sre.wdqs.data-reload | [production] | 
            
  | 13:10 | <gehel@cumin1001> | START - Cookbook sre.wdqs.data-reload | [production] | 
            
  | 11:42 | <jgiannelos@deploy1002> | helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . | [production] | 
            
  | 11:11 | <jynus> | manually migrating some vms out of ganeti1009 to avoid excessive memory pressure | [production] | 
            
  | 10:58 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17413 and previous config saved to /var/cache/conftool/dbconfig/20211001-105849-root.json | [production] | 
            
  | 10:57 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17412 and previous config saved to /var/cache/conftool/dbconfig/20211001-105735-root.json | [production] | 
            
  | 10:43 | <jgiannelos@deploy1002> | Finished deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 49s) | [production] | 
            
  | 10:43 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17411 and previous config saved to /var/cache/conftool/dbconfig/20211001-104345-root.json | [production] | 
            
  | 10:43 | <jgiannelos@deploy1002> | Started deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad | [production] | 
            
  | 10:42 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17410 and previous config saved to /var/cache/conftool/dbconfig/20211001-104232-root.json | [production] | 
            
  | 10:28 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17409 and previous config saved to /var/cache/conftool/dbconfig/20211001-102841-root.json | [production] | 
            
  | 10:27 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17408 and previous config saved to /var/cache/conftool/dbconfig/20211001-102728-root.json | [production] | 
            
  | 10:13 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17407 and previous config saved to /var/cache/conftool/dbconfig/20211001-101338-root.json | [production] | 
            
  | 10:12 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17406 and previous config saved to /var/cache/conftool/dbconfig/20211001-101224-root.json | [production] |