| 
      
        2021-07-26
      
      §
     | 
  
    
  | 10:52 | 
  <ladsgroup@deploy1002> | 
  Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org) | 
  [production] | 
            
  | 10:51 | 
  <jynus> | 
  deploying 10 second mw user query limit on s3 codfw replicas | 
  [production] | 
            
  | 10:49 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json | 
  [production] | 
            
  | 10:46 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json | 
  [production] | 
            
  | 10:46 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json | 
  [production] | 
            
  | 10:38 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json | 
  [production] | 
            
  | 10:33 | 
  <oblivian@deploy1002> | 
  helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | 
  [production] | 
            
  | 09:55 | 
  <jgiannelos@deploy1002> | 
  helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . | 
  [production] | 
            
  | 09:15 | 
  <XioNoX> | 
  rollback sampling for T286038 | 
  [production] | 
            
  | 08:31 | 
  <jmm@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet | 
  [production] | 
            
  | 08:27 | 
  <jmm@cumin2002> | 
  START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet | 
  [production] | 
            
  | 08:26 | 
  <jmm@cumin2002> | 
  END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet | 
  [production] | 
            
  | 08:11 | 
  <jmm@cumin2002> | 
  START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet | 
  [production] | 
            
  | 07:18 | 
  <_joe_> | 
  docker-image prune on deneb T287222 | 
  [production] | 
            
  | 07:17 | 
  <_joe_> | 
  manage-production-images prune on deneb, T287222 | 
  [production] | 
            
  | 07:08 | 
  <marostegui> | 
  Optimize dewiki.logging in eqiad (there will be lag) | 
  [production] | 
            
  | 06:39 | 
  <moritzm> | 
  installing krb5 security updates | 
  [production] | 
            
  | 05:55 | 
  <Amir1> | 
  start cleaning up auto-review flagged revs logs in plwiki | 
  [production] | 
            
  
    | 
      
        2021-07-23
      
      §
     | 
  
    
  | 19:11 | 
  <topranks> | 
  Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - T287110 | 
  [production] | 
            
  | 19:02 | 
  <topranks> | 
  De-pooling eqiad again after successful replacement of linecard in cr2-codfw T287110 | 
  [production] | 
            
  | 18:26 | 
  <legoktm@deploy1002> | 
  helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' . | 
  [production] | 
            
  | 18:24 | 
  <legoktm@deploy1002> | 
  helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' . | 
  [production] | 
            
  | 18:14 | 
  <topranks> | 
  Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0. | 
  [production] | 
            
  | 18:12 | 
  <legoktm@deploy1002> | 
  helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' . | 
  [production] | 
            
  | 16:15 | 
  <effie> | 
  enable puppet on mc-gp* hosts | 
  [production] | 
            
  | 15:47 | 
  <papaul> | 
  powerdown wdqs2002 for IDRAC reset | 
  [production] | 
            
  | 15:45 | 
  <elukey@deploy1002> | 
  helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 15:44 | 
  <elukey@deploy1002> | 
  helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | 
  [production] | 
            
  | 15:11 | 
  <elukey> | 
  stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - T287238 | 
  [production] | 
            
  | 14:36 | 
  <_joe_> | 
  rebuilding httpd-fcgi, mediawiki-http fixing logging T285384 | 
  [production] | 
            
  | 14:16 | 
  <brennen> | 
  gitlab1001: running ansible to deploy [[gerrit:707236|fix puma exporter listen address]] (T275170) | 
  [production] | 
            
  | 13:35 | 
  <otto@deploy1002> | 
  Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232 (duration: 03m 32s) | 
  [production] | 
            
  | 13:31 | 
  <otto@deploy1002> | 
  Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232 | 
  [production] | 
            
  | 12:16 | 
  <jelto@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309 | 
  [production] | 
            
  | 12:16 | 
  <jelto@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309 | 
  [production] | 
            
  | 12:15 | 
  <jelto@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309 | 
  [production] | 
            
  | 12:15 | 
  <jelto@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309 | 
  [production] | 
            
  | 11:50 | 
  <marostegui> | 
  Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - T287244 | 
  [production] | 
            
  | 11:17 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet | 
  [production] | 
            
  | 11:17 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet | 
  [production] | 
            
  | 11:11 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet | 
  [production] | 
            
  | 11:11 | 
  <dzahn@cumin1001> | 
  conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet | 
  [production] | 
            
  | 11:00 | 
  <dzahn@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host | 
  [production] | 
            
  | 11:00 | 
  <dzahn@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host | 
  [production] | 
            
  | 10:58 | 
  <arturo> | 
  adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001 | 
  [production] | 
            
  | 10:02 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet | 
  [production] | 
            
  | 09:57 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet | 
  [production] | 
            
  | 09:49 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet | 
  [production] | 
            
  | 09:47 | 
  <dzahn@cumin1001> | 
  conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet | 
  [production] |