| 
      
        2019-08-19
      
      §
     | 
  
    
  | 09:26 | 
  <marostegui@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 09:24 | 
  <marostegui@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 08:57 | 
  <godog> | 
  add 100G to graphite1004 / graphite2003 /srv LVs | 
  [production] | 
            
  | 07:59 | 
  <onimisionipe> | 
  shutdown elastic2050 to prepare for mgmt reset - T230597 | 
  [production] | 
            
  | 07:40 | 
  <marostegui> | 
  Redact napwikisource on db1124 and db2094 - T210762 | 
  [production] | 
            
  | 07:19 | 
  <moritzm> | 
  installing golang-1.11 security updates on buster | 
  [production] | 
            
  | 07:08 | 
  <moritzm> | 
  installing ffmpeg security updates on buster | 
  [production] | 
            
  | 06:37 | 
  <vgutierrez> | 
  upgrading acme-chief to version 0.20 on production servers - T229096 | 
  [production] | 
            
  | 06:30 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet | 
  [production] | 
            
  | 06:29 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet | 
  [production] | 
            
  | 06:28 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet | 
  [production] | 
            
  | 06:27 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet | 
  [production] | 
            
  | 06:26 | 
  <moritzm> | 
  installing ghostscript security updates on scb/proton/notebook* hosts | 
  [production] | 
            
  | 06:25 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet | 
  [production] | 
            
  | 06:25 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet | 
  [production] | 
            
  | 06:24 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet | 
  [production] | 
            
  | 06:22 | 
  <vgutierrez@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet | 
  [production] | 
            
  | 06:21 | 
  <vgutierrez> | 
  rolling upgrade of nginx in ncredir hosts | 
  [production] | 
            
  | 06:03 | 
  <moritzm> | 
  installing php5 security updates | 
  [production] | 
            
  | 05:51 | 
  <marostegui@deploy1001> | 
  Synchronized wmf-config/db-eqiad.php: Remove db2067 from config T230705  (duration: 00m 47s) | 
  [production] | 
            
  | 05:50 | 
  <marostegui@deploy1001> | 
  Synchronized wmf-config/db-codfw.php: Remove db2067 from config T230705  (duration: 00m 50s) | 
  [production] | 
            
  | 05:46 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool db2067, will be moved to m1 T230705', diff saved to https://phabricator.wikimedia.org/P8930 and previous config saved to /var/cache/conftool/dbconfig/20190819-054606-marostegui.json | 
  [production] | 
            
  | 05:29 | 
  <elukey> | 
  reboot cp2004 due to bnx2x crash (kern.log saved into my home on the host if needed) | 
  [production] | 
            
  
    | 
      
        2019-08-16
      
      §
     | 
  
    
  | 19:48 | 
  <sbassett> | 
  Deployed security patch for T230576 (ex:MobileFrontend) | 
  [production] | 
            
  | 18:57 | 
  <@> | 
  helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' . | 
  [production] | 
            
  | 16:38 | 
  <XioNoX> | 
  add BGP sessions to Scaleway (AS12876) in esams | 
  [production] | 
            
  | 16:12 | 
  <elukey> | 
  upload prometheus-druid-exporter 0.7-1 to stretch/buster-wikimedia | 
  [production] | 
            
  | 15:42 | 
  <elukey> | 
  roll restart of druid broker/historicals to pick up new logging/metrics settings | 
  [production] | 
            
  | 14:39 | 
  <onimisionipe> | 
  run `bmc-device --cold-reset; echo $?` in elastic2050 hoping it resets mgmt interface -T230597 | 
  [production] | 
            
  | 14:24 | 
  <gehel> | 
  rolling reboot of cloudelastic | 
  [production] | 
            
  | 13:52 | 
  <mholloway-shell@deploy1001> | 
  Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision (beta): Request labels targeting Beta Wikidata (duration: 00m 50s) | 
  [production] | 
            
  | 08:18 | 
  <_joe_> | 
  stopping php on phab1003, to restart it with systemd | 
  [production] | 
            
  | 06:50 | 
  <_joe_> | 
  upgrading envoyproxy across production (http2 CVEs) | 
  [production] | 
            
  | 02:51 | 
  <vgutierrez> | 
  repooling cp5002, running compress.so experiment | 
  [production] | 
            
  
    | 
      
        2019-08-15
      
      §
     | 
  
    
  | 23:35 | 
  <smalyshev@deploy1001> | 
  Finished deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588 (duration: 09m 48s) | 
  [production] | 
            
  | 23:25 | 
  <smalyshev@deploy1001> | 
  Started deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588 | 
  [production] | 
            
  | 21:54 | 
  <smalyshev@deploy1001> | 
  Finished deploy [wdqs/wdqs@fce8177]: Weekly deploy (duration: 25m 28s) | 
  [production] | 
            
  | 21:28 | 
  <smalyshev@deploy1001> | 
  Started deploy [wdqs/wdqs@fce8177]: Weekly deploy | 
  [production] | 
            
  | 21:27 | 
  <ebernhardson> | 
  finish restarting cloudelastic-chi-eqiad with -XX:NewRatio=3 | 
  [production] | 
            
  | 21:18 | 
  <ebernhardson> | 
  increase cloudelastic indices.recovery.max_bytes_per_sec from 40mbit to 512mbit as these have 10G networking | 
  [production] | 
            
  | 21:07 | 
  <ebernhardson> | 
  restart cloudelastic1002 with -XX:NewRatio=3 to match cloudelastic1001 | 
  [production] | 
            
  | 20:22 | 
  <gehel@cumin1001> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 19:37 | 
  <ema> | 
  depool cp5002 during the EU night, running compress.so experiment | 
  [production] | 
            
  | 19:28 | 
  <gehel@cumin1001> | 
  END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0) | 
  [production] | 
            
  | 19:19 | 
  <sbassett> | 
  Deployed security patch for T230402 (1.34.0-wmf.17) | 
  [production] | 
            
  | 19:18 | 
  <gehel@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 19:18 | 
  <sbassett> | 
  Deployed security patch for T229541 (1.34.0-wmf.17) | 
  [production] | 
            
  | 19:17 | 
  <gehel@cumin1001> | 
  END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) | 
  [production] | 
            
  | 19:17 | 
  <gehel@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] |