| 
      
        2018-04-16
      
      §
     | 
  
    
  | 09:07 | 
  <gehel> | 
  starting rolling restart of wdqs100[35] and wdqs200[123] for kernel upgrade | 
  [production] | 
            
  | 09:05 | 
  <moritzm> | 
  pooled mw1276-mw1278 (API app server canaries running stretch) | 
  [production] | 
            
  | 08:49 | 
  <gehel> | 
  first manual run of populate_admin() for maps[12]001 - T190605 | 
  [production] | 
            
  | 08:47 | 
  <marostegui@tin> | 
  Synchronized wmf-config/db-eqiad.php: Restore db1114 original main traffic weight (duration: 00m 58s) | 
  [production] | 
            
  | 08:41 | 
  <moritzm> | 
  pooled mw1261-mw1264 (app server canaries running stretch) | 
  [production] | 
            
  | 08:29 | 
  <joal@tin> | 
  Finished deploy [analytics/refinery@27416a9]: Regular weekly deploy - Mostly bugfixes from previous week huge deploy (duration: 05m 27s) | 
  [production] | 
            
  | 08:25 | 
  <_joe_> | 
  depooling mw1223 for investigation too | 
  [production] | 
            
  | 08:23 | 
  <joal@tin> | 
  Started deploy [analytics/refinery@27416a9]: Regular weekly deploy - Mostly bugfixes from previous week huge deploy | 
  [production] | 
            
  | 08:17 | 
  <marostegui@tin> | 
  Synchronized wmf-config/db-eqiad.php: Slowly repool db1114 in API (duration: 00m 58s) | 
  [production] | 
            
  | 08:04 | 
  <elukey> | 
  restart hhvm on mw[1228,1234,1281-1287,1289,1290,1312-1314,1317,1339,1343,1345,1346,1348] - more than 50% cpu usage, prevention scheme for current high load | 
  [production] | 
            
  | 08:00 | 
  <marostegui@tin> | 
  Synchronized wmf-config/db-eqiad.php: Slowly repool db1114 (duration: 00m 58s) | 
  [production] | 
            
  | 07:49 | 
  <marostegui> | 
  Stop MySQL and reboot db1114 - T191996 | 
  [production] | 
            
  | 07:46 | 
  <marostegui@tin> | 
  Synchronized wmf-config/db-eqiad.php: Depool db1114 (duration: 00m 59s) | 
  [production] | 
            
  | 07:40 | 
  <vgutierrez@neodymium> | 
  conftool action : set/pooled=no; selector: name=achernar.wikimedia.org,service=pdns_recursor | 
  [production] | 
            
  | 07:39 | 
  <vgutierrez> | 
  Depool and reimage achernar.wikimedia.org - T187090 | 
  [production] | 
            
  | 07:27 | 
  <moritzm> | 
  installing perl security updates on Debian systems | 
  [production] | 
            
  | 06:45 | 
  <TimStarling> | 
  depooled mw1230 | 
  [production] | 
            
  | 06:38 | 
  <_joe_> | 
  repooling mw1230 | 
  [production] | 
            
  | 06:20 | 
  <marostegui> | 
  Drop table flow_subscription from x1 - T149936 | 
  [production] | 
            
  | 05:59 | 
  <elukey> | 
  restart hhvm on mw[1221,1233,1280,1347] - high load | 
  [production] | 
            
  | 05:55 | 
  <elukey> | 
  repool mw1341 after investigation | 
  [production] | 
            
  | 05:48 | 
  <elukey> | 
  restart hhvm on mw1225, 1315, 1316, 1340, 1341, 1342, 1347 - high load | 
  [production] | 
            
  | 05:42 | 
  <marostegui> | 
  Reload haproxy on dbproxy1010 | 
  [production] | 
            
  | 05:36 | 
  <elukey> | 
  restart hhvm on mw1226,27,32,88 - high load | 
  [production] | 
            
  | 05:35 | 
  <_joe_> | 
  depooling mw1341 to further debug the API issue | 
  [production] | 
            
  | 05:33 | 
  <marostegui> | 
  Deploy schema change on db1087 with replication (this will generate lag in labs) - T187089 T185128 T153182 | 
  [production] | 
            
  | 05:31 | 
  <marostegui@tin> | 
  Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 59s) | 
  [production] | 
            
  | 03:02 | 
  <l10nupdate@tin> | 
  scap sync-l10n completed (1.31.0-wmf.29) (duration: 11m 09s) | 
  [production] | 
            
  
    | 
      
        2018-04-15
      
      §
     | 
  
    
  | 22:09 | 
  <ema> | 
  cp3037: restart varnish-be | 
  [production] | 
            
  | 21:45 | 
  <ema> | 
  cp3039: restart varnish-be | 
  [production] | 
            
  | 21:42 | 
  <elukey> | 
  restart hhvm on mw1286,1317,1339 - high load | 
  [production] | 
            
  | 21:31 | 
  <ema> | 
  cp3038: restart varnish-be | 
  [production] | 
            
  | 21:30 | 
  <ema> | 
  cp3036: restart varnish-be | 
  [production] | 
            
  | 20:52 | 
  <elukey> | 
  restart hhvm on mw13[43,45,46,48] - high load | 
  [production] | 
            
  | 20:48 | 
  <elukey> | 
  restart hhvm on mw13[12-14] - high load | 
  [production] | 
            
  | 20:45 | 
  <elukey> | 
  restart hhvm on mw[1285,1287,1289-1290] - high load | 
  [production] | 
            
  | 20:40 | 
  <_joe_> | 
  restart mw1344, high load | 
  [production] | 
            
  | 20:38 | 
  <elukey> | 
  restart hhvm on mw12[22,79,82] - high load | 
  [production] | 
            
  | 20:32 | 
  <elukey> | 
  restart hhvm on mw12[32-35] - high load | 
  [production] | 
            
  | 20:24 | 
  <elukey> | 
  restart hhvm on mw1229-31 - high load | 
  [production] | 
            
  | 20:24 | 
  <_joe_> | 
  restarted mw1280-4, high load | 
  [production] | 
            
  | 20:17 | 
  <elukey> | 
  restart hhvm on mw122[6-8] - high load | 
  [production] | 
            
  | 20:05 | 
  <elukey> | 
  restart hhvm on mw122[3,4] - high load | 
  [production] | 
            
  | 13:42 | 
  <elukey> | 
  restart hhvm on mw1227 due to high load (hhvm dump debug in /tmp/hhvm.44071.bt) | 
  [production] | 
            
  | 10:53 | 
  <elukey> | 
  powercycle mw1272 - not responsive to ssh, mgmt com2 console showing "[OK" and no tty | 
  [production] |