| 
      
        2017-10-29
      
      §
     | 
  
    
  | 23:49 | 
  <ema> | 
  powercycle cp4024 | 
  [production] | 
            
  | 22:31 | 
  <ariel@tin> | 
  Finished deploy [dumps/dumps@2aa2275]: fix keep setting to work with overrides (duration: 00m 02s) | 
  [production] | 
            
  | 22:31 | 
  <ariel@tin> | 
  Started deploy [dumps/dumps@2aa2275]: fix keep setting to work with overrides | 
  [production] | 
            
  | 17:55 | 
  <ariel@tin> | 
  Finished deploy [dumps/dumps@d8978ce]: add overrides section processing to config file (duration: 00m 04s) | 
  [production] | 
            
  | 17:55 | 
  <ariel@tin> | 
  Started deploy [dumps/dumps@d8978ce]: add overrides section processing to config file | 
  [production] | 
            
  | 17:23 | 
  <ariel@tin> | 
  Finished deploy [dumps/dumps@d426cf7]: batch 7z jobs, multistream job fixup (duration: 00m 02s) | 
  [production] | 
            
  | 17:23 | 
  <ariel@tin> | 
  Started deploy [dumps/dumps@d426cf7]: batch 7z jobs, multistream job fixup | 
  [production] | 
            
  | 12:54 | 
  <ema> | 
  cp4026: restart varnish-be for mbox lag | 
  [production] | 
            
  
    | 
      
        2017-10-28
      
      §
     | 
  
    
  | 21:03 | 
  <bblack> | 
  cp1067 (current target cache): disabling the relatively-new VCL that sets do_stream=false if !CL on applayer fetches... | 
  [production] | 
            
  | 19:39 | 
  <hoo@tin> | 
  Synchronized wmf-config/CommonSettings.php: Half the Flow -> Parsoid timeout (100s -> 50s) (T179156) (duration: 00m 51s) | 
  [production] | 
            
  | 19:39 | 
  <bblack> | 
  backend restart on cp1065 | 
  [production] | 
            
  | 18:39 | 
  <bblack> | 
  restarting varnish backend on cp1053 to move the lag/503 issues to another box and buy more time to debug | 
  [production] | 
            
  | 18:28 | 
  <bblack> | 
  cp4025 - restart backend for mailbox lag (upload@ulsfo, unrelated to text-cluster issues) | 
  [production] | 
            
  | 18:21 | 
  <bblack> | 
  cp1053 - manual VCL change, backends appservers+api_appservers, reduce connect/firstbyte/betweenbytes timeoues from 5/180/60 to 3/20/10 | 
  [production] | 
            
  | 16:51 | 
  <elukey> | 
  restart varnish backend on cp1055 - mailbox lag + T179156 | 
  [production] | 
            
  | 12:14 | 
  <elukey@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet | 
  [production] | 
            
  | 12:10 | 
  <elukey> | 
  manually killed (SIGTERM) hhvm on mw1313 - high load, hhvm-dump-debug not responsive | 
  [production] | 
            
  | 12:01 | 
  <elukey@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet | 
  [production] | 
            
  | 11:53 | 
  <elukey> | 
  restart hhvm on mw1285 - hhvm-dump-debug in /tmp/hhvm.17700.bt | 
  [production] | 
            
  | 11:24 | 
  <hoo@tin> | 
  Synchronized wmf-config/Wikibase-labs.php: Consistency sync (duration: 00m 50s) | 
  [production] | 
            
  | 10:52 | 
  <volans> | 
  restarted pdfrender on scb1001, was stuck since 2d with AssertionError: display is not set! | 
  [production] | 
            
  
    | 
      
        2017-10-27
      
      §
     | 
  
    
  | 20:54 | 
  <MaxSem> | 
  running migratePreferences.php on group2 wikis | 
  [production] | 
            
  | 19:09 | 
  <hoo> | 
  Ran scap pull on mwdebug1001 | 
  [production] | 
            
  | 18:20 | 
  <awight@tin> | 
  Finished deploy [ores/deploy@185170f]: Test pip-9 scap trick on ores1002 (non-production) (duration: 02m 17s) | 
  [production] | 
            
  | 18:18 | 
  <awight@tin> | 
  Started deploy [ores/deploy@185170f]: Test pip-9 scap trick on ores1002 (non-production) | 
  [production] | 
            
  | 17:54 | 
  <hoo> | 
  Taking mwdebug1001 to do tests regarding T179156 | 
  [production] | 
            
  | 16:38 | 
  <gehel> | 
  re-enabling wdqs-updater | 
  [production] | 
            
  | 16:16 | 
  <bblack> | 
  cp1054 varnish backend restarted (was 503s / bad-conns target of ongoing issues) | 
  [production] | 
            
  | 16:16 | 
  <gehel> | 
  wdqs updater is now stopped for real | 
  [production] | 
            
  | 16:10 | 
  <XioNoX> | 
  deactivating BGP sessions to Zayo in eqiad (flapping) | 
  [production] | 
            
  | 15:58 | 
  <gehel> | 
  disabling wdqs updater on all nodes | 
  [production] | 
            
  | 15:50 | 
  <hoo@tin> | 
  Synchronized wmf-config/Wikibase-production.php: Disable constraints check with SPARQL for now (T179156) (duration: 00m 50s) | 
  [production] | 
            
  | 15:48 | 
  <marostegui> | 
  Compress InnoDB on db2038 (s6) - T178359 | 
  [production] | 
            
  | 15:46 | 
  <bblack> | 
  restart varnish-backend on cp4022 (upload@ulsfo) - mailbox | 
  [production] | 
            
  | 14:49 | 
  <bblack> | 
  turn on cp4024 port on asw-ulsfo | 
  [production] | 
            
  | 13:52 | 
  <bblack> | 
  reboot cp4021 to clean up oom messes | 
  [production] | 
            
  | 13:49 | 
  <bblack> | 
  restarting nginx on cp4021, without NUMA memory constraints | 
  [production] | 
            
  | 12:10 | 
  <marostegui> | 
  Optimize commonswiki.templatelinks on dbstore1001 - T162789 | 
  [production] | 
            
  | 12:03 | 
  <elukey> | 
  execute systemctl reset-failed kafka-mirror-main-eqiad_to_jumbo-eqiad.service on kafka-jumbo hosts (old unit not deployed anymore) | 
  [production] | 
            
  | 11:41 | 
  <mutante> | 
  gerrit back - maintenance over | 
  [production] | 
            
  | 11:39 | 
  <mutante> | 
  gerrit restart to apply gerrit:386793 is imminent | 
  [production] | 
            
  | 11:36 | 
  <ema> | 
  cp4023: varnish-backend-restart for lag | 
  [production] | 
            
  | 11:06 | 
  <mobrovac@tin> | 
  Finished deploy [citoid/deploy@ff63420]: Update dependencies (duration: 03m 18s) | 
  [production] |