| 
      
        2020-05-08
      
      ยง
     | 
  
    
  | 21:45 | 
  <bstorm_> | 
  cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598 | 
  [production] | 
            
  | 21:45 | 
  <bstorm_> | 
  cleaned up wb_terms_no_longer_updated view on labsdb1012 T251598 | 
  [production] | 
            
  | 21:33 | 
  <bstorm_> | 
  cleaning up wb_terms_no_longer_updated view on labsdb1009 T251598 | 
  [production] | 
            
  | 21:06 | 
  <ottomata> | 
  running prefered replica election for kafka-jumbo  to get preferred leaders back after reboot of broker earlier today - T252203 | 
  [production] | 
            
  | 19:16 | 
  <jhuneidi@deploy1001> | 
  helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' . | 
  [production] | 
            
  | 19:12 | 
  <jhuneidi@deploy1001> | 
  helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' . | 
  [production] | 
            
  | 19:07 | 
  <jhuneidi@deploy1001> | 
  helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . | 
  [production] | 
            
  | 18:12 | 
  <andrewbogott> | 
  reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for T252121 | 
  [production] | 
            
  | 17:59 | 
  <marostegui> | 
  Extend /srv by 500G on labsdb1011 T249188 | 
  [production] | 
            
  | 16:55 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 16:53 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 16:51 | 
  <cmjohnson@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 16:48 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 16:39 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 16:37 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 16:14 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 16:12 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 15:43 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 15:41 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 15:36 | 
  <ottomata> | 
  starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203 | 
  [production] | 
            
  | 15:31 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 15:28 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 15:27 | 
  <ottomata> | 
  stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203 | 
  [production] | 
            
  | 14:50 | 
  <otto@deploy1001> | 
  Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s) | 
  [production] | 
            
  | 14:50 | 
  <otto@deploy1001> | 
  Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only | 
  [production] | 
            
  | 14:05 | 
  <akosiaris> | 
  T243106 undo experiment with DROP iptable rules this time around. Use mw1331, mw1348 | 
  [production] | 
            
  | 13:22 | 
  <vgutierrez> | 
  rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - T249335 | 
  [production] | 
            
  | 13:20 | 
  <akosiaris> | 
  T243106 redo experiment with DROP iptable rules this time around. Use mw1331, mw1348 | 
  [production] | 
            
  | 13:16 | 
  <akosiaris> | 
  T243106 undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure. | 
  [production] | 
            
  | 12:49 | 
  <akosiaris> | 
  T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348 | 
  [production] | 
            
  | 12:49 | 
  <akosiaris> | 
  T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle | 
  [production] | 
            
  | 11:49 | 
  <hnowlan> | 
  restarting cassandra on restbase2009 for java updates | 
  [production] | 
            
  | 11:28 | 
  <cmjohnson@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | 
  [production] | 
            
  | 11:25 | 
  <cmjohnson@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 11:08 | 
  <akosiaris> | 
  repool eqiad eventgate-analytics. Test concluded | 
  [production] | 
            
  | 11:08 | 
  <akosiaris@cumin1001> | 
  conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics | 
  [production] | 
            
  | 09:54 | 
  <mutante> | 
  disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of | 
  [production] | 
            
  | 09:52 | 
  <akosiaris> | 
  depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster | 
  [production] | 
            
  | 09:52 | 
  <akosiaris@cumin1001> | 
  conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics | 
  [production] | 
            
  | 09:51 | 
  <akosiaris@cumin1001> | 
  conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics | 
  [production] | 
            
  | 09:45 | 
  <oblivian@puppetmaster1001> | 
  conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.* | 
  [production] | 
            
  | 08:20 | 
  <vgutierrez> | 
  rolling restart of ats-tls on esams - T249335 | 
  [production] | 
            
  | 07:19 | 
  <vgutierrez> | 
  ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - T249335 | 
  [production] | 
            
  | 07:07 | 
  <mutante> | 
  phabricator rmdir /var/run/phd/pid  - empty and now unused | 
  [production] | 
            
  | 07:01 | 
  <moritzm> | 
  installing php5 security updates | 
  [production] | 
            
  | 05:27 | 
  <marostegui@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 05:24 | 
  <marostegui@cumin1001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 05:10 | 
  <marostegui> | 
  Upgrade pc1010 | 
  [production] | 
            
  | 00:30 | 
  <brennen@deploy1001> | 
  rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for T252179 | 
  [production] | 
            
  | 00:19 | 
  <brennen> | 
  rolling 1.35.0-wmf.31 train back to group0 for T252179 | 
  [production] |