| 2020-11-16
      
      § | 
    
  | 14:06 | <marostegui> | Restart pc1007's mysql T266483 | [production] | 
            
  | 14:06 | <marostegui@deploy1001> | Synchronized wmf-config/db-eqiad.php: Depool pc1007 and place pc1010 instead of it T266483 (duration: 01m 00s) | [production] | 
            
  | 13:23 | <hnowlan@cumin1001> | END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) | [production] | 
            
  | 13:00 | <kormat> | running schema change against s1 in codfw T259831 | [production] | 
            
  | 12:59 | <kormat@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | [production] | 
            
  | 12:59 | <kormat@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 12:43 | <moritzm> | installing tcpdump security updates | [production] | 
            
  | 12:35 | <hnowlan@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | [production] | 
            
  | 12:35 | <hnowlan@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 12:25 | <hnowlan@cumin1001> | START - Cookbook sre.cassandra.roll-restart | [production] | 
            
  | 12:25 | <hnowlan> | roll-restarting restbase-codfw | [production] | 
            
  | 12:24 | <hnowlan@cumin1001> | END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) | [production] | 
            
  | 12:10 | <hnowlan@cumin1001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | [production] | 
            
  | 12:10 | <hnowlan@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 11:49 | <hnowlan> | roll restarting sessionstore for java updates | [production] | 
            
  | 11:49 | <hnowlan@cumin1001> | START - Cookbook sre.cassandra.roll-restart | [production] | 
            
  | 11:13 | <moritzm> | installing poppler security updates | [production] | 
            
  | 10:46 | <klausman@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | [production] | 
            
  | 10:46 | <klausman@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 10:45 | <dcaro@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | [production] | 
            
  | 10:45 | <dcaro@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 10:44 | <dcaro@cumin1001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | [production] | 
            
  | 10:44 | <dcaro@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 09:31 | <gehel@cumin2001> | END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99) | [production] | 
            
  | 09:31 | <gehel@cumin2001> | START - Cookbook sre.elasticsearch.force-shard-allocation | [production] | 
            
  | 08:39 | <godog> | centrallog1001 move invalid config /etc/logrotate.d/logrotate-debug to /etc | [production] | 
            
  | 08:35 | <moritzm> | installing codemirror-js security updates | [production] | 
            
  | 08:32 | <XioNoX> | asw-c-codfw> request system power-off member 7 - T267865 | [production] | 
            
  | 08:24 | <joal@deploy1001> | Finished deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] (duration: 00m 07s) | [production] | 
            
  | 08:23 | <joal@deploy1001> | Started deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] | [production] | 
            
  | 08:23 | <joal@deploy1001> | Finished deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] (duration: 10m 09s) | [production] | 
            
  | 08:13 | <joal@deploy1001> | Started deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] | [production] | 
            
  | 08:08 | <XioNoX> | asw-c-codfw> request system power-off member 7 - T267865 | [production] | 
            
  | 06:35 | <marostegui> | Stop replication on s3 codfw master (db2105) for MCR schema change deployment T238966 | [production] | 
            
  | 06:14 | <marostegui> | Stop MySQL on es1018, es1015, es1019 to clone es1032, es1033, es1034 - T261717 | [production] | 
            
  | 06:06 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Depool es1018, es1015, es1019 - T261717', diff saved to https://phabricator.wikimedia.org/P13262 and previous config saved to /var/cache/conftool/dbconfig/20201116-060624-marostegui.json | [production] | 
            
  | 06:02 | <marostegui> | Restart mysql on db1115 (tendril/dbtree) due to memory usage | [production] | 
            
  | 00:55 | <shdubsh> | re-applied mask to kafka and kafka-mirror-main-eqiad_to_main-codfw@0 on kafka-main2003 and disabled puppet to prevent restart - T267865 | [production] | 
            
  | 00:19 | <elukey> | run 'systemctl mask kafka' and 'systemctl mask kafka-mirror-main-eqiad_to_main-codfw@0' on kafka-main2003 (for the brief moment when it was up) to avoid purged issues - T267865 | [production] | 
            
  | 00:09 | <elukey> | sudo cumin 'cp2028* or cp2036* or cp2039* or cp4022* or cp4025* or cp4028* or cp4031*' 'systemctl restart purged' -b 3 - T267865 | [production] | 
            
  
    | 2020-11-15
      
      § | 
    
  | 22:10 | <cdanis> | restart some purgeds in ulsfo as well T267865 T267867 | [production] | 
            
  | 22:03 | <cdanis> | T267867 T267865 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b2 -s10 'A:cp and A:codfw' 'systemctl restart purged' | [production] | 
            
  | 14:00 | <cdanis> | powercycling ms-be1022 via mgmt | [production] | 
            
  | 11:21 | <aborrero@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | [production] | 
            
  | 11:21 | <aborrero@cumin1001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 11:12 | <vgutierrez> | depooling lvs2007, lvs2010 taking over text traffic on codfw - T267865 | [production] | 
            
  | 10:00 | <elukey> | cumin 'cp2042* or cp2036* or cp2039*' 'systemctl restart purged' -b 1 | [production] | 
            
  | 09:57 | <elukey> | restart purged on cp4028 (consumer stuck due to kafka-main2003 down) | [production] | 
            
  | 09:55 | <elukey> | restart purged on cp4025 (consumer stuck due to kafka-main2003 down) | [production] | 
            
  | 09:53 | <elukey> | restart purged on cp4031 (consumer stuck due to kafka-main2003 down) | [production] |