| 
      
        2021-03-24
      
      §
     | 
  
    
  | 07:50 | 
  <jayme@cumin1001> | 
  conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external | 
  [production] | 
            
  | 07:50 | 
  <jayme@cumin1001> | 
  conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero | 
  [production] | 
            
  | 07:41 | 
  <elukey@cumin1001> | 
  END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet | 
  [production] | 
            
  | 07:40 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json | 
  [production] | 
            
  | 07:37 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json | 
  [production] | 
            
  | 07:27 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet | 
  [production] | 
            
  | 07:23 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json | 
  [production] | 
            
  | 07:22 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json | 
  [production] | 
            
  | 07:20 | 
  <elukey@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet | 
  [production] | 
            
  | 07:10 | 
  <elukey@cumin1001> | 
  START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet | 
  [production] | 
            
  | 07:09 | 
  <moritzm> | 
  installing squid security updates | 
  [production] | 
            
  | 06:35 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Add db1181 to dbctl, depooled T275633', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json | 
  [production] | 
            
  | 06:24 | 
  <root@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet | 
  [production] | 
            
  | 06:14 | 
  <root@cumin1001> | 
  START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet | 
  [production] | 
            
  | 05:52 | 
  <marostegui@cumin1001> | 
  dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json | 
  [production] | 
            
  | 04:44 | 
  <ryankemper@cumin1001> | 
  END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) | 
  [production] | 
            
  | 03:41 | 
  <ryankemper> | 
  T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots` | 
  [production] | 
            
  | 03:41 | 
  <ryankemper> | 
  T274204 Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already | 
  [production] | 
            
  | 03:40 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.elasticsearch.rolling-upgrade | 
  [production] | 
            
  | 03:39 | 
  <ryankemper> | 
  T274204 Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.` | 
  [production] | 
            
  | 03:38 | 
  <ryankemper@cumin1001> | 
  END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99) | 
  [production] | 
            
  | 02:38 | 
  <ryankemper> | 
  T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots` | 
  [production] | 
            
  | 02:31 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.elasticsearch.rolling-upgrade | 
  [production] | 
            
  | 01:59 | 
  <ryankemper> | 
  T274204 For now I'll proceed to the reboots of `codfw` | 
  [production] | 
            
  | 01:58 | 
  <ryankemper> | 
  T274204 `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack | 
  [production] | 
            
  | 01:58 | 
  <ryankemper@cumin1001> | 
  END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97) | 
  [production] | 
            
  | 01:49 | 
  <ryankemper> | 
  T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots` | 
  [production] | 
            
  | 01:48 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.elasticsearch.rolling-upgrade-reboot | 
  [production] | 
            
  | 01:36 | 
  <eileen> | 
  civicrm revision changed from f36a0b08f0 to ad430721f6, config revision is 26b02db7ba | 
  [production] | 
            
  | 00:22 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 00:18 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 00:18 | 
  <pt1979@cumin2001> | 
  END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 00:16 | 
  <pt1979@cumin2001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  
    | 
      
        2021-03-23
      
      §
     | 
  
    
  | 22:59 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 22:57 | 
  <robh@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 22:33 | 
  <dwisehaupt> | 
  pushing 60f9baaf50b to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - T170321 | 
  [production] | 
            
  | 22:19 | 
  <ebernhardson@deploy1002> | 
  Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s) | 
  [production] | 
            
  | 22:17 | 
  <ebernhardson@deploy1002> | 
  Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace | 
  [production] | 
            
  | 22:09 | 
  <dzahn@cumin1001> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 22:05 | 
  <dzahn@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 21:27 | 
  <ppchelko@deploy1002> | 
  Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s) | 
  [production] | 
            
  | 21:09 | 
  <ppchelko@deploy1002> | 
  Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint | 
  [production] | 
            
  | 21:04 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 21:00 | 
  <robh@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 20:59 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 20:40 | 
  <eileen> | 
  civicrm revision changed from 39d24e8b0a to f36a0b08f0, config revision is 26b02db7ba | 
  [production] | 
            
  | 20:24 | 
  <robh@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 20:24 | 
  <robh@cumin1001> | 
  END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | 
  [production] | 
            
  | 20:21 | 
  <robh@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 20:13 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth1002.eqiad.wmnet | 
  [production] |