| 
      
        2021-06-03
      
      ยง
     | 
  
    
  | 20:34 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 20:34 | 
  <ebernhardson@deploy1002> | 
  Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container | 
  [production] | 
            
  | 20:34 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 20:34 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 19:58 | 
  <mutante> | 
  [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts | 
  [production] | 
            
  | 19:56 | 
  <mutante> | 
  [mwmaint1002:~] $ sudo systemctl start  daily_account_consistency_check.service | 
  [production] | 
            
  | 19:41 | 
  <dzahn@cumin1001> | 
  END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org | 
  [production] | 
            
  | 19:41 | 
  <dzahn@cumin1001> | 
  START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org | 
  [production] | 
            
  | 19:39 | 
  <ebernhardson@deploy1002> | 
  Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s) | 
  [production] | 
            
  | 19:37 | 
  <dzahn@cumin1001> | 
  END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org | 
  [production] | 
            
  | 19:34 | 
  <ebernhardson@deploy1002> | 
  Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs | 
  [production] | 
            
  | 19:33 | 
  <mutante> | 
  [deneb:~] $ sudo systemctl start docker-reporter-releng-images - T251918 -  icinga-wm> RECOVERY - Check systemd state on deneb is OK | 
  [production] | 
            
  | 19:33 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 19:32 | 
  <ryankemper@cumin1001> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 19:32 | 
  <mutante> | 
  [deneb:~] $ sudo systemctl start docker-reporter-releng-images | 
  [production] | 
            
  | 19:28 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 19:27 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 19:27 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 19:27 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 19:23 | 
  <dzahn@cumin1001> | 
  START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org | 
  [production] | 
            
  | 19:14 | 
  <mutante> | 
  install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456 | 
  [production] | 
            
  | 19:05 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 19:03 | 
  <ryankemper@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 19:03 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 19:01 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 18:52 | 
  <ebernhardson@deploy1002> | 
  Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s) | 
  [production] | 
            
  | 18:51 | 
  <ebernhardson@deploy1002> | 
  Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter | 
  [production] | 
            
  | 18:46 | 
  <ryankemper> | 
  T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 18:46 | 
  <ryankemper> | 
  T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 18:39 | 
  <ryankemper> | 
  [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on) | 
  [production] | 
            
  | 18:37 | 
  <ryankemper> | 
  [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547) | 
  [production] | 
            
  | 18:37 | 
  <dzahn@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729 | 
  [production] | 
            
  | 18:37 | 
  <dzahn@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729 | 
  [production] | 
            
  | 18:28 | 
  <mutante> | 
  temp. disabling puppet on install* servers. switching nginx to light variant (T164456) | 
  [production] | 
            
  | 18:16 | 
  <ebernhardson@deploy1002> | 
  Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s) | 
  [production] | 
            
  | 18:16 | 
  <ebernhardson@deploy1002> | 
  Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter | 
  [production] | 
            
  | 17:49 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 17:47 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 17:47 | 
  <robh@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 17:45 | 
  <robh@cumin1001> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 17:37 | 
  <brennen> | 
  gitlab1001: re-running install-gitlab-server.sh | 
  [production] | 
            
  | 17:16 | 
  <urandom> | 
  remove dropped Cassandra keyspace snapshots -- T258414 | 
  [production] | 
            
  | 16:55 | 
  <ejegg> | 
  updated payments-wiki from 6fac77f60e to 7be0534b91 | 
  [production] | 
            
  | 16:23 | 
  <ayounsi@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 15:49 | 
  <topranks> | 
  Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs. | 
  [production] | 
            
  | 15:27 | 
  <papaul> | 
  pdu  replacement  complete | 
  [production] | 
            
  | 15:25 | 
  <moritzm> | 
  upgrading gitlab to 13.11.5 | 
  [production] | 
            
  | 15:08 | 
  <papaul> | 
  disconnect ps2-d8-codfw for replacement | 
  [production] | 
            
  | 14:55 | 
  <oblivian@deploy1002> | 
  helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . | 
  [production] | 
            
  | 14:54 | 
  <topranks> | 
  Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002. | 
  [production] |