| 
      
        2021-06-04
      
      §
     | 
  
    
  | 04:43 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 04:41 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 04:25 | 
  <ryankemper> | 
  T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 04:22 | 
  <ryankemper> | 
  T280382 `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv` | 
  [production] | 
            
  | 03:49 | 
  <ryankemper@cumin1001> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 02:42 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 02:33 | 
  <ryankemper> | 
  [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph` | 
  [production] | 
            
  | 02:32 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 02:30 | 
  <ryankemper> | 
  T280382 `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv` | 
  [production] | 
            
  | 02:25 | 
  <ryankemper> | 
  [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag) | 
  [production] | 
            
  | 02:09 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 02:06 | 
  <ebernhardson> | 
  post-deploy restart airflow-(webserver|scheduer) on an-airflow1001 | 
  [production] | 
            
  | 02:05 | 
  <ebernhardson@deploy1002> | 
  Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s) | 
  [production] | 
            
  | 02:00 | 
  <ebernhardson@deploy1002> | 
  Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift | 
  [production] | 
            
  | 01:38 | 
  <ryankemper@cumin1001> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 01:24 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 00:12 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 00:08 | 
  <reedy@deploy1002> | 
  Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 57s) | 
  [production] | 
            
  | 00:07 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 00:06 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 00:05 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 00:05 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 00:05 | 
  <ryankemper@cumin1001> | 
  END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) | 
  [production] | 
            
  
    | 
      
        2021-06-03
      
      §
     | 
  
    
  | 23:41 | 
  <reedy@deploy1002> | 
  Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 56s) | 
  [production] | 
            
  | 23:40 | 
  <reedy@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: T280886 (duration: 00m 57s) | 
  [production] | 
            
  | 23:33 | 
  <mutante> | 
  installing OS on fresh VM doh5001 | 
  [production] | 
            
  | 23:30 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 23:28 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE | 
  [production] | 
            
  | 23:09 | 
  <thcipriani@deploy1002> | 
  Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694686|Restrict changetags to sysops and bots on meta]] T283625 (duration: 00m 58s) | 
  [production] | 
            
  | 22:41 | 
  <ryankemper> | 
  T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 22:39 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 22:39 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 22:36 | 
  <ryankemper> | 
  T280382 Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after | 
  [production] | 
            
  | 22:36 | 
  <ryankemper@cumin1001> | 
  END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) | 
  [production] | 
            
  | 22:35 | 
  <ryankemper> | 
  T280382 `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv` | 
  [production] | 
            
  | 22:28 | 
  <robh@cumin1001> | 
  END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | 
  [production] | 
            
  | 22:15 | 
  <robh@cumin1001> | 
  START - Cookbook sre.dns.netbox | 
  [production] | 
            
  | 21:55 | 
  <ryankemper@cumin2002> | 
  END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | 
  [production] | 
            
  | 20:54 | 
  <shdubsh> | 
  restart kafka on kafka-logging to take new retention config | 
  [production] | 
            
  | 20:47 | 
  <sbassett> | 
  Deployed security patch for T282932 | 
  [production] | 
            
  | 20:37 | 
  <ebernhardson> | 
  restart mjolnir-kafka-bulk-daemon on search-loader[12]001 | 
  [production] | 
            
  | 20:35 | 
  <ebernhardson@deploy1002> | 
  Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s) | 
  [production] | 
            
  | 20:34 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 20:34 | 
  <ryankemper@cumin1001> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 20:34 | 
  <ebernhardson@deploy1002> | 
  Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container | 
  [production] | 
            
  | 20:34 | 
  <ryankemper> | 
  T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage` | 
  [production] | 
            
  | 20:34 | 
  <ryankemper@cumin2002> | 
  START - Cookbook sre.wdqs.data-transfer | 
  [production] | 
            
  | 19:58 | 
  <mutante> | 
  [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts | 
  [production] | 
            
  | 19:56 | 
  <mutante> | 
  [mwmaint1002:~] $ sudo systemctl start  daily_account_consistency_check.service | 
  [production] | 
            
  | 19:41 | 
  <dzahn@cumin1001> | 
  END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org | 
  [production] |