| 2021-05-04
      
      § | 
    
  | 04:41 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json | [production] | 
            
  | 04:06 | <ryankemper@cumin1001> | END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 03:38 | <ryankemper@cumin1001> | START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 03:38 | <ryankemper@cumin1001> | END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 03:36 | <ryankemper> | T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` | [production] | 
            
  | 03:35 | <ryankemper@cumin1001> | START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 02:09 | <pt1979@cumin2001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 02:07 | <pt1979@cumin2001> | START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 01:41 | <ryankemper@cumin1001> | END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  
    | 2021-05-03
      
      § | 
    
  | 23:18 | <urbanecm@deploy1002> | Synchronized wmf-config/InitialiseSettings.php: 230ef5716b34ca83348667f289180313b76ce8a3: Prepare for new configuration option (T277951) (duration: 00m 57s) | [production] | 
            
  | 23:15 | <urbanecm@deploy1002> | Synchronized wmf-config/InitialiseSettings.php: 7c47ee17b3936fb1f79590187a9e0028276e4a9d: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s) | [production] | 
            
  | 23:14 | <urbanecm@deploy1002> | sync-file aborted: 7c47ee17b3936fb1f79590187a9e0028276e4a9d: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s) | [production] | 
            
  | 22:17 | <legoktm> | ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l | [production] | 
            
  | 22:14 | <mutante> | [backup1001:~] $ sudo check_bacula.py --icinga | [production] | 
            
  | 21:56 | <ryankemper> | T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` | [production] | 
            
  | 21:55 | <ryankemper@cumin1001> | START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 21:54 | <ryankemper> | T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))` | [production] | 
            
  | 21:52 | <ryankemper@cumin1001> | END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 21:47 | <ryankemper> | T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` | [production] | 
            
  | 21:46 | <ryankemper@cumin1001> | START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 21:32 | <krinkle@deploy1002> | Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s) | [production] | 
            
  | 21:27 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 21:25 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 21:22 | <ryankemper> | [WDQS] `ryankemper@wdqs1003:~$ sudo pool` | [production] | 
            
  | 21:20 | <ryankemper> | T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no` | [production] | 
            
  | 21:19 | <ryankemper@puppetmaster1001> | conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet | [production] | 
            
  | 21:09 | <ryankemper> | T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` | [production] | 
            
  | 21:06 | <ryankemper> | T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` | [production] | 
            
  | 21:05 | <ryankemper@cumin1001> | START - Cookbook sre.wdqs.data-transfer | [production] | 
            
  | 21:02 | <ryankemper> | T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  975G  1.5T  39% /srv` | [production] | 
            
  | 20:56 | <ryankemper> | T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force` | [production] | 
            
  | 20:44 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) | [production] | 
            
  | 20:42 | <ryankemper@cumin1001> | END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) | [production] | 
            
  | 20:37 | <ryankemper> | T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` | [production] | 
            
  | 20:37 | <ryankemper@cumin1001> | START - Cookbook sre.wdqs.data-transfer | [production] | 
            
  | 19:40 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 19:39 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 19:24 | <ryankemper> | T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` | [production] | 
            
  | 19:24 | <ryankemper@cumin1001> | START - Cookbook sre.wdqs.data-transfer | [production] | 
            
  | 19:21 | <ryankemper@puppetmaster1001> | conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet | [production] | 
            
  | 19:21 | <ryankemper> | T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead) | [production] | 
            
  | 18:20 | <Urbanecm> | Morning B&C window done | [production] | 
            
  | 18:19 | <urbanecm@deploy1002> | Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s) | [production] | 
            
  | 18:15 | <urbanecm@deploy1002> | Synchronized wmf-config/filebackend.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s) | [production] | 
            
  | 18:14 | <urbanecm@deploy1002> | Synchronized wmf-config/CommonSettings.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s) | [production] | 
            
  | 17:44 | <ryankemper@cumin1001> | END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 17:20 | <hashar> | Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737 | [production] | 
            
  | 16:30 | <ryankemper@cumin1001> | START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 | [production] | 
            
  | 16:29 | <ryankemper> | T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435 | [production] | 
            
  | 16:27 | <ryankemper@puppetmaster1001> | conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet | [production] |