| 2021-11-25
      
      § | 
    
  | 07:51 | <jelto@cumin1001> | conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(echostore|sessionstore) | [production] | 
            
  | 07:49 | <marostegui> | Stop mysql on db1133 to clone db1128 as a test host T295965 | [production] | 
            
  | 07:49 | <jelto@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' . | [production] | 
            
  | 07:48 | <jelto@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' . | [production] | 
            
  | 07:47 | <jayme> | elevated MediaWiki exceptions and fatals (from ~07:35) due to a mistake during re-deploy of eventgate-main | [production] | 
            
  | 07:45 | <jelto@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' . | [production] | 
            
  | 07:35 | <jelto@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . | [production] | 
            
  | 07:32 | <jelto@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . | [production] | 
            
  | 07:32 | <jelto@deploy1002> | helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . | [production] | 
            
  | 07:29 | <elukey_> | elukey@mwdebug2002:~$ sudo systemctl reset-failed ifup@ens5.service | [production] | 
            
  | 07:27 | <marostegui@cumin1001> | START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye | [production] | 
            
  | 07:23 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance T296143 | [production] | 
            
  | 07:23 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance T296143 | [production] | 
            
  | 07:20 | <jelto@cumin1001> | conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(apertium|api-gateway|apple-search|blubberoid|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventstreams|eventstreams-internal|linkrecommendation|mathoid|mobileapps|proton|push-notifications|recommendation-api|sessionstore|shellbox|shellbox-constraints|shellbox-media|shellbox-syntax | [production] | 
            
  | 07:17 | <jelto@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy T251305 | [production] | 
            
  | 07:17 | <jelto@cumin1001> | START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy T251305 | [production] | 
            
  | 07:10 | <jelto> | downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy T251305. I'm keeping an eye on icing and remove downtime as soon as I'm finished | [production] | 
            
  | 07:09 | <jelto> | start re-deploy procedure in eqiad Kubernetes T251305 | [production] | 
            
  | 06:31 | <marostegui> | Restart tendril's DB | [production] | 
            
  | 05:51 | <ryankemper> | [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good | [production] | 
            
  | 04:45 | <ryankemper@deploy1002> | Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s) | [production] | 
            
  | 04:43 | <ryankemper> | [WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet | [production] | 
            
  | 04:40 | <ryankemper@deploy1002> | Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS | [production] | 
            
  | 04:39 | <ryankemper> | [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` | [production] | 
            
  | 04:38 | <ryankemper> | [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` | [production] | 
            
  | 04:38 | <ryankemper> | [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` | [production] | 
            
  | 04:35 | <ryankemper@deploy1002> | Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s) | [production] | 
            
  | 04:30 | <ryankemper> | [Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'` | [production] | 
            
  | 04:27 | <ryankemper> | [WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet | [production] | 
            
  | 04:25 | <ryankemper@deploy1002> | Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93 | [production] | 
            
  | 04:25 | <ryankemper> | [WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003` | [production] | 
            
  | 03:12 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster | [production] | 
            
  | 02:42 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster | [production] | 
            
  | 02:34 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster | [production] | 
            
  | 02:23 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster | [production] | 
            
  | 02:04 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster | [production] | 
            
  | 01:54 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster | [production] | 
            
  | 01:49 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster | [production] | 
            
  | 01:34 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster | [production] | 
            
  | 01:19 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster | [production] | 
            
  | 01:04 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster | [production] | 
            
  | 00:37 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster | [production] |