| 2022-02-09
      
      § | 
    
  | 09:46 | <jayme@deploy1002> | helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | [production] | 
            
  | 09:45 | <jayme@deploy1002> | helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | [production] | 
            
  | 09:45 | <jayme@deploy1002> | helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | [production] | 
            
  | 09:45 | <elukey> | update my ssh key on all network devices (will commit only when the diff is my key only) | [production] | 
            
  | 09:44 | <jayme@deploy1002> | helmfile [staging-codfw] START helmfile.d/admin 'apply'. | [production] | 
            
  | 09:41 | <ema> | cp3050: stop and disable atskafka-webrequest.service T247497 | [production] | 
            
  | 09:15 | <ema> | cp3050: ats-backend-restart to set the number of allowed Lua states back from 64 to 256 (default) T265625 | [production] | 
            
  | 08:21 | <dcausse> | restarting blazegraph on wdqs1004 (jvm stuck for 5hours) | [production] | 
            
  | 07:55 | <filippo@cumin1001> | END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet | [production] | 
            
  | 07:42 | <filippo@cumin1001> | START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet | [production] | 
            
  | 07:35 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Remove logpager group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20410 and previous config saved to /var/cache/conftool/dbconfig/20220209-073528-marostegui.json | [production] | 
            
  | 04:10 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 04:10 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 03:48 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 03:48 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 03:48 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20407 and previous config saved to /var/cache/conftool/dbconfig/20220209-034800-ladsgroup.json | [production] | 
            
  | 03:32 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20406 and previous config saved to /var/cache/conftool/dbconfig/20220209-033255-ladsgroup.json | [production] | 
            
  | 03:17 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20405 and previous config saved to /var/cache/conftool/dbconfig/20220209-031750-ladsgroup.json | [production] | 
            
  | 03:02 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20404 and previous config saved to /var/cache/conftool/dbconfig/20220209-030245-ladsgroup.json | [production] | 
            
  | 02:34 | <ladsgroup@cumin1001> | dbctl commit (dc=all): 'Depooling db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20403 and previous config saved to /var/cache/conftool/dbconfig/20220209-023446-ladsgroup.json | [production] | 
            
  | 02:34 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 02:34 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance | [production] | 
            
  | 02:11 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 11 hosts with reason: Maintenance | [production] | 
            
  | 02:11 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 12:00:00 on 11 hosts with reason: Maintenance | [production] | 
            
  | 02:11 | <ladsgroup@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance | [production] | 
            
  | 02:11 | <ladsgroup@cumin1001> | START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance | [production] | 
            
  
    | 2022-02-08
      
      § | 
    
  | 23:52 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2055.codfw.wmnet with OS buster | [production] | 
            
  | 23:48 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2054.codfw.wmnet with OS buster | [production] | 
            
  | 23:22 | <tzatziki> | removing 1 file for legal compliance | [production] | 
            
  | 23:21 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS buster | [production] | 
            
  | 23:20 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS buster | [production] | 
            
  | 23:17 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS buster | [production] | 
            
  | 23:12 | <pt1979@cumin2002> | END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS buster | [production] | 
            
  | 22:50 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS buster | [production] | 
            
  | 22:44 | <dzahn@deploy1002> | helmfile [staging] DONE helmfile.d/services/miscweb: sync on main | [production] | 
            
  | 22:42 | <dzahn@deploy1002> | helmfile [staging] START helmfile.d/services/miscweb: apply on main | [production] | 
            
  | 22:41 | <pt1979@cumin2002> | START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS buster | [production] | 
            
  | 22:15 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20402 and previous config saved to /var/cache/conftool/dbconfig/20220208-221545-marostegui.json | [production] | 
            
  | 22:12 | <topranks> | doing planned 1-by-1 shutdown of ports xe-0/1/1, xe-0/1/2 and xe-0/1/9 on cr2-esams, to test reliability of each following user reports of issues at AMS-IX. | [production] | 
            
  | 22:00 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20401 and previous config saved to /var/cache/conftool/dbconfig/20220208-220041-marostegui.json | [production] | 
            
  | 21:59 | <ryankemper> | T294805 elastic10[68-83] erroneously weren't in pybal, added them just now: `sudo confctl select 'cluster=elasticsearch' set/pooled=yes:weight=10` (there's no hosts in the `conftool-data` list that we want depooled so we're okay setting all to pooled w/ equal weight) | [production] | 
            
  | 21:59 | <ryankemper@puppetmaster1001> | conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch | [production] | 
            
  | 21:58 | <ryankemper@puppetmaster1001> | conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch,name=elastic1* | [production] | 
            
  | 21:53 | <ryankemper@puppetmaster1001> | conftool action : GET; selector: service=search | [production] | 
            
  | 21:52 | <ryankemper@puppetmaster1001> | conftool action : GET; selector: service=search | [production] | 
            
  | 21:47 | <ryankemper> | [Elastic] `ryankemper@elastic1081:~$ sudo systemctl restart elasticsearch_6*psi*` (9600 but not 9200 seemed to be having connectivity issues) | [production] | 
            
  | 21:45 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20400 and previous config saved to /var/cache/conftool/dbconfig/20220208-214536-marostegui.json | [production] | 
            
  | 21:30 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20399 and previous config saved to /var/cache/conftool/dbconfig/20220208-213031-marostegui.json | [production] | 
            
  | 21:26 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Depooling db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20398 and previous config saved to /var/cache/conftool/dbconfig/20220208-212558-marostegui.json | [production] | 
            
  | 21:25 | <marostegui@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance | [production] |