2022-02-09
§
|
09:45 |
<jayme@deploy1002> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
09:45 |
<jayme@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
09:45 |
<elukey> |
update my ssh key on all network devices (will commit only when the diff is my key only) |
[production] |
09:44 |
<jayme@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
09:41 |
<ema> |
cp3050: stop and disable atskafka-webrequest.service T247497 |
[production] |
09:15 |
<ema> |
cp3050: ats-backend-restart to set the number of allowed Lua states back from 64 to 256 (default) T265625 |
[production] |
08:21 |
<dcausse> |
restarting blazegraph on wdqs1004 (jvm stuck for 5hours) |
[production] |
07:55 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet |
[production] |
07:42 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet |
[production] |
07:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove logpager group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20410 and previous config saved to /var/cache/conftool/dbconfig/20220209-073528-marostegui.json |
[production] |
04:10 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance |
[production] |
04:10 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance |
[production] |
03:48 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance |
[production] |
03:48 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance |
[production] |
03:48 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20407 and previous config saved to /var/cache/conftool/dbconfig/20220209-034800-ladsgroup.json |
[production] |
03:32 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20406 and previous config saved to /var/cache/conftool/dbconfig/20220209-033255-ladsgroup.json |
[production] |
03:17 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20405 and previous config saved to /var/cache/conftool/dbconfig/20220209-031750-ladsgroup.json |
[production] |
03:02 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20404 and previous config saved to /var/cache/conftool/dbconfig/20220209-030245-ladsgroup.json |
[production] |
02:34 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20403 and previous config saved to /var/cache/conftool/dbconfig/20220209-023446-ladsgroup.json |
[production] |
02:34 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance |
[production] |
02:34 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance |
[production] |
02:11 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 11 hosts with reason: Maintenance |
[production] |
02:11 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on 11 hosts with reason: Maintenance |
[production] |
02:11 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance |
[production] |
02:11 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance |
[production] |
2022-02-08
§
|
23:52 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2055.codfw.wmnet with OS buster |
[production] |
23:48 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2054.codfw.wmnet with OS buster |
[production] |
23:22 |
<tzatziki> |
removing 1 file for legal compliance |
[production] |
23:21 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS buster |
[production] |
23:20 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS buster |
[production] |
23:17 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS buster |
[production] |
23:12 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS buster |
[production] |
22:50 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS buster |
[production] |
22:44 |
<dzahn@deploy1002> |
helmfile [staging] DONE helmfile.d/services/miscweb: sync on main |
[production] |
22:42 |
<dzahn@deploy1002> |
helmfile [staging] START helmfile.d/services/miscweb: apply on main |
[production] |
22:41 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS buster |
[production] |
22:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20402 and previous config saved to /var/cache/conftool/dbconfig/20220208-221545-marostegui.json |
[production] |
22:12 |
<topranks> |
doing planned 1-by-1 shutdown of ports xe-0/1/1, xe-0/1/2 and xe-0/1/9 on cr2-esams, to test reliability of each following user reports of issues at AMS-IX. |
[production] |
22:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20401 and previous config saved to /var/cache/conftool/dbconfig/20220208-220041-marostegui.json |
[production] |
21:59 |
<ryankemper> |
T294805 elastic10[68-83] erroneously weren't in pybal, added them just now: `sudo confctl select 'cluster=elasticsearch' set/pooled=yes:weight=10` (there's no hosts in the `conftool-data` list that we want depooled so we're okay setting all to pooled w/ equal weight) |
[production] |
21:59 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch |
[production] |
21:58 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch,name=elastic1* |
[production] |
21:53 |
<ryankemper@puppetmaster1001> |
conftool action : GET; selector: service=search |
[production] |
21:52 |
<ryankemper@puppetmaster1001> |
conftool action : GET; selector: service=search |
[production] |
21:47 |
<ryankemper> |
[Elastic] `ryankemper@elastic1081:~$ sudo systemctl restart elasticsearch_6*psi*` (9600 but not 9200 seemed to be having connectivity issues) |
[production] |
21:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20400 and previous config saved to /var/cache/conftool/dbconfig/20220208-214536-marostegui.json |
[production] |
21:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20399 and previous config saved to /var/cache/conftool/dbconfig/20220208-213031-marostegui.json |
[production] |
21:26 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20398 and previous config saved to /var/cache/conftool/dbconfig/20220208-212558-marostegui.json |
[production] |
21:25 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance |
[production] |
21:25 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance |
[production] |