7001-7050 of 10000 results (59ms)
2022-02-09 §
10:20 <jelto> update scap to 4.3.1 on A:restbase-canary - T301307 [production]
10:17 <jelto> update scap to 4.3.1 on A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary - T301307 [production]
10:16 <ariel@deploy1002> Finished deploy [dumps/dumps@9993036]: fix up default api jobs entry for siteinfo v2 (duration: 00m 03s) [production]
10:15 <ariel@deploy1002> Started deploy [dumps/dumps@9993036]: fix up default api jobs entry for siteinfo v2 [production]
10:15 <mvernon@cumin2002> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-fe[2005-2008].codfw.wmnet [production]
10:14 <volans> uploaded python3-wmflib_1.0.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [production]
10:11 <mvernon@cumin2002> START - Cookbook sre.hosts.decommission for hosts ms-fe[2005-2008].codfw.wmnet [production]
10:03 <akosiaris> T300568 upload prometheus-etherpad-exporter_0.4_amd64 to apt.wikimedia.org bullseye-wikimedia/main [production]
10:02 <Emperor> rolling restart of swift frontends T301251 [production]
09:46 <jayme@deploy1002> helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [production]
09:45 <jayme@deploy1002> helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [production]
09:45 <jayme@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [production]
09:45 <elukey> update my ssh key on all network devices (will commit only when the diff is my key only) [production]
09:44 <jayme@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'apply'. [production]
09:41 <ema> cp3050: stop and disable atskafka-webrequest.service T247497 [production]
09:15 <ema> cp3050: ats-backend-restart to set the number of allowed Lua states back from 64 to 256 (default) T265625 [production]
08:21 <dcausse> restarting blazegraph on wdqs1004 (jvm stuck for 5hours) [production]
07:55 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet [production]
07:42 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet [production]
07:35 <marostegui@cumin1001> dbctl commit (dc=all): 'Remove logpager group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20410 and previous config saved to /var/cache/conftool/dbconfig/20220209-073528-marostegui.json [production]
04:10 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance [production]
04:10 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance [production]
03:48 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [production]
03:48 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [production]
03:48 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20407 and previous config saved to /var/cache/conftool/dbconfig/20220209-034800-ladsgroup.json [production]
03:32 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20406 and previous config saved to /var/cache/conftool/dbconfig/20220209-033255-ladsgroup.json [production]
03:17 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20405 and previous config saved to /var/cache/conftool/dbconfig/20220209-031750-ladsgroup.json [production]
03:02 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20404 and previous config saved to /var/cache/conftool/dbconfig/20220209-030245-ladsgroup.json [production]
02:34 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20403 and previous config saved to /var/cache/conftool/dbconfig/20220209-023446-ladsgroup.json [production]
02:34 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance [production]
02:34 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance [production]
02:11 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 11 hosts with reason: Maintenance [production]
02:11 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on 11 hosts with reason: Maintenance [production]
02:11 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance [production]
02:11 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance [production]
2022-02-08 §
23:52 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2055.codfw.wmnet with OS buster [production]
23:48 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2054.codfw.wmnet with OS buster [production]
23:22 <tzatziki> removing 1 file for legal compliance [production]
23:21 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS buster [production]
23:20 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS buster [production]
23:17 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS buster [production]
23:12 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS buster [production]
22:50 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS buster [production]
22:44 <dzahn@deploy1002> helmfile [staging] DONE helmfile.d/services/miscweb: sync on main [production]
22:42 <dzahn@deploy1002> helmfile [staging] START helmfile.d/services/miscweb: apply on main [production]
22:41 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS buster [production]
22:15 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20402 and previous config saved to /var/cache/conftool/dbconfig/20220208-221545-marostegui.json [production]
22:12 <topranks> doing planned 1-by-1 shutdown of ports xe-0/1/1, xe-0/1/2 and xe-0/1/9 on cr2-esams, to test reliability of each following user reports of issues at AMS-IX. [production]
22:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20401 and previous config saved to /var/cache/conftool/dbconfig/20220208-220041-marostegui.json [production]
21:59 <ryankemper> T294805 elastic10[68-83] erroneously weren't in pybal, added them just now: `sudo confctl select 'cluster=elasticsearch' set/pooled=yes:weight=10` (there's no hosts in the `conftool-data` list that we want depooled so we're okay setting all to pooled w/ equal weight) [production]