2021-12-02
ยง
|
11:47 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance T277354 |
[production] |
11:47 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance T277354 |
[production] |
11:47 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance T277354 |
[production] |
11:47 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:47 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17975 and previous config saved to /var/cache/conftool/dbconfig/20211202-114711-marostegui.json |
[production] |
11:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17974 and previous config saved to /var/cache/conftool/dbconfig/20211202-113206-marostegui.json |
[production] |
11:28 |
<oblivian@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:21 |
<moritzm> |
draining primary/secondary instances off ganeti2022 T296622 |
[production] |
11:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17973 and previous config saved to /var/cache/conftool/dbconfig/20211202-111702-marostegui.json |
[production] |
11:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17972 and previous config saved to /var/cache/conftool/dbconfig/20211202-110157-marostegui.json |
[production] |
11:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17971 and previous config saved to /var/cache/conftool/dbconfig/20211202-110120-marostegui.json |
[production] |
11:01 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance T277354 |
[production] |
11:01 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance T277354 |
[production] |
11:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17970 and previous config saved to /var/cache/conftool/dbconfig/20211202-110110-marostegui.json |
[production] |
10:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17969 and previous config saved to /var/cache/conftool/dbconfig/20211202-104606-marostegui.json |
[production] |
10:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17968 and previous config saved to /var/cache/conftool/dbconfig/20211202-103100-marostegui.json |
[production] |
10:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17967 and previous config saved to /var/cache/conftool/dbconfig/20211202-101555-marostegui.json |
[production] |
10:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17966 and previous config saved to /var/cache/conftool/dbconfig/20211202-101522-marostegui.json |
[production] |
10:15 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance T277354 |
[production] |
10:15 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance T277354 |
[production] |
10:05 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Maintenance T277354 |
[production] |
10:05 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Maintenance T277354 |
[production] |
10:03 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance T277354 |
[production] |
10:03 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance T277354 |
[production] |
10:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17964 and previous config saved to /var/cache/conftool/dbconfig/20211202-100307-marostegui.json |
[production] |
09:52 |
<moritzm> |
draining primary/secondary instances off ganeti2009 T296622 |
[production] |
09:48 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17963 and previous config saved to /var/cache/conftool/dbconfig/20211202-094802-marostegui.json |
[production] |
09:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17962 and previous config saved to /var/cache/conftool/dbconfig/20211202-093257-marostegui.json |
[production] |
09:27 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet |
[production] |
09:27 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet |
[production] |
09:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17961 and previous config saved to /var/cache/conftool/dbconfig/20211202-091753-marostegui.json |
[production] |
09:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17960 and previous config saved to /var/cache/conftool/dbconfig/20211202-091629-marostegui.json |
[production] |
09:16 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance T277354 |
[production] |
09:16 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance T277354 |
[production] |
08:51 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet |
[production] |
08:45 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet |
[production] |
08:34 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2010.codfw.wmnet with OS buster |
[production] |
08:29 |
<dcausse> |
restarting blazegraph on wdqs1007 (jvm stuck for 4h) |
[production] |
08:03 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster |
[production] |
02:50 |
<andrew@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster |
[production] |
02:43 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster |
[production] |
02:40 |
<andrew@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster |
[production] |
02:15 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster |
[production] |
02:14 |
<andrew@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster |
[production] |
01:52 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster |
[production] |
01:21 |
<ryankemper> |
T280001 Rolling restart of low-traffic pybal hosts complete. All of `wcqs` is pooled and the pybal / ipvs related alerts have cleared |
[production] |
01:16 |
<ryankemper> |
T280001 Pooled `wcqs200[1-3]` (had been left unpooled from when we last removed wcqs from production) |
[production] |
01:12 |
<ryankemper> |
T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` |
[production] |
01:12 |
<ryankemper> |
T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` |
[production] |
01:11 |
<ryankemper> |
T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` |
[production] |