2019-10-24
§
|
23:46 |
<mutante> |
bast3002 - rsyncing /home, /srv/tfptboot and /srv/prometheus to /srv/bast3002/ on bast3004 (T236394 T236329) |
[production] |
23:24 |
<krinkle@deploy1001> |
Synchronized php-1.35.0-wmf.3/includes/specials/pagers/BlockListPager.php: T236425, fc99c5a7c0de2 (duration: 00m 54s) |
[production] |
22:16 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
22:14 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
22:13 |
<mutante> |
gerrit1001 - starting gerrit |
[production] |
22:13 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
22:12 |
<bblack@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
22:12 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
22:12 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
22:11 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
22:10 |
<thcipriani> |
stopping gerrit briefly for script run for T236344 |
[production] |
22:09 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
22:01 |
<mutante> |
mw1270 - was alerting in Icinga as degraded systemd state - reason was 'hhvm.service not-found". systemctl reset-failed cleared it. could cause monitoring spam on more servers (T229792) |
[production] |
21:56 |
<eileen> |
civicrm revision changed from 47e0800001 to a55c2d2787, config revision is 63a67f32a1 |
[production] |
21:16 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet |
[production] |
21:16 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet |
[production] |
21:13 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet |
[production] |
21:13 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet |
[production] |
21:12 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp3039.esams.wmnet |
[production] |
21:06 |
<bblack> |
cr3-esams remove pybal neighbor IPs for lvs3001-4 |
[production] |
21:05 |
<bblack> |
cr2-esams remove pybal neighbor IPs for lvs3001-4 |
[production] |
21:05 |
<urandom> |
restbase cassandra rolling restart, codfw / rack 'd' -- T200803 |
[production] |
21:02 |
<bblack> |
downtimed lvs3001-4, stopping pybal there, etc... |
[production] |
20:58 |
<bblack> |
cr3-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005 |
[production] |
20:58 |
<bblack> |
cr2-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005 |
[production] |
20:40 |
<bblack> |
esams lvs: high-traffic1 - change 3005's med to 0 (becomes new primary, permanently) |
[production] |
20:36 |
<bblack> |
esams lvs: high-traffic1 - change 3003's med to 200, 3001's med to 50, 3005 remains 100 (traffic will blip to 3005 then back to 3001 again) |
[production] |
20:33 |
<urandom> |
restbase cassandra rolling restart, codfw / rack 'c' -- T200803 |
[production] |
20:24 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp3038.esams.wmnet |
[production] |
20:24 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet |
[production] |
20:23 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet |
[production] |
20:22 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet |
[production] |
20:04 |
<bblack> |
reboot cp3054 again for good measure |
[production] |
19:57 |
<bblack> |
cp3054 - trying racadm serveraction hardreset |
[production] |
19:32 |
<bblack> |
reboot dns3001 |
[production] |
19:31 |
<urandom> |
restbase cassandra rolling restart, codfw / rack 'b' -- T200803 |
[production] |
19:10 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
19:07 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
19:05 |
<urandom> |
restbase cassandra rolling restart, rack 'd' -- T200803 |
[production] |
19:05 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
19:05 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
19:05 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |