1-50 of 10000 results (32ms)
2019-10-24 ยง
23:46 <mutante> bast3002 - rsyncing /home, /srv/tfptboot and /srv/prometheus to /srv/bast3002/ on bast3004 (T236394 T236329) [production]
23:24 <krinkle@deploy1001> Synchronized php-1.35.0-wmf.3/includes/specials/pagers/BlockListPager.php: T236425, fc99c5a7c0de2 (duration: 00m 54s) [production]
22:16 <bblack@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
22:14 <bblack@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
22:13 <mutante> gerrit1001 - starting gerrit [production]
22:13 <bblack@cumin1001> START - Cookbook sre.hosts.downtime [production]
22:12 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
22:12 <bblack@cumin1001> START - Cookbook sre.hosts.downtime [production]
22:12 <bblack@cumin1001> START - Cookbook sre.hosts.downtime [production]
22:11 <bblack@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
22:10 <thcipriani> stopping gerrit briefly for script run for T236344 [production]
22:09 <bblack@cumin1001> START - Cookbook sre.hosts.downtime [production]
22:01 <mutante> mw1270 - was alerting in Icinga as degraded systemd state - reason was 'hhvm.service not-found". systemctl reset-failed cleared it. could cause monitoring spam on more servers (T229792) [production]
21:56 <eileen> civicrm revision changed from 47e0800001 to a55c2d2787, config revision is 63a67f32a1 [production]
21:16 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet [production]
21:16 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet [production]
21:13 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet [production]
21:13 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet [production]
21:12 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp3039.esams.wmnet [production]
21:06 <bblack> cr3-esams remove pybal neighbor IPs for lvs3001-4 [production]
21:05 <bblack> cr2-esams remove pybal neighbor IPs for lvs3001-4 [production]
21:05 <urandom> restbase cassandra rolling restart, codfw / rack 'd' -- T200803 [production]
21:02 <bblack> downtimed lvs3001-4, stopping pybal there, etc... [production]
20:58 <bblack> cr3-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005 [production]
20:58 <bblack> cr2-esams switch high-traffic1 static fallback routes from lvs3001 to lvs3005 [production]
20:40 <bblack> esams lvs: high-traffic1 - change 3005's med to 0 (becomes new primary, permanently) [production]
20:36 <bblack> esams lvs: high-traffic1 - change 3003's med to 200, 3001's med to 50, 3005 remains 100 (traffic will blip to 3005 then back to 3001 again) [production]
20:33 <urandom> restbase cassandra rolling restart, codfw / rack 'c' -- T200803 [production]
20:24 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp3038.esams.wmnet [production]
20:24 <bblack@cumin1001> conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet [production]
20:23 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet [production]
20:22 <bblack@cumin1001> conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet [production]
20:04 <bblack> reboot cp3054 again for good measure [production]
19:57 <bblack> cp3054 - trying racadm serveraction hardreset [production]
19:32 <bblack> reboot dns3001 [production]
19:31 <urandom> restbase cassandra rolling restart, codfw / rack 'b' -- T200803 [production]
19:10 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:07 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:05 <urandom> restbase cassandra rolling restart, rack 'd' -- T200803 [production]
19:05 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:05 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime [production]
19:05 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime [production]
19:03 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime [production]
19:01 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:01 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:01 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:00 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
19:00 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
18:59 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
18:57 <bblack@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]