2019-10-24
ยง
|
17:15 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:54 |
<ema> |
depool cp3036 (cache_upload) T233242 |
[production] |
16:39 |
<urandom> |
restarting cassandra, restbase2011 (canary for config changes) -- T200803 |
[production] |
16:32 |
<urandom> |
restarting cassandra, restbase1016 (canary for config changes) -- T200803 |
[production] |
16:28 |
<ema> |
depool cp3035 (cache_upload) T233242 |
[production] |
16:07 |
<ema> |
pool cp3057 (cache_upload) T233242 |
[production] |
15:51 |
<ema> |
depool cp3032 (cache_text) T233242 |
[production] |
15:45 |
<ema> |
depool cp3034 (cache_upload) T233242 |
[production] |
15:40 |
<ema> |
depool cp3030 (cache_text) T233242 |
[production] |
15:27 |
<bblack> |
asw2-esams: configure port descriptions and vlan/lvs groupings for all rack16 hosts (lvs3007, ganeti3003, bast3004, cp3061-5) |
[production] |
15:19 |
<ema> |
pool cp3058 (cache_text) T233242 |
[production] |
15:18 |
<effie> |
Slowly reload apache across the fleet (as we are enabling puppet) - T229792 |
[production] |
15:09 |
<effie> |
Remove hhvm packages and enable puppet across the fleet - T229792 |
[production] |
15:09 |
<ema> |
pool cp3055 (cache_upload) T233242 |
[production] |
15:04 |
<addshore@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: testcommonswiki, Enable Wikibase client access T223792 (duration: 00m 53s) |
[production] |
15:00 |
<bblack> |
cr2-esams - add missing lvs3005 IP to bgp pybal neighbor list |
[production] |
14:58 |
<bblack> |
cr3-esams - change fallback static route for high-traffic2 to lvs3006 |
[production] |
14:58 |
<bblack> |
cr2-esams - change fallback static route for high-traffic2 to lvs3006 |
[production] |
14:47 |
<effie> |
run puppet on all canaries and codfw - T229792 |
[production] |
14:42 |
<ema@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
14:40 |
<effie> |
Remove hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from all canaries and codfw - T229792 |
[production] |
14:40 |
<ema@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:26 |
<bblack> |
lvs3006 (upload, becoming active) - manual pybal med s/90/0/ (will take over from lvs3002, intended permanently). |
[production] |
14:23 |
<bblack> |
lvs3006 (upload, inactive) - manual pybal med s/100/90/ (preferred to lvs3004 for fallback from lvs3002) |
[production] |
14:22 |
<effie> |
enable puppet on mw app canaries |
[production] |
14:16 |
<ema> |
power-cycle cp3056, stuck rebooting into d-i T233242 |
[production] |
13:59 |
<ema> |
pool cp3060 T233242 |
[production] |
13:36 |
<bblack> |
re-pooling esams in dns |
[production] |
13:34 |
<effie> |
enable puppet on mwdebug* |
[production] |
13:25 |
<XioNoX> |
enable transit4/6 on cr2-knams |
[production] |
13:24 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=varnish-be,name=cp30[56].* |
[production] |
13:24 |
<bblack@cumin1001> |
conftool action : set/weight=100; selector: name=cp30[56].*,service=varnish-be |
[production] |
13:23 |
<bblack@cumin1001> |
conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=varnish-fe |
[production] |
13:22 |
<bblack@cumin1001> |
conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_text,service=nginx |
[production] |
13:22 |
<bblack@cumin1001> |
conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=varnish-fe |
[production] |
13:22 |
<bblack@cumin1001> |
conftool action : set/weight=1; selector: name=cp30[56].*,cluster=cache_upload,service=nginx |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3063.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3051.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3059.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3061.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3057.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3065.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3055.esams.wmnet |
[production] |
13:18 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: service=ats-be,name=cp3053.esams.wmnet |
[production] |
13:17 |
<ema> |
set ats-be weights on new esams upload nodes T233242 |
[production] |
13:06 |
<liw@deploy1001> |
rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.3 |
[production] |
12:56 |
<effie> |
purge hhvm hhvm-luasandbox hhvm-tidy hhvm-wikidiff2 hhvm-dbg from mw* canaries - T229792 |
[production] |
12:42 |
<ema@puppetmaster1001> |
conftool action : set/weight=100; selector: name=cp3060.esams.wmnet,service=varnish-be |
[production] |
12:33 |
<effie> |
Stopping puppet on all hosts including the hhvm class (C:hhvm) - 544864 - T229792 |
[production] |
12:25 |
<ema> |
cp3060: powercycle -- NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [charon:1226] T233242 |
[production] |