2021-01-22
ยง
|
20:00 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE |
[production] |
20:00 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE |
[production] |
20:00 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE |
[production] |
20:00 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE |
[production] |
19:59 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE |
[production] |
19:39 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2356.codfw.wmnet |
[production] |
19:38 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2354.codfw.wmnet |
[production] |
19:38 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2352.codfw.wmnet |
[production] |
19:36 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2350.codfw.wmnet |
[production] |
19:35 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2352.codfw.wmnet |
[production] |
19:35 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2350.codfw.wmnet |
[production] |
19:35 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2354.codfw.wmnet |
[production] |
19:34 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2356.codfw.wmnet |
[production] |
19:15 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE |
[production] |
19:13 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE |
[production] |
19:11 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE |
[production] |
19:10 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE |
[production] |
19:09 |
<mutante> |
releases1002 systemctl reset-failed |
[production] |
19:09 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE |
[production] |
19:09 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE |
[production] |
19:08 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE |
[production] |
19:07 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE |
[production] |
18:47 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet |
[production] |
18:47 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet |
[production] |
18:47 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet |
[production] |
18:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet |
[production] |
18:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet |
[production] |
18:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet |
[production] |
18:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet |
[production] |
18:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet |
[production] |
18:17 |
<mutante> |
releases2002 - rebooting to confirm works now and also new disk gets auto-mounted |
[production] |
18:03 |
<mutante> |
releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again |
[production] |
18:01 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk |
[production] |
18:01 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk |
[production] |
17:59 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster |
[production] |
17:59 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster |
[production] |
17:57 |
<mutante> |
releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into T272555 but if it does now it's known how to fix |
[production] |
17:55 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE |
[production] |
17:54 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE |
[production] |
17:53 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE |
[production] |
17:52 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE |
[production] |
17:52 |
<mutante> |
releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1 |
[production] |
17:50 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE |
[production] |
17:50 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE |
[production] |
17:49 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE |
[production] |
17:49 |
<ppchelko@deploy1001> |
Finished deploy [restbase/deploy@e54225d]: T270411 T270415 T270281 T270277 (duration: 65m 37s) |
[production] |
17:49 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE |
[production] |
17:29 |
<mforns@deploy1001> |
Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s) |
[production] |
17:29 |
<mforns@deploy1001> |
Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] |
[production] |
17:23 |
<mforns@deploy1001> |
Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s) |
[production] |