2021-01-22
ยง
|
19:15 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE |
[production] |
19:13 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE |
[production] |
19:11 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE |
[production] |
19:10 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE |
[production] |
19:09 |
<mutante> |
releases1002 systemctl reset-failed |
[production] |
19:09 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE |
[production] |
19:09 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE |
[production] |
19:08 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE |
[production] |
19:07 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE |
[production] |
18:47 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet |
[production] |
18:47 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet |
[production] |
18:47 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet |
[production] |
18:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet |
[production] |
18:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet |
[production] |
18:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet |
[production] |
18:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet |
[production] |
18:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet |
[production] |
18:17 |
<mutante> |
releases2002 - rebooting to confirm works now and also new disk gets auto-mounted |
[production] |
18:03 |
<mutante> |
releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again |
[production] |
18:01 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk |
[production] |
18:01 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk |
[production] |
17:59 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster |
[production] |
17:59 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster |
[production] |
17:57 |
<mutante> |
releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into T272555 but if it does now it's known how to fix |
[production] |
17:55 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE |
[production] |
17:54 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE |
[production] |
17:53 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE |
[production] |
17:52 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE |
[production] |
17:52 |
<mutante> |
releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1 |
[production] |
17:50 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE |
[production] |
17:50 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE |
[production] |
17:49 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE |
[production] |
17:49 |
<ppchelko@deploy1001> |
Finished deploy [restbase/deploy@e54225d]: T270411 T270415 T270281 T270277 (duration: 65m 37s) |
[production] |
17:49 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE |
[production] |
17:29 |
<mforns@deploy1001> |
Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s) |
[production] |
17:29 |
<mforns@deploy1001> |
Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] |
[production] |
17:23 |
<mforns@deploy1001> |
Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s) |
[production] |
17:13 |
<mforns@deploy1001> |
Started deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] |
[production] |
16:44 |
<ppchelko@deploy1001> |
Started deploy [restbase/deploy@e54225d]: T270411 T270415 T270281 T270277 |
[production] |
16:40 |
<cmjohnson1> |
replacing optics/fiber pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 T271295 |
[production] |
16:19 |
<jynus> |
restart of backup source hosts on codfw T271913 |
[production] |
15:54 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' . |
[production] |
15:40 |
<moritzm> |
installing puppetboard1002 |
[production] |
15:24 |
<moritzm> |
installing puppetboard2002 |
[production] |
13:44 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13932 and previous config saved to /var/cache/conftool/dbconfig/20210122-134444-kormat.json |
[production] |
13:33 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13931 and previous config saved to /var/cache/conftool/dbconfig/20210122-133341-marostegui.json |
[production] |
13:31 |
<marostegui> |
Stop replication on db1121 |
[production] |
13:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13930 and previous config saved to /var/cache/conftool/dbconfig/20210122-133044-marostegui.json |
[production] |
13:29 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13929 and previous config saved to /var/cache/conftool/dbconfig/20210122-132939-kormat.json |
[production] |
13:21 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2002.codfw.wmnet |
[production] |