2021-08-04
ยง
|
17:57 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE |
[production] |
17:49 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet |
[production] |
17:49 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet |
[production] |
17:46 |
<dzahn@cumin1001> |
conftool action : set/weight=30; selector: name=mw2380.codfw.wmnet |
[production] |
17:46 |
<dzahn@cumin1001> |
conftool action : set/weight=30; selector: name=mw237[7-9].codfw.wmnet |
[production] |
17:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet |
[production] |
17:41 |
<dzahn@cumin1001> |
conftool action : set/weight=25; selector: name=mw2357.codfw.wmnet |
[production] |
17:40 |
<mutante> |
mw2357, mw2377, mw2378 - scap pull |
[production] |
17:40 |
<dzahn@cumin1001> |
conftool action : set/weight=30; selector: name=mw2357.codfw.wmnet |
[production] |
17:39 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
17:29 |
<dzahn@cumin1001> |
conftool action : set/weight=25; selector: name=mw238[1-2].codfw.wmnet |
[production] |
17:29 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
17:27 |
<ejegg> |
updated payments-wiki config to 360c8a1f08 |
[production] |
17:26 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet |
[production] |
17:25 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet |
[production] |
17:25 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet |
[production] |
17:25 |
<dzahn@cumin1001> |
conftool action : set/weight=25; selector: name=mw2355.codfw.wmnet |
[production] |
17:25 |
<dzahn@cumin1001> |
conftool action : set/weight=25; selector: name=mw2353.codfw.wmnet |
[production] |
17:25 |
<dzahn@cumin1001> |
conftool action : set/weight=25; selector: name=mw2351.codfw.wmnet |
[production] |
17:15 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE |
[production] |
17:13 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE |
[production] |
17:12 |
<urbanecm@deploy1002> |
Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 66c2c7593322dfc575edc818aaff8d9b79466bdd: updateMenteeData: Output how long the script took (T287964) (duration: 01m 07s) |
[production] |
17:11 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE |
[production] |
17:11 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE |
[production] |
17:10 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
17:10 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
17:09 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE |
[production] |
17:08 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE |
[production] |
16:57 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:55 |
<mutante> |
mw2351, mw2353, mw2355 - scap pull |
[production] |
16:40 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:37 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:25 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2355.codfw.wmnet with reason: reimage |
[production] |
16:25 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw2355.codfw.wmnet with reason: reimage |
[production] |
16:23 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE |
[production] |
16:23 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage |
[production] |
16:22 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage |
[production] |
16:22 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage |
[production] |
16:22 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage |
[production] |
16:21 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage |
[production] |
16:21 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage |
[production] |
16:21 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE |
[production] |
16:21 |
<joe> |
find . -type f -delete on /var/cache/nginx-docker-registry on registry2*, the disk is too small for unbound cache *and* accepting large uploads |
[production] |
16:20 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE |
[production] |
16:19 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE |
[production] |
16:18 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE |
[production] |
16:16 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE |
[production] |
16:15 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009 |
[production] |
16:15 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009 |
[production] |
16:14 |
<hnowlan> |
draining maps1008 from cassandra cluster |
[production] |