2020-04-12
§
|
10:18 |
<elukey> |
restart wdqs-updater on wdqs1004 (logs show no reports from the past hours, last one were stack traces related to a json decode failure) |
[production] |
06:59 |
<dcausse> |
restarting blazegraph on wdqs1004 (T242453) |
[production] |
06:35 |
<elukey@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=restbase1025.eqiad.wmnet |
[production] |
06:32 |
<elukey> |
powerdown restbase1025 - T250027 |
[production] |
06:20 |
<elukey> |
powercycle restbase1025 (not reachable, serial console shows blank, racadm getsel reports errors with DIMM_B2) |
[production] |
05:53 |
<bblack> |
pushing https://gerrit.wikimedia.org/r/588134 to cache_text |
[production] |
05:50 |
<vgutierrez> |
restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- T249335 |
[production] |
05:31 |
<bblack> |
pushing https://gerrit.wikimedia.org/r/588133 to cache_text |
[production] |
2020-04-11
§
|
19:52 |
<cdanis@cumin1001> |
dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json |
[production] |
17:35 |
<cdanis@cumin1001> |
dbctl commit (dc=all): 's8: +weight db1111, -weight db1126', diff saved to https://phabricator.wikimedia.org/P10959 and previous config saved to /var/cache/conftool/dbconfig/20200411-173517-cdanis.json |
[production] |
15:39 |
<vgutierrez> |
restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- T249335 |
[production] |
09:30 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) |
[production] |
09:20 |
<elukey@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers |
[production] |
07:01 |
<vgutierrez> |
restart ats-tls on cp[1079,1081,1083,1085].eqiad.wmnet- T249335 |
[production] |
2020-04-10
§
|
21:12 |
<cdanis@cumin1001> |
dbctl commit (dc=all): 'db1111 seems overloaded', diff saved to https://phabricator.wikimedia.org/P10954 and previous config saved to /var/cache/conftool/dbconfig/20200410-211202-cdanis.json |
[production] |
19:37 |
<cdanis> |
cdanis@re0.cr1-codfw> clear bfd session address 208.80.153.220 |
[production] |
15:03 |
<vgutierrez> |
restart ats-tls on cp1083 and cp1085 - T249335 |
[production] |
13:14 |
<hashar@deploy1001> |
Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 40s) |
[production] |
13:14 |
<hashar@deploy1001> |
Started deploy [zuul/deploy@4a69913]: (no justification provided) |
[production] |
13:12 |
<mutante> |
restarted and re-armed keyholder on deploy1001 to pick up changes for zuul scap deploy |
[production] |
12:12 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
12:11 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
12:10 |
<mutante> |
Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. (T249907) |
[production] |
12:10 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
12:10 |
<mutante> |
Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. This may take a few minutes. |
[production] |
12:10 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
12:09 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
12:09 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
11:47 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' . |
[production] |
11:47 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' . |
[production] |
11:44 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' . |
[production] |
11:39 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' . |
[production] |
09:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10953 and previous config saved to /var/cache/conftool/dbconfig/20200410-094359-marostegui.json |
[production] |
09:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10952 and previous config saved to /var/cache/conftool/dbconfig/20200410-093129-marostegui.json |
[production] |
08:52 |
<hashar@deploy1001> |
Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 16s) |
[production] |
08:51 |
<hashar@deploy1001> |
Started deploy [zuul/deploy@4a69913]: (no justification provided) |
[production] |
08:46 |
<hashar@deploy1001> |
Finished deploy [zuul/deploy@5a0a03a]: (no justification provided) (duration: 02m 20s) |
[production] |
08:44 |
<hashar@deploy1001> |
Started deploy [zuul/deploy@5a0a03a]: (no justification provided) |
[production] |
08:39 |
<mutante> |
deploy1001 - keyholder disarm, keyholder arm |
[production] |
08:32 |
<mutante> |
fix comment in deployment ssh key for zuul to include the path to the key on deploy1001 |
[production] |
08:24 |
<vgutierrez> |
update puppet compiler facts |
[production] |
08:20 |
<hashar@deploy1001> |
Finished deploy [integration/zuul/deploy@6c3ddad]: (no justification provided) (duration: 00m 11s) |
[production] |
08:19 |
<hashar@deploy1001> |
Started deploy [integration/zuul/deploy@6c3ddad]: (no justification provided) |
[production] |
08:03 |
<hashar@deploy1001> |
Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 05s) |
[production] |
08:03 |
<hashar@deploy1001> |
Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) |
[production] |
07:52 |
<mutante> |
closing port 80 on phab hosts for caching servers |
[production] |
07:37 |
<ema> |
cp3050: back to vhtcpd for the holidays T249583 |
[production] |
07:00 |
<mutante> |
sodium - sudo -u mirror ftpsync |
[production] |
06:58 |
<mutante> |
armed keyholder on deploy1001 |
[production] |
06:19 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |