2024-08-14
§
|
12:52 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67293 and previous config saved to /var/cache/conftool/dbconfig/20240814-125245-arnaudb.json |
[production] |
12:49 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: replication table exclusion deployment |
[production] |
12:49 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: replication table exclusion deployment |
[production] |
12:37 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67292 and previous config saved to /var/cache/conftool/dbconfig/20240814-123739-arnaudb.json |
[production] |
12:22 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 16%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67291 and previous config saved to /var/cache/conftool/dbconfig/20240814-122234-arnaudb.json |
[production] |
12:07 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 8%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67290 and previous config saved to /var/cache/conftool/dbconfig/20240814-120729-arnaudb.json |
[production] |
11:52 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 4%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67289 and previous config saved to /var/cache/conftool/dbconfig/20240814-115223-arnaudb.json |
[production] |
11:37 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 2%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67288 and previous config saved to /var/cache/conftool/dbconfig/20240814-113718-arnaudb.json |
[production] |
11:23 |
<mvolz@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/citoid: apply |
[production] |
11:23 |
<mvolz@deploy1003> |
helmfile [eqiad] START helmfile.d/services/citoid: apply |
[production] |
11:22 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db2189 (re)pooling @ 1%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67287 and previous config saved to /var/cache/conftool/dbconfig/20240814-112212-arnaudb.json |
[production] |
11:20 |
<mvolz@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/citoid: apply |
[production] |
11:19 |
<mvolz@deploy1003> |
helmfile [codfw] START helmfile.d/services/citoid: apply |
[production] |
11:19 |
<mvolz@deploy1003> |
helmfile [staging] DONE helmfile.d/services/citoid: apply |
[production] |
11:18 |
<mvolz@deploy1003> |
helmfile [staging] START helmfile.d/services/citoid: apply |
[production] |
09:56 |
<fnegri@cumin1002> |
conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1 |
[production] |
09:26 |
<klausman@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/api-gateway: apply |
[production] |
09:26 |
<klausman@deploy1003> |
helmfile [codfw] START helmfile.d/services/api-gateway: apply |
[production] |
09:23 |
<klausman@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply |
[production] |
09:23 |
<klausman@deploy1003> |
helmfile [eqiad] START helmfile.d/services/api-gateway: apply |
[production] |
09:17 |
<klausman@deploy1003> |
helmfile [staging] DONE helmfile.d/services/api-gateway: apply |
[production] |
09:16 |
<klausman@deploy1003> |
helmfile [staging] START helmfile.d/services/api-gateway: apply |
[production] |
09:11 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up |
[production] |
09:11 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up |
[production] |
08:53 |
<jayme@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bullseye |
[production] |
08:46 |
<jayme@cumin1002> |
START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye |
[production] |
07:45 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: index corruption |
[production] |
07:45 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: index corruption |
[production] |
00:54 |
<eileen> |
config revision changed from d6f17100 to f569b590 |
[production] |
00:41 |
<eileen> |
civicrm upgraded from dd54b9ae to eecbba5d |
[production] |
00:11 |
<eileen> |
civicrm upgraded from 686c7c5f to dd54b9ae |
[production] |
00:04 |
<eileen> |
config revision changed from e8cc0ed6 to d6f17100 |
[production] |
2024-08-13
§
|
23:08 |
<ejegg> |
payments-wiki upgraded from 2d48f432 to 3eb3be67 |
[production] |
21:56 |
<inflatador> |
bking@cumin2002 reboot wdqs101[3-5],1018,1020 from DRAC due to unresponsiveness T372442 |
[production] |
21:16 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:16 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:15 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:15 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:09 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:09 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:07 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
21:07 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
20:51 |
<ryankemper@cumin2002> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards |
[production] |
20:22 |
<brett> |
Update ncmonitor to 1.2.0 via apt1002 |
[production] |
19:57 |
<ryankemper@cumin2002> |
START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards |
[production] |
19:44 |
<ebernhardson@deploy1003> |
helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:43 |
<ebernhardson@deploy1003> |
helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:32 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874 |
[production] |
19:29 |
<ryankemper@cumin2002> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards |
[production] |
19:27 |
<ryankemper@cumin2002> |
START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards |
[production] |