2022-01-26
§
|
06:30 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance |
[production] |
06:30 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance |
[production] |
06:30 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance |
[production] |
06:24 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db2086 (s7,s8) T299882', diff saved to https://phabricator.wikimedia.org/P19229 and previous config saved to /var/cache/conftool/dbconfig/20220126-062406-marostegui.json |
[production] |
05:02 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@dc7c5ac] (wcqs): Deploy 0.3.100 to WCQS (duration: 02m 21s) |
[production] |
04:59 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@dc7c5ac] (wcqs): Deploy 0.3.100 to WCQS |
[production] |
04:56 |
<ryankemper> |
[WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good |
[production] |
03:42 |
<ryankemper> |
[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
03:42 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
03:42 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
03:40 |
<ryankemper@deploy1002> |
Finished deploy [wdqs/wdqs@dc7c5ac]: 0.3.100 (duration: 08m 35s) |
[production] |
03:32 |
<ryankemper> |
[WDQS Deploy] Tests passing following deploy of `0.3.100` on canary `wdqs1003`; proceeding to rest of fleet |
[production] |
03:31 |
<ryankemper@deploy1002> |
Started deploy [wdqs/wdqs@dc7c5ac]: 0.3.100 |
[production] |
03:30 |
<ryankemper> |
[WDQS Deploy] Gearing up for deploy of wdqs `0.3.100`. Pre-deploy tests passing on canary `wdqs1003` |
[production] |
02:49 |
<ryankemper> |
[WDQS] T299098 `ryankemper@wdqs2003:~$ sudo pool` (forgot to pool after dcops fixed hw issue) |
[production] |
01:05 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
01:03 |
<catrope@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:757087|Enable migration mode on Italian and MediaWIki.org (T299927)]] (duration: 00m 54s) |
[production] |
01:01 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
01:01 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
01:00 |
<catrope@deploy1002> |
Synchronized php-1.38.0-wmf.18/skins/Vector/: Backport: [[gerrit:756997|Do not load common.js twice (T300070)]] and [[gerrit:756696|Fix bug in SkinVersionLookup (T299971)]] (duration: 00m 51s) |
[production] |
01:00 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:56 |
<catrope@deploy1002> |
Synchronized php-1.38.0-wmf.19/skins/Vector/: Backport: [[gerrit:756998|Do not load common.js twice (T300070)]] (duration: 02m 43s) |
[production] |
00:55 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:54 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:54 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:53 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:48 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:44 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:44 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:37 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:11 |
<ryankemper> |
T294805 Reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003 (elasticsearch-oss dependency issues, will pick this back up tomorrow); re-enabling puppet across elastic1* |
[production] |
00:03 |
<ryankemper> |
T294805 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003; running puppet on `elastic1068` to make it join the fleet |
[production] |
2022-01-25
§
|
23:42 |
<ryankemper> |
T294805 [Elastic] Step 2: Disabling puppet in advance of merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/736117 |
[production] |
23:20 |
<ryankemper> |
T294805 [Elastic] Merged https://gerrit.wikimedia.org/r/736116, step 1 of bringing new eqiad 10G refresh hosts into service |
[production] |
21:20 |
<bblack@cumin1001> |
conftool action : set/weight=100; selector: dc=drmrs,service=ats-be |
[production] |
21:20 |
<bblack@cumin1001> |
conftool action : set/weight=1; selector: dc=drmrs,service=varnish-fe |
[production] |
21:20 |
<bblack@cumin1001> |
conftool action : set/weight=1; selector: dc=drmrs,service=ats-tls |
[production] |
21:03 |
<cwhite> |
end transition to logstash output opensearch plugin T299168 |
[production] |
20:41 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
20:35 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
20:35 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
20:29 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
20:18 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
20:17 |
<cwhite> |
begin transition to logstash output opensearch plugin T299168 |
[production] |
20:12 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
20:12 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
20:08 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
20:05 |
<brennen@deploy1002> |
rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.19 refs T293960 |
[production] |
20:03 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1008.eqiad.wmnet with OS buster |
[production] |
20:01 |
<brennen> |
train 1.38.0-wmf.19 (T293960): testwiki sync finished, still no open blockers, proceeding to group0 |
[production] |