2020-11-19
§
|
20:19 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
20:17 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:12 |
<herron> |
upgraded logstash-next to kibana 7.10 |
[production] |
19:23 |
<otto@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
19:23 |
<otto@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
19:20 |
<otto@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
19:20 |
<otto@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
19:14 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
18:48 |
<mutante> |
gerrit1001 - re-enabling puppet after merging gerrit:642086 for T268260 (upstream bug 13701) |
[production] |
18:41 |
<mutante> |
gerrit1001 - added RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME} in apache config, reloaded apache to fix redirect issue |
[production] |
18:37 |
<mutante> |
gerrit1001 - disabled puppet |
[production] |
18:19 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) |
[production] |
18:07 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) |
[production] |
18:03 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) |
[production] |
17:59 |
<clarakosi@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' . |
[production] |
17:47 |
<clarakosi@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' . |
[production] |
17:33 |
<hashar@deploy1001> |
Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 (duration: 00m 09s) |
[production] |
17:33 |
<hashar@deploy1001> |
Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 |
[production] |
17:32 |
<hashar> |
Upgrading Gerrit to 3.2.5 and restarting it |
[production] |
17:05 |
<dancy@deploy1001> |
Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 06s) |
[production] |
17:04 |
<dancy@deploy1001> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16 |
[production] |
16:59 |
<ryankemper> |
T246345 [wdqs] Data-transfer of new wdqs node `wdqs1012` is complete, beginning transfer of `wdqs1004`->`wdqs1013` (public) and `wdqs1003`->`wdqs1011` (internal). Once these transfers are done `wdqs1012` and `wdqs1013` will need to be pooled and have their weights set to 10 after verifying they're healthy |
[production] |
16:58 |
<kormat> |
started mariadb on pc2010, now with more 🤞 |
[production] |
16:58 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
16:54 |
<kormat> |
stopping mariadb on pc2010 |
[production] |
16:54 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
16:43 |
<hashar> |
Restarting Gerrit replica instance on gerrit2001 |
[production] |
16:42 |
<hashar@deploy1001> |
Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) (duration: 00m 10s) |
[production] |
16:42 |
<hashar@deploy1001> |
Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) |
[production] |
16:41 |
<kormat> |
stopped and started replication on pc2010 to see if that would help it recover |
[production] |
16:40 |
<hashar@deploy1001> |
Finished deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 (duration: 00m 05s) |
[production] |
16:40 |
<hashar@deploy1001> |
Started deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 |
[production] |
16:35 |
<elukey> |
roll restart hadoop workers for openjdk upgrades |
[production] |
16:35 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-workers |
[production] |
16:06 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) |
[production] |
15:58 |
<moritzm> |
installing jupyter-notebook security updates on an-coord* |
[production] |
15:56 |
<elukey@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers |
[production] |
15:52 |
<bblack> |
dns*: upgrade to gdnsd-3.4.0 on remainder of the dns fleet' |
[production] |
15:44 |
<bblack> |
dns3001: upgrade gdnsd to 3.4.0 |
[production] |
15:43 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
15:41 |
<bblack> |
dns1001: upgrade gdnsd to 3.4.0 |
[production] |
15:40 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
15:40 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
15:36 |
<bblack> |
dns3002: upgrade gdnsd to 3.4.0 |
[production] |
15:36 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
15:36 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
15:31 |
<bblack> |
authdns1001: upgrade gdnsd to 3.4.0 |
[production] |
15:30 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
15:29 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
15:26 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |