2020-06-03
ยง
|
21:16 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
21:15 |
<ryankemper> |
The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade |
[production] |
21:13 |
<ryankemper> |
Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '{}' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing |
[production] |
21:05 |
<ryankemper> |
Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true` |
[production] |
21:03 |
<ryankemper@cumin2001> |
END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) |
[production] |
20:58 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
20:52 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) |
[production] |
20:49 |
<eileen> |
civicrm revision changed from eb156dffa4 to 63508b01b9, config revision is 95dcdb0a8a |
[production] |
20:47 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
20:19 |
<gehel> |
elasticsearch cluster restart stopped |
[production] |
20:18 |
<ryankemper@cumin2001> |
END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) |
[production] |
19:35 |
<ppchelko@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
19:35 |
<ppchelko@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
19:33 |
<ppchelko@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
19:32 |
<ppchelko@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
19:30 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
19:29 |
<ppchelko@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
19:29 |
<ppchelko@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
19:20 |
<jforrester@deploy1001> |
rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 T253023 |
[production] |
19:16 |
<hnowlan@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
19:15 |
<hnowlan@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
19:14 |
<jforrester@deploy1001> |
Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s) |
[production] |
19:13 |
<jforrester@deploy1001> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.35 |
[production] |
19:05 |
<jforrester@deploy1001> |
Synchronized dblists/mobilemainpagelegacy.dblist: T32405 Stop special casing the main page on another 47 projects (duration: 01m 08s) |
[production] |
19:01 |
<ppchelko@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601843 Enable talk pages on Swedish Minerva (duration: 01m 08s) |
[production] |
18:59 |
<hnowlan@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
18:56 |
<hnowlan@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
18:55 |
<ppchelko@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601842 - Disable growth survey (duration: 01m 06s) |
[production] |
18:49 |
<ppchelko@deploy1001> |
Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 596277 Use AddFooterLink hook for code of conduct and contact links (duration: 01m 05s) |
[production] |
18:34 |
<ppchelko@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 599150 - enable kafka purges for group0 (duration: 01m 06s) |
[production] |
18:19 |
<ppchelko@deploy1001> |
Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. CS.php (duration: 01m 05s) |
[production] |
18:14 |
<ppchelko@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. IS.php (duration: 01m 06s) |
[production] |
17:15 |
<ejegg> |
updated payments-wiki from e46114d8b1 to c1d14a5db7 |
[production] |
17:08 |
<elukey> |
ganeti: gnd-instance reboot an-launcher1001 to get new memory settings - T254125 |
[production] |
15:21 |
<hnowlan@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
15:19 |
<hnowlan@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
15:12 |
<hnowlan@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
14:50 |
<kormat@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after reimaging T252182 (duration: 01m 06s) |
[production] |
14:47 |
<moritzm> |
updated grafana on cloudmetrics* to 6.7.4 |
[production] |
14:26 |
<kormat> |
stopping replication on pc1010 |
[production] |
14:20 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:17 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:16 |
<gehel> |
cleaning commonsrdf-dumps cron entry manually on snapshot1008 |
[production] |
14:00 |
<hashar> |
Restarted CI Jenkins for plugin update |
[production] |
13:59 |
<kormat@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Replace pc1009 with pc1010 reimaging T252182 (duration: 01m 06s) |
[production] |
13:47 |
<kormat> |
reimaging *pc1009 (promise) to buster T252182 |
[production] |
13:44 |
<kormat> |
reimaging pc1007 to buster, wish me luck T252182 |
[production] |
13:20 |
<hnowlan@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
13:13 |
<kormat@deploy1001> |
Synchronized wmf-config/db-codfw.php: Put pc2009 back into pc3 after reimaging T252182 (duration: 01m 05s) |
[production] |
13:01 |
<hnowlan@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |