7951-8000 of 10000 results (30ms)
2020-06-03 ยง
21:16 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
21:15 <ryankemper> The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade [production]
21:13 <ryankemper> Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '{}' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing [production]
21:05 <ryankemper> Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true` [production]
21:03 <ryankemper@cumin2001> END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) [production]
20:58 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
20:52 <ryankemper@cumin2001> END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) [production]
20:49 <eileen> civicrm revision changed from eb156dffa4 to 63508b01b9, config revision is 95dcdb0a8a [production]
20:47 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
20:19 <gehel> elasticsearch cluster restart stopped [production]
20:18 <ryankemper@cumin2001> END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) [production]
19:35 <ppchelko@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
19:35 <ppchelko@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
19:33 <ppchelko@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
19:32 <ppchelko@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
19:30 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
19:29 <ppchelko@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
19:29 <ppchelko@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
19:20 <jforrester@deploy1001> rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 T253023 [production]
19:16 <hnowlan@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
19:15 <hnowlan@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
19:14 <jforrester@deploy1001> Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s) [production]
19:13 <jforrester@deploy1001> rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.35 [production]
19:05 <jforrester@deploy1001> Synchronized dblists/mobilemainpagelegacy.dblist: T32405 Stop special casing the main page on another 47 projects (duration: 01m 08s) [production]
19:01 <ppchelko@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601843 Enable talk pages on Swedish Minerva (duration: 01m 08s) [production]
18:59 <hnowlan@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
18:56 <hnowlan@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
18:55 <ppchelko@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601842 - Disable growth survey (duration: 01m 06s) [production]
18:49 <ppchelko@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 596277 Use AddFooterLink hook for code of conduct and contact links (duration: 01m 05s) [production]
18:34 <ppchelko@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 599150 - enable kafka purges for group0 (duration: 01m 06s) [production]
18:19 <ppchelko@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. CS.php (duration: 01m 05s) [production]
18:14 <ppchelko@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. IS.php (duration: 01m 06s) [production]
17:15 <ejegg> updated payments-wiki from e46114d8b1 to c1d14a5db7 [production]
17:08 <elukey> ganeti: gnd-instance reboot an-launcher1001 to get new memory settings - T254125 [production]
15:21 <hnowlan@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
15:19 <hnowlan@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
15:12 <hnowlan@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
14:50 <kormat@deploy1001> Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after reimaging T252182 (duration: 01m 06s) [production]
14:47 <moritzm> updated grafana on cloudmetrics* to 6.7.4 [production]
14:26 <kormat> stopping replication on pc1010 [production]
14:20 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
14:17 <kormat@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:16 <gehel> cleaning commonsrdf-dumps cron entry manually on snapshot1008 [production]
14:00 <hashar> Restarted CI Jenkins for plugin update [production]
13:59 <kormat@deploy1001> Synchronized wmf-config/db-eqiad.php: Replace pc1009 with pc1010 reimaging T252182 (duration: 01m 06s) [production]
13:47 <kormat> reimaging *pc1009 (promise) to buster T252182 [production]
13:44 <kormat> reimaging pc1007 to buster, wish me luck T252182 [production]
13:20 <hnowlan@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
13:13 <kormat@deploy1001> Synchronized wmf-config/db-codfw.php: Put pc2009 back into pc3 after reimaging T252182 (duration: 01m 05s) [production]
13:01 <hnowlan@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]