2020-06-04
§
|
08:59 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:59 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:58 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:50 |
<marostegui> |
Repool labsdb1009 after running maintain-views T252219 |
[production] |
08:42 |
<moritzm> |
restarting archiva to pick up Java security updates |
[production] |
08:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1107 to clone db1091 on s1 T253217', diff saved to https://phabricator.wikimedia.org/P11392 and previous config saved to /var/cache/conftool/dbconfig/20200604-081545-marostegui.json |
[production] |
08:14 |
<marostegui> |
Run sudo /usr/local/sbin/maintain-views --all-databases --replace-all on labsdb1009 - T252219 |
[production] |
07:49 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
07:45 |
<marostegui> |
Depool labsdb1009 - T252219 |
[production] |
07:45 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
07:33 |
<oblivian@puppetmaster1001> |
conftool action : set/weight=10; selector: dc=eqiad,cluster=labweb,service=labweb-ssl |
[production] |
07:32 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=cloudceph,service=cloudceph |
[production] |
06:52 |
<mutante> |
mwmaint1002 started mediawiki_job_cirrus_build_completion_indices_eqiad.service |
[production] |
06:06 |
<oblivian@puppetmaster1001> |
conftool action : set/weight=10; selector: name=logstash200.* |
[production] |
06:05 |
<oblivian@puppetmaster1001> |
conftool action : set/weight=10; selector: name=logstash100.* |
[production] |
06:04 |
<oblivian@puppetmaster1001> |
conftool action : set/weight=10; selector: cluster=eventschemas,service=eventschemas |
[production] |
06:02 |
<oblivian@puppetmaster1001> |
conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch.* |
[production] |
06:01 |
<oblivian@puppetmaster1001> |
conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch |
[production] |
05:59 |
<_joe_> |
fixing weights of cp2040 T245594 |
[production] |
05:31 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
05:28 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
00:36 |
<reedy@deploy1001> |
Synchronized php-1.35.0-wmf.35/includes/specials/SpecialUserrights.php: T254417 T251534 (duration: 01m 06s) |
[production] |
2020-06-03
§
|
23:08 |
<reedy@deploy1001> |
Synchronized wmf-config/CommonSettings-labs.php: T249834 (duration: 01m 06s) |
[production] |
23:06 |
<reedy@deploy1001> |
Synchronized wmf-config/InitialiseSettings-labs.php: T249834 (duration: 01m 06s) |
[production] |
22:22 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) |
[production] |
21:54 |
<jforrester@deploy1001> |
rebuilt and synchronized wikiversions files: Re-rolling group1 to 1.35.0-wmf.35 for T253023 |
[production] |
21:49 |
<jforrester@deploy1001> |
Synchronized php-1.35.0-wmf.35/extensions/EventStreamConfig/includes/ApiStreamConfigs.php: T254390 ApiStreamConfigs: If the 'constraints' parameter is unset, don't explode (duration: 01m 06s) |
[production] |
21:43 |
<cstone> |
civicrm revision changed from 63508b01b9 to 11b0e7c7e5 |
[production] |
21:16 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
21:15 |
<ryankemper> |
The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade |
[production] |
21:13 |
<ryankemper> |
Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '{}' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing |
[production] |
21:05 |
<ryankemper> |
Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true` |
[production] |
21:03 |
<ryankemper@cumin2001> |
END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) |
[production] |
20:58 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
20:52 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) |
[production] |
20:49 |
<eileen> |
civicrm revision changed from eb156dffa4 to 63508b01b9, config revision is 95dcdb0a8a |
[production] |
20:47 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
20:19 |
<gehel> |
elasticsearch cluster restart stopped |
[production] |
20:18 |
<ryankemper@cumin2001> |
END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) |
[production] |
19:35 |
<ppchelko@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
19:35 |
<ppchelko@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
19:33 |
<ppchelko@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
19:32 |
<ppchelko@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
19:30 |
<ryankemper@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-upgrade |
[production] |
19:29 |
<ppchelko@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
19:29 |
<ppchelko@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
19:20 |
<jforrester@deploy1001> |
rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 T253023 |
[production] |
19:16 |
<hnowlan@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
19:15 |
<hnowlan@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
19:14 |
<jforrester@deploy1001> |
Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s) |
[production] |