251-300 of 10000 results (24ms)
2020-06-04 §
08:59 <akosiaris@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:58 <akosiaris@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:50 <marostegui> Repool labsdb1009 after running maintain-views T252219 [production]
08:42 <moritzm> restarting archiva to pick up Java security updates [production]
08:15 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1107 to clone db1091 on s1 T253217', diff saved to https://phabricator.wikimedia.org/P11392 and previous config saved to /var/cache/conftool/dbconfig/20200604-081545-marostegui.json [production]
08:14 <marostegui> Run sudo /usr/local/sbin/maintain-views --all-databases --replace-all on labsdb1009 - T252219 [production]
07:49 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
07:45 <marostegui> Depool labsdb1009 - T252219 [production]
07:45 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime [production]
07:33 <oblivian@puppetmaster1001> conftool action : set/weight=10; selector: dc=eqiad,cluster=labweb,service=labweb-ssl [production]
07:32 <oblivian@puppetmaster1001> conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=cloudceph,service=cloudceph [production]
06:52 <mutante> mwmaint1002 started mediawiki_job_cirrus_build_completion_indices_eqiad.service [production]
06:06 <oblivian@puppetmaster1001> conftool action : set/weight=10; selector: name=logstash200.* [production]
06:05 <oblivian@puppetmaster1001> conftool action : set/weight=10; selector: name=logstash100.* [production]
06:04 <oblivian@puppetmaster1001> conftool action : set/weight=10; selector: cluster=eventschemas,service=eventschemas [production]
06:02 <oblivian@puppetmaster1001> conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch.* [production]
06:01 <oblivian@puppetmaster1001> conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch [production]
05:59 <_joe_> fixing weights of cp2040 T245594 [production]
05:31 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
05:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
00:36 <reedy@deploy1001> Synchronized php-1.35.0-wmf.35/includes/specials/SpecialUserrights.php: T254417 T251534 (duration: 01m 06s) [production]
2020-06-03 §
23:08 <reedy@deploy1001> Synchronized wmf-config/CommonSettings-labs.php: T249834 (duration: 01m 06s) [production]
23:06 <reedy@deploy1001> Synchronized wmf-config/InitialiseSettings-labs.php: T249834 (duration: 01m 06s) [production]
22:22 <ryankemper@cumin2001> END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) [production]
21:54 <jforrester@deploy1001> rebuilt and synchronized wikiversions files: Re-rolling group1 to 1.35.0-wmf.35 for T253023 [production]
21:49 <jforrester@deploy1001> Synchronized php-1.35.0-wmf.35/extensions/EventStreamConfig/includes/ApiStreamConfigs.php: T254390 ApiStreamConfigs: If the 'constraints' parameter is unset, don't explode (duration: 01m 06s) [production]
21:43 <cstone> civicrm revision changed from 63508b01b9 to 11b0e7c7e5 [production]
21:16 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
21:15 <ryankemper> The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade [production]
21:13 <ryankemper> Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '{}' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing [production]
21:05 <ryankemper> Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true` [production]
21:03 <ryankemper@cumin2001> END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) [production]
20:58 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
20:52 <ryankemper@cumin2001> END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0) [production]
20:49 <eileen> civicrm revision changed from eb156dffa4 to 63508b01b9, config revision is 95dcdb0a8a [production]
20:47 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
20:19 <gehel> elasticsearch cluster restart stopped [production]
20:18 <ryankemper@cumin2001> END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97) [production]
19:35 <ppchelko@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
19:35 <ppchelko@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
19:33 <ppchelko@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
19:32 <ppchelko@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
19:30 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-upgrade [production]
19:29 <ppchelko@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
19:29 <ppchelko@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
19:20 <jforrester@deploy1001> rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 T253023 [production]
19:16 <hnowlan@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
19:15 <hnowlan@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
19:14 <jforrester@deploy1001> Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s) [production]
19:13 <jforrester@deploy1001> rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.35 [production]