1-50 of 10000 results (43ms)
2022-06-08 ยง
23:15 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - ryankemper@cumin1001 - T309648 [production]
23:11 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - ryankemper@cumin1001 - T309648 [production]
23:08 <ryankemper> T309648 Built `wmf-elasticsearch-search-plugins_6.8.23-3` (https://gerrit.wikimedia.org/r/c/operations/software/elasticsearch/plugins/+/804003) following steps in https://phabricator.wikimedia.org/P19522. Result: https://apt.wikimedia.org/wikimedia/pool/component/elastic68/w/wmf-elasticsearch-search-plugins/ [production]
22:03 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
22:00 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
22:00 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
21:53 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
21:52 <cjming@deploy1002> Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803988|[beta cluster] Enable VectorTitleAboveTabs (T309398)]] (duration: 03m 32s) [production]
21:41 <mutante> repooled mw1415 after restarting apache and php-fpm, seeing all Icinga alerts recover etc T307755 T310225 [production]
21:40 <dzahn@cumin2002> conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1415.eqiad.wmnet [production]
21:23 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
21:17 <dzahn@cumin2002> conftool action : set/pooled=no; selector: dc=eqiad,name=mw1415.eqiad.wmnet [production]
21:17 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
21:17 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
21:13 <mutante> mw1415 - scap pull, restart apache, /usr/local/sbin/restart-php7.2-fpm (INFO: The server is depooled from all services. Restarting the service directly) [production]
21:10 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
20:58 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet [production]
20:52 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet [production]
20:44 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
20:43 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
20:43 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
20:42 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
20:42 <Krinkle> krinkle@mw1415: Run `scap pull` manually ref T310225 [production]
20:35 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.15" [production]
20:33 <dduvall> rolling back group0 as well due to T310214 [production]
19:58 <urandom> restarting Cassandra, aqs1010-{a,b}, to apply logback work-around -- T309896 [production]
19:51 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet [production]
19:46 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet [production]
19:32 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet [production]
19:28 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet [production]
19:27 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet [production]
19:23 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet [production]
19:23 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet [production]
19:20 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet [production]
19:19 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet [production]
19:14 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet [production]
19:14 <aokoth@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet [production]
19:08 <aokoth@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet [production]
18:41 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
18:40 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
18:40 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
18:39 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
18:38 <urandom> uprading aqs1010.eqiad.wmnet to Cassandra 3.11.13 (canary) -- T309896 [production]
18:32 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance [production]
18:32 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance [production]
18:28 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.39.0-wmf.15" [production]
18:24 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
18:23 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
18:23 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
18:21 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance [production]