201-250 of 10000 results (25ms)
2020-12-17 ยง
12:36 <jbond42> disable puppet fleet wide for condif master vhost change [production]
12:23 <matthiasmullie> EU backport+config window done [production]
12:23 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE [production]
12:22 <mlitn@deploy1001> Synchronized wmf-config/InitialiseSettings.php: f3a50cb06: Enable ContentTranslation as default tool for ceb, km, mg, tg and yi WPs (duration: 01m 02s) [production]
12:21 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE [production]
12:17 <mlitn@deploy1001> Synchronized wmf-config/InitialiseSettings.php: a29fec312: Add Wikidocumentaries campaign for ContentTranslation (duration: 01m 02s) [production]
12:07 <mlitn@deploy1001> Synchronized wmf-config/SearchSettingsForSDC.php: 68ac6fa61: Media Search: Remove license map from config (duration: 01m 04s) [production]
11:38 <kart_> Updated cxserver to 2020-12-17-111820-production (T262192) [production]
11:36 <kartik@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
11:34 <kartik@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
11:32 <kartik@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [production]
11:27 <godog> bounce apache2 on grafana1002 [production]
11:26 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE [production]
11:24 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE [production]
11:22 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE [production]
11:21 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE [production]
11:21 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE [production]
11:20 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE [production]
11:20 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE [production]
11:18 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE [production]
11:16 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE [production]
11:16 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE [production]
11:10 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
11:08 <jbond@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
10:50 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 [production]
10:45 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 [production]
10:21 <jbond42> updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872 [production]
09:57 <vgutierrez> repool ats-tls on cp5011 [production]
09:00 <marostegui> Sanitize s1 and s5 on db1154 T268742 [production]
08:30 <godog> swift codfw-prod: more weight to ms-be20[58-61] - T269337 [production]
07:49 <ryankemper> [wdqs deploy] (wdqs deploy complete) [production]
07:19 <marostegui> Stop mysql on db1082 to clone db1154 [production]
07:19 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json [production]
07:18 <elukey> reboot an-airflow1001 for kernel upgrades [production]
07:08 <elukey> update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706 [production]
07:08 <ryankemper> [wdqs] depooled `wdqs1013` while it catches up on lag [production]
07:06 <ryankemper> [wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` [production]
07:05 <ryankemper> [wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [production]
07:05 <ryankemper> [wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` [production]
07:04 <ryankemper@deploy1001> Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s) [production]
06:54 <ryankemper> [wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet [production]
06:53 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56 [production]
06:53 <ryankemper> [wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy [production]
06:52 <kart_> Updated cxserver to 2020-12-16-164911-production (T234220, T269437) [production]
06:52 <kart_> Updated cxserver to 2020-12-16-164911-production (T234220, T234220) [production]
06:22 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool es1013 for decommissioning T268436', diff saved to https://phabricator.wikimedia.org/P13562 and previous config saved to /var/cache/conftool/dbconfig/20201217-062249-marostegui.json [production]
06:22 <kartik@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
06:19 <kartik@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
06:17 <kartik@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [production]
06:13 <marostegui@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]