2020-12-17
ยง
|
13:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1087 to clone db1154:3318 add db1092 as vslow,dump service for s8 T268742 ', diff saved to https://phabricator.wikimedia.org/P13571 and previous config saved to /var/cache/conftool/dbconfig/20201217-130101-marostegui.json |
[production] |
12:56 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1089 (re)pooling @ 25%: Repool db1089 after helping out on db1106', diff saved to https://phabricator.wikimedia.org/P13570 and previous config saved to /var/cache/conftool/dbconfig/20201217-125624-root.json |
[production] |
12:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13569 and previous config saved to /var/cache/conftool/dbconfig/20201217-125556-root.json |
[production] |
12:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Change db1089 weights', diff saved to https://phabricator.wikimedia.org/P13568 and previous config saved to /var/cache/conftool/dbconfig/20201217-125535-marostegui.json |
[production] |
12:54 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1106 after cloning db1154:3311 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13567 and previous config saved to /var/cache/conftool/dbconfig/20201217-125446-marostegui.json |
[production] |
12:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13566 and previous config saved to /var/cache/conftool/dbconfig/20201217-124052-root.json |
[production] |
12:36 |
<jbond42> |
disable puppet fleet wide for condif master vhost change |
[production] |
12:23 |
<matthiasmullie> |
EU backport+config window done |
[production] |
12:23 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE |
[production] |
12:22 |
<mlitn@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: f3a50cb06: Enable ContentTranslation as default tool for ceb, km, mg, tg and yi WPs (duration: 01m 02s) |
[production] |
12:21 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE |
[production] |
12:17 |
<mlitn@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: a29fec312: Add Wikidocumentaries campaign for ContentTranslation (duration: 01m 02s) |
[production] |
12:07 |
<mlitn@deploy1001> |
Synchronized wmf-config/SearchSettingsForSDC.php: 68ac6fa61: Media Search: Remove license map from config (duration: 01m 04s) |
[production] |
11:38 |
<kart_> |
Updated cxserver to 2020-12-17-111820-production (T262192) |
[production] |
11:36 |
<kartik@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
11:34 |
<kartik@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
11:32 |
<kartik@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . |
[production] |
11:27 |
<godog> |
bounce apache2 on grafana1002 |
[production] |
11:26 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE |
[production] |
11:24 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE |
[production] |
11:22 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE |
[production] |
11:21 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE |
[production] |
11:21 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE |
[production] |
11:20 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE |
[production] |
11:20 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE |
[production] |
11:18 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE |
[production] |
11:16 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE |
[production] |
11:16 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE |
[production] |
11:10 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) |
[production] |
11:08 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.reboot-single |
[production] |
10:50 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
10:45 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
10:21 |
<jbond42> |
updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872 |
[production] |
09:57 |
<vgutierrez> |
repool ats-tls on cp5011 |
[production] |
09:00 |
<marostegui> |
Sanitize s1 and s5 on db1154 T268742 |
[production] |
08:30 |
<godog> |
swift codfw-prod: more weight to ms-be20[58-61] - T269337 |
[production] |
07:49 |
<ryankemper> |
[wdqs deploy] (wdqs deploy complete) |
[production] |
07:19 |
<marostegui> |
Stop mysql on db1082 to clone db1154 |
[production] |
07:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json |
[production] |
07:18 |
<elukey> |
reboot an-airflow1001 for kernel upgrades |
[production] |
07:08 |
<elukey> |
update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706 |
[production] |
07:08 |
<ryankemper> |
[wdqs] depooled `wdqs1013` while it catches up on lag |
[production] |
07:06 |
<ryankemper> |
[wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
07:05 |
<ryankemper> |
[wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
07:05 |
<ryankemper> |
[wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
07:04 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s) |
[production] |
06:54 |
<ryankemper> |
[wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet |
[production] |
06:53 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56 |
[production] |
06:53 |
<ryankemper> |
[wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy |
[production] |
06:52 |
<kart_> |
Updated cxserver to 2020-12-16-164911-production (T234220, T269437) |
[production] |