2020-12-17
§
|
11:10 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) |
[production] |
11:08 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.reboot-single |
[production] |
10:50 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
10:45 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
10:21 |
<jbond42> |
updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872 |
[production] |
09:57 |
<vgutierrez> |
repool ats-tls on cp5011 |
[production] |
09:00 |
<marostegui> |
Sanitize s1 and s5 on db1154 T268742 |
[production] |
08:30 |
<godog> |
swift codfw-prod: more weight to ms-be20[58-61] - T269337 |
[production] |
07:49 |
<ryankemper> |
[wdqs deploy] (wdqs deploy complete) |
[production] |
07:19 |
<marostegui> |
Stop mysql on db1082 to clone db1154 |
[production] |
07:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json |
[production] |
07:18 |
<elukey> |
reboot an-airflow1001 for kernel upgrades |
[production] |
07:08 |
<elukey> |
update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706 |
[production] |
07:08 |
<ryankemper> |
[wdqs] depooled `wdqs1013` while it catches up on lag |
[production] |
07:06 |
<ryankemper> |
[wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
07:05 |
<ryankemper> |
[wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
07:05 |
<ryankemper> |
[wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
07:04 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s) |
[production] |
06:54 |
<ryankemper> |
[wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet |
[production] |
06:53 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56 |
[production] |
06:53 |
<ryankemper> |
[wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy |
[production] |
06:52 |
<kart_> |
Updated cxserver to 2020-12-16-164911-production (T234220, T269437) |
[production] |
06:52 |
<kart_> |
Updated cxserver to 2020-12-16-164911-production (T234220, T234220) |
[production] |
06:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool es1013 for decommissioning T268436', diff saved to https://phabricator.wikimedia.org/P13562 and previous config saved to /var/cache/conftool/dbconfig/20201217-062249-marostegui.json |
[production] |
06:22 |
<kartik@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:19 |
<kartik@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:17 |
<kartik@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . |
[production] |
06:13 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
06:05 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
05:56 |
<marostegui> |
Stop mysql on db1106 to clone db1154 |
[production] |
05:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1106 for cloning db1154:3311 T268742 ', diff saved to https://phabricator.wikimedia.org/P13560 and previous config saved to /var/cache/conftool/dbconfig/20201217-055556-marostegui.json |
[production] |
01:35 |
<andrew@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1019.eqiad.wmnet with reason: REIMAGE |
[production] |
01:33 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1019.eqiad.wmnet with reason: REIMAGE |
[production] |
01:01 |
<twentyafterfour> |
preparing to update phabricator translations |
[production] |
00:22 |
<mutante> |
running puppet on mw2266, mw2370, mw2354 |
[production] |
2020-12-16
§
|
23:56 |
<bstorm> |
bootstrapped meta_p database for the new s7 replicas T269427 |
[production] |
20:12 |
<marxarelli> |
group1 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415) |
[production] |
20:06 |
<dduvall@deploy1001> |
Synchronized php: group1 wikis to 1.36.0-wmf.22 (duration: 01m 01s) |
[production] |
20:05 |
<legoktm> |
added myself to the ops LDAP group |
[production] |
20:05 |
<dduvall@deploy1001> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.22 |
[production] |
19:23 |
<dcausse> |
Morning backport window deploy done |
[production] |
19:21 |
<dcausse@deploy1001> |
Synchronized php-1.36.0-wmf.22/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s) |
[production] |
19:18 |
<dcausse@deploy1001> |
Synchronized php-1.36.0-wmf.21/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s) |
[production] |
19:09 |
<dcausse@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: T266359: wgMinervaCountErrors config was removed (duration: 01m 03s) |
[production] |
17:52 |
<effie> |
uploading python-thumbor-wikimedia_2.9-1 to stretch-wikimedia/component/thumbor |
[production] |
16:40 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE |
[production] |
16:38 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE |
[production] |
16:38 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1019.eqiad.wmnet with reason: REIMAGE |
[production] |
16:36 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1019.eqiad.wmnet with reason: REIMAGE |
[production] |
16:32 |
<akosiaris@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' . |
[production] |