2021-04-14
ยง
|
11:03 |
<akosiaris@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . |
[production] |
11:03 |
<akosiaris@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . |
[production] |
11:02 |
<akosiaris@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' . |
[production] |
11:02 |
<akosiaris@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . |
[production] |
11:02 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE |
[production] |
11:01 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE |
[production] |
10:59 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE |
[production] |
10:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 70%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15322 and previous config saved to /var/cache/conftool/dbconfig/20210414-105202-root.json |
[production] |
10:48 |
<dcaro> |
Upgrade of codfw ceph to octopus 15.2.20 done, will run some performance tests now (T274566) |
[admin] |
10:41 |
<dcaro> |
Upgrade of codfw ceph to octopus 15.2.20, mgrs upgraded, osds next (T274566) |
[admin] |
10:37 |
<dcaro> |
Upgrade of codfw ceph to octopus 15.2.20, mons upgraded, mgrs next (T274566) |
[admin] |
10:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 60%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15321 and previous config saved to /var/cache/conftool/dbconfig/20210414-103659-root.json |
[production] |
10:30 |
<marostegui> |
Failover m1 from db1080 to db1159 - T276448 |
[production] |
10:25 |
<dcaro@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus |
[production] |
10:25 |
<dcaro@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus |
[production] |
10:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15320 and previous config saved to /var/cache/conftool/dbconfig/20210414-102153-root.json |
[production] |
10:15 |
<dcaro> |
starting the upgrade of codfw ceph to octopus 15.2.20 (T274566) |
[admin] |
10:07 |
<dcaro> |
Merged the ceph 15 (Octopus) repo deployment to codfw, only the repo, not the packages (T274566) |
[admin] |
10:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 40%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15319 and previous config saved to /var/cache/conftool/dbconfig/20210414-100649-root.json |
[production] |
10:02 |
<elukey> |
roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart) |
[analytics] |
09:54 |
<elukey> |
kill long running mediawiki-job refine erroring out application_1615988861843_166906 |
[analytics] |
09:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 30%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15318 and previous config saved to /var/cache/conftool/dbconfig/20210414-095146-root.json |
[production] |
09:46 |
<elukey> |
kill application_1615988861843_163186 for the same reason |
[analytics] |
09:43 |
<elukey> |
kill application_1615988861843_164387 to see if any improvement to socket consumption is made |
[analytics] |
09:37 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
09:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 20%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15317 and previous config saved to /var/cache/conftool/dbconfig/20210414-093642-root.json |
[production] |
09:33 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15316 and previous config saved to /var/cache/conftool/dbconfig/20210414-093305-marostegui.json |
[production] |
09:29 |
<gehel> |
depooling wdqs1004 - corrupted data after data reload |
[production] |
09:27 |
<effie> |
disable puppet on all mediawiki servers to merge 676580 |
[production] |
09:24 |
<urbanecm@deploy1002> |
Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/Hooks/HookUtils.php: e4b2d93dcf86a336314ed09fd37844edb16f4f30: Dont allow query and cookie hacks to enable topic subscriptions (T280082) (duration: 01m 24s) |
[production] |
09:23 |
<gehel> |
repooling wdqs1013, catched up on lag |
[production] |
09:22 |
<gehel> |
depooling wdqs1003 - corrupted data after data reload |
[production] |
09:19 |
<jmm@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kraz.wikimedia.org |
[production] |
09:16 |
<gehel> |
restarting blazegraph on wdqs1003 |
[production] |
09:14 |
<elukey> |
run "sudo kill `pgrep -f sqoop`" on an-launcher1002 to clean up old test processes still running |
[analytics] |
09:12 |
<ryankemper> |
T267927 depooled `wdqs1004` following data transfer (catching up on lag), current round of data transfers is done so there shouldn't be any left to depool |
[production] |
09:10 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
09:09 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts kraz.wikimedia.org |
[production] |
09:09 |
<arturo> |
enable XFF for wsexport.{wmflabs,wmcloud}.org https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/10fd2f002c3c2b20d5ca9b359b540e33789defc0%5E%21/#F0 (T279111) |
[project-proxy] |
09:06 |
<arturo> |
cleanup horizon hiera that is applied to both 'proxy' prefix and project-level (leave the project-level one) |
[project-proxy] |
09:06 |
<ryankemper> |
T267927 depool `wdqs2001` following data transfer (catching up on lag) |
[production] |
09:03 |
<jmm@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast1002.wikimedia.org |
[production] |
09:03 |
<ryankemper@cumin2001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
08:53 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts bast1002.wikimedia.org |
[production] |
08:44 |
<Urbanecm> |
Run scap pull on mwdebug1002 |
[production] |
08:40 |
<Urbanecm> |
Stagging on mwdebug1002 |
[production] |
08:20 |
<akosiaris@cumin1001> |
conftool action : set/weight=10; selector: cluster=videoscaler,service=apache2,name=mw2394.codfw.wmnet |
[production] |
08:20 |
<akosiaris@cumin1001> |
conftool action : set/weight=10; selector: cluster=videoscaler,service=apache2,name=mw2395.codfw.wmnet |
[production] |
08:16 |
<jiji@cumin1001> |
conftool action : set/pooled=yes; selector: name=(wtp1033.eqiad.wmnet|wtp1032.eqiad.wmnet) |
[production] |
08:07 |
<jayme> |
updated chartmuseum to 0.13.1 on charmuseum1001, chartmuseum2001 |
[production] |