2021-09-13
ยง
|
19:49 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:47 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
19:04 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:59 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
18:59 |
<urbanecm> |
[urbanecm@mwmaint2002 ~]$ mwscript resetAuthenticationThrottle.php --wiki={cswiki,cswikiversity} --signup --ip=185.47.223.49 # T290809 |
[production] |
18:58 |
<urbanecm@deploy1002> |
Synchronized wmf-config/throttle.php: 9db1d1ac938ca053c82fed88c8b6e75f97a52416: Add throttle rule for Czech wiki course (T290809) (duration: 00m 58s) |
[production] |
18:29 |
<ryankemper> |
[Cirrus] `eqiad` fully recovered (100% of shards), `codfw` at 99.816%. `codfw` is getting held up by recovery of `enwiki` shards which tend to be quite large |
[production] |
18:25 |
<razzi> |
reenable replication on dbstore1007 for T290841 |
[production] |
18:16 |
<cwhite> |
apply high log volume from ES mitigations to deprecated inputs |
[production] |
18:13 |
<razzi> |
razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for T290841 |
[production] |
18:05 |
<razzi> |
sudo systemctl restart mariadb@s2.service |
[production] |
17:48 |
<ryankemper> |
[Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83% |
[production] |
17:20 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet |
[production] |
17:17 |
<ryankemper> |
[Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered |
[production] |
16:57 |
<volans@cumin1001> |
START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet |
[production] |
16:18 |
<legoktm@cumin1001> |
conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main |
[production] |
16:16 |
<volans@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1414.* |
[production] |
16:08 |
<volans@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1414.* |
[production] |
16:06 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet |
[production] |
15:54 |
<moritzm> |
filtered mx2001 on the routers for reimage T286911 |
[production] |
15:43 |
<vgutierrez> |
update acme-chief to version 0.31 on acmechief-test hosts - T290249 |
[production] |
15:40 |
<vgutierrez> |
upload acme-chief 0.31 to apt.wm.o (buster) - T290249 |
[production] |
15:32 |
<jelto> |
Traffic: depool codfw from user traffic |
[production] |
15:26 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0) |
[production] |
15:25 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.02-restore-ttl |
[production] |
15:25 |
<volans@cumin1001> |
START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet |
[production] |
15:20 |
<Emperor> |
rebooting ms-be2045 to see if that brings the disk back properly T290881 |
[production] |
15:13 |
<jelto@cumin2002> |
conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async |
[production] |
15:13 |
<legoktm> |
(cotd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero) |
[production] |
15:13 |
<rzl> |
(contd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero) |
[production] |
15:12 |
<jelto@cumin2002> |
conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|eventstreams-internal|kartotherian|linkrecommendation|mathoid|mobileapps|ores|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|shellbox|shell |
[production] |
15:02 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0) |
[production] |
15:02 |
<topranks> |
Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm. |
[production] |
14:56 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep |
[production] |
14:44 |
<herron> |
drained mx2001 mail queue to mx1001 T286911 |
[production] |
14:38 |
<dcausse> |
restarting wdqs-updater.service on all wdqs servers |
[production] |
14:21 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0) |
[production] |
14:20 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.02-restore-ttl |
[production] |
14:13 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0) |
[production] |
14:13 |
<legoktm> |
(cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad |
[production] |
14:12 |
<jelto@cumin2002> |
Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex |
[production] |
14:12 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.01-switch-dc |
[production] |
14:11 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0) |
[production] |
14:05 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep |
[production] |
14:03 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet |
[production] |
13:51 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet |
[production] |
13:50 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet |
[production] |
13:39 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet |
[production] |
13:36 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet |
[production] |
13:21 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet |
[production] |