2021-09-13
ยง
|
17:48 |
<ryankemper> |
[Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83% |
[production] |
17:20 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet |
[production] |
17:17 |
<ryankemper> |
[Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered |
[production] |
16:57 |
<volans@cumin1001> |
START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet |
[production] |
16:18 |
<legoktm@cumin1001> |
conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main |
[production] |
16:16 |
<volans@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1414.* |
[production] |
16:08 |
<volans@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1414.* |
[production] |
16:06 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet |
[production] |
15:54 |
<moritzm> |
filtered mx2001 on the routers for reimage T286911 |
[production] |
15:43 |
<vgutierrez> |
update acme-chief to version 0.31 on acmechief-test hosts - T290249 |
[production] |
15:40 |
<vgutierrez> |
upload acme-chief 0.31 to apt.wm.o (buster) - T290249 |
[production] |
15:32 |
<jelto> |
Traffic: depool codfw from user traffic |
[production] |
15:26 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0) |
[production] |
15:25 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.02-restore-ttl |
[production] |
15:25 |
<volans@cumin1001> |
START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet |
[production] |
15:20 |
<Emperor> |
rebooting ms-be2045 to see if that brings the disk back properly T290881 |
[production] |
15:13 |
<jelto@cumin2002> |
conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async |
[production] |
15:13 |
<legoktm> |
(cotd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero) |
[production] |
15:13 |
<rzl> |
(contd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero) |
[production] |
15:12 |
<jelto@cumin2002> |
conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|eventstreams-internal|kartotherian|linkrecommendation|mathoid|mobileapps|ores|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|shellbox|shell |
[production] |
15:02 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0) |
[production] |
15:02 |
<topranks> |
Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm. |
[production] |
14:56 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep |
[production] |
14:44 |
<herron> |
drained mx2001 mail queue to mx1001 T286911 |
[production] |
14:38 |
<dcausse> |
restarting wdqs-updater.service on all wdqs servers |
[production] |
14:21 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0) |
[production] |
14:20 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.02-restore-ttl |
[production] |
14:13 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0) |
[production] |
14:13 |
<legoktm> |
(cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad |
[production] |
14:12 |
<jelto@cumin2002> |
Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex |
[production] |
14:12 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.01-switch-dc |
[production] |
14:11 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0) |
[production] |
14:05 |
<jelto@cumin2002> |
START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep |
[production] |
14:03 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet |
[production] |
13:51 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet |
[production] |
13:50 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet |
[production] |
13:39 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet |
[production] |
13:36 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet |
[production] |
13:21 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet |
[production] |
13:20 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2001.codfw.wmnet |
[production] |
13:08 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host durum2001.codfw.wmnet |
[production] |
12:09 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
12:03 |
<volans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
11:32 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:27 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:26 |
<kostajh> |
European mid-day backport window deploys done |
[production] |
11:24 |
<kharlan@deploy1002> |
Synchronized wmf-config: Config: [[gerrit:713553|WikimediaEvents: Remove UnderstandingFirstDay config]] (duration: 00m 59s) |
[production] |
10:51 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet |
[production] |
10:43 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet |
[production] |
10:15 |
<volans@cumin1001> |
END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet |
[production] |