2024-07-03
ยง
|
14:25 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json |
[production] |
14:25 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json |
[production] |
14:25 |
<klausman@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . |
[production] |
14:21 |
<jayme@cumin1002> |
conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet) |
[production] |
14:18 |
<bking@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply |
[production] |
14:18 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json |
[production] |
14:17 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994 |
[production] |
14:17 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994 |
[production] |
14:17 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994 |
[production] |
14:16 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994 |
[production] |
14:11 |
<klausman@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . |
[production] |
14:10 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye |
[production] |
14:09 |
<klausman@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . |
[production] |
14:09 |
<klausman@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . |
[production] |
14:09 |
<bking@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply |
[production] |
14:08 |
<klausman@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . |
[production] |
14:07 |
<jclark@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye |
[production] |
14:04 |
<topranks> |
rebooting lsw1-e2-eqiad to install updated JunOS version T365994 |
[production] |
14:01 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad |
[production] |
14:00 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad |
[production] |
13:59 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977 |
[production] |
13:59 |
<bking@cumin2002> |
START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977 |
[production] |
13:58 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad |
[production] |
13:58 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad |
[production] |
13:57 |
<jayme@cumin1002> |
conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet) |
[production] |
13:56 |
<bking@cumin2002> |
END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002 |
[production] |
13:56 |
<bking@cumin2002> |
START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002 |
[production] |
13:56 |
<jayme@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 |
[production] |
13:55 |
<jayme@cumin1002> |
START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 |
[production] |
13:53 |
<cmooney@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad |
[production] |
13:52 |
<cmooney@cumin1002> |
START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad |
[production] |
13:48 |
<Lucas_WMDE> |
UTC afternoon backport+config window done |
[production] |
13:48 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] (duration: 08m 38s) |
[production] |
13:44 |
<jayme> |
draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994 |
[production] |
13:43 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync |
[production] |
13:42 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
13:39 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] |
[production] |
13:38 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] (duration: 09m 28s) |
[production] |
13:33 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync |
[production] |
13:31 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
13:29 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm |
[production] |
13:29 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm |
[production] |
13:29 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm |
[production] |
13:28 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] |
[production] |
13:25 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] (duration: 08m 20s) |
[production] |
13:22 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye |
[production] |
13:20 |
<jclark@cumin1002> |
END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001 |
[production] |
13:20 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync |
[production] |
13:19 |
<logmsgbot> |
lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
13:19 |
<jclark@cumin1002> |
START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001 |
[production] |