4201-4250 of 10000 results (116ms)
2024-07-03 ยง
14:16 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994 [production]
14:11 <klausman@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . [production]
14:10 <jclark@cumin1002> START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye [production]
14:09 <klausman@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . [production]
14:09 <klausman@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . [production]
14:09 <bking@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply [production]
14:08 <klausman@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' . [production]
14:07 <jclark@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye [production]
14:04 <topranks> rebooting lsw1-e2-eqiad to install updated JunOS version T365994 [production]
14:01 <cmooney@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad [production]
14:00 <cmooney@cumin1002> START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad [production]
13:59 <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977 [production]
13:59 <bking@cumin2002> START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977 [production]
13:58 <cmooney@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad [production]
13:58 <cmooney@cumin1002> START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad [production]
13:57 <jayme@cumin1002> conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet) [production]
13:56 <bking@cumin2002> END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002 [production]
13:56 <bking@cumin2002> START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002 [production]
13:56 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 [production]
13:55 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 [production]
13:53 <cmooney@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad [production]
13:52 <cmooney@cumin1002> START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad [production]
13:48 <Lucas_WMDE> UTC afternoon backport+config window done [production]
13:48 <logmsgbot> lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] (duration: 08m 38s) [production]
13:44 <jayme> draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994 [production]
13:43 <logmsgbot> lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync [production]
13:42 <logmsgbot> lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:39 <logmsgbot> lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1037587|noc: fail with a 404 when the selected wiki is nonexistent]], [[gerrit:1037783|CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup]] [production]
13:38 <logmsgbot> lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] (duration: 09m 28s) [production]
13:33 <logmsgbot> lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync [production]
13:31 <logmsgbot> lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:29 <jclark@cumin1002> START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm [production]
13:29 <jclark@cumin1002> START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm [production]
13:29 <jclark@cumin1002> START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm [production]
13:28 <logmsgbot> lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1051749|GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)]] [production]
13:25 <logmsgbot> lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] (duration: 08m 20s) [production]
13:22 <jclark@cumin1002> START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye [production]
13:20 <jclark@cumin1002> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001 [production]
13:20 <logmsgbot> lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync [production]
13:19 <logmsgbot> lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
13:19 <jclark@cumin1002> START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001 [production]
13:18 <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994 [production]
13:18 <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994 [production]
13:17 <cmooney@cumin1002> END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors [production]
13:17 <cmooney@cumin1002> START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors [production]
13:17 <cmooney@cumin1002> END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors [production]
13:17 <cmooney@cumin1002> START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors [production]
13:17 <arnaudb@cumin1002> dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json [production]
13:17 <logmsgbot> lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1051748|PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)]] [production]
13:16 <cmooney@cumin1002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]