2025-05-16
ยง
|
13:54 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Remove db2166 and db1177 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76270 and previous config saved to /var/cache/conftool/dbconfig/20250516-135438-ladsgroup.json |
[production] |
13:52 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.clone of db1238.eqiad.wmnet onto db1247.eqiad.wmnet |
[production] |
13:50 |
<fceratto@cumin1002> |
END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1188 gradually with 4 steps - Pooling back in |
[production] |
13:50 |
<fceratto@cumin1002> |
END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1238.eqiad.wmnet onto db1247.eqiad.wmnet |
[production] |
13:47 |
<fceratto@cumin1002> |
END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1238 - Depool db1238.eqiad.wmnet to then clone it to db1247.eqiad.wmnet - fceratto@cumin1002 |
[production] |
13:47 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.depool db1238 - Depool db1238.eqiad.wmnet to then clone it to db1247.eqiad.wmnet - fceratto@cumin1002 |
[production] |
13:47 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.clone of db1238.eqiad.wmnet onto db1247.eqiad.wmnet |
[production] |
13:40 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye |
[production] |
13:35 |
<jclark@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye |
[production] |
13:21 |
<hashar@deploy1003> |
Finished deploy [gerrit/gerrit@fcb893c]: wm-zuul-status: do not popup when navigating changes - T394485 (duration: 00m 12s) |
[production] |
13:21 |
<mvernon@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-be1004.eqiad.wmnet with reason: host reimage |
[production] |
13:21 |
<hashar@deploy1003> |
Started deploy [gerrit/gerrit@fcb893c]: wm-zuul-status: do not popup when navigating changes - T394485 |
[production] |
13:17 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on apus-be1004.eqiad.wmnet with reason: host reimage |
[production] |
13:13 |
<jclark@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye |
[production] |
13:05 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.pool db1188 gradually with 4 steps - Pooling back in |
[production] |
13:03 |
<joal@deploy1003> |
Finished deploy [airflow-dags/analytics@4ebb376]: Fix gobblin artifacts (after pulling code...) (duration: 01m 01s) |
[production] |
13:02 |
<joal@deploy1003> |
Started deploy [airflow-dags/analytics@4ebb376]: Fix gobblin artifacts (after pulling code...) |
[production] |
13:02 |
<joal@deploy1003> |
Finished deploy [airflow-dags/analytics_test@4ebb376]: Fix gobblin artifacts (duration: 00m 16s) |
[production] |
13:01 |
<joal@deploy1003> |
Started deploy [airflow-dags/analytics_test@4ebb376]: Fix gobblin artifacts |
[production] |
13:00 |
<joal@deploy1003> |
Finished deploy [airflow-dags/analytics@4351188]: Fix gobblin artifacts (duration: 00m 07s) |
[production] |
13:00 |
<joal@deploy1003> |
Started deploy [airflow-dags/analytics@4351188]: Fix gobblin artifacts |
[production] |
12:52 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm |
[production] |
12:46 |
<mvernon@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-be1004.eqiad.wmnet with OS bookworm |
[production] |
12:43 |
<aqu@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply |
[production] |
12:42 |
<aqu@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply |
[production] |
12:35 |
<fceratto@cumin1002> |
END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1188 gradually with 4 steps - Pooling back in |
[production] |
12:33 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.pool db1188 gradually with 4 steps - Pooling back in |
[production] |
12:32 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm |
[production] |
12:28 |
<mvernon@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-be1004.eqiad.wmnet with OS bookworm |
[production] |
12:20 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm |
[production] |
12:01 |
<kamila@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply |
[production] |
12:01 |
<kamila@deploy1003> |
helmfile [eqiad] START helmfile.d/services/mw-cron: apply |
[production] |
11:59 |
<fceratto@cumin1002> |
END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1188.eqiad.wmnet onto db1246.eqiad.wmnet |
[production] |
11:42 |
<fceratto@cumin1002> |
END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1188 - Depool db1188.eqiad.wmnet to then clone it to db1246.eqiad.wmnet - fceratto@cumin1002 |
[production] |
11:42 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.depool db1188 - Depool db1188.eqiad.wmnet to then clone it to db1246.eqiad.wmnet - fceratto@cumin1002 |
[production] |
11:42 |
<fceratto@cumin1002> |
START - Cookbook sre.mysql.clone of db1188.eqiad.wmnet onto db1246.eqiad.wmnet |
[production] |
11:23 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Remove db2242 from x3, remove db2154 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76262 and previous config saved to /var/cache/conftool/dbconfig/20250516-112345-ladsgroup.json |
[production] |
11:19 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Remove db1214 from x3, remove db1257 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76261 and previous config saved to /var/cache/conftool/dbconfig/20250516-111952-ladsgroup.json |
[production] |
10:44 |
<joal@deploy1003> |
Finished deploy [airflow-dags/analytics@4351188]: Deploying analytics with artifact-cache warming using main folder (duration: 00m 49s) |
[production] |
10:43 |
<joal@deploy1003> |
Started deploy [airflow-dags/analytics@4351188]: Deploying analytics with artifact-cache warming using main folder |
[production] |
10:28 |
<joal@deploy1003> |
Finished deploy [airflow-dags/main@4351188]: Deploying main instead of analytics subfolder (duration: 01m 51s) |
[production] |
10:26 |
<joal@deploy1003> |
Started deploy [airflow-dags/main@4351188]: Deploying main instead of analytics subfolder |
[production] |
10:22 |
<jynus> |
upgrading db1239 MariaDB server T394487 |
[production] |
10:16 |
<jynus@cumin1002> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet,ms-backup1002.eqiad.wmnet with reason: Upgrade and test |
[production] |
09:51 |
<joal@deploy1003> |
Finished deploy [airflow-dags/analytics_test@4351188]: Fix slf4j artifact sync (duration: 00m 12s) |
[production] |
09:51 |
<joal@deploy1003> |
Started deploy [airflow-dags/analytics_test@4351188]: Fix slf4j artifact sync |
[production] |
09:49 |
<btullis@deploy1003> |
Finished deploy [airflow-dags/analytics_test@c2d660e]: Test (duration: 24m 55s) |
[production] |
09:27 |
<cgoubert@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply |
[production] |
09:27 |
<cgoubert@deploy1003> |
helmfile [eqiad] START helmfile.d/services/mw-cron: apply |
[production] |
09:24 |
<btullis@deploy1003> |
Started deploy [airflow-dags/analytics_test@c2d660e]: Test |
[production] |