2022-12-19
ยง
|
12:13 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet |
[production] |
12:10 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five |
[production] |
12:09 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five |
[production] |
11:43 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test |
[production] |
11:43 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test |
[production] |
11:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance |
[production] |
11:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) |
[production] |
11:29 |
<elukey@cumin1001> |
START - Cookbook sre.discovery.service-route |
[production] |
11:29 |
<elukey@cumin1001> |
START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance |
[production] |
11:27 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. |
[production] |
10:51 |
<btullis@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. |
[production] |
10:36 |
<dcausse@deploy1002> |
Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s) |
[production] |
10:33 |
<dcausse@deploy1002> |
Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d |
[production] |
10:28 |
<taavi@deploy1002> |
Finished scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] (duration: 07m 58s) |
[production] |
10:23 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) |
[production] |
10:23 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) |
[production] |
10:23 |
<elukey@cumin1001> |
START - Cookbook sre.discovery.service-route |
[production] |
10:23 |
<elukey@cumin1001> |
START - Cookbook sre.k8s.pool-depool-cluster |
[production] |
10:21 |
<taavi@deploy1002> |
taavi and taavi: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet |
[production] |
10:20 |
<taavi@deploy1002> |
Started scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] |
[production] |
09:59 |
<moritzm> |
update bullseye netboot image for Bullseye 11.6 point release T325186 |
[production] |
09:49 |
<aqu@deploy1002> |
Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s) |
[production] |
09:48 |
<aqu@deploy1002> |
Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] |
[production] |
09:47 |
<aqu@deploy1002> |
Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s) |
[production] |
09:47 |
<aqu@deploy1002> |
Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] |
[production] |
09:30 |
<aqu@deploy1002> |
Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s) |
[production] |
09:29 |
<aqu@deploy1002> |
Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] |
[production] |
09:29 |
<aqu@deploy1002> |
Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s) |
[production] |
09:21 |
<aqu@deploy1002> |
Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] |
[production] |
09:20 |
<aqu@deploy1002> |
Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s) |
[production] |
09:19 |
<aqu@deploy1002> |
Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] |
[production] |
09:17 |
<aqu> |
About to deploy analytics/refinery (bug fix in HDFS usage pipeline) |
[production] |
09:15 |
<ladsgroup@deploy1002> |
Finished scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] (duration: 09m 24s) |
[production] |
09:14 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
09:11 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
09:07 |
<ladsgroup@deploy1002> |
ladsgroup and ladsgroup: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet |
[production] |
09:06 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
09:05 |
<ladsgroup@deploy1002> |
Started scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] |
[production] |
09:04 |
<dcausse> |
restarting blazegraph on wdqs1015 (BlazegraphFreeAllocatorsDecreasingRapidly) |
[production] |
09:03 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
09:02 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
09:01 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
08:59 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
08:58 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
08:56 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
08:37 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
08:34 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
08:33 |
<ayounsi@cumin1001> |
END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 29535 |
[production] |
08:32 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.peering with action 'email' for AS: 29535 |
[production] |
08:02 |
<moritzm> |
installing openexr security updates |
[production] |