3751-3800 of 10000 results (110ms)
2022-12-19 ยง
12:13 <btullis@cumin1001> START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet [production]
12:10 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five [production]
12:09 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five [production]
11:43 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test [production]
11:43 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test [production]
11:29 <elukey@cumin1001> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance [production]
11:29 <elukey@cumin1001> END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) [production]
11:29 <elukey@cumin1001> START - Cookbook sre.discovery.service-route [production]
11:29 <elukey@cumin1001> START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance [production]
11:27 <btullis@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. [production]
10:51 <btullis@cumin1001> START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. [production]
10:36 <dcausse@deploy1002> Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s) [production]
10:33 <dcausse@deploy1002> Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d [production]
10:28 <taavi@deploy1002> Finished scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] (duration: 07m 58s) [production]
10:23 <elukey@cumin1001> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) [production]
10:23 <elukey@cumin1001> END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) [production]
10:23 <elukey@cumin1001> START - Cookbook sre.discovery.service-route [production]
10:23 <elukey@cumin1001> START - Cookbook sre.k8s.pool-depool-cluster [production]
10:21 <taavi@deploy1002> taavi and taavi: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet [production]
10:20 <taavi@deploy1002> Started scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] [production]
09:59 <moritzm> update bullseye netboot image for Bullseye 11.6 point release T325186 [production]
09:49 <aqu@deploy1002> Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s) [production]
09:48 <aqu@deploy1002> Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] [production]
09:47 <aqu@deploy1002> Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s) [production]
09:47 <aqu@deploy1002> Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] [production]
09:30 <aqu@deploy1002> Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s) [production]
09:29 <aqu@deploy1002> Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] [production]
09:29 <aqu@deploy1002> Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s) [production]
09:21 <aqu@deploy1002> Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] [production]
09:20 <aqu@deploy1002> Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s) [production]
09:19 <aqu@deploy1002> Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] [production]
09:17 <aqu> About to deploy analytics/refinery (bug fix in HDFS usage pipeline) [production]
09:15 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] (duration: 09m 24s) [production]
09:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
09:11 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
09:07 <ladsgroup@deploy1002> ladsgroup and ladsgroup: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet [production]
09:06 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
09:05 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] [production]
09:04 <dcausse> restarting blazegraph on wdqs1015 (BlazegraphFreeAllocatorsDecreasingRapidly) [production]
09:03 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
09:02 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
09:01 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
08:59 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
08:58 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
08:56 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
08:37 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
08:34 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
08:33 <ayounsi@cumin1001> END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 29535 [production]
08:32 <ayounsi@cumin1001> START - Cookbook sre.network.peering with action 'email' for AS: 29535 [production]
08:02 <moritzm> installing openexr security updates [production]