production SAL

3751-3800 of 10000 results (68ms)

2022-12-19 §
12:13	<btullis@cumin1001>	START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet	[production]
12:10	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five	[production]
12:09	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five	[production]
11:43	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test	[production]
11:43	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test	[production]
11:29	<elukey@cumin1001>	END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance	[production]
11:29	<elukey@cumin1001>	END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)	[production]
11:29	<elukey@cumin1001>	START - Cookbook sre.discovery.service-route	[production]
11:29	<elukey@cumin1001>	START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance	[production]
11:27	<btullis@cumin1001>	END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.	[production]
10:51	<btullis@cumin1001>	START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.	[production]
10:36	<dcausse@deploy1002>	Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s)	[production]
10:33	<dcausse@deploy1002>	Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d	[production]
10:28	<taavi@deploy1002>	Finished scap: Backport for [[gerrit:868869\|Only preload getPageData if there's thread data for the page (T325477)]] (duration: 07m 58s)	[production]
10:23	<elukey@cumin1001>	END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0)	[production]
10:23	<elukey@cumin1001>	END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)	[production]
10:23	<elukey@cumin1001>	START - Cookbook sre.discovery.service-route	[production]
10:23	<elukey@cumin1001>	START - Cookbook sre.k8s.pool-depool-cluster	[production]
10:21	<taavi@deploy1002>	taavi and taavi: Backport for [[gerrit:868869\|Only preload getPageData if there's thread data for the page (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
10:20	<taavi@deploy1002>	Started scap: Backport for [[gerrit:868869\|Only preload getPageData if there's thread data for the page (T325477)]]	[production]
09:59	<moritzm>	update bullseye netboot image for Bullseye 11.6 point release T325186	[production]
09:49	<aqu@deploy1002>	Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s)	[production]
09:48	<aqu@deploy1002>	Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269]	[production]
09:47	<aqu@deploy1002>	Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s)	[production]
09:47	<aqu@deploy1002>	Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269]	[production]
09:30	<aqu@deploy1002>	Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s)	[production]
09:29	<aqu@deploy1002>	Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff]	[production]
09:29	<aqu@deploy1002>	Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s)	[production]
09:21	<aqu@deploy1002>	Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff]	[production]
09:20	<aqu@deploy1002>	Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s)	[production]
09:19	<aqu@deploy1002>	Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff]	[production]
09:17	<aqu>	About to deploy analytics/refinery (bug fix in HDFS usage pipeline)	[production]
09:15	<ladsgroup@deploy1002>	Finished scap: Backport for [[gerrit:868867\|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] (duration: 09m 24s)	[production]
09:14	<elukey@deploy1002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .	[production]
09:11	<elukey@deploy1002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .	[production]
09:07	<ladsgroup@deploy1002>	ladsgroup and ladsgroup: Backport for [[gerrit:868867\|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet	[production]
09:06	<elukey@deploy1002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .	[production]
09:05	<ladsgroup@deploy1002>	Started scap: Backport for [[gerrit:868867\|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]]	[production]
09:04	<dcausse>	restarting blazegraph on wdqs1015 (BlazegraphFreeAllocatorsDecreasingRapidly)	[production]
09:03	<elukey@deploy1002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .	[production]
09:02	<elukey@deploy1002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .	[production]
09:01	<elukey@deploy1002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .	[production]
08:59	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .	[production]
08:58	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .	[production]
08:56	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .	[production]
08:37	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .	[production]
08:34	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .	[production]
08:33	<ayounsi@cumin1001>	END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 29535	[production]
08:32	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'email' for AS: 29535	[production]
08:02	<moritzm>	installing openexr security updates	[production]