production SAL

8151-8200 of 10000 results (117ms)

2023-10-26 §
13:21	<jmm@cumin2002>	START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw	[production]
13:20	<lucaswerkmeister-wmde@deploy2002>	Finished scap: Backport for [[gerrit:968713\|Enable block feature for AbuseFilter on srwiki (T349727)]] (duration: 10m 23s)	[production]
13:20	<bking@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
13:20	<bking@cumin1001>	END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)	[production]
13:15	<lucaswerkmeister-wmde@deploy2002>	zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync	[production]
13:15	<moritzm>	installing poppler security updates	[production]
13:11	<lucaswerkmeister-wmde@deploy2002>	zoranzoki21 and lucaswerkmeister-wmde: Backport for [[gerrit:968713\|Enable block feature for AbuseFilter on srwiki (T349727)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
13:10	<lucaswerkmeister-wmde@deploy2002>	Started scap: Backport for [[gerrit:968713\|Enable block feature for AbuseFilter on srwiki (T349727)]]	[production]
13:04	<bking@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
12:27	<stevemunene@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance	[production]
12:26	<stevemunene@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance	[production]
11:04	<kevinbazira@deploy2002>	helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
11:03	<kevinbazira@deploy2002>	helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
10:58	<kevinbazira@deploy2002>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
10:51	<isaranto@deploy2002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
10:51	<isaranto@deploy2002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
10:51	<isaranto@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
10:40	<elukey@deploy2002>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
10:30	<mvolz@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/citoid: apply	[production]
10:29	<mvolz@deploy2002>	helmfile [eqiad] START helmfile.d/services/citoid: apply	[production]
10:25	<mvolz@deploy2002>	helmfile [codfw] DONE helmfile.d/services/citoid: apply	[production]
10:25	<mvolz@deploy2002>	helmfile [codfw] START helmfile.d/services/citoid: apply	[production]
10:20	<mvolz@deploy2002>	helmfile [staging] DONE helmfile.d/services/citoid: apply	[production]
10:20	<mvolz@deploy2002>	helmfile [staging] START helmfile.d/services/citoid: apply	[production]
10:10	<mvolz@deploy2002>	helmfile [staging] DONE helmfile.d/services/citoid: apply	[production]
10:10	<mvolz@deploy2002>	helmfile [staging] START helmfile.d/services/citoid: apply	[production]
09:29	<dcausse>	erratum (replace wdqs1009 with wdqs2009 in the above msg): depooling and restarting blazegraph on wdqs2009 (stuck since 2023-10-12)	[production]
09:28	<dcausse>	depooling and restarting blazegraph on wdqs1009 (stuck since 2023-10-12)	[production]
09:23	<brouberol@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1009.eqiad.wmnet with OS bullseye	[production]
09:14	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox	[production]
09:14	<ayounsi@cumin1001>	START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox	[production]
09:06	<brouberol@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage	[production]
09:03	<brouberol@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage	[production]
08:50	<brouberol@cumin1001>	START - Cookbook sre.hosts.reimage for host kafka-jumbo1009.eqiad.wmnet with OS bullseye	[production]
08:49	<urbanecm>	mwmaint2002: `foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (testing T344428; after enabling backend on all Wikipedias)	[production]
08:48	<urbanecm@deploy2002>	Finished scap: Backport for [[gerrit:949034\|Growth: Enable new Impact backend everywhere (T344143)]] (duration: 09m 29s)	[production]
08:43	<urbanecm@deploy2002>	urbanecm: Continuing with sync	[production]
08:40	<urbanecm@deploy2002>	urbanecm: Backport for [[gerrit:949034\|Growth: Enable new Impact backend everywhere (T344143)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
08:40	<kevinbazira@deploy2002>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
08:40	<brouberol@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1008.eqiad.wmnet with OS bullseye	[production]
08:39	<urbanecm@deploy2002>	Started scap: Backport for [[gerrit:949034\|Growth: Enable new Impact backend everywhere (T344143)]]	[production]
08:32	<kevinbazira@deploy2002>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
08:32	<urbanecm@deploy2002>	helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply	[production]
08:31	<urbanecm@deploy2002>	helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply	[production]
08:29	<urbanecm@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply	[production]
08:28	<urbanecm@deploy2002>	helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply	[production]
08:28	<urbanecm@deploy2002>	helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply	[production]
08:27	<urbanecm@deploy2002>	helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply	[production]
08:24	<brouberol@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage	[production]
08:21	<brouberol@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage	[production]