production SAL

5351-5400 of 10000 results (93ms)

2023-10-18 §
11:21	<hnowlan@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply	[production]
11:20	<hnowlan@deploy2002>	helmfile [eqiad] START helmfile.d/services/editor-analytics: apply	[production]
11:16	<hnowlan@deploy2002>	helmfile [staging] DONE helmfile.d/services/editor-analytics: apply	[production]
11:16	<hnowlan@deploy2002>	helmfile [staging] START helmfile.d/services/editor-analytics: apply	[production]
11:14	<fnegri@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage	[production]
11:12	<fnegri@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage	[production]
11:11	<ladsgroup@deploy2002>	Finished scap: Backport for [[gerrit:966592\|Set s6 and s8 to write both for pagelinks migration (T345732)]] (duration: 10m 10s)	[production]
11:08	<jbond@cumin1001>	START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye	[production]
11:05	<ladsgroup@deploy2002>	ladsgroup: Continuing with sync	[production]
11:02	<ladsgroup@deploy2002>	ladsgroup: Backport for [[gerrit:966592\|Set s6 and s8 to write both for pagelinks migration (T345732)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
11:01	<fnegri@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm	[production]
11:00	<ladsgroup@deploy2002>	Started scap: Backport for [[gerrit:966592\|Set s6 and s8 to write both for pagelinks migration (T345732)]]	[production]
10:40	<volans>	re-enabled puppet on the cumin hosts. installed spicerack 8.0.1 on the cumin hosts	[production]
10:37	<volans@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye	[production]
10:35	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet	[production]
10:32	<fnegri@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm	[production]
10:28	<kevinbazira@deploy2002>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
10:19	<fnegri@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage	[production]
10:16	<fnegri@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage	[production]
10:09	<volans@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet	[production]
10:07	<fnegri@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm	[production]
10:03	<volans@cumin2002>	START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet	[production]
09:54	<volans@cumin2002>	START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye	[production]
09:52	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009	[production]
09:52	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009	[production]
09:48	<volans@cumin2002>	END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet	[production]
09:47	<volans@cumin2002>	START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet	[production]
09:25	<volans>	uploaded spicerack_8.0.1 to apt.wikimedia.org bullseye-wikimedia	[production]
09:23	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply	[production]
09:23	<jynus>	aborting backup of es1022, es1025 (there was already another backup running)	[production]
09:23	<fnegri@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm	[production]
09:22	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/services/wikifunctions: apply	[production]
09:21	<jynus>	starting new backup of es1022, es1025 (new clusters only)	[production]
09:20	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet	[production]
09:20	<jayme@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply	[production]
09:19	<jayme@deploy1002>	helmfile [eqiad] START helmfile.d/services/wikifunctions: apply	[production]
09:17	<jayme@deploy1002>	helmfile [staging] DONE helmfile.d/services/wikifunctions: apply	[production]
09:17	<jayme@deploy1002>	helmfile [staging] START helmfile.d/services/wikifunctions: apply	[production]
09:17	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting	[production]
09:16	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting	[production]
09:14	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet	[production]
09:13	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet	[production]
09:13	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet	[production]
09:10	<fnegri@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage	[production]
09:06	<fnegri@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage	[production]
09:05	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet	[production]
09:02	<aqu@deploy2002>	Finished deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce] (duration: 00m 06s)	[production]
09:02	<aqu@deploy2002>	Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce]	[production]
09:01	<aqu@deploy2002>	deploy aborted: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce] (duration: 01m 10s)	[production]
09:00	<aqu@deploy2002>	Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce]	[production]