production SAL

401-450 of 10000 results (105ms)

2024-08-07 §
17:29	<milimetric@deploy1003>	Finished deploy [analytics/refinery@0d25645]: Syncing browser general script, and refinery-source 0.2.45 apparently (duration: 54m 21s)	[production]
17:27	<brennen@deploy1003>	Started scap sync-world: Backport for [[gerrit:1060468\|Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977)]]	[production]
17:17	<brett>	stop pybal on lvs1019 for server reboot	[production]
17:14	<brett>	start pybal on lvs2014	[production]
17:11	<brett@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet	[production]
17:08	<brett@cumin2002>	START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet	[production]
17:07	<jclark@cumin1002>	START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye	[production]
16:42	<brett>	stop pybal on lvs2014 for server reboot	[production]
16:37	<mutante>	puppetserver1002 systemctl start dump_ip_reputation	[production]
16:34	<milimetric@deploy1003>	Started deploy [analytics/refinery@0d25645]: Syncing browser general script, and refinery-source 0.2.45 apparently	[production]
16:27	<brett>	start pybal on lvs2013	[production]
16:15	<andrew@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1038.eqiad.wmnet with OS bullseye	[production]
16:14	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P67246 and previous config saved to /var/cache/conftool/dbconfig/20240807-161452-ladsgroup.json	[production]
16:11	<brett@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet	[production]
16:08	<brett@cumin2002>	START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet	[production]
16:01	<elukey@cumin1002>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Openjdk upgrade - elukey@cumin1002	[production]
15:57	<andrew@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage	[production]
15:54	<andrew@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage	[production]
15:40	<brett>	stop pybal on lvs2013 for server reboot	[production]
15:37	<andrew@cumin1002>	START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye	[production]
15:36	<andrew@cumin1002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1038.eqiad.wmnet with OS bullseye	[production]
15:25	<kevinbazira@deploy1003>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .	[production]
15:21	<andrew@cumin1002>	START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye	[production]
15:15	<kevinbazira@deploy1003>	helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .	[production]
14:58	<sukhe>	start pybal on lvs3008	[production]
14:53	<sukhe@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3008.esams.wmnet	[production]
14:50	<sukhe@cumin1002>	START - Cookbook sre.hosts.reboot-single for host lvs3008.esams.wmnet	[production]
14:33	<elukey@cumin1002>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Openjdk upgrade - elukey@cumin1002	[production]
14:26	<jnuche@deploy1003>	Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 01m 12s)	[production]
14:25	<jnuche@deploy1003>	Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)	[production]
14:24	<sukhe>	sudo cumin "lvs3008*" 'disable-puppet "rebooting" && systemctl stop pybal.service'	[production]
14:22	<jnuche@deploy1003>	Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 53s)	[production]
14:21	<jnuche@deploy1003>	Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)	[production]
14:04	<brouberol@deploy1003>	helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.	[production]
14:03	<brouberol@deploy1003>	helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.	[production]
14:01	<elukey>	import Jenkins 2.462.1 on bullseye-wikimedia:thirdparty/ci	[production]
13:55	<sukhe>	start pybal on lvs3009	[production]
13:54	<sukhe@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3009.esams.wmnet	[production]
13:51	<sukhe@cumin1002>	START - Cookbook sre.hosts.reboot-single for host lvs3009.esams.wmnet	[production]
13:46	<dcaro@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye	[production]
13:43	<hnowlan@deploy1003>	Finished scap: sync to test mw-jobrunner resource increase (duration: 02m 22s)	[production]
13:42	<hnowlan@deploy1003>	Started scap sync-world: sync to test mw-jobrunner resource increase	[production]
13:39	<filippo@deploy1003>	helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply	[production]
13:39	<filippo@deploy1003>	helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply	[production]
13:39	<filippo@deploy1003>	helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply	[production]
13:38	<filippo@deploy1003>	helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply	[production]
13:31	<hashar>	UTC afternoon backport window is completed	[production]
13:28	<dcaro@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage	[production]
13:28	<hashar@deploy1003>	Finished scap: Backport for [[gerrit:1060415\|Turn on Parsoid support for Kartographer on Wikivoyage (T371823)]] (duration: 17m 26s)	[production]
13:27	<sukhe>	sudo cumin "lvs3009*" 'disable-puppet "rebooting" && systemctl stop pybal.service'	[production]