production SAL

1551-1600 of 10000 results (83ms)

2023-07-10 §
07:29	<elukey@deploy1002>	helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync	[production]
07:29	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet	[production]
07:29	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet	[production]
07:22	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet	[production]
07:22	<elukey@deploy1002>	helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync	[production]
07:21	<elukey@deploy1002>	helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync	[production]
07:21	<hashar>	deploy1002: removed empty untracked directory from MediaWiki staging area: `rmdir /srv/mediawiki-staging/wmf-config/scap/log/ && rmdir /srv/mediawiki-staging/wmf-config/scap/` \| T341292	[production]
07:20	<elukey@deploy1002>	helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync	[production]
07:20	<elukey@deploy1002>	helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync	[production]
07:02	<jmm@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet	[production]
07:01	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet	[production]
07:01	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet	[production]
06:55	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet	[production]
06:45	<jmm@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet	[production]
06:43	<godog>	add 100G to prometheus/k8s in codfw	[production]
01:06	<rzl@deploy1002>	helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply	[production]
01:06	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
2023-07-09 §
14:51	<apergos>	swapped dumpsdata1003 in as the new nfs share for misc dumps; dumpsdata1002 is now a spare, to be decommissioned. 1003 is running bullseye.	[production]
04:04	<apergos>	rsync misc dumps output files from dumpsdata1002 to 1003, in ariel screen session on 1003, bwlimit to 1G	[production]
2023-07-08 §
03:21	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
2023-07-07 §
22:55	<rzl@deploy1002>	helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply	[production]
22:55	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
22:41	<rzl@deploy1002>	helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply	[production]
22:21	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
22:04	<jhancock@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye	[production]
21:59	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
21:24	<bking@deploy1002>	Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 57s)	[production]
21:23	<bking@deploy1002>	Started deploy [wdqs/wdqs@dff41b7]: 0.3.124	[production]
21:23	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
20:53	<dwisehaupt@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
20:53	<dwisehaupt@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"	[production]
20:52	<dwisehaupt@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"	[production]
20:50	<dwisehaupt@cumin1001>	START - Cookbook sre.dns.netbox	[production]
20:44	<jhancock@cumin2002>	START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye	[production]
19:33	<bking@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
19:33	<bking@cumin1001>	END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)	[production]
19:32	<bking@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
19:12	<bking@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)	[production]
18:11	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
18:08	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
17:58	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
17:57	<pt1979@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye	[production]
17:56	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
17:40	<bking@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
16:44	<pt1979@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye	[production]
16:38	<bking@cumin1001>	conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet	[production]
16:23	<bking@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
16:20	<hashar>	Restarting CI Jenkins due to a confusion in the next build number leading to intermittent 404 when browsing console links \| T341348	[production]
16:00	<bking@cumin1001>	conftool action : set/pooled=no; selector: name=wdqs2020.codfw.wmnet	[production]
15:53	<aborrero@cumin1001>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED	[production]