production SAL

4201-4250 of 10000 results (96ms)

2023-11-29 §
13:05	<cmooney@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002	[production]
13:05	<cmooney@cumin1001>	START - Cookbook sre.hosts.downtime for 0:20:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002	[production]
13:01	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
13:01	<cmooney@cumin1001>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
13:00	<jmm@cumin2002>	START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet	[production]
12:58	<topranks>	restoring DB snapshot from 11:37 UTC to netboxdb1002	[production]
12:52	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
12:52	<cmooney@cumin1001>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
12:46	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
12:44	<hashar@deploy2002>	Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 07s)	[production]
12:43	<hashar@deploy2002>	Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412	[production]
12:36	<hashar@deploy2002>	Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 06s)	[production]
12:35	<hashar@deploy2002>	Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412	[production]
12:35	<hashar@deploy2002>	Finished deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 12s)	[production]
12:35	<hashar@deploy2002>	Started deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412	[production]
12:25	<vgutierrez>	rolling restart of pybal on lvs4008 and lvs4010, effectively enabling IPIP encapsulation for ncredir@ulsfo - T351069	[production]
12:22	<hashar@deploy2002>	Finished deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 15s)	[production]
12:22	<hashar@deploy2002>	Started deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412	[production]
12:06	<fabfur@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[1075-1090].eqiad.wmnet	[production]
12:06	<fabfur@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
12:05	<fabfur@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[1075-1090].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"	[production]
12:05	<klausman@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
12:04	<fabfur@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[1075-1090].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"	[production]
12:02	<hashar>	Disabled Puppet agent on gerrit1003 and gerrit2002 to roll https://gerrit.wikimedia.org/r/844998 which requires some manual steps \| T317412	[production]
11:26	<jiji@deploy2002>	helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply	[production]
11:26	<jiji@deploy2002>	helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply	[production]
11:23	<fabfur@cumin1001>	START - Cookbook sre.dns.netbox	[production]
11:21	<vgutierrez>	upload tcp-mss-clamper 0.3+deb12u1 to apt.wm.o (bookworm) - T352249	[production]
11:15	<hnowlan@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply	[production]
11:14	<hnowlan@deploy2002>	helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply	[production]
11:13	<hnowlan@deploy2002>	helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply	[production]
11:13	<hnowlan@deploy2002>	helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply	[production]
11:12	<btullis>	re-enabled all DAGs on all airflow instances after airflow upgrade to 2.7.3	[production]
10:57	<vgutierrez>	upload ipip-multiqueue-optimizer 0.3+deb11u1 to apt.wm.o (bullseye) - T352249	[production]
10:56	<jiji@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply	[production]
10:56	<jiji@deploy2002>	helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply	[production]
10:53	<klausman@deploy2002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
10:51	<hnowlan@deploy2002>	helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply	[production]
10:51	<hnowlan@deploy2002>	helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply	[production]
10:50	<klausman@deploy2002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
10:49	<klausman@deploy2002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .	[production]
10:37	<fabfur@cumin1001>	START - Cookbook sre.hosts.decommission for hosts cp[1075-1090].eqiad.wmnet	[production]
10:37	<btullis>	pausing all active dags on all airflow instances	[production]
10:36	<fabfur>	decommissioning cp1075-1090 (T352253)	[production]
10:10	<klausman@deploy2002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.	[production]
10:10	<klausman@deploy2002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.	[production]
09:42	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bookworm	[production]
09:28	<marostegui@cumin1001>	dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53938 and previous config saved to /var/cache/conftool/dbconfig/20231129-092808-root.json	[production]
09:21	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage	[production]
09:20	<hashar@deploy2002>	Synchronized php: group1 wikis to 1.42.0-wmf.7 refs T350083 (duration: 07m 23s)	[production]