production SAL

1301-1350 of 10000 results (84ms)

2024-06-06 §
12:33	<topranks>	disabling BGP to ssw1-e1-eqiad from cr1-eqiad in advance of upgrade T366361	[production]
12:33	<vgutierrez>	depool text@codfw before enabling IPIP encapsulation - T366466	[production]
12:29	<fabfur@cumin1002>	START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet	[production]
12:28	<fabfur@cumin1002>	conftool action : set/pooled=no; selector: name=cp4051.ulsfo.wmnet	[production]
12:25	<topranks>	disabling PyBal on lvs1018 to allow for cable move T366361	[production]
12:25	<cmooney@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf	[production]
12:25	<cmooney@cumin1002>	START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf	[production]
12:24	<cmooney@cumin1002>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1017.eqiad.wmnet	[production]
12:24	<cmooney@cumin1002>	START - Cookbook sre.hosts.remove-downtime for lvs1017.eqiad.wmnet	[production]
12:21	<sfaci@deploy1002>	helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply	[production]
12:21	<sfaci@deploy1002>	helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply	[production]
12:14	<cmooney@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f	[production]
12:14	<cmooney@cumin1002>	START - Cookbook sre.hosts.downtime for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f	[production]
11:56	<topranks>	disabling PyBal on lvs1017 to allow for cable move T366361	[production]
11:55	<cmooney@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf	[production]
11:55	<cmooney@cumin1002>	START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf	[production]
11:28	<cgoubert@cumin1002>	END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw	[production]
11:27	<effie>	kicking off k8s eqiad restarts - T366555	[production]
11:25	<jiji@cumin1002>	START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad	[production]
11:15	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply	[production]
11:09	<klausman@cumin1002>	START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad	[production]
11:05	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/data-gateway: apply	[production]
10:58	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/data-gateway: apply	[production]
10:47	<sfaci@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply	[production]
10:45	<sfaci@deploy1002>	helmfile [eqiad] START helmfile.d/services/geo-analytics: apply	[production]
10:45	<sfaci@deploy1002>	helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply	[production]
10:43	<sfaci@deploy1002>	helmfile [codfw] START helmfile.d/services/geo-analytics: apply	[production]
10:41	<pfischer@deploy1002>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:41	<sfaci@deploy1002>	helmfile [staging] DONE helmfile.d/services/geo-analytics: apply	[production]
10:40	<sfaci@deploy1002>	helmfile [staging] START helmfile.d/services/geo-analytics: apply	[production]
10:40	<sfaci@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply	[production]
10:38	<sfaci@deploy1002>	helmfile [eqiad] START helmfile.d/services/device-analytics: apply	[production]
10:37	<sfaci@deploy1002>	helmfile [codfw] DONE helmfile.d/services/device-analytics: apply	[production]
10:35	<sfaci@deploy1002>	helmfile [codfw] START helmfile.d/services/device-analytics: apply	[production]
10:27	<sfaci@deploy1002>	helmfile [staging] DONE helmfile.d/services/device-analytics: apply	[production]
10:26	<sfaci@deploy1002>	helmfile [staging] START helmfile.d/services/device-analytics: apply	[production]
10:11	<pfischer@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:07	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64181 and previous config saved to /var/cache/conftool/dbconfig/20240606-100747-arnaudb.json	[production]
09:52	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64180 and previous config saved to /var/cache/conftool/dbconfig/20240606-095240-arnaudb.json	[production]
09:51	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance	[production]
09:50	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance	[production]
09:50	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64179 and previous config saved to /var/cache/conftool/dbconfig/20240606-095053-marostegui.json	[production]
09:47	<mvernon@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2004.codfw.wmnet	[production]
09:37	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64178 and previous config saved to /var/cache/conftool/dbconfig/20240606-093734-arnaudb.json	[production]
09:35	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64177 and previous config saved to /var/cache/conftool/dbconfig/20240606-093545-marostegui.json	[production]
09:33	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet	[production]
09:30	<mvernon@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2003.codfw.wmnet	[production]
09:22	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64176 and previous config saved to /var/cache/conftool/dbconfig/20240606-092228-arnaudb.json	[production]
09:22	<stevemunene@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
09:20	<stevemunene@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]