production SAL

1701-1750 of 10000 results (91ms)

2024-07-02 §
11:26	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depool db2129 T369021', diff saved to https://phabricator.wikimedia.org/P65653 and previous config saved to /var/cache/conftool/dbconfig/20240702-112616-root.json	[production]
11:25	<marostegui@cumin1002>	dbctl commit (dc=all): 'Promote db2214 to s6 primary T369021', diff saved to https://phabricator.wikimedia.org/P65652 and previous config saved to /var/cache/conftool/dbconfig/20240702-112518-marostegui.json	[production]
11:24	<marostegui>	Starting s6 codfw failover from db2129 to db2214 - T369021	[production]
11:24	<jayme>	switched wikikube production clusters from PSP to PSS for restricted namespaces - T273507	[production]
11:23	<jayme@deploy1002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
11:22	<btullis@cumin1002>	START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet	[production]
11:22	<jayme@deploy1002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
11:22	<fabfur@cumin1002>	START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad	[production]
11:22	<fabfur@cumin1002>	START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad	[production]
11:21	<jayme@cumin1002>	START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet	[production]
11:21	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
11:21	<claime>	Uncordoning wikikube-ctrl2001.codfw.wmnet and wikikube-ctrl2002.codfw.wmnet	[production]
11:20	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
11:19	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65651 and previous config saved to /var/cache/conftool/dbconfig/20240702-111949-marostegui.json	[production]
11:17	<root@cumin1002>	START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye	[production]
11:16	<marostegui@cumin1002>	dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65650 and previous config saved to /var/cache/conftool/dbconfig/20240702-111616-root.json	[production]
11:14	<fabfur@cumin1002>	END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad	[production]
11:12	<cgoubert@cumin1002>	conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet\|wikikube-worker2026.codfw.wmnet\|wikikube-worker2027.codfw.wmnet\|wikikube-worker2028.codfw.wmnet\|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc	[production]
11:12	<fabfur@cumin1002>	END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad	[production]
11:12	<claime>	pooling and uncordoning wikikube-worker2025.codfw.wmnet\|wikikube-worker2026.codfw.wmnet\|wikikube-worker2027.codfw.wmnet\|wikikube-worker2028.codfw.wmnet\|wikikube-worker2029.codfw.wmnet - T351074	[production]
11:11	<jiji@cumin1002>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[2001-2002].codfw.wmnet	[production]
11:11	<jiji@cumin1002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
11:11	<jiji@cumin1002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"	[production]
11:07	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021	[production]
11:07	<marostegui@cumin1002>	dbctl commit (dc=all): 'Set db2214 with weight 0 T369021', diff saved to https://phabricator.wikimedia.org/P65649 and previous config saved to /var/cache/conftool/dbconfig/20240702-110750-root.json	[production]
11:07	<jiji@cumin1002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"	[production]
11:07	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021	[production]
11:04	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65648 and previous config saved to /var/cache/conftool/dbconfig/20240702-110442-marostegui.json	[production]
11:01	<marostegui@cumin1002>	dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65647 and previous config saved to /var/cache/conftool/dbconfig/20240702-110111-root.json	[production]
10:56	<jiji@cumin1002>	START - Cookbook sre.dns.netbox	[production]
10:50	<jiji@cumin1002>	START - Cookbook sre.hosts.decommission for hosts kubemaster[2001-2002].codfw.wmnet	[production]
10:46	<marostegui@cumin1002>	dbctl commit (dc=all): 'db2165 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65646 and previous config saved to /var/cache/conftool/dbconfig/20240702-104605-root.json	[production]
10:42	<pfischer@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:42	<pfischer@deploy1002>	helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:42	<pfischer@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:41	<pfischer@deploy1002>	helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:35	<brouberol@cumin1002>	START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.	[production]
10:34	<btullis@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1003.eqiad.wmnet	[production]
10:32	<brouberol@cumin1002>	END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker	[production]
10:28	<fabfur>	upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)	[production]
10:27	<fabfur@cumin1002>	START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad	[production]
10:27	<fabfur@cumin1002>	START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad	[production]
10:25	<btullis@cumin1002>	START - Cookbook sre.hosts.reboot-single for host an-master1003.eqiad.wmnet	[production]
10:06	<jynus@cumin1002>	dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65645 and previous config saved to /var/cache/conftool/dbconfig/20240702-100636-jynus.json	[production]
10:02	<claime>	homer 'crcodfw' commit 'T351074'	[production]
09:53	<jiji@cumin1002>	conftool action : set/pooled=no; selector: name=kubemaster200[1-2].codfw.wmnet	[production]
09:52	<elukey>	volatile dir on puppetserver1001 with the new point release (12.6) for Bookworm	[production]
09:48	<jiji@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom	[production]
09:47	<jiji@cumin1002>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom	[production]
09:20	<brouberol@cumin1002>	START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker	[production]