production SAL

201-250 of 10000 results (78ms)

2024-06-06 §
11:55	<cmooney@cumin1002>	START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf	[production]
11:28	<cgoubert@cumin1002>	END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw	[production]
11:27	<effie>	kicking off k8s eqiad restarts - T366555	[production]
11:25	<jiji@cumin1002>	START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad	[production]
11:15	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply	[production]
11:09	<klausman@cumin1002>	START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad	[production]
11:05	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/data-gateway: apply	[production]
10:58	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/data-gateway: apply	[production]
10:47	<sfaci@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply	[production]
10:45	<sfaci@deploy1002>	helmfile [eqiad] START helmfile.d/services/geo-analytics: apply	[production]
10:45	<sfaci@deploy1002>	helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply	[production]
10:43	<sfaci@deploy1002>	helmfile [codfw] START helmfile.d/services/geo-analytics: apply	[production]
10:41	<pfischer@deploy1002>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:41	<sfaci@deploy1002>	helmfile [staging] DONE helmfile.d/services/geo-analytics: apply	[production]
10:40	<sfaci@deploy1002>	helmfile [staging] START helmfile.d/services/geo-analytics: apply	[production]
10:40	<sfaci@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply	[production]
10:38	<sfaci@deploy1002>	helmfile [eqiad] START helmfile.d/services/device-analytics: apply	[production]
10:37	<sfaci@deploy1002>	helmfile [codfw] DONE helmfile.d/services/device-analytics: apply	[production]
10:35	<sfaci@deploy1002>	helmfile [codfw] START helmfile.d/services/device-analytics: apply	[production]
10:27	<sfaci@deploy1002>	helmfile [staging] DONE helmfile.d/services/device-analytics: apply	[production]
10:26	<sfaci@deploy1002>	helmfile [staging] START helmfile.d/services/device-analytics: apply	[production]
10:11	<pfischer@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
10:07	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64181 and previous config saved to /var/cache/conftool/dbconfig/20240606-100747-arnaudb.json	[production]
09:52	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64180 and previous config saved to /var/cache/conftool/dbconfig/20240606-095240-arnaudb.json	[production]
09:51	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance	[production]
09:50	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance	[production]
09:50	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64179 and previous config saved to /var/cache/conftool/dbconfig/20240606-095053-marostegui.json	[production]
09:47	<mvernon@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2004.codfw.wmnet	[production]
09:37	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64178 and previous config saved to /var/cache/conftool/dbconfig/20240606-093734-arnaudb.json	[production]
09:35	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64177 and previous config saved to /var/cache/conftool/dbconfig/20240606-093545-marostegui.json	[production]
09:33	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet	[production]
09:30	<mvernon@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2003.codfw.wmnet	[production]
09:22	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64176 and previous config saved to /var/cache/conftool/dbconfig/20240606-092228-arnaudb.json	[production]
09:22	<stevemunene@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
09:20	<stevemunene@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
09:20	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64175 and previous config saved to /var/cache/conftool/dbconfig/20240606-092037-marostegui.json	[production]
09:20	<stevemunene@deploy1002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
09:18	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet	[production]
09:17	<cgoubert@cumin1002>	START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw	[production]
09:17	<stevemunene@deploy1002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
09:15	<stevemunene@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
09:13	<stevemunene@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
09:12	<stevemunene@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
09:11	<stevemunene@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
09:08	<mvernon@cumin1002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1004.eqiad.wmnet	[production]
09:07	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64174 and previous config saved to /var/cache/conftool/dbconfig/20240606-090722-arnaudb.json	[production]
09:05	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64173 and previous config saved to /var/cache/conftool/dbconfig/20240606-090529-marostegui.json	[production]
09:01	<mvernon@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet	[production]
09:01	<filippo@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet	[production]
09:01	<filippo@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet	[production]