metricsinfra SAL

45 results (20ms)

2025-08-14 §
11:24	<taavi@cloudcumin1001>	END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on metricsinfra-thanos-fe-2.metricsinfra.eqiad1.wikimedia.cloud	[metricsinfra]
11:22	<taavi@cloudcumin1001>	START - Cookbook wmcs.vps.refresh_puppet_certs on metricsinfra-thanos-fe-2.metricsinfra.eqiad1.wikimedia.cloud	[metricsinfra]
2025-08-04 §
13:06	<filippo@cloudcumin1001>	END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'filippo' in role 'member' (T401091)	[metricsinfra]
13:06	<filippo@cloudcumin1001>	START - Cookbook wmcs.vps.add_user_to_project for user 'filippo' in role 'member' (T401091)	[metricsinfra]
2025-06-30 §
11:45	<dcaro>	added a new global alert when nfs space is >90%	[metricsinfra]
2025-05-12 §
10:44	<taavi>	add generic TargetDown rule for better detection of issues like T392889	[metricsinfra]
2025-05-09 §
08:47	<taavi>	failing over grafana to grafana-2 T393735	[metricsinfra]
08:44	<taavi@cloudcumin1001>	END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on metricsinfra-grafana-2.metricsinfra.eqiad1.wikimedia.cloud	[metricsinfra]
08:43	<taavi@cloudcumin1001>	START - Cookbook wmcs.vps.refresh_puppet_certs on metricsinfra-grafana-2.metricsinfra.eqiad1.wikimedia.cloud	[metricsinfra]
2025-04-29 §
14:10	<taavi>	upgrading thanos to 0.38 T383966	[metricsinfra]
2025-04-23 §
09:43	<taavi>	updating security group rules to include IPv6 terms	[metricsinfra]
2025-01-31 §
11:38	<dhinus>	rebooting VM metricsinfra-prometheus-3 T385262	[metricsinfra]
11:30	<dhinus>	systemctl restart prometheus@cloud on metricsinfra-prometheus-2 T385262	[metricsinfra]
11:16	<dhinus>	systemctl restart prometheus@cloud on metricsinfra-prometheus-3 T385262	[metricsinfra]
2025-01-20 §
13:01	<dcaro>	stopping and starting metricsinfra-alertmanager-3 to try to get the right network	[metricsinfra]
2024-06-24 §
20:09	<andrew@cloudcumin1001>	END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1)	[metricsinfra]
19:56	<andrew@cloudcumin1001>	START - Cookbook wmcs.openstack.migrate_project_to_ovs	[metricsinfra]
2024-03-13 §
12:14	<taavi>	MariaDB [prometheusconfig]> delete from alerts where name = 'GridQueueProblem'; # T314664	[metricsinfra]
2023-11-30 §
18:53	<taavi>	no longer send quarry alerts to cloud services team	[metricsinfra]
2023-11-18 §
14:09	<taavi>	reboot metricsinfra-alertmanager-1 to see if it stops flapping a puppet alert	[metricsinfra]
2023-09-29 §
08:24	<wm-bot2>	dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)	[metricsinfra]
08:17	<wm-bot2>	dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console	[metricsinfra]
08:17	<wm-bot2>	dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)	[metricsinfra]
08:16	<wm-bot2>	dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console	[metricsinfra]
2023-05-10 §
17:17	<wm-bot2>	Increased quotas by 8 cores, 16384 ram (T336423) - cookbook ran by taavi@runko	[metricsinfra]
2023-05-04 §
15:11	<dcaro>	rebooting metricsinfra-prometheus-2 as it was unresponsive	[metricsinfra]
2023-04-24 §
14:16	<dcaro>	rebooting metricsinfra-prometheus-2, it's in a non-responsive state (no ssh, console hangs)	[metricsinfra]
2023-04-21 §
21:58	<andrewbogott>	added raymond-ndibe as project member	[metricsinfra]
2023-03-07 §
16:31	<wm-bot2>	removed instance metricsinfra-controller-1 - cookbook ran by dcaro@vulcanus	[metricsinfra]
2023-02-13 §
23:37	<bd808>	metricsinfra-db-1.trove.eqiad1.wikimedia.cloud restarted via Horizon	[metricsinfra]
23:35	<bd808>	metricsinfra-db-1.trove.eqiad1.wikimedia.cloud not responsive to ssh	[metricsinfra]
23:32	<bd808>	grafana.wmcloud.org offline with db connection error. Investigating.	[metricsinfra]
2022-12-20 §
15:59	<dcaro>	rebooting prometheus-2 due to being non-responsive	[metricsinfra]
2022-06-16 §
14:18	<taavi>	add 'gitlab-runners' project to list of scraped projects	[metricsinfra]
2022-03-01 §
11:38	<dcaro>	Reloading alertmanager to refresh new config (T302702)	[metricsinfra]
11:37	<dcaro>	Adding runbook url annotation to GridQueueProblem alert on DB at metricsinfra-crontroller-1 (T302702)	[metricsinfra]
2022-01-22 §
11:32	<taavi>	added project-proxy VMs to prometheus targets	[metricsinfra]
2021-12-14 §
09:27	<majavah>	drop "analytics" project from current beta coverage, the setup is currently not compatible with pontoon	[metricsinfra]
2021-09-11 §
08:41	<majavah>	silence deployment-prep alerts yet again	[metricsinfra]
2021-07-12 §
15:45	<bstorm>	silenced deployment prep alerts for another 60 days	[metricsinfra]
2021-06-15 §
16:12	<balloons>	add 8 CPU/16G RAM to quota T284973	[metricsinfra]
2021-06-14 §
18:40	<balloons>	Add majavah as projectadmin T284938	[metricsinfra]
2021-03-11 §
18:33	<bstorm>	silenced alerts from deploymentprep for another 60 days	[metricsinfra]
2021-01-04 §
15:50	<bstorm>	silencing all alerts from deployment-prep for 60 more days	[metricsinfra]
2020-09-29 §
16:53	<bstorm>	silence all the deployment-prep alerts for another 30 days	[metricsinfra]