1-50 of 52 results (20ms)
|
2025-11-18
§
|
| 14:41 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm metricsinfra-thanos-fe-2 (cluster eqiad1) |
[metricsinfra] |
| 14:41 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.vps.instance.stop_start vm metricsinfra-thanos-fe-2 (cluster eqiad1) |
[metricsinfra] |
| 14:40 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm metricsinfra-grafana-2 (cluster eqiad1) |
[metricsinfra] |
| 14:39 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.vps.instance.stop_start vm metricsinfra-grafana-2 (cluster eqiad1) |
[metricsinfra] |
|
2025-11-04
§
|
| 17:38 |
<dcaro> |
removed tools and toolsbeta redis from the scrapes targets (moved to tools/toolsbeta prometheus) |
[metricsinfra] |
|
2025-09-26
§
|
| 11:40 |
<godog> |
set default timezone for grafana 'wikimedia cloud services' org to UTC |
[metricsinfra] |
|
2025-08-27
§
|
| 14:06 |
<dcaro> |
remove osbpo repos as they don't work anymore |
[metricsinfra] |
|
2025-08-14
§
|
| 11:24 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on metricsinfra-thanos-fe-2.metricsinfra.eqiad1.wikimedia.cloud |
[metricsinfra] |
| 11:22 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.vps.refresh_puppet_certs on metricsinfra-thanos-fe-2.metricsinfra.eqiad1.wikimedia.cloud |
[metricsinfra] |
|
2025-08-04
§
|
| 13:06 |
<filippo@cloudcumin1001> |
END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'filippo' in role 'member' (T401091) |
[metricsinfra] |
| 13:06 |
<filippo@cloudcumin1001> |
START - Cookbook wmcs.vps.add_user_to_project for user 'filippo' in role 'member' (T401091) |
[metricsinfra] |
|
2025-06-30
§
|
| 11:45 |
<dcaro> |
added a new global alert when nfs space is >90% |
[metricsinfra] |
|
2025-05-12
§
|
| 10:44 |
<taavi> |
add generic TargetDown rule for better detection of issues like T392889 |
[metricsinfra] |
|
2025-05-09
§
|
| 08:47 |
<taavi> |
failing over grafana to grafana-2 T393735 |
[metricsinfra] |
| 08:44 |
<taavi@cloudcumin1001> |
END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on metricsinfra-grafana-2.metricsinfra.eqiad1.wikimedia.cloud |
[metricsinfra] |
| 08:43 |
<taavi@cloudcumin1001> |
START - Cookbook wmcs.vps.refresh_puppet_certs on metricsinfra-grafana-2.metricsinfra.eqiad1.wikimedia.cloud |
[metricsinfra] |
|
2025-04-29
§
|
| 14:10 |
<taavi> |
upgrading thanos to 0.38 T383966 |
[metricsinfra] |
|
2025-04-23
§
|
| 09:43 |
<taavi> |
updating security group rules to include IPv6 terms |
[metricsinfra] |
|
2025-01-31
§
|
| 11:38 |
<dhinus> |
rebooting VM metricsinfra-prometheus-3 T385262 |
[metricsinfra] |
| 11:30 |
<dhinus> |
systemctl restart prometheus@cloud on metricsinfra-prometheus-2 T385262 |
[metricsinfra] |
| 11:16 |
<dhinus> |
systemctl restart prometheus@cloud on metricsinfra-prometheus-3 T385262 |
[metricsinfra] |
|
2025-01-20
§
|
| 13:01 |
<dcaro> |
stopping and starting metricsinfra-alertmanager-3 to try to get the right network |
[metricsinfra] |
|
2024-06-24
§
|
| 20:09 |
<andrew@cloudcumin1001> |
END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1) |
[metricsinfra] |
| 19:56 |
<andrew@cloudcumin1001> |
START - Cookbook wmcs.openstack.migrate_project_to_ovs |
[metricsinfra] |
|
2024-03-13
§
|
| 12:14 |
<taavi> |
MariaDB [prometheusconfig]> delete from alerts where name = 'GridQueueProblem'; # T314664 |
[metricsinfra] |
|
2023-11-30
§
|
| 18:53 |
<taavi> |
no longer send quarry alerts to cloud services team |
[metricsinfra] |
|
2023-11-18
§
|
| 14:09 |
<taavi> |
reboot metricsinfra-alertmanager-1 to see if it stops flapping a puppet alert |
[metricsinfra] |
|
2023-09-29
§
|
| 08:24 |
<wm-bot2> |
dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) |
[metricsinfra] |
| 08:17 |
<wm-bot2> |
dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console |
[metricsinfra] |
| 08:17 |
<wm-bot2> |
dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) |
[metricsinfra] |
| 08:16 |
<wm-bot2> |
dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console |
[metricsinfra] |
|
2023-05-10
§
|
| 17:17 |
<wm-bot2> |
Increased quotas by 8 cores, 16384 ram (T336423) - cookbook ran by taavi@runko |
[metricsinfra] |
|
2023-05-04
§
|
| 15:11 |
<dcaro> |
rebooting metricsinfra-prometheus-2 as it was unresponsive |
[metricsinfra] |
|
2023-04-24
§
|
| 14:16 |
<dcaro> |
rebooting metricsinfra-prometheus-2, it's in a non-responsive state (no ssh, console hangs) |
[metricsinfra] |
|
2023-04-21
§
|
| 21:58 |
<andrewbogott> |
added raymond-ndibe as project member |
[metricsinfra] |
|
2023-03-07
§
|
| 16:31 |
<wm-bot2> |
removed instance metricsinfra-controller-1 - cookbook ran by dcaro@vulcanus |
[metricsinfra] |
|
2023-02-13
§
|
| 23:37 |
<bd808> |
metricsinfra-db-1.trove.eqiad1.wikimedia.cloud restarted via Horizon |
[metricsinfra] |
| 23:35 |
<bd808> |
metricsinfra-db-1.trove.eqiad1.wikimedia.cloud not responsive to ssh |
[metricsinfra] |
| 23:32 |
<bd808> |
grafana.wmcloud.org offline with db connection error. Investigating. |
[metricsinfra] |
|
2022-12-20
§
|
| 15:59 |
<dcaro> |
rebooting prometheus-2 due to being non-responsive |
[metricsinfra] |
|
2022-06-16
§
|
| 14:18 |
<taavi> |
add 'gitlab-runners' project to list of scraped projects |
[metricsinfra] |
|
2022-03-01
§
|
| 11:38 |
<dcaro> |
Reloading alertmanager to refresh new config (T302702) |
[metricsinfra] |
| 11:37 |
<dcaro> |
Adding runbook url annotation to GridQueueProblem alert on DB at metricsinfra-crontroller-1 (T302702) |
[metricsinfra] |
|
2022-01-22
§
|
| 11:32 |
<taavi> |
added project-proxy VMs to prometheus targets |
[metricsinfra] |
|
2021-12-14
§
|
| 09:27 |
<majavah> |
drop "analytics" project from current beta coverage, the setup is currently not compatible with pontoon |
[metricsinfra] |
|
2021-09-11
§
|
| 08:41 |
<majavah> |
silence deployment-prep alerts yet again |
[metricsinfra] |
|
2021-07-12
§
|
| 15:45 |
<bstorm> |
silenced deployment prep alerts for another 60 days |
[metricsinfra] |
|
2021-06-15
§
|
| 16:12 |
<balloons> |
add 8 CPU/16G RAM to quota T284973 |
[metricsinfra] |
|
2021-06-14
§
|
| 18:40 |
<balloons> |
Add majavah as projectadmin T284938 |
[metricsinfra] |
|
2021-03-11
§
|
| 18:33 |
<bstorm> |
silenced alerts from deploymentprep for another 60 days |
[metricsinfra] |