2401-2450 of 10000 results (29ms)
2023-10-10 §
13:15 <elukey@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
2023-10-09 §
10:50 <elukey@deploy2002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
2023-10-06 §
13:53 <elukey@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
08:26 <elukey@deploy2002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
08:24 <elukey@deploy2002> helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
08:18 <elukey@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-10-03 §
08:27 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting [production]
08:27 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting [production]
2023-10-02 §
17:18 <elukey@deploy2002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
17:18 <elukey@deploy2002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
17:17 <elukey@deploy2002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
17:17 <elukey@deploy2002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
17:17 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
17:17 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
14:55 <elukey> restart kubelet on ml-serve1001 (high latencies registered) [production]
2023-09-29 §
10:09 <elukey@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: sync [production]
10:09 <elukey@deploy2002> helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: sync [production]
09:08 <elukey@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: sync [production]
09:08 <elukey@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: sync [production]
2023-09-28 §
13:04 <elukey@deploy2002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
13:03 <elukey@deploy2002> helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
13:03 <elukey@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
12:47 <elukey> restart thanos-query on titan1002 [production]
12:44 <elukey> restart thanos-query on titan1001 [production]
2023-09-27 §
09:05 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo [production]
09:05 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo [production]
08:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo [production]
08:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo [production]
08:10 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 15 hosts with reason: Kafka mirror issues on jumbo [production]
08:10 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 0:30:00 on 15 hosts with reason: Kafka mirror issues on jumbo [production]
2023-09-25 §
08:58 <elukey> migrate ores.wikimedia.org's ATS backend to ores-legacy.discovery.wmnet (k8s app) - This will drain traffic to ORES bare metal nodes - T341696 [production]
2023-09-21 §
15:33 <elukey@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
10:42 <elukey@deploy2002> helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . [production]
2023-09-20 §
16:29 <elukey@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
16:28 <elukey@deploy2002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
16:26 <elukey@deploy2002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
16:24 <elukey@deploy2002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
09:48 <brouberol@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart [production]
09:48 <brouberol@cumin1001> START - Cookbook sre.hosts.downtime for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart [production]
2023-09-19 §
13:52 <elukey> clean old puppet certs kafka_logging-eqiad_broker [production]
09:12 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet [production]
09:08 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet [production]
09:03 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet [production]
08:59 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet [production]
08:47 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet [production]
08:43 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet [production]
2023-09-18 §
14:05 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet [production]
14:01 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet [production]
14:01 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet [production]
13:57 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet [production]