5051-5100 of 10000 results (16ms)
2022-06-16 §
10:11 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster [production]
10:08 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster [production]
10:02 <elukey> ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3] [production]
09:47 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage [production]
09:44 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage [production]
09:36 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage [production]
09:33 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage [production]
09:32 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster [production]
09:21 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster [production]
2022-06-15 §
16:05 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster [production]
14:30 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage [production]
14:27 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage [production]
14:15 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster [production]
2022-06-09 §
07:53 <elukey> drop DRDB disk template from ml-etcd2* nodes - T310073 [production]
2022-06-06 §
15:00 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
15:00 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
14:59 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
14:59 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
14:57 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
14:56 <elukey> add BGP config for the k8s ml-staging cluster on cr{1,2}-codfw via homer - T302198 [production]
14:47 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
13:47 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
13:37 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
13:36 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
13:36 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
2022-06-01 §
08:43 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
08:43 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
08:41 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
08:41 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
08:39 <elukey> powercycle an-worker1094 - OEM event registered in `racadm getsel`, host frozen [production]
2022-05-31 §
13:38 <elukey> move ml-etcd100[1-3] from drdb to plain to investigate high k8s latencies for the control plane [production]
07:27 <elukey> add profile k8s_mlstaging + authkey for ml-staging k8s - T302195 [production]
06:26 <elukey> `elukey@an-master1001:~$ sudo systemctl reset-failed hadoop-clean-fairscheduler-event-logs.service` [production]
2022-05-30 §
06:49 <elukey> restart kube-api on ml-serve-ctrl2002 as attempt to clear some high api latencies / HTTP 504 due to LIST to a specific knative resource [production]
06:48 <elukey> restart kube-api on ml-serve-ctrl1002 as attempt to clear some high api latencies / HTTP 504 due to LIST to a specific knative resource [production]
2022-05-27 §
13:57 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
13:20 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
13:17 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
2022-05-26 §
09:19 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
09:16 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
08:56 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
08:55 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
07:23 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
07:18 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
2022-05-25 §
13:44 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
06:20 <elukey> `elukey@an-tool1011:~$ sudo systemctl reset-failed ifup@ens13.service` - T273026 [production]
2022-05-24 §
11:23 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
11:19 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
2022-05-17 §
13:52 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
13:50 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]