3301-3350 of 10000 results (28ms)
2023-04-27 §
12:43 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage [production]
12:40 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage [production]
12:29 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye [production]
12:27 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS bullseye [production]
10:24 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage [production]
10:20 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage [production]
10:09 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye [production]
10:04 <elukey@cumin1001> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-cache1001.eqiad.wmnet with OS bullseye [production]
09:55 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye [production]
09:54 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1001.eqiad.wmnet with OS bullseye [production]
09:34 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye [production]
2023-04-26 §
07:35 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync [production]
07:34 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync [production]
07:33 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync [production]
07:33 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync [production]
07:32 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync [production]
07:32 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/eventgate-logging-external: sync [production]
2023-04-20 §
10:57 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
10:57 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
09:42 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
09:42 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
09:40 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
09:40 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
09:35 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
09:35 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
2023-04-19 §
09:01 <elukey@deploy2002> helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync [production]
09:00 <elukey@deploy2002> helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync [production]
09:00 <elukey@deploy2002> helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync [production]
08:59 <elukey@deploy2002> helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync [production]
2023-04-18 §
13:41 <elukey> restart etcdmirror on conf2005 (down due to conf1009 under maintenance) [production]
11:00 <elukey> puppet cert clean kafka_jumbo-eqiad_broker on puppetmaster1001 - remove old certificate (not used anymore) [production]
2023-04-17 §
14:14 <elukey> upload amd-k8s-device-plugin deb (1.25.2.3-1) to bullseye-wikimedia - T333009 [production]
2023-04-13 §
14:14 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet [production]
14:05 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet [production]
09:25 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-worker1001.eqiad.wmnet [production]
09:12 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet [production]
2023-04-12 §
13:26 <elukey> upload AMD ROCm 5.4 debian packages to wikimedia-bullseye:thirdparty/amd-rocm54 - T295661 [production]
2023-04-11 §
16:07 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1132.eqiad.wmnet with reason: More tests are needed before the host can be added to prod [production]
16:06 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker1132.eqiad.wmnet with reason: More tests are needed before the host can be added to prod [production]
16:05 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS buster [production]
15:37 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage [production]
15:34 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage [production]
14:48 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster [production]
13:54 <elukey> remove old puppet certificates for kafka main brokers from A:kafka-main - T319372 [production]
13:46 <elukey> powercycle analytics1069, down for some days now, host stuck from the mgmt/serial console [production]
2023-04-06 §
14:21 <elukey> upgrade istioctl on deploy[12]002 and istio-cni on ml-serve[12]00[1-8] manually - T334068 [production]
14:14 <elukey> upload new istio-cni and istioctl 1.15.7 debian package versions to bullseye-wikimedia - T334068 [production]
10:28 <elukey@deploy2002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
10:27 <elukey@deploy2002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
10:27 <elukey@deploy2002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]