3951-4000 of 10000 results (32ms)
2023-02-27 §
16:31 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage [production]
16:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage [production]
16:08 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage [production]
16:06 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage [production]
15:56 <elukey@cumin1001> END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host ml-etcd2001.codfw.wmnet with OS bullseye [production]
15:52 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-etcd2001.codfw.wmnet with reason: etcd cluster upgrade failed, waiting for k8s upgrade [production]
15:52 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-etcd2001.codfw.wmnet with reason: etcd cluster upgrade failed, waiting for k8s upgrade [production]
15:44 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [production]
15:43 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [production]
15:35 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [production]
15:11 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage [production]
15:08 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage [production]
15:01 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage [production]
14:56 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage [production]
14:54 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [production]
14:45 <elukey@cumin1001> START - Cookbook sre.ganeti.reimage for host ml-etcd2001.codfw.wmnet with OS bullseye [production]
11:20 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
11:07 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
11:05 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
10:48 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
10:48 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
10:26 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
09:32 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
09:19 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
2023-02-25 §
11:03 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on dse-k8s-worker[1001-1004,1007].eqiad.wmnet with reason: Downtime DSE workers for cluster upgrade [production]
11:02 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on dse-k8s-worker[1001-1004,1007].eqiad.wmnet with reason: Downtime DSE workers for cluster upgrade [production]
09:38 <elukey> delete knative pods on ml-serve-codfw to clear latency alerts [production]
2023-02-24 §
14:50 <elukey@cumin1001> END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23 [production]
14:50 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [production]
14:50 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: Cluster half broken, in the middle of upgrading [production]
14:50 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: Cluster half broken, in the middle of upgrading [production]
14:39 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage [production]
14:36 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [production]
14:35 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage [production]
14:31 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
14:23 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [production]
14:23 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [production]
14:22 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [production]
14:17 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
14:10 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [production]
11:02 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye [production]
10:59 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye [production]
10:59 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye [production]
10:52 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye [production]
10:46 <elukey@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. [production]
10:46 <elukey@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. [production]
10:45 <elukey@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. [production]
10:45 <elukey@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. [production]
10:44 <elukey@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. [production]
10:44 <elukey@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. [production]