1851-1900 of 10000 results (24ms)
2024-06-10 §
13:08 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
13:07 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
13:04 <elukey@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
13:04 <elukey@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
13:03 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
13:03 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
13:01 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'sync'. [production]
13:01 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/admin 'sync'. [production]
12:58 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'sync'. [production]
12:58 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/admin 'sync'. [production]
12:50 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'sync'. [production]
12:49 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/admin 'sync'. [production]
12:48 <elukey@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
12:46 <elukey@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
12:44 <elukey@deploy1002> helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. [production]
12:43 <elukey@deploy1002> helmfile [staging-eqiad] START helmfile.d/admin 'sync'. [production]
12:41 <elukey@deploy1002> helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'. [production]
12:40 <elukey@deploy1002> helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'. [production]
2024-06-06 §
14:51 <elukey> kill sessionstore pod running on mw1390.eqiad.wmnet (no dedicated='kask' taint) [production]
2024-06-05 §
15:02 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED [production]
14:07 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED [production]
13:46 <elukey> factory reset for sretest1001 to test the new provision cookbook - T365372 [production]
13:27 <elukey> systemctl reset-failed prometheus-redis-exporter@6380.service redis-instance-tcp_6380.service on netbox[12]002 + apt-get purge of redis-server and prometheus-redis-exporter packages to clean up stale configs (no local redis is used) [production]
13:10 <elukey@cumin1002> END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker [production]
12:56 <elukey@cumin1002> START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker [production]
2024-06-04 §
16:41 <elukey> delete other 2 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state [production]
16:31 <elukey> delete 3 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state [production]
15:57 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1003.eqiad.wmnet [production]
15:53 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1003.eqiad.wmnet [production]
15:51 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1002.eqiad.wmnet [production]
15:47 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1002.eqiad.wmnet [production]
15:47 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1001.eqiad.wmnet [production]
15:43 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet [production]
15:43 <elukey@cumin1002> END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM aux-k8s-etcd1001.eqiad.wmnet [production]
15:42 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet [production]
15:37 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet [production]
15:31 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet [production]
15:25 <elukey@cumin1002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1001.eqiad.wmnet [production]
15:19 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet [production]
15:18 <elukey@cumin1002> END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM aux-k8s-ctrl1001.eqiad.wmnet [production]
15:18 <elukey@cumin1002> START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet [production]
15:13 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL [production]
15:12 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL [production]
15:11 <elukey@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED [production]
15:11 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED [production]
15:08 <elukey@cumin1002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED [production]
15:08 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED [production]
15:05 <elukey@cumin1002> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL [production]
15:04 <elukey@cumin1002> START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL [production]
2024-05-29 §
12:39 <elukey> move thanos-fe100[3,4] and thanos-fe2* to PKI TLS certs (envoy, backends for thanos-swift.discovery.wmnet) - T344324 [production]