3101-3150 of 10000 results (25ms)
2023-06-05 §
16:05 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
16:05 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
15:33 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance [production]
15:33 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance [production]
13:45 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance [production]
13:44 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance [production]
13:44 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance [production]
13:43 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance [production]
07:24 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [production]
07:23 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
07:23 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
07:23 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
2023-06-03 §
13:41 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade [production]
13:41 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade [production]
2023-06-01 §
09:21 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet [production]
09:17 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet [production]
09:17 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet [production]
09:13 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet [production]
09:11 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet [production]
09:07 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet [production]
09:06 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet [production]
09:02 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet [production]
09:01 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet [production]
08:57 <elukey@cumin1001> START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet [production]
2023-05-31 §
16:22 <elukey> `systemctl reset-failed session-c6111.scope session-c7230.scope` on stat1005 to clear old alerts [production]
2023-05-29 §
07:57 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
07:56 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
2023-05-26 §
15:40 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
15:38 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
15:34 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/changeprop: sync [production]
15:34 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/changeprop: sync [production]
09:13 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/changeprop: sync [production]
09:13 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/changeprop: sync [production]
06:42 <elukey> `apt-get clean` on stat1008 to clean up some space in the root partition [production]
06:36 <elukey> `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up [production]
2023-05-25 §
08:32 <elukey> revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - T337248 [production]
2023-05-24 §
16:05 <elukey> move kafka mirror on kafka main brokers to PKI - T337248 [production]
15:56 <elukey> move kafka mirror on kafka jumbo brokers to PKI - T337248 [production]
07:42 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/api-gateway: sync [production]
07:42 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/api-gateway: sync [production]
07:41 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync [production]
07:40 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/api-gateway: sync [production]
2023-05-23 §
13:44 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/api-gateway: sync [production]
13:44 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/api-gateway: sync [production]
2023-05-22 §
08:28 <elukey> drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin [production]
07:59 <elukey> restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events [production]
2023-05-19 §
14:59 <elukey@cumin1001> END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad [production]
12:19 <elukey@cumin1001> START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad [production]
09:49 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
09:49 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]