3201-3250 of 10000 results (22ms)
2023-05-18 §
08:30 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
08:28 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
08:27 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
08:24 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/changeprop: sync [production]
08:24 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/changeprop: sync [production]
08:19 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/changeprop: sync [production]
08:19 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/changeprop: sync [production]
2023-05-17 §
15:46 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
15:38 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
14:17 <elukey> run authdns-update for new ml-serve/ores discovery endpoints - T336726 [production]
09:39 <elukey> roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - T336726 [production]
2023-05-16 §
10:34 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
10:34 <elukey@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001" [production]
10:33 <elukey@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001" [production]
10:28 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
10:07 <elukey@deploy1002> helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
10:06 <elukey@deploy1002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
06:57 <elukey@deploy1002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-05-15 §
08:26 <elukey> restart pybal on lvs2010 and lvs2009 to pick up new LVS VIP for ml-staging k8s ingress - T335756 [production]
2023-05-11 §
16:16 <elukey> benthos webrequest live instances migrated to kafka-franz (new consumer client, data may have some holes) - T331801 [production]
13:57 <elukey> upgrade benthos (4.9.1 -> 4.15.0) on centrallog nodes - T331801 [production]
13:21 <elukey> upload benthos 4.15.0-1 to {buster,bullseye}-wikimedia - T331801 [production]
08:40 <elukey> `apt-get clean` on orespoolcounter nodes to free space in the root partition [production]
2023-05-10 §
14:02 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
10:38 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-05-04 §
15:54 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
13:37 <elukey> revert "Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733" [production]
10:30 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
10:30 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
10:16 <elukey@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
10:10 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
08:49 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
08:49 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
08:07 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
08:07 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
2023-05-03 §
15:29 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
07:26 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-05-02 §
12:28 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
11:51 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
08:51 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
08:40 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-04-30 §
08:06 <elukey> powercycle ores1002 (mgmt console tty not usable, host frozen) [production]
2023-04-28 §
15:39 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
10:25 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
09:24 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-04-27 §
14:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS bullseye [production]
13:48 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage [production]
13:45 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage [production]
13:33 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS bullseye [production]
13:04 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye [production]