2951-3000 of 10000 results (22ms)
2023-07-10 §
13:27 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/eventgate-main: sync [production]
10:50 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync [production]
10:50 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/eventgate-main: sync [production]
10:44 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/eventgate-main: sync [production]
10:44 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/eventgate-main: sync [production]
07:30 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
07:29 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync [production]
07:22 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
07:21 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync [production]
07:20 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
07:20 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync [production]
2023-07-07 §
08:05 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster [production]
08:05 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster [production]
2023-07-06 §
15:54 <elukey> changeprop's kafka linger.ms set to 20s - T338357 (was 5ms, now changeprop waits a bit more to batch messages to send to kafka in one go) [production]
15:53 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/changeprop: sync [production]
15:53 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/changeprop: sync [production]
15:45 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/changeprop: sync [production]
15:45 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/changeprop: sync [production]
15:36 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/changeprop: sync [production]
15:36 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/changeprop: sync [production]
13:33 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zookeeper-test1002.eqiad.wmnet with OS bookworm [production]
12:58 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage [production]
12:56 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage [production]
12:42 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm [production]
12:15 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host zookeeper-test1002.eqiad.wmnet with OS bookworm [production]
12:15 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm [production]
09:11 <elukey> restart kube-apiserver on ml-serve-ctrl2* as attempt to fix LIST-related latency issues [production]
2023-07-05 §
14:18 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
14:18 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync [production]
14:17 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
14:16 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync [production]
13:55 <elukey> expand kafka topic partitions from 1 to 5 for {codfw,eqiad}.mediawiki.job.RecordLintJob and {eqiad,codfw}.mediawiki.job.refreshLinks on kafka-main eqiad/codfw - T338357 [production]
2023-07-04 §
13:32 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync [production]
13:32 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/api-gateway: sync [production]
13:31 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/api-gateway: sync [production]
13:31 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/api-gateway: sync [production]
13:28 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/api-gateway: sync [production]
13:28 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/api-gateway: sync [production]
12:31 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/api-gateway: sync [production]
12:31 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/api-gateway: sync [production]
2023-07-03 §
07:54 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/toolhub: sync [production]
07:54 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/toolhub: sync [production]
2023-06-30 §
15:35 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
15:35 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
10:24 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
10:22 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
10:20 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
2023-06-29 §
15:49 <elukey@deploy1002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
15:48 <elukey@deploy1002> helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
15:47 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]