2901-2950 of 10000 results (30ms)
2023-07-19 §
13:31 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
10:02 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
08:45 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
2023-07-18 §
16:28 <elukey> maintenance finished for kafka main-codfw [production]
13:43 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
13:42 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
13:42 <elukey@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
13:41 <elukey@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
13:40 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
13:39 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
07:16 <elukey> restart kafka main-codfw rebalances (long maintenance) - T341558 [production]
2023-07-17 §
16:12 <elukey> stop kafka-main codfw maintenance - T341558 [production]
16:08 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
16:08 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
16:07 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
16:05 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
16:05 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
16:04 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
16:02 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
15:57 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [production]
15:57 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . [production]
14:36 <elukey> restart rsyslog on centrallog1002 ("peer did not provide a certificate, not permitted to talk to it") [production]
14:10 <elukey> start kafka partitions rebalance for main-codfw (long running maintenance, see https://phabricator.wikimedia.org/T341558) [production]
13:13 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it [production]
13:12 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it [production]
12:54 <elukey@deploy1002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
12:54 <elukey@deploy1002> helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
12:53 <elukey@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
09:18 <elukey@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
09:18 <elukey@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
09:17 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
09:17 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
2023-07-14 §
14:25 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
14:22 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
2023-07-13 §
14:43 <elukey> depool ores2003 to allow DCops maintenance work [production]
14:43 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it [production]
14:43 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it [production]
09:11 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
09:11 <elukey> increased kafka partitions for mediawiki.job.cirrusSearchLinksUpdate and mediawiki.job.cirrusSearchLinksUpdate (eqiad/codfw) - T341558 [production]
09:10 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync [production]
09:09 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync [production]
09:09 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync [production]
2023-07-11 §
09:06 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync [production]
09:06 <elukey@deploy1002> helmfile [eqiad] START helmfile.d/services/eventgate-main: sync [production]
09:01 <elukey@deploy1002> helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync [production]
09:01 <elukey@deploy1002> helmfile [codfw] START helmfile.d/services/eventgate-main: sync [production]
08:59 <elukey@deploy1002> helmfile [staging] DONE helmfile.d/services/eventgate-main: sync [production]
08:59 <elukey@deploy1002> helmfile [staging] START helmfile.d/services/eventgate-main: sync [production]
06:59 <elukey> restart kube-apiserver on ml-serve-ctrl1* as attempt to resolve spikes in latencies [production]
2023-07-10 §
13:27 <elukey@deploy1002> helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync [production]