2023-07-17
§
|
16:12 |
<elukey> |
stop kafka-main codfw maintenance - T341558 |
[production] |
16:08 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
16:08 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
16:07 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
16:05 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
16:05 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
16:04 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
16:02 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
15:57 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . |
[production] |
15:57 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' . |
[production] |
14:36 |
<elukey> |
restart rsyslog on centrallog1002 ("peer did not provide a certificate, not permitted to talk to it") |
[production] |
14:10 |
<elukey> |
start kafka partitions rebalance for main-codfw (long running maintenance, see https://phabricator.wikimedia.org/T341558) |
[production] |
13:13 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it |
[production] |
13:12 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it |
[production] |
12:54 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . |
[production] |
12:54 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . |
[production] |
12:53 |
<elukey@deploy1002> |
helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . |
[production] |
09:18 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
09:18 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. |
[production] |
09:17 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
09:17 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
2023-07-13
§
|
14:43 |
<elukey> |
depool ores2003 to allow DCops maintenance work |
[production] |
14:43 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it |
[production] |
14:43 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it |
[production] |
09:11 |
<elukey@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync |
[production] |
09:11 |
<elukey> |
increased kafka partitions for mediawiki.job.cirrusSearchLinksUpdate and mediawiki.job.cirrusSearchLinksUpdate (eqiad/codfw) - T341558 |
[production] |
09:10 |
<elukey@deploy1002> |
helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync |
[production] |
09:09 |
<elukey@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync |
[production] |
09:09 |
<elukey@deploy1002> |
helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync |
[production] |