2023-03-22
§
|
09:02 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet |
[production] |
09:01 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
09:01 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
08:25 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
08:25 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
08:20 |
<elukey@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/api-gateway: sync |
[production] |
08:20 |
<elukey@deploy2002> |
helmfile [codfw] START helmfile.d/services/api-gateway: sync |
[production] |
08:18 |
<elukey@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync |
[production] |
08:17 |
<elukey@deploy2002> |
helmfile [eqiad] START helmfile.d/services/api-gateway: sync |
[production] |
2023-03-21
§
|
15:52 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
15:32 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage |
[production] |
15:26 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage |
[production] |
15:10 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
15:02 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
14:38 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
14:37 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
14:27 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
14:10 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
14:05 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:33 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:16 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
13:11 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
13:11 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
13:05 |
<elukey> |
move kafka mirror maker instances to PKI migration settings (new truststores) - T319372 |
[production] |
09:43 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
09:39 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage |
[production] |
09:39 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage |
[production] |
09:06 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC |
[production] |
09:05 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC |
[production] |
08:31 |
<elukey> |
move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372 |
[production] |
2023-03-16
§
|
14:44 |
<elukey@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . |
[production] |
14:40 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
14:40 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
14:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
14:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. |
[production] |
10:42 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
10:42 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
10:40 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
10:39 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
10:33 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
10:32 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
10:30 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
10:29 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |