|
2023-03-22
§
|
| 09:02 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet |
[production] |
| 09:01 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
| 09:01 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
| 08:25 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
| 08:25 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
| 08:20 |
<elukey@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/api-gateway: sync |
[production] |
| 08:20 |
<elukey@deploy2002> |
helmfile [codfw] START helmfile.d/services/api-gateway: sync |
[production] |
| 08:18 |
<elukey@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync |
[production] |
| 08:17 |
<elukey@deploy2002> |
helmfile [eqiad] START helmfile.d/services/api-gateway: sync |
[production] |
|
2023-03-21
§
|
| 15:52 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
| 15:32 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage |
[production] |
| 15:26 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage |
[production] |
| 15:10 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
| 15:02 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
| 14:38 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
| 14:37 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
| 14:27 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
| 14:10 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
| 14:05 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
| 13:33 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
| 13:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
| 13:16 |
<elukey@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet |
[production] |
| 13:11 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
| 13:11 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware |
[production] |
| 13:05 |
<elukey> |
move kafka mirror maker instances to PKI migration settings (new truststores) - T319372 |
[production] |
| 09:43 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye |
[production] |
| 09:39 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage |
[production] |
| 09:39 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage |
[production] |
| 09:06 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC |
[production] |
| 09:05 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC |
[production] |
| 08:31 |
<elukey> |
move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372 |
[production] |
|
2023-03-16
§
|
| 14:44 |
<elukey@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . |
[production] |
| 14:40 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
| 14:40 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
| 14:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
| 14:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. |
[production] |
| 10:42 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
| 10:42 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . |
[production] |
| 10:40 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
| 10:39 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
| 10:33 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
| 10:32 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . |
[production] |
| 10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
| 10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
| 10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
| 10:31 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . |
[production] |
| 10:30 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
| 10:29 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |