2351-2400 of 10000 results (82ms)
2023-03-31 ยง
13:51 <jclark@cumin1001> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED [production]
13:49 <jclark@cumin1001> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED [production]
13:41 <jclark@cumin1001> START - Cookbook sre.hosts.provision for host an-worker1153.mgmt.eqiad.wmnet with reboot policy FORCED [production]
13:40 <jclark@cumin1001> START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED [production]
13:34 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage [production]
13:31 <pt1979@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage [production]
13:30 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye [production]
13:12 <elukey> move kafka-jumbo1004's kafka broker cert to PKI - T296064 [production]
13:11 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1004.eqiad.wmnet with reason: restart kafka, switch to PKI [production]
13:11 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1004.eqiad.wmnet with reason: restart kafka, switch to PKI [production]
13:11 <dcausse@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply [production]
13:10 <phedenskog@deploy2002> Finished deploy [performance/navtiming@c30b954]: (no justification provided) (duration: 00m 05s) [production]
13:10 <phedenskog@deploy2002> Started deploy [performance/navtiming@c30b954]: (no justification provided) [production]
13:10 <dcausse@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply [production]
13:09 <elukey> restart kafkatee on centrallog2002 - test to see if there are issues connecting to the jumbo brokers running pki [production]
12:55 <eoghan@cumin2002> END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab [production]
12:46 <btullis@deploy2002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
12:45 <btullis@deploy2002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
12:25 <btullis@deploy2002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
12:04 <eoghan@cumin2002> START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab [production]
12:00 <eoghan@cumin1001> END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab [production]
11:42 <Emperor> shutdown ms-be1042 for battery swap T332883 [production]
11:41 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be1042.eqiad.wmnet with reason: Add-in Card 2 ROMB Battery LOW [production]
11:41 <mvernon@cumin2002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be1042.eqiad.wmnet with reason: Add-in Card 2 ROMB Battery LOW [production]
11:12 <jclark@cumin1001> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151.eqiad.wmnet'] [production]
11:09 <eoghan@cumin1001> START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab [production]
11:08 <eoghan@cumin1001> END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab [production]
11:02 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye [production]
10:46 <mvernon@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage [production]
10:45 <Amir1> Failover m1 from db1101 to db1164 - T333123 [production]
10:44 <mvernon@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage [production]
10:32 <jclark@cumin1001> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149.eqiad.wmnet'] [production]
10:28 <mvernon@cumin2002> START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye [production]
10:25 <jynus@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: preparing for m1 primary db switchover [production]
10:25 <jynus@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: preparing for m1 primary db switchover [production]
10:18 <eoghan@cumin1001> START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab [production]
10:07 <gmodena@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
10:07 <gmodena@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
10:06 <gmodena@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
10:06 <gmodena@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply [production]
09:54 <elukey> move kafka-jumbo1003's kafka broker cert to PKI - T296064 [production]
09:54 <jynus@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: reprovisioning after maintenance [production]
09:54 <jynus@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: reprovisioning after maintenance [production]
09:54 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1003.eqiad.wmnet with reason: restart kafka, switch to PKI [production]
09:53 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1003.eqiad.wmnet with reason: restart kafka, switch to PKI [production]
09:03 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1002.eqiad.wmnet with reason: restart kafka, switch to PKI [production]
09:03 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1002.eqiad.wmnet with reason: restart kafka, switch to PKI [production]
09:02 <elukey> move kafka-jumbo1002's kafka broker cert to PKI - T296064 [production]
08:47 <jelto@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye [production]
08:38 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on an-worker1091.eqiad.wmnet with reason: Replacing battery [production]