2051-2100 of 10000 results (82ms)
2023-03-28 ยง
12:21 <eoghan@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage [production]
12:20 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
12:20 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
12:16 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295 [production]
12:15 <ayounsi@cumin1001> START - Cookbook sre.network.peering with action 'configure' for AS: 45295 [production]
12:09 <eoghan@cumin1001> START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye [production]
11:57 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
11:57 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
11:56 <elukey> dist-upgrade kafka-main1002 to debian bullseye - T332013 [production]
11:51 <ladsgroup@deploy2002> Finished scap: Backport for [[gerrit:903549|api: Mark query as read-only to avoid regex on SQL (T332942)]] (duration: 18m 42s) [production]
11:47 <hnowlan@deploy2002> helmfile [eqiad] DONE helmfile.d/services/thumbor: apply [production]
11:37 <hnowlan@deploy2002> helmfile [eqiad] START helmfile.d/services/thumbor: apply [production]
11:34 <hnowlan@deploy2002> helmfile [eqiad] DONE helmfile.d/services/thumbor: apply [production]
11:34 <ladsgroup@deploy2002> ladsgroup: Backport for [[gerrit:903549|api: Mark query as read-only to avoid regex on SQL (T332942)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet [production]
11:32 <ladsgroup@deploy2002> Started scap: Backport for [[gerrit:903549|api: Mark query as read-only to avoid regex on SQL (T332942)]] [production]
11:24 <hnowlan@deploy2002> helmfile [eqiad] START helmfile.d/services/thumbor: apply [production]
11:23 <hnowlan@deploy2002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
11:22 <hnowlan@deploy2002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
11:22 <hnowlan@deploy2002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
11:21 <hnowlan@deploy2002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
11:08 <akosiaris@deploy2002> helmfile [codfw] DONE helmfile.d/services/thumbor: apply [production]
11:00 <akosiaris@deploy2002> helmfile [codfw] START helmfile.d/services/thumbor: apply [production]
10:24 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
10:24 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
10:16 <stevemunene@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage [production]
10:12 <stevemunene@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage [production]
09:56 <stevemunene@cumin1001> START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye [production]
09:45 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues [production]
09:45 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues [production]
09:41 <vgutierrez> resetting cp2035 management card - T333312 [production]
09:38 <elukey> dist-upgrade kafka-main1001 to bullseye - T332013 [production]
09:36 <godog> silence systemdunitfailed alerts for team=wmcs - T333315 [production]
09:35 <vgutierrez> depool cp2035 - T333312 [production]
09:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
09:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
09:12 <jbond@cumin1001> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts [production]
09:11 <jbond@cumin1001> START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts [production]
09:11 <jbond@cumin1001> END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts [production]
09:11 <jbond@cumin1001> START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts [production]
08:58 <vgutierrez> restart ipmiseld on cp2035 [production]
08:50 <aborrero@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org [production]
08:49 <ayounsi@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
08:48 <AndyRussG> update payments.wiki config 65bedd4a -> e31ffd7d, payments (automatic updates only) a6c6c2b1 -> f5ec2677 [production]
08:45 <ayounsi@deploy1002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
08:43 <ayounsi@deploy1002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
08:42 <aborrero@cumin2002> START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org [production]
08:39 <ayounsi@deploy1002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
08:37 <ayounsi@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
08:35 <ayounsi@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
08:34 <ayounsi@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]