5651-5700 of 10000 results (101ms)
2023-03-28 ยง
12:56 <elukey@deploy2002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
12:56 <elukey@deploy2002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
12:44 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 [production]
12:44 <ayounsi@cumin1001> START - Cookbook sre.network.debug for Netbox circuit ID 108 [production]
12:43 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 [production]
12:43 <ayounsi@cumin1001> START - Cookbook sre.network.debug for Netbox circuit ID 108 [production]
12:38 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108 [production]
12:38 <ayounsi@cumin1001> START - Cookbook sre.network.debug for Netbox circuit ID 108 [production]
12:36 <eoghan@cumin1001> END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict1002.eqiad.wmnet with OS bullseye [production]
12:34 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112 [production]
12:34 <ayounsi@cumin1001> START - Cookbook sre.network.debug for Netbox circuit ID 112 [production]
12:24 <eoghan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage [production]
12:21 <eoghan@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict1002.eqiad.wmnet with reason: host reimage [production]
12:20 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
12:20 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
12:16 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45295 [production]
12:15 <ayounsi@cumin1001> START - Cookbook sre.network.peering with action 'configure' for AS: 45295 [production]
12:09 <eoghan@cumin1001> START - Cookbook sre.ganeti.reimage for host aphlict1002.eqiad.wmnet with OS bullseye [production]
11:57 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
11:57 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1002.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
11:56 <elukey> dist-upgrade kafka-main1002 to debian bullseye - T332013 [production]
11:51 <ladsgroup@deploy2002> Finished scap: Backport for [[gerrit:903549|api: Mark query as read-only to avoid regex on SQL (T332942)]] (duration: 18m 42s) [production]
11:47 <hnowlan@deploy2002> helmfile [eqiad] DONE helmfile.d/services/thumbor: apply [production]
11:37 <hnowlan@deploy2002> helmfile [eqiad] START helmfile.d/services/thumbor: apply [production]
11:34 <hnowlan@deploy2002> helmfile [eqiad] DONE helmfile.d/services/thumbor: apply [production]
11:34 <ladsgroup@deploy2002> ladsgroup: Backport for [[gerrit:903549|api: Mark query as read-only to avoid regex on SQL (T332942)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet [production]
11:32 <ladsgroup@deploy2002> Started scap: Backport for [[gerrit:903549|api: Mark query as read-only to avoid regex on SQL (T332942)]] [production]
11:24 <hnowlan@deploy2002> helmfile [eqiad] START helmfile.d/services/thumbor: apply [production]
11:23 <hnowlan@deploy2002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
11:22 <hnowlan@deploy2002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
11:22 <hnowlan@deploy2002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
11:21 <hnowlan@deploy2002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
11:08 <akosiaris@deploy2002> helmfile [codfw] DONE helmfile.d/services/thumbor: apply [production]
11:00 <akosiaris@deploy2002> helmfile [codfw] START helmfile.d/services/thumbor: apply [production]
10:24 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
10:24 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
10:16 <stevemunene@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage [production]
10:12 <stevemunene@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage [production]
09:56 <stevemunene@cumin1001> START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye [production]
09:45 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues [production]
09:45 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues [production]
09:41 <vgutierrez> resetting cp2035 management card - T333312 [production]
09:38 <elukey> dist-upgrade kafka-main1001 to bullseye - T332013 [production]
09:36 <godog> silence systemdunitfailed alerts for team=wmcs - T333315 [production]
09:35 <vgutierrez> depool cp2035 - T333312 [production]
09:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
09:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
09:12 <jbond@cumin1001> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts [production]
09:11 <jbond@cumin1001> START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts [production]
09:11 <jbond@cumin1001> END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts [production]