1201-1250 of 10000 results (74ms)
2023-03-28 ยง
11:22 <hnowlan@deploy2002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
11:21 <hnowlan@deploy2002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
11:08 <akosiaris@deploy2002> helmfile [codfw] DONE helmfile.d/services/thumbor: apply [production]
11:00 <akosiaris@deploy2002> helmfile [codfw] START helmfile.d/services/thumbor: apply [production]
10:24 <elukey@deploy2002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
10:24 <elukey@deploy2002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
10:16 <stevemunene@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage [production]
10:12 <stevemunene@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage [production]
09:56 <stevemunene@cumin1001> START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye [production]
09:45 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues [production]
09:45 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: HW issues [production]
09:41 <vgutierrez> resetting cp2035 management card - T333312 [production]
09:38 <elukey> dist-upgrade kafka-main1001 to bullseye - T332013 [production]
09:36 <godog> silence systemdunitfailed alerts for team=wmcs - T333315 [production]
09:35 <vgutierrez> depool cp2035 - T333312 [production]
09:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
09:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main1001.eqiad.wmnet with reason: stop kafka and dist-upgrade [production]
09:12 <jbond@cumin1001> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nicolas Fraison out of all services on: 2048 hosts [production]
09:11 <jbond@cumin1001> START - Cookbook sre.idm.logout Logging Nicolas Fraison out of all services on: 2048 hosts [production]
09:11 <jbond@cumin1001> END (ERROR) - Cookbook sre.idm.logout (exit_code=97) Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts [production]
09:11 <jbond@cumin1001> START - Cookbook sre.idm.logout Logging Nicolas Fraison out of systemdlogoutd on: 2048 hosts [production]
08:58 <vgutierrez> restart ipmiseld on cp2035 [production]
08:50 <aborrero@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org [production]
08:49 <ayounsi@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
08:48 <AndyRussG> update payments.wiki config 65bedd4a -> e31ffd7d, payments (automatic updates only) a6c6c2b1 -> f5ec2677 [production]
08:45 <ayounsi@deploy1002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
08:43 <ayounsi@deploy1002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
08:42 <aborrero@cumin2002> START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org [production]
08:39 <ayounsi@deploy1002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
08:37 <ayounsi@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
08:35 <ayounsi@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
08:34 <ayounsi@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
08:32 <ayounsi@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
08:32 <phedenskog@deploy2002> Finished deploy [performance/navtiming@e757bdf]: (no justification provided) (duration: 00m 06s) [production]
08:32 <phedenskog@deploy2002> Started deploy [performance/navtiming@e757bdf]: (no justification provided) [production]
08:31 <ayounsi@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
08:29 <ayounsi@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
08:25 <ayounsi@deploy1002> helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. [production]
08:21 <ayounsi@deploy1002> helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. [production]
08:14 <ayounsi@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
08:11 <oblivian@deploy2002> Finished scap: Backport for [[gerrit:903209|Failover statsd to graphite2004 (T330165)]] (duration: 08m 48s) [production]
08:08 <ayounsi@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
08:06 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 16 hosts with reason: Switch maintenance [production]
08:05 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 8:00:00 on 16 hosts with reason: Switch maintenance [production]
08:05 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 21 hosts with reason: Switch maintenance [production]
08:05 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 8:00:00 on 21 hosts with reason: Switch maintenance [production]
08:04 <oblivian@deploy2002> oblivian and filippo: Backport for [[gerrit:903209|Failover statsd to graphite2004 (T330165)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet [production]
08:03 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance [production]
08:03 <ayounsi@deploy1002> helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [production]
08:03 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 8:00:00 on es[1020-1022].eqiad.wmnet with reason: Switch maintenance [production]