151-200 of 10000 results (132ms)
2026-03-13 ยง
18:05 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS trixie [production]
18:03 <elukey> powercycle db1253 - host not reachable via ssh, no events logged in racadm getsel, no console com2 available (blank screen) [production]
17:59 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage [production]
17:56 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage [production]
17:49 <brett@puppetserver1001> conftool action : set/pooled=yes; selector: name=cp4049.* [production]
17:46 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS trixie [production]
17:37 <cgoubert@deploy2002> helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [production]
17:37 <cgoubert@deploy2002> helmfile [codfw] START helmfile.d/services/rest-gateway: apply [production]
17:36 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie [production]
17:35 <brett@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS trixie [production]
17:35 <cgoubert@deploy2002> helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [production]
17:34 <cgoubert@deploy2002> helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [production]
17:27 <jforrester@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply [production]
17:26 <jforrester@deploy2002> helmfile [codfw] START helmfile.d/services/mw-experimental: apply [production]
17:20 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage [production]
17:17 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie [production]
17:17 <cgoubert@deploy2002> helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [production]
17:16 <cgoubert@deploy2002> helmfile [staging] START helmfile.d/services/rest-gateway: apply [production]
17:16 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage [production]
17:12 <fnegri@cumin1003> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1016.eqiad.wmnet [production]
17:12 <fnegri@cumin1003> START - Cookbook sre.hosts.remove-downtime for clouddb1016.eqiad.wmnet [production]
17:11 <fnegri@cumin1003> conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet [production]
17:11 <brett@puppetserver1001> conftool action : set/pooled=yes; selector: name=cp4048.* [production]
17:10 <dhinus> (relogging failed sal) conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet [production]
17:10 <dhinus> (relogging failed sal) DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Rebooting clouddb1016 T419960 [production]
17:09 <dhinus> (relogging failed sal) END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1015.eqiad.wmnet [production]
17:08 <dhinus> (relogging failed sal) START - Cookbook sre.hosts.remove-downtime for clouddb1015.eqiad.wmnet [production]
17:08 <jforrester@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [production]
17:07 <jforrester@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [production]
17:07 <dhinus> fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet [production]
17:07 <jforrester@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply [production]
17:07 <brett@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie [production]
17:06 <jforrester@deploy2002> helmfile [codfw] START helmfile.d/services/mw-experimental: apply [production]
16:40 <brett@cumin2002> START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS trixie [production]
16:39 <brett@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage [production]
16:36 <fnegri@cumin1003> conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet [production]
16:35 <fnegri@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Rebooting clouddb1015 T419960 [production]
16:34 <fnegri@cumin1003> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1014.eqiad.wmnet [production]
16:34 <fnegri@cumin1003> START - Cookbook sre.hosts.remove-downtime for clouddb1014.eqiad.wmnet [production]
16:34 <fnegri@cumin1003> conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet [production]
16:29 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet [production]
16:28 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org [production]
16:25 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet [production]
16:22 <andrew@cumin2002> START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org [production]
16:21 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org [production]
16:20 <fnegri@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Rebooting clouddb1014 T419960 [production]
16:20 <fnegri@cumin1003> conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet [production]
16:19 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor-dev2001.codfw.wmnet [production]