|
2026-06-02
ยง
|
| 14:50 |
<blake@cumin1003> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage |
[production] |
| 14:49 |
<fceratto@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T426633)', diff saved to https://phabricator.wikimedia.org/P93575 and previous config saved to /var/cache/conftool/dbconfig/20260602-144935-fceratto.json |
[production] |
| 14:42 |
<fceratto@cumin1003> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for pc2021.codfw.wmnet |
[production] |
| 14:42 |
<fceratto@cumin1003> |
START - Cookbook sre.hosts.remove-downtime for pc2021.codfw.wmnet |
[production] |
| 14:41 |
<fceratto@cumin1003> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2250.codfw.wmnet |
[production] |
| 14:41 |
<fceratto@cumin1003> |
START - Cookbook sre.hosts.remove-downtime for db2250.codfw.wmnet |
[production] |
| 14:41 |
<fceratto@cumin1003> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2158.codfw.wmnet |
[production] |
| 14:41 |
<fceratto@cumin1003> |
START - Cookbook sre.hosts.remove-downtime for db2158.codfw.wmnet |
[production] |
| 14:41 |
<fceratto@cumin1003> |
END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc2021: Repooling |
[production] |
| 14:41 |
<fceratto@cumin1003> |
END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) |
[production] |
| 14:41 |
<fceratto@cumin1003> |
START - Cookbook sre.mysql.parsercache |
[production] |
| 14:41 |
<fceratto@cumin1003> |
START - Cookbook sre.mysql.pool pool pc2021: Repooling |
[production] |
| 14:41 |
<fceratto@cumin1003> |
dbctl commit (dc=all): 'Depooling db1182 (T426633)', diff saved to https://phabricator.wikimedia.org/P93573 and previous config saved to /var/cache/conftool/dbconfig/20260602-144110-fceratto.json |
[production] |
| 14:41 |
<fceratto@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance |
[production] |
| 14:41 |
<fceratto@cumin1003> |
START - Cookbook sre.mysql.pool pool db2158: Repooling |
[production] |
| 14:40 |
<trueg@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply |
[production] |
| 14:40 |
<fceratto@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1156 (T426633)', diff saved to https://phabricator.wikimedia.org/P93571 and previous config saved to /var/cache/conftool/dbconfig/20260602-144043-fceratto.json |
[production] |
| 14:38 |
<trueg@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply |
[production] |
| 14:38 |
<atsuko@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver: apply |
[production] |
| 14:38 |
<atsuko@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver: apply |
[production] |
| 14:38 |
<jnuche> |
restarting Jenkins |
[releng] |
| 14:37 |
<trueg@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply |
[production] |
| 14:37 |
<jiji@cumin1003> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1048.eqiad.wmnet |
[production] |
| 14:37 |
<jiji@cumin1003> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
| 14:37 |
<jiji@cumin1003> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1048.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003" |
[production] |
| 14:37 |
<blake@cumin1003> |
START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS trixie |
[production] |
| 14:36 |
<blake@cumin1003> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS trixie |
[production] |
| 14:34 |
<blake@cumin1003> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage |
[production] |
| 14:30 |
<fceratto@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P93569 and previous config saved to /var/cache/conftool/dbconfig/20260602-143035-fceratto.json |
[production] |
| 14:30 |
<blake@cumin1003> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage |
[production] |
| 14:28 |
<jnuche> |
bring back castor node, that didn't help |
[releng] |
| 14:25 |
<jiji@cumin1003> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc1048.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003" |
[production] |
| 14:23 |
<jnuche> |
trying to reconnect castor node, see if that helps somehow |
[releng] |
| 14:21 |
<cwilliams@cumin1003> |
START - Cookbook sre.mysql.pool pool db1167: Repooling after Icing wait-for-green timeout |
[production] |
| 14:20 |
<blake@cumin1003> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage |
[production] |
| 14:20 |
<fceratto@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P93566 and previous config saved to /var/cache/conftool/dbconfig/20260602-142027-fceratto.json |
[production] |
| 14:17 |
<blake@cumin1003> |
START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS trixie |
[production] |
| 14:17 |
<jayme@cumin2002> |
START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS trixie |
[production] |
| 14:17 |
<cwilliams@cumin1003> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1167.eqiad.wmnet |
[production] |
| 14:17 |
<cwilliams@cumin1003> |
START - Cookbook sre.hosts.remove-downtime for db1167.eqiad.wmnet |
[production] |
| 14:16 |
<blake@cumin1003> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS trixie |
[production] |
| 14:15 |
<jayme@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS trixie |
[production] |
| 14:14 |
<jiji@cumin1003> |
START - Cookbook sre.dns.netbox |
[production] |
| 14:13 |
<blake@cumin1003> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage |
[production] |
| 14:12 |
<jnuche> |
option to "Enable Gearman" times out. Can't re-enable from UI. Gearman plugin logs are empty. Neat |
[releng] |
| 14:10 |
<fceratto@cumin1003> |
dbctl commit (dc=all): 'Repooling after maintenance db1156 (T426633)', diff saved to https://phabricator.wikimedia.org/P93564 and previous config saved to /var/cache/conftool/dbconfig/20260602-141019-fceratto.json |
[production] |
| 14:09 |
<urbanecm@deploy1003> |
mwscript-k8s job started: foreachwikiindblist growthexperiments userOptions.php --delete --nowarn growthexperiments-homepage-variant # T417621 |
[production] |
| 14:09 |
<jiji@cumin1003> |
START - Cookbook sre.hosts.decommission for hosts mc1048.eqiad.wmnet |
[production] |
| 14:08 |
<urbanecm@deploy1003> |
mwscript-k8s job started: foreachwikiindblist growthexperiments userOptions.php --delete growthexperiments-homepage-variant # T417621 |
[production] |
| 14:05 |
<jnuche> |
trying to reconnect Gearman |
[releng] |