|
2026-06-15
ยง
|
| 15:13 |
<cmooney@cumin1003> |
START - Cookbook sre.dns.admin DNS admin: pool esams [reason: no reason specified, no task ID specified] |
[production] |
| 15:02 |
<topranks> |
depool esams due to cr2-esams rpd crash |
[production] |
| 15:02 |
<cmooney@cumin1003> |
END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool esams [reason: no reason specified, no task ID specified] |
[production] |
| 15:01 |
<cmooney@cumin1003> |
START - Cookbook sre.dns.admin DNS admin: depool esams [reason: no reason specified, no task ID specified] |
[production] |
| 15:00 |
<trueg@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply |
[production] |
| 14:58 |
<elukey@cumin1003> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART |
[production] |
| 14:57 |
<elukey@cumin1003> |
START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART |
[production] |
| 14:57 |
<trueg@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs: apply |
[production] |
| 14:55 |
<cwilliams@cumin1003> |
START - Cookbook sre.mysql.pool pool db1196: Migration of db1196.eqiad.wmnet completed |
[production] |
| 14:54 |
<topranks> |
enable BGP graceful-shutdown sender on cr2-esams to drain traffic T427056 |
[production] |
| 14:52 |
<cmooney@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-esams,cr2-esams IPv6 with reason: bouncing pic0 to reconfigure port speeds |
[production] |
| 14:52 |
<wm-bot2> |
Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27554700598 (https://github.com/cluebotng/component-configs/commits/12298f8c7711b0dbc3ebe3196da055b62b307301) |
[tools.cluebotng-monitoring] |
| 14:45 |
<wm-bot2> |
Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27554179755 (https://github.com/cluebotng/component-configs/commits/a236330774424b9ce999258a01f924f1994594b1) |
[tools.cluebotng] |
| 14:44 |
<wm-bot2> |
Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/27554179713 (https://github.com/cluebotng/component-configs/commits/a236330774424b9ce999258a01f924f1994594b1) |
[tools.cluebotng-review] |
| 14:44 |
<wm-bot2> |
Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27554179699 (https://github.com/cluebotng/component-configs/commits/a236330774424b9ce999258a01f924f1994594b1) |
[tools.cluebotng-staging] |
| 14:41 |
<cwilliams@cumin1003> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1196.eqiad.wmnet with OS trixie |
[production] |
| 14:31 |
<andrew@cloudcumin1001> |
START - Cookbook wmcs.ceph.upgrade_osds (T428385) |
[admin] |
| 14:31 |
<elukey@cumin1003> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1077.eqiad.wmnet with OS trixie |
[production] |
| 14:31 |
<elukey@cumin1003> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" |
[production] |
| 14:30 |
<wmftkbot> |
Test Kitchen experiment (poll 902) - adds: none; removes: ncs-cors-test; fields: none - TK tips at https://w.wiki/FwuD |
[analytics] |
| 14:24 |
<elukey@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: tesT |
[production] |
| 14:24 |
<elukey@cumin1003> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" |
[production] |
| 14:23 |
<cwilliams@cumin1003> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage |
[production] |
| 14:17 |
<cwilliams@cumin1003> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage |
[production] |
| 14:10 |
<andrew@cloudcumin1001> |
END (PASS) - Cookbook wmcs.ceph.upgrade_mons (exit_code=0) |
[admin] |
| 14:08 |
<hnowlan@deploy1003> |
helmfile [eqiad] DONE helmfile.d/services/thumbor: apply |
[production] |
| 14:07 |
<hnowlan@deploy1003> |
helmfile [eqiad] START helmfile.d/services/thumbor: apply |
[production] |
| 14:07 |
<elukey@cumin1003> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1077.eqiad.wmnet with reason: host reimage |
[production] |
| 14:07 |
<elukey@cumin1003> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1077.eqiad.wmnet with reason: host reimage |
[production] |
| 14:06 |
<hnowlan@deploy1003> |
helmfile [codfw] DONE helmfile.d/services/thumbor: apply |
[production] |
| 14:05 |
<hnowlan@deploy1003> |
helmfile [codfw] START helmfile.d/services/thumbor: apply |
[production] |
| 14:05 |
<hnowlan@deploy1003> |
helmfile [staging] DONE helmfile.d/services/thumbor: apply |
[production] |
| 14:04 |
<hnowlan@deploy1003> |
helmfile [staging] START helmfile.d/services/thumbor: apply |
[production] |
| 14:03 |
<cwilliams@cumin1003> |
START - Cookbook sre.hosts.reimage for host db1196.eqiad.wmnet with OS trixie |
[production] |
| 14:02 |
<oblivian@cumin1003> |
END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "revert deployment - oblivian@cumin1003" |
[production] |
| 14:02 |
<oblivian@cumin1003> |
END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: revert deployment - oblivian@cumin1003 |
[production] |
| 14:01 |
<oblivian@cumin1003> |
START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: revert deployment - oblivian@cumin1003 |
[production] |
| 14:01 |
<oblivian@cumin1003> |
START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "revert deployment - oblivian@cumin1003" |
[production] |
| 14:01 |
<cwilliams@cumin1003> |
END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1196: Upgrading db1196.eqiad.wmnet |
[production] |
| 14:00 |
<cwilliams@cumin1003> |
START - Cookbook sre.mysql.depool depool db1196: Upgrading db1196.eqiad.wmnet |
[production] |
| 14:00 |
<cwilliams@cumin1003> |
START - Cookbook sre.mysql.major-upgrade |
[production] |
| 13:56 |
<elukey@cumin1003> |
START - Cookbook sre.hosts.reimage for host cloudvirt1077.eqiad.wmnet with OS trixie |
[production] |
| 13:56 |
<elukey@cumin1003> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie |
[production] |
| 13:54 |
<federico3> |
doing a quick restart of sanitarium hosts db1155 and db1154 |
[production] |
| 13:53 |
<atsuko@deploy1003> |
mwscript-k8s job started: foreachwikiindblist mwscript.dblist extensions/Translate/scripts/ttmserver-export.php --ttmserver codfw-k8s # T425377: populating translation memory (ttmserver-export.php) on codfw-k8s (dblist: https://phabricator.wikimedia.org/P94145) |
[production] |
| 13:51 |
<fceratto@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Reboots T426633 |
[production] |
| 13:51 |
<fceratto@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Reboots T426633 |
[production] |
| 13:50 |
<andrew@cloudcumin1001> |
START - Cookbook wmcs.ceph.upgrade_mons (T428385) |
[admin] |
| 13:49 |
<fceratto@cumin1003> |
DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 11 hosts with reason: Reboots T426633 |
[production] |
| 13:49 |
<trueg@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs: apply |
[production] |