2023-10-19
ยง
|
15:25 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED |
[production] |
15:22 |
<brouberol> |
The kafka service has been stopped on kafka-jumbo100[1-6] - T336044 |
[analytics] |
15:15 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:15 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:15 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:14 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:14 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:14 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:14 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:13 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:13 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:13 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:13 |
<brouberol@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:13 |
<brouberol@cumin1001> |
START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned |
[production] |
15:09 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1008-dev'] |
[production] |
15:09 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1007-dev'] |
[production] |
15:09 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev'] |
[production] |
15:09 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev'] |
[production] |
15:08 |
<jclark@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet'] |
[production] |
15:08 |
<jclark@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet'] |
[production] |
15:08 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:08 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:08 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:08 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:07 |
<jclark@cumin1001> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet'] |
[production] |
15:06 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:05 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:04 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:04 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
15:04 |
<brouberol> |
sudo cumin --batch-size 1 --batch-sleep 60 'kafka-jumbo100[1-6].eqiad.wmnet' 'sudo systemctl stop kafka.service' - T336044 |
[analytics] |
15:02 |
<brouberol> |
disabling puppet on kafka-jumbo100[1-6] to make sure kafka isn't resarted - T336044 |
[analytics] |
14:59 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet'] |
[production] |
14:59 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet'] |
[production] |
14:59 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
14:58 |
<elukey> |
powercycle titan1001 |
[production] |
14:58 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet'] |
[production] |
14:57 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet'] |
[production] |
14:56 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev'] |
[production] |
14:56 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev'] |
[production] |
14:55 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev'] |
[production] |
14:55 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev'] |
[production] |
14:55 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev'] |
[production] |
14:55 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev'] |
[production] |
14:55 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev'] |
[production] |
14:55 |
<jclark@cumin1001> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev'] |
[production] |
14:54 |
<jclark@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
14:53 |
<Rook> |
Bump urllib3 from 1.26.17 to 1.26.18 |
[paws] |
14:51 |
<jclark@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
14:51 |
<jclark@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
14:51 |
<jclark@cumin1001> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |