|
2022-01-14
ยง
|
| 12:20 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster |
[production] |
| 12:18 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster |
[production] |
| 11:51 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster |
[production] |
| 11:49 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing |
[production] |
| 11:48 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing |
[production] |
| 11:45 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster |
[production] |
| 11:18 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster |
[production] |
| 11:01 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org |
[production] |
| 11:00 |
<moritzm> |
systemctl reset-failed ifup@ens5.service on archiva1002 T273026 |
[production] |
| 10:56 |
<moritzm> |
rebooting archiva1002 (running archiva.wikimedia.org) |
[production] |
| 10:56 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org |
[production] |
| 10:55 |
<bking@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch |
[production] |
| 10:50 |
<moritzm> |
systemctl reset-failed ifup@ens5.service on an-test-ui1001 T273026 |
[production] |
| 10:50 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet |
[production] |
| 10:42 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet |
[production] |
| 10:21 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet |
[production] |
| 10:17 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet |
[production] |
| 10:07 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet |
[production] |
| 10:05 |
<moritzm> |
rebooting matomo1002 (running piwik.wikimedia.org) |
[production] |
| 10:04 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet |
[production] |
| 09:59 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-druid1001.eqiad.wmnet |
[production] |
| 09:55 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-test-druid1001.eqiad.wmnet |
[production] |
| 09:38 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt1001.wikimedia.org |
[production] |
| 09:35 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM apt1001.wikimedia.org |
[production] |
| 09:32 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1003.wikimedia.org |
[production] |
| 09:28 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM install1003.wikimedia.org |
[production] |
| 09:22 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-client1001.eqiad.wmnet |
[production] |
| 09:19 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-test-client1001.eqiad.wmnet |
[production] |
| 09:11 |
<marostegui> |
Move pc1014 from pc1 to pc2 T299046 |
[production] |
| 09:05 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bullseye |
[production] |
| 09:03 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1009.eqiad.wmnet |
[production] |
| 09:01 |
<moritzm> |
rebooting an-tool1009 (running hue.wikimedia.org) |
[production] |
| 09:01 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-tool1009.eqiad.wmnet |
[production] |
| 09:00 |
<moritzm> |
systemctl reset-failed ifup@ens5.service on an-tool1005 T273026 |
[production] |
| 09:00 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1008.eqiad.wmnet |
[production] |
| 08:58 |
<moritzm> |
rebooting an-tool1008 (running yarn.wikimedia.org) |
[production] |
| 08:58 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-tool1008.eqiad.wmnet |
[production] |
| 08:57 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1007.eqiad.wmnet |
[production] |
| 08:55 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-tool1007.eqiad.wmnet |
[production] |
| 08:53 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1005.eqiad.wmnet |
[production] |
| 08:51 |
<moritzm> |
rebooting an-tool1007 (running turnilo.wikimedia.org) |
[production] |
| 08:50 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-tool1005.eqiad.wmnet |
[production] |
| 08:36 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet |
[production] |
| 08:34 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet |
[production] |
| 08:33 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bullseye |
[production] |
| 07:39 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bullseye |
[production] |
| 07:05 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bullseye |
[production] |
| 06:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove logpager group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18735 and previous config saved to /var/cache/conftool/dbconfig/20220114-063554-marostegui.json |
[production] |
| 06:15 |
<marostegui> |
Failover m5 proxy from dbproxy1017 to dbproxy1021 T298586 |
[production] |
| 05:16 |
<legoktm> |
manually restarted discard_held_messages service on lists1001, failed with a spurious sqlalchemy issue about packets being out of order |
[production] |