2021-03-09
ยง
|
14:07 |
<jgleeson> |
updated smashpig from 5a69abd40f to 58b070db1a |
[production] |
14:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14694 and previous config saved to /var/cache/conftool/dbconfig/20210309-140025-root.json |
[production] |
13:52 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1004.eqiad.wmnet |
[production] |
13:52 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE |
[production] |
13:50 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE |
[production] |
13:49 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE |
[production] |
13:49 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE |
[production] |
13:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14693 and previous config saved to /var/cache/conftool/dbconfig/20210309-134522-root.json |
[production] |
13:38 |
<arturo> |
draining cloudvrit1027 for T275753 |
[admin] |
13:37 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host prometheus1004.eqiad.wmnet |
[production] |
13:35 |
<arturo> |
icinga-downtime cloudvirt1038 for 30 days for T276922 |
[admin] |
13:34 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: HW issue |
[production] |
13:34 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: HW issue |
[production] |
13:32 |
<arturo> |
hard-reboot deployment-db05 because issues related to T276922 |
[deployment-prep] |
13:32 |
<arturo> |
hard-reboot deployment-db05 because issues related to T276922 |
[releng] |
13:31 |
<arturo> |
hard-reboot tools-docker-registry-04 because issues related to T276922 |
[tools] |
13:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14692 and previous config saved to /var/cache/conftool/dbconfig/20210309-133124-root.json |
[production] |
13:28 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1003.eqiad.wmnet |
[production] |
13:27 |
<elukey> |
reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster |
[production] |
13:26 |
<elukey> |
reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster |
[analytics] |
13:21 |
<arturo> |
add cloudvirt1039 to the ceph host aggregate (no longer a spare, we have cloudvirt1038 with HW failures) |
[admin] |
13:21 |
<jgleeson> |
updated payments-wiki from 65dbf0ed9d to 0e7800027a |
[production] |
13:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1198:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14691 and previous config saved to /var/cache/conftool/dbconfig/20210309-131652-marostegui.json |
[production] |
13:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14690 and previous config saved to /var/cache/conftool/dbconfig/20210309-131620-root.json |
[production] |
13:10 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host prometheus1003.eqiad.wmnet |
[production] |
13:08 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1103.eqiad.wmnet with reason: REIMAGE |
[production] |
13:06 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1103.eqiad.wmnet with reason: REIMAGE |
[production] |
13:03 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE |
[production] |
13:01 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE |
[production] |
13:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1168 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14689 and previous config saved to /var/cache/conftool/dbconfig/20210309-130116-root.json |
[production] |
12:59 |
<elukey> |
drain + reimage an-worker1103 to Buster |
[production] |
12:59 |
<elukey> |
drain + reimage an-worker1103 to Buster |
[analytics] |
12:59 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE |
[production] |
12:57 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE |
[production] |
12:56 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1403.eqiad.wmnet |
[production] |
12:56 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1402.eqiad.wmnet |
[production] |
12:52 |
<arturo> |
cloudvirt1038 hard powerdown / powerup for T276922 |
[admin] |
12:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P14688 and previous config saved to /var/cache/conftool/dbconfig/20210309-125007-marostegui.json |
[production] |
12:49 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14687 and previous config saved to /var/cache/conftool/dbconfig/20210309-124931-root.json |
[production] |
12:41 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single for host mw1403.eqiad.wmnet |
[production] |
12:41 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single for host mw1402.eqiad.wmnet |
[production] |
12:38 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
12:34 |
<arturo> |
briefly rebooting VM tools-docker-registry-04, we need to reboot the hypervisor cloudvirt1038 and failed to migrate away |
[tools] |
12:34 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1173 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14686 and previous config saved to /var/cache/conftool/dbconfig/20210309-123427-root.json |
[production] |
12:34 |
<arturo> |
briefly rebooting VM deployment-db05, we need to reboot its hypervisor cloudvirt1038 and failed to migrate to other |
[releng] |
12:34 |
<arturo> |
briefly rebooting VM deployment-db05, we need to reboot its hypervisor cloudvirt1038 and failed to migrate to other |
[deployment-prep] |
12:33 |
<arturo> |
rebooting cloudvirt1038 (T275753) |
[admin] |
12:33 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cloudvirt1038.eqiad.wmnet |
[production] |
12:31 |
<hnowlan@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
12:30 |
<hnowlan> |
regenerating interfaces and reimaging aqs101[1-5] |
[production] |