2023-05-05
ยง
|
13:13 |
<andrewbogott> |
rebooting cloudbackup2001.codfw.wmnet, unresponsive |
[production] |
13:05 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47765 and previous config saved to /var/cache/conftool/dbconfig/20230505-130544-ladsgroup.json |
[production] |
13:05 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet |
[production] |
13:05 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet |
[production] |
12:57 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet |
[production] |
12:56 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet |
[production] |
12:50 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47764 and previous config saved to /var/cache/conftool/dbconfig/20230505-125038-ladsgroup.json |
[production] |
12:46 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet |
[production] |
12:44 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47763 and previous config saved to /var/cache/conftool/dbconfig/20230505-124412-ladsgroup.json |
[production] |
12:44 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance |
[production] |
12:43 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance |
[production] |
12:43 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47762 and previous config saved to /var/cache/conftool/dbconfig/20230505-124349-ladsgroup.json |
[production] |
12:31 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet |
[production] |
12:28 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47761 and previous config saved to /var/cache/conftool/dbconfig/20230505-122843-ladsgroup.json |
[production] |
12:24 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet |
[production] |
12:13 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47760 and previous config saved to /var/cache/conftool/dbconfig/20230505-121336-ladsgroup.json |
[production] |
12:06 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet |
[production] |
11:59 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet |
[production] |
11:58 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet |
[production] |
11:58 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47759 and previous config saved to /var/cache/conftool/dbconfig/20230505-115830-ladsgroup.json |
[production] |
11:52 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet |
[production] |
11:51 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47758 and previous config saved to /var/cache/conftool/dbconfig/20230505-115126-ladsgroup.json |
[production] |
11:51 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance |
[production] |
11:51 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance |
[production] |
11:26 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47757 and previous config saved to /var/cache/conftool/dbconfig/20230505-112649-ladsgroup.json |
[production] |
11:26 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47756 and previous config saved to /var/cache/conftool/dbconfig/20230505-112605-ladsgroup.json |
[production] |
11:11 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47755 and previous config saved to /var/cache/conftool/dbconfig/20230505-111145-ladsgroup.json |
[production] |
11:11 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47754 and previous config saved to /var/cache/conftool/dbconfig/20230505-111100-ladsgroup.json |
[production] |
10:56 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47753 and previous config saved to /var/cache/conftool/dbconfig/20230505-105640-ladsgroup.json |
[production] |
10:55 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47752 and previous config saved to /var/cache/conftool/dbconfig/20230505-105555-ladsgroup.json |
[production] |
10:41 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47751 and previous config saved to /var/cache/conftool/dbconfig/20230505-104135-ladsgroup.json |
[production] |
10:41 |
<moritzm> |
installing wireshark security updates |
[production] |
10:40 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47750 and previous config saved to /var/cache/conftool/dbconfig/20230505-104050-ladsgroup.json |
[production] |
09:28 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033) |
[production] |
09:28 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033) |
[production] |
09:14 |
<Amir1> |
power cycled db1170\ |
[production] |
09:10 |
<marostegui> |
Failover m2-master from dbproxy1013 to dbproxy1015 |
[production] |
09:08 |
<hnowlan@deploy1002> |
Finished deploy [restbase/deploy@8aba801]: deploying to host missing from configs (duration: 01m 22s) |
[production] |
09:06 |
<hnowlan@deploy1002> |
Started deploy [restbase/deploy@8aba801]: deploying to host missing from configs |
[production] |
08:58 |
<XioNoX> |
deploy CR914772 on all hosts running Bird |
[production] |
08:15 |
<godog> |
delete wal and chunks_head from prometheus5002 and prometheus4002 to let prometheus start back up and not crashloop - T309979 |
[production] |
08:07 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm |
[production] |
08:05 |
<hashar@deploy1002> |
Finished deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0 (duration: 00m 13s) |
[production] |
08:04 |
<hashar@deploy1002> |
Started deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0 |
[production] |
07:32 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
07:31 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
07:31 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance |
[production] |
07:31 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance |
[production] |
06:53 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet |
[production] |
06:51 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm |
[production] |