2022-04-05
§
|
01:36 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24077 and previous config saved to /var/cache/conftool/dbconfig/20220405-013609-ladsgroup.json |
[production] |
01:15 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet |
[production] |
01:07 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet |
[production] |
01:06 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet |
[production] |
01:02 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet |
[production] |
00:58 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet |
[production] |
00:53 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet |
[production] |
00:53 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet |
[production] |
00:51 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet |
[production] |
00:51 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp4034.ulsfo.wmnet |
[production] |
00:50 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet |
[production] |
00:43 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp5016.eqsin.wmnet |
[production] |
00:42 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet |
[production] |
00:42 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet |
[production] |
00:40 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet |
[production] |
00:39 |
<mutante> |
gitlab1001 - mv 1648814678_2022_04_01_14.9.1_gitlab_backup.tar and other files from April 2nd/April 3rd over from /srv/gitlab-backup to /mnt/gitlab-backup to prevent another outage due to disk space T274463 |
[production] |
00:36 |
<mutante> |
gitlab2001 - apt-get clean to prevent disk space issues |
[production] |
00:34 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24076 and previous config saved to /var/cache/conftool/dbconfig/20220405-003419-ladsgroup.json |
[production] |
00:34 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance |
[production] |
00:34 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance |
[production] |
00:34 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24075 and previous config saved to /var/cache/conftool/dbconfig/20220405-003405-ladsgroup.json |
[production] |
00:33 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet |
[production] |
00:33 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet |
[production] |
00:33 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1047.eqiad.wmnet |
[production] |
00:32 |
<mutante> |
gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... T274463 - <+icinga-wm> RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK |
[production] |
00:30 |
<mutante> |
gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... |
[production] |
00:27 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1048.eqiad.wmnet |
[production] |
00:23 |
<mutante> |
wtp1046, wtp1047, wtp1048 - rebooting, one at a time |
[production] |
00:21 |
<dzahn@cumin2002> |
conftool action : set/pooled=no; selector: dc=eqiad,name=wtp104[6-8].eqiad.wmnet |
[production] |
00:19 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24074 and previous config saved to /var/cache/conftool/dbconfig/20220405-001900-ladsgroup.json |
[production] |
00:18 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet |
[production] |
00:17 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet |
[production] |
00:16 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet |
[production] |
00:03 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24073 and previous config saved to /var/cache/conftool/dbconfig/20220405-000355-ladsgroup.json |
[production] |
2022-04-04
§
|
23:51 |
<mutante> |
apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659) |
[production] |
23:48 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json |
[production] |
22:48 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json |
[production] |
22:48 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance |
[production] |
22:48 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance |
[production] |
22:48 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json |
[production] |
22:33 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json |
[production] |
22:18 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json |
[production] |
22:03 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json |
[production] |
21:14 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet |
[production] |
21:14 |
<mutante> |
puppetmaster1001/puppetmaster2003 - geoip / maxmind database update timers renamed. 'geoip_update_legacy' became 'geoip_update_main', 'geoip_update' became 'geoip_update_ipinfo'. Not using the confusing 'legacy' term anymore as was suggested as part of (T303464) |
[production] |
21:11 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet |
[production] |
21:09 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet |
[production] |
21:05 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet |
[production] |
21:02 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet |
[production] |
21:02 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet |
[production] |