2022-05-26
§
|
05:16 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298555)', diff saved to https://phabricator.wikimedia.org/P28574 and previous config saved to /var/cache/conftool/dbconfig/20220526-051641-ladsgroup.json |
[production] |
05:01 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P28573 and previous config saved to /var/cache/conftool/dbconfig/20220526-050136-ladsgroup.json |
[production] |
04:31 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298555)', diff saved to https://phabricator.wikimedia.org/P28571 and previous config saved to /var/cache/conftool/dbconfig/20220526-043126-ladsgroup.json |
[production] |
02:23 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1119 (T298555)', diff saved to https://phabricator.wikimedia.org/P28570 and previous config saved to /var/cache/conftool/dbconfig/20220526-022307-ladsgroup.json |
[production] |
02:23 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance |
[production] |
02:23 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance |
[production] |
02:23 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298555)', diff saved to https://phabricator.wikimedia.org/P28569 and previous config saved to /var/cache/conftool/dbconfig/20220526-022259-ladsgroup.json |
[production] |
02:07 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P28568 and previous config saved to /var/cache/conftool/dbconfig/20220526-020752-ladsgroup.json |
[production] |
01:52 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P28567 and previous config saved to /var/cache/conftool/dbconfig/20220526-015247-ladsgroup.json |
[production] |
01:51 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance |
[production] |
01:51 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance |
[production] |
01:46 |
<mutante> |
T308089 T274463 - gitlab1001 - still not enough disk space to finish full backup. moved backup of May 24th to /root/ . deleted latest.tar; started full-backup service once again |
[production] |
01:37 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298555)', diff saved to https://phabricator.wikimedia.org/P28566 and previous config saved to /var/cache/conftool/dbconfig/20220526-013741-ladsgroup.json |
[production] |
01:27 |
<mutante> |
T308089 T274463 - gitlab1001 - systemctl start rsync-config-backup-gitlab1003.wikimedia.org - Suceeded - RECOVERY - Check systemd state on gitlab1001 is OK |
[production] |
01:20 |
<mutante> |
gitlab1003 - T308089 T274463 - gitlab1001 - deleted backups from April 4 and April 5 from /srv/gitlab-backup AND deleted partial failed backups from May 26 from /mnt/gitlab-backup; deployed both gerrit:799016 and gerrit:799280 ; restarting full-backup service |
[production] |
01:01 |
<mutante> |
gitlab1003 - T308089 T274463 - gitlab1003 - systemctl status backup-restore is failed because it's looking for /mnt/gitlab-backup/latest/latest.tar needs gerrit:799016 |
[production] |
00:58 |
<mutante> |
gitlab1001 - T308089 T274463 - gitlab1001 - systemctl start full-backup |
[production] |
00:56 |
<mutante> |
gitlab1001 - T308089 T274463 - '<+icinga-wm> PROBLEM - Disk space on gitlab1001 is CRITICAL: DISK CRITICAL - free space: /mnt/gitlab-backup 0 MB' - manually deleted 1653294190_2022_05_23_14.10.2_gitlab_backup.tar (we have May 24 and 25, 26 could not finish writing backup) - RECOVERY - Disk space on gitlab1001 is OK |
[production] |
2022-05-25
§
|
23:35 |
<bd808@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply |
[production] |
23:35 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1184 (T298555)', diff saved to https://phabricator.wikimedia.org/P28563 and previous config saved to /var/cache/conftool/dbconfig/20220525-233520-ladsgroup.json |
[production] |
23:35 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance |
[production] |
23:35 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance |
[production] |
23:35 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298555)', diff saved to https://phabricator.wikimedia.org/P28562 and previous config saved to /var/cache/conftool/dbconfig/20220525-233512-ladsgroup.json |
[production] |
23:35 |
<bd808@deploy1002> |
helmfile [eqiad] START helmfile.d/services/developer-portal: apply |
[production] |
23:20 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P28561 and previous config saved to /var/cache/conftool/dbconfig/20220525-232007-ladsgroup.json |
[production] |
23:05 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P28560 and previous config saved to /var/cache/conftool/dbconfig/20220525-230502-ladsgroup.json |
[production] |
22:49 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298555)', diff saved to https://phabricator.wikimedia.org/P28559 and previous config saved to /var/cache/conftool/dbconfig/20220525-224957-ladsgroup.json |
[production] |
22:47 |
<bd808@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/developer-portal: apply |
[production] |
22:47 |
<bd808@deploy1002> |
helmfile [codfw] START helmfile.d/services/developer-portal: apply |
[production] |
22:46 |
<bd808@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/developer-portal: apply |
[production] |
22:45 |
<bd808@deploy1002> |
helmfile [codfw] START helmfile.d/services/developer-portal: apply |
[production] |
22:06 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
22:03 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
21:47 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:45 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
21:45 |
<cmooney@cumin1001> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
21:21 |
<ejegg> |
updated Fundraising CiviCRM from b8b8c177 to dc72ad44 |
[production] |
21:06 |
<joal@deploy1002> |
Finished deploy [airflow-dags/analytics_test@3ae51e7]: (no justification provided) (duration: 00m 06s) |
[production] |
21:06 |
<joal@deploy1002> |
Started deploy [airflow-dags/analytics_test@3ae51e7]: (no justification provided) |
[production] |
20:37 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1106 (T298555)', diff saved to https://phabricator.wikimedia.org/P28558 and previous config saved to /var/cache/conftool/dbconfig/20220525-203708-ladsgroup.json |
[production] |
20:37 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance |
[production] |
20:37 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance |
[production] |
20:37 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
20:36 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance |
[production] |
20:35 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
20:34 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
20:34 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
20:33 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
20:32 |
<cjming> |
end of UTC late backport window |
[production] |
20:28 |
<cjming@deploy1002> |
Synchronized wmf-config/CirrusSearch-common.php: Config: [[gerrit:775965|cirrus: Migrate popularity_score configuration]] (duration: 00m 51s) |
[production] |