2022-05-18
§
|
23:58 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
23:58 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
23:57 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1174 (T303603)', diff saved to https://phabricator.wikimedia.org/P28009 and previous config saved to /var/cache/conftool/dbconfig/20220518-235759-ladsgroup.json |
[production] |
23:53 |
<mutante> |
webperf1001 - systemctl reset-failed |
[production] |
23:53 |
<mutante> |
webperf1001/webperf2001 - re-enabling notifications in icinga that were disabled without comment (please don't do this, they keep being forgotten on a regular basis) |
[production] |
23:49 |
<mutante> |
seaborgium - broken systemd state in Icinga since 23d - systemctl reset-failed |
[production] |
23:48 |
<mutante> |
ms-be1063 - broken systemd state in Icinga since 19d - systemctl reset-failed |
[production] |
23:47 |
<mutante> |
ms-be1054 - broken systemd state in Icinga since 19d - systemctl reset-failed |
[production] |
23:47 |
<mutante> |
ms-be1036 - broken systemd state in Icinga since 15d - systemctl reset-failed |
[production] |
23:45 |
<mutante> |
dumpsdata1002 - broken systemd state in Icinga since 23d - systemctl reset-failed |
[production] |
23:44 |
<mutante> |
deploy2002 - broken systemd state in Icinga since 42d - systemctl reset-failed |
[production] |
23:43 |
<mutante> |
an-db1002 - broken systemd state in Icinga since 48d - systemctl reset-failed |
[production] |
23:42 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P28008 and previous config saved to /var/cache/conftool/dbconfig/20220518-234254-ladsgroup.json |
[production] |
23:27 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P28007 and previous config saved to /var/cache/conftool/dbconfig/20220518-232749-ladsgroup.json |
[production] |
23:27 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1096:3316 (T298555)', diff saved to https://phabricator.wikimedia.org/P28006 and previous config saved to /var/cache/conftool/dbconfig/20220518-232704-ladsgroup.json |
[production] |
23:27 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance |
[production] |
23:27 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance |
[production] |
23:26 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298555)', diff saved to https://phabricator.wikimedia.org/P28005 and previous config saved to /var/cache/conftool/dbconfig/20220518-232656-ladsgroup.json |
[production] |
23:17 |
<jhathaway@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: exim debug log capture |
[production] |
23:16 |
<jhathaway@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: exim debug log capture |
[production] |
23:12 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1174 (T303603)', diff saved to https://phabricator.wikimedia.org/P28004 and previous config saved to /var/cache/conftool/dbconfig/20220518-231244-ladsgroup.json |
[production] |
23:11 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P28003 and previous config saved to /var/cache/conftool/dbconfig/20220518-231151-ladsgroup.json |
[production] |
23:10 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1174 (T303603)', diff saved to https://phabricator.wikimedia.org/P28002 and previous config saved to /var/cache/conftool/dbconfig/20220518-230956-ladsgroup.json |
[production] |
23:09 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance |
[production] |
23:09 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance |
[production] |
23:09 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158 (T303603)', diff saved to https://phabricator.wikimedia.org/P28001 and previous config saved to /var/cache/conftool/dbconfig/20220518-230948-ladsgroup.json |
[production] |
22:56 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P28000 and previous config saved to /var/cache/conftool/dbconfig/20220518-225646-ladsgroup.json |
[production] |
22:54 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P27999 and previous config saved to /var/cache/conftool/dbconfig/20220518-225443-ladsgroup.json |
[production] |
22:50 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
22:50 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
22:50 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
22:49 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
22:46 |
<ladsgroup@deploy1002> |
Synchronized php-1.39.0-wmf.12/resources/src/mediawiki.htmlform/cond-state.js: Backport: [[gerrit:793146|mw.htmlform: Fix conditional hide/disable for non-OOUI forms (T308626)]] (duration: 00m 51s) |
[production] |
22:41 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298555)', diff saved to https://phabricator.wikimedia.org/P27998 and previous config saved to /var/cache/conftool/dbconfig/20220518-224141-ladsgroup.json |
[production] |
22:39 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P27997 and previous config saved to /var/cache/conftool/dbconfig/20220518-223938-ladsgroup.json |
[production] |
22:34 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
22:33 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
22:33 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
22:32 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
22:30 |
<ladsgroup@deploy1002> |
Synchronized php-1.39.0-wmf.12/includes/parser/ParserObserver.php: Backport: [[gerrit:792665|parser: Avoid pushing the whole content to ParserObserver debug log (T305218)]] (duration: 00m 52s) |
[production] |
22:24 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158 (T303603)', diff saved to https://phabricator.wikimedia.org/P27996 and previous config saved to /var/cache/conftool/dbconfig/20220518-222433-ladsgroup.json |
[production] |
22:21 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1158 (T303603)', diff saved to https://phabricator.wikimedia.org/P27995 and previous config saved to /var/cache/conftool/dbconfig/20220518-222145-ladsgroup.json |
[production] |
22:21 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance |
[production] |
22:21 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance |
[production] |
22:21 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance |
[production] |
22:21 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance |
[production] |