2020-01-15
§
|
08:44 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) |
[production] |
08:40 |
<godog> |
roll restart ores in codfw/eqiad to apply logging pipeline changes |
[production] |
08:40 |
<elukey@cumin1001> |
START - Cookbook sre.aqs.roll-restart |
[production] |
08:40 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) |
[production] |
08:40 |
<elukey@cumin1001> |
START - Cookbook sre.aqs.roll-restart |
[production] |
08:23 |
<godog> |
roll restart ores in codfw/eqiad to apply logging pipeline changes |
[production] |
08:13 |
<godog> |
testing ores logging to pipeline on ores2001 |
[production] |
07:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10159 and previous config saved to /var/cache/conftool/dbconfig/20200115-070201-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10158 and previous config saved to /var/cache/conftool/dbconfig/20200115-065353-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10157 and previous config saved to /var/cache/conftool/dbconfig/20200115-065305-marostegui.json |
[production] |
06:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10156 and previous config saved to /var/cache/conftool/dbconfig/20200115-064606-marostegui.json |
[production] |
06:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10155 and previous config saved to /var/cache/conftool/dbconfig/20200115-064535-marostegui.json |
[production] |
06:25 |
<marostegui> |
Upgrade db1098:3316 and db1098:3317 |
[production] |
06:23 |
<mholloway-shell@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: MachineVision: Make testcommonswiki behavior consistent with commonswiki (duration: 01m 16s) |
[production] |
06:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1098:3316 db1098:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10152 and previous config saved to /var/cache/conftool/dbconfig/20200115-062028-marostegui.json |
[production] |
06:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10151 and previous config saved to /var/cache/conftool/dbconfig/20200115-061859-marostegui.json |
[production] |
06:16 |
<marostegui> |
Remove revision partitions from db2088:3311 - T239453 |
[production] |
06:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json |
[production] |
06:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json |
[production] |
06:00 |
<mholloway-shell@deploy1001> |
Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae (duration: 05m 56s) |
[production] |
05:54 |
<mholloway-shell@deploy1001> |
Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae |
[production] |
01:32 |
<mutante> |
lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga |
[production] |
01:17 |
<mutante> |
dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered |
[production] |
01:15 |
<marxarelli> |
deploy zuul layout change https://gerrit.wikimedia.org/r/c/integration/config/+/564813 |
[releng] |
01:13 |
<mutante> |
dbproxy1017 - systemctl reload haproxy |
[production] |
00:22 |
<bstorm_> |
restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue |
[production] |
00:18 |
<wm-bot> |
<lucaswerkmeister> deployed bc1d49c202 (better CSRF error handling, T242573) |
[tools.lexeme-forms] |
00:12 |
<bstorm_> |
set max_connections to 600 temporarily while troubleshooting on m5 (db1133) |
[production] |
2020-01-14
§
|
23:51 |
<bd808> |
Rotated conduit API token |
[tools.os-deprecation] |
23:02 |
<bd808> |
Rotated conduit API token |
[tools.phab-ban] |
22:55 |
<bd808> |
Rotated conduit API token |
[tools.stashbot] |
21:44 |
<James_F> |
Zuul: Stop running JS tests for Parsoid and Parsoid-deploy T242782 |
[releng] |
20:47 |
<bd808> |
Lots of $HOME/error.log messages about "ile_get_contents(http://en.wikipedia.org/w/index.php?action=render&title=Template:CasTemplate): failed to open stream |
[tools.magnustools] |
20:12 |
<milimetric> |
deployed aqs with new service-runner version 2.7.3 |
[analytics] |
20:11 |
<milimetric@deploy1001> |
Finished deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version (duration: 04m 48s) |
[production] |
20:07 |
<milimetric@deploy1001> |
Started deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version |
[production] |
19:38 |
<Krinkle> |
No syslog entries from php-fpm or puppet-agent on deployment-mediawiki-07, T242659, rebooting server |
[releng] |
19:22 |
<urbanecm@deploy1001> |
Synchronized wmf-config/CommonSettings.php: SWAT: e400916: [wikitech] Restore contentadmin ability to manage abuse filters (duration: 01m 05s) |
[production] |
18:11 |
<vgutierrez> |
repooling cp5012 |
[production] |
18:06 |
<vgutierrez> |
depool cp5012 for some ats parent select debugging |
[production] |
18:02 |
<James_F> |
Zuul: Add JeremyNguyen to Jenkins whitelist T235286 |
[releng] |
17:54 |
<James_F> |
Zuul: [mediawiki/core] Stop publishing docker images for now T242775 |
[releng] |
17:43 |
<vgutierrez> |
repooling cp4027 |
[production] |
17:39 |
<vgutierrez> |
depooling cp4027 for some ats-tls parent balancing tests |
[production] |
17:21 |
<_joe_> |
upload docker-report 0.0.2 to {buster,stretch}-wikimedia T242604 |
[production] |
16:53 |
<liw@deploy1001> |
rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.15 |
[production] |
16:46 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
16:44 |
<liw> |
branch is cut for 1.35.0-wmv.15; train window is closed, but I'll continue train since the next time slot seems to not have anything |
[production] |
16:44 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:41 |
<marostegui> |
Enable puppet back on install1002 and install2002 - T242481 |
[production] |