1551-1600 of 10000 results (77ms)
2020-01-15 §
06:10 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json [production]
06:03 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json [production]
06:00 <mholloway-shell@deploy1001> Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae (duration: 05m 56s) [production]
05:54 <mholloway-shell@deploy1001> Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae [production]
01:32 <mutante> lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga [production]
01:17 <mutante> dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered [production]
01:15 <marxarelli> deploy zuul layout change https://gerrit.wikimedia.org/r/c/integration/config/+/564813 [releng]
01:13 <mutante> dbproxy1017 - systemctl reload haproxy [production]
00:22 <bstorm_> restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue [production]
00:18 <wm-bot> <lucaswerkmeister> deployed bc1d49c202 (better CSRF error handling, T242573) [tools.lexeme-forms]
00:12 <bstorm_> set max_connections to 600 temporarily while troubleshooting on m5 (db1133) [production]
2020-01-14 §
23:51 <bd808> Rotated conduit API token [tools.os-deprecation]
23:02 <bd808> Rotated conduit API token [tools.phab-ban]
22:55 <bd808> Rotated conduit API token [tools.stashbot]
21:44 <James_F> Zuul: Stop running JS tests for Parsoid and Parsoid-deploy T242782 [releng]
20:47 <bd808> Lots of $HOME/error.log messages about "ile_get_contents(http://en.wikipedia.org/w/index.php?action=render&amp;title=Template:CasTemplate): failed to open stream [tools.magnustools]
20:12 <milimetric> deployed aqs with new service-runner version 2.7.3 [analytics]
20:11 <milimetric@deploy1001> Finished deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version (duration: 04m 48s) [production]
20:07 <milimetric@deploy1001> Started deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version [production]
19:38 <Krinkle> No syslog entries from php-fpm or puppet-agent on deployment-mediawiki-07, T242659, rebooting server [releng]
19:22 <urbanecm@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: e400916: [wikitech] Restore contentadmin ability to manage abuse filters (duration: 01m 05s) [production]
18:11 <vgutierrez> repooling cp5012 [production]
18:06 <vgutierrez> depool cp5012 for some ats parent select debugging [production]
18:02 <James_F> Zuul: Add JeremyNguyen to Jenkins whitelist T235286 [releng]
17:54 <James_F> Zuul: [mediawiki/core] Stop publishing docker images for now T242775 [releng]
17:43 <vgutierrez> repooling cp4027 [production]
17:39 <vgutierrez> depooling cp4027 for some ats-tls parent balancing tests [production]
17:21 <_joe_> upload docker-report 0.0.2 to {buster,stretch}-wikimedia T242604 [production]
16:53 <liw@deploy1001> rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.15 [production]
16:46 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
16:44 <liw> branch is cut for 1.35.0-wmv.15; train window is closed, but I'll continue train since the next time slot seems to not have anything [production]
16:44 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:41 <marostegui> Enable puppet back on install1002 and install2002 - T242481 [production]
16:31 <liw@deploy1001> Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) (duration: 43m 29s) [production]
16:26 <marostegui> Disable temporarily puppet on install1002 and install2002 - T242481 [production]
16:08 <volans@deploy1001> Finished deploy [debmonitor/deploy@e72911c]: Release v0.2.4 (duration: 01m 09s) [production]
16:07 <volans@deploy1001> Started deploy [debmonitor/deploy@e72911c]: Release v0.2.4 [production]
15:47 <liw@deploy1001> Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) [production]
15:33 <James_F> Depooling integration-agent-docker-1003 due to issues. [releng]
15:29 <bstorm_> failed the gridengine master back to the master server from the shadow [tools]
15:02 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:02 <marostegui> Copy data from db1080 to db1107 T242702 [production]
15:02 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1080 for tranfer', diff saved to https://phabricator.wikimedia.org/P10144 and previous config saved to /var/cache/conftool/dbconfig/20200114-150223-marostegui.json [production]
15:00 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:51 <liw@deploy1001> scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_44869219" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 55s) [production]
14:47 <liw@deploy1001> Started scap: testwiki to php-1.35.0-wmf.15 and rebuild l10n cache [production]
14:43 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10143 and previous config saved to /var/cache/conftool/dbconfig/20200114-144341-marostegui.json [production]
14:26 <marostegui> Move db1114 under db1080 [production]
14:24 <marostegui> Stop db1080 and db1107 replication in sync [production]
14:23 <Zppix> delete pod to attempt force restart due to openstack upgrade (T241347) [tools.zppixbot]