2014-02-24
§
|
21:25 |
<RobH> |
updating blog apache configs to use techblog.w.o https cert |
[production] |
21:24 |
<gwicke> |
deployed Parsoid 51c71eb / deploy b684fea |
[production] |
21:17 |
<mutante> |
restarting parsoid on wtp1002 |
[production] |
20:33 |
<bd808> |
Restarted elasticsearch on logstash1002 in attempt to clear stuck reallocations likely caused by OOM while running recovery |
[production] |
19:30 |
<Coren> |
remmoting virt1001 (sick stuck on bad mounts) |
[production] |
17:06 |
<bd808> |
Logs on logstash1003 show elasticsearch split brain starting at 2014-02-23T00:00:12. logstash1001 and logstash1003 both thought they were master. logstash1001 not responding to logstash1003's requests to become authoritative. |
[production] |
16:38 |
<bd808> |
Logstash elasticsearch split-brain resulted in loss of all logs for 2014-02-24 from 00:00Z to ~16:30Z |
[production] |
16:16 |
<bd808> |
Restarted elasticsearch on logstash1001 |
[production] |
03:28 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at 2014-02-24 03:27:55+00:00 |
[production] |
02:44 |
<LocalisationUpdate> |
completed (1.23wmf15) at 2014-02-24 02:44:44+00:00 |
[production] |
02:31 |
<LocalisationUpdate> |
completed (1.23wmf14) at 2014-02-24 02:31:50+00:00 |
[production] |
01:16 |
<tstarling> |
updated /a/common/php-1.23wmf15 to {{Gerrit|I268599be9}}: [1.23wmf15] Make SiteStats (re)initializing more sane |
[production] |
01:16 |
<tstarling> |
synchronized php-1.23wmf14/includes/SiteStats.php |
[production] |
2014-02-23
§
|
20:29 |
<Tim> |
updated ss_active_users on plwiki master to not be -1 |
[production] |
20:14 |
<springle> |
killed SiteStatsInit from both wikiuser and wikiadmin on all s2 slaves |
[production] |
20:01 |
<Tim> |
killed SiteStatsInit queries on db1060 |
[production] |
19:57 |
<tstarling> |
synchronized php-1.23wmf15/includes/SiteStats.php |
[production] |
19:56 |
<tstarling> |
synchronized php-1.23wmf14/includes/SiteStats.php |
[production] |
19:48 |
<RobH> |
operations folks are looking into site issues at present |
[production] |
19:38 |
<greg-g> |
< paravoid> something that has to do with SiteStatsInit, probably |
[production] |
19:33 |
<greg-g> |
< paravoid> it's all plwiki |
[production] |
19:33 |
<greg-g> |
< paravoid> tons of SELECT /* SiteStatsInit::edits */ COUNT(*) FROM `revision` LIMIT 1 |
[production] |
19:32 |
<greg-g> |
< paravoid> it's s2 |
[production] |
02:08 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at 2014-02-23 02:08:36+00:00 |
[production] |
02:02 |
<LocalisationUpdate> |
completed (1.23wmf15) at 2014-02-23 02:02:44+00:00 |
[production] |
02:02 |
<LocalisationUpdate> |
completed (1.23wmf14) at 2014-02-23 02:01:56+00:00 |
[production] |
2014-02-21
§
|
22:12 |
<mwalker> |
updated civicrm from eb3536eb32cbc7400e4e5884d56fbf104e38fc2b to 41dce289bc15ea1ca638c37b29ff2e3e709a2251 for thank you templates |
[production] |
21:40 |
<bd808> |
mw1047 and mw1079 errors cleared after apache-graceful |
[production] |
21:29 |
<mutante> |
graceful'ing apache on mw1047 and mw1079 by request |
[production] |
21:26 |
<bd808> |
mw1047 and mw1079 throwing PHP exception that looks like APC corruption |
[production] |
20:35 |
<bd808> |
Finished scap: no-diff scap; recording asciicast (duration: 03m 13s) |
[production] |
20:31 |
<bd808> |
Started scap: no-diff scap; recording asciicast |
[production] |
18:57 |
<catrope> |
synchronized php-1.23wmf15/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ViewPageTarget.js 'touch' |
[production] |
18:57 |
<catrope> |
synchronized php-1.23wmf15/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'touch' |
[production] |
18:57 |
<catrope> |
synchronized php-1.23wmf14/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ViewPageTarget.js 'touch' |
[production] |
18:56 |
<catrope> |
synchronized php-1.23wmf14/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'touch' |
[production] |
18:55 |
<bd808> |
The 4 hosts that failed scap-rebuild-cdbs were snapshot[1234]; can we pull them from mediawiki-installation dsh group? |
[production] |
18:54 |
<bd808> |
Finished scap: no-diff scap to test script changes; expect l10n updates (duration: 13m 38s) |
[production] |
18:54 |
<bd808> |
scap-rebuild-cdbs failed on 4 hosts |
[production] |
18:50 |
<bd808> |
The 4 hosts that failed scap-1 were snapshot[1234]; all have old/bad python installs |
[production] |
18:49 |
<bd808> |
scap-1 failed on 4 hosts |
[production] |
18:41 |
<bd808> |
Started scap: no-diff scap to test script changes; expect l10n updates |
[production] |
18:36 |
<bd808> |
Forced update of /svr/scap to 6203585 across cluster |
[production] |
18:25 |
<ottomata> |
initiating kafka preferred replica election to rebalance partition leaders |
[production] |
18:19 |
<bblack> |
cp1054 healthy now, rebuilding persistent cache from scratch there... |
[production] |