production SAL

6651-6700 of 10000 results (30ms)

2014-02-24 §
21:17	<mutante>	restarting parsoid on wtp1002	[production]
20:33	<bd808>	Restarted elasticsearch on logstash1002 in attempt to clear stuck reallocations likely caused by OOM while running recovery	[production]
19:30	<Coren>	remmoting virt1001 (sick stuck on bad mounts)	[production]
17:06	<bd808>	Logs on logstash1003 show elasticsearch split brain starting at 2014-02-23T00:00:12. logstash1001 and logstash1003 both thought they were master. logstash1001 not responding to logstash1003's requests to become authoritative.	[production]
16:38	<bd808>	Logstash elasticsearch split-brain resulted in loss of all logs for 2014-02-24 from 00:00Z to ~16:30Z	[production]
16:16	<bd808>	Restarted elasticsearch on logstash1001	[production]
03:28	<LocalisationUpdate>	ResourceLoader cache refresh completed at 2014-02-24 03:27:55+00:00	[production]
02:44	<LocalisationUpdate>	completed (1.23wmf15) at 2014-02-24 02:44:44+00:00	[production]
02:31	<LocalisationUpdate>	completed (1.23wmf14) at 2014-02-24 02:31:50+00:00	[production]
01:16	<tstarling>	updated /a/common/php-1.23wmf15 to {{Gerrit\|I268599be9}}: [1.23wmf15] Make SiteStats (re)initializing more sane	[production]
01:16	<tstarling>	synchronized php-1.23wmf14/includes/SiteStats.php	[production]
2014-02-23 §
20:29	<Tim>	updated ss_active_users on plwiki master to not be -1	[production]
20:14	<springle>	killed SiteStatsInit from both wikiuser and wikiadmin on all s2 slaves	[production]
20:01	<Tim>	killed SiteStatsInit queries on db1060	[production]
19:57	<tstarling>	synchronized php-1.23wmf15/includes/SiteStats.php	[production]
19:56	<tstarling>	synchronized php-1.23wmf14/includes/SiteStats.php	[production]
19:48	<RobH>	operations folks are looking into site issues at present	[production]
19:38	<greg-g>	< paravoid> something that has to do with SiteStatsInit, probably	[production]
19:33	<greg-g>	< paravoid> it's all plwiki	[production]
19:33	<greg-g>	< paravoid> tons of SELECT /* SiteStatsInit::edits / COUNT() FROM `revision` LIMIT 1	[production]
19:32	<greg-g>	< paravoid> it's s2	[production]
02:08	<LocalisationUpdate>	ResourceLoader cache refresh completed at 2014-02-23 02:08:36+00:00	[production]
02:02	<LocalisationUpdate>	completed (1.23wmf15) at 2014-02-23 02:02:44+00:00	[production]
02:02	<LocalisationUpdate>	completed (1.23wmf14) at 2014-02-23 02:01:56+00:00	[production]
2014-02-22 §
03:16	<LocalisationUpdate>	ResourceLoader cache refresh completed at 2014-02-22 03:16:14+00:00	[production]
02:35	<LocalisationUpdate>	completed (1.23wmf15) at 2014-02-22 02:35:25+00:00	[production]
02:21	<LocalisationUpdate>	completed (1.23wmf14) at 2014-02-22 02:21:50+00:00	[production]
02:07	<Coren>	undid the cert change on the virt0 LDAP; this has subtle impact in some other places because of the RapidSSL cert and will need planning.	[production]
01:05	<Coren>	Shutting down LDAP briefly on virt0 for a config switch	[production]
2014-02-21 §
22:12	<mwalker>	updated civicrm from eb3536eb32cbc7400e4e5884d56fbf104e38fc2b to 41dce289bc15ea1ca638c37b29ff2e3e709a2251 for thank you templates	[production]
21:40	<bd808>	mw1047 and mw1079 errors cleared after apache-graceful	[production]
21:29	<mutante>	graceful'ing apache on mw1047 and mw1079 by request	[production]
21:26	<bd808>	mw1047 and mw1079 throwing PHP exception that looks like APC corruption	[production]
20:35	<bd808>	Finished scap: no-diff scap; recording asciicast (duration: 03m 13s)	[production]
20:31	<bd808>	Started scap: no-diff scap; recording asciicast	[production]
18:57	<catrope>	synchronized php-1.23wmf15/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ViewPageTarget.js 'touch'	[production]
18:57	<catrope>	synchronized php-1.23wmf15/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'touch'	[production]
18:57	<catrope>	synchronized php-1.23wmf14/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.ViewPageTarget.js 'touch'	[production]
18:56	<catrope>	synchronized php-1.23wmf14/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'touch'	[production]
18:55	<bd808>	The 4 hosts that failed scap-rebuild-cdbs were snapshot[1234]; can we pull them from mediawiki-installation dsh group?	[production]
18:54	<bd808>	Finished scap: no-diff scap to test script changes; expect l10n updates (duration: 13m 38s)	[production]
18:54	<bd808>	scap-rebuild-cdbs failed on 4 hosts	[production]
18:50	<bd808>	The 4 hosts that failed scap-1 were snapshot[1234]; all have old/bad python installs	[production]
18:49	<bd808>	scap-1 failed on 4 hosts	[production]
18:41	<bd808>	Started scap: no-diff scap to test script changes; expect l10n updates	[production]
18:36	<bd808>	Forced update of /svr/scap to 6203585 across cluster	[production]
18:25	<ottomata>	initiating kafka preferred replica election to rebalance partition leaders	[production]
18:19	<bblack>	cp1054 healthy now, rebuilding persistent cache from scratch there...	[production]
15:30	<Jeff_Green>	dist-upgrade and reboot boron	[production]
13:29	<akosiaris>	just resized 208.80.155.64/26 to 208.80.155.64/28. This is Sandbox1-b-eqiad subnet. dickson.freenode.net needs to have it's netmask changed. I will talk with coren, mutante	[production]