production SAL

1-50 of 10000 results (21ms)

2015-11-30 §
21:55	<mutante>	re-wrote l10nupdate cron; restarted cron service on tin	[production]
20:05	<apergos>	re-enabled puppet on neodymium, minion testing concluded for now	[production]
19:47	<gwicke>	running `nodetool decommission` on restbase1009 in preparation for the conversion to the multi-instance setup, per https://phabricator.wikimedia.org/T95253#	[production]
19:31	<demon@tin>	Synchronized wmf-config/InitialiseSettings.php: rm deprecated/unused rate limit log config (duration: 00m 28s)	[production]
17:27	<demon@tin>	Synchronized php-1.27.0-wmf.7/extensions/WikimediaMaintenance/: need maint script errywhere (duration: 00m 28s)	[production]
16:51	<thcipriani@tin>	Synchronized php-1.27.0-wmf.7/extensions/ContentTranslation/modules/draft/ext.cx.draft.js: SWAT: Add some extra information to save failure logging [[gerrit:255956]] (duration: 00m 28s)	[production]
16:38	<thcipriani@tin>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable QuickSurveys reader segmentation survey [[gerrit:255448]] (duration: 00m 28s)	[production]
16:30	<paravoid>	mw1002 service hhvm restart	[production]
16:17	<paravoid>	rolling back to kernel 3.19 on lvs2001/2/3	[production]
15:29	<paravoid>	stopping pybal on lvs2001/2/3	[production]
15:21	<paravoid>	switching lvs2004/5/6 traffic back to lvs2001/2/3	[production]
15:13	<paravoid>	switching lvs2001/2/3 traffic to lvs2004/5/6 and upgrading kernels	[production]
15:12	<_joe_>	restarting HHVM on mw1147 too, same reason as mw1114	[production]
15:10	<_joe_>	restarting hhvm on mw1114, stuck in __pthread_cond_wait () [folly::EventBase::runInEventBaseThreadAndWait ()], apparently blocked in writing to stdout	[production]
15:02	<paravoid>	switching traffic from lvs4002 to lvs4004; upgrading lvs4002's kernel	[production]
15:02	<paravoid>	switching traffic back to lvs4001	[production]
14:57	<paravoid>	switching traffic from lvs4001 to lvs4003; upgrading lvs4001's kernel	[production]
14:45	<paravoid>	switching traffic from lvs3001 to lvs3003; upgrading lvs3001's kernel	[production]
14:38	<paravoid>	switching traffic back to lvs3002	[production]
14:31	<paravoid>	switching traffic from lvs3002 to lvs3004; upgrading lvs3002's kernel	[production]
14:07	<bblack>	upgrading varnishkafka package on all caches	[production]
13:52	<bblack>	updating varnishkafka on cp1065	[production]
11:03	<godog>	upgrade python-statsd to 3.0.1 in eqiad	[production]
10:59	<godog>	upgrade python-statsd to 3.0.1 in codfw	[production]
10:15	<godog>	reenable puppet on graphite1001	[production]
10:10	<paravoid>	re-enabling OSPF over cr2-eqiad:xe-5/2/2 <-> cr1-ulsfo:xe-0/0/3.538	[production]
10:09	<paravoid>	re-enabling cr2-eqiad:xe-5/2/0 and xe-5/2/1	[production]
10:01	<jynus>	performing schema change on db1046 (analytics master)	[production]
09:32	<jynus>	removing old snapshots from db1046	[production]
06:38	<ori>	Restarted statsv on hafnium	[production]
02:00	<l10nupdate@tin>	LocalisationUpdate failed: git pull of core failed	[production]
01:56	<gwicke>	started `nodetool cleanup` on restbase1002 to get rid of unnecessary data from earlier 1001 decommission attempt	[production]
01:05	<bd808@tin>	sync-l10n completed (1.27.0-wmf.7) (duration: 01m 19s)	[production]
01:04	<bd808>	testing l10n cache rebuild as l10nupdate user (take 2)	[production]
00:57	<Krenair>	test	[production]
00:49	<bd808@tin>	sync-l10nupdate completed (1.27.0-wmf.7) (duration: 04m 37s)	[production]
00:45	<bd808>	testing l10n cache rebuild as l10nupdate user	[production]
00:01	<bd808>	Tried to update scap to 1879fd4 (Add sync-l10n command for l10nupdate); trebuchet reported 0/483 minions completing fetch and 3/483 minions completing checkout	[production]
2015-11-29 §
21:25	<jynus>	importing user.user_touched (s7) from dbstore1002 to sanitarium. s7 lag on labs replicas will be higher for some minutes.	[production]
20:51	<jynus>	importing user.user_touched (s6) from dbstore1002 to sanitarium. s6 lag on labs replicas will be higher for some minutes.	[production]
20:28	<jynus>	importing user.user_touched (s5) from dbstore1002 to sanitarium. s5 lag on labs replicas will be higher for some minutes.	[production]
19:51	<jynus>	importing user.user_touched (s4) from dbstore1002 to sanitarium. s4 lab will be affected for some minutes.	[production]
04:50	<gwicke>	restarted cassandra on restbase1009 to avoid it running out of disk space; had large compaction (~2TB) at 80% and only 64G disk space left	[production]
03:01	<YuviPanda>	run chown -R l10nupdate: /var/lib/l10nupdate/mediawiki for Reedy on tin	[production]
02:28	<Reedy>	l10nupdate failed because some git objects owned by 997:l10nupdate	[production]
02:00	<l10nupdate@tin>	LocalisationUpdate failed: git pull of core failed	[production]
2015-11-28 §
22:48	<bd808@tin>	Synchronized php-1.27.0-wmf.5/cache/l10n: bd808 testing l10nupdate sync-dir using stale branch (duration: 01m 29s)	[production]
20:49	<l10nupdate@tin>	LocalisationUpdate failed: Failed to sync-dir 'php-1.27.0-wmf.7/cache/l10n'	[production]
20:49	<krenair@tin>	Synchronized php-1.27.0-wmf.7/cache/l10n: l10nupdate for 1.27.0-wmf.7 (duration: 07m 11s)	[production]
20:35	<ori@tin>	Synchronized wmf-config/InitialiseSettings.php: Ie33ae3b6a: Increase $wgCopyUploadTimeout to 90 seconds (from default 25) (T118887) (duration: 00m 27s)	[production]