production SAL

6851-6900 of 7768 results (13ms)

2009-02-11 §
20:05	<brion>	ixia lagged 8810 secs	[production]
20:00	<brion>	ixia replication is broken -- causing contribs lag on itwiki	[production]
19:19	<RobH>	setup msw-a5-sdtpa like 30 minutes ago, opps ;]	[production]
19:00	<mark>	Added srv190-225 to DNS & DHCP	[production]
18:55	<mark>	set up RANCID for asw-a4-sdtpa and asw-a5-sdtpa	[production]
18:54	<brion>	disabled srv38,39,77,79,80 in lvs3 pybal config to ensure they don't go back into service accidentally until fixed up	[production]
18:37	<brion>	stopping apache on those bad machines for the moment	[production]
18:35	<brion>	srv38, 39, 77, 79, and 80 appear to have been prematurely put into apaches pool, running old version of PHP. need to be halted and upgraded	[production]
17:26	<domas>	restarted apache on srv154 after teh deadlock in apc	[production]
16:04	<Tim>	disabled checkers.php hack, using mwsuggest.js hack instead	[production]
15:52	<Tim>	emergency optimisation: disabled search suggest via checkers.php	[production]
15:41	<domas>	srv159 restarted as proper apache, not -DSCALER	[production]
09:02	<domas>	moved morebots to ~morebots@wikitech.wikimedia, startup line in rc.local :)	[production]
09:00	<domas>	tests	[production]
07:06	<Tim>	running maintenance/fixBug17442.php	[production]
06:56	<Tim>	restarted job runners	[production]
04:31	<Tim>	upgraded bugzilla to 3.0.8 with cvs up, and copied in the docs directory from the 3.0.8 tarball	[production]
03:31	<Tim>	gave myself an account on isidore, cleaned up some crap in /srv/org/wikimedia to /srv/org/wikimedia/backup	[production]
02:58	<Tim>	apt-get upgrade on isidore	[production]
2009-02-10 §
23:47	<mark>	Moved upload esams LVS from mint to hawthorn	[production]
23:41	<mark>	Installed a specially compiled LVS Feisty kernel on hawthorn (running Hardy) & rebooted	[production]
22:33	<RobH>	updated mwlib on erzurumi per brion	[production]
22:25	<RobH>	some resets and such on searchidx1 to get ssh working. system is very sluggish.	[production]
19:28	<brion>	wikitech server crashed; CPU pegged and OOM. rob rebooted it, yay	[production]
02:46	<Tim>	running maintenance/fixBug17300.php to create missing redirect table entries	[production]
01:18	<Tim>	reverted PP caching patch	[production]
01:14	<Tim>	re-enabled search suggestions	[production]
2009-02-09 §
23:13	<domas>	grunt session finished	[production]
23:10	<domas>	brought up srv80 from hibernation and made it work.	[production]
22:53	<domas>	added srv61 too	[production]
22:23	<domas>	added srv144 and srv147 to duty, added ganglia stuff too	[production]
22:01	<domas>	started appserver work on srv77,srv79	[production]
21:54	<domas>	started srv35,38,49 as appservers, restarted deadlocked srv49 processes	[production]
16:14	<mark>	Moved upload LVS back from hawthorn to mint - even a optimized 2.6.24 kernel is not fast enough to serve upload LVS	[production]
16:03	<Tim>	disabled search suggest as an emergency optimsation measure	[production]
16:02	<mark>	Rebooted hawthorn with an LVS optimized kernel, moved upload LVS back to it	[production]
15:53	<mark>	Moved upload esams LVS back to mint	[production]
15:37	<mark>	Moved upload.esams LVS from mint to hawthorn	[production]
15:28	<mark>	Reinstalled server hawthorn with Hardy 8.04	[production]
13:55	<domas>	fixed ganglia group for srv159 (it is scaler, not appserv)	[production]
13:51	<domas>	brought srv182 up	[production]
13:32	<domas>	repooled srv104 and srv105, after few months of vacation	[production]
13:20	<domas>	killed few orphaned tidy processes that were very very busy since Feb1	[production]
13:13	<domas>	heeheee, extorted this: [15:11] <rainman-sr> so, srv77,79,80, rose, coronelli and maurus could be converted to apaches	[production]
12:36	<Tim>	trying apc.localcache=1 on srv176	[production]
04:27	<Tim>	patching in r46936	[production]
03:48	<Tim>	attempting to reproduce APC lock contention on srv188	[production]
2009-02-08 §
22:43	<brion>	may or may not have fixed that -- log file was unwritable. hard to test the command since 'su' bitches about apache not being loginabble on hume :P	[production]
22:39	<brion>	investigating why centralnotice update is still broken. getting fatal php errors wtf?	[production]
20:17	<domas>	we were hitting APC lock contention after some CPU peak. Dear Ops Team, please upgrade to APC with localcache support. :)))))	[production]