production SAL

5101-5150 of 6011 results (11ms)

2009-02-11 §
18:35	<brion>	srv38, 39, 77, 79, and 80 appear to have been prematurely put into apaches pool, running old version of PHP. need to be halted and upgraded	[production]
17:26	<domas>	restarted apache on srv154 after teh deadlock in apc	[production]
16:04	<Tim>	disabled checkers.php hack, using mwsuggest.js hack instead	[production]
15:52	<Tim>	emergency optimisation: disabled search suggest via checkers.php	[production]
15:41	<domas>	srv159 restarted as proper apache, not -DSCALER	[production]
09:02	<domas>	moved morebots to ~morebots@wikitech.wikimedia, startup line in rc.local :)	[production]
09:00	<domas>	tests	[production]
07:06	<Tim>	running maintenance/fixBug17442.php	[production]
06:56	<Tim>	restarted job runners	[production]
04:31	<Tim>	upgraded bugzilla to 3.0.8 with cvs up, and copied in the docs directory from the 3.0.8 tarball	[production]
03:31	<Tim>	gave myself an account on isidore, cleaned up some crap in /srv/org/wikimedia to /srv/org/wikimedia/backup	[production]
02:58	<Tim>	apt-get upgrade on isidore	[production]
2009-02-10 §
23:47	<mark>	Moved upload esams LVS from mint to hawthorn	[production]
23:41	<mark>	Installed a specially compiled LVS Feisty kernel on hawthorn (running Hardy) & rebooted	[production]
22:33	<RobH>	updated mwlib on erzurumi per brion	[production]
22:25	<RobH>	some resets and such on searchidx1 to get ssh working. system is very sluggish.	[production]
19:28	<brion>	wikitech server crashed; CPU pegged and OOM. rob rebooted it, yay	[production]
02:46	<Tim>	running maintenance/fixBug17300.php to create missing redirect table entries	[production]
01:18	<Tim>	reverted PP caching patch	[production]
01:14	<Tim>	re-enabled search suggestions	[production]
2009-02-09 §
23:13	<domas>	grunt session finished	[production]
23:10	<domas>	brought up srv80 from hibernation and made it work.	[production]
22:53	<domas>	added srv61 too	[production]
22:23	<domas>	added srv144 and srv147 to duty, added ganglia stuff too	[production]
22:01	<domas>	started appserver work on srv77,srv79	[production]
21:54	<domas>	started srv35,38,49 as appservers, restarted deadlocked srv49 processes	[production]
16:14	<mark>	Moved upload LVS back from hawthorn to mint - even a optimized 2.6.24 kernel is not fast enough to serve upload LVS	[production]
16:03	<Tim>	disabled search suggest as an emergency optimsation measure	[production]
16:02	<mark>	Rebooted hawthorn with an LVS optimized kernel, moved upload LVS back to it	[production]
15:53	<mark>	Moved upload esams LVS back to mint	[production]
15:37	<mark>	Moved upload.esams LVS from mint to hawthorn	[production]
15:28	<mark>	Reinstalled server hawthorn with Hardy 8.04	[production]
13:55	<domas>	fixed ganglia group for srv159 (it is scaler, not appserv)	[production]
13:51	<domas>	brought srv182 up	[production]
13:32	<domas>	repooled srv104 and srv105, after few months of vacation	[production]
13:20	<domas>	killed few orphaned tidy processes that were very very busy since Feb1	[production]
13:13	<domas>	heeheee, extorted this: [15:11] <rainman-sr> so, srv77,79,80, rose, coronelli and maurus could be converted to apaches	[production]
12:36	<Tim>	trying apc.localcache=1 on srv176	[production]
04:27	<Tim>	patching in r46936	[production]
03:48	<Tim>	attempting to reproduce APC lock contention on srv188	[production]
2009-02-08 §
22:43	<brion>	may or may not have fixed that -- log file was unwritable. hard to test the command since 'su' bitches about apache not being loginabble on hume :P	[production]
22:39	<brion>	investigating why centralnotice update is still broken. getting fatal php errors wtf?	[production]
20:17	<domas>	we were hitting APC lock contention after some CPU peak. Dear Ops Team, please upgrade to APC with localcache support. :)))))	[production]
2009-02-07 §
22:49	<domas>	db17 came up, but it crashed with different symptoms than other boxes, and it was running 2.6.28.1 kernel. might be previous hardware problems resurfacing	[production]
21:23	<domas>	db17 down	[production]
2009-02-06 §
12:33	<brion>	stopped that process since it was taking a while and just saved it as an hourly cronjob. :) log to /opt/mwlib/var/log/cache-cleaning	[production]
12:28	<brion>	running mw-serve cache cleanup for files older than 24h	[production]
2009-02-05 §
18:19	<brion>	put ulimit back with -v 1024000 that's better :D	[production]
18:18	<brion>	removed the ulimit; was unable to reach server with it in place	[production]
18:15	<brion>	hacked mw-serve to ulimit -v 102400 on erzurumi, see if this helps with the leaks for now	[production]