production SAL

1-50 of 695 results (2ms)

2009-01-20 §
21:45	<RobH>	killed some run away processes on db9 that were killing bugzilla	[production]
21:44	<brion>	stock long queries on bz again. got rob poking em	[production]
20:31	<brion>	putting $wgEnotifUseJobQ back for now. change postdates some of the spikes i'm seeing, but it'll be easier to not have to consider it	[production]
20:19	<mark>	Upgraded kernel to 2.6.24-22 on sq22	[production]
19:57	<brion>	disabling $wgEnotifUseJobQ since the lag is ungodly	[production]
17:58	<JeLuF>	db2 overloaded, error messages about unreachable DB server have been supported. Nearly all connections on DB2 are in status "Sleep"	[production]
17:21	<JeLuF>	srv154 is reachable again, current load average is 25, no obvious CPU consuming processes visible	[production]
17:10	<JeLuF>	srv154 went down. Replaced its memcached by srv144's memcached	[production]
03:02	<brion>	syncing InitialiseSettings -- reenabling CentralNotice which we'd taken temporarily out during the upload breakage	[production]
01:50	<Tim>	exim4 on lily died while I examined reports of breakage, restarted it	[production]
2009-01-19 §
21:28	<mark>	Distribution upgrade on lily complete	[production]
21:28	<mark>	Letting mail through again on lily	[production]
21:01	<JeLuF>	Bugzilla didn't work. Some long-running (>3h) requests were locking some tables. Killed all long running jobs.	[production]
20:05	<mark>	Put mail delivery on hold on lily	[production]
20:03	<mark>	Upgrading lily (Mailing list server) to Ubuntu 8.04 Hardy	[production]
14:04	<mark>	Set a static ARP entry for 85.17.163.246 on csw1-esams to see if it helps with the inbound packet loss effects	[production]
2009-01-18 §
20:25	<mark>	Cut outbound announcements to AS16265 to counter the inbound packet loss on that link	[production]
17:57	<river>	started copying ms1:/export/upload to ms4	[production]
00:21	<Tim>	restarted apache on srv158,srv177,srv106,srv66,srv109,srv140,srv86,srv90,srv133,srv172	[production]
00:19	<Tim>	cleaned up binlogs on db1	[production]
2009-01-17 §
12:43	<mark>	Shut down transit link to 16265 due to intermittent packet loss	[production]
2009-01-16 §
23:25	<brion>	activating Drafts extension on testwiki	[production]
21:18	<brion>	updating english/default wikibooks logo [[bugzilla:17034]]	[production]
19:50	<brion>	uncommented srv101 from apache nodelist	[production]
19:41	<mark>	Fixed authentication on srv101, and mounted /mnt/upload5	[production]
19:25	<brion>	srv101 is commented out of 'apaches' node group so didn't show up on my earlier sweep	[production]
19:23	<brion>	poking around, srv101 at least is missing upload5 mount still	[production]
2009-01-15 §
21:16	<brion>	seems magically better now	[production]
20:48	<brion>	ok webserver7 started	[production]
20:43	<brion>	per mark's recommendation, retrying webserver7 now that we've reduced hit rate and are past peak...	[production]
20:28	<brion>	bumping styles back to apaches	[production]
20:25	<brion>	restarted w/ some old server config bits commented out	[production]
20:24	<brion>	tom recompiled lighty w/ the solaris bug patch. may or may not be workin' better, but still not throwing a lot of reqs through. checking config...	[production]
19:48	<brion>	trying webserver7 again to see if it's still doing the funk and if we can measure something useful	[production]
19:47	<brion>	we're gonna poke around http://redmine.lighttpd.net/issues/show/673 but we're really not sure what the original problem was to begin with yet	[production]
19:39	<brion>	turning lighty back on, gonna poke it some more	[production]
19:31	<brion>	stopping lighty again. not sure what the hell is going on, but it seems not to respond to most requests	[production]
19:27	<brion>	image scalers are still doing wayyy under what they're supposed to, but they are churning some stuff out. not overloaded that i can see...	[production]
19:20	<brion>	seems to spawn its php-cgi's ok	[production]
19:19	<brion>	trying to stop lighty to poke at fastcgi again	[production]
19:15	<brion>	looks like ms1+lighty is successfully serving images, but failing to hit the scaling backends. possible fastcgi buggage	[production]
19:12	<brion>	started lighty on ms1 a bit ago. not realyl sure if it's configured right	[production]
19:00	<brion>	stopping it again. confirmed load spike still going on	[production]
18:58	<brion>	restarting webserver on ms1, see what happens	[production]
18:56	<brion>	apache load seems to have dropped back to normal	[production]
18:48	<brion>	switching stylepath back to upload (should be cached), seeing if that affects apache load	[production]
18:40	<brion>	switching $wgStylePath to apaches for the moment	[production]
18:39	<brion>	load dropping on ms1; ping time stabilizing also	[production]
18:38	<RobH>	sq14, sq15, sq16 back up and serving requests	[production]
18:38	<brion>	trying stopping/starting webserver on ms1	[production]