1551-1600 of 2239 results (9ms)
2009-01-20 §
17:21 <JeLuF> srv154 is reachable again, current load average is 25, no obvious CPU consuming processes visible [production]
17:10 <JeLuF> srv154 went down. Replaced its memcached by srv144's memcached [production]
03:02 <brion> syncing InitialiseSettings -- reenabling CentralNotice which we'd taken temporarily out during the upload breakage [production]
01:50 <Tim> exim4 on lily died while I examined reports of breakage, restarted it [production]
2009-01-19 §
21:28 <mark> Distribution upgrade on lily complete [production]
21:28 <mark> Letting mail through again on lily [production]
21:01 <JeLuF> Bugzilla didn't work. Some long-running (>3h) requests were locking some tables. Killed all long running jobs. [production]
20:05 <mark> Put mail delivery on hold on lily [production]
20:03 <mark> Upgrading lily (Mailing list server) to Ubuntu 8.04 Hardy [production]
14:04 <mark> Set a static ARP entry for 85.17.163.246 on csw1-esams to see if it helps with the inbound packet loss effects [production]
2009-01-18 §
20:25 <mark> Cut outbound announcements to AS16265 to counter the inbound packet loss on that link [production]
17:57 <river> started copying ms1:/export/upload to ms4 [production]
00:21 <Tim> restarted apache on srv158,srv177,srv106,srv66,srv109,srv140,srv86,srv90,srv133,srv172 [production]
00:19 <Tim> cleaned up binlogs on db1 [production]
2009-01-17 §
12:43 <mark> Shut down transit link to 16265 due to intermittent packet loss [production]
2009-01-16 §
23:25 <brion> activating Drafts extension on testwiki [production]
21:18 <brion> updating english/default wikibooks logo [[bugzilla:17034]] [production]
19:50 <brion> uncommented srv101 from apache nodelist [production]
19:41 <mark> Fixed authentication on srv101, and mounted /mnt/upload5 [production]
19:25 <brion> srv101 is commented out of 'apaches' node group so didn't show up on my earlier sweep [production]
19:23 <brion> poking around, srv101 at least is missing upload5 mount still [production]
2009-01-15 §
21:16 <brion> seems magically better now [production]
20:48 <brion> ok webserver7 started [production]
20:43 <brion> per mark's recommendation, retrying webserver7 now that we've reduced hit rate and are past peak... [production]
20:28 <brion> bumping styles back to apaches [production]
20:25 <brion> restarted w/ some old server config bits commented out [production]
20:24 <brion> tom recompiled lighty w/ the solaris bug patch. may or may not be workin' better, but still not throwing a lot of reqs through. checking config... [production]
19:48 <brion> trying webserver7 again to see if it's still doing the funk and if we can measure something useful [production]
19:47 <brion> we're gonna poke around http://redmine.lighttpd.net/issues/show/673 but we're really not sure what the original problem was to begin with yet [production]
19:39 <brion> turning lighty back on, gonna poke it some more [production]
19:31 <brion> stopping lighty again. not sure what the hell is going on, but it seems not to respond to most requests [production]
19:27 <brion> image scalers are still doing wayyy under what they're supposed to, but they are churning some stuff out. not overloaded that i can see... [production]
19:20 <brion> seems to spawn its php-cgi's ok [production]
19:19 <brion> trying to stop lighty to poke at fastcgi again [production]
19:15 <brion> looks like ms1+lighty is successfully serving images, but failing to hit the scaling backends. possible fastcgi buggage [production]
19:12 <brion> started lighty on ms1 a bit ago. not realyl sure if it's configured right [production]
19:00 <brion> stopping it again. confirmed load spike still going on [production]
18:58 <brion> restarting webserver on ms1, see what happens [production]
18:56 <brion> apache load seems to have dropped back to normal [production]
18:48 <brion> switching stylepath back to upload (should be cached), seeing if that affects apache load [production]
18:40 <brion> switching $wgStylePath to apaches for the moment [production]
18:39 <brion> load dropping on ms1; ping time stabilizing also [production]
18:38 <RobH> sq14, sq15, sq16 back up and serving requests [production]
18:38 <brion> trying stopping/starting webserver on ms1 [production]
18:27 <brion> nfs upload5 is not happy :( [production]
18:27 <brion> some sort of issues w/ media fileserver, we think, perhaps pressure due to some upload squid cache clearing? [production]
18:23 <RobH> sq14-aq16 offline, rebooting and cleaning cache [production]
18:16 <RobH> sq2, sq4, and sq10 were unresponsive and down. Restarted, cleaned cache, and brought back online. [production]
04:32 <Tim> increased squid max post size from 75MB to 110MB so that people can actually upload 100MB files as advertised in the media [production]
2009-01-14 §
19:21 <mark> Lower preffed paths from 13030 that were learned at NYIIX [production]