| 
      
        2009-01-15
      
      §
     | 
  
    
  | 21:16 | 
  <brion> | 
  seems magically better now | 
  [production] | 
            
  | 20:48 | 
  <brion> | 
  ok webserver7 started | 
  [production] | 
            
  | 20:43 | 
  <brion> | 
  per mark's recommendation, retrying webserver7 now that we've reduced hit rate and are past peak... | 
  [production] | 
            
  | 20:28 | 
  <brion> | 
  bumping styles back to apaches | 
  [production] | 
            
  | 20:25 | 
  <brion> | 
  restarted w/ some old server config bits commented out | 
  [production] | 
            
  | 20:24 | 
  <brion> | 
  tom recompiled lighty w/ the solaris bug patch. may or may not be workin' better, but still not throwing a lot of reqs through. checking config... | 
  [production] | 
            
  | 19:48 | 
  <brion> | 
  trying webserver7 again to see if it's still doing the funk and if we can measure something useful | 
  [production] | 
            
  | 19:47 | 
  <brion> | 
  we're gonna poke around http://redmine.lighttpd.net/issues/show/673 but we're really not sure what the original problem was to begin with yet | 
  [production] | 
            
  | 19:39 | 
  <brion> | 
  turning lighty back on, gonna poke it some more | 
  [production] | 
            
  | 19:31 | 
  <brion> | 
  stopping lighty again. not sure what the hell is going on, but it seems not to respond to most requests | 
  [production] | 
            
  | 19:27 | 
  <brion> | 
  image scalers are still doing wayyy under what they're supposed to, but they are churning some stuff out. not overloaded that i can see... | 
  [production] | 
            
  | 19:20 | 
  <brion> | 
  seems to spawn its php-cgi's ok | 
  [production] | 
            
  | 19:19 | 
  <brion> | 
  trying to stop lighty to poke at fastcgi again | 
  [production] | 
            
  | 19:15 | 
  <brion> | 
  looks like ms1+lighty is successfully serving images, but failing to hit the scaling backends. possible fastcgi buggage | 
  [production] | 
            
  | 19:12 | 
  <brion> | 
  started lighty on ms1 a bit ago. not realyl sure if it's configured right | 
  [production] | 
            
  | 19:00 | 
  <brion> | 
  stopping it again. confirmed load spike still going on | 
  [production] | 
            
  | 18:58 | 
  <brion> | 
  restarting webserver on ms1, see what happens | 
  [production] | 
            
  | 18:56 | 
  <brion> | 
  apache load seems to have dropped back to normal | 
  [production] | 
            
  | 18:48 | 
  <brion> | 
  switching stylepath back to upload (should be cached), seeing if that affects apache load | 
  [production] | 
            
  | 18:40 | 
  <brion> | 
  switching $wgStylePath to apaches for the moment | 
  [production] | 
            
  | 18:39 | 
  <brion> | 
  load dropping on ms1; ping time stabilizing also | 
  [production] | 
            
  | 18:38 | 
  <RobH> | 
  sq14, sq15, sq16 back up and serving requests | 
  [production] | 
            
  | 18:38 | 
  <brion> | 
  trying stopping/starting webserver on ms1 | 
  [production] | 
            
  | 18:27 | 
  <brion> | 
  nfs upload5 is not happy :( | 
  [production] | 
            
  | 18:27 | 
  <brion> | 
  some sort of issues w/ media fileserver, we think, perhaps pressure due to some upload squid cache clearing? | 
  [production] | 
            
  | 18:23 | 
  <RobH> | 
  sq14-aq16 offline, rebooting and cleaning cache | 
  [production] | 
            
  | 18:16 | 
  <RobH> | 
  sq2, sq4, and sq10 were unresponsive and down.  Restarted, cleaned cache, and brought back online. | 
  [production] | 
            
  | 04:32 | 
  <Tim> | 
  increased squid max post size from 75MB to 110MB so that people can actually upload 100MB files as advertised in the media | 
  [production] |