| 2009-01-15
      
      § | 
    
  | 21:16 | <brion> | seems magically better now | [production] | 
            
  | 20:48 | <brion> | ok webserver7 started | [production] | 
            
  | 20:43 | <brion> | per mark's recommendation, retrying webserver7 now that we've reduced hit rate and are past peak... | [production] | 
            
  | 20:28 | <brion> | bumping styles back to apaches | [production] | 
            
  | 20:25 | <brion> | restarted w/ some old server config bits commented out | [production] | 
            
  | 20:24 | <brion> | tom recompiled lighty w/ the solaris bug patch. may or may not be workin' better, but still not throwing a lot of reqs through. checking config... | [production] | 
            
  | 19:48 | <brion> | trying webserver7 again to see if it's still doing the funk and if we can measure something useful | [production] | 
            
  | 19:47 | <brion> | we're gonna poke around http://redmine.lighttpd.net/issues/show/673 but we're really not sure what the original problem was to begin with yet | [production] | 
            
  | 19:39 | <brion> | turning lighty back on, gonna poke it some more | [production] | 
            
  | 19:31 | <brion> | stopping lighty again. not sure what the hell is going on, but it seems not to respond to most requests | [production] | 
            
  | 19:27 | <brion> | image scalers are still doing wayyy under what they're supposed to, but they are churning some stuff out. not overloaded that i can see... | [production] | 
            
  | 19:20 | <brion> | seems to spawn its php-cgi's ok | [production] | 
            
  | 19:19 | <brion> | trying to stop lighty to poke at fastcgi again | [production] | 
            
  | 19:15 | <brion> | looks like ms1+lighty is successfully serving images, but failing to hit the scaling backends. possible fastcgi buggage | [production] | 
            
  | 19:12 | <brion> | started lighty on ms1 a bit ago. not realyl sure if it's configured right | [production] | 
            
  | 19:00 | <brion> | stopping it again. confirmed load spike still going on | [production] | 
            
  | 18:58 | <brion> | restarting webserver on ms1, see what happens | [production] | 
            
  | 18:56 | <brion> | apache load seems to have dropped back to normal | [production] | 
            
  | 18:48 | <brion> | switching stylepath back to upload (should be cached), seeing if that affects apache load | [production] | 
            
  | 18:40 | <brion> | switching $wgStylePath to apaches for the moment | [production] | 
            
  | 18:39 | <brion> | load dropping on ms1; ping time stabilizing also | [production] | 
            
  | 18:38 | <RobH> | sq14, sq15, sq16 back up and serving requests | [production] | 
            
  | 18:38 | <brion> | trying stopping/starting webserver on ms1 | [production] | 
            
  | 18:27 | <brion> | nfs upload5 is not happy :( | [production] | 
            
  | 18:27 | <brion> | some sort of issues w/ media fileserver, we think, perhaps pressure due to some upload squid cache clearing? | [production] | 
            
  | 18:23 | <RobH> | sq14-aq16 offline, rebooting and cleaning cache | [production] | 
            
  | 18:16 | <RobH> | sq2, sq4, and sq10 were unresponsive and down.  Restarted, cleaned cache, and brought back online. | [production] | 
            
  | 04:32 | <Tim> | increased squid max post size from 75MB to 110MB so that people can actually upload 100MB files as advertised in the media | [production] | 
            
  
    | 2009-01-13
      
      § | 
    
  | 23:32 | <Tim> | fixed NRPE on db29 | [production] | 
            
  | 22:56 | <Tim> | cleaned up binlogs on db1 and ixia | [production] | 
            
  | 22:54 | <brion> | poking WP alias on frwiki [[bugzilla:16887]] | [production] | 
            
  | 21:11 | <RobH> | setup ganglia on erzurumi | [production] | 
            
  | 20:42 | <brion> | setting all pdf generators to use the new server | [production] | 
            
  | 20:40 | <brion> | testing pdf gen on erzurumi on testwiki | [production] | 
            
  | 20:35 | <RobH> | setup erzurumi for dev testing | [production] | 
            
  | 20:35 | <RobH> | some random updates on [[server roles]] to clean it up | [production] | 
            
  | 19:37 | <mark> | Restored normal situation, with 14907 -> 43821 traffic downpreffed to HGTN to avoid peering network congestion | [production] | 
            
  | 18:40 | <mark> | Retracted outbound announcement to all AMS-IX peers, 16265 and 13030 to force inbound via 1299 | [production] | 
            
  | 18:25 | <mark> | Undid any routing changes as they were not having the desired effect | [production] | 
            
  | 18:14 | <mark> | Prepended 43821 twice on outgoing announcements to 16265 to make pmtpa-esams path via nycx less attractive | [production] | 
            
  | 11:38 | <Tim> | reducing innodb_buffer_pool_size on db19, db21, db22, db29 | [production] |