751-800 of 1446 results (2ms)
2009-01-21 §
10:29 <domas> db28 powered down because of temperature reading over threshold (45C???) [production]
2009-01-20 §
21:45 <RobH> killed some run away processes on db9 that were killing bugzilla [production]
21:44 <brion> stock long queries on bz again. got rob poking em [production]
20:31 <brion> putting $wgEnotifUseJobQ back for now. change postdates some of the spikes i'm seeing, but it'll be easier to not have to consider it [production]
20:19 <mark> Upgraded kernel to 2.6.24-22 on sq22 [production]
19:57 <brion> disabling $wgEnotifUseJobQ since the lag is ungodly [production]
17:58 <JeLuF> db2 overloaded, error messages about unreachable DB server have been supported. Nearly all connections on DB2 are in status "Sleep" [production]
17:21 <JeLuF> srv154 is reachable again, current load average is 25, no obvious CPU consuming processes visible [production]
17:10 <JeLuF> srv154 went down. Replaced its memcached by srv144's memcached [production]
03:02 <brion> syncing InitialiseSettings -- reenabling CentralNotice which we'd taken temporarily out during the upload breakage [production]
01:50 <Tim> exim4 on lily died while I examined reports of breakage, restarted it [production]
2009-01-19 §
21:28 <mark> Distribution upgrade on lily complete [production]
21:28 <mark> Letting mail through again on lily [production]
21:01 <JeLuF> Bugzilla didn't work. Some long-running (>3h) requests were locking some tables. Killed all long running jobs. [production]
20:05 <mark> Put mail delivery on hold on lily [production]
20:03 <mark> Upgrading lily (Mailing list server) to Ubuntu 8.04 Hardy [production]
14:04 <mark> Set a static ARP entry for 85.17.163.246 on csw1-esams to see if it helps with the inbound packet loss effects [production]
2009-01-18 §
20:25 <mark> Cut outbound announcements to AS16265 to counter the inbound packet loss on that link [production]
17:57 <river> started copying ms1:/export/upload to ms4 [production]
00:21 <Tim> restarted apache on srv158,srv177,srv106,srv66,srv109,srv140,srv86,srv90,srv133,srv172 [production]
00:19 <Tim> cleaned up binlogs on db1 [production]
2009-01-17 §
12:43 <mark> Shut down transit link to 16265 due to intermittent packet loss [production]
2009-01-16 §
23:25 <brion> activating Drafts extension on testwiki [production]
21:18 <brion> updating english/default wikibooks logo [[bugzilla:17034]] [production]
19:50 <brion> uncommented srv101 from apache nodelist [production]
19:41 <mark> Fixed authentication on srv101, and mounted /mnt/upload5 [production]
19:25 <brion> srv101 is commented out of 'apaches' node group so didn't show up on my earlier sweep [production]
19:23 <brion> poking around, srv101 at least is missing upload5 mount still [production]
2009-01-15 §
21:16 <brion> seems magically better now [production]
20:48 <brion> ok webserver7 started [production]
20:43 <brion> per mark's recommendation, retrying webserver7 now that we've reduced hit rate and are past peak... [production]
20:28 <brion> bumping styles back to apaches [production]
20:25 <brion> restarted w/ some old server config bits commented out [production]
20:24 <brion> tom recompiled lighty w/ the solaris bug patch. may or may not be workin' better, but still not throwing a lot of reqs through. checking config... [production]
19:48 <brion> trying webserver7 again to see if it's still doing the funk and if we can measure something useful [production]
19:47 <brion> we're gonna poke around http://redmine.lighttpd.net/issues/show/673 but we're really not sure what the original problem was to begin with yet [production]
19:39 <brion> turning lighty back on, gonna poke it some more [production]
19:31 <brion> stopping lighty again. not sure what the hell is going on, but it seems not to respond to most requests [production]
19:27 <brion> image scalers are still doing wayyy under what they're supposed to, but they are churning some stuff out. not overloaded that i can see... [production]
19:20 <brion> seems to spawn its php-cgi's ok [production]
19:19 <brion> trying to stop lighty to poke at fastcgi again [production]
19:15 <brion> looks like ms1+lighty is successfully serving images, but failing to hit the scaling backends. possible fastcgi buggage [production]
19:12 <brion> started lighty on ms1 a bit ago. not realyl sure if it's configured right [production]
19:00 <brion> stopping it again. confirmed load spike still going on [production]
18:58 <brion> restarting webserver on ms1, see what happens [production]
18:56 <brion> apache load seems to have dropped back to normal [production]
18:48 <brion> switching stylepath back to upload (should be cached), seeing if that affects apache load [production]
18:40 <brion> switching $wgStylePath to apaches for the moment [production]
18:39 <brion> load dropping on ms1; ping time stabilizing also [production]
18:38 <RobH> sq14, sq15, sq16 back up and serving requests [production]