2009-01-15
§
|
19:31 |
<brion> |
stopping lighty again. not sure what the hell is going on, but it seems not to respond to most requests |
[production] |
19:27 |
<brion> |
image scalers are still doing wayyy under what they're supposed to, but they are churning some stuff out. not overloaded that i can see... |
[production] |
19:20 |
<brion> |
seems to spawn its php-cgi's ok |
[production] |
19:19 |
<brion> |
trying to stop lighty to poke at fastcgi again |
[production] |
19:15 |
<brion> |
looks like ms1+lighty is successfully serving images, but failing to hit the scaling backends. possible fastcgi buggage |
[production] |
19:12 |
<brion> |
started lighty on ms1 a bit ago. not realyl sure if it's configured right |
[production] |
19:00 |
<brion> |
stopping it again. confirmed load spike still going on |
[production] |
18:58 |
<brion> |
restarting webserver on ms1, see what happens |
[production] |
18:56 |
<brion> |
apache load seems to have dropped back to normal |
[production] |
18:48 |
<brion> |
switching stylepath back to upload (should be cached), seeing if that affects apache load |
[production] |
18:40 |
<brion> |
switching $wgStylePath to apaches for the moment |
[production] |
18:39 |
<brion> |
load dropping on ms1; ping time stabilizing also |
[production] |
18:38 |
<RobH> |
sq14, sq15, sq16 back up and serving requests |
[production] |
18:38 |
<brion> |
trying stopping/starting webserver on ms1 |
[production] |
18:27 |
<brion> |
nfs upload5 is not happy :( |
[production] |
18:27 |
<brion> |
some sort of issues w/ media fileserver, we think, perhaps pressure due to some upload squid cache clearing? |
[production] |
18:23 |
<RobH> |
sq14-aq16 offline, rebooting and cleaning cache |
[production] |
18:16 |
<RobH> |
sq2, sq4, and sq10 were unresponsive and down. Restarted, cleaned cache, and brought back online. |
[production] |
04:32 |
<Tim> |
increased squid max post size from 75MB to 110MB so that people can actually upload 100MB files as advertised in the media |
[production] |
2009-01-13
§
|
23:32 |
<Tim> |
fixed NRPE on db29 |
[production] |
22:56 |
<Tim> |
cleaned up binlogs on db1 and ixia |
[production] |
22:54 |
<brion> |
poking WP alias on frwiki [[bugzilla:16887]] |
[production] |
21:11 |
<RobH> |
setup ganglia on erzurumi |
[production] |
20:42 |
<brion> |
setting all pdf generators to use the new server |
[production] |
20:40 |
<brion> |
testing pdf gen on erzurumi on testwiki |
[production] |
20:35 |
<RobH> |
setup erzurumi for dev testing |
[production] |
20:35 |
<RobH> |
some random updates on [[server roles]] to clean it up |
[production] |
19:37 |
<mark> |
Restored normal situation, with 14907 -> 43821 traffic downpreffed to HGTN to avoid peering network congestion |
[production] |
18:40 |
<mark> |
Retracted outbound announcement to all AMS-IX peers, 16265 and 13030 to force inbound via 1299 |
[production] |
18:25 |
<mark> |
Undid any routing changes as they were not having the desired effect |
[production] |
18:14 |
<mark> |
Prepended 43821 twice on outgoing announcements to 16265 to make pmtpa-esams path via nycx less attractive |
[production] |
11:38 |
<Tim> |
reducing innodb_buffer_pool_size on db19, db21, db22, db29 |
[production] |
09:15 |
<Tim> |
restarting mysqld on db23 again |
[production] |
09:09 |
<Tim> |
restarting mysqld on db18 again |
[production] |
07:08 |
<Tim> |
removed db23 from rotation, since I'm bringing it up soon and it will be lagged |
[production] |
07:02 |
<Tim> |
shutting down mysqld on db18 for further mem usage tweak |
[production] |
06:53 |
<Tim> |
fixed broken /etc/fstab on db23 via serial console |
[production] |
06:42 |
<Tim> |
restarting db23 |
[production] |
00:08 |
<Tim> |
repooling db18, has caught up |
[production] |