9451-9500 of 10000 results (31ms)
2010-11-04 §
00:36 <RobH> torrus back online [production]
00:29 <RobH> fixing torrus deadlock, no touchy [production]
00:18 <tomaszf> upped open fd's on loudon to 4096 [production]
00:17 <RobH> kicking srv217 for reinstall [production]
2010-11-03 §
21:22 <RobH> updated puppet to properly remove memcached from memcached::false entries and removed the host memcached check for servers no longer running memcached, hup'd nagios to take the change [production]
21:21 <atglenn> rebooting ms5 after OS update. note that we were unable to get some of the more recent patches, they are probably from after the sun->oracle transition [production]
21:02 <nimishg> synchronized php-1.5/extensions/LandingCheck/LandingCheck.i18n.php 'r75890' [production]
21:02 <nimishg> synchronized php-1.5/extensions/LandingCheck/LandingCheck.alias.php 'r75890' [production]
21:01 <nimishg> synchronized php-1.5/extensions/LandingCheck/SpecialLandingCheck.php 'r75890' [production]
21:01 <nimishg> synchronized php-1.5/extensions/LandingCheck/LandingCheck.php 'r75890' [production]
20:31 <atglenn> removed about 1.5T of stuff off of /export on ms4 (old backups, solaris isos, etc) [production]
19:41 <catrope> synchronized php-1.5/README 'Dummy sync so I can document what the errors look like' [production]
19:32 <tfinc> synchronized php-1.5/wmf-config/CommonSettings.php 'Backing out config change for stats fix' [production]
19:31 <RobH> srv281 still down, setting to false in pybal just so it doesnt keep trying to use it [production]
18:31 <RobH> reinstalling srv281, tired of lookin at it in red [production]
17:18 <mark> Upgraded storage1 to Lucid [production]
16:42 <mark> Removing 2010-03 snapshots on ms4 [production]
16:01 <mark> Fixed sshd on ms4 [production]
15:46 <mark> Removing 2010-02 snapshots on ms4 [production]
15:45 <mark> Disabled gmetric cron jobs on ms4 [production]
15:43 <mark> Disabled daily snapshot generation on ms4 [production]
15:27 <mark> Restarted gmond on ms4 [production]
15:24 <mark> Upgraded puppet on ms4 [production]
15:13 <mark> Powercycled knsq2 [production]
14:52 <mark> Removing daily snapshots for 2010-10 on ms4 [production]
14:24 <mark> Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils [production]
05:34 <tstarling> synchronized php-1.5/includes/Math.php 'r75909' [production]
04:52 <apergos> oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified. [production]
04:18 <apergos> lather rinse repeat for sq47, I hope that's all of 'em [production]
03:46 <apergos> repeated on sq45... [production]
03:13 <apergos> same old story on sq46... restarted syslog, reloaded squid, got back some space on / [production]
02:41 <apergos> er... and deleted the log file :-P [production]
02:38 <apergos> moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier [production]
02:32 <apergos> cleaned up / on sq41, restarted syslog, reloaded squid [production]
00:59 <nimishg> synchronized php-1.5/wmf-config/InitialiseSettings.php [production]
00:53 <nimishg> synchronizing Wikimedia installation... Revision: 75891 [production]
00:33 <apergos1> also 44 and 43 [production]
00:30 <apergos1> cleaning up space on other / full squids: sq42 [production]
2010-11-02 §
23:22 <apergos> same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200 [production]
23:13 <apergos> trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now... [production]
23:08 <apergos> hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space [production]
23:08 <tfinc> synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links' [production]
22:40 <apergos> added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari [production]
19:48 <RobH> torrus back online [production]
19:44 <RobH> following procedure on wikitech to fix torrus [production]
16:46 <RobH> sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue [production]
16:38 <RobH> restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs [production]
16:35 <RobH> sq43 was flapping since the nfs mount on ms4 was borked. restarted it [production]
16:07 <apergos> NFSD_SERVERS=2048 in /etc/default on ms4 [production]
16:06 <apergos> note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and [production]