9501-9550 of 10000 results (36ms)
2010-11-03 §
00:30 <apergos1> cleaning up space on other / full squids: sq42 [production]
2010-11-02 §
23:22 <apergos> same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200 [production]
23:13 <apergos> trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now... [production]
23:08 <apergos> hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space [production]
23:08 <tfinc> synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links' [production]
22:40 <apergos> added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari [production]
19:48 <RobH> torrus back online [production]
19:44 <RobH> following procedure on wikitech to fix torrus [production]
16:46 <RobH> sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue [production]
16:38 <RobH> restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs [production]
16:35 <RobH> sq43 was flapping since the nfs mount on ms4 was borked. restarted it [production]
16:07 <apergos> NFSD_SERVERS=2048 in /etc/default on ms4 [production]
16:06 <apergos> note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and [production]
15:54 <apergos> hard reset on ms4, reboot was not getting the job done [production]
15:47 <apergos> rebootint ms4, nfsd hung and couldn't be restarted or killed. [production]
14:04 <RobH> restarted pdns on linne due to crash from authdns update [production]
14:02 <RobH> updated dns with new mgmt entries for payments, owasrvs, and owadbs [production]
03:45 <domas> added srv193 back to apaches pool on lvs [production]
2010-11-01 §
23:55 <tfinc> synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Picking up fixes for Bug #25564' [production]
23:54 <tfinc> synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fixes for r25564' [production]
20:43 <domas> ms4 mildly loaded (disks go to >100i/s each) throwing nfs timeouts, I bumped up NFSD_SERVERS to 2048 [production]
19:05 <Ryan_Lane> powercycling srv207 [production]
16:18 <RoanKattouw> Something weird's going on with srv207: Nagios says its SSH is up but it times out on SSH from fenari [production]
16:15 <catrope> synchronized php-1.5/includes/api/ApiBase.php 'r75798' [production]
2010-10-31 §
17:21 <catrope> synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 25719 - Add missing slash in timezone' [production]
2010-10-30 §
23:05 <apergos> test of logging (sorry) [production]
21:22 <mark> Deploying a sudoers file for NRPE using Puppet [production]
20:48 <mark> Running apt-get upgrade on db17 [production]
20:48 <mark> Pushed updated wikimedia-raid-utils package into the APT repository, with a newer arcconf that should work on Lucid [production]
15:53 <atglenn> powercycled mobile2, it was unresponsive to ssh and pings, ganaglia showed no activity [production]
03:05 <domas> ms1 can't snapshot either, I suspect kernel bugs. we either have to roll back to 2.6.28 or move forward, or actually try rebuilding filesystems from scratch with new kernels... [production]
2010-10-29 §
23:21 <domas> lol repaired myisam tables on db9, call if data has been lost, hehe [production]
22:58 <domas> resynced srv154, was running with months old configuration/code. [production]
22:58 <domas> was db22 disabled silently by someone? or not reenabled? :) reenabled now... [production]
22:55 <midom> synchronized php-1.5/wmf-config/db.php [production]
18:33 <apergos> restarted torrus on streber, after reports that it was not responding [production]
17:46 <apergos> domas ran "reset-mysql-slave db18" (from fenari) which clears out *all* old relay logs, and restarts the slaves. [production]
17:34 <apergos> removed some old relay logs from /a/sqldata on db18 to get space back, it was at 95% [production]
15:22 <RoanKattouw> Followers on Twitter: view missing entries between Sep 2 and today at http://identi.ca/wikimediatech [production]
15:22 <RoanKattouw> Re-established identi.ca->Twitter bridge for wikimediatech, broken since September 2 [production]
15:21 <RobH> repaired the sessions table, rt is now happy [production]
15:09 <RobH> rt is being odd, looking into it [production]
14:43 <phuzion> test [production]
2010-10-28 §
21:34 <RobH> powercycled sq69, ran puppet, its back online [production]
21:24 <RobH> sq69 is borked, powercycling [production]
17:51 <Ryan_Lane> running checksetup.pl on kaulen for bugzilla [production]
17:50 <Ryan_Lane> running mysqlcheck --autorepair on bugzilla database on db9 for the bug_fulltext table [production]
15:23 <atglenn> reenabled logging for fundraising on locke [production]
14:50 <atglenn> I see a lot of lot of ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO) after reboot of db9... not awake enough to try to look at it; services seem to be running ok [production]
14:46 <atglenn> powercycled db9, it was unreachable by ssh, ganglia showed load and wait_cpu through the roof [production]