651-700 of 9787 results (7ms)
2010-11-03 §
15:27 <mark> Restarted gmond on ms4 [production]
15:24 <mark> Upgraded puppet on ms4 [production]
15:13 <mark> Powercycled knsq2 [production]
14:52 <mark> Removing daily snapshots for 2010-10 on ms4 [production]
14:24 <mark> Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils [production]
05:34 <tstarling> synchronized php-1.5/includes/Math.php 'r75909' [production]
04:52 <apergos> oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified. [production]
04:18 <apergos> lather rinse repeat for sq47, I hope that's all of 'em [production]
03:46 <apergos> repeated on sq45... [production]
03:13 <apergos> same old story on sq46... restarted syslog, reloaded squid, got back some space on / [production]
02:41 <apergos> er... and deleted the log file :-P [production]
02:38 <apergos> moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier [production]
02:32 <apergos> cleaned up / on sq41, restarted syslog, reloaded squid [production]
00:59 <nimishg> synchronized php-1.5/wmf-config/InitialiseSettings.php [production]
00:53 <nimishg> synchronizing Wikimedia installation... Revision: 75891 [production]
00:33 <apergos1> also 44 and 43 [production]
00:30 <apergos1> cleaning up space on other / full squids: sq42 [production]
2010-11-02 §
23:22 <apergos> same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200 [production]
23:13 <apergos> trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now... [production]
23:08 <apergos> hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space [production]
23:08 <tfinc> synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links' [production]
22:40 <apergos> added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari [production]
19:48 <RobH> torrus back online [production]
19:44 <RobH> following procedure on wikitech to fix torrus [production]
16:46 <RobH> sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue [production]
16:38 <RobH> restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs [production]
16:35 <RobH> sq43 was flapping since the nfs mount on ms4 was borked. restarted it [production]
16:07 <apergos> NFSD_SERVERS=2048 in /etc/default on ms4 [production]
16:06 <apergos> note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and [production]
15:54 <apergos> hard reset on ms4, reboot was not getting the job done [production]
15:47 <apergos> rebootint ms4, nfsd hung and couldn't be restarted or killed. [production]
14:04 <RobH> restarted pdns on linne due to crash from authdns update [production]
14:02 <RobH> updated dns with new mgmt entries for payments, owasrvs, and owadbs [production]
03:45 <domas> added srv193 back to apaches pool on lvs [production]
2010-11-01 §
23:55 <tfinc> synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Picking up fixes for Bug #25564' [production]
23:54 <tfinc> synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fixes for r25564' [production]
20:43 <domas> ms4 mildly loaded (disks go to >100i/s each) throwing nfs timeouts, I bumped up NFSD_SERVERS to 2048 [production]
19:05 <Ryan_Lane> powercycling srv207 [production]
16:18 <RoanKattouw> Something weird's going on with srv207: Nagios says its SSH is up but it times out on SSH from fenari [production]
16:15 <catrope> synchronized php-1.5/includes/api/ApiBase.php 'r75798' [production]
2010-10-31 §
17:21 <catrope> synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 25719 - Add missing slash in timezone' [production]
2010-10-30 §
23:05 <apergos> test of logging (sorry) [production]
21:22 <mark> Deploying a sudoers file for NRPE using Puppet [production]
20:48 <mark> Running apt-get upgrade on db17 [production]
20:48 <mark> Pushed updated wikimedia-raid-utils package into the APT repository, with a newer arcconf that should work on Lucid [production]
15:53 <atglenn> powercycled mobile2, it was unresponsive to ssh and pings, ganaglia showed no activity [production]
03:05 <domas> ms1 can't snapshot either, I suspect kernel bugs. we either have to roll back to 2.6.28 or move forward, or actually try rebuilding filesystems from scratch with new kernels... [production]
2010-10-29 §
23:21 <domas> lol repaired myisam tables on db9, call if data has been lost, hehe [production]
22:58 <domas> resynced srv154, was running with months old configuration/code. [production]
22:58 <domas> was db22 disabled silently by someone? or not reenabled? :) reenabled now... [production]