production SAL

2151-2200 of 10000 results (13ms)

2010-11-03 §
14:52	<mark>	Removing daily snapshots for 2010-10 on ms4	[production]
14:24	<mark>	Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils	[production]
05:34	<tstarling>	synchronized php-1.5/includes/Math.php 'r75909'	[production]
04:52	<apergos>	oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified.	[production]
04:18	<apergos>	lather rinse repeat for sq47, I hope that's all of 'em	[production]
03:46	<apergos>	repeated on sq45...	[production]
03:13	<apergos>	same old story on sq46... restarted syslog, reloaded squid, got back some space on /	[production]
02:41	<apergos>	er... and deleted the log file :-P	[production]
02:38	<apergos>	moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier	[production]
02:32	<apergos>	cleaned up / on sq41, restarted syslog, reloaded squid	[production]
00:59	<nimishg>	synchronized php-1.5/wmf-config/InitialiseSettings.php	[production]
00:53	<nimishg>	synchronizing Wikimedia installation... Revision: 75891	[production]
00:33	<apergos1>	also 44 and 43	[production]
00:30	<apergos1>	cleaning up space on other / full squids: sq42	[production]
2010-11-02 §
23:22	<apergos>	same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200	[production]
23:13	<apergos>	trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now...	[production]
23:08	<apergos>	hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space	[production]
23:08	<tfinc>	synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links'	[production]
22:40	<apergos>	added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari	[production]
19:48	<RobH>	torrus back online	[production]
19:44	<RobH>	following procedure on wikitech to fix torrus	[production]
16:46	<RobH>	sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue	[production]
16:38	<RobH>	restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs	[production]
16:35	<RobH>	sq43 was flapping since the nfs mount on ms4 was borked. restarted it	[production]
16:07	<apergos>	NFSD_SERVERS=2048 in /etc/default on ms4	[production]
16:06	<apergos>	note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and	[production]
15:54	<apergos>	hard reset on ms4, reboot was not getting the job done	[production]
15:47	<apergos>	rebootint ms4, nfsd hung and couldn't be restarted or killed.	[production]
14:04	<RobH>	restarted pdns on linne due to crash from authdns update	[production]
14:02	<RobH>	updated dns with new mgmt entries for payments, owasrvs, and owadbs	[production]
03:45	<domas>	added srv193 back to apaches pool on lvs	[production]
2010-11-01 §
23:55	<tfinc>	synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Picking up fixes for Bug #25564'	[production]
23:54	<tfinc>	synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fixes for r25564'	[production]
20:43	<domas>	ms4 mildly loaded (disks go to >100i/s each) throwing nfs timeouts, I bumped up NFSD_SERVERS to 2048	[production]
19:05	<Ryan_Lane>	powercycling srv207	[production]
16:18	<RoanKattouw>	Something weird's going on with srv207: Nagios says its SSH is up but it times out on SSH from fenari	[production]
16:15	<catrope>	synchronized php-1.5/includes/api/ApiBase.php 'r75798'	[production]
2010-10-31 §
17:21	<catrope>	synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 25719 - Add missing slash in timezone'	[production]
2010-10-30 §
23:05	<apergos>	test of logging (sorry)	[production]
21:22	<mark>	Deploying a sudoers file for NRPE using Puppet	[production]
20:48	<mark>	Running apt-get upgrade on db17	[production]
20:48	<mark>	Pushed updated wikimedia-raid-utils package into the APT repository, with a newer arcconf that should work on Lucid	[production]
15:53	<atglenn>	powercycled mobile2, it was unresponsive to ssh and pings, ganaglia showed no activity	[production]
03:05	<domas>	ms1 can't snapshot either, I suspect kernel bugs. we either have to roll back to 2.6.28 or move forward, or actually try rebuilding filesystems from scratch with new kernels...	[production]
2010-10-29 §
23:21	<domas>	lol repaired myisam tables on db9, call if data has been lost, hehe	[production]
22:58	<domas>	resynced srv154, was running with months old configuration/code.	[production]
22:58	<domas>	was db22 disabled silently by someone? or not reenabled? :) reenabled now...	[production]
22:55	<midom>	synchronized php-1.5/wmf-config/db.php	[production]
18:33	<apergos>	restarted torrus on streber, after reports that it was not responding	[production]
17:46	<apergos>	domas ran "reset-mysql-slave db18" (from fenari) which clears out all old relay logs, and restarts the slaves.	[production]