production SAL

1251-1300 of 10000 results (8ms)

2010-11-04 §
00:18	<tomaszf>	upped open fd's on loudon to 4096	[production]
00:17	<RobH>	kicking srv217 for reinstall	[production]
2010-11-03 §
21:22	<RobH>	updated puppet to properly remove memcached from memcached::false entries and removed the host memcached check for servers no longer running memcached, hup'd nagios to take the change	[production]
21:21	<atglenn>	rebooting ms5 after OS update. note that we were unable to get some of the more recent patches, they are probably from after the sun->oracle transition	[production]
21:02	<nimishg>	synchronized php-1.5/extensions/LandingCheck/LandingCheck.i18n.php 'r75890'	[production]
21:02	<nimishg>	synchronized php-1.5/extensions/LandingCheck/LandingCheck.alias.php 'r75890'	[production]
21:01	<nimishg>	synchronized php-1.5/extensions/LandingCheck/SpecialLandingCheck.php 'r75890'	[production]
21:01	<nimishg>	synchronized php-1.5/extensions/LandingCheck/LandingCheck.php 'r75890'	[production]
20:31	<atglenn>	removed about 1.5T of stuff off of /export on ms4 (old backups, solaris isos, etc)	[production]
19:41	<catrope>	synchronized php-1.5/README 'Dummy sync so I can document what the errors look like'	[production]
19:32	<tfinc>	synchronized php-1.5/wmf-config/CommonSettings.php 'Backing out config change for stats fix'	[production]
19:31	<RobH>	srv281 still down, setting to false in pybal just so it doesnt keep trying to use it	[production]
18:31	<RobH>	reinstalling srv281, tired of lookin at it in red	[production]
17:18	<mark>	Upgraded storage1 to Lucid	[production]
16:42	<mark>	Removing 2010-03 snapshots on ms4	[production]
16:01	<mark>	Fixed sshd on ms4	[production]
15:46	<mark>	Removing 2010-02 snapshots on ms4	[production]
15:45	<mark>	Disabled gmetric cron jobs on ms4	[production]
15:43	<mark>	Disabled daily snapshot generation on ms4	[production]
15:27	<mark>	Restarted gmond on ms4	[production]
15:24	<mark>	Upgraded puppet on ms4	[production]
15:13	<mark>	Powercycled knsq2	[production]
14:52	<mark>	Removing daily snapshots for 2010-10 on ms4	[production]
14:24	<mark>	Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils	[production]
05:34	<tstarling>	synchronized php-1.5/includes/Math.php 'r75909'	[production]
04:52	<apergos>	oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified.	[production]
04:18	<apergos>	lather rinse repeat for sq47, I hope that's all of 'em	[production]
03:46	<apergos>	repeated on sq45...	[production]
03:13	<apergos>	same old story on sq46... restarted syslog, reloaded squid, got back some space on /	[production]
02:41	<apergos>	er... and deleted the log file :-P	[production]
02:38	<apergos>	moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier	[production]
02:32	<apergos>	cleaned up / on sq41, restarted syslog, reloaded squid	[production]
00:59	<nimishg>	synchronized php-1.5/wmf-config/InitialiseSettings.php	[production]
00:53	<nimishg>	synchronizing Wikimedia installation... Revision: 75891	[production]
00:33	<apergos1>	also 44 and 43	[production]
00:30	<apergos1>	cleaning up space on other / full squids: sq42	[production]
2010-11-02 §
23:22	<apergos>	same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200	[production]
23:13	<apergos>	trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now...	[production]
23:08	<apergos>	hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space	[production]
23:08	<tfinc>	synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links'	[production]
22:40	<apergos>	added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari	[production]
19:48	<RobH>	torrus back online	[production]
19:44	<RobH>	following procedure on wikitech to fix torrus	[production]
16:46	<RobH>	sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue	[production]
16:38	<RobH>	restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs	[production]
16:35	<RobH>	sq43 was flapping since the nfs mount on ms4 was borked. restarted it	[production]
16:07	<apergos>	NFSD_SERVERS=2048 in /etc/default on ms4	[production]
16:06	<apergos>	note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and	[production]
15:54	<apergos>	hard reset on ms4, reboot was not getting the job done	[production]
15:47	<apergos>	rebootint ms4, nfsd hung and couldn't be restarted or killed.	[production]