production SAL

9101-9150 of 10000 results (36ms)

2010-11-03 §
18:31	<RobH>	reinstalling srv281, tired of lookin at it in red	[production]
17:18	<mark>	Upgraded storage1 to Lucid	[production]
16:42	<mark>	Removing 2010-03 snapshots on ms4	[production]
16:01	<mark>	Fixed sshd on ms4	[production]
15:46	<mark>	Removing 2010-02 snapshots on ms4	[production]
15:45	<mark>	Disabled gmetric cron jobs on ms4	[production]
15:43	<mark>	Disabled daily snapshot generation on ms4	[production]
15:27	<mark>	Restarted gmond on ms4	[production]
15:24	<mark>	Upgraded puppet on ms4	[production]
15:13	<mark>	Powercycled knsq2	[production]
14:52	<mark>	Removing daily snapshots for 2010-10 on ms4	[production]
14:24	<mark>	Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils	[production]
05:34	<tstarling>	synchronized php-1.5/includes/Math.php 'r75909'	[production]
04:52	<apergos>	oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified.	[production]
04:18	<apergos>	lather rinse repeat for sq47, I hope that's all of 'em	[production]
03:46	<apergos>	repeated on sq45...	[production]
03:13	<apergos>	same old story on sq46... restarted syslog, reloaded squid, got back some space on /	[production]
02:41	<apergos>	er... and deleted the log file :-P	[production]
02:38	<apergos>	moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier	[production]
02:32	<apergos>	cleaned up / on sq41, restarted syslog, reloaded squid	[production]
00:59	<nimishg>	synchronized php-1.5/wmf-config/InitialiseSettings.php	[production]
00:53	<nimishg>	synchronizing Wikimedia installation... Revision: 75891	[production]
00:33	<apergos1>	also 44 and 43	[production]
00:30	<apergos1>	cleaning up space on other / full squids: sq42	[production]
2010-11-02 §
23:22	<apergos>	same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200	[production]
23:13	<apergos>	trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now...	[production]
23:08	<apergos>	hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space	[production]
23:08	<tfinc>	synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links'	[production]
22:40	<apergos>	added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari	[production]
19:48	<RobH>	torrus back online	[production]
19:44	<RobH>	following procedure on wikitech to fix torrus	[production]
16:46	<RobH>	sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue	[production]
16:38	<RobH>	restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs	[production]
16:35	<RobH>	sq43 was flapping since the nfs mount on ms4 was borked. restarted it	[production]
16:07	<apergos>	NFSD_SERVERS=2048 in /etc/default on ms4	[production]
16:06	<apergos>	note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and	[production]
15:54	<apergos>	hard reset on ms4, reboot was not getting the job done	[production]
15:47	<apergos>	rebootint ms4, nfsd hung and couldn't be restarted or killed.	[production]
14:04	<RobH>	restarted pdns on linne due to crash from authdns update	[production]
14:02	<RobH>	updated dns with new mgmt entries for payments, owasrvs, and owadbs	[production]
03:45	<domas>	added srv193 back to apaches pool on lvs	[production]
2010-11-01 §
23:55	<tfinc>	synchronized php-1.5/extensions/CentralNotice/SpecialBannerController.php 'Picking up fixes for Bug #25564'	[production]
23:54	<tfinc>	synchronized php-1.5/extensions/CentralNotice/CentralNotice.php 'Picking up fixes for r25564'	[production]
20:43	<domas>	ms4 mildly loaded (disks go to >100i/s each) throwing nfs timeouts, I bumped up NFSD_SERVERS to 2048	[production]
19:05	<Ryan_Lane>	powercycling srv207	[production]
16:18	<RoanKattouw>	Something weird's going on with srv207: Nagios says its SSH is up but it times out on SSH from fenari	[production]
16:15	<catrope>	synchronized php-1.5/includes/api/ApiBase.php 'r75798'	[production]
2010-10-31 §
17:21	<catrope>	synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 25719 - Add missing slash in timezone'	[production]
2010-10-30 §
23:05	<apergos>	test of logging (sorry)	[production]
21:22	<mark>	Deploying a sudoers file for NRPE using Puppet	[production]