production SAL

2851-2900 of 10000 results (22ms)

2011-12-19 §
15:49	<RobH>	dataset1 reinstalled and has had puppet run. Now to see if it can keep time	[production]
15:46	<RoanKattouw>	maerlant is fried, load avg is 500+, linearly increasing since Friday. Rejects SSH login attempts	[production]
15:45	<notpeter>	restarting indexer on searchidx2	[production]
14:16	<apergos>	thumb cleaner to bed for the night... for the last time?	[production]
13:15	<mutante>	truncated spence.cfg in ./puppet_checks.d/ - it had multiple dupe service definitions for all checks on spence	[production]
13:11	<mutante>	commented check_job_queue stuff from non-puppetized files on spence (hosts.cfg, conf.php) to get rid of "duplicate definition" now that it's been pupptized	[production]
12:35	<mutante>	deleted snapshot4 files from /var/lib/puppet/yaml/node and ./yaml/facts on sockpuppet and stafford, they got recreated and fixed puppet run on sn4	[production]
10:08	<apergos>	a few more binlogs on db9 gone. eeking out another 12 hours or so	[production]
06:57	<apergos>	thumb cleaner awake for the day. poor thing, slaving away but soon it will be able to retire	[production]
01:57	<LocalisationUpdate>	failed (1.18) at Mon Dec 19 02:00:11 UTC 2011	[production]
2011-12-18 §
16:41	<notpeter>	removing about 4G of binlogs from db9. everything more than 24 hours old.	[production]
15:12	<apergos>	thumb cleaner sleeping it off for the night	[production]
07:38	<jeremyb>	17 mins ago <apergos> thumb cleaner to work for the day	[production]
01:57	<LocalisationUpdate>	failed (1.18) at Sun Dec 18 02:00:04 UTC 2011	[production]
2011-12-17 §
22:49	<RobH>	Anytime db9 hits 98 or 99% someone needs to remove binlogs to bring it back down to 94 or 95%	[production]
22:48	<RobH>	removed older binlogs on db9 again to kick it back to a bit more free space to last the weekend.	[production]
17:53	<catrope>	synchronized wmf-config/CommonSettings.php 'Remove SVN dir setting, this is now passed in on the command line'	[production]
16:43	<RoanKattouw>	Found out why LocalisationUpdate was failing. Would have been fixed already if puppet had been running on fenari, but it's throwing errors. See [[rev:1617\|r1617]] and my comment on [[rev:1558\|r1558]]	[production]
14:32	<apergos>	thumb cleaner to bed for the night... about 2 days left I think	[production]
07:25	<apergos>	thumb cleaner started up for the day	[production]
01:57	<LocalisationUpdate>	failed (1.18) at Sat Dec 17 02:00:18 UTC 2011	[production]
2011-12-16 §
22:30	<RobH>	reclaimed space on db9, restarted mysql, services seem to be recovering	[production]
22:24	<maplebed>	restarting mysql on db9; brief downtime for a number of apps (bugzilla, blog, etc.) expected.	[production]
22:03	<RobH>	db9 space reclaimed back to 94% full, related services should start recovering	[production]
21:57	<RobH>	db9 disk full, related services are messing up, fixing	[production]
21:56	<RobH>	kicking apache for bz related issues on kaulen	[production]
19:14	<catrope>	synchronized php-1.18/resources/startup.js 'touch'	[production]
19:07	<catrope>	synchronized wmf-config/InitialiseSettings.php 'Set AFTv4 lottery odds to 100% on en_labswikimedia'	[production]
18:48	<LeslieCarr>	removed the ssl* yaml logs on stafford to fix the puppet not running error	[production]
16:13	<apergos>	thumb cleaner to bed for the night. definitely need an alarm clock for this... good thing it's only got about 4 days of backlog left	[production]
15:41	<RobH>	es1002 being actively worked on for hdd controller testing	[production]
15:39	<RobH>	lvs1003 disk dead per RT 1549, will torubleshoot on site later today or Monday	[production]
15:32	<RobH>	lvs1003 unresponsive to serial console, rebooting	[production]
15:18	<RobH>	reinstalling dataset1	[production]
14:45	<mutante>	puppet was broken on all servers including "nrpe" due to package conflict with nagios-plugins-basic i added to base, revert+fix	[production]
13:29	<RoanKattouw>	Dropping and recreating AFTv5 tables on en_labswikimedia and enwiki	[production]
13:26	<catrope>	synchronized php-1.18/extensions/ArticleFeedbackv5/ 'Updating to trunk state'	[production]
13:25	<mutante>	tweaked Nagios earlier today: external command_check_interval & event_broker_options (see comments in gerrit Id3b4a458)	[production]
13:01	<mark>	Found lvs5 and lvs6 with offload-gro enabled, even though it's set disabled in /etc/network/interfaces... corrected	[production]
09:21	<apergos>	restarted lighthttpd on ds2, it had stopped (and why didn't nagios tell us? )	[production]
08:38	<mutante>	spence - had killed additional notifications.cgi and history.cgi procs, waited 5 minutes, load went down a lot, restarting nagios	[production]
08:23	<mutante>	spence - almost unusable, Nagios notifications.cgi and history.cgi use a lot of memory, stopping Nagios, watching swap	[production]
08:15	<mutante>	spence slow again, side-note: tried to use "sar" to investigate but "Please check if data collecting is enabled in /etc/default/sysstat" (want to?)	[production]
07:54	<nikerabbit>	synchronized php-1.18/extensions/WebFonts/resources/ext.webfonts.js 'JS fix [[rev:106418\|r106418]]'	[production]
07:09	<apergos>	thumbs cleaner awake for the day	[production]
01:57	<LocalisationUpdate>	failed (1.18) at Fri Dec 16 02:00:14 UTC 2011	[production]
2011-12-15 §
23:19	<LeslieCarr>	pushing rule to planet.wikimedia.org which should redirect all https to http	[production]
23:00	<LeslieCarr>	puppetized planet.wikimedia.org on singer	[production]
22:41	<LeslieCarr>	removing https support from planet.wikimedia.org	[production]
21:43	<awjrichards>	synchronized php/extensions/LandingCheck/SpecialLandingCheck.php '[[rev:106377\|r106377]]'	[production]