651-700 of 10000 results (13ms)
2011-12-19 §
16:42 <RobH> dataset1 new data partition ready and setup to automount [production]
15:49 <RobH> dataset1 reinstalled and has had puppet run. Now to see if it can keep time [production]
15:46 <RoanKattouw> maerlant is fried, load avg is 500+, linearly increasing since Friday. Rejects SSH login attempts [production]
15:45 <notpeter> restarting indexer on searchidx2 [production]
14:16 <apergos> thumb cleaner to bed for the night... for the last time? [production]
13:15 <mutante> truncated spence.cfg in ./puppet_checks.d/ - it had multiple dupe service definitions for all checks on spence [production]
13:11 <mutante> commented check_job_queue stuff from non-puppetized files on spence (hosts.cfg, conf.php) to get rid of "duplicate definition" now that it's been pupptized [production]
12:35 <mutante> deleted snapshot4 files from /var/lib/puppet/yaml/node and ./yaml/facts on sockpuppet and stafford, they got recreated and fixed puppet run on sn4 [production]
10:08 <apergos> a few more binlogs on db9 gone. eeking out another 12 hours or so [production]
06:57 <apergos> thumb cleaner awake for the day. poor thing, slaving away but soon it will be able to retire [production]
01:57 <LocalisationUpdate> failed (1.18) at Mon Dec 19 02:00:11 UTC 2011 [production]
2011-12-18 §
16:41 <notpeter> removing about 4G of binlogs from db9. everything more than 24 hours old. [production]
15:12 <apergos> thumb cleaner sleeping it off for the night [production]
07:38 <jeremyb> 17 mins ago <apergos> thumb cleaner to work for the day [production]
01:57 <LocalisationUpdate> failed (1.18) at Sun Dec 18 02:00:04 UTC 2011 [production]
2011-12-17 §
22:49 <RobH> Anytime db9 hits 98 or 99% someone needs to remove binlogs to bring it back down to 94 or 95% [production]
22:48 <RobH> removed older binlogs on db9 again to kick it back to a bit more free space to last the weekend. [production]
17:53 <catrope> synchronized wmf-config/CommonSettings.php 'Remove SVN dir setting, this is now passed in on the command line' [production]
16:43 <RoanKattouw> Found out why LocalisationUpdate was failing. Would have been fixed already if puppet had been running on fenari, but it's throwing errors. See [[rev:1617|r1617]] and my comment on [[rev:1558|r1558]] [production]
14:32 <apergos> thumb cleaner to bed for the night... about 2 days left I think [production]
07:25 <apergos> thumb cleaner started up for the day [production]
01:57 <LocalisationUpdate> failed (1.18) at Sat Dec 17 02:00:18 UTC 2011 [production]
2011-12-16 §
22:30 <RobH> reclaimed space on db9, restarted mysql, services seem to be recovering [production]
22:24 <maplebed> restarting mysql on db9; brief downtime for a number of apps (bugzilla, blog, etc.) expected. [production]
22:03 <RobH> db9 space reclaimed back to 94% full, related services should start recovering [production]
21:57 <RobH> db9 disk full, related services are messing up, fixing [production]
21:56 <RobH> kicking apache for bz related issues on kaulen [production]
19:14 <catrope> synchronized php-1.18/resources/startup.js 'touch' [production]
19:07 <catrope> synchronized wmf-config/InitialiseSettings.php 'Set AFTv4 lottery odds to 100% on en_labswikimedia' [production]
18:48 <LeslieCarr> removed the ssl* yaml logs on stafford to fix the puppet not running error [production]
16:13 <apergos> thumb cleaner to bed for the night. definitely need an alarm clock for this... good thing it's only got about 4 days of backlog left [production]
15:41 <RobH> es1002 being actively worked on for hdd controller testing [production]
15:39 <RobH> lvs1003 disk dead per RT 1549, will torubleshoot on site later today or Monday [production]
15:32 <RobH> lvs1003 unresponsive to serial console, rebooting [production]
15:18 <RobH> reinstalling dataset1 [production]
14:45 <mutante> puppet was broken on all servers including "nrpe" due to package conflict with nagios-plugins-basic i added to base, revert+fix [production]
13:29 <RoanKattouw> Dropping and recreating AFTv5 tables on en_labswikimedia and enwiki [production]
13:26 <catrope> synchronized php-1.18/extensions/ArticleFeedbackv5/ 'Updating to trunk state' [production]
13:25 <mutante> tweaked Nagios earlier today: external command_check_interval & event_broker_options (see comments in gerrit Id3b4a458) [production]
13:01 <mark> Found lvs5 and lvs6 with offload-gro enabled, even though it's set disabled in /etc/network/interfaces... corrected [production]
09:21 <apergos> restarted lighthttpd on ds2, it had stopped (and why didn't nagios tell us? ) [production]
08:38 <mutante> spence - had killed additional notifications.cgi and history.cgi procs, waited 5 minutes, load went down a lot, restarting nagios [production]
08:23 <mutante> spence - almost unusable, Nagios notifications.cgi and history.cgi use a lot of memory, stopping Nagios, watching swap [production]
08:15 <mutante> spence slow again, side-note: tried to use "sar" to investigate but "Please check if data collecting is enabled in /etc/default/sysstat" (want to?) [production]
07:54 <nikerabbit> synchronized php-1.18/extensions/WebFonts/resources/ext.webfonts.js 'JS fix [[rev:106418|r106418]]' [production]
07:09 <apergos> thumbs cleaner awake for the day [production]
01:57 <LocalisationUpdate> failed (1.18) at Fri Dec 16 02:00:14 UTC 2011 [production]
2011-12-15 §
23:19 <LeslieCarr> pushing rule to planet.wikimedia.org which should redirect all https to http [production]
23:00 <LeslieCarr> puppetized planet.wikimedia.org on singer [production]
22:41 <LeslieCarr> removing https support from planet.wikimedia.org [production]