| 2011-12-19
      
      § | 
    
  | 15:49 | <RobH> | dataset1 reinstalled and has had puppet run.  Now to see if it can keep time | [production] | 
            
  | 15:46 | <RoanKattouw> | maerlant is fried, load avg is 500+, linearly increasing since Friday. Rejects SSH login attempts | [production] | 
            
  | 15:45 | <notpeter> | restarting indexer on searchidx2 | [production] | 
            
  | 14:16 | <apergos> | thumb cleaner to bed for the night... for the last time? | [production] | 
            
  | 13:15 | <mutante> | truncated spence.cfg in ./puppet_checks.d/ - it had multiple dupe service definitions for all checks on spence | [production] | 
            
  | 13:11 | <mutante> | commented check_job_queue stuff from non-puppetized files on spence (hosts.cfg, conf.php) to get rid of "duplicate definition" now that it's been pupptized | [production] | 
            
  | 12:35 | <mutante> | deleted snapshot4 files from /var/lib/puppet/yaml/node and ./yaml/facts on sockpuppet and stafford, they got recreated and fixed puppet run on sn4 | [production] | 
            
  | 10:08 | <apergos> | a few more binlogs on db9 gone.  eeking out another 12 hours or so | [production] | 
            
  | 06:57 | <apergos> | thumb cleaner awake for the day. poor thing, slaving away but soon it will be able to retire | [production] | 
            
  | 01:57 | <LocalisationUpdate> | failed (1.18) at Mon Dec 19 02:00:11 UTC 2011 | [production] | 
            
  
    | 2011-12-17
      
      § | 
    
  | 22:49 | <RobH> | Anytime db9 hits 98 or 99% someone needs to remove binlogs to bring it back down to 94 or 95% | [production] | 
            
  | 22:48 | <RobH> | removed older binlogs on db9 again to kick it back to a bit more free space to last the weekend. | [production] | 
            
  | 17:53 | <catrope> | synchronized wmf-config/CommonSettings.php  'Remove SVN dir setting, this is now passed in on the command line' | [production] | 
            
  | 16:43 | <RoanKattouw> | Found out why LocalisationUpdate was failing. Would have been fixed already if puppet had been running on fenari, but it's throwing errors. See [[rev:1617|r1617]] and my comment on [[rev:1558|r1558]] | [production] | 
            
  | 14:32 | <apergos> | thumb cleaner to bed for the night... about 2 days left I think | [production] | 
            
  | 07:25 | <apergos> | thumb cleaner started up for the day | [production] | 
            
  | 01:57 | <LocalisationUpdate> | failed (1.18) at Sat Dec 17 02:00:18 UTC 2011 | [production] | 
            
  
    | 2011-12-16
      
      § | 
    
  | 22:30 | <RobH> | reclaimed space on db9, restarted mysql, services seem to be recovering | [production] | 
            
  | 22:24 | <maplebed> | restarting mysql on db9; brief downtime for a number of apps (bugzilla, blog, etc.) expected. | [production] | 
            
  | 22:03 | <RobH> | db9 space reclaimed back to 94% full, related services should start recovering | [production] | 
            
  | 21:57 | <RobH> | db9 disk full, related services are messing up, fixing | [production] | 
            
  | 21:56 | <RobH> | kicking apache for bz related issues on kaulen | [production] | 
            
  | 19:14 | <catrope> | synchronized php-1.18/resources/startup.js  'touch' | [production] | 
            
  | 19:07 | <catrope> | synchronized wmf-config/InitialiseSettings.php  'Set AFTv4 lottery odds to 100% on en_labswikimedia' | [production] | 
            
  | 18:48 | <LeslieCarr> | removed the ssl* yaml logs on stafford to fix the puppet not running error | [production] | 
            
  | 16:13 | <apergos> | thumb cleaner to bed for the night. definitely need an alarm clock for this... good thing it's only got about 4 days of backlog left | [production] | 
            
  | 15:41 | <RobH> | es1002 being actively worked on for hdd controller testing | [production] | 
            
  | 15:39 | <RobH> | lvs1003 disk dead per RT 1549, will torubleshoot on site later today or Monday | [production] | 
            
  | 15:32 | <RobH> | lvs1003 unresponsive to serial console, rebooting | [production] | 
            
  | 15:18 | <RobH> | reinstalling dataset1 | [production] | 
            
  | 14:45 | <mutante> | puppet was broken on all servers including "nrpe" due to package conflict with nagios-plugins-basic i added to base, revert+fix | [production] | 
            
  | 13:29 | <RoanKattouw> | Dropping and recreating AFTv5 tables on en_labswikimedia and enwiki | [production] | 
            
  | 13:26 | <catrope> | synchronized php-1.18/extensions/ArticleFeedbackv5/  'Updating to trunk state' | [production] | 
            
  | 13:25 | <mutante> | tweaked Nagios earlier today: external command_check_interval & event_broker_options (see comments in gerrit Id3b4a458) | [production] | 
            
  | 13:01 | <mark> | Found lvs5 and lvs6 with offload-gro enabled, even though it's set disabled in /etc/network/interfaces... corrected | [production] | 
            
  | 09:21 | <apergos> | restarted lighthttpd on ds2, it had stopped (and why didn't nagios tell us? ) | [production] | 
            
  | 08:38 | <mutante> | spence - had killed additional notifications.cgi and history.cgi procs, waited 5 minutes, load went down a lot, restarting nagios | [production] | 
            
  | 08:23 | <mutante> | spence - almost unusable, Nagios notifications.cgi and history.cgi use a lot of memory, stopping Nagios, watching swap | [production] | 
            
  | 08:15 | <mutante> | spence slow again, side-note: tried to use "sar" to investigate but "Please check if data collecting is enabled in /etc/default/sysstat" (want to?) | [production] | 
            
  | 07:54 | <nikerabbit> | synchronized php-1.18/extensions/WebFonts/resources/ext.webfonts.js  'JS fix [[rev:106418|r106418]]' | [production] | 
            
  | 07:09 | <apergos> | thumbs cleaner awake for the day | [production] | 
            
  | 01:57 | <LocalisationUpdate> | failed (1.18) at Fri Dec 16 02:00:14 UTC 2011 | [production] |