| 2010-11-05
      
      § | 
    
  | 17:43 | <RobH> | srv266 unresponsive to remote console, rebooting and updating | [production] | 
            
  | 17:42 | <RobH> | srv206 fixed, pushed back into lvs | [production] | 
            
  | 17:25 | <RobH> | working on srv206, disregard any errors it throws | [production] | 
            
  | 16:40 | <RobH> | issue with the new api servers is fixed and they are now back in service | [production] | 
            
  | 16:04 | <RobH> | some new api servers are not working right, depooled until they are fixed | [production] | 
            
  | 15:58 | <mark> | Removed ibis IPs from Squid ACLs; invalid requests issue has been resolved | [production] | 
            
  | 15:57 | <mark> | Fixed NFS mounts on apaches that had them missing since the wikimedia-task-appserver upgrade | [production] | 
            
  | 15:26 | <RobH> | working on sq57, disregard flapping | [production] | 
            
  | 15:24 | <RobH> | new api apackes srv290-srv301 are online, except srv298 which needs drac correction before installation | [production] | 
            
  | 15:22 | <RobH> | dropping old entry for tenwiki in apache config and resyncing/restarting apaches to eliminate error message | [production] | 
            
  | 15:18 | <RobH> | pushing srv291-srv301 into lvs | [production] | 
            
  | 15:11 | <RobH> | doing puppet runs on srv292-srv301 before pushing them into service | [production] | 
            
  | 14:57 | <mark> | Hacked out the 'remotemount' lines in /var/lib/dpkg/info/wikimedia-task-appserver.postrm files to prevent apaches from being without NFS mounts during/between puppet runs and package upgrades | [production] | 
            
  | 14:23 | <mark> | Deploying new package wikimedia-task-appserver 1.46 across the cluster, which removes configuration files (now handled by Puppet) | [production] | 
            
  | 11:59 | <catrope> | synchronized php-1.5/includes/api/ApiLogin.php  'Revert r76078' | [production] | 
            
  | 11:49 | <catrope> | synchronized php-1.5/includes/api/ApiLogin.php  'r76078' | [production] | 
            
  | 05:57 | <apergos> | failure booting into be3 on ms4, had to back out.  so, no progress, we are back to where we were before the  reboots. | [production] | 
            
  | 05:40 | <apergos> | cleared up luactivate error, shutdown ms4 again, trying to boot into alt boot environment | [production] | 
            
  | 05:16 | <apergos> | used shutdown on ms4, be3 showed as "active on reboot" but it booted into be0 (old boot environment) nonetheless.  *grumble* | [production] | 
            
  | 05:06 | <apergos> | rebooted ms4 into alt boot environment with current patches applied | [production] | 
            
  | 00:18 | <RobH> | new api servers are not coping down the data correctly and not reflecting config changes in puppet, so they fail, srv290+ not online yet | [production] | 
            
  
    | 2010-11-04
      
      § | 
    
  | 23:06 | <RobH> | running puppet across the new api servers srv290-srv301 then will push them in service later when i figure out why they are not doing what I want ;P | [production] | 
            
  | 20:13 | <RobH> | sq51 hatees me | [production] | 
            
  | 20:11 | <RobH> | new api servers srv290-301 are installed and showing in ganglia, having issues getting the first couple to pool into lvs before i push the rest into service | [production] | 
            
  | 20:09 | <RobH> | fixed sq51 | [production] | 
            
  | 19:29 | <RoanKattouw> | Strike that, have backed out changes | [production] | 
            
  | 19:06 | <RoanKattouw> | Until Mark's made sure they're good, that is | [production] | 
            
  | 19:06 | <RoanKattouw> | Changing some files in wmf-deployment/includes/media . DO NOT RUN SCAP or otherwise deploy these changes! | [production] | 
            
  | 18:36 | <RobH> | added dns entries for payments | [production] | 
            
  | 17:59 | <RobH> | doing puppet runs and final setup for srv290-srv301 | [production] | 
            
  | 16:56 | <rfaulk> | Added numpy Python package to grosley.wikimedia.org with apt_get ... For use in the 2010/11 fundraiser to facilitate stats gathering by providing scientific computing functionality in Python | [production] | 
            
  | 16:43 | <rfaulk> | Added MySQLdb Python package to on grosley.wikimedia.org with apt-get ... This package will be used to access fundraising databases to facilitate the gathering and synthesis of relevant statistics for the 2010/11 Wikimedia findraiser | [production] | 
            
  | 16:23 | <mark> | Set storage1 (varnish) as upload backend on sq41-50, instead of ms4 | [production] | 
            
  | 16:14 | <RobH> | sq59 is being bitchy and wont clean the cache, possible hdd issue?  will investigate later | [production] | 
            
  | 15:42 | <RobH> | sq35 back in rotation | [production] | 
            
  | 15:34 | <mark> | Added storage1 (varnish->ms4) as an HTTP backend to sq45's squid config | [production] | 
            
  | 15:34 | <RobH> | commenting out sq35, trying to make it work again in pybal | [production] | 
            
  | 15:16 | <RobH> | poking at sq59 | [production] | 
            
  | 15:06 | <RobH> | sq35 back online, pushed into lvs, partially up -  may need to wait up to 5 for idleconnect timer | [production] | 
            
  | 14:46 | <RobH> | pushed dns updates for new payments boxes and correcting owadb1/2 to db31/32 | [production] | 
            
  | 14:28 | <RobH> | sq35 set to false in pybal until i determine whats wrong with it | [production] | 
            
  | 14:09 | <mark> | Reduced CARP weight of sq41-50 from 10 to 5 | [production] | 
            
  | 13:37 | <RobH> | sq35 may flag, disregard | [production] | 
            
  | 13:30 | <RoanKattouw> | Removed uploadwizard test wiki on prototype, gonna set it up on the Commons prototype instead | [production] | 
            
  | 04:17 | <atglenn> | ganglia 3.1 now running on ms4 and ms5 | [production] | 
            
  | 01:44 | <RobH> | srv217 back in cluster | [production] | 
            
  | 00:36 | <RobH> | torrus back online | [production] | 
            
  | 00:29 | <RobH> | fixing torrus deadlock, no touchy | [production] | 
            
  | 00:18 | <tomaszf> | upped open fd's on loudon to 4096 | [production] | 
            
  | 00:17 | <RobH> | kicking srv217 for reinstall | [production] |