| 2010-08-13
      
      § | 
    
  | 14:55 | <mark> | sq80 had been down for a long time. Brought it back up and synced it | [production] | 
            
  | 14:54 | <rainman-sr> | all of the search cluster restored to pre-relocation configuration | [production] | 
            
  | 14:34 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'reverting search13 to search11' | [production] | 
            
  | 13:55 | <mark> | /dev/sda on sq57 is busted | [production] | 
            
  | 13:54 | <RobH> | removed search17 from search_pool_3 | [production] | 
            
  | 13:50 | <mark> | Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal | [production] | 
            
  | 13:44 | <mark> | powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746] | [production] | 
            
  | 13:42 | <mark> | sq58 was down for a long long time. Brought it back up and synced it | [production] | 
            
  | 13:37 | <RobH> | added search7 back into search_pool_3, kept search17 in as well | [production] | 
            
  | 13:27 | <RobH> | changed search_pool_3 back from search7 to search17 since it failed | [production] | 
            
  | 13:25 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use' | [production] | 
            
  | 12:45 | <mark> | API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well | [production] | 
            
  | 12:40 | <mark> | Shutdown puppet and backend squid on sq32 | [production] | 
            
  | 11:41 | <mark> | Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files | [production] | 
            
  | 11:37 | <mark> | Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs | [production] | 
            
  | 11:32 | <mark> | Reinstalled sq31 with Lucid | [production] | 
            
  | 10:25 | <mark> | Shutting down backend squid on sq31 to see the load impact | [production] | 
            
  | 10:18 | <mark> | Setup backend request statistics for the API on torrus | [production] | 
            
  | 09:15 | <rainman-sr> | bringing up search1-12 and doing some initial index warmups | [production] | 
            
  | 01:54 | <RobH> | searchidx1, search1-search12 relocated and online, not in cluster until Robert can fix in the morning.  The other half will have to move on a different day, 12 hours in the datacenter is long enough. | [production] | 
            
  | 01:40 | <RobH> | finished moving searchidx1 and search1-12, bringin them back up now | [production] | 
            
  
    | 2010-08-12
      
      § | 
    
  | 23:10 | <RobH> | shutting down searchidx1, search1-12 for move | [production] | 
            
  | 22:40 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'swapped search13 and search18 for migration' | [production] | 
            
  | 22:37 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'reverting so search13 and search18 can change roles' | [production] | 
            
  | 22:22 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'changes back in place to migrate searchidx1 and search1-10' | [production] | 
            
  | 22:19 | <RobH> | puppet updated on all search servers, confirmed all have all three lvs ip addresses | [production] | 
            
  | 21:55 | <mark> | Configured puppet to bind all LVS service IPs to all search servers | [production] | 
            
  | 21:54 | <RobH> | reverted search_pool changes on lvs | [production] | 
            
  | 21:54 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'rolling it back' | [production] | 
            
  | 21:48 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'changing settings for migration of searchidx1 and search1-search12' | [production] | 
            
  | 21:43 | <RobH> | changing lvs3 search pool settings for server relocations | [production] | 
            
  | 20:33 | <robh> | synchronized php-1.5/wmf-config/lucene.php  'commented out wgEnableLucenePrefixSearch for search server relocation' | [production] | 
            
  | 19:30 | <RobH> | srv281 reinstall done but not online as puppet has multiple package issues, leaving out of lvs | [production] | 
            
  | 19:09 | <RobH> | srv230 is on, but set to false in lvs.  do not push back into rotation until after new memory arrives and is installed tomorrow (rt#69) | [production] | 
            
  | 18:59 | <robh> | synchronized php-1.5/wmf-config/mc.php  'updating without srv230' | [production] | 
            
  | 18:53 | <RobH> | srv230 coming down for memory testing | [production] | 
            
  | 18:49 | <RobH> | set srv230 to false in lvs, need to test memory | [production] | 
            
  | 18:04 | <RobH> | reinstalling srv281 | [production] | 
            
  | 17:59 | <RobH> | nix that, srv125 was ex-es, leaving those for now. | [production] | 
            
  | 17:58 | <RobH> | pulling srv103 & srv125 for wipe (pulling stuff with temp warnings first) | [production] | 
            
  | 17:53 | <robh> | synchronized php-1.5/wmf-config/mc.php  'removed srv103, replacing it with srv244' | [production] | 
            
  | 17:47 | <RobH> | pulling srv95 for wipe | [production] | 
            
  | 17:38 | <RobH> | srv110 removed from lvs3 config | [production] | 
            
  | 17:36 | <mark> | Removed all apaches up to srv150 from the appserver LVS pool on lvs3 | [production] | 
            
  | 17:21 | <Fred> | restarting apache on webservers (220,221,222,224) | [production] | 
            
  | 16:45 | <RobH> | wipe running on adler and amane, and they have been removed from puppet and dsh node groups | [production] | 
            
  | 16:12 | <jeluf> | synchronized docroot/bits/index.html | [production] | 
            
  | 15:41 | <mark> | Setup ports ge-2/0/0 to ge-2/0/20 for search servers on asw-b-sdtpa | [production] | 
            
  | 15:03 | <mark> | Shutdown BGP session to AS1257 130.244.6.249 on port 2/5 of br1-knams, preparing for cable move | [production] | 
            
  | 13:08 | <mark> | Recovered backend squid on knsq11 | [production] |