| 
      
        2010-08-13
      
      §
     | 
  
    
  | 15:38 | 
  <mark> | 
  Restarted backend squid on sq33 | 
  [production] | 
            
  | 15:36 | 
  <mark> | 
  synchronized php-1.5/wmf-config/mc.php  'Remove some to-be-decommissioned hosts from the down list' | 
  [production] | 
            
  | 15:25 | 
  <mark> | 
  Reinstalled sq32 with Lucid | 
  [production] | 
            
  | 15:01 | 
  <mark> | 
  Removed sq86 and sq87 from API LVS pool | 
  [production] | 
            
  | 14:55 | 
  <mark> | 
  sq80 had been down for a long time. Brought it back up and synced it | 
  [production] | 
            
  | 14:54 | 
  <rainman-sr> | 
  all of the search cluster restored to pre-relocation configuration | 
  [production] | 
            
  | 14:34 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'reverting search13 to search11' | 
  [production] | 
            
  | 13:55 | 
  <mark> | 
  /dev/sda on sq57 is busted | 
  [production] | 
            
  | 13:54 | 
  <RobH> | 
  removed search17 from search_pool_3 | 
  [production] | 
            
  | 13:50 | 
  <mark> | 
  Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal | 
  [production] | 
            
  | 13:44 | 
  <mark> | 
  powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746] | 
  [production] | 
            
  | 13:42 | 
  <mark> | 
  sq58 was down for a long long time. Brought it back up and synced it | 
  [production] | 
            
  | 13:37 | 
  <RobH> | 
  added search7 back into search_pool_3, kept search17 in as well | 
  [production] | 
            
  | 13:27 | 
  <RobH> | 
  changed search_pool_3 back from search7 to search17 since it failed | 
  [production] | 
            
  | 13:25 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use' | 
  [production] | 
            
  | 12:45 | 
  <mark> | 
  API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well | 
  [production] | 
            
  | 12:40 | 
  <mark> | 
  Shutdown puppet and backend squid on sq32 | 
  [production] | 
            
  | 11:41 | 
  <mark> | 
  Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files | 
  [production] | 
            
  | 11:37 | 
  <mark> | 
  Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs | 
  [production] | 
            
  | 11:32 | 
  <mark> | 
  Reinstalled sq31 with Lucid | 
  [production] | 
            
  | 10:25 | 
  <mark> | 
  Shutting down backend squid on sq31 to see the load impact | 
  [production] | 
            
  | 10:18 | 
  <mark> | 
  Setup backend request statistics for the API on torrus | 
  [production] | 
            
  | 09:15 | 
  <rainman-sr> | 
  bringing up search1-12 and doing some initial index warmups | 
  [production] | 
            
  | 01:54 | 
  <RobH> | 
  searchidx1, search1-search12 relocated and online, not in cluster until Robert can fix in the morning.  The other half will have to move on a different day, 12 hours in the datacenter is long enough. | 
  [production] | 
            
  | 01:40 | 
  <RobH> | 
  finished moving searchidx1 and search1-12, bringin them back up now | 
  [production] | 
            
  
    | 
      
        2010-08-12
      
      §
     | 
  
    
  | 23:10 | 
  <RobH> | 
  shutting down searchidx1, search1-12 for move | 
  [production] | 
            
  | 22:40 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'swapped search13 and search18 for migration' | 
  [production] | 
            
  | 22:37 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'reverting so search13 and search18 can change roles' | 
  [production] | 
            
  | 22:22 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'changes back in place to migrate searchidx1 and search1-10' | 
  [production] | 
            
  | 22:19 | 
  <RobH> | 
  puppet updated on all search servers, confirmed all have all three lvs ip addresses | 
  [production] | 
            
  | 21:55 | 
  <mark> | 
  Configured puppet to bind all LVS service IPs to all search servers | 
  [production] | 
            
  | 21:54 | 
  <RobH> | 
  reverted search_pool changes on lvs | 
  [production] | 
            
  | 21:54 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'rolling it back' | 
  [production] | 
            
  | 21:48 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'changing settings for migration of searchidx1 and search1-search12' | 
  [production] | 
            
  | 21:43 | 
  <RobH> | 
  changing lvs3 search pool settings for server relocations | 
  [production] | 
            
  | 20:33 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'commented out wgEnableLucenePrefixSearch for search server relocation' | 
  [production] | 
            
  | 19:30 | 
  <RobH> | 
  srv281 reinstall done but not online as puppet has multiple package issues, leaving out of lvs | 
  [production] | 
            
  | 19:09 | 
  <RobH> | 
  srv230 is on, but set to false in lvs.  do not push back into rotation until after new memory arrives and is installed tomorrow (rt#69) | 
  [production] | 
            
  | 18:59 | 
  <robh> | 
  synchronized php-1.5/wmf-config/mc.php  'updating without srv230' | 
  [production] | 
            
  | 18:53 | 
  <RobH> | 
  srv230 coming down for memory testing | 
  [production] | 
            
  | 18:49 | 
  <RobH> | 
  set srv230 to false in lvs, need to test memory | 
  [production] | 
            
  | 18:04 | 
  <RobH> | 
  reinstalling srv281 | 
  [production] | 
            
  | 17:59 | 
  <RobH> | 
  nix that, srv125 was ex-es, leaving those for now. | 
  [production] | 
            
  | 17:58 | 
  <RobH> | 
  pulling srv103 & srv125 for wipe (pulling stuff with temp warnings first) | 
  [production] | 
            
  | 17:53 | 
  <robh> | 
  synchronized php-1.5/wmf-config/mc.php  'removed srv103, replacing it with srv244' | 
  [production] | 
            
  | 17:47 | 
  <RobH> | 
  pulling srv95 for wipe | 
  [production] | 
            
  | 17:38 | 
  <RobH> | 
  srv110 removed from lvs3 config | 
  [production] | 
            
  | 17:36 | 
  <mark> | 
  Removed all apaches up to srv150 from the appserver LVS pool on lvs3 | 
  [production] | 
            
  | 17:21 | 
  <Fred> | 
  restarting apache on webservers (220,221,222,224) | 
  [production] | 
            
  | 16:45 | 
  <RobH> | 
  wipe running on adler and amane, and they have been removed from puppet and dsh node groups | 
  [production] |