| 
      
        2010-08-14
      
      §
     | 
  
    
  | 14:18 | 
  <mark> | 
  synchronized php-1.5/wmf-config/db.php  'Add ms2 and ms1 to clusters rc1 an cluster22' | 
  [production] | 
            
  | 14:06 | 
  <mark> | 
  FLUSH TABLES WITH READ LOCK on ms1 for testing | 
  [production] | 
            
  | 13:59 | 
  <mark> | 
  Stopping mysql on ms1 as monitoring test | 
  [production] | 
            
  | 13:59 | 
  <mark> | 
  Granted SELECT on mysql.* to nagios on ms3 | 
  [production] | 
            
  | 10:57 | 
  <mark> | 
  Removed oldest LVM snapshot on ixia | 
  [production] | 
            
  | 09:43 | 
  <mark> | 
  Fixed apparmor profile /etc/apparmor.d/usr.sbin.mysqld on ms1, restarted mysql under apparmor | 
  [production] | 
            
  | 09:39 | 
  <mark> | 
  START SLAVE on ms1, catching up with ms3 | 
  [production] | 
            
  | 09:38 | 
  <mark> | 
  RESET SLAVE on db5 | 
  [production] | 
            
  | 09:37 | 
  <mark> | 
  STOP SLAVE on db5 | 
  [production] | 
            
  | 09:35 | 
  <mark> | 
  Stopped apparmor on ms1 | 
  [production] | 
            
  | 08:41 | 
  <Andrew> | 
  Leaving as-is for now, hoping somebody with appropriate permissions can fix it later. | 
  [production] | 
            
  | 08:40 | 
  <Andrew> | 
  STOP SLAVE on db5 gives me ERROR 1045 (00000): Access denied for user: 'wikiadmin@208.80.152.%' (Using password: NO) | 
  [production] | 
            
  | 08:34 | 
  <Andrew> | 
  Slave is supposedly still running on db5. Assuming Roan didn't stop it when he switched masters a few days ago. Going to text somebody to confirm that stopping is correct course of action. | 
  [production] | 
            
  | 08:24 | 
  <Andrew> | 
  db5 can't be lagged, it's the master ;-). Obviously something wrong with wfWaitForSlaves. | 
  [production] | 
            
  | 08:19 | 
  <Andrew> | 
  db5 lagged 217904 seconds | 
  [production] | 
            
  | 05:09 | 
  <Andrew> | 
  Ran thread_pending_relationship and thread_reaction schema changes on all LiquidThreads wikis | 
  [production] | 
            
  | 05:06 | 
  <andrew> | 
  synchronizing Wikimedia installation... Revision: 70933 | 
  [production] | 
            
  | 05:04 | 
  <Andrew> | 
  About to update LiquidThreads production version to the alpha. | 
  [production] | 
            
  
    | 
      
        2010-08-13
      
      §
     | 
  
    
  | 22:03 | 
  <mark> | 
  API logins on commons (only) are reported broken | 
  [production] | 
            
  | 21:45 | 
  <mark> | 
  Set correct $cluster variable for reinstalled knsq* squids | 
  [production] | 
            
  | 21:03 | 
  <mark> | 
  Increased cache_mem from 1000 to 2500 on sq33, like the other API backend squids | 
  [production] | 
            
  | 20:58 | 
  <mark> | 
  Stopping backend squid on sq33 | 
  [production] | 
            
  | 20:50 | 
  <jeluf> | 
  synchronized php-1.5/wmf-config/InitialiseSettings.php  '24769 - Import source addition for tpi.wikipedia.org' | 
  [production] | 
            
  | 17:46 | 
  <Fred> | 
  and srv100 | 
  [production] | 
            
  | 17:45 | 
  <Fred> | 
  restarted apache on srv219 and srv222 | 
  [production] | 
            
  | 15:57 | 
  <mark> | 
  synchronized php-1.5/wmf-config/mc.php  'Remove some to-be-decommissioned from the down list' | 
  [production] | 
            
  | 15:56 | 
  <mark> | 
  synchronized php-1.5/wmf-config/mc.php  'Remove some to-be-decommissioned hosts from the down list' | 
  [production] | 
            
  | 15:53 | 
  <RobH> | 
  srv146 removed from puppet and nodelists, slated for wipe, decommissioned. | 
  [production] | 
            
  | 15:47 | 
  <mark> | 
  Sent srv146 to death using echo b > /proc/sysrq-trigger. It had a read-only filesystem and is therefore decommissioned. | 
  [production] | 
            
  | 15:38 | 
  <mark> | 
  Restarted backend squid on sq33 | 
  [production] | 
            
  | 15:36 | 
  <mark> | 
  synchronized php-1.5/wmf-config/mc.php  'Remove some to-be-decommissioned hosts from the down list' | 
  [production] | 
            
  | 15:25 | 
  <mark> | 
  Reinstalled sq32 with Lucid | 
  [production] | 
            
  | 15:01 | 
  <mark> | 
  Removed sq86 and sq87 from API LVS pool | 
  [production] | 
            
  | 14:55 | 
  <mark> | 
  sq80 had been down for a long time. Brought it back up and synced it | 
  [production] | 
            
  | 14:54 | 
  <rainman-sr> | 
  all of the search cluster restored to pre-relocation configuration | 
  [production] | 
            
  | 14:34 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'reverting search13 to search11' | 
  [production] | 
            
  | 13:55 | 
  <mark> | 
  /dev/sda on sq57 is busted | 
  [production] | 
            
  | 13:54 | 
  <RobH> | 
  removed search17 from search_pool_3 | 
  [production] | 
            
  | 13:50 | 
  <mark> | 
  Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal | 
  [production] | 
            
  | 13:44 | 
  <mark> | 
  powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746] | 
  [production] | 
            
  | 13:42 | 
  <mark> | 
  sq58 was down for a long long time. Brought it back up and synced it | 
  [production] | 
            
  | 13:37 | 
  <RobH> | 
  added search7 back into search_pool_3, kept search17 in as well | 
  [production] | 
            
  | 13:27 | 
  <RobH> | 
  changed search_pool_3 back from search7 to search17 since it failed | 
  [production] | 
            
  | 13:25 | 
  <robh> | 
  synchronized php-1.5/wmf-config/lucene.php  'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use' | 
  [production] | 
            
  | 12:45 | 
  <mark> | 
  API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well | 
  [production] | 
            
  | 12:40 | 
  <mark> | 
  Shutdown puppet and backend squid on sq32 | 
  [production] | 
            
  | 11:41 | 
  <mark> | 
  Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files | 
  [production] | 
            
  | 11:37 | 
  <mark> | 
  Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs | 
  [production] | 
            
  | 11:32 | 
  <mark> | 
  Reinstalled sq31 with Lucid | 
  [production] | 
            
  | 10:25 | 
  <mark> | 
  Shutting down backend squid on sq31 to see the load impact | 
  [production] |