| 
      
        2009-01-08
      
      §
     | 
  
    
  | 22:08 | 
  <brion> | 
  putting db12 back in service, caught up | 
  [production] | 
            
  | 21:42 | 
  <RobH> | 
  changed the ip address for the management interfaces on sq31-sq50 | 
  [production] | 
            
  | 21:30 | 
  <RobH> | 
  updated dns with the squids and srv mangement info for pmtpa | 
  [production] | 
            
  | 21:16 | 
  <brion> | 
  taking load off db12 while it updates | 
  [production] | 
            
  | 21:15 | 
  <brion> | 
  killing stuck query threads on db12 (lagged 13k seconds) | 
  [production] | 
            
  | 20:23 | 
  <RobH> | 
  updated dns removing a large number of decommissioned servers from records. | 
  [production] | 
            
  | 20:08 | 
  <RobH> | 
  pushed updates to dns for mangement ip allocations, changed mangement ips of search8-search12 | 
  [production] | 
            
  | 19:43 | 
  <RobH> | 
  changed the mangement ip addresses of db5-db10 to fit into current ip scheme | 
  [production] | 
            
  | 18:20 | 
  <RobH> | 
  updated dns for the management name resolution of db11-db30 | 
  [production] | 
            
  | 18:11 | 
  <RobH> | 
  ms5 has lom access enabled and is ready for testing.  (Only one ethernet connection in lieu of the typical 3 on the thumper/thors) | 
  [production] | 
            
  | 15:50 | 
  <RobH> | 
  srv118 reinstalled | 
  [production] | 
            
  | 15:46 | 
  <RobH> | 
  srv136 is borked.  Even after reinstall, it will run for a few minutes, then lock hard.  Going to RMA it. | 
  [production] | 
            
  | 15:38 | 
  <RobH> | 
  reinstalled srv136 and srv118 cuz they were pissing me off (a valid reinstallation reason if there ever was one.) | 
  [production] | 
            
  | 15:09 | 
  <RobH> | 
  and srv118 back down, thing is borked. | 
  [production] | 
            
  | 15:06 | 
  <RobH> | 
  srv118 back online and serving requests. | 
  [production] | 
            
  | 15:01 | 
  <RobH> | 
  pushed db13 back into cluster, same with db14, from yesterdays work | 
  [production] | 
            
  | 14:26 | 
  <RobH> | 
  srv101 back online and in lvs | 
  [production] | 
            
  | 14:15 | 
  <RobH> | 
  reinstalled srv101, installing wikimedia-task-app packages now | 
  [production] | 
            
  | 06:37 | 
  <JeLuF> | 
  rebooted db18. Mysqld was stuck but couldn't be killed. | 
  [production] | 
            
  | 04:08 | 
  <Tim> | 
  migrated all locked wikis from $wgReadOnly(File) to permissions-based locking, so that stewards can edit the alternate project links, and so that various MediaWiki components don't break on page view | 
  [production] | 
            
  | 03:57 | 
  <river> | 
  set up ms3/ms4 with solaris 10 update 6 | 
  [production] | 
            
  
    | 
      
        2009-01-07
      
      §
     | 
  
    
  | 22:50 | 
  <RobH> | 
  db13 and db14 are replicating but not in the cluster (not sure if they are caught up) | 
  [production] | 
            
  | 22:35 | 
  <RobH> | 
  updated power strip information for ps1-a1-sdtpa and balanced load | 
  [production] | 
            
  | 22:35 | 
  <RobH> | 
  reseated mrj cable for csw1-sdtpa_1/13 | 
  [production] | 
            
  | 21:36 | 
  <RobH> | 
  started up db13 and db14 | 
  [production] | 
            
  | 21:19 | 
  <RobH> | 
  updating firmware on db13-db14 | 
  [production] | 
            
  | 21:15 | 
  <RobH> | 
  shutdown db13 and db14 to fix lom lockup issue. | 
  [production] | 
            
  | 20:52 | 
  <RobH> | 
  depooled db13 and db14 in db.php to reboot them and fix the SP lockup issue. | 
  [production] | 
            
  | 20:49 | 
  <RobH> | 
  updating firmware on db16. | 
  [production] | 
            
  | 20:43 | 
  <RobH> | 
  started mysql back up on db15 | 
  [production] | 
            
  | 20:42 | 
  <RobH> | 
  cold reset of db16 to resolve lom issue.  will update firmware upon boot. | 
  [production] | 
            
  | 20:39 | 
  <RobH> | 
  swappned hostnames on ms3 and ms4, updated racktables and dns to reflect change | 
  [production] | 
            
  | 20:24 | 
  <brion> | 
  disabled wikidiff2 since it's not installed, and this apparaently is nicely broken | 
  [production] | 
            
  | 20:21 | 
  <RobH> | 
  db15 now responsive to lom and ready to be re-integrated into the cluster | 
  [production] | 
            
  | 20:12 | 
  <RobH> | 
  db15 cold reset fixes the LOM non-responsive issue.  Upgrading its firmware to prevent future issues. | 
  [production] | 
            
  | 20:06 | 
  <brion> | 
  removed stray whitespace from wikitech config file which was breaking rss feeds | 
  [production] | 
            
  | 19:23 | 
  <mark> | 
  Possibility that esams LVS was overloaded, split over 2 boxes (fuchsia & mint) | 
  [production] | 
            
  | 19:19 | 
  <RobH> | 
  ms3 and ms4 are accessible via LOM and ready for setup/deployment | 
  [production] | 
            
  | 19:05 | 
  <RobH> | 
  updated dns for ms3-ms5, updated dns for mangement for all media servers. | 
  [production] | 
            
  | 19:05 | 
  <RobH> | 
  nope  ;_; | 
  [production] | 
            
  | 19:04 | 
  <RobH> | 
  did domas update the bot for the new wikitech? | 
  [production] | 
            
  | 19:03 | 
  <brion> | 
  touching MessagesZh.php and re-trying scap; may not have properly updated | 
  [production] | 
            
  | 17:40 | 
  <brion-plague> | 
  scapping -- merged r45507 zh specialpage alias fix to live. also r45499 (revert of Cite error thingy) seems to already have been merged | 
  [production] | 
            
  | 13:58 | 
  <Tim> | 
  ran updateAutoPromote.php on all flaggedRevs wikis | 
  [production] | 
            
  | 13:41 | 
  <Tim> | 
  scap | 
  [production] | 
            
  | 13:21 | 
  <Tim> | 
  repooled db3 and db4 | 
  [production] | 
            
  | 04:36 | 
  <brion-codereview> | 
  svn up'ing testwiki to r45489 | 
  [production] |