| 2011-04-25
      
      § | 
    
  | 21:20 | <RobH> | trying to fix storage2 | [production] | 
            
  | 20:16 | <notpeter> | actually adding everyone on ops to watchmouse service... didn't know this had not already been done. | [production] | 
            
  | 20:02 | <RobH> | updated csw1 to removed labels and move to default vlan ports 11/12, 11/14, 11/19, & 11/21.  old connection ports for dataset2, tridge, ms1, and ms5 | [production] | 
            
  | 19:53 | <RobH> | the datacenter is looking awesome. | [production] | 
            
  | 19:45 | <RobH> | ms1 moved from temp network to permanent home, no downtime, responding fine | [production] | 
            
  | 19:42 | <RobH> | ms5 connection moved, no downtime, responds fine, less than 4 seconds | [production] | 
            
  | 19:40 | <RobH> | updated csw1-sdtpa 15/1,15/2 from vlan 105 to vlan 2, 15/3 and 15/4 from vlan 105 to 101 | [production] | 
            
  | 18:52 | <RobH> | snapshot4 relocated to new home, ready for os install | [production] | 
            
  | 18:42 | <RobH> | db19 and db20 back online (not in services as they have other issues) | [production] | 
            
  | 18:39 | <RobH> | db19 and db20 powering back up | [production] | 
            
  | 18:25 | <RobH> | virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware | [production] | 
            
  | 18:12 | <RobH> | rack b2 power rebalanced | [production] | 
            
  | 18:01 | <RobH> | db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly) | [production] | 
            
  | 18:00 | <RobH> | db20 shutdown | [production] | 
            
  | 18:00 | <RobH> | didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa.  tested out fine and all my major configuration changes on netowrk should be complete | [production] | 
            
  | 17:56 | <RobH> | ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues | [production] | 
            
  | 15:47 | <RobH> | delay, not coming down yet, need more cables | [production] | 
            
  | 15:46 | <RobH> | db19 is coming down as well, it is depooled anyhow | [production] | 
            
  | 15:46 | <RobH> | db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online. | [production] | 
            
  | 15:21 | <RobH> | relocating snapshot4 into rack c2, it will be offline during this process | [production] | 
            
  | 15:20 | <RobH> | db43-db47 network setup, sites not down, yay me | [production] | 
            
  | 15:10 | <RobH> | being on csw1 makes robh nervous. | [production] | 
            
  | 15:09 | <RobH> | labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 | [production] | 
            
  | 14:47 | <RobH> | fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning | [production] | 
            
  | 14:42 | <RobH> | stealing dataset1's known good scs connection to test storage2.  dataset1 service will remain unaffected. | [production] | 
            
  
    | 2011-04-23
      
      § | 
    
  | 22:31 | <RobH> | required even. | [production] | 
            
  | 22:31 | <RobH> | no drives display error leds, futher investigation requried | [production] | 
            
  | 22:27 | <RobH> | ms2 is having bad drive investigated.  if we do this right, it wont go down.  if we don't it will.  is a slave es server. | [production] | 
            
  | 22:00 | <RobH> | singer returned to operation, blog, techblog, survey, and secure returned to normal operation | [production] | 
            
  | 21:52 | <RobH> | singer is once again coming back down for drive replacement.  This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org.  Service will be returned as soon as possible. | [production] | 
            
  | 21:19 | <RobH> | singer back online, for awhile, will come back down for further repair shortly. | [production] | 
            
  | 21:05 | <RobH> | singer going down, blogs will be offline, so will secure, system will return to service as soon as possible | [production] | 
            
  | 21:00 | <RobH> | preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process | [production] | 
            
  | 19:50 | <mark> | Upgrading mr1-pmtpa to junos 10.4R3.4 | [production] | 
            
  | 17:49 | <RobH> | migrating searchidx1 & search1-search10 to new ports in same rack.  moving one at a time and ensuring link lights between moves.  (already tested with search10) | [production] | 
            
  | 14:11 | <RobH> | db19 is back online, seems to not have any mysql setup done. | [production] | 
            
  | 14:02 | <RobH> | restarting db19 | [production] | 
            
  | 14:02 | <RobH> | arcconf checks out all drives on db19 are indeed working as rich found earlier | [production] | 
            
  | 12:47 | <mark> | Added (x121Address=1) condition to the LDAP query of the ldap_aliases router on mchenry's exim | [production] | 
            
  | 00:32 | <hcatlin> | Mobile: Deploying fix to an issue that kept the standard-style Main_Page from displaying on mobile | [production] | 
            
  | 00:25 | <Ryan_Lane> | restarting memcached on all of the mobile servers | [production] | 
            
  | 00:23 | <Ryan_Lane> | repooling mobile3, since mobile will die without it (fun!!) | [production] | 
            
  | 00:17 | <Ryan_Lane> | depooling mobile3 | [production] | 
            
  | 00:13 | <Ryan_Lane> | restarting apache on mobile3 | [production] | 
            
  | 00:10 | <Ryan_Lane> | puppet was broken on mobile1, reinstalled it | [production] |