| 2011-04-25
      
      § | 
    
  | 18:00 | <RobH> | db20 shutdown | [production] | 
            
  | 18:00 | <RobH> | didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa.  tested out fine and all my major configuration changes on netowrk should be complete | [production] | 
            
  | 17:56 | <RobH> | ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues | [production] | 
            
  | 15:47 | <RobH> | delay, not coming down yet, need more cables | [production] | 
            
  | 15:46 | <RobH> | db19 is coming down as well, it is depooled anyhow | [production] | 
            
  | 15:46 | <RobH> | db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online. | [production] | 
            
  | 15:21 | <RobH> | relocating snapshot4 into rack c2, it will be offline during this process | [production] | 
            
  | 15:20 | <RobH> | db43-db47 network setup, sites not down, yay me | [production] | 
            
  | 15:10 | <RobH> | being on csw1 makes robh nervous. | [production] | 
            
  | 15:09 | <RobH> | labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 | [production] | 
            
  | 14:47 | <RobH> | fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning | [production] | 
            
  | 14:42 | <RobH> | stealing dataset1's known good scs connection to test storage2.  dataset1 service will remain unaffected. | [production] | 
            
  
    | 2011-04-23
      
      § | 
    
  | 22:31 | <RobH> | required even. | [production] | 
            
  | 22:31 | <RobH> | no drives display error leds, futher investigation requried | [production] | 
            
  | 22:27 | <RobH> | ms2 is having bad drive investigated.  if we do this right, it wont go down.  if we don't it will.  is a slave es server. | [production] | 
            
  | 22:00 | <RobH> | singer returned to operation, blog, techblog, survey, and secure returned to normal operation | [production] | 
            
  | 21:52 | <RobH> | singer is once again coming back down for drive replacement.  This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org.  Service will be returned as soon as possible. | [production] | 
            
  | 21:19 | <RobH> | singer back online, for awhile, will come back down for further repair shortly. | [production] | 
            
  | 21:05 | <RobH> | singer going down, blogs will be offline, so will secure, system will return to service as soon as possible | [production] | 
            
  | 21:00 | <RobH> | preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process | [production] | 
            
  | 19:50 | <mark> | Upgrading mr1-pmtpa to junos 10.4R3.4 | [production] | 
            
  | 17:49 | <RobH> | migrating searchidx1 & search1-search10 to new ports in same rack.  moving one at a time and ensuring link lights between moves.  (already tested with search10) | [production] | 
            
  | 14:11 | <RobH> | db19 is back online, seems to not have any mysql setup done. | [production] | 
            
  | 14:02 | <RobH> | restarting db19 | [production] | 
            
  | 14:02 | <RobH> | arcconf checks out all drives on db19 are indeed working as rich found earlier | [production] | 
            
  | 12:47 | <mark> | Added (x121Address=1) condition to the LDAP query of the ldap_aliases router on mchenry's exim | [production] | 
            
  | 00:32 | <hcatlin> | Mobile: Deploying fix to an issue that kept the standard-style Main_Page from displaying on mobile | [production] | 
            
  | 00:25 | <Ryan_Lane> | restarting memcached on all of the mobile servers | [production] | 
            
  | 00:23 | <Ryan_Lane> | repooling mobile3, since mobile will die without it (fun!!) | [production] | 
            
  | 00:17 | <Ryan_Lane> | depooling mobile3 | [production] | 
            
  | 00:13 | <Ryan_Lane> | restarting apache on mobile3 | [production] | 
            
  | 00:10 | <Ryan_Lane> | puppet was broken on mobile1, reinstalled it | [production] | 
            
  
    | 2011-04-22
      
      § | 
    
  | 23:56 | <domas> | detached gdb from srv193 apache, apparently it was used for something | [production] | 
            
  | 23:14 | <notpeter> | restarting nagios (again)wq | [production] | 
            
  | 22:43 | <notpeter> | restarting nagios | [production] | 
            
  | 19:23 | <apergos> | shot all stopped rsyncs on ms5 (that were copying from ms4 about two weeks ago), changed all perms on the directories they had reached so thumbs can be served/read from them.. oh. not me, someone else must have done it, I'm not here :-P | [production] | 
            
  | 19:02 | <RobH> | ms4 shutting down for memory troubleshooting | [production] | 
            
  | 18:52 | <RobH> | ms4 troubleshooting, disragrd bounces] | [production] | 
            
  | 18:51 | <notpeter> | restarting nagios | [production] | 
            
  | 12:41 | <hcatlin> | Restarting mobile cluster with April code update. | [production] | 
            
  | 00:49 | <notpeter> | restarting nagios. hopefully now with more sms! | [production] |