| 
      
        2011-04-25
      
      §
     | 
  
    
  | 15:21 | 
  <RobH> | 
  relocating snapshot4 into rack c2, it will be offline during this process | 
  [production] | 
            
  | 15:20 | 
  <RobH> | 
  db43-db47 network setup, sites not down, yay me | 
  [production] | 
            
  | 15:10 | 
  <RobH> | 
  being on csw1 makes robh nervous. | 
  [production] | 
            
  | 15:09 | 
  <RobH> | 
  labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 | 
  [production] | 
            
  | 14:47 | 
  <RobH> | 
  fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning | 
  [production] | 
            
  | 14:42 | 
  <RobH> | 
  stealing dataset1's known good scs connection to test storage2.  dataset1 service will remain unaffected. | 
  [production] | 
            
  
    | 
      
        2011-04-23
      
      §
     | 
  
    
  | 22:31 | 
  <RobH> | 
  required even. | 
  [production] | 
            
  | 22:31 | 
  <RobH> | 
  no drives display error leds, futher investigation requried | 
  [production] | 
            
  | 22:27 | 
  <RobH> | 
  ms2 is having bad drive investigated.  if we do this right, it wont go down.  if we don't it will.  is a slave es server. | 
  [production] | 
            
  | 22:00 | 
  <RobH> | 
  singer returned to operation, blog, techblog, survey, and secure returned to normal operation | 
  [production] | 
            
  | 21:52 | 
  <RobH> | 
  singer is once again coming back down for drive replacement.  This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org.  Service will be returned as soon as possible.   | 
  [production] | 
            
  | 21:19 | 
  <RobH> | 
  singer back online, for awhile, will come back down for further repair shortly. | 
  [production] | 
            
  | 21:05 | 
  <RobH> | 
  singer going down, blogs will be offline, so will secure, system will return to service as soon as possible | 
  [production] | 
            
  | 21:00 | 
  <RobH> | 
  preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process | 
  [production] | 
            
  | 19:50 | 
  <mark> | 
  Upgrading mr1-pmtpa to junos 10.4R3.4 | 
  [production] | 
            
  | 17:49 | 
  <RobH> | 
  migrating searchidx1 & search1-search10 to new ports in same rack.  moving one at a time and ensuring link lights between moves.  (already tested with search10) | 
  [production] | 
            
  | 14:11 | 
  <RobH> | 
  db19 is back online, seems to not have any mysql setup done. | 
  [production] | 
            
  | 14:02 | 
  <RobH> | 
  restarting db19 | 
  [production] | 
            
  | 14:02 | 
  <RobH> | 
  arcconf checks out all drives on db19 are indeed working as rich found earlier | 
  [production] | 
            
  | 12:47 | 
  <mark> | 
  Added (x121Address=1) condition to the LDAP query of the ldap_aliases router on mchenry's exim | 
  [production] | 
            
  | 00:32 | 
  <hcatlin> | 
  Mobile: Deploying fix to an issue that kept the standard-style Main_Page from displaying on mobile | 
  [production] | 
            
  | 00:25 | 
  <Ryan_Lane> | 
  restarting memcached on all of the mobile servers | 
  [production] | 
            
  | 00:23 | 
  <Ryan_Lane> | 
  repooling mobile3, since mobile will die without it (fun!!) | 
  [production] | 
            
  | 00:17 | 
  <Ryan_Lane> | 
  depooling mobile3 | 
  [production] | 
            
  | 00:13 | 
  <Ryan_Lane> | 
  restarting apache on mobile3 | 
  [production] | 
            
  | 00:10 | 
  <Ryan_Lane> | 
  puppet was broken on mobile1, reinstalled it | 
  [production] | 
            
  
    | 
      
        2011-04-22
      
      §
     | 
  
    
  | 23:56 | 
  <domas> | 
  detached gdb from srv193 apache, apparently it was used for something | 
  [production] | 
            
  | 23:14 | 
  <notpeter> | 
  restarting nagios (again)wq | 
  [production] | 
            
  | 22:43 | 
  <notpeter> | 
  restarting nagios | 
  [production] | 
            
  | 19:23 | 
  <apergos> | 
  shot all stopped rsyncs on ms5 (that were copying from ms4 about two weeks ago), changed all perms on the directories they had reached so thumbs can be served/read from them.. oh. not me, someone else must have done it, I'm not here :-P | 
  [production] | 
            
  | 19:02 | 
  <RobH> | 
  ms4 shutting down for memory troubleshooting | 
  [production] | 
            
  | 18:52 | 
  <RobH> | 
  ms4 troubleshooting, disragrd bounces] | 
  [production] | 
            
  | 18:51 | 
  <notpeter> | 
  restarting nagios | 
  [production] | 
            
  | 12:41 | 
  <hcatlin> | 
  Restarting mobile cluster with April code update. | 
  [production] | 
            
  | 00:49 | 
  <notpeter> | 
  restarting nagios. hopefully now with more sms! | 
  [production] | 
            
  
    | 
      
        2011-04-21
      
      §
     | 
  
    
  | 23:32 | 
  <midom> | 
  synchronized php-1.17/includes/ImagePage.php  | 
  [production] | 
            
  | 20:54 | 
  <pdhanda> | 
  synchronized live-1.5/404.php  | 
  [production] | 
            
  | 19:02 | 
  <domas> | 
  bumped up maxclients/serverlimit on singer to 350 (up from 150), set maxrequestsperchild to 30 to avoid heap blowup (down from 0), all governed via apache2/conf.d/maxrequests | 
  [production] | 
            
  | 18:25 | 
  <Ryan_Lane> | 
  restarting apache on singer | 
  [production] | 
            
  | 18:19 | 
  <Ryan_Lane> | 
  applying system patches to raskin | 
  [production] | 
            
  | 17:47 | 
  <Ryan_Lane> | 
  restarting apache on singer | 
  [production] | 
            
  | 17:04 | 
  <pdhanda> | 
  synchronized live-1.5/404.php  | 
  [production] | 
            
  | 16:52 | 
  <rainman-sr> | 
  copying stuff from /home/ariel/searchidx/ to searchidx2 as it didn't get copied | 
  [production] | 
            
  | 15:32 | 
  <awjrichards> | 
  synchronized php-1.17/wmf-config/CommonSettings.php  'Fixing typo to properly check for wgUseEmailCapture' | 
  [production] | 
            
  | 15:29 | 
  <awjrichards> | 
  ran sync-common-all  | 
  [production] | 
            
  | 15:06 | 
  <awjrichards> | 
  synchronized php-1.17/wmf-config/InitialiseSettings.php  'Added section for EmailCapture, enabling on testwiki' | 
  [production] | 
            
  | 15:05 | 
  <awjrichards> | 
  synchronized php-1.17/wmf-config/CommonSettings.php  'Added section for EmailCapture' | 
  [production] |