| 
      
        2011-04-03
      
      §
     | 
  
    
  | 13:03 | 
  <apergos> | 
  shot rsyncs on ms5, setting 777 dir perms on all thumbnail dirs (eg e/ef/blablah.jpg) so scalers can write into them | 
  [production] | 
            
  | 12:53 | 
  <apergos> | 
  did same for rest of projects and subdirs (777 on hash dirs) | 
  [production] | 
            
  | 12:47 | 
  <apergos> | 
  chmod 777 on commons/thumb/*/* on ms5 so that scalers can create directories in there (mismatch of uid apache vs www-data) | 
  [production] | 
            
  | 11:12 | 
  <mark> | 
  Raised per-squid connection limit to ms5 of 200 to 400 connections | 
  [production] | 
            
  | 11:05 | 
  <mark> | 
  Raised per-squid connection limit to ms5 of 100 to 200 connections | 
  [production] | 
            
  | 10:55 | 
  <mark> | 
  Fixed squid loop, the pmtpa.upload squids were using the esams squids as "CARP parents for distant content" | 
  [production] | 
            
  | 10:29 | 
  <mark> | 
  Fixed puppet on sq42/43 | 
  [production] | 
            
  | 09:44 | 
  <mark> | 
  Lowered FCGI thumb handlers from 90 to 60 again, to reduce concurrency | 
  [production] | 
            
  | 08:08 | 
  <mark> | 
  Started 4 more rsyncs (8 total now) | 
  [production] | 
            
  | 07:49 | 
  <mark> | 
  Removed mlocate from ms5, puppetising | 
  [production] | 
            
  | 07:42 | 
  <mark> | 
  Started 4 rsyncs from ms4 to ms5 (--ignore-existing) | 
  [production] | 
            
  | 07:32 | 
  <mark> | 
  increased thumb handler count from 60 to 90 | 
  [production] | 
            
  | 07:11 | 
  <mark> | 
  Doubled the amount of fcgi thumb handlers | 
  [production] | 
            
  | 07:08 | 
  <mark> | 
  Turned off logging of 404s to nginx error.log | 
  [production] | 
            
  | 06:50 | 
  <mark> | 
  Restarted Apache on the image scalers | 
  [production] | 
            
  | 06:49 | 
  <mark> | 
  Reconfigured ms5 to use the 404 thumb handler | 
  [production] | 
            
  | 06:48 | 
  <Ryan_Lane> | 
  disabling nfs on ms4 | 
  [production] | 
            
  | 06:33 | 
  <mark> | 
  Running puppet on all apaches to fix fstab and mount ms5.pmtpa.wmnet:/export/thumbs | 
  [production] | 
            
  | 06:32 | 
  <mark> | 
  Unmounting /mnt/thumbs on all mediawiki-installation servers | 
  [production] | 
            
  | 06:30 | 
  <mark> | 
  Remounted NFS /mnt/thumbs on the scalers to ms5 | 
  [production] | 
            
  | 06:28 | 
  <Ryan_Lane> | 
  bring nfs back up | 
  [production] | 
            
  | 06:28 | 
  <Ryan_Lane> | 
  brought ms4 back up. stopping the web server service and nfs | 
  [production] | 
            
  | 06:20 | 
  <mark> | 
  Setup NFS kernel server on ms5 | 
  [production] | 
            
  | 06:18 | 
  <Ryan_Lane> | 
  powercycling ms4 | 
  [production] | 
            
  | 05:29 | 
  <Ryan_Lane> | 
  rebooting ms4 with -d to get a coredump | 
  [production] | 
            
  | 05:14 | 
  <apergos> | 
  reanbling webserver on ms4 for testing | 
  [production] | 
            
  | 04:45 | 
  <apergos> | 
  stopping web service on ms4 for the moment | 
  [production] | 
            
  | 04:29 | 
  <apergos> | 
  shot webserver again | 
  [production] | 
            
  | 04:26 | 
  <apergos> | 
  turned off hourly snaps on ms4, turned back on webserver and nfs | 
  [production] | 
            
  | 04:09 | 
  <apergos> | 
  rebooted ms4, shut down webserver and nfsd temporarily for testing | 
  [production] | 
            
  | 02:58 | 
  <apergos> | 
  still looking at kernel memory issues, still rebooting, ryan should be here in a few minutes to help out | 
  [production] | 
            
  | 02:03 | 
  <apergos> | 
  a solaris advisor... also have zfs arch cache max to 2g which is ridiculously low but wtf right? | 
  [production] | 
            
  | 02:02 | 
  <apergos> | 
  set tcp_time_wait_interval to 10000 at suggestion of | 
  [production] | 
            
  | 01:37 | 
  <apergos> | 
  lowered zfs arch max to 2g (someone should reset this later)... will take effect on next reboot | 
  [production] | 
            
  | 00:29 | 
  <apergos> | 
  rebooting with the new zfs arc cache max value, which will reduce the min value as well... dunno if this will give us enough breathing room or not | 
  [production] | 
            
  | 00:24 | 
  <apergos> | 
  set zfs arc cache to ridiculously low value of 4gb, since when it's healthy it's using much less than that (1gb), this will take effect on reboot | 
  [production] | 
            
  | 00:22 | 
  <Reedy> | 
  Still experiencing MS4 issues, thumb service is likely to be problematic for most users | 
  [production] | 
            
  
    | 
      
        2011-04-02
      
      §
     | 
  
    
  | 23:47 | 
  <apergos> | 
  rebooting ms4 from serial console, out to lunch and took the renderers down too  | 
  [production] | 
            
  | 18:42 | 
  <catrope> | 
  synchronized php-1.17/wmf-config/CommonSettings.php  'Per NeilK, change Category:Uploaded_by_UploadWizard to Category:Uploaded_with_UploadWizard' | 
  [production] | 
            
  | 17:59 | 
  <mark> | 
  Upgrading varnish to 2.1.5 | 
  [production] | 
            
  | 17:14 | 
  <demon> | 
  synchronized php-1.17/includes/filerepo/LocalFile.php  'r85200' | 
  [production] | 
            
  | 14:19 | 
  <mark> | 
  Implemented CARP weights for distant CARP parents on squid configurator (used to be all equal before) | 
  [production] | 
            
  | 11:36 | 
  <mark> | 
  Created btrfs filesystem on ms6, striped (raid10 style) over 46 devices - very experimental | 
  [production] | 
            
  | 09:50 | 
  <mark> | 
  Reinstalling ms6 with Ubuntu 10.04 | 
  [production] | 
            
  | 09:50 | 
  <mark> | 
  Fixed torrus again | 
  [production] | 
            
  | 06:02 | 
  <mark> | 
  !wikipedia The image thumbnail servers appear stable now | 
  [production] | 
            
  | 04:59 | 
  <mark> | 
  Increased nginx worker processes from 1 to 4, set file limit to 30k | 
  [production] | 
            
  | 04:40 | 
  <mark> | 
  !wikipedia Image Thumbnail server outage, it's being worked on | 
  [production] | 
            
  | 04:34 | 
  <mark> | 
  Power cycling ms4 again | 
  [production] | 
            
  | 04:06 | 
  <mark> | 
  Power cycled ms4 again | 
  [production] |