| 2011-04-03
      
      § | 
    
  | 07:32 | <mark> | increased thumb handler count from 60 to 90 | [production] | 
            
  | 07:11 | <mark> | Doubled the amount of fcgi thumb handlers | [production] | 
            
  | 07:08 | <mark> | Turned off logging of 404s to nginx error.log | [production] | 
            
  | 06:50 | <mark> | Restarted Apache on the image scalers | [production] | 
            
  | 06:49 | <mark> | Reconfigured ms5 to use the 404 thumb handler | [production] | 
            
  | 06:48 | <Ryan_Lane> | disabling nfs on ms4 | [production] | 
            
  | 06:33 | <mark> | Running puppet on all apaches to fix fstab and mount ms5.pmtpa.wmnet:/export/thumbs | [production] | 
            
  | 06:32 | <mark> | Unmounting /mnt/thumbs on all mediawiki-installation servers | [production] | 
            
  | 06:30 | <mark> | Remounted NFS /mnt/thumbs on the scalers to ms5 | [production] | 
            
  | 06:28 | <Ryan_Lane> | bring nfs back up | [production] | 
            
  | 06:28 | <Ryan_Lane> | brought ms4 back up. stopping the web server service and nfs | [production] | 
            
  | 06:20 | <mark> | Setup NFS kernel server on ms5 | [production] | 
            
  | 06:18 | <Ryan_Lane> | powercycling ms4 | [production] | 
            
  | 05:29 | <Ryan_Lane> | rebooting ms4 with -d to get a coredump | [production] | 
            
  | 05:14 | <apergos> | reanbling webserver on ms4 for testing | [production] | 
            
  | 04:45 | <apergos> | stopping web service on ms4 for the moment | [production] | 
            
  | 04:29 | <apergos> | shot webserver again | [production] | 
            
  | 04:26 | <apergos> | turned off hourly snaps on ms4, turned back on webserver and nfs | [production] | 
            
  | 04:09 | <apergos> | rebooted ms4, shut down webserver and nfsd temporarily for testing | [production] | 
            
  | 02:58 | <apergos> | still looking at kernel memory issues, still rebooting, ryan should be here in a few minutes to help out | [production] | 
            
  | 02:03 | <apergos> | a solaris advisor... also have zfs arch cache max to 2g which is ridiculously low but wtf right? | [production] | 
            
  | 02:02 | <apergos> | set tcp_time_wait_interval to 10000 at suggestion of | [production] | 
            
  | 01:37 | <apergos> | lowered zfs arch max to 2g (someone should reset this later)... will take effect on next reboot | [production] | 
            
  | 00:29 | <apergos> | rebooting with the new zfs arc cache max value, which will reduce the min value as well... dunno if this will give us enough breathing room or not | [production] | 
            
  | 00:24 | <apergos> | set zfs arc cache to ridiculously low value of 4gb, since when it's healthy it's using much less than that (1gb), this will take effect on reboot | [production] | 
            
  | 00:22 | <Reedy> | Still experiencing MS4 issues, thumb service is likely to be problematic for most users | [production] | 
            
  
    | 2011-04-02
      
      § | 
    
  | 23:47 | <apergos> | rebooting ms4 from serial console, out to lunch and took the renderers down too | [production] | 
            
  | 18:42 | <catrope> | synchronized php-1.17/wmf-config/CommonSettings.php  'Per NeilK, change Category:Uploaded_by_UploadWizard to Category:Uploaded_with_UploadWizard' | [production] | 
            
  | 17:59 | <mark> | Upgrading varnish to 2.1.5 | [production] | 
            
  | 17:14 | <demon> | synchronized php-1.17/includes/filerepo/LocalFile.php  'r85200' | [production] | 
            
  | 14:19 | <mark> | Implemented CARP weights for distant CARP parents on squid configurator (used to be all equal before) | [production] | 
            
  | 11:36 | <mark> | Created btrfs filesystem on ms6, striped (raid10 style) over 46 devices - very experimental | [production] | 
            
  | 09:50 | <mark> | Reinstalling ms6 with Ubuntu 10.04 | [production] | 
            
  | 09:50 | <mark> | Fixed torrus again | [production] | 
            
  | 06:02 | <mark> | !wikipedia The image thumbnail servers appear stable now | [production] | 
            
  | 04:59 | <mark> | Increased nginx worker processes from 1 to 4, set file limit to 30k | [production] | 
            
  | 04:40 | <mark> | !wikipedia Image Thumbnail server outage, it's being worked on | [production] | 
            
  | 04:34 | <mark> | Power cycling ms4 again | [production] | 
            
  | 04:06 | <mark> | Power cycled ms4 again | [production] | 
            
  | 04:02 | <mark> | Removed ms4 from pmtpa.upload config, sending all thumbs to ms5 | [production] | 
            
  | 03:47 | <mark> | Restarted rsyncs ms4->ms5 | [production] | 
            
  | 03:25 | <Ryan_Lane> | powercycling ms4 again | [production] | 
            
  | 02:59 | <Ryan_Lane> | rebooting ms4 | [production] | 
            
  | 02:46 | <Ryan_Lane> | seems ms4 is totally dead, powercycling it | [production] | 
            
  | 01:09 | <Ryan_Lane> | installing python-pyinotify on spence for an updated ircecho | [production] |