| 
      
        2011-04-03
      
      §
     | 
  
    
  | 06:49 | 
  <mark> | 
  Reconfigured ms5 to use the 404 thumb handler | 
  [production] | 
            
  | 06:48 | 
  <Ryan_Lane> | 
  disabling nfs on ms4 | 
  [production] | 
            
  | 06:33 | 
  <mark> | 
  Running puppet on all apaches to fix fstab and mount ms5.pmtpa.wmnet:/export/thumbs | 
  [production] | 
            
  | 06:32 | 
  <mark> | 
  Unmounting /mnt/thumbs on all mediawiki-installation servers | 
  [production] | 
            
  | 06:30 | 
  <mark> | 
  Remounted NFS /mnt/thumbs on the scalers to ms5 | 
  [production] | 
            
  | 06:28 | 
  <Ryan_Lane> | 
  bring nfs back up | 
  [production] | 
            
  | 06:28 | 
  <Ryan_Lane> | 
  brought ms4 back up. stopping the web server service and nfs | 
  [production] | 
            
  | 06:20 | 
  <mark> | 
  Setup NFS kernel server on ms5 | 
  [production] | 
            
  | 06:18 | 
  <Ryan_Lane> | 
  powercycling ms4 | 
  [production] | 
            
  | 05:29 | 
  <Ryan_Lane> | 
  rebooting ms4 with -d to get a coredump | 
  [production] | 
            
  | 05:14 | 
  <apergos> | 
  reanbling webserver on ms4 for testing | 
  [production] | 
            
  | 04:45 | 
  <apergos> | 
  stopping web service on ms4 for the moment | 
  [production] | 
            
  | 04:29 | 
  <apergos> | 
  shot webserver again | 
  [production] | 
            
  | 04:26 | 
  <apergos> | 
  turned off hourly snaps on ms4, turned back on webserver and nfs | 
  [production] | 
            
  | 04:09 | 
  <apergos> | 
  rebooted ms4, shut down webserver and nfsd temporarily for testing | 
  [production] | 
            
  | 02:58 | 
  <apergos> | 
  still looking at kernel memory issues, still rebooting, ryan should be here in a few minutes to help out | 
  [production] | 
            
  | 02:03 | 
  <apergos> | 
  a solaris advisor... also have zfs arch cache max to 2g which is ridiculously low but wtf right? | 
  [production] | 
            
  | 02:02 | 
  <apergos> | 
  set tcp_time_wait_interval to 10000 at suggestion of | 
  [production] | 
            
  | 01:37 | 
  <apergos> | 
  lowered zfs arch max to 2g (someone should reset this later)... will take effect on next reboot | 
  [production] | 
            
  | 00:29 | 
  <apergos> | 
  rebooting with the new zfs arc cache max value, which will reduce the min value as well... dunno if this will give us enough breathing room or not | 
  [production] | 
            
  | 00:24 | 
  <apergos> | 
  set zfs arc cache to ridiculously low value of 4gb, since when it's healthy it's using much less than that (1gb), this will take effect on reboot | 
  [production] | 
            
  | 00:22 | 
  <Reedy> | 
  Still experiencing MS4 issues, thumb service is likely to be problematic for most users | 
  [production] | 
            
  
    | 
      
        2011-04-02
      
      §
     | 
  
    
  | 23:47 | 
  <apergos> | 
  rebooting ms4 from serial console, out to lunch and took the renderers down too  | 
  [production] | 
            
  | 18:42 | 
  <catrope> | 
  synchronized php-1.17/wmf-config/CommonSettings.php  'Per NeilK, change Category:Uploaded_by_UploadWizard to Category:Uploaded_with_UploadWizard' | 
  [production] | 
            
  | 17:59 | 
  <mark> | 
  Upgrading varnish to 2.1.5 | 
  [production] | 
            
  | 17:14 | 
  <demon> | 
  synchronized php-1.17/includes/filerepo/LocalFile.php  'r85200' | 
  [production] | 
            
  | 14:19 | 
  <mark> | 
  Implemented CARP weights for distant CARP parents on squid configurator (used to be all equal before) | 
  [production] | 
            
  | 11:36 | 
  <mark> | 
  Created btrfs filesystem on ms6, striped (raid10 style) over 46 devices - very experimental | 
  [production] | 
            
  | 09:50 | 
  <mark> | 
  Reinstalling ms6 with Ubuntu 10.04 | 
  [production] | 
            
  | 09:50 | 
  <mark> | 
  Fixed torrus again | 
  [production] | 
            
  | 06:02 | 
  <mark> | 
  !wikipedia The image thumbnail servers appear stable now | 
  [production] | 
            
  | 04:59 | 
  <mark> | 
  Increased nginx worker processes from 1 to 4, set file limit to 30k | 
  [production] | 
            
  | 04:40 | 
  <mark> | 
  !wikipedia Image Thumbnail server outage, it's being worked on | 
  [production] | 
            
  | 04:34 | 
  <mark> | 
  Power cycling ms4 again | 
  [production] | 
            
  | 04:06 | 
  <mark> | 
  Power cycled ms4 again | 
  [production] | 
            
  | 04:02 | 
  <mark> | 
  Removed ms4 from pmtpa.upload config, sending all thumbs to ms5 | 
  [production] | 
            
  | 03:47 | 
  <mark> | 
  Restarted rsyncs ms4->ms5 | 
  [production] | 
            
  | 03:25 | 
  <Ryan_Lane> | 
  powercycling ms4 again | 
  [production] | 
            
  | 02:59 | 
  <Ryan_Lane> | 
  rebooting ms4 | 
  [production] | 
            
  | 02:46 | 
  <Ryan_Lane> | 
  seems ms4 is totally dead, powercycling it | 
  [production] | 
            
  | 01:09 | 
  <Ryan_Lane> | 
  installing python-pyinotify on spence for an updated ircecho | 
  [production] | 
            
  
    | 
      
        2011-04-01
      
      §
     | 
  
    
  | 21:35 | 
  <Ryan_Lane> | 
  purging some binlogs on db9 to free up space | 
  [production] | 
            
  | 21:35 | 
  <RobH> | 
  bugzilla now version 4 | 
  [production] | 
            
  | 21:31 | 
  <RobH> | 
  taking down bugzilla for a quick upgrade | 
  [production] | 
            
  | 18:48 | 
  <Ryan_Lane> | 
  added ctwoo, brion, py, and reedy to the engineering alias | 
  [production] | 
            
  | 18:36 | 
  <mark> | 
  Deployed ms5.pmtpa.wmnet as a special 'apache' for pmtpa squid uploads... now serving a small portion of commons thumbs | 
  [production] | 
            
  | 18:11 | 
  <RobH> | 
  bugzilla back online, CRproxy was affected, and repaired | 
  [production] | 
            
  | 17:30 | 
  <RobH> | 
  bugzilla.wikimedia.org going offline for database backup and upgrade | 
  [production] | 
            
  | 17:13 | 
  <RobH> | 
  beginning upgrade process for bugzilla, it's availability will be in question during this time | 
  [production] | 
            
  | 16:59 | 
  <mark> | 
  Turned off Etag in the webserver7 configuration (/opt/webserver7/https-ms4/config/obj.conf) on ms4 | 
  [production] |