| 2010-06-04
      
      § | 
    
  | 22:52 | <tomaszf> | starting webstats with new binary | [production] | 
            
  | 22:50 | <tomaszf> | stopping webstats in prep for update to track mobile stats | [production] | 
            
  | 19:30 | <atglenn> | moved bad snapshots (apr 11 through may 6 2010) to /mnt/dumps/public/bad so public index shows only good dumps and so there will be no prefetch against them | [production] | 
            
  | 18:47 | <Fred> | moved mobile2 to squid vlan / re-ip'ed / dns changed. mobile1 => 115 mobile2 => 116 | [production] | 
            
  | 18:35 | <catrope> | synchronized php-1.5/wmf-config/CommonSettings.php  'Bump style version appendix. Gotta kill this thing some time' | [production] | 
            
  | 18:35 | <catrope> | synchronized php-1.5/extensions/UsabilityInitiative/Vector/Vector.combined.min.js  'r67355' | [production] | 
            
  | 18:34 | <catrope> | synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js  'r67355' | [production] | 
            
  | 12:11 | <tstarling> | synchronized php-1.5/wmf-config/InitialiseSettings.php  'WikimediaMobile' | [production] | 
            
  | 11:37 | <Tim> | mobile down for 15 minutes, possibly apache threads exhausted, restarting apache | [production] | 
            
  | 09:56 | <catrope> | synchronized php-1.5/extensions/ContactPage/SpecialContact.php  'r67333' | [production] | 
            
  | 09:56 | <domas> | deployments manage to kill apache processes sometimes | [production] | 
            
  | 09:50 | <tstarling> | synchronizing Wikimedia installation... Revision: 66620 | [production] | 
            
  | 09:50 | <Tim> | pushing out WikimediaMobile (r67331) in preparation for deployment on testwiki | [production] | 
            
  | 08:44 | <domas> | decreased keepalivetimeout and timeout on mobile1 | [production] | 
            
  | 08:35 | <Tim> | on mobile1: reduced max passenger pool size to 200, Domas and I think it's about right, shouldn't exceed allowable memory, should give us close to 100% CPU. | [production] | 
            
  | 08:26 | <Tim> | on mobile1: domas fixed file limit, now 50k | [production] | 
            
  | 08:10 | <Tim> | increasing MaxClients on mobile1 to 1500 | [production] | 
            
  | 05:01 | <Fred> | Added apache2.conf, memcached.conf to puppet receipe for mobile. | [production] | 
            
  | 03:43 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '23784 - Modify add/remove rights for bureaucrats on officewiki' | [production] | 
            
  | 02:46 | <Tim> | mobile1: increased ServerLimit to 1500 and reduced MaxClients to 500 | [production] | 
            
  | 02:35 | <Tim> | on mobile1: increased memcached memory limit from 64M to 5000M | [production] | 
            
  | 02:15 | <Tim> | switched mobile1 over from apache2-mpm-worker to apache2-mpm-prefork (via puppet) | [production] | 
            
  | 01:03 | <Tim> | set ganglia host_dmax to 1 day | [production] | 
            
  
    | 2010-06-03
      
      § | 
    
  | 21:57 | <Fred> | mobile1 re-imaged and puppetized. Changed subnet for mobile1. Changed DNS for mobile1. m pointing to newly imaged mobile1 (until transition is completed) | [production] | 
            
  | 20:55 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '23689 - Enable Collection extension on Thai Wikipedia' | [production] | 
            
  | 20:22 | <AaronSchulz> | deployed r67296 FlaggedRevs_alpha | [production] | 
            
  | 20:21 | <aaron> | synchronizing Wikimedia installation... Revision: 66620 | [production] | 
            
  | 19:39 | <mark> | Moved mobile1 switchport from vlan 101 to 100 | [production] | 
            
  | 19:36 | <mark> | Reverted DNS change of mobile1, back to .157 | [production] | 
            
  | 17:21 | <Fred> | mobile1 going to be unreacheable while re-ip'ing | [production] | 
            
  | 14:05 | <midom> | synchronized php-1.5/wmf-config/InitialiseSettings.php  'timezone change for bat-smg' | [production] | 
            
  | 11:53 | <mark> | Made m.wikipedia.org CNAME m.wikipedia.org, m.wikipedia.org A to mobile1/2 in RR | [production] | 
            
  | 10:57 | <hcatlin> | mobile2 has been rebuilt and is featuring the new apache/mobile stack taking 40% of all mobile traffic. pls help monitor on ganglia. | [production] | 
            
  | 09:04 | <Tim> | cleaning COSS on sq45, resynced its configuration, will start squid when done | [production] | 
            
  | 08:58 | <Tim> | kernel reports degraded RAID on sq33, sq34, sq35, sq37, sq38, sq40 | [production] | 
            
  | 08:39 | <Tim> | checked all serial consoles, all nonresponsive, rebooted all | [production] | 
            
  | 08:23 | <Tim> | sq33, sq34, sq35, sq37, sq38, sq40, sq45 have been down for 16-28 days, apparently for no good reason, can't find any log or DT entries. Will try restarts. | [production] | 
            
  | 07:56 | <Tim> | added new squids to nagios | [production] | 
            
  | 06:36 | <Tim> | cleaning cache directories on sq56 to avoid resurrection of expired content | [production] | 
            
  | 06:35 | <Tim> | adding monitoring for rather important service IPs: upload.esams and text.esams | [production] | 
            
  | 06:22 | <Tim> | sq56 not responding to ping or serial console (for 4 days), nothing in racadm getsel, rebooting | [production] | 
            
  | 06:07 | <tstarling> | synchronized php-1.5/wmf-config/InitialiseSettings.php  'disabling ClickTracking due to CR r58099' | [production] | 
            
  | 05:24 | <Tim> | started apache on srv216, was stopped for some reason | [production] | 
            
  | 03:57 | <Fred> | shutting down mailman on list for a few minutes while exim and spamd catch up | [production] |