| 2010-07-15
      
      § | 
    
  | 19:23 | <mark> | Replaced all occurrences of 'rr.wikimedia.org' with 'text.wikimedia.org' in DNS | [production] | 
            
  | 19:14 | <mark> | Updated IP of deprecated record rr.esams.wikimedia.org | [production] | 
            
  | 19:10 | <mark> | Started PyBal on amslvs1 with a new config; it automatically picked up the traffic for both text.esams (91.198.174.232) and bits.esams (91.198.174.233) | [production] | 
            
  | 19:07 | <mark> | Stopped PyBal on amslvs1, BGP and OSPF did an automatic failover of bits.esams (91.198.174.233) to amslvs3 | [production] | 
            
  | 18:59 | <mark> | Removed IP 91.198.174.2 (old text squids service ip) from amslvs1. Anyone still using the old IP after weeks will now be unable to reach our sites. | [production] | 
            
  | 18:56 | <mark> | Depooled knsq1-knsq7 in PyBal | [production] | 
            
  | 17:38 | <Fred> | fixed nfs mounts on Bayes. | [production] | 
            
  | 15:35 | <apergos> | chowned  /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions tree to extdist.  ExtensionDistributor apparently working now | [production] | 
            
  | 15:01 | <apergos> | running svn cleanup on /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions as extdist user | [production] | 
            
  | 12:34 | <tstarling> | synchronizing Wikimedia installation... Revision: 69381 | [production] | 
            
  | 12:18 | <Tim> | svn up/scap to r69380 | [production] | 
            
  | 05:13 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '24321 - ml.wikiquote.org lost its project namespace' | [production] | 
            
  
    | 2010-07-14
      
      § | 
    
  | 23:44 | <Fred> | re-added ccron job to periodically save rrds on our ganglia server. (cron job seems to have vanished for some reason) | [production] | 
            
  | 17:59 | <catrope> | synchronized php-1.5/wmf-config/InitialiseSettings.php  'Favicon for wikimaniateamwiki per Guillaume' | [production] | 
            
  | 16:06 | <Fred> | restarted apache on mobile1 (had begun to return 500) | [production] | 
            
  | 14:07 | <mark> | Fixed memcached on srv110 | [production] | 
            
  | 12:19 | <mark> | Fixed ganglia and puppet on stafford | [production] | 
            
  | 11:54 | <mark> | Migrated DNS monitoring to puppet | [production] | 
            
  | 10:31 | <mark> | Migrated ZFS RAID nagios check to puppet | [production] | 
            
  | 10:14 | <mark> | Migrated monitoring of lucene to puppet | [production] | 
            
  | 09:37 | <mark> | Migrated monitoring of image scalers to puppet | [production] | 
            
  | 08:49 | <Tim> | using stafford for some pbuilder experimentation | [production] | 
            
  
    | 2010-07-12
      
      § | 
    
  | 16:54 | <Fred> | changed LONGQUERIES check threshold | [production] | 
            
  | 16:08 | <Fred> | restarting morebots since it had died. | [production] | 
            
  | 16:08 | <Fred> | restarting Nagios since it was down. | [production] | 
            
  | 14:29 | <mark> | Added "cfg_file=/etc/nagios/puppet_hosts.cfg" to nagios.cfg | [production] | 
            
  | 13:25 | <JeLuF> | added disk space monitoring for apaches | [production] | 
            
  | 12:51 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '24306 - Create namespaces for Lithuanian Wiktionary' | [production] | 
            
  | 12:48 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '24321 - ml.wikiquote.org lost its project namespace' | [production] | 
            
  | 12:46 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '24321 - ml.wikiquote.org lost its project namespace' | [production] | 
            
  | 12:41 | <jeluf> | synchronized php-1.5/wmf-config/InitialiseSettings.php  '24344 - Namespace changes - si.wiktionary' | [production] | 
            
  | 11:45 | <JeLuF> | fixed broken ganglia-metrics installation on srv146 (chown gmetric /var/log/gmetricd/gmetricd.log) | [production] | 
            
  | 11:41 | <JeLuF> | added DPKG status monitoring for all app servers to nagios. Reports all packages that are not in state 'rc' or 'ii'. | [production] | 
            
  | 10:43 | <JeLuF> | lots of false alerts from nagios due to missing SSL setup for NRPE. Working on it. | [production] | 
            
  | 09:53 | <JeLuF> | changed puppet config to install nrpe on all app servers | [production] | 
            
  | 09:28 | <JeLuF> | replacing opsview-nrpe agents by nagios-nrpe agents (image_scalers, some other apaches). Most apaches already use nagios-nrpe | [production] | 
            
  | 07:40 | <Tim> | set up NRPE disk space monitoring on ms4, discovered that /mnt2 is full | [production] | 
            
  | 04:54 | <Tim> | updated NFS host/service groups to monitor the actual NFS servers, not a random collection of miscellaneous ex-NFS servers | [production] | 
            
  | 04:46 | <Tim> | installed NRPE on nfs1 and nfs2 | [production] | 
            
  | 04:08 | <Tim> | adding rendering, m, bits.esams, recursor0, recursor1, recursor0.esams to nagios | [production] | 
            
  | 04:02 | <Tim> | added forward DNS entry for recursor0.esams, modified reverse DNS entry resolver0.esams -> recursor0.esams | [production] | 
            
  | 03:55 | <Tim> | fixed reverse DNS entries for recursor0 and recursor1, were set incorrectly to non-existent hostnames "resolver0" and "recursor1" | [production] | 
            
  | 03:36 | <Tim> | renamed db6.mgmt to locke.mgmt | [production] |