| 2009-04-20
      
      § | 
    
  | 19:59 | <Rob> | Powering down srv67, srv85, srv88, srv90 due to temp warnings and bad fans. | [production] | 
            
  | 19:36 | <Rob> | updated mc-pmtpa.php to reflect the status of down or spare for the memcached servers.  (lots more spares now) | [production] | 
            
  | 17:35 | <azafred> | restarted apache on srv217 | [production] | 
            
  | 17:34 | <azafred> | srv125 reinstall completed. | [production] | 
            
  | 17:24 | <Rob> | srv146 back online | [production] | 
            
  | 17:10 | <Rob> | srv131 back up, updated and synced. | [production] | 
            
  | 16:52 | <azafred> | srv118 reinstall completed. | [production] | 
            
  | 16:52 | <Rob> | srv127 back online and synced. | [production] | 
            
  | 16:41 | <Rob> | srv125 reinstalled, passing off to fred | [production] | 
            
  | 16:40 | <Rob> | replaced dead disk in sq26 | [production] | 
            
  | 16:31 | <Rob> | shutting down sq26 to replace bad hdd | [production] | 
            
  | 16:27 | <Rob> | reinstalling srv125 | [production] | 
            
  | 16:13 | <azafred> | finished re-install of srv63. | [production] | 
            
  | 16:11 | <Rob> | reinstalled srv118, handed off to fred for completion | [production] | 
            
  | 16:02 | <Rob> | restarted srv118 and reinstalled it | [production] | 
            
  | 15:57 | <Rob> | restarted a locked up srv110 and synced it. | [production] | 
            
  | 15:49 | <Rob> | srv81 lacked up, fixed, synced and online | [production] | 
            
  | 15:29 | <Rob> | replaced fan and drive in srv63, reinstalling | [production] | 
            
  | 14:36 | <Rob> | memory replaced in srv203, back online. | [production] | 
            
  | 14:11 | <Rob> | shutting down srv203 to swap out bad memory | [production] | 
            
  | 05:12 | <Tim> | fixed memcached on srv75, stopped old ES slave on srv102, srv106, srv107, srv159, srv171 | [production] | 
            
  
    | 2009-04-17
      
      § | 
    
  | 22:49 | <brion> | regenerated centralnotice output again... this time ok | [production] | 
            
  | 22:48 | <brion> | srv93 and srv107 memcached nodes are running but broken. restarting them... | [production] | 
            
  | 22:43 | <brion> | restarted srv82 memcache node. attempting to rebuild centralnotices... | [production] | 
            
  | 22:41 | <brion> | bad memcached node srv82 | [production] | 
            
  | 22:05 | <mark> | Set up 3 new pywikipedia mailing lists, redirected svn commit output to one of them | [production] | 
            
  | 19:38 | <robh> | synchronized php-1.5/InitialiseSettings.php  'Bug 18494 Logo for ln.wiki' | [production] | 
            
  | 17:22 | <Rob> | removed wikimedia.se from our nameservers as they are using their own. | [production] | 
            
  | 16:48 | <azafred> | updated spamassassin rules on lily to include the SARE rules and mirror the settings on McHenry. | [production] | 
            
  | 10:25 | <tstarling> | synchronized robots.txt | [production] | 
            
  | 08:19 | <tstarling> | synchronized php-1.5/InitialiseSettings.php | [production] | 
            
  | 07:13 | <Tim> | temporarily killed apache on overloaded ES masters | [production] | 
            
  | 07:11 | <tstarling> | synchronized php-1.5/db.php  'zeroing read load on ES masters' | [production] | 
            
  | 06:04 | <Tim> | brief site-wide outage while it rebooted, reason unknown. All good now. Resuming logrotate. | [production] | 
            
  | 05:55 | <Tim> | db20 h/w reboot | [production] | 
            
  | 05:48 | <Tim> | shutting down daemons on db20 for pre-emptive reboot. Serial console shows "BUG: soft lockup - CPU#4 stuck for 11s! [rsync:27854]" etc. | [production] | 
            
  | 05:10 | <Tim> | on db20: killed logrotate -f half done due to alarming kswapd CPU (linked to deadlocked rsync processes). May need a reboot. | [production] | 
            
  | 05:00 | <Tim> | fixed logrotate on db20, broken since March 10 due to broken status file, most likely due to non-ASCII filenames generated by demux.py. Patched demux.py. Removed everything.log. | [production] | 
            
  | 02:14 | <river> | set up ms6.esams, copying /export/upload from ms1 | [production] | 
            
  | 00:24 | <Tim> | blocked lots of uci.edu IPs that were collectively doing 20 req/s of expensive API queries, overloading ES | [production] | 
            
  | 00:15 | <brion> | techblog post on Phorm opt-out is linked from slashdot; load on singer seems fairly stable. | [production] |