| 
      
        2009-04-20
      
      §
     | 
  
    
  | 16:40 | 
  <Rob> | 
  replaced dead disk in sq26 | 
  [production] | 
            
  | 16:31 | 
  <Rob> | 
  shutting down sq26 to replace bad hdd | 
  [production] | 
            
  | 16:27 | 
  <Rob> | 
  reinstalling srv125 | 
  [production] | 
            
  | 16:13 | 
  <azafred> | 
  finished re-install of srv63. | 
  [production] | 
            
  | 16:11 | 
  <Rob> | 
  reinstalled srv118, handed off to fred for completion | 
  [production] | 
            
  | 16:02 | 
  <Rob> | 
  restarted srv118 and reinstalled it | 
  [production] | 
            
  | 15:57 | 
  <Rob> | 
  restarted a locked up srv110 and synced it. | 
  [production] | 
            
  | 15:49 | 
  <Rob> | 
  srv81 lacked up, fixed, synced and online | 
  [production] | 
            
  | 15:29 | 
  <Rob> | 
  replaced fan and drive in srv63, reinstalling | 
  [production] | 
            
  | 14:36 | 
  <Rob> | 
  memory replaced in srv203, back online. | 
  [production] | 
            
  | 14:11 | 
  <Rob> | 
  shutting down srv203 to swap out bad memory | 
  [production] | 
            
  | 05:12 | 
  <Tim> | 
  fixed memcached on srv75, stopped old ES slave on srv102, srv106, srv107, srv159, srv171 | 
  [production] | 
            
  
    | 
      
        2009-04-17
      
      §
     | 
  
    
  | 22:49 | 
  <brion> | 
  regenerated centralnotice output again... this time ok | 
  [production] | 
            
  | 22:48 | 
  <brion> | 
  srv93 and srv107 memcached nodes are running but broken. restarting them... | 
  [production] | 
            
  | 22:43 | 
  <brion> | 
  restarted srv82 memcache node. attempting to rebuild centralnotices... | 
  [production] | 
            
  | 22:41 | 
  <brion> | 
  bad memcached node srv82 | 
  [production] | 
            
  | 22:05 | 
  <mark> | 
  Set up 3 new pywikipedia mailing lists, redirected svn commit output to one of them | 
  [production] | 
            
  | 19:38 | 
  <robh> | 
  synchronized php-1.5/InitialiseSettings.php  'Bug 18494 Logo for ln.wiki' | 
  [production] | 
            
  | 17:22 | 
  <Rob> | 
  removed wikimedia.se from our nameservers as they are using their own. | 
  [production] | 
            
  | 16:48 | 
  <azafred> | 
  updated spamassassin rules on lily to include the SARE rules and mirror the settings on McHenry. | 
  [production] | 
            
  | 10:25 | 
  <tstarling> | 
  synchronized robots.txt  | 
  [production] | 
            
  | 08:19 | 
  <tstarling> | 
  synchronized php-1.5/InitialiseSettings.php  | 
  [production] | 
            
  | 07:13 | 
  <Tim> | 
  temporarily killed apache on overloaded ES masters | 
  [production] | 
            
  | 07:11 | 
  <tstarling> | 
  synchronized php-1.5/db.php  'zeroing read load on ES masters' | 
  [production] | 
            
  | 06:04 | 
  <Tim> | 
  brief site-wide outage while it rebooted, reason unknown. All good now. Resuming logrotate. | 
  [production] | 
            
  | 05:55 | 
  <Tim> | 
  db20 h/w reboot | 
  [production] | 
            
  | 05:48 | 
  <Tim> | 
  shutting down daemons on db20 for pre-emptive reboot. Serial console shows "BUG: soft lockup - CPU#4 stuck for 11s! [rsync:27854]" etc. | 
  [production] | 
            
  | 05:10 | 
  <Tim> | 
  on db20: killed logrotate -f half done due to alarming kswapd CPU (linked to deadlocked rsync processes). May need a reboot. | 
  [production] | 
            
  | 05:00 | 
  <Tim> | 
  fixed logrotate on db20, broken since March 10 due to broken status file, most likely due to non-ASCII filenames generated by demux.py. Patched demux.py. Removed everything.log. | 
  [production] | 
            
  | 02:14 | 
  <river> | 
  set up ms6.esams, copying /export/upload from ms1 | 
  [production] | 
            
  | 00:24 | 
  <Tim> | 
  blocked lots of uci.edu IPs that were collectively doing 20 req/s of expensive API queries, overloading ES | 
  [production] | 
            
  | 00:15 | 
  <brion> | 
  techblog post on Phorm opt-out is linked from slashdot; load on singer seems fairly stable. | 
  [production] | 
            
  
    | 
      
        2009-04-16
      
      §
     | 
  
    
  | 23:06 | 
  <tfinc> | 
  synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php  | 
  [production] | 
            
  | 22:48 | 
  <azafred> | 
  bounced apache on srv217. All threads were DED - dead | 
  [production] | 
            
  | 22:16 | 
  <tfinc> | 
  synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php  | 
  [production] | 
            
  | 22:08 | 
  <tfinc> | 
  synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php  | 
  [production] | 
            
  | 17:41 | 
  <domas> | 
  fantastic. I start _looking_ at stuff and it fixes itself. | 
  [production] | 
            
  | 17:35 | 
  <midom> | 
  synchronized php-1.5/includes/Revision.php  'live profiling hook' | 
  [production] | 
            
  | 17:28 | 
  <domas> | 
  db20 has kswapd deadlock, needs reboot soonish | 
  [production] | 
            
  | 17:18 | 
  <midom> | 
  synchronized php-1.5/InitialiseSettings.php  'disabled stats' | 
  [production] | 
            
  | 17:15 | 
  <midom> | 
  synchronized php-1.5/InitialiseSettings.php  'enabling udp stats' | 
  [production] | 
            
  | 16:18 | 
  <azafred> | 
  bounced apache on srv217 (no pid file so previous restart did not include this one) | 
  [production] | 
            
  | 15:57 | 
  <brion> | 
  network borkage between Florida and Amsterdam. Visitors through AMS proxies can't reach sites. | 
  [production] | 
            
  | 15:55 | 
  <azafred> | 
  bounced apache on srv[73,86,88,93,108,114,139,141,154,181,194,204,213,99] | 
  [production] | 
            
  | 15:52 | 
  <Tim-away> | 
  started mysqld on srv98,srv122,srv124,srv142,srv106,srv107: done with them for now. srv102 still going. | 
  [production] | 
            
  | 15:30 | 
  <mark> | 
  Set up ms6 with SP management at ms6.ipmi.esams.wikimedia.org | 
  [production] | 
            
  | 14:13 | 
  <mark> | 
  Restoring traffic to Amsterdam cluster | 
  [production] |