| 
      
        2009-04-27
      
      §
     | 
  
    
  | 02:20 | 
  <Tim> | 
  srv53 down, took it out of memcached rotation. Updating the memcached spare list. | 
  [production] | 
            
  | 02:20 | 
  <tstarling> | 
  synchronized php-1.5/mc-pmtpa.php  | 
  [production] | 
            
  | 02:12 | 
  <Tim> | 
  fixed rc1 slaves, broken by expire_logs_days on ms3 | 
  [production] | 
            
  | 01:59 | 
  <Tim> | 
  Shut down srv217 for maintenance. Similar timer interrupt issue observed as before: select() syscalls running indefinitely despite a short timeout specified.  | 
  [production] | 
            
  | 01:53 | 
  <tstarling> | 
  synchronized php-1.5/db.php  | 
  [production] | 
            
  | 01:52 | 
  <Tim> | 
  repooled ms3 rc1 instance | 
  [production] | 
            
  | 01:49 | 
  <Tim> | 
  reset slave on db21, was running out of disk space due to relay logs | 
  [production] | 
            
  | 01:42 | 
  <Tim> | 
  fixed nagios for srv99, still had its apache check command set to my CGI security vulnerability demonstration, permanently saved in retention.dat despite config changes | 
  [production] | 
            
  | 01:17 | 
  <Tim> | 
  enabled apport on srv99, to see if I can track down the nagios flapping | 
  [production] | 
            
  | 00:52 | 
  <Tim> | 
  restarted trackBlobs.php | 
  [production] | 
            
  
    | 
      
        2009-04-25
      
      §
     | 
  
    
  | 23:31 | 
  <Tim-away> | 
  experimentally stopping replication on db3 to check disk load | 
  [production] | 
            
  | 22:51 | 
  <tstarling> | 
  synchronized php-1.5/db.php  'reduced load on db3' | 
  [production] | 
            
  | 18:50 | 
  <mark> | 
  Killed long-running SQL query TrackBlobs::trackRevisions query from hume causing db3 to lag heavily | 
  [production] | 
            
  | 17:22 | 
  <mark> | 
  Stopped Apaches on srv32/srv33 again, as syncs will fail in most cases | 
  [production] | 
            
  | 16:36 | 
  <mark> | 
  Started /home-less apache on srv33 | 
  [production] | 
            
  | 13:23 | 
  <mark> | 
  Started /home-less apache on srv32 | 
  [production] | 
            
  | 11:03 | 
  <mark> | 
  Kicked srv99 back into submission | 
  [production] | 
            
  | 10:56 | 
  <mark> | 
  Squid-blocked high-rate scraper which was overloading ES | 
  [production] | 
            
  | 05:30 | 
  <Tim-away> | 
  fixed conflict markers in extensions/CentralNotice/SpecialNoticeText.php and resynced. | 
  [production] | 
            
  | 05:30 | 
  <tstarling> | 
  synchronized php-1.5/extensions/CentralNotice/SpecialNoticeText.php  | 
  [production] | 
            
  
    | 
      
        2009-04-24
      
      §
     | 
  
    
  | 22:23 | 
  <rainman__> | 
  search back up on all wikis | 
  [production] | 
            
  | 22:17 | 
  <root> | 
  synchronized php-1.5/lucene.php  'Replacement for reinstalled srv58' | 
  [production] | 
            
  | 22:15 | 
  <brion> | 
  synchronized php-1.5/secure.php  'fix for thumbs on private ssl access (bug 18475 etc)' | 
  [production] | 
            
  | 21:19 | 
  <rainman_> | 
  srv58 dead, making all non-major wikis search broken, transfering the service to search11/12.... | 
  [production] | 
            
  | 19:50 | 
  <Rob> | 
  srv90-srv99 ganglia installed. | 
  [production] | 
            
  | 19:50 | 
  <Rob> | 
  srv97 online | 
  [production] | 
            
  | 19:47 | 
  <Rob> | 
  srv98 online | 
  [production] | 
            
  | 19:46 | 
  <Rob> | 
  srv96 online | 
  [production] | 
            
  | 19:45 | 
  <Rob> | 
  srv99 online | 
  [production] | 
            
  | 19:42 | 
  <Rob> | 
  srv95 online | 
  [production] | 
            
  | 19:40 | 
  <Rob> | 
  srv92, srv93, and srv94 back online | 
  [production] | 
            
  | 19:39 | 
  <Rob> | 
  srv91 back online | 
  [production] | 
            
  | 19:24 | 
  <Rob> | 
  srv90 online | 
  [production] | 
            
  | 19:16 | 
  <Rob> | 
  srv90-srv99 reinstalled, currently looping though package installation | 
  [production] | 
            
  | 18:34 | 
  <mark> | 
  Fixed ganglia by installing the appropriate config files on the (reinstalled) aggregation hosts | 
  [production] | 
            
  | 18:28 | 
  <Rob> | 
  installed ganglia on all servers reinstalled to ubuntu apache thus far today. | 
  [production] | 
            
  | 18:27 | 
  <Rob> | 
  srv89 back online | 
  [production] | 
            
  | 18:17 | 
  <Rob> | 
  srv90-srv99 will be down over the next 30 minutes for ubuntufication. | 
  [production] | 
            
  | 18:16 | 
  <robh> | 
  synchronized php-1.5/mc-pmtpa.php  'some spares were actually down' | 
  [production] | 
            
  | 18:14 | 
  <robh> | 
  synchronized php-1.5/mc-pmtpa.php  'removed the 9x servers for reinstallation' | 
  [production] | 
            
  | 18:02 | 
  <Rob> | 
  srv84 ubuntufied and online | 
  [production] | 
            
  | 17:58 | 
  <Rob> | 
  srv83 ubuntufied and online | 
  [production] | 
            
  | 17:54 | 
  <Rob> | 
  srv82 ubuntufied and online | 
  [production] | 
            
  | 17:50 | 
  <Rob> | 
  srv81 reinstalled and online | 
  [production] | 
            
  | 17:47 | 
  <Rob> | 
  srv89 coming down for reinstall | 
  [production] | 
            
  | 17:44 | 
  <Rob> | 
  srv58 online | 
  [production] | 
            
  | 17:38 | 
  <Rob> | 
  srv57 online | 
  [production] | 
            
  | 17:26 | 
  <Rob> | 
  reinstalling srv58 | 
  [production] | 
            
  | 17:16 | 
  <mark> | 
  Set up switchport for srv57 on asw-c4-pmtpa | 
  [production] | 
            
  | 17:10 | 
  <Rob> | 
  reinstalling srv57 | 
  [production] |