| 
      
        2009-04-23
      
      §
     | 
  
    
  | 14:31 | 
  <Tim> | 
  merged r49051 | 
  [production] | 
            
  | 14:13 | 
  <Tim> | 
  fixed nagios labels for esams backup ext store, erroneously labelled as "toolserver" | 
  [production] | 
            
  | 06:27 | 
  <Tim> | 
  restarted all job runners, ES connection errors weren't killing them | 
  [production] | 
            
  | 05:43 | 
  <Tim> | 
  shutting down mysql on all fedora ES servers. Will update documentation and node lists to indicate that this is permanent. | 
  [production] | 
            
  | 05:37 | 
  <Tim> | 
  srv217 did not come up from a soft reboot, but power cycle worked. Before reboot, observed apache2 hanging indefinitely on nanosleep(), but couldn't reproduce a timer issue in other processes. An NFS mount was hanging on stat. | 
  [production] | 
            
  | 05:13 | 
  <Tim> | 
  rebooting srv217 | 
  [production] | 
            
  | 04:41 | 
  <Tim> | 
  srv217 is hanging on various operations, investigating. Trying to shut down its apache. | 
  [production] | 
            
  | 04:35 | 
  <tstarling> | 
  synchronized php-1.5/db.php  | 
  [production] | 
            
  | 04:31 | 
  <Tim> | 
  copy done, started cluster18 mysql instance on ms3 using srv104 snapshot, repooled it | 
  [production] | 
            
  | 02:07 | 
  <tstarling> | 
  synchronized php-1.5/InitialiseSettings.php  | 
  [production] | 
            
  | 01:57 | 
  <Tim> | 
  relaxed wgAccountCreationThrottle on frwiki, presumably the 2006 vandal emergency is over. Disabled it on idwiki for workshop event. | 
  [production] | 
            
  | 01:45 | 
  <Tim> | 
  copying srv104's data from ms3 to ms2 | 
  [production] | 
            
  | 01:11 | 
  <Tim> | 
  started mysql on srv104 | 
  [production] | 
            
  
    | 
      
        2009-04-22
      
      §
     | 
  
    
  | 21:44 | 
  <tomaszf> | 
  db9 is back up. excessive tmpfs file systems removed | 
  [production] | 
            
  | 21:39 | 
  <tomaszf> | 
  taking outage on db9 to remove tmpfs file systems | 
  [production] | 
            
  | 11:34 | 
  <JeLuF> | 
  initiated reboot of srv137. dmesg shows no usable information any more. | 
  [production] | 
            
  | 11:30 | 
  <JeLuF> | 
  srv137 has read-only filesystem. Stopped Apache. | 
  [production] | 
            
  | 06:03 | 
  <andrew> | 
  synchronized php-1.5/includes/specials/SpecialBlockip.php  'Live-merged r49730, typo causing failures in user hiding' | 
  [production] | 
            
  | 06:02 | 
  <Andrew> | 
  srv137 still seems read-only, srv137: rsync: mkstemp "/apache/common/php-1.5/includes/specials/.SpecialBlockip.php.1QkrKX" failed: Read-only file system (30) | 
  [production] | 
            
  | 03:14 | 
  <Tim> | 
  copying ES data from srv104 to ms3 using nc tarpipe | 
  [production] | 
            
  | 03:10 | 
  <tstarling> | 
  synchronized php-1.5/db.php  'depooling srv104 ES' | 
  [production] | 
            
  | 03:03 | 
  <Tim> | 
  corruption found on cluster18, the copy source server (srv106) is missing lots of rows. Switched back to srv105/104. | 
  [production] | 
            
  | 03:02 | 
  <tstarling> | 
  synchronized php-1.5/db.php  | 
  [production] | 
            
  | 02:50 | 
  <tstarling> | 
  synchronized php-1.5/includes/Revision.php  'reverted profiling and logging hacks' | 
  [production] | 
            
  | 02:40 | 
  <Tim> | 
  depooled ms2 ex-fedora instances and shut them down, it can be a backup for now | 
  [production] | 
            
  | 02:38 | 
  <tstarling> | 
  synchronized php-1.5/db.php  | 
  [production] | 
            
  | 02:33 | 
  <Tim> | 
  deployed the new ms2/ms3 ex-fedora ES configuration | 
  [production] | 
            
  | 02:32 | 
  <tstarling> | 
  synchronized php-1.5/db.php  | 
  [production] | 
            
  | 02:01 | 
  <Tim> | 
  set up ex-fedora mysql instances on both ms2 and ms3, controlled with /etc/init.d/mysql-ex-fedora | 
  [production] | 
            
  | 01:04 | 
  <Tim> | 
  changed the main mysql instance on ms3 (rc1) to bind to a single IP address instead of * | 
  [production] | 
            
  
    | 
      
        2009-04-21
      
      §
     | 
  
    
  | 19:41 | 
  <mark> | 
  Added grosley.wikimedia.org to local_domains list on grosley's exim.conf, and added appropriate aliases in /etc/aliases | 
  [production] | 
            
  | 16:35 | 
  <Andrew> | 
  Re-ran rebuildTemplates.php, all seems well now | 
  [production] | 
            
  | 16:30 | 
  <robh> | 
  synchronized php-1.5/mc-pmtpa.php  'syncing for fred' | 
  [production] | 
            
  | 16:30 | 
  <root> | 
  synchronized php-1.5/mc-pmtpa.php  'swapping out srv88 for srv159 and srv90 for srv198' | 
  [production] | 
            
  | 16:29 | 
  <andrew> | 
  synchronized php-1.5/mc-pmtpa.php  'Switched srv88 for srv159, srv90 for srv198 to fix down memcache nodes' | 
  [production] | 
            
  | 16:18 | 
  <azafred> | 
  restarted memcached on srv96. Now responding. | 
  [production] | 
            
  | 16:14 | 
  <Rob> | 
  Fred needs to start logging in as Fred and not as root, bad fred (see it wasnt me this time, bwahahahahahaa) | 
  [production] | 
            
  | 16:11 | 
  <Andrew> | 
  Fred fixed up some memcached nodes, but no joy with rebuildTemplates | 
  [production] | 
            
  | 16:10 | 
  <root> | 
  synchronized php-1.5/mc-pmtpa.php  'swapping out down servers for active ones' | 
  [production] | 
            
  | 16:09 | 
  <root> | 
  synchronized php-1.5/mc-pmtpa.php  'swapping out down servers for active ones' | 
  [production] | 
            
  | 16:01 | 
  <Rob> | 
  srv137 read only, depooled in pybal for apache and rebooting. | 
  [production] | 
            
  | 15:57 | 
  <root> | 
  synchronized php-1.5/mc-pmtpa.php  'swapping out down servers for active ones' | 
  [production] | 
            
  | 14:34 | 
  <Andrew> | 
  rebuildTemplates.php appeared not to help, same problem as before (stopped after a few wikis). Possibly a dodgy memcache node. | 
  [production] | 
            
  | 14:32 | 
  <Andrew> | 
  ran rebuildTemplates.php metawiki due to reports of <messagename> appearing in place of the central notice. | 
  [production] | 
            
  | 05:04 | 
  <Andrew> | 
  Live-merged r49685, fix for unsuppression of usernames on unblock -- some usernames were left stuck suppressed if they were unblocked when the block suppressed their username | 
  [production] | 
            
  | 05:03 | 
  <andrew> | 
  synchronized php-1.5/includes/specials/SpecialBlockip.php  | 
  [production] | 
            
  | 05:03 | 
  <andrew> | 
  synchronized php-1.5/includes/specials/SpecialIpblocklist.php  | 
  [production] | 
            
  | 01:34 | 
  <azafred> | 
  Made some improvments on Spam handling. Bayes is in play and can learn from everybody what is spam and what is ham. Documentation to follow. | 
  [production] |