| 2009-12-04
      
      § | 
    
  | 23:30 | <atglenn> | started netcat of the bulk of the data from ms5 to ms7. running in screen as root on both hosts. | [production] | 
            
  | 23:21 | <atglenn> | started ncat of (small piece of) image date from ms5 to ms7, running in screen as root on both hosts | [production] | 
            
  | 20:47 | <Rob> | which doesnt work, damn. | [production] | 
            
  | 20:47 | <Rob> | got sick of racktables.wikimedia.org not redirecting correctly, put in a rewrite for non ssl connections to ssl | [production] | 
            
  | 20:24 | <Fred> | fixed nrpe on db20 and db7 | [production] | 
            
  | 20:13 | <root> | ran sync-common-all | [production] | 
            
  | 20:12 | <Rob> | running sync-common-all to update configuration for support of flaggedrevs on plwiktionary | [production] | 
            
  | 19:20 | <Rob> | srv144 removed from node groups & pybal, nagios resynced. | [production] | 
            
  | 19:19 | <Rob> | srv144 is out of warranty and rebooting randomly, decommissioning. | [production] | 
            
  | 19:05 | <Fred> | finished setup of srv245. | [production] | 
            
  | 19:02 | <Rob> | srv126 removed from node groups and lvs.  nagios restarted to exclude it. | [production] | 
            
  | 19:01 | <Rob> | srv126 refuses to even post when benched, out of warranty, slating for immediate decommissioning | [production] | 
            
  | 19:00 | <Rob> | srv144 reinstalling with a single hard disk, no more raid1 | [production] | 
            
  | 18:50 | <Rob> | swapped primary srv144 drive with old decommissioned spare.  reinstalling OS, will reinstall packages and get online later. | [production] | 
            
  | 18:45 | <Rob> | sq22 back online, all drives nominal, rebuilding cache and ensuring it is in rotation | [production] | 
            
  | 18:41 | <Rob> | rebooted sq22 | [production] | 
            
  | 18:38 | <Rob> | rebooted srv144 and srv126 | [production] | 
            
  | 18:36 | <Rob> | srv245 package install failed.  I do not have time to tinker with it while in the DC, I have other things that require my physical access to the machines.  Leaving it alone for now to work on remotely. | [production] | 
            
  | 18:28 | <Rob> | srv245 OS installed, setting up wikimedia-task-appserver | [production] | 
            
  | 18:06 | <Rob> | srv245 was sitting idle with no OS, depooled from apaches.  reinstalling system. | [production] | 
            
  | 17:57 | <Rob> | rebooted srv83 per fred | [production] | 
            
  | 17:35 | <Fred> | removed srv83 from the nodelist since it was causing ddsh to never finish executing. | [production] | 
            
  | 17:26 | <Fred> | fixed broken apache. Seems like there is a machine down that is preventing normal sync-file from finishing... Looking into it. | [production] | 
            
  | 16:50 | <rainman-sr> | stopped logging of search queries on searchidx1 until someone sets up proper log archiving to a different machine | [production] | 
            
  | 16:48 | <rainman-sr> | searchidx1 had full disk, freed some 100gb of space by deleting logs and stuff laying around | [production] | 
            
  | 16:14 | <Rob> | srv245 down and unresponsive, rebooting | [production] | 
            
  | 16:12 | <Rob> | sq43's replacement disk is also bad (talk about bad luck), placing rma with dell.  system will remain powered down for now. | [production] | 
            
  | 15:55 | <Rob> | sq43 isn't seeing a replaced disk, rebooting and troubleshooting | [production] | 
            
  | 15:33 | <domas> | 'arcconf setcache 1 logicaldrive 0 roff ' - disabling any read caching on db11-db30 RAIDs | [production] | 
            
  | 15:13 | <Rob> | after tinkering with it with domas, it appears rebuild is indeed automatic.  db21 rebuilding raid array | [production] | 
            
  | 15:09 | <Rob> | db21 bad disk swapped out, rebuild should be automatic | [production] | 
            
  | 14:57 | <Rob> | sq14 back up, rebuilding its cache | [production] | 
            
  | 14:54 | <Rob> | sq13 primary disk dead, out of warranty | [production] | 
            
  | 14:53 | <Rob> | swapping sdc in sq13 and sq14 to bring sq14 back online | [production] | 
            
  | 14:53 | <Rob> | sq14 disk sdc dead, out of warranty. | [production] | 
            
  | 05:18 | <Tim> | on fenari: running all pending renameUser jobs from enwiki | [production] | 
            
  | 03:37 | <Tim> | Around 03:12, accidentally renamed enwiki's job table and so renamed it back a second later. This caused all slaves to stop due to a replication bug. Fixed now. | [production] | 
            
  | 03:25 | <Tim> | testing fixJobQueueExplosion.php on commonswiki | [production] | 
            
  | 02:46 | <Tim> | srv156 not responding to ssh, trying reboot | [production] | 
            
  | 01:13 | <Tim> | restarting job runners | [production] | 
            
  | 01:13 | <tstarling> | synchronized php-1.5/includes/HTMLCacheUpdate.php  'patching out all category backlink updates, major bug causing job queue to stall' | [production] | 
            
  | 00:12 | <Tim> | granted access to root@fenari on all servers in the mysql node group | [production] |