| 2009-02-23
      
      § | 
    
  | 17:32 | <Rob> | srv31 powered back up | [production] | 
            
  | 17:25 | <Rob> | found a breaker flip in the DC, affects srv31-srv34 | [production] | 
            
  | 13:40 | <domas> | oh, btw folks, kudos on perfect web2.0 engineering, now morebots complains when message is longer than 140 bytes, and we end up without our microblogging syndication | [production] | 
            
  | 13:39 | <domas> | added "su -m 'www-data' -c 'find /opt/mwlib/var/cache/ -mindepth 3 -mtime +1 -delete'" to pdf1 crontab, does anyone actually look after this service? | [production] | 
            
  | 12:57 | <Tim> | deployed r47704, now command line scripts don't access /home anymore | [production] | 
            
  | 11:37 | <Tim> | switched archive directory over to /mnt/upload5, starting another rsync. Some files will be missing until the rsync is done | [production] | 
            
  | 10:07 | <Tim> | moved all job runners from the previous ad hoc script to the new wikimedia-job-runner package | [production] | 
            
  | 06:25 | <Tim> | moved the nagios plugins for fedora from /home/nagios to /h/w/common/nagios-fedora-plugins | [production] | 
            
  | 05:21 | <Tim> | started udp2log on db20, MW UDP logs were dead | [production] | 
            
  | 05:19 | <Tim> | killed errant jobs loop scripts still running on fedora servers | [production] | 
            
  | 04:36 | <Tim> | fixed the log directory for /etc/cron.d/mw-central-notice, killed the process that was in a tight loop trying to write to a stale NFS file handle | [production] | 
            
  | 04:28 | <Tim> | finished moving ExtensionDistributor working copy | [production] | 
            
  | 04:14 | <Tim> | moving ExtensionDistributor working directory from /home to /mnt/upload5 | [production] | 
            
  | 04:00 | <Tim> | private/archive/wikipedia was in fact not migrated, but an initial rsync was done. I will do a second rsync now. | [production] | 
            
  | 03:42 | <Tim> | rsync done, uploads re-enabled, b/c symlinks set up | [production] | 
            
  | 03:37 | <Tim> | doing rsync | [production] | 
            
  | 03:31 | <Tim> | temporarily disabled file uploads on all private wikis, for migration to ms1 | [production] | 
            
  | 02:50 | <Tim> | same for commons ForeignDBViaLBRepo directory, ScanSet directory, CentralNotice directory, | [production] | 
            
  | 02:44 | <Tim> | fixed CommonSettings.php location of deleted images, upload3 -> upload5, appears to have been moved already | [production] | 
            
  
    | 2009-02-21
      
      § | 
    
  | 19:49 | <mark> | Installed gmond on eiximenis | [production] | 
            
  | 19:02 | <domas> | db26 lacks 8g of ram :) | [production] | 
            
  | 19:00 | <mark> | Restarted stuck apache on srv217 | [production] | 
            
  | 17:26 | <mark> | Started apache on srv218-221 | [production] | 
            
  | 17:24 | <mark> | Restarted stuck apache on srv217 | [production] | 
            
  | 17:07 | <mark> | Squid/kernel upgrade complete | [production] | 
            
  | 16:46 | <mark> | Increased max-connections per upload squid to ms1 to 100 | [production] | 
            
  | 15:58 | <mark> | Running automated upgrade/reboot of squid and kernel on sq43-47 | [production] | 
            
  | 15:58 | <mark> | Upgraded squid and kernel on sq41-42, sq48-50, and rebooted | [production] | 
            
  | 15:44 | <mark> | Upgraded squid and kernel on sq36-40, and rebooted | [production] | 
            
  | 12:55 | <river> | fixed reverse dns entries for ms3/ms4, which had got swapped somehow | [production] | 
            
  | 11:55 | <Tim> | re-enabled ExtensionDistributor | [production] | 
            
  | 11:16 | <Tim> | removed syslog.0 and messages.0 on srv170 and srv176, they had critical disk free on / | [production] | 
            
  | 03:25 | <Tim> | started apache on the image scaling servers | [production] | 
            
  | 02:51 | <brion> | ran sync-common on srv199 while i'm at it | [production] | 
            
  | 02:48 | <brion> | zeroing out stupid giant syslog files on srv199 | [production] | 
            
  | 02:46 | <brion> | srv199 is out of disk space | [production] | 
            
  | 02:46 | <brion> | copying hacked-up copies of InitialiseSettings/CommonSettings back to /home so the changes aren't lost this time | [production] | 
            
  | 02:23 | <mark> | db20 back up, for reals | [production] | 
            
  | 02:19 | <mark> | Rebooting db20 with upgraded RAID controller firmware | [production] | 
            
  | 02:13 | <domas> | flashing BIOS helped | [production] | 
            
  | 02:13 | <mark> | db20 up! | [production] | 
            
  | 02:04 | <brion> | services on bart (secure, planet) are temporarily offline while server is poked at | [production] | 
            
  | 01:50 | <brion> | seeing pages, yay | [production] | 
            
  | 01:49 | <brion> | running apache2ctl start or apachectl start for various apaches | [production] | 
            
  | 01:47 | <domas> | I FOUND HOW TO REVIVE APACHES | [production] | 
            
  | 01:46 | <brion> | think i killed em, now trying to restart apache procs | [production] | 
            
  | 01:43 | <brion> | poking to see if we can restart apaches... | [production] | 
            
  | 01:42 | <brion> | syncing fixed InitialiseSettings/COmmonSettings to apaches | [production] | 
            
  | 01:14 | <brion> | and flyingparchment | [production] | 
            
  | 01:14 | <brion> | domas and mark are attempting to restart the NFS server, but aren't mentioning any details in the public channel or log | [production] |