| 2012-01-04
      
      § | 
    
  | 19:12 | <reedy> | synchronized php-1.18/extensions/CentralAuth/specials/  '[[rev:107070|r107070]]' | [production] | 
            
  | 19:11 | <RobH> | updating dns for mgmt of ms-fe1/2 and other new servers in tampa, as well as search boxen in eqiad | [production] | 
            
  | 19:04 | <mutante> | srv199 boots but without eth0, NIC1 is Enabled in BIOS but MAC Address "Not Present" - creating hardware ticket | [production] | 
            
  | 18:55 | <catrope> | synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js  '[[rev:108064|r108064]]' | [production] | 
            
  | 18:43 | <catrope> | synchronized wmf-config/CommonSettings.php  'Disable AFTv5 bucketing tracking again' | [production] | 
            
  | 18:38 | <mutante> | powercycling srv199 | [production] | 
            
  | 18:33 | <catrope> | synchronized php-1.18/resources/startup.js  'touch' | [production] | 
            
  | 18:30 | <catrope> | synchronized wmf-config/CommonSettings.php  'Actually bump version number' | [production] | 
            
  | 18:28 | <catrope> | synchronized php-1.18/resources/mediawiki/mediawiki.user.js  'Revert live hack' | [production] | 
            
  | 18:24 | <catrope> | synchronized wmf-config/CommonSettings.php  'and bump the version number too' | [production] | 
            
  | 18:22 | <catrope> | synchronized wmf-config/CommonSettings.php  'Enable tracking for AFTv5 bucketing' | [production] | 
            
  | 18:06 | <mutante> | duplicate nagios-wm instances on spence (/home/wikipedia/bin/ircecho vs. /usr/ircecho/bin/ircecho) killed them both, restarted with init.d/ircecho | [production] | 
            
  | 18:00 | <catrope> | synchronized php-1.18/resources/mediawiki/mediawiki.user.js  'Live hack for tracking a percentage of bucketing events' | [production] | 
            
  | 17:52 | <mutante> | knsq11 is broken. boots into installer, then "Dazed and confused" at hardware detection (NMI received for unknown reason 21 on CPU 0). -> RT 2206 | [production] | 
            
  | 17:38 | <mutante> | powercycling knsq11 | [production] | 
            
  | 15:52 | <mutante> | added project deployment-prep for hexmode and petan | [production] | 
            
  | 11:31 | <catrope> | synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php  '[[rev:108017|r108017]]' | [production] | 
            
  | 08:44 | <nikerabbit> | synchronized php-1.18/includes/specials/SpecialAllmessages.php  '[[rev:107998|r107998]]' | [production] | 
            
  | 07:40 | <Tim> | fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6 | [production] | 
            
  | 07:32 | <Tim> | on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it | [production] | 
            
  | 03:25 | <Tim> | experimentally raised max_concurrent_checks to 128 | [production] | 
            
  | 03:17 | <Tim> | on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test. | [production] | 
            
  | 02:27 | <Ryan_Lane> | I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo | [production] | 
            
  | 02:24 | <Tim> | on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log | [production] | 
            
  | 02:22 | <Ryan_Lane> | removing manually added 10.2.1.13 address from lvs4 | [production] | 
            
  | 02:01 | <LocalisationUpdate> | completed (1.18) at Wed Jan  4 02:04:57 UTC 2012 | [production] | 
            
  | 01:43 | <Nemo_bis> | Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b | [production] | 
            
  | 01:02 | <reedy> | synchronized php-1.18/includes/  '[[rev:107978|r107978]]' | [production] | 
            
  | 00:45 | <reedy> | synchronized php-1.18/extensions  '[[rev:107977|r107977]], [[rev:107976|r107976]]' | [production] | 
            
  | 00:39 | <Tim> | running purgeParserCache.php on hume, deleting objects older than 3 months | [production] | 
            
  | 00:38 | <reedy> | synchronized php-1.18/includes/specials/  '[[rev:107975|r107975]]' | [production] | 
            
  | 00:29 | <tstarling> | synchronizing Wikimedia installation... : | [production] | 
            
  | 00:27 | <reedy> | synchronized php-1.18/extensions/Nuke/  '[[rev:107974|r107974]]' | [production] | 
            
  | 00:25 | <reedy> | synchronized php-1.18/extensions/  '[[rev:107970|r107970]]' | [production] | 
            
  
    | 2012-01-03
      
      § | 
    
  | 23:00 | <Tim> | on spence: restarting gmetad | [production] | 
            
  | 22:58 | <reedy> | synchronizing Wikimedia installation... : Pushing [[rev:107953|r107953]], [[rev:107955|r107955]], [[rev:107956|r107956]], [[rev:107957|r107957]] | [production] | 
            
  | 22:47 | <LeslieCarr> | stopping and then starting apache2 on spence to try and lower load | [production] | 
            
  | 22:29 | <RobH> | added in the lo addres to lvs4, now its working and generating thumbnails | [production] | 
            
  | 22:09 | <reedy> | synchronizing Wikimedia installation... : Push [[rev:107938|r107938]] [[rev:107948|r107948]] | [production] | 
            
  | 21:45 | <RobH> | ganglia graphs will have missing data for past 30 to 40 minutes | [production] | 
            
  | 21:45 | <RobH> | spence back online, ganglia and nagios confirmed operational | [production] | 
            
  | 21:38 | <RobH> | resetting spence and dropping to serial to try to fix it | [production] | 
            
  | 21:25 | <RobH> | nagios and ganglia down due to spence reboot, system still coming back online | [production] | 
            
  | 21:21 | <RobH> | spence is unresponsive to ssh and serial console, rebooting | [production] | 
            
  | 21:14 | <LeslieCarr> | resetting DRAC 5 on spence for management connectivity | [production] | 
            
  | 21:05 | <binasher> | that fixed it. but how did that happen? | [production] | 
            
  | 21:05 | <binasher> | ran ip addr add 10.2.1.22/32 label "lo:LVS" dev lo on lvs4 | [production] | 
            
  | 19:36 | <reedy> | synchronized php-1.18/skins/common/images/  '[[rev:107930|r107930]]' | [production] | 
            
  | 17:36 | <mutante> | killing more runJobs.php / nextJobDB.php processes on a bunch of servers (/home/catrope/badjobrunners) | [production] | 
            
  | 17:26 | <RoanKattouw> | Stopping job runners on the following DECOMMISSIONED servers: srv151 srv152 srv153 srv158 srv160 srv164 srv165 srv166 srv167 srv168 srv170 srv176 srv177 srv178 srv181 srv184 srv185 | [production] |