| 
      
        2012-01-04
      
      §
     | 
  
    
  | 18:00 | 
  <catrope> | 
  synchronized php-1.18/resources/mediawiki/mediawiki.user.js  'Live hack for tracking a percentage of bucketing events' | 
  [production] | 
            
  | 17:52 | 
  <mutante> | 
  knsq11 is broken. boots into installer, then "Dazed and confused" at hardware detection (NMI received for unknown reason 21 on CPU 0). -> RT 2206 | 
  [production] | 
            
  | 17:38 | 
  <mutante> | 
  powercycling knsq11 | 
  [production] | 
            
  | 15:52 | 
  <mutante> | 
  added project deployment-prep for hexmode and petan | 
  [production] | 
            
  | 11:31 | 
  <catrope> | 
  synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php  '[[rev:108017|r108017]]' | 
  [production] | 
            
  | 08:44 | 
  <nikerabbit> | 
  synchronized php-1.18/includes/specials/SpecialAllmessages.php  '[[rev:107998|r107998]]' | 
  [production] | 
            
  | 07:40 | 
  <Tim> | 
  fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6 | 
  [production] | 
            
  | 07:32 | 
  <Tim> | 
  on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it | 
  [production] | 
            
  | 03:25 | 
  <Tim> | 
  experimentally raised max_concurrent_checks to 128 | 
  [production] | 
            
  | 03:17 | 
  <Tim> | 
  on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test. | 
  [production] | 
            
  | 02:27 | 
  <Ryan_Lane> | 
  I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo | 
  [production] | 
            
  | 02:24 | 
  <Tim> | 
  on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log | 
  [production] | 
            
  | 02:22 | 
  <Ryan_Lane> | 
  removing manually added 10.2.1.13 address from lvs4 | 
  [production] | 
            
  | 02:01 | 
  <LocalisationUpdate> | 
  completed (1.18) at Wed Jan  4 02:04:57 UTC 2012 | 
  [production] | 
            
  | 01:43 | 
  <Nemo_bis> | 
  Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b | 
  [production] | 
            
  | 01:02 | 
  <reedy> | 
  synchronized php-1.18/includes/  '[[rev:107978|r107978]]' | 
  [production] | 
            
  | 00:45 | 
  <reedy> | 
  synchronized php-1.18/extensions  '[[rev:107977|r107977]], [[rev:107976|r107976]]' | 
  [production] | 
            
  | 00:39 | 
  <Tim> | 
  running purgeParserCache.php on hume, deleting objects older than 3 months | 
  [production] | 
            
  | 00:38 | 
  <reedy> | 
  synchronized php-1.18/includes/specials/  '[[rev:107975|r107975]]' | 
  [production] | 
            
  | 00:29 | 
  <tstarling> | 
  synchronizing Wikimedia installation... :  | 
  [production] | 
            
  | 00:27 | 
  <reedy> | 
  synchronized php-1.18/extensions/Nuke/  '[[rev:107974|r107974]]' | 
  [production] | 
            
  | 00:25 | 
  <reedy> | 
  synchronized php-1.18/extensions/  '[[rev:107970|r107970]]' | 
  [production] | 
            
  
    | 
      
        2012-01-03
      
      §
     | 
  
    
  | 23:00 | 
  <Tim> | 
  on spence: restarting gmetad | 
  [production] | 
            
  | 22:58 | 
  <reedy> | 
  synchronizing Wikimedia installation... : Pushing [[rev:107953|r107953]], [[rev:107955|r107955]], [[rev:107956|r107956]], [[rev:107957|r107957]] | 
  [production] | 
            
  | 22:47 | 
  <LeslieCarr> | 
  stopping and then starting apache2 on spence to try and lower load | 
  [production] | 
            
  | 22:29 | 
  <RobH> | 
  added in the lo addres to lvs4, now its working and generating thumbnails | 
  [production] | 
            
  | 22:09 | 
  <reedy> | 
  synchronizing Wikimedia installation... : Push [[rev:107938|r107938]] [[rev:107948|r107948]] | 
  [production] | 
            
  | 21:45 | 
  <RobH> | 
  ganglia graphs will have missing data for past 30 to 40 minutes | 
  [production] | 
            
  | 21:45 | 
  <RobH> | 
  spence back online, ganglia and nagios confirmed operational | 
  [production] | 
            
  | 21:38 | 
  <RobH> | 
  resetting spence and dropping to serial to try to fix it | 
  [production] | 
            
  | 21:25 | 
  <RobH> | 
  nagios and ganglia down due to spence reboot, system still coming back online | 
  [production] | 
            
  | 21:21 | 
  <RobH> | 
  spence is unresponsive to ssh and serial console, rebooting | 
  [production] | 
            
  | 21:14 | 
  <LeslieCarr> | 
  resetting DRAC 5 on spence for management connectivity | 
  [production] | 
            
  | 21:05 | 
  <binasher> | 
  that fixed it. but how did that happen? | 
  [production] | 
            
  | 21:05 | 
  <binasher> | 
  ran ip addr add 10.2.1.22/32 label "lo:LVS" dev lo on lvs4 | 
  [production] | 
            
  | 19:36 | 
  <reedy> | 
  synchronized php-1.18/skins/common/images/  '[[rev:107930|r107930]]' | 
  [production] | 
            
  | 17:36 | 
  <mutante> | 
  killing more runJobs.php / nextJobDB.php processes on a bunch of servers (/home/catrope/badjobrunners) | 
  [production] | 
            
  | 17:26 | 
  <RoanKattouw> | 
  Stopping job runners on the following DECOMMISSIONED servers: srv151 srv152 srv153 srv158 srv160 srv164 srv165 srv166 srv167 srv168 srv170 srv176 srv177 srv178 srv181 srv184 srv185 | 
  [production] | 
            
  | 15:55 | 
  <RobH> | 
  torrus back, took forever to recompile | 
  [production] | 
            
  | 15:53 | 
  <reedy> | 
  synchronized wmf-config/InitialiseSettings.php  'Bug 33485 - Enable WikiLove in si.wikipedia' | 
  [production] | 
            
  | 15:52 | 
  <Reedy> | 
  Created wikilove tables on siwiki | 
  [production] | 
            
  | 15:46 | 
  <RobH> | 
  torrus deadlocked, kicking | 
  [production] | 
            
  | 14:00 | 
  <RoanKattouw> | 
  Restarting job runners on srv242 and mw25, those are the last ones that are stuck | 
  [production] | 
            
  | 13:57 | 
  <RoanKattouw> | 
  Restarting all job runners that are stuck | 
  [production] | 
            
  | 13:48 | 
  <RoanKattouw> | 
  Restarting job runner on srv236, seems to be stuck | 
  [production] | 
            
  | 02:02 | 
  <LocalisationUpdate> | 
  completed (1.18) at Tue Jan  3 02:05:21 UTC 2012 | 
  [production] |