| 
      
        2012-01-03
      
      §
     | 
  
    
  | 22:47 | 
  <LeslieCarr> | 
  stopping and then starting apache2 on spence to try and lower load | 
  [production] | 
            
  | 22:29 | 
  <RobH> | 
  added in the lo addres to lvs4, now its working and generating thumbnails | 
  [production] | 
            
  | 22:09 | 
  <reedy> | 
  synchronizing Wikimedia installation... : Push [[rev:107938|r107938]] [[rev:107948|r107948]] | 
  [production] | 
            
  | 21:45 | 
  <RobH> | 
  ganglia graphs will have missing data for past 30 to 40 minutes | 
  [production] | 
            
  | 21:45 | 
  <RobH> | 
  spence back online, ganglia and nagios confirmed operational | 
  [production] | 
            
  | 21:38 | 
  <RobH> | 
  resetting spence and dropping to serial to try to fix it | 
  [production] | 
            
  | 21:25 | 
  <RobH> | 
  nagios and ganglia down due to spence reboot, system still coming back online | 
  [production] | 
            
  | 21:21 | 
  <RobH> | 
  spence is unresponsive to ssh and serial console, rebooting | 
  [production] | 
            
  | 21:14 | 
  <LeslieCarr> | 
  resetting DRAC 5 on spence for management connectivity | 
  [production] | 
            
  | 21:05 | 
  <binasher> | 
  that fixed it. but how did that happen? | 
  [production] | 
            
  | 21:05 | 
  <binasher> | 
  ran ip addr add 10.2.1.22/32 label "lo:LVS" dev lo on lvs4 | 
  [production] | 
            
  | 19:36 | 
  <reedy> | 
  synchronized php-1.18/skins/common/images/  '[[rev:107930|r107930]]' | 
  [production] | 
            
  | 17:36 | 
  <mutante> | 
  killing more runJobs.php / nextJobDB.php processes on a bunch of servers (/home/catrope/badjobrunners) | 
  [production] | 
            
  | 17:26 | 
  <RoanKattouw> | 
  Stopping job runners on the following DECOMMISSIONED servers: srv151 srv152 srv153 srv158 srv160 srv164 srv165 srv166 srv167 srv168 srv170 srv176 srv177 srv178 srv181 srv184 srv185 | 
  [production] | 
            
  | 15:55 | 
  <RobH> | 
  torrus back, took forever to recompile | 
  [production] | 
            
  | 15:53 | 
  <reedy> | 
  synchronized wmf-config/InitialiseSettings.php  'Bug 33485 - Enable WikiLove in si.wikipedia' | 
  [production] | 
            
  | 15:52 | 
  <Reedy> | 
  Created wikilove tables on siwiki | 
  [production] | 
            
  | 15:46 | 
  <RobH> | 
  torrus deadlocked, kicking | 
  [production] | 
            
  | 14:00 | 
  <RoanKattouw> | 
  Restarting job runners on srv242 and mw25, those are the last ones that are stuck | 
  [production] | 
            
  | 13:57 | 
  <RoanKattouw> | 
  Restarting all job runners that are stuck | 
  [production] | 
            
  | 13:48 | 
  <RoanKattouw> | 
  Restarting job runner on srv236, seems to be stuck | 
  [production] | 
            
  | 02:02 | 
  <LocalisationUpdate> | 
  completed (1.18) at Tue Jan  3 02:05:21 UTC 2012 | 
  [production] | 
            
  
    | 
      
        2012-01-02
      
      §
     | 
  
    
  | 23:36 | 
  <Reedy> | 
  Seems to potentially be an issue with job runners, enwiki backed up to over 90k over the last week or so. Needs investigating | 
  [production] | 
            
  | 23:18 | 
  <tstarling> | 
  synchronized php-1.18/includes/parser/Parser.php  '[[rev:107856|r107856]]' | 
  [production] | 
            
  | 22:58 | 
  <tstarling> | 
  synchronizing Wikimedia installation... :  | 
  [production] | 
            
  | 18:08 | 
  <nikerabbit> | 
  synchronized wmf-config/InitialiseSettings.php  'Bug 33368: WebFonts on bpywiki' | 
  [production] | 
            
  | 18:05 | 
  <nikerabbit> | 
  synchronized php-1.18/languages/messages/  'i18ndeploy [[rev:107843|r107843]]' | 
  [production] | 
            
  | 18:04 | 
  <nikerabbit> | 
  synchronized php-1.18/extensions/WebFonts/WebFonts.i18n.php  'i18ndeploy [[rev:107843|r107843]]' | 
  [production] | 
            
  | 16:58 | 
  <mutante> | 
  installed SiteMap extension on Bugzilla - soon bugs should be googleable | 
  [production] | 
            
  | 16:33 | 
  <mutante> | 
  upgraded Bugzilla from 4.0.2 to 4.0.3 (http://www.bugzilla.org/releases/4.0.3/release-notes.html#v40_point) (RT #2194) | 
  [production] | 
            
  | 14:47 | 
  <mutante> | 
  cleaned out gammu spool to stop sms bomb - sorry. deamon runs again now though.. | 
  [production] | 
            
  | 14:36 | 
  <mutante> | 
  fixed gammu-smsd on spence per wikitech "Nagios#Fixing_the_USB_dongle" (sending out queued SMS now ) | 
  [production] | 
            
  | 14:30 | 
  <mutante> | 
  puppet ran on spence, ganglia also seems ok despite the errors i logged before. gammu-smsd cant find device again though | 
  [production] | 
            
  | 14:03 | 
  <mutante> | 
  spence / gmetad - RRD_update .. illegal attempt to update using time .. last update time is .. (minimum one second step) | 
  [production] | 
            
  | 13:57 | 
  <mutante> | 
  gmond complains about missing kernel modules on spence when trying to start on boot | 
  [production] | 
            
  | 13:54 | 
  <mutante> | 
  spence down, no ssh, no mgmt output, powercycling it .. | 
  [production] | 
            
  | 02:01 | 
  <LocalisationUpdate> | 
  completed (1.18) at Mon Jan  2 02:04:47 UTC 2012 | 
  [production] | 
            
  | 00:08 | 
  <tstarling> | 
  synchronized php-1.18/includes/media/SVGMetadataExtractor.php  '[[rev:107792|r107792]]' | 
  [production] | 
            
  
    | 
      
        2012-01-01
      
      §
     | 
  
    
  | 21:28 | 
  <Ryan_Lane> | 
  restarted pdns-recursor on dobson | 
  [production] | 
            
  | 21:26 | 
  <Ryan_Lane> | 
  restarted pdns on ns2 about an hour ago | 
  [production] | 
            
  | 09:46 | 
  <apergos> | 
  restarted lucene search on srch 10, 11,  then later on 3,4,9,1 | 
  [production] | 
            
  | 09:35 | 
  <apergos> | 
  removed log.1 from /a/search/logs on search6, it was 35gb | 
  [production] | 
            
  | 03:55 | 
  <Tim> | 
  fixed broken package on search7 and search11 | 
  [production] | 
            
  | 02:01 | 
  <LocalisationUpdate> | 
  completed (1.18) at Sun Jan  1 02:04:30 UTC 2012 | 
  [production] | 
            
  | 01:36 | 
  <Tim> | 
  adjusted FD limit in /etc/init.d/lsearchd on all search servers with sed | 
  [production] | 
            
  | 01:34 | 
  <Tim> | 
  increased FD limit on search6 and restarted lsearchd | 
  [production] | 
            
  | 00:46 | 
  <Tim> | 
  removed some logs on search6 to fix /a disk space exhaustion | 
  [production] |