| 2012-01-05
      
      § | 
    
  | 18:45 | <preilly> | synchronized php-1.18/extensions/MobileFrontend/javascripts/application.js | [production] | 
            
  | 18:00 | <mutante> | tarin - added "#includedir /etc/sudoers.d" to sudo config, needs to read /etc/sudoers.d/nrpe for Nagios RAID check | [production] | 
            
  | 17:49 | <logmsgbot_> | hashar: gallium: cleaned /tmp . Our test suites leak a large amount of files :D | [production] | 
            
  | 17:49 | <^demon> | removed chuck norris plugin from jenkins, restarted | [production] | 
            
  | 16:48 | <mutante> | payments4 - 25 running nginx procs cause a warning - but normal and just raise limit? | [production] | 
            
  | 16:15 | <mutante> | people claim it was "completely resolved with "2.6.38-10 backport from PPA." (add-apt-repository ppa:kernel-ppa/ppa ...). wanna try that? (or just reboot ms1002 pls) | [production] | 
            
  | 15:45 | <mutante> | ms1002 - kswapd 100% CPU - but no swap used and free memory left - this looks like https://bugs.launchpad.net/ubuntu/+bug/721896 again | [production] | 
            
  | 15:39 | <mutante> | Nagios check_ntp does stuff like: overall average offset: 0  ->  NTP OK: Offset unknown| -> NTP CRITICAL: Offset unknown (even though this bug was supposed to be fixed in a version before the one we use)..sigh | [production] | 
            
  | 15:14 | <mutante> | lvs1004 - puppet didnt run since 12 hours, looked stuck, "already in progress" on every run. rm /var/lib/puppet/state/puppetdlock, restart puppet agent, finished fine in a few seconds. maybe puppet [[bugzilla:2888|bug 2888]],5246 or related | [production] | 
            
  | 14:57 | <mutante> | magnesium - memcached runs on default port 11211, but we run all the others on 11000, this causes Nagios CRIT. Is it supposed to run here? (was also on -l 127.0.0.1 only, but init script starts it on all) | [production] | 
            
  | 14:55 | <Jeff_Green> | searchidx1 /a reached 100%, did the "space issues" maintenance procedure from wikitech search documentation | [production] | 
            
  | 14:39 | <mutante> | same on srv193 | [production] | 
            
  | 14:35 | <mutante> | srv290 - before restart memcached was running with -m 64 and -l 127.0.0.1 for some reason, causing Nagios CRIT, now it looks like others and recovered | [production] | 
            
  | 14:32 | <mutante> | restarting memcached on srv290 | [production] | 
            
  | 02:01 | <LocalisationUpdate> | completed (1.18) at Thu Jan  5 02:05:03 UTC 2012 | [production] | 
            
  
    | 2012-01-04
      
      § | 
    
  | 23:27 | <catrope> | synchronizing Wikimedia installation... : Deploying MoodBar and MarkAsHelpful changes | [production] | 
            
  | 22:39 | <Tim> | taking srv280 for action=purge slowness investigation | [production] | 
            
  | 21:20 | <Ryan_Lane> | deploying LdapAuthentication 2.0a and OpenStackmanager 1.3 to virt1 | [production] | 
            
  | 21:13 | <RoanKattouw> | Applying schema changes to moodbar_feedback_response on all wikis (drop index, create index, add column) | [production] | 
            
  | 19:36 | <notpeter> | restarting dhcpd on brewster | [production] | 
            
  | 19:13 | <RobH> | dns update successful and none of them fell over | [production] | 
            
  | 19:12 | <Reedy> | [[rev:108070|r108070]] even | [production] | 
            
  | 19:12 | <reedy> | synchronized php-1.18/extensions/CentralAuth/specials/  '[[rev:107070|r107070]]' | [production] | 
            
  | 19:11 | <RobH> | updating dns for mgmt of ms-fe1/2 and other new servers in tampa, as well as search boxen in eqiad | [production] | 
            
  | 19:04 | <mutante> | srv199 boots but without eth0, NIC1 is Enabled in BIOS but MAC Address "Not Present" - creating hardware ticket | [production] | 
            
  | 18:55 | <catrope> | synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js  '[[rev:108064|r108064]]' | [production] | 
            
  | 18:43 | <catrope> | synchronized wmf-config/CommonSettings.php  'Disable AFTv5 bucketing tracking again' | [production] | 
            
  | 18:38 | <mutante> | powercycling srv199 | [production] | 
            
  | 18:33 | <catrope> | synchronized php-1.18/resources/startup.js  'touch' | [production] | 
            
  | 18:30 | <catrope> | synchronized wmf-config/CommonSettings.php  'Actually bump version number' | [production] | 
            
  | 18:28 | <catrope> | synchronized php-1.18/resources/mediawiki/mediawiki.user.js  'Revert live hack' | [production] | 
            
  | 18:24 | <catrope> | synchronized wmf-config/CommonSettings.php  'and bump the version number too' | [production] | 
            
  | 18:22 | <catrope> | synchronized wmf-config/CommonSettings.php  'Enable tracking for AFTv5 bucketing' | [production] | 
            
  | 18:06 | <mutante> | duplicate nagios-wm instances on spence (/home/wikipedia/bin/ircecho vs. /usr/ircecho/bin/ircecho) killed them both, restarted with init.d/ircecho | [production] | 
            
  | 18:00 | <catrope> | synchronized php-1.18/resources/mediawiki/mediawiki.user.js  'Live hack for tracking a percentage of bucketing events' | [production] | 
            
  | 17:52 | <mutante> | knsq11 is broken. boots into installer, then "Dazed and confused" at hardware detection (NMI received for unknown reason 21 on CPU 0). -> RT 2206 | [production] | 
            
  | 17:38 | <mutante> | powercycling knsq11 | [production] | 
            
  | 15:52 | <mutante> | added project deployment-prep for hexmode and petan | [production] | 
            
  | 11:31 | <catrope> | synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php  '[[rev:108017|r108017]]' | [production] | 
            
  | 08:44 | <nikerabbit> | synchronized php-1.18/includes/specials/SpecialAllmessages.php  '[[rev:107998|r107998]]' | [production] | 
            
  | 07:40 | <Tim> | fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6 | [production] | 
            
  | 07:32 | <Tim> | on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it | [production] | 
            
  | 03:25 | <Tim> | experimentally raised max_concurrent_checks to 128 | [production] | 
            
  | 03:17 | <Tim> | on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test. | [production] | 
            
  | 02:27 | <Ryan_Lane> | I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo | [production] | 
            
  | 02:24 | <Tim> | on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log | [production] | 
            
  | 02:22 | <Ryan_Lane> | removing manually added 10.2.1.13 address from lvs4 | [production] | 
            
  | 02:01 | <LocalisationUpdate> | completed (1.18) at Wed Jan  4 02:04:57 UTC 2012 | [production] | 
            
  | 01:43 | <Nemo_bis> | Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b | [production] | 
            
  | 01:02 | <reedy> | synchronized php-1.18/includes/  '[[rev:107978|r107978]]' | [production] |