51-100 of 10000 results (12ms)
2012-01-05 §
17:49 <^demon> removed chuck norris plugin from jenkins, restarted [production]
16:48 <mutante> payments4 - 25 running nginx procs cause a warning - but normal and just raise limit? [production]
16:15 <mutante> people claim it was "completely resolved with "2.6.38-10 backport from PPA." (add-apt-repository ppa:kernel-ppa/ppa ...). wanna try that? (or just reboot ms1002 pls) [production]
15:45 <mutante> ms1002 - kswapd 100% CPU - but no swap used and free memory left - this looks like https://bugs.launchpad.net/ubuntu/+bug/721896 again [production]
15:39 <mutante> Nagios check_ntp does stuff like: overall average offset: 0 -> NTP OK: Offset unknown| -> NTP CRITICAL: Offset unknown (even though this bug was supposed to be fixed in a version before the one we use)..sigh [production]
15:14 <mutante> lvs1004 - puppet didnt run since 12 hours, looked stuck, "already in progress" on every run. rm /var/lib/puppet/state/puppetdlock, restart puppet agent, finished fine in a few seconds. maybe puppet [[bugzilla:2888|bug 2888]],5246 or related [production]
14:57 <mutante> magnesium - memcached runs on default port 11211, but we run all the others on 11000, this causes Nagios CRIT. Is it supposed to run here? (was also on -l 127.0.0.1 only, but init script starts it on all) [production]
14:55 <Jeff_Green> searchidx1 /a reached 100%, did the "space issues" maintenance procedure from wikitech search documentation [production]
14:39 <mutante> same on srv193 [production]
14:35 <mutante> srv290 - before restart memcached was running with -m 64 and -l 127.0.0.1 for some reason, causing Nagios CRIT, now it looks like others and recovered [production]
14:32 <mutante> restarting memcached on srv290 [production]
02:01 <LocalisationUpdate> completed (1.18) at Thu Jan 5 02:05:03 UTC 2012 [production]
2012-01-04 §
23:27 <catrope> synchronizing Wikimedia installation... : Deploying MoodBar and MarkAsHelpful changes [production]
22:39 <Tim> taking srv280 for action=purge slowness investigation [production]
21:20 <Ryan_Lane> deploying LdapAuthentication 2.0a and OpenStackmanager 1.3 to virt1 [production]
21:13 <RoanKattouw> Applying schema changes to moodbar_feedback_response on all wikis (drop index, create index, add column) [production]
19:36 <notpeter> restarting dhcpd on brewster [production]
19:13 <RobH> dns update successful and none of them fell over [production]
19:12 <Reedy> [[rev:108070|r108070]] even [production]
19:12 <reedy> synchronized php-1.18/extensions/CentralAuth/specials/ '[[rev:107070|r107070]]' [production]
19:11 <RobH> updating dns for mgmt of ms-fe1/2 and other new servers in tampa, as well as search boxen in eqiad [production]
19:04 <mutante> srv199 boots but without eth0, NIC1 is Enabled in BIOS but MAC Address "Not Present" - creating hardware ticket [production]
18:55 <catrope> synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js '[[rev:108064|r108064]]' [production]
18:43 <catrope> synchronized wmf-config/CommonSettings.php 'Disable AFTv5 bucketing tracking again' [production]
18:38 <mutante> powercycling srv199 [production]
18:33 <catrope> synchronized php-1.18/resources/startup.js 'touch' [production]
18:30 <catrope> synchronized wmf-config/CommonSettings.php 'Actually bump version number' [production]
18:28 <catrope> synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Revert live hack' [production]
18:24 <catrope> synchronized wmf-config/CommonSettings.php 'and bump the version number too' [production]
18:22 <catrope> synchronized wmf-config/CommonSettings.php 'Enable tracking for AFTv5 bucketing' [production]
18:06 <mutante> duplicate nagios-wm instances on spence (/home/wikipedia/bin/ircecho vs. /usr/ircecho/bin/ircecho) killed them both, restarted with init.d/ircecho [production]
18:00 <catrope> synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Live hack for tracking a percentage of bucketing events' [production]
17:52 <mutante> knsq11 is broken. boots into installer, then "Dazed and confused" at hardware detection (NMI received for unknown reason 21 on CPU 0). -> RT 2206 [production]
17:38 <mutante> powercycling knsq11 [production]
15:52 <mutante> added project deployment-prep for hexmode and petan [production]
11:31 <catrope> synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php '[[rev:108017|r108017]]' [production]
08:44 <nikerabbit> synchronized php-1.18/includes/specials/SpecialAllmessages.php '[[rev:107998|r107998]]' [production]
07:40 <Tim> fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6 [production]
07:32 <Tim> on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it [production]
03:25 <Tim> experimentally raised max_concurrent_checks to 128 [production]
03:17 <Tim> on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test. [production]
02:27 <Ryan_Lane> I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo [production]
02:24 <Tim> on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log [production]
02:22 <Ryan_Lane> removing manually added 10.2.1.13 address from lvs4 [production]
02:01 <LocalisationUpdate> completed (1.18) at Wed Jan 4 02:04:57 UTC 2012 [production]
01:43 <Nemo_bis> Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b [production]
01:02 <reedy> synchronized php-1.18/includes/ '[[rev:107978|r107978]]' [production]
00:45 <reedy> synchronized php-1.18/extensions '[[rev:107977|r107977]], [[rev:107976|r107976]]' [production]
00:39 <Tim> running purgeParserCache.php on hume, deleting objects older than 3 months [production]
00:38 <reedy> synchronized php-1.18/includes/specials/ '[[rev:107975|r107975]]' [production]