2012-01-04
§
|
21:13 |
<RoanKattouw> |
Applying schema changes to moodbar_feedback_response on all wikis (drop index, create index, add column) |
[production] |
19:36 |
<notpeter> |
restarting dhcpd on brewster |
[production] |
19:13 |
<RobH> |
dns update successful and none of them fell over |
[production] |
19:12 |
<Reedy> |
[[rev:108070|r108070]] even |
[production] |
19:12 |
<reedy> |
synchronized php-1.18/extensions/CentralAuth/specials/ '[[rev:107070|r107070]]' |
[production] |
19:11 |
<RobH> |
updating dns for mgmt of ms-fe1/2 and other new servers in tampa, as well as search boxen in eqiad |
[production] |
19:04 |
<mutante> |
srv199 boots but without eth0, NIC1 is Enabled in BIOS but MAC Address "Not Present" - creating hardware ticket |
[production] |
18:55 |
<catrope> |
synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js '[[rev:108064|r108064]]' |
[production] |
18:43 |
<catrope> |
synchronized wmf-config/CommonSettings.php 'Disable AFTv5 bucketing tracking again' |
[production] |
18:38 |
<mutante> |
powercycling srv199 |
[production] |
18:33 |
<catrope> |
synchronized php-1.18/resources/startup.js 'touch' |
[production] |
18:30 |
<catrope> |
synchronized wmf-config/CommonSettings.php 'Actually bump version number' |
[production] |
18:28 |
<catrope> |
synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Revert live hack' |
[production] |
18:24 |
<catrope> |
synchronized wmf-config/CommonSettings.php 'and bump the version number too' |
[production] |
18:22 |
<catrope> |
synchronized wmf-config/CommonSettings.php 'Enable tracking for AFTv5 bucketing' |
[production] |
18:06 |
<mutante> |
duplicate nagios-wm instances on spence (/home/wikipedia/bin/ircecho vs. /usr/ircecho/bin/ircecho) killed them both, restarted with init.d/ircecho |
[production] |
18:00 |
<catrope> |
synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Live hack for tracking a percentage of bucketing events' |
[production] |
17:52 |
<mutante> |
knsq11 is broken. boots into installer, then "Dazed and confused" at hardware detection (NMI received for unknown reason 21 on CPU 0). -> RT 2206 |
[production] |
17:38 |
<mutante> |
powercycling knsq11 |
[production] |
15:52 |
<mutante> |
added project deployment-prep for hexmode and petan |
[production] |
11:31 |
<catrope> |
synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php '[[rev:108017|r108017]]' |
[production] |
08:44 |
<nikerabbit> |
synchronized php-1.18/includes/specials/SpecialAllmessages.php '[[rev:107998|r107998]]' |
[production] |
07:40 |
<Tim> |
fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6 |
[production] |
07:32 |
<Tim> |
on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it |
[production] |
03:25 |
<Tim> |
experimentally raised max_concurrent_checks to 128 |
[production] |
03:17 |
<Tim> |
on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test. |
[production] |
02:27 |
<Ryan_Lane> |
I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo |
[production] |
02:24 |
<Tim> |
on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log |
[production] |
02:22 |
<Ryan_Lane> |
removing manually added 10.2.1.13 address from lvs4 |
[production] |
02:01 |
<LocalisationUpdate> |
completed (1.18) at Wed Jan 4 02:04:57 UTC 2012 |
[production] |
01:43 |
<Nemo_bis> |
Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b |
[production] |
01:02 |
<reedy> |
synchronized php-1.18/includes/ '[[rev:107978|r107978]]' |
[production] |
00:45 |
<reedy> |
synchronized php-1.18/extensions '[[rev:107977|r107977]], [[rev:107976|r107976]]' |
[production] |
00:39 |
<Tim> |
running purgeParserCache.php on hume, deleting objects older than 3 months |
[production] |
00:38 |
<reedy> |
synchronized php-1.18/includes/specials/ '[[rev:107975|r107975]]' |
[production] |
00:29 |
<tstarling> |
synchronizing Wikimedia installation... : |
[production] |
00:27 |
<reedy> |
synchronized php-1.18/extensions/Nuke/ '[[rev:107974|r107974]]' |
[production] |
00:25 |
<reedy> |
synchronized php-1.18/extensions/ '[[rev:107970|r107970]]' |
[production] |
2012-01-03
§
|
23:00 |
<Tim> |
on spence: restarting gmetad |
[production] |
22:58 |
<reedy> |
synchronizing Wikimedia installation... : Pushing [[rev:107953|r107953]], [[rev:107955|r107955]], [[rev:107956|r107956]], [[rev:107957|r107957]] |
[production] |
22:47 |
<LeslieCarr> |
stopping and then starting apache2 on spence to try and lower load |
[production] |
22:29 |
<RobH> |
added in the lo addres to lvs4, now its working and generating thumbnails |
[production] |
22:09 |
<reedy> |
synchronizing Wikimedia installation... : Push [[rev:107938|r107938]] [[rev:107948|r107948]] |
[production] |
21:45 |
<RobH> |
ganglia graphs will have missing data for past 30 to 40 minutes |
[production] |
21:45 |
<RobH> |
spence back online, ganglia and nagios confirmed operational |
[production] |
21:38 |
<RobH> |
resetting spence and dropping to serial to try to fix it |
[production] |
21:25 |
<RobH> |
nagios and ganglia down due to spence reboot, system still coming back online |
[production] |
21:21 |
<RobH> |
spence is unresponsive to ssh and serial console, rebooting |
[production] |
21:14 |
<LeslieCarr> |
resetting DRAC 5 on spence for management connectivity |
[production] |
21:05 |
<binasher> |
that fixed it. but how did that happen? |
[production] |