2011-12-23
§
|
16:34 |
<mutante> |
kswapd crashed, there is a Call Trace, and there was a load spike before, guess it is https://bugs.launchpad.net/ubuntu/+bug/721896 or similar |
[production] |
16:27 |
<mutante> |
first interesting syslog line when it started: formey kernel: [39665413.570024] INFO: task kswapd0:36 blocked for more than 120 seconds. |
[production] |
16:10 |
<mutante> |
gerrit and svn back up |
[production] |
16:07 |
<mutante> |
gerrit stopped working and formey would still ping but no ssh connect and no mgmt output, powercycling formey |
[production] |
02:07 |
<binasher> |
db9 maintenance completed |
[production] |
02:02 |
<LocalisationUpdate> |
completed (1.18) at Fri Dec 23 02:05:19 UTC 2011 |
[production] |
01:57 |
<binasher> |
starting db9 maintenance - services will be unavailable for approx 15 minutes |
[production] |
00:08 |
<hashar> |
synchronized php-1.18/extensions/WikimediaIncubator/IncubatorTest.php 'deploy [[rev:107120|r107120]] - ([[bugzilla:32772|bug 32772]]) fix up MoodBar and WikiLove on WIkimedia Incubator' |
[production] |
2011-12-22
§
|
23:43 |
<maplebed> |
put owa1-3 in as container servers, took ms1-3 out for pmtpa test swift cluster |
[production] |
20:56 |
<neilk> |
synchronized p/resources/mediawiki/mediawiki.feedback.js |
[production] |
20:45 |
<Ryan_Lane> |
*date |
[production] |
20:45 |
<Ryan_Lane> |
bringing mediawiki on virt1 up to daye |
[production] |
20:24 |
<catrope> |
synchronized wmf-config/InitialiseSettings.php 'Re-enable Contest extension' |
[production] |
20:10 |
<mark> |
Turned off puppet dashboard reporting |
[production] |
15:07 |
<apergos> |
er, on srv219, that is. |
[production] |
15:07 |
<apergos> |
cleaned out tmp but... see, there really was only today's stuff in there so it's making me nervous |
[production] |
14:15 |
<31NAAAO2O> |
hashar: testswarm: deleted mobile jobs requests since they are now disabled |
[production] |
12:30 |
<mark> |
Made swift thumb seeder fetch from the squids instead of ms5, as a performance test |
[production] |
05:10 |
<Ryan_Lane> |
finished creating all puppet configuration groups, classes, and variables. It's safe to configure and create instances again. |
[production] |
04:40 |
<Ryan_Lane> |
upping version of OpenStackManager on virt1 to match development. configuration and creation of instances should be avoided. |
[production] |
02:04 |
<binasher> |
db9 is writable again |
[production] |
02:01 |
<LocalisationUpdate> |
completed (1.18) at Thu Dec 22 02:04:05 UTC 2011 |
[production] |
01:59 |
<binasher> |
started db9 maintenance phase 1 (getting it replicated to db10 again) |
[production] |
00:15 |
<K4-713> |
synchronized payments cluster to [[rev:107018|r107018]] |
[production] |
00:07 |
<awjrichards> |
synchronizing Wikimedia installation... : [[rev:107015|r107015]] |
[production] |
2011-12-21
§
|
22:44 |
<mark> |
proxy worker processes increased from 8 to 24 on owa1-2, 48 on owa3 |
[production] |
22:28 |
<mark> |
Unmounted /srv/swift-storage/sdab1 on ms2 (borken filesystem) |
[production] |
22:16 |
<mark> |
Flushed all iptables rules down the drain on ms1-3 (live hack, puppet will restore) |
[production] |
21:54 |
<mark> |
Ran swift-init all restart on ms3 |
[production] |
21:41 |
<mark> |
Ran swift-init all restart on ms2 |
[production] |
21:33 |
<mark> |
Running ben's swift thumb loader script in a screen on hume |
[production] |
21:05 |
<apergos> |
three more bin logs tossed from ds9 |
[production] |
20:31 |
<mark> |
Restarted swift-container on ms1 with higher worker count (4 instead of 2) |
[production] |
20:31 |
<Jeff_Green> |
power cycled kaulen because it's deathswapped and unresponsive |
[production] |
19:41 |
<mark> |
Ended oprofile run on ms1 |
[production] |
19:33 |
<catrope> |
synchronized wmf-config/missing.php 'Update missing.php from trunk, see [[bugzilla:30206|bug 30206]]' |
[production] |
19:24 |
<mark> |
Started oprofile run on ms1 |
[production] |
19:20 |
<mark> |
Migrated DRBD sync between nfs1 and nfs2 from protocol C (sync) to A (async) |
[production] |
17:48 |
<RoanKattouw> |
srv224 has a full disk |
[production] |
17:48 |
<catrope> |
synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js '[[rev:106959|r106959]]' |
[production] |
17:28 |
<maplebed> |
ran apt-get clean on hume to clear out ~600M space on the / partition |
[production] |
16:18 |
<apergos> |
so that was fast. barf from scp, nice call trace etc, shot the process on ds2, will email the vendor |
[production] |
15:27 |
<apergos> |
and starting another huge copy from ds2 to ds1, let's see what happens... |
[production] |
15:18 |
<apergos> |
reboot dataset1 with new kernel |
[production] |
15:14 |
<apergos> |
installing 2.6.38 from natty backports on ds1 for further testing |
[production] |
13:55 |
<apergos> |
powering on and off ds1 the hard way via the pdu. |
[production] |
11:15 |
<apergos> |
rebooting ds1 as it's got the one cpu tied up with a hung scp process and continual spewing to syslog... |
[production] |
10:23 |
<apergos> |
s/lgo/log/ as in syslog. saving a copy of the bad log in fenari:/home/ariel/dataset1-syslog-dec-20-2012 |
[production] |