2011-12-21
§
|
22:28 |
<mark> |
Unmounted /srv/swift-storage/sdab1 on ms2 (borken filesystem) |
[production] |
22:16 |
<mark> |
Flushed all iptables rules down the drain on ms1-3 (live hack, puppet will restore) |
[production] |
21:54 |
<mark> |
Ran swift-init all restart on ms3 |
[production] |
21:41 |
<mark> |
Ran swift-init all restart on ms2 |
[production] |
21:33 |
<mark> |
Running ben's swift thumb loader script in a screen on hume |
[production] |
21:05 |
<apergos> |
three more bin logs tossed from ds9 |
[production] |
20:31 |
<mark> |
Restarted swift-container on ms1 with higher worker count (4 instead of 2) |
[production] |
20:31 |
<Jeff_Green> |
power cycled kaulen because it's deathswapped and unresponsive |
[production] |
19:41 |
<mark> |
Ended oprofile run on ms1 |
[production] |
19:33 |
<catrope> |
synchronized wmf-config/missing.php 'Update missing.php from trunk, see [[bugzilla:30206|bug 30206]]' |
[production] |
19:24 |
<mark> |
Started oprofile run on ms1 |
[production] |
19:20 |
<mark> |
Migrated DRBD sync between nfs1 and nfs2 from protocol C (sync) to A (async) |
[production] |
17:48 |
<RoanKattouw> |
srv224 has a full disk |
[production] |
17:48 |
<catrope> |
synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js '[[rev:106959|r106959]]' |
[production] |
17:28 |
<maplebed> |
ran apt-get clean on hume to clear out ~600M space on the / partition |
[production] |
16:18 |
<apergos> |
so that was fast. barf from scp, nice call trace etc, shot the process on ds2, will email the vendor |
[production] |
15:27 |
<apergos> |
and starting another huge copy from ds2 to ds1, let's see what happens... |
[production] |
15:18 |
<apergos> |
reboot dataset1 with new kernel |
[production] |
15:14 |
<apergos> |
installing 2.6.38 from natty backports on ds1 for further testing |
[production] |
13:55 |
<apergos> |
powering on and off ds1 the hard way via the pdu. |
[production] |
11:15 |
<apergos> |
rebooting ds1 as it's got the one cpu tied up with a hung scp process and continual spewing to syslog... |
[production] |
10:23 |
<apergos> |
s/lgo/log/ as in syslog. saving a copy of the bad log in fenari:/home/ariel/dataset1-syslog-dec-20-2012 |
[production] |
10:09 |
<apergos> |
dataset1 kernel panics in lgo during copy :-( :-( |
[production] |
09:27 |
<apergos> |
a few more binlogs deleted on db9... |
[production] |
03:53 |
<LocalisationUpdate> |
completed (1.18) at Wed Dec 21 03:56:58 UTC 2011 |
[production] |
03:48 |
<Tim> |
doing a manual run of l10nupdate to check recache timings |
[production] |
03:27 |
<tstarling> |
synchronized php-1.18/includes/LocalisationCache.php '[[rev:106927|r106927]]' |
[production] |
02:40 |
<tstarling> |
synchronized wmf-config/InitialiseSettings.php 'LC recache log' |
[production] |
02:38 |
<tstarling> |
synchronized php-1.18/includes/LocalisationCache.php '[[rev:106922|r106922]]' |
[production] |
02:03 |
<LocalisationUpdate> |
completed (1.18) at Wed Dec 21 02:06:08 UTC 2011 |
[production] |
01:51 |
<reedy> |
synchronized php-1.18/resources/mediawiki 'creating empty mediawiki.debug.css/js' |
[production] |
01:50 |
<K4-713> |
synchronized payments cluster to [[rev:106917|r106917]] |
[production] |
01:16 |
<K4-713> |
synchronized payments cluster to [[rev:106909|r106909]] |
[production] |
2011-12-20
§
|
23:57 |
<Ryan_Lane> |
readded /dev/sda2 partition on streber, it was somehow deleted, borking the raidset |
[production] |
23:20 |
<Ryan_Lane> |
rebooting streber |
[production] |
23:00 |
<LeslieCarr> |
creating a new logical volume on streber called syslog for syslog-ng purposes |
[production] |
21:08 |
<awjr> |
synchronizing CiviCRM instance on grosley and aluminium to [[rev:1037|r1037]] |
[production] |
19:23 |
<reedy> |
synchronized php-1.18/extensions/CentralAuth/ '[[rev:106840|r106840]]' |
[production] |
19:14 |
<reedy> |
synchronized php-1.18/extensions/Contest/ '[[rev:106838|r106838]]' |
[production] |
17:05 |
<mutante> |
spence: according to [http://nagios.manubulon.com/traduction/docs25en/tuning.html] we should even double that if we have "high latency values (> 10 or 15 seconds)" and we have like > 1000 |
[production] |
17:04 |
<mutante> |
spence: check out "nagios -s /etc/nagios/nagios.cfg" for performance data - it suggests "Value for 'max_concurrent_checks' option should be >= 1231" |
[production] |
16:56 |
<Jeff_Green> |
manually rotated spence:/var/log/nagios/nagios.log because nagios log rotation appears broken and the file is ~2.6G |
[production] |
16:43 |
<catrope> |
synchronized php-1.18/resources/startup.js 'touch' |
[production] |
16:32 |
<catrope> |
synchronized wmf-config/InitialiseSettings.php 'Underscores -> spaces in wmgArticleFeedbackBlacklistCategories' |
[production] |
16:14 |
<apergos> |
restarting scp on ds2, seems that it renegotiates after 64GB and that was failing, fixed |
[production] |
15:25 |
<apergos> |
thumbs cleaner on ms5 complete. (don't worry, a new job will start up tomorrow) |
[production] |
15:16 |
<mutante> |
installing security upgrades on tarin (includes perl and php) |
[production] |
14:10 |
<apergos> |
another couple binlogs gone on ds9 |
[production] |
13:41 |
<mutante> |
added testswarm package to repo and installed it on gallium |
[production] |
13:15 |
<catrope> |
synchronized wmf-config/InitialiseSettings.php 'Use the correct interwiki prefix' |
[production] |