2011-12-22
§
|
12:30 |
<mark> |
Made swift thumb seeder fetch from the squids instead of ms5, as a performance test |
[production] |
05:10 |
<Ryan_Lane> |
finished creating all puppet configuration groups, classes, and variables. It's safe to configure and create instances again. |
[production] |
04:40 |
<Ryan_Lane> |
upping version of OpenStackManager on virt1 to match development. configuration and creation of instances should be avoided. |
[production] |
02:04 |
<binasher> |
db9 is writable again |
[production] |
02:01 |
<LocalisationUpdate> |
completed (1.18) at Thu Dec 22 02:04:05 UTC 2011 |
[production] |
01:59 |
<binasher> |
started db9 maintenance phase 1 (getting it replicated to db10 again) |
[production] |
00:15 |
<K4-713> |
synchronized payments cluster to [[rev:107018|r107018]] |
[production] |
00:07 |
<awjrichards> |
synchronizing Wikimedia installation... : [[rev:107015|r107015]] |
[production] |
2011-12-21
§
|
22:44 |
<mark> |
proxy worker processes increased from 8 to 24 on owa1-2, 48 on owa3 |
[production] |
22:28 |
<mark> |
Unmounted /srv/swift-storage/sdab1 on ms2 (borken filesystem) |
[production] |
22:16 |
<mark> |
Flushed all iptables rules down the drain on ms1-3 (live hack, puppet will restore) |
[production] |
21:54 |
<mark> |
Ran swift-init all restart on ms3 |
[production] |
21:41 |
<mark> |
Ran swift-init all restart on ms2 |
[production] |
21:33 |
<mark> |
Running ben's swift thumb loader script in a screen on hume |
[production] |
21:05 |
<apergos> |
three more bin logs tossed from ds9 |
[production] |
20:31 |
<mark> |
Restarted swift-container on ms1 with higher worker count (4 instead of 2) |
[production] |
20:31 |
<Jeff_Green> |
power cycled kaulen because it's deathswapped and unresponsive |
[production] |
19:41 |
<mark> |
Ended oprofile run on ms1 |
[production] |
19:33 |
<catrope> |
synchronized wmf-config/missing.php 'Update missing.php from trunk, see [[bugzilla:30206|bug 30206]]' |
[production] |
19:24 |
<mark> |
Started oprofile run on ms1 |
[production] |
19:20 |
<mark> |
Migrated DRBD sync between nfs1 and nfs2 from protocol C (sync) to A (async) |
[production] |
17:48 |
<RoanKattouw> |
srv224 has a full disk |
[production] |
17:48 |
<catrope> |
synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js '[[rev:106959|r106959]]' |
[production] |
17:28 |
<maplebed> |
ran apt-get clean on hume to clear out ~600M space on the / partition |
[production] |
16:18 |
<apergos> |
so that was fast. barf from scp, nice call trace etc, shot the process on ds2, will email the vendor |
[production] |
15:27 |
<apergos> |
and starting another huge copy from ds2 to ds1, let's see what happens... |
[production] |
15:18 |
<apergos> |
reboot dataset1 with new kernel |
[production] |
15:14 |
<apergos> |
installing 2.6.38 from natty backports on ds1 for further testing |
[production] |
13:55 |
<apergos> |
powering on and off ds1 the hard way via the pdu. |
[production] |
11:15 |
<apergos> |
rebooting ds1 as it's got the one cpu tied up with a hung scp process and continual spewing to syslog... |
[production] |
10:23 |
<apergos> |
s/lgo/log/ as in syslog. saving a copy of the bad log in fenari:/home/ariel/dataset1-syslog-dec-20-2012 |
[production] |
10:09 |
<apergos> |
dataset1 kernel panics in lgo during copy :-( :-( |
[production] |
09:27 |
<apergos> |
a few more binlogs deleted on db9... |
[production] |
03:53 |
<LocalisationUpdate> |
completed (1.18) at Wed Dec 21 03:56:58 UTC 2011 |
[production] |
03:48 |
<Tim> |
doing a manual run of l10nupdate to check recache timings |
[production] |
03:27 |
<tstarling> |
synchronized php-1.18/includes/LocalisationCache.php '[[rev:106927|r106927]]' |
[production] |
02:40 |
<tstarling> |
synchronized wmf-config/InitialiseSettings.php 'LC recache log' |
[production] |
02:38 |
<tstarling> |
synchronized php-1.18/includes/LocalisationCache.php '[[rev:106922|r106922]]' |
[production] |
02:03 |
<LocalisationUpdate> |
completed (1.18) at Wed Dec 21 02:06:08 UTC 2011 |
[production] |
01:51 |
<reedy> |
synchronized php-1.18/resources/mediawiki 'creating empty mediawiki.debug.css/js' |
[production] |
01:50 |
<K4-713> |
synchronized payments cluster to [[rev:106917|r106917]] |
[production] |
01:16 |
<K4-713> |
synchronized payments cluster to [[rev:106909|r106909]] |
[production] |
2011-12-20
§
|
23:57 |
<Ryan_Lane> |
readded /dev/sda2 partition on streber, it was somehow deleted, borking the raidset |
[production] |
23:20 |
<Ryan_Lane> |
rebooting streber |
[production] |
23:00 |
<LeslieCarr> |
creating a new logical volume on streber called syslog for syslog-ng purposes |
[production] |
21:08 |
<awjr> |
synchronizing CiviCRM instance on grosley and aluminium to [[rev:1037|r1037]] |
[production] |
19:23 |
<reedy> |
synchronized php-1.18/extensions/CentralAuth/ '[[rev:106840|r106840]]' |
[production] |
19:14 |
<reedy> |
synchronized php-1.18/extensions/Contest/ '[[rev:106838|r106838]]' |
[production] |
17:05 |
<mutante> |
spence: according to [http://nagios.manubulon.com/traduction/docs25en/tuning.html] we should even double that if we have "high latency values (> 10 or 15 seconds)" and we have like > 1000 |
[production] |
17:04 |
<mutante> |
spence: check out "nagios -s /etc/nagios/nagios.cfg" for performance data - it suggests "Value for 'max_concurrent_checks' option should be >= 1231" |
[production] |