2009-10-27
§
|
19:32 |
<mark> |
Ran "deluser catrope" across the cluster to prompt puppet to recreate |
[production] |
19:30 |
<mark> |
Fixed admins.pp in puppet, "managehome" attribute had disappeared |
[production] |
17:16 |
<midom> |
synchronized php-1.5/languages/LanguageConverter.php |
[production] |
10:14 |
<midom> |
synchronized php-1.5/StartProfiler.php |
[production] |
10:08 |
<midom> |
synchronized php-1.5/languages/LanguageConverter.php 'oops, this is not entirely right, livehacking for now' |
[production] |
09:58 |
<midom> |
synchronized php-1.5/languages/LanguageConverter.php 'push locking change live' |
[production] |
07:45 |
<domas> |
rolled live memcached changes, read/write timeouts down from 1s to 50ms, connect timeouts from 3x10ms with backoff to 2x10ms with no backoff, and fixed some host blacklist bug. |
[production] |
07:43 |
<midom> |
synchronized php-1.5/includes/memcached-client.php 'HERE WE GO MEMCACHED FIXES' |
[production] |
06:05 |
<domas> |
fixed perms in survey.wikimedia.org's /srv/org/wikimedia/survey/tmp/ , as well as set display_errors to off, in case there's more incompetence around ;-) |
[production] |
01:39 |
<rainman-sr> |
turned back on highlighting on en/de/fr, turned off interwiki search on smaller wikis ... we need more servers to cope with increase in traffic on large wikis |
[production] |
01:11 |
<atglenn> |
disabled search2 from lvs3 pybal config at rainman's request (it had load 21) |
[production] |
01:01 |
<rainman-sr> |
could someone please remove search2 from lsv3 search group ASAP |
[production] |
00:15 |
<andrew> |
synchronized php-1.5/extensions/LiquidThreads/pages/SpecialNewMessages.php 'Deploy r58176' |
[production] |
2009-10-26
§
|
20:45 |
<Andrew> |
scapping to update LiquidThreads |
[production] |
20:16 |
<Andrew> |
Going to update LiquidThreads to trunk state in a few minutes |
[production] |
16:08 |
<rainman-sr> |
overloads all around, turned off en/de/fr wiki highlighting so that searchs don't time out |
[production] |
11:10 |
<hcatlin> |
reworked mobile1's config so that its more standardized and more of the config is in the repo |
[production] |
08:53 |
<domas> |
updated nagios to reflect changed server roles |
[production] |
08:43 |
<domas> |
dewiki is now separate cluster, s5, replication switch over done at http://p.defau.lt/?kfvvlNOc4TkJ_6SCAVe6mg |
[production] |
08:42 |
<midom> |
synchronized php-1.5/wmf-config/CommonSettings.php 'dewiki readwrite' |
[production] |
08:40 |
<midom> |
synchronized php-1.5/wmf-config/db.php 'restructuring s2dewiki into s5' |
[production] |
08:38 |
<midom> |
synchronized php-1.5/wmf-config/CommonSettings.php 'dewiki read-only' |
[production] |
07:57 |
<midom> |
synchronized php-1.5/wmf-config/db.php 'entirely separating dewiki slaves' |
[production] |
06:54 |
<midom> |
synchronized php-1.5/wmf-config/db.php 'taking out db4 for copy to db23' |
[production] |
05:45 |
<midom> |
synchronized php-1.5/wmf-config/db.php |
[production] |
2009-10-25
§
|
15:23 |
<domas> |
converting usability initiative tables to InnoDB... |
[production] |
13:23 |
<domas> |
set up snapshot rotation on db10 |
[production] |
12:36 |
<hcatlin> |
mobile1: created init.d/cluster to correct USR1 sig problem, fully updated sys ops on wikitech |
[production] |
12:03 |
<domas> |
Mark, I'm sure you'll like that! ;-p~ |
[production] |
12:02 |
<domas> |
started sq43 without /dev/sdd COSS store (manual conf hack) |
[production] |
11:54 |
<domas> |
removed ns3 from nagios, added ns1 |
[production] |
11:45 |
<domas> |
bounced ns1 too, was affected by selective-answer leak (same number as ns0, btw, 507!) ages ago, just not noticed by nagios. this seem to resolve some slowness I noticed few times. |
[production] |
11:41 |
<domas> |
bounced pdns on ns0, was affected by selective-answer leak |
[production] |
2009-10-23
§
|
23:37 |
<tstarling> |
synchronized wmf-deployment/cache/trusted-xff.cdb |
[production] |
23:31 |
<tstarling> |
synchronized wmf-deployment/cache/trusted-xff.cdb |
[production] |
23:24 |
<Tim> |
updating TrustedXFF (bolt browser) |
[production] |
22:36 |
<domas> |
db28 has multiple fan failures (LOM is finally able to do something :) - still needs datacenter ops |
[production] |
22:20 |
<domas> |
db28 is a toast, needs cold restart by datacenter ops, LOM not able to do anything |
[production] |
22:20 |
<midom> |
synchronized php-1.5/wmf-config/db.php 'db28 dead' |
[production] |
11:17 |
<domas> |
Fixed skip-list of cached query pages, was broken for past two months :) |
[production] |
10:54 |
<midom> |
synchronized php-1.5/thumb.php 'removing livehack' |
[production] |
10:52 |
<domas> |
rotating logs becomes difficult when they become too big, so they continue to grow indefinitely! db20 / nearly full, loooots of /var/log/remote ;-) |
[production] |
10:39 |
<domas> |
who watches the watchers? :) rrdtool process on spence was using 8G of memory. :-)))) |
[production] |
10:24 |
<domas> |
semaphore leaks made some apaches fail, failed apache in rendering farm was not depooled, thus having 404 handler serve plenty of "can't connect to host" broken thumbs. |
[production] |
10:13 |
<domas> |
apparently there're intermittent connection failures from ms4 to scalers |
[production] |
09:56 |
<midom> |
synchronized php-1.5/thumb.php 'error header livehack' |
[production] |