2009-04-21
§
|
16:10 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out down servers for active ones' |
[production] |
16:09 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out down servers for active ones' |
[production] |
16:01 |
<Rob> |
srv137 read only, depooled in pybal for apache and rebooting. |
[production] |
15:57 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out down servers for active ones' |
[production] |
14:34 |
<Andrew> |
rebuildTemplates.php appeared not to help, same problem as before (stopped after a few wikis). Possibly a dodgy memcache node. |
[production] |
14:32 |
<Andrew> |
ran rebuildTemplates.php metawiki due to reports of <messagename> appearing in place of the central notice. |
[production] |
05:04 |
<Andrew> |
Live-merged r49685, fix for unsuppression of usernames on unblock -- some usernames were left stuck suppressed if they were unblocked when the block suppressed their username |
[production] |
05:03 |
<andrew> |
synchronized php-1.5/includes/specials/SpecialBlockip.php |
[production] |
05:03 |
<andrew> |
synchronized php-1.5/includes/specials/SpecialIpblocklist.php |
[production] |
01:34 |
<azafred> |
Made some improvments on Spam handling. Bayes is in play and can learn from everybody what is spam and what is ham. Documentation to follow. |
[production] |
2009-04-20
§
|
19:59 |
<Rob> |
Powering down srv67, srv85, srv88, srv90 due to temp warnings and bad fans. |
[production] |
19:36 |
<Rob> |
updated mc-pmtpa.php to reflect the status of down or spare for the memcached servers. (lots more spares now) |
[production] |
17:35 |
<azafred> |
restarted apache on srv217 |
[production] |
17:34 |
<azafred> |
srv125 reinstall completed. |
[production] |
17:24 |
<Rob> |
srv146 back online |
[production] |
17:10 |
<Rob> |
srv131 back up, updated and synced. |
[production] |
16:52 |
<azafred> |
srv118 reinstall completed. |
[production] |
16:52 |
<Rob> |
srv127 back online and synced. |
[production] |
16:41 |
<Rob> |
srv125 reinstalled, passing off to fred |
[production] |
16:40 |
<Rob> |
replaced dead disk in sq26 |
[production] |
16:31 |
<Rob> |
shutting down sq26 to replace bad hdd |
[production] |
16:27 |
<Rob> |
reinstalling srv125 |
[production] |
16:13 |
<azafred> |
finished re-install of srv63. |
[production] |
16:11 |
<Rob> |
reinstalled srv118, handed off to fred for completion |
[production] |
16:02 |
<Rob> |
restarted srv118 and reinstalled it |
[production] |
15:57 |
<Rob> |
restarted a locked up srv110 and synced it. |
[production] |
15:49 |
<Rob> |
srv81 lacked up, fixed, synced and online |
[production] |
15:29 |
<Rob> |
replaced fan and drive in srv63, reinstalling |
[production] |
14:36 |
<Rob> |
memory replaced in srv203, back online. |
[production] |
14:11 |
<Rob> |
shutting down srv203 to swap out bad memory |
[production] |
05:12 |
<Tim> |
fixed memcached on srv75, stopped old ES slave on srv102, srv106, srv107, srv159, srv171 |
[production] |
2009-04-17
§
|
22:49 |
<brion> |
regenerated centralnotice output again... this time ok |
[production] |
22:48 |
<brion> |
srv93 and srv107 memcached nodes are running but broken. restarting them... |
[production] |
22:43 |
<brion> |
restarted srv82 memcache node. attempting to rebuild centralnotices... |
[production] |
22:41 |
<brion> |
bad memcached node srv82 |
[production] |
22:05 |
<mark> |
Set up 3 new pywikipedia mailing lists, redirected svn commit output to one of them |
[production] |
19:38 |
<robh> |
synchronized php-1.5/InitialiseSettings.php 'Bug 18494 Logo for ln.wiki' |
[production] |
17:22 |
<Rob> |
removed wikimedia.se from our nameservers as they are using their own. |
[production] |
16:48 |
<azafred> |
updated spamassassin rules on lily to include the SARE rules and mirror the settings on McHenry. |
[production] |
10:25 |
<tstarling> |
synchronized robots.txt |
[production] |
08:19 |
<tstarling> |
synchronized php-1.5/InitialiseSettings.php |
[production] |
07:13 |
<Tim> |
temporarily killed apache on overloaded ES masters |
[production] |
07:11 |
<tstarling> |
synchronized php-1.5/db.php 'zeroing read load on ES masters' |
[production] |
06:04 |
<Tim> |
brief site-wide outage while it rebooted, reason unknown. All good now. Resuming logrotate. |
[production] |
05:55 |
<Tim> |
db20 h/w reboot |
[production] |
05:48 |
<Tim> |
shutting down daemons on db20 for pre-emptive reboot. Serial console shows "BUG: soft lockup - CPU#4 stuck for 11s! [rsync:27854]" etc. |
[production] |
05:10 |
<Tim> |
on db20: killed logrotate -f half done due to alarming kswapd CPU (linked to deadlocked rsync processes). May need a reboot. |
[production] |