2009-04-23
§
|
14:31 |
<Tim> |
merged r49051 |
[production] |
14:13 |
<Tim> |
fixed nagios labels for esams backup ext store, erroneously labelled as "toolserver" |
[production] |
06:27 |
<Tim> |
restarted all job runners, ES connection errors weren't killing them |
[production] |
05:43 |
<Tim> |
shutting down mysql on all fedora ES servers. Will update documentation and node lists to indicate that this is permanent. |
[production] |
05:37 |
<Tim> |
srv217 did not come up from a soft reboot, but power cycle worked. Before reboot, observed apache2 hanging indefinitely on nanosleep(), but couldn't reproduce a timer issue in other processes. An NFS mount was hanging on stat. |
[production] |
05:13 |
<Tim> |
rebooting srv217 |
[production] |
04:41 |
<Tim> |
srv217 is hanging on various operations, investigating. Trying to shut down its apache. |
[production] |
04:35 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
04:31 |
<Tim> |
copy done, started cluster18 mysql instance on ms3 using srv104 snapshot, repooled it |
[production] |
02:07 |
<tstarling> |
synchronized php-1.5/InitialiseSettings.php |
[production] |
01:57 |
<Tim> |
relaxed wgAccountCreationThrottle on frwiki, presumably the 2006 vandal emergency is over. Disabled it on idwiki for workshop event. |
[production] |
01:45 |
<Tim> |
copying srv104's data from ms3 to ms2 |
[production] |
01:11 |
<Tim> |
started mysql on srv104 |
[production] |
2009-04-22
§
|
21:44 |
<tomaszf> |
db9 is back up. excessive tmpfs file systems removed |
[production] |
21:39 |
<tomaszf> |
taking outage on db9 to remove tmpfs file systems |
[production] |
11:34 |
<JeLuF> |
initiated reboot of srv137. dmesg shows no usable information any more. |
[production] |
11:30 |
<JeLuF> |
srv137 has read-only filesystem. Stopped Apache. |
[production] |
06:03 |
<andrew> |
synchronized php-1.5/includes/specials/SpecialBlockip.php 'Live-merged r49730, typo causing failures in user hiding' |
[production] |
06:02 |
<Andrew> |
srv137 still seems read-only, srv137: rsync: mkstemp "/apache/common/php-1.5/includes/specials/.SpecialBlockip.php.1QkrKX" failed: Read-only file system (30) |
[production] |
03:14 |
<Tim> |
copying ES data from srv104 to ms3 using nc tarpipe |
[production] |
03:10 |
<tstarling> |
synchronized php-1.5/db.php 'depooling srv104 ES' |
[production] |
03:03 |
<Tim> |
corruption found on cluster18, the copy source server (srv106) is missing lots of rows. Switched back to srv105/104. |
[production] |
03:02 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
02:50 |
<tstarling> |
synchronized php-1.5/includes/Revision.php 'reverted profiling and logging hacks' |
[production] |
02:40 |
<Tim> |
depooled ms2 ex-fedora instances and shut them down, it can be a backup for now |
[production] |
02:38 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
02:33 |
<Tim> |
deployed the new ms2/ms3 ex-fedora ES configuration |
[production] |
02:32 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
02:01 |
<Tim> |
set up ex-fedora mysql instances on both ms2 and ms3, controlled with /etc/init.d/mysql-ex-fedora |
[production] |
01:04 |
<Tim> |
changed the main mysql instance on ms3 (rc1) to bind to a single IP address instead of * |
[production] |
2009-04-21
§
|
19:41 |
<mark> |
Added grosley.wikimedia.org to local_domains list on grosley's exim.conf, and added appropriate aliases in /etc/aliases |
[production] |
16:35 |
<Andrew> |
Re-ran rebuildTemplates.php, all seems well now |
[production] |
16:30 |
<robh> |
synchronized php-1.5/mc-pmtpa.php 'syncing for fred' |
[production] |
16:30 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out srv88 for srv159 and srv90 for srv198' |
[production] |
16:29 |
<andrew> |
synchronized php-1.5/mc-pmtpa.php 'Switched srv88 for srv159, srv90 for srv198 to fix down memcache nodes' |
[production] |
16:18 |
<azafred> |
restarted memcached on srv96. Now responding. |
[production] |
16:14 |
<Rob> |
Fred needs to start logging in as Fred and not as root, bad fred (see it wasnt me this time, bwahahahahahaa) |
[production] |
16:11 |
<Andrew> |
Fred fixed up some memcached nodes, but no joy with rebuildTemplates |
[production] |
16:10 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out down servers for active ones' |
[production] |
16:09 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out down servers for active ones' |
[production] |
16:01 |
<Rob> |
srv137 read only, depooled in pybal for apache and rebooting. |
[production] |
15:57 |
<root> |
synchronized php-1.5/mc-pmtpa.php 'swapping out down servers for active ones' |
[production] |
14:34 |
<Andrew> |
rebuildTemplates.php appeared not to help, same problem as before (stopped after a few wikis). Possibly a dodgy memcache node. |
[production] |
14:32 |
<Andrew> |
ran rebuildTemplates.php metawiki due to reports of <messagename> appearing in place of the central notice. |
[production] |
05:04 |
<Andrew> |
Live-merged r49685, fix for unsuppression of usernames on unblock -- some usernames were left stuck suppressed if they were unblocked when the block suppressed their username |
[production] |
05:03 |
<andrew> |
synchronized php-1.5/includes/specials/SpecialBlockip.php |
[production] |
05:03 |
<andrew> |
synchronized php-1.5/includes/specials/SpecialIpblocklist.php |
[production] |
01:34 |
<azafred> |
Made some improvments on Spam handling. Bayes is in play and can learn from everybody what is spam and what is ham. Documentation to follow. |
[production] |