2009-04-24
§
|
03:35 |
<andrew> |
synchronized php-1.5/InitialiseSettings.php |
[production] |
03:35 |
<Andrew> |
Deployed AbuseFilter to fiwiki |
[production] |
02:51 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php |
[production] |
02:46 |
<Tim> |
srv127 has corrupted root partition, needs reinstall or repair. Shut down with echo o > /proc/sysrq-trigger. |
[production] |
02:36 |
<tstarling> |
synchronized php-1.5/mc-pmtpa.php |
[production] |
02:31 |
<Tim> |
killed srv124 with /proc/sysrq-trigger. Was very slow on ssh and was giving odd 403 errors via HTTP. |
[production] |
02:21 |
<tstarling> |
synchronized php-1.5/README |
[production] |
02:12 |
<andrew> |
synchronized php-1.5/CommonSettings.php |
[production] |
02:10 |
<Andrew> |
srv127: rsync: mkstemp "/apache/common/php-1.5/.CommonSettings.php.TRNqkG" failed: Read-only file system (30) |
[production] |
01:15 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
01:14 |
<Tim> |
depooled db3 so that it can finish doing the querycache update without making lots of people wait for a MASTER_POS_WAIT |
[production] |
01:03 |
<tstarling> |
synchronized php-1.5/InitialiseSettings.php |
[production] |
01:03 |
<Tim> |
blacklisted Wantedtemplates on enwiki, has been running for more than a day. |
[production] |
00:54 |
<Tim> |
restarting trackBlobs.php on hume for afwiki and enwiki |
[production] |
2009-04-23
§
|
19:05 |
<brion> |
donate.wikipedia.org redirect borked, going to civicrm instead of public donation pages. server config needs updating |
[production] |
16:54 |
<brion> |
db3 was lagging a bit; 403s a few minutes ago. catching up nicely now |
[production] |
14:46 |
<robh> |
synchronized php-1.5/InitialiseSettings.php 'Added namespaces to huwikisource per bug 18557' |
[production] |
14:41 |
<tstarling> |
synchronized php-1.5/includes/specials/SpecialUpload.php |
[production] |
14:39 |
<Tim> |
merged r49775 |
[production] |
14:32 |
<tstarling> |
synchronized php-1.5/includes/specials/SpecialUpload.php |
[production] |
14:31 |
<Tim> |
merged r49051 |
[production] |
14:13 |
<Tim> |
fixed nagios labels for esams backup ext store, erroneously labelled as "toolserver" |
[production] |
06:27 |
<Tim> |
restarted all job runners, ES connection errors weren't killing them |
[production] |
05:43 |
<Tim> |
shutting down mysql on all fedora ES servers. Will update documentation and node lists to indicate that this is permanent. |
[production] |
05:37 |
<Tim> |
srv217 did not come up from a soft reboot, but power cycle worked. Before reboot, observed apache2 hanging indefinitely on nanosleep(), but couldn't reproduce a timer issue in other processes. An NFS mount was hanging on stat. |
[production] |
05:13 |
<Tim> |
rebooting srv217 |
[production] |
04:41 |
<Tim> |
srv217 is hanging on various operations, investigating. Trying to shut down its apache. |
[production] |
04:35 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
04:31 |
<Tim> |
copy done, started cluster18 mysql instance on ms3 using srv104 snapshot, repooled it |
[production] |
02:07 |
<tstarling> |
synchronized php-1.5/InitialiseSettings.php |
[production] |
01:57 |
<Tim> |
relaxed wgAccountCreationThrottle on frwiki, presumably the 2006 vandal emergency is over. Disabled it on idwiki for workshop event. |
[production] |
01:45 |
<Tim> |
copying srv104's data from ms3 to ms2 |
[production] |
01:11 |
<Tim> |
started mysql on srv104 |
[production] |
2009-04-22
§
|
21:44 |
<tomaszf> |
db9 is back up. excessive tmpfs file systems removed |
[production] |
21:39 |
<tomaszf> |
taking outage on db9 to remove tmpfs file systems |
[production] |
11:34 |
<JeLuF> |
initiated reboot of srv137. dmesg shows no usable information any more. |
[production] |
11:30 |
<JeLuF> |
srv137 has read-only filesystem. Stopped Apache. |
[production] |
06:03 |
<andrew> |
synchronized php-1.5/includes/specials/SpecialBlockip.php 'Live-merged r49730, typo causing failures in user hiding' |
[production] |
06:02 |
<Andrew> |
srv137 still seems read-only, srv137: rsync: mkstemp "/apache/common/php-1.5/includes/specials/.SpecialBlockip.php.1QkrKX" failed: Read-only file system (30) |
[production] |
03:14 |
<Tim> |
copying ES data from srv104 to ms3 using nc tarpipe |
[production] |
03:10 |
<tstarling> |
synchronized php-1.5/db.php 'depooling srv104 ES' |
[production] |
03:03 |
<Tim> |
corruption found on cluster18, the copy source server (srv106) is missing lots of rows. Switched back to srv105/104. |
[production] |
03:02 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
02:50 |
<tstarling> |
synchronized php-1.5/includes/Revision.php 'reverted profiling and logging hacks' |
[production] |
02:40 |
<Tim> |
depooled ms2 ex-fedora instances and shut them down, it can be a backup for now |
[production] |
02:38 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
02:33 |
<Tim> |
deployed the new ms2/ms3 ex-fedora ES configuration |
[production] |
02:32 |
<tstarling> |
synchronized php-1.5/db.php |
[production] |
02:01 |
<Tim> |
set up ex-fedora mysql instances on both ms2 and ms3, controlled with /etc/init.d/mysql-ex-fedora |
[production] |
01:04 |
<Tim> |
changed the main mysql instance on ms3 (rc1) to bind to a single IP address instead of * |
[production] |