2009-12-04
§
|
20:13 |
<root> |
ran sync-common-all |
[production] |
20:12 |
<Rob> |
running sync-common-all to update configuration for support of flaggedrevs on plwiktionary |
[production] |
19:20 |
<Rob> |
srv144 removed from node groups & pybal, nagios resynced. |
[production] |
19:19 |
<Rob> |
srv144 is out of warranty and rebooting randomly, decommissioning. |
[production] |
19:05 |
<Fred> |
finished setup of srv245. |
[production] |
19:02 |
<Rob> |
srv126 removed from node groups and lvs. nagios restarted to exclude it. |
[production] |
19:01 |
<Rob> |
srv126 refuses to even post when benched, out of warranty, slating for immediate decommissioning |
[production] |
19:00 |
<Rob> |
srv144 reinstalling with a single hard disk, no more raid1 |
[production] |
18:50 |
<Rob> |
swapped primary srv144 drive with old decommissioned spare. reinstalling OS, will reinstall packages and get online later. |
[production] |
18:45 |
<Rob> |
sq22 back online, all drives nominal, rebuilding cache and ensuring it is in rotation |
[production] |
18:41 |
<Rob> |
rebooted sq22 |
[production] |
18:38 |
<Rob> |
rebooted srv144 and srv126 |
[production] |
18:36 |
<Rob> |
srv245 package install failed. I do not have time to tinker with it while in the DC, I have other things that require my physical access to the machines. Leaving it alone for now to work on remotely. |
[production] |
18:28 |
<Rob> |
srv245 OS installed, setting up wikimedia-task-appserver |
[production] |
18:06 |
<Rob> |
srv245 was sitting idle with no OS, depooled from apaches. reinstalling system. |
[production] |
17:57 |
<Rob> |
rebooted srv83 per fred |
[production] |
17:35 |
<Fred> |
removed srv83 from the nodelist since it was causing ddsh to never finish executing. |
[production] |
17:26 |
<Fred> |
fixed broken apache. Seems like there is a machine down that is preventing normal sync-file from finishing... Looking into it. |
[production] |
16:50 |
<rainman-sr> |
stopped logging of search queries on searchidx1 until someone sets up proper log archiving to a different machine |
[production] |
16:48 |
<rainman-sr> |
searchidx1 had full disk, freed some 100gb of space by deleting logs and stuff laying around |
[production] |
16:14 |
<Rob> |
srv245 down and unresponsive, rebooting |
[production] |
16:12 |
<Rob> |
sq43's replacement disk is also bad (talk about bad luck), placing rma with dell. system will remain powered down for now. |
[production] |
15:55 |
<Rob> |
sq43 isn't seeing a replaced disk, rebooting and troubleshooting |
[production] |
15:33 |
<domas> |
'arcconf setcache 1 logicaldrive 0 roff ' - disabling any read caching on db11-db30 RAIDs |
[production] |
15:13 |
<Rob> |
after tinkering with it with domas, it appears rebuild is indeed automatic. db21 rebuilding raid array |
[production] |
15:09 |
<Rob> |
db21 bad disk swapped out, rebuild should be automatic |
[production] |
14:57 |
<Rob> |
sq14 back up, rebuilding its cache |
[production] |
14:54 |
<Rob> |
sq13 primary disk dead, out of warranty |
[production] |
14:53 |
<Rob> |
swapping sdc in sq13 and sq14 to bring sq14 back online |
[production] |
14:53 |
<Rob> |
sq14 disk sdc dead, out of warranty. |
[production] |
05:18 |
<Tim> |
on fenari: running all pending renameUser jobs from enwiki |
[production] |
03:37 |
<Tim> |
Around 03:12, accidentally renamed enwiki's job table and so renamed it back a second later. This caused all slaves to stop due to a replication bug. Fixed now. |
[production] |
03:25 |
<Tim> |
testing fixJobQueueExplosion.php on commonswiki |
[production] |
02:46 |
<Tim> |
srv156 not responding to ssh, trying reboot |
[production] |
01:13 |
<Tim> |
restarting job runners |
[production] |
01:13 |
<tstarling> |
synchronized php-1.5/includes/HTMLCacheUpdate.php 'patching out all category backlink updates, major bug causing job queue to stall' |
[production] |
00:12 |
<Tim> |
granted access to root@fenari on all servers in the mysql node group |
[production] |
2009-12-03
§
|
23:46 |
<catrope> |
synchronized php-1.5/wmf-config/InitialiseSettings.php 'Allow bcrats to add and remove new arbcom group on nlwiki' |
[production] |
23:40 |
<RoanKattouw> |
Synced InitiatiseSettings.php: allow bcrats to add and remove new arbcom group on nlwiki |
[production] |
22:49 |
<RoanKattouw> |
Importing 365 images into Commons as User:GeographBot, requested by Multichill |
[production] |
22:39 |
<RoanKattouw> |
Synced InitialiseSettings.php for bug 21238: self-removal of flood flag on plwiki |
[production] |
22:33 |
<RoanKattouw> |
Synced InitialiseSettings.php for bugs 20775 and 21719. sync-file is stalling on what seems to be an unresponsive server |
[production] |
21:35 |
<RoanKattouw> |
Running namespaceDupes on usabilitywiki for bug 21753 |
[production] |
21:35 |
<RoanKattouw> |
catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21753 Fix Multimedia talk NS on usabilitywiki' |
[production] |
04:20 |
<tfinc> |
synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php 'fixing conversion rate bugs' |
[production] |