2009-12-04
§
|
17:35 |
<Fred> |
removed srv83 from the nodelist since it was causing ddsh to never finish executing. |
[production] |
17:26 |
<Fred> |
fixed broken apache. Seems like there is a machine down that is preventing normal sync-file from finishing... Looking into it. |
[production] |
16:50 |
<rainman-sr> |
stopped logging of search queries on searchidx1 until someone sets up proper log archiving to a different machine |
[production] |
16:48 |
<rainman-sr> |
searchidx1 had full disk, freed some 100gb of space by deleting logs and stuff laying around |
[production] |
16:14 |
<Rob> |
srv245 down and unresponsive, rebooting |
[production] |
16:12 |
<Rob> |
sq43's replacement disk is also bad (talk about bad luck), placing rma with dell. system will remain powered down for now. |
[production] |
15:55 |
<Rob> |
sq43 isn't seeing a replaced disk, rebooting and troubleshooting |
[production] |
15:33 |
<domas> |
'arcconf setcache 1 logicaldrive 0 roff ' - disabling any read caching on db11-db30 RAIDs |
[production] |
15:13 |
<Rob> |
after tinkering with it with domas, it appears rebuild is indeed automatic. db21 rebuilding raid array |
[production] |
15:09 |
<Rob> |
db21 bad disk swapped out, rebuild should be automatic |
[production] |
14:57 |
<Rob> |
sq14 back up, rebuilding its cache |
[production] |
14:54 |
<Rob> |
sq13 primary disk dead, out of warranty |
[production] |
14:53 |
<Rob> |
swapping sdc in sq13 and sq14 to bring sq14 back online |
[production] |
14:53 |
<Rob> |
sq14 disk sdc dead, out of warranty. |
[production] |
05:18 |
<Tim> |
on fenari: running all pending renameUser jobs from enwiki |
[production] |
03:37 |
<Tim> |
Around 03:12, accidentally renamed enwiki's job table and so renamed it back a second later. This caused all slaves to stop due to a replication bug. Fixed now. |
[production] |
03:25 |
<Tim> |
testing fixJobQueueExplosion.php on commonswiki |
[production] |
02:46 |
<Tim> |
srv156 not responding to ssh, trying reboot |
[production] |
01:13 |
<Tim> |
restarting job runners |
[production] |
01:13 |
<tstarling> |
synchronized php-1.5/includes/HTMLCacheUpdate.php 'patching out all category backlink updates, major bug causing job queue to stall' |
[production] |
00:12 |
<Tim> |
granted access to root@fenari on all servers in the mysql node group |
[production] |
2009-12-03
§
|
23:46 |
<catrope> |
synchronized php-1.5/wmf-config/InitialiseSettings.php 'Allow bcrats to add and remove new arbcom group on nlwiki' |
[production] |
23:40 |
<RoanKattouw> |
Synced InitiatiseSettings.php: allow bcrats to add and remove new arbcom group on nlwiki |
[production] |
22:49 |
<RoanKattouw> |
Importing 365 images into Commons as User:GeographBot, requested by Multichill |
[production] |
22:39 |
<RoanKattouw> |
Synced InitialiseSettings.php for bug 21238: self-removal of flood flag on plwiki |
[production] |
22:33 |
<RoanKattouw> |
Synced InitialiseSettings.php for bugs 20775 and 21719. sync-file is stalling on what seems to be an unresponsive server |
[production] |
21:35 |
<RoanKattouw> |
Running namespaceDupes on usabilitywiki for bug 21753 |
[production] |
21:35 |
<RoanKattouw> |
catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21753 Fix Multimedia talk NS on usabilitywiki' |
[production] |
04:20 |
<tfinc> |
synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php 'fixing conversion rate bugs' |
[production] |
2009-12-02
§
|
23:28 |
<midom> |
synchronized php-1.5/wmf-config/db.php 'reenabling db18 and db25, also, attempting to overwrite stale db.php copies' |
[production] |
23:25 |
<Fred> |
massaged mc.php to retrieve working spare, and remove broken memcached nodes. all is now good in the land of memcache |
[production] |
22:13 |
<mark> |
Recovered torrus from deadlock |
[production] |
21:00 |
<Fred> |
rebooted srv194 (hung) |
[production] |
20:48 |
<Rob> |
removed bayle and khaldun from dsh, both are in rack running wipe with network pulled |
[production] |
20:38 |
<Fred> |
bart removed from nagios (well that sounds funny) |
[production] |
20:36 |
<Rob> |
khaldun is down forever! decomissioned and running wipe in rack with the network pulled |
[production] |
20:35 |
<Rob> |
isidore rebooted by accident due to power cable issues |
[production] |
20:21 |
<Rob> |
srv136 crashed with temp warnings, going to decommission it, rebooting to wipe and remove network |
[production] |
20:15 |
<Rob> |
bart decommissioned, unracked, wipe running on testbench with usbcdrom |
[production] |
19:49 |
<Rob> |
decommissioned, unracked srv66, srv51, srv81, srv118 (previously removed from pybal) |
[production] |
19:39 |
<Rob> |
decommissioned srv130, unracked |
[production] |
19:20 |
<Rob> |
srv122 decommissioned, wiped, unracked |
[production] |
18:19 |
<Rob> |
ms7/ms8 racked in sdtpa a2, network wired, dns setup, racktables updated, & LOM online |
[production] |
18:18 |
<Rob> |
serial connection to ps1-a4-sdtpa returned to normal |
[production] |
18:05 |
<Rob> |
ps1-a4-sdtpa temp losing its serial connection, stealing adapter to setup ms7/8 |
[production] |
18:04 |
<Rob> |
added ms7/ms8 to dns for wmnet and mgmt nics |
[production] |
16:20 |
<andrew> |
synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Deploy r59665' |
[production] |
16:19 |
<andrew> |
synchronized php-1.5/extensions/LiquidThreads_alpha/jquery/js2.combined.js 'Deploy r59665' |
[production] |
15:40 |
<Rob> |
rebooted the following per domas request: srv101 srv105 srv112 srv117 srv138 srv183 srv89 srv84 srv89 srv91 srv96 srv98 srv99 |
[production] |
15:02 |
<mark> |
Shutting down mint for installation of a wifi card |
[production] |