2010-08-14
§
|
20:15 |
<mark> |
Decommissioning srv150 |
[production] |
19:56 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24789 - Enable AbuseFilter for ja.wikipedia' |
[production] |
19:52 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24626 - Add an "autopatrolled" status for frwiktionary' |
[production] |
15:37 |
<mark> |
dobson has failed RAID1 array member /dev/sda. Running long SMART self test on /dev/sda |
[production] |
14:18 |
<mark> |
synchronized php-1.5/wmf-config/db.php 'Add ms2 and ms1 to clusters rc1 an cluster22' |
[production] |
14:06 |
<mark> |
FLUSH TABLES WITH READ LOCK on ms1 for testing |
[production] |
13:59 |
<mark> |
Stopping mysql on ms1 as monitoring test |
[production] |
13:59 |
<mark> |
Granted SELECT on mysql.* to nagios on ms3 |
[production] |
10:57 |
<mark> |
Removed oldest LVM snapshot on ixia |
[production] |
09:43 |
<mark> |
Fixed apparmor profile /etc/apparmor.d/usr.sbin.mysqld on ms1, restarted mysql under apparmor |
[production] |
09:39 |
<mark> |
START SLAVE on ms1, catching up with ms3 |
[production] |
09:38 |
<mark> |
RESET SLAVE on db5 |
[production] |
09:37 |
<mark> |
STOP SLAVE on db5 |
[production] |
09:35 |
<mark> |
Stopped apparmor on ms1 |
[production] |
08:41 |
<Andrew> |
Leaving as-is for now, hoping somebody with appropriate permissions can fix it later. |
[production] |
08:40 |
<Andrew> |
STOP SLAVE on db5 gives me ERROR 1045 (00000): Access denied for user: 'wikiadmin@208.80.152.%' (Using password: NO) |
[production] |
08:34 |
<Andrew> |
Slave is supposedly still running on db5. Assuming Roan didn't stop it when he switched masters a few days ago. Going to text somebody to confirm that stopping is correct course of action. |
[production] |
08:24 |
<Andrew> |
db5 can't be lagged, it's the master ;-). Obviously something wrong with wfWaitForSlaves. |
[production] |
08:19 |
<Andrew> |
db5 lagged 217904 seconds |
[production] |
05:09 |
<Andrew> |
Ran thread_pending_relationship and thread_reaction schema changes on all LiquidThreads wikis |
[production] |
05:06 |
<andrew> |
synchronizing Wikimedia installation... Revision: 70933 |
[production] |
05:04 |
<Andrew> |
About to update LiquidThreads production version to the alpha. |
[production] |
2010-08-13
§
|
22:03 |
<mark> |
API logins on commons (only) are reported broken |
[production] |
21:45 |
<mark> |
Set correct $cluster variable for reinstalled knsq* squids |
[production] |
21:03 |
<mark> |
Increased cache_mem from 1000 to 2500 on sq33, like the other API backend squids |
[production] |
20:58 |
<mark> |
Stopping backend squid on sq33 |
[production] |
20:50 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24769 - Import source addition for tpi.wikipedia.org' |
[production] |
17:46 |
<Fred> |
and srv100 |
[production] |
17:45 |
<Fred> |
restarted apache on srv219 and srv222 |
[production] |
15:57 |
<mark> |
synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned from the down list' |
[production] |
15:56 |
<mark> |
synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list' |
[production] |
15:53 |
<RobH> |
srv146 removed from puppet and nodelists, slated for wipe, decommissioned. |
[production] |
15:47 |
<mark> |
Sent srv146 to death using echo b > /proc/sysrq-trigger. It had a read-only filesystem and is therefore decommissioned. |
[production] |
15:38 |
<mark> |
Restarted backend squid on sq33 |
[production] |
15:36 |
<mark> |
synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list' |
[production] |
15:25 |
<mark> |
Reinstalled sq32 with Lucid |
[production] |
15:01 |
<mark> |
Removed sq86 and sq87 from API LVS pool |
[production] |
14:55 |
<mark> |
sq80 had been down for a long time. Brought it back up and synced it |
[production] |
14:54 |
<rainman-sr> |
all of the search cluster restored to pre-relocation configuration |
[production] |
14:34 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'reverting search13 to search11' |
[production] |
13:55 |
<mark> |
/dev/sda on sq57 is busted |
[production] |
13:54 |
<RobH> |
removed search17 from search_pool_3 |
[production] |
13:50 |
<mark> |
Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal |
[production] |
13:44 |
<mark> |
powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746] |
[production] |
13:42 |
<mark> |
sq58 was down for a long long time. Brought it back up and synced it |
[production] |
13:37 |
<RobH> |
added search7 back into search_pool_3, kept search17 in as well |
[production] |
13:27 |
<RobH> |
changed search_pool_3 back from search7 to search17 since it failed |
[production] |
13:25 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use' |
[production] |
12:45 |
<mark> |
API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well |
[production] |
12:40 |
<mark> |
Shutdown puppet and backend squid on sq32 |
[production] |