2010-08-13
§
|
14:54 |
<rainman-sr> |
all of the search cluster restored to pre-relocation configuration |
[production] |
14:34 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'reverting search13 to search11' |
[production] |
13:55 |
<mark> |
/dev/sda on sq57 is busted |
[production] |
13:54 |
<RobH> |
removed search17 from search_pool_3 |
[production] |
13:50 |
<mark> |
Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal |
[production] |
13:44 |
<mark> |
powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746] |
[production] |
13:42 |
<mark> |
sq58 was down for a long long time. Brought it back up and synced it |
[production] |
13:37 |
<RobH> |
added search7 back into search_pool_3, kept search17 in as well |
[production] |
13:27 |
<RobH> |
changed search_pool_3 back from search7 to search17 since it failed |
[production] |
13:25 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use' |
[production] |
12:45 |
<mark> |
API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well |
[production] |
12:40 |
<mark> |
Shutdown puppet and backend squid on sq32 |
[production] |
11:41 |
<mark> |
Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files |
[production] |
11:37 |
<mark> |
Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs |
[production] |
11:32 |
<mark> |
Reinstalled sq31 with Lucid |
[production] |
10:25 |
<mark> |
Shutting down backend squid on sq31 to see the load impact |
[production] |
10:18 |
<mark> |
Setup backend request statistics for the API on torrus |
[production] |
09:15 |
<rainman-sr> |
bringing up search1-12 and doing some initial index warmups |
[production] |
01:54 |
<RobH> |
searchidx1, search1-search12 relocated and online, not in cluster until Robert can fix in the morning. The other half will have to move on a different day, 12 hours in the datacenter is long enough. |
[production] |
01:40 |
<RobH> |
finished moving searchidx1 and search1-12, bringin them back up now |
[production] |
2010-08-12
§
|
23:10 |
<RobH> |
shutting down searchidx1, search1-12 for move |
[production] |
22:40 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'swapped search13 and search18 for migration' |
[production] |
22:37 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'reverting so search13 and search18 can change roles' |
[production] |
22:22 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'changes back in place to migrate searchidx1 and search1-10' |
[production] |
22:19 |
<RobH> |
puppet updated on all search servers, confirmed all have all three lvs ip addresses |
[production] |
21:55 |
<mark> |
Configured puppet to bind all LVS service IPs to all search servers |
[production] |
21:54 |
<RobH> |
reverted search_pool changes on lvs |
[production] |
21:54 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'rolling it back' |
[production] |
21:48 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'changing settings for migration of searchidx1 and search1-search12' |
[production] |
21:43 |
<RobH> |
changing lvs3 search pool settings for server relocations |
[production] |
20:33 |
<robh> |
synchronized php-1.5/wmf-config/lucene.php 'commented out wgEnableLucenePrefixSearch for search server relocation' |
[production] |
19:30 |
<RobH> |
srv281 reinstall done but not online as puppet has multiple package issues, leaving out of lvs |
[production] |
19:09 |
<RobH> |
srv230 is on, but set to false in lvs. do not push back into rotation until after new memory arrives and is installed tomorrow (rt#69) |
[production] |
18:59 |
<robh> |
synchronized php-1.5/wmf-config/mc.php 'updating without srv230' |
[production] |
18:53 |
<RobH> |
srv230 coming down for memory testing |
[production] |
18:49 |
<RobH> |
set srv230 to false in lvs, need to test memory |
[production] |
18:04 |
<RobH> |
reinstalling srv281 |
[production] |
17:59 |
<RobH> |
nix that, srv125 was ex-es, leaving those for now. |
[production] |
17:58 |
<RobH> |
pulling srv103 & srv125 for wipe (pulling stuff with temp warnings first) |
[production] |
17:53 |
<robh> |
synchronized php-1.5/wmf-config/mc.php 'removed srv103, replacing it with srv244' |
[production] |
17:47 |
<RobH> |
pulling srv95 for wipe |
[production] |
17:38 |
<RobH> |
srv110 removed from lvs3 config |
[production] |
17:36 |
<mark> |
Removed all apaches up to srv150 from the appserver LVS pool on lvs3 |
[production] |
17:21 |
<Fred> |
restarting apache on webservers (220,221,222,224) |
[production] |
16:45 |
<RobH> |
wipe running on adler and amane, and they have been removed from puppet and dsh node groups |
[production] |
16:12 |
<jeluf> |
synchronized docroot/bits/index.html |
[production] |
15:41 |
<mark> |
Setup ports ge-2/0/0 to ge-2/0/20 for search servers on asw-b-sdtpa |
[production] |
15:03 |
<mark> |
Shutdown BGP session to AS1257 130.244.6.249 on port 2/5 of br1-knams, preparing for cable move |
[production] |
13:08 |
<mark> |
Recovered backend squid on knsq11 |
[production] |
12:53 |
<mark> |
Reassembling RAID arrays md0 and md1 on knsq11 |
[production] |