production SAL

2551-2600 of 10000 results (8ms)

2010-08-13 §
13:55	<mark>	/dev/sda on sq57 is busted	[production]
13:54	<RobH>	removed search17 from search_pool_3	[production]
13:50	<mark>	Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal	[production]
13:44	<mark>	powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746]	[production]
13:42	<mark>	sq58 was down for a long long time. Brought it back up and synced it	[production]
13:37	<RobH>	added search7 back into search_pool_3, kept search17 in as well	[production]
13:27	<RobH>	changed search_pool_3 back from search7 to search17 since it failed	[production]
13:25	<robh>	synchronized php-1.5/wmf-config/lucene.php 'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use'	[production]
12:45	<mark>	API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well	[production]
12:40	<mark>	Shutdown puppet and backend squid on sq32	[production]
11:41	<mark>	Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files	[production]
11:37	<mark>	Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs	[production]
11:32	<mark>	Reinstalled sq31 with Lucid	[production]
10:25	<mark>	Shutting down backend squid on sq31 to see the load impact	[production]
10:18	<mark>	Setup backend request statistics for the API on torrus	[production]
09:15	<rainman-sr>	bringing up search1-12 and doing some initial index warmups	[production]
01:54	<RobH>	searchidx1, search1-search12 relocated and online, not in cluster until Robert can fix in the morning. The other half will have to move on a different day, 12 hours in the datacenter is long enough.	[production]
01:40	<RobH>	finished moving searchidx1 and search1-12, bringin them back up now	[production]
2010-08-12 §
23:10	<RobH>	shutting down searchidx1, search1-12 for move	[production]
22:40	<robh>	synchronized php-1.5/wmf-config/lucene.php 'swapped search13 and search18 for migration'	[production]
22:37	<robh>	synchronized php-1.5/wmf-config/lucene.php 'reverting so search13 and search18 can change roles'	[production]
22:22	<robh>	synchronized php-1.5/wmf-config/lucene.php 'changes back in place to migrate searchidx1 and search1-10'	[production]
22:19	<RobH>	puppet updated on all search servers, confirmed all have all three lvs ip addresses	[production]
21:55	<mark>	Configured puppet to bind all LVS service IPs to all search servers	[production]
21:54	<RobH>	reverted search_pool changes on lvs	[production]
21:54	<robh>	synchronized php-1.5/wmf-config/lucene.php 'rolling it back'	[production]
21:48	<robh>	synchronized php-1.5/wmf-config/lucene.php 'changing settings for migration of searchidx1 and search1-search12'	[production]
21:43	<RobH>	changing lvs3 search pool settings for server relocations	[production]
20:33	<robh>	synchronized php-1.5/wmf-config/lucene.php 'commented out wgEnableLucenePrefixSearch for search server relocation'	[production]
19:30	<RobH>	srv281 reinstall done but not online as puppet has multiple package issues, leaving out of lvs	[production]
19:09	<RobH>	srv230 is on, but set to false in lvs. do not push back into rotation until after new memory arrives and is installed tomorrow (rt#69)	[production]
18:59	<robh>	synchronized php-1.5/wmf-config/mc.php 'updating without srv230'	[production]
18:53	<RobH>	srv230 coming down for memory testing	[production]
18:49	<RobH>	set srv230 to false in lvs, need to test memory	[production]
18:04	<RobH>	reinstalling srv281	[production]
17:59	<RobH>	nix that, srv125 was ex-es, leaving those for now.	[production]
17:58	<RobH>	pulling srv103 & srv125 for wipe (pulling stuff with temp warnings first)	[production]
17:53	<robh>	synchronized php-1.5/wmf-config/mc.php 'removed srv103, replacing it with srv244'	[production]
17:47	<RobH>	pulling srv95 for wipe	[production]
17:38	<RobH>	srv110 removed from lvs3 config	[production]
17:36	<mark>	Removed all apaches up to srv150 from the appserver LVS pool on lvs3	[production]
17:21	<Fred>	restarting apache on webservers (220,221,222,224)	[production]
16:45	<RobH>	wipe running on adler and amane, and they have been removed from puppet and dsh node groups	[production]
16:12	<jeluf>	synchronized docroot/bits/index.html	[production]
15:41	<mark>	Setup ports ge-2/0/0 to ge-2/0/20 for search servers on asw-b-sdtpa	[production]
15:03	<mark>	Shutdown BGP session to AS1257 130.244.6.249 on port 2/5 of br1-knams, preparing for cable move	[production]
13:08	<mark>	Recovered backend squid on knsq11	[production]
12:53	<mark>	Reassembling RAID arrays md0 and md1 on knsq11	[production]
12:40	<mark>	Running apt-get upgrade && reboot on amssq31	[production]
11:17	<mark>	Shutdown knsq1 and knsq11 for swapping drives	[production]