production SAL

2601-2650 of 10000 results (13ms)

2010-08-14 §
09:37	<mark>	STOP SLAVE on db5	[production]
09:35	<mark>	Stopped apparmor on ms1	[production]
08:41	<Andrew>	Leaving as-is for now, hoping somebody with appropriate permissions can fix it later.	[production]
08:40	<Andrew>	STOP SLAVE on db5 gives me ERROR 1045 (00000): Access denied for user: 'wikiadmin@208.80.152.%' (Using password: NO)	[production]
08:34	<Andrew>	Slave is supposedly still running on db5. Assuming Roan didn't stop it when he switched masters a few days ago. Going to text somebody to confirm that stopping is correct course of action.	[production]
08:24	<Andrew>	db5 can't be lagged, it's the master ;-). Obviously something wrong with wfWaitForSlaves.	[production]
08:19	<Andrew>	db5 lagged 217904 seconds	[production]
05:09	<Andrew>	Ran thread_pending_relationship and thread_reaction schema changes on all LiquidThreads wikis	[production]
05:06	<andrew>	synchronizing Wikimedia installation... Revision: 70933	[production]
05:04	<Andrew>	About to update LiquidThreads production version to the alpha.	[production]
2010-08-13 §
22:03	<mark>	API logins on commons (only) are reported broken	[production]
21:45	<mark>	Set correct $cluster variable for reinstalled knsq* squids	[production]
21:03	<mark>	Increased cache_mem from 1000 to 2500 on sq33, like the other API backend squids	[production]
20:58	<mark>	Stopping backend squid on sq33	[production]
20:50	<jeluf>	synchronized php-1.5/wmf-config/InitialiseSettings.php '24769 - Import source addition for tpi.wikipedia.org'	[production]
17:46	<Fred>	and srv100	[production]
17:45	<Fred>	restarted apache on srv219 and srv222	[production]
15:57	<mark>	synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned from the down list'	[production]
15:56	<mark>	synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list'	[production]
15:53	<RobH>	srv146 removed from puppet and nodelists, slated for wipe, decommissioned.	[production]
15:47	<mark>	Sent srv146 to death using echo b > /proc/sysrq-trigger. It had a read-only filesystem and is therefore decommissioned.	[production]
15:38	<mark>	Restarted backend squid on sq33	[production]
15:36	<mark>	synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list'	[production]
15:25	<mark>	Reinstalled sq32 with Lucid	[production]
15:01	<mark>	Removed sq86 and sq87 from API LVS pool	[production]
14:55	<mark>	sq80 had been down for a long time. Brought it back up and synced it	[production]
14:54	<rainman-sr>	all of the search cluster restored to pre-relocation configuration	[production]
14:34	<robh>	synchronized php-1.5/wmf-config/lucene.php 'reverting search13 to search11'	[production]
13:55	<mark>	/dev/sda on sq57 is busted	[production]
13:54	<RobH>	removed search17 from search_pool_3	[production]
13:50	<mark>	Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal	[production]
13:44	<mark>	powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746]	[production]
13:42	<mark>	sq58 was down for a long long time. Brought it back up and synced it	[production]
13:37	<RobH>	added search7 back into search_pool_3, kept search17 in as well	[production]
13:27	<RobH>	changed search_pool_3 back from search7 to search17 since it failed	[production]
13:25	<robh>	synchronized php-1.5/wmf-config/lucene.php 'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use'	[production]
12:45	<mark>	API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well	[production]
12:40	<mark>	Shutdown puppet and backend squid on sq32	[production]
11:41	<mark>	Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files	[production]
11:37	<mark>	Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs	[production]
11:32	<mark>	Reinstalled sq31 with Lucid	[production]
10:25	<mark>	Shutting down backend squid on sq31 to see the load impact	[production]
10:18	<mark>	Setup backend request statistics for the API on torrus	[production]
09:15	<rainman-sr>	bringing up search1-12 and doing some initial index warmups	[production]
01:54	<RobH>	searchidx1, search1-search12 relocated and online, not in cluster until Robert can fix in the morning. The other half will have to move on a different day, 12 hours in the datacenter is long enough.	[production]
01:40	<RobH>	finished moving searchidx1 and search1-12, bringin them back up now	[production]
2010-08-12 §
23:10	<RobH>	shutting down searchidx1, search1-12 for move	[production]
22:40	<robh>	synchronized php-1.5/wmf-config/lucene.php 'swapped search13 and search18 for migration'	[production]
22:37	<robh>	synchronized php-1.5/wmf-config/lucene.php 'reverting so search13 and search18 can change roles'	[production]
22:22	<robh>	synchronized php-1.5/wmf-config/lucene.php 'changes back in place to migrate searchidx1 and search1-10'	[production]