production SAL

851-900 of 8869 results (6ms)

2010-08-14 §
15:37	<mark>	dobson has failed RAID1 array member /dev/sda. Running long SMART self test on /dev/sda	[production]
14:18	<mark>	synchronized php-1.5/wmf-config/db.php 'Add ms2 and ms1 to clusters rc1 an cluster22'	[production]
14:06	<mark>	FLUSH TABLES WITH READ LOCK on ms1 for testing	[production]
13:59	<mark>	Stopping mysql on ms1 as monitoring test	[production]
13:59	<mark>	Granted SELECT on mysql.* to nagios on ms3	[production]
10:57	<mark>	Removed oldest LVM snapshot on ixia	[production]
09:43	<mark>	Fixed apparmor profile /etc/apparmor.d/usr.sbin.mysqld on ms1, restarted mysql under apparmor	[production]
09:39	<mark>	START SLAVE on ms1, catching up with ms3	[production]
09:38	<mark>	RESET SLAVE on db5	[production]
09:37	<mark>	STOP SLAVE on db5	[production]
09:35	<mark>	Stopped apparmor on ms1	[production]
08:41	<Andrew>	Leaving as-is for now, hoping somebody with appropriate permissions can fix it later.	[production]
08:40	<Andrew>	STOP SLAVE on db5 gives me ERROR 1045 (00000): Access denied for user: 'wikiadmin@208.80.152.%' (Using password: NO)	[production]
08:34	<Andrew>	Slave is supposedly still running on db5. Assuming Roan didn't stop it when he switched masters a few days ago. Going to text somebody to confirm that stopping is correct course of action.	[production]
08:24	<Andrew>	db5 can't be lagged, it's the master ;-). Obviously something wrong with wfWaitForSlaves.	[production]
08:19	<Andrew>	db5 lagged 217904 seconds	[production]
05:09	<Andrew>	Ran thread_pending_relationship and thread_reaction schema changes on all LiquidThreads wikis	[production]
05:06	<andrew>	synchronizing Wikimedia installation... Revision: 70933	[production]
05:04	<Andrew>	About to update LiquidThreads production version to the alpha.	[production]
2010-08-13 §
22:03	<mark>	API logins on commons (only) are reported broken	[production]
21:45	<mark>	Set correct $cluster variable for reinstalled knsq* squids	[production]
21:03	<mark>	Increased cache_mem from 1000 to 2500 on sq33, like the other API backend squids	[production]
20:58	<mark>	Stopping backend squid on sq33	[production]
20:50	<jeluf>	synchronized php-1.5/wmf-config/InitialiseSettings.php '24769 - Import source addition for tpi.wikipedia.org'	[production]
17:46	<Fred>	and srv100	[production]
17:45	<Fred>	restarted apache on srv219 and srv222	[production]
15:57	<mark>	synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned from the down list'	[production]
15:56	<mark>	synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list'	[production]
15:53	<RobH>	srv146 removed from puppet and nodelists, slated for wipe, decommissioned.	[production]
15:47	<mark>	Sent srv146 to death using echo b > /proc/sysrq-trigger. It had a read-only filesystem and is therefore decommissioned.	[production]
15:38	<mark>	Restarted backend squid on sq33	[production]
15:36	<mark>	synchronized php-1.5/wmf-config/mc.php 'Remove some to-be-decommissioned hosts from the down list'	[production]
15:25	<mark>	Reinstalled sq32 with Lucid	[production]
15:01	<mark>	Removed sq86 and sq87 from API LVS pool	[production]
14:55	<mark>	sq80 had been down for a long time. Brought it back up and synced it	[production]
14:54	<rainman-sr>	all of the search cluster restored to pre-relocation configuration	[production]
14:34	<robh>	synchronized php-1.5/wmf-config/lucene.php 'reverting search13 to search11'	[production]
13:55	<mark>	/dev/sda on sq57 is busted	[production]
13:54	<RobH>	removed search17 from search_pool_3	[production]
13:50	<mark>	Set idleconnection.timeout = 300 (NOT idlecommand.timeout) on all LVS services on lvs3, restarting pybal	[production]
13:44	<mark>	powercycled sq57, which was stuck in [16538652.048532] BUG: soft lockup - CPU#3 stuck for 61s! [gmond:15746]	[production]
13:42	<mark>	sq58 was down for a long long time. Brought it back up and synced it	[production]
13:37	<RobH>	added search7 back into search_pool_3, kept search17 in as well	[production]
13:27	<RobH>	changed search_pool_3 back from search7 to search17 since it failed	[production]
13:25	<robh>	synchronized php-1.5/wmf-config/lucene.php 'Re-enabling LucenePrefixSearch - pushed changes on lvs3 to put search back to normal use'	[production]
12:45	<mark>	API squid cluster is too flaky to my taste. Converting sq33 into an API backend squid as well	[production]
12:40	<mark>	Shutdown puppet and backend squid on sq32	[production]
11:41	<mark>	Corrected changed hostname for api.svc.pmtpa.wmnet in text squid config files	[production]
11:37	<mark>	Temporarily rejecting requests to sq31 backend to give it some breathing room while it's reading its COSS dirs	[production]
11:32	<mark>	Reinstalled sq31 with Lucid	[production]