production SAL

5501-5550 of 8226 results (6ms)

2009-07-10 §
18:08	<RobH_DC>	turning srv210 into test.wikipedia.org	[production]
17:57	<Andrew>	Reactivating UsabilityInitiative globally, too.	[production]
17:55	<Andrew>	Scapping, back-out diff is in /home/andrew/usability-diff	[production]
17:43	<Andrew>	Apply r52926, r52930, and update Resources and EditToolbar/images	[production]
16:44	<Fred>	reinstalled and configured gmond on storage1.	[production]
15:08	<Rob>	upgraded blog and techblog to wordpress 2.8.1	[production]
13:58	<midom>	synchronized php-1.5/includes/api/ApiQueryCategoryMembers.php 'hello, fix\\!'	[production]
12:40	<Tim>	prototype.wikimedia.org is in OOM death, nagios reports down 3 hours, still responsive on shell so I will try a light touch	[production]
11:08	<tstarling>	synchronized php-1.5/mc-pmtpa.php 'more'	[production]
10:58	<Tim>	installed memcached on srv200-srv209	[production]
10:51	<tstarling>	synchronized php-1.5/mc-pmtpa.php 'deployed the 11 available spares, will make some more'	[production]
10:48	<Tim>	mctest.php reports 17 servers down out of 78, most from the range that Rob decommissioned	[production]
10:37	<Tim>	installed memcached on srv120, srv121, srv122, srv123	[production]
10:32	<Tim>	found rogue server srv101, missing puppet configuration and so skipping syncs. Uninstalled apache on it.	[production]
2009-07-09 §
23:56	<RoanKattouw>	Rebooted prototype around 16:30, got stuck around 15:30	[production]
21:43	<Rob>	srv35 (test.wikipedia.org) is not posting, i think its dead jim.	[production]
21:35	<Rob>	decommissioned srv55 and put srv35 in its place in C4, test.wikipedia.org should be back online shortly	[production]
20:04	<Rob>	removed decommissioned servers from node groups, getting error on syncing up nagios.	[production]
20:03	<Rob>	updated dns for new apache servers	[production]
19:54	<Rob>	decommissioned all old apaches in rack pmtpa b2	[production]
16:22	<Tim>	creating mhrwiki (bug 19515)	[production]
13:27	<domas>	db13 controller battery failed, s2 needs master switch eventually	[production]
2009-07-08 §
13:31	<midom>	synchronized php-1.5/InitialiseSettings.php 'disabling usability initiative on all wikis, except test and usability. someone who enabled this and left at this state should be shot'	[production]
2009-07-07 §
19:06	<Fred>	adjusted www.wikipedia.org apache conf file to remove a redirect-loop to www.wikibooks.org. (bug #19460)	[production]
17:34	<Fred>	found the cause of Ganglia issues: Puppet. Seems like the configuration of the master hosts gets reverted to being deaf automagically...	[production]
17:05	<Fred>	ganglia fixed. For some reason the master cluster nodes were set to Deaf mode... (ie the aggregator couldn't gather data from them).	[production]
15:02	<robh>	synchronized php-1.5/InitialiseSettings.php '19470 Rollback on pt.wikipedia'	[production]
03:37	<Fred>	fixing ganglia. Expect disruption	[production]
00:27	<tomaszf>	starting six worker threads for xml snapshots	[production]
00:12	<Fred>	srv142 and srv55 will need manual power-cycle.	[production]
00:10	<Fred>	Rolling reboot has finally completed.	[production]
2009-07-06 §
23:57	<Fred>	restarted ganglia since it is acting up...	[production]
23:54	<tomaszf>	restarting all xml snapshots due to kernel upgrades	[production]
18:49	<Rob>	upgraded spam detection plugins on blog and techblog	[production]
18:47	<Fred>	starting rolling reboot of servers in Apaches cluster.	[production]
17:53	<tomaszf>	cleaning out space on storage2. lowering retention for xml snapshots to 10	[production]
17:53	<Fred>	upgrading kernel on cluster. This will take a while!	[production]
17:46	<Fred>	rebooting srv220 to test kernel update.	[production]
2009-07-03 §
12:51	<andrew>	synchronized php-1.5/extensions/AbuseFilter/Views/AbuseFilterViewEdit.php 'Re-activating abuse filter public logging in the logging table now that log_type and log_action have been expanded.'	[production]
11:45	<mark>	Kicked iris so it would boot	[production]
10:11	<andrew>	synchronized php-1.5/skins/common/htmlform.js 'IE7 fixes for new preference system	[production]
10:00	<Andrew>	testing	[production]
05:51	<Tim>	restarted squid instances on sq28	[production]
05:47	<Tim>	restarted squid instances on sq2	[production]
05:46	<Tim>	started squid backend on sq10 and sq23, sq24, sq31, restarted frontend on most of those to reduce memory usage	[production]
05:35	<Tim>	restarted squid backend on sq16, was reporting "gateway timeout" apparently for all requests. Seemed to fix it. Will try that for a few more that nagios is complaining about.	[production]
2009-07-02 §
21:38	<Rob>	sq24 wont accept ssh, depooling.	[production]
21:34	<Rob>	rebooting sq21	[production]
21:26	<Rob>	ran changes to push dns back to normal scenario	[production]
19:52	<mark>	Power outage at esams, moving traffic	[production]