production SAL

301-350 of 2080 results (8ms)

2009-04-20 §
15:29	<Rob>	replaced fan and drive in srv63, reinstalling	[production]
14:36	<Rob>	memory replaced in srv203, back online.	[production]
14:11	<Rob>	shutting down srv203 to swap out bad memory	[production]
05:12	<Tim>	fixed memcached on srv75, stopped old ES slave on srv102, srv106, srv107, srv159, srv171	[production]
2009-04-18 §
14:05	<Tim>	unblocked 80legs, they promised to be nice	[production]
13:56	<tstarling>	synchronized robots.txt	[production]
05:26	<azafred>	rebooted db20 after / ran out of space and started causing all kind of issues.	[production]
2009-04-17 §
22:49	<brion>	regenerated centralnotice output again... this time ok	[production]
22:48	<brion>	srv93 and srv107 memcached nodes are running but broken. restarting them...	[production]
22:43	<brion>	restarted srv82 memcache node. attempting to rebuild centralnotices...	[production]
22:41	<brion>	bad memcached node srv82	[production]
22:05	<mark>	Set up 3 new pywikipedia mailing lists, redirected svn commit output to one of them	[production]
19:38	<robh>	synchronized php-1.5/InitialiseSettings.php 'Bug 18494 Logo for ln.wiki'	[production]
17:22	<Rob>	removed wikimedia.se from our nameservers as they are using their own.	[production]
16:48	<azafred>	updated spamassassin rules on lily to include the SARE rules and mirror the settings on McHenry.	[production]
10:25	<tstarling>	synchronized robots.txt	[production]
08:19	<tstarling>	synchronized php-1.5/InitialiseSettings.php	[production]
07:13	<Tim>	temporarily killed apache on overloaded ES masters	[production]
07:11	<tstarling>	synchronized php-1.5/db.php 'zeroing read load on ES masters'	[production]
06:04	<Tim>	brief site-wide outage while it rebooted, reason unknown. All good now. Resuming logrotate.	[production]
05:55	<Tim>	db20 h/w reboot	[production]
05:48	<Tim>	shutting down daemons on db20 for pre-emptive reboot. Serial console shows "BUG: soft lockup - CPU#4 stuck for 11s! [rsync:27854]" etc.	[production]
05:10	<Tim>	on db20: killed logrotate -f half done due to alarming kswapd CPU (linked to deadlocked rsync processes). May need a reboot.	[production]
05:00	<Tim>	fixed logrotate on db20, broken since March 10 due to broken status file, most likely due to non-ASCII filenames generated by demux.py. Patched demux.py. Removed everything.log.	[production]
02:14	<river>	set up ms6.esams, copying /export/upload from ms1	[production]
00:24	<Tim>	blocked lots of uci.edu IPs that were collectively doing 20 req/s of expensive API queries, overloading ES	[production]
00:15	<brion>	techblog post on Phorm opt-out is linked from slashdot; load on singer seems fairly stable.	[production]
2009-04-16 §
23:06	<tfinc>	synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php	[production]
22:48	<azafred>	bounced apache on srv217. All threads were DED - dead	[production]
22:16	<tfinc>	synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php	[production]
22:08	<tfinc>	synchronized php-1.5/extensions/ContributionReporting/ContributionHistory_body.php	[production]
17:41	<domas>	fantastic. I start _looking_ at stuff and it fixes itself.	[production]
17:35	<midom>	synchronized php-1.5/includes/Revision.php 'live profiling hook'	[production]
17:28	<domas>	db20 has kswapd deadlock, needs reboot soonish	[production]
17:18	<midom>	synchronized php-1.5/InitialiseSettings.php 'disabled stats'	[production]
17:15	<midom>	synchronized php-1.5/InitialiseSettings.php 'enabling udp stats'	[production]
16:18	<azafred>	bounced apache on srv217 (no pid file so previous restart did not include this one)	[production]
15:57	<brion>	network borkage between Florida and Amsterdam. Visitors through AMS proxies can't reach sites.	[production]
15:55	<azafred>	bounced apache on srv[73,86,88,93,108,114,139,141,154,181,194,204,213,99]	[production]
15:52	<Tim-away>	started mysqld on srv98,srv122,srv124,srv142,srv106,srv107: done with them for now. srv102 still going.	[production]
15:30	<mark>	Set up ms6 with SP management at ms6.ipmi.esams.wikimedia.org	[production]
14:13	<mark>	Restoring traffic to Amsterdam cluster	[production]
14:06	<mark>	Reloading csw1-esams	[production]
13:55	<mark>	Reloading csw1-esams	[production]
13:53	<JeLuF>	ms1 NFS issues again. Might be load related	[production]
13:49	<Tim>	copying fedora ES data from ms3 to ms2	[production]
13:44	<JeLuF>	ms1 is reachable, no errors logged, NFS daemons running fine. After some minutes, NFS clients were able to access the server again. Root cause unknown.	[production]
13:38	<JeLuF>	ms1 issues. On NFS slaves: "ls: cannot access /mnt/upload5/: Input/output error"	[production]
13:24	<mark>	DNS scenario knams-down for upcoming core switch reboot	[production]
08:23	<river>	pdns on bayle crashed, bindbackend parser seems rather fragile	[production]