7101-7150 of 10000 results (13ms)
2011-04-04 §
18:42 <notpeter> added cname etherpad for hooper.wikimedia.org [production]
18:00 <Ryan_Lane> added the wikimedia-fonts package to lucid-wikimedia repo [production]
17:29 <notpeter> adding self to nagios group. rebooterizing nagios. [production]
05:58 <apergos> cleaned up perms on commons/thumb/a/af, left over from interrupted rsync test last night [production]
05:50 <tstarling> synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabling pool counter on all wikis' [production]
04:12 <tstarling> synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabling PoolCounter on testwiki and test2wiki' [production]
01:22 <Tim> apache CPU overload lasted ~10 mins, v. high backend request rate, don't know cause, seems to have stopped now [production]
2011-04-03 §
18:42 <apergos> 8 rsyncs of ms4 thumbs restarted with better perms so scalers can write... in screen as root on ms5. If we start seeing nfs timesouts in the scaler logs please shoot a couple [production]
17:14 <mark> Deployed max-connections on all cache peers for esams.upload squids to their florida parents (current limit 200) [production]
17:00 <mark> Removed the carp weights on the esams backends again, as the weighting was completely screwed up [production]
16:59 <mark> Started knsq13 backend [production]
14:27 <catrope> ran sync-common-all [production]
14:26 <RoanKattouw> Running sync-common-all to deploy r85256 [production]
13:03 <apergos> shot rsyncs on ms5, setting 777 dir perms on all thumbnail dirs (eg e/ef/blablah.jpg) so scalers can write into them [production]
12:53 <apergos> did same for rest of projects and subdirs (777 on hash dirs) [production]
12:47 <apergos> chmod 777 on commons/thumb/*/* on ms5 so that scalers can create directories in there (mismatch of uid apache vs www-data) [production]
11:12 <mark> Raised per-squid connection limit to ms5 of 200 to 400 connections [production]
11:05 <mark> Raised per-squid connection limit to ms5 of 100 to 200 connections [production]
10:55 <mark> Fixed squid loop, the pmtpa.upload squids were using the esams squids as "CARP parents for distant content" [production]
10:29 <mark> Fixed puppet on sq42/43 [production]
09:44 <mark> Lowered FCGI thumb handlers from 90 to 60 again, to reduce concurrency [production]
08:08 <mark> Started 4 more rsyncs (8 total now) [production]
07:49 <mark> Removed mlocate from ms5, puppetising [production]
07:42 <mark> Started 4 rsyncs from ms4 to ms5 (--ignore-existing) [production]
07:32 <mark> increased thumb handler count from 60 to 90 [production]
07:11 <mark> Doubled the amount of fcgi thumb handlers [production]
07:08 <mark> Turned off logging of 404s to nginx error.log [production]
06:50 <mark> Restarted Apache on the image scalers [production]
06:49 <mark> Reconfigured ms5 to use the 404 thumb handler [production]
06:48 <Ryan_Lane> disabling nfs on ms4 [production]
06:33 <mark> Running puppet on all apaches to fix fstab and mount ms5.pmtpa.wmnet:/export/thumbs [production]
06:32 <mark> Unmounting /mnt/thumbs on all mediawiki-installation servers [production]
06:30 <mark> Remounted NFS /mnt/thumbs on the scalers to ms5 [production]
06:28 <Ryan_Lane> bring nfs back up [production]
06:28 <Ryan_Lane> brought ms4 back up. stopping the web server service and nfs [production]
06:20 <mark> Setup NFS kernel server on ms5 [production]
06:18 <Ryan_Lane> powercycling ms4 [production]
05:29 <Ryan_Lane> rebooting ms4 with -d to get a coredump [production]
05:14 <apergos> reanbling webserver on ms4 for testing [production]
04:45 <apergos> stopping web service on ms4 for the moment [production]
04:29 <apergos> shot webserver again [production]
04:26 <apergos> turned off hourly snaps on ms4, turned back on webserver and nfs [production]
04:09 <apergos> rebooted ms4, shut down webserver and nfsd temporarily for testing [production]
02:58 <apergos> still looking at kernel memory issues, still rebooting, ryan should be here in a few minutes to help out [production]
02:03 <apergos> a solaris advisor... also have zfs arch cache max to 2g which is ridiculously low but wtf right? [production]
02:02 <apergos> set tcp_time_wait_interval to 10000 at suggestion of [production]
01:37 <apergos> lowered zfs arch max to 2g (someone should reset this later)... will take effect on next reboot [production]
00:29 <apergos> rebooting with the new zfs arc cache max value, which will reduce the min value as well... dunno if this will give us enough breathing room or not [production]
00:24 <apergos> set zfs arc cache to ridiculously low value of 4gb, since when it's healthy it's using much less than that (1gb), this will take effect on reboot [production]
00:22 <Reedy> Still experiencing MS4 issues, thumb service is likely to be problematic for most users [production]