From: "Jose R. Santos" Subject: Re: ZFS, XFS, and EXT4 compared Date: Thu, 30 Aug 2007 08:37:47 -0500 Message-ID: <20070830083747.018cfe8a@gara> References: <1188454611.23311.13.camel@toonses.gghcwest.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: zfs-discuss@opensolaris.org, xfs@oss.sgi.com, linux-ext4@vger.kernel.org To: "Jeffrey W. Baker" Return-path: Received: from e34.co.us.ibm.com ([32.97.110.152]:43196 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753772AbXH3Nhp (ORCPT ); Thu, 30 Aug 2007 09:37:45 -0400 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e34.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l7UDbhBF027014 for ; Thu, 30 Aug 2007 09:37:43 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l7UDbhlO449684 for ; Thu, 30 Aug 2007 07:37:43 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l7UDbg8j004253 for ; Thu, 30 Aug 2007 07:37:42 -0600 In-Reply-To: <1188454611.23311.13.camel@toonses.gghcwest.com> Sender: linux-ext4-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Wed, 29 Aug 2007 23:16:51 -0700 "Jeffrey W. Baker" wrote: Nice comparisons. > I have a lot of people whispering "zfs" in my virtual ear these days, > and at the same time I have an irrational attachment to xfs based > entirely on its lack of the 32000 subdirectory limit. I'm not afraid of The 32000 subdir limit should be fixed on the latest rc kernels. > ext4's newness, since really a lot of that stuff has been in Lustre for > years. So a-benchmarking I went. Results at the bottom: > > http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html FFSB: Could you send the patch to fix FFSB Solaris build? I should probably update the Sourceforge version so that it built out of the box. I'm also curious about your choices in the FFSB profiles you created. Specifically, the very short run time and doing fsync after every file close. When using FFSB, I usually run with a large run time (usually 600 seconds) to make sure that we do enough IO to get a stable result. Running longer means that we also use more of the disk storage and our results are not base on doing IO to just the beginning of the disk. When running for that long period of time, the fsync flag is not required since we do enough reads and writes to cause memory pressure and guarantee IO going to disk. Nothing wrong in what you did, but I wonder how it would affect the results of these runs. The agefs options you use are also interesting since you only utilize a very small percentage of your filesystem. Also note that since create and append weight are very heavy compare to deletes, the desired utilization would be reach very quickly and without that much fragmentation. Again, nothing wrong here, just very interested in your perspective in selecting these setting for your profile. Postmark: I've been looking at the postmark results and I'm becoming more convince that the meta-data results in ZFS may be artificially high due to the nature of the workload. For one thing, I find it very interesting (e.i. odd) that 9050KB/s reads and 28360KB/s writes shows up multiple times even across filesystems. The data set on postmark is also very limited in size and the run times are small enough that it is difficult to get an idea of sustained meta-data performance on any of the filesystems. Base on the ZFS numbers, it seems that there is hardly any IO being done on the ZFS case given the random nature of the workload and the high numbers it's achieving. In short, I don't think postmark is a very good workload to sustain ZFS claim as the meta-data king. It may very well be the case, but I would like to see that proven with another workload. One that actually show sustained meta-data performance across a fairly large fileset would be preferred. FFSB could be use simulate a meta-data intensive workload as well and it has better control over the fileset size and run time to make the results more interesting. Don't mean to invalidate the Postmark results, just merely pointing out a possible error in the assessment of the meta-data performance of ZFS. I say possible since it's still unknown if another workload will be able to validate these results. General: Did you gathered CPU statistics when running these benchmarks? For some environments, having the ratio filesystem performance vs CPU utilization would be good information to have since some workloads are CPU sensitive and being 20% faster while consuming 50% more CPU may not necessarily be a good thing. While this may be less of an issue in the future since CPU performance seems to be increasing at a much faster pace than IO and disk performance, it would still be another interesting data point. > > Short version: ext4 is awesome. zfs has absurdly fast metadata > operations but falls apart on sequential transfer. xfs has great > sequential transfer but really bad metadata ops, like 3 minutes to tar > up the kernel. > > It would be nice if mke2fs would copy xfs's code for optimal layout on a > software raid. The mkfs defaults and the mdadm defaults interact badly. > > Postmark is somewhat bogus benchmark with some obvious quantization > problems. Ah... Guess you agree with me about the postmark results validity. ;) > Regards, > jwb > -JRS