Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751989AbbD3AUa (ORCPT ); Wed, 29 Apr 2015 20:20:30 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:62890 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751252AbbD3AUZ (ORCPT ); Wed, 29 Apr 2015 20:20:25 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CTEQAOdEFV/38+LHlcgwxTXIxZpy0MAQEBAQEBBpNAhX4EAgKBPk0BAQEBAQGBC0EEg1sBAQEDATocIwULCAMOCgkaCw8FJQMhExmICgcOx1AsGIV+hCCBAoEjgUiBUUkHhC0FhTIHig2GLYY+gSSGOopLg1AjgWUiHBWBTiwxAYECgUIBAQE Date: Thu, 30 Apr 2015 10:20:08 +1000 From: Dave Chinner To: Mike Galbraith Cc: Daniel Phillips , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, tux3@tux3.org, "Theodore Ts'o" , OGAWA Hirofumi Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Message-ID: <20150430002008.GY15810@dastard> References: <8f886f13-6550-4322-95be-93244ae61045@phunq.net> <1430274071.3363.4.camel@gmail.com> <1906f271-aa23-404b-9776-a4e2bce0c6aa@phunq.net> <1430289213.3693.3.camel@gmail.com> <1430325763.19371.41.camel@gmail.com> <1430334326.7360.25.camel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1430334326.7360.25.camel@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5832 Lines: 141 On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote: > Here's something that _might_ interest xfs folks. > > cd git (source repository of git itself) > make clean > echo 3 > /proc/sys/vm/drop_caches > time make -j8 test > > ext4 2m20.721s > xfs 6m41.887s <-- ick > btrfs 1m32.038s > tux3 1m30.262s > > Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test. TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems using defaults: real user sys xfs 3m16.138s 7m8.341s 14m32.462s ext4 3m18.045s 7m7.840s 14m32.994s btrfs 3m45.149s 7m10.184s 16m30.498s What you are seeing is physical seek distances impacting read performance. XFS does not optimise for minimal physical seek distance, and hence is slower than filesytsems that do optimise for minimal seek distance. This shows up especially well on slow single spindles. XFS is *adequate* for the use on slow single drives, but it is really designed for best performance on storage hardware that is not seek distance sensitive. IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and the problem goes away. :) ---- And now in more detail. It's easy to be fast on empty filesystems. XFS does not aim to be fast in such situations - it aims to have consistent performance across the life of the filesystem. In this case, ext4, btrfs and tux3 have optimal allocation filling from the outside of the disk, while XFS is spreading the files across (at least) 4 separate regions of the whole disk. Hence XFS is seeing seek times on read are much larger than the other filesystems when the filesystem is empty as it is doing full disk seeks rather than being confined to the outer edges of spindle. Thing is, once you've abused those filesytsems for a couple of months, the files in ext4, btrfs and tux3 are not going to be laid out perfectly on the outer edge of the disk. They'll be spread all over the place and so all the filesystems will be seeing large seeks on read. The thing is, XFS will have roughly the same performance as when the filesystem is empty because the spreading of the allocation allows it to maintain better locality and separation and hence doesn't fragment free space nearly as badly as the oher filesystems. Free space fragmentation is what leads to performance degradation in filesystems, and all the other filesystem will have degraded to be *much worse* than XFS. Put simply: empty filesystem benchmarking does not show the real performance of the filesystem under sustained production workloads. Hence benchmarks like this - while interesting from a theoretical point of view and are widely used for bragging about whose got the fastest - are mostly irrelevant to determining how the filesystem will perform in production environments. We can also look at this algorithm in a different way: take a large filesystem (say a few hundred TB) across a few tens of disks in a linear concat. ext4, btrfs and tux3 will only hit the first disk in the concat, and so go no faster because they are still bound by physical seek times. XFS, however, will spread the load across many (if not all) of the disks, and so effectively reduce the average seek time by the number of disks doing concurrent IO. Then you'll see that application level IO concurrency becomes the performance limitation, not the physical seek time of the hardware. IOWs, what you don't see here is that the XFS algorithms that make your test slow will keep *lots* of disks busy. i.e. testing empty filesystem performance a single, slow disk demonstrates that an algorithm designed for scalability isn't designed to acheive physical seek distance minimisation. Hence your storage makes XFS look particularly poor in comparison to filesystems that are being designed and optimised for the limitations of single slow spindles... To further demonstrate that it is physical seek distance that is the issue here, lets take the seek time out of the equation (e.g. use a SSD). Doing that will result in basically no difference in performance between all 4 filesystems as performance will now be determined by application level concurrency and that is the same for all tests. e.g. on a 16p, 16GB RAM VM with storage on a SSDs a "make -j 8" compile test on a kernel source tree (using my normal test machine .config) gives: real user sys xfs: 4m6.723s 26m21.087s 2m49.426s ext4: 4m11.415s 26m21.122s 2m49.786s btrfs: 4m8.118s 26m26.440s 2m50.357s i.e. take seek times out of the picture, and XFS is just as fast as any of the other filesystems. Just about everyone I know uses SSDs in their laptops and machines that build kernels these days, and spinning disks are rapidly disappearing from enterprise and HPC environments which also happens to be the target markets for XFS. Hence filesystem performance on slow single spindles is the furthest thing away from what we really need to optimise XFS for. Indeed, I'll point you to where we are going with fsync optimisation - it's completely the other end of the scale: http://oss.sgi.com/archives/xfs/2014-06/msg00214.html i.e. being able to scale effectively to tens of thousands of fsync calls every second because that's what applications like ceph and gluster really need from XFS.... > Are defaults for mkfs.xfs such that nobody sane uses them, or does xfs > really hate whatever git selftests are doing this much? It just hates your disk. Spend $50 and buy a cheap SSD and the problem goes away. :) Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/