LinuxLists.cc - ZFS, XFS, and EXT4 compared

2007-08-30 06:16:51

Subject: ZFS, XFS, and EXT4 compared

I have a lot of people whispering "zfs" in my virtual ear these days,
and at the same time I have an irrational attachment to xfs based
entirely on its lack of the 32000 subdirectory limit. I'm not afraid of
ext4's newness, since really a lot of that stuff has been in Lustre for
years. So a-benchmarking I went. Results at the bottom:

http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html

Short version: ext4 is awesome. zfs has absurdly fast metadata
operations but falls apart on sequential transfer. xfs has great
sequential transfer but really bad metadata ops, like 3 minutes to tar
up the kernel.

It would be nice if mke2fs would copy xfs's code for optimal layout on a
software raid. The mkfs defaults and the mdadm defaults interact badly.

Postmark is somewhat bogus benchmark with some obvious quantization
problems.

Regards,
jwb

2007-08-30 06:25:04

by Cyril Plisko

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

Jeffrey,

it would be interesting to see your zpool layout info as well.
It can significantly influence the results obtained in the benchmarks.

On 8/30/07, Jeffrey W. Baker <[email protected]> wrote:
> I have a lot of people whispering "zfs" in my virtual ear these days,
> and at the same time I have an irrational attachment to xfs based
> entirely on its lack of the 32000 subdirectory limit. I'm not afraid of
> ext4's newness, since really a lot of that stuff has been in Lustre for
> years. So a-benchmarking I went. Results at the bottom:
>
> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html
>
> Short version: ext4 is awesome. zfs has absurdly fast metadata
> operations but falls apart on sequential transfer. xfs has great
> sequential transfer but really bad metadata ops, like 3 minutes to tar
> up the kernel.
>
> It would be nice if mke2fs would copy xfs's code for optimal layout on a
> software raid. The mkfs defaults and the mdadm defaults interact badly.
>
> Postmark is somewhat bogus benchmark with some obvious quantization
> problems.
>
> Regards,
> jwb
>
> _______________________________________________
> zfs-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

--
Regards,
Cyril

2007-08-30 06:28:00

by mike

[permalink] [raw]

Subject: Re: [zfs-discuss] ZFS, XFS, and EXT4 compared

On 8/29/07, Jeffrey W. Baker <[email protected]> wrote:
> I have a lot of people whispering "zfs" in my virtual ear these days,
> and at the same time I have an irrational attachment to xfs based
> entirely on its lack of the 32000 subdirectory limit. I'm not afraid of
> ext4's newness, since really a lot of that stuff has been in Lustre for
> years. So a-benchmarking I went. Results at the bottom:
>
> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html
>
> Short version: ext4 is awesome. zfs has absurdly fast metadata
> operations but falls apart on sequential transfer. xfs has great
> sequential transfer but really bad metadata ops, like 3 minutes to tar
> up the kernel.
>
> It would be nice if mke2fs would copy xfs's code for optimal layout on a
> software raid. The mkfs defaults and the mdadm defaults interact badly.

this is cool to see. however, performance wouldn't be my reason for
moving to zfs. the inline checksumming and all that is what i want. if
someone could get nearly incorruptable filesystems (or just a linux
version of zfs... btrfs looks promising) this would be even better.

sadly ext4+swraid isn't as good, i might have tried that since waiting
the right hardware support for zfs for me seems to be unknown at this
point.

2007-08-30 07:07:46

by Nathan Scott

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Wed, 2007-08-29 at 23:16 -0700, Jeffrey W. Baker wrote:
> ... xfs has great
> sequential transfer but really bad metadata ops, like 3 minutes to tar
> up the kernel.

Perhaps this is due to the write barrier support - would be interesting
to try a run with the "-o nobarrier" mount option to XFS. With external
logs, write barriers are automatically disabled, which may explain:
"Oddly XFS has better sequential reads when using an external journal,
which makes little sense."

To improve metadata performance, you have many options with XFS (which
ones are useful depends on the type of metadata workload) - you can try
a v2 format log, and mount with "-o logbsize=256k", try increasing the
directory block size (e.g. mkfs.xfs -nsize=16k, etc), and also the log
size (mkfs.xfs -lsize=XXXXXXb).

Have fun!

cheers.

--
Nathan

2007-08-30 13:20:15

by Christoph Hellwig

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Thu, Aug 30, 2007 at 05:07:46PM +1000, Nathan Scott wrote:
> To improve metadata performance, you have many options with XFS (which
> ones are useful depends on the type of metadata workload) - you can try
> a v2 format log, and mount with "-o logbsize=256k", try increasing the
> directory block size (e.g. mkfs.xfs -nsize=16k, etc), and also the log
> size (mkfs.xfs -lsize=XXXXXXb).

Okay, these suggestions are one too often now. v2 log and large logs/log
buffers are the almost universal suggestions, and we really need to make
these defaults. XFS is already the laughing stock of the Linux community
due to it's absurdely bad default settings.

2007-08-30 13:37:45

by Jose R. Santos

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Wed, 29 Aug 2007 23:16:51 -0700
"Jeffrey W. Baker" <[email protected]> wrote:

Nice comparisons.

> I have a lot of people whispering "zfs" in my virtual ear these days,
> and at the same time I have an irrational attachment to xfs based
> entirely on its lack of the 32000 subdirectory limit. I'm not afraid of

The 32000 subdir limit should be fixed on the latest rc kernels.

> ext4's newness, since really a lot of that stuff has been in Lustre for
> years. So a-benchmarking I went. Results at the bottom:
>
> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html

FFSB:
Could you send the patch to fix FFSB Solaris build? I should probably
update the Sourceforge version so that it built out of the box.

I'm also curious about your choices in the FFSB profiles you created.
Specifically, the very short run time and doing fsync after every file
close. When using FFSB, I usually run with a large run time (usually
600 seconds) to make sure that we do enough IO to get a stable
result. Running longer means that we also use more of the disk
storage and our results are not base on doing IO to just the beginning
of the disk. When running for that long period of time, the fsync flag
is not required since we do enough reads and writes to cause memory
pressure and guarantee IO going to disk. Nothing wrong in what you
did, but I wonder how it would affect the results of these runs.

The agefs options you use are also interesting since you only utilize a
very small percentage of your filesystem. Also note that since create
and append weight are very heavy compare to deletes, the desired
utilization would be reach very quickly and without that much
fragmentation. Again, nothing wrong here, just very interested in your
perspective in selecting these setting for your profile.

Postmark:

I've been looking at the postmark results and I'm becoming more convince
that the meta-data results in ZFS may be artificially high due to the
nature of the workload. For one thing, I find it very interesting
(e.i. odd) that 9050KB/s reads and 28360KB/s writes shows up multiple
times even across filesystems. The data set on postmark is also very
limited in size and the run times are small enough that it is difficult
to get an idea of sustained meta-data performance on any of the
filesystems. Base on the ZFS numbers, it seems that there is hardly
any IO being done on the ZFS case given the random nature of the
workload and the high numbers it's achieving.

In short, I don't think postmark is a very good workload to sustain ZFS
claim as the meta-data king. It may very well be the case, but I would
like to see that proven with another workload. One that actually show
sustained meta-data performance across a fairly large fileset would be
preferred. FFSB could be use simulate a meta-data intensive workload
as well and it has better control over the fileset size and run time to
make the results more interesting.

Don't mean to invalidate the Postmark results, just merely pointing out
a possible error in the assessment of the meta-data performance of ZFS.
I say possible since it's still unknown if another workload will be
able to validate these results.

General:
Did you gathered CPU statistics when running these benchmarks? For
some environments, having the ratio filesystem performance vs CPU
utilization would be good information to have since some workloads are
CPU sensitive and being 20% faster while consuming 50% more CPU may not
necessarily be a good thing. While this may be less of an issue in the
future since CPU performance seems to be increasing at a much faster
pace than IO and disk performance, it would still be another
interesting data point.

>
> Short version: ext4 is awesome. zfs has absurdly fast metadata
> operations but falls apart on sequential transfer. xfs has great
> sequential transfer but really bad metadata ops, like 3 minutes to tar
> up the kernel.
>
> It would be nice if mke2fs would copy xfs's code for optimal layout on a
> software raid. The mkfs defaults and the mdadm defaults interact badly.
>
> Postmark is somewhat bogus benchmark with some obvious quantization
> problems.

Ah... Guess you agree with me about the postmark results validity. ;)

> Regards,
> jwb
>

-JRS

2007-08-30 18:33:48

by Jim Mauro

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

I'll take a look at this. ZFS provides outstanding sequential IO performance
(both read and write). In my testing, I can essentially sustain
"hardware speeds"
with ZFS on sequential loads. That is, assuming 30-60MB/sec per disk
sequential
IO capability (depending on hitting inner or out cylinders), I get
linear scale-up
on sequential loads as I add disks to a zpool, e.g. I can sustain
250-300MB/sec
on a 6 disk zpool, and it's pretty consistent for raidz and raidz2.

Your numbers are in the 50-90MB/second range, or roughly 1/2 to 1/4 what was
measured on the other 2 file systems for the same test. Very odd.

Still looking...

Thanks,
/jim

Jeffrey W. Baker wrote:
> I have a lot of people whispering "zfs" in my virtual ear these days,
> and at the same time I have an irrational attachment to xfs based
> entirely on its lack of the 32000 subdirectory limit. I'm not afraid of
> ext4's newness, since really a lot of that stuff has been in Lustre for
> years. So a-benchmarking I went. Results at the bottom:
>
> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html
>
> Short version: ext4 is awesome. zfs has absurdly fast metadata
> operations but falls apart on sequential transfer. xfs has great
> sequential transfer but really bad metadata ops, like 3 minutes to tar
> up the kernel.
>
> It would be nice if mke2fs would copy xfs's code for optimal layout on a
> software raid. The mkfs defaults and the mdadm defaults interact badly.
>
> Postmark is somewhat bogus benchmark with some obvious quantization
> problems.
>
> Regards,
> jwb
>
> _______________________________________________
> zfs-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

2007-08-30 18:52:11

by Jeffrey W. Baker

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Thu, 2007-08-30 at 08:37 -0500, Jose R. Santos wrote:
> On Wed, 29 Aug 2007 23:16:51 -0700
> "Jeffrey W. Baker" <[email protected]> wrote:
> > http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html
>
> FFSB:
> Could you send the patch to fix FFSB Solaris build? I should probably
> update the Sourceforge version so that it built out of the box.

Sadly I blew away OpenSolaris without preserving the patch, but the gist
of it is this: ctime_r takes three parameters on Solaris (the third is
the buffer length) and Solaris has directio(3c) instead of O_DIRECT.

> I'm also curious about your choices in the FFSB profiles you created.
> Specifically, the very short run time and doing fsync after every file
> close. When using FFSB, I usually run with a large run time (usually
> 600 seconds) to make sure that we do enough IO to get a stable
> result.

With a 1GB machine and max I/O of 200MB/s, I assumed 30 seconds would be
enough for the machine to quiesce. You disagree? The fsync flag is in
there because my primary workload is PostgreSQL, which is entirely
synchronous.

> Running longer means that we also use more of the disk
> storage and our results are not base on doing IO to just the beginning
> of the disk. When running for that long period of time, the fsync flag
> is not required since we do enough reads and writes to cause memory
> pressure and guarantee IO going to disk. Nothing wrong in what you
> did, but I wonder how it would affect the results of these runs.

So do I :) I did want to finish the test in a practical amount of time,
and it takes 4 hours for the RAID to build. I will do a few hours-long
runs of ffsb with Ext4 and see what it looks like.

> The agefs options you use are also interesting since you only utilize a
> very small percentage of your filesystem. Also note that since create
> and append weight are very heavy compare to deletes, the desired
> utilization would be reach very quickly and without that much
> fragmentation. Again, nothing wrong here, just very interested in your
> perspective in selecting these setting for your profile.

The aging takes forever, as you are no doubt already aware. It requires
at least 1 minute for 1% utilization. On a longer run, I can do more
aging. The create and append weights are taken from the README.

> Don't mean to invalidate the Postmark results, just merely pointing out
> a possible error in the assessment of the meta-data performance of ZFS.
> I say possible since it's still unknown if another workload will be
> able to validate these results.

I don't want to pile scorn on XFS, but the postmark workload was chosen
for a reasonable run time on XFS, and then it turned out that it runs in
1-2 seconds on the other filesystems. The scaling factors could have
been better chosen to exercise the high speeds of Ext4 and ZFS. The
test needs to run for more than a minute to get meaningful results from
postmark, since it uses truncated whole number seconds as the
denominator when reporting.

One thing that stood out from the postmark results is how ext4/sw has a
weird inverse scaling with respect to the number of subdirectories.
It's faster with 10000 files in 1 directory than with 100 files each in
100 subdirectories. Odd, no?

> Did you gathered CPU statistics when running these benchmarks?

I didn't bother. If you buy a server these days and it has fewer than
four CPUs, you got ripped off.

-jwb

2007-08-30 18:57:28

by Eric Sandeen

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

Christoph Hellwig wrote:
> On Thu, Aug 30, 2007 at 05:07:46PM +1000, Nathan Scott wrote:
>> To improve metadata performance, you have many options with XFS (which
>> ones are useful depends on the type of metadata workload) - you can try
>> a v2 format log, and mount with "-o logbsize=256k", try increasing the
>> directory block size (e.g. mkfs.xfs -nsize=16k, etc), and also the log
>> size (mkfs.xfs -lsize=XXXXXXb).
>
> Okay, these suggestions are one too often now. v2 log and large logs/log
> buffers are the almost universal suggestions, and we really need to make
> these defaults. XFS is already the laughing stock of the Linux community
> due to it's absurdely bad default settings.

Agreed on reevaluating the defaults, Christoph!

barrier seems to hurt badly on xfs, too. Note: barrier is off by
default on ext[34], so if you want apples to apples there, you need to
change one or the other filesystem's mount options. If your write cache
is safe (battery backed?) you may as well turn barriers off. I'm not
sure offhand who will react more poorly to an evaporating write cache
(with no barriers), ext4 or xfs...

-Eric

2007-08-30 19:07:59

by eric kustarz

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Aug 29, 2007, at 11:16 PM, Jeffrey W. Baker wrote:

> I have a lot of people whispering "zfs" in my virtual ear these days,
> and at the same time I have an irrational attachment to xfs based
> entirely on its lack of the 32000 subdirectory limit. I'm not
> afraid of
> ext4's newness, since really a lot of that stuff has been in Lustre
> for
> years. So a-benchmarking I went. Results at the bottom:
>
> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html
>
> Short version: ext4 is awesome. zfs has absurdly fast metadata
> operations but falls apart on sequential transfer. xfs has great
> sequential transfer but really bad metadata ops, like 3 minutes to tar
> up the kernel.
>
> It would be nice if mke2fs would copy xfs's code for optimal layout
> on a
> software raid. The mkfs defaults and the mdadm defaults interact
> badly.
>
> Postmark is somewhat bogus benchmark with some obvious quantization
> problems.
>
> Regards,
> jwb
>

Hey jwb,

Thanks for taking up the task, its benchmarking so i've got some
questions...

What does it mean to have an external vs. internal journal for ZFS?

Can you show the output of 'zpool status' when using software RAID
vs. hardware RAID for ZFS?

The hardware RAID has a cache on the controller. ZFS will flush the
"cache" when pushing out a txg (essentially before writing out the
uberblock and after writing out the uberblock). When you have a non-
volatile cache with battery backing (such as your setup), its safe to
disable that via putting 'set zfs:zfs_nocacheflush = 1' in /etc/
system and rebooting. Its ugly but we're going through the final
code review of a fix for this (its partly we aren't sending down the
right command and partly even if we did, no storage devices actually
support it quite yet).

What parameters did you give bonnie++? compiled 64bit, right?

For the randomio test, it looks like you used an io_size of 4KB. Are
those aligned? random? How big is the '/dev/sdb' file?

Do you have the parameters given to FFSB?

eric

2007-08-30 19:09:01

by Jeffrey W. Baker

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Thu, 2007-08-30 at 13:57 -0500, Eric Sandeen wrote:
> Christoph Hellwig wrote:
> > On Thu, Aug 30, 2007 at 05:07:46PM +1000, Nathan Scott wrote:
> >> To improve metadata performance, you have many options with XFS (which
> >> ones are useful depends on the type of metadata workload) - you can try
> >> a v2 format log, and mount with "-o logbsize=256k", try increasing the
> >> directory block size (e.g. mkfs.xfs -nsize=16k, etc), and also the log
> >> size (mkfs.xfs -lsize=XXXXXXb).
> >
> > Okay, these suggestions are one too often now. v2 log and large logs/log
> > buffers are the almost universal suggestions, and we really need to make
> > these defaults. XFS is already the laughing stock of the Linux community
> > due to it's absurdely bad default settings.
>
> Agreed on reevaluating the defaults, Christoph!
>
> barrier seems to hurt badly on xfs, too. Note: barrier is off by
> default on ext[34], so if you want apples to apples there, you need to
> change one or the other filesystem's mount options. If your write cache
> is safe (battery backed?) you may as well turn barriers off. I'm not
> sure offhand who will react more poorly to an evaporating write cache
> (with no barriers), ext4 or xfs...

I didn't compare the safety of the three filesystems, but I did have
disk caches disabled and only battery-backed caches enabled. Do you
need barriers without volatile caches?

Most people benchmark ext3 with data=writeback which is unsafe. I used
ordered (the default).

I think if you look at all the features, zfs is theoretically the most
safe filesystem. But in practice, who knows?

-jwb

2007-08-30 19:13:52

by Eric Sandeen

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

Jeffrey W. Baker wrote:
> On Thu, 2007-08-30 at 13:57 -0500, Eric Sandeen wrote:

>> barrier seems to hurt badly on xfs, too. Note: barrier is off by
>> default on ext[34], so if you want apples to apples there, you need to
>> change one or the other filesystem's mount options. If your write cache
>> is safe (battery backed?) you may as well turn barriers off. I'm not
>> sure offhand who will react more poorly to an evaporating write cache
>> (with no barriers), ext4 or xfs...
>
> I didn't compare the safety of the three filesystems,

Understood

> but I did have
> disk caches disabled

Oh, so for the SW raid tests the individual disks had no write cache?f

> and only battery-backed caches enabled. Do you
> need barriers without volatile caches?

As far as I understand it, then nope, you don't need it, and you're
hurting performance with it.

-Eric

2007-08-30 19:53:37

by Jose R. Santos

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

On Thu, 30 Aug 2007 11:52:10 -0700
"Jeffrey W. Baker" <[email protected]> wrote:

> On Thu, 2007-08-30 at 08:37 -0500, Jose R. Santos wrote:
> > On Wed, 29 Aug 2007 23:16:51 -0700
> > "Jeffrey W. Baker" <[email protected]> wrote:
> > > http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html
> >
> > FFSB:
> > Could you send the patch to fix FFSB Solaris build? I should probably
> > update the Sourceforge version so that it built out of the box.
>
> Sadly I blew away OpenSolaris without preserving the patch, but the gist
> of it is this: ctime_r takes three parameters on Solaris (the third is
> the buffer length) and Solaris has directio(3c) instead of O_DIRECT.

If you ever run these workloads again, a tested patch would be greatly
appreciated since I do not currently have access to a OpenSolaris box.

> > I'm also curious about your choices in the FFSB profiles you created.
> > Specifically, the very short run time and doing fsync after every file
> > close. When using FFSB, I usually run with a large run time (usually
> > 600 seconds) to make sure that we do enough IO to get a stable
> > result.
>
> With a 1GB machine and max I/O of 200MB/s, I assumed 30 seconds would be
> enough for the machine to quiesce. You disagree? The fsync flag is in
> there because my primary workload is PostgreSQL, which is entirely
> synchronous.

On your results, you mentioned that you are able to get about 150MB/s
out of the RAID controller and here you said you're getting about
200MB/s in FFSB? Then it does probably mean that you needed to run for
an extended period of time since it could mean that you could be doing
a lot from page cache. You could verify that you get the same results,
by doing one of the runs with a larger run time and comparing it to one
of the previous runs.

The fsync flag only does fsync at file close time, not at each IO
transaction on a selected file. For the purposes of testing
PostgreSQL, wouldn't testing with O_DIRECT be more what you are looking
for?

> > Running longer means that we also use more of the disk
> > storage and our results are not base on doing IO to just the beginning
> > of the disk. When running for that long period of time, the fsync flag
> > is not required since we do enough reads and writes to cause memory
> > pressure and guarantee IO going to disk. Nothing wrong in what you
> > did, but I wonder how it would affect the results of these runs.
>
> So do I :) I did want to finish the test in a practical amount of time,
> and it takes 4 hours for the RAID to build. I will do a few hours-long
> runs of ffsb with Ext4 and see what it looks like.

Been there. I fell your pain. :)

> > The agefs options you use are also interesting since you only utilize a
> > very small percentage of your filesystem. Also note that since create
> > and append weight are very heavy compare to deletes, the desired
> > utilization would be reach very quickly and without that much
> > fragmentation. Again, nothing wrong here, just very interested in your
> > perspective in selecting these setting for your profile.
>
> The aging takes forever, as you are no doubt already aware. It requires
> at least 1 minute for 1% utilization. On a longer run, I can do more
> aging. The create and append weights are taken from the README.

Yes it does take for ever, but since you're doing so very little aging,
why even run it in the first place. It will make you're runs go faster
if you just don't use it. :)

Did such a small aging created noticeable difference in the results?
It may have, since I've never run aging with such a small run time my
self.

> > Don't mean to invalidate the Postmark results, just merely pointing out
> > a possible error in the assessment of the meta-data performance of ZFS.
> > I say possible since it's still unknown if another workload will be
> > able to validate these results.
>
> I don't want to pile scorn on XFS, but the postmark workload was chosen
> for a reasonable run time on XFS, and then it turned out that it runs in
> 1-2 seconds on the other filesystems. The scaling factors could have
> been better chosen to exercise the high speeds of Ext4 and ZFS. The
> test needs to run for more than a minute to get meaningful results from
> postmark, since it uses truncated whole number seconds as the
> denominator when reporting.
>
> One thing that stood out from the postmark results is how ext4/sw has a
> weird inverse scaling with respect to the number of subdirectories.
> It's faster with 10000 files in 1 directory than with 100 files each in
> 100 subdirectories. Odd, no?

Not so weird since inode allocator tries to spread directory inode
across multiple block groups which could cause larger seeks on very
meta-data intensive workloads. I'm actually working on a feature to
address this sort of issue in ext4.

Granted, if you really wanted to simulate file server performance, you
would want to start your workload with a huge fileset to begging with
where the data is spread across a larger chunk of the disk. The
benchmark performance deficiencies on ext4 on a clean filesystem should
be a lot less noticeable.

Yet another reason why I don't particularly like postmark.

> > Did you gathered CPU statistics when running these benchmarks?
>
> I didn't bother. If you buy a server these days and it has fewer than
> four CPUs, you got ripped off.

At the same time you can order a server with several dual port 4GB
fiber channel card and really big and expensive disk arrays with lots
of fast write caches. Here you could see the negative effects of a CPU
hog filesystem.

For desktop or relatively small server setup, I agree that CPU of
utilization of a filesystem would be mostly insignificant/irelevant.

> -jwb
>

-JRS

2007-08-30 22:41:20

by Nathan Scott

[permalink] [raw]

Subject: Re: ZFS, XFS, and EXT4 compared

[culled zfs-discuss from CC, since its subscriber-only]

On Thu, 2007-08-30 at 14:20 +0100, Christoph Hellwig wrote:
>
> On Thu, Aug 30, 2007 at 05:07:46PM +1000, Nathan Scott wrote:
> > To improve metadata performance, you have many options with XFS
> (which
> > ones are useful depends on the type of metadata workload) - you can
> try
> > a v2 format log, and mount with "-o logbsize=256k", try increasing
> the
> > directory block size (e.g. mkfs.xfs -nsize=16k, etc), and also the
> log
> > size (mkfs.xfs -lsize=XXXXXXb).
>
> Okay, these suggestions are one too often now. v2 log and large
> logs/log
> buffers are the almost universal suggestions, and we really need to
> make
> these defaults.

Possibly. Far more importantly for XFS, there really needs to be some
way for RAID drivers to say "even though I support write barriers, its
not a good idea for filesystems to enable write barriers by default on
me". Enabling write barriers everywhere, by default, seems to have a
far worse impact than any mkfs/mount option tweaking.

> XFS is already the laughing stock of the Linux community
> due to it's absurdely bad default settings.

Oh, _thats_ what everyone's laughing at?

cheers.

--
Nathan