2010-04-22 21:38:14

by Steve Brown

[permalink] [raw]
Subject: ext4 benchmark questions

I'm in the process of evaluating various storage options for a large
array (12TB) I'm creating. First. I will preface all of this by
saying that I understand the note in the kernel docs about comparing
file systems under various workloads, and I acknowledge that my exact
methodology isn't perfect. But it works for what I'm doing. :) This
array will be used for storage of large media files (up to 20-30GB per
file). I'm testing using iozone with various file sizes ranging from
4GB to 32GB. I'm pretty much settled on a RAID50 (128kb stripe size)
running ext4 on top of LVM (for snapshots, future expansion, etc.).
I'm running kernel 2.6.33.2, e2fsprogs 1.41.11 and util-linux-ng 2.16.

The file system in question was created with the following options:

mkfs -t ext4 -T large -i 524288 -b 4096 -I 256 -E
stride=32,stripe-width=192 /dev/vg/lv

Currently, I'm testing the effect of various mount options on an ext4
file system and my results are not what I would have expected based on
the docs I have read. I wanted to bounce some of them off the list to
find out if I'm completely missing something, or if my expectations
were off.

I'll start with the craziest one: noatime. Everything I have read
says that the noatime option should increase both read and write
performance. My results are finding that write speeds are comparable
with or without this option, but read speeds are significantly faster
*without* the noatime option. For example, a 16GB file reads about
210MB/s with noatime but reads closer to 250MB/s without the noatime
option.

Next is the write barrier. I'm an in a fully battery-backed
environment, so I'm not worried about disabling it. From my testing,
setting barrier=0 will improve write performance on large files
(>10GB), but hurts performance on smaller files (<10GB). Read
performance is effected similarly. Is this to be expected with files
of this size?

Next is the data option. I am seeing a significant increase in read
performance when using data=ordered vs data=writeback. Reading is as
much as 20% faster when using data=ordered. The difference in write
performance is almost none with this option.

Finally is the commit option. I did my testing mounting with commit=5
and commit=90. While my read performance increased with commit=90, my
write performance improved by as much as 30% or more with commit=5.

As I said, I'm looking for some help interpreting these results more
than anything. Any insight into these results that can be provided
would be appreciated.

Steve


2010-04-22 21:52:18

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4 benchmark questions

Steve Brown wrote:
...

> I'll start with the craziest one: noatime. Everything I have read
> says that the noatime option should increase both read and write
> performance. My results are finding that write speeds are comparable
> with or without this option, but read speeds are significantly faster
> *without* the noatime option. For example, a 16GB file reads about
> 210MB/s with noatime but reads closer to 250MB/s without the noatime
> option.

the kernel uses "relatime" now by default, which gives you most of the
benefit already.

> Next is the write barrier. I'm an in a fully battery-backed
> environment, so I'm not worried about disabling it. From my testing,
> setting barrier=0 will improve write performance on large files
> (>10GB), but hurts performance on smaller files (<10GB). Read
> performance is effected similarly. Is this to be expected with files
> of this size?

not expected by me; barriers == drive write cache flushes, which I
would never expect to speed things up...

> Next is the data option. I am seeing a significant increase in read
> performance when using data=ordered vs data=writeback. Reading is as
> much as 20% faster when using data=ordered. The difference in write
> performance is almost none with this option.

data=writeback is not safe for data integrity; unless you can handle
scrambled files post-crash/powerloss, don't use it.

> Finally is the commit option. I did my testing mounting with commit=5
> and commit=90. While my read performance increased with commit=90, my
> write performance improved by as much as 30% or more with commit=5.

not sure offhand what to make of decreased write performance with a longer
commit time...

-Eric

2010-04-22 22:11:34

by Steve Brown

[permalink] [raw]
Subject: Re: ext4 benchmark questions

>> I'll start with the craziest one: noatime. ?Everything I have read
>> says that the noatime option should increase both read and write
>> performance. ?My results are finding that write speeds are comparable
>> with or without this option, but read speeds are significantly faster
>> *without* the noatime option. ?For example, a 16GB file reads about
>> 210MB/s with noatime but reads closer to 250MB/s without the noatime
>> option.
>
> the kernel uses "relatime" now by default, which gives you most of the
> benefit already.

So should I see any performance change by using the noatime mount option at all?

>> Next is the write barrier. ?I'm an in a fully battery-backed
>> environment, so I'm not worried about disabling it. ?From my testing,
>> setting barrier=0 will improve write performance on large files
>> (>10GB), but hurts performance on smaller files (<10GB). ?Read
>> performance is effected similarly. ?Is this to be expected with files
>> of this size?
>
> not expected by me; barriers == drive write cache flushes, which I
> would never expect to speed things up...

hmmm... this would seem to conflict with the docs in the kernel, especially:

"Write barriers enforce proper on-disk ordering
of journal commits, making volatile disk write caches
safe to use, at some performance penalty. If
your disks are battery-backed in one way or another,
disabling barriers may safely improve performance."

>> Next is the data option. ?I am seeing a significant increase in read
>> performance when using data=ordered vs data=writeback. ?Reading is as
>> much as 20% faster when using data=ordered. ?The difference in write
>> performance is almost none with this option.
>
> data=writeback is not safe for data integrity; unless you can handle
> scrambled files post-crash/powerloss, don't use it.

I'm not worried about powerloss. The kernel docs seem to imply that
data=[journaled,ordered] come with a performance hit. My results
would indicate otherwise. Should I be seeing this kinda of
performance difference?

>> Finally is the commit option. ?I did my testing mounting with commit=5
>> and commit=90. ?While my read performance increased with commit=90, my
>> write performance improved by as much as 30% or more with commit=5.
>
> not sure offhand what to make of decreased write performance with a longer
> commit time...

Steve

2010-04-22 22:20:54

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4 benchmark questions

Steve Brown wrote:
>>> I'll start with the craziest one: noatime. Everything I have read
>>> says that the noatime option should increase both read and write
>>> performance. My results are finding that write speeds are comparable
>>> with or without this option, but read speeds are significantly faster
>>> *without* the noatime option. For example, a 16GB file reads about
>>> 210MB/s with noatime but reads closer to 250MB/s without the noatime
>>> option.
>> the kernel uses "relatime" now by default, which gives you most of the
>> benefit already.
>
> So should I see any performance change by using the noatime mount option at all?

they are not exactly the same thing, so noatime may be -slightly-
faster in some cases than relatime.

>>> Next is the write barrier. I'm an in a fully battery-backed
>>> environment, so I'm not worried about disabling it. From my testing,
>>> setting barrier=0 will improve write performance on large files
>>> (>10GB), but hurts performance on smaller files (<10GB). Read
>>> performance is effected similarly. Is this to be expected with files
>>> of this size?
>> not expected by me; barriers == drive write cache flushes, which I
>> would never expect to speed things up...
>
> hmmm... this would seem to conflict with the docs in the kernel, especially:
>
> "Write barriers enforce proper on-disk ordering
> of journal commits, making volatile disk write caches
> safe to use, at some performance penalty. If
> your disks are battery-backed in one way or another,
> disabling barriers may safely improve performance."

what you saw is in conflict with what is expected, yes; I don't know
why barriers would ever increase performance.

(my description of barriers as drive write caches isn't in conflict
with the docs, I just said how they're implemented)

>>> Next is the data option. I am seeing a significant increase in read
>>> performance when using data=ordered vs data=writeback. Reading is as
>>> much as 20% faster when using data=ordered. The difference in write
>>> performance is almost none with this option.
>> data=writeback is not safe for data integrity; unless you can handle
>> scrambled files post-crash/powerloss, don't use it.
>
> I'm not worried about powerloss. The kernel docs seem to imply that
> data=[journaled,ordered] come with a performance hit. My results
> would indicate otherwise. Should I be seeing this kinda of
> performance difference?

Sorry, I misread... I also don't know why reading would be much affected
at all by the journalling mode, which journals -writes- (reading can
update metadata, but not much, esp. if you have noatime/relatime).

-Eric

>>> Finally is the commit option. I did my testing mounting with commit=5
>>> and commit=90. While my read performance increased with commit=90, my
>>> write performance improved by as much as 30% or more with commit=5.
>> not sure offhand what to make of decreased write performance with a longer
>> commit time...
>
> Steve
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2010-04-23 14:42:21

by Ric Wheeler

[permalink] [raw]
Subject: Re: ext4 benchmark questions

On 04/22/2010 06:20 PM, Eric Sandeen wrote:
> Steve Brown wrote:
>
>>>> I'll start with the craziest one: noatime. Everything I have read
>>>> says that the noatime option should increase both read and write
>>>> performance. My results are finding that write speeds are comparable
>>>> with or without this option, but read speeds are significantly faster
>>>> *without* the noatime option. For example, a 16GB file reads about
>>>> 210MB/s with noatime but reads closer to 250MB/s without the noatime
>>>> option.
>>>>
>>> the kernel uses "relatime" now by default, which gives you most of the
>>> benefit already.
>>>
>> So should I see any performance change by using the noatime mount option at all?
>>
> they are not exactly the same thing, so noatime may be -slightly-
> faster in some cases than relatime.
>
>
>>>> Next is the write barrier. I'm an in a fully battery-backed
>>>> environment, so I'm not worried about disabling it. From my testing,
>>>> setting barrier=0 will improve write performance on large files
>>>> (>10GB), but hurts performance on smaller files (<10GB). Read
>>>> performance is effected similarly. Is this to be expected with files
>>>> of this size?
>>>>
>>> not expected by me; barriers == drive write cache flushes, which I
>>> would never expect to speed things up...
>>>
>> hmmm... this would seem to conflict with the docs in the kernel, especially:
>>
>> "Write barriers enforce proper on-disk ordering
>> of journal commits, making volatile disk write caches
>> safe to use, at some performance penalty. If
>> your disks are battery-backed in one way or another,
>> disabling barriers may safely improve performance."
>>
> what you saw is in conflict with what is expected, yes; I don't know
> why barriers would ever increase performance.
>
> (my description of barriers as drive write caches isn't in conflict
> with the docs, I just said how they're implemented)
>

Barriers when working should never make things faster, at best, we
should have parity.

Also important to note that barriers should be disabled if you hardware
RAID card exports itself as a "write through" cache, even if you enable
barriers on the command line.

What controller are you using and what kind of drives do you have in the
back end?

ric


>
>>>> Next is the data option. I am seeing a significant increase in read
>>>> performance when using data=ordered vs data=writeback. Reading is as
>>>> much as 20% faster when using data=ordered. The difference in write
>>>> performance is almost none with this option.
>>>>
>>> data=writeback is not safe for data integrity; unless you can handle
>>> scrambled files post-crash/powerloss, don't use it.
>>>
>> I'm not worried about powerloss. The kernel docs seem to imply that
>> data=[journaled,ordered] come with a performance hit. My results
>> would indicate otherwise. Should I be seeing this kinda of
>> performance difference?
>>
> Sorry, I misread... I also don't know why reading would be much affected
> at all by the journalling mode, which journals -writes- (reading can
> update metadata, but not much, esp. if you have noatime/relatime).
>
> -Eric
>
>
>>>> Finally is the commit option. I did my testing mounting with commit=5
>>>> and commit=90. While my read performance increased with commit=90, my
>>>> write performance improved by as much as 30% or more with commit=5.
>>>>
>>> not sure offhand what to make of decreased write performance with a longer
>>> commit time...
>>>
>> Steve
>>


2010-04-23 15:38:50

by Steve Brown

[permalink] [raw]
Subject: Re: ext4 benchmark questions

>>>> not expected by me; barriers == drive write cache flushes, which I
>>>> would never expect to speed things up...
>>>>
>>>
>>> hmmm... this would seem to conflict with the docs in the kernel,
>>> especially:
>>>
>>> "Write barriers enforce proper on-disk ordering
>>> of journal commits, making volatile disk write caches
>>> safe to use, at some performance penalty. ?If
>>> your disks are battery-backed in one way or another,
>>> disabling barriers may safely improve performance."
>>>
>>
>> what you saw is in conflict with what is expected, yes; I don't know
>> why barriers would ever increase performance.
>>
>> (my description of barriers as drive write caches isn't in conflict
>> with the docs, I just said how they're implemented)
>>
>
> Barriers when working should never make things faster, at best, we should
> have parity.
>
> Also important to note that barriers should be disabled if you hardware RAID
> card exports itself as a "write through" cache, even if you enable barriers
> on the command line.
>
> What controller are you using and what kind of drives do you have in the
> back end?

Thats good to know about the write barriers with WT cache. I'm still
setting everything manually in /etc/fstab because, well... I don't
always trust software. ;)

The controller is an LSI 9280-8e (megaraid_sas kernel module). Drives
are 1TB Seagate ES.2s, 16 of them in the chassis.

Steve

2010-04-23 15:45:42

by Ric Wheeler

[permalink] [raw]
Subject: Re: ext4 benchmark questions

On 04/23/2010 11:38 AM, Steve Brown wrote:
>>>>> not expected by me; barriers == drive write cache flushes, which I
>>>>> would never expect to speed things up...
>>>>>
>>>>>
>>>> hmmm... this would seem to conflict with the docs in the kernel,
>>>> especially:
>>>>
>>>> "Write barriers enforce proper on-disk ordering
>>>> of journal commits, making volatile disk write caches
>>>> safe to use, at some performance penalty. If
>>>> your disks are battery-backed in one way or another,
>>>> disabling barriers may safely improve performance."
>>>>
>>>>
>>> what you saw is in conflict with what is expected, yes; I don't know
>>> why barriers would ever increase performance.
>>>
>>> (my description of barriers as drive write caches isn't in conflict
>>> with the docs, I just said how they're implemented)
>>>
>>>
>> Barriers when working should never make things faster, at best, we should
>> have parity.
>>
>> Also important to note that barriers should be disabled if you hardware RAID
>> card exports itself as a "write through" cache, even if you enable barriers
>> on the command line.
>>
>> What controller are you using and what kind of drives do you have in the
>> back end?
>>
> Thats good to know about the write barriers with WT cache. I'm still
> setting everything manually in /etc/fstab because, well... I don't
> always trust software. ;)
>
> The controller is an LSI 9280-8e (megaraid_sas kernel module). Drives
> are 1TB Seagate ES.2s, 16 of them in the chassis.
>
> Steve
>
>

If you have the boot time log messages for the disks you use, you can
see how the cache is advertised to the kernel.

Also note that having battery backed RAID cards does not mean that your
drive's write cache will survive a power outage. You need to use vendor
specific tools usually to poke at the drives and make sure that the
write cache on the S-ATA disks is properly disabled (unless the LSI
firmware does something to manage the write cache on the drives).

Thanks!

Ric



2010-04-23 15:49:08

by Steve Brown

[permalink] [raw]
Subject: Re: ext4 benchmark questions

> Also note that having battery backed RAID cards does not mean that your
> drive's write cache will survive a power outage. You need to use vendor
> specific tools usually to poke at the drives and make sure that the write
> cache on the S-ATA disks is properly disabled (unless the LSI firmware does
> something to manage the write cache on the drives).

The server is fully battery backed for up to 45 minutes. Also, LSI
does provide tools to disable the cache when the BBU fails. Its one
of the array config parameters.