2009-06-05 15:23:33

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Hisashi Hifumi wrote:
> At 09:36 09/06/01, Andrew Morton wrote:
>> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
>> <[email protected]> wrote:
>>
>>> I added blk_run_backing_dev on page_cache_async_readahead
>>> so readahead I/O is unpluged to improve throughput on
>>> especially RAID environment.
>> I skipped the last version of this because KOSAKI Motohiro
>> <[email protected]> said "Please attach blktrace analysis ;)".
>>
>> I'm not sure why he asked for that, but he's a smart chap and
>> presumably had his reasons.
>>
>> If you think that such an analysis is unneeded, or isn't worth the time
>> to generate then please tell us that. But please don't just ignore the
>> request!
>
> Hi Andrew.
>
> Sorry for this.
>
> I did not ignore KOSAKI Motohiro's request.
> I've got blktrace output for both with and without the patch,
> but I just did not clarify the reason for throuput improvement
> from this result.
>
> I do not notice any difference except around unplug behavior by dd.
> Comments?

Pardon my ignorance on the global issues concerning the patch, but
specifically looking at the traces generated by blktrace leads one to
also note that the patched version may generate inefficiencies in other
places in the kernel by reducing the merging going on. In the unpatched
version it looks like (generally) that two incoming bio's are able to be
merged to generate a single I/O request. In the patched version -
because of the quicker unplug(?) - no such merging is going on. This
leads to more work lower in the stack (twice as many I/O operations
being managed), perhaps increased interrupts & handling &c. [This may be
acceptable if the goal is to decrease latencies on a per-bio basis...]

Do you have a place where the raw blktrace data can be retrieved for
more in-depth analysis?

Regards,
Alan D. Brunelle
Hewlett-Packard


2009-06-06 14:36:51

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


sorry for late responce.
I wonder why I and Wu don't contain Cc list in this thread.


> Hisashi Hifumi wrote:
> > At 09:36 09/06/01, Andrew Morton wrote:
> >> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
> >> <[email protected]> wrote:
> >>
> >>> I added blk_run_backing_dev on page_cache_async_readahead
> >>> so readahead I/O is unpluged to improve throughput on
> >>> especially RAID environment.
> >> I skipped the last version of this because KOSAKI Motohiro
> >> <[email protected]> said "Please attach blktrace analysis ;)".
> >>
> >> I'm not sure why he asked for that, but he's a smart chap and
> >> presumably had his reasons.
> >>
> >> If you think that such an analysis is unneeded, or isn't worth the time
> >> to generate then please tell us that. But please don't just ignore the
> >> request!
> >
> > Hi Andrew.
> >
> > Sorry for this.
> >
> > I did not ignore KOSAKI Motohiro's request.
> > I've got blktrace output for both with and without the patch,
> > but I just did not clarify the reason for throuput improvement
> > from this result.
> >
> > I do not notice any difference except around unplug behavior by dd.
> > Comments?
>
> Pardon my ignorance on the global issues concerning the patch, but
> specifically looking at the traces generated by blktrace leads one to
> also note that the patched version may generate inefficiencies in other
> places in the kernel by reducing the merging going on. In the unpatched
> version it looks like (generally) that two incoming bio's are able to be
> merged to generate a single I/O request. In the patched version -
> because of the quicker unplug(?) - no such merging is going on. This
> leads to more work lower in the stack (twice as many I/O operations
> being managed), perhaps increased interrupts & handling &c. [This may be
> acceptable if the goal is to decrease latencies on a per-bio basis...]
>
> Do you have a place where the raw blktrace data can be retrieved for
> more in-depth analysis?

I think your comment is really adequate. In another thread, Wu Fengguang pointed
out the same issue.
I and Wu also wait his analysis.

Thanks.

2009-06-06 22:45:55

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Sat, Jun 06, 2009 at 10:36:41PM +0800, KOSAKI Motohiro wrote:
>
> sorry for late responce.
> I wonder why I and Wu don't contain Cc list in this thread.

[restore more CC]

> > Hisashi Hifumi wrote:
> > > At 09:36 09/06/01, Andrew Morton wrote:
> > >> On Fri, 29 May 2009 14:35:55 +0900 Hisashi Hifumi
> > >> <[email protected]> wrote:
> > >>
> > >>> I added blk_run_backing_dev on page_cache_async_readahead
> > >>> so readahead I/O is unpluged to improve throughput on
> > >>> especially RAID environment.
> > >> I skipped the last version of this because KOSAKI Motohiro
> > >> <[email protected]> said "Please attach blktrace analysis ;)".
> > >>
> > >> I'm not sure why he asked for that, but he's a smart chap and
> > >> presumably had his reasons.
> > >>
> > >> If you think that such an analysis is unneeded, or isn't worth the time
> > >> to generate then please tell us that. But please don't just ignore the
> > >> request!
> > >
> > > Hi Andrew.
> > >
> > > Sorry for this.
> > >
> > > I did not ignore KOSAKI Motohiro's request.
> > > I've got blktrace output for both with and without the patch,
> > > but I just did not clarify the reason for throuput improvement
> > > from this result.
> > >
> > > I do not notice any difference except around unplug behavior by dd.
> > > Comments?
> >
> > Pardon my ignorance on the global issues concerning the patch, but
> > specifically looking at the traces generated by blktrace leads one to
> > also note that the patched version may generate inefficiencies in other
> > places in the kernel by reducing the merging going on. In the unpatched
> > version it looks like (generally) that two incoming bio's are able to be
> > merged to generate a single I/O request. In the patched version -
> > because of the quicker unplug(?) - no such merging is going on. This
> > leads to more work lower in the stack (twice as many I/O operations
> > being managed), perhaps increased interrupts & handling &c. [This may be
> > acceptable if the goal is to decrease latencies on a per-bio basis...]
> >
> > Do you have a place where the raw blktrace data can be retrieved for
> > more in-depth analysis?
>
> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> out the same issue.
> I and Wu also wait his analysis.

And do it with a large readahead size :)

Alan, this was my analysis:

: Hifumi, can you help retest with some large readahead size?
:
: Your readahead size (128K) is smaller than your max_sectors_kb (256K),
: so two readahead IO requests get merged into one real IO, that means
: half of the readahead requests are delayed.

ie. two readahead requests get merged and complete together, thus the effective
IO size is doubled but at the same time it becomes completely synchronous IO.

:
: The IO completion size goes down from 512 to 256 sectors:
:
: before patch:
: 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
: 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
: 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
: 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
: 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
:
: after patch:
: 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
: 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
: 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
: 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
: 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]

Thanks,
Fengguang

2009-06-18 19:05:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Sun, 7 Jun 2009 06:45:38 +0800
Wu Fengguang <[email protected]> wrote:

> > > Do you have a place where the raw blktrace data can be retrieved for
> > > more in-depth analysis?
> >
> > I think your comment is really adequate. In another thread, Wu Fengguang pointed
> > out the same issue.
> > I and Wu also wait his analysis.
>
> And do it with a large readahead size :)
>
> Alan, this was my analysis:
>
> : Hifumi, can you help retest with some large readahead size?
> :
> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> : so two readahead IO requests get merged into one real IO, that means
> : half of the readahead requests are delayed.
>
> ie. two readahead requests get merged and complete together, thus the effective
> IO size is doubled but at the same time it becomes completely synchronous IO.
>
> :
> : The IO completion size goes down from 512 to 256 sectors:
> :
> : before patch:
> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> :
> : after patch:
> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>

I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
and it's looking like 2.6.32 material, if ever.

If it turns out to be wonderful, we could always ask the -stable
maintainers to put it in 2.6.x.y I guess.

2009-06-20 04:33:32

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> On Sun, 7 Jun 2009 06:45:38 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > > > Do you have a place where the raw blktrace data can be retrieved for
> > > > more in-depth analysis?
> > >
> > > I think your comment is really adequate. In another thread, Wu Fengguang pointed
> > > out the same issue.
> > > I and Wu also wait his analysis.
> >
> > And do it with a large readahead size :)
> >
> > Alan, this was my analysis:
> >
> > : Hifumi, can you help retest with some large readahead size?
> > :
> > : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> > : so two readahead IO requests get merged into one real IO, that means
> > : half of the readahead requests are delayed.
> >
> > ie. two readahead requests get merged and complete together, thus the effective
> > IO size is doubled but at the same time it becomes completely synchronous IO.
> >
> > :
> > : The IO completion size goes down from 512 to 256 sectors:
> > :
> > : before patch:
> > : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> > : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> > : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> > : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> > : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> > :
> > : after patch:
> > : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> > : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> > : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> > : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> > : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >
>
> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> and it's looking like 2.6.32 material, if ever.
>
> If it turns out to be wonderful, we could always ask the -stable
> maintainers to put it in 2.6.x.y I guess.

Agreed. The expected (and interesting) test on a properly configured
HW RAID has not happened yet, hence the theory remains unsupported.

Thanks,
Fengguang

2009-06-20 12:30:48

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>> On Sun, 7 Jun 2009 06:45:38 +0800
>> Wu Fengguang <[email protected]> wrote:
>>
>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>> more in-depth analysis?
>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>> out the same issue.
>>>> I and Wu also wait his analysis.
>>> And do it with a large readahead size :)
>>>
>>> Alan, this was my analysis:
>>>
>>> : Hifumi, can you help retest with some large readahead size?
>>> :
>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>> : so two readahead IO requests get merged into one real IO, that means
>>> : half of the readahead requests are delayed.
>>>
>>> ie. two readahead requests get merged and complete together, thus the effective
>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>
>>> :
>>> : The IO completion size goes down from 512 to 256 sectors:
>>> :
>>> : before patch:
>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>> :
>>> : after patch:
>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>
>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>> and it's looking like 2.6.32 material, if ever.
>>
>> If it turns out to be wonderful, we could always ask the -stable
>> maintainers to put it in 2.6.x.y I guess.
>
> Agreed. The expected (and interesting) test on a properly configured
> HW RAID has not happened yet, hence the theory remains unsupported.

Hmm, do you see anything improper in the Ronald's setup (see
http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
It is HW RAID based.

As I already wrote, we can ask Ronald to perform any needed tests.

> Thanks,
> Fengguang

2009-06-29 09:34:50

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>
> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> > On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >> On Sun, 7 Jun 2009 06:45:38 +0800
> >> Wu Fengguang <[email protected]> wrote:
> >>
> >>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>> more in-depth analysis?
> >>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>> out the same issue.
> >>>> I and Wu also wait his analysis.
> >>> And do it with a large readahead size :)
> >>>
> >>> Alan, this was my analysis:
> >>>
> >>> : Hifumi, can you help retest with some large readahead size?
> >>> :
> >>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>> : so two readahead IO requests get merged into one real IO, that means
> >>> : half of the readahead requests are delayed.
> >>>
> >>> ie. two readahead requests get merged and complete together, thus the effective
> >>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>
> >>> :
> >>> : The IO completion size goes down from 512 to 256 sectors:
> >>> :
> >>> : before patch:
> >>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> >>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> >>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> >>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> >>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> >>> :
> >>> : after patch:
> >>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> >>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> >>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> >>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> >>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >>>
> >> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >> and it's looking like 2.6.32 material, if ever.
> >>
> >> If it turns out to be wonderful, we could always ask the -stable
> >> maintainers to put it in 2.6.x.y I guess.
> >
> > Agreed. The expected (and interesting) test on a properly configured
> > HW RAID has not happened yet, hence the theory remains unsupported.
>
> Hmm, do you see anything improper in the Ronald's setup (see
> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> It is HW RAID based.

No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
RAID performance is too bad and may be improved by increasing the
readahead size, hehe.

> As I already wrote, we can ask Ronald to perform any needed tests.

Thanks! Ronald's test results are:

231 MB/s HW RAID
69.6 MB/s HW RAID + SCST
89.7 MB/s HW RAID + SCST + this patch

So this patch seem to help SCST, but again it would be better to
improve the SCST throughput first - it is now quite sub-optimal.
(Sorry for the long delay: currently I have not got an idea on
how to measure such timing issues.)

And if Ronald could provide the HW RAID performance with this patch,
then we can confirm if this patch really makes a difference for RAID.

Thanks,
Fengguang

2009-06-29 10:26:25

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/6/29 Wu Fengguang <[email protected]>:
> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>
>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>> > On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>> >> On Sun, 7 Jun 2009 06:45:38 +0800
>> >> Wu Fengguang <[email protected]> wrote:
>> >>
>> >>>>> Do you have a place where the raw blktrace data can be retrieved for
>> >>>>> more in-depth analysis?
>> >>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>> >>>> out the same issue.
>> >>>> I and Wu also wait his analysis.
>> >>> And do it with a large readahead size :)
>> >>>
>> >>> Alan, this was my analysis:
>> >>>
>> >>> : Hifumi, can you help retest with some large readahead size?
>> >>> :
>> >>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>> >>> : so two readahead IO requests get merged into one real IO, that means
>> >>> : half of the readahead requests are delayed.
>> >>>
>> >>> ie. two readahead requests get merged and complete together, thus the effective
>> >>> IO size is doubled but at the same time it becomes completely synchronous IO.
>> >>>
>> >>> :
>> >>> : The IO completion size goes down from 512 to 256 sectors:
>> >>> :
>> >>> : before patch:
>> >>> : ? 8,0 ? ?3 ? 177955 ? ?50.050313976 ? ? 0 ?C ? R 8724991 + 512 [0]
>> >>> : ? 8,0 ? ?3 ? 177966 ? ?50.053380250 ? ? 0 ?C ? R 8725503 + 512 [0]
>> >>> : ? 8,0 ? ?3 ? 177977 ? ?50.056970395 ? ? 0 ?C ? R 8726015 + 512 [0]
>> >>> : ? 8,0 ? ?3 ? 177988 ? ?50.060326743 ? ? 0 ?C ? R 8726527 + 512 [0]
>> >>> : ? 8,0 ? ?3 ? 177999 ? ?50.063922341 ? ? 0 ?C ? R 8727039 + 512 [0]
>> >>> :
>> >>> : after patch:
>> >>> : ? 8,0 ? ?3 ? 257297 ? ?50.000760847 ? ? 0 ?C ? R 9480703 + 256 [0]
>> >>> : ? 8,0 ? ?3 ? 257306 ? ?50.003034240 ? ? 0 ?C ? R 9480959 + 256 [0]
>> >>> : ? 8,0 ? ?3 ? 257307 ? ?50.003076338 ? ? 0 ?C ? R 9481215 + 256 [0]
>> >>> : ? 8,0 ? ?3 ? 257323 ? ?50.004774693 ? ? 0 ?C ? R 9481471 + 256 [0]
>> >>> : ? 8,0 ? ?3 ? 257332 ? ?50.006865854 ? ? 0 ?C ? R 9481727 + 256 [0]
>> >>>
>> >> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>> >> and it's looking like 2.6.32 material, if ever.
>> >>
>> >> If it turns out to be wonderful, we could always ask the -stable
>> >> maintainers to put it in 2.6.x.y I guess.
>> >
>> > Agreed. The expected (and interesting) test on a properly configured
>> > HW RAID has not happened yet, hence the theory remains unsupported.
>>
>> Hmm, do you see anything improper in the Ronald's setup (see
>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>> It is HW RAID based.
>
> No. Ronald's HW RAID performance is reasonably good. ?I meant Hifumi's
> RAID performance is too bad and may be improved by increasing the
> readahead size, hehe.
>
>> As I already wrote, we can ask Ronald to perform any needed tests.
>
> Thanks! ?Ronald's test results are:
>
> 231 ? MB/s ? HW RAID
> ?69.6 MB/s ? HW RAID + SCST
> ?89.7 MB/s ? HW RAID + SCST + this patch
>
> So this patch seem to help SCST, but again it would be better to
> improve the SCST throughput first - it is now quite sub-optimal.
> (Sorry for the long delay: currently I have not got an idea on
> ?how to measure such timing issues.)
>
> And if Ronald could provide the HW RAID performance with this patch,
> then we can confirm if this patch really makes a difference for RAID.

I just tested raw HW RAID throughput with the patch applied, same
readahead setting (512KB), and it doesn't look promising:

./blockdev-perftest -d -r /dev/cciss/c0d0
blocksize W W W R R R
67108864 -1 -1 -1 5.59686 5.4098 5.45396
33554432 -1 -1 -1 6.18616 6.13232 5.96124
16777216 -1 -1 -1 7.6757 7.32139 7.4966
8388608 -1 -1 -1 8.82793 9.02057 9.01055
4194304 -1 -1 -1 12.2289 12.6804 12.19
2097152 -1 -1 -1 13.3012 13.706 14.7542
1048576 -1 -1 -1 11.7577 12.3609 11.9507
524288 -1 -1 -1 12.4112 12.2383 11.9105
262144 -1 -1 -1 7.30687 7.4417 7.38246
131072 -1 -1 -1 7.95752 7.95053 8.60796
65536 -1 -1 -1 10.1282 10.1286 10.1956
32768 -1 -1 -1 9.91857 9.98597 10.8421
16384 -1 -1 -1 10.8267 10.8899 10.8718
8192 -1 -1 -1 12.0345 12.5275 12.005
4096 -1 -1 -1 15.1537 15.0771 15.1753
2048 -1 -1 -1 25.432 24.8985 25.4303
1024 -1 -1 -1 45.2674 45.2707 45.3504
512 -1 -1 -1 87.9405 88.5047 87.4726

It dropped down to 189 MB/s. :(

Ronald.

2009-06-29 10:55:41

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev



Wu Fengguang, on 06/29/2009 01:34 PM wrote:
> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>> Wu Fengguang <[email protected]> wrote:
>>>>
>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>> more in-depth analysis?
>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>> out the same issue.
>>>>>> I and Wu also wait his analysis.
>>>>> And do it with a large readahead size :)
>>>>>
>>>>> Alan, this was my analysis:
>>>>>
>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>> :
>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>> : half of the readahead requests are delayed.
>>>>>
>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>
>>>>> :
>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>> :
>>>>> : before patch:
>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>>>> :
>>>>> : after patch:
>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>>>
>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>> and it's looking like 2.6.32 material, if ever.
>>>>
>>>> If it turns out to be wonderful, we could always ask the -stable
>>>> maintainers to put it in 2.6.x.y I guess.
>>> Agreed. The expected (and interesting) test on a properly configured
>>> HW RAID has not happened yet, hence the theory remains unsupported.
>> Hmm, do you see anything improper in the Ronald's setup (see
>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>> It is HW RAID based.
>
> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> RAID performance is too bad and may be improved by increasing the
> readahead size, hehe.
>
>> As I already wrote, we can ask Ronald to perform any needed tests.
>
> Thanks! Ronald's test results are:
>
> 231 MB/s HW RAID
> 69.6 MB/s HW RAID + SCST
> 89.7 MB/s HW RAID + SCST + this patch
>
> So this patch seem to help SCST, but again it would be better to
> improve the SCST throughput first - it is now quite sub-optimal.

No, SCST performance isn't an issue here. You simply can't get more than
110 MB/s from iSCSI over 1GbE, hence 231 MB/s fundamentally isn't
possible. There is only room for 20% improvement, which should be
achieved with better client-side-driven pipelining (see our other
discussions, e.g. http://lkml.org/lkml/2009/5/12/370)

> (Sorry for the long delay: currently I have not got an idea on
> how to measure such timing issues.)
>
> And if Ronald could provide the HW RAID performance with this patch,
> then we can confirm if this patch really makes a difference for RAID.
>
> Thanks,
> Fengguang

2009-06-29 10:56:29

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
> 2009/6/29 Wu Fengguang <[email protected]>:
>> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>>> Wu Fengguang <[email protected]> wrote:
>>>>>
>>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>>> more in-depth analysis?
>>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>>> out the same issue.
>>>>>>> I and Wu also wait his analysis.
>>>>>> And do it with a large readahead size :)
>>>>>>
>>>>>> Alan, this was my analysis:
>>>>>>
>>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>>> :
>>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>>> : half of the readahead requests are delayed.
>>>>>>
>>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>>
>>>>>> :
>>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>>> :
>>>>>> : before patch:
>>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>>>>> :
>>>>>> : after patch:
>>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>>>>
>>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>>> and it's looking like 2.6.32 material, if ever.
>>>>>
>>>>> If it turns out to be wonderful, we could always ask the -stable
>>>>> maintainers to put it in 2.6.x.y I guess.
>>>> Agreed. The expected (and interesting) test on a properly configured
>>>> HW RAID has not happened yet, hence the theory remains unsupported.
>>> Hmm, do you see anything improper in the Ronald's setup (see
>>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>>> It is HW RAID based.
>> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
>> RAID performance is too bad and may be improved by increasing the
>> readahead size, hehe.
>>
>>> As I already wrote, we can ask Ronald to perform any needed tests.
>> Thanks! Ronald's test results are:
>>
>> 231 MB/s HW RAID
>> 69.6 MB/s HW RAID + SCST
>> 89.7 MB/s HW RAID + SCST + this patch
>>
>> So this patch seem to help SCST, but again it would be better to
>> improve the SCST throughput first - it is now quite sub-optimal.
>> (Sorry for the long delay: currently I have not got an idea on
>> how to measure such timing issues.)
>>
>> And if Ronald could provide the HW RAID performance with this patch,
>> then we can confirm if this patch really makes a difference for RAID.
>
> I just tested raw HW RAID throughput with the patch applied, same
> readahead setting (512KB), and it doesn't look promising:
>
> ./blockdev-perftest -d -r /dev/cciss/c0d0
> blocksize W W W R R R
> 67108864 -1 -1 -1 5.59686 5.4098 5.45396
> 33554432 -1 -1 -1 6.18616 6.13232 5.96124
> 16777216 -1 -1 -1 7.6757 7.32139 7.4966
> 8388608 -1 -1 -1 8.82793 9.02057 9.01055
> 4194304 -1 -1 -1 12.2289 12.6804 12.19
> 2097152 -1 -1 -1 13.3012 13.706 14.7542
> 1048576 -1 -1 -1 11.7577 12.3609 11.9507
> 524288 -1 -1 -1 12.4112 12.2383 11.9105
> 262144 -1 -1 -1 7.30687 7.4417 7.38246
> 131072 -1 -1 -1 7.95752 7.95053 8.60796
> 65536 -1 -1 -1 10.1282 10.1286 10.1956
> 32768 -1 -1 -1 9.91857 9.98597 10.8421
> 16384 -1 -1 -1 10.8267 10.8899 10.8718
> 8192 -1 -1 -1 12.0345 12.5275 12.005
> 4096 -1 -1 -1 15.1537 15.0771 15.1753
> 2048 -1 -1 -1 25.432 24.8985 25.4303
> 1024 -1 -1 -1 45.2674 45.2707 45.3504
> 512 -1 -1 -1 87.9405 88.5047 87.4726
>
> It dropped down to 189 MB/s. :(

Ronald,

Can you, please, rerun this test locally on the target with the latest
version of blockdev-perftest, which produces much more readable results,
for the following 6 cases:

1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead

2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default

3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB
max_sectors_kb, the rest is default

4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319
vanilla 2.6.29 kernel, default parameters, including read-ahead

5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
read-ahead, the rest is default

6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
read-ahead, 64 KB max_sectors_kb, the rest is default

Thanks,
Vlad

2009-06-29 12:55:16

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 06:55:40PM +0800, Vladislav Bolkhovitin wrote:
> Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
> > 2009/6/29 Wu Fengguang <[email protected]>:
> >> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> >>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> >>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >>>>> On Sun, 7 Jun 2009 06:45:38 +0800
> >>>>> Wu Fengguang <[email protected]> wrote:
> >>>>>
> >>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>>>>> more in-depth analysis?
> >>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>>>>> out the same issue.
> >>>>>>> I and Wu also wait his analysis.
> >>>>>> And do it with a large readahead size :)
> >>>>>>
> >>>>>> Alan, this was my analysis:
> >>>>>>
> >>>>>> : Hifumi, can you help retest with some large readahead size?
> >>>>>> :
> >>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>>>>> : so two readahead IO requests get merged into one real IO, that means
> >>>>>> : half of the readahead requests are delayed.
> >>>>>>
> >>>>>> ie. two readahead requests get merged and complete together, thus the effective
> >>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>>>>
> >>>>>> :
> >>>>>> : The IO completion size goes down from 512 to 256 sectors:
> >>>>>> :
> >>>>>> : before patch:
> >>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> >>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> >>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> >>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> >>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> >>>>>> :
> >>>>>> : after patch:
> >>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> >>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> >>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> >>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> >>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >>>>>>
> >>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >>>>> and it's looking like 2.6.32 material, if ever.
> >>>>>
> >>>>> If it turns out to be wonderful, we could always ask the -stable
> >>>>> maintainers to put it in 2.6.x.y I guess.
> >>>> Agreed. The expected (and interesting) test on a properly configured
> >>>> HW RAID has not happened yet, hence the theory remains unsupported.
> >>> Hmm, do you see anything improper in the Ronald's setup (see
> >>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> >>> It is HW RAID based.
> >> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> >> RAID performance is too bad and may be improved by increasing the
> >> readahead size, hehe.
> >>
> >>> As I already wrote, we can ask Ronald to perform any needed tests.
> >> Thanks! Ronald's test results are:
> >>
> >> 231 MB/s HW RAID
> >> 69.6 MB/s HW RAID + SCST
> >> 89.7 MB/s HW RAID + SCST + this patch
> >>
> >> So this patch seem to help SCST, but again it would be better to
> >> improve the SCST throughput first - it is now quite sub-optimal.
> >> (Sorry for the long delay: currently I have not got an idea on
> >> how to measure such timing issues.)
> >>
> >> And if Ronald could provide the HW RAID performance with this patch,
> >> then we can confirm if this patch really makes a difference for RAID.
> >
> > I just tested raw HW RAID throughput with the patch applied, same
> > readahead setting (512KB), and it doesn't look promising:
> >
> > ./blockdev-perftest -d -r /dev/cciss/c0d0
> > blocksize W W W R R R
> > 67108864 -1 -1 -1 5.59686 5.4098 5.45396
> > 33554432 -1 -1 -1 6.18616 6.13232 5.96124
> > 16777216 -1 -1 -1 7.6757 7.32139 7.4966
> > 8388608 -1 -1 -1 8.82793 9.02057 9.01055
> > 4194304 -1 -1 -1 12.2289 12.6804 12.19
> > 2097152 -1 -1 -1 13.3012 13.706 14.7542
> > 1048576 -1 -1 -1 11.7577 12.3609 11.9507
> > 524288 -1 -1 -1 12.4112 12.2383 11.9105
> > 262144 -1 -1 -1 7.30687 7.4417 7.38246
> > 131072 -1 -1 -1 7.95752 7.95053 8.60796
> > 65536 -1 -1 -1 10.1282 10.1286 10.1956
> > 32768 -1 -1 -1 9.91857 9.98597 10.8421
> > 16384 -1 -1 -1 10.8267 10.8899 10.8718
> > 8192 -1 -1 -1 12.0345 12.5275 12.005
> > 4096 -1 -1 -1 15.1537 15.0771 15.1753
> > 2048 -1 -1 -1 25.432 24.8985 25.4303
> > 1024 -1 -1 -1 45.2674 45.2707 45.3504
> > 512 -1 -1 -1 87.9405 88.5047 87.4726
> >
> > It dropped down to 189 MB/s. :(
>
> Ronald,
>
> Can you, please, rerun this test locally on the target with the latest
> version of blockdev-perftest, which produces much more readable results,

Is blockdev-perftest public available? It's not obvious from google search.

> for the following 6 cases:
>
> 1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead

Why not 2.6.30? :)

> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default

How about 2MB RAID readahead size? That transforms into about 512KB
per-disk readahead size.

> 3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB
> max_sectors_kb, the rest is default
>
> 4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319
> vanilla 2.6.29 kernel, default parameters, including read-ahead
>
> 5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
> read-ahead, the rest is default
>
> 6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
> read-ahead, 64 KB max_sectors_kb, the rest is default

Thanks,
Fengguang

2009-06-29 12:58:31

by Bart Van Assche

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 2:54 PM, Wu Fengguang<[email protected]> wrote:
> Is blockdev-perftest public available? It's not obvious from google search.

This script is publicly available. You can retrieve it by running the
following command:
svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk/scripts

Bart.

2009-06-29 13:00:30

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 06:55:21PM +0800, Vladislav Bolkhovitin wrote:
>
>
> Wu Fengguang, on 06/29/2009 01:34 PM wrote:
> > On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
> >> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
> >>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
> >>>> On Sun, 7 Jun 2009 06:45:38 +0800
> >>>> Wu Fengguang <[email protected]> wrote:
> >>>>
> >>>>>>> Do you have a place where the raw blktrace data can be retrieved for
> >>>>>>> more in-depth analysis?
> >>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
> >>>>>> out the same issue.
> >>>>>> I and Wu also wait his analysis.
> >>>>> And do it with a large readahead size :)
> >>>>>
> >>>>> Alan, this was my analysis:
> >>>>>
> >>>>> : Hifumi, can you help retest with some large readahead size?
> >>>>> :
> >>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
> >>>>> : so two readahead IO requests get merged into one real IO, that means
> >>>>> : half of the readahead requests are delayed.
> >>>>>
> >>>>> ie. two readahead requests get merged and complete together, thus the effective
> >>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
> >>>>>
> >>>>> :
> >>>>> : The IO completion size goes down from 512 to 256 sectors:
> >>>>> :
> >>>>> : before patch:
> >>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
> >>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
> >>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
> >>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
> >>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
> >>>>> :
> >>>>> : after patch:
> >>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
> >>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
> >>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
> >>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
> >>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
> >>>>>
> >>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
> >>>> and it's looking like 2.6.32 material, if ever.
> >>>>
> >>>> If it turns out to be wonderful, we could always ask the -stable
> >>>> maintainers to put it in 2.6.x.y I guess.
> >>> Agreed. The expected (and interesting) test on a properly configured
> >>> HW RAID has not happened yet, hence the theory remains unsupported.
> >> Hmm, do you see anything improper in the Ronald's setup (see
> >> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
> >> It is HW RAID based.
> >
> > No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
> > RAID performance is too bad and may be improved by increasing the
> > readahead size, hehe.
> >
> >> As I already wrote, we can ask Ronald to perform any needed tests.
> >
> > Thanks! Ronald's test results are:
> >
> > 231 MB/s HW RAID
> > 69.6 MB/s HW RAID + SCST
> > 89.7 MB/s HW RAID + SCST + this patch
> >
> > So this patch seem to help SCST, but again it would be better to
> > improve the SCST throughput first - it is now quite sub-optimal.
>
> No, SCST performance isn't an issue here. You simply can't get more than
> 110 MB/s from iSCSI over 1GbE, hence 231 MB/s fundamentally isn't
> possible. There is only room for 20% improvement, which should be

Ah yes.

> achieved with better client-side-driven pipelining (see our other
> discussions, e.g. http://lkml.org/lkml/2009/5/12/370)

Yeah, that's what I want to figure out why :)

Thanks,
Fengguang

> > (Sorry for the long delay: currently I have not got an idea on
> > how to measure such timing issues.)
> >
> > And if Ronald could provide the HW RAID performance with this patch,
> > then we can confirm if this patch really makes a difference for RAID.
> >
> > Thanks,
> > Fengguang

2009-06-29 13:01:44

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 08:58:24PM +0800, Bart Van Assche wrote:
> On Mon, Jun 29, 2009 at 2:54 PM, Wu Fengguang<[email protected]> wrote:
> > Is blockdev-perftest public available? It's not obvious from google search.
>
> This script is publicly available. You can retrieve it by running the
> following command:
> svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk/scripts

Thank you! This is a handy tool :)

Fengguang

2009-06-29 13:05:16

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> On Mon, Jun 29, 2009 at 06:55:40PM +0800, Vladislav Bolkhovitin wrote:
>> Ronald Moesbergen, on 06/29/2009 02:26 PM wrote:
>>> 2009/6/29 Wu Fengguang <[email protected]>:
>>>> On Sat, Jun 20, 2009 at 08:29:31PM +0800, Vladislav Bolkhovitin wrote:
>>>>> Wu Fengguang, on 06/20/2009 07:55 AM wrote:
>>>>>> On Fri, Jun 19, 2009 at 03:04:36AM +0800, Andrew Morton wrote:
>>>>>>> On Sun, 7 Jun 2009 06:45:38 +0800
>>>>>>> Wu Fengguang <[email protected]> wrote:
>>>>>>>
>>>>>>>>>> Do you have a place where the raw blktrace data can be retrieved for
>>>>>>>>>> more in-depth analysis?
>>>>>>>>> I think your comment is really adequate. In another thread, Wu Fengguang pointed
>>>>>>>>> out the same issue.
>>>>>>>>> I and Wu also wait his analysis.
>>>>>>>> And do it with a large readahead size :)
>>>>>>>>
>>>>>>>> Alan, this was my analysis:
>>>>>>>>
>>>>>>>> : Hifumi, can you help retest with some large readahead size?
>>>>>>>> :
>>>>>>>> : Your readahead size (128K) is smaller than your max_sectors_kb (256K),
>>>>>>>> : so two readahead IO requests get merged into one real IO, that means
>>>>>>>> : half of the readahead requests are delayed.
>>>>>>>>
>>>>>>>> ie. two readahead requests get merged and complete together, thus the effective
>>>>>>>> IO size is doubled but at the same time it becomes completely synchronous IO.
>>>>>>>>
>>>>>>>> :
>>>>>>>> : The IO completion size goes down from 512 to 256 sectors:
>>>>>>>> :
>>>>>>>> : before patch:
>>>>>>>> : 8,0 3 177955 50.050313976 0 C R 8724991 + 512 [0]
>>>>>>>> : 8,0 3 177966 50.053380250 0 C R 8725503 + 512 [0]
>>>>>>>> : 8,0 3 177977 50.056970395 0 C R 8726015 + 512 [0]
>>>>>>>> : 8,0 3 177988 50.060326743 0 C R 8726527 + 512 [0]
>>>>>>>> : 8,0 3 177999 50.063922341 0 C R 8727039 + 512 [0]
>>>>>>>> :
>>>>>>>> : after patch:
>>>>>>>> : 8,0 3 257297 50.000760847 0 C R 9480703 + 256 [0]
>>>>>>>> : 8,0 3 257306 50.003034240 0 C R 9480959 + 256 [0]
>>>>>>>> : 8,0 3 257307 50.003076338 0 C R 9481215 + 256 [0]
>>>>>>>> : 8,0 3 257323 50.004774693 0 C R 9481471 + 256 [0]
>>>>>>>> : 8,0 3 257332 50.006865854 0 C R 9481727 + 256 [0]
>>>>>>>>
>>>>>>> I haven't sent readahead-add-blk_run_backing_dev.patch in to Linus yet
>>>>>>> and it's looking like 2.6.32 material, if ever.
>>>>>>>
>>>>>>> If it turns out to be wonderful, we could always ask the -stable
>>>>>>> maintainers to put it in 2.6.x.y I guess.
>>>>>> Agreed. The expected (and interesting) test on a properly configured
>>>>>> HW RAID has not happened yet, hence the theory remains unsupported.
>>>>> Hmm, do you see anything improper in the Ronald's setup (see
>>>>> http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel)?
>>>>> It is HW RAID based.
>>>> No. Ronald's HW RAID performance is reasonably good. I meant Hifumi's
>>>> RAID performance is too bad and may be improved by increasing the
>>>> readahead size, hehe.
>>>>
>>>>> As I already wrote, we can ask Ronald to perform any needed tests.
>>>> Thanks! Ronald's test results are:
>>>>
>>>> 231 MB/s HW RAID
>>>> 69.6 MB/s HW RAID + SCST
>>>> 89.7 MB/s HW RAID + SCST + this patch
>>>>
>>>> So this patch seem to help SCST, but again it would be better to
>>>> improve the SCST throughput first - it is now quite sub-optimal.
>>>> (Sorry for the long delay: currently I have not got an idea on
>>>> how to measure such timing issues.)
>>>>
>>>> And if Ronald could provide the HW RAID performance with this patch,
>>>> then we can confirm if this patch really makes a difference for RAID.
>>> I just tested raw HW RAID throughput with the patch applied, same
>>> readahead setting (512KB), and it doesn't look promising:
>>>
>>> ./blockdev-perftest -d -r /dev/cciss/c0d0
>>> blocksize W W W R R R
>>> 67108864 -1 -1 -1 5.59686 5.4098 5.45396
>>> 33554432 -1 -1 -1 6.18616 6.13232 5.96124
>>> 16777216 -1 -1 -1 7.6757 7.32139 7.4966
>>> 8388608 -1 -1 -1 8.82793 9.02057 9.01055
>>> 4194304 -1 -1 -1 12.2289 12.6804 12.19
>>> 2097152 -1 -1 -1 13.3012 13.706 14.7542
>>> 1048576 -1 -1 -1 11.7577 12.3609 11.9507
>>> 524288 -1 -1 -1 12.4112 12.2383 11.9105
>>> 262144 -1 -1 -1 7.30687 7.4417 7.38246
>>> 131072 -1 -1 -1 7.95752 7.95053 8.60796
>>> 65536 -1 -1 -1 10.1282 10.1286 10.1956
>>> 32768 -1 -1 -1 9.91857 9.98597 10.8421
>>> 16384 -1 -1 -1 10.8267 10.8899 10.8718
>>> 8192 -1 -1 -1 12.0345 12.5275 12.005
>>> 4096 -1 -1 -1 15.1537 15.0771 15.1753
>>> 2048 -1 -1 -1 25.432 24.8985 25.4303
>>> 1024 -1 -1 -1 45.2674 45.2707 45.3504
>>> 512 -1 -1 -1 87.9405 88.5047 87.4726
>>>
>>> It dropped down to 189 MB/s. :(
>> Ronald,
>>
>> Can you, please, rerun this test locally on the target with the latest
>> version of blockdev-perftest, which produces much more readable results,
>
> Is blockdev-perftest public available? It's not obvious from google search.
>
>> for the following 6 cases:
>>
>> 1. Default vanilla 2.6.29 kernel, default parameters, including read-ahead
>
> Why not 2.6.30? :)

We started with 2.6.29, so why not complete with it (to save additional
Ronald's effort to move on 2.6.30)?

>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>
> How about 2MB RAID readahead size? That transforms into about 512KB
> per-disk readahead size.

OK. Ronald, can you 4 more test cases, please:

7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default

8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
max_sectors_kb, the rest is default

9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
read-ahead, the rest is default

10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
read-ahead, 64 KB max_sectors_kb, the rest is default

>> 3. Default vanilla 2.6.29 kernel, 512 KB read-ahead, 64 KB
>> max_sectors_kb, the rest is default
>>
>> 4. Patched by the Fengguang's patch http://lkml.org/lkml/2009/5/21/319
>> vanilla 2.6.29 kernel, default parameters, including read-ahead
>>
>> 5. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
>> read-ahead, the rest is default
>>
>> 6. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 512 KB
>> read-ahead, 64 KB max_sectors_kb, the rest is default
>
> Thanks,
> Fengguang
>
>

2009-06-29 13:13:49

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> >
> > Why not 2.6.30? :)
>
> We started with 2.6.29, so why not complete with it (to save additional
> Ronald's effort to move on 2.6.30)?

OK, that's fair enough.

Fengguang

2009-06-29 13:29:10

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> > >
> > > Why not 2.6.30? :)
> >
> > We started with 2.6.29, so why not complete with it (to save additional
> > Ronald's effort to move on 2.6.30)?
>
> OK, that's fair enough.

btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
in case it will help the SCST performance.

Ronald, if you run context readahead, please make sure that the server
side readahead size is bigger than the client side readahead size.

Thanks,
Fengguang


Attachments:
(No filename) (639.00 B)
readahead-context-2.6.29.patch (6.50 kB)
Download all attachments

2009-06-29 14:00:32

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

... tests ...

> We started with 2.6.29, so why not complete with it (to save additional
> Ronald's effort to move on 2.6.30)?
>
>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>
>> How about 2MB RAID readahead size? That transforms into about 512KB
>> per-disk readahead size.
>
> OK. Ronald, can you 4 more test cases, please:
>
> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>
> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> max_sectors_kb, the rest is default
>
> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> read-ahead, the rest is default
>
> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> read-ahead, 64 KB max_sectors_kb, the rest is default

The results:

Unpatched, 128KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.621 5.503 5.419 185.744 2.780 2.902
33554432 6.628 5.897 6.242 164.068 7.827 5.127
16777216 7.312 7.165 7.614 139.148 3.501 8.697
8388608 8.719 8.408 8.694 119.003 1.973 14.875
4194304 11.836 12.192 12.137 84.958 1.111 21.239
2097152 13.452 13.992 14.035 74.090 1.442 37.045
1048576 12.759 11.996 12.195 83.194 2.152 83.194
524288 11.895 12.297 12.587 83.570 1.945 167.140
262144 7.325 7.285 7.444 139.304 1.272 557.214
131072 7.992 8.832 7.952 124.279 5.901 994.228
65536 10.940 10.062 10.122 98.847 3.715 1581.545
32768 9.973 10.012 9.945 102.640 0.281 3284.493
16384 11.377 10.538 10.692 94.316 3.100 6036.222

Unpatched, 512KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.032 4.770 5.265 204.228 8.271 3.191
33554432 5.569 5.712 5.863 179.263 3.755 5.602
16777216 6.661 6.857 6.550 153.132 2.888 9.571
8388608 8.022 8.000 7.978 127.998 0.288 16.000
4194304 10.959 11.579 12.208 88.586 3.902 22.146
2097152 13.692 12.670 12.625 78.906 2.914 39.453
1048576 11.120 11.144 10.878 92.703 1.018 92.703
524288 11.234 10.915 11.374 91.667 1.587 183.334
262144 6.848 6.678 6.795 151.191 1.594 604.763
131072 7.393 7.367 7.337 139.025 0.428 1112.202
65536 10.003 10.919 10.015 99.466 4.019 1591.462
32768 10.117 10.124 10.169 101.018 0.229 3232.574
16384 11.614 11.027 11.029 91.293 2.207 5842.771

Unpatched, 2MB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.268 5.316 5.418 191.996 2.241 3.000
33554432 5.831 6.459 6.110 167.259 6.977 5.227
16777216 7.313 7.069 7.197 142.385 1.972 8.899
8388608 8.657 8.500 8.498 119.754 1.039 14.969
4194304 11.846 12.116 11.801 85.911 0.994 21.478
2097152 12.917 13.652 13.100 77.484 1.808 38.742
1048576 9.544 10.667 10.807 99.345 5.640 99.345
524288 11.736 7.171 6.599 128.410 29.539 256.821
262144 7.530 7.403 7.416 137.464 1.053 549.857
131072 8.741 8.002 8.022 124.256 5.029 994.051
65536 10.701 10.138 10.090 99.394 2.629 1590.311
32768 9.978 9.950 9.934 102.875 0.188 3291.994
16384 11.435 10.823 10.907 92.684 2.234 5931.749

Unpatched, 512KB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.994 3.991 4.123 253.774 3.838 3.965
33554432 4.100 4.329 4.161 244.111 5.569 7.628
16777216 5.476 4.835 5.079 200.148 10.177 12.509
8388608 5.484 5.258 5.227 192.470 4.084 24.059
4194304 6.429 6.458 6.435 158.989 0.315 39.747
2097152 7.219 7.744 7.306 138.081 4.187 69.040
1048576 6.850 6.897 6.776 149.696 1.089 149.696
524288 6.406 6.393 6.469 159.439 0.814 318.877
262144 6.865 7.508 6.861 144.931 6.041 579.726
131072 8.435 8.482 8.307 121.792 1.076 974.334
65536 9.616 9.610 10.262 104.279 3.176 1668.462
32768 9.682 9.932 10.015 103.701 1.497 3318.428
16384 10.962 10.852 11.565 92.106 2.547 5894.813

Unpatched, 2MB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.730 3.714 3.914 270.615 6.396 4.228
33554432 4.445 3.999 3.989 247.710 12.276 7.741
16777216 4.763 4.712 4.709 216.590 1.122 13.537
8388608 5.001 5.086 5.229 200.649 3.673 25.081
4194304 6.365 6.362 6.905 156.710 5.948 39.178
2097152 7.390 7.367 7.270 139.470 0.992 69.735
1048576 7.038 7.050 7.090 145.052 0.456 145.052
524288 6.862 7.167 7.278 144.272 3.617 288.544
262144 7.266 7.313 7.265 140.635 0.436 562.540
131072 8.677 8.735 8.821 117.108 0.790 936.865
65536 10.865 10.040 10.038 99.418 3.658 1590.685
32768 10.167 10.130 10.177 100.805 0.201 3225.749
16384 11.643 11.017 11.103 91.041 2.203 5826.629

Patched, 128KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.670 5.188 5.636 186.555 7.671 2.915
33554432 6.069 5.971 6.141 168.992 1.954 5.281
16777216 7.821 7.501 7.372 135.451 3.340 8.466
8388608 9.147 8.618 9.000 114.849 2.908 14.356
4194304 12.199 12.914 12.381 81.981 1.964 20.495
2097152 13.449 13.891 14.288 73.842 1.828 36.921
1048576 11.890 12.182 11.519 86.360 1.984 86.360
524288 11.899 12.706 12.135 83.678 2.287 167.357
262144 7.460 7.559 7.563 136.041 0.864 544.164
131072 7.987 8.003 8.530 125.403 3.792 1003.220
65536 10.179 10.119 10.131 100.957 0.255 1615.312
32768 9.899 9.923 10.589 101.114 3.121 3235.656
16384 10.849 10.835 10.876 94.351 0.150 6038.474

Patched, 512KB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 5.062 5.111 5.083 201.358 0.795 3.146
33554432 5.589 5.713 5.657 181.165 1.625 5.661
16777216 6.337 7.220 6.457 154.002 8.690 9.625
8388608 7.952 7.880 7.527 131.588 3.192 16.448
4194304 10.695 11.224 10.736 94.119 2.047 23.530
2097152 10.898 12.072 12.358 87.215 4.839 43.607
1048576 10.890 11.347 9.290 98.166 8.664 98.166
524288 10.898 11.032 10.887 93.611 0.560 187.223
262144 6.714 7.230 6.804 148.219 4.724 592.875
131072 7.325 7.342 7.363 139.441 0.295 1115.530
65536 9.773 9.988 10.592 101.327 3.417 1621.227
32768 10.031 9.995 10.086 102.019 0.377 3264.620
16384 11.041 10.987 11.564 91.502 2.093 5856.144

Patched, 2MB readahead, 512 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 4.970 5.097 5.188 201.435 3.559 3.147
33554432 5.588 5.793 5.169 186.042 8.923 5.814
16777216 6.151 6.414 6.526 161.012 4.027 10.063
8388608 7.836 7.299 7.475 135.980 3.989 16.998
4194304 11.792 10.964 10.158 93.683 5.706 23.421
2097152 11.225 11.492 11.357 90.162 0.866 45.081
1048576 12.017 11.258 11.432 88.580 2.449 88.580
524288 5.974 10.883 11.840 117.323 38.361 234.647
262144 6.774 6.765 6.526 153.155 2.661 612.619
131072 8.036 7.324 7.341 135.579 5.766 1084.633
65536 9.964 10.595 9.999 100.608 2.806 1609.735
32768 10.132 10.036 10.190 101.197 0.637 3238.308
16384 11.133 11.568 11.036 91.093 1.850 5829.981

Patched, 512KB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.722 3.698 3.721 275.759 0.809 4.309
33554432 4.058 3.849 3.957 259.063 5.580 8.096
16777216 4.601 4.613 4.738 220.212 2.913 13.763
8388608 5.039 5.534 5.017 197.452 8.791 24.682
4194304 6.302 6.270 6.282 162.942 0.341 40.735
2097152 7.314 7.302 7.069 141.700 2.233 70.850
1048576 6.881 7.655 6.909 143.597 6.951 143.597
524288 7.163 7.025 6.951 145.344 1.803 290.687
262144 7.315 7.233 7.299 140.621 0.689 562.482
131072 9.292 8.756 8.807 114.475 3.036 915.803
65536 9.942 9.985 9.960 102.787 0.181 1644.598
32768 10.721 10.091 10.192 99.154 2.605 3172.935
16384 11.049 11.016 11.065 92.727 0.169 5934.531

Patched, 2MB readahead, 64 max_sectors_kb
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 3.697 3.819 3.741 272.931 3.661 4.265
33554432 3.951 3.905 4.038 258.320 3.586 8.073
16777216 5.595 5.182 4.864 197.044 11.236 12.315
8388608 5.267 5.156 5.116 197.725 2.431 24.716
4194304 6.411 6.335 6.290 161.389 1.267 40.347
2097152 7.329 7.663 7.462 136.860 2.502 68.430
1048576 7.225 7.077 7.215 142.784 1.352 142.784
524288 6.903 7.015 7.095 146.210 1.647 292.419
262144 7.365 7.926 7.278 136.309 5.076 545.237
131072 8.796 8.819 8.814 116.233 0.130 929.862
65536 9.998 10.609 9.995 100.464 2.786 1607.423
32768 10.161 10.124 10.246 100.623 0.505 3219.943

Regards,
Ronald.

2009-06-29 14:21:50

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
> ... tests ...
>
> > We started with 2.6.29, so why not complete with it (to save additional
> > Ronald's effort to move on 2.6.30)?
> >
> >>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> >>
> >> How about 2MB RAID readahead size? That transforms into about 512KB
> >> per-disk readahead size.
> >
> > OK. Ronald, can you 4 more test cases, please:
> >
> > 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
> >
> > 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> > max_sectors_kb, the rest is default
> >
> > 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > read-ahead, the rest is default
> >
> > 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > read-ahead, 64 KB max_sectors_kb, the rest is default
>
> The results:

I made a blindless average:

N MB/s IOPS case

0 114.859 984.148 Unpatched, 128KB readahead, 512 max_sectors_kb
1 122.960 981.213 Unpatched, 512KB readahead, 512 max_sectors_kb
2 120.709 985.111 Unpatched, 2MB readahead, 512 max_sectors_kb
3 158.732 1004.714 Unpatched, 512KB readahead, 64 max_sectors_kb
4 159.237 979.659 Unpatched, 2MB readahead, 64 max_sectors_kb

5 114.583 982.998 Patched, 128KB readahead, 512 max_sectors_kb
6 124.902 987.523 Patched, 512KB readahead, 512 max_sectors_kb
7 127.373 984.848 Patched, 2MB readahead, 512 max_sectors_kb
8 161.218 986.698 Patched, 512KB readahead, 64 max_sectors_kb
9 163.908 574.651 Patched, 2MB readahead, 64 max_sectors_kb

So before/after patch:

avg throughput 135.299 => 138.397 by +2.3%
avg IOPS 986.969 => 903.344 by -8.5%

The IOPS is a bit weird.

Summaries:
- this patch improves RAID throughput by +2.3% on average
- after this patch, 2MB readahead performs slightly better
(by 1-2%) than 512KB readahead

Thanks,
Fengguang

> Unpatched, 128KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.621 5.503 5.419 185.744 2.780 2.902
> 33554432 6.628 5.897 6.242 164.068 7.827 5.127
> 16777216 7.312 7.165 7.614 139.148 3.501 8.697
> 8388608 8.719 8.408 8.694 119.003 1.973 14.875
> 4194304 11.836 12.192 12.137 84.958 1.111 21.239
> 2097152 13.452 13.992 14.035 74.090 1.442 37.045
> 1048576 12.759 11.996 12.195 83.194 2.152 83.194
> 524288 11.895 12.297 12.587 83.570 1.945 167.140
> 262144 7.325 7.285 7.444 139.304 1.272 557.214
> 131072 7.992 8.832 7.952 124.279 5.901 994.228
> 65536 10.940 10.062 10.122 98.847 3.715 1581.545
> 32768 9.973 10.012 9.945 102.640 0.281 3284.493
> 16384 11.377 10.538 10.692 94.316 3.100 6036.222
>
> Unpatched, 512KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.032 4.770 5.265 204.228 8.271 3.191
> 33554432 5.569 5.712 5.863 179.263 3.755 5.602
> 16777216 6.661 6.857 6.550 153.132 2.888 9.571
> 8388608 8.022 8.000 7.978 127.998 0.288 16.000
> 4194304 10.959 11.579 12.208 88.586 3.902 22.146
> 2097152 13.692 12.670 12.625 78.906 2.914 39.453
> 1048576 11.120 11.144 10.878 92.703 1.018 92.703
> 524288 11.234 10.915 11.374 91.667 1.587 183.334
> 262144 6.848 6.678 6.795 151.191 1.594 604.763
> 131072 7.393 7.367 7.337 139.025 0.428 1112.202
> 65536 10.003 10.919 10.015 99.466 4.019 1591.462
> 32768 10.117 10.124 10.169 101.018 0.229 3232.574
> 16384 11.614 11.027 11.029 91.293 2.207 5842.771
>
> Unpatched, 2MB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.268 5.316 5.418 191.996 2.241 3.000
> 33554432 5.831 6.459 6.110 167.259 6.977 5.227
> 16777216 7.313 7.069 7.197 142.385 1.972 8.899
> 8388608 8.657 8.500 8.498 119.754 1.039 14.969
> 4194304 11.846 12.116 11.801 85.911 0.994 21.478
> 2097152 12.917 13.652 13.100 77.484 1.808 38.742
> 1048576 9.544 10.667 10.807 99.345 5.640 99.345
> 524288 11.736 7.171 6.599 128.410 29.539 256.821
> 262144 7.530 7.403 7.416 137.464 1.053 549.857
> 131072 8.741 8.002 8.022 124.256 5.029 994.051
> 65536 10.701 10.138 10.090 99.394 2.629 1590.311
> 32768 9.978 9.950 9.934 102.875 0.188 3291.994
> 16384 11.435 10.823 10.907 92.684 2.234 5931.749
>
> Unpatched, 512KB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.994 3.991 4.123 253.774 3.838 3.965
> 33554432 4.100 4.329 4.161 244.111 5.569 7.628
> 16777216 5.476 4.835 5.079 200.148 10.177 12.509
> 8388608 5.484 5.258 5.227 192.470 4.084 24.059
> 4194304 6.429 6.458 6.435 158.989 0.315 39.747
> 2097152 7.219 7.744 7.306 138.081 4.187 69.040
> 1048576 6.850 6.897 6.776 149.696 1.089 149.696
> 524288 6.406 6.393 6.469 159.439 0.814 318.877
> 262144 6.865 7.508 6.861 144.931 6.041 579.726
> 131072 8.435 8.482 8.307 121.792 1.076 974.334
> 65536 9.616 9.610 10.262 104.279 3.176 1668.462
> 32768 9.682 9.932 10.015 103.701 1.497 3318.428
> 16384 10.962 10.852 11.565 92.106 2.547 5894.813
>
> Unpatched, 2MB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.730 3.714 3.914 270.615 6.396 4.228
> 33554432 4.445 3.999 3.989 247.710 12.276 7.741
> 16777216 4.763 4.712 4.709 216.590 1.122 13.537
> 8388608 5.001 5.086 5.229 200.649 3.673 25.081
> 4194304 6.365 6.362 6.905 156.710 5.948 39.178
> 2097152 7.390 7.367 7.270 139.470 0.992 69.735
> 1048576 7.038 7.050 7.090 145.052 0.456 145.052
> 524288 6.862 7.167 7.278 144.272 3.617 288.544
> 262144 7.266 7.313 7.265 140.635 0.436 562.540
> 131072 8.677 8.735 8.821 117.108 0.790 936.865
> 65536 10.865 10.040 10.038 99.418 3.658 1590.685
> 32768 10.167 10.130 10.177 100.805 0.201 3225.749
> 16384 11.643 11.017 11.103 91.041 2.203 5826.629
>
> Patched, 128KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.670 5.188 5.636 186.555 7.671 2.915
> 33554432 6.069 5.971 6.141 168.992 1.954 5.281
> 16777216 7.821 7.501 7.372 135.451 3.340 8.466
> 8388608 9.147 8.618 9.000 114.849 2.908 14.356
> 4194304 12.199 12.914 12.381 81.981 1.964 20.495
> 2097152 13.449 13.891 14.288 73.842 1.828 36.921
> 1048576 11.890 12.182 11.519 86.360 1.984 86.360
> 524288 11.899 12.706 12.135 83.678 2.287 167.357
> 262144 7.460 7.559 7.563 136.041 0.864 544.164
> 131072 7.987 8.003 8.530 125.403 3.792 1003.220
> 65536 10.179 10.119 10.131 100.957 0.255 1615.312
> 32768 9.899 9.923 10.589 101.114 3.121 3235.656
> 16384 10.849 10.835 10.876 94.351 0.150 6038.474
>
> Patched, 512KB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 5.062 5.111 5.083 201.358 0.795 3.146
> 33554432 5.589 5.713 5.657 181.165 1.625 5.661
> 16777216 6.337 7.220 6.457 154.002 8.690 9.625
> 8388608 7.952 7.880 7.527 131.588 3.192 16.448
> 4194304 10.695 11.224 10.736 94.119 2.047 23.530
> 2097152 10.898 12.072 12.358 87.215 4.839 43.607
> 1048576 10.890 11.347 9.290 98.166 8.664 98.166
> 524288 10.898 11.032 10.887 93.611 0.560 187.223
> 262144 6.714 7.230 6.804 148.219 4.724 592.875
> 131072 7.325 7.342 7.363 139.441 0.295 1115.530
> 65536 9.773 9.988 10.592 101.327 3.417 1621.227
> 32768 10.031 9.995 10.086 102.019 0.377 3264.620
> 16384 11.041 10.987 11.564 91.502 2.093 5856.144
>
> Patched, 2MB readahead, 512 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 4.970 5.097 5.188 201.435 3.559 3.147
> 33554432 5.588 5.793 5.169 186.042 8.923 5.814
> 16777216 6.151 6.414 6.526 161.012 4.027 10.063
> 8388608 7.836 7.299 7.475 135.980 3.989 16.998
> 4194304 11.792 10.964 10.158 93.683 5.706 23.421
> 2097152 11.225 11.492 11.357 90.162 0.866 45.081
> 1048576 12.017 11.258 11.432 88.580 2.449 88.580
> 524288 5.974 10.883 11.840 117.323 38.361 234.647
> 262144 6.774 6.765 6.526 153.155 2.661 612.619
> 131072 8.036 7.324 7.341 135.579 5.766 1084.633
> 65536 9.964 10.595 9.999 100.608 2.806 1609.735
> 32768 10.132 10.036 10.190 101.197 0.637 3238.308
> 16384 11.133 11.568 11.036 91.093 1.850 5829.981
>
> Patched, 512KB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.722 3.698 3.721 275.759 0.809 4.309
> 33554432 4.058 3.849 3.957 259.063 5.580 8.096
> 16777216 4.601 4.613 4.738 220.212 2.913 13.763
> 8388608 5.039 5.534 5.017 197.452 8.791 24.682
> 4194304 6.302 6.270 6.282 162.942 0.341 40.735
> 2097152 7.314 7.302 7.069 141.700 2.233 70.850
> 1048576 6.881 7.655 6.909 143.597 6.951 143.597
> 524288 7.163 7.025 6.951 145.344 1.803 290.687
> 262144 7.315 7.233 7.299 140.621 0.689 562.482
> 131072 9.292 8.756 8.807 114.475 3.036 915.803
> 65536 9.942 9.985 9.960 102.787 0.181 1644.598
> 32768 10.721 10.091 10.192 99.154 2.605 3172.935
> 16384 11.049 11.016 11.065 92.727 0.169 5934.531
>
> Patched, 2MB readahead, 64 max_sectors_kb
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 3.697 3.819 3.741 272.931 3.661 4.265
> 33554432 3.951 3.905 4.038 258.320 3.586 8.073
> 16777216 5.595 5.182 4.864 197.044 11.236 12.315
> 8388608 5.267 5.156 5.116 197.725 2.431 24.716
> 4194304 6.411 6.335 6.290 161.389 1.267 40.347
> 2097152 7.329 7.663 7.462 136.860 2.502 68.430
> 1048576 7.225 7.077 7.215 142.784 1.352 142.784
> 524288 6.903 7.015 7.095 146.210 1.647 292.419
> 262144 7.365 7.926 7.278 136.309 5.076 545.237
> 131072 8.796 8.819 8.814 116.233 0.130 929.862
> 65536 9.998 10.609 9.995 100.464 2.786 1607.423
> 32768 10.161 10.124 10.246 100.623 0.505 3219.943
>
> Regards,
> Ronald.

2009-06-29 14:43:57

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/6/29 Wu Fengguang <[email protected]>:
> On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>> > >
>> > > Why not 2.6.30? :)
>> >
>> > We started with 2.6.29, so why not complete with it (to save additional
>> > Ronald's effort to move on 2.6.30)?
>>
>> OK, that's fair enough.
>
> btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
> in case it will help the SCST performance.
>
> Ronald, if you run context readahead, please make sure that the server
> side readahead size is bigger than the client side readahead size.

I tried this patch on a vanilla kernel and no other patches applied,
but it does not seem to help. The iSCSI throughput does not go above
60MB/s. (1GB in 17 seconds). I have tried several readahead settings
from 128KB up to 4MB and kept the server readahead at twice the client
readahead, but it never comes above 60MB/s. This is using SCST on the
serverside and openiscsi on the client. I get much better throughput
(90 MB/s) when using the patches supplied with SCST, together with the
blk_run_backing_dev readahead patch.

Ronald.

2009-06-29 14:52:27

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
> 2009/6/29 Wu Fengguang <[email protected]>:
> > On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
> >> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
> >> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
> >> > >
> >> > > Why not 2.6.30? :)
> >> >
> >> > We started with 2.6.29, so why not complete with it (to save additional
> >> > Ronald's effort to move on 2.6.30)?
> >>
> >> OK, that's fair enough.
> >
> > btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
> > in case it will help the SCST performance.
> >
> > Ronald, if you run context readahead, please make sure that the server
> > side readahead size is bigger than the client side readahead size.
>
> I tried this patch on a vanilla kernel and no other patches applied,
> but it does not seem to help. The iSCSI throughput does not go above
> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
> from 128KB up to 4MB and kept the server readahead at twice the client
> readahead, but it never comes above 60MB/s. This is using SCST on the

OK, thanks for the tests anyway!

> serverside and openiscsi on the client. I get much better throughput
> (90 MB/s) when using the patches supplied with SCST, together with the

What do you mean by "patches supplied with SCST"?

> blk_run_backing_dev readahead patch.

Thanks,
Fengguang

2009-06-29 14:56:25

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/6/29 Wu Fengguang <[email protected]>:
> On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
>> 2009/6/29 Wu Fengguang <[email protected]>:
>> > On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>> >> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>> >> > Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>> >> > >
>> >> > > Why not 2.6.30? :)
>> >> >
>> >> > We started with 2.6.29, so why not complete with it (to save additional
>> >> > Ronald's effort to move on 2.6.30)?
>> >>
>> >> OK, that's fair enough.
>> >
>> > btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
>> > in case it will help the SCST performance.
>> >
>> > Ronald, if you run context readahead, please make sure that the server
>> > side readahead size is bigger than the client side readahead size.
>>
>> I tried this patch on a vanilla kernel and no other patches applied,
>> but it does not seem to help. The iSCSI throughput does not go above
>> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
>> from 128KB up to 4MB and kept the server readahead at twice the client
>> readahead, but it never comes above 60MB/s. This is using SCST on the
>
> OK, thanks for the tests anyway!

You're welcome.

>> serverside and openiscsi on the client. I get much better throughput
>> (90 MB/s) when using the patches supplied with SCST, together with the
>
> What do you mean by "patches supplied with SCST"?

These:
http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/kernel/

Regards,
Ronald.

2009-06-29 15:01:53

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jun 29, 2009 at 10:21:24PM +0800, Wu Fengguang wrote:
> On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
> > ... tests ...
> >
> > > We started with 2.6.29, so why not complete with it (to save additional
> > > Ronald's effort to move on 2.6.30)?
> > >
> > >>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
> > >>
> > >> How about 2MB RAID readahead size? That transforms into about 512KB
> > >> per-disk readahead size.
> > >
> > > OK. Ronald, can you 4 more test cases, please:
> > >
> > > 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
> > >
> > > 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
> > > max_sectors_kb, the rest is default
> > >
> > > 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > > read-ahead, the rest is default
> > >
> > > 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
> > > read-ahead, 64 KB max_sectors_kb, the rest is default
> >
> > The results:
>
> I made a blindless average:
>
> N MB/s IOPS case
>
> 0 114.859 984.148 Unpatched, 128KB readahead, 512 max_sectors_kb
> 1 122.960 981.213 Unpatched, 512KB readahead, 512 max_sectors_kb
> 2 120.709 985.111 Unpatched, 2MB readahead, 512 max_sectors_kb
> 3 158.732 1004.714 Unpatched, 512KB readahead, 64 max_sectors_kb
> 4 159.237 979.659 Unpatched, 2MB readahead, 64 max_sectors_kb
>
> 5 114.583 982.998 Patched, 128KB readahead, 512 max_sectors_kb
> 6 124.902 987.523 Patched, 512KB readahead, 512 max_sectors_kb
> 7 127.373 984.848 Patched, 2MB readahead, 512 max_sectors_kb
> 8 161.218 986.698 Patched, 512KB readahead, 64 max_sectors_kb
> 9 163.908 574.651 Patched, 2MB readahead, 64 max_sectors_kb
>
> So before/after patch:
>
> avg throughput 135.299 => 138.397 by +2.3%
> avg IOPS 986.969 => 903.344 by -8.5%
>
> The IOPS is a bit weird.
>
> Summaries:
> - this patch improves RAID throughput by +2.3% on average
> - after this patch, 2MB readahead performs slightly better
> (by 1-2%) than 512KB readahead

and the most important one:
- 64 max_sectors_kb performs much better than 256 max_sectors_kb, by ~30% !

Thanks,
Fengguang

> > Unpatched, 128KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.621 5.503 5.419 185.744 2.780 2.902
> > 33554432 6.628 5.897 6.242 164.068 7.827 5.127
> > 16777216 7.312 7.165 7.614 139.148 3.501 8.697
> > 8388608 8.719 8.408 8.694 119.003 1.973 14.875
> > 4194304 11.836 12.192 12.137 84.958 1.111 21.239
> > 2097152 13.452 13.992 14.035 74.090 1.442 37.045
> > 1048576 12.759 11.996 12.195 83.194 2.152 83.194
> > 524288 11.895 12.297 12.587 83.570 1.945 167.140
> > 262144 7.325 7.285 7.444 139.304 1.272 557.214
> > 131072 7.992 8.832 7.952 124.279 5.901 994.228
> > 65536 10.940 10.062 10.122 98.847 3.715 1581.545
> > 32768 9.973 10.012 9.945 102.640 0.281 3284.493
> > 16384 11.377 10.538 10.692 94.316 3.100 6036.222
> >
> > Unpatched, 512KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.032 4.770 5.265 204.228 8.271 3.191
> > 33554432 5.569 5.712 5.863 179.263 3.755 5.602
> > 16777216 6.661 6.857 6.550 153.132 2.888 9.571
> > 8388608 8.022 8.000 7.978 127.998 0.288 16.000
> > 4194304 10.959 11.579 12.208 88.586 3.902 22.146
> > 2097152 13.692 12.670 12.625 78.906 2.914 39.453
> > 1048576 11.120 11.144 10.878 92.703 1.018 92.703
> > 524288 11.234 10.915 11.374 91.667 1.587 183.334
> > 262144 6.848 6.678 6.795 151.191 1.594 604.763
> > 131072 7.393 7.367 7.337 139.025 0.428 1112.202
> > 65536 10.003 10.919 10.015 99.466 4.019 1591.462
> > 32768 10.117 10.124 10.169 101.018 0.229 3232.574
> > 16384 11.614 11.027 11.029 91.293 2.207 5842.771
> >
> > Unpatched, 2MB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.268 5.316 5.418 191.996 2.241 3.000
> > 33554432 5.831 6.459 6.110 167.259 6.977 5.227
> > 16777216 7.313 7.069 7.197 142.385 1.972 8.899
> > 8388608 8.657 8.500 8.498 119.754 1.039 14.969
> > 4194304 11.846 12.116 11.801 85.911 0.994 21.478
> > 2097152 12.917 13.652 13.100 77.484 1.808 38.742
> > 1048576 9.544 10.667 10.807 99.345 5.640 99.345
> > 524288 11.736 7.171 6.599 128.410 29.539 256.821
> > 262144 7.530 7.403 7.416 137.464 1.053 549.857
> > 131072 8.741 8.002 8.022 124.256 5.029 994.051
> > 65536 10.701 10.138 10.090 99.394 2.629 1590.311
> > 32768 9.978 9.950 9.934 102.875 0.188 3291.994
> > 16384 11.435 10.823 10.907 92.684 2.234 5931.749
> >
> > Unpatched, 512KB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.994 3.991 4.123 253.774 3.838 3.965
> > 33554432 4.100 4.329 4.161 244.111 5.569 7.628
> > 16777216 5.476 4.835 5.079 200.148 10.177 12.509
> > 8388608 5.484 5.258 5.227 192.470 4.084 24.059
> > 4194304 6.429 6.458 6.435 158.989 0.315 39.747
> > 2097152 7.219 7.744 7.306 138.081 4.187 69.040
> > 1048576 6.850 6.897 6.776 149.696 1.089 149.696
> > 524288 6.406 6.393 6.469 159.439 0.814 318.877
> > 262144 6.865 7.508 6.861 144.931 6.041 579.726
> > 131072 8.435 8.482 8.307 121.792 1.076 974.334
> > 65536 9.616 9.610 10.262 104.279 3.176 1668.462
> > 32768 9.682 9.932 10.015 103.701 1.497 3318.428
> > 16384 10.962 10.852 11.565 92.106 2.547 5894.813
> >
> > Unpatched, 2MB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.730 3.714 3.914 270.615 6.396 4.228
> > 33554432 4.445 3.999 3.989 247.710 12.276 7.741
> > 16777216 4.763 4.712 4.709 216.590 1.122 13.537
> > 8388608 5.001 5.086 5.229 200.649 3.673 25.081
> > 4194304 6.365 6.362 6.905 156.710 5.948 39.178
> > 2097152 7.390 7.367 7.270 139.470 0.992 69.735
> > 1048576 7.038 7.050 7.090 145.052 0.456 145.052
> > 524288 6.862 7.167 7.278 144.272 3.617 288.544
> > 262144 7.266 7.313 7.265 140.635 0.436 562.540
> > 131072 8.677 8.735 8.821 117.108 0.790 936.865
> > 65536 10.865 10.040 10.038 99.418 3.658 1590.685
> > 32768 10.167 10.130 10.177 100.805 0.201 3225.749
> > 16384 11.643 11.017 11.103 91.041 2.203 5826.629
> >
> > Patched, 128KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.670 5.188 5.636 186.555 7.671 2.915
> > 33554432 6.069 5.971 6.141 168.992 1.954 5.281
> > 16777216 7.821 7.501 7.372 135.451 3.340 8.466
> > 8388608 9.147 8.618 9.000 114.849 2.908 14.356
> > 4194304 12.199 12.914 12.381 81.981 1.964 20.495
> > 2097152 13.449 13.891 14.288 73.842 1.828 36.921
> > 1048576 11.890 12.182 11.519 86.360 1.984 86.360
> > 524288 11.899 12.706 12.135 83.678 2.287 167.357
> > 262144 7.460 7.559 7.563 136.041 0.864 544.164
> > 131072 7.987 8.003 8.530 125.403 3.792 1003.220
> > 65536 10.179 10.119 10.131 100.957 0.255 1615.312
> > 32768 9.899 9.923 10.589 101.114 3.121 3235.656
> > 16384 10.849 10.835 10.876 94.351 0.150 6038.474
> >
> > Patched, 512KB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 5.062 5.111 5.083 201.358 0.795 3.146
> > 33554432 5.589 5.713 5.657 181.165 1.625 5.661
> > 16777216 6.337 7.220 6.457 154.002 8.690 9.625
> > 8388608 7.952 7.880 7.527 131.588 3.192 16.448
> > 4194304 10.695 11.224 10.736 94.119 2.047 23.530
> > 2097152 10.898 12.072 12.358 87.215 4.839 43.607
> > 1048576 10.890 11.347 9.290 98.166 8.664 98.166
> > 524288 10.898 11.032 10.887 93.611 0.560 187.223
> > 262144 6.714 7.230 6.804 148.219 4.724 592.875
> > 131072 7.325 7.342 7.363 139.441 0.295 1115.530
> > 65536 9.773 9.988 10.592 101.327 3.417 1621.227
> > 32768 10.031 9.995 10.086 102.019 0.377 3264.620
> > 16384 11.041 10.987 11.564 91.502 2.093 5856.144
> >
> > Patched, 2MB readahead, 512 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 4.970 5.097 5.188 201.435 3.559 3.147
> > 33554432 5.588 5.793 5.169 186.042 8.923 5.814
> > 16777216 6.151 6.414 6.526 161.012 4.027 10.063
> > 8388608 7.836 7.299 7.475 135.980 3.989 16.998
> > 4194304 11.792 10.964 10.158 93.683 5.706 23.421
> > 2097152 11.225 11.492 11.357 90.162 0.866 45.081
> > 1048576 12.017 11.258 11.432 88.580 2.449 88.580
> > 524288 5.974 10.883 11.840 117.323 38.361 234.647
> > 262144 6.774 6.765 6.526 153.155 2.661 612.619
> > 131072 8.036 7.324 7.341 135.579 5.766 1084.633
> > 65536 9.964 10.595 9.999 100.608 2.806 1609.735
> > 32768 10.132 10.036 10.190 101.197 0.637 3238.308
> > 16384 11.133 11.568 11.036 91.093 1.850 5829.981
> >
> > Patched, 512KB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.722 3.698 3.721 275.759 0.809 4.309
> > 33554432 4.058 3.849 3.957 259.063 5.580 8.096
> > 16777216 4.601 4.613 4.738 220.212 2.913 13.763
> > 8388608 5.039 5.534 5.017 197.452 8.791 24.682
> > 4194304 6.302 6.270 6.282 162.942 0.341 40.735
> > 2097152 7.314 7.302 7.069 141.700 2.233 70.850
> > 1048576 6.881 7.655 6.909 143.597 6.951 143.597
> > 524288 7.163 7.025 6.951 145.344 1.803 290.687
> > 262144 7.315 7.233 7.299 140.621 0.689 562.482
> > 131072 9.292 8.756 8.807 114.475 3.036 915.803
> > 65536 9.942 9.985 9.960 102.787 0.181 1644.598
> > 32768 10.721 10.091 10.192 99.154 2.605 3172.935
> > 16384 11.049 11.016 11.065 92.727 0.169 5934.531
> >
> > Patched, 2MB readahead, 64 max_sectors_kb
> > blocksize R R R R(avg, R(std R
> > (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> > 67108864 3.697 3.819 3.741 272.931 3.661 4.265
> > 33554432 3.951 3.905 4.038 258.320 3.586 8.073
> > 16777216 5.595 5.182 4.864 197.044 11.236 12.315
> > 8388608 5.267 5.156 5.116 197.725 2.431 24.716
> > 4194304 6.411 6.335 6.290 161.389 1.267 40.347
> > 2097152 7.329 7.663 7.462 136.860 2.502 68.430
> > 1048576 7.225 7.077 7.215 142.784 1.352 142.784
> > 524288 6.903 7.015 7.095 146.210 1.647 292.419
> > 262144 7.365 7.926 7.278 136.309 5.076 545.237
> > 131072 8.796 8.819 8.814 116.233 0.130 929.862
> > 65536 9.998 10.609 9.995 100.464 2.786 1607.423
> > 32768 10.161 10.124 10.246 100.623 0.505 3219.943
> >
> > Regards,
> > Ronald.

2009-06-29 15:37:58

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Wu Fengguang, on 06/29/2009 06:51 PM wrote:
> On Mon, Jun 29, 2009 at 10:43:48PM +0800, Ronald Moesbergen wrote:
>> 2009/6/29 Wu Fengguang <[email protected]>:
>>> On Mon, Jun 29, 2009 at 09:13:27PM +0800, Wu Fengguang wrote:
>>>> On Mon, Jun 29, 2009 at 09:04:57PM +0800, Vladislav Bolkhovitin wrote:
>>>>> Wu Fengguang, on 06/29/2009 04:54 PM wrote:
>>>>>> Why not 2.6.30? :)
>>>>> We started with 2.6.29, so why not complete with it (to save additional
>>>>> Ronald's effort to move on 2.6.30)?
>>>> OK, that's fair enough.
>>> btw, I backported the 2.6.31 context readahead patches to 2.6.29, just
>>> in case it will help the SCST performance.
>>>
>>> Ronald, if you run context readahead, please make sure that the server
>>> side readahead size is bigger than the client side readahead size.
>> I tried this patch on a vanilla kernel and no other patches applied,
>> but it does not seem to help. The iSCSI throughput does not go above
>> 60MB/s. (1GB in 17 seconds). I have tried several readahead settings
>> from 128KB up to 4MB and kept the server readahead at twice the client
>> readahead, but it never comes above 60MB/s. This is using SCST on the
>
> OK, thanks for the tests anyway!
>
>> serverside and openiscsi on the client. I get much better throughput
>> (90 MB/s) when using the patches supplied with SCST, together with the
>
> What do you mean by "patches supplied with SCST"?

Ronald means io_context patch
(http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/kernel/io_context-2.6.29.patch?revision=717),
which allows SCST's I/O threads to share a single IO context.

Vlad

2009-06-29 15:38:18

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Wu Fengguang, on 06/29/2009 07:01 PM wrote:
> On Mon, Jun 29, 2009 at 10:21:24PM +0800, Wu Fengguang wrote:
>> On Mon, Jun 29, 2009 at 10:00:20PM +0800, Ronald Moesbergen wrote:
>>> ... tests ...
>>>
>>>> We started with 2.6.29, so why not complete with it (to save additional
>>>> Ronald's effort to move on 2.6.30)?
>>>>
>>>>>> 2. Default vanilla 2.6.29 kernel, 512 KB read-ahead, the rest is default
>>>>> How about 2MB RAID readahead size? That transforms into about 512KB
>>>>> per-disk readahead size.
>>>> OK. Ronald, can you 4 more test cases, please:
>>>>
>>>> 7. Default vanilla 2.6.29 kernel, 2MB read-ahead, the rest is default
>>>>
>>>> 8. Default vanilla 2.6.29 kernel, 2MB read-ahead, 64 KB
>>>> max_sectors_kb, the rest is default
>>>>
>>>> 9. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>>>> read-ahead, the rest is default
>>>>
>>>> 10. Patched by the Fengguang's patch vanilla 2.6.29 kernel, 2MB
>>>> read-ahead, 64 KB max_sectors_kb, the rest is default
>>> The results:
>> I made a blindless average:
>>
>> N MB/s IOPS case
>>
>> 0 114.859 984.148 Unpatched, 128KB readahead, 512 max_sectors_kb
>> 1 122.960 981.213 Unpatched, 512KB readahead, 512 max_sectors_kb
>> 2 120.709 985.111 Unpatched, 2MB readahead, 512 max_sectors_kb
>> 3 158.732 1004.714 Unpatched, 512KB readahead, 64 max_sectors_kb
>> 4 159.237 979.659 Unpatched, 2MB readahead, 64 max_sectors_kb
>>
>> 5 114.583 982.998 Patched, 128KB readahead, 512 max_sectors_kb
>> 6 124.902 987.523 Patched, 512KB readahead, 512 max_sectors_kb
>> 7 127.373 984.848 Patched, 2MB readahead, 512 max_sectors_kb
>> 8 161.218 986.698 Patched, 512KB readahead, 64 max_sectors_kb
>> 9 163.908 574.651 Patched, 2MB readahead, 64 max_sectors_kb
>>
>> So before/after patch:
>>
>> avg throughput 135.299 => 138.397 by +2.3%
>> avg IOPS 986.969 => 903.344 by -8.5%
>>
>> The IOPS is a bit weird.
>>
>> Summaries:
>> - this patch improves RAID throughput by +2.3% on average
>> - after this patch, 2MB readahead performs slightly better
>> (by 1-2%) than 512KB readahead
>
> and the most important one:
> - 64 max_sectors_kb performs much better than 256 max_sectors_kb, by ~30% !

Yes, I've just wanted to point it out ;)

> Thanks,
> Fengguang
>
>>> Unpatched, 128KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.621 5.503 5.419 185.744 2.780 2.902
>>> 33554432 6.628 5.897 6.242 164.068 7.827 5.127
>>> 16777216 7.312 7.165 7.614 139.148 3.501 8.697
>>> 8388608 8.719 8.408 8.694 119.003 1.973 14.875
>>> 4194304 11.836 12.192 12.137 84.958 1.111 21.239
>>> 2097152 13.452 13.992 14.035 74.090 1.442 37.045
>>> 1048576 12.759 11.996 12.195 83.194 2.152 83.194
>>> 524288 11.895 12.297 12.587 83.570 1.945 167.140
>>> 262144 7.325 7.285 7.444 139.304 1.272 557.214
>>> 131072 7.992 8.832 7.952 124.279 5.901 994.228
>>> 65536 10.940 10.062 10.122 98.847 3.715 1581.545
>>> 32768 9.973 10.012 9.945 102.640 0.281 3284.493
>>> 16384 11.377 10.538 10.692 94.316 3.100 6036.222
>>>
>>> Unpatched, 512KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.032 4.770 5.265 204.228 8.271 3.191
>>> 33554432 5.569 5.712 5.863 179.263 3.755 5.602
>>> 16777216 6.661 6.857 6.550 153.132 2.888 9.571
>>> 8388608 8.022 8.000 7.978 127.998 0.288 16.000
>>> 4194304 10.959 11.579 12.208 88.586 3.902 22.146
>>> 2097152 13.692 12.670 12.625 78.906 2.914 39.453
>>> 1048576 11.120 11.144 10.878 92.703 1.018 92.703
>>> 524288 11.234 10.915 11.374 91.667 1.587 183.334
>>> 262144 6.848 6.678 6.795 151.191 1.594 604.763
>>> 131072 7.393 7.367 7.337 139.025 0.428 1112.202
>>> 65536 10.003 10.919 10.015 99.466 4.019 1591.462
>>> 32768 10.117 10.124 10.169 101.018 0.229 3232.574
>>> 16384 11.614 11.027 11.029 91.293 2.207 5842.771
>>>
>>> Unpatched, 2MB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.268 5.316 5.418 191.996 2.241 3.000
>>> 33554432 5.831 6.459 6.110 167.259 6.977 5.227
>>> 16777216 7.313 7.069 7.197 142.385 1.972 8.899
>>> 8388608 8.657 8.500 8.498 119.754 1.039 14.969
>>> 4194304 11.846 12.116 11.801 85.911 0.994 21.478
>>> 2097152 12.917 13.652 13.100 77.484 1.808 38.742
>>> 1048576 9.544 10.667 10.807 99.345 5.640 99.345
>>> 524288 11.736 7.171 6.599 128.410 29.539 256.821
>>> 262144 7.530 7.403 7.416 137.464 1.053 549.857
>>> 131072 8.741 8.002 8.022 124.256 5.029 994.051
>>> 65536 10.701 10.138 10.090 99.394 2.629 1590.311
>>> 32768 9.978 9.950 9.934 102.875 0.188 3291.994
>>> 16384 11.435 10.823 10.907 92.684 2.234 5931.749
>>>
>>> Unpatched, 512KB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.994 3.991 4.123 253.774 3.838 3.965
>>> 33554432 4.100 4.329 4.161 244.111 5.569 7.628
>>> 16777216 5.476 4.835 5.079 200.148 10.177 12.509
>>> 8388608 5.484 5.258 5.227 192.470 4.084 24.059
>>> 4194304 6.429 6.458 6.435 158.989 0.315 39.747
>>> 2097152 7.219 7.744 7.306 138.081 4.187 69.040
>>> 1048576 6.850 6.897 6.776 149.696 1.089 149.696
>>> 524288 6.406 6.393 6.469 159.439 0.814 318.877
>>> 262144 6.865 7.508 6.861 144.931 6.041 579.726
>>> 131072 8.435 8.482 8.307 121.792 1.076 974.334
>>> 65536 9.616 9.610 10.262 104.279 3.176 1668.462
>>> 32768 9.682 9.932 10.015 103.701 1.497 3318.428
>>> 16384 10.962 10.852 11.565 92.106 2.547 5894.813
>>>
>>> Unpatched, 2MB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.730 3.714 3.914 270.615 6.396 4.228
>>> 33554432 4.445 3.999 3.989 247.710 12.276 7.741
>>> 16777216 4.763 4.712 4.709 216.590 1.122 13.537
>>> 8388608 5.001 5.086 5.229 200.649 3.673 25.081
>>> 4194304 6.365 6.362 6.905 156.710 5.948 39.178
>>> 2097152 7.390 7.367 7.270 139.470 0.992 69.735
>>> 1048576 7.038 7.050 7.090 145.052 0.456 145.052
>>> 524288 6.862 7.167 7.278 144.272 3.617 288.544
>>> 262144 7.266 7.313 7.265 140.635 0.436 562.540
>>> 131072 8.677 8.735 8.821 117.108 0.790 936.865
>>> 65536 10.865 10.040 10.038 99.418 3.658 1590.685
>>> 32768 10.167 10.130 10.177 100.805 0.201 3225.749
>>> 16384 11.643 11.017 11.103 91.041 2.203 5826.629
>>>
>>> Patched, 128KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.670 5.188 5.636 186.555 7.671 2.915
>>> 33554432 6.069 5.971 6.141 168.992 1.954 5.281
>>> 16777216 7.821 7.501 7.372 135.451 3.340 8.466
>>> 8388608 9.147 8.618 9.000 114.849 2.908 14.356
>>> 4194304 12.199 12.914 12.381 81.981 1.964 20.495
>>> 2097152 13.449 13.891 14.288 73.842 1.828 36.921
>>> 1048576 11.890 12.182 11.519 86.360 1.984 86.360
>>> 524288 11.899 12.706 12.135 83.678 2.287 167.357
>>> 262144 7.460 7.559 7.563 136.041 0.864 544.164
>>> 131072 7.987 8.003 8.530 125.403 3.792 1003.220
>>> 65536 10.179 10.119 10.131 100.957 0.255 1615.312
>>> 32768 9.899 9.923 10.589 101.114 3.121 3235.656
>>> 16384 10.849 10.835 10.876 94.351 0.150 6038.474
>>>
>>> Patched, 512KB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 5.062 5.111 5.083 201.358 0.795 3.146
>>> 33554432 5.589 5.713 5.657 181.165 1.625 5.661
>>> 16777216 6.337 7.220 6.457 154.002 8.690 9.625
>>> 8388608 7.952 7.880 7.527 131.588 3.192 16.448
>>> 4194304 10.695 11.224 10.736 94.119 2.047 23.530
>>> 2097152 10.898 12.072 12.358 87.215 4.839 43.607
>>> 1048576 10.890 11.347 9.290 98.166 8.664 98.166
>>> 524288 10.898 11.032 10.887 93.611 0.560 187.223
>>> 262144 6.714 7.230 6.804 148.219 4.724 592.875
>>> 131072 7.325 7.342 7.363 139.441 0.295 1115.530
>>> 65536 9.773 9.988 10.592 101.327 3.417 1621.227
>>> 32768 10.031 9.995 10.086 102.019 0.377 3264.620
>>> 16384 11.041 10.987 11.564 91.502 2.093 5856.144
>>>
>>> Patched, 2MB readahead, 512 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 4.970 5.097 5.188 201.435 3.559 3.147
>>> 33554432 5.588 5.793 5.169 186.042 8.923 5.814
>>> 16777216 6.151 6.414 6.526 161.012 4.027 10.063
>>> 8388608 7.836 7.299 7.475 135.980 3.989 16.998
>>> 4194304 11.792 10.964 10.158 93.683 5.706 23.421
>>> 2097152 11.225 11.492 11.357 90.162 0.866 45.081
>>> 1048576 12.017 11.258 11.432 88.580 2.449 88.580
>>> 524288 5.974 10.883 11.840 117.323 38.361 234.647
>>> 262144 6.774 6.765 6.526 153.155 2.661 612.619
>>> 131072 8.036 7.324 7.341 135.579 5.766 1084.633
>>> 65536 9.964 10.595 9.999 100.608 2.806 1609.735
>>> 32768 10.132 10.036 10.190 101.197 0.637 3238.308
>>> 16384 11.133 11.568 11.036 91.093 1.850 5829.981
>>>
>>> Patched, 512KB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.722 3.698 3.721 275.759 0.809 4.309
>>> 33554432 4.058 3.849 3.957 259.063 5.580 8.096
>>> 16777216 4.601 4.613 4.738 220.212 2.913 13.763
>>> 8388608 5.039 5.534 5.017 197.452 8.791 24.682
>>> 4194304 6.302 6.270 6.282 162.942 0.341 40.735
>>> 2097152 7.314 7.302 7.069 141.700 2.233 70.850
>>> 1048576 6.881 7.655 6.909 143.597 6.951 143.597
>>> 524288 7.163 7.025 6.951 145.344 1.803 290.687
>>> 262144 7.315 7.233 7.299 140.621 0.689 562.482
>>> 131072 9.292 8.756 8.807 114.475 3.036 915.803
>>> 65536 9.942 9.985 9.960 102.787 0.181 1644.598
>>> 32768 10.721 10.091 10.192 99.154 2.605 3172.935
>>> 16384 11.049 11.016 11.065 92.727 0.169 5934.531
>>>
>>> Patched, 2MB readahead, 64 max_sectors_kb
>>> blocksize R R R R(avg, R(std R
>>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>>> 67108864 3.697 3.819 3.741 272.931 3.661 4.265
>>> 33554432 3.951 3.905 4.038 258.320 3.586 8.073
>>> 16777216 5.595 5.182 4.864 197.044 11.236 12.315
>>> 8388608 5.267 5.156 5.116 197.725 2.431 24.716
>>> 4194304 6.411 6.335 6.290 161.389 1.267 40.347
>>> 2097152 7.329 7.663 7.462 136.860 2.502 68.430
>>> 1048576 7.225 7.077 7.215 142.784 1.352 142.784
>>> 524288 6.903 7.015 7.095 146.210 1.647 292.419
>>> 262144 7.365 7.926 7.278 136.309 5.076 545.237
>>> 131072 8.796 8.819 8.814 116.233 0.130 929.862
>>> 65536 9.998 10.609 9.995 100.464 2.786 1607.423
>>> 32768 10.161 10.124 10.246 100.623 0.505 3219.943
>>>
>>> Regards,
>>> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-06-30 10:23:13

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

#!/bin/sh

############################################################################
#
# Script for testing block device I/O performance. Running this script on a
# block device that is connected to a remote SCST target device allows to
# test the performance of the transport protocols implemented in SCST. The
# operation of this script is similar to iozone, while this script is easier
# to use.
#
# Copyright (C) 2009 Bart Van Assche <[email protected]>.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation, version 2
# of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
############################################################################

#########################
# Function definitions #
#########################

usage() {
echo "Usage: $0 [-a] [-d] [-i <i>] [-n] [-r] [-s <l2s>] <dev>"
echo " -a - use asynchronous (buffered) I/O."
echo " -d - use direct (non-buffered) I/O."
echo " -i - number times each test is iterated."
echo " -n - do not verify the data on <dev> before overwriting it."
echo " -r - only perform the read test."
echo " -s - logarithm base two of the I/O size."
echo " <dev> - block device to run the I/O performance test on."
}

# Echo ((2**$1))
pow2() {
if [ $1 = 0 ]; then
echo 1
else
echo $((2 * $(pow2 $(($1 - 1)) ) ))
fi
}

drop_caches() {
sync
if [ -w /proc/sys/vm/drop_caches ]; then
echo 3 > /proc/sys/vm/drop_caches
fi
}

# Read times in seconds from stdin, one number per line, echo each number
# using format $1, and also echo the average transfer size in MB/s, its
# standard deviation and the number of IOPS using the total I/O size $2 and
# the block transfer size $3.
echo_and_calc_avg() {
awk -v fmt="$1" -v iosize="$2" -v blocksize="$3" 'BEGIN{pow_2_20=1024*1024}{if ($1 != 0){n++;sum+=iosize/$1;sumsq+=iosize*iosize/($1*$1)};printf fmt, $1} END{d=(n>0?sumsq/n-sum*sum/n/n:0);avg=(n>0?sum/n:0);stddev=(d>0?sqrt(d):0);iops=avg/blocksize;printf fmt fmt fmt,avg/pow_2_20,stddev/pow_2_20,iops}'
}

#########################
# Default settings #
#########################

iterations=3
log2_io_size=30 # 1 GB
log2_min_blocksize=9 # 512 bytes
log2_max_blocksize=26 # 64 MB
iotype=direct
read_test_only=false
verify_device_data=true


#########################
# Argument processing #
#########################

set -- $(/usr/bin/getopt "adhi:nrs:" "$@")
while [ "$1" != "${1#-}" ]
do
case "$1" in
'-a') iotype="buffered"; shift;;
'-d') iotype="direct"; shift;;
'-i') iterations="$2"; shift; shift;;
'-n') verify_device_data="false"; shift;;
'-r') read_test_only="true"; shift;;
'-s') log2_io_size="$2"; shift; shift;;
'--') shift;;
*) usage; exit 1;;
esac
done

if [ "$#" != 1 ]; then
usage
exit 1
fi

device="$1"


####################
# Performance test #
####################

if [ ! -e "${device}" ]; then
echo "Error: device ${device} does not exist."
exit 1
fi

if [ "${read_test_only}" = "false" -a ! -w "${device}" ]; then
echo "Error: device ${device} is not writeable."
exit 1
fi

if [ "${read_test_only}" = "false" -a "${verify_device_data}" = "true" ] \
&& ! cmp -s -n $(pow2 $log2_io_size) "${device}" /dev/zero
then
echo "Error: device ${device} still contains data."
exit 1
fi

if [ "${iotype}" = "direct" ]; then
dd_oflags="oflag=direct"
dd_iflags="iflag=direct"
else
dd_oflags="oflag=sync"
dd_iflags=""
fi

# Header, line 1
printf "%9s " blocksize
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "W"
i=$((i+1))
done
printf "%8s %8s %8s " "W(avg," "W(std," "W"
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "R"
i=$((i+1))
done
printf "%8s %8s %8s" "R(avg," "R(std" "R"
printf "\n"

# Header, line 2
printf "%9s " "(bytes)"
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "(s)"
i=$((i+1))
done
printf "%8s %8s %8s " "MB/s)" ",MB/s)" "(IOPS)"
i=0
while [ $i -lt ${iterations} ]
do
printf "%8s " "(s)"
i=$((i+1))
done
printf "%8s %8s %8s" "MB/s)" ",MB/s)" "(IOPS)"
printf "\n"

# Measurements
log2_blocksize=${log2_max_blocksize}
while [ ! $log2_blocksize -lt $log2_min_blocksize ]
do
if [ $log2_blocksize -gt $log2_io_size ]; then
continue
fi
iosize=$(pow2 $log2_io_size)
bs=$(pow2 $log2_blocksize)
count=$(pow2 $(($log2_io_size - $log2_blocksize)))
printf "%9d " ${bs}
i=0
while [ $i -lt ${iterations} ]
do
if [ "${read_test_only}" = "false" ]; then
drop_caches
dd if=/dev/zero of="${device}" bs=${bs} count=${count} \
${dd_oflags} 2>&1 \
| sed -n 's/.* \([0-9.]*\) s,.*/\1/p'
else
echo 0
fi
i=$((i+1))
done | echo_and_calc_avg "%8.3f " ${iosize} ${bs}

i=0
while [ $i -lt ${iterations} ]
do
drop_caches
dd if="${device}" of=/dev/null bs=${bs} count=${count} \
${dd_iflags} 2>&1 \
| sed -n 's/.* \([0-9.]*\) s,.*/\1/p'
i=$((i+1))
done | echo_and_calc_avg "%8.3f " ${iosize} ${bs}
printf "\n"
log2_blocksize=$((log2_blocksize - 1))
done


Attachments:
blockdev-perftest (5.27 kB)