2006-09-14 23:32:43

by Badari Pulavarty

[permalink] [raw]
Subject: ext3 sequential read performance (~20%) degrade

Hi Andrew,

I have been working on tracking down ~20% performance degrade for
sequential read performance for ext3.

Finally narrowed it down to get_blocks() support. If I force
ext3_get_blocks_handle() to always return 1 block - I get better
IO rate. I did all the usual stuff, tracked down requests, traced
blocksizes, looked at readahead code, looked at mpage_readpages()
etc.. I still can't figure out how to explain the degrade..

Any suggestions on how to track it down.

Thanks,
Badari

# cat iotest
mount /dev/sdb2 /mnt/tmp
time dd if=/mnt/tmp/testfile of=/dev/null bs=4k count=1048576
umount /mnt/tmp

2.6.18-rc6: (multiblock):

# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 75.2654 seconds, 57.1 MB/s

real 1m15.282s
user 0m0.248s
sys 0m4.292s

2.6.18-rc6 (force single block in ext3_get_blocks_handle():

# ./iotest
./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 62.9472 seconds, 68.2 MB/s

real 1m2.976s
user 0m0.268s
sys 0m4.280s




2006-09-15 00:03:19

by Andrew Morton

[permalink] [raw]
Subject: Re: ext3 sequential read performance (~20%) degrade

On Thu, 14 Sep 2006 16:36:12 -0700
Badari Pulavarty <[email protected]> wrote:

> Hi Andrew,
>
> I have been working on tracking down ~20% performance degrade for
> sequential read performance for ext3.

oop. I'd kinda prefer that we discover things like this before the patch
gets into mainline.

> Finally narrowed it down to get_blocks() support. If I force
> ext3_get_blocks_handle() to always return 1 block - I get better
> IO rate. I did all the usual stuff, tracked down requests, traced
> blocksizes, looked at readahead code, looked at mpage_readpages()
> etc.. I still can't figure out how to explain the degrade..
>
> Any suggestions on how to track it down.

Learn to driver Jens's blktrace stuff, find out why the IO scheduling went
bad.

Number one suspicion: the buffer_boundary() stuff isn't working.

> Thanks,
> Badari
>
> # cat iotest
> mount /dev/sdb2 /mnt/tmp
> time dd if=/mnt/tmp/testfile of=/dev/null bs=4k count=1048576
> umount /mnt/tmp

Try using /proc/sys/vm/drop_caches, or ext3-tools's fadvise.c...

> 2.6.18-rc6: (multiblock):
>
> # ./iotest
> 1048576+0 records in
> 1048576+0 records out
> 4294967296 bytes (4.3 GB) copied, 75.2654 seconds, 57.1 MB/s
>
> real 1m15.282s
> user 0m0.248s
> sys 0m4.292s
>
> 2.6.18-rc6 (force single block in ext3_get_blocks_handle():
>
> # ./iotest
> ./iotest
> 1048576+0 records in
> 1048576+0 records out
> 4294967296 bytes (4.3 GB) copied, 62.9472 seconds, 68.2 MB/s
>
> real 1m2.976s
> user 0m0.268s
> sys 0m4.280s

ow.

2006-09-15 05:48:40

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: ext3 sequential read performance (~20%) degrade

On Thu, Sep 14, 2006 at 05:03:08PM -0700, Andrew Morton wrote:
> On Thu, 14 Sep 2006 16:36:12 -0700
> Badari Pulavarty <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > I have been working on tracking down ~20% performance degrade for
> > sequential read performance for ext3.
>
> oop. I'd kinda prefer that we discover things like this before the patch
> gets into mainline.
>
> > Finally narrowed it down to get_blocks() support. If I force
> > ext3_get_blocks_handle() to always return 1 block - I get better
> > IO rate. I did all the usual stuff, tracked down requests, traced
> > blocksizes, looked at readahead code, looked at mpage_readpages()
> > etc.. I still can't figure out how to explain the degrade..
> >
> > Any suggestions on how to track it down.
>
> Learn to driver Jens's blktrace stuff, find out why the IO scheduling went
> bad.
>
> Number one suspicion: the buffer_boundary() stuff isn't working.

I think you are right about that - perhaps something along
the lines of the following patch (untested) would help ?

If this is the problem then I guess the degradation should show up for DIO
as well.

-----------------------------

The boundary block check in ext3_get_blocks_handle needs to be adjusted
against the count of blocks mapped in this call, now that it can map
more than one block.



linux-2.6.18-rc5-suparna/fs/ext3/inode.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/ext3/inode.c~ext3-multiblock-boundary-fix fs/ext3/inode.c
--- linux-2.6.18-rc5/fs/ext3/inode.c~ext3-multiblock-boundary-fix 2006-09-15 10:53:12.000000000 +0530
+++ linux-2.6.18-rc5-suparna/fs/ext3/inode.c 2006-09-15 10:54:30.000000000 +0530
@@ -925,7 +925,7 @@ int ext3_get_blocks_handle(handle_t *han
set_buffer_new(bh_result);
got_it:
map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key));
- if (blocks_to_boundary == 0)
+ if (count > blocks_to_boundary)
set_buffer_boundary(bh_result);
err = count;
/* Clean up and exit */

_

Regards
Suparna

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India



2006-09-15 15:58:15

by Badari Pulavarty

[permalink] [raw]
Subject: Re: ext3 sequential read performance (~20%) degrade

On Fri, 2006-09-15 at 11:20 +0530, Suparna Bhattacharya wrote:
> On Thu, Sep 14, 2006 at 05:03:08PM -0700, Andrew Morton wrote:
> > On Thu, 14 Sep 2006 16:36:12 -0700
> > Badari Pulavarty <[email protected]> wrote:
> >
> > > Hi Andrew,
> > >
> > > I have been working on tracking down ~20% performance degrade for
> > > sequential read performance for ext3.
> >
> > oop. I'd kinda prefer that we discover things like this before the patch
> > gets into mainline.
> >
> > > Finally narrowed it down to get_blocks() support. If I force
> > > ext3_get_blocks_handle() to always return 1 block - I get better
> > > IO rate. I did all the usual stuff, tracked down requests, traced
> > > blocksizes, looked at readahead code, looked at mpage_readpages()
> > > etc.. I still can't figure out how to explain the degrade..
> > >
> > > Any suggestions on how to track it down.
> >
> > Learn to driver Jens's blktrace stuff, find out why the IO scheduling went
> > bad.
> >
> > Number one suspicion: the buffer_boundary() stuff isn't working.
>
> I think you are right about that - perhaps something along
> the lines of the following patch (untested) would help ?

Yep. It works :)

Thanks,
Badari