2002-06-21 22:06:34

by Duc Vianney

[permalink] [raw]
Subject: [Lse-tech] Re: ext3 performance bottleneck as the number of spindles gets large

Andrew Morton wrote:
>If you have time, please test ext2 and/or reiserfs and/or ext3
>in writeback mode.
I ran IOzone on ext2fs, ext3fs, JFS, and Reiserfs on an SMP 4-way
500MHz, 2.5GB RAM, two 9.1GB SCSI drives. The test partition is 1GB,
test file size is 128MB, test block size is 4KB, and IO threads varies
from 1 to 6. When comparing with other file system for this test
environment, the results on a 2.5.19 SMP kernel show ext3fs is having
performance problem with Writes and in particularly, with Random Write.
I think the BKL contention patch would help ext3fs, but I need to verify
it first.

The following data are throughput in MB/sec obtained from IOzone
benchmark running on all file systems installed with default options.


Kernels 2519smp4 2519smp4 2519smp4 2519smp4
No of threads=1 ext2-1t jfs-1t ext3-1t reiserfs-1t

Initial write 138010 111023 29808 48170
Rewrite 205736 204538 119543 142765
Read 236500 237235 231860 236959
Re-read 242927 243577 240284 242776
Random read 204292 206010 201664 207219
Random write 180144 180461 1090 121676

No of threads=2 ext2-2t jfs-2t ext3-2t reiserfs-2t

Initial write 196477 143395 62248 55260
Rewrite 261641 261441 126604 205076
Read 292566 292796 313562 291434
Re-read 302239 306423 341416 303424
Random read 296152 295430 316966 288584
Random write 253026 251013 958 203358

No of threads=4 ext2-4t jfs-4t ext3-4t reiserfs-4t

Initial write 79513 172302 42051 48782
Rewrite 256568 269840 124912 231395
Read 290599 303669 327066 283793
Re-read 289578 303644 327362 287531
Random read 354011 353455 353806 351671
Random write 279704 279922 2482 250498

No of threads=6 ext2-6t jfs-6t ext3-6t reiserfs-6t

Initial write 98559 69825 59728 15576
Rewrite 274993 286987 126048 232193
Read 330522 326143 332147 326163
Re-read 339672 328890 333094 326725
Random read 348059 346154 347901 344927
Random write 281613 280213 3659 227579

Cheers,
Duc J Vianney, [email protected]
home page: http://www-124.ibm.com/developerworks/opensource/linuxperf/
project page: http://www-124.ibm.com/developerworks/projects/linuxperf



2002-06-21 23:13:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [Lse-tech] Re: ext3 performance bottleneck as the number of spindles gets large

Duc Vianney wrote:
>
> Andrew Morton wrote:
> >If you have time, please test ext2 and/or reiserfs and/or ext3
> >in writeback mode.
> I ran IOzone on ext2fs, ext3fs, JFS, and Reiserfs on an SMP 4-way
> 500MHz, 2.5GB RAM, two 9.1GB SCSI drives. The test partition is 1GB,
> test file size is 128MB, test block size is 4KB, and IO threads varies
> from 1 to 6. When comparing with other file system for this test
> environment, the results on a 2.5.19 SMP kernel show ext3fs is having
> performance problem with Writes and in particularly, with Random Write.
> I think the BKL contention patch would help ext3fs, but I need to verify
> it first.
>
> The following data are throughput in MB/sec obtained from IOzone
> benchmark running on all file systems installed with default options.
>
> Kernels 2519smp4 2519smp4 2519smp4 2519smp4
> No of threads=1 ext2-1t jfs-1t ext3-1t reiserfs-1t
>
> Initial write 138010 111023 29808 48170
> Rewrite 205736 204538 119543 142765
> Read 236500 237235 231860 236959
> Re-read 242927 243577 240284 242776
> Random read 204292 206010 201664 207219
> Random write 180144 180461 1090 121676

ext3 only allows dirty data to remain in memory for five seconds,
whereas the other filesystems allow it for thirty. This is
a reasonable thing to do, but it hurts badly in benchmarks.

If you run a benchmark which takes ext2 ten seconds to
complete, ext2 will do it all in-RAM. But after five
seconds, ext3 will go to disk and the test takes vastly longer.
I suspect that is what is happening here - we're seeing the
difference between disk bandwidth and memory bandwidth.

If you choose a larger file, a shorter file or a longer-running
test then the difference will not be so gross.

You can confirm this by trying a one-gigabyte file instead.

The "Initial write" is fishy. I wonder if the same thing
is happening here - there may have been lots of dirty memory
left in-core (and unaccounted for) after the test completed.
iozone has a `-e' option which causes it to include the fsync()
time in the timing calculations. Using that would give a
better comparison, unless you are specifically trying to test
in-memory performance. And we're not doing that here.

-