LinuxLists.cc - [BK PATCH] Add ext3 indexed directory (htree) support

2002-09-25 19:59:01

Subject: [BK PATCH] Add ext3 indexed directory (htree) support

Hi Linus,

This changeset contains Daniel Phillip's indexed directory changes,
ported to 2.5 by Christopher Li and Andrew Morton, and then extensively
cleaned up by me. I also implemented the enhanced dx_readdir code which
returns files would be returned in hash order. This was necessary so
that concurrent tree splits would not result in filenames to be
erroneously returned twice or not at all when the b-tree splits
reorganized the directory out from underneath readdir().

This patch significantly increases the speed of using large directories.
Creating 100,000 files in a single directory took 38 minutes without
directory indexing... and 11 seconds with the directory indexing turned on.

I've given this code a good bit of testing, under both 2.4 and 2.5
kernels, and believe that it is ready for prime-time. Please pull it
from:

bk://extfs.bkbits.net/for-linus-htree-2.5

In order to use the new directory indexing feature, please update your
e2fsprogs to 1.29. Existing filesystem can be updated to use directory
indexing using the command "tune2fs -O dir_index /dev/hdXXX". This can
be done while the filesystem is mounted, and subsequent new directories
or directories fit within a single block will be use the new (backwards
compatible) dirctory indexing format when they grow beyond a single
block.

Existing large directories on the filesystem can be converted to use the
new indexed directory format by running the following command on an
unmounted filesystem "e2fsck -fD /dev/hdXXX".

- Ted

fs/ext3/Makefile | 2
fs/ext3/dir.c | 298 ++++++++++
fs/ext3/file.c | 3
fs/ext3/hash.c | 215 +++++++
fs/ext3/namei.c | 1305 ++++++++++++++++++++++++++++++++++++++++-----
fs/ext3/super.c | 6
include/linux/ext3_fs.h | 86 ++
include/linux/ext3_fs_sb.h | 2
include/linux/ext3_jbd.h | 2
include/linux/rbtree.h | 1
lib/rbtree.c | 16
11 files changed, 1797 insertions(+), 139 deletions(-)

(The changes to rbtree.c were to add a new function rb_first(), which
returns the first node in the rbtree.)

2002-09-25 20:31:01

by Andreas Dilger

[permalink] [raw]

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

On Sep 25, 2002 16:03 -0400, [email protected] wrote:
> This patch significantly increases the speed of using large directories.
> Creating 100,000 files in a single directory took 38 minutes without
> directory indexing... and 11 seconds with the directory indexing turned on.

Not mentioned, but very important to note is that the dir indexing code
is 100% compatible with older kernels (even back to 1.2 days) for both
read and write.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-09-25 20:32:11

by Dave Jones

[permalink] [raw]

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

On Wed, Sep 25, 2002 at 04:03:44PM -0400, [email protected] wrote:

> This patch significantly increases the speed of using large directories.
> Creating 100,000 files in a single directory took 38 minutes without
> directory indexing... and 11 seconds with the directory indexing turned on.

Just curious.. what measurable overhead (if any) is there of indexing
dirs with smaller numbers of files vs non-indexed ?
If so, where would be the break-even point ?

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-09-25 21:05:24

by Andreas Dilger

[permalink] [raw]

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

On Sep 25, 2002 21:41 +0100, Dave Jones wrote:
> On Wed, Sep 25, 2002 at 04:03:44PM -0400, [email protected] wrote:
>
> > This patch significantly increases the speed of using large directories.
> > Creating 100,000 files in a single directory took 38 minutes without
> > directory indexing... and 11 seconds with the directory indexing turned on.
>
> Just curious.. what measurable overhead (if any) is there of indexing
> dirs with smaller numbers of files vs non-indexed ?
> If so, where would be the break-even point ?

No overhead at all for directories 1 block in size. The htree code uses
the existing "search leaf block" code for such a directory directly.
For directories > 1 block in size, you have the index (1 block
overhead), but also the benefit that you are only searching 1/N of the
blocks for an entry (the leaf block searching code remains the same,
just the "which block to search" code is activated.

So, in summary, htree is never slower than an un-indexed directory, so
there is never really a time when you wouldn't want to use it.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-09-25 21:30:09

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

On Wed, Sep 25, 2002 at 09:41:01PM +0100, Dave Jones wrote:
> Just curious.. what measurable overhead (if any) is there of indexing
> dirs with smaller numbers of files vs non-indexed ?
> If so, where would be the break-even point ?

It should be negligible to the point of being non-measurable. If the
number of files fit within a single block, the format doesn't change
at all. Once the directory is grows so there are two block's worth of
directory entries, then we move to an indexed format, and every file
will be found within two disk block reads, and on average we will need
to do comparisons on half a block worth of directory names. Without
directory indexing, on average a lookup will succeed in 1.5 disk reads
(50% will require one disk block read, and 50% will require two disk
block reads), and on average we will need to do comparisons on a full
block's worth of directory entries.

So when the directory is two blocks' worth of data, it's a slight lose
if you are looking at things from the point of view of disk reads, but
we're winning already from the point of view of CPU time. Not that
this matters; it won't be measureable because the block read
statistics only apply the first time you search the directory. After
that, the directory blocks will be in cache, and the only thing that
matters is the CPU time. And the amount of CPU time that it takes to
search directory entries, whether it we need to search on average 1
block or half a block worth of directory entries, is small enough to
be not an issue.

Not that it matters, but for the record though, we break-even on disk
reads once the directory is 3 blocks long. At that point, linear
search will require on average reading 2 blocks (1*1/3 + 2*1/3 + 3*1/3
== 2) and the indexed directory will still require 2 disk reads. On
the CPU front, the linear search will require comparisons with 1.5
blocks worth of directory entries, while the indexed directory will
still require only 0.5 blocks worth of directory comparisons. (I'm
ignoring here the CPU time it takes to calculate the hash, but it's
the time to search the directory blocks that matters, since that's
where you'll have all of the memory cache misses that will slow down
the search.)

Oh, and finally, for a small directory, it won't be measurable because
after the first time around, all of the directory entries will fit in
the dcache. :-)

- Ted

2002-09-25 22:49:17

by Jeff Garzik

[permalink] [raw]

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

[email protected] wrote:
> I've given this code a good bit of testing, under both 2.4 and 2.5
> kernels, and believe that it is ready for prime-time. Please pull it
> from:
>
> bk://extfs.bkbits.net/for-linus-htree-2.5

Can you post a GNU patch too, for public lookover and independent
integration?

Jeff

2002-09-25 23:25:07

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

On Wed, Sep 25, 2002 at 06:54:00PM -0400, Jeff Garzik wrote:
>
> Can you post a GNU patch too, for public lookover and independent
> integration?
>

Sure. The patch is a bit big for e-mail, but you can find it at:

http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.5.38

There is also a 2.4.19 patch available as well:

http://thunk.org/tytso/linux/ext3-dxdir/patch-ext3-dxdir-2.4.19-2

- Ted

2002-09-25 23:40:28

Subject: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Attachments:

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Attachments:

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Attachments:

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Attachments:

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [Ext2-devel] Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [Ext2-devel] Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [Ext2-devel] Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: Re: [BK PATCH] Add ext3 indexed directory (htree) support

Subject: [PATCH] fix htree dir corrupt after fsck -fD

Subject: Re: [PATCH] fix htree dir corrupt after fsck -fD

Subject: Re: [PATCH] fix htree dir corrupt after fsck -fD

Attachments:

Subject: Re: [PATCH] fix htree dir corrupt after fsck -fD

Subject: Re: [PATCH] fix htree dir corrupt after fsck -fD

Subject: Re: [PATCH] fix htree dir corrupt after fsck -fD