2012-07-31 13:12:41

by Wang Sheng-Hui

[permalink] [raw]
Subject: Questions about dx --- hash conflicts and limit

Dear all,

I have 2 questions about dx in ext4:

1) hash conflict

I walked through the ext4/namei.c, but didn't find any
code dealing with hash conflicts.

I wonder if some name hash conflicts, how can they be stored
and retrieved with the dir ops?

2) limit on size of dir

For the limit on items of the dir, the dir size is limited
to a height of 2 HTree:
root node
|
intermidate node ...
| |
leaf node...

Is this right?



Any explanations are welcomed.


thanks,


2012-07-31 18:56:50

by Andreas Dilger

[permalink] [raw]
Subject: Re: Questions about dx --- hash conflicts and limit

On 2012-07-31, at 6:12, Wang Sheng-Hui <[email protected]> wrote:
>
> I have 2 questions about dx in ext4:
>
> 1) hash conflict
>
> I walked through the ext4/namei.c, but didn't find any
> code dealing with hash conflicts.
>
> I wonder if some name hash conflicts, how can they be stored
> and retrieved with the dir ops?

Sorry, don't have code available to look at to comment on this issue.

> 2) limit on size of dir
>
> For the limit on items of the dir, the dir size is limited
> to a height of 2 HTree:
> root node
> |
> intermidate node ...
> | |
> leaf node...
>
> Is this right?

Correct, though this is only an implementation limit and not a format limit.

We have a patch to increase the htree depth to 3 levels and beyond 2GB directories, but it needs e2fsck support still, and it needs to be separated from a patch that also allows multi-threaded access to the same directory (though not via the VFS, unfortunately).

You can find this patch in the Lustre Git repo at git.whamcloud.com/fs/lustre-release in the ldiskfs/patches directory, if you are interested in this.

Cheers, Andreas

2012-07-31 20:24:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Questions about dx --- hash conflicts and limit

On Tue, Jul 31, 2012 at 09:12:24PM +0800, Wang Sheng-Hui wrote:
>
> I walked through the ext4/namei.c, but didn't find any
> code dealing with hash conflicts.
>
> I wonder if some name hash conflicts, how can they be stored
> and retrieved with the dir ops?

The hash lookup points us at a particular directory leaf block where
we start the search. All directory entries that have the same has are
stored in the same directory leaf block. What if there are more
directory entries that have the same hash than can fit in a single
leaf block? Then we look at the next block (by tree order) in the
directory. The low bit in the hash key is set to 1 in the index entry
to indicate that the block in question is a continuation block.

The reason why we need to know about the continuation block is if you
have so many collisions that you they spill across not only different
index blocks, but the 2nd order index block. Basically, if you want
to ever muck with the namei code and try to improve it, there are all
sorts of scary edges cases you have to handle, and these are not well
documented. Sorry about that....

Unfortunately namei.c isn't well documented; the original author of
this code, Daniel Phillips didn't really believe in comments and
really liked very deeply indented functions. I did a huge amount of
cleanup before these patches were included in ext3, but looking back,
I wish I had done more cleanup and added more comments... :-(

BTW, if you *are* thinking about mucking about in the namei code, my
suggestion would be to create a new checksum function which does a
simple additive checksum, and then masks off all but the two lowest
bits, so you have a degenerate hash function which has only 4 possible
values. Then use a 1k block file system, and fill the directory with
lots and lots of files, and test away. That should stress all of the
various corner cases quite nicely. :-)

- Ted