2012-08-03 18:33:39

by djwong

[permalink] [raw]
Subject: Re: How to understand get_dx_countlimit?

On Wed, Aug 01, 2012 at 10:50:25PM +0800, Wang Sheng-Hui wrote:
> Dear all,
>
> Sorry to trouble you!
>
> I'm confused by the namei.c/get_dx_countlimit.
> This function seems support metadata checksum for dir/dx.
>
> I wonder what kind of dirent would meet:
>
> le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb)
>
> The tail dirent of one dir block?

This case usually indicates that there's one big "empty" dirent that's hiding
some dx hash tree data. These ought to be non-root dx tree nodes.

> And the case "le16_to_cpu(dirent->rec_len) == 12"?

In a hashed directory, the first block (i.e. the root of the dx tree, if there
even is a tree) uses the first 24 bytes to present dirents for "." and "..". The
remaining space is a big "empty" dirent that hides the root of the dx tree.

> I suspect these kinds of dirents are the tail ones, but
> I cannot figure out the physical layout for one dir block
> with metadat checksum, e.g in which case we would have dirents
> meet the conditions in the function get_dx_countlimit?

FYI, there's some (slightly out of date) additional reference data at
https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout.

Before hashed directories, a directory file consisted of variable-length
dirents that weren't in any particular order. The dirents are packed in
such a way that they do not cross $BLOCKSIZE (usually 4KiB) boundaries. To add
metadata checksumming to one of these classic dirent blocks, I create an empty
"dirent" in the last 12 bytes of the block that looks deleted, and stuff the
crc32 into the name field. Obviously, if there's no room in the block then
e2fsck has to rebuild the directory.

When hashed directories were added, it was desired to retrofit them into the
existing directory file structure in such a way that the old code could read
the directory file without getting confused by the tree data. Furthermore, it
was decided that tree data should not be mixed in with regular dirents, i.e.
given a block in a directory, it either contains tree data or dirents pointing
to inodes, but not both.

To accomplish that, a block containing tree data is given a dirent header that
doesn't point to an inode ("null dirent"), since dirents that don't point to
valid inodes are skipped over by the old ext2 code. The dirent header claims
to take up all the space in the block, and the tree data goes in the space that
normally stores the file name. If metadata checksumming is enabled, the last
dx_entry in the tree block is reserved for storing the checksum.

There is one exception to what I just wrote -- the old ext2 code expects the
first block of a directory file to contain (at offset zero) two dirents
pointing to "." and "..". Therefore, the root of the tree is encapsulated
inside a null dirent (just like the non-root nodes, as I describe above) but
the null dirent begins 24 bytes into that first block, instead of at the very
beginning of the block.

Hope that doesn't muddy the situation any more....

--D
>
> Any explanations are welcomed!
>
> Thanks,
> Sheng-Hui
>



2012-08-04 04:46:10

by Wang Sheng-Hui

[permalink] [raw]
Subject: Re: How to understand get_dx_countlimit?

On 2012年08月04日 02:33, Darrick J. Wong wrote:
> On Wed, Aug 01, 2012 at 10:50:25PM +0800, Wang Sheng-Hui wrote:
>> Dear all,
>>
>> Sorry to trouble you!
>>
>> I'm confused by the namei.c/get_dx_countlimit.
>> This function seems support metadata checksum for dir/dx.
>>
>> I wonder what kind of dirent would meet:
>>
>> le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb)
>>
>> The tail dirent of one dir block?
>
> This case usually indicates that there's one big "empty" dirent that's hiding
> some dx hash tree data. These ought to be non-root dx tree nodes.
>
>> And the case "le16_to_cpu(dirent->rec_len) == 12"?
>
> In a hashed directory, the first block (i.e. the root of the dx tree, if there
> even is a tree) uses the first 24 bytes to present dirents for "." and "..". The
> remaining space is a big "empty" dirent that hides the root of the dx tree.
>
>> I suspect these kinds of dirents are the tail ones, but
>> I cannot figure out the physical layout for one dir block
>> with metadat checksum, e.g in which case we would have dirents
>> meet the conditions in the function get_dx_countlimit?
>
> FYI, there's some (slightly out of date) additional reference data at
> https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout.
>
> Before hashed directories, a directory file consisted of variable-length
> dirents that weren't in any particular order. The dirents are packed in
> such a way that they do not cross $BLOCKSIZE (usually 4KiB) boundaries. To add
> metadata checksumming to one of these classic dirent blocks, I create an empty
> "dirent" in the last 12 bytes of the block that looks deleted, and stuff the
> crc32 into the name field. Obviously, if there's no room in the block then
> e2fsck has to rebuild the directory.
>
> When hashed directories were added, it was desired to retrofit them into the
> existing directory file structure in such a way that the old code could read
> the directory file without getting confused by the tree data. Furthermore, it
> was decided that tree data should not be mixed in with regular dirents, i.e.
> given a block in a directory, it either contains tree data or dirents pointing
> to inodes, but not both.
>
> To accomplish that, a block containing tree data is given a dirent header that
> doesn't point to an inode ("null dirent"), since dirents that don't point to
> valid inodes are skipped over by the old ext2 code. The dirent header claims
> to take up all the space in the block, and the tree data goes in the space that
> normally stores the file name. If metadata checksumming is enabled, the last
> dx_entry in the tree block is reserved for storing the checksum.
>
> There is one exception to what I just wrote -- the old ext2 code expects the
> first block of a directory file to contain (at offset zero) two dirents
> pointing to "." and "..". Therefore, the root of the tree is encapsulated
> inside a null dirent (just like the non-root nodes, as I describe above) but
> the null dirent begins 24 bytes into that first block, instead of at the very
> beginning of the block.
>
> Hope that doesn't muddy the situation any more....
>
> --D

Thanks a lot, Darrick!

Regards,
Sheng-Hui
>>
>> Any explanations are welcomed!
>>
>> Thanks,
>> Sheng-Hui
>>
>