2023-07-29 07:55:57

by Stephen Zhang

[permalink] [raw]
Subject: [PATCH] ext4: Fix rec_len verify error

From: Shida Zhang <[email protected]>

with the configuration PAGE_SIZE 64k and filesystem blocksize 64k,
a problem occurred when more than 13 millon files were directly created
under a directory:

EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum

when enough files are created, the fake_dirent->reclen will be 0xffff.
it doesn't equal to blocksize 65536, i.e. 0x10000.

But it is not the same condition when blocksize equals to 4k.
when enough files are created, the fake_dirent->reclen will be 0x1000.
it equals to blocksize 4k, i.e. 0x1000.

The problem seems to be related to the limitation of the 16-bit field
when the blocksize is set to 64k. To address this, a special condition
was introduced to handle it properly.

Signed-off-by: Shida Zhang <[email protected]>
---
fs/ext4/namei.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0caf6c730ce3..a422cff25216 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -458,6 +458,9 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
root->info_length != sizeof(struct dx_root_info))
return NULL;
count_offset = 32;
+ } else if ((EXT4_BLOCK_SIZE(inode->i_sb) == 65536)
+ && (le16_to_cpu(dirent->rec_len) == 65535)) {
+ count_offset = 8;
} else
return NULL;

--
2.27.0



2023-07-29 19:35:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix rec_len verify error

On Jul 29, 2023, at 00:14, zhangshida <[email protected]> wrote:
>
> From: Shida Zhang <[email protected]>
>
> with the configuration PAGE_SIZE 64k and filesystem blocksize 64k,
> a problem occurred when more than 13 millon files were directly created
> under a directory:
>
> EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
> EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm xxxxx: dir seems corrupt? Run e2fsck -D.
> EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: comm xxxxx: Directory index failed checksum
>
> when enough files are created, the fake_dirent->reclen will be 0xffff.
> it doesn't equal to blocksize 65536, i.e. 0x10000.
>
> But it is not the same condition when blocksize equals to 4k.
> when enough files are created, the fake_dirent->reclen will be 0x1000.
> it equals to blocksize 4k, i.e. 0x1000.
>
> The problem seems to be related to the limitation of the 16-bit field
> when the blocksize is set to 64k. To address this, a special condition
> was introduced to handle it properly.
>
> Signed-off-by: Shida Zhang <[email protected]>
> ---
> fs/ext4/namei.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 0caf6c730ce3..a422cff25216 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -458,6 +458,9 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
> root->info_length != sizeof(struct dx_root_info))
> return NULL;
> count_offset = 32;
> + } else if ((EXT4_BLOCK_SIZE(inode->i_sb) == 65536)
> + && (le16_to_cpu(dirent->rec_len) == 65535)) {
> + count_offset = 8;

This should be moved up to the first if-block that is checking the block size:

if (le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb) ||
(le16_to_cpu(dirent->rec_len) == 65535 &&
EXT4_BLOCK_SIZE(inode->i_sb) >= 65536))
count_offset = 8;

since this is really the same case.

Ecen better would be to use ext4_rec_len_from_disk() to check the
length so that it keeps this large PAGE_SIZE logic in one place, and
does not add overhead on systems with smaller PAGE_SIZE:

int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);

if (ext4_rec_len_from_disk(dirent->rec_len, blocksize) == blocksize)
count_offset = 8;

Cheers, Andreas

> } else
> return NULL;
>
> --
> 2.27.0
>