2021-09-01 02:05:23

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 0/3] ext4: fix a inode checksum error

We find a checksum error and a inode corruption problem while doing
stress test, this 3 patches address to fix them. The first two patches
are prepare to do the fix, the last patch fix these two issue.

- Checksum error

EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

- Inode corruption

e2fsck 1.46.0 (29-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'foo' in / (2) has deleted/unused inode 17. Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Inode bitmap differences: -17
Fix<y>? yes
Free inodes count wrong for group #0 (32750, counted=32751).
Fix<y>? yes
Free inodes count wrong (32750, counted=32751).
Fix<y>? yes

Changes since v4:
- Drop first three already applied patches.
- Remove 'in_mem' parameter passing __ext4_get_inode_loc() in the last
patch.

Changes since v3:
- Postpone initialization to ext4_do_update_inode() may cause zeroout
newly set xattr entry. So switch to do initialization in
__ext4_get_inode_loc().

Changes since v2:
- Instead of using WARN_ON_ONCE to prevent ext4_do_update_inode()
return before filling the inode buffer, keep the error and postpone
the report after the updating in the third patch.
- Fix some language mistacks in the last patch.

Changes since v1:
- Add a patch to prevent ext4_do_update_inode() return before filling
the inode buffer.
- Do not use BH_New flag to indicate the empty buffer, postpone the
zero and uptodate logic into ext4_do_update_inode() before filling
the inode buffer.

Thanks,
Yi.

Zhang Yi (3):
ext4: factor out ext4_fill_raw_inode()
ext4: move ext4_fill_raw_inode() related functions
ext4: prevent getting empty inode buffer

fs/ext4/inode.c | 316 +++++++++++++++++++++++++-----------------------
1 file changed, 165 insertions(+), 151 deletions(-)

--
2.31.1


2021-09-01 02:05:23

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions

In preparation for calling ext4_fill_raw_inode() in
__ext4_get_inode_loc(), move three related functions before
__ext4_get_inode_loc(), no logical change.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 293 ++++++++++++++++++++++++------------------------
1 file changed, 147 insertions(+), 146 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c7186460c14d..3c36e701e30e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4292,6 +4292,153 @@ int ext4_truncate(struct inode *inode)
return err;
}

+static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
+{
+ if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+ return inode_peek_iversion_raw(inode);
+ else
+ return inode_peek_iversion(inode);
+}
+
+static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
+ struct ext4_inode_info *ei)
+{
+ struct inode *inode = &(ei->vfs_inode);
+ u64 i_blocks = READ_ONCE(inode->i_blocks);
+ struct super_block *sb = inode->i_sb;
+
+ if (i_blocks <= ~0U) {
+ /*
+ * i_blocks can be represented in a 32 bit variable
+ * as multiple of 512 bytes
+ */
+ raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
+ raw_inode->i_blocks_high = 0;
+ ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
+ return 0;
+ }
+
+ /*
+ * This should never happen since sb->s_maxbytes should not have
+ * allowed this, sb->s_maxbytes was set according to the huge_file
+ * feature in ext4_fill_super().
+ */
+ if (!ext4_has_feature_huge_file(sb))
+ return -EFSCORRUPTED;
+
+ if (i_blocks <= 0xffffffffffffULL) {
+ /*
+ * i_blocks can be represented in a 48 bit variable
+ * as multiple of 512 bytes
+ */
+ raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
+ raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+ ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
+ } else {
+ ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
+ /* i_block is stored in file system block size */
+ i_blocks = i_blocks >> (inode->i_blkbits - 9);
+ raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
+ raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+ }
+ return 0;
+}
+
+static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
+{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ uid_t i_uid;
+ gid_t i_gid;
+ projid_t i_projid;
+ int block;
+ int err;
+
+ err = ext4_inode_blocks_set(raw_inode, ei);
+
+ raw_inode->i_mode = cpu_to_le16(inode->i_mode);
+ i_uid = i_uid_read(inode);
+ i_gid = i_gid_read(inode);
+ i_projid = from_kprojid(&init_user_ns, ei->i_projid);
+ if (!(test_opt(inode->i_sb, NO_UID32))) {
+ raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
+ raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
+ /*
+ * Fix up interoperability with old kernels. Otherwise,
+ * old inodes get re-used with the upper 16 bits of the
+ * uid/gid intact.
+ */
+ if (ei->i_dtime && list_empty(&ei->i_orphan)) {
+ raw_inode->i_uid_high = 0;
+ raw_inode->i_gid_high = 0;
+ } else {
+ raw_inode->i_uid_high =
+ cpu_to_le16(high_16_bits(i_uid));
+ raw_inode->i_gid_high =
+ cpu_to_le16(high_16_bits(i_gid));
+ }
+ } else {
+ raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
+ raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
+ raw_inode->i_uid_high = 0;
+ raw_inode->i_gid_high = 0;
+ }
+ raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
+
+ EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+ EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
+ EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
+ EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
+
+ raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
+ raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
+ if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
+ raw_inode->i_file_acl_high =
+ cpu_to_le16(ei->i_file_acl >> 32);
+ raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
+ ext4_isize_set(raw_inode, ei->i_disksize);
+
+ raw_inode->i_generation = cpu_to_le32(inode->i_generation);
+ if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
+ if (old_valid_dev(inode->i_rdev)) {
+ raw_inode->i_block[0] =
+ cpu_to_le32(old_encode_dev(inode->i_rdev));
+ raw_inode->i_block[1] = 0;
+ } else {
+ raw_inode->i_block[0] = 0;
+ raw_inode->i_block[1] =
+ cpu_to_le32(new_encode_dev(inode->i_rdev));
+ raw_inode->i_block[2] = 0;
+ }
+ } else if (!ext4_has_inline_data(inode)) {
+ for (block = 0; block < EXT4_N_BLOCKS; block++)
+ raw_inode->i_block[block] = ei->i_data[block];
+ }
+
+ if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
+ u64 ivers = ext4_inode_peek_iversion(inode);
+
+ raw_inode->i_disk_version = cpu_to_le32(ivers);
+ if (ei->i_extra_isize) {
+ if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+ raw_inode->i_version_hi =
+ cpu_to_le32(ivers >> 32);
+ raw_inode->i_extra_isize =
+ cpu_to_le16(ei->i_extra_isize);
+ }
+ }
+
+ if (i_projid != EXT4_DEF_PROJID &&
+ !ext4_has_feature_project(inode->i_sb))
+ err = err ?: -EFSCORRUPTED;
+
+ if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
+ EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
+ raw_inode->i_projid = cpu_to_le32(i_projid);
+
+ ext4_inode_csum_set(inode, raw_inode, ei);
+ return err;
+}
+
/*
* ext4_get_inode_loc returns with an extra refcount against the inode's
* underlying buffer_head on success. If 'in_mem' is true, we have all
@@ -4580,13 +4727,6 @@ static inline void ext4_inode_set_iversion_queried(struct inode *inode, u64 val)
else
inode_set_iversion_queried(inode, val);
}
-static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
-{
- if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
- return inode_peek_iversion_raw(inode);
- else
- return inode_peek_iversion(inode);
-}

struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
ext4_iget_flags flags, const char *function,
@@ -4902,50 +5042,6 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
return ERR_PTR(ret);
}

-static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
- struct ext4_inode_info *ei)
-{
- struct inode *inode = &(ei->vfs_inode);
- u64 i_blocks = READ_ONCE(inode->i_blocks);
- struct super_block *sb = inode->i_sb;
-
- if (i_blocks <= ~0U) {
- /*
- * i_blocks can be represented in a 32 bit variable
- * as multiple of 512 bytes
- */
- raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
- raw_inode->i_blocks_high = 0;
- ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
- return 0;
- }
-
- /*
- * This should never happen since sb->s_maxbytes should not have
- * allowed this, sb->s_maxbytes was set according to the huge_file
- * feature in ext4_fill_super().
- */
- if (!ext4_has_feature_huge_file(sb))
- return -EFSCORRUPTED;
-
- if (i_blocks <= 0xffffffffffffULL) {
- /*
- * i_blocks can be represented in a 48 bit variable
- * as multiple of 512 bytes
- */
- raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
- raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
- ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
- } else {
- ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
- /* i_block is stored in file system block size */
- i_blocks = i_blocks >> (inode->i_blkbits - 9);
- raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
- raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
- }
- return 0;
-}
-
static void __ext4_update_other_inode_time(struct super_block *sb,
unsigned long orig_ino,
unsigned long ino,
@@ -5006,101 +5102,6 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
rcu_read_unlock();
}

-static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
-{
- struct ext4_inode_info *ei = EXT4_I(inode);
- uid_t i_uid;
- gid_t i_gid;
- projid_t i_projid;
- int block;
- int err;
-
- err = ext4_inode_blocks_set(raw_inode, ei);
-
- raw_inode->i_mode = cpu_to_le16(inode->i_mode);
- i_uid = i_uid_read(inode);
- i_gid = i_gid_read(inode);
- i_projid = from_kprojid(&init_user_ns, ei->i_projid);
- if (!(test_opt(inode->i_sb, NO_UID32))) {
- raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
- raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
- /*
- * Fix up interoperability with old kernels. Otherwise,
- * old inodes get re-used with the upper 16 bits of the
- * uid/gid intact.
- */
- if (ei->i_dtime && list_empty(&ei->i_orphan)) {
- raw_inode->i_uid_high = 0;
- raw_inode->i_gid_high = 0;
- } else {
- raw_inode->i_uid_high =
- cpu_to_le16(high_16_bits(i_uid));
- raw_inode->i_gid_high =
- cpu_to_le16(high_16_bits(i_gid));
- }
- } else {
- raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
- raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
- raw_inode->i_uid_high = 0;
- raw_inode->i_gid_high = 0;
- }
- raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
-
- EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
- EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
- EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
- EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
-
- raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
- raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
- if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
- raw_inode->i_file_acl_high =
- cpu_to_le16(ei->i_file_acl >> 32);
- raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
- ext4_isize_set(raw_inode, ei->i_disksize);
-
- raw_inode->i_generation = cpu_to_le32(inode->i_generation);
- if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
- if (old_valid_dev(inode->i_rdev)) {
- raw_inode->i_block[0] =
- cpu_to_le32(old_encode_dev(inode->i_rdev));
- raw_inode->i_block[1] = 0;
- } else {
- raw_inode->i_block[0] = 0;
- raw_inode->i_block[1] =
- cpu_to_le32(new_encode_dev(inode->i_rdev));
- raw_inode->i_block[2] = 0;
- }
- } else if (!ext4_has_inline_data(inode)) {
- for (block = 0; block < EXT4_N_BLOCKS; block++)
- raw_inode->i_block[block] = ei->i_data[block];
- }
-
- if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
- u64 ivers = ext4_inode_peek_iversion(inode);
-
- raw_inode->i_disk_version = cpu_to_le32(ivers);
- if (ei->i_extra_isize) {
- if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
- raw_inode->i_version_hi =
- cpu_to_le32(ivers >> 32);
- raw_inode->i_extra_isize =
- cpu_to_le16(ei->i_extra_isize);
- }
- }
-
- if (i_projid != EXT4_DEF_PROJID &&
- !ext4_has_feature_project(inode->i_sb))
- err = err ?: -EFSCORRUPTED;
-
- if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
- EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
- raw_inode->i_projid = cpu_to_le32(i_projid);
-
- ext4_inode_csum_set(inode, raw_inode, ei);
- return err;
-}
-
/*
* Post the struct inode info into an on-disk inode location in the
* buffer-cache. This gobbles the caller's reference to the
--
2.31.1

2021-09-01 02:08:34

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v5 3/3] ext4: prevent getting empty inode buffer

In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch initialize the inode buffer by filling the in-mem inode
contents if we skip read I/O, ensure that the buffer is really uptodate.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c36e701e30e..a8388ec91f9f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4441,12 +4441,12 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode

/*
* ext4_get_inode_loc returns with an extra refcount against the inode's
- * underlying buffer_head on success. If 'in_mem' is true, we have all
- * data in memory that is needed to recreate the on-disk version of this
- * inode.
+ * underlying buffer_head on success. If we pass 'inode' and it does not
+ * have in-inode xattr, we have all inode data in memory that is needed
+ * to recreate the on-disk version of this inode.
*/
static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
- struct ext4_iloc *iloc, int in_mem,
+ struct inode *inode, struct ext4_iloc *iloc,
ext4_fsblk_t *ret_block)
{
struct ext4_group_desc *gdp;
@@ -4486,7 +4486,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
* is the only valid inode in the block, we need not read the
* block.
*/
- if (in_mem) {
+ if (inode && !ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
struct buffer_head *bitmap_bh;
int i, start;

@@ -4514,8 +4514,13 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
}
brelse(bitmap_bh);
if (i == start + inodes_per_block) {
+ struct ext4_inode *raw_inode =
+ (struct ext4_inode *) (bh->b_data + iloc->offset);
+
/* all other inodes are free, so skip I/O */
memset(bh->b_data, 0, bh->b_size);
+ if (!ext4_test_inode_state(inode, EXT4_STATE_NEW))
+ ext4_fill_raw_inode(inode, raw_inode);
set_buffer_uptodate(bh);
unlock_buffer(bh);
goto has_buffer;
@@ -4576,7 +4581,7 @@ static int __ext4_get_inode_loc_noinmem(struct inode *inode,
ext4_fsblk_t err_blk;
int ret;

- ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0,
+ ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, NULL, iloc,
&err_blk);

if (ret == -EIO)
@@ -4591,9 +4596,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
ext4_fsblk_t err_blk;
int ret;

- /* We have all inode data except xattrs in memory here. */
- ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc,
- !ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk);
+ ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, inode, iloc,
+ &err_blk);

if (ret == -EIO)
ext4_error_inode_block(inode, err_blk, EIO,
@@ -4606,7 +4610,7 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
struct ext4_iloc *iloc)
{
- return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL);
+ return __ext4_get_inode_loc(sb, ino, NULL, iloc, NULL);
}

static bool ext4_should_enable_dax(struct inode *inode)
--
2.31.1

2021-09-20 16:17:17

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions

On Wed 01-09-21 10:09:54, Zhang Yi wrote:
> In preparation for calling ext4_fill_raw_inode() in
> __ext4_get_inode_loc(), move three related functions before
> __ext4_get_inode_loc(), no logical change.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 293 ++++++++++++++++++++++++------------------------
> 1 file changed, 147 insertions(+), 146 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c7186460c14d..3c36e701e30e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4292,6 +4292,153 @@ int ext4_truncate(struct inode *inode)
> return err;
> }
>
> +static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
> +{
> + if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> + return inode_peek_iversion_raw(inode);
> + else
> + return inode_peek_iversion(inode);
> +}
> +
> +static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
> + struct ext4_inode_info *ei)
> +{
> + struct inode *inode = &(ei->vfs_inode);
> + u64 i_blocks = READ_ONCE(inode->i_blocks);
> + struct super_block *sb = inode->i_sb;
> +
> + if (i_blocks <= ~0U) {
> + /*
> + * i_blocks can be represented in a 32 bit variable
> + * as multiple of 512 bytes
> + */
> + raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
> + raw_inode->i_blocks_high = 0;
> + ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> + return 0;
> + }
> +
> + /*
> + * This should never happen since sb->s_maxbytes should not have
> + * allowed this, sb->s_maxbytes was set according to the huge_file
> + * feature in ext4_fill_super().
> + */
> + if (!ext4_has_feature_huge_file(sb))
> + return -EFSCORRUPTED;
> +
> + if (i_blocks <= 0xffffffffffffULL) {
> + /*
> + * i_blocks can be represented in a 48 bit variable
> + * as multiple of 512 bytes
> + */
> + raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
> + raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> + ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> + } else {
> + ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> + /* i_block is stored in file system block size */
> + i_blocks = i_blocks >> (inode->i_blkbits - 9);
> + raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
> + raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> + }
> + return 0;
> +}
> +
> +static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
> +{
> + struct ext4_inode_info *ei = EXT4_I(inode);
> + uid_t i_uid;
> + gid_t i_gid;
> + projid_t i_projid;
> + int block;
> + int err;
> +
> + err = ext4_inode_blocks_set(raw_inode, ei);
> +
> + raw_inode->i_mode = cpu_to_le16(inode->i_mode);
> + i_uid = i_uid_read(inode);
> + i_gid = i_gid_read(inode);
> + i_projid = from_kprojid(&init_user_ns, ei->i_projid);
> + if (!(test_opt(inode->i_sb, NO_UID32))) {
> + raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
> + raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
> + /*
> + * Fix up interoperability with old kernels. Otherwise,
> + * old inodes get re-used with the upper 16 bits of the
> + * uid/gid intact.
> + */
> + if (ei->i_dtime && list_empty(&ei->i_orphan)) {
> + raw_inode->i_uid_high = 0;
> + raw_inode->i_gid_high = 0;
> + } else {
> + raw_inode->i_uid_high =
> + cpu_to_le16(high_16_bits(i_uid));
> + raw_inode->i_gid_high =
> + cpu_to_le16(high_16_bits(i_gid));
> + }
> + } else {
> + raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
> + raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
> + raw_inode->i_uid_high = 0;
> + raw_inode->i_gid_high = 0;
> + }
> + raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
> +
> + EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
> + EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
> + EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
> + EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
> +
> + raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
> + raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
> + if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
> + raw_inode->i_file_acl_high =
> + cpu_to_le16(ei->i_file_acl >> 32);
> + raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
> + ext4_isize_set(raw_inode, ei->i_disksize);
> +
> + raw_inode->i_generation = cpu_to_le32(inode->i_generation);
> + if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
> + if (old_valid_dev(inode->i_rdev)) {
> + raw_inode->i_block[0] =
> + cpu_to_le32(old_encode_dev(inode->i_rdev));
> + raw_inode->i_block[1] = 0;
> + } else {
> + raw_inode->i_block[0] = 0;
> + raw_inode->i_block[1] =
> + cpu_to_le32(new_encode_dev(inode->i_rdev));
> + raw_inode->i_block[2] = 0;
> + }
> + } else if (!ext4_has_inline_data(inode)) {
> + for (block = 0; block < EXT4_N_BLOCKS; block++)
> + raw_inode->i_block[block] = ei->i_data[block];
> + }
> +
> + if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
> + u64 ivers = ext4_inode_peek_iversion(inode);
> +
> + raw_inode->i_disk_version = cpu_to_le32(ivers);
> + if (ei->i_extra_isize) {
> + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
> + raw_inode->i_version_hi =
> + cpu_to_le32(ivers >> 32);
> + raw_inode->i_extra_isize =
> + cpu_to_le16(ei->i_extra_isize);
> + }
> + }
> +
> + if (i_projid != EXT4_DEF_PROJID &&
> + !ext4_has_feature_project(inode->i_sb))
> + err = err ?: -EFSCORRUPTED;
> +
> + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
> + EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
> + raw_inode->i_projid = cpu_to_le32(i_projid);
> +
> + ext4_inode_csum_set(inode, raw_inode, ei);
> + return err;
> +}
> +
> /*
> * ext4_get_inode_loc returns with an extra refcount against the inode's
> * underlying buffer_head on success. If 'in_mem' is true, we have all
> @@ -4580,13 +4727,6 @@ static inline void ext4_inode_set_iversion_queried(struct inode *inode, u64 val)
> else
> inode_set_iversion_queried(inode, val);
> }
> -static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
> -{
> - if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> - return inode_peek_iversion_raw(inode);
> - else
> - return inode_peek_iversion(inode);
> -}
>
> struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> ext4_iget_flags flags, const char *function,
> @@ -4902,50 +5042,6 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> return ERR_PTR(ret);
> }
>
> -static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
> - struct ext4_inode_info *ei)
> -{
> - struct inode *inode = &(ei->vfs_inode);
> - u64 i_blocks = READ_ONCE(inode->i_blocks);
> - struct super_block *sb = inode->i_sb;
> -
> - if (i_blocks <= ~0U) {
> - /*
> - * i_blocks can be represented in a 32 bit variable
> - * as multiple of 512 bytes
> - */
> - raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
> - raw_inode->i_blocks_high = 0;
> - ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> - return 0;
> - }
> -
> - /*
> - * This should never happen since sb->s_maxbytes should not have
> - * allowed this, sb->s_maxbytes was set according to the huge_file
> - * feature in ext4_fill_super().
> - */
> - if (!ext4_has_feature_huge_file(sb))
> - return -EFSCORRUPTED;
> -
> - if (i_blocks <= 0xffffffffffffULL) {
> - /*
> - * i_blocks can be represented in a 48 bit variable
> - * as multiple of 512 bytes
> - */
> - raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
> - raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> - ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> - } else {
> - ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> - /* i_block is stored in file system block size */
> - i_blocks = i_blocks >> (inode->i_blkbits - 9);
> - raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
> - raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> - }
> - return 0;
> -}
> -
> static void __ext4_update_other_inode_time(struct super_block *sb,
> unsigned long orig_ino,
> unsigned long ino,
> @@ -5006,101 +5102,6 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
> rcu_read_unlock();
> }
>
> -static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
> -{
> - struct ext4_inode_info *ei = EXT4_I(inode);
> - uid_t i_uid;
> - gid_t i_gid;
> - projid_t i_projid;
> - int block;
> - int err;
> -
> - err = ext4_inode_blocks_set(raw_inode, ei);
> -
> - raw_inode->i_mode = cpu_to_le16(inode->i_mode);
> - i_uid = i_uid_read(inode);
> - i_gid = i_gid_read(inode);
> - i_projid = from_kprojid(&init_user_ns, ei->i_projid);
> - if (!(test_opt(inode->i_sb, NO_UID32))) {
> - raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
> - raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
> - /*
> - * Fix up interoperability with old kernels. Otherwise,
> - * old inodes get re-used with the upper 16 bits of the
> - * uid/gid intact.
> - */
> - if (ei->i_dtime && list_empty(&ei->i_orphan)) {
> - raw_inode->i_uid_high = 0;
> - raw_inode->i_gid_high = 0;
> - } else {
> - raw_inode->i_uid_high =
> - cpu_to_le16(high_16_bits(i_uid));
> - raw_inode->i_gid_high =
> - cpu_to_le16(high_16_bits(i_gid));
> - }
> - } else {
> - raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
> - raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
> - raw_inode->i_uid_high = 0;
> - raw_inode->i_gid_high = 0;
> - }
> - raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
> -
> - EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
> - EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
> - EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
> - EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
> -
> - raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
> - raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
> - if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
> - raw_inode->i_file_acl_high =
> - cpu_to_le16(ei->i_file_acl >> 32);
> - raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
> - ext4_isize_set(raw_inode, ei->i_disksize);
> -
> - raw_inode->i_generation = cpu_to_le32(inode->i_generation);
> - if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
> - if (old_valid_dev(inode->i_rdev)) {
> - raw_inode->i_block[0] =
> - cpu_to_le32(old_encode_dev(inode->i_rdev));
> - raw_inode->i_block[1] = 0;
> - } else {
> - raw_inode->i_block[0] = 0;
> - raw_inode->i_block[1] =
> - cpu_to_le32(new_encode_dev(inode->i_rdev));
> - raw_inode->i_block[2] = 0;
> - }
> - } else if (!ext4_has_inline_data(inode)) {
> - for (block = 0; block < EXT4_N_BLOCKS; block++)
> - raw_inode->i_block[block] = ei->i_data[block];
> - }
> -
> - if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
> - u64 ivers = ext4_inode_peek_iversion(inode);
> -
> - raw_inode->i_disk_version = cpu_to_le32(ivers);
> - if (ei->i_extra_isize) {
> - if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
> - raw_inode->i_version_hi =
> - cpu_to_le32(ivers >> 32);
> - raw_inode->i_extra_isize =
> - cpu_to_le16(ei->i_extra_isize);
> - }
> - }
> -
> - if (i_projid != EXT4_DEF_PROJID &&
> - !ext4_has_feature_project(inode->i_sb))
> - err = err ?: -EFSCORRUPTED;
> -
> - if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
> - EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
> - raw_inode->i_projid = cpu_to_le32(i_projid);
> -
> - ext4_inode_csum_set(inode, raw_inode, ei);
> - return err;
> -}
> -
> /*
> * Post the struct inode info into an on-disk inode location in the
> * buffer-cache. This gobbles the caller's reference to the
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-09-20 16:18:52

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v5 3/3] ext4: prevent getting empty inode buffer

On Wed 01-09-21 10:09:55, Zhang Yi wrote:
> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
> inode buffer when the inode monopolize an inode block for performance
> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
> buffer to make it fine, but we could miss this call if something bad
> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
> empty inode buffer and trigger ext4 error.
>
> For example, if we remove a nonexistent xattr on inode A,
> ext4_xattr_set_handle() will return ENODATA before invoking
> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
> will get checksum error message in ext4_iget() when getting inode again.
>
> EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
>
> Even worse, if we allocate another inode B at the same inode block, it
> will corrupt the inode A on disk when write back inode B.
>
> So this patch initialize the inode buffer by filling the in-mem inode
> contents if we skip read I/O, ensure that the buffer is really uptodate.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 24 ++++++++++++++----------
> 1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3c36e701e30e..a8388ec91f9f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4441,12 +4441,12 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode
>
> /*
> * ext4_get_inode_loc returns with an extra refcount against the inode's
> - * underlying buffer_head on success. If 'in_mem' is true, we have all
> - * data in memory that is needed to recreate the on-disk version of this
> - * inode.
> + * underlying buffer_head on success. If we pass 'inode' and it does not
> + * have in-inode xattr, we have all inode data in memory that is needed
> + * to recreate the on-disk version of this inode.
> */
> static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
> - struct ext4_iloc *iloc, int in_mem,
> + struct inode *inode, struct ext4_iloc *iloc,
> ext4_fsblk_t *ret_block)
> {
> struct ext4_group_desc *gdp;
> @@ -4486,7 +4486,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
> * is the only valid inode in the block, we need not read the
> * block.
> */
> - if (in_mem) {
> + if (inode && !ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
> struct buffer_head *bitmap_bh;
> int i, start;
>
> @@ -4514,8 +4514,13 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
> }
> brelse(bitmap_bh);
> if (i == start + inodes_per_block) {
> + struct ext4_inode *raw_inode =
> + (struct ext4_inode *) (bh->b_data + iloc->offset);
> +
> /* all other inodes are free, so skip I/O */
> memset(bh->b_data, 0, bh->b_size);
> + if (!ext4_test_inode_state(inode, EXT4_STATE_NEW))
> + ext4_fill_raw_inode(inode, raw_inode);
> set_buffer_uptodate(bh);
> unlock_buffer(bh);
> goto has_buffer;
> @@ -4576,7 +4581,7 @@ static int __ext4_get_inode_loc_noinmem(struct inode *inode,
> ext4_fsblk_t err_blk;
> int ret;
>
> - ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0,
> + ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, NULL, iloc,
> &err_blk);
>
> if (ret == -EIO)
> @@ -4591,9 +4596,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
> ext4_fsblk_t err_blk;
> int ret;
>
> - /* We have all inode data except xattrs in memory here. */
> - ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc,
> - !ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk);
> + ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, inode, iloc,
> + &err_blk);
>
> if (ret == -EIO)
> ext4_error_inode_block(inode, err_blk, EIO,
> @@ -4606,7 +4610,7 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
> int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
> struct ext4_iloc *iloc)
> {
> - return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL);
> + return __ext4_get_inode_loc(sb, ino, NULL, iloc, NULL);
> }
>
> static bool ext4_should_enable_dax(struct inode *inode)
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-10-14 11:03:48

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH v5 0/3] ext4: fix a inode checksum error

Hi, Ted. Do you consider to merge this remaining patch set?

Thanks,
Yi.

On 2021/9/1 10:09, Zhang Yi wrote:
> We find a checksum error and a inode corruption problem while doing
> stress test, this 3 patches address to fix them. The first two patches
> are prepare to do the fix, the last patch fix these two issue.
>
> - Checksum error
>
> EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
>
> - Inode corruption
>
> e2fsck 1.46.0 (29-Jan-2020)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Entry 'foo' in / (2) has deleted/unused inode 17. Clear<y>? yes
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Inode bitmap differences: -17
> Fix<y>? yes
> Free inodes count wrong for group #0 (32750, counted=32751).
> Fix<y>? yes
> Free inodes count wrong (32750, counted=32751).
> Fix<y>? yes
>
> Changes since v4:
> - Drop first three already applied patches.
> - Remove 'in_mem' parameter passing __ext4_get_inode_loc() in the last
> patch.
>
> Changes since v3:
> - Postpone initialization to ext4_do_update_inode() may cause zeroout
> newly set xattr entry. So switch to do initialization in
> __ext4_get_inode_loc().
>
> Changes since v2:
> - Instead of using WARN_ON_ONCE to prevent ext4_do_update_inode()
> return before filling the inode buffer, keep the error and postpone
> the report after the updating in the third patch.
> - Fix some language mistacks in the last patch.
>
> Changes since v1:
> - Add a patch to prevent ext4_do_update_inode() return before filling
> the inode buffer.
> - Do not use BH_New flag to indicate the empty buffer, postpone the
> zero and uptodate logic into ext4_do_update_inode() before filling
> the inode buffer.
>
> Thanks,
> Yi.
>
> Zhang Yi (3):
> ext4: factor out ext4_fill_raw_inode()
> ext4: move ext4_fill_raw_inode() related functions
> ext4: prevent getting empty inode buffer
>
> fs/ext4/inode.c | 316 +++++++++++++++++++++++++-----------------------
> 1 file changed, 165 insertions(+), 151 deletions(-)
>

2021-10-28 14:47:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v5 0/3] ext4: fix a inode checksum error

On Wed, 1 Sep 2021 10:09:52 +0800, Zhang Yi wrote:
> We find a checksum error and a inode corruption problem while doing
> stress test, this 3 patches address to fix them. The first two patches
> are prepare to do the fix, the last patch fix these two issue.
>
> - Checksum error
>
> EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
>
> [...]

Applied, thanks!

[1/3] ext4: factor out ext4_fill_raw_inode()
commit: 6a6af6b5ee4363e29fc86552ddfd94c5000d845d
[2/3] ext4: move ext4_fill_raw_inode() related functions
commit: 4bc2b511975a6fe5ce9f2b2dcc84b152354d1f90
[3/3] ext4: prevent getting empty inode buffer
commit: fb5becb151d54652ae613e34157f1aadb8447ed8

Best regards,
--
Theodore Ts'o <[email protected]>