LinuxLists.cc - [PATCH 0/2] fix ext4 i_blocks corruption in generic/269

2015-07-03 16:54:25

Subject: [PATCH 0/2] fix ext4 i_blocks corruption in generic/269

I find generic/269 fails on ext4 because of i_blocks corruption like

e2fsck 1.42.12 (29-Aug-2014)
Pass 1: Checking inodes, blocks, and sizes
Inode 4802, i_blocks is 280, should be 248. Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

ext4.img: ********** WARNING: Filesystem still has errors **********

ext4.img: 32752/32768 files (12.0% non-contiguous), 128483/131072 blocks

and find that it has something to do with the setattr and write
operations in fsstress, with delalloc enabled.

This can be reproduced more reliably by

# disable all other operations except create/mkdir/write/setattr
# the "-M 524288" restricts the setflags ioctl to EXT4_EXTENTS_FL
export FSSTRESS_AVOID="-fallocsp=0 -fattr_remove=0 -fattr_set=0 \
-fbulkstat=0 -fbulkstat1=0 -fchown=0 -fdread=0 -fdwrite=0 \
-ffallocate=0 -ffiemap=0 -ffreesp=0 -fgetattr=0 -fgetdents=0 \
-flink=0 -fmknod=0 -fpunch=0 -fzero=0 -fcollapse=0 -finsert=0 \
-fread=0 -freadlink=0 -frename=0 -fresvsp=0 -frmdir=0 -fsetxattr=0 \
-fstat=0 -fsymlink=0 -ftruncate=0 -funlink=0 -funresvsp=0 \
-fsetattr=5 -M 524288"
while ./check generic/269; do : ; done

And a simplified reproducer is

xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
umount /mnt/ext4
e2fsck -nf /dev/<device>

More details please see the commit log in patches.

Eryu Guan (2):
ext4: be more strict when migrating to non-extent based file
ext4: reserve hole in the migration to non-extent based file

fs/ext4/migrate.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

--
1.8.3.1

2015-07-03 16:55:35

by Eryu Guan

[permalink] [raw]

Subject: [PATCH 1/2] ext4: be more strict when migrating to non-extent based file

Currently the check in ext4_ind_migrate() is not enough before doing the
real conversion:

a) delayed allocated extents could bypass the check on eh->eh_entries
and eh->eh_depth

This can be demonstrated by this script

xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile

where testfile has two extents but still be converted to non-extent
based file format.

b) only extent length is checked but not the offset, which would result
in data lose (delalloc) or fs corruption (nodelalloc), because
non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
blocks

This can be demostrated by

xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
sync

If delalloc is enabled, dmesg prints
EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
EXT4-fs (dm-4): This should not happen!! Data will be lost

If delalloc is disabled, e2fsck -nf shows corruption
Inode 53, i_size is 5497558142976, should be 4096. Fix? no

Fix the two issues by

a) forcing all delayed allocation blocks to be allocated before checking
eh->eh_depth and eh->eh_entries
b) limiting the last logical block of the extent is within direct map

Signed-off-by: Eryu Guan <[email protected]>
---
fs/ext4/migrate.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index b52374e..3651141 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -620,6 +620,7 @@ int ext4_ind_migrate(struct inode *inode)
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_extent *ex;
unsigned int i, len;
+ ext4_lblk_t end;
ext4_fsblk_t blk;
handle_t *handle;
int ret;
@@ -633,6 +634,14 @@ int ext4_ind_migrate(struct inode *inode)
EXT4_FEATURE_RO_COMPAT_BIGALLOC))
return -EOPNOTSUPP;

+ /*
+ * In order to get correct extent info, force all delayed allocation
+ * blocks to be allocated, otherwise delayed allocation blocks may not
+ * be reflected and bypass the checks on extent header.
+ */
+ if (test_opt(inode->i_sb, DELALLOC))
+ ext4_alloc_da_blocks(inode);
+
handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
if (IS_ERR(handle))
return PTR_ERR(handle);
@@ -654,7 +663,8 @@ int ext4_ind_migrate(struct inode *inode)
else {
len = le16_to_cpu(ex->ee_len);
blk = ext4_ext_pblock(ex);
- if (len > EXT4_NDIR_BLOCKS) {
+ end = le32_to_cpu(ex->ee_block) + len - 1;
+ if (end >= EXT4_NDIR_BLOCKS) {
ret = -EOPNOTSUPP;
goto errout;
}
--
1.8.3.1

2015-07-03 16:56:44

by Eryu Guan

[permalink] [raw]

Subject: [PATCH 2/2] ext4: reserve hole in the migration to non-extent based file

Currently in ext4_ind_migrate() hole is not reserved, a following
delayed allocation write to the "hole" would reclaim the same data
blocks again and results in fs corruption.

# assmuing 4k block size ext4, with delalloc enabled
# skip the first block and write to the second block
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile

# converting to indirect-mapped file, which would move the data blocks
# to the beginning of the file, but extent status cache still marks
# that region as a hole
chattr -e /mnt/ext4/testfile

# delayed allocation writes to the "hole", reclaim the same data block
# again, results in i_blocks corruption
xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
umount /mnt/ext4
e2fsck -nf /dev/sda6
...
Inode 53, i_blocks is 16, should be 8. Fix? no
...

Signed-off-by: Eryu Guan <[email protected]>
---
fs/ext4/migrate.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 3651141..10824b0 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -620,7 +620,7 @@ int ext4_ind_migrate(struct inode *inode)
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_extent *ex;
unsigned int i, len;
- ext4_lblk_t end;
+ ext4_lblk_t start, end;
ext4_fsblk_t blk;
handle_t *handle;
int ret;
@@ -659,11 +659,12 @@ int ext4_ind_migrate(struct inode *inode)
goto errout;
}
if (eh->eh_entries == 0)
- blk = len = 0;
+ blk = len = start = end = 0;
else {
len = le16_to_cpu(ex->ee_len);
blk = ext4_ext_pblock(ex);
- end = le32_to_cpu(ex->ee_block) + len - 1;
+ start = le32_to_cpu(ex->ee_block);
+ end = start + len - 1;
if (end >= EXT4_NDIR_BLOCKS) {
ret = -EOPNOTSUPP;
goto errout;
@@ -672,7 +673,7 @@ int ext4_ind_migrate(struct inode *inode)

ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
memset(ei->i_data, 0, sizeof(ei->i_data));
- for (i=0; i < len; i++)
+ for (i = start; i <= end; i++)
ei->i_data[i] = cpu_to_le32(blk++);
ext4_mark_inode_dirty(handle, inode);
errout:
--
1.8.3.1

2015-07-04 04:04:50

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 1/2] ext4: be more strict when migrating to non-extent based file

On Sat, Jul 04, 2015 at 12:51:50AM +0800, Eryu Guan wrote:
> Currently the check in ext4_ind_migrate() is not enough before doing the
> real conversion:
>
> a) delayed allocated extents could bypass the check on eh->eh_entries
> and eh->eh_depth
>
> This can be demonstrated by this script
>
> xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
> chattr -e /mnt/ext4/testfile
>
> where testfile has two extents but still be converted to non-extent
> based file format.
>
> b) only extent length is checked but not the offset, which would result
> in data lose (delalloc) or fs corruption (nodelalloc), because
> non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
> blocks
>
> ...

Thanks for the really great commit description! I've applied this to
the ext4 git tree.

- Ted

2015-07-04 04:33:30

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 2/2] ext4: reserve hole in the migration to non-extent based file

Applied, although I reworded the commit description a little:

ext4: correctly migrate a file with a hole at the beginning

Currently ext4_ind_migrate() doesn't correctly handle a file which
contains a hole at the beginning of the file. This caused the migration
to be done incorrectly, and then if there is a subsequent following
delayed allocation write to the "hole", this would reclaim the same data
blocks again and results in fs corruption.

...

Thanks!!

- Ted

2015-07-05 10:18:10

by Eryu Guan

[permalink] [raw]

Subject: Re: [PATCH 2/2] ext4: reserve hole in the migration to non-extent based file

On Sat, Jul 04, 2015 at 12:33:29AM -0400, Theodore Ts'o wrote:
> Applied, although I reworded the commit description a little:
>
> ext4: correctly migrate a file with a hole at the beginning
>
> Currently ext4_ind_migrate() doesn't correctly handle a file which
> contains a hole at the beginning of the file. This caused the migration
> to be done incorrectly, and then if there is a subsequent following
> delayed allocation write to the "hole", this would reclaim the same data
> blocks again and results in fs corruption.

Clearly better than my wording, thanks!

Eryu