I find generic/269 fails on ext4 because of i_blocks corruption like
e2fsck 1.42.12 (29-Aug-2014)
Pass 1: Checking inodes, blocks, and sizes
Inode 4802, i_blocks is 280, should be 248. Fix? no
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
ext4.img: ********** WARNING: Filesystem still has errors **********
ext4.img: 32752/32768 files (12.0% non-contiguous), 128483/131072 blocks
and find that it has something to do with the setattr and write
operations in fsstress, with delalloc enabled.
This can be reproduced more reliably by
# disable all other operations except create/mkdir/write/setattr
# the "-M 524288" restricts the setflags ioctl to EXT4_EXTENTS_FL
export FSSTRESS_AVOID="-fallocsp=0 -fattr_remove=0 -fattr_set=0 \
-fbulkstat=0 -fbulkstat1=0 -fchown=0 -fdread=0 -fdwrite=0 \
-ffallocate=0 -ffiemap=0 -ffreesp=0 -fgetattr=0 -fgetdents=0 \
-flink=0 -fmknod=0 -fpunch=0 -fzero=0 -fcollapse=0 -finsert=0 \
-fread=0 -freadlink=0 -frename=0 -fresvsp=0 -frmdir=0 -fsetxattr=0 \
-fstat=0 -fsymlink=0 -ftruncate=0 -funlink=0 -funresvsp=0 \
-fsetattr=5 -M 524288"
while ./check generic/269; do : ; done
And a simplified reproducer is
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
umount /mnt/ext4
e2fsck -nf /dev/<device>
More details please see the commit log in patches.
Eryu Guan (2):
ext4: be more strict when migrating to non-extent based file
ext4: reserve hole in the migration to non-extent based file
fs/ext4/migrate.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--
1.8.3.1
Currently the check in ext4_ind_migrate() is not enough before doing the
real conversion:
a) delayed allocated extents could bypass the check on eh->eh_entries
and eh->eh_depth
This can be demonstrated by this script
xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
where testfile has two extents but still be converted to non-extent
based file format.
b) only extent length is checked but not the offset, which would result
in data lose (delalloc) or fs corruption (nodelalloc), because
non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
blocks
This can be demostrated by
xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
sync
If delalloc is enabled, dmesg prints
EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
EXT4-fs (dm-4): This should not happen!! Data will be lost
If delalloc is disabled, e2fsck -nf shows corruption
Inode 53, i_size is 5497558142976, should be 4096. Fix? no
Fix the two issues by
a) forcing all delayed allocation blocks to be allocated before checking
eh->eh_depth and eh->eh_entries
b) limiting the last logical block of the extent is within direct map
Signed-off-by: Eryu Guan <[email protected]>
---
fs/ext4/migrate.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index b52374e..3651141 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -620,6 +620,7 @@ int ext4_ind_migrate(struct inode *inode)
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_extent *ex;
unsigned int i, len;
+ ext4_lblk_t end;
ext4_fsblk_t blk;
handle_t *handle;
int ret;
@@ -633,6 +634,14 @@ int ext4_ind_migrate(struct inode *inode)
EXT4_FEATURE_RO_COMPAT_BIGALLOC))
return -EOPNOTSUPP;
+ /*
+ * In order to get correct extent info, force all delayed allocation
+ * blocks to be allocated, otherwise delayed allocation blocks may not
+ * be reflected and bypass the checks on extent header.
+ */
+ if (test_opt(inode->i_sb, DELALLOC))
+ ext4_alloc_da_blocks(inode);
+
handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
if (IS_ERR(handle))
return PTR_ERR(handle);
@@ -654,7 +663,8 @@ int ext4_ind_migrate(struct inode *inode)
else {
len = le16_to_cpu(ex->ee_len);
blk = ext4_ext_pblock(ex);
- if (len > EXT4_NDIR_BLOCKS) {
+ end = le32_to_cpu(ex->ee_block) + len - 1;
+ if (end >= EXT4_NDIR_BLOCKS) {
ret = -EOPNOTSUPP;
goto errout;
}
--
1.8.3.1
Currently in ext4_ind_migrate() hole is not reserved, a following
delayed allocation write to the "hole" would reclaim the same data
blocks again and results in fs corruption.
# assmuing 4k block size ext4, with delalloc enabled
# skip the first block and write to the second block
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile
# converting to indirect-mapped file, which would move the data blocks
# to the beginning of the file, but extent status cache still marks
# that region as a hole
chattr -e /mnt/ext4/testfile
# delayed allocation writes to the "hole", reclaim the same data block
# again, results in i_blocks corruption
xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
umount /mnt/ext4
e2fsck -nf /dev/sda6
...
Inode 53, i_blocks is 16, should be 8. Fix? no
...
Signed-off-by: Eryu Guan <[email protected]>
---
fs/ext4/migrate.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 3651141..10824b0 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -620,7 +620,7 @@ int ext4_ind_migrate(struct inode *inode)
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_extent *ex;
unsigned int i, len;
- ext4_lblk_t end;
+ ext4_lblk_t start, end;
ext4_fsblk_t blk;
handle_t *handle;
int ret;
@@ -659,11 +659,12 @@ int ext4_ind_migrate(struct inode *inode)
goto errout;
}
if (eh->eh_entries == 0)
- blk = len = 0;
+ blk = len = start = end = 0;
else {
len = le16_to_cpu(ex->ee_len);
blk = ext4_ext_pblock(ex);
- end = le32_to_cpu(ex->ee_block) + len - 1;
+ start = le32_to_cpu(ex->ee_block);
+ end = start + len - 1;
if (end >= EXT4_NDIR_BLOCKS) {
ret = -EOPNOTSUPP;
goto errout;
@@ -672,7 +673,7 @@ int ext4_ind_migrate(struct inode *inode)
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
memset(ei->i_data, 0, sizeof(ei->i_data));
- for (i=0; i < len; i++)
+ for (i = start; i <= end; i++)
ei->i_data[i] = cpu_to_le32(blk++);
ext4_mark_inode_dirty(handle, inode);
errout:
--
1.8.3.1
On Sat, Jul 04, 2015 at 12:51:50AM +0800, Eryu Guan wrote:
> Currently the check in ext4_ind_migrate() is not enough before doing the
> real conversion:
>
> a) delayed allocated extents could bypass the check on eh->eh_entries
> and eh->eh_depth
>
> This can be demonstrated by this script
>
> xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
> chattr -e /mnt/ext4/testfile
>
> where testfile has two extents but still be converted to non-extent
> based file format.
>
> b) only extent length is checked but not the offset, which would result
> in data lose (delalloc) or fs corruption (nodelalloc), because
> non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
> blocks
>
> ...
Thanks for the really great commit description! I've applied this to
the ext4 git tree.
- Ted
Applied, although I reworded the commit description a little:
ext4: correctly migrate a file with a hole at the beginning
Currently ext4_ind_migrate() doesn't correctly handle a file which
contains a hole at the beginning of the file. This caused the migration
to be done incorrectly, and then if there is a subsequent following
delayed allocation write to the "hole", this would reclaim the same data
blocks again and results in fs corruption.
...
Thanks!!
- Ted
On Sat, Jul 04, 2015 at 12:33:29AM -0400, Theodore Ts'o wrote:
> Applied, although I reworded the commit description a little:
>
> ext4: correctly migrate a file with a hole at the beginning
>
> Currently ext4_ind_migrate() doesn't correctly handle a file which
> contains a hole at the beginning of the file. This caused the migration
> to be done incorrectly, and then if there is a subsequent following
> delayed allocation write to the "hole", this would reclaim the same data
> blocks again and results in fs corruption.
Clearly better than my wording, thanks!
Eryu