2009-04-19 03:03:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 0/3] Fix resize2fs data/filesystem corruption bugs


These patches fix some nasty resize2fs bugs when growing or shrinking
ext4 filesystems off-line. Eric, you'll probably want get these
patches into Fedora's e2fsprogs before F11 goes live.

The test case I've been using for these patches are as follows.

#!/bin/sh
mke2fs -i 8192 -b 1024 -t ext4 /dev/closure/testext4 1048576
mount /dev/closure/testext4 /mnt
cd /mnt
mkdir 1 2 3 4
cd 1; tar xjf /usr/projects/uml/shars/linux-2.6.11.tar.bz2; cd ..
cd 2; tar xjf /usr/projects/uml/shars/linux-2.6.11.tar.bz2; cd ..
cd 3; tar xjf /usr/projects/uml/shars/linux-2.6.11.tar.bz2; cd ..
cd 4; tar xjf /usr/projects/uml/shars/linux-2.6.11.tar.bz2; cd ..
cd /
umount /mnt
e2fsck -f /dev/closure/testext4

Then try resizing the filesystem smaller or larger. With e2fsprogs
1.41.4, "resize2fs -M -p /dev/closure/testext4" will just plain fail in
the middle with the resize operation:

# /sbin/resize2fs -M -p /dev/closure/testext4
resize2fs 1.41.4 (27-Jan-2009)
Resizing the filesystem on /dev/closure/textext4 to 863610 (1k) blocks.
Begin pass 2 (max = 116149)
Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 3 (max = 128)
Scanning inode table XXXXXXXXXXXXXXXXXXXX/sbin/resize2fs: No space left on device while trying to resize /dev/closure/textext4

The bugs fixed include potentially overwriting part of the existing
inode table when shrinking the filesystem, and potentially overwriting
in-use data blocks when growing the filesystem. So these are pretty
serious errors, and means an e2fsprogs 1.41.5 will be coming out real
soon now.

With these patches, I've resized the filesystem and then run a CRC
checksum test across the unpacked 2.6.11 source tree. (Why 2.6.11?
Because it was what I happened to have handy on my laptop at the time. :-)

- Ted




2009-04-19 03:03:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/3] resize2fs: Fix data corruption bug when growing an ext4 filesystem off-line

When allocating a new set of block group metadata as part of growing
the filesystem, the resize2fs code assumes that the bitmap and inode
table blocks are in their own block group; an assumption which is
changed by the flex_bg feature. This commit works around the problem
by temporarily turning off flex_bg while allocating the new block
group metadata, to avoid potentially overwriting previously allocated
data blocks.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
resize/resize2fs.c | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 1173da1..0c1549b 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -746,6 +746,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
errcode_t retval;
ext2_filsys fs, old_fs;
ext2fs_block_bitmap meta_bmap;
+ __u32 save_incompat_flag;

fs = rfs->new_fs;
old_fs = rfs->old_fs;
@@ -890,9 +891,29 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)

/*
* Allocate the missing data structures
+ *
+ * XXX We have a problem with FLEX_BG and off-line
+ * resizing where we are growing the size of the
+ * filesystem. ext2fs_allocate_group_table() will try
+ * to reserve the inode table in the desired flex_bg
+ * location. However, passing rfs->reserve_blocks
+ * doesn't work since it only has reserved the blocks
+ * that will be used in the new block group -- and
+ * with flex_bg, we can and will allocate the tables
+ * outside of the block group. And we can't pass in
+ * the fs->block_map because it doesn't handle
+ * overlapping inode table movements right. So for
+ * now, we temporarily disable flex_bg to force
+ * ext2fs_allocate_group_tables() to allocate the bg
+ * metadata in side the block group, and the restore
+ * it afterwards. Ugly, until we can fix this up
+ * right later.
*/
+ save_incompat_flag = fs->super->s_feature_incompat;
+ fs->super->s_feature_incompat &= ~EXT4_FEATURE_INCOMPAT_FLEX_BG;
retval = ext2fs_allocate_group_table(fs, i,
rfs->reserve_blocks);
+ fs->super->s_feature_incompat = save_incompat_flag;
if (retval)
goto errout;

--
1.5.6.3


2009-04-19 03:03:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/3] resize2fs: Fix data corruption bug when shrinking the inode table for ext4

If we need to shrink the inode table, we need to make sure the inodes
contained in the part of the inode table we are vacating don't get
reused as part of the filesystem shrink operation. This wasn't a
problem with ext3 filesystems, since the inode table was located in
the block group that was going away, so that location was not eligible
for reallocation.

However with ext4 filesystems with flex_bg enabled, it's possible for
a portion of the inode table in the last flex_bg group to be
deallocated, but in a part of the filesystem which could be used as
data blocks. So we must mark those blocks as reserved to prevent
their reuse, and adjust the minimum filesystem size calculation to
assure that we don't shrink a filesystem too small for the resize
operation to succeed.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
resize/online.c | 2 +-
resize/resize2fs.c | 56 +++++++++++++++++++++++++++++++++++++++-------------
resize/resize2fs.h | 1 +
3 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/resize/online.c b/resize/online.c
index d581553..4bc5451 100644
--- a/resize/online.c
+++ b/resize/online.c
@@ -104,7 +104,7 @@ errcode_t online_resize_fs(ext2_filsys fs, const char *mtpt,
* but at least it allows on-line resizing to function.
*/
new_fs->super->s_feature_incompat &= ~EXT4_FEATURE_INCOMPAT_FLEX_BG;
- retval = adjust_fs_info(new_fs, fs, *new_size);
+ retval = adjust_fs_info(new_fs, fs, 0, *new_size);
if (retval)
return retval;

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 0c1549b..ac926ce 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -233,20 +233,29 @@ static void fix_uninit_block_bitmaps(ext2_filsys fs)

/*
* If the group descriptor's bitmap and inode table blocks are valid,
- * release them in the specified filesystem data structure
+ * release them in the new filesystem data structure, and mark them as
+ * reserved so the old inode table blocks don't get overwritten.
*/
-static void free_gdp_blocks(ext2_filsys fs, struct ext2_group_desc *gdp)
+static void free_gdp_blocks(ext2_filsys fs,
+ ext2fs_block_bitmap reserve_blocks,
+ struct ext2_group_desc *gdp)
{
blk_t blk;
int j;

if (gdp->bg_block_bitmap &&
- (gdp->bg_block_bitmap < fs->super->s_blocks_count))
+ (gdp->bg_block_bitmap < fs->super->s_blocks_count)) {
ext2fs_block_alloc_stats(fs, gdp->bg_block_bitmap, -1);
+ ext2fs_mark_block_bitmap(reserve_blocks,
+ gdp->bg_block_bitmap);
+ }

if (gdp->bg_inode_bitmap &&
- (gdp->bg_inode_bitmap < fs->super->s_blocks_count))
+ (gdp->bg_inode_bitmap < fs->super->s_blocks_count)) {
ext2fs_block_alloc_stats(fs, gdp->bg_inode_bitmap, -1);
+ ext2fs_mark_block_bitmap(reserve_blocks,
+ gdp->bg_inode_bitmap);
+ }

if (gdp->bg_inode_table == 0 ||
(gdp->bg_inode_table >= fs->super->s_blocks_count))
@@ -257,14 +266,19 @@ static void free_gdp_blocks(ext2_filsys fs, struct ext2_group_desc *gdp)
if (blk >= fs->super->s_blocks_count)
break;
ext2fs_block_alloc_stats(fs, blk, -1);
+ ext2fs_mark_block_bitmap(reserve_blocks, blk);
}
}

/*
* This routine is shared by the online and offline resize routines.
* All of the information which is adjusted in memory is done here.
+ *
+ * The reserve_blocks parameter is only needed when shrinking the
+ * filesystem.
*/
-errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs, blk_t new_size)
+errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs,
+ ext2fs_block_bitmap reserve_blocks, blk_t new_size)
{
errcode_t retval;
int overhead = 0;
@@ -399,8 +413,8 @@ retry:
}

/*
- * If we are shrinking the number block groups, we're done and
- * can exit now.
+ * If we are shrinking the number of block groups, we're done
+ * and can exit now.
*/
if (old_fs->group_desc_count > fs->group_desc_count) {
/*
@@ -409,7 +423,8 @@ retry:
*/
for (i = fs->group_desc_count;
i < old_fs->group_desc_count; i++) {
- free_gdp_blocks(fs, &old_fs->group_desc[i]);
+ free_gdp_blocks(fs, reserve_blocks,
+ &old_fs->group_desc[i]);
}
retval = 0;
goto errout;
@@ -550,7 +565,12 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
ext2fs_mark_bb_dirty(fs);
ext2fs_mark_ib_dirty(fs);

- retval = adjust_fs_info(fs, rfs->old_fs, new_size);
+ retval = ext2fs_allocate_block_bitmap(fs, _("reserved blocks"),
+ &rfs->reserve_blocks);
+ if (retval)
+ return retval;
+
+ retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size);
if (retval)
goto errout;

@@ -753,11 +773,6 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
if (old_fs->super->s_blocks_count > fs->super->s_blocks_count)
fs = rfs->old_fs;

- retval = ext2fs_allocate_block_bitmap(fs, _("reserved blocks"),
- &rfs->reserve_blocks);
- if (retval)
- return retval;
-
retval = ext2fs_allocate_block_bitmap(fs, _("blocks to be moved"),
&rfs->move_blocks);
if (retval)
@@ -1877,6 +1892,19 @@ blk_t calculate_minimum_resize_size(ext2_filsys fs)
data_needed -= SUPER_OVERHEAD(fs) * num_of_superblocks;
data_needed -= META_OVERHEAD(fs) * fs->group_desc_count;

+ if (fs->super->s_feature_incompat & EXT4_FEATURE_INCOMPAT_FLEX_BG) {
+ /*
+ * For ext4 we need to allow for up to a flex_bg worth
+ * of inode tables of slack space so the resize
+ * operation can be guaranteed to finish.
+ */
+ int flexbg_size = 1 << fs->super->s_log_groups_per_flex;
+ int extra_groups;
+
+ extra_groups = flexbg_size - (groups & (flexbg_size - 1));
+ data_needed += META_OVERHEAD(fs) * extra_groups;
+ }
+
/*
* figure out how many data blocks we have given the number of groups
* we need for our inodes
diff --git a/resize/resize2fs.h b/resize/resize2fs.h
index ed25b06..fab7290 100644
--- a/resize/resize2fs.h
+++ b/resize/resize2fs.h
@@ -128,6 +128,7 @@ extern errcode_t resize_fs(ext2_filsys fs, blk_t *new_size, int flags,
unsigned long max));

extern errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs,
+ ext2fs_block_bitmap reserve_blocks,
blk_t new_size);
extern blk_t calculate_minimum_resize_size(ext2_filsys fs);

--
1.5.6.3


2009-04-19 03:03:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 3/3] resize2fs: Fix corruption bug impacting ext4 filesystems with uninit_bg

Due to a fencepost bug, when skipping a block group whose block bitmap
was uninitialized (and hence could not contain any blocks eligible for
relaocation), the block immediately following the block group wasn't
checked as well. If it was in use and required relocation, it
wouldn't get properly relocated, with the result that an inode using
such a block would end up, post resize, with a pointer to a block now
outside the bounds of the filesystem.

This commit fixes this fencepost error.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
resize/resize2fs.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index ac926ce..f3e7fd0 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -804,7 +804,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
* to the next block group.
*/
blk = ((g+1) * fs->super->s_blocks_per_group) +
- fs->super->s_first_data_block;
+ fs->super->s_first_data_block - 1;
continue;
}
if (ext2fs_test_block_bitmap(old_fs->block_map, blk) &&
--
1.5.6.3


2009-04-21 23:58:39

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix resize2fs data/filesystem corruption bugs

Theodore Ts'o wrote:
> These patches fix some nasty resize2fs bugs when growing or shrinking
> ext4 filesystems off-line. Eric, you'll probably want get these
> patches into Fedora's e2fsprogs before F11 goes live.

Thanks for the heads up Ted -

FWIW, the livecd creation process, which does a fair bit of offline
resizing (shrinking), was indeed seeing corruption, and these patches do
seem to resolve it. (I will soon be recommending that the final livecd
image be fsck'd, as it gets subsequently dd'd to thousands of hard
drives during the livecd install process!)

-Eric