LinuxLists.cc - (resend) extent header problems following shrink with resize2fs

2008-12-23 05:49:16

Subject: (resend) extent header problems following shrink with resize2fs

(resending without gipped attachment)

Running Linux 2.6.28-rc9 as of ab65387243f47a7bc11725f733c86bf27248b326.
e2fsprogs 1.41.3-1 from Debian.

Yesterday I created a ~464GB ext4 volume and copied about 107GB of music
files onto it. Then I decided that I wanted to use half of the disk for
something else, so last night I resized the ext4 filesystem to ~232GB
and recreated the partitions to suit. This morning I wrote some new
files to the ext4 filesystem, which went fine. Then I installed a new
music player, which wanted to scan all of the files on the disk. It
reported being unable to read some files, and there's rather a lot of
this sort of thing in dmesg (see also http://ondioline.org/~paul/e4dmesg.gz):

EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #39565: invalid magic - magic 24e, entries 28338, max 21313(0), depth 28712(0)
EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #39555: invalid magic - magic cd8c, entries 59560, max 57082(0), depth 5425(0)
EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #39563: invalid magic - magic 976d, entries 52325, max 49256(0), depth 50316(0)
EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #39556: invalid magic - magic 61c6, entries 47990, max 4668(0), depth 32768(0)
EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #44888: invalid magic - magic 42a, entries 5388, max 32960(0), depth 1872(0)
EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #39844: invalid magic - magic 6ae8, entries 44073, max 20807(0), depth 10869(0)
EXT4-fs error (device sdb1): ext4_ext_find_extent: bad header in inode #39843: invalid magic - magic 2200, entries 38282, max 17931(0), depth 0(0)

There are no "access beyond end of partition" messages, so I don't think
I screwed up the resize procedure. The argument I gave resizefs was
"244192000K"; here's the partition table:

Device Boot Start End Blocks Id System
/dev/sdb1 1 30401 244196001 83 Linux
/dev/sdb2 30402 60801 244188000 83 Linux

e2fsck aborts when I try to use it fix the filesystem:

/dev/sdb1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Error1: Corrupt extent header on inode 38979
[New Thread 0x7fe15e066740 (LWP 24166)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fe15e066740 (LWP 24166)]
0x00007fe15d0fbed5 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x00007fe15d0fbed5 in raise () from /lib/libc.so.6
#1 0x00007fe15d0fd3f3 in abort () from /lib/libc.so.6
#2 0x000000000040bdae in scan_extent_node (ctx=0x24c6f70,
pctx=0x7fff6607c7a0, pb=0x7fff6607c5f0, start_block=0, ehandle=0x2ed94d0)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/pass1.c:1700
#3 0x000000000040cc1d in check_blocks (ctx=0x24c6f70, pctx=0x7fff6607c7a0,
block_buf=0x2ec11a0 "�002")
at /build/buildd/e2fsprogs-1.41.3/e2fsck/pass1.c:1773
#4 0x000000000040e063 in e2fsck_pass1 (ctx=0x24c6f70)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/pass1.c:1030
#5 0x00000000004089e8 in e2fsck_run (ctx=0x24c6f70)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/e2fsck.c:215
#6 0x00000000004074a3 in main (argc=<value optimized out>,
argv=<value optimized out>)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/unix.c:1278

and e2image exits with:

e2image: Corrupt extent header while iterating over inode 38979

In the matter of inode 38979, debugfs says:

debugfs: bmap <38979> 0
argv[0]: Corrupt extent header while mapping logical block 0

I can keep the FS around so let me know if you need any more
information.

--
Paul Collins
Wellington, New Zealand

Dag vijandelijk luchtschip de huismeester is dood

2008-12-23 06:18:39

by Theodore Ts'o

[permalink] [raw]

Subject: Re: (resend) extent header problems following shrink with resize2fs

On Tue, Dec 23, 2008 at 06:49:15PM +1300, Paul Collins wrote:
> (resending without gipped attachment)
>
> Yesterday I created a ~464GB ext4 volume and copied about 107GB of music
> files onto it. Then I decided that I wanted to use half of the disk for
> something else, so last night I resized the ext4 filesystem to ~232GB
> and recreated the partitions to suit. This morning I wrote some new
> files to the ext4 filesystem, which went fine. Then I installed a new
> music player, which wanted to scan all of the files on the disk. It
> reported being unable to read some files, and there's rather a lot of
> this sort of thing in dmesg (see also http://ondioline.org/~paul/e4dmesg.gz):

Yeah, resize2fs needs to be fixed to handle extents correctly. At the
moment it can screw them up pretty badly. I'll log this as a bug to
resize2fs; thanks for reporting it, and I hope you didn't suffer any
permanent data loss.

Regards,

- Ted

2008-12-25 07:18:50

by Paul Collins

[permalink] [raw]

Subject: Re: (resend) extent header problems following shrink with resize2fs

Theodore Tso <[email protected]> writes:
> Yeah, resize2fs needs to be fixed to handle extents correctly. At the
> moment it can screw them up pretty badly.

In the meantime, perhaps something like the patch below is appropriate?

> I'll log this as a bug to resize2fs; thanks for reporting it, and I
> hope you didn't suffer any permanent data loss.

No worries there, that was replica N+1 of those particular files.

My real concern, which I didn't highlight well and buried way down in my
original report to boot, was e2fsck blowing up like it did. Hardware
being what it is, I imagine at some point extent headers will get
corrupted, and losing one file is of course preferable to losing the
entire filesystem.

For reference, here's that backtrace again.

/dev/sdb1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Error1: Corrupt extent header on inode 38979
[New Thread 0x7fe15e066740 (LWP 24166)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fe15e066740 (LWP 24166)]
0x00007fe15d0fbed5 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x00007fe15d0fbed5 in raise () from /lib/libc.so.6
#1 0x00007fe15d0fd3f3 in abort () from /lib/libc.so.6
#2 0x000000000040bdae in scan_extent_node (ctx=0x24c6f70,
pctx=0x7fff6607c7a0, pb=0x7fff6607c5f0, start_block=0, ehandle=0x2ed94d0)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/pass1.c:1700
#3 0x000000000040cc1d in check_blocks (ctx=0x24c6f70, pctx=0x7fff6607c7a0,
block_buf=0x2ec11a0 "�002")
at /build/buildd/e2fsprogs-1.41.3/e2fsck/pass1.c:1773
#4 0x000000000040e063 in e2fsck_pass1 (ctx=0x24c6f70)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/pass1.c:1030
#5 0x00000000004089e8 in e2fsck_run (ctx=0x24c6f70)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/e2fsck.c:215
#6 0x00000000004074a3 in main (argc=<value optimized out>,
argv=<value optimized out>)
at /build/buildd/e2fsprogs-1.41.3/e2fsck/unix.c:1278

diff --git a/resize/main.c b/resize/main.c
index 3de333e..fb4fa99 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -426,6 +426,13 @@ int main (int argc, char ** argv)
"long. Nothing to do!\n\n"), new_size);
exit(0);
}
+ if ((new_size < fs->super->s_blocks_count) &&
+ (fs->super->s_feature_incompat & EXT3_FEATURE_INCOMPAT_EXTENTS)) {
+ fprintf(stderr, _("Reducing the size of a "
+ "filesystem with extents enabled\n"
+ "is currently not supported.\n"));
+ exit(1);
+ }
if (mount_flags & EXT2_MF_MOUNTED) {
retval = online_resize_fs(fs, mtpt, &new_size, flags);
} else {

--
Paul Collins
Wellington, New Zealand

Dag vijandelijk luchtschip de huismeester is dood

2008-12-25 13:12:04

by Theodore Ts'o

[permalink] [raw]

Subject: Re: (resend) extent header problems following shrink with resize2fs

On Thu, Dec 25, 2008 at 08:18:48PM +1300, Paul Collins wrote:
> My real concern, which I didn't highlight well and buried way down in my
> original report to boot, was e2fsck blowing up like it did. Hardware
> being what it is, I imagine at some point extent headers will get
> corrupted, and losing one file is of course preferable to losing the
> entire filesystem.

Yeah, I know about that problem. It was highlighted recently but what
with the end of the year coming up I haven't had a chance to fix it
yet. It's an embarassing oversight on my part; I didn't notice that I
failed to handle this case because it happens relatively rarely that
an extent tree has a depth >= 2 in the first place, since this error
only happens when an non-leaf interior node gets corrupted. I had
left it as an "we'll handle this later" case, and I never got back to
it. The short-term workaround is simply to use debugfs and use the
clri function:

debugfs -w /dev/sdb1
debugfs: clri <38979>
debugfs: quit

... and then run e2fsck. I'll get this fixed in the next maintenance
release of e2fsprogs, though, which will be out soon. We have a few
ext4 related problems that I really need to get fixed and out the
door.

- Ted

2008-12-26 03:58:04

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Massive filesystem corruption

On Sun, Dec 21, 2008 at 03:05:49AM +0100, Matteo Croce wrote:
> > > Pass 1: Checking inodes, blocks, and sizes
> > > Error1: Corrupt extent header on inode 107192
> > > [New Thread 0xb7e46700 (LWP 12878)]

The following patch to e2fsprogs will fix e2fsck's inability to deal
with a corrupted interior node in the extent tree. It will be in the
next maintenance release of e2fsprogs, and it should address the
problem you've pointed out.

Regards,

- Ted

commit 7518c176867099eb529502103106501861a71280
Author: Theodore Ts'o <[email protected]>
Date: Thu Dec 25 22:42:38 2008 -0500

e2fsck: Fix an unhandled corruption case in scan_extent_node()

A corrupted interior node in an extent tree would cause e2fsck to
crash with the error message:

Error1: Corrupt extent header on inode 107192
Aborted (core dumped)

Handle this and related failures when scanning an inode's extent tree
more robustly.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 2619272..04aeb26 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1655,6 +1655,7 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
problem = PR_1_EXTENT_ENDS_BEYOND;

if (problem) {
+ report_problem:
pctx->blk = extent.e_pblk;
pctx->blk2 = extent.e_lblk;
pctx->num = extent.e_len;
@@ -1662,11 +1663,7 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
pctx->errcode =
ext2fs_extent_delete(ehandle, 0);
if (pctx->errcode) {
- fix_problem(ctx,
- PR_1_EXTENT_DELETE_FAIL,
- pctx);
- /* Should never get here */
- ctx->flags |= E2F_FLAG_ABORT;
+ pctx->str = "ext2fs_extent_delete";
return;
}
pctx->errcode = ext2fs_extent_get(ehandle,
@@ -1682,23 +1679,27 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
}

if (!is_leaf) {
- mark_block_used(ctx, extent.e_pblk);
- pb->num_blocks++;
+ blk = extent.e_pblk;
pctx->errcode = ext2fs_extent_get(ehandle,
EXT2_EXTENT_DOWN, &extent);
if (pctx->errcode) {
- printf("Error1: %s on inode %u\n",
- error_message(pctx->errcode), pctx->ino);
- abort();
+ pctx->str = "EXT2_EXTENT_DOWN";
+ problem = PR_1_EXTENT_HEADER_INVALID;
+ if (pctx->errcode == EXT2_ET_EXTENT_HEADER_BAD)
+ goto report_problem;
+ return;
}
scan_extent_node(ctx, pctx, pb, extent.e_lblk, ehandle);
+ if (pctx->errcode)
+ return;
pctx->errcode = ext2fs_extent_get(ehandle,
EXT2_EXTENT_UP, &extent);
if (pctx->errcode) {
- printf("Error1: %s on inode %u\n",
- error_message(pctx->errcode), pctx->ino);
- abort();
+ pctx->str = "EXT2_EXTENT_UP";
+ return;
}
+ mark_block_used(ctx, blk);
+ pb->num_blocks++;
goto next;
}

@@ -1780,7 +1781,14 @@ static void check_blocks_extents(e2fsck_t ctx, struct problem_context *pctx,
}

scan_extent_node(ctx, pctx, pb, 0, ehandle);
-
+ if (pctx->errcode &&
+ fix_problem(ctx, PR_1_EXTENT_ITERATE_FAILURE, pctx)) {
+ pb->num_blocks = 0;
+ inode->i_blocks = 0;
+ e2fsck_clear_inode(ctx, ino, inode, E2F_FLAG_RESTART,
+ "check_blocks_extents");
+ pctx->errcode = 0;
+ }
ext2fs_extent_free(ehandle);
}

diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index 19e8719..9cb3094 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -823,10 +823,11 @@ static struct e2fsck_problem problem_table[] = {
N_("Error while reading over @x tree in @i %i: %m\n"),
PROMPT_CLEAR_INODE, 0 },

- /* Error deleting a bogus extent */
- { PR_1_EXTENT_DELETE_FAIL,
- N_("Error while deleting extent: %m\n"),
- PROMPT_ABORT, 0 },
+ /* Failure to iterate extents */
+ { PR_1_EXTENT_ITERATE_FAILURE,
+ N_("Failed to iterate extents in @i %i\n"
+ "\t(op %s, blk %b, lblk %c): %m\n"),
+ PROMPT_CLEAR_INODE, 0 },

/* Bad starting block in extent */
{ PR_1_EXTENT_BAD_START_BLK,
@@ -863,6 +864,10 @@ static struct e2fsck_problem problem_table[] = {
N_("@i %i has out of order extents\n\t(@n logical @b %c, physical @b %b, len %N)\n"),
PROMPT_CLEAR, 0 },

+ { PR_1_EXTENT_HEADER_INVALID,
+ N_("@i %i has an invalid extent node (blk %b, lblk %c)\n"),
+ PROMPT_CLEAR, 0 },
+
/* Pass 1b errors */

/* Pass 1B: Rescan for duplicate/bad blocks */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 815b37c..1cb054c 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -479,8 +479,8 @@ struct problem_context {
/* Error while reading extent tree */
#define PR_1_READ_EXTENT 0x010056

-/* Error deleting a bogus extent */
-#define PR_1_EXTENT_DELETE_FAIL 0x010057
+/* Failure to iterate extents */
+#define PR_1_EXTENT_ITERATE_FAILURE 0x010057

/* Bad starting block in extent */
#define PR_1_EXTENT_BAD_START_BLK 0x010058
@@ -503,6 +503,9 @@ struct problem_context {
/* Extents are out of order */
#define PR_1_OUT_OF_ORDER_EXTENTS 0x01005E

+/* Extent node header invalid */
+#define PR_1_EXTENT_HEADER_INVALID 0x01005F
+
/*
* Pass 1b errors
*/
diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 929e5cd..5545a94 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -441,8 +441,10 @@ retry:
eh = (struct ext3_extent_header *) newpath->buf;

retval = ext2fs_extent_header_verify(eh, handle->fs->blocksize);
- if (retval)
+ if (retval) {
+ handle->level--;
return retval;
+ }

newpath->left = newpath->entries =
ext2fs_le16_to_cpu(eh->eh_entries);

2008-12-26 04:14:15

by Theodore Ts'o

[permalink] [raw]

Subject: Re: (resend) extent header problems following shrink with resize2fs

On Thu, Dec 25, 2008 at 08:18:48PM +1300, Paul Collins wrote:
> Theodore Tso <[email protected]> writes:
> > Yeah, resize2fs needs to be fixed to handle extents correctly. At the
> > moment it can screw them up pretty badly.
>
> In the meantime, perhaps something like the patch below is appropriate?

Actually, I think the following patch should fix things up nicely. I
need to create a test case so I can be sure this fixes the problem,
but I think this should address the root cause of theproblem you
reported.

- Ted

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index abe05f5..65398a6 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1188,6 +1188,16 @@ static int process_block(ext2_filsys fs, blk_t *block_nr,
return ret;
}

+static int process_block_ind(ext2_filsys fs, blk_t *block_nr,
+ e2_blkcnt_t blockcnt, blk_t ref_block,
+ int ref_offset, void *priv_data)
+{
+ if (blockcnt >= 0)
+ return 0;
+ return process_block(fs, block_nr, blockcnt, ref_block, ref_offset,
+ priv_data);
+}
+
/*
* Progress callback
*/
@@ -1302,6 +1312,18 @@ static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
if (ext2fs_inode_has_valid_blocks(inode) &&
(rfs->bmap || pb.is_dir)) {
pb.ino = ino;
+ if (inode->i_flags & EXT4_EXTENTS_FL) {
+ /*
+ * With extent-based files, we have
+ * to translate all of the interior
+ * node blocks first.
+ */
+ retval = ext2fs_block_iterate2(rfs->old_fs,
+ ino, 0, block_buf,
+ process_block_ind, &pb);
+ if (retval)
+ goto errout;
+ }
retval = ext2fs_block_iterate2(rfs->old_fs,
ino, 0, block_buf,
process_block, &pb);