2009-06-16 22:37:59

by Eric Sandeen

[permalink] [raw]
Subject: Something wrong with extent-based journal creation

I've been tearing my hair out all day on this and not getting anywhere
yet, so punting to the list ;)

I have a user running mke2fs -t ext4 in a virt guest, and a mount
immediately after that fails.

It fails because the journal inode has an invalide i_extra_isize.

I've narrowed it down to the commit (961306d3) which creates the journal
with extent format; if that's commented out, it works fine.

I can't for the life of me see where the problem is coming from, though.

I thought we'd need a write_inode_new (emphasis on new here) for the
journal inode, but switching to that doesn't help.

I can't reproduce this myself... any ideas or suggestions of where to
look? :)

-Eric


2009-06-17 00:57:58

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH] libext2fs: write only core inode in update_path()

Eric Sandeen wrote:

> I've been tearing my hair out all day on this and not getting anywhere
> yet, so punting to the list ;)
>
>
>
Well, here's one problem. I'm waiting for the reporter to confirm
whether it fixes the bug. I think it likely does but will report
back tomorrow. Anyway patch follows, thanks valgrind!



libext2fs: write only core inode in update_path()

The ext2_extent_handle only has a struct ext2_inode allocated on
it, and the same amount copied into it in that same function,
but in update_path() we're possibly writing out more than that -
for example 256 bytes, from that address. This causes uninitialized
memory to get written to disk, overwriting the parts of the
inode past the osd2 member (the end of the smaller structure).

Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index b7eb617..0dfee62 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -547,7 +547,7 @@ static errcode_t update_path(ext2_extent_handle_t handle)

if (handle->level == 0) {
retval = ext2fs_write_inode_full(handle->fs, handle->ino,
- handle->inode, EXT2_INODE_SIZE(handle->fs->super));
+ handle->inode, sizeof(struct ext2_inode));
} else {
ix = handle->path[handle->level - 1].curr;
blk = ext2fs_le32_to_cpu(ix->ei_leaf) +



2009-06-17 08:04:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: Something wrong with extent-based journal creation

On Jun 16, 2009 17:38 -0500, Eric Sandeen wrote:
> I've narrowed it down to the commit (961306d3) which creates the journal
> with extent format; if that's commented out, it works fine.

Hmm, this seems like it might be a troublesome feature... Are we sure
that there is proper kernel/tool compatibility for an extent-mapped
journal? At least none of the Lustre code has any support for this,
though they can handle the INCOMPAT_EXTENTS feature, and I don't think
this was present in older kernels either.

I haven't looked at the code for this yet, so it may be that it will
"just work", but if this is a new feature I would urge caution w.r.t.
compatibility before this is merged upstream (in either e2fsprogs or
the kernel).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-06-17 14:45:16

by Eric Sandeen

[permalink] [raw]
Subject: Re: Something wrong with extent-based journal creation

Andreas Dilger wrote:
> On Jun 16, 2009 17:38 -0500, Eric Sandeen wrote:
>> I've narrowed it down to the commit (961306d3) which creates the journal
>> with extent format; if that's commented out, it works fine.
>
> Hmm, this seems like it might be a troublesome feature... Are we sure
> that there is proper kernel/tool compatibility for an extent-mapped
> journal? At least none of the Lustre code has any support for this,
> though they can handle the INCOMPAT_EXTENTS feature, and I don't think
> this was present in older kernels either.
>
> I haven't looked at the code for this yet, so it may be that it will
> "just work", but if this is a new feature I would urge caution w.r.t.
> compatibility before this is merged upstream (in either e2fsprogs or
> the kernel).

It's already upstream.... :)

But I found the problem I was looking for, at least, see the [PATCH] in
reply to my original email. (which is actually a pretty bad bug, likely
causing problems during e2fsck etc as well)

-Eric

2009-06-17 15:31:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Something wrong with extent-based journal creation

On Wed, Jun 17, 2009 at 02:04:32AM -0600, Andreas Dilger wrote:
> On Jun 16, 2009 17:38 -0500, Eric Sandeen wrote:
> > I've narrowed it down to the commit (961306d3) which creates the journal
> > with extent format; if that's commented out, it works fine.
>
> Hmm, this seems like it might be a troublesome feature... Are we sure
> that there is proper kernel/tool compatibility for an extent-mapped
> journal? At least none of the Lustre code has any support for this,
> though they can handle the INCOMPAT_EXTENTS feature, and I don't think
> this was present in older kernels either.
>
> I haven't looked at the code for this yet, so it may be that it will
> "just work", but if this is a new feature I would urge caution w.r.t.
> compatibility before this is merged upstream (in either e2fsprogs or
> the kernel).

Extent-based journals "just works", at least in the kernel; I didn't
need to make any changes to the kernel before I added support to using
an extent-mapped journal; if the kernel supports extents, it should
expert extent-mapped journals without any difficults. The patch to
add this was merged in before e2fsprogs 1.41.1 was released, so mke2fs
has been creating extent-based journals for a while.

- Ted

2009-06-17 15:35:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

On Tue, Jun 16, 2009 at 07:57:59PM -0500, Eric Sandeen wrote:
> libext2fs: write only core inode in update_path()
>
> The ext2_extent_handle only has a struct ext2_inode allocated on
> it, and the same amount copied into it in that same function,
> but in update_path() we're possibly writing out more than that -
> for example 256 bytes, from that address. This causes uninitialized
> memory to get written to disk, overwriting the parts of the
> inode past the osd2 member (the end of the smaller structure).

Oh, I see. The bug was introduced by commit 84b239ae


libext2fs: add ext2fs_extent_open2

The patch below adds a function, ext2fs_extent_open2(), that behaves
as ext2fs_extent_open(), but will use the user-supplied inode
structure when opening an extent instead of reading the inode from
disk. It also changes several of the calls to extent_open() to use
this enhancement.

Signed-off-by: Nic Case <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

Which replaced ext2fs_read_inode_full with ext2fs_read_inode(); which
is fine, extents.c doesn't need to use the full inode; but it didn't
change ext2fs_write_inode_full() with ext2fs_write_inode().

> diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
> index b7eb617..0dfee62 100644
> --- a/lib/ext2fs/extent.c
> +++ b/lib/ext2fs/extent.c
> @@ -547,7 +547,7 @@ static errcode_t update_path(ext2_extent_handle_t handle)
>
> if (handle->level == 0) {
> retval = ext2fs_write_inode_full(handle->fs, handle->ino,
> - handle->inode, EXT2_INODE_SIZE(handle->fs->super));
> + handle->inode, sizeof(struct ext2_inode));

Probably it would be better/simpler to replace this with:

retval = ext2fs_write_inode(handle->fs, handle->ino,
handle->inode);

- Ted

2009-06-17 15:50:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

Theodore Tso wrote:

>
> Probably it would be better/simpler to replace this with:
>
> retval = ext2fs_write_inode(handle->fs, handle->ino,
> handle->inode);
>
> - Ted
>
Sure, that makes more sense, revised below:

libext2fs: write only core inode in update_path()

The ext2_extent_handle only has a struct ext2_inode allocated on
it, and the same amount copied into it in that same function,
but in update_path() we're possibly writing out more than that -
for example 256 bytes, from that address. This causes uninitialized
memory to get written to disk, overwriting the parts of the
inode past the osd2 member (the end of the smaller structure).

Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index b7eb617..15ce302 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -546,8 +546,8 @@ static errcode_t update_path(ext2_extent_handle_t handle)
struct ext3_extent_idx *ix;

if (handle->level == 0) {
- retval = ext2fs_write_inode_full(handle->fs, handle->ino,
- handle->inode, EXT2_INODE_SIZE(handle->fs->super));
+ retval = ext2fs_write_inode(handle->fs, handle->ino,
+ handle->inode);
} else {
ix = handle->path[handle->level - 1].curr;
blk = ext2fs_le32_to_cpu(ix->ei_leaf) +


2009-06-17 20:11:20

by Nic Case

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()


--- On Wed, 6/17/09, Eric Sandeen wrote:
> Theodore Tso wrote:
> > Probably it would be better/simpler to replace this
> with:
> >
> > retval =
> ext2fs_write_inode(handle->fs, handle->ino,
> handle->inode);
> - Ted
> >
> Sure, that makes more sense, revised below:
>
> libext2fs: write only core inode in update_path()
>
> The ext2_extent_handle only has a struct ext2_inode
> allocated on
> it, and the same amount copied into it in that same
> function,

First, sorry for the original bug. I don't think the above is the right way to fix it, however. I am concerned that I may have basically broken write_inode_full on any inode with extents. For example, there is another call to write_inode_full in extent.c that might exhibit this same problem. I think the right fix would be to return to reading the full inode into memory in the extent_open function. See the patch below for what I am talking about.

libext2fs: read the full inode in extent_open2

The inode pointed to in the extent handle structure is expected to have the same size as the on-disk inode so the entire inode can be updated and written to disk, but in extent_open2, only 128 bytes was read into the inode, regardless of the on-disk inode size. Instead, read the entire inode.

Signed-off-by: Nic Case
---
diff --git a/e2fsprogs-1.41.6-orig/lib/ext2fs/extent.c b/e2fsprogs-1.41.6/lib/ext2fs/extent.c
index b7eb617..aebb9bc 100644
--- a/e2fsprogs-1.41.6-orig/lib/ext2fs/extent.c
+++ b/e2fsprogs-1.41.6/lib/ext2fs/extent.c
@@ -189,6 +189,7 @@ extern errcode_t ext2fs_extent_open2(ext2_filsys fs, ext2_ino_t ino,
{
struct ext2_extent_handle *handle;
errcode_t retval;
+ int isize = EXT2_INODE_SIZE(fs->super);
int i;
struct ext3_extent_header *eh;

@@ -203,7 +204,7 @@ extern errcode_t ext2fs_extent_open2(ext2_filsys fs, ext2_ino_t ino,
return retval;
memset(handle, 0, sizeof(struct ext2_extent_handle));

- retval = ext2fs_get_mem(sizeof(struct ext2_inode), &handle->inode);
+ retval = ext2fs_get_mem(isize, &handle->inode);
if (retval)
goto errout;

@@ -211,10 +212,10 @@ extern errcode_t ext2fs_extent_open2(ext2_filsys fs, ext2_ino_t ino,
handle->fs = fs;

if (inode) {
- memcpy(handle->inode, inode, sizeof(struct ext2_inode));
+ memcpy(handle->inode, inode, isize);
}
else {
- retval = ext2fs_read_inode(fs, ino, handle->inode);
+ retval = ext2fs_read_inode_full(fs, ino, handle->inode, isize);
if (retval)
goto errout;
}
---






2009-06-17 20:43:03

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

number9652 wrote:
> --- On Wed, 6/17/09, Eric Sandeen wrote:
>> Theodore Tso wrote:
>>> Probably it would be better/simpler to replace this
>> with:
>>> retval =
>> ext2fs_write_inode(handle->fs, handle->ino, handle->inode); - Ted
>>>
>> Sure, that makes more sense, revised below:
>>
>> libext2fs: write only core inode in update_path()
>>
>> The ext2_extent_handle only has a struct ext2_inode allocated on
>> it, and the same amount copied into it in that same function,
>
> First, sorry for the original bug. I don't think the above is the
> right way to fix it, however. I am concerned that I may have
> basically broken write_inode_full on any inode with extents. For
> example, there is another call to write_inode_full in extent.c that
> might exhibit this same problem. I think the right fix would be to
> return to reading the full inode into memory in the extent_open
> function. See the patch below for what I am talking about.

Hm, well, I wondered about that. IMHO having these 2 inode sizes is a
big pain in the butt, and the shuffling back and forth is just asking
for bugs.

If this is the case (that it's broken now), then we really need
something in the regression suite to catch it - all tests pass in 1.41.6
....

-Eric



2009-06-17 21:32:16

by Nic Case

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()


> number9652 wrote:
> > --- On Wed, 6/17/09, Eric Sandeen wrote:
> > right way to fix it, however. I am concerned
> that I may have
> > basically broken write_inode_full on any inode with
> extents. For
> > example, there is another call to write_inode_full in
> extent.c that
> > might exhibit this same problem. I think the
> right fix would be to
> > return to reading the full inode into memory in the
> extent_open
>
> If this is the case (that it's broken now), then we really
> need
> something in the regression suite to catch it - all tests
> pass in 1.41.6
> ....
>
> -Eric
>
I was too general in my statement above about what it broke. I think (didn't test) if a program follows this path:
extent_open(...,*handle)
write_inode_full(...,handle->inode,...)
that it will read uninitialized memory in write_inode_full. It seems clear that all previously written code assumes that this path is valid, and fixing all that to not assume that would seemingly be much more than just what your patch fixes and require more time to test. If you only use read/write_inode_full, everything is still okay.

I think that in this case, even when only using the handle to read the inode, we want to have the full inode available (by handle->inode) so it is possible (for example) to check the file creation time with the returned handle structure.

Nic






2009-06-17 21:36:31

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

number9652 wrote:
>> number9652 wrote:
>>> --- On Wed, 6/17/09, Eric Sandeen wrote: right way to fix it,
>>> however. I am concerned
>> that I may have
>>> basically broken write_inode_full on any inode with
>> extents. For
>>> example, there is another call to write_inode_full in
>> extent.c that
>>> might exhibit this same problem. I think the
>> right fix would be to
>>> return to reading the full inode into memory in the
>> extent_open
>>
>> If this is the case (that it's broken now), then we really need
>> something in the regression suite to catch it - all tests pass in
>> 1.41.6 ....
>>
>> -Eric
>>
> I was too general in my statement above about what it broke. I think
> (didn't test) if a program follows this path:
> extent_open(...,*handle) write_inode_full(...,handle->inode,...) that
> it will read uninitialized memory in write_inode_full. It seems
> clear that all previously written code assumes that this path is
> valid, and fixing all that to not assume that would seemingly be much
> more than just what your patch fixes and require more time to test.
> If you only use read/write_inode_full, everything is still okay.
>
> I think that in this case, even when only using the handle to read
> the inode, we want to have the full inode available (by
> handle->inode) so it is possible (for example) to check the file
> creation time with the returned handle structure.

Seems reasonable to me ...

FWIW, I have Yet Another Case where resize is corrupting the larger
inode, even with this fix and/or my fix.

Still digging into it, sigh.

-Eric

2009-06-17 22:47:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

On Wed, Jun 17, 2009 at 02:32:18PM -0700, number9652 wrote:
> I was too general in my statement above about what it broke. I think (didn't test) if a program follows this path:
> extent_open(...,*handle)
> write_inode_full(...,handle->inode,...)
> that it will read uninitialized memory in write_inode_full. It
>seems clear that all previously written code assumes that this path
>is valid, and fixing all that to not assume that would seemingly be
>much more than just what your patch fixes and require more time to
>test. If you only use read/write_inode_full, everything is still
>okay.

Um, but that can't happen; the ext2_extent_handle_t type is an opaque
type. So outside of code in lib/ext2fs/extent.c, no one else can see
the inode in the ext2_handle_t.

The safest thing to do is to simply make the extent code only use
ext2_write_inode and ext2_read_inode. Otherwise, we have to worry
about a user passing in the smaller 128-byte inode form. The extent
code doesn't need any of the extended inode fields, so this is
completely safe.

- Ted

2009-06-17 22:50:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

This is what I ultimately checked in. It converts all calls of
ext2fs_write_inode_full() to ext2fs_write_inode().

- Ted

commit 125a36780626cdb0fc4d62fd529486baa8bce54c
Author: Eric Sandeen <[email protected]>
Date: Wed Jun 17 18:49:01 2009 -0400

libext2fs: write only core inode in update_path()

The ext2_extent_handle only has a struct ext2_inode allocated on
it, and the same amount copied into it in that same function,
but in update_path() we're possibly writing out more than that -
for example 256 bytes, from that address. This causes uninitialized
memory to get written to disk, overwriting the parts of the
inode past the osd2 member (the end of the smaller structure).

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index 2b88739..35b080e 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -546,8 +546,8 @@ static errcode_t update_path(ext2_extent_handle_t handle)
struct ext3_extent_idx *ix;

if (handle->level == 0) {
- retval = ext2fs_write_inode_full(handle->fs, handle->ino,
- handle->inode, EXT2_INODE_SIZE(handle->fs->super));
+ retval = ext2fs_write_inode(handle->fs, handle->ino,
+ handle->inode);
} else {
ix = handle->path[handle->level - 1].curr;
blk = ext2fs_le32_to_cpu(ix->ei_leaf) +
@@ -1011,8 +1011,8 @@ static errcode_t extent_node_split(ext2_extent_handle_t handle)

/* new node hooked in, so update inode block count (do this here?) */
handle->inode->i_blocks += handle->fs->blocksize / 512;
- retval = ext2fs_write_inode_full(handle->fs, handle->ino,
- handle->inode, EXT2_INODE_SIZE(handle->fs->super));
+ retval = ext2fs_write_inode(handle->fs, handle->ino,
+ handle->inode);
if (retval)
goto done;

@@ -1370,9 +1370,8 @@ errcode_t ext2fs_extent_delete(ext2_extent_handle_t handle, int flags)

retval = ext2fs_extent_delete(handle, flags);
handle->inode->i_blocks -= handle->fs->blocksize / 512;
- retval = ext2fs_write_inode_full(handle->fs,
- handle->ino, handle->inode,
- EXT2_INODE_SIZE(handle->fs->super));
+ retval = ext2fs_write_inode(handle->fs, handle->ino,
+ handle->inode);
ext2fs_block_alloc_stats(handle->fs, extent.e_pblk, -1);
}
} else {

2009-06-17 23:00:35

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] libext2fs: write only core inode in update_path()

Theodore Tso wrote:
> This is what I ultimately checked in. It converts all calls of
> ext2fs_write_inode_full() to ext2fs_write_inode().

Hey thanks for finding those, it fixed the next resize bug I was looking
at! ;)

Now, put down the laptop and pay attention to that talk, Ted ;)

-Eric

> - Ted
>
> commit 125a36780626cdb0fc4d62fd529486baa8bce54c
> Author: Eric Sandeen <[email protected]>
> Date: Wed Jun 17 18:49:01 2009 -0400
>
> libext2fs: write only core inode in update_path()
>
> The ext2_extent_handle only has a struct ext2_inode allocated on
> it, and the same amount copied into it in that same function,
> but in update_path() we're possibly writing out more than that -
> for example 256 bytes, from that address. This causes uninitialized
> memory to get written to disk, overwriting the parts of the
> inode past the osd2 member (the end of the smaller structure).
>
> Signed-off-by: Eric Sandeen <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
> index 2b88739..35b080e 100644
> --- a/lib/ext2fs/extent.c
> +++ b/lib/ext2fs/extent.c
> @@ -546,8 +546,8 @@ static errcode_t update_path(ext2_extent_handle_t handle)
> struct ext3_extent_idx *ix;
>
> if (handle->level == 0) {
> - retval = ext2fs_write_inode_full(handle->fs, handle->ino,
> - handle->inode, EXT2_INODE_SIZE(handle->fs->super));
> + retval = ext2fs_write_inode(handle->fs, handle->ino,
> + handle->inode);
> } else {
> ix = handle->path[handle->level - 1].curr;
> blk = ext2fs_le32_to_cpu(ix->ei_leaf) +
> @@ -1011,8 +1011,8 @@ static errcode_t extent_node_split(ext2_extent_handle_t handle)
>
> /* new node hooked in, so update inode block count (do this here?) */
> handle->inode->i_blocks += handle->fs->blocksize / 512;
> - retval = ext2fs_write_inode_full(handle->fs, handle->ino,
> - handle->inode, EXT2_INODE_SIZE(handle->fs->super));
> + retval = ext2fs_write_inode(handle->fs, handle->ino,
> + handle->inode);
> if (retval)
> goto done;
>
> @@ -1370,9 +1370,8 @@ errcode_t ext2fs_extent_delete(ext2_extent_handle_t handle, int flags)
>
> retval = ext2fs_extent_delete(handle, flags);
> handle->inode->i_blocks -= handle->fs->blocksize / 512;
> - retval = ext2fs_write_inode_full(handle->fs,
> - handle->ino, handle->inode,
> - EXT2_INODE_SIZE(handle->fs->super));
> + retval = ext2fs_write_inode(handle->fs, handle->ino,
> + handle->inode);
> ext2fs_block_alloc_stats(handle->fs, extent.e_pblk, -1);
> }
> } else {