2012-12-15 20:49:51

by Dâniel Fraga

[permalink] [raw]
Subject: Kernel 3.7.0: bad header/extent

After upgrading from kernel 3.6.0 to 3.7.0 (x86-64) I got this:

Dec 15 18:38:28 tux kernel: EXT4-fs error (device sda2):
ext4_ext_check_inode:462: inode #9311628: comm less: bad header/extent:
invalid extent entries - magic f30a, entries 1, max 4(4), depth 0(0)

So I tried to run e2fsck and it "fixed" inode 9311628. Running
e2fsck again doesn't detect any errors, but the problem remains.

The file I can't access is /usr/include/dlfcn.h (I was trying to compile openssh).

Anyway, any hints how can I fix it?

Thanks.

Ps: if you need more information, just ask.

--
Linux 3.7.0: Terrified Chipmunk
http://www.youtube.com/DanielFragaBR
http://www.libertarios.org.br




2012-12-16 02:27:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On Sat, Dec 15, 2012 at 06:44:26PM -0200, D?niel Fraga wrote:
> After upgrading from kernel 3.6.0 to 3.7.0 (x86-64) I got this:
>
> Dec 15 18:38:28 tux kernel: EXT4-fs error (device sda2):
> ext4_ext_check_inode:462: inode #9311628: comm less: bad header/extent:
> invalid extent entries - magic f30a, entries 1, max 4(4), depth 0(0)
>
> So I tried to run e2fsck and it "fixed" inode 9311628. Running
> e2fsck again doesn't detect any errors, but the problem remains.

What do you mean "the problem remains"? Are you getting the same
EXT4-fs error message when you try to access the file?

Can you send the output of the command:

debugfs -R "stat <9311628>"/dev/sda2

Thanks!!

- Ted

2012-12-16 03:00:30

by Dâniel Fraga

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On Sat, 15 Dec 2012 21:27:53 -0500
Theodore Ts'o <[email protected]> wrote:

> > So I tried to run e2fsck and it "fixed" inode 9311628. Running
> > e2fsck again doesn't detect any errors, but the problem remains.
>
> What do you mean "the problem remains"? Are you getting the same
> EXT4-fs error message when you try to access the file?
>
> Can you send the output of the command:
>
> debugfs -R "stat <9311628>"/dev/sda2

Hi Ted! Yes, I get the same EXT4-fs message when I try to
access teh file. Here's the requested output:

debugfs 1.41.12 (17-May-2010)
Inode: 9311628 Type: regular Mode: 0644 Flags: 0x80000
Generation: 834245655 Version: 0x00000000:00000001
User: 0 Group: 0 Size: 7117
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4f76ba2c:537fd694 -- Sat Mar 31 05:02:52 2012
atime: 0x50ccaf85:c6d93344 -- Sat Dec 15 15:12:37 2012
mtime: 0x4f76ba2c:537fd694 -- Sat Mar 31 05:02:52 2012
crtime: 0x4f76ba2c:537fd694 -- Sat Mar 31 05:02:52 2012
Size of extra inode fields: 28
EXTENTS:

***

If you need more testing, just ask. Thanks!

--
Linux 3.7.0: Terrified Chipmunk
http://www.youtube.com/DanielFragaBR
http://www.libertarios.org.br

2012-12-16 03:51:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On Sun, Dec 16, 2012 at 01:00:25AM -0200, D?niel Fraga wrote:
>
> Hi Ted! Yes, I get the same EXT4-fs message when I try to
> access teh file. Here's the requested output:

Um, really? **Exactly** the same error message? That doesn't make
any sense. The error message you quoted happens when the kernel
complains that the block numbers in the inode in question are invalid
(i.e., are too big for the inode in question, or point at file system
metadata).

However, debugfs is not showing any extents --- which would be the
case after e2fsck repaired the file system (it would have zapped the
extent tree for the inode).

So (a) you did run e2fsck on an unmounted file system right?

(b) Can you send me the output of:

debugfs -R "extents <9311628>" /dev/sda2

just to be sure we aren't missing anything.

Also, if you are using a really new kernel such as 3.6.x or 3.7.x, you
***really*** shouldn't be using an ancient version of e2fsprogs such
as 1.41.12. You really should be using e2fsprogs 1.42.x, preferably
the latest e2fsprogs 1.42.6. I wonder if you are seeing a similar
message indicating that the file system had previously found an error,
and which wasn't cleared because you are using an ancient version of
e2fsprogs....

- Ted

2012-12-16 05:39:41

by Dâniel Fraga

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On Sat, 15 Dec 2012 22:51:50 -0500
Theodore Ts'o <[email protected]> wrote:

> Um, really? **Exactly** the same error message? That doesn't make
> any sense. The error message you quoted happens when the kernel
> complains that the block numbers in the inode in question are invalid
> (i.e., are too big for the inode in question, or point at file system
> metadata).

Yes. The exact same message before and after:

EXT4-fs error (device sda2): ext4_ext_check_inode:462: inode #9311628:
comm less: bad header/extent: invalid extent entries - magic f30a,
entries 1, max 4(4), depth 0(0)

> However, debugfs is not showing any extents --- which would be the
> case after e2fsck repaired the file system (it would have zapped the
> extent tree for the inode).
>
> So (a) you did run e2fsck on an unmounted file system right?

Yes, unmounted.

> (b) Can you send me the output of:
>
> debugfs -R "extents <9311628>" /dev/sda2
>
> just to be sure we aren't missing anything.

Here it is:

debugfs 1.41.12 (17-May-2010)
Level Entries Logical Physical Length Flags
0/ 0 1/ 1 0 - 4294967295 37333026 - 4332300321 0

> Also, if you are using a really new kernel such as 3.6.x or 3.7.x, you
> ***really*** shouldn't be using an ancient version of e2fsprogs such
> as 1.41.12. You really should be using e2fsprogs 1.42.x, preferably
> the latest e2fsprogs 1.42.6. I wonder if you are seeing a similar
> message indicating that the file system had previously found an error,
> and which wasn't cleared because you are using an ancient version of
> e2fsprogs....

Ok. The problem is that I'm trapped. I need to compile the most
recent version (1.42.6) but the needed file to
compile (/usr/include/dlfcn.h) isn't available (Input/output error)
because of this problem.

But no problem, because I used e2fsck from "Recovery is
possible 13.7" cd which uses e2fsck 1.42 version (so you can be sure I
used e2fsck 1.42 version).

Any more suggestions? Thanks!

--
Linux 3.7.0: Terrified Chipmunk
http://www.youtube.com/DanielFragaBR
http://www.libertarios.org.br

2012-12-16 06:08:41

by Andreas Dilger

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On 2012-12-15, at 22:39, Dâniel Fraga <[email protected]> wrote:

> On Sat, 15 Dec 2012 22:51:50 -0500
> Theodore Ts'o <[email protected]> wrote:
>
>> Um, really? **Exactly** the same error message? That doesn't make
>> any sense. The error message you quoted happens when the kernel
>> complains that the block numbers in the inode in question are invalid
>> (i.e., are too big for the inode in question, or point at file system
>> metadata).
>
> Yes. The exact same message before and after:
>
> EXT4-fs error (device sda2): ext4_ext_check_inode:462: inode #9311628:
> comm less: bad header/extent: invalid extent entries - magic f30a,
> entries 1, max 4(4), depth 0(0)
>
>> However, debugfs is not showing any extents --- which would be the
>> case after e2fsck repaired the file system (it would have zapped the
>> extent tree for the inode).
>>
>> So (a) you did run e2fsck on an unmounted file system right?
>
> Yes, unmounted.
>
>> (b) Can you send me the output of:
>>
>> debugfs -R "extents <9311628>" /dev/sda2
>>
>> just to be sure we aren't missing anything.
>
> Here it is:
>
> debugfs 1.41.12 (17-May-2010)
> Level Entries Logical Physical Length Flags
> 0/ 0 1/ 1 0 - 4294967295 37333026 - 4332300321 0

This is interesting. The one extent reports it is valid for 2^32-1 blocks, but this isn't possible with the current on-disk extent format. It looks like the extent is actually storing "-1" blocks (which is also invalid) but is incorrectly sign extended to 0xffffffff.

So e2fsck is allowing this, because it is theoretically possible, but the kernel can't actually use it.

Cheers, Andreas

>> Also, if you are using a really new kernel such as 3.6.x or 3.7.x, you
>> ***really*** shouldn't be using an ancient version of e2fsprogs such
>> as 1.41.12. You really should be using e2fsprogs 1.42.x, preferably
>> the latest e2fsprogs 1.42.6. I wonder if you are seeing a similar
>> message indicating that the file system had previously found an error,
>> and which wasn't cleared because you are using an ancient version of
>> e2fsprogs....
>
> Ok. The problem is that I'm trapped. I need to compile the most
> recent version (1.42.6) but the needed file to
> compile (/usr/include/dlfcn.h) isn't available (Input/output error)
> because of this problem.
>
> But no problem, because I used e2fsck from "Recovery is
> possible 13.7" cd which uses e2fsck 1.42 version (so you can be sure I
> used e2fsck 1.42 version).
>
> Any more suggestions? Thanks!
>
> --
> Linux 3.7.0: Terrified Chipmunk
> http://www.youtube.com/DanielFragaBR
> http://www.libertarios.org.br
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-12-16 14:50:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On Sat, Dec 15, 2012 at 11:08:34PM -0700, Andreas Dilger wrote:
> > debugfs 1.41.12 (17-May-2010)
> > Level Entries Logical Physical Length Flags
> > 0/ 0 1/ 1 0 - 4294967295 37333026 - 4332300321 0
>
> This is interesting. The one extent reports it is valid for 2^32-1
> blocks, but this isn't possible with the current on-disk extent
> format. It looks like the extent is actually storing "-1" blocks
> (which is also invalid) but is incorrectly sign extended to
> 0xffffffff.

Actuually, the number of blocks in the extent was set to 0. The
number reported by e2fsprogs contains is the inclusive range (i.e.,
lblk, lblk+len-1).

A fix for this was added to e2fsprogs in v1.42.2 in March 2012, by
commit 26c09eb8145a1 ("e2fsck: check for zero length extent"). There
was a regression which this commit would sometimes trigger which was
fixed in v1.42.4 (commit 9c40d14841f0, "e2fsck: only check for
zero-length leaf extents"). So e2fsck 1.42.4 or newer is recommended
to repair this sort of file system corruption.

> > Ok. The problem is that I'm trapped. I need to compile the most
> > recent version (1.42.6) but the needed file to
> > compile (/usr/include/dlfcn.h) isn't available (Input/output error)
> > because of this problem.

What I would suggest that you do is to zap the bad inode and then run
e2fsck to repair the resulting damage:

debugfs -w -R "clri <9311628>" /dev/sda2
e2fsck -fy /dev/sda2

Then reinstall the glibc package (libc6-dev if you are using a
Debian-derived distribution), which supplies dlfcn.h, and then
recompile e2fsprogs so you can install a 1.42.x version of e2fsprogs.

I'm not entirely sure how the inode had gotten corrupted in the first
place, but I would be surprised if it was due to upgrading the kernel
from 3.6 to 3.7. If you do see this corruption again, please let us
know, and hopefully we can try to figure out what the root cause of
the issue might be.

Regards,

- Ted

2012-12-16 23:52:55

by Dâniel Fraga

[permalink] [raw]
Subject: Re: Kernel 3.7.0: bad header/extent

On Sun, 16 Dec 2012 09:50:13 -0500
Theodore Ts'o <[email protected]> wrote:

> A fix for this was added to e2fsprogs in v1.42.2 in March 2012, by
> commit 26c09eb8145a1 ("e2fsck: check for zero length extent"). There
> was a regression which this commit would sometimes trigger which was
> fixed in v1.42.4 (commit 9c40d14841f0, "e2fsck: only check for
> zero-length leaf extents"). So e2fsck 1.42.4 or newer is recommended
> to repair this sort of file system corruption.

Ok Ted. An easier way was to use Systemrescuecd beta which
provides e2fsck 1.42.6. Now the problem has gone ;)

> I'm not entirely sure how the inode had gotten corrupted in the first
> place, but I would be surprised if it was due to upgrading the kernel
> from 3.6 to 3.7. If you do see this corruption again, please let us
> know, and hopefully we can try to figure out what the root cause of
> the issue might be.

No problem. I always compile my software, so the first time I
notice it was after upgrading to 3.7.0 kernel. Anyway I'm not sure if
it appeared after the upgrade to 3.7.0 kernel.

But ok. If I have some more of these issues, I will report here.

Thanks for the great support!

--
Linux 3.7.0: Terrified Chipmunk
http://www.youtube.com/DanielFragaBR
http://www.libertarios.org.br