2007-08-10 12:41:32

by Kosta Kliakhandler

[permalink] [raw]
Subject: fsck 1.39 segfaults while fixing a corrupt inode

Hi!

I have a severe problem - I was a fool and made / (root) ext4 (at
least not /home, thank god) and somehow some corruption occured.
While running fsck, it keeps segfaulting, complaining about fast
memory corruption (segfaults always at the same inode, always when I
tell it to fix it). I *think* the issue was addressed in
e2fsprogs-1.40.2 (from what I understood in the changelog) - but I
can't find a patchset for it...
I tried to do the 1.39 patch on the 1.40.2 source but there were more
then a few mismatches and it didn't build. With enough work, I might
be able to build it, but I strongly prefer a ready patch/tarball which
I would know works, but I haven't found any on the net.

Can anyone please point me to one or send one to me, or advise on a
different solution?

Attached is the error log.

Thanks in advance,
Kosta.

--
Kosta.tk


Attachments:
(No filename) (850.00 B)
error.out (3.00 kB)
Download all attachments

2007-08-10 14:03:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: fsck 1.39 segfaults while fixing a corrupt inode

On Fri, Aug 10, 2007 at 03:41:32PM +0300, Kosta Kliakhandler wrote:
>
> I have a severe problem - I was a fool and made / (root) ext4 (at
> least not /home, thank god) and somehow some corruption occured.
> While running fsck, it keeps segfaulting, complaining about fast
> memory corruption (segfaults always at the same inode, always when I
> tell it to fix it). I *think* the issue was addressed in
> e2fsprogs-1.40.2 (from what I understood in the changelog)

What, you mean this one?

A recent change to e2fsck_add_dir_info() to use tdb files to check
filesystems with a very large number of filesystems had a typo which
caused us to resize the wrong data structure. This would cause a
array overrun leading to malloc pointer corruptions and segfaults.
Since we normally can very accurately predict how big the the dirinfo
array needs to be, this bug only got triggered on very badly corrupted
filesystems.

If so, it couldn't be, since the tdb support was only added in
e2fsprogs 1.40, and you're using the 1.39 patchset, right? So it has
to be some other problem, and probably a bug which gets triggered when
it runs across a corrupted extent entry.

How big is your root filesystem image? Unfortunately e2image hasn't
been updated support extents yet (there's a reason I keep telling
people ext4 isn't quite ready for prime time yet...), so we can't use
a compressed e2image file.

- Ted

2007-08-10 23:52:17

by Kosta Kliakhandler

[permalink] [raw]
Subject: Re: fsck 1.39 segfaults while fixing a corrupt inode

Thanks man!

I'm now finally writing this from firefox and not from links :)

If you care to know, I found the 1.40.1 patches in the testing dir,
and applied only the extents patch to 1.40.2, and then build it and
ran e2fsck, which worked fine and found and fixed lots of
corruption...

as for the block that was giving problems last time, it now said that
it's position is in -1 instead of 0, so I guess this is what caused
problems..

Best regards,
Kosta.



> On 8/10/07, Andreas Dilger <[email protected]> wrote:
> > On Aug 10, 2007 15:34 +0300, Kosta Kliakhandler wrote:
> > > I have a severe problem - I was a fool and made / (root) ext4 (at
> > > least not /home, thank god) and somehow some corruption occured.
> > > While running fsck, it keeps segfaulting, complaining about fast
> > > memory corruption (segfaults always at the same inode, always when I
> > > tell it to fix it). I *think* the issue was addressed in
> > > e2fsprogs-1.40.2 (from what I understood in the changelog) - but I
> > > can't find a patchset for it...
> > > I tried to do the 1.39 patch on the 1.40.2 source but there were more
> > > then a few mismatches and it didn't build. with enough work, I might
> > > be able to build it, but I strongly prefer a ready patch/tarball which
> > > I know will work and I haven't found any on the net.
> > >
> > > Can anyone please point me to one or send one to me, or advise on a
> > > different solution?
> >
> > Try ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Principal Software Engineer
> > Cluster File Systems, Inc.

2007-08-11 00:02:55

by Kosta Kliakhandler

[permalink] [raw]
Subject: Re: fsck 1.39 segfaults while fixing a corrupt inode

On 8/10/07, Theodore Tso <[email protected]> wrote:
> On Fri, Aug 10, 2007 at 03:41:32PM +0300, Kosta Kliakhandler wrote:
> >
> > I have a severe problem - I was a fool and made / (root) ext4 (at
> > least not /home, thank god) and somehow some corruption occured.
> > While running fsck, it keeps segfaulting, complaining about fast
> > memory corruption (segfaults always at the same inode, always when I
> > tell it to fix it). I *think* the issue was addressed in
> > e2fsprogs-1.40.2 (from what I understood in the changelog)
>
> What, you mean this one?
>
> A recent change to e2fsck_add_dir_info() to use tdb files to check
> filesystems with a very large number of filesystems had a typo which
> caused us to resize the wrong data structure. This would cause a
> array overrun leading to malloc pointer corruptions and segfaults.
> Since we normally can very accurately predict how big the the dirinfo
> array needs to be, this bug only got triggered on very badly corrupted
> filesystems.
>
> If so, it couldn't be, since the tdb support was only added in
> e2fsprogs 1.40, and you're using the 1.39 patchset, right? So it has
> to be some other problem, and probably a bug which gets triggered when
> it runs across a corrupted extent entry.
>
> How big is your root filesystem image? Unfortunately e2image hasn't
> been updated support extents yet (there's a reason I keep telling
> people ext4 isn't quite ready for prime time yet...), so we can't use
> a compressed e2image file.
>
> - Ted
>

Yes, this was what I thought about... Well, maybe it wasn't that bug,
but what happened to me certainly seemed similar - especially since
after applying the extents patch which andreas supplied, it worked
well.

When I ran the new one, it reported that the problematic inode is in
position -1 and not 0, so I guess this is what caused the problems in
1.39.

Anyway, thanks for the help.

Regards,
Kosta.

--
Kosta.tk