2007-09-12 12:14:14

by Stef Epardaud

[permalink] [raw]
Subject: Filesystem silent crash with EXT3

Hello,
I am not subscribed to this list, so please CC me on replies.
I have just suffered from a filesystem crash on my laptop. It's an x86
running linux 2.6.22 from Ubutu latest gutsy.
This morning it booted and there were all kinds of strange core dumps.
fsck told me the root filesystem was clean,.
I then did a long test with smartctl which told me my disk was fine.
Then I forced fsck to check the filesystem (I was getting lots of files
not found), which then proceeded to fix plenty of errors, duplicate
missing or shared inodes, and many other errors I did not understand.
After pressing 'y' for about a hunderd times I was pretty convinced I
would not get my data back. Anyways after the 'fix' several parts of my
libc were removed and I cannot boot the system anymore (cannot get a
shell).
I had no idea my EXT3 filesystem could go wrong without notifying me
(fsck thought the filesystem was clean before I forced it). Is this
normal ?
If not, what can I do to figure out what went wrong ?
I am in the process of getting a live linux CD to look at the disk,
maybe there are some logs that can help.
Note that it's the first time I've lost a EXT3 (or EXT2) filesystem
without having experienced any system crash or power failure to explain
the filesystem problem. This machine booted and was turned off properly
for several weeks (at least).
Thanks for any help identifying the problem, I really hope this does not
happen again to me or anyone else.
--
Stéphane Epardaud


2007-09-12 13:03:58

by Jan Kara

[permalink] [raw]
Subject: Re: Filesystem silent crash with EXT3

Hello,

> I am not subscribed to this list, so please CC me on replies.
> I have just suffered from a filesystem crash on my laptop. It's an x86
> running linux 2.6.22 from Ubutu latest gutsy.
> This morning it booted and there were all kinds of strange core dumps.
> fsck told me the root filesystem was clean,.
> I then did a long test with smartctl which told me my disk was fine.
> Then I forced fsck to check the filesystem (I was getting lots of files
> not found), which then proceeded to fix plenty of errors, duplicate
> missing or shared inodes, and many other errors I did not understand.
> After pressing 'y' for about a hunderd times I was pretty convinced I
> would not get my data back. Anyways after the 'fix' several parts of my
> libc were removed and I cannot boot the system anymore (cannot get a
> shell).
My condolations ;) Actually, smartctl is not very reliable... I've
heard about disks reporting everything is fine while they were broken.

> I had no idea my EXT3 filesystem could go wrong without notifying me
> (fsck thought the filesystem was clean before I forced it). Is this
> normal ?
> If not, what can I do to figure out what went wrong ?
> I am in the process of getting a live linux CD to look at the disk,
> maybe there are some logs that can help.
> Note that it's the first time I've lost a EXT3 (or EXT2) filesystem
> without having experienced any system crash or power failure to explain
> the filesystem problem. This machine booted and was turned off properly
> for several weeks (at least).
> Thanks for any help identifying the problem, I really hope this does not
> happen again to me or anyone else.
It is close to impossible to find out what has happened after the fact
- I guess you don't have the original filesystem image backed-up, do
you? If something like this happens to you next time, try backing up
the filesystem image (via dd) before you run fsck (or if you don't have
enough spare space, you can use e2image to backup at least metadata).
Maybe if you could dig some messages in the system log, it would be
possible to find out what has happened. But with the information we
currently have it's to foggy to be able to debug anything... sorry.

Honza

2007-09-12 13:53:14

by Michael B. Trausch

[permalink] [raw]
Subject: [OT] Ubuntu GG 2.6.22 (Was: Re: Filesystem silent crash with EXT3)

Stef Epardaud, on 09/12/2007 08:02 AM said:
>
> Hello,
> I am not subscribed to this list, so please CC me on replies.
> I have just suffered from a filesystem crash on my laptop. It's an x86
> running linux 2.6.22 from Ubutu latest gutsy.
>

(I am cc:ing the list so that this person does not get several multiple
replies. Also, I am making the assumption, Stef, that you are not
running a vanilla kernel that you built yourself from the kernel sources
and that you are running the Ubuntu stock kernel.)

Please go to #ubuntu+1 on irc.freenode.net for issues with your Ubuntu
Gutsy Gibbon prerelease system. For starters, the kernel that comes
with Ubuntu is not the vanilla Linux kernel, it has patches and added
and removed things to it---like nearly all distribution kernels.

Secondly, the Gutsy Gibbon is not yet released for public consumption.
It is in a prerelease state, and as such, if you're running it, and you
encounter issues, please file bugs with Ubuntu
(http://www.launchpad.net/), where they will be far more receptive to
such information. However, they will not directly support you---you
chose to run a prerelease system; they'll tell you that if it breaks
your system you get to keep all the pieces. Once the Gibbon is
released, though, you can use it with the backing of the regular Ubuntu
community and it will be supported.

That having been said, you may not want to run prerelease software that
is known to be unstable if you don't have the time (or the ability) to
troubleshoot and file a detailed bug report during the prerelease state.
At any rate, this list is not the place to air your problems with the
Gutsy Gibbon.

HTH,
Mike

--
Michael B. Trausch Internet Mail & Jabber: [email protected]
Phone: (404) 592-5746 x1 http://www.trausch.us/
Mobile: (678) 522-7934 VoIP: [email protected], 861384@fwd
Pidgin 2.1.1 and plugins for Ubuntu Feisty! http://www.trausch.us/pidgin


Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2007-09-12 13:53:52

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: Filesystem silent crash with EXT3

Stef Epardaud <[email protected]> writes:

> I had no idea my EXT3 filesystem could go wrong without notifying me
> (fsck thought the filesystem was clean before I forced it). Is this
> normal ?

Filesystem clean = the "dirty" flag is not set. With journaling
fs it basically means the kernel didn't found errors during
operation.
It doesn't mean there are no errors.

> If not, what can I do to figure out what went wrong ?

I'd check with memtest86. Silent fs corruption is usually caused
by faulty hardware such as RAM.

> I am in the process of getting a live linux CD to look at the disk,
> maybe there are some logs that can help.

I wouldn't count on it.

> Note that it's the first time I've lost a EXT3 (or EXT2) filesystem
> without having experienced any system crash or power failure to explain
> the filesystem problem.

A system crash or especially power failure shouldn't damage ext3fs.
--
Krzysztof Halasa

2007-09-12 14:12:18

by Stef Epardaud

[permalink] [raw]
Subject: Re: [OT] Ubuntu GG 2.6.22 (Was: Re: Filesystem silent crash with EXT3)

On Wed, Sep 12, 2007 at 09:52:53AM -0400, Michael B. Trausch wrote:
> (I am cc:ing the list so that this person does not get several multiple
> replies. Also, I am making the assumption, Stef, that you are not
> running a vanilla kernel that you built yourself from the kernel sources
> and that you are running the Ubuntu stock kernel.)

You're right.

> Please go to #ubuntu+1 on irc.freenode.net for issues with your Ubuntu
> Gutsy Gibbon prerelease system. For starters, the kernel that comes
> with Ubuntu is not the vanilla Linux kernel, it has patches and added
> and removed things to it---like nearly all distribution kernels.

Sorry, I did not realise there was a more appropriate place. It won't
happen again :)

> Secondly, the Gutsy Gibbon is not yet released for public consumption.
> It is in a prerelease state, and as such, if you're running it, and you
> encounter issues, please file bugs with Ubuntu
> (http://www.launchpad.net/), where they will be far more receptive to
> such information. However, they will not directly support you---you
> chose to run a prerelease system; they'll tell you that if it breaks
> your system you get to keep all the pieces. Once the Gibbon is
> released, though, you can use it with the backing of the regular Ubuntu
> community and it will be supported.
> That having been said, you may not want to run prerelease software that
> is known to be unstable if you don't have the time (or the ability) to
> troubleshoot and file a detailed bug report during the prerelease state.

It's not so much that I am complaining, I didn't ask anyone to help me
recover my data either. I just wanted to see if there was something I
could do to find an eventual problem, so that it could be fixed.

Thanks for your pointers though.
--
Stéphane Epardaud

2007-09-12 14:16:20

by Stef Epardaud

[permalink] [raw]
Subject: [Resolved: bad RAM] Filesystem silent crash with EXT3

On Wed, Sep 12, 2007 at 03:53:40PM +0200, Krzysztof Halasa wrote:
> Filesystem clean = the "dirty" flag is not set. With journaling
> fs it basically means the kernel didn't found errors during
> operation.
> It doesn't mean there are no errors.

That's what I assumed, and that's why I forced fsck.

> > If not, what can I do to figure out what went wrong ?
> I'd check with memtest86. Silent fs corruption is usually caused
> by faulty hardware such as RAM.

I feel like a fool now. You're absolutely right, my RAM is corrupted.
Sorry for the trouble folks, I'm very happy to come to the conclusion I
can rely on EXT3. Now I have to start worrying about how to detect RAM
corruption before it screws up my data next time.

> A system crash or especially power failure shouldn't damage ext3fs.

That's what I assumed, and I'm glad to see it's still true.

Once again, sorry for the trouble.
--
Stéphane Epardaud

2007-09-13 09:21:53

by Helge Hafting

[permalink] [raw]
Subject: Re: [Resolved: bad RAM] Filesystem silent crash with EXT3

Stef Epardaud wrote:
> I feel like a fool now. You're absolutely right, my RAM is corrupted.
> Sorry for the trouble folks, I'm very happy to come to the conclusion I
> can rely on EXT3. Now I have to start worrying about how to detect RAM
> corruption before it screws up my data next time.
That is what memory with parity is for. Don't know if
that exists for laptops though.

Helge Hafting