2011-02-23 21:09:22

by Hari Subramanian

[permalink] [raw]
Subject: e2fsprogs/ext4 version compatibility

Hi,

I'm running a SLES11 SP1 GA based kernel (2.6.32.23) with ext4 filesystem. The e2fsprogs package that's installed on the box is version 1.41.9 (22-Aug-2009). I was wondering if there was a version in compatibility between the filesystem implementation and the tools package, esp. since there have been 5 revisions of the package since 1.41.9

The reason I'm asking the question is my machine recently rebooted after a crash but fsck failed with an error code of 4 and the following message:

"Inodes that were part of a corrupted orphan linked list found"
"UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY"

There were no files being created or deleted at the time of the crash although IOs were probably inflight. So, the inodes being a part of orphaned link list doesn't make sense to me. I don't claim deep knowledge in filesystem internals either. Before I go suspecting a corrupted filesystem, I wanted to make sure I was running the right version of fsck.

Thanks in advance for your help!
~ Hari

P.S. I wasn't sure if this is the right forum to post this question but I couldn't find a e2fsprogs mailing list and linux-ext4 seemed like a good alternative. Sorry, if this is the wrong forum.


2011-02-24 09:22:37

by Amir Goldstein

[permalink] [raw]
Subject: Re: e2fsprogs/ext4 version compatibility

On Wed, Feb 23, 2011 at 11:09 PM, Hari Subramanian <[email protected]> wrote:
> Hi,
>
> I'm running a SLES11 SP1 GA based kernel (2.6.32.23) with ext4 filesystem. The e2fsprogs package that's installed on the box is version 1.41.9 (22-Aug-2009). I was wondering if there was a version in compatibility between the filesystem implementation and the tools package, esp. since there have been 5 revisions of the package since 1.41.9
>

I am using the same kernel/e2fsprogs versions and they don't seem to
have any comparability issues
(none of the kernel/e2fsprogs versions should have comparability issues AFAIK)

> The reason I'm asking the question is my machine recently rebooted after a crash but fsck failed with an error code of 4 and the following message:
>
> "Inodes that were part of a corrupted orphan linked list found"
> "UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY"

This doesn't look like a kernel/e2fsprogs compatibility issue.
Fsck is telling you it found an inode, which looks like it should have
been on the orphan list,
but wasn't found on the list.

It could also be that you deleted an inode back in the 80's,
but I think the code is trying to rule that option out somehow
with the busted_fs_time:

if (inode->i_dtime && !busted_fs_time &&
inode->i_dtime < ctx->fs->super->s_inodes_count) {
if (fix_problem(ctx, PR_1_LOW_DTIME, &pctx)) {
inode->i_dtime = inode->i_links_count ?
0 : ctx->now;
e2fsck_write_inode(ctx, ino, inode,
"pass1");
}
}

In any case, saying yes to fix that problem seems to be mostly harmless.
later passes of fsck would figure out what to do with that inode, which
is why fsck requires you to run it manually, so you can answer important
questions like: delete inode XXX?

>
> There were no files being created or deleted at the time of the crash although IOs were probably inflight. So, the inodes being a part of orphaned link list doesn't make sense to me. I don't claim deep knowledge in filesystem internals either. Before I go suspecting a corrupted filesystem, I wanted to make sure I was running the right version of fsck.

An inode is added to orphan list before every write that is about to
change the file size, so it does make sense to
have inodes on the orphan list after crash, but it doesn't explain why
the orphan list is corrupted.

Could it be that your system had skewed time before the crash? 1/1/1970?

>
> Thanks in advance for your help!
> ~ Hari
>
> P.S. I wasn't sure if this is the right forum to post this question but I couldn't find a e2fsprogs mailing list and linux-ext4 seemed like a good alternative. Sorry, if this is the wrong forum.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-02-24 09:39:48

by Rogier Wolff

[permalink] [raw]
Subject: Re: e2fsprogs/ext4 version compatibility

On Wed, Feb 23, 2011 at 01:09:20PM -0800, Hari Subramanian wrote:

> The reason I'm asking the question is my machine recently rebooted
> after a crash but fsck failed with an error code of 4 and the
> following message:

> "Inodes that were part of a corrupted orphan linked list found"
> "UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY"

These sorts of filesystem errors occasionally occur. :-(

Do you have ECC RAM? A cosmic particle may have flipped a bit in your
RAM. There is not much you can do about it, except buy ECC RAM next
time. Much more likely, but less likely to be believed by users is:
your system simply flipped a bit. Somewhere in your system there is a
path that once in a million times is not fast enough to catch the
proper data, and will latch the wrong data. Result? A flipped bit.

Anyway, these errors accumulate. That's why running e2fsck is still
good to be doing every once in a while even on a logging filesystem
like ext3 or ext4 that should be resistant to suddenly turning off
the power (or crashing).

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2011-02-25 00:13:48

by Hari Subramanian

[permalink] [raw]
Subject: RE: e2fsprogs/ext4 version compatibility

Hi Amir,

Thanks for the quick response. Glad to note that I'm running the right version of fsck to start with.

So, is the variable i_dtime overloaded to store something other than 'deleted timestamp' after the fact that a file is deleted? Coz, the following part of the check doesn't make sense to me:

inode->i_dtime < ctx->fs->super->s_inodes_count

In any case it looks like it found an inode that has been deleted (inferred from a non zero i_dtime?).

It's certainly possible that the time on the node was changed following the crash. I'm running this on a VM and it's even possible that the hypervisor chose to change the time at the time of reboot (following the crash). But I doubt if it was 1/1/1970. Another thing that is not obvious to me how it determined that he i_dtime too old? That would help me get to the bottom of this.

I would like to have the setup configured so that I don't have to manually intervene to fix such problems at all and that's the intention behind these questions. As such you have been of great help so far.

Thanks again
~ Hari


-----Original Message-----
From: Amir Goldstein [mailto:[email protected]]
Sent: Thursday, February 24, 2011 4:23 AM
To: Hari Subramanian
Cc: [email protected]
Subject: Re: e2fsprogs/ext4 version compatibility

On Wed, Feb 23, 2011 at 11:09 PM, Hari Subramanian <[email protected]> wrote:
> Hi,
>
> I'm running a SLES11 SP1 GA based kernel (2.6.32.23) with ext4 filesystem. The e2fsprogs package that's installed on the box is version 1.41.9 (22-Aug-2009). I was wondering if there was a version in compatibility between the filesystem implementation and the tools package, esp. since there have been 5 revisions of the package since 1.41.9
>

I am using the same kernel/e2fsprogs versions and they don't seem to
have any comparability issues
(none of the kernel/e2fsprogs versions should have comparability issues AFAIK)

> The reason I'm asking the question is my machine recently rebooted after a crash but fsck failed with an error code of 4 and the following message:
>
> "Inodes that were part of a corrupted orphan linked list found"
> "UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY"

This doesn't look like a kernel/e2fsprogs compatibility issue.
Fsck is telling you it found an inode, which looks like it should have
been on the orphan list,
but wasn't found on the list.

It could also be that you deleted an inode back in the 80's,
but I think the code is trying to rule that option out somehow
with the busted_fs_time:

if (inode->i_dtime && !busted_fs_time &&
inode->i_dtime < ctx->fs->super->s_inodes_count) {
if (fix_problem(ctx, PR_1_LOW_DTIME, &pctx)) {
inode->i_dtime = inode->i_links_count ?
0 : ctx->now;
e2fsck_write_inode(ctx, ino, inode,
"pass1");
}
}

In any case, saying yes to fix that problem seems to be mostly harmless.
later passes of fsck would figure out what to do with that inode, which
is why fsck requires you to run it manually, so you can answer important
questions like: delete inode XXX?

>
> There were no files being created or deleted at the time of the crash although IOs were probably inflight. So, the inodes being a part of orphaned link list doesn't make sense to me. I don't claim deep knowledge in filesystem internals either. Before I go suspecting a corrupted filesystem, I wanted to make sure I was running the right version of fsck.

An inode is added to orphan list before every write that is about to
change the file size, so it does make sense to
have inodes on the orphan list after crash, but it doesn't explain why
the orphan list is corrupted.

Could it be that your system had skewed time before the crash? 1/1/1970?

>
> Thanks in advance for your help!
> ~ Hari
>
> P.S. I wasn't sure if this is the right forum to post this question but I couldn't find a e2fsprogs mailing list and linux-ext4 seemed like a good alternative. Sorry, if this is the wrong forum.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-02-25 00:17:23

by Hari Subramanian

[permalink] [raw]
Subject: RE: e2fsprogs/ext4 version compatibility

Hi Rogier,

My setup certainly has ECC RAM although I'm running this on a VM. So, that adds another software layer that potentially has bugs. In any case, I'm totally OK with running fsck but would like not have to manually intervene to fix problems. Any suggestions in this specific case to work around the 'required manual intervention'?

Thanks
~ Hari

-----Original Message-----
From: Rogier Wolff [mailto:[email protected]]
Sent: Thursday, February 24, 2011 4:40 AM
To: Hari Subramanian
Cc: [email protected]
Subject: Re: e2fsprogs/ext4 version compatibility

On Wed, Feb 23, 2011 at 01:09:20PM -0800, Hari Subramanian wrote:

> The reason I'm asking the question is my machine recently rebooted
> after a crash but fsck failed with an error code of 4 and the
> following message:

> "Inodes that were part of a corrupted orphan linked list found"
> "UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY"

These sorts of filesystem errors occasionally occur. :-(

Do you have ECC RAM? A cosmic particle may have flipped a bit in your
RAM. There is not much you can do about it, except buy ECC RAM next
time. Much more likely, but less likely to be believed by users is:
your system simply flipped a bit. Somewhere in your system there is a
path that once in a million times is not fast enough to catch the
proper data, and will latch the wrong data. Result? A flipped bit.

Anyway, these errors accumulate. That's why running e2fsck is still
good to be doing every once in a while even on a logging filesystem
like ext3 or ext4 that should be resistant to suddenly turning off
the power (or crashing).

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ