From: Andreas Dilger <adilger@sun.com>
Subject: Re: fsck infinite loop on corrupt ext4 file system
Date: Mon, 17 Aug 2009 20:47:41 -0600
Message-ID: <20090818024741.GF5931@webber.adilger.int>
References: <1250294105.6221.24.camel@bobble.smo.corp.google.com>
 <1250557822.23227.9.camel@bobble.smo.corp.google.com>
Mime-Version: 1.0
Content-Type: text/plain; CHARSET=US-ASCII
Content-Transfer-Encoding: 7BIT
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu
To: Frank Mayhar <fmayhar@google.com>
Content-disposition: inline
In-reply-to: <1250557822.23227.9.camel@bobble.smo.corp.google.com>
Sender: linux-ext4-owner@vger.kernel.org

On Aug 17, 2009  18:10 -0700, Frank Mayhar wrote:
> I've made a little more progress since Friday.  I had grabbed a dumpe2fs
> dump of the corrupted file system and one of the newly-created file
> system on the same device.  Adjusting for normal variation (numbers of
> free blocks, flags, etc.), there are no differences _except_ in the very
> block groups that fsck complained about having bad checksums.  For those
> (and only those), the locations of the block bitmap and inode table
> differ.  I've attached the diff output.

It doesn't appear that the two filesystems were created with the same
options, or one of the filesystems was resized or something.

> In particular, block group 276 claims to have its inode table at blocks
> 0-204, which is clearly wrong.  This is the block group for which the
> allocation failed, causing the original loop.
> 
> It's clear that fsck is neither correcting the block groups nor is it
> detecting the bad entries properly (a sanity check might be in order
> here).  It's not even noticing that it's looping, it just keeps failing
> the allocation and retrying.  While it may be that fsck can't recover
> the file system in this case, it should at least notice and abort.
> 
> My thinking is that the location of the inode tables should be invariant
> over the life of the file system.  Certainly there's no place in ext4
> itself that changes those fields (that I can see, anyway).  Why couldn't
> fsck compute the proper values and compare those against what's there?

With the addition of FLEX_BG there is no longer a hard & fast rule for
the location of the block groups' metadata.  In the past it was always
guaranteed to be within the group itself, now it can be anywhere.

>  Group 276: (Blocks 9043968-9076735)
> -  Block bitmap at 9043968 (+0), Inode bitmap at 9043969 (+1)
> -  Inode table at 0-204
> +  Block bitmap at 8912900, Inode bitmap at 8912916
> +  Inode table at 8913748-8913952

This is definitely bogus and should be detected/fixed by e2fsck.  I
suspect it used to be handled (pre-flexbg) by the check that the inode
table is within the group, but now there is no sanity check for the
placement at all (including overlapping with other groups, superblocks,
etc.

It makes sense to still validate the sanity of the group descriptor
data, and then check the backup group descriptors if the primaries
are suspicious.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.