the fsfuzzer has been keeping me busy lately ;-)
http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html
has an image with a corrupt directory inode - despite having only 4 blocks, it
has an extremely large i_size.
readdir & lookup seem to behave differently when ext2_check_page fails for the
bogus high-index pages.
an "ls" immediately fails with "EIO" because:
ext2_readdir
ext2_get_page
ext2_check_page
and if ext2_check_page fails,
if (IS_ERR(page)) {
ext2_error(sb, __FUNCTION__,
"bad page in #%lu",
inode->i_ino);
filp->f_pos += PAGE_CACHE_SIZE - offset;
return -EIO;
}
however, if you try to "cat *" it spews errors over and over because it gets
into lookup:
ext2_lookup
ext2_inode_by_name
ext2_find_entry
loop over all pages within i_size calling ext2_get_page
and ext2_find_entry does not break out of the loop when a bad page is found, it
keeps trying the -next- page, causing a storm of printks as it churns through
all these bogus pages/offsets.
It seems odd to me that readdir bails out with an error on the first bad page,
while lookup keeps trying. Shouldn't these be consistent? And if so, which is
the desired behavior?
If we truly wish to keep trying after an error, perhaps adding a "bad page
count" to the inode_info struct, so that we can stop after a predetermined
number of errors, might be an option.
Or, perhaps a check high up that says if i_size doesn't correlate to i_blocks,
this inode is corrupt, and bail out early.
Thoughts?
Thanks,
-Eric
On Nov 14, 2006 09:25 -0600, Eric Sandeen wrote:
> has an image with a corrupt directory inode - despite having only 4 blocks,
> it has an extremely large i_size.
>
> It seems odd to me that readdir bails out with an error on the first bad
> page, while lookup keeps trying. Shouldn't these be consistent? And if
> so, which is the desired behavior?
I'd prefer that readdir _should_ return all of the valid directory blocks
it can find. Otherwise, it makes on average 1/2 of the files in that dir
inaccessible.
> Or, perhaps a check high up that says if i_size doesn't correlate to
> i_blocks, this inode is corrupt, and bail out early.
We did that for ext3, no? It would make sense to fix ext2 in the same way.
I'd suggest bailing out "early" == min(i_size >> blocksize, i_blocks).
The i_blocks count is an upper limit, because it includes the overhead of
indirect blocks. Directories cannot be sparse.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
Andreas Dilger wrote:
> On Nov 14, 2006 09:25 -0600, Eric Sandeen wrote:
>> has an image with a corrupt directory inode - despite having only 4 blocks,
>> it has an extremely large i_size.
>>
>> It seems odd to me that readdir bails out with an error on the first bad
>> page, while lookup keeps trying. Shouldn't these be consistent? And if
>> so, which is the desired behavior?
>
> I'd prefer that readdir _should_ return all of the valid directory blocks
> it can find. Otherwise, it makes on average 1/2 of the files in that dir
> inaccessible.
in the very rare case of corruption, yes... although if ext2 is mounted
with anything other than errors=continue the fs is going to turn
somewhat useless shortly thereafter anyway.
>> Or, perhaps a check high up that says if i_size doesn't correlate to
>> i_blocks, this inode is corrupt, and bail out early.
>
> We did that for ext3, no?
Yes, this is similar. In that case we kept trying bad pages until we
had exceeded the block count, IIRC. I was considering the possibility
of checking blocks vs. size right at the top (ext3_readdir or lookup)
and if they don't correspond, don't even bother because the information
we're starting with is known to be bad.
Looking at this one I wonder if the ext3 fix was too specific/targeted -
I'll double check it.
> It would make sense to fix ext2 in the same way.
> I'd suggest bailing out "early" == min(i_size >> blocksize, i_blocks).
> The i_blocks count is an upper limit, because it includes the overhead of
> indirect blocks. Directories cannot be sparse.
so we could either a) keep processing pages based on i_size, until we
have passed i_blocks, or b) if i_size & i_blocks don't match,
immediately bail out because we know we have found a corrupted inode
(vs. a "normal" unreadable block...)
-Eric
On Nov 14, 2006 13:38 -0600, Eric Sandeen wrote:
> Andreas Dilger wrote:
> > It would make sense to fix ext2 in the same way.
> > I'd suggest bailing out "early" == min(i_size >> blocksize, i_blocks).
> > The i_blocks count is an upper limit, because it includes the overhead of
> > indirect blocks. Directories cannot be sparse.
>
> so we could either a) keep processing pages based on i_size, until we
> have passed i_blocks, or b) if i_size & i_blocks don't match,
> immediately bail out because we know we have found a corrupted inode
> (vs. a "normal" unreadable block...)
Do we already ext3_error() in this case? That allows the admin to determine
the behaviour already. If it is errors=continue or errors=remount-ro then
we should continue I think. We might consider the inode fatally corrupted
if (i_blocks << 9 < i_size ||
i_blocks > i_size >> (blockbits - 8) + /* blocks */
i_size >> (blockbits * 2 - 8 - 2) + /* indirect */
i_size >> (blockbits * 3 - 8 - 2) + /* dindirect */
i_size >> (blockbits * 4 - 8 - 2)) /* tindirect */
I think... Trying to account for indirect blocks. It is already given a
100% margin (-8 instead of -9) to cover rounding, EA blocks, some small
bugs in block counting, extents format, etc. FYI, the "-2" is 4 bytes/addr.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.