2008-04-12 21:12:31

by Andi Kleen

[permalink] [raw]
Subject: ext3_valid_block_bitmap: Invalid block bitmap in 2.6.25rc in memory


FYI, a system here running various 2.6.25rc kernels (latest upto rc7-git6)
with longer uptimes suddenly decided to fsck one of its file systems
due to an error after reboot.

The error causing this was:

kernel: EXT3-fs error (device dm-0): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 285, block = 9338882

detected by the 2.6.25rc7-git6 kernel.

I don't see any ill effects from it and fsck didn't find anything wrong
so it must have been something spurious in memory only (or fsck
fails to check for this condition, but that is hard to imagine)

The system never showed anything like this on earlier kernel versions.

-Andi


2008-04-14 14:50:10

by Mingming Cao

[permalink] [raw]
Subject: Re: ext3_valid_block_bitmap: Invalid block bitmap in 2.6.25rc in memory

On Sat, 2008-04-12 at 22:57 +0200, Andi Kleen wrote:
> FYI, a system here running various 2.6.25rc kernels (latest upto rc7-git6)
> with longer uptimes suddenly decided to fsck one of its file systems
> due to an error after reboot.
>
> The error causing this was:
>
> kernel: EXT3-fs error (device dm-0): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 285, block = 9338882
>
> detected by the 2.6.25rc7-git6 kernel.
>
> I don't see any ill effects from it and fsck didn't find anything wrong
> so it must have been something spurious in memory only (or fsck
> fails to check for this condition, but that is hard to imagine)
>

The ext3_valid_block_bitmap() is to check whether the block or inode
bitmap block is marked as "used" in the block group bitmap, to prevent
allocating blocks from these system meta data blocks. The error messages
seems indicating that one of the block group meta data is corrupted, but
I don't why fsck doesn't catch this, Andreas?

Mingming
> The system never showed anything like this on earlier kernel versions.
>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-04-14 23:41:16

by Andreas Dilger

[permalink] [raw]
Subject: Re: ext3_valid_block_bitmap: Invalid block bitmap in 2.6.25rc in memory

On Apr 14, 2008 07:50 -0700, Mingming Cao wrote:
> On Sat, 2008-04-12 at 22:57 +0200, Andi Kleen wrote:
> > FYI, a system here running various 2.6.25rc kernels (latest upto rc7-git6)
> > with longer uptimes suddenly decided to fsck one of its file systems
> > due to an error after reboot.
> >
> > The error causing this was:
> >
> > kernel: EXT3-fs error (device dm-0): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 285, block = 9338882
> >
> > detected by the 2.6.25rc7-git6 kernel.
> >
> > I don't see any ill effects from it and fsck didn't find anything wrong
> > so it must have been something spurious in memory only (or fsck
> > fails to check for this condition, but that is hard to imagine)
>
> The ext3_valid_block_bitmap() is to check whether the block or inode
> bitmap block is marked as "used" in the block group bitmap, to prevent
> allocating blocks from these system meta data blocks.

Right.

> The error messages seems indicating that one of the block group meta
> data is corrupted, but I don't why fsck doesn't catch this, Andreas?

It might have been corrupted on read (e.g. bad cable, or bad/wrong
data read from disk the first time).

The message itself isn't very useful though. It should report what it
thinks is wrong with the bitmap (e.g. whether block/inode bitmaps are
unallocated, which/how many itable blocks are unallocated).

> Mingming
> > The system never showed anything like this on earlier kernel versions.

This is a new check, to catch allocation bitmap corruption before it
causes the corruption to spread into the rest of the filesystem by
double-allocating blocks, etc. Having a checksum would also be good,
but even then memory corruption can lead to a valid checksum of bad
data in memory so a validity check is still useful for such important
and rarely-read data.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-04-15 08:54:55

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: ext3_valid_block_bitmap: Invalid block bitmap in 2.6.25rc in memory

On Mon, Apr 14, 2008 at 05:40:59PM -0600, Andreas Dilger wrote:
> On Apr 14, 2008 07:50 -0700, Mingming Cao wrote:
> > On Sat, 2008-04-12 at 22:57 +0200, Andi Kleen wrote:
> > > FYI, a system here running various 2.6.25rc kernels (latest upto rc7-git6)
> > > with longer uptimes suddenly decided to fsck one of its file systems
> > > due to an error after reboot.
> > >
> > > The error causing this was:
> > >
> > > kernel: EXT3-fs error (device dm-0): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 285, block = 9338882
> > >
> > > detected by the 2.6.25rc7-git6 kernel.
> > >
> > > I don't see any ill effects from it and fsck didn't find anything wrong
> > > so it must have been something spurious in memory only (or fsck
> > > fails to check for this condition, but that is hard to imagine)
> >
> > The ext3_valid_block_bitmap() is to check whether the block or inode
> > bitmap block is marked as "used" in the block group bitmap, to prevent
> > allocating blocks from these system meta data blocks.
>
> Right.
>
> > The error messages seems indicating that one of the block group meta
> > data is corrupted, but I don't why fsck doesn't catch this, Andreas?
>
> It might have been corrupted on read (e.g. bad cable, or bad/wrong
> data read from disk the first time).
>
> The message itself isn't very useful though. It should report what it
> thinks is wrong with the bitmap (e.g. whether block/inode bitmaps are
> unallocated, which/how many itable blocks are unallocated).
>

debugfs should help to find these details right ?


-aneesh

2008-04-15 10:04:25

by Andreas Dilger

[permalink] [raw]
Subject: Re: ext3_valid_block_bitmap: Invalid block bitmap in 2.6.25rc in memory

On Apr 15, 2008 14:17 +0530, Aneesh Kumar K.V wrote:
> On Mon, Apr 14, 2008 at 05:40:59PM -0600, Andreas Dilger wrote:
> > On Apr 14, 2008 07:50 -0700, Mingming Cao wrote:
> > > On Sat, 2008-04-12 at 22:57 +0200, Andi Kleen wrote:
> > > > FYI, a system here running various 2.6.25rc kernels (latest upto rc7-git6)
> > > > with longer uptimes suddenly decided to fsck one of its file systems
> > > > due to an error after reboot.
> > > >
> > > > The error causing this was:
> > > >
> > > > kernel: EXT3-fs error (device dm-0): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 285, block = 9338882
> > > >
> > > > detected by the 2.6.25rc7-git6 kernel.
> > > >
> > > > I don't see any ill effects from it and fsck didn't find anything wrong
> > > > so it must have been something spurious in memory only (or fsck
> > > > fails to check for this condition, but that is hard to imagine)
> > >
> > > The ext3_valid_block_bitmap() is to check whether the block or inode
> > > bitmap block is marked as "used" in the block group bitmap, to prevent
> > > allocating blocks from these system meta data blocks.
> >
> > Right.
> >
> > > The error messages seems indicating that one of the block group meta
> > > data is corrupted, but I don't why fsck doesn't catch this, Andreas?
> >
> > It might have been corrupted on read (e.g. bad cable, or bad/wrong
> > data read from disk the first time).
> >
> > The message itself isn't very useful though. It should report what it
> > thinks is wrong with the bitmap (e.g. whether block/inode bitmaps are
> > unallocated, which/how many itable blocks are unallocated).
>
> debugfs should help to find these details right ?

It isn't always possible to run debugfs on a customer system, and the
information would be lost after a reboot or an e2fsck. The e2fsck might
even happen automatically after an errors=panic reboot and auto e2fsck.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.