2008-11-07 10:21:23

by Frédéric Bohé

[permalink] [raw]
Subject: [PATCH] ext4: add checksum calculation when clearing UNINIT flag

From: Frederic Bohe <[email protected]>

Block group's checksum need to be re-calculated during the
initialization of an UNINIT'd group. This fix a race when several
threads try to allocate a new inode in an UNINIT'd group.

Signed-off-by: Frederic Bohe <[email protected]>
---
ialloc.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux/fs/ext4/ialloc.c
===================================================================
--- linux.orig/fs/ext4/ialloc.c 2008-11-06 17:22:14.000000000 +0100
+++ linux/fs/ext4/ialloc.c 2008-11-07 10:43:41.000000000 +0100
@@ -718,6 +718,8 @@ got:
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
free = ext4_free_blocks_after_init(sb, group, gdp);
gdp->bg_free_blocks_count = cpu_to_le16(free);
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
+ gdp);
}
spin_unlock(sb_bgl_lock(sbi, group));
--





2008-11-07 13:52:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add checksum calculation when clearing UNINIT flag

On Fri, Nov 07, 2008 at 11:22:56AM +0100, Fr?d?ric Boh? wrote:
> From: Frederic Bohe <[email protected]>
>
> Block group's checksum need to be re-calculated during the
> initialization of an UNINIT'd group. This fix a race when several
> threads try to allocate a new inode in an UNINIT'd group.

This patch looks sane, and so I'll accept it, but there's a higher
order hiding here ---- why are we initializing the block bitmap in
ext4_new_inode()? Sure, *most* of the time where we create a new
inode, we'll be needing to allocate a new block, but sometimes we
won't (i.e., when creating a symlink, device file, socket, or a
zero-length regular file). More seriously, we don't account for the
potential need for an extra journal credit in all of the callers for
ext4_new_inode(). Obviously this doesn't get us in trouble because we
generally massively overestimate the number of journal credits we need
--- but from the point of view of code simplification, maybe code
block to ininitialize the block bitmap in ext4_new_inode() should be
dropped entirely.

We have to do the exact same check in the mballoc.c when we actually
allocate blocks --- and in that case we know we'll be modifying the
block bitmap, so there's no need to first initialize the block bitmap
in ext4_new_inode(), only to need to request to redirty that same
block bitmap in mballoc.c when we are really allocating data for the
inode.

Does that make sense for a future cleanup?

- Ted

2008-11-07 14:27:30

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: add checksum calculation when clearing UNINIT flag

On Fri, Nov 07, 2008 at 08:52:22AM -0500, Theodore Tso wrote:
> On Fri, Nov 07, 2008 at 11:22:56AM +0100, Fr?d?ric Boh? wrote:
> > From: Frederic Bohe <[email protected]>
> >
> > Block group's checksum need to be re-calculated during the
> > initialization of an UNINIT'd group. This fix a race when several
> > threads try to allocate a new inode in an UNINIT'd group.
>
> This patch looks sane, and so I'll accept it, but there's a higher
> order hiding here ---- why are we initializing the block bitmap in
> ext4_new_inode()? Sure, *most* of the time where we create a new
> inode, we'll be needing to allocate a new block, but sometimes we
> won't (i.e., when creating a symlink, device file, socket, or a
> zero-length regular file).

Because when we clear the uninitt_bg flag the kernel expect the block
bitmap to be correctly indicate blocks containing block
bitmap and inode bitmap as used. If mke2fs didn't do that we would
need to do the same when we remove the uninit_bg flag.


-aneesh

2008-11-07 14:38:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: add checksum calculation when clearing UNINIT flag

On Fri, Nov 07, 2008 at 07:57:18PM +0530, Aneesh Kumar K.V wrote:
> On Fri, Nov 07, 2008 at 08:52:22AM -0500, Theodore Tso wrote:
> > On Fri, Nov 07, 2008 at 11:22:56AM +0100, Fr?d?ric Boh? wrote:
> > > From: Frederic Bohe <[email protected]>
> > >
> > > Block group's checksum need to be re-calculated during the
> > > initialization of an UNINIT'd group. This fix a race when several
> > > threads try to allocate a new inode in an UNINIT'd group.
> >
> > This patch looks sane, and so I'll accept it, but there's a higher
> > order hiding here ---- why are we initializing the block bitmap in
> > ext4_new_inode()? Sure, *most* of the time where we create a new
> > inode, we'll be needing to allocate a new block, but sometimes we
> > won't (i.e., when creating a symlink, device file, socket, or a
> > zero-length regular file).
>
> Because when we clear the uninitt_bg flag the kernel expect the block
> bitmap to be correctly indicate blocks containing block
> bitmap and inode bitmap as used. If mke2fs didn't do that we would
> need to do the same when we remove the uninit_bg flag.

We have separate flags inidicating whether the block allocation bitmap
and inode allocation bitmaps are initialized or not,
EXT4_BG_BLOCK_UNINIT, and EXT4_BG_INODE_UNINIT, respectively. So what
I am proposing is to not initialize the block bitmap in
ext4_new_inode(), and not to clear the EXT4_BG_BLOCK_UNINIT flag, either.

- Ted

2008-11-11 01:24:04

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: add checksum calculation when clearing UNINIT flag

On Nov 07, 2008 09:38 -0500, Theodore Ts'o wrote:
> On Fri, Nov 07, 2008 at 07:57:18PM +0530, Aneesh Kumar K.V wrote:
> > Because when we clear the uninitt_bg flag the kernel expect the block
> > bitmap to be correctly indicate blocks containing block
> > bitmap and inode bitmap as used. If mke2fs didn't do that we would
> > need to do the same when we remove the uninit_bg flag.
>
> We have separate flags inidicating whether the block allocation bitmap
> and inode allocation bitmaps are initialized or not,
> EXT4_BG_BLOCK_UNINIT, and EXT4_BG_INODE_UNINIT, respectively. So what
> I am proposing is to not initialize the block bitmap in
> ext4_new_inode(), and not to clear the EXT4_BG_BLOCK_UNINIT flag, either.

That would be dangerous, because the block group _would_ be in use due
to the fact that one of the inode table blocks is in use. That isn't
to say we couldn't adopt sematics as you suggest (e.g. that INODE_UNINIT
not being set implies that the inode table blocks are in use regardless
of whether or not BLOCK_UNINIT is set, but it needs careful consideration.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.