by Andreas Dilger

[permalink] [raw]

Subject: Re: [E2FSPROGS, RFC] mke2fs: New bitmap and inode table allocation for FLEX_BG

On Apr 23, 2008 16:05 -0500, Jose R. Santos wrote:
> On Wed, 23 Apr 2008 14:39:55 -0600
> Andreas Dilger <[email protected]> wrote:
> > It makes total sense to me that the BG_BLOCK_UNINIT flag would not be set
> > on a group that does not have the default bitmap layouts, so I agree with
> > this change. I might suggest that we add a new flag BG_BLOCK_EMPTY or
> > similar (which is really part of the FLEXBG feature so it doesn't affect
> > the existing uninit_groups code) that indicates that the block bitmap
> > contains NO allocated blocks, so that the kernel can know immediately
> > when reconstructing the bitmap that there are no bitmaps or itable in
> > that group (i.e. the bitmap is all zero).
>
> I originally had a similar idea but was vetoed because there was no
> kernel user on the flag. The flag that I used was set if the block
> group had meta-data as opposed to just being empty since there are still
> block groups out there that can have no meta-data but still have bgd or
> backup super blocks. Would BG_BLOCK_EMPTY mean no bitmaps/inode tables
> or does it imply completely empty block group?

It could mean either... What is important is if that is useful it should
be done before FLEXBG goes into the field.

The kernel can already determine somewhat efficiently whether a group
has sb or gdt backups, though it can't hurt to flag this also. What
seems to be quite difficult is to know in the presence of FLEXBG whether
a group has an itable or bitmap in it.

I'd HOPE (and I believe this is what Ted's recent patch did) is that any
group which is being used to store flexbg data will have an initialized
block bitmap in it, because it is "non-standard".

What is more tricky is if a group has BLOCK_UNINIT and/or INODE_UNINIT
set what should happen when that group's block bitmap is initialized.
Should it assume there is a block + inode bitmap and an itable, or is
it enough to check its own group descriptor to determine if the bitmap
and itable are not in the group itself.

Maybe I'm being paranoid, and we don't need the flag(s), but better to
think the issues through now and decide we don't need them, than to
decide later that we do.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-04-28 12:01:50

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [E2FSPROGS, RFC] mke2fs: New bitmap and inode table allocation for FLEX_BG

On Fri, Apr 25, 2008 at 02:10:26PM -0600, Andreas Dilger wrote:
> I'd HOPE (and I believe this is what Ted's recent patch did) is that any
> group which is being used to store flexbg data will have an initialized
> block bitmap in it, because it is "non-standard".

Correct. If there are *any* blocks allocated other than the block's
own metadata, BLOCK_UNINIT will never be set. And that's precisely to
avoid the tricky case described in your next paragraph:

> What is more tricky is if a group has BLOCK_UNINIT and/or INODE_UNINIT
> set what should happen when that group's block bitmap is initialized.
> Should it assume there is a block + inode bitmap and an itable, or is
> it enough to check its own group descriptor to determine if the bitmap
> and itable are not in the group itself.

In the kernel, it should be enough only to check bg_inode_bitmp,
bg_block_bitmap, and bg_inode_table to construct the block bitmap.
The point was to keep things simple.

The cost of doing this is that you will end up needing to initialize
the block bitmaps for every an extra 1 out of every flex_bg_size block
groups, but that's not a major cost. It also means that BLOCK_UNINIT
and BLOCK_BG_EMPTY as defined by Andreas are the same thing. This was
a Keep It Simple, Stupid design point; I don't think the complexity is
worth it.

If someone wants to convince me that the benefits of forcing the
kernel and e2fsck pass5 to paw through all of the block group
descriptors to construct the block bitmap outweighs the costs (and
more importantly, volunteers to write the code :-), I'm willing to be
convinced otherwise....

- Ted