2007-03-24 21:40:59

by Andreas Dilger

[permalink] [raw]
Subject: group descriptor contents and LAZY_BG

Ted,
the LAZY_BG feature was originally written to test large filesystems and
avoid the need to format (or even allocate, if sparse) large parts of the
inode table and bitmaps. To make the feature COMPAT the bg_free_blocks_count
and bg_free_inodes_count are initialized to zero, and the kernel will skip
these groups entirely without checking the bitmap.

For the GDT_CSUM feature, we are using the LAZY_BG feature to not initialize
the bitmaps and inode table. However, setting the bg_free_*_count to zero
is troublesome because it means we always need to check the GDT_CSUM feature
flag to know whether a group actually has free blocks/inodes, and e2fsck
is also getting confused about the number of free blocks/inodes in the groups
and filesystem (we have to have a double accounting for "real" free blocks
and "uninitialized" free blocks.

What do you think for the GDT_CSUM feature that we initialize the group
descriptors as if the group had actually been formatted? I think our
use of the LAZY_BG feature is actually misguided - while they share some
components (e.g. UNINIT flags, mke2fs code, some e2fsck code), since the
GDT_CSUM feature is RO_COMPAT there isn't much reason to even enable
LAZY_BG at format time...

We have to replace all uses of bg_free_*_count with the below macros:

+/* Macro-instructions used to calculate Free inodes and blocks count. */
+#define EXT3_BG_INODES_FREE(sb,gr,gdp) ((EXT3_HAS_RO_COMPAT_FEATURE(sb, \
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM) && \
+ (gdp)->bg_flags & \
+ cpu_to_le16(EXT3_BG_INODE_UNINIT)) ? \
+ EXT3_INODES_PER_GROUP(sb) : \
+ le16_to_cpu((gdp)->bg_itable_unused) + \
+ le16_to_cpu((gdp)->bg_free_inodes_count))
+#define EXT3_BG_BLOCKS_FREE(sb,gr,gdp) ((EXT3_HAS_RO_COMPAT_FEATURE(sb, \
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM) && \
+ (gdp)->bg_flags & \
+ cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) ? \
+ ext3_free_blocks_after_init(sb,gr,gdp) :\
+ le16_to_cpu((gdp)->bg_free_blocks_count))

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


2007-03-26 07:03:40

by Andreas Dilger

[permalink] [raw]
Subject: Re: group descriptor contents and LAZY_BG

On Mar 24, 2007 15:40 -0600, Andreas Dilger wrote:
> For the GDT_CSUM feature, we are using the LAZY_BG feature to not initialize
> the bitmaps and inode table. However, setting the bg_free_*_count to zero
> is troublesome because it means we always need to check the GDT_CSUM feature
> flag to know whether a group actually has free blocks/inodes, and e2fsck
> is also getting confused about the number of free blocks/inodes in the groups
> and filesystem (we have to have a double accounting for "real" free blocks
> and "uninitialized" free blocks.
>
> What do you think for the GDT_CSUM feature that we initialize the group
> descriptors as if the group had actually been formatted? I think our
> use of the LAZY_BG feature is actually misguided - while they share some
> components (e.g. UNINIT flags, mke2fs code, some e2fsck code), since the
> GDT_CSUM feature is RO_COMPAT there isn't much reason to even enable
> LAZY_BG at format time...

FYI, I changed the patch to just make bg_itable_unused a hint for e2fsck
(leaving the meaning of the other bg_* fields the same as a normal fs)
and this has simplified the e2fsprogs and kernel code a lot.

I also had mke2fs deal differently with GDT_CSUM compared to LAZY_BG
when setting the bg_free_{blocks,inodes}_count. With LAZY_BG the
*_UNINIT flag also set the corresponding bg_free_*_count values to zero,
while if GDT_CSUM is set then they are left as they would otherwise be.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-03-28 13:14:43

by Theodore Ts'o

[permalink] [raw]
Subject: Re: group descriptor contents and LAZY_BG

On Sat, Mar 24, 2007 at 03:40:57PM -0600, Andreas Dilger wrote:
> What do you think for the GDT_CSUM feature that we initialize the group
> descriptors as if the group had actually been formatted?

I think that makes a huge amount of sense. Initializing the group
descriptors isn't want takes the huge amount of time; it's
initializing the bitmap blocks and inode table.

- Ted