2008-06-17 06:06:34

by Theodore Ts'o

[permalink] [raw]
Subject: e2fsprogs: Patches to resize2fs

The following patches fix resize2fs support for uninit_bg filesystems,
as well as a rather embarassing bug which I had introduced while
making mke2fs -j more efficient. The final patch prohibits resizing
filesystems that have flex_bg enabled and do not have the resize_inode
feature enabled, since my testing has found a nasty bug in the flex_bg
&& !resize_inode case, and I fix isn't going to be coming right away.

In any case, this should make e2fsprogs work much better; at this
point resizing should be in good shape except for:

1) flex_bg with on-line resizing

and

2) flex_bg && !resize_inode with off-line resizing.

There may also be problems with flex_bg && uninit_bg; more testing is
needed there to be 100% sure there's nothing hiding in that particular
feature set combination.

- Ted




2008-06-17 06:06:34

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] resize2fs: Prohibit the combination of flex_bg and !resize_inode

This is a potentially a difficult case for resize2fs to handle, so
prohibit it for now.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
resize/main.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/resize/main.c b/resize/main.c
index d7097ad..1cb0b20 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -316,9 +316,28 @@ int main (int argc, char ** argv)
"(%s)", device_name);
exit(1);
}
+
+ /*
+ * XXXX The combination of flex_bg and !resize_inode causes
+ * major problems for resize2fs, since when the group descriptors
+ * grow in size this can potentially require multiple inode
+ * tables to be moved aside to make room, and resize2fs chokes
+ * rather badly in this scenario. It's a rare combination,
+ * except when a filesystem is expanded more than a certain
+ * size, so for now, we'll just prohibit that combination.
+ * This is something we should fix eventually, though.
+ */
+ if ((fs->super->s_feature_incompat & EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ !(fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_RESIZE_INODE)) {
+ com_err(program_name, 0, _("%s: The combination of flex_bg "
+ "and\n\t!resize_inode features "
+ "is not supported by resize2fs.\n"),
+ device_name);
+ exit(1);
+ }

if (print_min_size) {
- printf("Estimated minimum size of the filesystem: %u\n",
+ printf(_("Estimated minimum size of the filesystem: %u\n"),
calculate_minimum_resize_size(fs));
exit(0);
}
--
1.5.6.rc3.1.g36b7.dirty


2008-06-17 06:06:34

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext2fs_zero_blocks: Avoid clearing more blocks than requested

This could cause certain mke2fs feature combinations to result in the
initial blocks of the inode table getting wiped out when the journal
is created.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
lib/ext2fs/mkjournal.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/ext2fs/mkjournal.c b/lib/ext2fs/mkjournal.c
index ca8e733..e55dcbd 100644
--- a/lib/ext2fs/mkjournal.c
+++ b/lib/ext2fs/mkjournal.c
@@ -170,9 +170,11 @@ errcode_t ext2fs_zero_blocks(ext2_filsys fs, blk_t blk, int num,
/* OK, do the write loop */
j=0;
while (j < num) {
- if (blk % STRIDE_LENGTH)
+ if (blk % STRIDE_LENGTH) {
count = STRIDE_LENGTH - (blk % STRIDE_LENGTH);
- else {
+ if (count > (num - j))
+ count = num - j;
+ } else {
count = num - j;
if (count > STRIDE_LENGTH)
count = STRIDE_LENGTH;
--
1.5.6.rc3.1.g36b7.dirty


2008-06-17 06:06:34

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] resize2fs: Fix support for the uninit_bg feature

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
resize/main.c | 6 ----
resize/resize2fs.c | 83 ++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 65 insertions(+), 24 deletions(-)

diff --git a/resize/main.c b/resize/main.c
index e26cd8b..d7097ad 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -307,12 +307,6 @@ int main (int argc, char ** argv)
exit (1);
}

- if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM) {
- com_err(program_name, EXT2_ET_RO_UNSUPP_FEATURE,
- ":- uninit_bg");
- exit(1);
- }
-
/*
* Check for compatibility with the feature sets. We need to
* be more stringent than ext2fs_open().
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 2d53873..027315c 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -188,7 +188,7 @@ errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs, blk_t new_size)
int adj, old_numblocks, numblocks, adjblocks;
unsigned long i, j, old_desc_blocks, max_group;
unsigned int meta_bg, meta_bg_size;
- int has_super;
+ int has_super, csum_flag;
unsigned long long new_inodes; /* u64 to check for overflow */

fs->super->s_blocks_count = new_size;
@@ -359,6 +359,8 @@ retry:
group_block = fs->super->s_first_data_block +
old_fs->group_desc_count * fs->super->s_blocks_per_group;

+ csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
adj = old_fs->group_desc_count;
max_group = fs->group_desc_count - adj;
if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
@@ -372,18 +374,26 @@ retry:
sizeof(struct ext2_group_desc));
adjblocks = 0;

+ fs->group_desc[i].bg_flags = 0;
+ if (csum_flag)
+ fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT |
+ EXT2_BG_INODE_ZEROED;
if (i == fs->group_desc_count-1) {
numblocks = (fs->super->s_blocks_count -
fs->super->s_first_data_block) %
fs->super->s_blocks_per_group;
if (!numblocks)
numblocks = fs->super->s_blocks_per_group;
- } else
+ } else {
numblocks = fs->super->s_blocks_per_group;
+ if (csum_flag)
+ fs->group_desc[i].bg_flags |=
+ EXT2_BG_BLOCK_UNINIT;
+ }

has_super = ext2fs_bg_has_super(fs, i);
if (has_super) {
- ext2fs_mark_block_bitmap(fs->block_map, group_block);
+ ext2fs_block_alloc_stats(fs, group_block, +1);
adjblocks++;
}
meta_bg_size = EXT2_DESC_PER_BLOCK(fs->super);
@@ -393,8 +403,8 @@ retry:
(meta_bg < fs->super->s_first_meta_bg)) {
if (has_super) {
for (j=0; j < old_desc_blocks; j++)
- ext2fs_mark_block_bitmap(fs->block_map,
- group_block + 1 + j);
+ ext2fs_block_alloc_stats(fs,
+ group_block + 1 + j, +1);
adjblocks += old_desc_blocks;
}
} else {
@@ -403,8 +413,8 @@ retry:
if (((i % meta_bg_size) == 0) ||
((i % meta_bg_size) == 1) ||
((i % meta_bg_size) == (meta_bg_size-1)))
- ext2fs_mark_block_bitmap(fs->block_map,
- group_block + has_super);
+ ext2fs_block_alloc_stats(fs,
+ group_block + has_super, +1);
}

adjblocks += 2 + fs->inode_blocks_per_group;
@@ -597,7 +607,7 @@ static void mark_fs_metablock(ext2_resize_t rfs,
ext2_filsys fs = rfs->new_fs;

ext2fs_mark_block_bitmap(rfs->reserve_blocks, blk);
- ext2fs_mark_block_bitmap(fs->block_map, blk);
+ ext2fs_block_alloc_stats(fs, blk, +1);

/*
* Check to see if we overlap with the inode or block bitmap,
@@ -707,8 +717,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
}
for (blk = group_blk+1+new_blocks;
blk < group_blk+1+old_blocks; blk++) {
- ext2fs_unmark_block_bitmap(fs->block_map,
- blk);
+ ext2fs_block_alloc_stats(fs, blk, -1);
rfs->needed_blocks--;
}
group_blk += fs->super->s_blocks_per_group;
@@ -781,7 +790,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
*/
if (FS_BLOCK_BM(old_fs, i) !=
(blk = FS_BLOCK_BM(fs, i))) {
- ext2fs_mark_block_bitmap(fs->block_map, blk);
+ ext2fs_block_alloc_stats(fs, blk, +1);
if (ext2fs_test_block_bitmap(old_fs->block_map, blk) &&
!ext2fs_test_block_bitmap(meta_bmap, blk))
ext2fs_mark_block_bitmap(rfs->move_blocks,
@@ -789,7 +798,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
}
if (FS_INODE_BM(old_fs, i) !=
(blk = FS_INODE_BM(fs, i))) {
- ext2fs_mark_block_bitmap(fs->block_map, blk);
+ ext2fs_block_alloc_stats(fs, blk, +1);
if (ext2fs_test_block_bitmap(old_fs->block_map, blk) &&
!ext2fs_test_block_bitmap(meta_bmap, blk))
ext2fs_mark_block_bitmap(rfs->move_blocks,
@@ -815,7 +824,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
*/
for (blk = fs->group_desc[i].bg_inode_table, j=0;
j < fs->inode_blocks_per_group ; j++, blk++) {
- ext2fs_mark_block_bitmap(fs->block_map, blk);
+ ext2fs_block_alloc_stats(fs, blk, +1);
if (ext2fs_test_block_bitmap(old_fs->block_map, blk) &&
!ext2fs_test_block_bitmap(meta_bmap, blk))
ext2fs_mark_block_bitmap(rfs->move_blocks,
@@ -953,7 +962,7 @@ static errcode_t block_mover(ext2_resize_t rfs)
retval = ENOSPC;
goto errout;
}
- ext2fs_mark_block_bitmap(fs->block_map, new_blk);
+ ext2fs_block_alloc_stats(fs, new_blk, +1);
ext2fs_add_extent_entry(rfs->bmap, blk, new_blk);
to_move++;
}
@@ -1222,7 +1231,8 @@ static errcode_t inode_scan_and_fix(ext2_resize_t rfs)
goto errout;
}
}
- ext2fs_mark_inode_bitmap(rfs->new_fs->inode_map, new_inode);
+ ext2fs_inode_alloc_stats2(rfs->new_fs, new_inode, +1,
+ pb.is_dir);
if (pb.changed) {
/* Get the new version of the inode */
retval = ext2fs_read_inode_full(rfs->old_fs, ino,
@@ -1491,7 +1501,7 @@ static errcode_t move_itables(ext2_resize_t rfs)

for (blk = rfs->old_fs->group_desc[i].bg_inode_table, j=0;
j < fs->inode_blocks_per_group ; j++, blk++)
- ext2fs_unmark_block_bitmap(fs->block_map, blk);
+ ext2fs_block_alloc_stats(fs, blk, -1);

rfs->old_fs->group_desc[i].bg_inode_table = new_blk;
ext2fs_group_desc_csum_set(rfs->old_fs, i);
@@ -1579,13 +1589,35 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs)
unsigned int count = 0;
int total_free = 0;
int group_free = 0;
+ int uninit = 0;
+ blk_t super_blk, old_desc_blk, new_desc_blk;
+ int old_desc_blocks;

/*
* First calculate the block statistics
*/
+ uninit = fs->group_desc[group].bg_flags & EXT2_BG_BLOCK_UNINIT;
+ ext2fs_super_and_bgd_loc(fs, group, &super_blk, &old_desc_blk,
+ &new_desc_blk, 0);
+ if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
+ old_desc_blocks = fs->super->s_first_meta_bg;
+ else
+ old_desc_blocks = fs->desc_blocks +
+ fs->super->s_reserved_gdt_blocks;
for (blk = fs->super->s_first_data_block;
blk < fs->super->s_blocks_count; blk++) {
- if (!ext2fs_fast_test_block_bitmap(fs->block_map, blk)) {
+ if ((uninit &&
+ !((blk == super_blk) ||
+ ((old_desc_blk && old_desc_blocks &&
+ (blk >= old_desc_blk) &&
+ (blk < old_desc_blk + old_desc_blocks))) ||
+ ((new_desc_blk && (blk == new_desc_blk))) ||
+ (blk == fs->group_desc[group].bg_block_bitmap) ||
+ (blk == fs->group_desc[group].bg_inode_bitmap) ||
+ ((blk >= fs->group_desc[group].bg_inode_table &&
+ (blk < fs->group_desc[group].bg_inode_table
+ + fs->inode_blocks_per_group))))) ||
+ (!ext2fs_fast_test_block_bitmap(fs->block_map, blk))) {
group_free++;
total_free++;
}
@@ -1598,6 +1630,17 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs)
group++;
count = 0;
group_free = 0;
+ uninit = (fs->group_desc[group].bg_flags &
+ EXT2_BG_BLOCK_UNINIT);
+ ext2fs_super_and_bgd_loc(fs, group, &super_blk,
+ &old_desc_blk,
+ &new_desc_blk, 0);
+ if (fs->super->s_feature_incompat &
+ EXT2_FEATURE_INCOMPAT_META_BG)
+ old_desc_blocks = fs->super->s_first_meta_bg;
+ else
+ old_desc_blocks = fs->desc_blocks +
+ fs->super->s_reserved_gdt_blocks;
}
}
fs->super->s_free_blocks_count = total_free;
@@ -1611,8 +1654,10 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs)
group = 0;

/* Protect loop from wrap-around if s_inodes_count maxed */
+ uninit = fs->group_desc[group].bg_flags & EXT2_BG_INODE_UNINIT;
for (ino = 1; ino <= fs->super->s_inodes_count && ino > 0; ino++) {
- if (!ext2fs_fast_test_inode_bitmap(fs->inode_map, ino)) {
+ if (uninit ||
+ !ext2fs_fast_test_inode_bitmap(fs->inode_map, ino)) {
group_free++;
total_free++;
}
@@ -1625,6 +1670,8 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs)
group++;
count = 0;
group_free = 0;
+ uninit = (fs->group_desc[group].bg_flags &
+ EXT2_BG_INODE_UNINIT);
}
}
fs->super->s_free_inodes_count = total_free;
--
1.5.6.rc3.1.g36b7.dirty


2008-06-17 15:16:51

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] resize2fs: Fix support for the uninit_bg feature

On Jun 17, 2008 02:06 -0400, Theodore Ts'o wrote:
> @@ -1579,13 +1589,35 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs)
> /*
> * First calculate the block statistics
> */
> + uninit = fs->group_desc[group].bg_flags & EXT2_BG_BLOCK_UNINIT;
> + ext2fs_super_and_bgd_loc(fs, group, &super_blk, &old_desc_blk,
> + &new_desc_blk, 0);
> + if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
> + old_desc_blocks = fs->super->s_first_meta_bg;
> + else
> + old_desc_blocks = fs->desc_blocks +
> + fs->super->s_reserved_gdt_blocks;

I'm a bit confused by this. It doesn't seem that you use old_desc_blocks
in the META_BG case? I'm trying to figure out what "old_desc_blocks"
means in the META_BG case.


Also, the number of places where we have to figure out whick blocks
are in use for a particular group is growing quite large, and with the
interactions between various features (META_BG, SPARSE_SUPER, FLEX_BG,
RESIZE_INODE, etc) getting this right in all of these places is hard.
Places that need this, off the top of my head, include mk2fs, e2fsck,
resize2fs, kernel (online resize, uninit_bg), debugfs.

It would be very handy to have a single function that can figure out
the block layout for any group based on all of the features, and then
return the block numbers of the bitmaps, itable, sb, gdt (if present).
This would be like a more complex version of ext2fs_super_and_bgd_loc(),
but it also returns the number of gdt blocks (normal and reserved),
bitmaps, inode table, and itable blocks.

A group descriptor would almost be the right struct, but it doesn't
(yet?) include the GDT block count, itable block count, or a flag if
there is a superblock. Adding these fields into the on-disk group_desc
struct wouldn't be a bad idea, then all the need to recalculate the
group layout in 10 places would disappear.

A second function would take the above block numbers (possibly in the
form of a modified group descriptor) and return a populated block bitmap.

> for (blk = fs->super->s_first_data_block;
> blk < fs->super->s_blocks_count; blk++) {
> - if (!ext2fs_fast_test_block_bitmap(fs->block_map, blk)) {
> + if ((uninit &&
> + !((blk == super_blk) ||
> + ((old_desc_blk && old_desc_blocks &&
> + (blk >= old_desc_blk) &&
> + (blk < old_desc_blk + old_desc_blocks))) ||
> + ((new_desc_blk && (blk == new_desc_blk))) ||
> + (blk == fs->group_desc[group].bg_block_bitmap) ||
> + (blk == fs->group_desc[group].bg_inode_bitmap) ||
> + ((blk >= fs->group_desc[group].bg_inode_table &&
> + (blk < fs->group_desc[group].bg_inode_table
> + + fs->inode_blocks_per_group))))) ||
> + (!ext2fs_fast_test_block_bitmap(fs->block_map, blk))) {
> group_free++;
> total_free++;
> }

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-06-17 16:30:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] resize2fs: Fix support for the uninit_bg feature

On Tue, Jun 17, 2008 at 09:16:47AM -0600, Andreas Dilger wrote:
> On Jun 17, 2008 02:06 -0400, Theodore Ts'o wrote:
> > @@ -1579,13 +1589,35 @@ static errcode_t ext2fs_calculate_summary_stats(ext2_filsys fs)
> > /*
> > * First calculate the block statistics
> > */
> > + uninit = fs->group_desc[group].bg_flags & EXT2_BG_BLOCK_UNINIT;
> > + ext2fs_super_and_bgd_loc(fs, group, &super_blk, &old_desc_blk,
> > + &new_desc_blk, 0);
> > + if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
> > + old_desc_blocks = fs->super->s_first_meta_bg;
> > + else
> > + old_desc_blocks = fs->desc_blocks +
> > + fs->super->s_reserved_gdt_blocks;
>
> I'm a bit confused by this. It doesn't seem that you use old_desc_blocks
> in the META_BG case? I'm trying to figure out what "old_desc_blocks"
> means in the META_BG case.

In the META_BG case, s_first_meta_bg is the number of the "meta
blockgroup" where the META_BG layout starts getting used. The idea
here is that you can have a traditional filesystem layout, and then
instead of using ext2_prepare, you simply grow the block groups until
the traditional block group descriptor space is filled out, and then
after that, you switch over to the meta_bg layout.

To convert conversion between the block group number and a
meta_block_group number, the formula is:

block_grp_num = meta_bg_num * (block_size / descriptor_size)

Because of this, very conveniently fs->super->s_first_meta also is the
size of the block group descriptors for block groups below the meta_bg
boundary.

Hence, old_desc_blocks is the number of blocks used by block group
descriptors for block groups that have the traditional block group
descriptor layout.

> Also, the number of places where we have to figure out whick blocks
> are in use for a particular group is growing quite large, and with the
> interactions between various features (META_BG, SPARSE_SUPER, FLEX_BG,
> RESIZE_INODE, etc) getting this right in all of these places is hard.
> Places that need this, off the top of my head, include mk2fs, e2fsck,
> resize2fs, kernel (online resize, uninit_bg), debugfs.

> It would be very handy to have a single function that can figure out
> the block layout for any group based on all of the features, and then
> return the block numbers of the bitmaps, itable, sb, gdt (if present).
> This would be like a more complex version of ext2fs_super_and_bgd_loc(),
> but it also returns the number of gdt blocks (normal and reserved),
> bitmaps, inode table, and itable blocks.

Yeah, ext2fs_super_and_bgd_loc() is about 95% of this logic. The one
additional bit that isn't included there is the meta_bg magic, i.e.:

if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
old_desc_blocks = fs->super->s_first_meta_bg;
else
old_desc_blocks = fs->desc_blocks +
fs->super->s_reserved_gdt_blocks;

So this would be an example of something I would push into the 64-bit
version of ext2fs_super_and_bgd_loc() in the future.

Everything else depends on exactly how you want to use the
information. The primary pattern is "is this block part of the block
group metadata?", which is this conditional:

((blk == super_blk) ||
((old_desc_blk && old_desc_blocks &&
(blk >= old_desc_blk) &&
(blk < old_desc_blk + old_desc_blocks))) ||
((new_desc_blk && (blk == new_desc_blk))) ||
(blk == fs->group_desc[group].bg_block_bitmap) ||
(blk == fs->group_desc[group].bg_inode_bitmap) ||
((blk >= fs->group_desc[group].bg_inode_table &&
(blk < fs->group_desc[group].bg_inode_table
+ fs->inode_blocks_per_group))))

So this is something I could make available via a function. The other
usage pattern which various functions implement is determining the
number of reserved metadata blocks (from superblock, block group
descriptors, inode tables, and bitmap blocks) in use in that
blockgroup.

> A group descriptor would almost be the right struct, but it doesn't
> (yet?) include the GDT block count, itable block count, or a flag if
> there is a superblock. Adding these fields into the on-disk group_desc
> struct wouldn't be a bad idea, then all the need to recalculate the
> group layout in 10 places would disappear.

It's not worth managing the disk format change, nor the space on disk
for all this information, IMHO.

Probably what we should do is have a new 64-bit version of
ext2fs_super_and_bgd_loc() fill in a data structure which contains all
of the calculated information, instead of in a series of pointers
passed into the function, and then create a new function which given a
block number and the said data structure, returns a boolean true/false
value if the block is part of the metadata.

> A second function would take the above block numbers (possibly in the
> form of a modified group descriptor) and return a populated block bitmap.

Yes, possibly; this will only be used by e2fsck and resize2fs, but
it's probably worth factoring out the code.

- Ted

2008-06-23 20:42:04

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] resize2fs: Fix support for the uninit_bg feature

On Jun 17, 2008 12:30 -0400, Theodore Ts'o wrote:
> On Tue, Jun 17, 2008 at 09:16:47AM -0600, Andreas Dilger wrote:
> > I'm a bit confused by this. It doesn't seem that you use old_desc_blocks
> > in the META_BG case? I'm trying to figure out what "old_desc_blocks"
> > means in the META_BG case.
>
> In the META_BG case, s_first_meta_bg is the number of the "meta
> blockgroup" where the META_BG layout starts getting used. The idea
> here is that you can have a traditional filesystem layout, and then
> instead of using ext2_prepare, you simply grow the block groups until
> the traditional block group descriptor space is filled out, and then
> after that, you switch over to the meta_bg layout.

Sure, I'm aware of how META_BG works.

> To convert conversion between the block group number and a
> meta_block_group number, the formula is:
>
> block_grp_num = meta_bg_num * (block_size / descriptor_size)
>
> Because of this, very conveniently fs->super->s_first_meta also is the
> size of the block group descriptors for block groups below the meta_bg
> boundary.
> Hence, old_desc_blocks is the number of blocks used by block group
> descriptors for block groups that have the traditional block group
> descriptor layout.

Ah, this is what I wasn't seeing. For the above code, can you please
add a comment to this effect.

> > A group descriptor would almost be the right struct, but it doesn't
> > (yet?) include the GDT block count, itable block count, or a flag if
> > there is a superblock. Adding these fields into the on-disk group_desc
> > struct wouldn't be a bad idea, then all the need to recalculate the
> > group layout in 10 places would disappear.
>
> It's not worth managing the disk format change, nor the space on disk
> for all this information, IMHO.

Given that the kernel increasingly needs to be able to generate correct
bitmaps itself (uninit_bg, online resize) it wouldn't be a bad idea IMHO.
The 64-bit descriptor still has some reserved space in it...

On a related note, I'd like to "reserve" the 2 __u32 fields in the 32-bit
group descriptor for inode and block bitmap checksums (at least some
comments to that effect).

> Probably what we should do is have a new 64-bit version of
> ext2fs_super_and_bgd_loc() fill in a data structure which contains all
> of the calculated information, instead of in a series of pointers
> passed into the function, and then create a new function which given a
> block number and the said data structure, returns a boolean true/false
> value if the block is part of the metadata.

Yes, that seems reasonable.

> > A second function would take the above block numbers (possibly in the
> > form of a modified group descriptor) and return a populated block bitmap.
>
> Yes, possibly; this will only be used by e2fsck and resize2fs, but
> it's probably worth factoring out the code.

Ideally mke2fs would be restructured to also use this, for consistency,
instead of the current implementation where it "allocates" the blocks
from the available blocks in the bitmap.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.