2007-04-19 21:11:52

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH] fix up lazy_bg bitmap initialization at mkfs time

While trying out the -O lazy_bg option, I ran into some trouble on my
big filesystem. The journal size was > free blocks in the first block group,
so it spilled into the next bg with available blocks. Since we are using
lazy_bg here, that -should- have been the last block group. But, when
setup_lazy_bg() marks block groups as UNINIT, it doesn't do anything with
the bitmaps (as designed). However, the block allocation routine simply
searches the bitmap for next available blocks, and finds them in the 2nd
bg, despite it being marked UNINIT - the summaries aren't checked during
allocation. This also caused the 1st group free block numbers to get
out of whack, as we start subtracting from zero:

Group 0: block bitmap at 1025, inode bitmap at 1026, inode table at 1027
0 free blocks, 16373 free inodes, 2 used directories
Group 1: block bitmap at 33793, inode bitmap at 33794, inode table at 33795
63957 free blocks, 0 free inodes, 0 used directories
[Inode not init, Block not init]
Group 2: block bitmap at 65536, inode bitmap at 65537, inode table at 65538
0 free blocks, 0 free inodes, 0 used directories
[Inode not init, Block not init]

The following patch seems to fix this up for me; just mark the in-memory
bitmaps as full for any bg's we flag as UNINIT. The bitmaps aren't marked
as dirty, so they won't be written out. When bitmaps are re-read on the next
invocation of debugfs, etc, the UNINIT flag will be found, and again
the in-memory bitmaps will be marked as full.

This has the somewhat interesting, but correct, result of making the
journal blocks land in both the first and last bgs of a 16T filesystem: :)

BLOCKS:
(0-11):1520-1531, (IND):1532, (12-1035):1533-2556, <...>
(IND):4194272286, (31756-32768):4194272287-419427329

Unfortunately it also increases mkfs time a bit, as it must search
a huge string of unavailable blocks if it has to allocate in the
last bg. Ah well...

Thanks,
-Eric

Signed-off-by: Eric Sandeen <[email protected]>

Index: e2fsprogs-1.39_ext4_hg/misc/mke2fs.c
===================================================================
--- e2fsprogs-1.39_ext4_hg.orig/misc/mke2fs.c
+++ e2fsprogs-1.39_ext4_hg/misc/mke2fs.c
@@ -450,16 +450,22 @@ static void setup_lazy_bg(ext2_filsys fs
int blks;
struct ext2_super_block *sb = fs->super;
struct ext2_group_desc *bg = fs->group_desc;
+ char *block_bitmap = fs->block_map->bitmap;
+ char *inode_bitmap = fs->inode_map->bitmap;
+ int block_nbytes = (int) EXT2_BLOCKS_PER_GROUP(fs->super) / 8;
+ int inode_nbytes = (int) EXT2_INODES_PER_GROUP(fs->super) / 8;

if (EXT2_HAS_COMPAT_FEATURE(fs->super,
EXT2_FEATURE_COMPAT_LAZY_BG)) {
for (i = 0; i < fs->group_desc_count; i++, bg++) {
if ((i == 0) ||
(i == fs->group_desc_count-1))
- continue;
+ goto skip;
if (bg->bg_free_inodes_count ==
sb->s_inodes_per_group) {
bg->bg_free_inodes_count = 0;
+ /* NB: set in mem only, see also read_bitmaps */
+ memset(inode_bitmap, 0xff, inode_nbytes);
bg->bg_flags |= EXT2_BG_INODE_UNINIT;
sb->s_free_inodes_count -=
sb->s_inodes_per_group;
@@ -467,9 +473,13 @@ static void setup_lazy_bg(ext2_filsys fs
blks = ext2fs_super_and_bgd_loc(fs, i, 0, 0, 0, 0);
if (bg->bg_free_blocks_count == blks) {
bg->bg_free_blocks_count = 0;
+ memset(block_bitmap, 0xff, block_nbytes);
bg->bg_flags |= EXT2_BG_BLOCK_UNINIT;
sb->s_free_blocks_count -= blks;
}
+skip:
+ block_bitmap += block_nbytes;
+ inode_bitmap += inode_nbytes;
}
}
}


2007-04-20 10:10:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] fix up lazy_bg bitmap initialization at mkfs time

On Apr 19, 2007 16:07 -0500, Eric Sandeen wrote:
> While trying out the -O lazy_bg option, I ran into some trouble on my
> big filesystem. The journal size was > free blocks in the first block group,
> so it spilled into the next bg with available blocks. Since we are using
> lazy_bg here, that -should- have been the last block group.

Just as an FYI - it probably makes little sense to have a 128MB journal
for a filesystem you want to use for testing, since it would still write
128MB to your loopback device when zeroing the journal. It still makes
sense for mke2fs and libext2fs to be fixed for correctness, but I think
it also makes sense for lazy_bg to influence the journal size.

> Unfortunately it also increases mkfs time a bit, as it must search
> a huge string of unavailable blocks if it has to allocate in the
> last bg. Ah well...

It also probably makes sense for libext2fs to check group descriptors
while allocating...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-04-20 15:02:02

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] fix up lazy_bg bitmap initialization at mkfs time

Andreas Dilger wrote:

> Just as an FYI - it probably makes little sense to have a 128MB journal
> for a filesystem you want to use for testing, since it would still write
> 128MB to your loopback device when zeroing the journal. It still makes
> sense for mke2fs and libext2fs to be fixed for correctness, but I think
> it also makes sense for lazy_bg to influence the journal size.
>
yup, it probably should change that heuristic. With lazy_bg, it should probably
look at something like blocks per group * 2 instead of total filesystem blocks...
see below.

>> Unfortunately it also increases mkfs time a bit, as it must search
>> a huge string of unavailable blocks if it has to allocate in the
>> last bg. Ah well...
>>
>
> It also probably makes sense for libext2fs to check group descriptors
> while allocating...
>
yeah... I was going with the "lazy" theme. ;-) (and the
small-functional-change-at-a-time theme...) But for large filesystems, that
would certainly be a bonus.

/*
* Stupid algorithm --- we now just search forward starting from the
* goal. Should put in a smarter one someday....
*/
errcode_t ext2fs_new_block(...

I get the impression that allocation in libext2 is not too glorious in
the first place... :)

-Eric

Anyway, how about something like this for calculating journal size in
the face of lazy_bg. I know the last group may be smaller... but I figure
this is just a heuristic anyway.

Signed-off-by: Eric Sandeen <[email protected]>

Index: e2fsprogs-1.39_ext4_hg/misc/util.c
===================================================================
--- e2fsprogs-1.39_ext4_hg.orig/misc/util.c
+++ e2fsprogs-1.39_ext4_hg/misc/util.c
@@ -252,8 +252,16 @@ void parse_journal_opts(const char *opts
int figure_journal_size(int size, ext2_filsys fs)
{
blk_t j_blocks;
+ blk_t fs_size;

- if (fs->super->s_blocks_count < 2048) {
+ if (EXT2_HAS_COMPAT_FEATURE(fs->super,
+ EXT2_FEATURE_COMPAT_LAZY_BG)) {
+ fs_size = fs->super->s_blocks_per_group * 2;
+ } else {
+ fs_size = fs->super->s_blocks_count;
+ }
+
+ if (fs_size < 2048) {
fputs(_("\nFilesystem too small for a journal\n"), stderr);
return 0;
}
@@ -276,13 +284,13 @@ int figure_journal_size(int size, ext2_f
return j_blocks;
}

- if (fs->super->s_blocks_count < 32768)
+ if (fs_size < 32768)
j_blocks = 1400;
- else if (fs->super->s_blocks_count < 256*1024)
+ else if (fs_size < 256*1024)
j_blocks = 4096;
- else if (fs->super->s_blocks_count < 512*1024)
+ else if (fs_size < 512*1024)
j_blocks = 8192;
- else if (fs->super->s_blocks_count < 1024*1024)
+ else if (fs_size < 1024*1024)
j_blocks = 16384;
else
j_blocks = 32768;

2007-04-20 22:19:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] fix up lazy_bg bitmap initialization at mkfs time

On Apr 20, 2007 09:57 -0500, Eric Sandeen wrote:
> Anyway, how about something like this for calculating journal size in
> the face of lazy_bg. I know the last group may be smaller... but I figure
> this is just a heuristic anyway.
>
> Signed-off-by: Eric Sandeen <[email protected]>
>
> Index: e2fsprogs-1.39_ext4_hg/misc/util.c
> ===================================================================
> --- e2fsprogs-1.39_ext4_hg.orig/misc/util.c
> +++ e2fsprogs-1.39_ext4_hg/misc/util.c
> @@ -252,8 +252,16 @@ void parse_journal_opts(const char *opts
> int figure_journal_size(int size, ext2_filsys fs)
> {
> blk_t j_blocks;
> + blk_t fs_size;
>
> - if (fs->super->s_blocks_count < 2048) {
> + if (EXT2_HAS_COMPAT_FEATURE(fs->super,
> + EXT2_FEATURE_COMPAT_LAZY_BG)) {
> + fs_size = fs->super->s_blocks_per_group * 2;
> + } else {
> + fs_size = fs->super->s_blocks_count;
> + }

This should also check for !RO_COMPAT_UNINIT_GROUPS, since LAZY_BG
can be used in conjunction with UNINIT_GROUPS to mean "don't zero
uninitialized groups" but doesn't set bg_free_blocks_count == 0.

I was going to suggest using s_free_blocks_count, but that might
lead to confusion if e.g. e2fsck deletes a journal on a nearly-full
fs and then tune2fs recreates it much smaller than before.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.