2008-08-28 12:36:29

by Frédéric Bohé

[permalink] [raw]
Subject: Online resize with flex bg

I am a bit confused on how to fix the online resize on filesystems using
flex bg.

First, this is roughly how resize currently works (correct me if I am
wrong):

without flexbg :
resize2fs extends the last group until its full size with an ioctl
called GROUP_EXTEND. Then it "prepares" a new group. That is to say, it
computes which blocks will contains the meta-datas for the new group,
then it issue a GROUP_ADD ioctl with those block numbers.
This works both for online and offline resize because new groups
meta-datas are created outside the working filesystem.

with flexbg :
It works the same way but this time, meta-datas blocks for new groups
are created inside the working filesystem (in a group containing
meta-datas for the whole flex group). resize2fs scans from the end of
the last flex_group meta-datas until it finds enough space to put the
new meta-datas. This is not a problem when resizing offline, but when
online, the blocks found for the meta-datas may be allocated by someone
else before the GROUP_ADD ioctl occurs.

I am not sure how to handle this. I guess that resize2fs should be able
to find and allocate the meta-datas blocks without being disturbed by
other process. But it could mean a long time blocking all processes
accessing the filesystem while it searchs for free blocks. That said,
resizing is not done very often so it could be acceptable. Moreover I
guess that using this way of doing things means leaving the kernel side
compute the meta-datas blocks instead of let the userland resize2fs
manage it.
Another approach I think of could be to deliberately write new groups'
meta-datas outside the working filesystems (just like non flex_bg
groups). But this will break the "grouped meta-datas" logic of flex_bg.
We could limit this breakage to the last flex_group of the resized fs if
we add some sort of FLEXGROUP_ADD ioctl which allow to add whole clean
flex_groups to the filesystem.

Any comments/suggestions are welcome.

Fred





2008-08-28 17:11:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Online resize with flex bg

On Thu, Aug 28, 2008 at 04:38:00PM +0200, Fr?d?ric Boh? wrote:
> with flexbg :
> It works the same way but this time, meta-datas blocks for new groups
> are created inside the working filesystem (in a group containing
> meta-datas for the whole flex group). resize2fs scans from the end of
> the last flex_group meta-datas until it finds enough space to put the
> new meta-datas. This is not a problem when resizing offline, but when
> online, the blocks found for the meta-datas may be allocated by someone
> else before the GROUP_ADD ioctl occurs.

Yep, that's a problem. Probably the quick fix is to allocate the
metadata outside of the working filesystem at least for now. Because
we're extending the filesystem block group by block group, the
quick-and-dirty fix would be to just use the non-flex_bg allocation
algorithm and allocate the block bitmap, inode bitmap, and inode table
in the block group.

This we can do in the userspace code right now, since the kernel code
is enforcing that the block and inode bitmaps be in the new block
group at the moment anyway. It is highly undesirable, since it would
break the advantage of flex_bg, but at least online resizing would
work.

To fix it right, I suspect we will need a new ioctl and do more of the
work inside the kernel, instead of in userspace. That's the only way
we can do the block allocation inside the kernel efficiently.

- Ted

2008-08-28 17:36:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Online resize with flex bg

And here's the short-term fix. With this patch, e2fsprogs will work
on ext4 filesystems with flex_bg. Interested in working on the
long-term fix, where we put all or most of the resizing logic in the
kernel? :-)

- Ted

commit df3871a168c3f8308718d72a17062a20aa02cc01
Author: Theodore Ts'o <[email protected]>
Date: Thu Aug 28 13:24:12 2008 -0400

resize2fs: Allow (non-optimal) on-line resizing for ext4 filesystems

The current method of adding one block group at a time to a mounted
filesystem means it is impossible to accomodate the flex_bg allocation
method of placing the metadata together in a single block group. For
now we "fix" this issue by using the traditional layout for new block
groups, where each block group is self-contained and contains its own
bitmap blocks and inode tables. This means we don't get the layout
advantages of flex_bg in the new block groups, but at least it allows
on-line resizing to function.

Long term, we will need to create a new ioctl which does much more of
the resizing work in the kernel.

We also fix a bug in the ext3/ext4 ioctl fallback code so we stop
trying the ext3 ioctl for every single block group when resizing an
ext4 filesystem.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/resize/online.c b/resize/online.c
index f4d24ce..d581553 100644
--- a/resize/online.c
+++ b/resize/online.c
@@ -93,6 +93,17 @@ errcode_t online_resize_fs(ext2_filsys fs, const char *mtpt,
if (retval)
return retval;

+ /* The current method of adding one block group at a time to a
+ * mounted filesystem means it is impossible to accomodate the
+ * flex_bg allocation method of placing the metadata together
+ * in a single block group. For now we "fix" this issue by
+ * using the traditional layout for new block groups, where
+ * each block group is self-contained and contains its own
+ * bitmap blocks and inode tables. This means we don't get
+ * the layout advantages of flex_bg in the new block groups,
+ * but at least it allows on-line resizing to function.
+ */
+ new_fs->super->s_feature_incompat &= ~EXT4_FEATURE_INCOMPAT_FLEX_BG;
retval = adjust_fs_info(new_fs, fs, *new_size);
if (retval)
return retval;
@@ -154,7 +165,7 @@ errcode_t online_resize_fs(ext2_filsys fs, const char *mtpt,
ioctl(fd, EXT2_IOC_GROUP_ADD, &input) == 0)
continue;
else
- use_old_ioctl = 1;
+ use_old_ioctl = 0;

input64.group = input.group;
input64.block_bitmap = input.block_bitmap;

2008-08-28 19:57:48

by Andreas Dilger

[permalink] [raw]
Subject: Re: Online resize with flex bg

On Aug 28, 2008 13:36 -0400, Theodore Ts'o wrote:
> And here's the short-term fix. With this patch, e2fsprogs will work
> on ext4 filesystems with flex_bg. Interested in working on the
> long-term fix, where we put all or most of the resizing logic in the
> kernel? :-)

Yes, that was my thought also. The setup_new_group_blocks() can already
do pretty much everything, the only piece lacking in the past was the
ability to specify the raid stride, but that is now in the superblock
also.

What would be good is to fix up setup_new_group_blocks() if GDT_CSUM[*]
feature is set to flag uninit_bg groups and set the itable_unused field.
We still need the itable zeroing, but setting ITABLE_ZERO flag will save
possibly duplicate work later.

Cheers, Andreas

[*] maybe we should add an UNINIT_BG #define and move over to that slowly?
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-09-01 12:00:16

by Frédéric Bohé

[permalink] [raw]
Subject: [PATCH] ext4: update flex bg counters when resizing

From: Frederic Bohe <[email protected]>

Update flex_bg free blocks and free inodes counters when resizing.

Signed-off-by: Frederic Bohe <[email protected]>
---
This patch fixes a bug which prevents to use the newly created inodes after a resize with flex bg.
I am not sure if it is really usefull to update free blocks too, but it is definitely cleaner code.

resize.c | 9 +++++++++
super.c | 7 +++++--
2 files changed, 14 insertions(+), 2 deletions(-)

Index: linux-2.6.27-rc5+patch_queue/fs/ext4/resize.c
===================================================================
--- linux-2.6.27-rc5+patch_queue.orig/fs/ext4/resize.c 2008-09-01 11:23:09.000000000 +0200
+++ linux-2.6.27-rc5+patch_queue/fs/ext4/resize.c 2008-09-01 13:16:03.000000000 +0200
@@ -929,6 +929,15 @@ int ext4_group_add(struct super_block *s
percpu_counter_add(&sbi->s_freeinodes_counter,
EXT4_INODES_PER_GROUP(sb));

+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
+ ext4_group_t flex_group;
+ flex_group = ext4_flex_group(sbi, input->group);
+ sbi->s_flex_groups[flex_group].free_blocks +=
+ input->free_blocks_count;
+ sbi->s_flex_groups[flex_group].free_inodes +=
+ EXT4_INODES_PER_GROUP(sb);
+ }
+
ext4_journal_dirty_metadata(handle, sbi->s_sbh);
sb->s_dirt = 1;

Index: linux-2.6.27-rc5+patch_queue/fs/ext4/super.c
===================================================================
--- linux-2.6.27-rc5+patch_queue.orig/fs/ext4/super.c 2008-09-01 12:38:03.000000000 +0200
+++ linux-2.6.27-rc5+patch_queue/fs/ext4/super.c 2008-09-01 13:29:39.000000000 +0200
@@ -1507,8 +1507,11 @@ static int ext4_fill_flex_info(struct su
sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
groups_per_flex = 1 << sbi->s_log_groups_per_flex;

- flex_group_count = (sbi->s_groups_count + groups_per_flex - 1) /
- groups_per_flex;
+ /* We allocate both existing and potentially added groups */
+ flex_group_count = ((sbi->s_groups_count + groups_per_flex - 1) +
+ ((sbi->s_es->s_reserved_gdt_blocks +1 ) <<
+ EXT4_DESC_PER_BLOCK_BITS(sb))) /
+ groups_per_flex;
sbi->s_flex_groups = kzalloc(flex_group_count *
sizeof(struct flex_groups), GFP_KERNEL);
if (sbi->s_flex_groups == NULL) {

--


2008-09-08 14:18:13

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: update flex bg counters when resizing

On Mon, Sep 01, 2008 at 04:02:18PM +0200, Fr?d?ric Boh? wrote:
> From: Frederic Bohe <[email protected]>
>
> Update flex_bg free blocks and free inodes counters when resizing.

Thanks, I've added it to the patch queue.

- Ted