2007-07-10 16:27:23

by Jose R. Santos

[permalink] [raw]
Subject: Initial results of FLEX_BG feature.

Hi folks,

I've started playing with the FLEX_BG feature (for now packing of
block group metadata closer together) and started doing some
preliminary benchmarking to see if the feature is worth pursuing.
I chose an FFSB profile that does single threaded small creates and
writes and then does an fsync. This is something I ran for a customer
a while ago in which ext3 performed poorly.

Here are some of the results (in transactions/sec@%CPU util) on a single
143GB@10K rpm disk.

ext4 [email protected]%
ext4(flex_bg) [email protected]% 20% improvement
ext4(data=writeback) [email protected]% <- hum...
ext4(flex_bg data=writeback) [email protected]% 28% over best ext4
ext3 [email protected]%
ext3(data=writeback) [email protected]%
ext2 [email protected]%
xfs [email protected]%
jfs [email protected]%

The results are from packing the metadata of 64 block groups closer
together at fsck time. Still need to clean up the e2fsprog patches,
but I hope to submit them to the list later this week for others to
try. It seems like fsck doesn't quite like the new location of the
metadata and I'm not sure how big of an effort it will be to fix it. I
mentioned this since one of the assumptions of implementing FLEX_BG was
the reduce time in fsck and it could be a while before I'm able to test
this.

-JRS


2007-07-11 04:12:17

by Andreas Dilger

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Jul 10, 2007 11:23 -0500, Jose R. Santos wrote:
> I've started playing with the FLEX_BG feature (for now packing of
> block group metadata closer together) and started doing some
> preliminary benchmarking to see if the feature is worth pursuing.
> I chose an FFSB profile that does single threaded small creates and
> writes and then does an fsync. This is something I ran for a customer
> a while ago in which ext3 performed poorly.

Jose,
thanks for the information and testing. This is definitely very
interesting and shows this is an avenue we should pursue.

> Here are some of the results (in transactions/sec@%CPU util) on a single
> 143GB@10K rpm disk.
>
> ext4 [email protected]%
> ext4(flex_bg) [email protected]% 20% improvement
> ext4(data=writeback) [email protected]% <- hum...
> ext4(flex_bg data=writeback) [email protected]% 28% over best ext4
> ext3 [email protected]%
> ext3(data=writeback) [email protected]%
> ext2 [email protected]%
> xfs [email protected]%
> jfs [email protected]%
>
> The results are from packing the metadata of 64 block groups closer
> together at fsck time. Still need to clean up the e2fsprog patches,

Does this mean that you are just moving the bitmaps and inode table
at mke2fs time, or also such things as directory blocks at fsck time?

> but I hope to submit them to the list later this week for others to
> try. It seems like fsck doesn't quite like the new location of the
> metadata and I'm not sure how big of an effort it will be to fix it. I
> mentioned this since one of the assumptions of implementing FLEX_BG was
> the reduce time in fsck and it could be a while before I'm able to test
> this.

i think in the spirit of the original META_BG option, Ted had wanted to
put all the bitmaps from EXT4_DESC_PER_BLOCK groups somewhere within the
metagroup. It would also be interesting to see if moving ALL of the
group metadata to a single location in the filesystem makes a bit difference.
If not, then we may as well keep it spread out for safety.

You might also want to test out placement of the journal in the middle
of the filesystem, the U. Wisconsin folks tested this in one of their
papers and showed some noticable improvements. That isn't exactly
related, but it is a relatively simple tweak to mke2fs/tune2fs to give
it an allocation goal of group_desc[s_groups_count / 2].bg_inode_table
(to put it past inode table in middle group).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-07-11 05:30:13

by Jose R. Santos

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Tue, 10 Jul 2007 22:12:14 -0600
Andreas Dilger <[email protected]> wrote:

> On Jul 10, 2007 11:23 -0500, Jose R. Santos wrote:
> > I've started playing with the FLEX_BG feature (for now packing of
> > block group metadata closer together) and started doing some
> > preliminary benchmarking to see if the feature is worth pursuing.
> > I chose an FFSB profile that does single threaded small creates and
> > writes and then does an fsync. This is something I ran for a customer
> > a while ago in which ext3 performed poorly.
>
> Jose,
> thanks for the information and testing. This is definitely very
> interesting and shows this is an avenue we should pursue.
>
> > Here are some of the results (in transactions/sec@%CPU util) on a single
> > 143GB@10K rpm disk.
> >
> > ext4 [email protected]%
> > ext4(flex_bg) [email protected]% 20% improvement
> > ext4(data=writeback) [email protected]% <- hum...
> > ext4(flex_bg data=writeback) [email protected]% 28% over best ext4
> > ext3 [email protected]%
> > ext3(data=writeback) [email protected]%
> > ext2 [email protected]%
> > xfs [email protected]%
> > jfs [email protected]%
> >
> > The results are from packing the metadata of 64 block groups closer
> > together at fsck time. Still need to clean up the e2fsprog patches,
>
> Does this mean that you are just moving the bitmaps and inode table
> at mke2fs time, or also such things as directory blocks at fsck time?

Right now what I've done is allocate the bitmaps and inode tables at the
beginning of each group of 64 BG. Still need to work on fsck since just
removing the restriction on were the bitmaps and inode table are
located still gives me errors of uninitialized inodes with dtime set.
Seems like fsck still expect inode information to be located at
specific locations within the disk.

> > but I hope to submit them to the list later this week for others to
> > try. It seems like fsck doesn't quite like the new location of the
> > metadata and I'm not sure how big of an effort it will be to fix it. I
> > mentioned this since one of the assumptions of implementing FLEX_BG was
> > the reduce time in fsck and it could be a while before I'm able to test
> > this.
>
> i think in the spirit of the original META_BG option, Ted had wanted to
> put all the bitmaps from EXT4_DESC_PER_BLOCK groups somewhere within the
> metagroup. It would also be interesting to see if moving ALL of the
> group metadata to a single location in the filesystem makes a bit difference.
> If not, then we may as well keep it spread out for safety.

This is by no means a final implementation, rather it's a means to
test whether this feature is worth pursuing. I plan on testing various
thing before coming up with a final design of what the feature should
look like.

I did try moving all of the groups metadata at the beginning of the
disk but it was slightly slower on an rsync test. Have not tried it
with FFSB yet.

Things on the TODO list of testing needed to be done are:

- More metadata intensive FFSB profile testing. I've been meaning to
add more operations to FFSB in order to make this possible. Now I have
an excuse.

- Testing of different ratios of groups per flex groups.

- Testing with storage devices with fast write cache. When I did the
customer testing a couple of months ago with this FFSB profile, JFS was
the fastest of the filesystems when paired with a decent storage
subsystem with fast write cache. It would be interesting to see what
effects do fast write caching have on such a feature.

- Testing fsck time once e2fsprogs understands how to read such a
filesystem.

- Testing an aged file systems to see what effects (if any) this
feature has in a fragmented filesystem.


> You might also want to test out placement of the journal in the middle
> of the filesystem, the U. Wisconsin folks tested this in one of their
> papers and showed some noticable improvements. That isn't exactly
> related, but it is a relatively simple tweak to mke2fs/tune2fs to give
> it an allocation goal of group_desc[s_groups_count / 2].bg_inode_table
> (to put it past inode table in middle group).

Make sense. Do you have a link to the paper?

> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>

Thanks

-JRS

2007-07-11 05:40:02

by Eric Sandeen

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

Jose R. Santos wrote:
> On Tue, 10 Jul 2007 22:12:14 -0600
> Andreas Dilger <[email protected]> wrote:

...

>
>> You might also want to test out placement of the journal in the middle
>> of the filesystem, the U. Wisconsin folks tested this in one of their
>> papers and showed some noticable improvements. That isn't exactly
>> related, but it is a relatively simple tweak to mke2fs/tune2fs to give
>> it an allocation goal of group_desc[s_groups_count / 2].bg_inode_table
>> (to put it past inode table in middle group).
>
> Make sense. Do you have a link to the paper?

filesystem shrinking would need to be fixed to handle this too, right...

-Eric

2007-07-11 12:41:12

by Andreas Dilger

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Jul 11, 2007 00:30 -0500, Jose R. Santos wrote:
> On Tue, 10 Jul 2007 22:12:14 -0600
> Andreas Dilger <[email protected]> wrote:
> > You might also want to test out placement of the journal in the middle
> > of the filesystem, the U. Wisconsin folks tested this in one of their
> > papers and showed some noticable improvements. That isn't exactly
> > related, but it is a relatively simple tweak to mke2fs/tune2fs to give
> > it an allocation goal of group_desc[s_groups_count / 2].bg_inode_table
> > (to put it past inode table in middle group).
>
> Make sense. Do you have a link to the paper?

I don't have a URL, but if you search for "IRON ext3" and go to Remzi's
site it is called "Analysis and Evolution of Journaling File Systems".

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-07-11 22:09:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote:
> > i think in the spirit of the original META_BG option, Ted had wanted to
> > put all the bitmaps from EXT4_DESC_PER_BLOCK groups somewhere within the
> > metagroup. It would also be interesting to see if moving ALL of the
> > group metadata to a single location in the filesystem makes a bit difference.
> > If not, then we may as well keep it spread out for safety.

My original intention was that META_BG would place the bitmaps and
inode tables at the beginning of each metagroup by default, but that
the constraints about where to put the bitmaps and inode tables would
be completely relaxed from the point of view of requirements by the
kernel and e2fsck. Unfortunately while I had patches which removed
the constraints checking, they never made it into mainline of either
the kernel or e2fsprogs, sigh.

- Ted

2007-07-11 22:14:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote:
> Right now what I've done is allocate the bitmaps and inode tables at the
> beginning of each group of 64 BG. Still need to work on fsck since just
> removing the restriction on were the bitmaps and inode table are
> located still gives me errors of uninitialized inodes with dtime set.
> Seems like fsck still expect inode information to be located at
> specific locations within the disk.

Can you send me the patch which you were playing with? I might be
able to help you with this. It should be pretty straightforward to
remove the constraint on the inode table location.

It really should only be a check in e2fsck/super.c:check_super_block(),
as far as I know.

If you're seeing errors of unitialized inodes with dtime set, that
sounds like maybe something else is going on. All of e2fsprogs should
be referencing the inode table via fs->group_desc[group_num].bg_inode_table.
See lib/ext2fs/inode.c, functions ext2fs_open_inode_scan(),
get_next_blockgroup(), and ext2fs_read_inode_full().

- Ted

2007-07-12 15:03:43

by Jose R. Santos

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Wed, 11 Jul 2007 18:14:25 -0400
Theodore Tso <[email protected]> wrote:
> On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote:
> > Right now what I've done is allocate the bitmaps and inode tables at the
> > beginning of each group of 64 BG. Still need to work on fsck since just
> > removing the restriction on were the bitmaps and inode table are
> > located still gives me errors of uninitialized inodes with dtime set.
> > Seems like fsck still expect inode information to be located at
> > specific locations within the disk.
>
> Can you send me the patch which you were playing with? I might be
> able to help you with this. It should be pretty straightforward to
> remove the constraint on the inode table location.
>
> It really should only be a check in e2fsck/super.c:check_super_block(),
> as far as I know.
>
> If you're seeing errors of unitialized inodes with dtime set, that
> sounds like maybe something else is going on. All of e2fsprogs should
> be referencing the inode table via fs->group_desc[group_num].bg_inode_table.
> See lib/ext2fs/inode.c, functions ext2fs_open_inode_scan(),
> get_next_blockgroup(), and ext2fs_read_inode_full().
>
> - Ted

Here is a very rough patch of the FLEX_BG feature implementation.
Still works as a prototype but there are a couple of thing that are
either broken or hard coded. As it currently stands, it can not be use
to create filesystem without the FLEX_BG features as I have not made
ext2fs_allocate_tables() backward compatible.

The number of groups per flex group is also hard coded to 64. Still
thinking on whether I should add this to the super block it self in
order to help recovery of the filesystem as well as possibly making
allocation algorithms in the kernel aware of the new groups
arrangements.

I create a filesystem using the following command:

mke2fs -j -O meta_bg,flex_bg /dev/sdh

While meta_bg is not required, having block group descriptors spread
across the multiple block groups does increase the chances of
fragmenting the meta data.

-JRS

diff -Naurp e2fsprogs-1.40/e2fsck/super.c /home/jsantos/e2fsprogs-1.40-flex/e2fsck/super.c
--- e2fsprogs-1.40/e2fsck/super.c 2007-06-03 23:48:01.000000000 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/e2fsck/super.c 2007-07-09 11:27:56.000000000 -0500
@@ -580,27 +580,31 @@ void check_super_block(e2fsck_t ctx)

first_block = ext2fs_group_first_block(fs, i);
last_block = ext2fs_group_last_block(fs, i);
-
+/*
if ((gd->bg_block_bitmap < first_block) ||
(gd->bg_block_bitmap > last_block)) {
pctx.blk = gd->bg_block_bitmap;
if (fix_problem(ctx, PR_0_BB_NOT_GROUP, &pctx))
gd->bg_block_bitmap = 0;
}
+*/
if (gd->bg_block_bitmap == 0) {
ctx->invalid_block_bitmap_flag[i]++;
ctx->invalid_bitmaps++;
}
+/*
if ((gd->bg_inode_bitmap < first_block) ||
(gd->bg_inode_bitmap > last_block)) {
pctx.blk = gd->bg_inode_bitmap;
if (fix_problem(ctx, PR_0_IB_NOT_GROUP, &pctx))
gd->bg_inode_bitmap = 0;
}
+*/
if (gd->bg_inode_bitmap == 0) {
ctx->invalid_inode_bitmap_flag[i]++;
ctx->invalid_bitmaps++;
}
+/*
if ((gd->bg_inode_table < first_block) ||
((gd->bg_inode_table +
fs->inode_blocks_per_group - 1) > last_block)) {
@@ -608,6 +612,7 @@ void check_super_block(e2fsck_t ctx)
if (fix_problem(ctx, PR_0_ITABLE_NOT_GROUP, &pctx))
gd->bg_inode_table = 0;
}
+*/
if (gd->bg_inode_table == 0) {
ctx->invalid_inode_table_flag[i]++;
ctx->invalid_bitmaps++;
diff -Naurp e2fsprogs-1.40/lib/e2p/feature.c /home/jsantos/e2fsprogs-1.40-flex/lib/e2p/feature.c
--- e2fsprogs-1.40/lib/e2p/feature.c 2007-03-21 15:46:10.000000000 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/lib/e2p/feature.c 2007-07-10 15:21:25.000000000 -0500
@@ -67,6 +67,8 @@ static struct feature feature_list[] = {
"extent" },
{ E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_64BIT,
"64bit" },
+ { E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FLEX_BG,
+ "flex_bg"},
{ 0, 0, 0 },
};

diff -Naurp e2fsprogs-1.40/lib/ext2fs/alloc_tables.c /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/alloc_tables.c
--- e2fsprogs-1.40/lib/ext2fs/alloc_tables.c 2006-09-12 13:56:16.000000000 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/alloc_tables.c 2007-07-12 09:34:38.988302340 -0500
@@ -102,16 +102,91 @@ errcode_t ext2fs_allocate_group_table(ex



+errcode_t ext2fs_allocate_contiguous(ext2_filsys fs, dgrp_t group,
+ int type, blk_t start_blk, blk_t last_blk,
+ int count, ext2fs_block_bitmap bmap)
+{
+ errcode_t retval;
+ blk_t new_blk, blk;
+ int j,i, start_group;
+
+ if (!bmap)
+ bmap = fs->block_map;
+
+ switch (type) {
+ case 1:
+ if (!fs->group_desc[group].bg_block_bitmap){
+ retval = ext2fs_get_free_blocks(fs, start_blk, last_blk,
+ 1 * count, bmap, &new_blk);
+ if (retval)
+ return retval;
+ for (i=0, blk= new_blk; i < count; i++, blk++){
+ ext2fs_mark_block_bitmap(bmap, blk);
+ fs->group_desc[group+i].bg_block_bitmap = blk;
+ }
+ }
+ break;
+ case 2:
+ if (!fs->group_desc[group].bg_inode_bitmap) {
+ retval = ext2fs_get_free_blocks(fs, start_blk, last_blk,
+ 1 * count, bmap, &new_blk);
+ if (retval)
+ return retval;
+ for (i=0, blk= new_blk; i < count; i++, blk++){
+ ext2fs_mark_block_bitmap(bmap, blk);
+ fs->group_desc[group+i].bg_inode_bitmap = blk;
+ }
+ }
+ break;
+ case 3:
+ for (i=0, blk=new_blk; i < count; i++, blk++){
+ if (!fs->group_desc[group+i].bg_inode_table) {
+ retval = ext2fs_get_free_blocks(fs, start_blk, last_blk,
+ fs->inode_blocks_per_group,
+ bmap, &new_blk);
+ if (retval)
+ return retval;
+ for (j=0, blk = new_blk;
+ j < fs->inode_blocks_per_group;
+ j++, blk++)
+ ext2fs_mark_block_bitmap(bmap, blk);
+ fs->group_desc[group+i].bg_inode_table = blk;
+ }
+ }
+ break;
+ }
+ return 0;
+}
+
errcode_t ext2fs_allocate_tables(ext2_filsys fs)
{
+
errcode_t retval;
+ blk_t start, last;
dgrp_t i;
+ int gpm;

- for (i = 0; i < fs->group_desc_count; i++) {
- retval = ext2fs_allocate_group_table(fs, i, fs->block_map);
+ gpm = 64;
+
+ for (i = 0; i < fs->group_desc_count; i=i+gpm) {
+ start = ext2fs_group_first_block(fs, i);
+ last = ext2fs_group_last_block(fs, i+gpm-1);
+
+ retval = ext2fs_allocate_contiguous(fs, i, 1,
+ start, last, gpm,
+ fs->block_map);
if (retval)
return retval;
+ retval = ext2fs_allocate_contiguous(fs, i, 2,
+ start, last, gpm,
+ fs->block_map);
+ if (retval)
+ return retval;
+ retval = ext2fs_allocate_contiguous(fs, i, 3,
+ start, last, gpm,
+ fs->block_map);
+ if (retval)
+ return retval;
+
}
- return 0;
}
-
diff -Naurp e2fsprogs-1.40/lib/ext2fs/ext2_fs.h /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/ext2_fs.h
--- e2fsprogs-1.40/lib/ext2fs/ext2_fs.h 2007-05-31 11:27:31.000000000 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/ext2_fs.h 2007-07-10 15:23:48.000000000 -0500
@@ -640,6 +640,7 @@ struct ext2_super_block {
#define EXT3_FEATURE_INCOMPAT_EXTENTS 0x0040
#define EXT4_FEATURE_INCOMPAT_64BIT 0x0080
#define EXT4_FEATURE_INCOMPAT_MMP 0x0100
+#define EXT4_FEATURE_INCOMPAT_FLEX_BG 0x0200


#define EXT2_FEATURE_COMPAT_SUPP 0
diff -Naurp e2fsprogs-1.40/lib/ext2fs/ext2fs.h /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/ext2fs.h
--- e2fsprogs-1.40/lib/ext2fs/ext2fs.h 2007-06-21 10:59:05.000000000 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/ext2fs.h 2007-07-12 09:31:22.628464371 -0500
@@ -452,12 +452,14 @@ typedef struct ext2_icount *ext2_icount_
EXT2_FEATURE_INCOMPAT_COMPRESSION|\
EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|\
EXT2_FEATURE_INCOMPAT_META_BG|\
- EXT3_FEATURE_INCOMPAT_RECOVER)
+ EXT3_FEATURE_INCOMPAT_RECOVER|\
+ EXT4_FEATURE_INCOMPAT_FLEX_BG)
#else
#define EXT2_LIB_FEATURE_INCOMPAT_SUPP (EXT2_FEATURE_INCOMPAT_FILETYPE|\
EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|\
EXT2_FEATURE_INCOMPAT_META_BG|\
- EXT3_FEATURE_INCOMPAT_RECOVER)
+ EXT3_FEATURE_INCOMPAT_RECOVER|\
+ EXT4_FEATURE_INCOMPAT_FLEX_BG)
#endif
#define EXT2_LIB_FEATURE_RO_COMPAT_SUPP (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
EXT2_FEATURE_RO_COMPAT_LARGE_FILE)
diff -Naurp e2fsprogs-1.40/misc/mke2fs.c /home/jsantos/e2fsprogs-1.40-flex/misc/mke2fs.c
--- e2fsprogs-1.40/misc/mke2fs.c 2007-05-31 10:28:23.000000000 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/misc/mke2fs.c 2007-07-12 09:32:27.210692448 -0500
@@ -873,7 +873,8 @@ static __u32 ok_features[3] = {
EXT2_FEATURE_COMPAT_LAZY_BG, /* Compat */
EXT2_FEATURE_INCOMPAT_FILETYPE| /* Incompat */
EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|
- EXT2_FEATURE_INCOMPAT_META_BG,
+ EXT2_FEATURE_INCOMPAT_META_BG|
+ EXT4_FEATURE_INCOMPAT_FLEX_BG,
EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER /* R/O compat */
};


2007-07-12 15:10:44

by Jose R. Santos

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Wed, 11 Jul 2007 18:14:25 -0400
Theodore Tso <[email protected]> wrote:

> On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote:
> > Right now what I've done is allocate the bitmaps and inode tables at the
> > beginning of each group of 64 BG. Still need to work on fsck since just
> > removing the restriction on were the bitmaps and inode table are
> > located still gives me errors of uninitialized inodes with dtime set.
> > Seems like fsck still expect inode information to be located at
> > specific locations within the disk.
>
> Can you send me the patch which you were playing with? I might be
> able to help you with this. It should be pretty straightforward to
> remove the constraint on the inode table location.

Here is the kernel piece.

-JRS

---
fs/ext4/super.c | 3 3 + 0 - 0 !
include/linux/ext4_fs.h | 4 3 + 1 - 0 !
2 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/ext4/super.c
===================================================================
--- linux-2.6.orig/fs/ext4/super.c 2007-07-11 15:34:58.000000000 -0500
+++ linux-2.6/fs/ext4/super.c 2007-07-11 16:19:08.000000000 -0500
@@ -1271,6 +1271,9 @@ static int ext4_check_descriptors (struc

ext4_debug ("Checking group descriptors");

+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
+ return 1;
+
for (i = 0; i < sbi->s_groups_count; i++)
{
if (i == sbi->s_groups_count - 1)
Index: linux-2.6/include/linux/ext4_fs.h
===================================================================
--- linux-2.6.orig/include/linux/ext4_fs.h 2007-07-11 15:34:58.000000000 -0500
+++ linux-2.6/include/linux/ext4_fs.h 2007-07-12 09:58:51.000000000 -0500
@@ -698,13 +698,15 @@ static inline int ext4_valid_inum(struct
#define EXT4_FEATURE_INCOMPAT_META_BG 0x0010
#define EXT4_FEATURE_INCOMPAT_EXTENTS 0x0040 /* extents support */
#define EXT4_FEATURE_INCOMPAT_64BIT 0x0080
+#define EXT4_FEATURE_INCOMPAT_FLEX_BG 0x0200

#define EXT4_FEATURE_COMPAT_SUPP EXT2_FEATURE_COMPAT_EXT_ATTR
#define EXT4_FEATURE_INCOMPAT_SUPP (EXT4_FEATURE_INCOMPAT_FILETYPE| \
EXT4_FEATURE_INCOMPAT_RECOVER| \
EXT4_FEATURE_INCOMPAT_META_BG| \
EXT4_FEATURE_INCOMPAT_EXTENTS| \
- EXT4_FEATURE_INCOMPAT_64BIT)
+ EXT4_FEATURE_INCOMPAT_64BIT| \
+ EXT4_FEATURE_INCOMPAT_FLEX_BG)
#define EXT4_FEATURE_RO_COMPAT_SUPP (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
EXT4_FEATURE_RO_COMPAT_DIR_NLINK | \

2007-07-16 06:35:00

by Andreas Dilger

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Jul 12, 2007 10:09 -0500, Jose R. Santos wrote:
> @@ -1271,6 +1271,9 @@ static int ext4_check_descriptors (struc
>
> ext4_debug ("Checking group descriptors");
>
> + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> + return 1;
> +
> for (i = 0; i < sbi->s_groups_count; i++)
> {
> if (i == sbi->s_groups_count - 1)

It looks pretty straight forward to just change this code to leave
first_block at s_first_data_block, and leave last_block at ext4_blocks_count()
if FLEX_BG is set.

Even with FLEX_BG we want to keep the group metadata within the bounds of
the filesystem.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-07-16 12:29:04

by Jose R. Santos

[permalink] [raw]
Subject: Re: Initial results of FLEX_BG feature.

On Mon, 16 Jul 2007 00:34:57 -0600
Andreas Dilger <[email protected]> wrote:

> On Jul 12, 2007 10:09 -0500, Jose R. Santos wrote:
> > @@ -1271,6 +1271,9 @@ static int ext4_check_descriptors (struc
> >
> > ext4_debug ("Checking group descriptors");
> >
> > + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> > + return 1;
> > +
> > for (i = 0; i < sbi->s_groups_count; i++)
> > {
> > if (i == sbi->s_groups_count - 1)
>
> It looks pretty straight forward to just change this code to leave
> first_block at s_first_data_block, and leave last_block at ext4_blocks_count()
> if FLEX_BG is set.

Sure. I'll add that.

> Even with FLEX_BG we want to keep the group metadata within the bounds of
> the filesystem.

Eventually, I want to be able to export the groups per flex groups so
that we can correctly calculate where the bounds of each block groups
metadata should be.

> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>

-JRS