2008-01-09 17:08:02

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

The stipe size used during block allocation is calculated as below.
a) if we specify a stripe=<value> option using mount time. Use that value.
b) if not use the value specified in super block.
b) If the value specfied at mount time is greater than blocks per group use
the value specified ini the super block.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/super.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index db1edc8..10330eb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2136,6 +2136,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
sbi->s_rsv_window_head.rsv_alloc_hit = 0;
sbi->s_rsv_window_head.rsv_goal_size = 0;
ext4_rsv_window_add(sb, &sbi->s_rsv_window_head);
+ /*
+ * set the stripe size. If we have specified it via mount option, then
+ * use the mount option value. If the value specified at mount time is
+ * greater than the blocks per group use the super block value.
+ * Allocator needs it be less than blocks per group.
+ */
+ if (!sbi->s_stripe ||
+ sbi->s_stripe >= sbi->s_blocks_per_group) {
+ sbi->s_stripe = le32_to_cpu(es->s_raid_stripe_width);
+ }

/*
* set up enough so that it can read an inode
--
1.5.4.rc2.60.gb2e62-dirty


2008-01-09 17:08:02

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: Check for return value from sb_set_blocksize

sb_set_blocksize validates whether the specfied block size can be used by
the file system. Make sure we fail mounting the file system if the
blocksize specfied cannot be used.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/super.c | 18 +++++++-----------
1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 10330eb..ab62d7f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1793,7 +1793,6 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
unsigned long def_mount_opts;
struct inode *root;
int blocksize;
- int hblock;
int db_count;
int i;
int needs_recovery;
@@ -1958,20 +1957,17 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
goto failed_mount;
}

- hblock = bdev_hardsect_size(sb->s_bdev);
if (sb->s_blocksize != blocksize) {
- /*
- * Make sure the blocksize for the filesystem is larger
- * than the hardware sectorsize for the machine.
- */
- if (blocksize < hblock) {
- printk(KERN_ERR "EXT4-fs: blocksize %d too small for "
- "device blocksize %d.\n", blocksize, hblock);
+
+ brelse (bh);
+
+ /* Validate the filesystem blocksize */
+ if (!sb_set_blocksize(sb, blocksize)) {
+ printk(KERN_ERR "EXT4-fs: bad block size %d.\n",
+ blocksize);
goto failed_mount;
}

- brelse (bh);
- sb_set_blocksize(sb, blocksize);
logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
offset = do_div(logical_sb_block, blocksize);
bh = sb_bread(sb, logical_sb_block);
--
1.5.4.rc2.60.gb2e62-dirty

2008-01-09 17:22:57

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

Aneesh Kumar K.V wrote:
> The stipe size used during block allocation is calculated as below.
> a) if we specify a stripe=<value> option using mount time. Use that value.
> b) if not use the value specified in super block.
> b) If the value specfied at mount time is greater than blocks per group use
> the value specified ini the super block.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/super.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index db1edc8..10330eb 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2136,6 +2136,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
> sbi->s_rsv_window_head.rsv_alloc_hit = 0;
> sbi->s_rsv_window_head.rsv_goal_size = 0;
> ext4_rsv_window_add(sb, &sbi->s_rsv_window_head);
> + /*
> + * set the stripe size. If we have specified it via mount option, then
> + * use the mount option value. If the value specified at mount time is
> + * greater than the blocks per group use the super block value.
> + * Allocator needs it be less than blocks per group.
> + */
> + if (!sbi->s_stripe ||
> + sbi->s_stripe >= sbi->s_blocks_per_group) {

Is stripe == s_blocks_per_group problematic?

Should we warn/printk in the case that the specified size is too big?

Thanks,

-Eric

> + sbi->s_stripe = le32_to_cpu(es->s_raid_stripe_width);
> + }
>
> /*
> * set up enough so that it can read an inode

2008-01-09 18:21:23

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Check for return value from sb_set_blocksize

Updated patch. The earlier patch did multiple brelse() during failed
mount case.


ext4: Check for return value from sb_set_blocksize

From: Aneesh Kumar K.V <[email protected]>

sb_set_blocksize validates whether the specfied block size can be used by
the file system. Make sure we fail mounting the file system if the
blocksize specfied cannot be used.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---

fs/ext4/super.c | 15 +++++----------
1 files changed, 5 insertions(+), 10 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 10330eb..f9a9ef1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1793,7 +1793,6 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
unsigned long def_mount_opts;
struct inode *root;
int blocksize;
- int hblock;
int db_count;
int i;
int needs_recovery;
@@ -1958,20 +1957,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
goto failed_mount;
}

- hblock = bdev_hardsect_size(sb->s_bdev);
if (sb->s_blocksize != blocksize) {
- /*
- * Make sure the blocksize for the filesystem is larger
- * than the hardware sectorsize for the machine.
- */
- if (blocksize < hblock) {
- printk(KERN_ERR "EXT4-fs: blocksize %d too small for "
- "device blocksize %d.\n", blocksize, hblock);
+
+ /* Validate the filesystem blocksize */
+ if (!sb_set_blocksize(sb, blocksize)) {
+ printk(KERN_ERR "EXT4-fs: bad block size %d.\n",
+ blocksize);
goto failed_mount;
}

brelse (bh);
- sb_set_blocksize(sb, blocksize);
logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
offset = do_div(logical_sb_block, blocksize);
bh = sb_bread(sb, logical_sb_block);

2008-01-09 22:02:21

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

On Wed, 2008-01-09 at 22:37 +0530, Aneesh Kumar K.V wrote:
> The stipe size used during block allocation is calculated as below.
> a) if we specify a stripe=<value> option using mount time. Use that value.
> b) if not use the value specified in super block.
> b) If the value specfied at mount time is greater than blocks per group use
> the value specified ini the super block.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/super.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index db1edc8..10330eb 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2136,6 +2136,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
> sbi->s_rsv_window_head.rsv_alloc_hit = 0;
> sbi->s_rsv_window_head.rsv_goal_size = 0;
> ext4_rsv_window_add(sb, &sbi->s_rsv_window_head);
> + /*
> + * set the stripe size. If we have specified it via mount option, then
> + * use the mount option value. If the value specified at mount time is
> + * greater than the blocks per group use the super block value.
> + * Allocator needs it be less than blocks per group.

Is that still true with flex bg?

> + */
> + if (!sbi->s_stripe ||
> + sbi->s_stripe >= sbi->s_blocks_per_group) {
> + sbi->s_stripe = le32_to_cpu(es->s_raid_stripe_width);
> + }
>
> /*
> * set up enough so that it can read an inode

2008-01-09 23:36:34

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

On Jan 09, 2008 22:37 +0530, Aneesh Kumar K.V wrote:
> The stipe size used during block allocation is calculated as below.
> a) if we specify a stripe=<value> option using mount time. Use that value.
> b) if not use the value specified in super block.
> b) If the value specfied at mount time is greater than blocks per group use
> the value specified ini the super block.

What if the value on disk is also bad? I'd add (after the second 'b' :-):

d) if s_stripe is still > s_blocks_per_group try s_raid_stride
e) if s_stripe is still > s_blocks_per_group use 0

> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/super.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index db1edc8..10330eb 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2136,6 +2136,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
> sbi->s_rsv_window_head.rsv_alloc_hit = 0;
> sbi->s_rsv_window_head.rsv_goal_size = 0;
> ext4_rsv_window_add(sb, &sbi->s_rsv_window_head);
> + /*
> + * set the stripe size. If we have specified it via mount option, then
> + * use the mount option value. If the value specified at mount time is
> + * greater than the blocks per group use the super block value.
> + * Allocator needs it be less than blocks per group.
> + */
> + if (!sbi->s_stripe ||
> + sbi->s_stripe >= sbi->s_blocks_per_group) {

Please fix alignment to be just after '(' on previous line.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-01-10 04:29:11

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

On Wed, Jan 09, 2008 at 04:36:27PM -0700, Andreas Dilger wrote:
> On Jan 09, 2008 22:37 +0530, Aneesh Kumar K.V wrote:
> > The stipe size used during block allocation is calculated as below.
> > a) if we specify a stripe=<value> option using mount time. Use that value.
> > b) if not use the value specified in super block.
> > b) If the value specfied at mount time is greater than blocks per group use
> > the value specified ini the super block.
>
> What if the value on disk is also bad? I'd add (after the second 'b' :-):
>
> d) if s_stripe is still > s_blocks_per_group try s_raid_stride
> e) if s_stripe is still > s_blocks_per_group use 0

But i guess mke2fs and tune2fs should validate the value of
s_raid_stripe_width and s_raid_stride. Both of them should be less that
blocks per group. Should I add extra check in the kernel for them ?

>
> > Signed-off-by: Aneesh Kumar K.V <[email protected]>
> > ---
> > fs/ext4/super.c | 10 ++++++++++
> > 1 files changed, 10 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index db1edc8..10330eb 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -2136,6 +2136,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
> > sbi->s_rsv_window_head.rsv_alloc_hit = 0;
> > sbi->s_rsv_window_head.rsv_goal_size = 0;
> > ext4_rsv_window_add(sb, &sbi->s_rsv_window_head);
> > + /*
> > + * set the stripe size. If we have specified it via mount option, then
> > + * use the mount option value. If the value specified at mount time is
> > + * greater than the blocks per group use the super block value.
> > + * Allocator needs it be less than blocks per group.
> > + */
> > + if (!sbi->s_stripe ||
> > + sbi->s_stripe >= sbi->s_blocks_per_group) {
>

So what do you think should it be > or >=. Looking at the mballoc I
guess it should work with stripe size equal to blocks per group. I am
not sure how efficient the allocation would be though.


-aneesh

2008-01-10 08:22:24

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Use superblock s_raid_stripe_width as stripe size during block allocation.

On Jan 10, 2008 09:58 +0530, Aneesh Kumar K.V wrote:
> > d) if s_stripe is still > s_blocks_per_group try s_raid_stride
> > e) if s_stripe is still > s_blocks_per_group use 0
>
> But i guess mke2fs and tune2fs should validate the value of
> s_raid_stripe_width and s_raid_stride. Both of them should be less that
> blocks per group. Should I add extra check in the kernel for them ?

It's true that mke2fs and tune2fs should validate this, but it is also
possible to become corrupted, and e2fsck doesn't fix it yet nor can it
make a good estimate of the right value.

> > > + if (!sbi->s_stripe ||
> > > + sbi->s_stripe >= sbi->s_blocks_per_group) {
> >
>
> So what do you think should it be > or >=. Looking at the mballoc I
> guess it should work with stripe size equal to blocks per group. I am
> not sure how efficient the allocation would be though.

I think if s_stripe == s_blocks_per_group that is fine... For 1kB block
filesystem that is only 8MB in size.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.