2011-07-15 21:42:16

by Dan Ehrenberg

[permalink] [raw]
Subject: [PATCH v2 1/2] ext4: Preallocation is a multiple of stripe size

Previously, if a stripe width was provided, then it would be used
as the preallocation granularity, with no santiy checking and no
way to override this. Now, mb_prealloc_size defaults to the smallest
multiple of stripe size that is greater than or equal to the old
default mb_prealloc_size, and this can be overridden with the sysfs
interface.

Signed-off-by: Dan Ehrenberg <[email protected]>
---
fs/ext4/mballoc.c | 29 ++++++++++++++++++++---------
1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 6ed859d..754eb29 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -128,12 +128,13 @@
* we are doing a group prealloc we try to normalize the request to
* sbi->s_mb_group_prealloc. Default value of s_mb_group_prealloc is
* 512 blocks. This can be tuned via
- * /sys/fs/ext4/<partition/mb_group_prealloc. The value is represented in
+ * /sys/fs/ext4/<partition>/mb_group_prealloc. The value is represented in
* terms of number of blocks. If we have mounted the file system with -O
* stripe=<value> option the group prealloc request is normalized to the
- * stripe value (sbi->s_stripe)
+ * the smallest multiple of the stripe value (sbi->s_stripe) which is
+ * greater than the default mb_group_prealloc.
*
- * The regular allocator(using the buddy cache) supports few tunables.
+ * The regular allocator (using the buddy cache) supports a few tunables.
*
* /sys/fs/ext4/<partition>/mb_min_to_scan
* /sys/fs/ext4/<partition>/mb_max_to_scan
@@ -2472,6 +2473,18 @@ int ext4_mb_init(struct super_block *sb, int needs_recovery)
sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD;
sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS;
sbi->s_mb_group_prealloc = MB_DEFAULT_GROUP_PREALLOC;
+ /*
+ * If there is a s_stripe > 1, then we set the s_mb_group_prealloc
+ * to the lowest multiple of s_stripe which is bigger than
+ * the s_mb_group_prealloc as determined above. We want
+ * the preallocation size to be an exact multiple of the
+ * RAID stripe size so that preallocations don't fragment
+ * the stripes.
+ */
+ if (sbi->s_stripe > 1) {
+ sbi->s_mb_group_prealloc = roundup(
+ sbi->s_mb_group_prealloc, sbi->s_stripe);
+ }

sbi->s_locality_groups = alloc_percpu(struct ext4_locality_group);
if (sbi->s_locality_groups == NULL) {
@@ -2830,8 +2843,9 @@ out_err:

/*
* here we normalize request for locality group
- * Group request are normalized to s_strip size if we set the same via mount
- * option. If not we set it to s_mb_group_prealloc which can be configured via
+ * Group request are normalized to s_mb_group_prealloc, which goes to
+ * s_strip if we set the same via mount option.
+ * s_mb_group_prealloc can be configured via
* /sys/fs/ext4/<partition>/mb_group_prealloc
*
* XXX: should we try to preallocate more than the group has now?
@@ -2842,10 +2856,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac)
struct ext4_locality_group *lg = ac->ac_lg;

BUG_ON(lg == NULL);
- if (EXT4_SB(sb)->s_stripe)
- ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
- else
- ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
+ ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
mb_debug(1, "#%u: goal %u blocks for locality group\n",
current->pid, ac->ac_g_ex.fe_len);
}
--
1.7.3.1



2011-07-15 21:42:26

by Dan Ehrenberg

[permalink] [raw]
Subject: [PATCH v2 2/2] ext4: Ignore a stripe width of 1

If the stripe width was set to 1, then this patch will ignore
that stripe width and ext4 will act as if the stripe width
were 0 with respect to optimizing allocations.

Signed-off-by: Dan Ehrenberg <[email protected]>
---
fs/ext4/super.c | 22 ++++++++++++++++------
1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9ea71aa..0a3745b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2383,17 +2383,27 @@ static unsigned long ext4_get_stripe_size(struct ext4_sb_info *sbi)
unsigned long stride = le16_to_cpu(sbi->s_es->s_raid_stride);
unsigned long stripe_width =
le32_to_cpu(sbi->s_es->s_raid_stripe_width);
+ int ret;

if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group)
- return sbi->s_stripe;
+ ret = sbi->s_stripe;

- if (stripe_width <= sbi->s_blocks_per_group)
- return stripe_width;
+ else if (stripe_width <= sbi->s_blocks_per_group)
+ ret = stripe_width;

- if (stride <= sbi->s_blocks_per_group)
- return stride;
+ else if (stride <= sbi->s_blocks_per_group)
+ ret = stride;
+ else
+ ret = 0;

- return 0;
+ /*
+ * If the stripe width is 1, this makes no sense and
+ * we set it to 0 to turn off stripe handling code.
+ */
+ if (ret <= 1)
+ ret = 0;
+
+ return ret;
}

/* sysfs supprt */
--
1.7.3.1


2011-07-15 21:47:16

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] ext4: Ignore a stripe width of 1

On 7/15/11 4:41 PM, Dan Ehrenberg wrote:
> If the stripe width was set to 1, then this patch will ignore
> that stripe width and ext4 will act as if the stripe width
> were 0 with respect to optimizing allocations.
>
> Signed-off-by: Dan Ehrenberg <[email protected]>

Thanks, I think this makes sense.

Reviewed-by: Eric Sandeen <[email protected]>

> ---
> fs/ext4/super.c | 22 ++++++++++++++++------
> 1 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9ea71aa..0a3745b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2383,17 +2383,27 @@ static unsigned long ext4_get_stripe_size(struct ext4_sb_info *sbi)
> unsigned long stride = le16_to_cpu(sbi->s_es->s_raid_stride);
> unsigned long stripe_width =
> le32_to_cpu(sbi->s_es->s_raid_stripe_width);
> + int ret;
>
> if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group)
> - return sbi->s_stripe;
> + ret = sbi->s_stripe;
>
> - if (stripe_width <= sbi->s_blocks_per_group)
> - return stripe_width;
> + else if (stripe_width <= sbi->s_blocks_per_group)
> + ret = stripe_width;
>
> - if (stride <= sbi->s_blocks_per_group)
> - return stride;
> + else if (stride <= sbi->s_blocks_per_group)
> + ret = stride;
> + else
> + ret = 0;
>
> - return 0;
> + /*
> + * If the stripe width is 1, this makes no sense and
> + * we set it to 0 to turn off stripe handling code.
> + */
> + if (ret <= 1)
> + ret = 0;
> +
> + return ret;
> }
>
> /* sysfs supprt */


2011-07-18 01:13:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] ext4: Preallocation is a multiple of stripe size

On Fri, Jul 15, 2011 at 02:41:54PM -0700, Dan Ehrenberg wrote:
> Previously, if a stripe width was provided, then it would be used
> as the preallocation granularity, with no santiy checking and no
> way to override this. Now, mb_prealloc_size defaults to the smallest
> multiple of stripe size that is greater than or equal to the old
> default mb_prealloc_size, and this can be overridden with the sysfs
> interface.
>
> Signed-off-by: Dan Ehrenberg <[email protected]>

Added to the ext4 tree, thanks!

- Ted


2011-07-18 01:21:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] ext4: Ignore a stripe width of 1

On Fri, Jul 15, 2011 at 02:41:55PM -0700, Dan Ehrenberg wrote:
> If the stripe width was set to 1, then this patch will ignore
> that stripe width and ext4 will act as if the stripe width
> were 0 with respect to optimizing allocations.
>
> Signed-off-by: Dan Ehrenberg <[email protected]>

Applied to the ext4 tree. I did make one formatting change. Please
don't have blank lines between the if and else clauses, like this:

if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group)
ret = sbi->s_stripe;

else if (stripe_width <= sbi->s_blocks_per_group)
ret = stripe_width;

else if (stride <= sbi->s_blocks_per_group)
ret = stride;

it wastes vertical whitespace and makes the control flow harder to
follow. Eliminate the blank lines, and it's easier to read, I think.

if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group)
ret = sbi->s_stripe;
else if (stripe_width <= sbi->s_blocks_per_group)
ret = stripe_width;
else if (stride <= sbi->s_blocks_per_group)
ret = stride;

- Ted