2011-02-09 05:52:36

by Tao Ma

[permalink] [raw]
Subject: [PATCH 0/4] EXT4 trim bug fixes and improvement.

Hi Ted and the list,
Here are 4 patches for trim support in ext4.
Patch 1/4 is a fix for the 'len' overflow. As we have decided that
we don't adjust start to s_first_data_block, this patch is still needed.
Patch 2/4 is a speedup for trim in one group.
Patch 3/4 adds trace points for the functions used in FITRIM.
Patch 4/4 is a speedup for trim in the whole volume. For more
details, please check the commit log.

Regards,
Tao


2011-02-09 05:57:54

by Tao Ma

[permalink] [raw]
Subject: [PATCH 1/4] ext4: fix trim length underflow with small trim length.

From: Tao Ma <[email protected]>

In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/mballoc.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 02cff4a..94e9f23 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4839,6 +4839,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)

if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
return -EINVAL;
+ if (start >= ext4_blocks_count(EXT4_SB(sb)->s_es) ||
+ start + len <= first_data_blk)
+ goto out;
if (start < first_data_blk) {
len -= first_data_blk - start;
start = first_data_blk;
@@ -4883,5 +4886,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
}
range->len = trimmed * sb->s_blocksize;

+out:
return ret;
}
--
1.6.3.GIT


2011-02-09 05:58:13

by Tao Ma

[permalink] [raw]
Subject: [PATCH 2/4] ext4: speed up group trim with the right free block count.

From: Tao Ma <[email protected]>

When we trim some free blocks in a group of ext4, we should
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M. With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M. But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.

Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/mballoc.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 94e9f23..806adb2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4757,7 +4757,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks)
{
void *bitmap;
- ext4_grpblk_t next, count = 0;
+ ext4_grpblk_t next, count = 0, free_count = 0;
ext4_group_t group;
int ret = 0;

@@ -4782,6 +4782,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
break;
count += next - start;
}
+ free_count += next - start;
start = next + 1;

if (fatal_signal_pending(current)) {
@@ -4795,7 +4796,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
ext4_lock_group(sb, group);
}

- if ((e4b->bd_info->bb_free - count) < minblocks)
+ if ((e4b->bd_info->bb_free - free_count) < minblocks)
break;
}
ext4_unlock_group(sb, group);
--
1.6.3.GIT


2011-02-09 05:58:16

by Tao Ma

[permalink] [raw]
Subject: [PATCH 3/4] ext4: Add new ext4 trim tracepoints

From: Tao Ma <[email protected]>

Add ext4_trim_extent and ext4_trim_all_free.

Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/mballoc.c | 5 ++++
include/trace/events/ext4.h | 49 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 806adb2..4eadac8 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4715,6 +4715,8 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count,
struct ext4_free_extent ex;
int ret = 0;

+ trace_ext4_trim_extent(sb, group, start, count);
+
assert_spin_locked(ext4_group_lock_ptr(sb, group));

ex.fe_start = start;
@@ -4767,8 +4769,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
group = e4b->bd_group;
start = (e4b->bd_info->bb_first_free > start) ?
e4b->bd_info->bb_first_free : start;
+
ext4_lock_group(sb, group);

+ trace_ext4_trim_all_free(sb, group, start, max);
+
while (start < max) {
start = mb_find_next_zero_bit(bitmap, max, start);
if (start >= max)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index e5e345f..b8e6aeb 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1166,6 +1166,55 @@ DEFINE_EVENT(ext4__bitmap_load, ext4_mb_buddy_bitmap_load,
TP_ARGS(sb, group)
);

+DECLARE_EVENT_CLASS(ext4__trim,
+ TP_PROTO(struct super_block *sb,
+ ext4_group_t group,
+ ext4_grpblk_t start,
+ ext4_grpblk_t len),
+
+ TP_ARGS(sb, group, start, len),
+
+ TP_STRUCT__entry(
+ __field( int, dev_major )
+ __field( int, dev_minor )
+ __field( __u32, group )
+ __field( int, start )
+ __field( int, len )
+ ),
+
+ TP_fast_assign(
+ __entry->dev_major = MAJOR(sb->s_dev);
+ __entry->dev_minor = MINOR(sb->s_dev);
+ __entry->group = group;
+ __entry->start = start;
+ __entry->len = len;
+ ),
+
+ TP_printk("dev %d,%d group %u, start %d, len %d",
+ __entry->dev_major, __entry->dev_minor,
+ __entry->group, __entry->start, __entry->len)
+);
+
+DEFINE_EVENT(ext4__trim, ext4_trim_extent,
+
+ TP_PROTO(struct super_block *sb,
+ ext4_group_t group,
+ ext4_grpblk_t start,
+ ext4_grpblk_t len),
+
+ TP_ARGS(sb, group, start, len)
+);
+
+DEFINE_EVENT(ext4__trim, ext4_trim_all_free,
+
+ TP_PROTO(struct super_block *sb,
+ ext4_group_t group,
+ ext4_grpblk_t start,
+ ext4_grpblk_t len),
+
+ TP_ARGS(sb, group, start, len)
+);
+
#endif /* _TRACE_EXT4_H */

/* This part must be outside protection */
--
1.6.3.GIT


2011-02-09 05:58:19

by Tao Ma

[permalink] [raw]
Subject: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.

From: Tao Ma <[email protected]>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state. The idea
is inspired by Andreas and Lukas Czerner.

Cc: Andreas Dilger <[email protected]>
Cc: Lukas Czerner <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/ext4.h | 8 +++++++-
fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
2 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..684f48e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
struct ext4_li_request *s_li_request;
/* Wait multiplier for lazy initialization thread */
unsigned int s_li_wait_mult;
+
+ /* record the last minlen when FITRIM is called. */
+ u64 trim_minlen;
};

static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,10 +1973,13 @@ struct ext4_group_info {
* 5 free 8-block regions. */
};

-#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED 1

#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
+ (test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED, &((grp)->bb_state)))

#define EXT4_MAX_CONTENTION 8
#define EXT4_CONTENTION_THRESHOLD 2
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..3f6b809 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
mb_check_buddy(e4b);

+ if (!ret)
+ EXT4_SB(sb)->trim_minlen = minlen;
+
return ret;
}

@@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
rb_erase(&entry->node, &(db->bb_free_root));
mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);

+ /*
+ * Clear the trimmed flag for the group so that the next
+ * ext4_trim_fs can trim it.
+ * If the volume is mounted with -o discard, online discard
+ * is supported and the free blocks will be trimmed online.
+ */
+ if (!test_opt(sb, DISCARD))
+ clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
+ &(db->bb_state));
+
if (!db->bb_free_root.rb_node) {
/* No more items in the per group rb tree
* balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,

ext4_lock_group(sb, group);

+ if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
+ minblocks >= EXT4_SB(sb)->trim_minlen)
+ goto out;
+
trace_ext4_trim_all_free(sb, group, start, max);

while (start < max) {
@@ -4804,6 +4821,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
if ((e4b->bd_info->bb_free - free_count) < minblocks)
break;
}
+
+ if (!ret)
+ set_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
+ &(e4b->bd_info->bb_state));
+out:
ext4_unlock_group(sb, group);

ext4_debug("trimmed %d blocks in the group %d\n",
--
1.6.3.GIT


2011-02-09 14:01:52

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On Wed, 9 Feb 2011, Tao Ma wrote:

> From: Tao Ma <[email protected]>
>
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
>
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state. The idea
> is inspired by Andreas and Lukas Czerner.

Hi Tao,

This is great, thanks for the patch! Do you have any benchmark numbers
showing how much does it improve FSTRIM time ? It seems it might help a
lot. Couple of comments bellow.

>
> Cc: Andreas Dilger <[email protected]>
> Cc: Lukas Czerner <[email protected]>
> Signed-off-by: Tao Ma <[email protected]>
> ---
> fs/ext4/ext4.h | 8 +++++++-
> fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
> 2 files changed, 29 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 0c8d97b..684f48e 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
> struct ext4_li_request *s_li_request;
> /* Wait multiplier for lazy initialization thread */
> unsigned int s_li_wait_mult;
> +
> + /* record the last minlen when FITRIM is called. */
> + u64 trim_minlen;

Maybe this is a bit nitpicking, but it is not very clear what is the
"trim_minlen" for. I would rather use something like
"last_trim_minblks", or similar.

> };
>
> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
> * 5 free 8-block regions. */
> };
>
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED 1
>
> #define EXT4_MB_GRP_NEED_INIT(grp) \
> (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
> + (test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED, &((grp)->bb_state)))
>
> #define EXT4_MAX_CONTENTION 8
> #define EXT4_CONTENTION_THRESHOLD 2
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4eadac8..3f6b809 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
> mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
> mb_check_buddy(e4b);
>
> + if (!ret)
> + EXT4_SB(sb)->trim_minlen = minlen;
> +

Did you even tried to compile it ? "minlen" is not defined in
mb_mark_used(), also it does make sense to set trim_minlen there. Did
not you meant it to be in ext4_trim_fs ?

> return ret;
> }
>
> @@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
> rb_erase(&entry->node, &(db->bb_free_root));
> mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>
> + /*
> + * Clear the trimmed flag for the group so that the next
> + * ext4_trim_fs can trim it.
> + * If the volume is mounted with -o discard, online discard
> + * is supported and the free blocks will be trimmed online.
> + */
> + if (!test_opt(sb, DISCARD))
> + clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
> + &(db->bb_state));
> +
> if (!db->bb_free_root.rb_node) {
> /* No more items in the per group rb tree
> * balance refcounts from ext4_mb_free_metadata()
> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>
> ext4_lock_group(sb, group);
>
> + if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> + minblocks >= EXT4_SB(sb)->trim_minlen)
> + goto out;
> +

I think this can be placed in the ext4_trim_fs() in the for() loop so we can
avoid unnecessary call to ext4_trim_all_free.

> trace_ext4_trim_all_free(sb, group, start, max);
>
> while (start < max) {
> @@ -4804,6 +4821,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
> if ((e4b->bd_info->bb_free - free_count) < minblocks)
> break;
> }
> +
> + if (!ret)
> + set_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
> + &(e4b->bd_info->bb_state));
> +out:
> ext4_unlock_group(sb, group);
>
> ext4_debug("trimmed %d blocks in the group %d\n",
>

Thanks!
-Lukas

2011-02-09 19:25:25

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On 2011-02-09, at 06:01, Lukas Czerner wrote:
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>> struct ext4_li_request *s_li_request;
>> /* Wait multiplier for lazy initialization thread */
>> unsigned int s_li_wait_mult;
>> +
>> + /* record the last minlen when FITRIM is called. */
>> + u64 trim_minlen;
>
> Maybe this is a bit nitpicking, but it is not very clear what is the
> "trim_minlen" for. I would rather use something like
> "last_trim_minblks", or similar.

And for good measure, this should use the "s_" prefix to indicate it is in the superblock.

>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED 1

It would also be good if the names of these fields were consistent, like

+#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1

>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>> mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>> mb_check_buddy(e4b);
>>
>> + if (!ret)
>> + EXT4_SB(sb)->trim_minlen = minlen;
>> +
>
> Did you even tried to compile it ? "minlen" is not defined in
> mb_mark_used(), also it does make sense to set trim_minlen there. Did
> not you meant it to be in ext4_trim_fs ?

I was wondering exactly the same thing, but I haven't had time to look at the code.

Cheers, Andreas






2011-02-10 01:37:15

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On 02/09/2011 10:01 PM, Lukas Czerner wrote:
> On Wed, 9 Feb 2011, Tao Ma wrote:
>
>
>> From: Tao Ma<[email protected]>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state. The idea
>> is inspired by Andreas and Lukas Czerner.
>>
> Hi Tao,
>
> This is great, thanks for the patch! Do you have any benchmark numbers
> showing how much does it improve FSTRIM time ? It seems it might help a
> lot. Couple of comments bellow.
>
I have a intel x25m ssd. I will try to collect some data about it and
update them in the commit log.
>
>> Cc: Andreas Dilger<[email protected]>
>> Cc: Lukas Czerner<[email protected]>
>> Signed-off-by: Tao Ma<[email protected]>
>> ---
>> fs/ext4/ext4.h | 8 +++++++-
>> fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
>> 2 files changed, 29 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 0c8d97b..684f48e 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>> struct ext4_li_request *s_li_request;
>> /* Wait multiplier for lazy initialization thread */
>> unsigned int s_li_wait_mult;
>> +
>> + /* record the last minlen when FITRIM is called. */
>> + u64 trim_minlen;
>>
> Maybe this is a bit nitpicking, but it is not very clear what is the
> "trim_minlen" for. I would rather use something like
> "last_trim_minblks", or similar.
>
ok, I will change it in the next round.
>
>> };
>>
>> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>> * 5 free 8-block regions. */
>> };
>>
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED 1
>>
>> #define EXT4_MB_GRP_NEED_INIT(grp) \
>> (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
>> + (test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,&((grp)->bb_state)))
>>
>> #define EXT4_MAX_CONTENTION 8
>> #define EXT4_CONTENTION_THRESHOLD 2
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4eadac8..3f6b809 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>> mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>> mb_check_buddy(e4b);
>>
>> + if (!ret)
>> + EXT4_SB(sb)->trim_minlen = minlen;
>> +
>>
> Did you even tried to compile it ? "minlen" is not defined in
> mb_mark_used(), also it does make sense to set trim_minlen there. Did
> not you meant it to be in ext4_trim_fs ?
>
oh, sorry, this line should be in the function ext4_trim_fs. I just
rebased my branch to the lastest kernel to see whether it conflicts.
I don't know why they are moved here. So the right position is:
@@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct
fstrim_range *range)
}
range->len = trimmed * sb->s_blocksize;

+ if (!ret)
+ EXT4_SB(sb)->trim_minlen = minlen;
+
out:
return ret;
}

Sorry for the trouble.
>
>> return ret;
>> }
>>
>> @@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>> rb_erase(&entry->node,&(db->bb_free_root));
>> mb_free_blocks(NULL,&e4b, entry->start_blk, entry->count);
>>
>> + /*
>> + * Clear the trimmed flag for the group so that the next
>> + * ext4_trim_fs can trim it.
>> + * If the volume is mounted with -o discard, online discard
>> + * is supported and the free blocks will be trimmed online.
>> + */
>> + if (!test_opt(sb, DISCARD))
>> + clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
>> + &(db->bb_state));
>> +
>> if (!db->bb_free_root.rb_node) {
>> /* No more items in the per group rb tree
>> * balance refcounts from ext4_mb_free_metadata()
>> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>>
>> ext4_lock_group(sb, group);
>>
>> + if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info)&&
>> + minblocks>= EXT4_SB(sb)->trim_minlen)
>> + goto out;
>> +
>>
> I think this can be placed in the ext4_trim_fs() in the for() loop so we can
> avoid unnecessary call to ext4_trim_all_free.
>
fair enough. I will update it.

Regards,
Tao

2011-02-10 01:39:20

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On 02/10/2011 03:25 AM, Andreas Dilger wrote:
> On 2011-02-09, at 06:01, Lukas Czerner wrote:
>
>>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>> struct ext4_li_request *s_li_request;
>>> /* Wait multiplier for lazy initialization thread */
>>> unsigned int s_li_wait_mult;
>>> +
>>> + /* record the last minlen when FITRIM is called. */
>>> + u64 trim_minlen;
>>>
>> Maybe this is a bit nitpicking, but it is not very clear what is the
>> "trim_minlen" for. I would rather use something like
>> "last_trim_minblks", or similar.
>>
> And for good measure, this should use the "s_" prefix to indicate it is in the superblock.
>
ok.
>
>>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
>>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED 1
>>>
> It would also be good if the names of these fields were consistent, like
>
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1
>
thanks for the advice.
>
>>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>> mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>>> mb_check_buddy(e4b);
>>>
>>> + if (!ret)
>>> + EXT4_SB(sb)->trim_minlen = minlen;
>>> +
>>>
>> Did you even tried to compile it ? "minlen" is not defined in
>> mb_mark_used(), also it does make sense to set trim_minlen there. Did
>> not you meant it to be in ext4_trim_fs ?
>>
> I was wondering exactly the same thing, but I haven't had time to look at the code.
>
Sorry, it seems to be caused by my rebase. The right place should be in
the end of ext4_trim_fs.

Regards,
Tao
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-02-10 03:56:44

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.

Hi Lukas,
On 02/09/2011 10:01 PM, Lukas Czerner wrote:
> On Wed, 9 Feb 2011, Tao Ma wrote:
>
>
>> From: Tao Ma<[email protected]>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state. The idea
>> is inspired by Andreas and Lukas Czerner.
>>
> Hi Tao,
>
> This is great, thanks for the patch! Do you have any benchmark numbers
> showing how much does it improve FSTRIM time ? It seems it might help a
> lot. Couple of comments bellow.
>
<snip>
>
>> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>>
>> ext4_lock_group(sb, group);
>>
>> + if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info)&&
>> + minblocks>= EXT4_SB(sb)->trim_minlen)
>> + goto out;
>> +
>>
> I think this can be placed in the ext4_trim_fs() in the for() loop so we can
> avoid unnecessary call to ext4_trim_all_free.
>
I remembered why I put the check here. We have to check this flag with
the group locked. So in ext4_trim_fs, we don't lock the group and it is
unsafe to check it there since we may free some blocks and clear the bit
in the same time in release_blocks_on_commit.

Regards,
Tao

2011-02-10 07:34:59

by Tao Ma

[permalink] [raw]
Subject: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.

From: Tao Ma <[email protected]>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb2 108G 35G 68G 34% /mnt/ext4
Block size: 4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m4.039s
user 0m0.000s
sys 0m1.020s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.577s
user 0m0.001s
sys 0m1.004s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.380s
user 0m0.000s
sys 0m0.991s

with the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.466s
user 0m0.000s
sys 0m0.966s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.001s
user 0m0.000s
sys 0m0.001s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.001s
user 0m0.000s
sys 0m0.000s

A big improvement for the 2nd and 3rd run.

After I delete some big image files and re-run the trim,
it is still much faster than iterating the whole disk.
/dev/sdb2 108G 25G 78G 24% /mnt/ext4

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.513s
user 0m0.000s
sys 0m0.069s

Cc: Andreas Dilger <[email protected]>
Cc: Lukas Czerner <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/ext4.h | 8 +++++++-
fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
2 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..1d59a63 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
struct ext4_li_request *s_li_request;
/* Wait multiplier for lazy initialization thread */
unsigned int s_li_wait_mult;
+
+ /* record the last minlen when FITRIM is called. */
+ u64 s_last_trim_minblks;
};

static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,10 +1973,13 @@ struct ext4_group_info {
* 5 free 8-block regions. */
};

-#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1

#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
+ (test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))

#define EXT4_MAX_CONTENTION 8
#define EXT4_CONTENTION_THRESHOLD 2
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..c7aa094 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
rb_erase(&entry->node, &(db->bb_free_root));
mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);

+ /*
+ * Clear the trimmed flag for the group so that the next
+ * ext4_trim_fs can trim it.
+ * If the volume is mounted with -o discard, online discard
+ * is supported and the free blocks will be trimmed online.
+ */
+ if (!test_opt(sb, DISCARD))
+ clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
+ &(db->bb_state));
+
if (!db->bb_free_root.rb_node) {
/* No more items in the per group rb tree
* balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,

ext4_lock_group(sb, group);

+ if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
+ minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
+ goto out;
+
trace_ext4_trim_all_free(sb, group, start, max);

while (start < max) {
@@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
if ((e4b->bd_info->bb_free - free_count) < minblocks)
break;
}
+
+ if (!ret)
+ set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
+ &(e4b->bd_info->bb_state));
+out:
ext4_unlock_group(sb, group);

ext4_debug("trimmed %d blocks in the group %d\n",
@@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
}
range->len = trimmed * sb->s_blocksize;

+ if (!ret)
+ EXT4_SB(sb)->s_last_trim_minblks = minlen;
+
out:
return ret;
}
--
1.6.3.GIT


2011-02-10 11:25:18

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On Thu, 10 Feb 2011, Tao Ma wrote:

> From: Tao Ma <[email protected]>
>
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
>
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state.
>
> A simple test with my intel x25m ssd:
> df -h shows:
> /dev/sdb2 108G 35G 68G 34% /mnt/ext4
> Block size: 4096
>
> run the FITRIM with the following parameter:
> range.start = 0;
> range.len = UINT64_MAX;
> range.minlen = 1048576;
>
> without the patch:
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m4.039s
> user 0m0.000s
> sys 0m1.020s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m3.577s
> user 0m0.001s
> sys 0m1.004s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m3.380s
> user 0m0.000s
> sys 0m0.991s
>
> with the patch:
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m3.466s
> user 0m0.000s
> sys 0m0.966s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m0.001s
> user 0m0.000s
> sys 0m0.001s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m0.001s
> user 0m0.000s
> sys 0m0.000s
>
> A big improvement for the 2nd and 3rd run.
>
> After I delete some big image files and re-run the trim,
> it is still much faster than iterating the whole disk.
> /dev/sdb2 108G 25G 78G 24% /mnt/ext4
>
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m0.513s
> user 0m0.000s
> sys 0m0.069s

Great it looks really good.

>
> Cc: Andreas Dilger <[email protected]>
> Cc: Lukas Czerner <[email protected]>
> Signed-off-by: Tao Ma <[email protected]>
> ---
> fs/ext4/ext4.h | 8 +++++++-
> fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
> 2 files changed, 29 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 0c8d97b..1d59a63 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
> struct ext4_li_request *s_li_request;
> /* Wait multiplier for lazy initialization thread */
> unsigned int s_li_wait_mult;
> +
> + /* record the last minlen when FITRIM is called. */
> + u64 s_last_trim_minblks;
> };
>
> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
> * 5 free 8-block regions. */
> };
>
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1
>
> #define EXT4_MB_GRP_NEED_INIT(grp) \
> (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
> + (test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>
> #define EXT4_MAX_CONTENTION 8
> #define EXT4_CONTENTION_THRESHOLD 2
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4eadac8..c7aa094 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
> rb_erase(&entry->node, &(db->bb_free_root));
> mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>
> + /*
> + * Clear the trimmed flag for the group so that the next
> + * ext4_trim_fs can trim it.
> + * If the volume is mounted with -o discard, online discard
> + * is supported and the free blocks will be trimmed online.
> + */
> + if (!test_opt(sb, DISCARD))
> + clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> + &(db->bb_state));
> +
> if (!db->bb_free_root.rb_node) {
> /* No more items in the per group rb tree
> * balance refcounts from ext4_mb_free_metadata()
> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>
> ext4_lock_group(sb, group);
>
> + if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> + minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
> + goto out;
> +
> trace_ext4_trim_all_free(sb, group, start, max);
>
> while (start < max) {
> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
> if ((e4b->bd_info->bb_free - free_count) < minblocks)
> break;
> }
> +
> + if (!ret)
> + set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> + &(e4b->bd_info->bb_state));
> +out:
> ext4_unlock_group(sb, group);
>
> ext4_debug("trimmed %d blocks in the group %d\n",
> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
> }
> range->len = trimmed * sb->s_blocksize;
>
> + if (!ret)
> + EXT4_SB(sb)->s_last_trim_minblks = minlen;
> +

Since this is not protected by any lock, would not it race in case of
multiple FITRIM calls ?

> out:
> return ret;
> }
>

Thanks!
-Lukas

2011-02-10 14:58:16

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.

> On Thu, 10 Feb 2011, Tao Ma wrote:
>
>> From: Tao Ma <[email protected]>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state.
>>
>> A simple test with my intel x25m ssd:
>> df -h shows:
>> /dev/sdb2 108G 35G 68G 34% /mnt/ext4
>> Block size: 4096
>>
>> run the FITRIM with the following parameter:
>> range.start = 0;
>> range.len = UINT64_MAX;
>> range.minlen = 1048576;
>>
>> without the patch:
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m4.039s
>> user 0m0.000s
>> sys 0m1.020s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m3.577s
>> user 0m0.001s
>> sys 0m1.004s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m3.380s
>> user 0m0.000s
>> sys 0m0.991s
>>
>> with the patch:
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m3.466s
>> user 0m0.000s
>> sys 0m0.966s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m0.001s
>> user 0m0.000s
>> sys 0m0.001s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m0.001s
>> user 0m0.000s
>> sys 0m0.000s
>>
>> A big improvement for the 2nd and 3rd run.
>>
>> After I delete some big image files and re-run the trim,
>> it is still much faster than iterating the whole disk.
>> /dev/sdb2 108G 25G 78G 24% /mnt/ext4
>>
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real 0m0.513s
>> user 0m0.000s
>> sys 0m0.069s
>
> Great it looks really good.
>
>>
>> Cc: Andreas Dilger <[email protected]>
>> Cc: Lukas Czerner <[email protected]>
>> Signed-off-by: Tao Ma <[email protected]>
>> ---
>> fs/ext4/ext4.h | 8 +++++++-
>> fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
>> 2 files changed, 29 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 0c8d97b..1d59a63 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>> struct ext4_li_request *s_li_request;
>> /* Wait multiplier for lazy initialization thread */
>> unsigned int s_li_wait_mult;
>> +
>> + /* record the last minlen when FITRIM is called. */
>> + u64 s_last_trim_minblks;
>> };
>>
>> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>> * 5 free 8-block regions. */
>> };
>>
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
>> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1
>>
>> #define EXT4_MB_GRP_NEED_INIT(grp) \
>> (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
>> + (test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>>
>> #define EXT4_MAX_CONTENTION 8
>> #define EXT4_CONTENTION_THRESHOLD 2
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4eadac8..c7aa094 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t
>> *journal, transaction_t *txn)
>> rb_erase(&entry->node, &(db->bb_free_root));
>> mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>>
>> + /*
>> + * Clear the trimmed flag for the group so that the next
>> + * ext4_trim_fs can trim it.
>> + * If the volume is mounted with -o discard, online discard
>> + * is supported and the free blocks will be trimmed online.
>> + */
>> + if (!test_opt(sb, DISCARD))
>> + clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
>> + &(db->bb_state));
>> +
>> if (!db->bb_free_root.rb_node) {
>> /* No more items in the per group rb tree
>> * balance refcounts from ext4_mb_free_metadata()
>> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct
>> super_block *sb, struct ext4_buddy *e4b,
>>
>> ext4_lock_group(sb, group);
>>
>> + if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
>> + minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
>> + goto out;
>> +
>> trace_ext4_trim_all_free(sb, group, start, max);
>>
>> while (start < max) {
>> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct
>> super_block *sb, struct ext4_buddy *e4b,
>> if ((e4b->bd_info->bb_free - free_count) < minblocks)
>> break;
>> }
>> +
>> + if (!ret)
>> + set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
>> + &(e4b->bd_info->bb_state));
>> +out:
>> ext4_unlock_group(sb, group);
>>
>> ext4_debug("trimmed %d blocks in the group %d\n",
>> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct
>> fstrim_range *range)
>> }
>> range->len = trimmed * sb->s_blocksize;
>>
>> + if (!ret)
>> + EXT4_SB(sb)->s_last_trim_minblks = minlen;
>> +
>
> Since this is not protected by any lock, would not it race in case of
> multiple FITRIM calls ?
yeah, I am also thinking of this, but I don't think we need a new lock
just for this. And I guess atomic_t isn't good here because minlen is a
u64.

Do you think we can use some other spin_lock in ext4 system? I am not
quite familiar with ext4 by now, so do you have any suggestion?

Regards,
Tao



2011-02-11 06:01:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On 2011-02-09, at 23:33, Tao Ma <[email protected]> wrote:
> After I delete some big image files and re-run the trim,
> it is still much faster than iterating the whole disk.
> /dev/sdb2 108G 25G 78G 24% /mnt/ext4
>
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real 0m0.513s
> user 0m0.000s
> sys 0m0.069s

Excellent results.

> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1
>
> #define EXT4_MB_GRP_NEED_INIT(grp) \
> (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \

For consistency, it would be better to call this:

EXT4_MB_GRP_WAS_TRIMMED(grp)

And also add and use:

EXT4_MB_GRP_SET_TRIMMED(grp)

And

EXT4_MB_GRP_CLEAR_TRIMMED(grp)

>

In the code. Then you can add Reviewed-by: on this patch.

Cheers, Andreas

2011-02-11 06:30:02

by Tao Ma

[permalink] [raw]
Subject: [PATCH 4/4 v3] ext4: Speed up FITRIM by recording flags in ext4_group_info.

From: Tao Ma <[email protected]>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb2 108G 35G 68G 34% /mnt/ext4
Block size: 4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m4.039s
user 0m0.000s
sys 0m1.020s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.577s
user 0m0.001s
sys 0m1.004s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.380s
user 0m0.000s
sys 0m0.991s

with the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.466s
user 0m0.000s
sys 0m0.966s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.001s
user 0m0.000s
sys 0m0.001s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.001s
user 0m0.000s
sys 0m0.000s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.
/dev/sdb2 108G 25G 78G 24% /mnt/ext4

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.513s
user 0m0.000s
sys 0m0.069s

Cc: Lukas Czerner <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/ext4.h | 13 ++++++++++++-
fs/ext4/mballoc.c | 20 ++++++++++++++++++++
2 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..259887d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
struct ext4_li_request *s_li_request;
/* Wait multiplier for lazy initialization thread */
unsigned int s_li_wait_mult;
+
+ /* record the last minlen when FITRIM is called. */
+ u64 s_last_trim_minblks;
};

static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,11 +1973,19 @@ struct ext4_group_info {
* 5 free 8-block regions. */
};

-#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1

#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))

+#define EXT4_MB_GRP_WAS_TRIMMED(grp) \
+ (test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_SET_TRIMMED(grp) \
+ (set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CLEAR_TRIMMED(grp) \
+ (clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+
#define EXT4_MAX_CONTENTION 8
#define EXT4_CONTENTION_THRESHOLD 2

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..f1eef35 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2687,6 +2687,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
rb_erase(&entry->node, &(db->bb_free_root));
mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);

+ /*
+ * Clear the trimmed flag for the group so that the next
+ * ext4_trim_fs can trim it.
+ * If the volume is mounted with -o discard, online discard
+ * is supported and the free blocks will be trimmed online.
+ */
+ if (!test_opt(sb, DISCARD))
+ EXT4_MB_GRP_CLEAR_TRIMMED(db);
+
if (!db->bb_free_root.rb_node) {
/* No more items in the per group rb tree
* balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4781,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,

ext4_lock_group(sb, group);

+ if (EXT4_MB_GRP_WAS_TRIMMED(e4b->bd_info) &&
+ minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
+ goto out;
+
trace_ext4_trim_all_free(sb, group, start, max);

while (start < max) {
@@ -4804,6 +4817,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
if ((e4b->bd_info->bb_free - free_count) < minblocks)
break;
}
+
+ if (!ret)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+out:
ext4_unlock_group(sb, group);

ext4_debug("trimmed %d blocks in the group %d\n",
@@ -4892,6 +4909,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
}
range->len = trimmed * sb->s_blocksize;

+ if (!ret)
+ EXT4_SB(sb)->s_last_trim_minblks = minlen;
+
out:
return ret;
}
--
1.6.3.GIT


2011-02-21 16:44:43

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.

On Thu, 10 Feb 2011, [email protected] wrote:

> > On Thu, 10 Feb 2011, Tao Ma wrote:
> >
> >> From: Tao Ma <[email protected]>
> >>
> >> In ext4, when FITRIM is called every time, we iterate all the
> >> groups and do trim one by one. It is a bit time wasting if the
> >> group has been trimmed and there is no change since the last
> >> trim.
> >>
> >> So this patch adds a new flag in ext4_group_info->bb_state to
> >> indicate that the group has been trimmed, and it will be cleared
> >> if some blocks is freed(in release_blocks_on_commit). Another
> >> trim_minlen is added in ext4_sb_info to record the last minlen
> >> we use to trim the volume, so that if the caller provide a small
> >> one, we will go on the trim regardless of the bb_state.
> >>
> >> A simple test with my intel x25m ssd:
> >> df -h shows:
> >> /dev/sdb2 108G 35G 68G 34% /mnt/ext4
> >> Block size: 4096
> >>
> >> run the FITRIM with the following parameter:
> >> range.start = 0;
> >> range.len = UINT64_MAX;
> >> range.minlen = 1048576;
> >>
> >> without the patch:
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m4.039s
> >> user 0m0.000s
> >> sys 0m1.020s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m3.577s
> >> user 0m0.001s
> >> sys 0m1.004s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m3.380s
> >> user 0m0.000s
> >> sys 0m0.991s
> >>
> >> with the patch:
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m3.466s
> >> user 0m0.000s
> >> sys 0m0.966s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m0.001s
> >> user 0m0.000s
> >> sys 0m0.001s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m0.001s
> >> user 0m0.000s
> >> sys 0m0.000s
> >>
> >> A big improvement for the 2nd and 3rd run.
> >>
> >> After I delete some big image files and re-run the trim,
> >> it is still much faster than iterating the whole disk.
> >> /dev/sdb2 108G 25G 78G 24% /mnt/ext4
> >>
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real 0m0.513s
> >> user 0m0.000s
> >> sys 0m0.069s
> >
> > Great it looks really good.
> >
> >>
> >> Cc: Andreas Dilger <[email protected]>
> >> Cc: Lukas Czerner <[email protected]>
> >> Signed-off-by: Tao Ma <[email protected]>
> >> ---
> >> fs/ext4/ext4.h | 8 +++++++-
> >> fs/ext4/mballoc.c | 22 ++++++++++++++++++++++
> >> 2 files changed, 29 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >> index 0c8d97b..1d59a63 100644
> >> --- a/fs/ext4/ext4.h
> >> +++ b/fs/ext4/ext4.h
> >> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
> >> struct ext4_li_request *s_li_request;
> >> /* Wait multiplier for lazy initialization thread */
> >> unsigned int s_li_wait_mult;
> >> +
> >> + /* record the last minlen when FITRIM is called. */
> >> + u64 s_last_trim_minblks;
> >> };
> >>
> >> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> >> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
> >> * 5 free 8-block regions. */
> >> };
> >>
> >> -#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> >> +#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
> >> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1
> >>
> >> #define EXT4_MB_GRP_NEED_INIT(grp) \
> >> (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> >> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp) \
> >> + (test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> >>
> >> #define EXT4_MAX_CONTENTION 8
> >> #define EXT4_CONTENTION_THRESHOLD 2
> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> >> index 4eadac8..c7aa094 100644
> >> --- a/fs/ext4/mballoc.c
> >> +++ b/fs/ext4/mballoc.c
> >> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t
> >> *journal, transaction_t *txn)
> >> rb_erase(&entry->node, &(db->bb_free_root));
> >> mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
> >>
> >> + /*
> >> + * Clear the trimmed flag for the group so that the next
> >> + * ext4_trim_fs can trim it.
> >> + * If the volume is mounted with -o discard, online discard
> >> + * is supported and the free blocks will be trimmed online.
> >> + */
> >> + if (!test_opt(sb, DISCARD))
> >> + clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> >> + &(db->bb_state));
> >> +
> >> if (!db->bb_free_root.rb_node) {
> >> /* No more items in the per group rb tree
> >> * balance refcounts from ext4_mb_free_metadata()
> >> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct
> >> super_block *sb, struct ext4_buddy *e4b,
> >>
> >> ext4_lock_group(sb, group);
> >>
> >> + if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> >> + minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
> >> + goto out;
> >> +
> >> trace_ext4_trim_all_free(sb, group, start, max);
> >>
> >> while (start < max) {
> >> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct
> >> super_block *sb, struct ext4_buddy *e4b,
> >> if ((e4b->bd_info->bb_free - free_count) < minblocks)
> >> break;
> >> }
> >> +
> >> + if (!ret)
> >> + set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> >> + &(e4b->bd_info->bb_state));
> >> +out:
> >> ext4_unlock_group(sb, group);
> >>
> >> ext4_debug("trimmed %d blocks in the group %d\n",
> >> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct
> >> fstrim_range *range)
> >> }
> >> range->len = trimmed * sb->s_blocksize;
> >>
> >> + if (!ret)
> >> + EXT4_SB(sb)->s_last_trim_minblks = minlen;
> >> +
> >
> > Since this is not protected by any lock, would not it race in case of
> > multiple FITRIM calls ?
> yeah, I am also thinking of this, but I don't think we need a new lock
> just for this. And I guess atomic_t isn't good here because minlen is a
> u64.
>
> Do you think we can use some other spin_lock in ext4 system? I am not
> quite familiar with ext4 by now, so do you have any suggestion?
>
> Regards,
> Tao

I am sorry for late answer. Even though minlen is 64-bit long, it can
not be that long in ext4, because it can not be bigger than size of the
allocation group (see mballoc.c:4840). I hope it helps.

I would be very happy to see this upstream, because it can help a lot!

Thanks!
-Lukas

2011-02-22 13:12:12

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 0/4] EXT4 trim bug fixes and improvement.

Hi Ted,
any comments for this patch set? Are there ready for the next merge
window?
The most complicate patch [4/4] has already reviewed by Andreas.

Regards,
Tao
On 02/09/2011 01:52 PM, Tao Ma wrote:
> Hi Ted and the list,
> Here are 4 patches for trim support in ext4.
> Patch 1/4 is a fix for the 'len' overflow. As we have decided that
> we don't adjust start to s_first_data_block, this patch is still needed.
> Patch 2/4 is a speedup for trim in one group.
> Patch 3/4 adds trace points for the functions used in FITRIM.
> Patch 4/4 is a speedup for trim in the whole volume. For more
> details, please check the commit log.
>
> Regards,
> Tao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-02-22 13:51:38

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 0/4] EXT4 trim bug fixes and improvement.

On Tue, 22 Feb 2011, Tao Ma wrote:

> Hi Ted,
> any comments for this patch set? Are there ready for the next merge
> window?
> The most complicate patch [4/4] has already reviewed by Andreas.

But still, I think there is a race -- see my previous comments in the
thread where I suggested using atomic_t for s_last_trim_minblks.

Thanks!
-Lukas

>
> Regards,
> Tao
> On 02/09/2011 01:52 PM, Tao Ma wrote:
> > Hi Ted and the list,
> > Here are 4 patches for trim support in ext4.
> > Patch 1/4 is a fix for the 'len' overflow. As we have decided that we
> > don't adjust start to s_first_data_block, this patch is still needed.
> > Patch 2/4 is a speedup for trim in one group.
> > Patch 3/4 adds trace points for the functions used in FITRIM.
> > Patch 4/4 is a speedup for trim in the whole volume. For more details,
> > please check the commit log.
> >
> > Regards,
> > Tao
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-02-24 14:19:06

by Tao Ma

[permalink] [raw]
Subject: [PATCH 4/4 V3] ext4: Speed up FITRIM by recording flags in ext4_group_info.

changlog from v2 -> v3:
change s_last_trim_minblks from u64 to atomic_t to avoid a race.

Regards,
Tao

>From 3875493dea3cc25be5460c175267f08a7ed540bc Mon Sep 17 00:00:00 2001
From: Tao Ma <[email protected]>
Date: Fri, 25 Feb 2011 06:10:26 +0800
Subject: [PATCH 4/4 V3] ext4: Speed up FITRIM by recording flags in ext4_group_info.

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb2 108G 35G 68G 34% /mnt/ext4
Block size: 4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m4.039s
user 0m0.000s
sys 0m1.020s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.577s
user 0m0.001s
sys 0m1.004s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.380s
user 0m0.000s
sys 0m0.991s

with the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m3.466s
user 0m0.000s
sys 0m0.966s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.001s
user 0m0.000s
sys 0m0.001s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.001s
user 0m0.000s
sys 0m0.000s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.
/dev/sdb2 108G 25G 78G 24% /mnt/ext4

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m0.513s
user 0m0.000s
sys 0m0.069s

Cc: Lukas Czerner <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ext4/ext4.h | 13 ++++++++++++-
fs/ext4/mballoc.c | 20 ++++++++++++++++++++
fs/ext4/super.c | 2 ++
3 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..715c0a4 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
struct ext4_li_request *s_li_request;
/* Wait multiplier for lazy initialization thread */
unsigned int s_li_wait_mult;
+
+ /* record the last minlen when FITRIM is called. */
+ atomic_t s_last_trim_minblks;
};

static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,11 +1973,19 @@ struct ext4_group_info {
* 5 free 8-block regions. */
};

-#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT 0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT 1

#define EXT4_MB_GRP_NEED_INIT(grp) \
(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))

+#define EXT4_MB_GRP_WAS_TRIMMED(grp) \
+ (test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_SET_TRIMMED(grp) \
+ (set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CLEAR_TRIMMED(grp) \
+ (clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+
#define EXT4_MAX_CONTENTION 8
#define EXT4_CONTENTION_THRESHOLD 2

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..29d7d17 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2687,6 +2687,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
rb_erase(&entry->node, &(db->bb_free_root));
mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);

+ /*
+ * Clear the trimmed flag for the group so that the next
+ * ext4_trim_fs can trim it.
+ * If the volume is mounted with -o discard, online discard
+ * is supported and the free blocks will be trimmed online.
+ */
+ if (!test_opt(sb, DISCARD))
+ EXT4_MB_GRP_CLEAR_TRIMMED(db);
+
if (!db->bb_free_root.rb_node) {
/* No more items in the per group rb tree
* balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4781,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,

ext4_lock_group(sb, group);

+ if (EXT4_MB_GRP_WAS_TRIMMED(e4b->bd_info) &&
+ minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
+ goto out;
+
trace_ext4_trim_all_free(sb, group, start, max);

while (start < max) {
@@ -4804,6 +4817,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
if ((e4b->bd_info->bb_free - free_count) < minblocks)
break;
}
+
+ if (!ret)
+ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+out:
ext4_unlock_group(sb, group);

ext4_debug("trimmed %d blocks in the group %d\n",
@@ -4892,6 +4909,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
}
range->len = trimmed * sb->s_blocksize;

+ if (!ret)
+ atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
+
out:
return ret;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4898cb1..0fa3ed6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3045,6 +3045,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_sectors_written_start =
part_stat_read(sb->s_bdev->bd_part, sectors[1]);

+ atomic_set(&sbi->s_last_trim_minblks, 0);
+
/* Cleanup superblock name */
for (cp = sb->s_id; (cp = strchr(cp, '/'));)
*cp = '!';
--
1.6.3.GIT