2010-03-30 12:36:20

by jing zhang

[permalink] [raw]
Subject: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

From: Jing Zhang <[email protected]>

Date: Tue Mar 30 20:35:22 2010

With the added cache, better group locality may be earned when
allocating blocks.

Cc: Theodore Ts'o <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Dave Kleikamp <[email protected]>
Cc: "Aneesh Kumar K. V" <[email protected]>
Signed-off-by: Jing Zhang <[email protected]>

---

--- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
+++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
@@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
ext4_group_t i, ngroups = ext4_get_groups_count(sb);
int ret;
int freed = 0;
+ static ext4_group_t grp_cache = 0;

trace_ext4_mb_discard_preallocations(sb, needed);
- for (i = 0; i < ngroups && needed > 0; i++) {
- ret = ext4_mb_discard_group_preallocations(sb, i, needed);
+ if (needed <= 0)
+ return freed;
+ for (i = 0; i < ngroups; i++) {
+ if (grp_cache >= ngroups)
+ grp_cache -= ngroups;
+ ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
freed += ret;
needed -= ret;
+ if (needed <= 0)
+ break;
+ grp_cache++;
}

return freed;


2010-03-30 18:37:23

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

On Tue, 30 Mar 2010 20:36:17 +0800, jing zhang <[email protected]> wrote:
> From: Jing Zhang <[email protected]>
>
> Date: Tue Mar 30 20:35:22 2010
>
> With the added cache, better group locality may be earned when
> allocating blocks.
>
> Cc: Theodore Ts'o <[email protected]>
> Cc: Andreas Dilger <[email protected]>
> Cc: Dave Kleikamp <[email protected]>
> Cc: "Aneesh Kumar K. V" <[email protected]>
> Signed-off-by: Jing Zhang <[email protected]>
>
> ---
>
> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
> ext4_group_t i, ngroups = ext4_get_groups_count(sb);
> int ret;
> int freed = 0;
> + static ext4_group_t grp_cache = 0;
>
> trace_ext4_mb_discard_preallocations(sb, needed);
> - for (i = 0; i < ngroups && needed > 0; i++) {
> - ret = ext4_mb_discard_group_preallocations(sb, i, needed);
> + if (needed <= 0)
> + return freed;
> + for (i = 0; i < ngroups; i++) {
> + if (grp_cache >= ngroups)
> + grp_cache -= ngroups;
> + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
> freed += ret;
> needed -= ret;
> + if (needed <= 0)
> + break;
> + grp_cache++;
> }
>
> return freed;

can you explain this further ?

-aneesh

2010-03-31 15:10:04

by jing zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

2010/3/31, Aneesh Kumar K. V <[email protected]>:
> On Tue, 30 Mar 2010 20:36:17 +0800, jing zhang <[email protected]> wrote:
>> From: Jing Zhang <[email protected]>
>>
>> Date: Tue Mar 30 20:35:22 2010
>>
>> With the added cache, better group locality may be earned when
>> allocating blocks.
>>
>> Cc: Theodore Ts'o <[email protected]>
>> Cc: Andreas Dilger <[email protected]>
>> Cc: Dave Kleikamp <[email protected]>
>> Cc: "Aneesh Kumar K. V" <[email protected]>
>> Signed-off-by: Jing Zhang <[email protected]>
>>
>> ---
>>
>> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
>> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
>> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>> ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>> int ret;
>> int freed = 0;
>> + static ext4_group_t grp_cache = 0;
>>
>> trace_ext4_mb_discard_preallocations(sb, needed);
>> - for (i = 0; i < ngroups && needed > 0; i++) {
>> - ret = ext4_mb_discard_group_preallocations(sb, i, needed);
>> + if (needed <= 0)
>> + return freed;
>> + for (i = 0; i < ngroups; i++) {
>> + if (grp_cache >= ngroups)
>> + grp_cache -= ngroups;
>> + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
>> freed += ret;
>> needed -= ret;
>> + if (needed <= 0)
>> + break;
>> + grp_cache++;
>> }
>>
>> return freed;
>
> can you explain this further ?
>
> -aneesh
>

The added cache checks whether blocks pre-allocated in group are still
available. If yes, they are discarded and used for allocation without
change of group. So more group locality can be earned.

What is more, in function, ext4_mb_discard_group_preallocations(),
pre-allocation is allowed to be discarded as much as possible by
yielding.

- zj

2010-03-31 21:45:07

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

On 2010-03-30, at 06:36, jing zhang wrote:
> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
> trace_ext4_mb_discard_preallocations(sb, needed);
> - for (i = 0; i < ngroups && needed > 0; i++) {
> - ret = ext4_mb_discard_group_preallocations(sb, i, needed);
> + if (needed <= 0)
> + return freed;
> + for (i = 0; i < ngroups; i++) {
> + if (grp_cache >= ngroups)
> + grp_cache -= ngroups;
> + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);


Anything that is walking every group in the filesystem is going to hit
problems on large filesystems. This seems like something that needs
to be fixed in a different way (e.g. keeping a list of preallocations).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2010-04-01 12:34:43

by jing zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

2010/3/31, Andreas Dilger <[email protected]>:
> On 2010-03-30, at 06:36, jing zhang wrote:
>> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
>> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
>> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>> trace_ext4_mb_discard_preallocations(sb, needed);
>> - for (i = 0; i < ngroups && needed > 0; i++) {
>> - ret = ext4_mb_discard_group_preallocations(sb, i, needed);
>> + if (needed <= 0)
>> + return freed;
>> + for (i = 0; i < ngroups; i++) {
>> + if (grp_cache >= ngroups)
>> + grp_cache -= ngroups;
>> + ret = ext4_mb_discard_group_preallocations(sb, grp_cache, needed);
>
>
> Anything that is walking every group in the filesystem is going to hit
> problems on large filesystems. This seems like something that needs
> to be fixed in a different way (e.g. keeping a list of preallocations).
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>

Then please take the following also into consideration.

Thanks
- zj

---

--- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
+++ ext4_mm_leak/mballoc-14.c 2010-04-01 20:35:58.000000000 +0800
@@ -4299,7 +4299,7 @@ repeat:
}
} else {
freed = ext4_mb_discard_preallocations(sb, ac->ac_o_ex.fe_len);
- if (freed)
+ if (freed && freed >= ac->ac_o_ex.fe_len)
goto repeat;
*errp = -ENOSPC;
ac->ac_b_ex.fe_len = 0;

2010-04-06 18:32:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

On Tue, Mar 30, 2010 at 08:36:17PM +0800, jing zhang wrote:
> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
> ext4_group_t i, ngroups = ext4_get_groups_count(sb);
> int ret;
> int freed = 0;
> + static ext4_group_t grp_cache = 0;

This is a problem right there. Remember that there could be multiple
file systems mounted so a static variable is fundamentally flawed.

In fact, we could have a one filesystem which has more than 3 times
the number of groups as another file system. I'll leave it as an
exercise to a reader why your patch would be fundamentally flawed in
that case.

The other thing to note is that this case only gets hit if the file
system is so full that we need to empty preallocations. So this means
hitting this case is rare, which raises two questions: (1) is it worth
it to optimize this case in the first place (is it really that
expensive to iterate over all the groups to discard the
preallocations); (2) can we test this case well?

- Ted

2010-04-06 18:50:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

On Thu, Apr 01, 2010 at 08:34:41PM +0800, jing zhang wrote:
>
> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
> +++ ext4_mm_leak/mballoc-14.c 2010-04-01 20:35:58.000000000 +0800
> @@ -4299,7 +4299,7 @@ repeat:
> }
> } else {
> freed = ext4_mb_discard_preallocations(sb, ac->ac_o_ex.fe_len);
> - if (freed)
> + if (freed && freed >= ac->ac_o_ex.fe_len)
> goto repeat;
> *errp = -ENOSPC;
> ac->ac_b_ex.fe_len = 0;

This is just wrong.

Since you didn't give a justification, I'm not sure why you think it
is correct.

- Ted

2010-04-07 12:51:10

by jing zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

2010/4/7, [email protected] <[email protected]>:
> On Tue, Mar 30, 2010 at 08:36:17PM +0800, jing zhang wrote:
>> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
>> +++ ext4_mm_leak/mballoc-13.c 2010-03-30 20:28:08.000000000 +0800
>> @@ -4183,12 +4183,20 @@ static int ext4_mb_discard_preallocation
>> ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>> int ret;
>> int freed = 0;
>> + static ext4_group_t grp_cache = 0;
>
> This is a problem right there. Remember that there could be multiple
> file systems mounted so a static variable is fundamentally flawed.
>

cool, the static in my patch is a fatal error.

- zj

> In fact, we could have a one filesystem which has more than 3 times
> the number of groups as another file system. I'll leave it as an
> exercise to a reader why your patch would be fundamentally flawed in
> that case.
>
> The other thing to note is that this case only gets hit if the file
> system is so full that we need to empty preallocations. So this means
> hitting this case is rare, which raises two questions: (1) is it worth
> it to optimize this case in the first place (is it really that
> expensive to iterate over all the groups to discard the
> preallocations); (2) can we test this case well?
>
> - Ted
>

2010-04-07 12:58:47

by jing zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

2010/4/7, [email protected] <[email protected]>:
> On Thu, Apr 01, 2010 at 08:34:41PM +0800, jing zhang wrote:
>>
>> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
>> +++ ext4_mm_leak/mballoc-14.c 2010-04-01 20:35:58.000000000 +0800
>> @@ -4299,7 +4299,7 @@ repeat:
>> }
>> } else {
>> freed = ext4_mb_discard_preallocations(sb, ac->ac_o_ex.fe_len);
>> - if (freed)
>> + if (freed && freed >= ac->ac_o_ex.fe_len)
>> goto repeat;
>> *errp = -ENOSPC;
>> ac->ac_b_ex.fe_len = 0;
>
> This is just wrong.
>
> Since you didn't give a justification, I'm not sure why you think it
> is correct.
>

Though freed, is the amount freed bigger than needed?
If not, it seems unnecessary to repeat.

- zj

2010-04-07 14:46:46

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: group cache is added in ext4_mb_discard_preallocations()

On Wed, Apr 07, 2010 at 08:58:36PM +0800, jing zhang wrote:
> 2010/4/7, [email protected] <[email protected]>:
> > On Thu, Apr 01, 2010 at 08:34:41PM +0800, jing zhang wrote:
> >>
> >> --- linux-2.6.32/fs/ext4/mballoc.c 2009-12-03 11:51:22.000000000 +0800
> >> +++ ext4_mm_leak/mballoc-14.c 2010-04-01 20:35:58.000000000 +0800
> >> @@ -4299,7 +4299,7 @@ repeat:
> >> }
> >> } else {
> >> freed = ext4_mb_discard_preallocations(sb, ac->ac_o_ex.fe_len);
> >> - if (freed)
> >> + if (freed && freed >= ac->ac_o_ex.fe_len)
> >> goto repeat;
> >> *errp = -ENOSPC;
> >> ac->ac_b_ex.fe_len = 0;
> >
> > This is just wrong.
> >
> > Since you didn't give a justification, I'm not sure why you think it
> > is correct.
> >
>
> Though freed, is the amount freed bigger than needed?
> If not, it seems unnecessary to repeat.

You don't understand the code, I think. If we've freed up any number
of blocks, it makes sense to use those blocks right away. Mballoc()
is allowed to return fewer blocks than what was requested.

- Ted