LinuxLists.cc - [PATCH v3] ext4: Fix dead loop in ext4_mb_new

2020-09-10 09:15:17

Subject: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks

As we test disk offline/online with running fsstress, we find fsstress
process is keeping running state.
kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
....
kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114

ext4_mb_new_blocks
repeat:
ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
freed = ext4_mb_discard_preallocations
ext4_mb_discard_group_preallocations
this_cpu_inc(discard_pa_seq);
---> freed == 0
seq_retry = ext4_get_discard_pa_seq_sum
for_each_possible_cpu(__cpu)
__seq += per_cpu(discard_pa_seq, __cpu);
if (seq_retry != *seq) {
*seq = seq_retry;
ret = true;
}

As we see seq_retry is sum of discard_pa_seq every cpu, if
ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
cpu maybe increase one, so condition "seq_retry != *seq" have always
been met.
To Fix this problem, in ext4_mb_discard_group_preallocations function increase
discard_pa_seq only when it found preallocation to discard.

Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
Signed-off-by: Ye Bin <[email protected]>
---
fs/ext4/mballoc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f386fe62727d..fd55264dc3fe 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
INIT_LIST_HEAD(&list);
repeat:
ext4_lock_group(sb, group);
- this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {
spin_lock(&pa->pa_lock);
@@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
goto out;
}

+ /* only increase when find reallocation to discard */
+ this_cpu_inc(discard_pa_seq);
+
/* now free all selected PAs */
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {

--
2.25.4

2020-09-10 17:21:12

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks

On Thu 10-09-20 17:12:52, Ye Bin wrote:
> As we test disk offline/online with running fsstress, we find fsstress
> process is keeping running state.
> kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
> ....
> kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>
> ext4_mb_new_blocks
> repeat:
> ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
> freed = ext4_mb_discard_preallocations
> ext4_mb_discard_group_preallocations
> this_cpu_inc(discard_pa_seq);
> ---> freed == 0
> seq_retry = ext4_get_discard_pa_seq_sum
> for_each_possible_cpu(__cpu)
> __seq += per_cpu(discard_pa_seq, __cpu);
> if (seq_retry != *seq) {
> *seq = seq_retry;
> ret = true;
> }
>
> As we see seq_retry is sum of discard_pa_seq every cpu, if
> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
> cpu maybe increase one, so condition "seq_retry != *seq" have always
> been met.
> To Fix this problem, in ext4_mb_discard_group_preallocations function increase
> discard_pa_seq only when it found preallocation to discard.
>
> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
> Signed-off-by: Ye Bin <[email protected]>

Thanks for the patch. One comment below.

> ---
> fs/ext4/mballoc.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f386fe62727d..fd55264dc3fe 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> INIT_LIST_HEAD(&list);
> repeat:
> ext4_lock_group(sb, group);
> - this_cpu_inc(discard_pa_seq);
> list_for_each_entry_safe(pa, tmp,
> &grp->bb_prealloc_list, pa_group_list) {
> spin_lock(&pa->pa_lock);
> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> goto out;
> }
>
> + /* only increase when find reallocation to discard */
> + this_cpu_inc(discard_pa_seq);
> +

This is a good place to increment the counter but I think you also need to
handle the case:

if (free < needed && busy) {
busy = 0;
ext4_unlock_group(sb, group);
cond_resched();
goto repeat;
}

We can unlock the group here after removing some preallocations and thus
other processes checking discard_pa_seq could miss we did this. In fact I
think the code is somewhat buggy here and we should also discard extents
accumulated on "list" so far before unlocking the group. Ritesh?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-09-11 13:59:31

by Ritesh Harjani

[permalink] [raw]

Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks

Hello Ye,

Please excuse if there is something horribly wrong with my email
formatting. Have yesterday received this laptop and still setting up
few things.

On 9/10/20 9:47 PM, Jan Kara wrote:
> On Thu 10-09-20 17:12:52, Ye Bin wrote:
>> As we test disk offline/online with running fsstress, we find fsstress
>> process is keeping running state.
>> kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>> ....
>> kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>>
>> ext4_mb_new_blocks
>> repeat:
>> ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
>> freed = ext4_mb_discard_preallocations
>> ext4_mb_discard_group_preallocations
>> this_cpu_inc(discard_pa_seq);
>> ---> freed == 0
>> seq_retry = ext4_get_discard_pa_seq_sum
>> for_each_possible_cpu(__cpu)
>> __seq += per_cpu(discard_pa_seq, __cpu);
>> if (seq_retry != *seq) {
>> *seq = seq_retry;
>> ret = true;
>> }
>>
>> As we see seq_retry is sum of discard_pa_seq every cpu, if
>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
>> cpu maybe increase one, so condition "seq_retry != *seq" have always
>> been met.
>> To Fix this problem, in ext4_mb_discard_group_preallocations function increase
>> discard_pa_seq only when it found preallocation to discard.
>>
>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
>> Signed-off-by: Ye Bin <[email protected]>
>
> Thanks for the patch. One comment below.
>
>> ---
>> fs/ext4/mballoc.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f386fe62727d..fd55264dc3fe 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>> INIT_LIST_HEAD(&list);
>> repeat:
>> ext4_lock_group(sb, group);
>> - this_cpu_inc(discard_pa_seq);
>> list_for_each_entry_safe(pa, tmp,
>> &grp->bb_prealloc_list, pa_group_list) {
>> spin_lock(&pa->pa_lock);
>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>> goto out;
>> }
>>
>> + /* only increase when find reallocation to discard */
>> + this_cpu_inc(discard_pa_seq);
>> +
>
> This is a good place to increment the counter but I think you also need to
> handle the case:
>
> if (free < needed && busy) {
> busy = 0;
> ext4_unlock_group(sb, group);
> cond_resched();
> goto repeat;
> }
>
> We can unlock the group here after removing some preallocations and thus
> other processes checking discard_pa_seq could miss we did this. In fact I
> think the code is somewhat buggy here and we should also discard extents
> accumulated on "list" so far before unlocking the group. Ritesh?
>

mmm, so even though this code is not discarding those buffers b4
unlocking, but it has still removed those from the grp->bb_prealloc_list
and added it to the local list. And since it will at some point get
scheduled and start operating from repeat: label so functionally wise
this should be ok. Am I missing anything?

Although I agree, that if we remove at least the current pa's before
unlocking the group may be a good idea, but we should also check
why was this done like this at the first place.

I agree with Jan, that we should increment discard_pa_seq once we
actually have something
to discard. I should have written a comment here to explain why we did
this here.
But I think commit msg should have all the history (since I have a habit
of writing long commit msgs ;)

But IIRC, it was done since in case if there is a parallel thread which
is discarding
all the preallocations so the current thread may return 0 since it
checks the
list_empty(&grp->bb_prealloc_list) check in
ext4_mb_discard_group_preallocations() and returns 0 directly.

And why the discard_pa_seq counter at other places may not help since we
remove the pa nodes from
grp->bb_prealloc_list into a local list and then start operating on
that. So meanwhile some thread may comes and just checks that the list
is empty and return 0 while some other thread may start discarding from
it's local list.
So I guess the main problem was that in the current code we remove
the pa from grp->bb_prealloc_list and add it to local list. So if
someone else comes
and checks that grp->bb_prealloc_list is empty then it will directly
return 0.

So, maybe we could do something like this then?

repeat:
ext4_lock_group(sb, group);
- this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {<...>

+ if (!free)
+ this_cpu_inc(discard_pa_seq); // we should do this here before
calling list_del(&pa->pa_group_list);

/* we can trust pa_free ... */
free += pa->pa_free;
spin_unlock(&pa->pa_lock);

list_del(&pa->pa_group_list);
list_add(&pa->u.pa_tmp_list, &list);
}

I have some test cases around this to test for cases which were
failing. Since in world of parallelism you can't be 100% certain of some
corner case (like this one you just reported).
But, I don't have my other box rite now where I kept all of those -
due to some technical issues. I think I should be able to get those by
next week, if not, I anyways will setup my current machine for testing
this.

-ritesh

2020-09-14 03:22:29

by yebin (H)

[permalink] [raw]

Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks

In fact, we didn't free the available space, and other processes
couldn't get it
even if they tried again.

According to your opinions, I made two different revisions. Which one do
you think is better?
（1）Free PAs before repeat
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 132c118d12e1..4ab76882350d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4189,7 +4189,6 @@ ext4_mb_discard_group_preallocations(struct
super_block *sb,
INIT_LIST_HEAD(&list);
repeat:
ext4_lock_group(sb, group);
- this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {
spin_lock(&pa->pa_lock);
@@ -4215,22 +4214,6 @@ ext4_mb_discard_group_preallocations(struct
super_block *sb,
list_add(&pa->u.pa_tmp_list, &list);
}

- /* if we still need more blocks and some PAs were used, try again */
- if (free < needed && busy) {
- busy = 0;
- ext4_unlock_group(sb, group);
- cond_resched();
- goto repeat;
- }
-
- /* found anything to free? */
- if (list_empty(&list)) {
- BUG_ON(free != 0);
- mb_debug(sb, "Someone else may have freed PA for this
group %u\n",
- group);
- goto out;
- }
-
/* now free all selected PAs */
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {

@@ -4248,6 +4231,14 @@ ext4_mb_discard_group_preallocations(struct
super_block *sb,
call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
}

+ /* if we still need more blocks and some PAs were used, try again */
+ if (free < needed && busy) {
+ busy = 0;
+ ext4_unlock_group(sb, group);
+ cond_resched();
+ goto repeat;
+ }
+
out:
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
--

(2)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 132c118d12e1..188772bbf679 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4189,7 +4189,6 @@ ext4_mb_discard_group_preallocations(struct
super_block *sb,
INIT_LIST_HEAD(&list);
repeat:
ext4_lock_group(sb, group);
- this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {
spin_lock(&pa->pa_lock);
@@ -4217,6 +4216,8 @@ ext4_mb_discard_group_preallocations(struct
super_block *sb,

/* if we still need more blocks and some PAs were used, try
again */
if (free < needed && busy) {
+ if (free)
+ this_cpu_inc(discard_pa_seq);
busy = 0;
ext4_unlock_group(sb, group);
cond_resched();

On 2020/9/11 21:20, Ritesh Harjani wrote:
> Hello Ye,
>
> Please excuse if there is something horribly wrong with my email
> formatting. Have yesterday received this laptop and still setting up
> few things.
>
> On 9/10/20 9:47 PM, Jan Kara wrote:
>> On Thu 10-09-20 17:12:52, Ye Bin wrote:
>>> As we test disk offline/online with running fsstress, we find fsstress
>>> process is keeping running state.
>>> kworker/u32:3-262 [004] ...1 140.787471:
>>> ext4_mb_discard_preallocations: dev 8,32 needed 114
>>> ....
>>> kworker/u32:3-262 [004] ...1 140.787471:
>>> ext4_mb_discard_preallocations: dev 8,32 needed 114
>>>
>>> ext4_mb_new_blocks
>>> repeat:
>>> ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
>>> freed = ext4_mb_discard_preallocations
>>> ext4_mb_discard_group_preallocations
>>> this_cpu_inc(discard_pa_seq);
>>> ---> freed == 0
>>> seq_retry = ext4_get_discard_pa_seq_sum
>>> for_each_possible_cpu(__cpu)
>>> __seq += per_cpu(discard_pa_seq, __cpu);
>>> if (seq_retry != *seq) {
>>> *seq = seq_retry;
>>> ret = true;
>>> }
>>>
>>> As we see seq_retry is sum of discard_pa_seq every cpu, if
>>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
>>> cpu maybe increase one, so condition "seq_retry != *seq" have always
>>> been met.
>>> To Fix this problem, in ext4_mb_discard_group_preallocations
>>> function increase
>>> discard_pa_seq only when it found preallocation to discard.
>>>
>>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for
>>> freeing PA to improve ENOSPC handling")
>>> Signed-off-by: Ye Bin <[email protected]>
>>
>> Thanks for the patch. One comment below.
>>
>>> ---
>>> fs/ext4/mballoc.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index f386fe62727d..fd55264dc3fe 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct
>>> super_block *sb,
>>> INIT_LIST_HEAD(&list);
>>> repeat:
>>> ext4_lock_group(sb, group);
>>> - this_cpu_inc(discard_pa_seq);
>>> list_for_each_entry_safe(pa, tmp,
>>> &grp->bb_prealloc_list, pa_group_list) {
>>> spin_lock(&pa->pa_lock);
>>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct
>>> super_block *sb,
>>> goto out;
>>> }
>>> + /* only increase when find reallocation to discard */
>>> + this_cpu_inc(discard_pa_seq);
>>> +
>>
>> This is a good place to increment the counter but I think you also
>> need to
>> handle the case:
>>
>> if (free < needed && busy) {
>> busy = 0;
>> ext4_unlock_group(sb, group);
>> cond_resched();
>> goto repeat;
>> }
>>
>> We can unlock the group here after removing some preallocations and thus
>> other processes checking discard_pa_seq could miss we did this. In
>> fact I
>> think the code is somewhat buggy here and we should also discard extents
>> accumulated on "list" so far before unlocking the group. Ritesh?
>>
>
> mmm, so even though this code is not discarding those buffers b4
> unlocking, but it has still removed those from the grp->bb_prealloc_list
> and added it to the local list. And since it will at some point get
> scheduled and start operating from repeat: label so functionally wise
> this should be ok. Am I missing anything?
>
> Although I agree, that if we remove at least the current pa's before
> unlocking the group may be a good idea, but we should also check
> why was this done like this at the first place.
>
>
> I agree with Jan, that we should increment discard_pa_seq once we
> actually have something
> to discard. I should have written a comment here to explain why we did
> this here.
> But I think commit msg should have all the history (since I have a
> habit of writing long commit msgs ;)
>
> But IIRC, it was done since in case if there is a parallel thread
> which is discarding
> all the preallocations so the current thread may return 0 since it
> checks the
> list_empty(&grp->bb_prealloc_list) check in
> ext4_mb_discard_group_preallocations() and returns 0 directly.
>
> And why the discard_pa_seq counter at other places may not help since
> we remove the pa nodes from
> grp->bb_prealloc_list into a local list and then start operating on
> that. So meanwhile some thread may comes and just checks that the list
> is empty and return 0 while some other thread may start discarding from
> it's local list.
> So I guess the main problem was that in the current code we remove
> the pa from grp->bb_prealloc_list and add it to local list. So if
> someone else comes
> and checks that grp->bb_prealloc_list is empty then it will directly
> return 0.
>
> So, maybe we could do something like this then?
>
> repeat:
> ext4_lock_group(sb, group);
> - this_cpu_inc(discard_pa_seq);
> list_for_each_entry_safe(pa, tmp,
> &grp->bb_prealloc_list, pa_group_list) {<...>
>
> + if (!free)
> + this_cpu_inc(discard_pa_seq); // we should do this here
> before calling list_del(&pa->pa_group_list);
>
> /* we can trust pa_free ... */
> free += pa->pa_free;
> spin_unlock(&pa->pa_lock);
>
> list_del(&pa->pa_group_list);
> list_add(&pa->u.pa_tmp_list, &list);
> }
>
> I have some test cases around this to test for cases which were
> failing. Since in world of parallelism you can't be 100% certain of some
> corner case (like this one you just reported).
> But, I don't have my other box rite now where I kept all of those -
> due to some technical issues. I think I should be able to get those by
> next week, if not, I anyways will setup my current machine for testing
> this.
>
> -ritesh
> .
>

2020-09-14 08:20:45

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks

Hello Ritesh,

On Fri 11-09-20 18:50:22, Ritesh Harjani wrote:
> On 9/10/20 9:47 PM, Jan Kara wrote:
> > > ---
> > > fs/ext4/mballoc.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > > index f386fe62727d..fd55264dc3fe 100644
> > > --- a/fs/ext4/mballoc.c
> > > +++ b/fs/ext4/mballoc.c
> > > @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> > > INIT_LIST_HEAD(&list);
> > > repeat:
> > > ext4_lock_group(sb, group);
> > > - this_cpu_inc(discard_pa_seq);
> > > list_for_each_entry_safe(pa, tmp,
> > > &grp->bb_prealloc_list, pa_group_list) {
> > > spin_lock(&pa->pa_lock);
> > > @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> > > goto out;
> > > }
> > > + /* only increase when find reallocation to discard */
> > > + this_cpu_inc(discard_pa_seq);
> > > +
> >
> > This is a good place to increment the counter but I think you also need to
> > handle the case:
> >
> > if (free < needed && busy) {
> > busy = 0;
> > ext4_unlock_group(sb, group);
> > cond_resched();
> > goto repeat;
> > }
> >
> > We can unlock the group here after removing some preallocations and thus
> > other processes checking discard_pa_seq could miss we did this. In fact I
> > think the code is somewhat buggy here and we should also discard extents
> > accumulated on "list" so far before unlocking the group. Ritesh?
> >
>
> mmm, so even though this code is not discarding those buffers b4
> unlocking, but it has still removed those from the grp->bb_prealloc_list
> and added it to the local list. And since it will at some point get
> scheduled and start operating from repeat: label so functionally wise
> this should be ok. Am I missing anything?

Yes, this function itself will be working correctly. But if we unlock the
group before discarding preallocations we have in our local list the
following can happen:

TASK0 TASK1
ext4_mb_new_blocks()
- no free block found
ext4_mb_discard_preallocations_should_retry()
ext4_mb_discard_preallocations()
ext4_mb_discard_group_preallocations()
ext4_lock_group(sb, group);
this_cpu_inc(discard_pa_seq);
scans group, moves some preallocations to local list
if (free < needed && busy) {
ext4_unlock_group(sb, group);
- finds no pa to discard
seq_retry = ext4_get_discard_pa_seq_sum()
- ok seq_retry != seq -> retry
- new search still didn't return
anything because preallocations
are sitting in TASK0 local list
and are not freed yet

> Although I agree, that if we remove at least the current pa's before
> unlocking the group may be a good idea, but we should also check
> why was this done like this at the first place.

If we don't care about somewhat premature ENOSPC error, then not discarding
preallocations before unlocking the group is not a problem. That's why
previously it was done like this I believe.

> I agree with Jan, that we should increment discard_pa_seq once we
> actually have something to discard. I should have written a comment here
> to explain why we did this here. But I think commit msg should have all
> the history (since I have a habit of writing long commit msgs ;)
>
> But IIRC, it was done since in case if there is a parallel thread which
> is discarding all the preallocations so the current thread may return 0
> since it checks the list_empty(&grp->bb_prealloc_list) check in
> ext4_mb_discard_group_preallocations() and returns 0 directly.

Ah, OK. I forgot about the unlocked check for empty bb_prealloc_list.

> And why the discard_pa_seq counter at other places may not help since we
> remove the pa nodes from grp->bb_prealloc_list into a local list and then
> start operating on that. So meanwhile some thread may comes and just
> checks that the list is empty and return 0 while some other thread may
> start discarding from it's local list. So I guess the main problem was
> that in the current code we remove the pa from grp->bb_prealloc_list and
> add it to local list. So if someone else comes and checks that
> grp->bb_prealloc_list is empty then it will directly return 0.
>
> So, maybe we could do something like this then?
>
> repeat:
> ext4_lock_group(sb, group);
> - this_cpu_inc(discard_pa_seq);
> list_for_each_entry_safe(pa, tmp,
> &grp->bb_prealloc_list, pa_group_list) {<...>
>
> + if (!free)
> + this_cpu_inc(discard_pa_seq); // we should do this here before calling
> list_del(&pa->pa_group_list);

Yup, that looks good.

> /* we can trust pa_free ... */
> free += pa->pa_free;
> spin_unlock(&pa->pa_lock);
>
> list_del(&pa->pa_group_list);
> list_add(&pa->u.pa_tmp_list, &list);
> }
>
> I have some test cases around this to test for cases which were
> failing. Since in world of parallelism you can't be 100% certain of some
> corner case (like this one you just reported).
> But, I don't have my other box rite now where I kept all of those -
> due to some technical issues. I think I should be able to get those by
> next week, if not, I anyways will setup my current machine for testing
> this.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR