2020-01-03 14:35:13

by Wei Yang

[permalink] [raw]
Subject: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

As all the other places, we grab the lock before manipulate the defer list.
Current implementation may face a race condition.

Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")

Signed-off-by: Wei Yang <[email protected]>

---
I notice the difference during code reading and just confused about the
difference. No specific test is done since limited knowledge about cgroup.

Maybe I miss something important?
---
mm/memcontrol.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bc01423277c5..62b7ec34ef1a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ spin_lock(&from->deferred_split_queue.split_queue_lock);
if (compound && !list_empty(page_deferred_list(page))) {
- spin_lock(&from->deferred_split_queue.split_queue_lock);
list_del_init(page_deferred_list(page));
from->deferred_split_queue.split_queue_len--;
- spin_unlock(&from->deferred_split_queue.split_queue_lock);
}
+ spin_unlock(&from->deferred_split_queue.split_queue_lock);
#endif
/*
* It is safe to change page->mem_cgroup here because the page
@@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
page->mem_cgroup = to;

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ spin_lock(&to->deferred_split_queue.split_queue_lock);
if (compound && list_empty(page_deferred_list(page))) {
- spin_lock(&to->deferred_split_queue.split_queue_lock);
list_add_tail(page_deferred_list(page),
&to->deferred_split_queue.split_queue);
to->deferred_split_queue.split_queue_len++;
- spin_unlock(&to->deferred_split_queue.split_queue_lock);
}
+ spin_unlock(&to->deferred_split_queue.split_queue_lock);
#endif

spin_unlock_irqrestore(&from->move_lock, flags);
--
2.17.1


2020-01-03 19:30:58

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Fri, 3 Jan 2020, Wei Yang wrote:

> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <[email protected]>
>
> ---
> I notice the difference during code reading and just confused about the
> difference. No specific test is done since limited knowledge about cgroup.
>
> Maybe I miss something important?

The check for !list_empty(page_deferred_list(page)) must certainly be
serialized with doing list_del_init(page_deferred_list(page)).

> ---
> mm/memcontrol.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..62b7ec34ef1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + spin_lock(&from->deferred_split_queue.split_queue_lock);
> if (compound && !list_empty(page_deferred_list(page))) {
> - spin_lock(&from->deferred_split_queue.split_queue_lock);
> list_del_init(page_deferred_list(page));
> from->deferred_split_queue.split_queue_len--;
> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
> }
> + spin_unlock(&from->deferred_split_queue.split_queue_lock);
> #endif
> /*
> * It is safe to change page->mem_cgroup here because the page
> @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
> page->mem_cgroup = to;
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + spin_lock(&to->deferred_split_queue.split_queue_lock);
> if (compound && list_empty(page_deferred_list(page))) {
> - spin_lock(&to->deferred_split_queue.split_queue_lock);
> list_add_tail(page_deferred_list(page),
> &to->deferred_split_queue.split_queue);
> to->deferred_split_queue.split_queue_len++;
> - spin_unlock(&to->deferred_split_queue.split_queue_lock);
> }
> + spin_unlock(&to->deferred_split_queue.split_queue_lock);
> #endif
>
> spin_unlock_irqrestore(&from->move_lock, flags);
> --
> 2.17.1
>
>
>

2020-01-03 23:41:13

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote:
>On Fri, 3 Jan 2020, Wei Yang wrote:
>
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>>
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>>
>> Signed-off-by: Wei Yang <[email protected]>
>>
>> ---
>> I notice the difference during code reading and just confused about the
>> difference. No specific test is done since limited knowledge about cgroup.
>>
>> Maybe I miss something important?
>
>The check for !list_empty(page_deferred_list(page)) must certainly be
>serialized with doing list_del_init(page_deferred_list(page)).
>

Hi David

Would you mind giving more information? You mean list_empty and list_del_init
is atomic?

--
Wei Yang
Help you, Help me

2020-01-04 00:46:53

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Sat, 4 Jan 2020, Wei Yang wrote:

> On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote:
> >On Fri, 3 Jan 2020, Wei Yang wrote:
> >
> >> As all the other places, we grab the lock before manipulate the defer list.
> >> Current implementation may face a race condition.
> >>
> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> >>
> >> Signed-off-by: Wei Yang <[email protected]>
> >>
> >> ---
> >> I notice the difference during code reading and just confused about the
> >> difference. No specific test is done since limited knowledge about cgroup.
> >>
> >> Maybe I miss something important?
> >
> >The check for !list_empty(page_deferred_list(page)) must certainly be
> >serialized with doing list_del_init(page_deferred_list(page)).
> >
>
> Hi David
>
> Would you mind giving more information? You mean list_empty and list_del_init
> is atomic?
>

I mean your patch is obviously correct :) It should likely also have a
[email protected] # 5.4+

Acked-by: David Rientjes <[email protected]>

2020-01-06 01:21:38

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Fri, Jan 03, 2020 at 04:44:59PM -0800, David Rientjes wrote:
>On Sat, 4 Jan 2020, Wei Yang wrote:
>
>> On Fri, Jan 03, 2020 at 11:29:06AM -0800, David Rientjes wrote:
>> >On Fri, 3 Jan 2020, Wei Yang wrote:
>> >
>> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> Current implementation may face a race condition.
>> >>
>> >> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> >>
>> >> Signed-off-by: Wei Yang <[email protected]>
>> >>
>> >> ---
>> >> I notice the difference during code reading and just confused about the
>> >> difference. No specific test is done since limited knowledge about cgroup.
>> >>
>> >> Maybe I miss something important?
>> >
>> >The check for !list_empty(page_deferred_list(page)) must certainly be
>> >serialized with doing list_del_init(page_deferred_list(page)).
>> >
>>
>> Hi David
>>
>> Would you mind giving more information? You mean list_empty and list_del_init
>> is atomic?
>>
>
>I mean your patch is obviously correct :) It should likely also have a
>[email protected] # 5.4+

Ah, my poor English ;-)

>
>Acked-by: David Rientjes <[email protected]>

--
Wei Yang
Help you, Help me

2020-01-06 10:24:59

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Fri 03-01-20 22:34:07, Wei Yang wrote:
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.

Please always make sure to describe the effect of the change. Why a racy
list_empty check matters?

> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <[email protected]>
>
> ---
> I notice the difference during code reading and just confused about the
> difference. No specific test is done since limited knowledge about cgroup.
>
> Maybe I miss something important?
> ---
> mm/memcontrol.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..62b7ec34ef1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + spin_lock(&from->deferred_split_queue.split_queue_lock);
> if (compound && !list_empty(page_deferred_list(page))) {
> - spin_lock(&from->deferred_split_queue.split_queue_lock);
> list_del_init(page_deferred_list(page));
> from->deferred_split_queue.split_queue_len--;
> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
> }
> + spin_unlock(&from->deferred_split_queue.split_queue_lock);
> #endif
> /*
> * It is safe to change page->mem_cgroup here because the page
> @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
> page->mem_cgroup = to;
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + spin_lock(&to->deferred_split_queue.split_queue_lock);
> if (compound && list_empty(page_deferred_list(page))) {
> - spin_lock(&to->deferred_split_queue.split_queue_lock);
> list_add_tail(page_deferred_list(page),
> &to->deferred_split_queue.split_queue);
> to->deferred_split_queue.split_queue_len++;
> - spin_unlock(&to->deferred_split_queue.split_queue_lock);
> }
> + spin_unlock(&to->deferred_split_queue.split_queue_lock);
> #endif
>
> spin_unlock_irqrestore(&from->move_lock, flags);
> --
> 2.17.1

--
Michal Hocko
SUSE Labs

2020-01-06 16:20:10

by Alexander Duyck

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Fri, Jan 3, 2020 at 6:34 AM Wei Yang <[email protected]> wrote:
>
> As all the other places, we grab the lock before manipulate the defer list.
> Current implementation may face a race condition.
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <[email protected]>
>
> ---
> I notice the difference during code reading and just confused about the
> difference. No specific test is done since limited knowledge about cgroup.
>
> Maybe I miss something important?
> ---
> mm/memcontrol.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc01423277c5..62b7ec34ef1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + spin_lock(&from->deferred_split_queue.split_queue_lock);
> if (compound && !list_empty(page_deferred_list(page))) {
> - spin_lock(&from->deferred_split_queue.split_queue_lock);
> list_del_init(page_deferred_list(page));
> from->deferred_split_queue.split_queue_len--;
> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
> }
> + spin_unlock(&from->deferred_split_queue.split_queue_lock);
> #endif
> /*
> * It is safe to change page->mem_cgroup here because the page

So I suspect the lock placement has to do with the compound boolean
value passed to the function.

One thing you might want to do is pull the "if (compound)" check out
and place it outside of the spinlock check. It would then simplify
this signficantly so it is something like
if (compound) {
spin_lock();
list = page_deferred_list(page);
if (!list_empty(list)) {
list_del_init(list);
from->..split_queue_len--;
}
spin_unlock();
}

Same for the block below. I would pull the check for compound outside
of the spinlock call since it is a value that shouldn't change and
would eliminate an unnecessary lock in the non-compound case.

> @@ -5385,13 +5385,13 @@ static int mem_cgroup_move_account(struct page *page,
> page->mem_cgroup = to;
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + spin_lock(&to->deferred_split_queue.split_queue_lock);
> if (compound && list_empty(page_deferred_list(page))) {
> - spin_lock(&to->deferred_split_queue.split_queue_lock);
> list_add_tail(page_deferred_list(page),
> &to->deferred_split_queue.split_queue);
> to->deferred_split_queue.split_queue_len++;
> - spin_unlock(&to->deferred_split_queue.split_queue_lock);
> }
> + spin_unlock(&to->deferred_split_queue.split_queue_lock);
> #endif
>
> spin_unlock_irqrestore(&from->move_lock, flags);
> --

2020-01-07 01:24:22

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>
>Please always make sure to describe the effect of the change. Why a racy
>list_empty check matters?
>

Hmm... access the list without proper lock leads to many bad behaviors.

For example, if we grab the lock after checking list_empty, the page may
already be removed from list in split_huge_page_list. And then list_del_init
would trigger bug.

--
Wei Yang
Help you, Help me

2020-01-07 01:27:15

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Mon, Jan 06, 2020 at 08:18:34AM -0800, Alexander Duyck wrote:
>On Fri, Jan 3, 2020 at 6:34 AM Wei Yang <[email protected]> wrote:
>>
>> As all the other places, we grab the lock before manipulate the defer list.
>> Current implementation may face a race condition.
>>
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>>
>> Signed-off-by: Wei Yang <[email protected]>
>>
>> ---
>> I notice the difference during code reading and just confused about the
>> difference. No specific test is done since limited knowledge about cgroup.
>>
>> Maybe I miss something important?
>> ---
>> mm/memcontrol.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index bc01423277c5..62b7ec34ef1a 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5368,12 +5368,12 @@ static int mem_cgroup_move_account(struct page *page,
>> }
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> + spin_lock(&from->deferred_split_queue.split_queue_lock);
>> if (compound && !list_empty(page_deferred_list(page))) {
>> - spin_lock(&from->deferred_split_queue.split_queue_lock);
>> list_del_init(page_deferred_list(page));
>> from->deferred_split_queue.split_queue_len--;
>> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
>> }
>> + spin_unlock(&from->deferred_split_queue.split_queue_lock);
>> #endif
>> /*
>> * It is safe to change page->mem_cgroup here because the page
>
>So I suspect the lock placement has to do with the compound boolean
>value passed to the function.
>

Hey, Alexander

Thanks for your comment.

>One thing you might want to do is pull the "if (compound)" check out
>and place it outside of the spinlock check. It would then simplify
>this signficantly so it is something like
>if (compound) {
> spin_lock();
> list = page_deferred_list(page);
> if (!list_empty(list)) {
> list_del_init(list);
> from->..split_queue_len--;
> }
> spin_unlock();
>}
>
>Same for the block below. I would pull the check for compound outside
>of the spinlock call since it is a value that shouldn't change and
>would eliminate an unnecessary lock in the non-compound case.

This is reasonable, if no objection from others, I would change this in v2.


--
Wei Yang
Help you, Help me

2020-01-07 02:08:31

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Tue, 7 Jan 2020, Wei Yang wrote:

> >One thing you might want to do is pull the "if (compound)" check out
> >and place it outside of the spinlock check. It would then simplify
> >this signficantly so it is something like
> >if (compound) {
> > spin_lock();
> > list = page_deferred_list(page);
> > if (!list_empty(list)) {
> > list_del_init(list);
> > from->..split_queue_len--;
> > }
> > spin_unlock();
> >}
> >
> >Same for the block below. I would pull the check for compound outside
> >of the spinlock call since it is a value that shouldn't change and
> >would eliminate an unnecessary lock in the non-compound case.
>
> This is reasonable, if no objection from others, I would change this in v2.

Looks fine to me; I don't see it as a necessary improvement but there's
also no reason to object to it. It's definitely a patch that is needed,
however, for the simple reason that with the existing code we can
manipulate the deferred split queue incorrectly so either way works for
me. Feel free to keep my acked-by.

2020-01-07 02:36:30

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Mon, Jan 06, 2020 at 06:07:29PM -0800, David Rientjes wrote:
>On Tue, 7 Jan 2020, Wei Yang wrote:
>
>> >One thing you might want to do is pull the "if (compound)" check out
>> >and place it outside of the spinlock check. It would then simplify
>> >this signficantly so it is something like
>> >if (compound) {
>> > spin_lock();
>> > list = page_deferred_list(page);
>> > if (!list_empty(list)) {
>> > list_del_init(list);
>> > from->..split_queue_len--;
>> > }
>> > spin_unlock();
>> >}
>> >
>> >Same for the block below. I would pull the check for compound outside
>> >of the spinlock call since it is a value that shouldn't change and
>> >would eliminate an unnecessary lock in the non-compound case.
>>
>> This is reasonable, if no objection from others, I would change this in v2.
>
>Looks fine to me; I don't see it as a necessary improvement but there's
>also no reason to object to it. It's definitely a patch that is needed,
>however, for the simple reason that with the existing code we can
>manipulate the deferred split queue incorrectly so either way works for
>me. Feel free to keep my acked-by.

Ah, thanks David. You are so supportive.

--
Wei Yang
Help you, Help me

2020-01-07 08:39:35

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Tue 07-01-20 09:22:41, Wei Yang wrote:
> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> As all the other places, we grab the lock before manipulate the defer list.
> >> Current implementation may face a race condition.
> >
> >Please always make sure to describe the effect of the change. Why a racy
> >list_empty check matters?
> >
>
> Hmm... access the list without proper lock leads to many bad behaviors.

My point is that the changelog should describe that bad behavior.

> For example, if we grab the lock after checking list_empty, the page may
> already be removed from list in split_huge_page_list. And then list_del_init
> would trigger bug.

And how does list_empty check under the lock guarantee that the page is
on the deferred list?
--
Michal Hocko
SUSE Labs

2020-01-08 00:38:00

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> Current implementation may face a race condition.
>> >
>> >Please always make sure to describe the effect of the change. Why a racy
>> >list_empty check matters?
>> >
>>
>> Hmm... access the list without proper lock leads to many bad behaviors.
>
>My point is that the changelog should describe that bad behavior.
>
>> For example, if we grab the lock after checking list_empty, the page may
>> already be removed from list in split_huge_page_list. And then list_del_init
>> would trigger bug.
>
>And how does list_empty check under the lock guarantee that the page is
>on the deferred list?

Just one confusion, is this kind of description basic concept of concurrent
programming? How detail level we need to describe the effect?

To me, grab the lock before accessing the critical section is obvious.
list_empty and list_del should be the critical section. And the
lock should protect the whole critical section instead of part of it.

>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-08 09:57:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Wed 08-01-20 08:35:43, Wei Yang wrote:
> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> >> As all the other places, we grab the lock before manipulate the defer list.
> >> >> Current implementation may face a race condition.
> >> >
> >> >Please always make sure to describe the effect of the change. Why a racy
> >> >list_empty check matters?
> >> >
> >>
> >> Hmm... access the list without proper lock leads to many bad behaviors.
> >
> >My point is that the changelog should describe that bad behavior.
> >
> >> For example, if we grab the lock after checking list_empty, the page may
> >> already be removed from list in split_huge_page_list. And then list_del_init
> >> would trigger bug.
> >
> >And how does list_empty check under the lock guarantee that the page is
> >on the deferred list?
>
> Just one confusion, is this kind of description basic concept of concurrent
> programming? How detail level we need to describe the effect?

When I write changelogs for patches like this I usually describe, what
is the potential race - e.g.
CPU1 CPU2
path1 path2
check lock
operation2
unlock
lock
# check might not hold anymore
operation1
unlock

and what is the effect of the race - e.g. a crash, data corruption,
pointless attempt for operation1 which fails with user visible effect
etc.
This helps reviewers and everybody reading the code in the future to
understand the locking scheme.

> To me, grab the lock before accessing the critical section is obvious.

It might be obvious but in many cases it is useful to minimize the
locking and do a potentially race check before the lock is taken if the
resulting operation can handle that.

> list_empty and list_del should be the critical section. And the
> lock should protect the whole critical section instead of part of it.

I am not disputing that. What I am trying to say is that the changelog
should described the problem in the first place.

Moreover, look at the code you are trying to fix. Sure extending the
locking seem straightforward but does it result in a correct code
though? See my question in the previous email. How do we know that the
page is actually enqued in a non-empty list?
--
Michal Hocko
SUSE Labs

2020-01-09 02:04:49

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
>On Wed 08-01-20 08:35:43, Wei Yang wrote:
>> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> >> Current implementation may face a race condition.
>> >> >
>> >> >Please always make sure to describe the effect of the change. Why a racy
>> >> >list_empty check matters?
>> >> >
>> >>
>> >> Hmm... access the list without proper lock leads to many bad behaviors.
>> >
>> >My point is that the changelog should describe that bad behavior.
>> >
>> >> For example, if we grab the lock after checking list_empty, the page may
>> >> already be removed from list in split_huge_page_list. And then list_del_init
>> >> would trigger bug.
>> >
>> >And how does list_empty check under the lock guarantee that the page is
>> >on the deferred list?
>>
>> Just one confusion, is this kind of description basic concept of concurrent
>> programming? How detail level we need to describe the effect?
>
>When I write changelogs for patches like this I usually describe, what
>is the potential race - e.g.
> CPU1 CPU2
> path1 path2
> check lock
> operation2
> unlock
> lock
> # check might not hold anymore
> operation1
> unlock
>

Nice, I would prepare a changelog like this.

>and what is the effect of the race - e.g. a crash, data corruption,
>pointless attempt for operation1 which fails with user visible effect
>etc.
>This helps reviewers and everybody reading the code in the future to
>understand the locking scheme.
>
>> To me, grab the lock before accessing the critical section is obvious.
>
>It might be obvious but in many cases it is useful to minimize the
>locking and do a potentially race check before the lock is taken if the
>resulting operation can handle that.
>
>> list_empty and list_del should be the critical section. And the
>> lock should protect the whole critical section instead of part of it.
>
>I am not disputing that. What I am trying to say is that the changelog
>should described the problem in the first place.
>
>Moreover, look at the code you are trying to fix. Sure extending the
>locking seem straightforward but does it result in a correct code
>though? See my question in the previous email. How do we know that the
>page is actually enqued in a non-empty list?

I may not get your point for the last sentence.

The list_empty() doesn't check the queue is empty but check the list, here is
the page, is not enqueued into any list. Is this your concern?

>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-09 03:19:48

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
>On Wed 08-01-20 08:35:43, Wei Yang wrote:
>> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> >> Current implementation may face a race condition.
>> >> >
>> >> >Please always make sure to describe the effect of the change. Why a racy
>> >> >list_empty check matters?
>> >> >
>> >>
>> >> Hmm... access the list without proper lock leads to many bad behaviors.
>> >
>> >My point is that the changelog should describe that bad behavior.
>> >
>> >> For example, if we grab the lock after checking list_empty, the page may
>> >> already be removed from list in split_huge_page_list. And then list_del_init
>> >> would trigger bug.
>> >
>> >And how does list_empty check under the lock guarantee that the page is
>> >on the deferred list?
>>
>> Just one confusion, is this kind of description basic concept of concurrent
>> programming? How detail level we need to describe the effect?
>
>When I write changelogs for patches like this I usually describe, what
>is the potential race - e.g.
> CPU1 CPU2
> path1 path2
> check lock
> operation2
> unlock
> lock
> # check might not hold anymore
> operation1
> unlock
>
>and what is the effect of the race - e.g. a crash, data corruption,
>pointless attempt for operation1 which fails with user visible effect
>etc.

Hi, Michal, here is my attempt for an example. Hope this one looks good to
you.


For example, the potential race would be:

CPU1 CPU2
mem_cgroup_move_account split_huge_page_to_list
!list_empty
lock
!list_empty
list_del
unlock
lock
# !list_empty might not hold anymore
list_del_init
unlock

When this sequence happens, the list_del_init() in
mem_cgroup_move_account() would crash since the page is already been
removed by list_del in split_huge_page_to_list().


--
Wei Yang
Help you, Help me

2020-01-09 08:36:48

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Thu 09-01-20 10:03:19, Wei Yang wrote:
> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
[...]
> >Moreover, look at the code you are trying to fix. Sure extending the
> >locking seem straightforward but does it result in a correct code
> >though? See my question in the previous email. How do we know that the
> >page is actually enqued in a non-empty list?
>
> I may not get your point for the last sentence.
>
> The list_empty() doesn't check the queue is empty but check the list, here is
> the page, is not enqueued into any list. Is this your concern?

My bad. For some reason I have misread the code and thought this was
get_deferred_split_queue rather than page_deferred_list. Sorry about the
confusion.
--
Michal Hocko
SUSE Labs

2020-01-09 08:37:40

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Thu 09-01-20 11:18:21, Wei Yang wrote:
> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
> >On Wed 08-01-20 08:35:43, Wei Yang wrote:
> >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
> >> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
> >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
> >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
> >> >> >> As all the other places, we grab the lock before manipulate the defer list.
> >> >> >> Current implementation may face a race condition.
> >> >> >
> >> >> >Please always make sure to describe the effect of the change. Why a racy
> >> >> >list_empty check matters?
> >> >> >
> >> >>
> >> >> Hmm... access the list without proper lock leads to many bad behaviors.
> >> >
> >> >My point is that the changelog should describe that bad behavior.
> >> >
> >> >> For example, if we grab the lock after checking list_empty, the page may
> >> >> already be removed from list in split_huge_page_list. And then list_del_init
> >> >> would trigger bug.
> >> >
> >> >And how does list_empty check under the lock guarantee that the page is
> >> >on the deferred list?
> >>
> >> Just one confusion, is this kind of description basic concept of concurrent
> >> programming? How detail level we need to describe the effect?
> >
> >When I write changelogs for patches like this I usually describe, what
> >is the potential race - e.g.
> > CPU1 CPU2
> > path1 path2
> > check lock
> > operation2
> > unlock
> > lock
> > # check might not hold anymore
> > operation1
> > unlock
> >
> >and what is the effect of the race - e.g. a crash, data corruption,
> >pointless attempt for operation1 which fails with user visible effect
> >etc.
>
> Hi, Michal, here is my attempt for an example. Hope this one looks good to
> you.
>
>
> For example, the potential race would be:
>
> CPU1 CPU2
> mem_cgroup_move_account split_huge_page_to_list
> !list_empty
> lock
> !list_empty
> list_del
> unlock
> lock
> # !list_empty might not hold anymore
> list_del_init
> unlock
>
> When this sequence happens, the list_del_init() in
> mem_cgroup_move_account() would crash since the page is already been
> removed by list_del in split_huge_page_to_list().

Yes this looks much more informative. I would just add that this will
crash if CONFIG_DEBUG_LIST.

Thanks!
--
Michal Hocko
SUSE Labs

2020-01-09 10:12:00

by Wei Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list

On Thu, Jan 09, 2020 at 09:36:41AM +0100, Michal Hocko wrote:
>On Thu 09-01-20 11:18:21, Wei Yang wrote:
>> On Wed, Jan 08, 2020 at 10:40:41AM +0100, Michal Hocko wrote:
>> >On Wed 08-01-20 08:35:43, Wei Yang wrote:
>> >> On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote:
>> >> >On Tue 07-01-20 09:22:41, Wei Yang wrote:
>> >> >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote:
>> >> >> >On Fri 03-01-20 22:34:07, Wei Yang wrote:
>> >> >> >> As all the other places, we grab the lock before manipulate the defer list.
>> >> >> >> Current implementation may face a race condition.
>> >> >> >
>> >> >> >Please always make sure to describe the effect of the change. Why a racy
>> >> >> >list_empty check matters?
>> >> >> >
>> >> >>
>> >> >> Hmm... access the list without proper lock leads to many bad behaviors.
>> >> >
>> >> >My point is that the changelog should describe that bad behavior.
>> >> >
>> >> >> For example, if we grab the lock after checking list_empty, the page may
>> >> >> already be removed from list in split_huge_page_list. And then list_del_init
>> >> >> would trigger bug.
>> >> >
>> >> >And how does list_empty check under the lock guarantee that the page is
>> >> >on the deferred list?
>> >>
>> >> Just one confusion, is this kind of description basic concept of concurrent
>> >> programming? How detail level we need to describe the effect?
>> >
>> >When I write changelogs for patches like this I usually describe, what
>> >is the potential race - e.g.
>> > CPU1 CPU2
>> > path1 path2
>> > check lock
>> > operation2
>> > unlock
>> > lock
>> > # check might not hold anymore
>> > operation1
>> > unlock
>> >
>> >and what is the effect of the race - e.g. a crash, data corruption,
>> >pointless attempt for operation1 which fails with user visible effect
>> >etc.
>>
>> Hi, Michal, here is my attempt for an example. Hope this one looks good to
>> you.
>>
>>
>> For example, the potential race would be:
>>
>> CPU1 CPU2
>> mem_cgroup_move_account split_huge_page_to_list
>> !list_empty
>> lock
>> !list_empty
>> list_del
>> unlock
>> lock
>> # !list_empty might not hold anymore
>> list_del_init
>> unlock
>>
>> When this sequence happens, the list_del_init() in
>> mem_cgroup_move_account() would crash since the page is already been
>> removed by list_del in split_huge_page_to_list().
>
>Yes this looks much more informative. I would just add that this will
>crash if CONFIG_DEBUG_LIST.
>
>Thanks!

Glad you like it~

Will prepare v2 with your suggestion :-)

>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me