2020-01-17 23:40:15

by Wei Yang

[permalink] [raw]
Subject: [Patch v4] mm: thp: remove the defer list related code since this will not happen

If compound is true, this means it is a PMD mapped THP. Which implies
the page is not linked to any defer list. So the first code chunk will
not be executed.

Also with this reason, it would not be proper to add this page to a
defer list. So the second code chunk is not correct.

Based on this, we should remove the defer list related code.

Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")

Signed-off-by: Wei Yang <[email protected]>
Suggested-by: Kirill A. Shutemov <[email protected]>
Cc: <[email protected]> [5.4+]

---
v4:
* finally we identified the related code is not necessary and not
correct, just remove it
* thanks to Kirill T first spot some problem
v3:
* remove all review/ack tag since rewrite the changelog
* use deferred_split_huge_page as the example of race
* add cc stable 5.4+ tag as suggested by David Rientjes

v2:
* move check on compound outside suggested by Alexander
* an example of the race condition, suggested by Michal
---
mm/memcontrol.c | 18 ------------------
1 file changed, 18 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6c83cf4ed970..27c231bf4565 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page *page,
__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
}

-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (compound && !list_empty(page_deferred_list(page))) {
- spin_lock(&from->deferred_split_queue.split_queue_lock);
- list_del_init(page_deferred_list(page));
- from->deferred_split_queue.split_queue_len--;
- spin_unlock(&from->deferred_split_queue.split_queue_lock);
- }
-#endif
/*
* It is safe to change page->mem_cgroup here because the page
* is referenced, charged, and isolated - we can't race with
@@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page *page,
/* caller should have done css_get */
page->mem_cgroup = to;

-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (compound && list_empty(page_deferred_list(page))) {
- spin_lock(&to->deferred_split_queue.split_queue_lock);
- list_add_tail(page_deferred_list(page),
- &to->deferred_split_queue.split_queue);
- to->deferred_split_queue.split_queue_len++;
- spin_unlock(&to->deferred_split_queue.split_queue_lock);
- }
-#endif
-
spin_unlock_irqrestore(&from->move_lock, flags);

ret = 0;
--
2.17.1


2020-01-18 00:59:36

by Yang Shi

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen



On 1/17/20 3:38 PM, Wei Yang wrote:
> If compound is true, this means it is a PMD mapped THP. Which implies
> the page is not linked to any defer list. So the first code chunk will
> not be executed.
>
> Also with this reason, it would not be proper to add this page to a
> defer list. So the second code chunk is not correct.
>
> Based on this, we should remove the defer list related code.
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <[email protected]>
> Suggested-by: Kirill A. Shutemov <[email protected]>
> Cc: <[email protected]> [5.4+]
>
> ---
> v4:
> * finally we identified the related code is not necessary and not
> correct, just remove it
> * thanks to Kirill T first spot some problem

Thanks for debugging and figuring this out. Acked-by: Yang Shi
<[email protected]>

> v3:
> * remove all review/ack tag since rewrite the changelog
> * use deferred_split_huge_page as the example of race
> * add cc stable 5.4+ tag as suggested by David Rientjes
>
> v2:
> * move check on compound outside suggested by Alexander
> * an example of the race condition, suggested by Michal
> ---
> mm/memcontrol.c | 18 ------------------
> 1 file changed, 18 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6c83cf4ed970..27c231bf4565 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page *page,
> __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
> }
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (compound && !list_empty(page_deferred_list(page))) {
> - spin_lock(&from->deferred_split_queue.split_queue_lock);
> - list_del_init(page_deferred_list(page));
> - from->deferred_split_queue.split_queue_len--;
> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
> - }
> -#endif
> /*
> * It is safe to change page->mem_cgroup here because the page
> * is referenced, charged, and isolated - we can't race with
> @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page *page,
> /* caller should have done css_get */
> page->mem_cgroup = to;
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (compound && list_empty(page_deferred_list(page))) {
> - spin_lock(&to->deferred_split_queue.split_queue_lock);
> - list_add_tail(page_deferred_list(page),
> - &to->deferred_split_queue.split_queue);
> - to->deferred_split_queue.split_queue_len++;
> - spin_unlock(&to->deferred_split_queue.split_queue_lock);
> - }
> -#endif
> -
> spin_unlock_irqrestore(&from->move_lock, flags);
>
> ret = 0;

2020-01-18 05:32:12

by Yang Shi

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen



On 1/17/20 4:57 PM, Yang Shi wrote:
>
>
> On 1/17/20 3:38 PM, Wei Yang wrote:
>> If compound is true, this means it is a PMD mapped THP. Which implies
>> the page is not linked to any defer list. So the first code chunk will
>> not be executed.
>>
>> Also with this reason, it would not be proper to add this page to a
>> defer list. So the second code chunk is not correct.
>>
>> Based on this, we should remove the defer list related code.
>>
>> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg
>> aware")
>>
>> Signed-off-by: Wei Yang <[email protected]>
>> Suggested-by: Kirill A. Shutemov <[email protected]>
>> Cc: <[email protected]>    [5.4+]
>>
>> ---
>> v4:
>>    * finally we identified the related code is not necessary and not
>>      correct, just remove it
>>    * thanks to Kirill T first spot some problem
>
> Thanks for debugging and figuring this out. Acked-by: Yang Shi
> <[email protected]>

BTW, the patch itself is fine, but the subject looks really confusing.
It sounds like we would remove all deferred list code. I'd suggest
rephrase it to:

mm: thp: don't need care deferred split queue in memcg charge move path

>
>> v3:
>>    * remove all review/ack tag since rewrite the changelog
>>    * use deferred_split_huge_page as the example of race
>>    * add cc stable 5.4+ tag as suggested by David Rientjes
>>
>> v2:
>>    * move check on compound outside suggested by Alexander
>>    * an example of the race condition, suggested by Michal
>> ---
>>   mm/memcontrol.c | 18 ------------------
>>   1 file changed, 18 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6c83cf4ed970..27c231bf4565 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struct page
>> *page,
>>           __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
>>       }
>>   -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -    if (compound && !list_empty(page_deferred_list(page))) {
>> - spin_lock(&from->deferred_split_queue.split_queue_lock);
>> -        list_del_init(page_deferred_list(page));
>> -        from->deferred_split_queue.split_queue_len--;
>> - spin_unlock(&from->deferred_split_queue.split_queue_lock);
>> -    }
>> -#endif
>>       /*
>>        * It is safe to change page->mem_cgroup here because the page
>>        * is referenced, charged, and isolated - we can't race with
>> @@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struct page
>> *page,
>>       /* caller should have done css_get */
>>       page->mem_cgroup = to;
>>   -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -    if (compound && list_empty(page_deferred_list(page))) {
>> - spin_lock(&to->deferred_split_queue.split_queue_lock);
>> -        list_add_tail(page_deferred_list(page),
>> - &to->deferred_split_queue.split_queue);
>> -        to->deferred_split_queue.split_queue_len++;
>> - spin_unlock(&to->deferred_split_queue.split_queue_lock);
>> -    }
>> -#endif
>> -
>>       spin_unlock_irqrestore(&from->move_lock, flags);
>>         ret = 0;
>

2020-01-18 22:56:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <[email protected]> wrote:

> If compound is true, this means it is a PMD mapped THP. Which implies
> the page is not linked to any defer list. So the first code chunk will
> not be executed.
>
> Also with this reason, it would not be proper to add this page to a
> defer list. So the second code chunk is not correct.
>
> Based on this, we should remove the defer list related code.
>
> Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>
> Signed-off-by: Wei Yang <[email protected]>
> Suggested-by: Kirill A. Shutemov <[email protected]>
> Cc: <[email protected]> [5.4+]

This patch is identical to "mm: thp: grab the lock before manipulating
defer list", which is rather confusing. Please let people know when
this sort of thing is done.

The earlier changelog mentioned a possible race condition. This
changelog does not. In fact this changelog fails to provide any
description of any userspace-visible runtime effects of the bug.
Please send along such a description for inclusion, as always.

2020-01-18 23:39:46

by David Rientjes

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Sat, 18 Jan 2020, Andrew Morton wrote:

> On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <[email protected]> wrote:
>
> > If compound is true, this means it is a PMD mapped THP. Which implies
> > the page is not linked to any defer list. So the first code chunk will
> > not be executed.
> >
> > Also with this reason, it would not be proper to add this page to a
> > defer list. So the second code chunk is not correct.
> >
> > Based on this, we should remove the defer list related code.
> >
> > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> >
> > Signed-off-by: Wei Yang <[email protected]>
> > Suggested-by: Kirill A. Shutemov <[email protected]>
> > Cc: <[email protected]> [5.4+]
>
> This patch is identical to "mm: thp: grab the lock before manipulating
> defer list", which is rather confusing. Please let people know when
> this sort of thing is done.
>
> The earlier changelog mentioned a possible race condition. This
> changelog does not. In fact this changelog fails to provide any
> description of any userspace-visible runtime effects of the bug.
> Please send along such a description for inclusion, as always.
>

The locking concern that Wei was originally looking at is no longer an
issue because we determined that the code in question could simply be
removed.

I think the following can be added to the changelog:

----->o-----

When migrating memcg charges of thp memory, there are two possibilities:

(1) The underlying compound page is mapped by a pmd and thus does is not
on a deferred split queue (it's mapped), or

(2) The compound page is not mapped by a pmd and is awaiting split on a
deferred split queue.

The current charge migration implementation does *not* migrate charges for
thp memory on the deferred split queue, it only migrates charges for pages
that are mapped by a pmd.

Thus, to migrate charges, the underlying compound page cannot be on a
deferred split queue; no list manipulation needs to be done in
mem_cgroup_move_account().

With the current code, the underlying compound page is moved to the
deferred split queue of the memcg its memory is not charged to, so
susbequent reclaim will consider these pages for the wrong memcg. Remove
the deferred split queue handling in mem_cgroup_move_account() entirely.

----->o-----

Acked-by: David Rientjes <[email protected]>

2020-01-19 02:26:06

by Wei Yang

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Sat, Jan 18, 2020 at 03:36:06PM -0800, David Rientjes wrote:
>On Sat, 18 Jan 2020, Andrew Morton wrote:
>
>> On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <[email protected]> wrote:
>>
>> > If compound is true, this means it is a PMD mapped THP. Which implies
>> > the page is not linked to any defer list. So the first code chunk will
>> > not be executed.
>> >
>> > Also with this reason, it would not be proper to add this page to a
>> > defer list. So the second code chunk is not correct.
>> >
>> > Based on this, we should remove the defer list related code.
>> >
>> > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> >
>> > Signed-off-by: Wei Yang <[email protected]>
>> > Suggested-by: Kirill A. Shutemov <[email protected]>
>> > Cc: <[email protected]> [5.4+]
>>
>> This patch is identical to "mm: thp: grab the lock before manipulating
>> defer list", which is rather confusing. Please let people know when
>> this sort of thing is done.
>>
>> The earlier changelog mentioned a possible race condition. This
>> changelog does not. In fact this changelog fails to provide any
>> description of any userspace-visible runtime effects of the bug.
>> Please send along such a description for inclusion, as always.
>>
>
>The locking concern that Wei was originally looking at is no longer an
>issue because we determined that the code in question could simply be
>removed.
>
>I think the following can be added to the changelog:
>
>----->o-----
>
>When migrating memcg charges of thp memory, there are two possibilities:
>
> (1) The underlying compound page is mapped by a pmd and thus does is not
> on a deferred split queue (it's mapped), or
>
> (2) The compound page is not mapped by a pmd and is awaiting split on a
> deferred split queue.
>
>The current charge migration implementation does *not* migrate charges for
>thp memory on the deferred split queue, it only migrates charges for pages
>that are mapped by a pmd.
>
>Thus, to migrate charges, the underlying compound page cannot be on a
>deferred split queue; no list manipulation needs to be done in
>mem_cgroup_move_account().
>
>With the current code, the underlying compound page is moved to the
>deferred split queue of the memcg its memory is not charged to, so
>susbequent reclaim will consider these pages for the wrong memcg. Remove
>the deferred split queue handling in mem_cgroup_move_account() entirely.
>
>----->o-----
>
>Acked-by: David Rientjes <[email protected]>

Hi David,

The changlog looks awesome to me. Thanks ~

Hi Andrew

I see you queue this in you tree, do I need to rewrite a patch with better
changelog?

--
Wei Yang
Help you, Help me

2020-01-20 07:23:48

by Michal Hocko

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Sat 18-01-20 15:36:06, David Rientjes wrote:
> On Sat, 18 Jan 2020, Andrew Morton wrote:
>
> > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <[email protected]> wrote:
> >
> > > If compound is true, this means it is a PMD mapped THP. Which implies
> > > the page is not linked to any defer list. So the first code chunk will
> > > not be executed.
> > >
> > > Also with this reason, it would not be proper to add this page to a
> > > defer list. So the second code chunk is not correct.
> > >
> > > Based on this, we should remove the defer list related code.
> > >
> > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
> > >
> > > Signed-off-by: Wei Yang <[email protected]>
> > > Suggested-by: Kirill A. Shutemov <[email protected]>
> > > Cc: <[email protected]> [5.4+]
> >
> > This patch is identical to "mm: thp: grab the lock before manipulating
> > defer list", which is rather confusing. Please let people know when
> > this sort of thing is done.
> >
> > The earlier changelog mentioned a possible race condition. This
> > changelog does not. In fact this changelog fails to provide any
> > description of any userspace-visible runtime effects of the bug.
> > Please send along such a description for inclusion, as always.
> >
>
> The locking concern that Wei was originally looking at is no longer an
> issue because we determined that the code in question could simply be
> removed.
>
> I think the following can be added to the changelog:
>
> ----->o-----
>
> When migrating memcg charges of thp memory, there are two possibilities:
>
> (1) The underlying compound page is mapped by a pmd and thus does is not
> on a deferred split queue (it's mapped), or
>
> (2) The compound page is not mapped by a pmd and is awaiting split on a
> deferred split queue.
>
> The current charge migration implementation does *not* migrate charges for
> thp memory on the deferred split queue, it only migrates charges for pages
> that are mapped by a pmd.
>
> Thus, to migrate charges, the underlying compound page cannot be on a
> deferred split queue; no list manipulation needs to be done in
> mem_cgroup_move_account().
>
> With the current code, the underlying compound page is moved to the
> deferred split queue of the memcg its memory is not charged to, so
> susbequent reclaim will consider these pages for the wrong memcg. Remove
> the deferred split queue handling in mem_cgroup_move_account() entirely.

I believe this still doesn't describe the underlying problem to the full
extent. What happens with the page on the deferred list when it
shouldn't be there in fact? Unless I am missing something deferred_split_scan
will simply split that huge page. Which is a bit unfortunate but nothing
really critical. This should be mentioned in the changelog.

With that clarified, feel free to add

Acked-by: Michal Hocko <[email protected]>

--
Michal Hocko
SUSE Labs

2020-01-20 08:18:07

by Wei Yang

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Mon, Jan 20, 2020 at 08:22:37AM +0100, Michal Hocko wrote:
>On Sat 18-01-20 15:36:06, David Rientjes wrote:
>> On Sat, 18 Jan 2020, Andrew Morton wrote:
>>
>> > On Sat, 18 Jan 2020 07:38:36 +0800 Wei Yang <[email protected]> wrote:
>> >
>> > > If compound is true, this means it is a PMD mapped THP. Which implies
>> > > the page is not linked to any defer list. So the first code chunk will
>> > > not be executed.
>> > >
>> > > Also with this reason, it would not be proper to add this page to a
>> > > defer list. So the second code chunk is not correct.
>> > >
>> > > Based on this, we should remove the defer list related code.
>> > >
>> > > Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
>> > >
>> > > Signed-off-by: Wei Yang <[email protected]>
>> > > Suggested-by: Kirill A. Shutemov <[email protected]>
>> > > Cc: <[email protected]> [5.4+]
>> >
>> > This patch is identical to "mm: thp: grab the lock before manipulating
>> > defer list", which is rather confusing. Please let people know when
>> > this sort of thing is done.
>> >
>> > The earlier changelog mentioned a possible race condition. This
>> > changelog does not. In fact this changelog fails to provide any
>> > description of any userspace-visible runtime effects of the bug.
>> > Please send along such a description for inclusion, as always.
>> >
>>
>> The locking concern that Wei was originally looking at is no longer an
>> issue because we determined that the code in question could simply be
>> removed.
>>
>> I think the following can be added to the changelog:
>>
>> ----->o-----
>>
>> When migrating memcg charges of thp memory, there are two possibilities:
>>
>> (1) The underlying compound page is mapped by a pmd and thus does is not
>> on a deferred split queue (it's mapped), or
>>
>> (2) The compound page is not mapped by a pmd and is awaiting split on a
>> deferred split queue.
>>
>> The current charge migration implementation does *not* migrate charges for
>> thp memory on the deferred split queue, it only migrates charges for pages
>> that are mapped by a pmd.
>>
>> Thus, to migrate charges, the underlying compound page cannot be on a
>> deferred split queue; no list manipulation needs to be done in
>> mem_cgroup_move_account().
>>
>> With the current code, the underlying compound page is moved to the
>> deferred split queue of the memcg its memory is not charged to, so
>> susbequent reclaim will consider these pages for the wrong memcg. Remove
>> the deferred split queue handling in mem_cgroup_move_account() entirely.
>
>I believe this still doesn't describe the underlying problem to the full
>extent. What happens with the page on the deferred list when it
>shouldn't be there in fact? Unless I am missing something deferred_split_scan
>will simply split that huge page. Which is a bit unfortunate but nothing
>really critical. This should be mentioned in the changelog.
>

Per my understanding, if we do the split when it is not necessary, we
probably have a lower performance due to tlb miss. For others, I don't see the
impact.

>With that clarified, feel free to add
>
>Acked-by: Michal Hocko <[email protected]>
>
>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-20 21:12:04

by David Rientjes

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Mon, 20 Jan 2020, Michal Hocko wrote:

> > When migrating memcg charges of thp memory, there are two possibilities:
> >
> > (1) The underlying compound page is mapped by a pmd and thus does is not
> > on a deferred split queue (it's mapped), or
> >
> > (2) The compound page is not mapped by a pmd and is awaiting split on a
> > deferred split queue.
> >
> > The current charge migration implementation does *not* migrate charges for
> > thp memory on the deferred split queue, it only migrates charges for pages
> > that are mapped by a pmd.
> >
> > Thus, to migrate charges, the underlying compound page cannot be on a
> > deferred split queue; no list manipulation needs to be done in
> > mem_cgroup_move_account().
> >
> > With the current code, the underlying compound page is moved to the
> > deferred split queue of the memcg its memory is not charged to, so
> > susbequent reclaim will consider these pages for the wrong memcg. Remove
> > the deferred split queue handling in mem_cgroup_move_account() entirely.
>
> I believe this still doesn't describe the underlying problem to the full
> extent. What happens with the page on the deferred list when it
> shouldn't be there in fact? Unless I am missing something deferred_split_scan
> will simply split that huge page. Which is a bit unfortunate but nothing
> really critical. This should be mentioned in the changelog.
>

Are you referring to a compound page on the deferred split queue before a
task is moved? I'm not sure this is within the scope of Wei's patch..
this is simply preventing a page from being moved to the deferred split
queue of a memcg that it is not charged to. Is there a concern about why
this code can be removed or a suggestion on something else it should be
doing instead?

2020-01-20 21:28:37

by Michal Hocko

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Mon 20-01-20 13:10:56, David Rientjes wrote:
> On Mon, 20 Jan 2020, Michal Hocko wrote:
>
> > > When migrating memcg charges of thp memory, there are two possibilities:
> > >
> > > (1) The underlying compound page is mapped by a pmd and thus does is not
> > > on a deferred split queue (it's mapped), or
> > >
> > > (2) The compound page is not mapped by a pmd and is awaiting split on a
> > > deferred split queue.
> > >
> > > The current charge migration implementation does *not* migrate charges for
> > > thp memory on the deferred split queue, it only migrates charges for pages
> > > that are mapped by a pmd.
> > >
> > > Thus, to migrate charges, the underlying compound page cannot be on a
> > > deferred split queue; no list manipulation needs to be done in
> > > mem_cgroup_move_account().
> > >
> > > With the current code, the underlying compound page is moved to the
> > > deferred split queue of the memcg its memory is not charged to, so
> > > susbequent reclaim will consider these pages for the wrong memcg. Remove
> > > the deferred split queue handling in mem_cgroup_move_account() entirely.
> >
> > I believe this still doesn't describe the underlying problem to the full
> > extent. What happens with the page on the deferred list when it
> > shouldn't be there in fact? Unless I am missing something deferred_split_scan
> > will simply split that huge page. Which is a bit unfortunate but nothing
> > really critical. This should be mentioned in the changelog.
> >
>
> Are you referring to a compound page on the deferred split queue before a
> task is moved? I'm not sure this is within the scope of Wei's patch..
> this is simply preventing a page from being moved to the deferred split
> queue of a memcg that it is not charged to. Is there a concern about why
> this code can be removed or a suggestion on something else it should be
> doing instead?

No, I do not have any concern about the patch itslef. It is that the
changelog doesn't decribe the user visible effect. All I am saying is
that the current code splits THPs of moved pages under memory pressure
even if that is not needed. And that is a clear bug.
--
Michal Hocko
SUSE Labs

2020-01-21 23:11:13

by David Rientjes

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Mon, 20 Jan 2020, Michal Hocko wrote:

> > > > When migrating memcg charges of thp memory, there are two possibilities:
> > > >
> > > > (1) The underlying compound page is mapped by a pmd and thus does is not
> > > > on a deferred split queue (it's mapped), or
> > > >
> > > > (2) The compound page is not mapped by a pmd and is awaiting split on a
> > > > deferred split queue.
> > > >
> > > > The current charge migration implementation does *not* migrate charges for
> > > > thp memory on the deferred split queue, it only migrates charges for pages
> > > > that are mapped by a pmd.
> > > >
> > > > Thus, to migrate charges, the underlying compound page cannot be on a
> > > > deferred split queue; no list manipulation needs to be done in
> > > > mem_cgroup_move_account().
> > > >
> > > > With the current code, the underlying compound page is moved to the
> > > > deferred split queue of the memcg its memory is not charged to, so
> > > > susbequent reclaim will consider these pages for the wrong memcg. Remove
> > > > the deferred split queue handling in mem_cgroup_move_account() entirely.
> > >
> > > I believe this still doesn't describe the underlying problem to the full
> > > extent. What happens with the page on the deferred list when it
> > > shouldn't be there in fact? Unless I am missing something deferred_split_scan
> > > will simply split that huge page. Which is a bit unfortunate but nothing
> > > really critical. This should be mentioned in the changelog.
> > >
> >
> > Are you referring to a compound page on the deferred split queue before a
> > task is moved? I'm not sure this is within the scope of Wei's patch..
> > this is simply preventing a page from being moved to the deferred split
> > queue of a memcg that it is not charged to. Is there a concern about why
> > this code can be removed or a suggestion on something else it should be
> > doing instead?
>
> No, I do not have any concern about the patch itslef. It is that the
> changelog doesn't decribe the user visible effect. All I am saying is
> that the current code splits THPs of moved pages under memory pressure
> even if that is not needed. And that is a clear bug.

Ah, gotcha. I tried to do this in the final paragraph of my amedment to
Wei's patch and why it's important that this is marked as stable.

The current code in 5.4 from commit 87eaceb3faa59 places any migrated
compound page onto the deferred split queue of the destination memcg
regardless of whether it has a mapping pmd
(list_empty(page_deferred_list()) was already false) or it does not have a
mapping pmd (but is now on the wrong queue). For the latter,
can_split_huge_page() can help for the actual split but not for the
removal of the page that is now erroneously on the queue. For the former,
memcg reclaim would not see the pages that it should split under memcg
pressure so we'll see the same memcg oom conditions we saw before the
deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.

2020-01-22 08:16:20

by Michal Hocko

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Tue 21-01-20 15:08:39, David Rientjes wrote:
> On Mon, 20 Jan 2020, Michal Hocko wrote:
>
> > > > > When migrating memcg charges of thp memory, there are two possibilities:
> > > > >
> > > > > (1) The underlying compound page is mapped by a pmd and thus does is not
> > > > > on a deferred split queue (it's mapped), or
> > > > >
> > > > > (2) The compound page is not mapped by a pmd and is awaiting split on a
> > > > > deferred split queue.
> > > > >
> > > > > The current charge migration implementation does *not* migrate charges for
> > > > > thp memory on the deferred split queue, it only migrates charges for pages
> > > > > that are mapped by a pmd.
> > > > >
> > > > > Thus, to migrate charges, the underlying compound page cannot be on a
> > > > > deferred split queue; no list manipulation needs to be done in
> > > > > mem_cgroup_move_account().
> > > > >
> > > > > With the current code, the underlying compound page is moved to the
> > > > > deferred split queue of the memcg its memory is not charged to, so
> > > > > susbequent reclaim will consider these pages for the wrong memcg. Remove
> > > > > the deferred split queue handling in mem_cgroup_move_account() entirely.
> > > >
> > > > I believe this still doesn't describe the underlying problem to the full
> > > > extent. What happens with the page on the deferred list when it
> > > > shouldn't be there in fact? Unless I am missing something deferred_split_scan
> > > > will simply split that huge page. Which is a bit unfortunate but nothing
> > > > really critical. This should be mentioned in the changelog.
> > > >
> > >
> > > Are you referring to a compound page on the deferred split queue before a
> > > task is moved? I'm not sure this is within the scope of Wei's patch..
> > > this is simply preventing a page from being moved to the deferred split
> > > queue of a memcg that it is not charged to. Is there a concern about why
> > > this code can be removed or a suggestion on something else it should be
> > > doing instead?
> >
> > No, I do not have any concern about the patch itslef. It is that the
> > changelog doesn't decribe the user visible effect. All I am saying is
> > that the current code splits THPs of moved pages under memory pressure
> > even if that is not needed. And that is a clear bug.
>
> Ah, gotcha. I tried to do this in the final paragraph of my amedment to
> Wei's patch and why it's important that this is marked as stable.

I considered "susbequent reclaim will consider these pages for the wrong
memcg." quite unclear TBH.

> The current code in 5.4 from commit 87eaceb3faa59 places any migrated
> compound page onto the deferred split queue of the destination memcg
> regardless of whether it has a mapping pmd
> (list_empty(page_deferred_list()) was already false) or it does not have a
> mapping pmd (but is now on the wrong queue). For the latter,
> can_split_huge_page() can help for the actual split but not for the
> removal of the page that is now erroneously on the queue.

Does that mean that those fully mapped THPs are not going to be split?

> For the former,
> memcg reclaim would not see the pages that it should split under memcg
> pressure so we'll see the same memcg oom conditions we saw before the
> deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.

OK, this is yet another user visibile effect and it would be better to
mention it explicitly in the changelog.

--
Michal Hocko
SUSE Labs

2020-01-22 23:42:02

by David Rientjes

[permalink] [raw]
Subject: Re: [Patch v4] mm: thp: remove the defer list related code since this will not happen

On Wed, 22 Jan 2020, Michal Hocko wrote:

> > The current code in 5.4 from commit 87eaceb3faa59 places any migrated
> > compound page onto the deferred split queue of the destination memcg
> > regardless of whether it has a mapping pmd
> > (list_empty(page_deferred_list()) was already false) or it does not have a
> > mapping pmd (but is now on the wrong queue). For the latter,
> > can_split_huge_page() can help for the actual split but not for the
> > removal of the page that is now erroneously on the queue.
>
> Does that mean that those fully mapped THPs are not going to be split?
>

It believe it should but deferred_split_scan() also won't take it off the
wrong split queue so it will remain there and any other checks for
page_deferred_list(page) will succeed.

> > For the former,
> > memcg reclaim would not see the pages that it should split under memcg
> > pressure so we'll see the same memcg oom conditions we saw before the
> > deferred split shrinker became SHRINKER_MEMCG_AWARE: unnecessary ooms.
>
> OK, this is yet another user visibile effect and it would be better to
> mention it explicitly in the changelog.
>

Ok, feel free to add to the last paragraph:

Memcg reclaim would not see the compound pages that it should split under
memcg pressure so we'll otherwise see the same memcg oom conditions we saw
before the deferred split shrinker became SHRINKER_MEMCG_AWARE:
unnecessary ooms.