2020-10-10 05:25:26

by Ralph Campbell

[permalink] [raw]
Subject: [PATCH] mm/memcg: fix device private memcg accounting

The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
NULL before checking is_device_private_entry() so device private pages
are never handled.
Fix this by checking for non_swap_entry() after handling device private
swap PTEs.

Cc: [email protected]
Fixes: c733a82874a7 ("mm/memcontrol: support MEMORY_DEVICE_PRIVATE")
Signed-off-by: Ralph Campbell <[email protected]>
---

I'm not sure exactly how to test this. I ran the HMM self tests but
that is a minimal sanity check. I think moving the self test from one
memory cgroup to another while it is running would exercise this patch.
I'm looking at how the test could move itself to another group after
migrating some anonymous memory to the test driver.

This applies cleanly to linux-5.9.0-rc8-mm1 and is for Andrew Morton's
tree.

mm/memcontrol.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2636f8bad908..3a24e3b619f5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5549,7 +5549,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
struct page *page = NULL;
swp_entry_t ent = pte_to_swp_entry(ptent);

- if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent))
+ if (!(mc.flags & MOVE_ANON))
return NULL;

/*
@@ -5568,6 +5568,9 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
return page;
}

+ if (non_swap_entry(ent))
+ return NULL;
+
/*
* Because lookup_swap_cache() updates some statistics counter,
* we call find_get_page() with swapper_space directly.
--
2.20.1


2020-10-10 05:53:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm/memcg: fix device private memcg accounting

On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <[email protected]> wrote:

> The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
> NULL before checking is_device_private_entry() so device private pages
> are never handled.
> Fix this by checking for non_swap_entry() after handling device private
> swap PTEs.
>
> Cc: [email protected]

I was going to ask "what are the end-user visible effects of the bug".
This is important information with a cc:stable.

>
> I'm not sure exactly how to test this. I ran the HMM self tests but
> that is a minimal sanity check. I think moving the self test from one
> memory cgroup to another while it is running would exercise this patch.
> I'm looking at how the test could move itself to another group after
> migrating some anonymous memory to the test driver.
>

But this makes me suspect the answer is "there aren't any that we know
of". Are you sure a cc:stable is warranted?

2020-10-11 02:52:23

by Ralph Campbell

[permalink] [raw]
Subject: Re: [PATCH] mm/memcg: fix device private memcg accounting


On 10/9/20 3:50 PM, Andrew Morton wrote:
> On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <[email protected]> wrote:
>
>> The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
>> NULL before checking is_device_private_entry() so device private pages
>> are never handled.
>> Fix this by checking for non_swap_entry() after handling device private
>> swap PTEs.
>>
>> Cc: [email protected]
>
> I was going to ask "what are the end-user visible effects of the bug".
> This is important information with a cc:stable.
>
>>
>> I'm not sure exactly how to test this. I ran the HMM self tests but
>> that is a minimal sanity check. I think moving the self test from one
>> memory cgroup to another while it is running would exercise this patch.
>> I'm looking at how the test could move itself to another group after
>> migrating some anonymous memory to the test driver.
>>
>
> But this makes me suspect the answer is "there aren't any that we know
> of". Are you sure a cc:stable is warranted?
>

I assume the memory cgroup accounting would be off somehow when moving
a process to another memory cgroup.
Currently, the device private page is charged like a normal anonymous page
when allocated and is uncharged when the page is freed so I think that path is OK.
Maybe someone who knows more about memory cgroup accounting can comment?

2020-10-12 13:37:12

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] mm/memcg: fix device private memcg accounting

On Fri, Oct 09, 2020 at 05:00:37PM -0700, Ralph Campbell wrote:
>
> On 10/9/20 3:50 PM, Andrew Morton wrote:
> > On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <[email protected]> wrote:
> >
> > > The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
> > > NULL before checking is_device_private_entry() so device private pages
> > > are never handled.
> > > Fix this by checking for non_swap_entry() after handling device private
> > > swap PTEs.

The fix looks good to me.

Acked-by: Johannes Weiner <[email protected]>

> > But this makes me suspect the answer is "there aren't any that we know
> > of". Are you sure a cc:stable is warranted?
> >
>
> I assume the memory cgroup accounting would be off somehow when moving
> a process to another memory cgroup.
> Currently, the device private page is charged like a normal anonymous page
> when allocated and is uncharged when the page is freed so I think that path is OK.
> Maybe someone who knows more about memory cgroup accounting can comment?

As for whether to CC stable, I'm leaning toward no:

- When moving tasks, we'd leave their device pages behind in the old
cgroup. This isn't great, but it doesn't cause counter imbalances or
corruption or anything - we also skip locked pages, we used to skip
pages mapped by more than one pte, the user can select whether to
move pages along tasks at all, and if so, whether only anon or file.

- Charge moving itself is a bit of a questionable feature, and users
have been moving away from it. Leaving tasks in a cgroup and
changing the configuration is a heck of a lot cheaper than moving
potentially gigabytes of pages to another configuration domain.

- According to the Fixes tag, this isn't a regression, either. Since
their inception, we have never migrated device pages.

2020-10-12 17:14:08

by Ralph Campbell

[permalink] [raw]
Subject: Re: [PATCH] mm/memcg: fix device private memcg accounting


On 10/12/20 6:28 AM, Johannes Weiner wrote:
> On Fri, Oct 09, 2020 at 05:00:37PM -0700, Ralph Campbell wrote:
>>
>> On 10/9/20 3:50 PM, Andrew Morton wrote:
>>> On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <[email protected]> wrote:
>>>
>>>> The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
>>>> NULL before checking is_device_private_entry() so device private pages
>>>> are never handled.
>>>> Fix this by checking for non_swap_entry() after handling device private
>>>> swap PTEs.
>
> The fix looks good to me.
>
> Acked-by: Johannes Weiner <[email protected]>
>
>>> But this makes me suspect the answer is "there aren't any that we know
>>> of". Are you sure a cc:stable is warranted?
>>>
>>
>> I assume the memory cgroup accounting would be off somehow when moving
>> a process to another memory cgroup.
>> Currently, the device private page is charged like a normal anonymous page
>> when allocated and is uncharged when the page is freed so I think that path is OK.
>> Maybe someone who knows more about memory cgroup accounting can comment?
>
> As for whether to CC stable, I'm leaning toward no:
>
> - When moving tasks, we'd leave their device pages behind in the old
> cgroup. This isn't great, but it doesn't cause counter imbalances or
> corruption or anything - we also skip locked pages, we used to skip
> pages mapped by more than one pte, the user can select whether to
> move pages along tasks at all, and if so, whether only anon or file.
>
> - Charge moving itself is a bit of a questionable feature, and users
> have been moving away from it. Leaving tasks in a cgroup and
> changing the configuration is a heck of a lot cheaper than moving
> potentially gigabytes of pages to another configuration domain.
>
> - According to the Fixes tag, this isn't a regression, either. Since
> their inception, we have never migrated device pages.

Thanks for the Acked-by and the comments.
I assume Andrew will update the tags when queuing this up unless he wants me to resend.