2019-06-04 16:49:28

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

From: Ira Weiny <[email protected]>

release_pages() is an optimized version of a loop around put_page().
Unfortunately for devmap pages the logic is not entirely correct in
release_pages(). This is because device pages can be more than type
MEMORY_DEVICE_PUBLIC. There are in fact 4 types, private, public, FS
DAX, and PCI P2PDMA. Some of these have specific needs to "put" the
page while others do not.

This logic to handle any special needs is contained in
put_devmap_managed_page(). Therefore all devmap pages should be
processed by this function where we can contain the correct logic for a
page put.

Handle all device type pages within release_pages() by calling
put_devmap_managed_page() on all devmap pages. If
put_devmap_managed_page() returns true the page has been put and we
continue with the next page. A false return of
put_devmap_managed_page() means the page did not require special
processing and should fall to "normal" processing.

This was found via code inspection while determining if release_pages()
and the new put_user_pages() could be interchangeable.[1]

[1] https://lore.kernel.org/lkml/[email protected]/

Cc: Jérôme Glisse <[email protected]>
Cc: Michal Hocko <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes from V2:
Update changelog for more clarity as requested by Michal
Update comment WRT "failing" of put_devmap_managed_page()

Changes from V1:
Add comment clarifying that put_devmap_managed_page() can still
fail.
Add Reviewed-by tags.

mm/swap.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 7ede3eddc12a..6d153ce4cb8c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
if (is_huge_zero_page(page))
continue;

- /* Device public page can not be huge page */
- if (is_device_public_page(page)) {
+ if (is_zone_device_page(page)) {
if (locked_pgdat) {
spin_unlock_irqrestore(&locked_pgdat->lru_lock,
flags);
locked_pgdat = NULL;
}
- put_devmap_managed_page(page);
- continue;
+ /*
+ * Not all zone-device-pages require special
+ * processing. Those pages return 'false' from
+ * put_devmap_managed_page() expecting a call to
+ * put_page_testzero()
+ */
+ if (put_devmap_managed_page(page))
+ continue;
}

page = compound_head(page);
--
2.20.1


2019-06-04 19:49:56

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

On 6/4/19 9:48 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> release_pages() is an optimized version of a loop around put_page().
> Unfortunately for devmap pages the logic is not entirely correct in
> release_pages(). This is because device pages can be more than type
> MEMORY_DEVICE_PUBLIC. There are in fact 4 types, private, public, FS
> DAX, and PCI P2PDMA. Some of these have specific needs to "put" the
> page while others do not.
>
> This logic to handle any special needs is contained in
> put_devmap_managed_page(). Therefore all devmap pages should be
> processed by this function where we can contain the correct logic for a
> page put.
>
> Handle all device type pages within release_pages() by calling
> put_devmap_managed_page() on all devmap pages. If
> put_devmap_managed_page() returns true the page has been put and we
> continue with the next page. A false return of
> put_devmap_managed_page() means the page did not require special
> processing and should fall to "normal" processing.
>
> This was found via code inspection while determining if release_pages()
> and the new put_user_pages() could be interchangeable.[1]
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Reviewed-by: Dan Williams <[email protected]>
> Reviewed-by: John Hubbard <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from V2:
> Update changelog for more clarity as requested by Michal
> Update comment WRT "failing" of put_devmap_managed_page()
>
> Changes from V1:
> Add comment clarifying that put_devmap_managed_page() can still
> fail.
> Add Reviewed-by tags.
>
> mm/swap.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 7ede3eddc12a..6d153ce4cb8c 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> if (is_huge_zero_page(page))
> continue;
>
> - /* Device public page can not be huge page */
> - if (is_device_public_page(page)) {
> + if (is_zone_device_page(page)) {
> if (locked_pgdat) {
> spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> flags);
> locked_pgdat = NULL;
> }
> - put_devmap_managed_page(page);
> - continue;
> + /*
> + * Not all zone-device-pages require special
> + * processing. Those pages return 'false' from
> + * put_devmap_managed_page() expecting a call to
> + * put_page_testzero()
> + */

Just a documentation tweak: how about:

/*
* ZONE_DEVICE pages that return 'false' from
* put_devmap_managed_page() do not require special
* processing, and instead, expect a call to
* put_page_testzero().
*/


thanks,
--
John Hubbard
NVIDIA

> + if (put_devmap_managed_page(page))
> + continue;
> }
>
> page = compound_head(page);
>

2019-06-04 20:14:45

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <[email protected]> wrote:
>
> On 6/4/19 9:48 AM, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > release_pages() is an optimized version of a loop around put_page().
> > Unfortunately for devmap pages the logic is not entirely correct in
> > release_pages(). This is because device pages can be more than type
> > MEMORY_DEVICE_PUBLIC. There are in fact 4 types, private, public, FS
> > DAX, and PCI P2PDMA. Some of these have specific needs to "put" the
> > page while others do not.
> >
> > This logic to handle any special needs is contained in
> > put_devmap_managed_page(). Therefore all devmap pages should be
> > processed by this function where we can contain the correct logic for a
> > page put.
> >
> > Handle all device type pages within release_pages() by calling
> > put_devmap_managed_page() on all devmap pages. If
> > put_devmap_managed_page() returns true the page has been put and we
> > continue with the next page. A false return of
> > put_devmap_managed_page() means the page did not require special
> > processing and should fall to "normal" processing.
> >
> > This was found via code inspection while determining if release_pages()
> > and the new put_user_pages() could be interchangeable.[1]
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
> >
> > Cc: Jérôme Glisse <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Reviewed-by: Dan Williams <[email protected]>
> > Reviewed-by: John Hubbard <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes from V2:
> > Update changelog for more clarity as requested by Michal
> > Update comment WRT "failing" of put_devmap_managed_page()
> >
> > Changes from V1:
> > Add comment clarifying that put_devmap_managed_page() can still
> > fail.
> > Add Reviewed-by tags.
> >
> > mm/swap.c | 13 +++++++++----
> > 1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 7ede3eddc12a..6d153ce4cb8c 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> > if (is_huge_zero_page(page))
> > continue;
> >
> > - /* Device public page can not be huge page */
> > - if (is_device_public_page(page)) {
> > + if (is_zone_device_page(page)) {
> > if (locked_pgdat) {
> > spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> > flags);
> > locked_pgdat = NULL;
> > }
> > - put_devmap_managed_page(page);
> > - continue;
> > + /*
> > + * Not all zone-device-pages require special
> > + * processing. Those pages return 'false' from
> > + * put_devmap_managed_page() expecting a call to
> > + * put_page_testzero()
> > + */
>
> Just a documentation tweak: how about:
>
> /*
> * ZONE_DEVICE pages that return 'false' from
> * put_devmap_managed_page() do not require special
> * processing, and instead, expect a call to
> * put_page_testzero().
> */

Looks better to me, but maybe just go ahead and list those
expectations explicitly. Something like:

/*
* put_devmap_managed_page() only handles
* ZONE_DEVICE (struct dev_pagemap managed)
* pages when the hosting dev_pagemap has the
* ->free() or ->fault() callback handlers
* implemented as indicated by
* dev_pagemap.type. Otherwise the expectation
* is to fall back to a plain decrement /
* put_page_testzero().
*/

2019-06-04 20:27:58

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

On 6/4/19 1:11 PM, Dan Williams wrote:
> On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <[email protected]> wrote:
>>
>> On 6/4/19 9:48 AM, [email protected] wrote:
>>> From: Ira Weiny <[email protected]>
>>>
...
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 7ede3eddc12a..6d153ce4cb8c 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
>>> if (is_huge_zero_page(page))
>>> continue;
>>>
>>> - /* Device public page can not be huge page */
>>> - if (is_device_public_page(page)) {
>>> + if (is_zone_device_page(page)) {
>>> if (locked_pgdat) {
>>> spin_unlock_irqrestore(&locked_pgdat->lru_lock,
>>> flags);
>>> locked_pgdat = NULL;
>>> }
>>> - put_devmap_managed_page(page);
>>> - continue;
>>> + /*
>>> + * Not all zone-device-pages require special
>>> + * processing. Those pages return 'false' from
>>> + * put_devmap_managed_page() expecting a call to
>>> + * put_page_testzero()
>>> + */
>>
>> Just a documentation tweak: how about:
>>
>> /*
>> * ZONE_DEVICE pages that return 'false' from
>> * put_devmap_managed_page() do not require special
>> * processing, and instead, expect a call to
>> * put_page_testzero().
>> */
>
> Looks better to me, but maybe just go ahead and list those
> expectations explicitly. Something like:
>
> /*
> * put_devmap_managed_page() only handles
> * ZONE_DEVICE (struct dev_pagemap managed)
> * pages when the hosting dev_pagemap has the
> * ->free() or ->fault() callback handlers
> * implemented as indicated by
> * dev_pagemap.type. Otherwise the expectation
> * is to fall back to a plain decrement /
> * put_page_testzero().
> */

I like it--but not here, because it's too much internal detail in a
call site that doesn't use that level of detail. The call site looks
at the return value, only.

Let's instead put that blurb above (or in) the put_devmap_managed_page()
routine itself. And leave the blurb that I wrote where it is. And then I
think everything will have an appropriate level of detail in the right places.


thanks,
--
John Hubbard
NVIDIA

2019-06-04 20:52:16

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

On Tue, Jun 04, 2019 at 01:17:42PM -0700, John Hubbard wrote:
> On 6/4/19 1:11 PM, Dan Williams wrote:
> > On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <[email protected]> wrote:
> >>
> >> On 6/4/19 9:48 AM, [email protected] wrote:
> >>> From: Ira Weiny <[email protected]>
> >>>
> ...
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 7ede3eddc12a..6d153ce4cb8c 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> >>> if (is_huge_zero_page(page))
> >>> continue;
> >>>
> >>> - /* Device public page can not be huge page */
> >>> - if (is_device_public_page(page)) {
> >>> + if (is_zone_device_page(page)) {
> >>> if (locked_pgdat) {
> >>> spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >>> flags);
> >>> locked_pgdat = NULL;
> >>> }
> >>> - put_devmap_managed_page(page);
> >>> - continue;
> >>> + /*
> >>> + * Not all zone-device-pages require special
> >>> + * processing. Those pages return 'false' from
> >>> + * put_devmap_managed_page() expecting a call to
> >>> + * put_page_testzero()
> >>> + */
> >>
> >> Just a documentation tweak: how about:
> >>
> >> /*
> >> * ZONE_DEVICE pages that return 'false' from
> >> * put_devmap_managed_page() do not require special
> >> * processing, and instead, expect a call to
> >> * put_page_testzero().
> >> */
> >
> > Looks better to me, but maybe just go ahead and list those
> > expectations explicitly. Something like:
> >
> > /*
> > * put_devmap_managed_page() only handles
> > * ZONE_DEVICE (struct dev_pagemap managed)
> > * pages when the hosting dev_pagemap has the
> > * ->free() or ->fault() callback handlers
> > * implemented as indicated by
> > * dev_pagemap.type. Otherwise the expectation
> > * is to fall back to a plain decrement /
> > * put_page_testzero().
> > */
>
> I like it--but not here, because it's too much internal detail in a
> call site that doesn't use that level of detail. The call site looks
> at the return value, only.
>
> Let's instead put that blurb above (or in) the put_devmap_managed_page()
> routine itself. And leave the blurb that I wrote where it is. And then I
> think everything will have an appropriate level of detail in the right places.

I agree. This leaves it open that this handles any special processing which is
required.

FWIW the same call is made in put_page() and has no comment so perhaps we are
getting wrapped around the axle for no reason?

Frankly I questioned myself when I mentioned put_page_testzero() as well. But
I'm ok with Johns suggestion. My wording was a bit "rushed". Sorry about
that. I wanted to remove the word 'fail' from the comment because I think it
is what caught Michal's eye.

Ira

>
>
> thanks,
> --
> John Hubbard
> NVIDIA
>

2019-06-04 21:45:44

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3] mm/swap: Fix release_pages() when releasing devmap pages

On Tue, Jun 4, 2019 at 1:17 PM John Hubbard <[email protected]> wrote:
>
> On 6/4/19 1:11 PM, Dan Williams wrote:
> > On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <[email protected]> wrote:
> >>
> >> On 6/4/19 9:48 AM, [email protected] wrote:
> >>> From: Ira Weiny <[email protected]>
> >>>
> ...
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 7ede3eddc12a..6d153ce4cb8c 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> >>> if (is_huge_zero_page(page))
> >>> continue;
> >>>
> >>> - /* Device public page can not be huge page */
> >>> - if (is_device_public_page(page)) {
> >>> + if (is_zone_device_page(page)) {
> >>> if (locked_pgdat) {
> >>> spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >>> flags);
> >>> locked_pgdat = NULL;
> >>> }
> >>> - put_devmap_managed_page(page);
> >>> - continue;
> >>> + /*
> >>> + * Not all zone-device-pages require special
> >>> + * processing. Those pages return 'false' from
> >>> + * put_devmap_managed_page() expecting a call to
> >>> + * put_page_testzero()
> >>> + */
> >>
> >> Just a documentation tweak: how about:
> >>
> >> /*
> >> * ZONE_DEVICE pages that return 'false' from
> >> * put_devmap_managed_page() do not require special
> >> * processing, and instead, expect a call to
> >> * put_page_testzero().
> >> */
> >
> > Looks better to me, but maybe just go ahead and list those
> > expectations explicitly. Something like:
> >
> > /*
> > * put_devmap_managed_page() only handles
> > * ZONE_DEVICE (struct dev_pagemap managed)
> > * pages when the hosting dev_pagemap has the
> > * ->free() or ->fault() callback handlers
> > * implemented as indicated by
> > * dev_pagemap.type. Otherwise the expectation
> > * is to fall back to a plain decrement /
> > * put_page_testzero().
> > */
>
> I like it--but not here, because it's too much internal detail in a
> call site that doesn't use that level of detail. The call site looks
> at the return value, only.
>
> Let's instead put that blurb above (or in) the put_devmap_managed_page()
> routine itself. And leave the blurb that I wrote where it is. And then I
> think everything will have an appropriate level of detail in the right places.

Ok. Ideally there wouldn't be any commentary needed at the call site
and the put_page() could be handled internal to
put_devmap_managed_page(), but I did not see a way to do that without
breaking the compile out / static branch optimization when there are
no active ZONE_DEVICE users.