2019-11-14 00:24:58

by Dan Williams

[permalink] [raw]
Subject: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel. One of those is a device-private
callback in the nouveau driver, the other is a generic wakeup needed in
the DAX case. In the hopes that all ->page_free() callbacks can be
migrated to common core kernel functionality, move the device-private
specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback. For the other page types just open-code the generic wakeup.

Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.

Cc: Jan Kara <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Hi John,

This applies on top of today's linux-next and passes my nvdimm unit
tests. That testing noticed that devmap_managed_enable_get() needed a
small fixup as well.

drivers/nvdimm/pmem.c | 6 ------
mm/memremap.c | 22 ++++++++++++----------
2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f9f76f6ba07b..21db1ce8c0ae 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
put_disk(pmem->disk);
}

-static void pmem_pagemap_page_free(struct page *page)
-{
- wake_up_var(&page->_refcount);
-}
-
static const struct dev_pagemap_ops fsdax_pagemap_ops = {
- .page_free = pmem_pagemap_page_free,
.kill = pmem_pagemap_kill,
.cleanup = pmem_pagemap_cleanup,
};
diff --git a/mm/memremap.c b/mm/memremap.c
index 022e78e68ea0..6e6f3d6fdb73 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)

static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
{
- if (!pgmap->ops || !pgmap->ops->page_free) {
+ if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
+ && !pgmap->ops->page_free)) {
WARN(1, "Missing page_free method\n");
return -EINVAL;
}
@@ -449,12 +450,6 @@ void __put_devmap_managed_page(struct page *page)
* holds a reference on the page.
*/
if (count == 1) {
- /* Clear Active bit in case of parallel mark_page_accessed */
- __ClearPageActive(page);
- __ClearPageWaiters(page);
-
- mem_cgroup_uncharge(page);
-
/*
* When a device_private page is freed, the page->mapping field
* may still contain a (stale) mapping value. For example, the
@@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
* handled differently or not done at all, so there is no need
* to clear page->mapping.
*/
- if (is_device_private_page(page))
- page->mapping = NULL;
+ if (is_device_private_page(page)) {
+ /* Clear Active bit in case of parallel mark_page_accessed */
+ __ClearPageActive(page);
+ __ClearPageWaiters(page);

- page->pgmap->ops->page_free(page);
+ mem_cgroup_uncharge(page);
+
+ page->mapping = NULL;
+ page->pgmap->ops->page_free(page);
+ } else
+ wake_up_var(&page->_refcount);
} else if (!count)
__put_page(page);
}


2019-11-14 00:46:21

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On 11/13/19 4:07 PM, Dan Williams wrote:
> After the removal of the device-public infrastructure there are only 2
> ->page_free() call backs in the kernel. One of those is a device-private
> callback in the nouveau driver, the other is a generic wakeup needed in
> the DAX case. In the hopes that all ->page_free() callbacks can be
> migrated to common core kernel functionality, move the device-private
> specific actions in __put_devmap_managed_page() under the
> is_device_private_page() conditional, including the ->page_free()
> callback. For the other page types just open-code the generic wakeup.
>
> Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
> does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
> case.
>
> Cc: Jan Kara <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: John Hubbard <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> Hi John,
>
> This applies on top of today's linux-next and passes my nvdimm unit
> tests. That testing noticed that devmap_managed_enable_get() needed a
> small fixup as well.

Got it. This will appear in the next posted version of my "mm/gup: track
dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.


>
> drivers/nvdimm/pmem.c | 6 ------
> mm/memremap.c | 22 ++++++++++++----------
> 2 files changed, 12 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index f9f76f6ba07b..21db1ce8c0ae 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
> put_disk(pmem->disk);
> }
>
> -static void pmem_pagemap_page_free(struct page *page)
> -{
> - wake_up_var(&page->_refcount);
> -}
> -
> static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> - .page_free = pmem_pagemap_page_free,
> .kill = pmem_pagemap_kill,
> .cleanup = pmem_pagemap_cleanup,
> };
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 022e78e68ea0..6e6f3d6fdb73 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
>
> static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> {
> - if (!pgmap->ops || !pgmap->ops->page_free) {
> + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> + && !pgmap->ops->page_free)) {


OK, so only MEMORY_DEVICE_PRIVATE has .page_free ops. That looks
correct to me, based on looking at the .page_free setters--I
only see Nouveau setting it.


thanks,
--
John Hubbard
NVIDIA

2019-11-14 00:51:21

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On Wed, Nov 13, 2019 at 4:42 PM John Hubbard <[email protected]> wrote:
>
> On 11/13/19 4:07 PM, Dan Williams wrote:
> > After the removal of the device-public infrastructure there are only 2
> > ->page_free() call backs in the kernel. One of those is a device-private
> > callback in the nouveau driver, the other is a generic wakeup needed in
> > the DAX case. In the hopes that all ->page_free() callbacks can be
> > migrated to common core kernel functionality, move the device-private
> > specific actions in __put_devmap_managed_page() under the
> > is_device_private_page() conditional, including the ->page_free()
> > callback. For the other page types just open-code the generic wakeup.
> >
> > Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
> > does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
> > case.
> >
> > Cc: Jan Kara <[email protected]>
> > Cc: Christoph Hellwig <[email protected]>
> > Cc: Ira Weiny <[email protected]>
> > Cc: Jérôme Glisse <[email protected]>
> > Cc: John Hubbard <[email protected]>
> > Signed-off-by: Dan Williams <[email protected]>
> > ---
> > Hi John,
> >
> > This applies on top of today's linux-next and passes my nvdimm unit
> > tests. That testing noticed that devmap_managed_enable_get() needed a
> > small fixup as well.
>
> Got it. This will appear in the next posted version of my "mm/gup: track
> dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.

Thanks!

>
>
> >
> > drivers/nvdimm/pmem.c | 6 ------
> > mm/memremap.c | 22 ++++++++++++----------
> > 2 files changed, 12 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index f9f76f6ba07b..21db1ce8c0ae 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
> > put_disk(pmem->disk);
> > }
> >
> > -static void pmem_pagemap_page_free(struct page *page)
> > -{
> > - wake_up_var(&page->_refcount);
> > -}
> > -
> > static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> > - .page_free = pmem_pagemap_page_free,
> > .kill = pmem_pagemap_kill,
> > .cleanup = pmem_pagemap_cleanup,
> > };
> > diff --git a/mm/memremap.c b/mm/memremap.c
> > index 022e78e68ea0..6e6f3d6fdb73 100644
> > --- a/mm/memremap.c
> > +++ b/mm/memremap.c
> > @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
> >
> > static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> > {
> > - if (!pgmap->ops || !pgmap->ops->page_free) {
> > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> > + && !pgmap->ops->page_free)) {
>
>
> OK, so only MEMORY_DEVICE_PRIVATE has .page_free ops. That looks
> correct to me, based on looking at the .page_free setters--I
> only see Nouveau setting it.

Correct. The FSDAX case still needs to enable the 'devmap_managed_key'
static key, but other than that the core will handle all the follow-on
details.

2019-11-14 01:26:41

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote:
> After the removal of the device-public infrastructure there are only 2
> ->page_free() call backs in the kernel. One of those is a device-private
> callback in the nouveau driver, the other is a generic wakeup needed in
> the DAX case. In the hopes that all ->page_free() callbacks can be
> migrated to common core kernel functionality, move the device-private
> specific actions in __put_devmap_managed_page() under the
> is_device_private_page() conditional, including the ->page_free()
> callback. For the other page types just open-code the generic wakeup.
>
> Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
> does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
> case.
>
> Cc: Jan Kara <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: J?r?me Glisse <[email protected]>
> Cc: John Hubbard <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>

All looks good to me.

Reviewed-by: J?r?me Glisse <[email protected]>


> ---
> Hi John,
>
> This applies on top of today's linux-next and passes my nvdimm unit
> tests. That testing noticed that devmap_managed_enable_get() needed a
> small fixup as well.
>
> drivers/nvdimm/pmem.c | 6 ------
> mm/memremap.c | 22 ++++++++++++----------
> 2 files changed, 12 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index f9f76f6ba07b..21db1ce8c0ae 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
> put_disk(pmem->disk);
> }
>
> -static void pmem_pagemap_page_free(struct page *page)
> -{
> - wake_up_var(&page->_refcount);
> -}
> -
> static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> - .page_free = pmem_pagemap_page_free,
> .kill = pmem_pagemap_kill,
> .cleanup = pmem_pagemap_cleanup,
> };
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 022e78e68ea0..6e6f3d6fdb73 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
>
> static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> {
> - if (!pgmap->ops || !pgmap->ops->page_free) {
> + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> + && !pgmap->ops->page_free)) {
> WARN(1, "Missing page_free method\n");
> return -EINVAL;
> }
> @@ -449,12 +450,6 @@ void __put_devmap_managed_page(struct page *page)
> * holds a reference on the page.
> */
> if (count == 1) {
> - /* Clear Active bit in case of parallel mark_page_accessed */
> - __ClearPageActive(page);
> - __ClearPageWaiters(page);
> -
> - mem_cgroup_uncharge(page);
> -
> /*
> * When a device_private page is freed, the page->mapping field
> * may still contain a (stale) mapping value. For example, the
> @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
> * handled differently or not done at all, so there is no need
> * to clear page->mapping.
> */
> - if (is_device_private_page(page))
> - page->mapping = NULL;
> + if (is_device_private_page(page)) {
> + /* Clear Active bit in case of parallel mark_page_accessed */
> + __ClearPageActive(page);
> + __ClearPageWaiters(page);
>
> - page->pgmap->ops->page_free(page);
> + mem_cgroup_uncharge(page);
> +
> + page->mapping = NULL;
> + page->pgmap->ops->page_free(page);
> + } else
> + wake_up_var(&page->_refcount);
> } else if (!count)
> __put_page(page);
> }
>

2019-11-14 07:22:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote:
> static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> {
> - if (!pgmap->ops || !pgmap->ops->page_free) {
> + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> + && !pgmap->ops->page_free)) {

I don't think this check is correct. You only want the the ops null check
or MEMORY_DEVICE_PRIVATE as well now, i.e.:

if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
(!pgmap->ops || !pgmap->ops->page_free)) {

> @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
> * handled differently or not done at all, so there is no need
> * to clear page->mapping.
> */
> - if (is_device_private_page(page))
> - page->mapping = NULL;
> + if (is_device_private_page(page)) {
> + /* Clear Active bit in case of parallel mark_page_accessed */

This adds a > 80 char line. But that whole flow of the function seems
rather odd now.

Why can't we do:

if (count == 0) {
__put_page(page);
} else if (is_device_private_page(page)) {
__ClearPageActive(page);
__ClearPageWaiters(page);

mem_cgroup_uncharge(page);
page->mapping = NULL;
page->pgmap->ops->page_free(page);
} else {
wake_up_var(&page->_refcount);
}

(except for the fact that I don't get the point of calling __put_page
on a refcount of zero, but that is separate from this patch).

2019-11-14 07:25:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On Wed, Nov 13, 2019 at 04:47:38PM -0800, Dan Williams wrote:
> > Got it. This will appear in the next posted version of my "mm/gup: track
> > dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.
>
> Thanks!

John - can you please send a small series just doing the zone device
patches rework? That way we can review it separately and maybe even get
it into 5.5.

2019-11-14 07:29:45

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On Wed, Nov 13, 2019 at 11:19 PM Christoph Hellwig <[email protected]> wrote:
>
> On Wed, Nov 13, 2019 at 04:07:22PM -0800, Dan Williams wrote:
> > static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> > {
> > - if (!pgmap->ops || !pgmap->ops->page_free) {
> > + if (!pgmap->ops || (pgmap->type == MEMORY_DEVICE_PRIVATE
> > + && !pgmap->ops->page_free)) {
>
> I don't think this check is correct. You only want the the ops null check
> or MEMORY_DEVICE_PRIVATE as well now, i.e.:
>
> if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
> (!pgmap->ops || !pgmap->ops->page_free)) {
>
> > @@ -476,10 +471,17 @@ void __put_devmap_managed_page(struct page *page)
> > * handled differently or not done at all, so there is no need
> > * to clear page->mapping.
> > */
> > - if (is_device_private_page(page))
> > - page->mapping = NULL;
> > + if (is_device_private_page(page)) {
> > + /* Clear Active bit in case of parallel mark_page_accessed */
>
> This adds a > 80 char line. But that whole flow of the function seems
> rather odd now.
>
> Why can't we do:
>
> if (count == 0) {
> __put_page(page);
> } else if (is_device_private_page(page)) {
> __ClearPageActive(page);
> __ClearPageWaiters(page);
>
> mem_cgroup_uncharge(page);
> page->mapping = NULL;
> page->pgmap->ops->page_free(page);
> } else {
> wake_up_var(&page->_refcount);
> }
>

All the above looks good to me will spin a v2.

> (except for the fact that I don't get the point of calling __put_page
> on a refcount of zero, but that is separate from this patch).

That looked odd to me as well until I recalled that we did that to
simplify the pgmap reference counting.

71389703839e mm, zone_device: Replace {get, put}_zone_device_page()
with a single reference to fix pmem crash

I'll add a comment in v2.

2019-11-14 07:30:22

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

On 11/13/19 11:23 PM, Christoph Hellwig wrote:
> On Wed, Nov 13, 2019 at 04:47:38PM -0800, Dan Williams wrote:
>>> Got it. This will appear in the next posted version of my "mm/gup: track
>>> dma-pinned pages: FOLL_PIN, FOLL_LONGTERM" patchset.
>>
>> Thanks!
>
> John - can you please send a small series just doing the zone device
> patches rework? That way we can review it separately and maybe even get
> it into 5.5.
>

Sure.


thanks,
--
John Hubbard
NVIDIA