2019-07-19 21:56:19

by Ralph Campbell

[permalink] [raw]
Subject: [PATCH v2 0/3] mm/hmm: fixes for device private page migration

Testing the latest linux git tree turned up a few bugs with page
migration to and from ZONE_DEVICE private and anonymous pages.
Hopefully it clarifies how ZONE_DEVICE private struct page uses
the same mapping and index fields from the source anonymous page
mapping.

Changes from v1 to v2:

Patch #1 merges ZONE_DEVICE page struct into a union of lru and
a struct for ZONE_DEVICE fields. So, basically a new patch.

Patch #2 updates the code comments for clearing page->mapping as
suggested by John Hubbard.

Patch #3 is unchanged from the previous posting but note that
Andrew Morton has v1 queued in v5.2-mmotm-2019-07-18-16-08.

Ralph Campbell (3):
mm: document zone device struct page reserved fields
mm/hmm: fix ZONE_DEVICE anon page mapping reuse
mm/hmm: Fix bad subpage pointer in try_to_unmap_one

include/linux/mm_types.h | 9 ++++++++-
kernel/memremap.c | 4 ++++
mm/rmap.c | 1 +
3 files changed, 13 insertions(+), 1 deletion(-)

--
2.20.1


2019-07-19 21:56:55

by Ralph Campbell

[permalink] [raw]
Subject: [PATCH v2 3/3] mm/hmm: Fix bad subpage pointer in try_to_unmap_one

When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:

BUG: unable to handle page fault for address: ffffea1fffffffc8

Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".

Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..ec1af8b60423 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}

--
2.20.1

2019-07-19 21:57:48

by Ralph Campbell

[permalink] [raw]
Subject: [PATCH v2 2/3] mm/hmm: fix ZONE_DEVICE anon page mapping reuse

When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */

The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.

Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.

Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: [email protected]
Signed-off-by: Ralph Campbell <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index bea6f887adad..98d04466dcde 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -408,6 +408,30 @@ void __put_devmap_managed_page(struct page *page)

mem_cgroup_uncharge(page);

+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1

2019-07-19 23:59:52

by Ralph Campbell

[permalink] [raw]
Subject: [PATCH v2 1/3] mm: document zone device struct page field usage

Struct page for ZONE_DEVICE private pages uses the page->mapping and
and page->index fields while the source anonymous pages are migrated to
device private memory. This is so rmap_walk() can find the page when
migrating the ZONE_DEVICE private page back to system memory.
ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
page->index fields when files are mapped into a process address space.

Restructure struct page and add comments to make this more clear.

Signed-off-by: Ralph Campbell <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
include/linux/mm_types.h | 42 +++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3a37a89eb7a7..f6c52e44d40c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -76,13 +76,35 @@ struct page {
* avoid collision and false-positive PageTail().
*/
union {
- struct { /* Page cache and anonymous pages */
- /**
- * @lru: Pageout list, eg. active_list protected by
- * pgdat->lru_lock. Sometimes used as a generic list
- * by the page owner.
- */
- struct list_head lru;
+ struct { /* Page cache, anonymous, ZONE_DEVICE pages */
+ union {
+ /**
+ * @lru: Pageout list, e.g., active_list
+ * protected by pgdat->lru_lock. Sometimes
+ * used as a generic list by the page owner.
+ */
+ struct list_head lru;
+ /**
+ * ZONE_DEVICE pages are never on the lru
+ * list so they reuse the list space.
+ * ZONE_DEVICE private pages are counted as
+ * being mapped so the @mapping and @index
+ * fields are used while the page is migrated
+ * to device private memory.
+ * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
+ * use the @mapping and @index fields when pmem
+ * backed DAX files are mapped.
+ */
+ struct {
+ /**
+ * @pgmap: Points to the hosting
+ * device page map.
+ */
+ struct dev_pagemap *pgmap;
+ /** @zone_device_data: opaque data. */
+ void *zone_device_data;
+ };
+ };
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
pgoff_t index; /* Our offset within mapping. */
@@ -155,12 +177,6 @@ struct page {
spinlock_t ptl;
#endif
};
- struct { /* ZONE_DEVICE pages */
- /** @pgmap: Points to the hosting device page map. */
- struct dev_pagemap *pgmap;
- void *zone_device_data;
- unsigned long _zd_pad_1; /* uses mapping */
- };

/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
--
2.20.1

2019-07-20 00:34:50

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

On 7/19/19 12:29 PM, Ralph Campbell wrote:
> Struct page for ZONE_DEVICE private pages uses the page->mapping and
> and page->index fields while the source anonymous pages are migrated to
> device private memory. This is so rmap_walk() can find the page when
> migrating the ZONE_DEVICE private page back to system memory.
> ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
> page->index fields when files are mapped into a process address space.
>
> Restructure struct page and add comments to make this more clear.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Reviewed-by: John Hubbard <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: "Kirill A . Shutemov" <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> ---
> include/linux/mm_types.h | 42 +++++++++++++++++++++++++++-------------
> 1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3a37a89eb7a7..f6c52e44d40c 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -76,13 +76,35 @@ struct page {
> * avoid collision and false-positive PageTail().
> */
> union {
> - struct { /* Page cache and anonymous pages */
> - /**
> - * @lru: Pageout list, eg. active_list protected by
> - * pgdat->lru_lock. Sometimes used as a generic list
> - * by the page owner.
> - */
> - struct list_head lru;
> + struct { /* Page cache, anonymous, ZONE_DEVICE pages */
> + union {
> + /**
> + * @lru: Pageout list, e.g., active_list
> + * protected by pgdat->lru_lock. Sometimes
> + * used as a generic list by the page owner.
> + */
> + struct list_head lru;
> + /**
> + * ZONE_DEVICE pages are never on the lru
> + * list so they reuse the list space.
> + * ZONE_DEVICE private pages are counted as
> + * being mapped so the @mapping and @index
> + * fields are used while the page is migrated
> + * to device private memory.
> + * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
> + * use the @mapping and @index fields when pmem
> + * backed DAX files are mapped.
> + */
> + struct {
> + /**
> + * @pgmap: Points to the hosting
> + * device page map.
> + */
> + struct dev_pagemap *pgmap;
> + /** @zone_device_data: opaque data. */

This is nice, and I think it's a solid step forward in documentation via
clearer code. I recall a number of email, and even face to face discussions,
in which the statement kept coming up: "remember, the ZONE_DEVICE pages use
mapping and index, too. Actually, that reminds me: page->private is often
in use in that case, too, so ZONE_DEVICE couldn't use that, either. I don't
think we need to explicitly say that, though, with this new layout.

nit: the above comment can be deleted, because it merely echoes the actual
code that directly follows it.

Either way, you can add:

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> + void *zone_device_data;
> + };
> + };
> /* See page-flags.h for PAGE_MAPPING_FLAGS */
> struct address_space *mapping;
> pgoff_t index; /* Our offset within mapping. */
> @@ -155,12 +177,6 @@ struct page {
> spinlock_t ptl;
> #endif
> };
> - struct { /* ZONE_DEVICE pages */
> - /** @pgmap: Points to the hosting device page map. */
> - struct dev_pagemap *pgmap;
> - void *zone_device_data;
> - unsigned long _zd_pad_1; /* uses mapping */
> - };
>
> /** @rcu_head: You can use this to free a page by RCU. */
> struct rcu_head rcu_head;
>

2019-07-20 06:19:29

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] mm/hmm: fix ZONE_DEVICE anon page mapping reuse

On 7/19/19 12:29 PM, Ralph Campbell wrote:
> When a ZONE_DEVICE private page is freed, the page->mapping field can be
> set. If this page is reused as an anonymous page, the previous value can
> prevent the page from being inserted into the CPU's anon rmap table.
> For example, when migrating a pte_none() page to device memory:
> migrate_vma(ops, vma, start, end, src, dst, private)
> migrate_vma_collect()
> src[] = MIGRATE_PFN_MIGRATE
> migrate_vma_prepare()
> /* no page to lock or isolate so OK */
> migrate_vma_unmap()
> /* no page to unmap so OK */
> ops->alloc_and_copy()
> /* driver allocates ZONE_DEVICE page for dst[] */
> migrate_vma_pages()
> migrate_vma_insert_page()
> page_add_new_anon_rmap()
> __page_set_anon_rmap()
> /* This check sees the page's stale mapping field */
> if (PageAnon(page))
> return
> /* page->mapping is not updated */
>
> The result is that the migration appears to succeed but a subsequent CPU
> fault will be unable to migrate the page back to system memory or worse.
>
> Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
> pointer data doesn't affect future page use.

Reviewed-by: John Hubbard <[email protected]>

thanks,
--
John Hubbard
NVIDIA

>
> Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
> Cc: [email protected]
> Signed-off-by: Ralph Campbell <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Logan Gunthorpe <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Mike Kravetz <[email protected]>
> Cc: "Jérôme Glisse" <[email protected]>
> ---
> kernel/memremap.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index bea6f887adad..98d04466dcde 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -408,6 +408,30 @@ void __put_devmap_managed_page(struct page *page)
>
> mem_cgroup_uncharge(page);
>
> + /*
> + * When a device_private page is freed, the page->mapping field
> + * may still contain a (stale) mapping value. For example, the
> + * lower bits of page->mapping may still identify the page as
> + * an anonymous page. Ultimately, this entire field is just
> + * stale and wrong, and it will cause errors if not cleared.
> + * One example is:
> + *
> + * migrate_vma_pages()
> + * migrate_vma_insert_page()
> + * page_add_new_anon_rmap()
> + * __page_set_anon_rmap()
> + * ...checks page->mapping, via PageAnon(page) call,
> + * and incorrectly concludes that the page is an
> + * anonymous page. Therefore, it incorrectly,
> + * silently fails to set up the new anon rmap.
> + *
> + * For other types of ZONE_DEVICE pages, migration is either
> + * handled differently or not done at all, so there is no need
> + * to clear page->mapping.
> + */
> + if (is_device_private_page(page))
> + page->mapping = NULL;
> +
> page->pgmap->ops->page_free(page);
> } else if (!count)
> __put_page(page);
>

2019-07-21 15:07:46

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

Hi,

On 7/19/19 12:29 PM, Ralph Campbell wrote:
> Struct page for ZONE_DEVICE private pages uses the page->mapping and
> and page->index fields while the source anonymous pages are migrated to
> device private memory. This is so rmap_walk() can find the page when
> migrating the ZONE_DEVICE private page back to system memory.
> ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
> page->index fields when files are mapped into a process address space.
>
> Restructure struct page and add comments to make this more clear.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Reviewed-by: John Hubbard <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: "Kirill A . Shutemov" <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> ---
> include/linux/mm_types.h | 42 +++++++++++++++++++++++++++-------------
> 1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3a37a89eb7a7..f6c52e44d40c 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -76,13 +76,35 @@ struct page {
> * avoid collision and false-positive PageTail().
> */
> union {
> - struct { /* Page cache and anonymous pages */
> - /**
> - * @lru: Pageout list, eg. active_list protected by
> - * pgdat->lru_lock. Sometimes used as a generic list
> - * by the page owner.
> - */
> - struct list_head lru;
> + struct { /* Page cache, anonymous, ZONE_DEVICE pages */
> + union {
> + /**
> + * @lru: Pageout list, e.g., active_list
> + * protected by pgdat->lru_lock. Sometimes
> + * used as a generic list by the page owner.
> + */
> + struct list_head lru;

Did you run this through 'make htmldocs' or anything similar?
The reason I ask is that the "/**" comment below is not in kernel-doc format AFAICT.
I would expect an error or warning, but I haven't tested it.

Thanks.

> + /**
> + * ZONE_DEVICE pages are never on the lru
> + * list so they reuse the list space.
> + * ZONE_DEVICE private pages are counted as
> + * being mapped so the @mapping and @index
> + * fields are used while the page is migrated
> + * to device private memory.
> + * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
> + * use the @mapping and @index fields when pmem
> + * backed DAX files are mapped.
> + */
> + struct {
> + /**
> + * @pgmap: Points to the hosting
> + * device page map.
> + */
> + struct dev_pagemap *pgmap;
> + /** @zone_device_data: opaque data. */
> + void *zone_device_data;
> + };
> + };
> /* See page-flags.h for PAGE_MAPPING_FLAGS */
> struct address_space *mapping;
> pgoff_t index; /* Our offset within mapping. */
> @@ -155,12 +177,6 @@ struct page {
> spinlock_t ptl;
> #endif
> };
> - struct { /* ZONE_DEVICE pages */
> - /** @pgmap: Points to the hosting device page map. */
> - struct dev_pagemap *pgmap;
> - void *zone_device_data;
> - unsigned long _zd_pad_1; /* uses mapping */
> - };
>
> /** @rcu_head: You can use this to free a page by RCU. */
> struct rcu_head rcu_head;
>


--
~Randy

2019-07-21 16:04:10

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

On Fri, Jul 19, 2019 at 12:29:53PM -0700, Ralph Campbell wrote:
> Struct page for ZONE_DEVICE private pages uses the page->mapping and
> and page->index fields while the source anonymous pages are migrated to
> device private memory. This is so rmap_walk() can find the page when
> migrating the ZONE_DEVICE private page back to system memory.
> ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
> page->index fields when files are mapped into a process address space.
>
> Restructure struct page and add comments to make this more clear.

NAK. I just got rid of this kind of foolishness from struct page,
and you're making it harder to understand, not easier. The comments
could be improved, but don't lay it out like this again.

> @@ -76,13 +76,35 @@ struct page {
> * avoid collision and false-positive PageTail().
> */
> union {
> - struct { /* Page cache and anonymous pages */
> - /**
> - * @lru: Pageout list, eg. active_list protected by
> - * pgdat->lru_lock. Sometimes used as a generic list
> - * by the page owner.
> - */
> - struct list_head lru;
> + struct { /* Page cache, anonymous, ZONE_DEVICE pages */
> + union {
> + /**
> + * @lru: Pageout list, e.g., active_list
> + * protected by pgdat->lru_lock. Sometimes
> + * used as a generic list by the page owner.
> + */
> + struct list_head lru;
> + /**
> + * ZONE_DEVICE pages are never on the lru
> + * list so they reuse the list space.
> + * ZONE_DEVICE private pages are counted as
> + * being mapped so the @mapping and @index
> + * fields are used while the page is migrated
> + * to device private memory.
> + * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
> + * use the @mapping and @index fields when pmem
> + * backed DAX files are mapped.
> + */
> + struct {
> + /**
> + * @pgmap: Points to the hosting
> + * device page map.
> + */
> + struct dev_pagemap *pgmap;
> + /** @zone_device_data: opaque data. */
> + void *zone_device_data;
> + };
> + };
> /* See page-flags.h for PAGE_MAPPING_FLAGS */
> struct address_space *mapping;
> pgoff_t index; /* Our offset within mapping. */
> @@ -155,12 +177,6 @@ struct page {
> spinlock_t ptl;
> #endif
> };
> - struct { /* ZONE_DEVICE pages */
> - /** @pgmap: Points to the hosting device page map. */
> - struct dev_pagemap *pgmap;
> - void *zone_device_data;
> - unsigned long _zd_pad_1; /* uses mapping */
> - };
>
> /** @rcu_head: You can use this to free a page by RCU. */
> struct rcu_head rcu_head;
> --
> 2.20.1
>

2019-07-22 05:57:20

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

On Sun, Jul 21, 2019 at 09:02:04AM -0700, Matthew Wilcox wrote:
> On Fri, Jul 19, 2019 at 12:29:53PM -0700, Ralph Campbell wrote:
> > Struct page for ZONE_DEVICE private pages uses the page->mapping and
> > and page->index fields while the source anonymous pages are migrated to
> > device private memory. This is so rmap_walk() can find the page when
> > migrating the ZONE_DEVICE private page back to system memory.
> > ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
> > page->index fields when files are mapped into a process address space.
> >
> > Restructure struct page and add comments to make this more clear.
>
> NAK. I just got rid of this kind of foolishness from struct page,
> and you're making it harder to understand, not easier. The comments
> could be improved, but don't lay it out like this again.

Was V1 of Ralphs patch ok? It seemed ok to me.

Ira

>
> > @@ -76,13 +76,35 @@ struct page {
> > * avoid collision and false-positive PageTail().
> > */
> > union {
> > - struct { /* Page cache and anonymous pages */
> > - /**
> > - * @lru: Pageout list, eg. active_list protected by
> > - * pgdat->lru_lock. Sometimes used as a generic list
> > - * by the page owner.
> > - */
> > - struct list_head lru;
> > + struct { /* Page cache, anonymous, ZONE_DEVICE pages */
> > + union {
> > + /**
> > + * @lru: Pageout list, e.g., active_list
> > + * protected by pgdat->lru_lock. Sometimes
> > + * used as a generic list by the page owner.
> > + */
> > + struct list_head lru;
> > + /**
> > + * ZONE_DEVICE pages are never on the lru
> > + * list so they reuse the list space.
> > + * ZONE_DEVICE private pages are counted as
> > + * being mapped so the @mapping and @index
> > + * fields are used while the page is migrated
> > + * to device private memory.
> > + * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
> > + * use the @mapping and @index fields when pmem
> > + * backed DAX files are mapped.
> > + */
> > + struct {
> > + /**
> > + * @pgmap: Points to the hosting
> > + * device page map.
> > + */
> > + struct dev_pagemap *pgmap;
> > + /** @zone_device_data: opaque data. */
> > + void *zone_device_data;
> > + };
> > + };
> > /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > struct address_space *mapping;
> > pgoff_t index; /* Our offset within mapping. */
> > @@ -155,12 +177,6 @@ struct page {
> > spinlock_t ptl;
> > #endif
> > };
> > - struct { /* ZONE_DEVICE pages */
> > - /** @pgmap: Points to the hosting device page map. */
> > - struct dev_pagemap *pgmap;
> > - void *zone_device_data;
> > - unsigned long _zd_pad_1; /* uses mapping */
> > - };
> >
> > /** @rcu_head: You can use this to free a page by RCU. */
> > struct rcu_head rcu_head;
> > --
> > 2.20.1
> >
>

2019-07-22 09:49:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

On Sun, Jul 21, 2019 at 09:02:04AM -0700, Matthew Wilcox wrote:
> On Fri, Jul 19, 2019 at 12:29:53PM -0700, Ralph Campbell wrote:
> > Struct page for ZONE_DEVICE private pages uses the page->mapping and
> > and page->index fields while the source anonymous pages are migrated to
> > device private memory. This is so rmap_walk() can find the page when
> > migrating the ZONE_DEVICE private page back to system memory.
> > ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
> > page->index fields when files are mapped into a process address space.
> >
> > Restructure struct page and add comments to make this more clear.
>
> NAK. I just got rid of this kind of foolishness from struct page,
> and you're making it harder to understand, not easier. The comments
> could be improved, but don't lay it out like this again.

This comes over pretty agressive. Please explain how making the
layout match how the code actually is used vs the previous separation
that is actively misleading and confused multiple people is "foolishness".

2019-07-22 09:49:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

Looks good modulo any potential kerneldoc issues for which I'm not
the expert:

Reviewed-by: Christoph Hellwig <[email protected]>

2019-07-22 09:49:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] mm/hmm: fix ZONE_DEVICE anon page mapping reuse

> + /*
> + * When a device_private page is freed, the page->mapping field
> + * may still contain a (stale) mapping value. For example, the
> + * lower bits of page->mapping may still identify the page as
> + * an anonymous page. Ultimately, this entire field is just
> + * stale and wrong, and it will cause errors if not cleared.
> + * One example is:
> + *
> + * migrate_vma_pages()
> + * migrate_vma_insert_page()
> + * page_add_new_anon_rmap()
> + * __page_set_anon_rmap()
> + * ...checks page->mapping, via PageAnon(page) call,
> + * and incorrectly concludes that the page is an
> + * anonymous page. Therefore, it incorrectly,
> + * silently fails to set up the new anon rmap.
> + *
> + * For other types of ZONE_DEVICE pages, migration is either
> + * handled differently or not done at all, so there is no need
> + * to clear page->mapping.
> + */
> + if (is_device_private_page(page))
> + page->mapping = NULL;
> +

Thanks, especially for the long comment.

Reviewed-by: Christoph Hellwig <[email protected]>

2019-07-22 12:40:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage

On Sun, Jul 21, 2019 at 10:13:45PM -0700, Ira Weiny wrote:
> On Sun, Jul 21, 2019 at 09:02:04AM -0700, Matthew Wilcox wrote:
> > On Fri, Jul 19, 2019 at 12:29:53PM -0700, Ralph Campbell wrote:
> > > Struct page for ZONE_DEVICE private pages uses the page->mapping and
> > > and page->index fields while the source anonymous pages are migrated to
> > > device private memory. This is so rmap_walk() can find the page when
> > > migrating the ZONE_DEVICE private page back to system memory.
> > > ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
> > > page->index fields when files are mapped into a process address space.
> > >
> > > Restructure struct page and add comments to make this more clear.
> >
> > NAK. I just got rid of this kind of foolishness from struct page,
> > and you're making it harder to understand, not easier. The comments
> > could be improved, but don't lay it out like this again.
>
> Was V1 of Ralphs patch ok? It seemed ok to me.

Yes, v1 was fine. This seems like a regression.

2019-07-24 02:31:45

by Ralph Campbell

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: document zone device struct page field usage


On 7/22/19 4:08 AM, Matthew Wilcox wrote:
> On Sun, Jul 21, 2019 at 10:13:45PM -0700, Ira Weiny wrote:
>> On Sun, Jul 21, 2019 at 09:02:04AM -0700, Matthew Wilcox wrote:
>>> On Fri, Jul 19, 2019 at 12:29:53PM -0700, Ralph Campbell wrote:
>>>> Struct page for ZONE_DEVICE private pages uses the page->mapping and
>>>> and page->index fields while the source anonymous pages are migrated to
>>>> device private memory. This is so rmap_walk() can find the page when
>>>> migrating the ZONE_DEVICE private page back to system memory.
>>>> ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and
>>>> page->index fields when files are mapped into a process address space.
>>>>
>>>> Restructure struct page and add comments to make this more clear.
>>>
>>> NAK. I just got rid of this kind of foolishness from struct page,
>>> and you're making it harder to understand, not easier. The comments
>>> could be improved, but don't lay it out like this again.
>>
>> Was V1 of Ralphs patch ok? It seemed ok to me.
>
> Yes, v1 was fine. This seems like a regression.
>

This is about what people find "easiest to understand" and so
I'm not surprised that opinions differ.
What if I post a v3 based on v1 but remove the _zd_pad_* variables
that Christoph found misleading and add some more comments
about how the different ZONE_DEVICE types use the 3 remaining
words (basically the comment from v2)?

2019-07-24 02:33:30

by Ira Weiny

[permalink] [raw]
Subject: RE: [PATCH v2 1/3] mm: document zone device struct page field usage

>
> On 7/22/19 4:08 AM, Matthew Wilcox wrote:
> > On Sun, Jul 21, 2019 at 10:13:45PM -0700, Ira Weiny wrote:
> >> On Sun, Jul 21, 2019 at 09:02:04AM -0700, Matthew Wilcox wrote:
> >>> On Fri, Jul 19, 2019 at 12:29:53PM -0700, Ralph Campbell wrote:
> >>>> Struct page for ZONE_DEVICE private pages uses the page->mapping
> >>>> and and page->index fields while the source anonymous pages are
> >>>> migrated to device private memory. This is so rmap_walk() can find
> >>>> the page when migrating the ZONE_DEVICE private page back to system
> memory.
> >>>> ZONE_DEVICE pmem backed fsdax pages also use the page->mapping
> and
> >>>> page->index fields when files are mapped into a process address space.
> >>>>
> >>>> Restructure struct page and add comments to make this more clear.
> >>>
> >>> NAK. I just got rid of this kind of foolishness from struct page,
> >>> and you're making it harder to understand, not easier. The comments
> >>> could be improved, but don't lay it out like this again.
> >>
> >> Was V1 of Ralphs patch ok? It seemed ok to me.
> >
> > Yes, v1 was fine. This seems like a regression.
> >
>
> This is about what people find "easiest to understand" and so I'm not
> surprised that opinions differ.
> What if I post a v3 based on v1 but remove the _zd_pad_* variables that
> Christoph found misleading and add some more comments about how the
> different ZONE_DEVICE types use the 3 remaining words (basically the
> comment from v2)?

I'm ok with that...

Ira