2018-08-09 02:15:12

by Zhang Yi

[permalink] [raw]
Subject: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio

For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved. the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to indentify these pages are from NVDIMM pmem. and let kvm treat these
as normal pages.

Without this patch, Many operations will be missed due to this
mistreatment to pmem pages. For example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.

V1:
https://lkml.org/lkml/2018/7/4/91

V2:
https://lkml.org/lkml/2018/7/10/135

V3:
[PATCH V3 1/4] Needs Comments.
[PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
[PATCH V3 3/4] Acked-by: Jan in V2
[PATCH V3 4/4] Needs Comments.

Zhang Yi (4):
kvm: remove redundant reserved page check
mm: introduce memory type MEMORY_DEVICE_DEV_DAX
mm: add a function to differentiate the pages is from DAX device
memory
kvm: add a check if pfn is from NVDIMM pmem.

drivers/dax/pmem.c | 1 +
include/linux/memremap.h | 8 ++++++++
include/linux/mm.h | 12 ++++++++++++
virt/kvm/kvm_main.c | 16 ++++++++--------
4 files changed, 29 insertions(+), 8 deletions(-)

--
2.7.4



2018-08-09 02:15:32

by Zhang Yi

[permalink] [raw]
Subject: [PATCH V3 1/4] kvm: remove redundant reserved page check

PageReserved() is already checked inside kvm_is_reserved_pfn(),
remove it from kvm_set_pfn_dirty().

Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yu <[email protected]>
---
virt/kvm/kvm_main.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b47507f..c44c406 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1690,12 +1690,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);

void kvm_set_pfn_dirty(kvm_pfn_t pfn)
{
- if (!kvm_is_reserved_pfn(pfn)) {
- struct page *page = pfn_to_page(pfn);
-
- if (!PageReserved(page))
- SetPageDirty(page);
- }
+ if (!kvm_is_reserved_pfn(pfn))
+ SetPageDirty(pfn_to_page(pfn));
}
EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);

--
2.7.4


2018-08-09 02:15:52

by Zhang Yi

[permalink] [raw]
Subject: [PATCH V3 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX

Currently, NVDIMM pages will be marked 'PageReserved'. However, unlike
other reserved PFNs, pages on NVDIMM shall still behave like normal ones
in many cases, i.e. when used as backend memory of KVM guest. This patch
introduces a new memory type, MEMORY_DEVICE_DEV_DAX. And set this flag
while dax driver hotplug the device memory.

Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yu <[email protected]>
---
drivers/dax/pmem.c | 1 +
include/linux/memremap.h | 8 ++++++++
2 files changed, 9 insertions(+)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index fd49b24..fb3f363 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -111,6 +111,7 @@ static int dax_pmem_probe(struct device *dev)
return rc;

dax_pmem->pgmap.ref = &dax_pmem->ref;
+ dax_pmem->pgmap.type = MEMORY_DEVICE_DEV_DAX;
addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e7..cd07ca8 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -53,11 +53,19 @@ struct vmem_altmap {
* wakeup event whenever a page is unpinned and becomes idle. This
* wakeup is used to coordinate physical address space management (ex:
* fs truncate/hole punch) vs pinned pages (ex: device dma).
+ *
+ * MEMORY_DEVICE_DEV_DAX:
+ * Device memory that support raw access to persistent memory. Without need
+ * of an intervening filesystem, it could be directed mapped via an mmap
+ * capable character device. Together with the type MEMORY_DEVICE_FS_DAX,
+ * we could distinguish the persistent memory pages from normal ZONE_DEVICE
+ * pages.
*/
enum memory_type {
MEMORY_DEVICE_PRIVATE = 1,
MEMORY_DEVICE_PUBLIC,
MEMORY_DEVICE_FS_DAX,
+ MEMORY_DEVICE_DEV_DAX,
};

/*
--
2.7.4


2018-08-09 02:16:04

by Zhang Yi

[permalink] [raw]
Subject: [PATCH V3 3/4] mm: add a function to differentiate the pages is from DAX device memory

DAX driver hotplug the device memory and move it to memory zone, these
pages will be marked reserved flag, however, some other kernel componet
will misconceive these pages are reserved mmio (ex: we map these dev_dax
or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
is DAX device memory or not.

Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yu <[email protected]>
---
include/linux/mm.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 68a5121..de5cbc3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct page *page)
page->pgmap->type == MEMORY_DEVICE_PUBLIC;
}

+static inline bool is_dax_page(const struct page *page)
+{
+ return is_zone_device_page(page) &&
+ (page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
+ page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
+}
+
#else /* CONFIG_DEV_PAGEMAP_OPS */
static inline void dev_pagemap_get_ops(void)
{
@@ -912,6 +919,11 @@ static inline bool is_device_public_page(const struct page *page)
{
return false;
}
+
+static inline bool is_dax_page(const struct page *page)
+{
+ return false;
+}
#endif /* CONFIG_DEV_PAGEMAP_OPS */

static inline void get_page(struct page *page)
--
2.7.4


2018-08-09 02:16:35

by Zhang Yi

[permalink] [raw]
Subject: [PATCH V3 4/4] kvm: add a check if pfn is from NVDIMM pmem.

For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved. the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to indentify these pages are from NVDIMM pmem. and let kvm treat these
as normal pages.

Without this patch, Many operations will be missed due to this
mistreatment to pmem pages. For example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc

Signed-off-by: Zhang Yi <[email protected]>
---
virt/kvm/kvm_main.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c44c406..969b6ca 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -147,8 +147,12 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,

bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
{
- if (pfn_valid(pfn))
- return PageReserved(pfn_to_page(pfn));
+ struct page *page;
+
+ if (pfn_valid(pfn)) {
+ page = pfn_to_page(pfn);
+ return PageReserved(page) && !is_dax_page(page);
+ }

return true;
}
--
2.7.4


2018-08-09 08:34:24

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [PATCH V3 4/4] kvm: add a check if pfn is from NVDIMM pmem.


>
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these

s/indentify/identify & remove '.'

> as normal pages.
>
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc
>
> Signed-off-by: Zhang Yi <[email protected]>
> ---
> virt/kvm/kvm_main.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c44c406..969b6ca 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -147,8 +147,12 @@ __weak void
> kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>
> bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
> {
> - if (pfn_valid(pfn))
> - return PageReserved(pfn_to_page(pfn));
> + struct page *page;
> +
> + if (pfn_valid(pfn)) {
> + page = pfn_to_page(pfn);
> + return PageReserved(page) && !is_dax_page(page);
> + }
>
> return true;
> }
> --
> 2.7.4
>
>

2018-08-09 09:00:55

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX

On Thu 09-08-18 18:53:08, Zhang Yi wrote:
> Currently, NVDIMM pages will be marked 'PageReserved'. However, unlike
> other reserved PFNs, pages on NVDIMM shall still behave like normal ones
> in many cases, i.e. when used as backend memory of KVM guest. This patch
> introduces a new memory type, MEMORY_DEVICE_DEV_DAX. And set this flag
> while dax driver hotplug the device memory.
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>

Looks good to me now. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> drivers/dax/pmem.c | 1 +
> include/linux/memremap.h | 8 ++++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
> index fd49b24..fb3f363 100644
> --- a/drivers/dax/pmem.c
> +++ b/drivers/dax/pmem.c
> @@ -111,6 +111,7 @@ static int dax_pmem_probe(struct device *dev)
> return rc;
>
> dax_pmem->pgmap.ref = &dax_pmem->ref;
> + dax_pmem->pgmap.type = MEMORY_DEVICE_DEV_DAX;
> addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
> if (IS_ERR(addr))
> return PTR_ERR(addr);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index f91f9e7..cd07ca8 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -53,11 +53,19 @@ struct vmem_altmap {
> * wakeup event whenever a page is unpinned and becomes idle. This
> * wakeup is used to coordinate physical address space management (ex:
> * fs truncate/hole punch) vs pinned pages (ex: device dma).
> + *
> + * MEMORY_DEVICE_DEV_DAX:
> + * Device memory that support raw access to persistent memory. Without need
> + * of an intervening filesystem, it could be directed mapped via an mmap
> + * capable character device. Together with the type MEMORY_DEVICE_FS_DAX,
> + * we could distinguish the persistent memory pages from normal ZONE_DEVICE
> + * pages.
> */
> enum memory_type {
> MEMORY_DEVICE_PRIVATE = 1,
> MEMORY_DEVICE_PUBLIC,
> MEMORY_DEVICE_FS_DAX,
> + MEMORY_DEVICE_DEV_DAX,
> };
>
> /*
> --
> 2.7.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-08-09 09:03:48

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio

On Thu 09-08-18 18:52:48, Zhang Yi wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
>
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>
> V1:
> https://lkml.org/lkml/2018/7/4/91
>
> V2:
> https://lkml.org/lkml/2018/7/10/135
>
> V3:
> [PATCH V3 1/4] Needs Comments.
> [PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
> [PATCH V3 3/4] Acked-by: Jan in V2

Hum, but it is not the the patch...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-08-09 09:15:07

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [PATCH V3 1/4] kvm: remove redundant reserved page check


>
> PageReserved() is already checked inside kvm_is_reserved_pfn(),
> remove it from kvm_set_pfn_dirty().
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>
> ---
> virt/kvm/kvm_main.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b47507f..c44c406 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1690,12 +1690,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
>
> void kvm_set_pfn_dirty(kvm_pfn_t pfn)
> {
> - if (!kvm_is_reserved_pfn(pfn)) {
> - struct page *page = pfn_to_page(pfn);
> -
> - if (!PageReserved(page))
> - SetPageDirty(page);
> - }
> + if (!kvm_is_reserved_pfn(pfn))
> + SetPageDirty(pfn_to_page(pfn));
> }
> EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);

Acked-by: Pankaj Gupta <[email protected]>

>
> --
> 2.7.4
>
>

2018-08-09 09:24:31

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [PATCH V3 3/4] mm: add a function to differentiate the pages is from DAX device memory


>
> DAX driver hotplug the device memory and move it to memory zone, these
> pages will be marked reserved flag, however, some other kernel componet
> will misconceive these pages are reserved mmio (ex: we map these dev_dax
> or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
> MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
> is DAX device memory or not.
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>
> ---
> include/linux/mm.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 68a5121..de5cbc3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct
> page *page)
> page->pgmap->type == MEMORY_DEVICE_PUBLIC;
> }
>
> +static inline bool is_dax_page(const struct page *page)
> +{
> + return is_zone_device_page(page) &&
> + (page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
> + page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
> +}

I think question from Dan for KVM VM with 'MEMORY_DEVICE_PUBLIC' still holds?
I am also interested to know if there is any use-case.

Thanks,
Pankaj

> +
> #else /* CONFIG_DEV_PAGEMAP_OPS */
> static inline void dev_pagemap_get_ops(void)
> {
> @@ -912,6 +919,11 @@ static inline bool is_device_public_page(const struct
> page *page)
> {
> return false;
> }
> +
> +static inline bool is_dax_page(const struct page *page)
> +{
> + return false;
> +}
> #endif /* CONFIG_DEV_PAGEMAP_OPS */
>
> static inline void get_page(struct page *page)
> --
> 2.7.4
>
>

2018-08-10 11:54:06

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH V3 1/4] kvm: remove redundant reserved page check

On 09.08.2018 12:52, Zhang Yi wrote:
> PageReserved() is already checked inside kvm_is_reserved_pfn(),
> remove it from kvm_set_pfn_dirty().
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>
> ---
> virt/kvm/kvm_main.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b47507f..c44c406 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1690,12 +1690,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
>
> void kvm_set_pfn_dirty(kvm_pfn_t pfn)
> {
> - if (!kvm_is_reserved_pfn(pfn)) {
> - struct page *page = pfn_to_page(pfn);
> -
> - if (!PageReserved(page))
> - SetPageDirty(page);
> - }
> + if (!kvm_is_reserved_pfn(pfn))
> + SetPageDirty(pfn_to_page(pfn));
> }
> EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
>
>

Reviewed-by: David Hildenbrand <[email protected]>

--

Thanks,

David / dhildenb

2018-08-10 13:34:51

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio

On 09.08.2018 12:52, Zhang Yi wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
>
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>

I am right now looking into (and trying to better document) PG_reserved
- and having a hard time :) .

One of the main points about reserved pages is that the struct pages are
not to be touched. See [1] (I know that statement is fairly old, but it
resembles what PG_reserved is actually used for nowadays - with some
exceptions unfortunately.).

Struct pages part of user space tables that are PG_reserved can indicate
(as of now according to my research)
- MMIO pages
- Selected MMAPed pages - e.g. vDSO
- Zero page
- PMEM pages as you correctly state

So I wonder, if it is really the right approach to silently go ahead and
treat reserved pages just like they would not be reserved. Maybe the
right approach would rather be to do something about pmem pages being
reserved. Yes, they are never to be given to the page allocator, but I
wonder if PG_reserved is strictly needed for that.

[1] https://lists.linuxcoding.com/kernel/2005-q3/msg10350.html

> V1:
> https://lkml.org/lkml/2018/7/4/91
>
> V2:
> https://lkml.org/lkml/2018/7/10/135
>
> V3:
> [PATCH V3 1/4] Needs Comments.
> [PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
> [PATCH V3 3/4] Acked-by: Jan in V2
> [PATCH V3 4/4] Needs Comments.
>
> Zhang Yi (4):
> kvm: remove redundant reserved page check
> mm: introduce memory type MEMORY_DEVICE_DEV_DAX
> mm: add a function to differentiate the pages is from DAX device
> memory
> kvm: add a check if pfn is from NVDIMM pmem.
>
> drivers/dax/pmem.c | 1 +
> include/linux/memremap.h | 8 ++++++++
> include/linux/mm.h | 12 ++++++++++++
> virt/kvm/kvm_main.c | 16 ++++++++--------
> 4 files changed, 29 insertions(+), 8 deletions(-)
>


--

Thanks,

David / dhildenb

2018-08-13 09:43:29

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio



On 2018年08月10日 21:27, David Hildenbrand wrote:
> On 09.08.2018 12:52, Zhang Yi wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
> I am right now looking into (and trying to better document) PG_reserved
> - and having a hard time :) .
>
> One of the main points about reserved pages is that the struct pages are
> not to be touched. See [1] (I know that statement is fairly old, but it
> resembles what PG_reserved is actually used for nowadays - with some
> exceptions unfortunately.).
>
> Struct pages part of user space tables that are PG_reserved can indicate
> (as of now according to my research)
> - MMIO pages
> - Selected MMAPed pages - e.g. vDSO
> - Zero page
> - PMEM pages as you correctly state
>
> So I wonder, if it is really the right approach to silently go ahead and
> treat reserved pages just like they would not be reserved. Maybe the
> right approach would rather be to do something about pmem pages being
> reserved. Yes, they are never to be given to the page allocator, but I
> wonder if PG_reserved is strictly needed for that.
>
> [1] https://lists.linuxcoding.com/kernel/2005-q3/msg10350.html

Thanks David list the long history of Page reserved, By now, I think we treat nvdimm as a device not a DRAM, also has it's device driver which manager its own device memory. From this perspective, it is reasonable to set these pages as zone device memory and mark reserved flag.
@Dan @Dave, how do you think about this?

>
>> V1:
>> https://lkml.org/lkml/2018/7/4/91
>>
>> V2:
>> https://lkml.org/lkml/2018/7/10/135
>>
>> V3:
>> [PATCH V3 1/4] Needs Comments.
>> [PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
>> [PATCH V3 3/4] Acked-by: Jan in V2
>> [PATCH V3 4/4] Needs Comments.
>>
>> Zhang Yi (4):
>> kvm: remove redundant reserved page check
>> mm: introduce memory type MEMORY_DEVICE_DEV_DAX
>> mm: add a function to differentiate the pages is from DAX device
>> memory
>> kvm: add a check if pfn is from NVDIMM pmem.
>>
>> drivers/dax/pmem.c | 1 +
>> include/linux/memremap.h | 8 ++++++++
>> include/linux/mm.h | 12 ++++++++++++
>> virt/kvm/kvm_main.c | 16 ++++++++--------
>> 4 files changed, 29 insertions(+), 8 deletions(-)
>>
>


2018-08-13 09:50:58

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio



On 2018年08月09日 17:02, Jan Kara wrote:
> On Thu 09-08-18 18:52:48, Zhang Yi wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
>> V1:
>> https://lkml.org/lkml/2018/7/4/91
>>
>> V2:
>> https://lkml.org/lkml/2018/7/10/135
>>
>> V3:
>> [PATCH V3 1/4] Needs Comments.
>> [PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
>> [PATCH V3 3/4] Acked-by: Jan in V2
> Hum, but it is not the the patch...
>
> Honza
Sorry, I missed that, will add in the next version, thanks for your review


2018-08-13 09:54:29

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH V3 4/4] kvm: add a check if pfn is from NVDIMM pmem.



On 2018年08月09日 16:32, Pankaj Gupta wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> s/indentify/identify & remove '.'
Thanks Pankaj, :-)
>
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>> ---
>> virt/kvm/kvm_main.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index c44c406..969b6ca 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -147,8 +147,12 @@ __weak void
>> kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>>
>> bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
>> {
>> - if (pfn_valid(pfn))
>> - return PageReserved(pfn_to_page(pfn));
>> + struct page *page;
>> +
>> + if (pfn_valid(pfn)) {
>> + page = pfn_to_page(pfn);
>> + return PageReserved(page) && !is_dax_page(page);
>> + }
>>
>> return true;
>> }
>> --
>> 2.7.4
>>
>>


2018-08-13 10:32:46

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH V3 3/4] mm: add a function to differentiate the pages is from DAX device memory



On 2018年08月09日 17:23, Pankaj Gupta wrote:
>> DAX driver hotplug the device memory and move it to memory zone, these
>> pages will be marked reserved flag, however, some other kernel componet
>> will misconceive these pages are reserved mmio (ex: we map these dev_dax
>> or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
>> MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
>> is DAX device memory or not.
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>> Signed-off-by: Zhang Yu <[email protected]>
>> ---
>> include/linux/mm.h | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 68a5121..de5cbc3 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct
>> page *page)
>> page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>> }
>>
>> +static inline bool is_dax_page(const struct page *page)
>> +{
>> + return is_zone_device_page(page) &&
>> + (page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
>> + page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
>> +}
> I think question from Dan for KVM VM with 'MEMORY_DEVICE_PUBLIC' still holds?
> I am also interested to know if there is any use-case.
>
> Thanks,
> Pankaj
Yes, it is, thanks for your remind, Pankaj.
Adding Jerome for Dan's questions on V1:
[Dan]:

Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
memory to a guest vm?

>
>> +
>> #else /* CONFIG_DEV_PAGEMAP_OPS */
>> static inline void dev_pagemap_get_ops(void)
>> {
>> @@ -912,6 +919,11 @@ static inline bool is_device_public_page(const struct
>> page *page)
>> {
>> return false;
>> }
>> +
>> +static inline bool is_dax_page(const struct page *page)
>> +{
>> + return false;
>> +}
>> #endif /* CONFIG_DEV_PAGEMAP_OPS */
>>
>> static inline void get_page(struct page *page)
>> --
>> 2.7.4
>>
>>


2018-08-13 14:31:22

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH V3 3/4] mm: add a function to differentiate the pages is from DAX device memory

On Tue, Aug 14, 2018 at 01:41:40AM +0800, Zhang,Yi wrote:
>
>
> On 2018年08月09日 17:23, Pankaj Gupta wrote:
> >> DAX driver hotplug the device memory and move it to memory zone, these
> >> pages will be marked reserved flag, however, some other kernel componet
> >> will misconceive these pages are reserved mmio (ex: we map these dev_dax
> >> or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
> >> MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
> >> is DAX device memory or not.
> >>
> >> Signed-off-by: Zhang Yi <[email protected]>
> >> Signed-off-by: Zhang Yu <[email protected]>
> >> ---
> >> include/linux/mm.h | 12 ++++++++++++
> >> 1 file changed, 12 insertions(+)
> >>
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index 68a5121..de5cbc3 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct
> >> page *page)
> >> page->pgmap->type == MEMORY_DEVICE_PUBLIC;
> >> }
> >>
> >> +static inline bool is_dax_page(const struct page *page)
> >> +{
> >> + return is_zone_device_page(page) &&
> >> + (page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
> >> + page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
> >> +}
> > I think question from Dan for KVM VM with 'MEMORY_DEVICE_PUBLIC' still holds?
> > I am also interested to know if there is any use-case.
> >
> > Thanks,
> > Pankaj
> Yes, it is, thanks for your remind, Pankaj.
> Adding Jerome for Dan's questions on V1:
> [Dan]:
>
> Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
> memory to a guest vm?

Yes and no, i am not sure how we are going to do it. But being able to
share GPU among multiple VM is on TODO list and those GPU will have
MEMORY_DEVICE_PUBLIC|PRIVATE depending on the platform. So either we
pass down the real underlying resource to the guest, or we will pass
down a fake one and have guest and host driver talk to each other so
that the host driver can do overall resource management accross multiple
guests.

So i would say that for now you can ignore MEMORY_DEVICE_PUBLIC and when
we get to the KVM guest sharing of those and decide how we want to do
it then we can update kvm to properly interpret those.

Cheers,
Jérôme