2018-07-04 06:52:38

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 0/3] Fix kvm misconceives NVDIMM pages as reserved mmio

For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved. the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to indentify these pages are from NVDIMM pmem. and let kvm treat these
as normal pages.

Without this patch, Many operations will be missed due to this
mistreatment to pmem pages. For example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.

Zhang Yi (3):
kvm: remove redundant reserved page check
mm: introduce memory type MEMORY_DEVICE_DEV_DAX
kvm: add a function to check if page is from NVDIMM pmem.

drivers/dax/pmem.c | 1 +
include/linux/memremap.h | 1 +
virt/kvm/kvm_main.c | 25 +++++++++++++++++--------
3 files changed, 19 insertions(+), 8 deletions(-)

--
2.7.4



2018-07-04 06:53:13

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 2/3] mm: introduce memory type MEMORY_DEVICE_DEV_DAX

Currently, NVDIMM pages will be marked 'PageReserved'. However, unlike
other reserved PFNs, pages on NVDIMM shall still behave like normal ones
in many cases, i.e. when used as backend memory of KVM guest. This patch
introduces a new memory type, MEMORY_DEVICE_DEV_DAX. Together with the
existing type MEMORY_DEVICE_FS_DAX, we can differentiate the pages on
NVDIMM with the normal reserved pages.

Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yu <[email protected]>
---
drivers/dax/pmem.c | 1 +
include/linux/memremap.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index fd49b24..fb3f363 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -111,6 +111,7 @@ static int dax_pmem_probe(struct device *dev)
return rc;

dax_pmem->pgmap.ref = &dax_pmem->ref;
+ dax_pmem->pgmap.type = MEMORY_DEVICE_DEV_DAX;
addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 5ebfff6..4127bf7 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -58,6 +58,7 @@ enum memory_type {
MEMORY_DEVICE_PRIVATE = 1,
MEMORY_DEVICE_PUBLIC,
MEMORY_DEVICE_FS_DAX,
+ MEMORY_DEVICE_DEV_DAX,
};

/*
--
2.7.4


2018-07-04 06:53:19

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.

For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved. the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to indentify these pages are from NVDIMM pmem. and let kvm treat these
as normal pages.

Without this patch, Many operations will be missed due to this
mistreatment to pmem pages. For example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.

Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yu <[email protected]>
---
virt/kvm/kvm_main.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index afb2e6e..1365d18 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
{
}

+static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
+{
+ struct page *page = pfn_to_page(pfn);
+
+ return is_zone_device_page(page) &&
+ ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
+ (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
+}
+
bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
{
- if (pfn_valid(pfn))
- return PageReserved(pfn_to_page(pfn));
+ struct page *page;
+
+ if (pfn_valid(pfn)) {
+ page = pfn_to_page(pfn);
+ return kvm_is_nd_pfn(pfn) ? false : PageReserved(page);
+ }

return true;
}
--
2.7.4


2018-07-04 06:54:30

by Zhang Yi

[permalink] [raw]
Subject: [PATCH 1/3] kvm: remove redundant reserved page check

PageReserved() is already checked inside kvm_is_reserved_pfn(),
remove it from kvm_set_pfn_dirty().

Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yu <[email protected]>
---
virt/kvm/kvm_main.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c7b2e92..afb2e6e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1673,12 +1673,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);

void kvm_set_pfn_dirty(kvm_pfn_t pfn)
{
- if (!kvm_is_reserved_pfn(pfn)) {
- struct page *page = pfn_to_page(pfn);
-
- if (!PageReserved(page))
- SetPageDirty(page);
- }
+ if (!kvm_is_reserved_pfn(pfn))
+ SetPageDirty(pfn_to_page(pfn));
}
EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);

--
2.7.4


2018-07-04 14:52:13

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: introduce memory type MEMORY_DEVICE_DEV_DAX

On Wed, Jul 4, 2018 at 8:30 AM, Zhang Yi <[email protected]> wrote:
> Currently, NVDIMM pages will be marked 'PageReserved'. However, unlike
> other reserved PFNs, pages on NVDIMM shall still behave like normal ones
> in many cases, i.e. when used as backend memory of KVM guest. This patch
> introduces a new memory type, MEMORY_DEVICE_DEV_DAX. Together with the
> existing type MEMORY_DEVICE_FS_DAX, we can differentiate the pages on
> NVDIMM with the normal reserved pages.
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>
> ---
> drivers/dax/pmem.c | 1 +
> include/linux/memremap.h | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
> index fd49b24..fb3f363 100644
> --- a/drivers/dax/pmem.c
> +++ b/drivers/dax/pmem.c
> @@ -111,6 +111,7 @@ static int dax_pmem_probe(struct device *dev)
> return rc;
>
> dax_pmem->pgmap.ref = &dax_pmem->ref;
> + dax_pmem->pgmap.type = MEMORY_DEVICE_DEV_DAX;
> addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
> if (IS_ERR(addr))
> return PTR_ERR(addr);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 5ebfff6..4127bf7 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -58,6 +58,7 @@ enum memory_type {
> MEMORY_DEVICE_PRIVATE = 1,
> MEMORY_DEVICE_PUBLIC,
> MEMORY_DEVICE_FS_DAX,
> + MEMORY_DEVICE_DEV_DAX,

Please add documentation for this new type to the comment block about
this definition.

2018-07-04 14:52:50

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.

[ adding Jerome ]

On Wed, Jul 4, 2018 at 8:30 AM, Zhang Yi <[email protected]> wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
>
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>
> ---
> virt/kvm/kvm_main.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index afb2e6e..1365d18 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> {
> }
>
> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> +{
> + struct page *page = pfn_to_page(pfn);
> +
> + return is_zone_device_page(page) &&
> + ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> + (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));

Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
memory to a guest vm?

> +}
> +
> bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
> {
> - if (pfn_valid(pfn))
> - return PageReserved(pfn_to_page(pfn));
> + struct page *page;
> +
> + if (pfn_valid(pfn)) {
> + page = pfn_to_page(pfn);
> + return kvm_is_nd_pfn(pfn) ? false : PageReserved(page);
> + }
>
> return true;
> }
> --
> 2.7.4
>

2018-07-04 15:27:05

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.

On 04/07/2018 17:30, Zhang Yi wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
>
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yu <[email protected]>
> ---
> virt/kvm/kvm_main.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index afb2e6e..1365d18 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> {
> }
>
> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> +{
> + struct page *page = pfn_to_page(pfn);
> +
> + return is_zone_device_page(page) &&
> + ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> + (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> +}

If the mm people agree, I'd prefer something that takes a struct page *
and is exported by include/linux/mm.h. Then KVM can just do something like

struct page *page;
if (!pfn_valid(pfn))
return true;

page = pfn_to_page(pfn);
return PageReserved(page) && !is_dax_page(page);

Thanks,

Paolo

2018-07-04 15:28:27

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.

On 04/07/2018 16:50, Dan Williams wrote:
>> + return is_zone_device_page(page) &&
>> + ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
>> + (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
> memory to a guest vm?
>

An even better reason to place this in mm.h. :) There should be an
function to tell you if a reserved page has accessed/dirty bits etc.,
that's all that KVM needs to know.

Paolo

2018-07-05 05:34:24

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.



On 2018年07月04日 23:25, Paolo Bonzini wrote:
> On 04/07/2018 17:30, Zhang Yi wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>> Signed-off-by: Zhang Yu <[email protected]>
>> ---
>> virt/kvm/kvm_main.c | 17 +++++++++++++++--
>> 1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index afb2e6e..1365d18 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>> {
>> }
>>
>> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
>> +{
>> + struct page *page = pfn_to_page(pfn);
>> +
>> + return is_zone_device_page(page) &&
>> + ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
>> + (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
>> +}
> If the mm people agree, I'd prefer something that takes a struct page *
> and is exported by include/linux/mm.h. Then KVM can just do something like
>
> struct page *page;
> if (!pfn_valid(pfn))
> return true;
>
> page = pfn_to_page(pfn);
> return PageReserved(page) && !is_dax_page(page);
>
> Thanks,
>
> Paolo
Yeah, that could be much better. Thanks for your comments Paolo.

Hi Kara, Do u have Any opinions/ideas to add such definition in mm?

Regards,
Yi

2018-07-05 05:36:20

by zhangyi6

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: introduce memory type MEMORY_DEVICE_DEV_DAX



On 2018年07月04日 22:50, Dan Williams wrote:
> On Wed, Jul 4, 2018 at 8:30 AM, Zhang Yi <[email protected]> wrote:
>> Currently, NVDIMM pages will be marked 'PageReserved'. However, unlike
>> other reserved PFNs, pages on NVDIMM shall still behave like normal ones
>> in many cases, i.e. when used as backend memory of KVM guest. This patch
>> introduces a new memory type, MEMORY_DEVICE_DEV_DAX. Together with the
>> existing type MEMORY_DEVICE_FS_DAX, we can differentiate the pages on
>> NVDIMM with the normal reserved pages.
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>> Signed-off-by: Zhang Yu <[email protected]>
>> ---
>> drivers/dax/pmem.c | 1 +
>> include/linux/memremap.h | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
>> index fd49b24..fb3f363 100644
>> --- a/drivers/dax/pmem.c
>> +++ b/drivers/dax/pmem.c
>> @@ -111,6 +111,7 @@ static int dax_pmem_probe(struct device *dev)
>> return rc;
>>
>> dax_pmem->pgmap.ref = &dax_pmem->ref;
>> + dax_pmem->pgmap.type = MEMORY_DEVICE_DEV_DAX;
>> addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
>> if (IS_ERR(addr))
>> return PTR_ERR(addr);
>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> index 5ebfff6..4127bf7 100644
>> --- a/include/linux/memremap.h
>> +++ b/include/linux/memremap.h
>> @@ -58,6 +58,7 @@ enum memory_type {
>> MEMORY_DEVICE_PRIVATE = 1,
>> MEMORY_DEVICE_PUBLIC,
>> MEMORY_DEVICE_FS_DAX,
>> + MEMORY_DEVICE_DEV_DAX,
> Please add documentation for this new type to the comment block about
> this definition.
Thanks for your comments Dan, Will add it in next version,
Regards
Yi.

2018-07-09 12:38:01

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.

On Thu 05-07-18 21:19:30, Zhang,Yi wrote:
>
>
> On 2018年07月04日 23:25, Paolo Bonzini wrote:
> > On 04/07/2018 17:30, Zhang Yi wrote:
> >> For device specific memory space, when we move these area of pfn to
> >> memory zone, we will set the page reserved flag at that time, some of
> >> these reserved for device mmio, and some of these are not, such as
> >> NVDIMM pmem.
> >>
> >> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> >> backend, since these pages are reserved. the check of
> >> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> >> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> >> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> >> as normal pages.
> >>
> >> Without this patch, Many operations will be missed due to this
> >> mistreatment to pmem pages. For example, a page may not have chance to
> >> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> >> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
> >>
> >> Signed-off-by: Zhang Yi <[email protected]>
> >> Signed-off-by: Zhang Yu <[email protected]>
> >> ---
> >> virt/kvm/kvm_main.c | 17 +++++++++++++++--
> >> 1 file changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index afb2e6e..1365d18 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> >> {
> >> }
> >>
> >> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> >> +{
> >> + struct page *page = pfn_to_page(pfn);
> >> +
> >> + return is_zone_device_page(page) &&
> >> + ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> >> + (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> >> +}
> > If the mm people agree, I'd prefer something that takes a struct page *
> > and is exported by include/linux/mm.h. Then KVM can just do something like
> >
> > struct page *page;
> > if (!pfn_valid(pfn))
> > return true;
> >
> > page = pfn_to_page(pfn);
> > return PageReserved(page) && !is_dax_page(page);
> >
> > Thanks,
> >
> > Paolo
> Yeah, that could be much better. Thanks for your comments Paolo.
>
> Hi Kara, Do u have Any opinions/ideas to add such definition in mm?

What Paolo suggests sounds good to me.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR