LinuxLists.cc - [PATCH] vfio/type1: avoid redundant PageReserved checking

2019-08-27 12:51:23

Subject: [PATCH] vfio/type1: avoid redundant PageReserved checking

currently, if the page is not a tail of compound page, it will be
checked twice for the same thing.

Signed-off-by: Ben Luo <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f..cbe0d88 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
static bool is_invalid_reserved_pfn(unsigned long pfn)
{
if (pfn_valid(pfn)) {
- bool reserved;
struct page *tail = pfn_to_page(pfn);
struct page *head = compound_head(tail);
- reserved = !!(PageReserved(head));
if (head != tail) {
+ bool reserved = !!(PageReserved(head));
/*
* "head" is not a dangling pointer
* (compound_head takes care of that)
@@ -310,7 +309,7 @@ static bool is_invalid_reserved_pfn(unsigned long pfn)
if (PageTail(tail))
return reserved;
}
- return PageReserved(tail);
+ return !!(PageReserved(tail));
}

return true;
--
1.8.3.1

2019-08-27 18:43:25

by Alex Williamson

[permalink] [raw]

Subject: Re: [PATCH] vfio/type1: avoid redundant PageReserved checking

On Tue, 27 Aug 2019 20:49:48 +0800
Ben Luo <[email protected]> wrote:

> currently, if the page is not a tail of compound page, it will be
> checked twice for the same thing.
>
> Signed-off-by: Ben Luo <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 054391f..cbe0d88 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> static bool is_invalid_reserved_pfn(unsigned long pfn)
> {
> if (pfn_valid(pfn)) {
> - bool reserved;
> struct page *tail = pfn_to_page(pfn);
> struct page *head = compound_head(tail);
> - reserved = !!(PageReserved(head));
> if (head != tail) {
> + bool reserved = !!(PageReserved(head));
> /*
> * "head" is not a dangling pointer
> * (compound_head takes care of that)
> @@ -310,7 +309,7 @@ static bool is_invalid_reserved_pfn(unsigned long pfn)
> if (PageTail(tail))
> return reserved;
> }
> - return PageReserved(tail);
> + return !!(PageReserved(tail));
> }
>
> return true;

Logic seems fine to me, though I'd actually prefer to get rid of the !!
in the first use than duplicate it at the second use. Thanks,

Alex

2019-08-28 04:29:22

by Ben Luo

[permalink] [raw]

Subject: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

currently, if the page is not a tail of compound page, it will be
checked twice for the same thing.

Signed-off-by: Ben Luo <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f..d0f7346 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
static bool is_invalid_reserved_pfn(unsigned long pfn)
{
if (pfn_valid(pfn)) {
- bool reserved;
struct page *tail = pfn_to_page(pfn);
struct page *head = compound_head(tail);
- reserved = !!(PageReserved(head));
if (head != tail) {
+ bool reserved = PageReserved(head);
/*
* "head" is not a dangling pointer
* (compound_head takes care of that)
--
1.8.3.1

2019-08-28 15:56:13

by Alex Williamson

[permalink] [raw]

Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

On Wed, 28 Aug 2019 12:28:04 +0800
Ben Luo <[email protected]> wrote:

> currently, if the page is not a tail of compound page, it will be
> checked twice for the same thing.
>
> Signed-off-by: Ben Luo <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 054391f..d0f7346 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> static bool is_invalid_reserved_pfn(unsigned long pfn)
> {
> if (pfn_valid(pfn)) {
> - bool reserved;
> struct page *tail = pfn_to_page(pfn);
> struct page *head = compound_head(tail);
> - reserved = !!(PageReserved(head));
> if (head != tail) {
> + bool reserved = PageReserved(head);
> /*
> * "head" is not a dangling pointer
> * (compound_head takes care of that)

Thinking more about this, the code here was originally just a copy of
kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
Should we instead do the same thing here? Thanks,

Alex

commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
Author: Andrea Arcangeli <[email protected]>
Date: Thu Jul 25 03:04:38 2013 +0200

kvm: optimize away THP checks in kvm_is_mmio_pfn()

The checks on PG_reserved in the page structure on head and tail pages
aren't necessary because split_huge_page wouldn't transfer the
PG_reserved bit from head to tail anyway.

This was a forward-thinking check done in the case PageReserved was
set by a driver-owned page mapped in userland with something like
remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
possible right now). It was meant to be very safe, but it's overkill
as it's unlikely split_huge_page could ever run without the driver
noticing and tearing down the hugepage itself.

And if a driver in the future will really want to map a reserved
hugepage in userland using an huge pmd it should simply take care of
marking all subpages reserved too to keep KVM safe. This of course
would require such a hypothetical driver to tear down the huge pmd
itself and splitting the hugepage itself, instead of relaying on
split_huge_page, but that sounds very reasonable, especially
considering split_huge_page wouldn't currently transfer the reserved
bit anyway.

Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d2836788561e..0fc25aed79a8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,28 +102,8 @@ static bool largepages_enabled = true;

bool kvm_is_mmio_pfn(pfn_t pfn)
{
- if (pfn_valid(pfn)) {
- int reserved;
- struct page *tail = pfn_to_page(pfn);
- struct page *head = compound_trans_head(tail);
- reserved = PageReserved(head);
- if (head != tail) {
- /*
- * "head" is not a dangling pointer
- * (compound_trans_head takes care of that)
- * but the hugepage may have been splitted
- * from under us (and we may not hold a
- * reference count on the head page so it can
- * be reused before we run PageReferenced), so
- * we've to check PageTail before returning
- * what we just read.
- */
- smp_rmb();
- if (PageTail(tail))
- return reserved;
- }
- return PageReserved(tail);
- }
+ if (pfn_valid(pfn))
+ return PageReserved(pfn_to_page(pfn));

return true;
}

2019-08-29 17:01:06

by Ben Luo

[permalink] [raw]

Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

?? 2019/8/28 ????11:55, Alex Williamson д??:
> On Wed, 28 Aug 2019 12:28:04 +0800
> Ben Luo <[email protected]> wrote:
>
>> currently, if the page is not a tail of compound page, it will be
>> checked twice for the same thing.
>>
>> Signed-off-by: Ben Luo <[email protected]>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 054391f..d0f7346 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>> static bool is_invalid_reserved_pfn(unsigned long pfn)
>> {
>> if (pfn_valid(pfn)) {
>> - bool reserved;
>> struct page *tail = pfn_to_page(pfn);
>> struct page *head = compound_head(tail);
>> - reserved = !!(PageReserved(head));
>> if (head != tail) {
>> + bool reserved = PageReserved(head);
>> /*
>> * "head" is not a dangling pointer
>> * (compound_head takes care of that)
> Thinking more about this, the code here was originally just a copy of
> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> Should we instead do the same thing here? Thanks,
>
> Alex
ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
a look at that and compose a new patch
>
> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> Author: Andrea Arcangeli <[email protected]>
> Date: Thu Jul 25 03:04:38 2013 +0200
>
> kvm: optimize away THP checks in kvm_is_mmio_pfn()
>
> The checks on PG_reserved in the page structure on head and tail pages
> aren't necessary because split_huge_page wouldn't transfer the
> PG_reserved bit from head to tail anyway.
>
> This was a forward-thinking check done in the case PageReserved was
> set by a driver-owned page mapped in userland with something like
> remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> possible right now). It was meant to be very safe, but it's overkill
> as it's unlikely split_huge_page could ever run without the driver
> noticing and tearing down the hugepage itself.
>
> And if a driver in the future will really want to map a reserved
> hugepage in userland using an huge pmd it should simply take care of
> marking all subpages reserved too to keep KVM safe. This of course
> would require such a hypothetical driver to tear down the huge pmd
> itself and splitting the hugepage itself, instead of relaying on
> split_huge_page, but that sounds very reasonable, especially
> considering split_huge_page wouldn't currently transfer the reserved
> bit anyway.
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Gleb Natapov <[email protected]>
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d2836788561e..0fc25aed79a8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
>
> bool kvm_is_mmio_pfn(pfn_t pfn)
> {
> - if (pfn_valid(pfn)) {
> - int reserved;
> - struct page *tail = pfn_to_page(pfn);
> - struct page *head = compound_trans_head(tail);
> - reserved = PageReserved(head);
> - if (head != tail) {
> - /*
> - * "head" is not a dangling pointer
> - * (compound_trans_head takes care of that)
> - * but the hugepage may have been splitted
> - * from under us (and we may not hold a
> - * reference count on the head page so it can
> - * be reused before we run PageReferenced), so
> - * we've to check PageTail before returning
> - * what we just read.
> - */
> - smp_rmb();
> - if (PageTail(tail))
> - return reserved;
> - }
> - return PageReserved(tail);
> - }
> + if (pfn_valid(pfn))
> + return PageReserved(pfn_to_page(pfn));
>
> return true;
> }

2019-08-29 17:08:51

by Alex Williamson

[permalink] [raw]

Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

On Fri, 30 Aug 2019 00:58:22 +0800
Ben Luo <[email protected]> wrote:

> 在 2019/8/28 下午11:55, Alex Williamson 写道:
> > On Wed, 28 Aug 2019 12:28:04 +0800
> > Ben Luo <[email protected]> wrote:
> >
> >> currently, if the page is not a tail of compound page, it will be
> >> checked twice for the same thing.
> >>
> >> Signed-off-by: Ben Luo <[email protected]>
> >> ---
> >> drivers/vfio/vfio_iommu_type1.c | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 054391f..d0f7346 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >> static bool is_invalid_reserved_pfn(unsigned long pfn)
> >> {
> >> if (pfn_valid(pfn)) {
> >> - bool reserved;
> >> struct page *tail = pfn_to_page(pfn);
> >> struct page *head = compound_head(tail);
> >> - reserved = !!(PageReserved(head));
> >> if (head != tail) {
> >> + bool reserved = PageReserved(head);
> >> /*
> >> * "head" is not a dangling pointer
> >> * (compound_head takes care of that)
> > Thinking more about this, the code here was originally just a copy of
> > kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> > Should we instead do the same thing here? Thanks,
> >
> > Alex
> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> a look at that and compose a new patch

I'm not sure if the further updates are quite as relevant for vfio, but
appreciate your review of them. Thanks,

Alex

> >
> > commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> > Author: Andrea Arcangeli <[email protected]>
> > Date: Thu Jul 25 03:04:38 2013 +0200
> >
> > kvm: optimize away THP checks in kvm_is_mmio_pfn()
> >
> > The checks on PG_reserved in the page structure on head and tail pages
> > aren't necessary because split_huge_page wouldn't transfer the
> > PG_reserved bit from head to tail anyway.
> >
> > This was a forward-thinking check done in the case PageReserved was
> > set by a driver-owned page mapped in userland with something like
> > remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> > possible right now). It was meant to be very safe, but it's overkill
> > as it's unlikely split_huge_page could ever run without the driver
> > noticing and tearing down the hugepage itself.
> >
> > And if a driver in the future will really want to map a reserved
> > hugepage in userland using an huge pmd it should simply take care of
> > marking all subpages reserved too to keep KVM safe. This of course
> > would require such a hypothetical driver to tear down the huge pmd
> > itself and splitting the hugepage itself, instead of relaying on
> > split_huge_page, but that sounds very reasonable, especially
> > considering split_huge_page wouldn't currently transfer the reserved
> > bit anyway.
> >
> > Signed-off-by: Andrea Arcangeli <[email protected]>
> > Signed-off-by: Gleb Natapov <[email protected]>
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index d2836788561e..0fc25aed79a8 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
> >
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > - if (pfn_valid(pfn)) {
> > - int reserved;
> > - struct page *tail = pfn_to_page(pfn);
> > - struct page *head = compound_trans_head(tail);
> > - reserved = PageReserved(head);
> > - if (head != tail) {
> > - /*
> > - * "head" is not a dangling pointer
> > - * (compound_trans_head takes care of that)
> > - * but the hugepage may have been splitted
> > - * from under us (and we may not hold a
> > - * reference count on the head page so it can
> > - * be reused before we run PageReferenced), so
> > - * we've to check PageTail before returning
> > - * what we just read.
> > - */
> > - smp_rmb();
> > - if (PageTail(tail))
> > - return reserved;
> > - }
> > - return PageReserved(tail);
> > - }
> > + if (pfn_valid(pfn))
> > + return PageReserved(pfn_to_page(pfn));
> >
> > return true;
> > }

2019-09-02 08:10:51

by Ben Luo

[permalink] [raw]

Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

在 2019/8/30 上午1:06, Alex Williamson 写道:
> On Fri, 30 Aug 2019 00:58:22 +0800
> Ben Luo <[email protected]> wrote:
>
>> 在 2019/8/28 下午11:55, Alex Williamson 写道:
>>> On Wed, 28 Aug 2019 12:28:04 +0800
>>> Ben Luo <[email protected]> wrote:
>>>
>>>> currently, if the page is not a tail of compound page, it will be
>>>> checked twice for the same thing.
>>>>
>>>> Signed-off-by: Ben Luo <[email protected]>
>>>> ---
>>>> drivers/vfio/vfio_iommu_type1.c | 3 +--
>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 054391f..d0f7346 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>>> static bool is_invalid_reserved_pfn(unsigned long pfn)
>>>> {
>>>> if (pfn_valid(pfn)) {
>>>> - bool reserved;
>>>> struct page *tail = pfn_to_page(pfn);
>>>> struct page *head = compound_head(tail);
>>>> - reserved = !!(PageReserved(head));
>>>> if (head != tail) {
>>>> + bool reserved = PageReserved(head);
>>>> /*
>>>> * "head" is not a dangling pointer
>>>> * (compound_head takes care of that)
>>> Thinking more about this, the code here was originally just a copy of
>>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
>>> Should we instead do the same thing here? Thanks,
>>>
>>> Alex
>> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
>> a look at that and compose a new patch
> I'm not sure if the further updates are quite as relevant for vfio, but
> appreciate your review of them. Thanks,
>
> Alex
After studying the related code, my personal understandings are:

kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO
mapped so
that to set the proper MTRR TYPE to spte.

is_invalid_reserved_pfn() is used in two scenarios:
1, to tell whether a page should be counted against user's mlock limits, as
the function's name implies, all 'invalid' PFNs who are not backed by struct
page and those reserved pages (including zero page and those from NVDIMM
DAX)
should be excluded.
2, to check if we have got a valid and pinned pfn for the vma with VM_PFNMAP
flag. So, for the zero page and 'RAM' backed PFNs without 'struct page',
kvm_is_mmio_pfn() should return false because they are not MMIO and are
cacheable, but is_invalid_reserved_pfn() should return true since they are
truely reserved or invalid and should not be counted against user's mlock
limits.

For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO also
returns this error code to user, seems not support fsdax for now, so
there is
no chance to call into is_invalid_reserved_pfn() currently, if fsdax is
to be
supported, not only this function needs to be updated, vaddr_get_pfn() also
needs some changes.

Now, with the assumption that PFNs of compound pages with reserved bit
set in
head will not be passed to is_invalid_reserved_pfn(), we can simplify this
function to:

static bool is_invalid_reserved_pfn(unsigned long pfn)
{
        if (pfn_valid(pfn))
                return PageReserved(pfn_to_page(pfn));

        return true;
}

But, I am not sure if the assumption above is true, if not, we still need to
check the reserved bit of head for a tail page as this PATCH v2 does.

Thanks,

    Ben

2019-09-13 19:30:26

by Alex Williamson

[permalink] [raw]

Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

On Mon, 2 Sep 2019 15:32:42 +0800
Ben Luo <[email protected]> wrote:

> 在 2019/8/30 上午1:06, Alex Williamson 写道:
> > On Fri, 30 Aug 2019 00:58:22 +0800
> > Ben Luo <[email protected]> wrote:
> >
> >> 在 2019/8/28 下午11:55, Alex Williamson 写道:
> >>> On Wed, 28 Aug 2019 12:28:04 +0800
> >>> Ben Luo <[email protected]> wrote:
> >>>
> >>>> currently, if the page is not a tail of compound page, it will be
> >>>> checked twice for the same thing.
> >>>>
> >>>> Signed-off-by: Ben Luo <[email protected]>
> >>>> ---
> >>>> drivers/vfio/vfio_iommu_type1.c | 3 +--
> >>>> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>>> index 054391f..d0f7346 100644
> >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>>> static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>>> {
> >>>> if (pfn_valid(pfn)) {
> >>>> - bool reserved;
> >>>> struct page *tail = pfn_to_page(pfn);
> >>>> struct page *head = compound_head(tail);
> >>>> - reserved = !!(PageReserved(head));
> >>>> if (head != tail) {
> >>>> + bool reserved = PageReserved(head);
> >>>> /*
> >>>> * "head" is not a dangling pointer
> >>>> * (compound_head takes care of that)
> >>> Thinking more about this, the code here was originally just a copy of
> >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> >>> Should we instead do the same thing here? Thanks,
> >>>
> >>> Alex
> >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> >> a look at that and compose a new patch
> > I'm not sure if the further updates are quite as relevant for vfio, but
> > appreciate your review of them. Thanks,
> >
> > Alex
>
> After studying the related code, my personal understandings are:
>
> kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO
> mapped so that to set
> the proper MTRR TYPE to spte.
>
> is_invalid_reserved_pfn() is used in two scenarios:
> 1. to tell whether a page should be counted against user's mlock
> limits, as the function's name
> implies, all 'invalid' PFNs who are not backed by struct page and those
> reserved pages (including
> zero page and those from NVDIMM DAX) should be excluded.
> 2. to check if we have got a valid and pinned pfn for the vma
> with VM_PFNMAP flag.
>
> So, for the zero page and 'RAM' backed PFNs without 'struct page',
> kvm_is_mmio_pfn() should
> return false because they are not MMIO and are cacheable, but
> is_invalid_reserved_pfn() should
> return true since they are truely reserved or invalid and should not be
> counted against user's
> mlock limits.
>
> For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO
> also returns this
> error code to user, seems not support fsdax for now, so there is no
> chance to call into
> is_invalid_reserved_pfn() currently, if fsdax is to be supported, not
> only this function needs to be
> updated, vaddr_get_pfn() also needs some changes.
>
> Now, with the assumption that PFNs of compound pages with reserved bit
> set in head will not be
> passed to is_invalid_reserved_pfn(), we can simplify this function to:
>
> static bool is_invalid_reserved_pfn(unsigned long pfn)
> {
>         if (pfn_valid(pfn))
>                 return PageReserved(pfn_to_page(pfn));
>
>         return true;
> }
>
> But, I am not sure if the assumption above is true, if not, we still
> need to check the reserved bit of
> head for a tail page as this PATCH v2 does.

I believe what you've described is correct. Andrea, have we missed
anything? Thanks,

Alex

> >
> >>> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> >>> Author: Andrea Arcangeli <[email protected]>
> >>> Date: Thu Jul 25 03:04:38 2013 +0200
> >>>
> >>> kvm: optimize away THP checks in kvm_is_mmio_pfn()
> >>>
> >>> The checks on PG_reserved in the page structure on head and tail pages
> >>> aren't necessary because split_huge_page wouldn't transfer the
> >>> PG_reserved bit from head to tail anyway.
> >>>
> >>> This was a forward-thinking check done in the case PageReserved was
> >>> set by a driver-owned page mapped in userland with something like
> >>> remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> >>> possible right now). It was meant to be very safe, but it's overkill
> >>> as it's unlikely split_huge_page could ever run without the driver
> >>> noticing and tearing down the hugepage itself.
> >>>
> >>> And if a driver in the future will really want to map a reserved
> >>> hugepage in userland using an huge pmd it should simply take care of
> >>> marking all subpages reserved too to keep KVM safe. This of course
> >>> would require such a hypothetical driver to tear down the huge pmd
> >>> itself and splitting the hugepage itself, instead of relaying on
> >>> split_huge_page, but that sounds very reasonable, especially
> >>> considering split_huge_page wouldn't currently transfer the reserved
> >>> bit anyway.
> >>>
> >>> Signed-off-by: Andrea Arcangeli <[email protected]>
> >>> Signed-off-by: Gleb Natapov <[email protected]>
> >>>
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index d2836788561e..0fc25aed79a8 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
> >>>
> >>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>> {
> >>> - if (pfn_valid(pfn)) {
> >>> - int reserved;
> >>> - struct page *tail = pfn_to_page(pfn);
> >>> - struct page *head = compound_trans_head(tail);
> >>> - reserved = PageReserved(head);
> >>> - if (head != tail) {
> >>> - /*
> >>> - * "head" is not a dangling pointer
> >>> - * (compound_trans_head takes care of that)
> >>> - * but the hugepage may have been splitted
> >>> - * from under us (and we may not hold a
> >>> - * reference count on the head page so it can
> >>> - * be reused before we run PageReferenced), so
> >>> - * we've to check PageTail before returning
> >>> - * what we just read.
> >>> - */
> >>> - smp_rmb();
> >>> - if (PageTail(tail))
> >>> - return reserved;
> >>> - }
> >>> - return PageReserved(tail);
> >>> - }
> >>> + if (pfn_valid(pfn))
> >>> + return PageReserved(pfn_to_page(pfn));
> >>>
> >>> return true;
> >>> }

2019-09-30 23:36:29

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

Hello,

On Fri, Sep 13, 2019 at 12:05:26PM -0600, Alex Williamson wrote:
> On Mon, 2 Sep 2019 15:32:42 +0800
> Ben Luo <[email protected]> wrote:
>
> > 在 2019/8/30 上午1:06, Alex Williamson 写道:
> > > On Fri, 30 Aug 2019 00:58:22 +0800
> > > Ben Luo <[email protected]> wrote:
> > >
> > >> 在 2019/8/28 下午11:55, Alex Williamson 写道:
> > >>> On Wed, 28 Aug 2019 12:28:04 +0800
> > >>> Ben Luo <[email protected]> wrote:
> > >>>
> > >>>> currently, if the page is not a tail of compound page, it will be
> > >>>> checked twice for the same thing.
> > >>>>
> > >>>> Signed-off-by: Ben Luo <[email protected]>
> > >>>> ---
> > >>>> drivers/vfio/vfio_iommu_type1.c | 3 +--
> > >>>> 1 file changed, 1 insertion(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > >>>> index 054391f..d0f7346 100644
> > >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> > >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> > >>>> static bool is_invalid_reserved_pfn(unsigned long pfn)
> > >>>> {
> > >>>> if (pfn_valid(pfn)) {
> > >>>> - bool reserved;
> > >>>> struct page *tail = pfn_to_page(pfn);
> > >>>> struct page *head = compound_head(tail);
> > >>>> - reserved = !!(PageReserved(head));
> > >>>> if (head != tail) {
> > >>>> + bool reserved = PageReserved(head);
> > >>>> /*
> > >>>> * "head" is not a dangling pointer
> > >>>> * (compound_head takes care of that)
> > >>> Thinking more about this, the code here was originally just a copy of
> > >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> > >>> Should we instead do the same thing here? Thanks,
> > >>>
> > >>> Alex
> > >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> > >> a look at that and compose a new patch
> > > I'm not sure if the further updates are quite as relevant for vfio, but
> > > appreciate your review of them. Thanks,
> > >
> > > Alex
> >
> > After studying the related code, my personal understandings are:
> >
> > kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO
> > mapped so that to set
> > the proper MTRR TYPE to spte.
> >
> > is_invalid_reserved_pfn() is used in two scenarios:
> > 1. to tell whether a page should be counted against user's mlock
> > limits, as the function's name
> > implies, all 'invalid' PFNs who are not backed by struct page and those
> > reserved pages (including
> > zero page and those from NVDIMM DAX) should be excluded.
> > 2. to check if we have got a valid and pinned pfn for the vma
> > with VM_PFNMAP flag.
> >
> > So, for the zero page and 'RAM' backed PFNs without 'struct page',
> > kvm_is_mmio_pfn() should
> > return false because they are not MMIO and are cacheable, but
> > is_invalid_reserved_pfn() should
> > return true since they are truely reserved or invalid and should not be
> > counted against user's
> > mlock limits.
> >
> > For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO
> > also returns this
> > error code to user, seems not support fsdax for now, so there is no
> > chance to call into
> > is_invalid_reserved_pfn() currently, if fsdax is to be supported, not
> > only this function needs to be
> > updated, vaddr_get_pfn() also needs some changes.
> >
> > Now, with the assumption that PFNs of compound pages with reserved bit
> > set in head will not be
> > passed to is_invalid_reserved_pfn(), we can simplify this function to:
> >
> > static bool is_invalid_reserved_pfn(unsigned long pfn)
> > {
> >         if (pfn_valid(pfn))
> >                 return PageReserved(pfn_to_page(pfn));
> >
> >         return true;
> > }
> >
> > But, I am not sure if the assumption above is true, if not, we still
> > need to check the reserved bit of
> > head for a tail page as this PATCH v2 does.
>
> I believe what you've described is correct. Andrea, have we missed
> anything? Thanks,

Yes it looks good. The only reason for ever wanting to check the head
page reserved bit (instead of only checking the tail page reserved
bit) would be if any code would transfer the reserved bit from head to
tail during a hugepage split, but no hugepage split code can transfer
the reserved bit from head to tail during the split, so checking the
head can't make a difference.

The buddy wouldn't allow the driver to allocate an hugepage if any
subpage is reserved in the e820 map at boot, so non-RAM pages with a
backing struct page aren't an issue here. This was only meaningful for
PFNMAP in case the PG_reserved bit was set by the driver on a hugepage
before mapping it in userland, in which case the driver needs to set
the reserved bit in all subpages to be safe (not only in the head).

Thanks,
Andrea