2019-11-13 04:32:13

by John Hubbard

[permalink] [raw]
Subject: [PATCH v4 08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM

As it says in the updated comment in gup.c: current FOLL_LONGTERM
behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
FS DAX check requirement on vmas.

However, the corresponding restriction in get_user_pages_remote() was
slightly stricter than is actually required: it forbade all
FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
that do not set the "locked" arg.

Update the code and comments accordingly, and update the VFIO caller
to take advantage of this, fixing a bug as a result: the VFIO caller
is logically a FOLL_LONGTERM user.

Also, remove an unnessary pair of calls that were releasing and
reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
just in order to call page_to_pfn().

Also, move the DAX check ("if a VMA is DAX, don't allow long term
pinning") from the VFIO call site, all the way into the internals
of get_user_pages_remote() and __gup_longterm_locked(). That is:
get_user_pages_remote() calls __gup_longterm_locked(), which in turn
calls check_dax_vmas(). It's lightly explained in the comments as well.

Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
and to Dan Williams for helping clarify the DAX refactoring.

Suggested-by: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
mm/gup.c | 27 ++++++++++++++++++++++-----
2 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d864277ea16f..7301b710c9a4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
{
struct page *page[1];
struct vm_area_struct *vma;
- struct vm_area_struct *vmas[1];
unsigned int flags = 0;
int ret;

@@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
flags |= FOLL_WRITE;

down_read(&mm->mmap_sem);
- if (mm == current->mm) {
- ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
- vmas);
- } else {
- ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
- vmas, NULL);
- /*
- * The lifetime of a vaddr_get_pfn() page pin is
- * userspace-controlled. In the fs-dax case this could
- * lead to indefinite stalls in filesystem operations.
- * Disallow attempts to pin fs-dax pages via this
- * interface.
- */
- if (ret > 0 && vma_is_fsdax(vmas[0])) {
- ret = -EOPNOTSUPP;
- put_page(page[0]);
- }
- }
- up_read(&mm->mmap_sem);
-
+ ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+ page, NULL, NULL);
if (ret == 1) {
*pfn = page_to_pfn(page[0]);
return 0;
}

- down_read(&mm->mmap_sem);
-
vaddr = untagged_addr(vaddr);

vma = find_vma_intersection(mm, vaddr, vaddr + 1);
diff --git a/mm/gup.c b/mm/gup.c
index 933524de6249..83702b2e86c8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,6 +29,13 @@ struct follow_page_context {
unsigned int page_mask;
};

+static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
+ struct mm_struct *mm,
+ unsigned long start,
+ unsigned long nr_pages,
+ struct page **pages,
+ struct vm_area_struct **vmas,
+ unsigned int flags);
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -1167,13 +1174,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct **vmas, int *locked)
{
/*
- * FIXME: Current FOLL_LONGTERM behavior is incompatible with
+ * Parts of FOLL_LONGTERM behavior are incompatible with
* FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
- * vmas. As there are no users of this flag in this call we simply
- * disallow this option for now.
+ * vmas. However, this only comes up if locked is set, and there are
+ * callers that do request FOLL_LONGTERM, but do not set locked. So,
+ * allow what we can.
*/
- if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
- return -EINVAL;
+ if (gup_flags & FOLL_LONGTERM) {
+ if (WARN_ON_ONCE(locked))
+ return -EINVAL;
+ /*
+ * This will check the vmas (even if our vmas arg is NULL)
+ * and return -ENOTSUPP if DAX isn't allowed in this case:
+ */
+ return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
+ vmas, gup_flags | FOLL_TOUCH |
+ FOLL_REMOTE);
+ }

return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
locked,
--
2.24.0


2019-11-13 13:05:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM

On Tue, Nov 12, 2019 at 08:26:55PM -0800, John Hubbard wrote:
> As it says in the updated comment in gup.c: current FOLL_LONGTERM
> behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> FS DAX check requirement on vmas.
>
> However, the corresponding restriction in get_user_pages_remote() was
> slightly stricter than is actually required: it forbade all
> FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> that do not set the "locked" arg.
>
> Update the code and comments accordingly, and update the VFIO caller
> to take advantage of this, fixing a bug as a result: the VFIO caller
> is logically a FOLL_LONGTERM user.
>
> Also, remove an unnessary pair of calls that were releasing and
> reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
> just in order to call page_to_pfn().
>
> Also, move the DAX check ("if a VMA is DAX, don't allow long term
> pinning") from the VFIO call site, all the way into the internals
> of get_user_pages_remote() and __gup_longterm_locked(). That is:
> get_user_pages_remote() calls __gup_longterm_locked(), which in turn
> calls check_dax_vmas(). It's lightly explained in the comments as well.
>
> Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
> and to Dan Williams for helping clarify the DAX refactoring.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Jerome Glisse <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Signed-off-by: John Hubbard <[email protected]>
> drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
> mm/gup.c | 27 ++++++++++++++++++++++-----
> 2 files changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d864277ea16f..7301b710c9a4 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> {
> struct page *page[1];
> struct vm_area_struct *vma;
> - struct vm_area_struct *vmas[1];
> unsigned int flags = 0;
> int ret;
>
> @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> flags |= FOLL_WRITE;
>
> down_read(&mm->mmap_sem);
> - if (mm == current->mm) {
> - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> - vmas);
> - } else {
> - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> - vmas, NULL);
> - /*
> - * The lifetime of a vaddr_get_pfn() page pin is
> - * userspace-controlled. In the fs-dax case this could
> - * lead to indefinite stalls in filesystem operations.
> - * Disallow attempts to pin fs-dax pages via this
> - * interface.
> - */
> - if (ret > 0 && vma_is_fsdax(vmas[0])) {
> - ret = -EOPNOTSUPP;
> - put_page(page[0]);
> - }
> - }
> - up_read(&mm->mmap_sem);
> -
> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> + page, NULL, NULL);
> if (ret == 1) {
> *pfn = page_to_pfn(page[0]);
> return 0;

Mind the return with the lock held this needs some goto unwind

Jason

2019-11-13 19:23:53

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM

On Tue, Nov 12, 2019 at 08:26:55PM -0800, John Hubbard wrote:
> As it says in the updated comment in gup.c: current FOLL_LONGTERM
> behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> FS DAX check requirement on vmas.
>
> However, the corresponding restriction in get_user_pages_remote() was
> slightly stricter than is actually required: it forbade all
> FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> that do not set the "locked" arg.
>
> Update the code and comments accordingly, and update the VFIO caller
> to take advantage of this, fixing a bug as a result: the VFIO caller
> is logically a FOLL_LONGTERM user.
>
> Also, remove an unnessary pair of calls that were releasing and
> reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
> just in order to call page_to_pfn().
>
> Also, move the DAX check ("if a VMA is DAX, don't allow long term
> pinning") from the VFIO call site, all the way into the internals
> of get_user_pages_remote() and __gup_longterm_locked(). That is:
> get_user_pages_remote() calls __gup_longterm_locked(), which in turn
> calls check_dax_vmas(). It's lightly explained in the comments as well.
>
> Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
> and to Dan Williams for helping clarify the DAX refactoring.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Jerome Glisse <[email protected]>
> Cc: Ira Weiny <[email protected]>

Reviewed-by: Ira Weiny <[email protected]>

> Signed-off-by: John Hubbard <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
> mm/gup.c | 27 ++++++++++++++++++++++-----
> 2 files changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d864277ea16f..7301b710c9a4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> {
> struct page *page[1];
> struct vm_area_struct *vma;
> - struct vm_area_struct *vmas[1];
> unsigned int flags = 0;
> int ret;
>
> @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> flags |= FOLL_WRITE;
>
> down_read(&mm->mmap_sem);
> - if (mm == current->mm) {
> - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> - vmas);
> - } else {
> - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> - vmas, NULL);
> - /*
> - * The lifetime of a vaddr_get_pfn() page pin is
> - * userspace-controlled. In the fs-dax case this could
> - * lead to indefinite stalls in filesystem operations.
> - * Disallow attempts to pin fs-dax pages via this
> - * interface.
> - */
> - if (ret > 0 && vma_is_fsdax(vmas[0])) {
> - ret = -EOPNOTSUPP;
> - put_page(page[0]);
> - }
> - }
> - up_read(&mm->mmap_sem);
> -
> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> + page, NULL, NULL);
> if (ret == 1) {
> *pfn = page_to_pfn(page[0]);
> return 0;
> }
>
> - down_read(&mm->mmap_sem);
> -
> vaddr = untagged_addr(vaddr);
>
> vma = find_vma_intersection(mm, vaddr, vaddr + 1);
> diff --git a/mm/gup.c b/mm/gup.c
> index 933524de6249..83702b2e86c8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,6 +29,13 @@ struct follow_page_context {
> unsigned int page_mask;
> };
>
> +static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
> + struct mm_struct *mm,
> + unsigned long start,
> + unsigned long nr_pages,
> + struct page **pages,
> + struct vm_area_struct **vmas,
> + unsigned int flags);
> /*
> * Return the compound head page with ref appropriately incremented,
> * or NULL if that failed.
> @@ -1167,13 +1174,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> struct vm_area_struct **vmas, int *locked)
> {
> /*
> - * FIXME: Current FOLL_LONGTERM behavior is incompatible with
> + * Parts of FOLL_LONGTERM behavior are incompatible with
> * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
> - * vmas. As there are no users of this flag in this call we simply
> - * disallow this option for now.
> + * vmas. However, this only comes up if locked is set, and there are
> + * callers that do request FOLL_LONGTERM, but do not set locked. So,
> + * allow what we can.
> */
> - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
> - return -EINVAL;
> + if (gup_flags & FOLL_LONGTERM) {
> + if (WARN_ON_ONCE(locked))
> + return -EINVAL;
> + /*
> + * This will check the vmas (even if our vmas arg is NULL)
> + * and return -ENOTSUPP if DAX isn't allowed in this case:
> + */
> + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
> + vmas, gup_flags | FOLL_TOUCH |
> + FOLL_REMOTE);
> + }
>
> return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
> locked,
> --
> 2.24.0
>

2019-11-13 19:29:13

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM

On Wed, Nov 13, 2019 at 09:02:02AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 12, 2019 at 08:26:55PM -0800, John Hubbard wrote:
> > As it says in the updated comment in gup.c: current FOLL_LONGTERM
> > behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> > FS DAX check requirement on vmas.
> >
> > However, the corresponding restriction in get_user_pages_remote() was
> > slightly stricter than is actually required: it forbade all
> > FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> > that do not set the "locked" arg.
> >
> > Update the code and comments accordingly, and update the VFIO caller
> > to take advantage of this, fixing a bug as a result: the VFIO caller
> > is logically a FOLL_LONGTERM user.
> >
> > Also, remove an unnessary pair of calls that were releasing and
> > reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
> > just in order to call page_to_pfn().
> >
> > Also, move the DAX check ("if a VMA is DAX, don't allow long term
> > pinning") from the VFIO call site, all the way into the internals
> > of get_user_pages_remote() and __gup_longterm_locked(). That is:
> > get_user_pages_remote() calls __gup_longterm_locked(), which in turn
> > calls check_dax_vmas(). It's lightly explained in the comments as well.
> >
> > Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
> > and to Dan Williams for helping clarify the DAX refactoring.
> >
> > Suggested-by: Jason Gunthorpe <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Jerome Glisse <[email protected]>
> > Cc: Ira Weiny <[email protected]>
> > Signed-off-by: John Hubbard <[email protected]>
> > drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
> > mm/gup.c | 27 ++++++++++++++++++++++-----
> > 2 files changed, 24 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index d864277ea16f..7301b710c9a4 100644
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> > {
> > struct page *page[1];
> > struct vm_area_struct *vma;
> > - struct vm_area_struct *vmas[1];
> > unsigned int flags = 0;
> > int ret;
> >
> > @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> > flags |= FOLL_WRITE;
> >
> > down_read(&mm->mmap_sem);
> > - if (mm == current->mm) {
> > - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> > - vmas);
> > - } else {
> > - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> > - vmas, NULL);
> > - /*
> > - * The lifetime of a vaddr_get_pfn() page pin is
> > - * userspace-controlled. In the fs-dax case this could
> > - * lead to indefinite stalls in filesystem operations.
> > - * Disallow attempts to pin fs-dax pages via this
> > - * interface.
> > - */
> > - if (ret > 0 && vma_is_fsdax(vmas[0])) {
> > - ret = -EOPNOTSUPP;
> > - put_page(page[0]);
> > - }
> > - }
> > - up_read(&mm->mmap_sem);
> > -
> > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> > + page, NULL, NULL);
> > if (ret == 1) {
> > *pfn = page_to_pfn(page[0]);
> > return 0;
>
> Mind the return with the lock held this needs some goto unwind

Ah yea... retract my reviewed by... :-(

Ira

2019-11-13 20:23:58

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v4 08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM

On 11/13/19 11:17 AM, Ira Weiny wrote:
...
>>> @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>>> flags |= FOLL_WRITE;
>>>
>>> down_read(&mm->mmap_sem);
>>> - if (mm == current->mm) {
>>> - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
>>> - vmas);
>>> - } else {
>>> - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
>>> - vmas, NULL);
>>> - /*
>>> - * The lifetime of a vaddr_get_pfn() page pin is
>>> - * userspace-controlled. In the fs-dax case this could
>>> - * lead to indefinite stalls in filesystem operations.
>>> - * Disallow attempts to pin fs-dax pages via this
>>> - * interface.
>>> - */
>>> - if (ret > 0 && vma_is_fsdax(vmas[0])) {
>>> - ret = -EOPNOTSUPP;
>>> - put_page(page[0]);
>>> - }
>>> - }
>>> - up_read(&mm->mmap_sem);
>>> -
>>> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
>>> + page, NULL, NULL);
>>> if (ret == 1) {
>>> *pfn = page_to_pfn(page[0]);
>>> return 0;
>>
>> Mind the return with the lock held this needs some goto unwind
>
> Ah yea... retract my reviewed by... :-(
>

ooops, embarrassed that I missed that, good catch. Will repost with it fixed.



thanks,
--
John Hubbard
NVIDIA