2019-08-09 23:00:19

by Ira Weiny

[permalink] [raw]
Subject: [RFC PATCH v2 15/19] mm/gup: Introduce vaddr_pin_pages()

From: Ira Weiny <[email protected]>

The addition of FOLL_LONGTERM has taken on additional meaning for CMA
pages.

In addition subsystems such as RDMA require new information to be passed
to the GUP interface to track file owning information. As such a simple
FOLL_LONGTERM flag is no longer sufficient for these users to pin pages.

Introduce a new GUP like call which takes the newly introduced vaddr_pin
information. Failure to pass the vaddr_pin object back to a vaddr_put*
call will result in a failure if pins were created on files during the
pin operation.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from list:
Change to vaddr_put_pages_dirty_lock
Change to vaddr_unpin_pages_dirty_lock

include/linux/mm.h | 5 ++++
mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 657c947bda49..90c5802866df 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
struct task_struct *task, bool bypass_rlim);

+long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vaddr_pin *vaddr_pin);
+void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
+ struct vaddr_pin *vaddr_pin, bool make_dirty);
bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);

/* Container for pinned pfns / pages */
diff --git a/mm/gup.c b/mm/gup.c
index eeaa0ddd08a6..6d23f70d7847 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
return ret;
}
EXPORT_SYMBOL_GPL(get_user_pages_fast);
+
+/**
+ * vaddr_pin_pages pin pages by virtual address and return the pages to the
+ * user.
+ *
+ * @addr, start address
+ * @nr_pages, number of pages to pin
+ * @gup_flags, flags to use for the pin
+ * @pages, array of pages returned
+ * @vaddr_pin, initalized meta information this pin is to be associated
+ * with.
+ *
+ * NOTE regarding vaddr_pin:
+ *
+ * Some callers can share pins via file descriptors to other processes.
+ * Callers such as this should use the f_owner field of vaddr_pin to indicate
+ * the file the fd points to. All other callers should use the mm this pin is
+ * being made against. Usually "current->mm".
+ *
+ * Expects mmap_sem to be read locked.
+ */
+long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vaddr_pin *vaddr_pin)
+{
+ long ret;
+
+ gup_flags |= FOLL_LONGTERM;
+
+ if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner))
+ return -EINVAL;
+
+ ret = __gup_longterm_locked(current,
+ vaddr_pin->mm,
+ addr, nr_pages,
+ pages, NULL, gup_flags,
+ vaddr_pin);
+ return ret;
+}
+EXPORT_SYMBOL(vaddr_pin_pages);
+
+/**
+ * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
+ *
+ * @pages, array of pages returned
+ * @nr_pages, number of pages in pages
+ * @vaddr_pin, same information passed to vaddr_pin_pages
+ * @make_dirty: whether to mark the pages dirty
+ *
+ * The semantics are similar to put_user_pages_dirty_lock but a vaddr_pin used
+ * in vaddr_pin_pages should be passed back into this call for propper
+ * tracking.
+ */
+void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
+ struct vaddr_pin *vaddr_pin, bool make_dirty)
+{
+ __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
+}
+EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
--
2.20.1


2019-08-12 21:01:41

by Ira Weiny

[permalink] [raw]
Subject: Re: [RFC PATCH v2 15/19] mm/gup: Introduce vaddr_pin_pages()

On Fri, Aug 09, 2019 at 05:09:54PM -0700, John Hubbard wrote:
> On 8/9/19 3:58 PM, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > The addition of FOLL_LONGTERM has taken on additional meaning for CMA
> > pages.
> >
> > In addition subsystems such as RDMA require new information to be passed
> > to the GUP interface to track file owning information. As such a simple
> > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages.
> >
> > Introduce a new GUP like call which takes the newly introduced vaddr_pin
> > information. Failure to pass the vaddr_pin object back to a vaddr_put*
> > call will result in a failure if pins were created on files during the
> > pin operation.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > Changes from list:
> > Change to vaddr_put_pages_dirty_lock
> > Change to vaddr_unpin_pages_dirty_lock
> >
> > include/linux/mm.h | 5 ++++
> > mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 64 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 657c947bda49..90c5802866df 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> > int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> > struct task_struct *task, bool bypass_rlim);
> >
> > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
> > + unsigned int gup_flags, struct page **pages,
> > + struct vaddr_pin *vaddr_pin);
> > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
> > + struct vaddr_pin *vaddr_pin, bool make_dirty);
>
> Hi Ira,
>
> OK, the API seems fine to me, anyway. :)
>
> A bit more below...
>
> > bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);
> >
> > /* Container for pinned pfns / pages */
> > diff --git a/mm/gup.c b/mm/gup.c
> > index eeaa0ddd08a6..6d23f70d7847 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
> > return ret;
> > }
> > EXPORT_SYMBOL_GPL(get_user_pages_fast);
> > +
> > +/**
> > + * vaddr_pin_pages pin pages by virtual address and return the pages to the
> > + * user.
> > + *
> > + * @addr, start address
>
> What's with the commas? I thought kernel-doc wants colons, like this, right?
>
> @addr: start address

:-/ I don't know.

Fixed.

>
>
> > + * @nr_pages, number of pages to pin
> > + * @gup_flags, flags to use for the pin
> > + * @pages, array of pages returned
> > + * @vaddr_pin, initalized meta information this pin is to be associated
> > + * with.
> > + *
> > + * NOTE regarding vaddr_pin:
> > + *
> > + * Some callers can share pins via file descriptors to other processes.
> > + * Callers such as this should use the f_owner field of vaddr_pin to indicate
> > + * the file the fd points to. All other callers should use the mm this pin is
> > + * being made against. Usually "current->mm".
> > + *
> > + * Expects mmap_sem to be read locked.
> > + */
> > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
> > + unsigned int gup_flags, struct page **pages,
> > + struct vaddr_pin *vaddr_pin)
> > +{
> > + long ret;
> > +
> > + gup_flags |= FOLL_LONGTERM;
>
>
> Is now the right time to introduce and use FOLL_PIN? If not, then I can always
> add it on top of this later, as part of gup-tracking patches. But you did point
> out that FOLL_LONGTERM is taking on additional meaning, and so maybe it's better
> to split that meaning up right from the start.
>

At one point I wanted to (and had in my tree) a new flag but I went away from
it. Prior to the discussion on mlock last week I did not think we needed it.
But I'm ok to add it back in.

I was not ignoring the idea for this RFC I just wanted to get this out there
for people to see. I see that you threw out a couple of patches which add this
flag in.

FWIW, I think it would be good to differentiate between an indefinite pinned
page vs a referenced "gotten" page.

What you and I have been working on is the former. So it would be easy to
change your refcounting patches to simply key off of FOLL_PIN.

Would you like me to add in your FOLL_PIN patches to this series?

>
> > +
> > + if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner))
> > + return -EINVAL;
> > +
> > + ret = __gup_longterm_locked(current,
> > + vaddr_pin->mm,
> > + addr, nr_pages,
> > + pages, NULL, gup_flags,
> > + vaddr_pin);
> > + return ret;
> > +}
> > +EXPORT_SYMBOL(vaddr_pin_pages);
> > +
> > +/**
> > + * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
> > + *
> > + * @pages, array of pages returned
> > + * @nr_pages, number of pages in pages
> > + * @vaddr_pin, same information passed to vaddr_pin_pages
> > + * @make_dirty: whether to mark the pages dirty
> > + *
> > + * The semantics are similar to put_user_pages_dirty_lock but a vaddr_pin used
> > + * in vaddr_pin_pages should be passed back into this call for propper
>
> Typo:
proper
Fixed.

>
> > + * tracking.
> > + */
> > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
> > + struct vaddr_pin *vaddr_pin, bool make_dirty)
> > +{
> > + __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
> > +}
> > +EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
> >
>
> OK, whew, I'm glad to see the updated _dirty_lock() API used here. :)

Yea this was pretty easy to change during the rebase. Again I'm kind of
floating these quickly at this point. So sorry about the nits...

Ira

>
> thanks,
> --
> John Hubbard
> NVIDIA

2019-08-12 21:02:30

by Ira Weiny

[permalink] [raw]
Subject: Re: [RFC PATCH v2 15/19] mm/gup: Introduce vaddr_pin_pages()

On Sun, Aug 11, 2019 at 04:07:23PM -0700, John Hubbard wrote:
> On 8/9/19 3:58 PM, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > The addition of FOLL_LONGTERM has taken on additional meaning for CMA
> > pages.
> >
> > In addition subsystems such as RDMA require new information to be passed
> > to the GUP interface to track file owning information. As such a simple
> > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages.
> >
> > Introduce a new GUP like call which takes the newly introduced vaddr_pin
> > information. Failure to pass the vaddr_pin object back to a vaddr_put*
> > call will result in a failure if pins were created on files during the
> > pin operation.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
>
> I'm creating a new call site conversion series, to replace the
> "put_user_pages(): miscellaneous call sites" series. This uses
> vaddr_pin_pages*() where appropriate. So it's based on your series here.
>
> btw, while doing that, I noticed one more typo while re-reading some of the comments.
> Thought you probably want to collect them all for the next spin. Below...
>
> > ---
> > Changes from list:
> > Change to vaddr_put_pages_dirty_lock
> > Change to vaddr_unpin_pages_dirty_lock
> >
> > include/linux/mm.h | 5 ++++
> > mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 64 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 657c947bda49..90c5802866df 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> > int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> > struct task_struct *task, bool bypass_rlim);
> >
> > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
> > + unsigned int gup_flags, struct page **pages,
> > + struct vaddr_pin *vaddr_pin);
> > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
> > + struct vaddr_pin *vaddr_pin, bool make_dirty);
> > bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);
> >
> > /* Container for pinned pfns / pages */
> > diff --git a/mm/gup.c b/mm/gup.c
> > index eeaa0ddd08a6..6d23f70d7847 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
> > return ret;
> > }
> > EXPORT_SYMBOL_GPL(get_user_pages_fast);
> > +
> > +/**
> > + * vaddr_pin_pages pin pages by virtual address and return the pages to the
> > + * user.
> > + *
> > + * @addr, start address
> > + * @nr_pages, number of pages to pin
> > + * @gup_flags, flags to use for the pin
> > + * @pages, array of pages returned
> > + * @vaddr_pin, initalized meta information this pin is to be associated
>
> Typo:
> initialized

Thanks fixed.
Ira

>
>
> thanks,
> --
> John Hubbard
> NVIDIA

2019-08-13 17:48:10

by Ira Weiny

[permalink] [raw]
Subject: Re: [RFC PATCH v2 15/19] mm/gup: Introduce vaddr_pin_pages()

On Tue, Aug 13, 2019 at 08:47:06AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2019 at 02:48:55PM -0700, Ira Weiny wrote:
> > On Mon, Aug 12, 2019 at 09:28:14AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Aug 09, 2019 at 03:58:29PM -0700, [email protected] wrote:
> > > > From: Ira Weiny <[email protected]>
> > > >
> > > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA
> > > > pages.
> > > >
> > > > In addition subsystems such as RDMA require new information to be passed
> > > > to the GUP interface to track file owning information. As such a simple
> > > > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages.
> > > >
> > > > Introduce a new GUP like call which takes the newly introduced vaddr_pin
> > > > information. Failure to pass the vaddr_pin object back to a vaddr_put*
> > > > call will result in a failure if pins were created on files during the
> > > > pin operation.
> > >
> > > Is this a 'vaddr' in the traditional sense, ie does it work with
> > > something returned by valloc?
> >
> > ...or malloc in user space, yes. I think the idea is that it is a user virtual
> > address.
>
> valloc is a kernel call

Oh... I thought you meant this: https://linux.die.net/man/3/valloc

>
> > So I'm open to suggestions. Jan gave me this one, so I figured it was safer to
> > suggest it...
>
> Should have the word user in it, imho

Fair enough...

user_addr_pin_pages(void __user * addr, ...) ?

uaddr_pin_pages(void __user * addr, ...) ?

I think I like uaddr...

>
> > > I also wish GUP like functions took in a 'void __user *' instead of
> > > the unsigned long to make this clear :\
> >
> > Not a bad idea. But I only see a couple of call sites who actually use a 'void
> > __user *' to pass into GUP... :-/
> >
> > For RDMA the address is _never_ a 'void __user *' AFAICS.
>
> That is actually a bug, converting from u64 to a 'user VA' needs to go
> through u64_to_user_ptr().

Fair enough.

But there are a lot of call sites throughout the kernel who have the same
bug... I'm ok with forcing u64_to_user_ptr() to use this new call if others
are.

Ira

2019-08-13 17:57:43

by John Hubbard

[permalink] [raw]
Subject: Re: [RFC PATCH v2 15/19] mm/gup: Introduce vaddr_pin_pages()

On 8/13/19 10:46 AM, Ira Weiny wrote:
> On Tue, Aug 13, 2019 at 08:47:06AM -0300, Jason Gunthorpe wrote:
>> On Mon, Aug 12, 2019 at 02:48:55PM -0700, Ira Weiny wrote:
>>> On Mon, Aug 12, 2019 at 09:28:14AM -0300, Jason Gunthorpe wrote:
>>>> On Fri, Aug 09, 2019 at 03:58:29PM -0700, [email protected] wrote:
>>>>> From: Ira Weiny <[email protected]>
...
>>> So I'm open to suggestions. Jan gave me this one, so I figured it was safer to
>>> suggest it...
>>
>> Should have the word user in it, imho
>
> Fair enough...
>
> user_addr_pin_pages(void __user * addr, ...) ?
>
> uaddr_pin_pages(void __user * addr, ...) ?
>
> I think I like uaddr...
>

Better to spell out "user". "u" prefixes are used for "unsigned" and it
is just too ambiguous here. Maybe:

vaddr_pin_user_pages()

...which also sounds close enough to get_user_pages() that a bit of
history and continuity is preserved, too.



thanks,
--
John Hubbard
NVIDIA