2020-10-24 02:02:08

by Jason Gunthorpe

[permalink] [raw]
Subject: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

The next patch in this series makes the lockless flow a little more
complex, so move the entire block into a new function and remove a level
of indention. Tidy a bit of cruft:

- addr is always the same as start, so use start
- Use the modern check_add_overflow() for computing end = start + len
- nr_pinned << PAGE_SHIFT needs an unsigned long cast, like nr_pages
- The handling of ret and nr_pinned can be streamlined a bit

No functional change.

Signed-off-by: Jason Gunthorpe <[email protected]>
---
mm/gup.c | 88 +++++++++++++++++++++++++++++---------------------------
1 file changed, 46 insertions(+), 42 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 102877ed77a4b4..ecbe1639ea2af7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2671,13 +2671,42 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
return ret;
}

+static unsigned int lockless_pages_from_mm(unsigned long addr,
+ unsigned long end,
+ unsigned int gup_flags,
+ struct page **pages)
+{
+ unsigned long flags;
+ int nr_pinned = 0;
+
+ if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
+ !gup_fast_permitted(addr, end))
+ return 0;
+
+ /*
+ * Disable interrupts. The nested form is used, in order to allow full,
+ * general purpose use of this routine.
+ *
+ * With interrupts disabled, we block page table pages from being freed
+ * from under us. See struct mmu_table_batch comments in
+ * include/asm-generic/tlb.h for more details.
+ *
+ * We do not adopt an rcu_read_lock(.) here as we also want to block
+ * IPIs that come from THPs splitting.
+ */
+ local_irq_save(flags);
+ gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
+ local_irq_restore(flags);
+ return nr_pinned;
+}
+
static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags,
struct page **pages)
{
- unsigned long addr, len, end;
- unsigned long flags;
- int nr_pinned = 0, ret = 0;
+ unsigned long len, end;
+ unsigned int nr_pinned;
+ int ret;

if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
FOLL_FORCE | FOLL_PIN | FOLL_GET |
@@ -2691,53 +2720,28 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
might_lock_read(&current->mm->mmap_lock);

start = untagged_addr(start) & PAGE_MASK;
- addr = start;
len = (unsigned long) nr_pages << PAGE_SHIFT;
- end = start + len;
-
- if (end <= start)
+ if (check_add_overflow(start, len, &end))
return 0;
if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;

- /*
- * Disable interrupts. The nested form is used, in order to allow
- * full, general purpose use of this routine.
- *
- * With interrupts disabled, we block page table pages from being
- * freed from under us. See struct mmu_table_batch comments in
- * include/asm-generic/tlb.h for more details.
- *
- * We do not adopt an rcu_read_lock(.) here as we also want to
- * block IPIs that come from THPs splitting.
- */
- if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
- unsigned long fast_flags = gup_flags;
-
- local_irq_save(flags);
- gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
- local_irq_restore(flags);
- ret = nr_pinned;
- }
-
- if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
- /* Try to get the remaining pages with get_user_pages */
- start += nr_pinned << PAGE_SHIFT;
- pages += nr_pinned;
-
- ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
- gup_flags, pages);
+ nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
+ if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
+ return nr_pinned;

+ /* Try to get the remaining pages with get_user_pages */
+ start += (unsigned long)nr_pinned << PAGE_SHIFT;
+ pages += nr_pinned;
+ ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
+ pages);
+ if (ret < 0) {
/* Have to be a bit careful with return values */
- if (nr_pinned > 0) {
- if (ret < 0)
- ret = nr_pinned;
- else
- ret += nr_pinned;
- }
+ if (nr_pinned)
+ return nr_pinned;
+ return ret;
}
-
- return ret;
+ return ret + nr_pinned;
}
/**
* get_user_pages_fast_only() - pin user pages in memory
--
2.28.0


2020-10-24 11:05:42

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
> The next patch in this series makes the lockless flow a little more
> complex, so move the entire block into a new function and remove a level
> of indention. Tidy a bit of cruft:
>
> - addr is always the same as start, so use start
> - Use the modern check_add_overflow() for computing end = start + len
> - nr_pinned << PAGE_SHIFT needs an unsigned long cast, like nr_pages
> - The handling of ret and nr_pinned can be streamlined a bit
>
> No functional change.

Looks nice. Because of the above, which is not just code movement but
quite a bit more than that, I had to rake through it with a fine-toothed
comb to be sure it's OK. I think it is. But as a side effect, I noticed
some tiny polishing ideas that popped up, see below.


>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> ---
> mm/gup.c | 88 +++++++++++++++++++++++++++++---------------------------
> 1 file changed, 46 insertions(+), 42 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 102877ed77a4b4..ecbe1639ea2af7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2671,13 +2671,42 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
> return ret;
> }
>
> +static unsigned int lockless_pages_from_mm(unsigned long addr,

It would be slightly more consistent to use "start" here, too, instead of addr.

Separately, I'm not joyful about the change to unsigned int for the
return type. I understand why you did it and that's perfectly sound
reasoning: there is no -ERRNO possible here, and nr_pinned will always
be >=0. And it's correct, although it does have a type mismatch in the
return value.

However, I'd argue, mildly, that it's better to just leave everything as
"int" for nr_pages and nr_pinned, in gup.c, and just keep all the types
matched perfectly. This helps avoid bugs. And if we want to meaningfully
upgrade that, then either:

a) change all the nr_pages and nr_pinned throughout, to "long", or

b) change all the nr_pages and nr_pinned all function args, and use int
return types throughout, as a "O or -ERRNO, only" return value
convention.



> + unsigned long end,
> + unsigned int gup_flags,
> + struct page **pages)
> +{
> + unsigned long flags;
> + int nr_pinned = 0;
> +
> + if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) ||
> + !gup_fast_permitted(addr, end))

That actually fits on a single 80-col line, so let's leave it that way:

if (!IS_ENABLED(CONFIG_HAVE_FAST_GUP) || !gup_fast_permitted(addr, end))

> + return 0;
> +
> + /*
> + * Disable interrupts. The nested form is used, in order to allow full,
> + * general purpose use of this routine.
> + *
> + * With interrupts disabled, we block page table pages from being freed
> + * from under us. See struct mmu_table_batch comments in
> + * include/asm-generic/tlb.h for more details.
> + *
> + * We do not adopt an rcu_read_lock(.) here as we also want to block
> + * IPIs that come from THPs splitting.
> + */
> + local_irq_save(flags);
> + gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
> + local_irq_restore(flags);
> + return nr_pinned;
> +}
> +
> static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
> unsigned int gup_flags,
> struct page **pages)
> {
> - unsigned long addr, len, end;
> - unsigned long flags;
> - int nr_pinned = 0, ret = 0;
> + unsigned long len, end;
> + unsigned int nr_pinned;

If you choose to take the advice above, then this should be left as "int
nr_pinned".

> + int ret;
>
> if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
> FOLL_FORCE | FOLL_PIN | FOLL_GET |
> @@ -2691,53 +2720,28 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
> might_lock_read(&current->mm->mmap_lock);
>
> start = untagged_addr(start) & PAGE_MASK;
> - addr = start;
> len = (unsigned long) nr_pages << PAGE_SHIFT;
> - end = start + len;
> -
> - if (end <= start)
> + if (check_add_overflow(start, len, &end))
> return 0;

Very nice.

> if (unlikely(!access_ok((void __user *)start, len)))
> return -EFAULT;
>
> - /*
> - * Disable interrupts. The nested form is used, in order to allow
> - * full, general purpose use of this routine.
> - *
> - * With interrupts disabled, we block page table pages from being
> - * freed from under us. See struct mmu_table_batch comments in
> - * include/asm-generic/tlb.h for more details.
> - *
> - * We do not adopt an rcu_read_lock(.) here as we also want to

I wish I had not copied that comment verbatim when I moved it here. Can
you please delete the weird ".", so that it just reads:

* We do not adopt an rcu_read_lock() here, because we also want to

> - * block IPIs that come from THPs splitting.
> - */
> - if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) {
> - unsigned long fast_flags = gup_flags;
> -
> - local_irq_save(flags);
> - gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned);
> - local_irq_restore(flags);
> - ret = nr_pinned;
> - }
> -
> - if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
> - /* Try to get the remaining pages with get_user_pages */
> - start += nr_pinned << PAGE_SHIFT;
> - pages += nr_pinned;
> -
> - ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
> - gup_flags, pages);
> + nr_pinned = lockless_pages_from_mm(start, end, gup_flags, pages);
> + if (nr_pinned == nr_pages || gup_flags & FOLL_FAST_ONLY)
> + return nr_pinned;
>
> + /* Try to get the remaining pages with get_user_pages */

Could we tweak that to this, as long as we're here:

/* Slow path: Try to get the remaining pages with get_user_pages */


> + start += (unsigned long)nr_pinned << PAGE_SHIFT;
> + pages += nr_pinned;
> + ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> + pages);
> + if (ret < 0) {
> /* Have to be a bit careful with return values */

...and can we move that comment up one level, so that it reads:

/* Have to be a bit careful with return values */
if (ret < 0) {
if (nr_pinned)
return nr_pinned;
return ret;
}
return ret + nr_pinned;

Thinking about this longer term, it would be nice if the whole gup/pup API
set just stopped pretending that anyone cares about partial success, because
they *don't*. If we had return values of "0 or -ERRNO" throughout, and an
additional set of API wrappers that did some sort of limited retry just like
some of the callers do, that would be a happier story.

That has nothing to do with your patch, I'm just baiting Matthew Wilcox and
linux-mm to tell me that they love the idea, or hate it, or don't care. :)


> - if (nr_pinned > 0) {
> - if (ret < 0)
> - ret = nr_pinned;
> - else
> - ret += nr_pinned;
> - }
> + if (nr_pinned)
> + return nr_pinned;
> + return ret;
> }
> -
> - return ret;
> + return ret + nr_pinned;
> }
> /**
> * get_user_pages_fast_only() - pin user pages in memory
>

thanks,
--
John Hubbard
NVIDIA

2020-10-27 00:21:13

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On Fri, Oct 23, 2020 at 09:44:17PM -0700, John Hubbard wrote:
> > Signed-off-by: Jason Gunthorpe <[email protected]>
> > mm/gup.c | 88 +++++++++++++++++++++++++++++---------------------------
> > 1 file changed, 46 insertions(+), 42 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 102877ed77a4b4..ecbe1639ea2af7 100644
> > +++ b/mm/gup.c
> > @@ -2671,13 +2671,42 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
> > return ret;
> > }
> > +static unsigned int lockless_pages_from_mm(unsigned long addr,
>
> It would be slightly more consistent to use "start" here, too, instead of addr.
>
> Separately, I'm not joyful about the change to unsigned int for the
> return type. I understand why you did it and that's perfectly sound
> reasoning: there is no -ERRNO possible here, and nr_pinned will always
> be >=0. And it's correct, although it does have a type mismatch in the
> return value.

I did it because I had to check that ignoring a negative return or
doing some wonky negative arithmetic wasn't some sneaky beahvior. It
isn't, the value is really unsigned. So I documented it to save the
next person this work.

I think the proper response is to ultimately change the
gup_pgd_range() call tree to take the unsigned as well.

> a) change all the nr_pages and nr_pinned throughout, to "long", or
>
> b) change all the nr_pages and nr_pinned all function args, and use int
> return types throughout, as a "O or -ERRNO, only" return value
> convention.

The gup_pgd_range() this stuff largely does return

I think gup_pgd_range() works as it does due to

> > + start += (unsigned long)nr_pinned << PAGE_SHIFT;
> > + pages += nr_pinned;
> > + ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> > + pages);
> > + if (ret < 0) {
> > /* Have to be a bit careful with return values */
>
> ...and can we move that comment up one level, so that it reads:
>
> /* Have to be a bit careful with return values */
> if (ret < 0) {
> if (nr_pinned)
> return nr_pinned;
> return ret;
> }
> return ret + nr_pinned;

I actually deliberately put it inside the if because there is nothing
tricky about ret < 0, that is basically perfectly normal. It is only
the logic to drop the error code sometimes that is tricky..

> Thinking about this longer term, it would be nice if the whole gup/pup API
> set just stopped pretending that anyone cares about partial success, because
> they *don't*. If we had return values of "0 or -ERRNO" throughout, and an
> additional set of API wrappers that did some sort of limited retry just like
> some of the callers do, that would be a happier story.

It seems like a good idea to me

I'll get the other notes in a v2

Thanks,
Jason

2020-10-27 14:31:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On Tue, Oct 27, 2020 at 10:33:01AM +0100, Jan Kara wrote:
> Actually there are callers that care about partial success. See e.g.
> iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or
> bio_iov_iter_get_pages(). These places handle partial success just fine and
> not allowing partial success from GUP could regress things...

But most users do indeed not care. Maybe an explicit FOLL_PARTIAL to
opt into partial handling could clean up a lot of the mess. Maybe just
for pin_user_pages for now.

2020-10-27 14:55:34

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On Tue, Oct 27, 2020 at 10:33:01AM +0100, Jan Kara wrote:
> On Fri 23-10-20 21:44:17, John Hubbard wrote:
> > On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
> > > + start += (unsigned long)nr_pinned << PAGE_SHIFT;
> > > + pages += nr_pinned;
> > > + ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> > > + pages);
> > > + if (ret < 0) {
> > > /* Have to be a bit careful with return values */
> >
> > ...and can we move that comment up one level, so that it reads:
> >
> > /* Have to be a bit careful with return values */
> > if (ret < 0) {
> > if (nr_pinned)
> > return nr_pinned;
> > return ret;
> > }
> > return ret + nr_pinned;
> >
> > Thinking about this longer term, it would be nice if the whole gup/pup API
> > set just stopped pretending that anyone cares about partial success, because
> > they *don't*. If we had return values of "0 or -ERRNO" throughout, and an
> > additional set of API wrappers that did some sort of limited retry just like
> > some of the callers do, that would be a happier story.
>
> Actually there are callers that care about partial success. See e.g.
> iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or
> bio_iov_iter_get_pages(). These places handle partial success just fine and
> not allowing partial success from GUP could regress things...

I looked through a bunch of call sites, and there are a wack that
actually do only want a complete return and are carrying a bunch of
code to fix it:

pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
if (!pvec)
return -ENOMEM;

do {
unsigned num_pages = npages - pinned;
uint64_t ptr = userptr->ptr + pinned * PAGE_SIZE;
struct page **pages = pvec + pinned;

ret = pin_user_pages_fast(ptr, num_pages,
!userptr->ro ? FOLL_WRITE : 0, pages);
if (ret < 0) {
unpin_user_pages(pvec, pinned);
kvfree(pvec);
return ret;
}

pinned += ret;

} while (pinned < npages);

Is really a lot better if written as:

pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
if (!pvec)
return -ENOMEM;
ret = pin_user_pages_fast(userptr->ptr, npages, FOLL_COMPLETE |
(!userptr->ro ? FOLL_WRITE : 0),
pvec);
if (ret) {
kvfree(pvec);
return ret;
}

(eg FOLL_COMPLETE says to return exactly npages or fail)

Some code assumes things work that way already anyhow:

/* Pin user pages for DMA Xfer */
err = pin_user_pages_unlocked(user_dma.uaddr, user_dma.page_count,
dma->map, FOLL_FORCE);

if (user_dma.page_count != err) {
IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
err, user_dma.page_count);
if (err >= 0) {
unpin_user_pages(dma->map, err);
return -EINVAL;
}
return err;
}

Actually I'm quite surprised I didn't find too many missing the tricky
unpin_user_pages() on the error path - eg
videobuf_dma_init_user_locked() is wrong.

Jason

2020-10-28 06:16:46

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On Fri 23-10-20 21:44:17, John Hubbard wrote:
> On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
> > + start += (unsigned long)nr_pinned << PAGE_SHIFT;
> > + pages += nr_pinned;
> > + ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
> > + pages);
> > + if (ret < 0) {
> > /* Have to be a bit careful with return values */
>
> ...and can we move that comment up one level, so that it reads:
>
> /* Have to be a bit careful with return values */
> if (ret < 0) {
> if (nr_pinned)
> return nr_pinned;
> return ret;
> }
> return ret + nr_pinned;
>
> Thinking about this longer term, it would be nice if the whole gup/pup API
> set just stopped pretending that anyone cares about partial success, because
> they *don't*. If we had return values of "0 or -ERRNO" throughout, and an
> additional set of API wrappers that did some sort of limited retry just like
> some of the callers do, that would be a happier story.

Actually there are callers that care about partial success. See e.g.
iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or
bio_iov_iter_get_pages(). These places handle partial success just fine and
not allowing partial success from GUP could regress things...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-10-28 23:22:23

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On 10/27/20 11:00 PM, John Hubbard wrote:
> On 10/27/20 6:15 AM, Jason Gunthorpe wrote:
>> On Tue, Oct 27, 2020 at 10:33:01AM +0100, Jan Kara wrote:
>>> On Fri 23-10-20 21:44:17, John Hubbard wrote:
>>>> On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
...
>
> I'll fix up that one above (using your Reported-by, I assume), unless someone else is
> already taking care of it.
>

ummm, strike that--that won't need fixing, it will pick up the fix automatically.
Arghh. Let me blame this on the late hour. haha :)


thanks,
--
John Hubbard
NVIDIA

2020-10-29 09:59:53

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On 10/27/20 6:15 AM, Jason Gunthorpe wrote:
> On Tue, Oct 27, 2020 at 10:33:01AM +0100, Jan Kara wrote:
>> On Fri 23-10-20 21:44:17, John Hubbard wrote:
>>> On 10/23/20 5:19 PM, Jason Gunthorpe wrote:
>>>> + start += (unsigned long)nr_pinned << PAGE_SHIFT;
>>>> + pages += nr_pinned;
>>>> + ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned, gup_flags,
>>>> + pages);
>>>> + if (ret < 0) {
>>>> /* Have to be a bit careful with return values */
>>>
>>> ...and can we move that comment up one level, so that it reads:
>>>
>>> /* Have to be a bit careful with return values */
>>> if (ret < 0) {
>>> if (nr_pinned)
>>> return nr_pinned;
>>> return ret;
>>> }
>>> return ret + nr_pinned;
>>>
>>> Thinking about this longer term, it would be nice if the whole gup/pup API
>>> set just stopped pretending that anyone cares about partial success, because
>>> they *don't*. If we had return values of "0 or -ERRNO" throughout, and an
>>> additional set of API wrappers that did some sort of limited retry just like
>>> some of the callers do, that would be a happier story.
>>
>> Actually there are callers that care about partial success. See e.g.
>> iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or
>> bio_iov_iter_get_pages(). These places handle partial success just fine and
>> not allowing partial success from GUP could regress things...
>
> I looked through a bunch of call sites, and there are a wack that

So did I! :)

> actually do only want a complete return and are carrying a bunch of
> code to fix it:
>
> pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> if (!pvec)
> return -ENOMEM;
>
> do {
> unsigned num_pages = npages - pinned;
> uint64_t ptr = userptr->ptr + pinned * PAGE_SIZE;
> struct page **pages = pvec + pinned;
>
> ret = pin_user_pages_fast(ptr, num_pages,
> !userptr->ro ? FOLL_WRITE : 0, pages);
> if (ret < 0) {
> unpin_user_pages(pvec, pinned);
> kvfree(pvec);
> return ret;
> }
>
> pinned += ret;
>
> } while (pinned < npages);
>
> Is really a lot better if written as:
>
> pvec = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> if (!pvec)
> return -ENOMEM;
> ret = pin_user_pages_fast(userptr->ptr, npages, FOLL_COMPLETE |
> (!userptr->ro ? FOLL_WRITE : 0),
> pvec);
> if (ret) {
> kvfree(pvec);
> return ret;
> }
>
> (eg FOLL_COMPLETE says to return exactly npages or fail)


Yes, exactly. And if I reverse the polarity (to Christoph's FOLL_PARTIAL, instead
of FOLL_COMPLETE) it's even smaller, slightly. Which is where I am leaning now.


>
> Some code assumes things work that way already anyhow:
>
> /* Pin user pages for DMA Xfer */
> err = pin_user_pages_unlocked(user_dma.uaddr, user_dma.page_count,
> dma->map, FOLL_FORCE);
>
> if (user_dma.page_count != err) {
> IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
> err, user_dma.page_count);
> if (err >= 0) {
> unpin_user_pages(dma->map, err);
> return -EINVAL;
> }
> return err;
> }
>
> Actually I'm quite surprised I didn't find too many missing the tricky
> unpin_user_pages() on the error path - eg
> videobuf_dma_init_user_locked() is wrong.
>

Well. That's not accidental. "Some People" (much thanks to Souptick Joarder, btw) have
been fixing up many of those sites, during the pin_user_pages() conversions. Otherwise
you would have found about 10 or 15 more.

I'll fix up that one above (using your Reported-by, I assume), unless someone else is
already taking care of it.


thanks,
--
John Hubbard
NVIDIA

2020-10-29 10:00:26

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: reorganize internal_get_user_pages_fast()

On 10/27/20 2:55 AM, Christoph Hellwig wrote:
> On Tue, Oct 27, 2020 at 10:33:01AM +0100, Jan Kara wrote:
>> Actually there are callers that care about partial success. See e.g.
>> iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or
>> bio_iov_iter_get_pages(). These places handle partial success just fine and
>> not allowing partial success from GUP could regress things...

Good point. And those also happen to be the key call sites that I haven't
yet converted to pin_user_pages*(). Seeing as how I'm three versions into
attempting to convert the various *iov_iter*() routines, I should have
remembered that they are all about partial success. :)

>
> But most users do indeed not care. Maybe an explicit FOLL_PARTIAL to
> opt into partial handling could clean up a lot of the mess. Maybe just
> for pin_user_pages for now.
>

That does seem like the perfect mix. IIRC, all of the pin_user_pages()
call sites today do not accept partial success (and it's easy enough to
audit and confirm). So likely no need to add FOLL_PARTIAL there, and no
huge danger of regressions. It would definitely reduce the line count at
multiple call sites, in return for adding some lines to gup.c.

And maybe it can go further, at some point, but that's a good way to start.

I'm leaning toward just sending out a small series to do that, unless there
are objections and/or better ways to improve this area...


thanks,
--
John Hubbard
NVIDIA