2020-05-27 20:58:57

by Souptick Joarder

[permalink] [raw]
Subject: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

This code was using get_user_pages_fast(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages_fast() + put_page() calls to
pin_user_pages_fast() + unpin_user_page() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Signed-off-by: Souptick Joarder <[email protected]>
Cc: John Hubbard <[email protected]>

Hi,

I'm compile tested this, but unable to run-time test, so any testing
help is much appriciated.
---
drivers/staging/gasket/gasket_page_table.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/gasket/gasket_page_table.c b/drivers/staging/gasket/gasket_page_table.c
index f6d7157..d712ad4 100644
--- a/drivers/staging/gasket/gasket_page_table.c
+++ b/drivers/staging/gasket/gasket_page_table.c
@@ -449,7 +449,7 @@ static bool gasket_release_page(struct page *page)

if (!PageReserved(page))
SetPageDirty(page);
- put_page(page);
+ unpin_user_page(page);

return true;
}
@@ -486,12 +486,12 @@ static int gasket_perform_mapping(struct gasket_page_table *pg_tbl,
ptes[i].dma_addr = pg_tbl->coherent_pages[0].paddr +
off + i * PAGE_SIZE;
} else {
- ret = get_user_pages_fast(page_addr - offset, 1,
+ ret = pin_user_pages_fast(page_addr - offset, 1,
FOLL_WRITE, &page);

if (ret <= 0) {
dev_err(pg_tbl->device,
- "get user pages failed for addr=0x%lx, offset=0x%lx [ret=%d]\n",
+ "pin user pages failed for addr=0x%lx, offset=0x%lx [ret=%d]\n",
page_addr, offset, ret);
return ret ? ret : -ENOMEM;
}
--
1.9.1


2020-05-28 11:07:18

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
> This code was using get_user_pages_fast(), in a "Case 2" scenario
> (DMA/RDMA), using the categorization from [1]. That means that it's
> time to convert the get_user_pages_fast() + put_page() calls to
> pin_user_pages_fast() + unpin_user_page() calls.

You are saying that the page is used for DIO and not DMA, but it sure
looks to me like it is used for DMA.

503 /* Map the page into DMA space. */
504 ptes[i].dma_addr =
505 dma_map_page(pg_tbl->device, page, 0, PAGE_SIZE,
506 DMA_BIDIRECTIONAL);

To be honest, that starting paragraph was confusing. At first I thought
you were saying gasket was an RDMA driver. :P I shouldn't have to read
a different document to understand the commit message. It should be
summarized enough and the other documentation is supplemental.

"In 2019 we introduced pin_user_pages() and now we are converting
get_user_pages() to the new API as appropriate".

>
> There is some helpful background in [2]: basically, this is a small
> part of fixing a long-standing disconnect between pinning pages, and
> file systems' use of those pages.

What is the impact of this patch on runtime?

>
> [1] Documentation/core-api/pin_user_pages.rst
>
> [2] "Explicit pinning of user-space pages":
> https://lwn.net/Articles/807108/
>
> Signed-off-by: Souptick Joarder <[email protected]>
> Cc: John Hubbard <[email protected]>
>
> Hi,
>
> I'm compile tested this, but unable to run-time test, so any testing
> help is much appriciated.
> ---

The "Hi" part of patch should have been under the "---" cut off line so
this will definitely need to be resent.

regards,
dan carpenter

2020-05-29 06:21:17

by Souptick Joarder

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On Thu, May 28, 2020 at 4:34 PM Dan Carpenter <[email protected]> wrote:
>
> On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
> > This code was using get_user_pages_fast(), in a "Case 2" scenario
> > (DMA/RDMA), using the categorization from [1]. That means that it's
> > time to convert the get_user_pages_fast() + put_page() calls to
> > pin_user_pages_fast() + unpin_user_page() calls.
>
> You are saying that the page is used for DIO and not DMA, but it sure
> looks to me like it is used for DMA.

No, I was referring to "Case 2" scenario in change log which means it is
used for DMA, not DIO.

>
> 503 /* Map the page into DMA space. */
> 504 ptes[i].dma_addr =
> 505 dma_map_page(pg_tbl->device, page, 0, PAGE_SIZE,
> 506 DMA_BIDIRECTIONAL);
>
> To be honest, that starting paragraph was confusing. At first I thought
> you were saying gasket was an RDMA driver. :P I shouldn't have to read
> a different document to understand the commit message. It should be
> summarized enough and the other documentation is supplemental.
>
> "In 2019 we introduced pin_user_pages() and now we are converting
> get_user_pages() to the new API as appropriate".

As all other similar conversion have similar change logs, so I was trying
to maintain the same. John might have a different opinion on this.

John, Any further opinion ??

>
> >
> > There is some helpful background in [2]: basically, this is a small
> > part of fixing a long-standing disconnect between pinning pages, and
> > file systems' use of those pages.
>
> What is the impact of this patch on runtime?

I don't have the hardware to validate the runtime impact and will
wait if someone is going to validate it for runtime impact.

>
> >
> > [1] Documentation/core-api/pin_user_pages.rst
> >
> > [2] "Explicit pinning of user-space pages":
> > https://lwn.net/Articles/807108/
> >
> > Signed-off-by: Souptick Joarder <[email protected]>
> > Cc: John Hubbard <[email protected]>
> >
> > Hi,
> >
> > I'm compile tested this, but unable to run-time test, so any testing
> > help is much appriciated.
> > ---
>
> The "Hi" part of patch should have been under the "---" cut off line so
> this will definitely need to be resent.

Sorry about it.
Will wait for feedback from John before resend it :)

>
> regards,
> dan carpenter
>

2020-05-29 06:29:43

by Souptick Joarder

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On Fri, May 29, 2020 at 11:46 AM Souptick Joarder <[email protected]> wrote:
>
> On Thu, May 28, 2020 at 4:34 PM Dan Carpenter <[email protected]> wrote:
> >
> > On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
> > > This code was using get_user_pages_fast(), in a "Case 2" scenario
> > > (DMA/RDMA), using the categorization from [1]. That means that it's
> > > time to convert the get_user_pages_fast() + put_page() calls to
> > > pin_user_pages_fast() + unpin_user_page() calls.
> >
> > You are saying that the page is used for DIO and not DMA, but it sure
> > looks to me like it is used for DMA.
>
> No, I was referring to "Case 2" scenario in change log which means it is
> used for DMA, not DIO.
>
> >
> > 503 /* Map the page into DMA space. */
> > 504 ptes[i].dma_addr =
> > 505 dma_map_page(pg_tbl->device, page, 0, PAGE_SIZE,
> > 506 DMA_BIDIRECTIONAL);
> >
> > To be honest, that starting paragraph was confusing. At first I thought
> > you were saying gasket was an RDMA driver. :P I shouldn't have to read
> > a different document to understand the commit message. It should be
> > summarized enough and the other documentation is supplemental.
> >
> > "In 2019 we introduced pin_user_pages() and now we are converting
> > get_user_pages() to the new API as appropriate".
>
> As all other similar conversion have similar change logs, so I was trying
> to maintain the same. John might have a different opinion on this.

For example, I was referring to few recent similar commits for change logs.

http://lkml.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]


>
> John, Any further opinion ??
>
> >
> > >
> > > There is some helpful background in [2]: basically, this is a small
> > > part of fixing a long-standing disconnect between pinning pages, and
> > > file systems' use of those pages.
> >
> > What is the impact of this patch on runtime?
>
> I don't have the hardware to validate the runtime impact and will
> wait if someone is going to validate it for runtime impact.
>
> >
> > >
> > > [1] Documentation/core-api/pin_user_pages.rst
> > >
> > > [2] "Explicit pinning of user-space pages":
> > > https://lwn.net/Articles/807108/
> > >
> > > Signed-off-by: Souptick Joarder <[email protected]>
> > > Cc: John Hubbard <[email protected]>
> > >
> > > Hi,
> > >
> > > I'm compile tested this, but unable to run-time test, so any testing
> > > help is much appriciated.
> > > ---
> >
> > The "Hi" part of patch should have been under the "---" cut off line so
> > this will definitely need to be resent.
>
> Sorry about it.
> Will wait for feedback from John before resend it :)
>
> >
> > regards,
> > dan carpenter
> >

2020-05-29 07:41:13

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On 2020-05-28 23:27, Souptick Joarder wrote:
> On Fri, May 29, 2020 at 11:46 AM Souptick Joarder <[email protected]> wrote:
>>
>> On Thu, May 28, 2020 at 4:34 PM Dan Carpenter <[email protected]> wrote:
>>>
>>> On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
>>>> This code was using get_user_pages_fast(), in a "Case 2" scenario
>>>> (DMA/RDMA), using the categorization from [1]. That means that it's
>>>> time to convert the get_user_pages_fast() + put_page() calls to
>>>> pin_user_pages_fast() + unpin_user_page() calls.
>>>
>>> You are saying that the page is used for DIO and not DMA, but it sure
>>> looks to me like it is used for DMA.
>>
>> No, I was referring to "Case 2" scenario in change log which means it is
>> used for DMA, not DIO.

Hi,

Dan, I also uncertain as to how you read this as referring to DIO. Case 2 is
DMA or RDMA, and in fact the proposed commit log says both of those things:
Case 2 and DMA/RDMA. I don't see "DIO" anywhere here...


>>
>>>
>>> 503 /* Map the page into DMA space. */
>>> 504 ptes[i].dma_addr =
>>> 505 dma_map_page(pg_tbl->device, page, 0, PAGE_SIZE,
>>> 506 DMA_BIDIRECTIONAL);
>>>
>>> To be honest, that starting paragraph was confusing. At first I thought
>>> you were saying gasket was an RDMA driver. :P I shouldn't have to read
>>> a different document to understand the commit message. It should be
>>> summarized enough and the other documentation is supplemental.
>>>
>>> "In 2019 we introduced pin_user_pages() and now we are converting
>>> get_user_pages() to the new API as appropriate".
>>
>> As all other similar conversion have similar change logs, so I was trying
>> to maintain the same. John might have a different opinion on this.
>
> For example, I was referring to few recent similar commits for change logs.
>
> http://lkml.kernel.org/r/[email protected]
> https://lore.kernel.org/r/[email protected]
>
>
>>
>> John, Any further opinion ??


Well, I've gotten away with the current wording for quite a few patches so
far, but that sure doesn't mean it's perfect! :)

Maybe adding the words that Dan suggests, above, will suffice? Here:

>>> "In 2019 we introduced pin_user_pages() and now we are converting
>>> get_user_pages() to the new API as appropriate".


thanks,
--
John Hubbard
NVIDIA

2020-05-29 07:49:25

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On Fri, May 29, 2020 at 11:57:09AM +0530, Souptick Joarder wrote:
> On Fri, May 29, 2020 at 11:46 AM Souptick Joarder <[email protected]> wrote:
> >
> > On Thu, May 28, 2020 at 4:34 PM Dan Carpenter <[email protected]> wrote:
> > >
> > > On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
> > > > This code was using get_user_pages_fast(), in a "Case 2" scenario
> > > > (DMA/RDMA), using the categorization from [1]. That means that it's
> > > > time to convert the get_user_pages_fast() + put_page() calls to
> > > > pin_user_pages_fast() + unpin_user_page() calls.
> > >
> > > You are saying that the page is used for DIO and not DMA, but it sure
> > > looks to me like it is used for DMA.
> >
> > No, I was referring to "Case 2" scenario in change log which means it is
> > used for DMA, not DIO.

You can't use pin_user_pages() for DMA. This was second reason that I
was confused.

mm/gup.c
2863 /**
2864 * pin_user_pages_fast() - pin user pages in memory without taking locks
2865 *
2866 * @start: starting user address
2867 * @nr_pages: number of pages from start to pin
2868 * @gup_flags: flags modifying pin behaviour
2869 * @pages: array that receives pointers to the pages pinned.
2870 * Should be at least nr_pages long.
2871 *
2872 * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is set. See
2873 * get_user_pages_fast() for documentation on the function arguments, because
2874 * the arguments here are identical.
2875 *
2876 * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
2877 * see Documentation/core-api/pin_user_pages.rst for further details.
2878 *
2879 * This is intended for Case 1 (DIO) in Documentation/core-api/pin_user_pages.rst. It
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2880 * is NOT intended for Case 2 (RDMA: long-term pins).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2881 */
2882 int pin_user_pages_fast(unsigned long start, int nr_pages,
2883 unsigned int gup_flags, struct page **pages)
2884 {
2885 /* FOLL_GET and FOLL_PIN are mutually exclusive. */
2886 if (WARN_ON_ONCE(gup_flags & FOLL_GET))
2887 return -EINVAL;
2888
2889 gup_flags |= FOLL_PIN;
2890 return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
2891 }
2892 EXPORT_SYMBOL_GPL(pin_user_pages_fast);

regards,
dan carpenter

2020-05-29 07:49:51

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On Fri, May 29, 2020 at 12:38:20AM -0700, John Hubbard wrote:
> On 2020-05-28 23:27, Souptick Joarder wrote:
> > On Fri, May 29, 2020 at 11:46 AM Souptick Joarder <[email protected]> wrote:
> > >
> > > On Thu, May 28, 2020 at 4:34 PM Dan Carpenter <[email protected]> wrote:
> > > >
> > > > On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
> > > > > This code was using get_user_pages_fast(), in a "Case 2" scenario
> > > > > (DMA/RDMA), using the categorization from [1]. That means that it's
> > > > > time to convert the get_user_pages_fast() + put_page() calls to
> > > > > pin_user_pages_fast() + unpin_user_page() calls.
> > > >
> > > > You are saying that the page is used for DIO and not DMA, but it sure
> > > > looks to me like it is used for DMA.
> > >
> > > No, I was referring to "Case 2" scenario in change log which means it is
> > > used for DMA, not DIO.
>
> Hi,
>
> Dan, I also uncertain as to how you read this as referring to DIO. Case 2 is
> DMA or RDMA, and in fact the proposed commit log says both of those things:
> Case 2 and DMA/RDMA. I don't see "DIO" anywhere here...

I thought he meant that the original code was appropriate for DMA and he
was fixing it. :P

regards,
dan carpenter

2020-05-29 08:02:49

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On 2020-05-29 00:46, Dan Carpenter wrote:
> On Fri, May 29, 2020 at 11:57:09AM +0530, Souptick Joarder wrote:
>> On Fri, May 29, 2020 at 11:46 AM Souptick Joarder <[email protected]> wrote:
>>>
>>> On Thu, May 28, 2020 at 4:34 PM Dan Carpenter <[email protected]> wrote:
>>>>
>>>> On Thu, May 28, 2020 at 02:32:42AM +0530, Souptick Joarder wrote:
>>>>> This code was using get_user_pages_fast(), in a "Case 2" scenario
>>>>> (DMA/RDMA), using the categorization from [1]. That means that it's
>>>>> time to convert the get_user_pages_fast() + put_page() calls to
>>>>> pin_user_pages_fast() + unpin_user_page() calls.
>>>>
>>>> You are saying that the page is used for DIO and not DMA, but it sure
>>>> looks to me like it is used for DMA.
>>>
>>> No, I was referring to "Case 2" scenario in change log which means it is
>>> used for DMA, not DIO.
>
> You can't use pin_user_pages() for DMA. This was second reason that I
> was confused.


OK, now it is getting interesting!


>
> mm/gup.c
> 2863 /**
> 2864 * pin_user_pages_fast() - pin user pages in memory without taking locks
> 2865 *
> 2866 * @start: starting user address
> 2867 * @nr_pages: number of pages from start to pin
> 2868 * @gup_flags: flags modifying pin behaviour
> 2869 * @pages: array that receives pointers to the pages pinned.
> 2870 * Should be at least nr_pages long.
> 2871 *
> 2872 * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is set. See
> 2873 * get_user_pages_fast() for documentation on the function arguments, because
> 2874 * the arguments here are identical.
> 2875 *
> 2876 * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
> 2877 * see Documentation/core-api/pin_user_pages.rst for further details.
> 2878 *
> 2879 * This is intended for Case 1 (DIO) in Documentation/core-api/pin_user_pages.rst. It
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 2880 * is NOT intended for Case 2 (RDMA: long-term pins).
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


I'm trying to figure out why I wrote that. It seems just wrong, because once the
page is dma-pinned, it will work just fine for either Case 1 or Case 2. hmmm, I
think this was from a few design ideas ago, when we were still working through the
FOLL_LONGTERM and FOLL_PIN thoughts and how the pin_user_pages*() API set should
look.

At this point, it's looking very much like a (my) documentation bug: all 4 of the
"intended for Case 1 (DIO)" comments in mm/gup.c probably need to be simply deleted.
Good catch.



> 2881 */
> 2882 int pin_user_pages_fast(unsigned long start, int nr_pages,
> 2883 unsigned int gup_flags, struct page **pages)
> 2884 {
> 2885 /* FOLL_GET and FOLL_PIN are mutually exclusive. */
> 2886 if (WARN_ON_ONCE(gup_flags & FOLL_GET))
> 2887 return -EINVAL;
> 2888
> 2889 gup_flags |= FOLL_PIN;
> 2890 return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
> 2891 }
> 2892 EXPORT_SYMBOL_GPL(pin_user_pages_fast);
>
> regards,
> dan carpenter
>

thanks,
--
John Hubbard
NVIDIA

2020-05-29 11:56:08

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

Anyway, can you resend with the commit message re-written. To me the
information that's most useful is from the lwn article:

"In short, if pages are being pinned for access to the data
contained within those pages, pin_user_pages() should be used. For
cases where the intent is to manipulate the page structures
corresponding to the pages rather than the data within them,
get_user_pages() is the correct interface."

What are the runtime implications of this patch? I'm still not clear on
that honestly.

When I'm reviewing patches, I also want to know how a bug was
introduced. In this case the original author did everything correctly
but we've just added some new features (cleanups. whatever).

I did skim the LWN article back in December but I don't remember the
details so I really want all this stuff re-stated in each commit
message.

regards,
dan carpenter

2020-05-29 20:34:09

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] staging: gasket: Convert get_user_pages*() --> pin_user_pages*()

On 2020-05-29 04:53, Dan Carpenter wrote:
...
> What are the runtime implications of this patch? I'm still not clear on
> that honestly.

Instead of incrementing each page's refcount by 1 (with get_user_pages()),
pin_user_pages*() will increment by GUP_PIN_COUNTING_BIAS, which is 1024.
That by itself should not have any performance impact, of course, but
there's a couple more things:

For compound pages of more than 2 page size, it will also increment
a separate struct page's field, via hpage_pincount_add().

And finally, it will update /proc/vmstat counters on pin and unpin, via
the optimized mod_node_page_state() call.

So it's expected to be very light. And, for DMA (as opposed to DIO)
situations, the DMA setup time is inevitably much greater than any of
the above overheads, so I expect that this patch will be completely
invisible from a performance point of view.

It would be a "nice to have", though, if anyone were able to do a
performance comparison on the gasket driver for this patch, and/or
basic runtime verification, since I'm sure it's a specialized setup.


thanks,
--
John Hubbard
NVIDIA