Hi,
It recently became clear to me that there are some get_user_pages*()
callers that don't fit neatly into any of the four cases that are so
far listed in pin_user_pages.rst. vhost.c is one of those.
Add a Case 5 to the documentation, and refer to that when converting
vhost.c.
Thanks to Jan Kara for helping me (again) in understanding the
interaction between get_user_pages() and page writeback [1].
This is based on today's mmotm, which has a nearby patch to
pin_user_pages.rst that rewords cases 3 and 4.
Note that I have only compile-tested the vhost.c patch, although that
does also include cross-compiling for a few other arches. Any run-time
testing would be greatly appreciated.
[1] https://lore.kernel.org/r/[email protected]
John Hubbard (2):
docs: mm/gup: pin_user_pages.rst: add a "case 5"
vhost: convert get_user_pages() --> pin_user_pages()
Documentation/core-api/pin_user_pages.rst | 20 ++++++++++++++++++++
drivers/vhost/vhost.c | 5 ++---
2 files changed, 22 insertions(+), 3 deletions(-)
--
2.26.2
There are four cases listed in pin_user_pages.rst. These are
intended to help developers figure out whether to use
get_user_pages*(), or pin_user_pages*(). However, the four cases
do not cover all the situations. For example, drivers/vhost/vhost.c
has a "pin, write to page, set page dirty, unpin" case.
Add a fifth case, to help explain that there is a general pattern
that requires pin_user_pages*() API calls.
Cc: Vlastimil Babka <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Hubbard <[email protected]>
---
Documentation/core-api/pin_user_pages.rst | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index 4675b04e8829..b9f2688a2c67 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -171,6 +171,26 @@ If only struct page data (as opposed to the actual memory contents that a page
is tracking) is affected, then normal GUP calls are sufficient, and neither flag
needs to be set.
+CASE 5: Pinning in order to write to the data within the page
+-------------------------------------------------------------
+Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
+access page's data, unpin" can cause a problem. Case 5 may be considered a
+superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
+other words, if the code is neither Case 1 nor Case 2, it may still require
+FOLL_PIN, for patterns like this:
+
+Correct (uses FOLL_PIN calls):
+ pin_user_pages()
+ access the data within the pages
+ set_page_dirty_lock()
+ unpin_user_pages()
+
+INCORRECT (uses FOLL_GET calls):
+ get_user_pages()
+ access the data within the pages
+ set_page_dirty_lock()
+ put_page()
+
page_maybe_dma_pinned(): the whole point of pinning
===================================================
--
2.26.2
This code was using get_user_pages*(), in approximately a "Case 5"
scenario (accessing the data within a page), using the categorization
from [1]. That means that it's time to convert the get_user_pages*() +
put_page() calls to pin_user_pages*() + unpin_user_pages() calls.
There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.
[1] Documentation/core-api/pin_user_pages.rst
[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Hubbard <[email protected]>
---
drivers/vhost/vhost.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 21a59b598ed8..596132a96cd5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1762,15 +1762,14 @@ static int set_bit_to_user(int nr, void __user *addr)
int bit = nr + (log % PAGE_SIZE) * 8;
int r;
- r = get_user_pages_fast(log, 1, FOLL_WRITE, &page);
+ r = pin_user_pages_fast(log, 1, FOLL_WRITE, &page);
if (r < 0)
return r;
BUG_ON(r != 1);
base = kmap_atomic(page);
set_bit(bit, base);
kunmap_atomic(base);
- set_page_dirty_lock(page);
- put_page(page);
+ unpin_user_pages_dirty_lock(&page, 1, true);
return 0;
}
--
2.26.2
On Sat, May 30, 2020 at 5:13 AM John Hubbard <[email protected]> wrote:
>
> There are four cases listed in pin_user_pages.rst. These are
> intended to help developers figure out whether to use
> get_user_pages*(), or pin_user_pages*(). However, the four cases
> do not cover all the situations. For example, drivers/vhost/vhost.c
> has a "pin, write to page, set page dirty, unpin" case.
>
> Add a fifth case, to help explain that there is a general pattern
> that requires pin_user_pages*() API calls.
>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Dave Chinner <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: John Hubbard <[email protected]>
> ---
> Documentation/core-api/pin_user_pages.rst | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
> index 4675b04e8829..b9f2688a2c67 100644
> --- a/Documentation/core-api/pin_user_pages.rst
> +++ b/Documentation/core-api/pin_user_pages.rst
> @@ -171,6 +171,26 @@ If only struct page data (as opposed to the actual memory contents that a page
> is tracking) is affected, then normal GUP calls are sufficient, and neither flag
> needs to be set.
>
> +CASE 5: Pinning in order to write to the data within the page
> +-------------------------------------------------------------
> +Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
> +access page's data, unpin" can cause a problem.
Will it be, *"pin, access page's data, set page dirty, unpin" * ?
Case 5 may be considered a
> +superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
> +other words, if the code is neither Case 1 nor Case 2, it may still require
> +FOLL_PIN, for patterns like this:
> +
> +Correct (uses FOLL_PIN calls):
> + pin_user_pages()
> + access the data within the pages
> + set_page_dirty_lock()
> + unpin_user_pages()
> +
> +INCORRECT (uses FOLL_GET calls):
> + get_user_pages()
> + access the data within the pages
> + set_page_dirty_lock()
> + put_page()
> +
> page_maybe_dma_pinned(): the whole point of pinning
> ===================================================
>
> --
> 2.26.2
>
On 2020-05-31 00:11, Souptick Joarder wrote:
...
>> diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
>> index 4675b04e8829..b9f2688a2c67 100644
>> --- a/Documentation/core-api/pin_user_pages.rst
>> +++ b/Documentation/core-api/pin_user_pages.rst
>> @@ -171,6 +171,26 @@ If only struct page data (as opposed to the actual memory contents that a page
>> is tracking) is affected, then normal GUP calls are sufficient, and neither flag
>> needs to be set.
>>
>> +CASE 5: Pinning in order to write to the data within the page
>> +-------------------------------------------------------------
>> +Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
>> +access page's data, unpin" can cause a problem.
>
> Will it be, *"pin, access page's data, set page dirty, unpin" * ?
Well...the problem can show up with just accessing (writing) the data.
But it is true that this statement is a little different from the
patterns below, which is confusing. I'll delete set_page_dirty() from each
of them, in order to avoid confusing things. (Although each is correct.)
And I'll also change the above to "pin, write to a page's data, upin".
set_page_dirty() interactions are really just extra credit here. :) And
fully read-only situations won't cause a problem.
>
> Case 5 may be considered a
>> +superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
>> +other words, if the code is neither Case 1 nor Case 2, it may still require
>> +FOLL_PIN, for patterns like this:
>> +
>> +Correct (uses FOLL_PIN calls):
>> + pin_user_pages()
>> + access the data within the pages
>> + set_page_dirty_lock()
>> + unpin_user_pages()
>> +
>> +INCORRECT (uses FOLL_GET calls):
>> + get_user_pages()
>> + access the data within the pages
>> + set_page_dirty_lock()
>> + put_page()
>> +
I'll send a v2 shortly.
thanks,
--
John Hubbard
NVIDIA
On Fri 29-05-20 16:43:09, John Hubbard wrote:
> This code was using get_user_pages*(), in approximately a "Case 5"
> scenario (accessing the data within a page), using the categorization
> from [1]. That means that it's time to convert the get_user_pages*() +
> put_page() calls to pin_user_pages*() + unpin_user_pages() calls.
>
> There is some helpful background in [2]: basically, this is a small
> part of fixing a long-standing disconnect between pinning pages, and
> file systems' use of those pages.
>
> [1] Documentation/core-api/pin_user_pages.rst
>
> [2] "Explicit pinning of user-space pages":
> https://lwn.net/Articles/807108/
>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: John Hubbard <[email protected]>
Looks good to me. You can add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> drivers/vhost/vhost.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 21a59b598ed8..596132a96cd5 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1762,15 +1762,14 @@ static int set_bit_to_user(int nr, void __user *addr)
> int bit = nr + (log % PAGE_SIZE) * 8;
> int r;
>
> - r = get_user_pages_fast(log, 1, FOLL_WRITE, &page);
> + r = pin_user_pages_fast(log, 1, FOLL_WRITE, &page);
> if (r < 0)
> return r;
> BUG_ON(r != 1);
> base = kmap_atomic(page);
> set_bit(bit, base);
> kunmap_atomic(base);
> - set_page_dirty_lock(page);
> - put_page(page);
> + unpin_user_pages_dirty_lock(&page, 1, true);
> return 0;
> }
>
> --
> 2.26.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, May 29, 2020 at 04:43:09PM -0700, John Hubbard wrote:
> This code was using get_user_pages*(), in approximately a "Case 5"
> scenario (accessing the data within a page), using the categorization
> from [1]. That means that it's time to convert the get_user_pages*() +
> put_page() calls to pin_user_pages*() + unpin_user_pages() calls.
>
> There is some helpful background in [2]: basically, this is a small
> part of fixing a long-standing disconnect between pinning pages, and
> file systems' use of those pages.
>
> [1] Documentation/core-api/pin_user_pages.rst
>
> [2] "Explicit pinning of user-space pages":
> https://lwn.net/Articles/807108/
>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: John Hubbard <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
> ---
> drivers/vhost/vhost.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 21a59b598ed8..596132a96cd5 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1762,15 +1762,14 @@ static int set_bit_to_user(int nr, void __user *addr)
> int bit = nr + (log % PAGE_SIZE) * 8;
> int r;
>
> - r = get_user_pages_fast(log, 1, FOLL_WRITE, &page);
> + r = pin_user_pages_fast(log, 1, FOLL_WRITE, &page);
> if (r < 0)
> return r;
> BUG_ON(r != 1);
> base = kmap_atomic(page);
> set_bit(bit, base);
> kunmap_atomic(base);
> - set_page_dirty_lock(page);
> - put_page(page);
> + unpin_user_pages_dirty_lock(&page, 1, true);
> return 0;
> }
>
> --
> 2.26.2
On Fri, May 29, 2020 at 04:43:08PM -0700, John Hubbard wrote:
> +CASE 5: Pinning in order to write to the data within the page
> +-------------------------------------------------------------
> +Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
> +access page's data, unpin" can cause a problem. Case 5 may be considered a
> +superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
> +other words, if the code is neither Case 1 nor Case 2, it may still require
> +FOLL_PIN, for patterns like this:
> +
> +Correct (uses FOLL_PIN calls):
> + pin_user_pages()
> + access the data within the pages
> + set_page_dirty_lock()
> + unpin_user_pages()
> +
> +INCORRECT (uses FOLL_GET calls):
> + get_user_pages()
> + access the data within the pages
> + set_page_dirty_lock()
> + put_page()
Why does this case need to pin? Why can't it just do ...
get_user_pages()
lock_page(page);
... modify the data ...
set_page_dirty(page);
unlock_page(page);
On 2020-06-12 12:24, Matthew Wilcox wrote:
> On Fri, May 29, 2020 at 04:43:08PM -0700, John Hubbard wrote:
>> +CASE 5: Pinning in order to write to the data within the page
>> +-------------------------------------------------------------
>> +Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
>> +access page's data, unpin" can cause a problem. Case 5 may be considered a
>> +superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
>> +other words, if the code is neither Case 1 nor Case 2, it may still require
>> +FOLL_PIN, for patterns like this:
>> +
>> +Correct (uses FOLL_PIN calls):
>> + pin_user_pages()
>> + access the data within the pages
>> + set_page_dirty_lock()
>> + unpin_user_pages()
>> +
>> +INCORRECT (uses FOLL_GET calls):
>> + get_user_pages()
>> + access the data within the pages
>> + set_page_dirty_lock()
>> + put_page()
>
> Why does this case need to pin? Why can't it just do ...
>
> get_user_pages()
> lock_page(page);
> ... modify the data ...
> set_page_dirty(page);
> unlock_page(page);
>
Yes, it could do that. And that would also make a good additional "correct"
example. Especially for the case of just dealing with a single page,
lock_page() has the benefit of completely fixing the problem *today*,
without waiting for the pin_user_pages*() handling improvements to get
implemented.
And it's also another (probably better) way to fix the vhost.c problem, than
commit 690623e1b496 ("vhost: convert get_user_pages() --> pin_user_pages()").
I'm inclined to leave vhost.c alone for now, unless someone really prefers
it to be changed, but to update the Case 5 documentation with your point
above. Sound about right?
thanks,
--
John Hubbard
NVIDIA