2022-08-27 08:38:20

by John Hubbard

[permalink] [raw]
Subject: [PATCH 3/6] iov_iter: new iov_iter_pin_pages*() routines

Provide two new wrapper routines that are intended for user space pages
only:

iov_iter_pin_pages()
iov_iter_pin_pages_alloc()

Internally, these routines call pin_user_pages_fast(), instead of
get_user_pages_fast().

As always, callers must use unpin_user_pages() or a suitable FOLL_PIN
variant, to release the pages, if they actually were acquired via
pin_user_pages_fast().

This is a prerequisite to converting bio/block layers over to use
pin_user_pages_fast().

Signed-off-by: John Hubbard <[email protected]>
---
include/linux/uio.h | 4 +++
lib/iov_iter.c | 74 ++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 5896af36199c..e26908e443d1 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -251,6 +251,10 @@ ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
size_t maxsize, unsigned maxpages, size_t *start);
ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
size_t maxsize, size_t *start);
+ssize_t iov_iter_pin_pages(struct iov_iter *i, struct page **pages,
+ size_t maxsize, unsigned int maxpages, size_t *start);
+ssize_t iov_iter_pin_pages_alloc(struct iov_iter *i, struct page ***pages,
+ size_t maxsize, size_t *start);
int iov_iter_npages(const struct iov_iter *i, int maxpages);
void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 4b7fce72e3e5..1c08014c8498 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1425,9 +1425,23 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
return page;
}

+enum pages_alloc_internal_flags {
+ USE_FOLL_GET,
+ USE_FOLL_PIN
+};
+
+/*
+ * TODO: get rid of the how_to_pin arg, and just call pin_user_pages_fast()
+ * unconditionally for the user_back_iter(i) case in this function. That can be
+ * done once all callers are ready to deal with FOLL_PIN pages for their
+ * user-space pages. (FOLL_PIN pages must be released via unpin_user_page(),
+ * rather than put_page().)
+ */
static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
struct page ***pages, size_t maxsize,
- unsigned int maxpages, size_t *start)
+ unsigned int maxpages, size_t *start,
+ enum pages_alloc_internal_flags how_to_pin)
+
{
unsigned int n;

@@ -1454,7 +1468,12 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
n = want_pages_array(pages, maxsize, *start, maxpages);
if (!n)
return -ENOMEM;
- res = get_user_pages_fast(addr, n, gup_flags, *pages);
+
+ if (how_to_pin == USE_FOLL_PIN)
+ res = pin_user_pages_fast(addr, n, gup_flags, *pages);
+ else
+ res = get_user_pages_fast(addr, n, gup_flags, *pages);
+
if (unlikely(res <= 0))
return res;
maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - *start);
@@ -1497,10 +1516,31 @@ ssize_t iov_iter_get_pages2(struct iov_iter *i,
return 0;
BUG_ON(!pages);

- return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
+ return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start,
+ USE_FOLL_GET);
}
EXPORT_SYMBOL(iov_iter_get_pages2);

+/*
+ * A FOLL_PIN variant that calls pin_user_pages_fast() instead of
+ * get_user_pages_fast().
+ */
+ssize_t iov_iter_pin_pages(struct iov_iter *i,
+ struct page **pages, size_t maxsize, unsigned int maxpages,
+ size_t *start)
+{
+ if (!maxpages)
+ return 0;
+ if (WARN_ON_ONCE(!pages))
+ return -EINVAL;
+ if (WARN_ON_ONCE(!user_backed_iter(i)))
+ return -EINVAL;
+
+ return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start,
+ USE_FOLL_PIN);
+}
+EXPORT_SYMBOL(iov_iter_pin_pages);
+
ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
struct page ***pages, size_t maxsize,
size_t *start)
@@ -1509,7 +1549,8 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,

*pages = NULL;

- len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
+ len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
+ USE_FOLL_GET);
if (len <= 0) {
kvfree(*pages);
*pages = NULL;
@@ -1518,6 +1559,31 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

+/*
+ * A FOLL_PIN variant that calls pin_user_pages_fast() instead of
+ * get_user_pages_fast().
+ */
+ssize_t iov_iter_pin_pages_alloc(struct iov_iter *i,
+ struct page ***pages, size_t maxsize,
+ size_t *start)
+{
+ ssize_t len;
+
+ *pages = NULL;
+
+ if (WARN_ON_ONCE(!user_backed_iter(i)))
+ return -EINVAL;
+
+ len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start,
+ USE_FOLL_PIN);
+ if (len <= 0) {
+ kvfree(*pages);
+ *pages = NULL;
+ }
+ return len;
+}
+EXPORT_SYMBOL(iov_iter_pin_pages_alloc);
+
size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
--
2.37.2


2022-08-27 22:50:39

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 3/6] iov_iter: new iov_iter_pin_pages*() routines

On Sat, Aug 27, 2022 at 01:36:04AM -0700, John Hubbard wrote:
> Provide two new wrapper routines that are intended for user space pages
> only:
>
> iov_iter_pin_pages()
> iov_iter_pin_pages_alloc()
>
> Internally, these routines call pin_user_pages_fast(), instead of
> get_user_pages_fast().
>
> As always, callers must use unpin_user_pages() or a suitable FOLL_PIN
> variant, to release the pages, if they actually were acquired via
> pin_user_pages_fast().
>
> This is a prerequisite to converting bio/block layers over to use
> pin_user_pages_fast().

You do realize that O_DIRECT on ITER_BVEC is possible, right?

2022-08-27 22:51:23

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 3/6] iov_iter: new iov_iter_pin_pages*() routines

On 8/27/22 15:46, Al Viro wrote:
> On Sat, Aug 27, 2022 at 01:36:04AM -0700, John Hubbard wrote:
>> Provide two new wrapper routines that are intended for user space pages
>> only:
>>
>> iov_iter_pin_pages()
>> iov_iter_pin_pages_alloc()
>>
>> Internally, these routines call pin_user_pages_fast(), instead of
>> get_user_pages_fast().
>>
>> As always, callers must use unpin_user_pages() or a suitable FOLL_PIN
>> variant, to release the pages, if they actually were acquired via
>> pin_user_pages_fast().
>>
>> This is a prerequisite to converting bio/block layers over to use
>> pin_user_pages_fast().
>
> You do realize that O_DIRECT on ITER_BVEC is possible, right?

I do now. But I didn't until you wrote this!

Any advice or ideas would be welcome here.


thanks,

--
John Hubbard
NVIDIA