2021-02-03 22:07:20

by Joao Martins

[permalink] [raw]
Subject: [PATCH 3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()

Add a unpin_user_page_range() API which takes a starting page
and how many consecutive pages we want to dirty.

Given that we won't be iterating on a list of changes, change
compound_next() to receive a bool, whether to calculate from the starting
page, or walk the page array. Finally add a separate iterator,
for_each_compound_range() that just operate in page ranges as opposed
to page array.

For users (like RDMA mr_dereg) where each sg represents a
contiguous set of pages, we're able to more efficiently unpin
pages without having to supply an array of pages much of what
happens today with unpin_user_pages().

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Joao Martins <[email protected]>
---
include/linux/mm.h | 2 ++
mm/gup.c | 48 ++++++++++++++++++++++++++++++++++++++--------
2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a608feb0d42e..b76063f7f18a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1265,6 +1265,8 @@ static inline void put_page(struct page *page)
void unpin_user_page(struct page *page);
void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
bool make_dirty);
+void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
+ bool make_dirty);
void unpin_user_pages(struct page **pages, unsigned long npages);

/**
diff --git a/mm/gup.c b/mm/gup.c
index 971a24b4b73f..1b57355d5033 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -215,11 +215,16 @@ void unpin_user_page(struct page *page)
}
EXPORT_SYMBOL(unpin_user_page);

-static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
+static inline unsigned int count_ntails(struct page **pages,
+ unsigned long npages, bool range)
{
- struct page *head = compound_head(pages[0]);
+ struct page *page = pages[0], *head = compound_head(page);
unsigned int ntails;

+ if (range)
+ return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
+ min_t(unsigned int, (head + compound_nr(head) - page), npages);
+
for (ntails = 1; ntails < npages; ntails++) {
if (compound_head(pages[ntails]) != head)
break;
@@ -229,20 +234,32 @@ static inline unsigned int count_ntails(struct page **pages, unsigned long npage
}

static inline void compound_next(unsigned long i, unsigned long npages,
- struct page **list, struct page **head,
- unsigned int *ntails)
+ struct page **list, bool range,
+ struct page **head, unsigned int *ntails)
{
+ struct page *p, **next = &p;
+
if (i >= npages)
return;

- *ntails = count_ntails(list + i, npages - i);
- *head = compound_head(list[i]);
+ if (range)
+ *next = *list + i;
+ else
+ next = list + i;
+
+ *ntails = count_ntails(next, npages - i, range);
+ *head = compound_head(*next);
}

+#define for_each_compound_range(i, list, npages, head, ntails) \
+ for (i = 0, compound_next(i, npages, list, true, &head, &ntails); \
+ i < npages; i += ntails, \
+ compound_next(i, npages, list, true, &head, &ntails))
+
#define for_each_compound_head(i, list, npages, head, ntails) \
- for (i = 0, compound_next(i, npages, list, &head, &ntails); \
+ for (i = 0, compound_next(i, npages, list, false, &head, &ntails); \
i < npages; i += ntails, \
- compound_next(i, npages, list, &head, &ntails))
+ compound_next(i, npages, list, false, &head, &ntails))

/**
* unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
@@ -306,6 +323,21 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
}
EXPORT_SYMBOL(unpin_user_pages_dirty_lock);

+void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
+ bool make_dirty)
+{
+ unsigned long index;
+ struct page *head;
+ unsigned int ntails;
+
+ for_each_compound_range(index, &page, npages, head, ntails) {
+ if (make_dirty && !PageDirty(head))
+ set_page_dirty_lock(head);
+ put_compound_head(head, ntails, FOLL_PIN);
+ }
+}
+EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
+
/**
* unpin_user_pages() - release an array of gup-pinned pages.
* @pages: array of pages to be marked dirty and released.
--
2.17.1


2021-02-03 23:42:11

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()

On 2/3/21 2:00 PM, Joao Martins wrote:
> Add a unpin_user_page_range() API which takes a starting page
> and how many consecutive pages we want to dirty.
>
> Given that we won't be iterating on a list of changes, change
> compound_next() to receive a bool, whether to calculate from the starting
> page, or walk the page array. Finally add a separate iterator,

A bool arg is sometimes, but not always, a hint that you really just want
a separate set of routines. Below...

> for_each_compound_range() that just operate in page ranges as opposed
> to page array.
>
> For users (like RDMA mr_dereg) where each sg represents a
> contiguous set of pages, we're able to more efficiently unpin
> pages without having to supply an array of pages much of what
> happens today with unpin_user_pages().
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Joao Martins <[email protected]>
> ---
> include/linux/mm.h | 2 ++
> mm/gup.c | 48 ++++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a608feb0d42e..b76063f7f18a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1265,6 +1265,8 @@ static inline void put_page(struct page *page)
> void unpin_user_page(struct page *page);
> void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
> bool make_dirty);
> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> + bool make_dirty);
> void unpin_user_pages(struct page **pages, unsigned long npages);
>
> /**
> diff --git a/mm/gup.c b/mm/gup.c
> index 971a24b4b73f..1b57355d5033 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -215,11 +215,16 @@ void unpin_user_page(struct page *page)
> }
> EXPORT_SYMBOL(unpin_user_page);
>
> -static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
> +static inline unsigned int count_ntails(struct page **pages,
> + unsigned long npages, bool range)
> {
> - struct page *head = compound_head(pages[0]);
> + struct page *page = pages[0], *head = compound_head(page);
> unsigned int ntails;
>
> + if (range)
> + return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
> + min_t(unsigned int, (head + compound_nr(head) - page), npages);

Here, you clearly should use a separate set of _range routines. Because you're basically
creating two different routines here! Keep it simple.

Once you're in a separate routine, you might feel more comfortable expanding that to
a more readable form, too:

if (!PageCompound(head) || compound_order(head) <= 1)
return 1;

return min_t(unsigned int, (head + compound_nr(head) - page), npages);


thanks,
--
John Hubbard
NVIDIA

> +
> for (ntails = 1; ntails < npages; ntails++) {
> if (compound_head(pages[ntails]) != head)
> break;
> @@ -229,20 +234,32 @@ static inline unsigned int count_ntails(struct page **pages, unsigned long npage
> }
>
> static inline void compound_next(unsigned long i, unsigned long npages,
> - struct page **list, struct page **head,
> - unsigned int *ntails)
> + struct page **list, bool range,
> + struct page **head, unsigned int *ntails)
> {
> + struct page *p, **next = &p;
> +
> if (i >= npages)
> return;
>
> - *ntails = count_ntails(list + i, npages - i);
> - *head = compound_head(list[i]);
> + if (range)
> + *next = *list + i;
> + else
> + next = list + i;
> +
> + *ntails = count_ntails(next, npages - i, range);
> + *head = compound_head(*next);
> }
>
> +#define for_each_compound_range(i, list, npages, head, ntails) \
> + for (i = 0, compound_next(i, npages, list, true, &head, &ntails); \
> + i < npages; i += ntails, \
> + compound_next(i, npages, list, true, &head, &ntails))
> +
> #define for_each_compound_head(i, list, npages, head, ntails) \
> - for (i = 0, compound_next(i, npages, list, &head, &ntails); \
> + for (i = 0, compound_next(i, npages, list, false, &head, &ntails); \
> i < npages; i += ntails, \
> - compound_next(i, npages, list, &head, &ntails))
> + compound_next(i, npages, list, false, &head, &ntails))
>
> /**
> * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
> @@ -306,6 +323,21 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
> }
> EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
>
> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> + bool make_dirty)
> +{
> + unsigned long index;
> + struct page *head;
> + unsigned int ntails;
> +
> + for_each_compound_range(index, &page, npages, head, ntails) {
> + if (make_dirty && !PageDirty(head))
> + set_page_dirty_lock(head);
> + put_compound_head(head, ntails, FOLL_PIN);
> + }
> +}
> +EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
> +
> /**
> * unpin_user_pages() - release an array of gup-pinned pages.
> * @pages: array of pages to be marked dirty and released.
>

2021-02-04 02:27:52

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()

On 2/3/21 2:00 PM, Joao Martins wrote:
...
> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> + bool make_dirty)
> +{
> + unsigned long index;
> + struct page *head;
> + unsigned int ntails;
> +
> + for_each_compound_range(index, &page, npages, head, ntails) {
> + if (make_dirty && !PageDirty(head))
> + set_page_dirty_lock(head);
> + put_compound_head(head, ntails, FOLL_PIN);
> + }
> +}
> +EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
> +

Also, looking at this again while trying to verify the sg diffs in patch #4, I noticed
that the name is tricky. Usually a "range" would not have a single struct page* as the
argument. So in this case, perhaps a comment above the function would help, something
approximately like this:

/*
* The "range" in the function name refers to the fact that the page may be a
* compound page. If so, then that compound page will be treated as one or more
* ranges of contiguous tail pages.
*/

...I guess. Or maybe the name is just wrong (a comment block explaining a name is
always a bad sign). Perhaps we should rename it to something like:

unpin_user_compound_page_dirty_lock()

?

thanks,
--
John Hubbard
NVIDIA

2021-02-04 11:42:02

by Joao Martins

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()



On 2/3/21 11:37 PM, John Hubbard wrote:
> On 2/3/21 2:00 PM, Joao Martins wrote:
>> Add a unpin_user_page_range() API which takes a starting page
>> and how many consecutive pages we want to dirty.
>>
>> Given that we won't be iterating on a list of changes, change
>> compound_next() to receive a bool, whether to calculate from the starting
>> page, or walk the page array. Finally add a separate iterator,
>
> A bool arg is sometimes, but not always, a hint that you really just want
> a separate set of routines. Below...
>
Yes.

I was definitely wrestling back and forth a lot about having separate routines for two
different iterators helpers i.e. compound_next_head()or having it all merged into one
compound_next() / count_ntails().

>> for_each_compound_range() that just operate in page ranges as opposed
>> to page array.
>>
>> For users (like RDMA mr_dereg) where each sg represents a
>> contiguous set of pages, we're able to more efficiently unpin
>> pages without having to supply an array of pages much of what
>> happens today with unpin_user_pages().
>>
>> Suggested-by: Jason Gunthorpe <[email protected]>
>> Signed-off-by: Joao Martins <[email protected]>
>> ---
>> include/linux/mm.h | 2 ++
>> mm/gup.c | 48 ++++++++++++++++++++++++++++++++++++++--------
>> 2 files changed, 42 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index a608feb0d42e..b76063f7f18a 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1265,6 +1265,8 @@ static inline void put_page(struct page *page)
>> void unpin_user_page(struct page *page);
>> void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>> bool make_dirty);
>> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>> + bool make_dirty);
>> void unpin_user_pages(struct page **pages, unsigned long npages);
>>
>> /**
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 971a24b4b73f..1b57355d5033 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -215,11 +215,16 @@ void unpin_user_page(struct page *page)
>> }
>> EXPORT_SYMBOL(unpin_user_page);
>>
>> -static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
>> +static inline unsigned int count_ntails(struct page **pages,
>> + unsigned long npages, bool range)
>> {
>> - struct page *head = compound_head(pages[0]);
>> + struct page *page = pages[0], *head = compound_head(page);
>> unsigned int ntails;
>>
>> + if (range)
>> + return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
>> + min_t(unsigned int, (head + compound_nr(head) - page), npages);
>
> Here, you clearly should use a separate set of _range routines. Because you're basically
> creating two different routines here! Keep it simple.
>
> Once you're in a separate routine, you might feel more comfortable expanding that to
> a more readable form, too:
>
> if (!PageCompound(head) || compound_order(head) <= 1)
> return 1;
>
> return min_t(unsigned int, (head + compound_nr(head) - page), npages);
>
Yes.

Let me also try instead to put move everything into two sole iterator helper routines,
compound_next() and compound_next_range(), and thus get rid of this count_ntails(). It
should also help in removing a compound_head() call which should save cycles.

Joao

2021-02-04 16:36:58

by Joao Martins

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()

On 2/4/21 11:35 AM, Joao Martins wrote:
> On 2/3/21 11:37 PM, John Hubbard wrote:
>> On 2/3/21 2:00 PM, Joao Martins wrote:
>>> -static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
>>> +static inline unsigned int count_ntails(struct page **pages,
>>> + unsigned long npages, bool range)
>>> {
>>> - struct page *head = compound_head(pages[0]);
>>> + struct page *page = pages[0], *head = compound_head(page);
>>> unsigned int ntails;
>>>
>>> + if (range)
>>> + return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
>>> + min_t(unsigned int, (head + compound_nr(head) - page), npages);
>>
>> Here, you clearly should use a separate set of _range routines. Because you're basically
>> creating two different routines here! Keep it simple.
>>
>> Once you're in a separate routine, you might feel more comfortable expanding that to
>> a more readable form, too:
>>
>> if (!PageCompound(head) || compound_order(head) <= 1)
>> return 1;
>>
>> return min_t(unsigned int, (head + compound_nr(head) - page), npages);
>>
> Yes.
>
> Let me also try instead to put move everything into two sole iterator helper routines,
> compound_next() and compound_next_range(), and thus get rid of this count_ntails(). It
> should also help in removing a compound_head() call which should save cycles.
>

As mentioned earlier, I got rid of count_ntails and the ugly boolean. Plus addressed the
missing docs -- fwiw, I borrowed unpin_user_pages_dirty_lock() docs and modified a bit.

Partial diff below, hopefully it is looking better now:

diff --git a/mm/gup.c b/mm/gup.c
index 5a3dd235017a..4ef36c8990e3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -215,6 +215,34 @@ void unpin_user_page(struct page *page)
}
EXPORT_SYMBOL(unpin_user_page);

+static inline void range_next(unsigned long i, unsigned long npages,
+ struct page **list, struct page **head,
+ unsigned int *ntails)
+{
+ struct page *next, *page;
+ unsigned int nr = 1;
+
+ if (i >= npages)
+ return;
+
+ npages -= i;
+ next = *list + i;
+
+ page = compound_head(next);
+ if (PageCompound(page) && compound_order(page) > 1)
+ nr = min_t(unsigned int,
+ page + compound_nr(page) - next, npages);
+
+ *head = page;
+ *ntails = nr;
+}
+
+#define for_each_compound_range(__i, __list, __npages, __head, __ntails) \
+ for (__i = 0, \
+ range_next(__i, __npages, __list, &(__head), &(__ntails)); \
+ __i < __npages; __i += __ntails, \
+ range_next(__i, __npages, __list, &(__head), &(__ntails)))
+
static inline void compound_next(unsigned long i, unsigned long npages,
struct page **list, struct page **head,
unsigned int *ntails)
@@ -306,6 +334,42 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long
npages,
}
EXPORT_SYMBOL(unpin_user_pages_dirty_lock);

+/**
+ * unpin_user_page_range_dirty_lock() - release and optionally dirty
+ * gup-pinned page range
+ *
+ * @page: the starting page of a range maybe marked dirty, and definitely released.
+ * @npages: number of consecutive pages to release.
+ * @make_dirty: whether to mark the pages dirty
+ *
+ * "gup-pinned page range" refers to a range of pages that has had one of the
+ * get_user_pages() variants called on that page.
+ *
+ * For the page ranges defined by [page .. page+npages], make that range (or
+ * its head pages, if a compound page) dirty, if @make_dirty is true, and if the
+ * page range was previously listed as clean.
+ *
+ * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is
+ * required, then the caller should a) verify that this is really correct,
+ * because _lock() is usually required, and b) hand code it:
+ * set_page_dirty_lock(), unpin_user_page().
+ *
+ */
+void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
+ bool make_dirty)
+{
+ unsigned long index;
+ struct page *head;
+ unsigned int ntails;
+
+ for_each_compound_range(index, &page, npages, head, ntails) {
+ if (make_dirty && !PageDirty(head))
+ set_page_dirty_lock(head);
+ put_compound_head(head, ntails, FOLL_PIN);
+ }
+}
+EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
+

2021-02-05 00:02:34

by Joao Martins

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()



On 2/4/21 12:11 AM, John Hubbard wrote:
> On 2/3/21 2:00 PM, Joao Martins wrote:
> ...
>> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>> + bool make_dirty)
>> +{
>> + unsigned long index;
>> + struct page *head;
>> + unsigned int ntails;
>> +
>> + for_each_compound_range(index, &page, npages, head, ntails) {
>> + if (make_dirty && !PageDirty(head))
>> + set_page_dirty_lock(head);
>> + put_compound_head(head, ntails, FOLL_PIN);
>> + }
>> +}
>> +EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
>> +
>
> Also, looking at this again while trying to verify the sg diffs in patch #4, I noticed
> that the name is tricky. Usually a "range" would not have a single struct page* as the
> argument. So in this case, perhaps a comment above the function would help, something
> approximately like this:
>
> /*
> * The "range" in the function name refers to the fact that the page may be a
> * compound page. If so, then that compound page will be treated as one or more
> * ranges of contiguous tail pages.
> */
>
> ...I guess. Or maybe the name is just wrong (a comment block explaining a name is
> always a bad sign).

Naming aside, a comment is probably worth nonetheless to explain what the function does.

> Perhaps we should rename it to something like:
>
> unpin_user_compound_page_dirty_lock()
>
> ?

The thing though, is that it doesn't *only* unpin a compound page. It unpins a contiguous
range of pages (hence page_range) *and* if these are compound pages it further accelerates
things.

Albeit, your name suggestion then probably hints the caller that you should be passing a
compound page anyways, so your suggestion does have a ring to it.

Joao