2023-05-26 22:11:37

by David Howells

[permalink] [raw]
Subject: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
to it from the page tables and make unpin_user_page*() correspondingly
ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a
zero page's refcount as we're only allowed ~2 million pins on it -
something that userspace can conceivably trigger.

Add a pair of functions to test whether a page or a folio is a ZERO_PAGE.

Signed-off-by: David Howells <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: David Hildenbrand <[email protected]>
cc: Lorenzo Stoakes <[email protected]>
cc: Andrew Morton <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Al Viro <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Jan Kara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Jason Gunthorpe <[email protected]>
cc: Logan Gunthorpe <[email protected]>
cc: Hillf Danton <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---

Notes:
ver #3)
- Move is_zero_page() and is_zero_folio() to mm.h for dependency reasons.
- Add more comments and adjust the docs.

ver #2)
- Fix use of ZERO_PAGE().
- Add is_zero_page() and is_zero_folio() wrappers.
- Return the zero page obtained, not ZERO_PAGE(0) unconditionally.

Documentation/core-api/pin_user_pages.rst | 6 +++++
include/linux/mm.h | 26 +++++++++++++++++--
mm/gup.c | 31 ++++++++++++++++++++++-
3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index 9fb0b1080d3b..d3c1f6d8c0e0 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -112,6 +112,12 @@ pages:
This also leads to limitations: there are only 31-10==21 bits available for a
counter that increments 10 bits at a time.

+* Because of that limitation, special handling is applied to the zero pages
+ when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its
+ refcount or pincount at all (it is permanent, so there's no need). The
+ unpinning functions also don't do anything to a zero page. This is
+ transparent to the caller.
+
* Callers must specifically request "dma-pinned tracking of pages". In other
words, just calling get_user_pages() will not suffice; a new set of functions,
pin_user_page() and related, must be used.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27ce77080c79..3c2f6b452586 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1910,6 +1910,28 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
return page_maybe_dma_pinned(page);
}

+/**
+ * is_zero_page - Query if a page is a zero page
+ * @page: The page to query
+ *
+ * This returns true if @page is one of the permanent zero pages.
+ */
+static inline bool is_zero_page(const struct page *page)
+{
+ return is_zero_pfn(page_to_pfn(page));
+}
+
+/**
+ * is_zero_folio - Query if a folio is a zero page
+ * @folio: The folio to query
+ *
+ * This returns true if @folio is one of the permanent zero pages.
+ */
+static inline bool is_zero_folio(const struct folio *folio)
+{
+ return is_zero_page(&folio->page);
+}
+
/* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */
#ifdef CONFIG_MIGRATION
static inline bool is_longterm_pinnable_page(struct page *page)
@@ -1920,8 +1942,8 @@ static inline bool is_longterm_pinnable_page(struct page *page)
if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
return false;
#endif
- /* The zero page may always be pinned */
- if (is_zero_pfn(page_to_pfn(page)))
+ /* The zero page can be "pinned" but gets special handling. */
+ if (is_zero_page(page))
return true;

/* Coherent device memory must always allow eviction. */
diff --git a/mm/gup.c b/mm/gup.c
index bbe416236593..ad28261dcafd 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages,
struct page *page = *pages;
struct folio *folio = page_folio(page);

- if (!folio_test_anon(folio))
+ if (is_zero_page(page) ||
+ !folio_test_anon(folio))
continue;
if (!folio_test_large(folio) || folio_test_hugetlb(folio))
VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page);
@@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
else if (flags & FOLL_PIN) {
struct folio *folio;

+ /*
+ * Don't take a pin on the zero page - it's not going anywhere
+ * and it is used in a *lot* of places.
+ */
+ if (is_zero_page(page))
+ return page_folio(page);
+
/*
* Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
* right zone, so fail and let the caller fall back to the slow
@@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
+ if (is_zero_folio(folio))
+ return;
node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
if (folio_test_large(folio))
atomic_sub(refs, &folio->_pincount);
@@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
if (flags & FOLL_GET)
folio_ref_inc(folio);
else if (flags & FOLL_PIN) {
+ /*
+ * Don't take a pin on the zero page - it's not going anywhere
+ * and it is used in a *lot* of places.
+ */
+ if (is_zero_page(page))
+ return 0;
+
/*
* Similar to try_grab_folio(): be sure to *also*
* increment the normal page refcount field at least once,
@@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
*
* FOLL_PIN means that the pages must be released via unpin_user_page(). Please
* see Documentation/core-api/pin_user_pages.rst for further details.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page() will not remove pins from it.
*/
int pin_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages)
@@ -3110,6 +3130,9 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
*
* FOLL_PIN means that the pages must be released via unpin_user_page(). Please
* see Documentation/core-api/pin_user_pages.rst for details.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page*() will not remove pins from it.
*/
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
@@ -3143,6 +3166,9 @@ EXPORT_SYMBOL(pin_user_pages_remote);
*
* FOLL_PIN means that the pages must be released via unpin_user_page(). Please
* see Documentation/core-api/pin_user_pages.rst for details.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page*() will not remove pins from it.
*/
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
@@ -3161,6 +3187,9 @@ EXPORT_SYMBOL(pin_user_pages);
* pin_user_pages_unlocked() is the FOLL_PIN variant of
* get_user_pages_unlocked(). Behavior is the same, except that this one sets
* FOLL_PIN and rejects FOLL_GET.
+ *
+ * Note that if a zero_page is amongst the returned pages, it will not have
+ * pins in it and unpin_user_page*() will not remove pins from it.
*/
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags)



2023-05-27 20:09:38

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

On Fri, May 26, 2023 at 10:41:40PM +0100, David Howells wrote:
> Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
> to it from the page tables and make unpin_user_page*() correspondingly
> ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a
> zero page's refcount as we're only allowed ~2 million pins on it -
> something that userspace can conceivably trigger.
>
> Add a pair of functions to test whether a page or a folio is a ZERO_PAGE.
>
> Signed-off-by: David Howells <[email protected]>
> cc: Christoph Hellwig <[email protected]>
> cc: David Hildenbrand <[email protected]>
> cc: Lorenzo Stoakes <[email protected]>
> cc: Andrew Morton <[email protected]>
> cc: Jens Axboe <[email protected]>
> cc: Al Viro <[email protected]>
> cc: Matthew Wilcox <[email protected]>
> cc: Jan Kara <[email protected]>
> cc: Jeff Layton <[email protected]>
> cc: Jason Gunthorpe <[email protected]>
> cc: Logan Gunthorpe <[email protected]>
> cc: Hillf Danton <[email protected]>
> cc: Christian Brauner <[email protected]>
> cc: Linus Torvalds <[email protected]>
> cc: [email protected]
> cc: [email protected]
> cc: [email protected]
> cc: [email protected]
> ---
>
> Notes:
> ver #3)
> - Move is_zero_page() and is_zero_folio() to mm.h for dependency reasons.
> - Add more comments and adjust the docs.
>
> ver #2)
> - Fix use of ZERO_PAGE().
> - Add is_zero_page() and is_zero_folio() wrappers.
> - Return the zero page obtained, not ZERO_PAGE(0) unconditionally.
>
> Documentation/core-api/pin_user_pages.rst | 6 +++++
> include/linux/mm.h | 26 +++++++++++++++++--
> mm/gup.c | 31 ++++++++++++++++++++++-
> 3 files changed, 60 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
> index 9fb0b1080d3b..d3c1f6d8c0e0 100644
> --- a/Documentation/core-api/pin_user_pages.rst
> +++ b/Documentation/core-api/pin_user_pages.rst
> @@ -112,6 +112,12 @@ pages:
> This also leads to limitations: there are only 31-10==21 bits available for a
> counter that increments 10 bits at a time.
>
> +* Because of that limitation, special handling is applied to the zero pages
> + when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its
> + refcount or pincount at all (it is permanent, so there's no need). The
> + unpinning functions also don't do anything to a zero page. This is
> + transparent to the caller.
> +

Great! Cheers. :)

> * Callers must specifically request "dma-pinned tracking of pages". In other
> words, just calling get_user_pages() will not suffice; a new set of functions,
> pin_user_page() and related, must be used.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 27ce77080c79..3c2f6b452586 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1910,6 +1910,28 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
> return page_maybe_dma_pinned(page);
> }
>
> +/**
> + * is_zero_page - Query if a page is a zero page
> + * @page: The page to query
> + *
> + * This returns true if @page is one of the permanent zero pages.
> + */
> +static inline bool is_zero_page(const struct page *page)
> +{
> + return is_zero_pfn(page_to_pfn(page));
> +}
> +
> +/**
> + * is_zero_folio - Query if a folio is a zero page
> + * @folio: The folio to query
> + *
> + * This returns true if @folio is one of the permanent zero pages.
> + */
> +static inline bool is_zero_folio(const struct folio *folio)
> +{
> + return is_zero_page(&folio->page);
> +}
> +
> /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */
> #ifdef CONFIG_MIGRATION
> static inline bool is_longterm_pinnable_page(struct page *page)
> @@ -1920,8 +1942,8 @@ static inline bool is_longterm_pinnable_page(struct page *page)
> if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
> return false;
> #endif
> - /* The zero page may always be pinned */
> - if (is_zero_pfn(page_to_pfn(page)))
> + /* The zero page can be "pinned" but gets special handling. */
> + if (is_zero_page(page))
> return true;
>
> /* Coherent device memory must always allow eviction. */
> diff --git a/mm/gup.c b/mm/gup.c
> index bbe416236593..ad28261dcafd 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages,
> struct page *page = *pages;
> struct folio *folio = page_folio(page);
>
> - if (!folio_test_anon(folio))
> + if (is_zero_page(page) ||
> + !folio_test_anon(folio))
> continue;
> if (!folio_test_large(folio) || folio_test_hugetlb(folio))
> VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page);
> @@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
> else if (flags & FOLL_PIN) {
> struct folio *folio;
>
> + /*
> + * Don't take a pin on the zero page - it's not going anywhere
> + * and it is used in a *lot* of places.
> + */
> + if (is_zero_page(page))
> + return page_folio(page);
> +
> /*
> * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
> * right zone, so fail and let the caller fall back to the slow
> @@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
> static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
> {
> if (flags & FOLL_PIN) {
> + if (is_zero_folio(folio))
> + return;
> node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> if (folio_test_large(folio))
> atomic_sub(refs, &folio->_pincount);
> @@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
> if (flags & FOLL_GET)
> folio_ref_inc(folio);
> else if (flags & FOLL_PIN) {
> + /*
> + * Don't take a pin on the zero page - it's not going anywhere
> + * and it is used in a *lot* of places.
> + */
> + if (is_zero_page(page))
> + return 0;
> +
> /*
> * Similar to try_grab_folio(): be sure to *also*
> * increment the normal page refcount field at least once,
> @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
> *
> * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
> * see Documentation/core-api/pin_user_pages.rst for further details.
> + *
> + * Note that if a zero_page is amongst the returned pages, it will not have
> + * pins in it and unpin_user_page() will not remove pins from it.
> */
> int pin_user_pages_fast(unsigned long start, int nr_pages,
> unsigned int gup_flags, struct page **pages)
> @@ -3110,6 +3130,9 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
> *
> * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
> * see Documentation/core-api/pin_user_pages.rst for details.
> + *
> + * Note that if a zero_page is amongst the returned pages, it will not have
> + * pins in it and unpin_user_page*() will not remove pins from it.
> */
> long pin_user_pages_remote(struct mm_struct *mm,
> unsigned long start, unsigned long nr_pages,
> @@ -3143,6 +3166,9 @@ EXPORT_SYMBOL(pin_user_pages_remote);
> *
> * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
> * see Documentation/core-api/pin_user_pages.rst for details.
> + *
> + * Note that if a zero_page is amongst the returned pages, it will not have
> + * pins in it and unpin_user_page*() will not remove pins from it.
> */
> long pin_user_pages(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> @@ -3161,6 +3187,9 @@ EXPORT_SYMBOL(pin_user_pages);
> * pin_user_pages_unlocked() is the FOLL_PIN variant of
> * get_user_pages_unlocked(). Behavior is the same, except that this one sets
> * FOLL_PIN and rejects FOLL_GET.
> + *
> + * Note that if a zero_page is amongst the returned pages, it will not have
> + * pins in it and unpin_user_page*() will not remove pins from it.
> */
> long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> struct page **pages, unsigned int gup_flags)
>

All looks good, thanks for adding comments/updating doc!

Reviewed-by: Lorenzo Stoakes <[email protected]>

2023-05-31 04:03:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-05-31 08:06:28

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

On 26.05.23 23:41, David Howells wrote:
> Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
> to it from the page tables and make unpin_user_page*() correspondingly
> ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a
> zero page's refcount as we're only allowed ~2 million pins on it -
> something that userspace can conceivably trigger.

2 millions pins (FOLL_PIN, which increments the refcount by 1024) or 2
million references ?

>
> Add a pair of functions to test whether a page or a folio is a ZERO_PAGE.
>

[...]

> diff --git a/mm/gup.c b/mm/gup.c
> index bbe416236593..ad28261dcafd 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages,
> struct page *page = *pages;
> struct folio *folio = page_folio(page);
>
> - if (!folio_test_anon(folio))
> + if (is_zero_page(page) ||
> + !folio_test_anon(folio))

Nit: Fits into a single line (without harming readability IMHO).

> continue;
> if (!folio_test_large(folio) || folio_test_hugetlb(folio))
> VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page);
> @@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
> else if (flags & FOLL_PIN) {
> struct folio *folio;
>
> + /*
> + * Don't take a pin on the zero page - it's not going anywhere
> + * and it is used in a *lot* of places.
> + */
> + if (is_zero_page(page))
> + return page_folio(page);
> +
> /*
> * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
> * right zone, so fail and let the caller fall back to the slow
> @@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
> static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
> {
> if (flags & FOLL_PIN) {
> + if (is_zero_folio(folio))
> + return;
> node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> if (folio_test_large(folio))
> atomic_sub(refs, &folio->_pincount);
> @@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
> if (flags & FOLL_GET)
> folio_ref_inc(folio);
> else if (flags & FOLL_PIN) {
> + /*
> + * Don't take a pin on the zero page - it's not going anywhere
> + * and it is used in a *lot* of places.
> + */
> + if (is_zero_page(page))
> + return 0;
> +
> /*
> * Similar to try_grab_folio(): be sure to *also*
> * increment the normal page refcount field at least once,
> @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
> *
> * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
> * see Documentation/core-api/pin_user_pages.rst for further details.
> + *
> + * Note that if a zero_page is amongst the returned pages, it will not have
> + * pins in it and unpin_user_page() will not remove pins from it.
> */

"it will not have pins in it" sounds fairly weird to a non-native speaker.

"Note that the refcount of any zero_pages returned among the pinned
pages will not be incremented, and unpin_user_page() will similarly not
decrement it."


Acked-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb


2023-05-31 08:40:46

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

David Hildenbrand <[email protected]> wrote:

> > Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
> > to it from the page tables and make unpin_user_page*() correspondingly
> > ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a
> > zero page's refcount as we're only allowed ~2 million pins on it -
> > something that userspace can conceivably trigger.
>
> 2 millions pins (FOLL_PIN, which increments the refcount by 1024) or 2 million
> references ?

Definitely pins. It's tricky because we've been using "pinned" to mean held
by a refcount or held by a flag too.

2 million pins on the zero page is in the realms of possibility. It only
takes 32768 64-page DIO writes.

> > @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
> > *
> > * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
> > * see Documentation/core-api/pin_user_pages.rst for further details.
> > + *
> > + * Note that if a zero_page is amongst the returned pages, it will not have
> > + * pins in it and unpin_user_page() will not remove pins from it.
> > */
>
> "it will not have pins in it" sounds fairly weird to a non-native speaker.

Oh, I know. The problem is that "pin" is now really ambiguous. Can we change
"FOLL_PIN" to "FOLL_NAIL"? Or maybe "FOLL_SCREW" - your pages are screwed if
you use DIO and fork at the same time.

> "Note that the refcount of any zero_pages returned among the pinned pages will
> not be incremented, and unpin_user_page() will similarly not decrement it."

That's not really right (although it happens to be true), because we're
talking primarily about the pin counter, not the refcount - and they may be
separate.

David


2023-05-31 09:00:22

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

On 31.05.23 10:35, David Howells wrote:
> David Hildenbrand <[email protected]> wrote:
>
>>> Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
>>> to it from the page tables and make unpin_user_page*() correspondingly
>>> ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a
>>> zero page's refcount as we're only allowed ~2 million pins on it -
>>> something that userspace can conceivably trigger.
>>
>> 2 millions pins (FOLL_PIN, which increments the refcount by 1024) or 2 million
>> references ?
>
> Definitely pins. It's tricky because we've been using "pinned" to mean held
> by a refcount or held by a flag too.
>

Yes, it would be clearer if we would be using "pinned" now only for
FOLL_PIN and everything else is simply "taking a temporary reference on
the page".

> 2 million pins on the zero page is in the realms of possibility. It only
> takes 32768 64-page DIO writes.
>
>>> @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
>>> *
>>> * FOLL_PIN means that the pages must be released via unpin_user_page(). Please
>>> * see Documentation/core-api/pin_user_pages.rst for further details.
>>> + *
>>> + * Note that if a zero_page is amongst the returned pages, it will not have
>>> + * pins in it and unpin_user_page() will not remove pins from it.
>>> */
>>
>> "it will not have pins in it" sounds fairly weird to a non-native speaker.
>
> Oh, I know. The problem is that "pin" is now really ambiguous. Can we change
> "FOLL_PIN" to "FOLL_NAIL"? Or maybe "FOLL_SCREW" - your pages are screwed if
> you use DIO and fork at the same time.
>

I'm hoping that "pinning" will be "FOLL_PIN" (intention to access page
content) and everything else is simply "taking a temporary page reference".

>> "Note that the refcount of any zero_pages returned among the pinned pages will
>> not be incremented, and unpin_user_page() will similarly not decrement it."
>
> That's not really right (although it happens to be true), because we're
> talking primarily about the pin counter, not the refcount - and they may be
> separate.

In any case (FOLL_PIN/FOLL_GET) you increment/decrement the refcount. If
we have a separate pincount, we increment/decrement the refcount by 1
when (un)pinning.

Sure, if we'd have a separate pincount we'd also not be modifying it.

--
Thanks,

David / dhildenb


2023-05-31 14:27:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

David Hildenbrand <[email protected]> wrote:

> Yes, it would be clearer if we would be using "pinned" now only for FOLL_PIN

You're not likely to get that. "To pin" is too useful a verb that gets used
in other contexts too. For that reason, I think FOLL_PIN was a poor choice of
name:-/. I guess the English language has got somewhat overloaded. Maybe
FOLL_PEG? ;-)

> and everything else is simply "taking a temporary reference on the page".

Excluding refs taken with pins, many refs are more permanent than pins as, so
far as I'm aware, pins only last for the duration of an I/O operation.

> >> "Note that the refcount of any zero_pages returned among the pinned pages will
> >> not be incremented, and unpin_user_page() will similarly not decrement it."
> > That's not really right (although it happens to be true), because we're
> > talking primarily about the pin counter, not the refcount - and they may be
> > separate.
>
> In any case (FOLL_PIN/FOLL_GET) you increment/decrement the refcount. If we
> have a separate pincount, we increment/decrement the refcount by 1 when
> (un)pinning.

FOLL_GET isn't relevant here - only FOLL_PIN. Yes, as it happens, we count a
ref if we count a pin, but that's kind of irrelevant; what matters is that the
effect must be undone with un-PUP.

It would be nice not to get a ref on the zero page in FOLL_GET, but I don't
think we can do that yet. Too many places assume that GUP will give them a
ref they can release later via ordinary methods.

David


2023-05-31 14:29:37

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

On Wed, May 31, 2023 at 02:55:35PM +0100, David Howells wrote:
> David Hildenbrand <[email protected]> wrote:
>
> > Yes, it would be clearer if we would be using "pinned" now only for FOLL_PIN
>
> You're not likely to get that. "To pin" is too useful a verb that gets used
> in other contexts too. For that reason, I think FOLL_PIN was a poor choice of
> name:-/. I guess the English language has got somewhat overloaded. Maybe
> FOLL_PEG? ;-)
>

Might I suggest FOLL_FOLL? As we are essentially 'following' the page once
'pinned'. We could differentiate between what is currently FOLL_GET and
FOLL_PIN by using FOLL_FOLL and FOLL_FOLL_FOLL.

Patch series coming soon...

> > and everything else is simply "taking a temporary reference on the page".
>
> Excluding refs taken with pins, many refs are more permanent than pins as, so
> far as I'm aware, pins only last for the duration of an I/O operation.
>
> > >> "Note that the refcount of any zero_pages returned among the pinned pages will
> > >> not be incremented, and unpin_user_page() will similarly not decrement it."
> > > That's not really right (although it happens to be true), because we're
> > > talking primarily about the pin counter, not the refcount - and they may be
> > > separate.
> >
> > In any case (FOLL_PIN/FOLL_GET) you increment/decrement the refcount. If we
> > have a separate pincount, we increment/decrement the refcount by 1 when
> > (un)pinning.
>
> FOLL_GET isn't relevant here - only FOLL_PIN. Yes, as it happens, we count a
> ref if we count a pin, but that's kind of irrelevant; what matters is that the
> effect must be undone with un-PUP.
>
> It would be nice not to get a ref on the zero page in FOLL_GET, but I don't
> think we can do that yet. Too many places assume that GUP will give them a
> ref they can release later via ordinary methods.
>
> David
>

2023-06-01 10:39:15

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages()

On 31.05.23 15:55, David Howells wrote:
> David Hildenbrand <[email protected]> wrote:
>
>> Yes, it would be clearer if we would be using "pinned" now only for FOLL_PIN
>
> You're not likely to get that. "To pin" is too useful a verb that gets used
> in other contexts too. For that reason, I think FOLL_PIN was a poor choice of
> name:-/. I guess the English language has got somewhat overloaded. Maybe
> FOLL_PEG? ;-)

You're probably right. FOLL_PIN and all around that is "get me an
additional reference on the page and make sure I can DMA it without any
undesired side-effects".

FOLL_PIN_DMA would have been clearer (and matches
folio_maybe_dma_pinned() ) ... but then, there are some use cases where
want the same semantics but not actually perform DMA, but simply
read/write via the directmap (e.g., vmsplice, some io_uring cases).
Sure, one could say that they behave like DMA: access page content at
any time.

Saying a page is pinned (additional refcount) and having a pincount of 0
or does indeed cause confusion.

... but once we start renaming FOLL_PIN, pincount, ... we also have to
rename pin_user_pages() and friends, and things get nasty.

>
>> and everything else is simply "taking a temporary reference on the page".
>
> Excluding refs taken with pins, many refs are more permanent than pins as, so
> far as I'm aware, pins only last for the duration of an I/O operation.

I was more thinking along the lines of FOLL_GET vs. FOLL_PIN. Once we
consider any references we might have on a page, things get more tricky
indeed.

>
>>>> "Note that the refcount of any zero_pages returned among the pinned pages will
>>>> not be incremented, and unpin_user_page() will similarly not decrement it."
>>> That's not really right (although it happens to be true), because we're
>>> talking primarily about the pin counter, not the refcount - and they may be
>>> separate.
>>
>> In any case (FOLL_PIN/FOLL_GET) you increment/decrement the refcount. If we
>> have a separate pincount, we increment/decrement the refcount by 1 when
>> (un)pinning.
>
> FOLL_GET isn't relevant here - only FOLL_PIN. Yes, as it happens, we count a
> ref if we count a pin, but that's kind of irrelevant; what matters is that the
> effect must be undone with un-PUP.

The point I was trying to make is that we always modify the refcount,
and in some cases (FOLL_PIN on order > 0) also the pincount.

But if you define "pins" as "additional reference", we're on the same page.

>
> It would be nice not to get a ref on the zero page in FOLL_GET, but I don't
> think we can do that yet. Too many places assume that GUP will give them a
> ref they can release later via ordinary methods.

No we can't I'm afraid.

--
Thanks,

David / dhildenb