Move compound_pincount from the third page to the second page, which
means it's available for all compound pages. That lets us delete
hpage_pincount_available().
On 32-bit systems, there isn't enough space for both compound_pincount
and compound_nr in the second page (it would collide with page->private,
which is in use for pages in the swap cache), so revert the optimisation
of storing both compound_order and compound_nr on 32-bit systems.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
Documentation/core-api/pin_user_pages.rst | 18 +++++++++---------
include/linux/mm.h | 21 ++++++++-------------
include/linux/mm_types.h | 7 +++++--
mm/debug.c | 14 ++++----------
mm/gup.c | 18 ++++++++----------
mm/page_alloc.c | 3 +--
mm/rmap.c | 6 ++----
7 files changed, 37 insertions(+), 50 deletions(-)
diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index fcf605be43d0..b18416f4500f 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -55,18 +55,18 @@ flags the caller provides. The caller is required to pass in a non-null struct
pages* array, and the function then pins pages by incrementing each by a special
value: GUP_PIN_COUNTING_BIAS.
-For huge pages (and in fact, any compound page of more than 2 pages), the
-GUP_PIN_COUNTING_BIAS scheme is not used. Instead, an exact form of pin counting
-is achieved, by using the 3rd struct page in the compound page. A new struct
-page field, hpage_pinned_refcount, has been added in order to support this.
+For compound pages, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead,
+an exact form of pin counting is achieved, by using the 2nd struct page
+in the compound page. A new struct page field, compound_pincount, has
+been added in order to support this.
This approach for compound pages avoids the counting upper limit problems that
are discussed below. Those limitations would have been aggravated severely by
huge pages, because each tail page adds a refcount to the head page. And in
-fact, testing revealed that, without a separate hpage_pinned_refcount field,
+fact, testing revealed that, without a separate compound_pincount field,
page overflows were seen in some huge page stress tests.
-This also means that huge pages and compound pages (of order > 1) do not suffer
+This also means that huge pages and compound pages do not suffer
from the false positives problem that is mentioned below.::
Function
@@ -264,9 +264,9 @@ place.)
Other diagnostics
=================
-dump_page() has been enhanced slightly, to handle these new counting fields, and
-to better report on compound pages in general. Specifically, for compound pages
-with order > 1, the exact (hpage_pinned_refcount) pincount is reported.
+dump_page() has been enhanced slightly, to handle these new counting
+fields, and to better report on compound pages in general. Specifically,
+for compound pages, the exact (compound_pincount) pincount is reported.
References
==========
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f2f3400665a4..598be27d4d2e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -929,17 +929,6 @@ static inline void destroy_compound_page(struct page *page)
compound_page_dtors[page[1].compound_dtor](page);
}
-static inline bool hpage_pincount_available(struct page *page)
-{
- /*
- * Can the page->hpage_pinned_refcount field be used? That field is in
- * the 3rd page of the compound page, so the smallest (2-page) compound
- * pages cannot support it.
- */
- page = compound_head(page);
- return PageCompound(page) && compound_order(page) > 1;
-}
-
static inline int head_compound_pincount(struct page *head)
{
return atomic_read(compound_pincount_ptr(head));
@@ -947,7 +936,7 @@ static inline int head_compound_pincount(struct page *head)
static inline int compound_pincount(struct page *page)
{
- VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
+ VM_BUG_ON_PAGE(!PageCompound(page), page);
page = compound_head(page);
return head_compound_pincount(page);
}
@@ -955,7 +944,9 @@ static inline int compound_pincount(struct page *page)
static inline void set_compound_order(struct page *page, unsigned int order)
{
page[1].compound_order = order;
+#ifdef CONFIG_64BIT
page[1].compound_nr = 1U << order;
+#endif
}
/* Returns the number of pages in this potentially compound page. */
@@ -963,7 +954,11 @@ static inline unsigned long compound_nr(struct page *page)
{
if (!PageHead(page))
return 1;
+#ifdef CONFIG_64BIT
return page[1].compound_nr;
+#else
+ return 1UL << compound_order(page);
+#endif
}
/* Returns the number of bytes in this potentially compound page. */
@@ -1325,7 +1320,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages);
*/
static inline bool page_maybe_dma_pinned(struct page *page)
{
- if (hpage_pincount_available(page))
+ if (PageCompound(page))
return compound_pincount(page) > 0;
/*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3a6e6209600..60e4595eaf63 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -150,11 +150,14 @@ struct page {
unsigned char compound_dtor;
unsigned char compound_order;
atomic_t compound_mapcount;
+ atomic_t compound_pincount;
+#ifdef CONFIG_64BIT
unsigned int compound_nr; /* 1 << compound_order */
+#endif
};
struct { /* Second tail page of compound page */
unsigned long _compound_pad_1; /* compound_head */
- atomic_t hpage_pinned_refcount;
+ unsigned long _compound_pad_2;
/* For both global and memcg */
struct list_head deferred_list;
};
@@ -311,7 +314,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
static inline atomic_t *compound_pincount_ptr(struct page *page)
{
- return &page[2].hpage_pinned_refcount;
+ return &page[1].compound_pincount;
}
/*
diff --git a/mm/debug.c b/mm/debug.c
index a05a39ff8fe4..7925fac2bd8e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -92,16 +92,10 @@ static void __dump_page(struct page *page)
page, page_ref_count(head), mapcount, mapping,
page_to_pgoff(page), page_to_pfn(page));
if (compound) {
- if (hpage_pincount_available(page)) {
- pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
- head, compound_order(head),
- head_compound_mapcount(head),
- head_compound_pincount(head));
- } else {
- pr_warn("head:%p order:%u compound_mapcount:%d\n",
- head, compound_order(head),
- head_compound_mapcount(head));
- }
+ pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
+ head, compound_order(head),
+ head_compound_mapcount(head),
+ head_compound_pincount(head));
}
#ifdef CONFIG_MEMCG
diff --git a/mm/gup.c b/mm/gup.c
index aed48de3912e..1282d29357b7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -30,9 +30,8 @@ struct follow_page_context {
};
/*
- * When pinning a compound page of order > 1 (which is what
- * hpage_pincount_available() checks for), use an exact count to track
- * it, via page_pincount_add/_sub().
+ * When pinning a compound page, use an exact count to track it, via
+ * page_pincount_add/_sub().
*
* However, be sure to *also* increment the normal page refcount field
* at least once, so that the page really is pinned. That's why the
@@ -42,7 +41,7 @@ static void page_pincount_add(struct page *page, int refs)
{
VM_BUG_ON_PAGE(page != compound_head(page), page);
- if (hpage_pincount_available(page))
+ if (PageHead(page))
atomic_add(refs, compound_pincount_ptr(page));
else
page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
@@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
{
VM_BUG_ON_PAGE(page != compound_head(page), page);
- if (hpage_pincount_available(page))
+ if (PageHead(page))
atomic_sub(refs, compound_pincount_ptr(page));
else
refs *= GUP_PIN_COUNTING_BIAS;
@@ -129,12 +128,11 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
*
* FOLL_GET: page's refcount will be incremented by @refs.
*
- * FOLL_PIN on compound pages that are > two pages long: page's refcount will
- * be incremented by @refs, and page[2].hpage_pinned_refcount will be
- * incremented by @refs * GUP_PIN_COUNTING_BIAS.
+ * FOLL_PIN on compound pages: page's refcount will be incremented by
+ * @refs, and page[1].compound_pincount will be incremented by @refs.
*
- * FOLL_PIN on normal pages, or compound pages that are two pages long:
- * page's refcount will be incremented by @refs * GUP_PIN_COUNTING_BIAS.
+ * FOLL_PIN on normal pages: page's refcount will be incremented by
+ * @refs * GUP_PIN_COUNTING_BIAS.
*
* Return: head page (with refcount appropriately incremented) for success, or
* NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..6b030c0cb207 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -741,8 +741,7 @@ void prep_compound_page(struct page *page, unsigned int order)
set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
set_compound_order(page, order);
atomic_set(compound_mapcount_ptr(page), -1);
- if (hpage_pincount_available(page))
- atomic_set(compound_pincount_ptr(page), 0);
+ atomic_set(compound_pincount_ptr(page), 0);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/mm/rmap.c b/mm/rmap.c
index 163ac4e6bcee..a44a32db4803 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1187,8 +1187,7 @@ void page_add_new_anon_rmap(struct page *page,
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
/* increment count (starts at -1) */
atomic_set(compound_mapcount_ptr(page), 0);
- if (hpage_pincount_available(page))
- atomic_set(compound_pincount_ptr(page), 0);
+ atomic_set(compound_pincount_ptr(page), 0);
__mod_lruvec_page_state(page, NR_ANON_THPS, nr);
} else {
@@ -2410,8 +2409,7 @@ void hugepage_add_new_anon_rmap(struct page *page,
{
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
atomic_set(compound_mapcount_ptr(page), 0);
- if (hpage_pincount_available(page))
- atomic_set(compound_pincount_ptr(page), 0);
+ atomic_set(compound_pincount_ptr(page), 0);
__page_set_anon_rmap(page, vma, address, 1);
}
--
2.33.0
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Move compound_pincount from the third page to the second page, which
> means it's available for all compound pages. That lets us delete
> hpage_pincount_available().
Wow, OK. That's a welcome simplification. Looks good. A couple comments
below, too.
...
> @@ -955,7 +944,9 @@ static inline int compound_pincount(struct page *page)
> static inline void set_compound_order(struct page *page, unsigned int order)
> {
> page[1].compound_order = order;
> +#ifdef CONFIG_64BIT
> page[1].compound_nr = 1U << order;
> +#endif
> }
>
> /* Returns the number of pages in this potentially compound page. */
> @@ -963,7 +954,11 @@ static inline unsigned long compound_nr(struct page *page)
> {
> if (!PageHead(page))
> return 1;
> +#ifdef CONFIG_64BIT
> return page[1].compound_nr;
> +#else
> + return 1UL << compound_order(page);
> +#endif
Now that you are highlighting this, I have this persistent feeling (not
yet confirmed by any testing) that compound_nr is a micro-optimization
that is actually invisible at runtime--but is now slicing up our code
with ifdefs, and using space in a fairly valuable location.
Not for this patch or series, but maybe a separate patch or series
should just remove the compound_nr field entirely, yes? It is
surprising to carry around both compound_order and (1 <<
compound_order), right next to each other. It would be different if this
were an expensive calculation, but it's just a shift.
Maybe testing would prove that that's a bad idea, and maybe someone has
already looked into it, but I wanted to point it out.
...
> @@ -42,7 +41,7 @@ static void page_pincount_add(struct page *page, int refs)
> {
> VM_BUG_ON_PAGE(page != compound_head(page), page);
>
> - if (hpage_pincount_available(page))
> + if (PageHead(page))
> atomic_add(refs, compound_pincount_ptr(page));
> else
> page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> @@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
> {
> VM_BUG_ON_PAGE(page != compound_head(page), page);
>
> - if (hpage_pincount_available(page))
> + if (PageHead(page))
OK, so we just verified (via VM_BUG_ON_PAGE(), which is not always active)
that this is not a tail page. And so PageHead() effectively means PageCompound().
I wonder if it would be better to just use PageCompound() here and in similar
cases. Because that's what is logically being checked, after all. It seems
slightly more accurate.
> atomic_sub(refs, compound_pincount_ptr(page));
> else
> refs *= GUP_PIN_COUNTING_BIAS;
> @@ -129,12 +128,11 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
> *
> * FOLL_GET: page's refcount will be incremented by @refs.
> *
> - * FOLL_PIN on compound pages that are > two pages long: page's refcount will
> - * be incremented by @refs, and page[2].hpage_pinned_refcount will be
> - * incremented by @refs * GUP_PIN_COUNTING_BIAS.
> + * FOLL_PIN on compound pages: page's refcount will be incremented by
> + * @refs, and page[1].compound_pincount will be incremented by @refs.
ha, thanks for fixing that documentation bug!
This all looks good, the above are very minor questions,
Reviewed-by: John Hubbard <[email protected]>
thanks,
--
John Hubbard
NVIDIA
On Mon, Jan 10, 2022 at 08:06:54PM -0800, John Hubbard wrote:
> > +#ifdef CONFIG_64BIT
> > return page[1].compound_nr;
> > +#else
> > + return 1UL << compound_order(page);
> > +#endif
>
> Now that you are highlighting this, I have this persistent feeling (not
> yet confirmed by any testing) that compound_nr is a micro-optimization
> that is actually invisible at runtime--but is now slicing up our code
> with ifdefs, and using space in a fairly valuable location.
>
> Not for this patch or series, but maybe a separate patch or series
> should just remove the compound_nr field entirely, yes? It is
> surprising to carry around both compound_order and (1 <<
> compound_order), right next to each other. It would be different if this
> were an expensive calculation, but it's just a shift.
>
> Maybe testing would prove that that's a bad idea, and maybe someone has
> already looked into it, but I wanted to point it out.
It' probably worth looking at the patch which added it ... 1378a5ee451a
in August 2020. I didn't provide any performance numbers, but code size
definitely went down.
> > @@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
> > {
> > VM_BUG_ON_PAGE(page != compound_head(page), page);
> > - if (hpage_pincount_available(page))
> > + if (PageHead(page))
>
> OK, so we just verified (via VM_BUG_ON_PAGE(), which is not always active)
> that this is not a tail page. And so PageHead() effectively means PageCompound().
>
> I wonder if it would be better to just use PageCompound() here and in similar
> cases. Because that's what is logically being checked, after all. It seems
> slightly more accurate.
Well PageCompound() is defined as PageHead() || PageTail(). I don't
think the intent was for people to always ask "Is this a compound page",
more "This is a good shorthand to replace PageHead() || PageTail()".
It's kind of moot anyway because this gets replaced with
folio_test_large() further down the patch series.
On 1/10/22 20:38, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 08:06:54PM -0800, John Hubbard wrote:
>>> +#ifdef CONFIG_64BIT
>>> return page[1].compound_nr;
>>> +#else
>>> + return 1UL << compound_order(page);
>>> +#endif
>>
>> Now that you are highlighting this, I have this persistent feeling (not
>> yet confirmed by any testing) that compound_nr is a micro-optimization
>> that is actually invisible at runtime--but is now slicing up our code
>> with ifdefs, and using space in a fairly valuable location.
>>
>> Not for this patch or series, but maybe a separate patch or series
>> should just remove the compound_nr field entirely, yes? It is
>> surprising to carry around both compound_order and (1 <<
>> compound_order), right next to each other. It would be different if this
>> were an expensive calculation, but it's just a shift.
>>
>> Maybe testing would prove that that's a bad idea, and maybe someone has
>> already looked into it, but I wanted to point it out.
>
> It' probably worth looking at the patch which added it ... 1378a5ee451a
> in August 2020. I didn't provide any performance numbers, but code size
> definitely went down.
I looked at that, and the lore link for the conversation, but failed to learn
anything additional. Of course if you recall that there was in fact a measurable
performance improvement, then as of now, it's recorded somewhere. :)
It's far from clear whether we'll need or want this space in page[1] in the
future anyway, just wanted to poke at it though.
>
>>> @@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
>>> {
>>> VM_BUG_ON_PAGE(page != compound_head(page), page);
>>> - if (hpage_pincount_available(page))
>>> + if (PageHead(page))
>>
>> OK, so we just verified (via VM_BUG_ON_PAGE(), which is not always active)
>> that this is not a tail page. And so PageHead() effectively means PageCompound().
>>
>> I wonder if it would be better to just use PageCompound() here and in similar
>> cases. Because that's what is logically being checked, after all. It seems
>> slightly more accurate.
>
> Well PageCompound() is defined as PageHead() || PageTail(). I don't
> think the intent was for people to always ask "Is this a compound page",
> more "This is a good shorthand to replace PageHead() || PageTail()".
> It's kind of moot anyway because this gets replaced with
> folio_test_large() further down the patch series.
>
OK.
thanks,
--
John Hubbard
NVIDIA
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>