2022-01-10 04:25:51

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 00/28] Convert GUP to folios

This patch series is against my current folio for-next branch. I know
it won't apply to sfr's next tree, and it's not for-next material yet.
I intend to submit it for 5.18 after I've rebased it to one of the
5.17-rc releases.

The overall effect of this (ignoring the primary "preparing for folios
that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
by ~700 bytes in the config I normally test with.

This patchset just converts existing implementations to use folios.
There's no new API for consumers here to provide information in a more
efficient (address, length) format. That will be a separate patchset.

In v2, I've tried to address all the comments from the reviews I got
on v1. Apologies if I missed anything; I got a lot of good feedback.
Primarily I separated out the folio changes (later) from the "While
I'm looking at this ..." changes (earlier). I'm not sure the story
of the patchset is necessarily coherent this way, but it should be
easier to review.

Another big change is that compound_pincount is now available to all
compound pages, not just those that are order-2-or-higher. Patch 11.

I did notice one bug in my original patchset which I'm disappointed GCC
didn't diagnose:

pages[nr++] = nth_page(page, nr);

Given the massive reorg of the patchset, I didn't feel right using
anyone's SoB from v1 on any of the patches. But, despite the increased
number of patches, I hope it's easier to review than v1.

And I'd dearly love a better name than 'folio_nth'. page_nth() is
a temporary construct, so doesn't need a better name. If you need
context, it's in the gup_folio_range_next() patch and its job
is to tell you, given a page and a folio, what # page it is within
a folio (so a number between [0 and folio_nr_pages())).

Matthew Wilcox (Oracle) (28):
gup: Remove for_each_compound_range()
gup: Remove for_each_compound_head()
gup: Change the calling convention for compound_range_next()
gup: Optimise compound_range_next()
gup: Change the calling convention for compound_next()
gup: Fix some contiguous memmap assumptions
gup: Remove an assumption of a contiguous memmap
gup: Handle page split race more efficiently
gup: Turn hpage_pincount_add() into page_pincount_add()
gup: Turn hpage_pincount_sub() into page_pincount_sub()
mm: Make compound_pincount always available
mm: Add folio_put_refs()
mm: Add folio_pincount_ptr()
mm: Convert page_maybe_dma_pinned() to use a folio
gup: Add try_get_folio() and try_grab_folio()
mm: Remove page_cache_add_speculative() and
page_cache_get_speculative()
gup: Add gup_put_folio()
hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
gup: Convert try_grab_page() to call try_grab_folio()
gup: Convert gup_pte_range() to use a folio
gup: Convert gup_hugepte() to use a folio
gup: Convert gup_huge_pmd() to use a folio
gup: Convert gup_huge_pud() to use a folio
gup: Convert gup_huge_pgd() to use a folio
gup: Convert compound_next() to gup_folio_next()
gup: Convert compound_range_next() to gup_folio_range_next()
mm: Add isolate_lru_folio()
gup: Convert check_and_migrate_movable_pages() to use a folio

Documentation/core-api/pin_user_pages.rst | 18 +-
arch/powerpc/include/asm/mmu_context.h | 1 -
include/linux/mm.h | 70 +++--
include/linux/mm_types.h | 13 +-
include/linux/pagemap.h | 11 -
mm/debug.c | 14 +-
mm/folio-compat.c | 8 +
mm/gup.c | 359 ++++++++++------------
mm/hugetlb.c | 7 +-
mm/internal.h | 8 +-
mm/page_alloc.c | 3 +-
mm/rmap.c | 6 +-
mm/vmscan.c | 43 ++-
13 files changed, 263 insertions(+), 298 deletions(-)

--
2.33.0



2022-01-10 04:25:53

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 05/28] gup: Change the calling convention for compound_next()

Return the head page instead of storing it to a passed parameter.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 6eedca605b3d..8a0ea220ced1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -245,9 +245,8 @@ static inline struct page *compound_range_next(unsigned long i,
return page;
}

-static inline void compound_next(unsigned long i, unsigned long npages,
- struct page **list, struct page **head,
- unsigned int *ntails)
+static inline struct page *compound_next(unsigned long i,
+ unsigned long npages, struct page **list, unsigned int *ntails)
{
struct page *page;
unsigned int nr;
@@ -258,8 +257,8 @@ static inline void compound_next(unsigned long i, unsigned long npages,
break;
}

- *head = page;
*ntails = nr - i;
+ return page;
}

/**
@@ -297,7 +296,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
}

for (index = 0; index < npages; index += ntails) {
- compound_next(index, npages, pages, &head, &ntails);
+ head = compound_next(index, npages, pages, &ntails);
/*
* Checking PageDirty at this point may race with
* clear_page_dirty_for_io(), but that's OK. Two key
@@ -386,7 +385,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
return;

for (index = 0; index < npages; index += ntails) {
- compound_next(index, npages, pages, &head, &ntails);
+ head = compound_next(index, npages, pages, &ntails);
put_compound_head(head, ntails, FOLL_PIN);
}
}
--
2.33.0


2022-01-10 04:26:00

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 04/28] gup: Optimise compound_range_next()

By definition, a compound page has an order >= 1, so the second half
of the test was redundant. Also, this cannot be a tail page since
it's the result of calling compound_head(), so use PageHead() instead
of PageCompound().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 3c93d2fdf4da..6eedca605b3d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -237,7 +237,7 @@ static inline struct page *compound_range_next(unsigned long i,

next = start + i;
page = compound_head(next);
- if (PageCompound(page) && compound_order(page) >= 1)
+ if (PageHead(page))
nr = min_t(unsigned int,
page + compound_nr(page) - next, npages - i);

--
2.33.0


2022-01-10 04:26:17

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 17/28] gup: Add gup_put_folio()

Convert put_compound_head() to gup_put_folio() and hpage_pincount_sub()
to folio_pincount_sub(). This removes the last call to put_page_refs(),
so delete it. Add a temporary put_compound_head() wrapper which will
be deleted by the end of this series.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 42 ++++++++++++++----------------------------
1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 9e581201d679..719252fa0402 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -45,34 +45,15 @@ static void folio_pincount_add(struct folio *folio, int refs)
folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1));
}

-static int page_pincount_sub(struct page *page, int refs)
+static int folio_pincount_sub(struct folio *folio, int refs)
{
- VM_BUG_ON_PAGE(page != compound_head(page), page);
-
- if (PageHead(page))
- atomic_sub(refs, compound_pincount_ptr(page));
+ if (folio_test_large(folio))
+ atomic_sub(refs, folio_pincount_ptr(folio));
else
refs *= GUP_PIN_COUNTING_BIAS;
return refs;
}

-/* Equivalent to calling put_page() @refs times. */
-static void put_page_refs(struct page *page, int refs)
-{
-#ifdef CONFIG_DEBUG_VM
- if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
- return;
-#endif
-
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
-}
-
/*
* Return the folio with ref appropriately incremented,
* or NULL if that failed.
@@ -171,15 +152,20 @@ struct page *try_grab_compound_head(struct page *page,
return &try_grab_folio(page, refs, flags)->page;
}

-static void put_compound_head(struct page *page, int refs, unsigned int flags)
+static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
- mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
- refs);
- refs = page_pincount_sub(page, refs);
+ node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
+ refs = folio_pincount_sub(folio, refs);
}

- put_page_refs(page, refs);
+ folio_put_refs(folio, refs);
+}
+
+static void put_compound_head(struct page *page, int refs, unsigned int flags)
+{
+ VM_BUG_ON_PAGE(PageTail(page), page);
+ gup_put_folio((struct folio *)page, refs, flags);
}

/**
@@ -220,7 +206,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
*/
void unpin_user_page(struct page *page)
{
- put_compound_head(compound_head(page), 1, FOLL_PIN);
+ gup_put_folio(page_folio(page), 1, FOLL_PIN);
}
EXPORT_SYMBOL(unpin_user_page);

--
2.33.0


2022-01-10 04:26:24

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 12/28] mm: Add folio_put_refs()

This is like folio_put(), but puts N references at once instead of
just one. It's like put_page_refs(), but does one atomic operation
instead of two, and is available to more than just gup.c.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 598be27d4d2e..bf9624ca61c3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1234,6 +1234,26 @@ static inline void folio_put(struct folio *folio)
__put_page(&folio->page);
}

+/**
+ * folio_put_refs - Reduce the reference count on a folio.
+ * @folio: The folio.
+ * @refs: The number of references to reduce.
+ *
+ * If the folio's reference count reaches zero, the memory will be
+ * released back to the page allocator and may be used by another
+ * allocation immediately. Do not access the memory or the struct folio
+ * after calling folio_put_refs() unless you can be sure that these weren't
+ * the last references.
+ *
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context. May be called while holding a spinlock.
+ */
+static inline void folio_put_refs(struct folio *folio, int refs)
+{
+ if (folio_ref_sub_and_test(folio, refs))
+ __put_page(&folio->page);
+}
+
static inline void put_page(struct page *page)
{
struct folio *folio = page_folio(page);
--
2.33.0


2022-01-10 04:26:27

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 23/28] gup: Convert gup_huge_pud() to use a folio

Use the new folio-based APIs.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a006bce2d47b..7b7bf8361558 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2527,7 +2527,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
unsigned long end, unsigned int flags,
struct page **pages, int *nr)
{
- struct page *head, *page;
+ struct page *page;
+ struct folio *folio;
int refs;

if (!pud_access_permitted(orig, flags & FOLL_WRITE))
@@ -2543,17 +2544,17 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

- head = try_grab_compound_head(pud_page(orig), refs, flags);
- if (!head)
+ folio = try_grab_folio(page, refs, flags);
+ if (!folio)
return 0;

if (unlikely(pud_val(orig) != pud_val(*pudp))) {
- put_compound_head(head, refs, flags);
+ gup_put_folio(folio, refs, flags);
return 0;
}

*nr += refs;
- SetPageReferenced(head);
+ folio_set_referenced(folio);
return 1;
}

--
2.33.0


2022-01-10 04:26:35

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()

Simplify try_grab_compound_head() and remove an unnecessary
VM_BUG_ON by handling pages both with and without a pincount field in
page_pincount_add().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index dbb1b54d0def..3ed9907f3c8d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,12 +29,23 @@ struct follow_page_context {
unsigned int page_mask;
};

-static void hpage_pincount_add(struct page *page, int refs)
+/*
+ * When pinning a compound page of order > 1 (which is what
+ * hpage_pincount_available() checks for), use an exact count to track
+ * it, via page_pincount_add/_sub().
+ *
+ * However, be sure to *also* increment the normal page refcount field
+ * at least once, so that the page really is pinned. That's why the
+ * refcount from the earlier try_get_compound_head() is left intact.
+ */
+static void page_pincount_add(struct page *page, int refs)
{
- VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
VM_BUG_ON_PAGE(page != compound_head(page), page);

- atomic_add(refs, compound_pincount_ptr(page));
+ if (hpage_pincount_available(page))
+ atomic_add(refs, compound_pincount_ptr(page));
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
}

static void hpage_pincount_sub(struct page *page, int refs)
@@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
if (!page)
return NULL;

- /*
- * When pinning a compound page of order > 1 (which is what
- * hpage_pincount_available() checks for), use an exact count to
- * track it, via hpage_pincount_add/_sub().
- *
- * However, be sure to *also* increment the normal page refcount
- * field at least once, so that the page really is pinned.
- * That's why the refcount from the earlier
- * try_get_compound_head() is left intact.
- */
- if (hpage_pincount_available(page))
- hpage_pincount_add(page, refs);
- else
- page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
-
+ page_pincount_add(page, refs);
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
refs);

--
2.33.0


2022-01-10 04:26:44

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap

This assumption needs the inverse of nth_page(), which I've temporarily
named page_nth() until someone comes up with a better name.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm.h | 2 ++
mm/gup.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8b7d7ed14dd..f2f3400665a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -216,8 +216,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,

#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
+#define page_nth(head, tail) (page_to_pfn(tail) - page_to_pfn(head))
#else
#define nth_page(page,n) ((page) + (n))
+#define page_nth(head, tail) ((tail) - (head))
#endif

/* to align the pointer to the (next) page boundary */
diff --git a/mm/gup.c b/mm/gup.c
index 9c0a702a4e03..afb638a30e44 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -238,8 +238,8 @@ static inline struct page *compound_range_next(unsigned long i,
next = nth_page(start, i);
page = compound_head(next);
if (PageHead(page))
- nr = min_t(unsigned int,
- page + compound_nr(page) - next, npages - i);
+ nr = min_t(unsigned int, npages - i,
+ compound_nr(page) - page_nth(page, next));

*ntails = nr;
return page;
--
2.33.0


2022-01-10 04:26:50

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions

Several functions in gup.c assume that a compound page has virtually
contiguous page structs. This isn't true for SPARSEMEM configs unless
SPARSEMEM_VMEMMAP is also set. Fix them by using nth_page() instead of
plain pointer arithmetic.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 8a0ea220ced1..9c0a702a4e03 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -235,7 +235,7 @@ static inline struct page *compound_range_next(unsigned long i,
struct page *next, *page;
unsigned int nr = 1;

- next = start + i;
+ next = nth_page(start, i);
page = compound_head(next);
if (PageHead(page))
nr = min_t(unsigned int,
@@ -2430,8 +2430,8 @@ static int record_subpages(struct page *page, unsigned long addr,
{
int nr;

- for (nr = 0; addr != end; addr += PAGE_SIZE)
- pages[nr++] = page++;
+ for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
+ pages[nr] = nth_page(page, nr);

return nr;
}
@@ -2466,7 +2466,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));

head = pte_page(pte);
- page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+ page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(head, refs, flags);
@@ -2526,7 +2526,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
pages, nr);
}

- page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(pmd_page(orig), refs, flags);
@@ -2560,7 +2560,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
pages, nr);
}

- page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+ page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(pud_page(orig), refs, flags);
@@ -2589,7 +2589,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,

BUILD_BUG_ON(pgd_devmap(orig));

- page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+ page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(pgd_page(orig), refs, flags);
--
2.33.0


2022-01-10 04:26:54

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 02/28] gup: Remove for_each_compound_head()

This macro doesn't simplify the users; it's easier to just call
compound_next() inside a standard loop.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7a07e0c00bf5..86f8d843de72 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -253,9 +253,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
struct page *page;
unsigned int nr;

- if (i >= npages)
- return;
-
page = compound_head(list[i]);
for (nr = i + 1; nr < npages; nr++) {
if (compound_head(list[nr]) != page)
@@ -266,12 +263,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
*ntails = nr - i;
}

-#define for_each_compound_head(__i, __list, __npages, __head, __ntails) \
- for (__i = 0, \
- compound_next(__i, __npages, __list, &(__head), &(__ntails)); \
- __i < __npages; __i += __ntails, \
- compound_next(__i, __npages, __list, &(__head), &(__ntails)))
-
/**
* unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
* @pages: array of pages to be maybe marked dirty, and definitely released.
@@ -306,7 +297,8 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
return;
}

- for_each_compound_head(index, pages, npages, head, ntails) {
+ for (index = 0; index < npages; index += ntails) {
+ compound_next(index, npages, pages, &head, &ntails);
/*
* Checking PageDirty at this point may race with
* clear_page_dirty_for_io(), but that's OK. Two key
@@ -394,8 +386,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
if (WARN_ON(IS_ERR_VALUE(npages)))
return;

- for_each_compound_head(index, pages, npages, head, ntails)
+ for (index = 0; index < npages; index += ntails) {
+ compound_next(index, npages, pages, &head, &ntails);
put_compound_head(head, ntails, FOLL_PIN);
+ }
}
EXPORT_SYMBOL(unpin_user_pages);

--
2.33.0


2022-01-10 04:27:07

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub()

Remove an unnecessary VM_BUG_ON by handling pages both with and without
a pincount field in page_pincount_sub().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 3ed9907f3c8d..aed48de3912e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -48,12 +48,15 @@ static void page_pincount_add(struct page *page, int refs)
page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
}

-static void hpage_pincount_sub(struct page *page, int refs)
+static int page_pincount_sub(struct page *page, int refs)
{
- VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
VM_BUG_ON_PAGE(page != compound_head(page), page);

- atomic_sub(refs, compound_pincount_ptr(page));
+ if (hpage_pincount_available(page))
+ atomic_sub(refs, compound_pincount_ptr(page));
+ else
+ refs *= GUP_PIN_COUNTING_BIAS;
+ return refs;
}

/* Equivalent to calling put_page() @refs times. */
@@ -177,11 +180,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
if (flags & FOLL_PIN) {
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
refs);
-
- if (hpage_pincount_available(page))
- hpage_pincount_sub(page, refs);
- else
- refs *= GUP_PIN_COUNTING_BIAS;
+ refs = page_pincount_sub(page, refs);
}

put_page_refs(page, refs);
--
2.33.0


2022-01-10 04:27:12

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 13/28] mm: Add folio_pincount_ptr()

This is the folio equivalent of compound_pincount_ptr().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm_types.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 60e4595eaf63..34c7114ea9e9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -312,6 +312,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
return &page[1].compound_mapcount;
}

+static inline atomic_t *folio_pincount_ptr(struct folio *folio)
+{
+ struct page *tail = &folio->page + 1;
+ return &tail->compound_pincount;
+}
+
static inline atomic_t *compound_pincount_ptr(struct page *page)
{
return &page[1].compound_pincount;
--
2.33.0


2022-01-10 04:27:15

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 03/28] gup: Change the calling convention for compound_range_next()

Return the head page instead of storing it to a passed parameter.
Pass the start page directly instead of passing a pointer to it.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 86f8d843de72..3c93d2fdf4da 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -229,21 +229,20 @@ void unpin_user_page(struct page *page)
}
EXPORT_SYMBOL(unpin_user_page);

-static inline void compound_range_next(unsigned long i, unsigned long npages,
- struct page **list, struct page **head,
- unsigned int *ntails)
+static inline struct page *compound_range_next(unsigned long i,
+ unsigned long npages, struct page *start, unsigned int *ntails)
{
struct page *next, *page;
unsigned int nr = 1;

- next = *list + i;
+ next = start + i;
page = compound_head(next);
if (PageCompound(page) && compound_order(page) >= 1)
nr = min_t(unsigned int,
page + compound_nr(page) - next, npages - i);

- *head = page;
*ntails = nr;
+ return page;
}

static inline void compound_next(unsigned long i, unsigned long npages,
@@ -355,7 +354,7 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
unsigned int ntails;

for (index = 0; index < npages; index += ntails) {
- compound_range_next(index, npages, &page, &head, &ntails);
+ head = compound_range_next(index, npages, page, &ntails);
if (make_dirty && !PageDirty(head))
set_page_dirty_lock(head);
put_compound_head(head, ntails, FOLL_PIN);
--
2.33.0


2022-01-10 04:27:27

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 22/28] gup: Convert gup_huge_pmd() to use a folio

Use the new folio-based APIs.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 250326458df6..a006bce2d47b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2492,7 +2492,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
unsigned long end, unsigned int flags,
struct page **pages, int *nr)
{
- struct page *head, *page;
+ struct page *page;
+ struct folio *folio;
int refs;

if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
@@ -2508,17 +2509,17 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

- head = try_grab_compound_head(pmd_page(orig), refs, flags);
- if (!head)
+ folio = try_grab_folio(page, refs, flags);
+ if (!folio)
return 0;

if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
- put_compound_head(head, refs, flags);
+ gup_put_folio(folio, refs, flags);
return 0;
}

*nr += refs;
- SetPageReferenced(head);
+ folio_set_referenced(folio);
return 1;
}

--
2.33.0


2022-01-10 04:27:38

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 08/28] gup: Handle page split race more efficiently

If we hit the page split race, the current code returns NULL which will
presumably trigger a retry under the mmap_lock. This isn't necessary;
we can just retry the compound_head() lookup. This is a very minor
optimisation of an unlikely path, but conceptually it matches (eg)
the page cache RCU-protected lookup.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index afb638a30e44..dbb1b54d0def 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -68,7 +68,10 @@ static void put_page_refs(struct page *page, int refs)
*/
static inline struct page *try_get_compound_head(struct page *page, int refs)
{
- struct page *head = compound_head(page);
+ struct page *head;
+
+retry:
+ head = compound_head(page);

if (WARN_ON_ONCE(page_ref_count(head) < 0))
return NULL;
@@ -86,7 +89,7 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
*/
if (unlikely(compound_head(page) != head)) {
put_page_refs(head, refs);
- return NULL;
+ goto retry;
}

return head;
--
2.33.0


2022-01-10 08:23:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 02/28] gup: Remove for_each_compound_head()

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:25:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 03/28] gup: Change the calling convention for compound_range_next()

On Mon, Jan 10, 2022 at 04:23:41AM +0000, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
> Pass the start page directly instead of passing a pointer to it.

Looks good, but when we're changing the calling conventions anyway:

> -static inline void compound_range_next(unsigned long i, unsigned long npages,
> - struct page **list, struct page **head,
> - unsigned int *ntails)
> +static inline struct page *compound_range_next(unsigned long i,
> + unsigned long npages, struct page *start, unsigned int *ntails)

To me the logical argument order would be something like:

static inline struct page *compound_range_next(struct page *start,
unsigned long npages,, unsigned long i, unsigned int *ntails)

where the two first arguments pass in what is worked on and match the
calling conventions of the caller.

2022-01-10 08:27:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 04/28] gup: Optimise compound_range_next()

On Mon, Jan 10, 2022 at 04:23:42AM +0000, Matthew Wilcox (Oracle) wrote:
> By definition, a compound page has an order >= 1, so the second half
> of the test was redundant. Also, this cannot be a tail page since
> it's the result of calling compound_head(), so use PageHead() instead
> of PageCompound().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:28:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 05/28] gup: Change the calling convention for compound_next()

On Mon, Jan 10, 2022 at 04:23:43AM +0000, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.

Looks good, but same comment about maybe passing list and npages first.

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:31:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions

On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
> Several functions in gup.c assume that a compound page has virtually
> contiguous page structs. This isn't true for SPARSEMEM configs unless
> SPARSEMEM_VMEMMAP is also set. Fix them by using nth_page() instead of
> plain pointer arithmetic.

So is this an actualy bug that need a Fixes tag, or do all architectures
that support THP and sparsemem use SPARSEMEM_VMEMMAP?

> + page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);

Would be nice to fix the indeation for sz - 1 while you're at it.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:33:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 08/28] gup: Handle page split race more efficiently

On Mon, Jan 10, 2022 at 04:23:46AM +0000, Matthew Wilcox (Oracle) wrote:
> If we hit the page split race, the current code returns NULL which will
> presumably trigger a retry under the mmap_lock. This isn't necessary;
> we can just retry the compound_head() lookup. This is a very minor
> optimisation of an unlikely path, but conceptually it matches (eg)
> the page cache RCU-protected lookup.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:33:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()

On Mon, Jan 10, 2022 at 04:23:47AM +0000, Matthew Wilcox (Oracle) wrote:
> Simplify try_grab_compound_head() and remove an unnecessary
> VM_BUG_ON by handling pages both with and without a pincount field in
> page_pincount_add().

Nice:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:33:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub()

On Mon, Jan 10, 2022 at 04:23:48AM +0000, Matthew Wilcox (Oracle) wrote:
> Remove an unnecessary VM_BUG_ON by handling pages both with and without
> a pincount field in page_pincount_sub().

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:34:05

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 12/28] mm: Add folio_put_refs()

On Mon, Jan 10, 2022 at 04:23:50AM +0000, Matthew Wilcox (Oracle) wrote:
> This is like folio_put(), but puts N references at once instead of
> just one. It's like put_page_refs(), but does one atomic operation
> instead of two, and is available to more than just gup.c.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:34:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 13/28] mm: Add folio_pincount_ptr()

On Mon, Jan 10, 2022 at 04:23:51AM +0000, Matthew Wilcox (Oracle) wrote:
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> + struct page *tail = &folio->page + 1;
> + return &tail->compound_pincount;
> +}

And empty line after the declaration would be nice here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:37:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 17/28] gup: Add gup_put_folio()

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2022-01-10 08:40:26

by Christoph Hellwig

[permalink] [raw]

2022-01-10 08:40:32

by Christoph Hellwig

[permalink] [raw]

2022-01-10 13:29:04

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 05/28] gup: Change the calling convention for compound_next()

On Mon, Jan 10, 2022 at 12:27:29AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 04:23:43AM +0000, Matthew Wilcox (Oracle) wrote:
> > Return the head page instead of storing it to a passed parameter.
>
> Looks good, but same comment about maybe passing list and npages first.

Done for both.

2022-01-10 13:37:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions

On Mon, Jan 10, 2022 at 12:29:58AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
> > Several functions in gup.c assume that a compound page has virtually
> > contiguous page structs. This isn't true for SPARSEMEM configs unless
> > SPARSEMEM_VMEMMAP is also set. Fix them by using nth_page() instead of
> > plain pointer arithmetic.
>
> So is this an actualy bug that need a Fixes tag, or do all architectures
> that support THP and sparsemem use SPARSEMEM_VMEMMAP?

As far as I can tell (and I am by no means an expert in this area),
this problem only affects pages of order MAX_ORDER or higher. That is,
somebody using regular 2MB hugepages on x86 won't see a problem, whether
they're using VMEMMAP or not. It only starts to become a problem for
1GB hugepages.

Since THPs are (currently) only allocated from the page allocator, it's
never a problem for THPs, only hugetlbfs. Correcting the places which
can't see a 1GB page is just defense against copy-and-paste programming.

So I'll defer to Mike -- does this ever affect real systems and thus
warrant a backport? I know this doesn't affect UEK because we enable
SPARSEMEM_VMEMMAP.

> > + page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
>
> Would be nice to fix the indeation for sz - 1 while you're at it.

Done.

2022-01-10 15:31:09

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 00/28] Convert GUP to folios


On Mon, Jan 10, 2022 at 04:23:38AM +0000, Matthew Wilcox (Oracle) wrote:
> This patch series is against my current folio for-next branch. I know
> it won't apply to sfr's next tree, and it's not for-next material yet.
> I intend to submit it for 5.18 after I've rebased it to one of the
> 5.17-rc releases.
>
> The overall effect of this (ignoring the primary "preparing for folios
> that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> by ~700 bytes in the config I normally test with.
>
> This patchset just converts existing implementations to use folios.
> There's no new API for consumers here to provide information in a more
> efficient (address, length) format. That will be a separate patchset.
>
> In v2, I've tried to address all the comments from the reviews I got
> on v1. Apologies if I missed anything; I got a lot of good feedback.
> Primarily I separated out the folio changes (later) from the "While
> I'm looking at this ..." changes (earlier). I'm not sure the story
> of the patchset is necessarily coherent this way, but it should be
> easier to review.
>
> Another big change is that compound_pincount is now available to all
> compound pages, not just those that are order-2-or-higher. Patch 11.
>
> I did notice one bug in my original patchset which I'm disappointed GCC
> didn't diagnose:
>
> pages[nr++] = nth_page(page, nr);
>
> Given the massive reorg of the patchset, I didn't feel right using
> anyone's SoB from v1 on any of the patches. But, despite the increased
> number of patches, I hope it's easier to review than v1.
>
> And I'd dearly love a better name than 'folio_nth'. page_nth() is
> a temporary construct, so doesn't need a better name. If you need
> context, it's in the gup_folio_range_next() patch and its job
> is to tell you, given a page and a folio, what # page it is within
> a folio (so a number between [0 and folio_nr_pages())).

folio_tail_index() ?

Series still looks good to me, though I checked each patch
carefully than the prior series:

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-01-10 16:09:43

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 00/28] Convert GUP to folios

On Mon, Jan 10, 2022 at 11:31:03AM -0400, Jason Gunthorpe wrote:
>
> On Mon, Jan 10, 2022 at 04:23:38AM +0000, Matthew Wilcox (Oracle) wrote:
> > This patch series is against my current folio for-next branch. I know
> > it won't apply to sfr's next tree, and it's not for-next material yet.
> > I intend to submit it for 5.18 after I've rebased it to one of the
> > 5.17-rc releases.
> >
> > The overall effect of this (ignoring the primary "preparing for folios
> > that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> > by ~700 bytes in the config I normally test with.
> >
> > This patchset just converts existing implementations to use folios.
> > There's no new API for consumers here to provide information in a more
> > efficient (address, length) format. That will be a separate patchset.
> >
> > In v2, I've tried to address all the comments from the reviews I got
> > on v1. Apologies if I missed anything; I got a lot of good feedback.
> > Primarily I separated out the folio changes (later) from the "While
> > I'm looking at this ..." changes (earlier). I'm not sure the story
> > of the patchset is necessarily coherent this way, but it should be
> > easier to review.
> >
> > Another big change is that compound_pincount is now available to all
> > compound pages, not just those that are order-2-or-higher. Patch 11.
> >
> > I did notice one bug in my original patchset which I'm disappointed GCC
> > didn't diagnose:
> >
> > pages[nr++] = nth_page(page, nr);
> >
> > Given the massive reorg of the patchset, I didn't feel right using
> > anyone's SoB from v1 on any of the patches. But, despite the increased
> > number of patches, I hope it's easier to review than v1.
> >
> > And I'd dearly love a better name than 'folio_nth'. page_nth() is
> > a temporary construct, so doesn't need a better name. If you need
> > context, it's in the gup_folio_range_next() patch and its job
> > is to tell you, given a page and a folio, what # page it is within
> > a folio (so a number between [0 and folio_nr_pages())).
>
> folio_tail_index() ?

I'm a little wary of "index" because folios are used to cache file data,
and folio->index means offset of the folio within the file. We could
make the argument for folio_page_index() (since it might be the head
page, not a tail page), and argue this is the index into the folio,
not the index into the file.

It's better than folio_nth ;-)

> Series still looks good to me, though I checked each patch
> carefully than the prior series:
>
> Reviewed-by: Jason Gunthorpe <[email protected]>

Thanks!

2022-01-10 17:27:02

by William Kucharski

[permalink] [raw]
Subject: Re: [PATCH v2 00/28] Convert GUP to folios

Series still looks good.

Reviewed-by: William Kucharski <[email protected]>

> On Jan 9, 2022, at 9:23 PM, Matthew Wilcox (Oracle) <[email protected]> wrote:
>
> This patch series is against my current folio for-next branch. I know
> it won't apply to sfr's next tree, and it's not for-next material yet.
> I intend to submit it for 5.18 after I've rebased it to one of the
> 5.17-rc releases.
>
> The overall effect of this (ignoring the primary "preparing for folios
> that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> by ~700 bytes in the config I normally test with.
>
> This patchset just converts existing implementations to use folios.
> There's no new API for consumers here to provide information in a more
> efficient (address, length) format. That will be a separate patchset.
>
> In v2, I've tried to address all the comments from the reviews I got
> on v1. Apologies if I missed anything; I got a lot of good feedback.
> Primarily I separated out the folio changes (later) from the "While
> I'm looking at this ..." changes (earlier). I'm not sure the story
> of the patchset is necessarily coherent this way, but it should be
> easier to review.
>
> Another big change is that compound_pincount is now available to all
> compound pages, not just those that are order-2-or-higher. Patch 11.
>
> I did notice one bug in my original patchset which I'm disappointed GCC
> didn't diagnose:
>
> pages[nr++] = nth_page(page, nr);
>
> Given the massive reorg of the patchset, I didn't feel right using
> anyone's SoB from v1 on any of the patches. But, despite the increased
> number of patches, I hope it's easier to review than v1.
>
> And I'd dearly love a better name than 'folio_nth'. page_nth() is
> a temporary construct, so doesn't need a better name. If you need
> context, it's in the gup_folio_range_next() patch and its job
> is to tell you, given a page and a folio, what # page it is within
> a folio (so a number between [0 and folio_nr_pages())).
>
> Matthew Wilcox (Oracle) (28):
> gup: Remove for_each_compound_range()
> gup: Remove for_each_compound_head()
> gup: Change the calling convention for compound_range_next()
> gup: Optimise compound_range_next()
> gup: Change the calling convention for compound_next()
> gup: Fix some contiguous memmap assumptions
> gup: Remove an assumption of a contiguous memmap
> gup: Handle page split race more efficiently
> gup: Turn hpage_pincount_add() into page_pincount_add()
> gup: Turn hpage_pincount_sub() into page_pincount_sub()
> mm: Make compound_pincount always available
> mm: Add folio_put_refs()
> mm: Add folio_pincount_ptr()
> mm: Convert page_maybe_dma_pinned() to use a folio
> gup: Add try_get_folio() and try_grab_folio()
> mm: Remove page_cache_add_speculative() and
> page_cache_get_speculative()
> gup: Add gup_put_folio()
> hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
> gup: Convert try_grab_page() to call try_grab_folio()
> gup: Convert gup_pte_range() to use a folio
> gup: Convert gup_hugepte() to use a folio
> gup: Convert gup_huge_pmd() to use a folio
> gup: Convert gup_huge_pud() to use a folio
> gup: Convert gup_huge_pgd() to use a folio
> gup: Convert compound_next() to gup_folio_next()
> gup: Convert compound_range_next() to gup_folio_range_next()
> mm: Add isolate_lru_folio()
> gup: Convert check_and_migrate_movable_pages() to use a folio
>
> Documentation/core-api/pin_user_pages.rst | 18 +-
> arch/powerpc/include/asm/mmu_context.h | 1 -
> include/linux/mm.h | 70 +++--
> include/linux/mm_types.h | 13 +-
> include/linux/pagemap.h | 11 -
> mm/debug.c | 14 +-
> mm/folio-compat.c | 8 +
> mm/gup.c | 359 ++++++++++------------
> mm/hugetlb.c | 7 +-
> mm/internal.h | 8 +-
> mm/page_alloc.c | 3 +-
> mm/rmap.c | 6 +-
> mm/vmscan.c | 43 ++-
> 13 files changed, 263 insertions(+), 298 deletions(-)
>
> --
> 2.33.0
>


2022-01-10 19:05:43

by Mike Kravetz

[permalink] [raw]
Subject: Re: [External] : Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions

On 1/10/22 05:37, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 12:29:58AM -0800, Christoph Hellwig wrote:
>> On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
>>> Several functions in gup.c assume that a compound page has virtually
>>> contiguous page structs. This isn't true for SPARSEMEM configs unless
>>> SPARSEMEM_VMEMMAP is also set. Fix them by using nth_page() instead of
>>> plain pointer arithmetic.
>>
>> So is this an actualy bug that need a Fixes tag, or do all architectures
>> that support THP and sparsemem use SPARSEMEM_VMEMMAP?
>
> As far as I can tell (and I am by no means an expert in this area),
> this problem only affects pages of order MAX_ORDER or higher. That is,
> somebody using regular 2MB hugepages on x86 won't see a problem, whether
> they're using VMEMMAP or not. It only starts to become a problem for
> 1GB hugepages.
>
> Since THPs are (currently) only allocated from the page allocator, it's
> never a problem for THPs, only hugetlbfs. Correcting the places which
> can't see a 1GB page is just defense against copy-and-paste programming.
>
> So I'll defer to Mike -- does this ever affect real systems and thus
> warrant a backport? I know this doesn't affect UEK because we enable
> SPARSEMEM_VMEMMAP.

I guess it all depends on your definition of 'real' systems. I am unaware
of any distros that disable SPARSEMEM_VMEMMAP, but I do not know or have
access to them all.

In arch specific Kconfig files, SPARSEMEM_VMEMMAP is enabled by default
(if sparsemem is enabled). However, it is 'possible' to configure a kernel
with SPARSEMEM and without SPARSEMEM_VMEMMAP.

This issue came up almost a year ago in this thread:
https://lore.kernel.org/linux-mm/[email protected]/

In practice, I do not recall ever seeing this outside of debug environments
specifically trying to hit the issue.
--
Mike Kravetz

2022-01-11 01:11:20

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 02/28] gup: Remove for_each_compound_head()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This macro doesn't simplify the users; it's easier to just call
> compound_next() inside a standard loop.

Yes it is. :)

>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---

Looks great.

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> mm/gup.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 7a07e0c00bf5..86f8d843de72 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -253,9 +253,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
> struct page *page;
> unsigned int nr;
>
> - if (i >= npages)
> - return;
> -
> page = compound_head(list[i]);
> for (nr = i + 1; nr < npages; nr++) {
> if (compound_head(list[nr]) != page)
> @@ -266,12 +263,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
> *ntails = nr - i;
> }
>
> -#define for_each_compound_head(__i, __list, __npages, __head, __ntails) \
> - for (__i = 0, \
> - compound_next(__i, __npages, __list, &(__head), &(__ntails)); \
> - __i < __npages; __i += __ntails, \
> - compound_next(__i, __npages, __list, &(__head), &(__ntails)))
> -
> /**
> * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
> * @pages: array of pages to be maybe marked dirty, and definitely released.
> @@ -306,7 +297,8 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
> return;
> }
>
> - for_each_compound_head(index, pages, npages, head, ntails) {
> + for (index = 0; index < npages; index += ntails) {
> + compound_next(index, npages, pages, &head, &ntails);
> /*
> * Checking PageDirty at this point may race with
> * clear_page_dirty_for_io(), but that's OK. Two key
> @@ -394,8 +386,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
> if (WARN_ON(IS_ERR_VALUE(npages)))
> return;
>
> - for_each_compound_head(index, pages, npages, head, ntails)
> + for (index = 0; index < npages; index += ntails) {
> + compound_next(index, npages, pages, &head, &ntails);
> put_compound_head(head, ntails, FOLL_PIN);
> + }
> }
> EXPORT_SYMBOL(unpin_user_pages);
>

2022-01-11 01:14:15

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 03/28] gup: Change the calling convention for compound_range_next()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
> Pass the start page directly instead of passing a pointer to it.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> mm/gup.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 86f8d843de72..3c93d2fdf4da 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -229,21 +229,20 @@ void unpin_user_page(struct page *page)
> }
> EXPORT_SYMBOL(unpin_user_page);
>
> -static inline void compound_range_next(unsigned long i, unsigned long npages,
> - struct page **list, struct page **head,
> - unsigned int *ntails)
> +static inline struct page *compound_range_next(unsigned long i,
> + unsigned long npages, struct page *start, unsigned int *ntails)
> {
> struct page *next, *page;
> unsigned int nr = 1;
>
> - next = *list + i;
> + next = start + i;
> page = compound_head(next);
> if (PageCompound(page) && compound_order(page) >= 1)
> nr = min_t(unsigned int,
> page + compound_nr(page) - next, npages - i);
>
> - *head = page;
> *ntails = nr;
> + return page;
> }
>
> static inline void compound_next(unsigned long i, unsigned long npages,
> @@ -355,7 +354,7 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> unsigned int ntails;
>
> for (index = 0; index < npages; index += ntails) {
> - compound_range_next(index, npages, &page, &head, &ntails);
> + head = compound_range_next(index, npages, page, &ntails);
> if (make_dirty && !PageDirty(head))
> set_page_dirty_lock(head);
> put_compound_head(head, ntails, FOLL_PIN);


2022-01-11 01:17:01

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 04/28] gup: Optimise compound_range_next()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> By definition, a compound page has an order >= 1, so the second half
> of the test was redundant. Also, this cannot be a tail page since
> it's the result of calling compound_head(), so use PageHead() instead
> of PageCompound().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> mm/gup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 3c93d2fdf4da..6eedca605b3d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -237,7 +237,7 @@ static inline struct page *compound_range_next(unsigned long i,
>
> next = start + i;
> page = compound_head(next);
> - if (PageCompound(page) && compound_order(page) >= 1)
> + if (PageHead(page))
> nr = min_t(unsigned int,
> page + compound_nr(page) - next, npages - i);
>


2022-01-11 01:18:59

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 05/28] gup: Change the calling convention for compound_next()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)


Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/mm/gup.c b/mm/gup.c
> index 6eedca605b3d..8a0ea220ced1 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -245,9 +245,8 @@ static inline struct page *compound_range_next(unsigned long i,
> return page;
> }
>
> -static inline void compound_next(unsigned long i, unsigned long npages,
> - struct page **list, struct page **head,
> - unsigned int *ntails)
> +static inline struct page *compound_next(unsigned long i,
> + unsigned long npages, struct page **list, unsigned int *ntails)
> {
> struct page *page;
> unsigned int nr;
> @@ -258,8 +257,8 @@ static inline void compound_next(unsigned long i, unsigned long npages,
> break;
> }
>
> - *head = page;
> *ntails = nr - i;
> + return page;
> }
>
> /**
> @@ -297,7 +296,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
> }
>
> for (index = 0; index < npages; index += ntails) {
> - compound_next(index, npages, pages, &head, &ntails);
> + head = compound_next(index, npages, pages, &ntails);
> /*
> * Checking PageDirty at this point may race with
> * clear_page_dirty_for_io(), but that's OK. Two key
> @@ -386,7 +385,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
> return;
>
> for (index = 0; index < npages; index += ntails) {
> - compound_next(index, npages, pages, &head, &ntails);
> + head = compound_next(index, npages, pages, &ntails);
> put_compound_head(head, ntails, FOLL_PIN);
> }
> }


2022-01-11 01:47:35

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Several functions in gup.c assume that a compound page has virtually
> contiguous page structs. This isn't true for SPARSEMEM configs unless
> SPARSEMEM_VMEMMAP is also set. Fix them by using nth_page() instead of
> plain pointer arithmetic.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: John Hubbard <[email protected]>

thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/mm/gup.c b/mm/gup.c
> index 8a0ea220ced1..9c0a702a4e03 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -235,7 +235,7 @@ static inline struct page *compound_range_next(unsigned long i,
> struct page *next, *page;
> unsigned int nr = 1;
>
> - next = start + i;
> + next = nth_page(start, i);
> page = compound_head(next);
> if (PageHead(page))
> nr = min_t(unsigned int,
> @@ -2430,8 +2430,8 @@ static int record_subpages(struct page *page, unsigned long addr,
> {
> int nr;
>
> - for (nr = 0; addr != end; addr += PAGE_SIZE)
> - pages[nr++] = page++;
> + for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
> + pages[nr] = nth_page(page, nr);
>
> return nr;
> }
> @@ -2466,7 +2466,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
> VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>
> head = pte_page(pte);
> - page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
> + page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
> refs = record_subpages(page, addr, end, pages + *nr);
>
> head = try_grab_compound_head(head, refs, flags);
> @@ -2526,7 +2526,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> pages, nr);
> }
>
> - page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> + page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
> refs = record_subpages(page, addr, end, pages + *nr);
>
> head = try_grab_compound_head(pmd_page(orig), refs, flags);
> @@ -2560,7 +2560,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> pages, nr);
> }
>
> - page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> + page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
> refs = record_subpages(page, addr, end, pages + *nr);
>
> head = try_grab_compound_head(pud_page(orig), refs, flags);
> @@ -2589,7 +2589,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>
> BUILD_BUG_ON(pgd_devmap(orig));
>
> - page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
> + page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
> refs = record_subpages(page, addr, end, pages + *nr);
>
> head = try_grab_compound_head(pgd_page(orig), refs, flags);


2022-01-11 03:27:54

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This assumption needs the inverse of nth_page(), which I've temporarily
> named page_nth() until someone comes up with a better name.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/mm.h | 2 ++
> mm/gup.c | 4 ++--
> 2 files changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: John Hubbard <[email protected]>

Maybe a better name for page_nth() will come to mind by the time I
reach the end of this series, we'll see. :)


thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d8b7d7ed14dd..f2f3400665a4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -216,8 +216,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> +#define page_nth(head, tail) (page_to_pfn(tail) - page_to_pfn(head))
> #else
> #define nth_page(page,n) ((page) + (n))
> +#define page_nth(head, tail) ((tail) - (head))
> #endif
>
> /* to align the pointer to the (next) page boundary */
> diff --git a/mm/gup.c b/mm/gup.c
> index 9c0a702a4e03..afb638a30e44 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -238,8 +238,8 @@ static inline struct page *compound_range_next(unsigned long i,
> next = nth_page(start, i);
> page = compound_head(next);
> if (PageHead(page))
> - nr = min_t(unsigned int,
> - page + compound_nr(page) - next, npages - i);
> + nr = min_t(unsigned int, npages - i,
> + compound_nr(page) - page_nth(page, next));
>
> *ntails = nr;
> return page;


2022-01-11 03:30:56

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 08/28] gup: Handle page split race more efficiently

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> If we hit the page split race, the current code returns NULL which will
> presumably trigger a retry under the mmap_lock. This isn't necessary;
> we can just retry the compound_head() lookup. This is a very minor
> optimisation of an unlikely path, but conceptually it matches (eg)
> the page cache RCU-protected lookup.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)


Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/mm/gup.c b/mm/gup.c
> index afb638a30e44..dbb1b54d0def 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -68,7 +68,10 @@ static void put_page_refs(struct page *page, int refs)
> */
> static inline struct page *try_get_compound_head(struct page *page, int refs)
> {
> - struct page *head = compound_head(page);
> + struct page *head;
> +
> +retry:
> + head = compound_head(page);
>
> if (WARN_ON_ONCE(page_ref_count(head) < 0))
> return NULL;
> @@ -86,7 +89,7 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
> */
> if (unlikely(compound_head(page) != head)) {
> put_page_refs(head, refs);
> - return NULL;
> + goto retry;
> }
>
> return head;

2022-01-11 03:36:55

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Simplify try_grab_compound_head() and remove an unnecessary
> VM_BUG_ON by handling pages both with and without a pincount field in
> page_pincount_add().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 33 +++++++++++++++------------------
> 1 file changed, 15 insertions(+), 18 deletions(-)

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/mm/gup.c b/mm/gup.c
> index dbb1b54d0def..3ed9907f3c8d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,12 +29,23 @@ struct follow_page_context {
> unsigned int page_mask;
> };
>
> -static void hpage_pincount_add(struct page *page, int refs)
> +/*
> + * When pinning a compound page of order > 1 (which is what
> + * hpage_pincount_available() checks for), use an exact count to track
> + * it, via page_pincount_add/_sub().
> + *
> + * However, be sure to *also* increment the normal page refcount field
> + * at least once, so that the page really is pinned. That's why the
> + * refcount from the earlier try_get_compound_head() is left intact.
> + */
> +static void page_pincount_add(struct page *page, int refs)
> {
> - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> VM_BUG_ON_PAGE(page != compound_head(page), page);
>
> - atomic_add(refs, compound_pincount_ptr(page));
> + if (hpage_pincount_available(page))
> + atomic_add(refs, compound_pincount_ptr(page));
> + else
> + page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> }
>
> static void hpage_pincount_sub(struct page *page, int refs)
> @@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
> if (!page)
> return NULL;
>
> - /*
> - * When pinning a compound page of order > 1 (which is what
> - * hpage_pincount_available() checks for), use an exact count to
> - * track it, via hpage_pincount_add/_sub().
> - *
> - * However, be sure to *also* increment the normal page refcount
> - * field at least once, so that the page really is pinned.
> - * That's why the refcount from the earlier
> - * try_get_compound_head() is left intact.
> - */
> - if (hpage_pincount_available(page))
> - hpage_pincount_add(page, refs);
> - else
> - page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> -
> + page_pincount_add(page, refs);
> mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
> refs);
>


2022-01-11 04:14:11

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 12/28] mm: Add folio_put_refs()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This is like folio_put(), but puts N references at once instead of
> just one. It's like put_page_refs(), but does one atomic operation
> instead of two, and is available to more than just gup.c.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/mm.h | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 598be27d4d2e..bf9624ca61c3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1234,6 +1234,26 @@ static inline void folio_put(struct folio *folio)
> __put_page(&folio->page);
> }
>
> +/**
> + * folio_put_refs - Reduce the reference count on a folio.
> + * @folio: The folio.
> + * @refs: The number of references to reduce.

Tiny rewording suggestion, if you end up sending another version:

* @refs: The amount to subtract from the folio's reference count.

> + *
> + * If the folio's reference count reaches zero, the memory will be
> + * released back to the page allocator and may be used by another
> + * allocation immediately. Do not access the memory or the struct folio
> + * after calling folio_put_refs() unless you can be sure that these weren't
> + * the last references.
> + *
> + * Context: May be called in process or interrupt context, but not in NMI
> + * context. May be called while holding a spinlock.
> + */
> +static inline void folio_put_refs(struct folio *folio, int refs)
> +{
> + if (folio_ref_sub_and_test(folio, refs))
> + __put_page(&folio->page);
> +}
> +
> static inline void put_page(struct page *page)
> {
> struct folio *folio = page_folio(page);


Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

2022-01-11 04:22:17

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 13/28] mm: Add folio_pincount_ptr()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of compound_pincount_ptr().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/mm_types.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 60e4595eaf63..34c7114ea9e9 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -312,6 +312,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
> return &page[1].compound_mapcount;
> }
>
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> + struct page *tail = &folio->page + 1;
> + return &tail->compound_pincount;
> +}
> +

Should we make this code (the various folio map/pin count inline
functions) more uniform? I see that you prefer "+1" over page[1], sure,
but having a mix seems like it's trying to call out a difference for
which the reader would search in vain. :)

After the whole series is applied, this area ends up with a 50/50 mix of the
two.

Or am I overlooking something, and they really *should* look different?


Just a very minor point, so either way,

Reviewed-by: John Hubbard <[email protected]>

thanks,
--
John Hubbard
NVIDIA

> static inline atomic_t *compound_pincount_ptr(struct page *page)
> {
> return &page[1].compound_pincount;

2022-01-11 04:32:27

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
...
> diff --git a/mm/gup.c b/mm/gup.c
> index dbb1b54d0def..3ed9907f3c8d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,12 +29,23 @@ struct follow_page_context {
> unsigned int page_mask;
> };
>
> -static void hpage_pincount_add(struct page *page, int refs)
> +/*
> + * When pinning a compound page of order > 1 (which is what
> + * hpage_pincount_available() checks for), use an exact count to track
> + * it, via page_pincount_add/_sub().
> + *
> + * However, be sure to *also* increment the normal page refcount field
> + * at least once, so that the page really is pinned. That's why the
> + * refcount from the earlier try_get_compound_head() is left intact.
> + */

I just realized, after looking at this again in a later patch, that the
last paragraph, above, is now misplaced. It refers to the behavior of the
caller, not to this routine. So it needs to be airlifted back to the
caller.

> +static void page_pincount_add(struct page *page, int refs)
> {
> - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> VM_BUG_ON_PAGE(page != compound_head(page), page);
>
> - atomic_add(refs, compound_pincount_ptr(page));
> + if (hpage_pincount_available(page))
> + atomic_add(refs, compound_pincount_ptr(page));
> + else
> + page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> }
>
> static void hpage_pincount_sub(struct page *page, int refs)
> @@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
> if (!page)
> return NULL;
>
> - /*
> - * When pinning a compound page of order > 1 (which is what
> - * hpage_pincount_available() checks for), use an exact count to
> - * track it, via hpage_pincount_add/_sub().
> - *
> - * However, be sure to *also* increment the normal page refcount
> - * field at least once, so that the page really is pinned.
> - * That's why the refcount from the earlier
> - * try_get_compound_head() is left intact.
> - */

...here.

> - if (hpage_pincount_available(page))
> - hpage_pincount_add(page, refs);
> - else
> - page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> -
> + page_pincount_add(page, refs);
> mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
> refs);
>

thanks,
--
John Hubbard
NVIDIA

2022-01-11 06:40:53

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Remove an unnecessary VM_BUG_ON by handling pages both with and without
> a pincount field in page_pincount_sub().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 3ed9907f3c8d..aed48de3912e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -48,12 +48,15 @@ static void page_pincount_add(struct page *page, int refs)
> page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> }
>
> -static void hpage_pincount_sub(struct page *page, int refs)

OK, this is a correct transformation. But it does make the function, as
small as it is, rather hard to understand. It is now factored out at a
strange place, as evidenced in the difficulty in documenting it. Here's
my best attempt so far at documenting it here:

/*
* Subtract @refs from the compound_pincount, if applicable. Otherwise, do
* nothing. And in all cases, return the number of pages that should be passed
* to put_page_refs().
*/

...which I think is worth adding, if we can't find a better factoring.

Either way, it's correct, so:


Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> +static int page_pincount_sub(struct page *page, int refs)
> {
> - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> VM_BUG_ON_PAGE(page != compound_head(page), page);
>
> - atomic_sub(refs, compound_pincount_ptr(page));
> + if (hpage_pincount_available(page))
> + atomic_sub(refs, compound_pincount_ptr(page));
> + else
> + refs *= GUP_PIN_COUNTING_BIAS;
> + return refs;
> }
>
> /* Equivalent to calling put_page() @refs times. */
> @@ -177,11 +180,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
> if (flags & FOLL_PIN) {
> mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
> refs);
> -
> - if (hpage_pincount_available(page))
> - hpage_pincount_sub(page, refs);
> - else
> - refs *= GUP_PIN_COUNTING_BIAS;
> + refs = page_pincount_sub(page, refs);
> }
>
> put_page_refs(page, refs);


2022-01-11 06:44:44

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 17/28] gup: Add gup_put_folio()

On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Convert put_compound_head() to gup_put_folio() and hpage_pincount_sub()
> to folio_pincount_sub(). This removes the last call to put_page_refs(),
> so delete it. Add a temporary put_compound_head() wrapper which will
> be deleted by the end of this series.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 42 ++++++++++++++----------------------------
> 1 file changed, 14 insertions(+), 28 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 9e581201d679..719252fa0402 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -45,34 +45,15 @@ static void folio_pincount_add(struct folio *folio, int refs)
> folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1));
> }
>
> -static int page_pincount_sub(struct page *page, int refs)
> +static int folio_pincount_sub(struct folio *folio, int refs)

I just noticed that this return value was a little hard to reason about, which
prompted me to notice that I'd skipped reviewing patch 10. So I went back
and added a note there. :)

Anyway, this one looks good,

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> {
> - VM_BUG_ON_PAGE(page != compound_head(page), page);
> -
> - if (PageHead(page))
> - atomic_sub(refs, compound_pincount_ptr(page));
> + if (folio_test_large(folio))
> + atomic_sub(refs, folio_pincount_ptr(folio));
> else
> refs *= GUP_PIN_COUNTING_BIAS;
> return refs;
> }
>
> -/* Equivalent to calling put_page() @refs times. */
> -static void put_page_refs(struct page *page, int refs)
> -{
> -#ifdef CONFIG_DEBUG_VM
> - if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
> - return;
> -#endif
> -
> - /*
> - * Calling put_page() for each ref is unnecessarily slow. Only the last
> - * ref needs a put_page().
> - */
> - if (refs > 1)
> - page_ref_sub(page, refs - 1);
> - put_page(page);
> -}
> -
> /*
> * Return the folio with ref appropriately incremented,
> * or NULL if that failed.
> @@ -171,15 +152,20 @@ struct page *try_grab_compound_head(struct page *page,
> return &try_grab_folio(page, refs, flags)->page;
> }
>
> -static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
> {
> if (flags & FOLL_PIN) {
> - mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
> - refs);
> - refs = page_pincount_sub(page, refs);
> + node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> + refs = folio_pincount_sub(folio, refs);
> }
>
> - put_page_refs(page, refs);
> + folio_put_refs(folio, refs);
> +}
> +
> +static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +{
> + VM_BUG_ON_PAGE(PageTail(page), page);
> + gup_put_folio((struct folio *)page, refs, flags);
> }
>
> /**
> @@ -220,7 +206,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
> */
> void unpin_user_page(struct page *page)
> {
> - put_compound_head(compound_head(page), 1, FOLL_PIN);
> + gup_put_folio(page_folio(page), 1, FOLL_PIN);
> }
> EXPORT_SYMBOL(unpin_user_page);
>


2022-01-11 07:36:13

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 22/28] gup: Convert gup_huge_pmd() to use a folio

On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

> diff --git a/mm/gup.c b/mm/gup.c
> index 250326458df6..a006bce2d47b 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2492,7 +2492,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> unsigned long end, unsigned int flags,
> struct page **pages, int *nr)
> {
> - struct page *head, *page;
> + struct page *page;
> + struct folio *folio;
> int refs;
>
> if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
> @@ -2508,17 +2509,17 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
> refs = record_subpages(page, addr, end, pages + *nr);
>
> - head = try_grab_compound_head(pmd_page(orig), refs, flags);
> - if (!head)
> + folio = try_grab_folio(page, refs, flags);
> + if (!folio)
> return 0;
>
> if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> - put_compound_head(head, refs, flags);
> + gup_put_folio(folio, refs, flags);
> return 0;
> }
>
> *nr += refs;
> - SetPageReferenced(head);
> + folio_set_referenced(folio);
> return 1;
> }
>

2022-01-11 07:38:07

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v2 23/28] gup: Convert gup_huge_pud() to use a folio

On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/gup.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: John Hubbard <[email protected]>


thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/mm/gup.c b/mm/gup.c
> index a006bce2d47b..7b7bf8361558 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2527,7 +2527,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> unsigned long end, unsigned int flags,
> struct page **pages, int *nr)
> {
> - struct page *head, *page;
> + struct page *page;
> + struct folio *folio;
> int refs;
>
> if (!pud_access_permitted(orig, flags & FOLL_WRITE))
> @@ -2543,17 +2544,17 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
> refs = record_subpages(page, addr, end, pages + *nr);
>
> - head = try_grab_compound_head(pud_page(orig), refs, flags);
> - if (!head)
> + folio = try_grab_folio(page, refs, flags);
> + if (!folio)
> return 0;
>
> if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> - put_compound_head(head, refs, flags);
> + gup_put_folio(folio, refs, flags);
> return 0;
> }
>
> *nr += refs;
> - SetPageReferenced(head);
> + folio_set_referenced(folio);
> return 1;
> }
>

2022-01-11 13:46:55

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()

On Mon, Jan 10, 2022 at 08:32:20PM -0800, John Hubbard wrote:
> On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> ...
> > diff --git a/mm/gup.c b/mm/gup.c
> > index dbb1b54d0def..3ed9907f3c8d 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -29,12 +29,23 @@ struct follow_page_context {
> > unsigned int page_mask;
> > };
> > -static void hpage_pincount_add(struct page *page, int refs)
> > +/*
> > + * When pinning a compound page of order > 1 (which is what
> > + * hpage_pincount_available() checks for), use an exact count to track
> > + * it, via page_pincount_add/_sub().
> > + *
> > + * However, be sure to *also* increment the normal page refcount field
> > + * at least once, so that the page really is pinned. That's why the
> > + * refcount from the earlier try_get_compound_head() is left intact.
> > + */
>
> I just realized, after looking at this again in a later patch, that the
> last paragraph, above, is now misplaced. It refers to the behavior of the
> caller, not to this routine. So it needs to be airlifted back to the
> caller.

I really do think it fits better here. The thing is there's just one
caller, so it's a little hard to decide what "all callers" need when
there's only one. Maybe I can wordsmith this a bit to read better.

> > +static void page_pincount_add(struct page *page, int refs)
> > {
> > - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> > VM_BUG_ON_PAGE(page != compound_head(page), page);
> > - atomic_add(refs, compound_pincount_ptr(page));
> > + if (hpage_pincount_available(page))
> > + atomic_add(refs, compound_pincount_ptr(page));
> > + else
> > + page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> > }
> > static void hpage_pincount_sub(struct page *page, int refs)
> > @@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
> > if (!page)
> > return NULL;
> > - /*
> > - * When pinning a compound page of order > 1 (which is what
> > - * hpage_pincount_available() checks for), use an exact count to
> > - * track it, via hpage_pincount_add/_sub().
> > - *
> > - * However, be sure to *also* increment the normal page refcount
> > - * field at least once, so that the page really is pinned.
> > - * That's why the refcount from the earlier
> > - * try_get_compound_head() is left intact.
> > - */
>
> ...here.
>
> > - if (hpage_pincount_available(page))
> > - hpage_pincount_add(page, refs);
> > - else
> > - page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> > -
> > + page_pincount_add(page, refs);
> > mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
> > refs);
>
> thanks,
> --
> John Hubbard
> NVIDIA