2022-02-08 16:50:43

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 00/75] MM folio patches for 5.18

Whole series availabke through git, and shortly in linux-next:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next
or git://git.infradead.org/users/willy/pagecache.git for-next

The first few patches should look familiar to most; these are converting
the GUP code to folios (and a few other things). Most are well-reviewed,
but I did have to make significant changes to a few patches to accommodate
John's recent bugfix, so I dropped the R-b from them.

After the GUP changes, I started working on vmscan, trying to convert
all of shrink_page_list() to use a folio. The pages it works on are
folios by definition (since they're chained through ->lru and ->lru
occupies the same bytes of memory as ->compound_head, so they can't be
tail pages). This is a ridiculously large function, and I'm only part
of the way through it. I have, however, finished converting rmap_walk()
and friends to take a folio instead of a page.

Midway through, there's a short detour to fix up page_vma_mapped_walk to
work on an explicit PFN range instead of a page. I had been intending to
convert that to use a folio, but with page_mapped_in_vma() really just
wanting to know about one page (even if it's a head page) and Muchun
wanting to walk pageless memory, making all the users use PFNs just
seemed like the right thing to do.

The last 9 patches actually start adding large folios to the page cache.
This is where I expect the most trouble, but they've been stable in my
testing for a while.

Matthew Wilcox (Oracle) (74):
mm/gup: Increment the page refcount before the pincount
mm/gup: Remove for_each_compound_range()
mm/gup: Remove for_each_compound_head()
mm/gup: Change the calling convention for compound_range_next()
mm/gup: Optimise compound_range_next()
mm/gup: Change the calling convention for compound_next()
mm/gup: Fix some contiguous memmap assumptions
mm/gup: Remove an assumption of a contiguous memmap
mm/gup: Handle page split race more efficiently
mm/gup: Remove hpage_pincount_add()
mm/gup: Remove hpage_pincount_sub()
mm: Make compound_pincount always available
mm: Add folio_pincount_ptr()
mm: Turn page_maybe_dma_pinned() into folio_maybe_dma_pinned()
mm/gup: Add try_get_folio() and try_grab_folio()
mm/gup: Convert try_grab_page() to use a folio
mm: Remove page_cache_add_speculative() and
page_cache_get_speculative()
mm/gup: Add gup_put_folio()
mm/hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
mm/gup: Convert gup_pte_range() to use a folio
mm/gup: Convert gup_hugepte() to use a folio
mm/gup: Convert gup_huge_pmd() to use a folio
mm/gup: Convert gup_huge_pud() to use a folio
mm/gup: Convert gup_huge_pgd() to use a folio
mm/gup: Turn compound_next() into gup_folio_next()
mm/gup: Turn compound_range_next() into gup_folio_range_next()
mm: Turn isolate_lru_page() into folio_isolate_lru()
mm/gup: Convert check_and_migrate_movable_pages() to use a folio
mm/workingset: Convert workingset_eviction() to take a folio
mm/memcg: Convert mem_cgroup_swapout() to take a folio
mm: Add lru_to_folio()
mm: Turn putback_lru_page() into folio_putback_lru()
mm/vmscan: Convert __remove_mapping() to take a folio
mm/vmscan: Turn page_check_dirty_writeback() into
folio_check_dirty_writeback()
mm: Turn head_compound_mapcount() into folio_entire_mapcount()
mm: Add folio_mapcount()
mm: Add split_folio_to_list()
mm: Add folio_is_zone_device() and folio_is_device_private()
mm: Add folio_pgoff()
mm: Add pvmw_set_page() and pvmw_set_folio()
hexagon: Add pmd_pfn()
mm: Convert page_vma_mapped_walk to work on PFNs
mm/page_idle: Convert page_idle_clear_pte_refs() to use a folio
mm/rmap: Use a folio in page_mkclean_one()
mm/rmap: Turn page_referenced() into folio_referenced()
mm/mlock: Turn clear_page_mlock() into folio_end_mlock()
mm/mlock: Turn mlock_vma_page() into mlock_vma_folio()
mm/rmap: Turn page_mlock() into folio_mlock()
mm/mlock: Turn munlock_vma_page() into munlock_vma_folio()
mm/huge_memory: Convert __split_huge_pmd() to take a folio
mm/rmap: Convert try_to_unmap() to take a folio
mm/rmap: Convert try_to_migrate() to folios
mm/rmap: Convert make_device_exclusive_range() to use folios
mm/migrate: Convert remove_migration_ptes() to folios
mm/damon: Convert damon_pa_mkold() to use a folio
mm/damon: Convert damon_pa_young() to use a folio
mm/rmap: Turn page_lock_anon_vma_read() into
folio_lock_anon_vma_read()
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Convert rmap_walk() to take a folio
mm/rmap: Constify the rmap_walk_control argument
mm/vmscan: Free non-shmem folios without splitting them
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Account large folios correctly
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Convert pageout() to take a folio
mm: Turn can_split_huge_page() into can_split_folio()
mm/filemap: Allow large folios to be added to the page cache
mm: Fix READ_ONLY_THP warning
mm: Make large folios depend on THP
mm: Support arbitrary THP sizes
mm/readahead: Add large folio readahead
mm/readahead: Switch to page_cache_ra_order
mm/filemap: Support VM_HUGEPAGE for file mappings
selftests/vm/transhuge-stress: Support file-backed PMD folios

William Kucharski (1):
mm/readahead: Align file mappings for non-DAX

Documentation/core-api/pin_user_pages.rst | 18 +-
arch/hexagon/include/asm/pgtable.h | 3 +-
arch/powerpc/include/asm/mmu_context.h | 1 -
include/linux/huge_mm.h | 59 +--
include/linux/hugetlb.h | 5 +
include/linux/ksm.h | 6 +-
include/linux/mm.h | 145 +++---
include/linux/mm_types.h | 7 +-
include/linux/pagemap.h | 32 +-
include/linux/rmap.h | 50 ++-
include/linux/swap.h | 6 +-
include/trace/events/vmscan.h | 10 +-
kernel/events/uprobes.c | 2 +-
mm/damon/paddr.c | 52 ++-
mm/debug.c | 18 +-
mm/filemap.c | 59 ++-
mm/folio-compat.c | 34 ++
mm/gup.c | 383 +++++++---------
mm/huge_memory.c | 127 +++---
mm/hugetlb.c | 7 +-
mm/internal.h | 52 ++-
mm/ksm.c | 17 +-
mm/memcontrol.c | 22 +-
mm/memory-failure.c | 10 +-
mm/memory_hotplug.c | 13 +-
mm/migrate.c | 90 ++--
mm/mlock.c | 136 +++---
mm/page_alloc.c | 3 +-
mm/page_idle.c | 26 +-
mm/page_vma_mapped.c | 58 ++-
mm/readahead.c | 108 ++++-
mm/rmap.c | 416 +++++++++---------
mm/util.c | 36 +-
mm/vmscan.c | 280 ++++++------
mm/workingset.c | 25 +-
tools/testing/selftests/vm/transhuge-stress.c | 35 +-
36 files changed, 1270 insertions(+), 1081 deletions(-)

--
2.34.1



2022-02-08 18:29:34

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 16/75] mm/gup: Convert try_grab_page() to use a folio

Hoist the folio conversion and the folio_ref_count() check to the
top of the function instead of using the one buried in try_get_page().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/gup.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4f1669db92f5..d18ce4da573f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -174,15 +174,14 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)

/**
* try_grab_page() - elevate a page's refcount by a flag-dependent amount
+ * @page: pointer to page to be grabbed
+ * @flags: gup flags: these are the FOLL_* flag values.
*
* This might not do anything at all, depending on the flags argument.
*
* "grab" names in this file mean, "look at flags to decide whether to use
* FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
*
- * @page: pointer to page to be grabbed
- * @flags: gup flags: these are the FOLL_* flag values.
- *
* Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
* time. Cases: please see the try_grab_folio() documentation, with
* "refs=1".
@@ -193,29 +192,28 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
*/
bool __must_check try_grab_page(struct page *page, unsigned int flags)
{
+ struct folio *folio = page_folio(page);
+
WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == (FOLL_GET | FOLL_PIN));
+ if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
+ return false;

if (flags & FOLL_GET)
- return try_get_page(page);
+ folio_ref_inc(folio);
else if (flags & FOLL_PIN) {
- page = compound_head(page);
-
- if (WARN_ON_ONCE(page_ref_count(page) <= 0))
- return false;
-
/*
- * Similar to try_grab_compound_head(): be sure to *also*
+ * Similar to try_grab_folio(): be sure to *also*
* increment the normal page refcount field at least once,
* so that the page really is pinned.
*/
- if (PageHead(page)) {
- page_ref_add(page, 1);
- atomic_add(1, compound_pincount_ptr(page));
+ if (folio_test_large(folio)) {
+ folio_ref_add(folio, 1);
+ atomic_add(1, folio_pincount_ptr(folio));
} else {
- page_ref_add(page, GUP_PIN_COUNTING_BIAS);
+ folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
}

- mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, 1);
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
}

return true;
--
2.34.1


2022-02-08 22:45:37

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 07/75] mm/gup: Fix some contiguous memmap assumptions

Several functions in gup.c assume that a compound page has virtually
contiguous page structs. This isn't true for SPARSEMEM configs unless
SPARSEMEM_VMEMMAP is also set. Fix them by using nth_page() instead of
plain pointer arithmetic.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
---
mm/gup.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7e4bdae83e9b..29a8021f10a2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -260,7 +260,7 @@ static inline struct page *compound_range_next(struct page *start,
struct page *next, *page;
unsigned int nr = 1;

- next = start + i;
+ next = nth_page(start, i);
page = compound_head(next);
if (PageHead(page))
nr = min_t(unsigned int,
@@ -2462,8 +2462,8 @@ static int record_subpages(struct page *page, unsigned long addr,
{
int nr;

- for (nr = 0; addr != end; addr += PAGE_SIZE)
- pages[nr++] = page++;
+ for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
+ pages[nr] = nth_page(page, nr);

return nr;
}
@@ -2498,7 +2498,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));

head = pte_page(pte);
- page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+ page = nth_page(head, (addr & (sz - 1)) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(head, refs, flags);
@@ -2558,7 +2558,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
pages, nr);
}

- page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(pmd_page(orig), refs, flags);
@@ -2592,7 +2592,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
pages, nr);
}

- page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+ page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(pud_page(orig), refs, flags);
@@ -2621,7 +2621,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,

BUILD_BUG_ON(pgd_devmap(orig));

- page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+ page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);

head = try_grab_compound_head(pgd_page(orig), refs, flags);
--
2.34.1


2022-02-09 00:36:58

by Christoph Hellwig

[permalink] [raw]

2022-02-14 12:05:12

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 00/75] MM folio patches for 5.18

On 2/4/22 11:57, Matthew Wilcox (Oracle) wrote:
> Whole series availabke through git, and shortly in linux-next:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next
> or git://git.infradead.org/users/willy/pagecache.git for-next

Hi Matthew,

I'm having trouble finding this series linux-next, or mmotm either. Has
the plan changed, or maybe I'm just Doing It Wrong? :)

Background as to why (you can skip this part unless you're wondering):

Locally, I've based a small but critical patch on top of this series. It
introduces a new routine:

void pin_user_page(struct page *page);

...which is a prerequisite for converting Direct IO over to use
FOLL_PIN.

For that, I am on the fence about whether to request putting the first
part of my conversion patchset into 5.18, or 5.19. Ideally, I'd like to
keep it based on your series, because otherwise there are a couple of
warts in pin_user_page() that have to be fixed up later. But on the
other hand, it would be nice to get the prerequisites in place, because
many filesystems need small changes.

Here's the diffs for "mm/gup: introduce pin_user_page()", for reference:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73b7e4bd250b..c2bb8099a56b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1963,6 +1963,7 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
+void pin_user_page(struct page *page);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 7150ea002002..7d57c3452192 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3014,6 +3014,40 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
}
EXPORT_SYMBOL(pin_user_pages);

+/**
+ * pin_user_page() - apply a FOLL_PIN reference to a page ()
+ *
+ * @page: the page to be pinned.
+ *
+ * Similar to get_user_pages(), in that the page's refcount is elevated using
+ * FOLL_PIN rules.
+ *
+ * IMPORTANT: That means that the caller must release the page via
+ * unpin_user_page().
+ *
+ */
+void pin_user_page(struct page *page)
+{
+ struct folio *folio = page_folio(page);
+
+ WARN_ON_ONCE(folio_ref_count(folio) <= 0);
+
+ /*
+ * Similar to try_grab_page(): be sure to *also*
+ * increment the normal page refcount field at least once,
+ * so that the page really is pinned.
+ */
+ if (folio_test_large(folio)) {
+ folio_ref_add(folio, 1);
+ atomic_add(1, folio_pincount_ptr(folio));
+ } else {
+ folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+ }
+
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+}
+EXPORT_SYMBOL(pin_user_page);
+
/*
* pin_user_pages_unlocked() is the FOLL_PIN variant of
* get_user_pages_unlocked(). Behavior is the same, except that this one sets


thanks,
--
John Hubbard
NVIDIA

2022-02-14 18:47:05

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 00/75] MM folio patches for 5.18

On Fri, Feb 04, 2022 at 07:57:37PM +0000, Matthew Wilcox (Oracle) wrote:
> Whole series availabke through git, and shortly in linux-next:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next
> or git://git.infradead.org/users/willy/pagecache.git for-next

I've just pushed out a new version to infradead. I'll probably forget
a few things, but major differences:

- Incorporate various fixes from others including:
- Implement pmd_pfn() for various arches from Mike Rapoport
- lru_to_folio() now a function from Christoph Hellwig
- Change a unpin_user_page() call to gup_put_folio() from Mark Hemment
- Use DEFINE_PAGE_VMA_WALK() and DEFINE_FOLIO_VMA_WALK() instead of the
pvmw_set_page()/folio() calls that were in this patch set.
- A new set of ten patches around invalidate_inode_page(). I'll send
them out as a fresh patchset tomorrow.
- Add various Reviewed-by trailers.
- Updated to -rc4.