2023-12-11 15:57:23

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 00/39] mm/rmap: interface overhaul

This series overhauls the rmap interface, to get rid of the "bool compound"
/ RMAP_COMPOUND parameter with the goal of making the interface less error
prone, more future proof, and more natural to extend to "batching". Also,
this converts the interface to always consume folio+subpage, which speeds
up operations on large folios.

Further, this series adds PTE-batching variants for 4 rmap functions,
whereby only folio_add_anon_rmap_ptes() is used for batching in this series
when PTE-remapping a PMD-mapped THP. folio_remove_rmap_ptes(),
folio_try_dup_anon_rmap_ptes() and folio_dup_file_rmap_ptes() will soon
come in handy[1,2].

This series performs a lot of folio conversion along the way. Most of the
added LOC in the diff are only due to documentation.

As we're moving to a pte/pmd interface where we clearly express the
mapping granularity we are dealing with, we first get the remainder of
hugetlb out of the way, as it is special and expected to remain special: it
treats everything as a "single logical PTE" and only currently allows
entire mappings.

Even if we'd ever support partial mappings, I strongly assume the interface
and implementation will still differ heavily: hopefull we can avoid working
on subpages/subpage mapcounts completely and only add a "count" parameter
for them to enable batching.

New (extended) hugetlb interface that operates on entire folio:
* hugetlb_add_new_anon_rmap() -> Already existed
* hugetlb_add_anon_rmap() -> Already existed
* hugetlb_try_dup_anon_rmap()
* hugetlb_try_share_anon_rmap()
* hugetlb_add_file_rmap()
* hugetlb_remove_rmap()

New "ordinary" interface for small folios / THP::
* folio_add_new_anon_rmap() -> Already existed
* folio_add_anon_rmap_[pte|ptes|pmd]()
* folio_try_dup_anon_rmap_[pte|ptes|pmd]()
* folio_try_share_anon_rmap_[pte|pmd]()
* folio_add_file_rmap_[pte|ptes|pmd]()
* folio_dup_file_rmap_[pte|ptes|pmd]()
* folio_remove_rmap_[pte|ptes|pmd]()

folio_add_new_anon_rmap() will always map at the largest granularity
possible (currently, a single PMD to cover a PMD-sized THP). Could be
extended if ever required.

In the future, we might want "_pud" variants and eventually "_pmds"
variants for batching.

I ran some simple microbenchmarks on an Intel(R) Xeon(R) Silver 4210R:
measuring munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
remapping PMD-mapped THPs on 1 GiB of memory.

For small folios, there is barely a change (< 1%).

For PTE-mapped THP:
* PTE-remapping a PMD-mapped THP is more than 10% faster.
* fork() is more than 4% faster.
* MADV_DONTNEED is 2% faster
* COW when writing only a single byte on a COW-shared PTE is 1% faster
* munmap() barely changes (< 1%).

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]

---

Based on current mm/mm-unstable. Compile-tested with/wihout THP on x86-64
and with defconig on a bunch more. Tested on x86-64.

RFC -> v1:
* Rebased on top of mm-unstable (containing mTHP)
* Use switch()-case and _always_inline for helper functions
* Fixed some (intermittend) compile issues and some smaller stuff
* folio_try_dup_anon_rmap_[pte|ptes|pmd]() rewrite
* Pass nr_pages consistently as "int"
* Simplify sanity checks
* Added RBs

Cc: Andrew Morton <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yin Fengwei <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>


David Hildenbrand (39):
mm/rmap: rename hugepage_add* to hugetlb_add*
mm/rmap: introduce and use hugetlb_remove_rmap()
mm/rmap: introduce and use hugetlb_add_file_rmap()
mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
mm/rmap: add hugetlb sanity checks
mm/rmap: convert folio_add_file_rmap_range() into
folio_add_file_rmap_[pte|ptes|pmd]()
mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
mm/rmap: remove page_add_file_rmap()
mm/rmap: factor out adding folio mappings into __folio_add_rmap()
mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/rmap: remove page_add_anon_rmap()
mm/rmap: remove RMAP_COMPOUND
mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
Documentation: stop referring to page_remove_rmap()
mm/rmap: remove page_remove_rmap()
mm/rmap: convert page_dup_file_rmap() to
folio_dup_file_rmap_[pte|ptes|pmd]()
mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
mm/huge_memory: page_try_dup_anon_rmap() ->
folio_try_dup_anon_rmap_pmd()
mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
mm/rmap: remove page_try_dup_anon_rmap()
mm: convert page_try_share_anon_rmap() to
folio_try_share_anon_rmap_[pte|pmd]()
mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED

Documentation/mm/transhuge.rst | 4 +-
Documentation/mm/unevictable-lru.rst | 4 +-
include/linux/mm.h | 6 +-
include/linux/rmap.h | 398 +++++++++++++++++++-----
kernel/events/uprobes.c | 2 +-
mm/filemap.c | 10 +-
mm/gup.c | 2 +-
mm/huge_memory.c | 85 +++---
mm/hugetlb.c | 21 +-
mm/internal.h | 12 +-
mm/khugepaged.c | 17 +-
mm/ksm.c | 15 +-
mm/memory-failure.c | 4 +-
mm/memory.c | 60 ++--
mm/migrate.c | 12 +-
mm/migrate_device.c | 41 +--
mm/mmu_gather.c | 2 +-
mm/rmap.c | 433 ++++++++++++++++-----------
mm/swapfile.c | 2 +-
mm/userfaultfd.c | 2 +-
20 files changed, 740 insertions(+), 392 deletions(-)

--
2.43.0


2023-12-11 15:57:30

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
code from page_remove_rmap(). This effectively removes one check on the
small-folio path as well.

Note: all possible candidates that need care are page_remove_rmap() that
pass compound=true.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 5 +++++
mm/hugetlb.c | 4 ++--
mm/rmap.c | 17 ++++++++---------
3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0bfea866f39b..d85bd1d4de04 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -213,6 +213,11 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_remove_rmap(struct folio *folio)
+{
+ atomic_dec(&folio->_entire_mapcount);
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
if (compound) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 305f3ca1dee6..ef48ae673890 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5676,7 +5676,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
make_pte_marker(PTE_MARKER_UFFD_WP),
sz);
hugetlb_count_sub(pages_per_huge_page(h), mm);
- page_remove_rmap(page, vma, true);
+ hugetlb_remove_rmap(page_folio(page));

spin_unlock(ptl);
tlb_remove_page_size(tlb, page, huge_page_size(h));
@@ -5987,7 +5987,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,

/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
- page_remove_rmap(&old_folio->page, vma, true);
+ hugetlb_remove_rmap(old_folio);
hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
diff --git a/mm/rmap.c b/mm/rmap.c
index 80d42c31281a..4e60c1f38eaa 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1482,13 +1482,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,

VM_BUG_ON_PAGE(compound && !PageHead(page), page);

- /* Hugetlb pages are not counted in NR_*MAPPED */
- if (unlikely(folio_test_hugetlb(folio))) {
- /* hugetlb pages are always mapped with pmds */
- atomic_dec(&folio->_entire_mapcount);
- return;
- }
-
/* Is page being unmapped by PTE? Is this its last map to be removed? */
if (likely(!compound)) {
last = atomic_add_negative(-1, &page->_mapcount);
@@ -1846,7 +1839,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
dec_mm_counter(mm, mm_counter_file(&folio->page));
}
discard:
- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2199,7 +2195,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
*/
}

- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
--
2.43.0

2023-12-11 15:57:34

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 05/39] mm/rmap: introduce and use hugetlb_try_share_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Note that try_to_unmap_one() does not need care. Easy to spot because
among all that nasty hugetlb special-casing in that function, we're not
using set_huge_pte_at() on the anon path -- well, and that code assumes
that we would want to swapout.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 23 +++++++++++++++++++++++
mm/rmap.c | 15 ++++++++++-----
2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index ca42b3db5688..4c0650e9f6db 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -228,6 +228,29 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
return 0;
}

+/* See page_try_share_anon_rmap() */
+static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
+
+ /* Paired with the memory barrier in try_grab_folio(). */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_mb();
+
+ if (unlikely(folio_maybe_dma_pinned(folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+
+ /*
+ * This is conceptually a smp_wmb() paired with the smp_rmb() in
+ * gup_must_unshare().
+ */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_mb__after_atomic();
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
diff --git a/mm/rmap.c b/mm/rmap.c
index 4e60c1f38eaa..e210ac1b73de 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2147,13 +2147,18 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
!anon_exclusive, subpage);

/* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
- if (folio_test_hugetlb(folio))
+ if (folio_test_hugetlb(folio)) {
+ if (anon_exclusive &&
+ hugetlb_try_share_anon_rmap(folio)) {
set_huge_pte_at(mm, address, pvmw.pte,
pteval, hsz);
- else
- set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ break;
+ }
+ } else if (anon_exclusive &&
+ page_try_share_anon_rmap(subpage)) {
+ set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
break;
--
2.43.0

2023-12-11 15:57:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 01/39] mm/rmap: rename hugepage_add* to hugetlb_add*

Let's just call it "hugetlb_".

Yes, it's all already inconsistent and confusing because we have a lot
of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
be confused with transparent huge pages, and it matches "hugetlb.c" and
"folio_test_hugetlb()". So let's minimize confusion in rmap code.

Reviewed-by: Muchun Song <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 ++--
mm/hugetlb.c | 8 ++++----
mm/migrate.c | 4 ++--
mm/rmap.c | 8 ++++----
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index af6a32b6f3e7..0bfea866f39b 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,9 +208,9 @@ void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

-void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
-void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

static inline void __page_dup_rmap(struct page *page, bool compound)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6feb3e0630d1..305f3ca1dee6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5285,7 +5285,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add
pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);

__folio_mark_uptodate(new_folio);
- hugepage_add_new_anon_rmap(new_folio, vma, addr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, addr);
if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz);
@@ -5988,7 +5988,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
page_remove_rmap(&old_folio->page, vma, true);
- hugepage_add_new_anon_rmap(new_folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(mm, haddr, ptep, newpte, huge_page_size(h));
@@ -6277,7 +6277,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
goto backout;

if (anon_rmap)
- hugepage_add_new_anon_rmap(folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
page_dup_file_rmap(&folio->page, true);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
@@ -6732,7 +6732,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
if (folio_in_pagecache)
page_dup_file_rmap(&folio->page, true);
else
- hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr);
+ hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

/*
* For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
diff --git a/mm/migrate.c b/mm/migrate.c
index 35a88334bb3c..4cb849fa0dd2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -249,8 +249,8 @@ static bool remove_migration_pte(struct folio *folio,

pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
if (folio_test_anon(folio))
- hugepage_add_anon_rmap(folio, vma, pvmw.address,
- rmap_flags);
+ hugetlb_add_anon_rmap(folio, vma, pvmw.address,
+ rmap_flags);
else
page_dup_file_rmap(new, true);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
diff --git a/mm/rmap.c b/mm/rmap.c
index 846fc79f3ca9..80d42c31281a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2625,8 +2625,8 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
*
* RMAP_COMPOUND is ignored.
*/
-void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
+void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags)
{
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

@@ -2637,8 +2637,8 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
PageAnonExclusive(&folio->page), folio);
}

-void hugepage_add_new_anon_rmap(struct folio *folio,
- struct vm_area_struct *vma, unsigned long address)
+void hugetlb_add_new_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma, unsigned long address)
{
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
--
2.43.0

2023-12-11 15:57:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 03/39] mm/rmap: introduce and use hugetlb_add_file_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

Right now we're using page_dup_file_rmap() in some cases where "ordinary"
rmap code would have used page_add_file_rmap(). So let's introduce and
use hugetlb_add_file_rmap() instead. We won't be adding a
"hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
doing the same: "dup" is just an optimization for "add".

What remains is a single page_dup_file_rmap() call in fork() code.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 7 +++++++
mm/hugetlb.c | 6 +++---
mm/migrate.c | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index d85bd1d4de04..91178d1aa028 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -213,6 +213,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_add_file_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+
+ atomic_inc(&folio->_entire_mapcount);
+}
+
static inline void hugetlb_remove_rmap(struct folio *folio)
{
atomic_dec(&folio->_entire_mapcount);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ef48ae673890..57e898187931 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5408,7 +5408,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* sleep during the process.
*/
if (!folio_test_anon(pte_folio)) {
- page_dup_file_rmap(&pte_folio->page, true);
+ hugetlb_add_file_rmap(pte_folio);
} else if (page_try_dup_anon_rmap(&pte_folio->page,
true, src_vma)) {
pte_t src_pte_old = entry;
@@ -6279,7 +6279,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
if (anon_rmap)
hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
&& (vma->vm_flags & VM_SHARED)));
/*
@@ -6730,7 +6730,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out_release_unlock;

if (folio_in_pagecache)
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
else
hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

diff --git a/mm/migrate.c b/mm/migrate.c
index 4cb849fa0dd2..de9d94b99ab7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio,
hugetlb_add_anon_rmap(folio, vma, pvmw.address,
rmap_flags);
else
- page_dup_file_rmap(new, true);
+ hugetlb_add_file_rmap(folio);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
psize);
} else
--
2.43.0

2023-12-11 15:57:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 04/39] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Note that is_device_private_page() does not apply to hugetlb.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 12 +++++++++---
include/linux/rmap.h | 15 +++++++++++++++
mm/hugetlb.c | 3 +--
3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b72bf25a45cf..ae547b62f325 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1964,15 +1964,21 @@ static inline bool page_maybe_dma_pinned(struct page *page)
*
* The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
*/
-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
+static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct folio *folio)
{
VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));

if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
return false;

- return page_maybe_dma_pinned(page);
+ return folio_maybe_dma_pinned(folio);
+}
+
+static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct page *page)
+{
+ return folio_needs_cow_for_dma(vma, page_folio(page));
}

/**
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 91178d1aa028..ca42b3db5688 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -213,6 +213,21 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+/* See page_try_dup_anon_rmap() */
+static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+ if (PageAnonExclusive(&folio->page)) {
+ if (unlikely(folio_needs_cow_for_dma(vma, folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+ }
+ atomic_inc(&folio->_entire_mapcount);
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57e898187931..378e460a6ab4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5409,8 +5409,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
if (!folio_test_anon(pte_folio)) {
hugetlb_add_file_rmap(pte_folio);
- } else if (page_try_dup_anon_rmap(&pte_folio->page,
- true, src_vma)) {
+ } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
pte_t src_pte_old = entry;
struct folio *new_folio;

--
2.43.0

2023-12-11 15:57:59

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 08/39] mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()

Let's convert insert_page_into_pte_locked() and do_set_pmd(). While at it,
perform some folio conversion.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6a5540ba3c65..70754fd65788 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1859,12 +1859,14 @@ static int validate_page_before_insert(struct page *page)
static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, struct page *page, pgprot_t prot)
{
+ struct folio *folio = page_folio(page);
+
if (!pte_none(ptep_get(pte)))
return -EBUSY;
/* Ok, finally just insert the thing.. */
- get_page(page);
+ folio_get(folio);
inc_mm_counter(vma->vm_mm, mm_counter_file(page));
- page_add_file_rmap(page, vma, false);
+ folio_add_file_rmap_pte(folio, page, vma);
set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
return 0;
}
@@ -4409,6 +4411,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)

vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
{
+ struct folio *folio = page_folio(page);
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
@@ -4418,8 +4421,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
return ret;

- page = compound_head(page);
- if (compound_order(page) != HPAGE_PMD_ORDER)
+ if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
return ret;

/*
@@ -4428,7 +4430,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
* check. This kind of THP just can be PTE mapped. Access to
* the corrupted subpage should trigger SIGBUS as expected.
*/
- if (unlikely(PageHasHWPoisoned(page)))
+ if (unlikely(folio_test_has_hwpoisoned(folio)))
return ret;

/*
@@ -4452,7 +4454,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
- page_add_file_rmap(page, vma, true);
+ folio_add_file_rmap_pmd(folio, page, vma);

/*
* deposit and withdraw with pmd lock held
--
2.43.0

2023-12-11 15:57:59

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 06/39] mm/rmap: add hugetlb sanity checks

Let's make sure we end up with the right folios in the right functions.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 7 +++++++
mm/rmap.c | 6 ++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 4c0650e9f6db..e3857d26b944 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -217,6 +217,7 @@ void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

if (PageAnonExclusive(&folio->page)) {
@@ -231,6 +232,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
/* See page_try_share_anon_rmap() */
static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);

@@ -253,6 +255,7 @@ static inline int hugetlb_try_share_anon_rmap(struct folio *folio)

static inline void hugetlb_add_file_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -260,11 +263,15 @@ static inline void hugetlb_add_file_rmap(struct folio *folio)

static inline void hugetlb_remove_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
atomic_dec(&folio->_entire_mapcount);
}

static inline void __page_dup_rmap(struct page *page, bool compound)
{
+ VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+
if (compound) {
struct folio *folio = (struct folio *)page;

diff --git a/mm/rmap.c b/mm/rmap.c
index e210ac1b73de..41597da14f26 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1343,6 +1343,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
{
int nr = folio_nr_pages(folio);

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_VMA(address < vma->vm_start ||
address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
__folio_set_swapbacked(folio);
@@ -1395,6 +1396,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
unsigned int nr_pmdmapped = 0, first;
int nr = 0;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);

/* Is page being mapped by PTE? Is this its first map to be added? */
@@ -1480,6 +1482,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool last;
enum node_stat_item idx;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_PAGE(compound && !PageHead(page), page);

/* Is page being unmapped by PTE? Is this its last map to be removed? */
@@ -2632,6 +2635,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -2644,6 +2648,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
void hugetlb_add_new_anon_rmap(struct folio *folio,
struct vm_area_struct *vma, unsigned long address)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
--
2.43.0

2023-12-11 15:57:59

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

Let's get rid of the compound parameter and instead define implicitly
which mappings we're adding. That is more future proof, easier to read
and harder to mess up.

Use an enum to express the granularity internally. Make the compiler
always special-case on the granularity by using __always_inline. Replace
the "compound" check by a switch-case that will be removed by the
compiler completely.

Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
folio_test_pmd_mappable() check by a config check in the caller and
sanity checks. Convert the single user of folio_add_file_rmap_range().

This function design can later easily be extended to PUDs and to batch
PMDs. Note that for now we don't support anything bigger than
PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
will catch if that ever changes.

Next up is removing page_remove_rmap() along with its "compound"
parameter and smilarly converting all other rmap functions.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 47 +++++++++++++++++++++++++--
mm/memory.c | 2 +-
mm/rmap.c | 75 +++++++++++++++++++++++++++++---------------
3 files changed, 95 insertions(+), 29 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e3857d26b944..1753900f4aed 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -191,6 +191,45 @@ typedef int __bitwise rmap_t;
*/
#define RMAP_COMPOUND ((__force rmap_t)BIT(1))

+/*
+ * Internally, we're using an enum to specify the granularity. Usually,
+ * we make the compiler create specialized variants for the different
+ * granularity.
+ */
+enum rmap_mode {
+ RMAP_MODE_PTE = 0,
+ RMAP_MODE_PMD,
+};
+
+static inline void __folio_rmap_sanity_checks(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_mode mode)
+{
+ /* hugetlb folios are handled separately. */
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_large(folio) &&
+ !folio_test_large_rmappable(folio), folio);
+
+ VM_WARN_ON_ONCE(nr_pages <= 0);
+ VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
+ VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
+
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ break;
+ case RMAP_MODE_PMD:
+ /*
+ * We don't support folios larger than a single PMD yet. So
+ * when RMAP_MODE_PMD is set, we assume that we are creating
+ * a single "entire" mapping of the folio.
+ */
+ VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
+ VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
+ break;
+ default:
+ VM_WARN_ON_ONCE(true);
+ }
+}
+
/*
* rmap interfaces called when adding or removing pte of page
*/
@@ -203,8 +242,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
void page_add_file_rmap(struct page *, struct vm_area_struct *,
bool compound);
-void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
- struct vm_area_struct *, bool compound);
+void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
+ struct vm_area_struct *);
+#define folio_add_file_rmap_pte(folio, page, vma) \
+ folio_add_file_rmap_ptes(folio, page, 1, vma)
+void folio_add_file_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

diff --git a/mm/memory.c b/mm/memory.c
index 8f0b936b90b5..6a5540ba3c65 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4515,7 +4515,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
folio_add_lru_vma(folio, vma);
} else {
add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
- folio_add_file_rmap_range(folio, page, nr, vma, false);
+ folio_add_file_rmap_ptes(folio, page, nr, vma);
}
set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);

diff --git a/mm/rmap.c b/mm/rmap.c
index 41597da14f26..4f30930a1162 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1376,31 +1376,20 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
}

-/**
- * folio_add_file_rmap_range - add pte mapping to page range of a folio
- * @folio: The folio to add the mapping to
- * @page: The first page to add
- * @nr_pages: The number of pages which will be mapped
- * @vma: the vm area in which the mapping is added
- * @compound: charge the page as compound or small page
- *
- * The page range of folio is defined by [first_page, first_page + nr_pages)
- *
- * The caller needs to hold the pte lock.
- */
-void folio_add_file_rmap_range(struct folio *folio, struct page *page,
- unsigned int nr_pages, struct vm_area_struct *vma,
- bool compound)
+static __always_inline void __folio_add_file_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *vma,
+ enum rmap_mode mode)
{
atomic_t *mapped = &folio->_nr_pages_mapped;
unsigned int nr_pmdmapped = 0, first;
int nr = 0;

- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

/* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(!compound)) {
+ switch (mode) {
+ case RMAP_MODE_PTE:
do {
first = atomic_inc_and_test(&page->_mapcount);
if (first && folio_test_large(folio)) {
@@ -1411,9 +1400,8 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
if (first)
nr++;
} while (page++, --nr_pages > 0);
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
-
+ break;
+ case RMAP_MODE_PMD:
first = atomic_inc_and_test(&folio->_entire_mapcount);
if (first) {
nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
@@ -1428,6 +1416,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
nr = 0;
}
}
+ break;
}

if (nr_pmdmapped)
@@ -1441,6 +1430,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
mlock_vma_folio(folio, vma);
}

+/**
+ * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
+ * @folio: The folio to add the mappings to
+ * @page: The first page to add
+ * @nr_pages: The number of pages that will be mapped using PTEs
+ * @vma: The vm area in which the mappings are added
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
+ int nr_pages, struct vm_area_struct *vma)
+{
+ __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
+}
+
+/**
+ * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @vma: The vm area in which the mapping is added
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/**
* page_add_file_rmap - add pte mapping to a file page
* @page: the page to add the mapping to
@@ -1453,16 +1479,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
bool compound)
{
struct folio *folio = page_folio(page);
- unsigned int nr_pages;

VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);

if (likely(!compound))
- nr_pages = 1;
+ folio_add_file_rmap_pte(folio, page, vma);
else
- nr_pages = folio_nr_pages(folio);
-
- folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+ folio_add_file_rmap_pmd(folio, page, vma);
}

/**
--
2.43.0

2023-12-11 15:58:00

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 09/39] mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()

Let's convert remove_migration_pmd() and while at it, perform some folio
conversion.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3a387c6f18b6..1f5634b2f374 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3577,6 +3577,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,

void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
{
+ struct folio *folio = page_folio(new);
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
@@ -3588,7 +3589,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
return;

entry = pmd_to_swp_entry(*pvmw->pmd);
- get_page(new);
+ folio_get(folio);
pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
@@ -3599,10 +3600,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
if (!is_migration_entry_young(entry))
pmde = pmd_mkold(pmde);
/* NOTE: this may contain setting soft-dirty on some archs */
- if (PageDirty(new) && is_migration_entry_dirty(entry))
+ if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
pmde = pmd_mkdirty(pmde);

- if (PageAnon(new)) {
+ if (folio_test_anon(folio)) {
rmap_t rmap_flags = RMAP_COMPOUND;

if (!is_readable_migration_entry(entry))
@@ -3610,9 +3611,9 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)

page_add_anon_rmap(new, vma, haddr, rmap_flags);
} else {
- page_add_file_rmap(new, vma, true);
+ folio_add_file_rmap_pmd(folio, new, vma);
}
- VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new));
+ VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new));
set_pmd_at(mm, haddr, pvmw->pmd, pmde);

/* No need to invalidate - it was non-present before */
--
2.43.0

2023-12-11 15:58:04

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 12/39] mm/rmap: remove page_add_file_rmap()

All users are gone, let's remove it.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 2 --
mm/rmap.c | 21 ---------------------
2 files changed, 23 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 1753900f4aed..7198905dc8be 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -240,8 +240,6 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
-void page_add_file_rmap(struct page *, struct vm_area_struct *,
- bool compound);
void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
struct vm_area_struct *);
#define folio_add_file_rmap_pte(folio, page, vma) \
diff --git a/mm/rmap.c b/mm/rmap.c
index 4f30930a1162..2ff2f11275e5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1467,27 +1467,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
#endif
}

-/**
- * page_add_file_rmap - add pte mapping to a file page
- * @page: the page to add the mapping to
- * @vma: the vm area in which the mapping is added
- * @compound: charge the page as compound or small page
- *
- * The caller needs to hold the pte lock.
- */
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
-{
- struct folio *folio = page_folio(page);
-
- VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
-
- if (likely(!compound))
- folio_add_file_rmap_pte(folio, page, vma);
- else
- folio_add_file_rmap_pmd(folio, page, vma);
-}
-
/**
* page_remove_rmap - take down pte mapping from a page
* @page: page to remove mapping from
--
2.43.0

2023-12-11 15:58:07

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap()

Let's factor it out to prepare for reuse as we convert
page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().

Make the compiler always special-case on the granularity by using
__always_inline.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 81 ++++++++++++++++++++++++++++++-------------------------
1 file changed, 45 insertions(+), 36 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 2ff2f11275e5..c5761986a411 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1157,6 +1157,49 @@ int folio_total_mapcount(struct folio *folio)
return mapcount;
}

+static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_mode mode,
+ unsigned int *nr_pmdmapped)
+{
+ atomic_t *mapped = &folio->_nr_pages_mapped;
+ int first, nr = 0;
+
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
+
+ /* Is page being mapped by PTE? Is this its first map to be added? */
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ do {
+ first = atomic_inc_and_test(&page->_mapcount);
+ if (first && folio_test_large(folio)) {
+ first = atomic_inc_return_relaxed(mapped);
+ first = (first < COMPOUND_MAPPED);
+ }
+
+ if (first)
+ nr++;
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_MODE_PMD:
+ first = atomic_inc_and_test(&folio->_entire_mapcount);
+ if (first) {
+ nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
+ if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
+ *nr_pmdmapped = folio_nr_pages(folio);
+ nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
+ /* Raced ahead of a remove and another add? */
+ if (unlikely(nr < 0))
+ nr = 0;
+ } else {
+ /* Raced ahead of a remove of COMPOUND_MAPPED */
+ nr = 0;
+ }
+ }
+ break;
+ }
+ return nr;
+}
+
/**
* folio_move_anon_rmap - move a folio to our anon_vma
* @folio: The folio to move to our anon_vma
@@ -1380,45 +1423,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
struct page *page, int nr_pages, struct vm_area_struct *vma,
enum rmap_mode mode)
{
- atomic_t *mapped = &folio->_nr_pages_mapped;
- unsigned int nr_pmdmapped = 0, first;
- int nr = 0;
+ unsigned int nr, nr_pmdmapped = 0;

VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
- __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
-
- /* Is page being mapped by PTE? Is this its first map to be added? */
- switch (mode) {
- case RMAP_MODE_PTE:
- do {
- first = atomic_inc_and_test(&page->_mapcount);
- if (first && folio_test_large(folio)) {
- first = atomic_inc_return_relaxed(mapped);
- first = (first < COMPOUND_MAPPED);
- }
-
- if (first)
- nr++;
- } while (page++, --nr_pages > 0);
- break;
- case RMAP_MODE_PMD:
- first = atomic_inc_and_test(&folio->_entire_mapcount);
- if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
- nr_pmdmapped = folio_nr_pages(folio);
- nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
- /* Raced ahead of a remove and another add? */
- if (unlikely(nr < 0))
- nr = 0;
- } else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
- nr = 0;
- }
- }
- break;
- }

+ nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
if (nr_pmdmapped)
__lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);
--
2.43.0

2023-12-11 15:58:08

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 11/39] mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()

Let's convert mfill_atomic_install_pte().

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 9ec814e47e99..330a481a1654 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -114,7 +114,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
/* Usually, cache pages are already added to LRU */
if (newly_allocated)
folio_add_lru(folio);
- page_add_file_rmap(page, dst_vma, false);
+ folio_add_file_rmap_pte(folio, page, dst_vma);
} else {
page_add_new_anon_rmap(page, dst_vma, dst_addr);
folio_add_lru_vma(folio, dst_vma);
--
2.43.0

2023-12-11 15:58:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()

Let's mimic what we did with folio_add_file_rmap_*() so we can similarly
replace page_add_anon_rmap() next.

Make the compiler always special-case on the granularity by using
__always_inline.

Note that the new functions ignore the RMAP_COMPOUND flag, which we will
remove as soon as page_add_anon_rmap() is gone.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 +++
mm/rmap.c | 118 ++++++++++++++++++++++++++++++-------------
2 files changed, 88 insertions(+), 36 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 7198905dc8be..3b5357cb1c09 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -234,6 +234,12 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
* rmap interfaces called when adding or removing pte of page
*/
void folio_move_anon_rmap(struct folio *, struct vm_area_struct *);
+void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages,
+ struct vm_area_struct *, unsigned long address, rmap_t flags);
+#define folio_add_anon_rmap_pte(folio, page, vma, address, flags) \
+ folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
+void folio_add_anon_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *, unsigned long address, rmap_t flags);
void page_add_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
diff --git a/mm/rmap.c b/mm/rmap.c
index c5761986a411..7787499fa2ad 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1300,38 +1300,20 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
struct folio *folio = page_folio(page);
- atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool compound = flags & RMAP_COMPOUND;
- bool first;

- /* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(!compound)) {
- first = atomic_inc_and_test(&page->_mapcount);
- nr = first;
- if (first && folio_test_large(folio)) {
- nr = atomic_inc_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
+ if (likely(!(flags & RMAP_COMPOUND)))
+ folio_add_anon_rmap_pte(folio, page, vma, address, flags);
+ else
+ folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
+}

- first = atomic_inc_and_test(&folio->_entire_mapcount);
- if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
- nr_pmdmapped = folio_nr_pages(folio);
- nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
- /* Raced ahead of a remove and another add? */
- if (unlikely(nr < 0))
- nr = 0;
- } else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
- nr = 0;
- }
- }
- }
+static __always_inline void __folio_add_anon_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags, enum rmap_mode mode)
+{
+ unsigned int i, nr, nr_pmdmapped = 0;

+ nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
if (nr_pmdmapped)
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped);
if (nr)
@@ -1345,18 +1327,34 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
* folio->index right when not given the address of the head
* page.
*/
- VM_WARN_ON_FOLIO(folio_test_large(folio) && !compound, folio);
+ VM_WARN_ON_FOLIO(folio_test_large(folio) &&
+ mode != RMAP_MODE_PMD, folio);
__folio_set_anon(folio, vma, address,
!!(flags & RMAP_EXCLUSIVE));
} else if (likely(!folio_test_ksm(folio))) {
__page_check_anon_rmap(folio, page, vma, address);
}
- if (flags & RMAP_EXCLUSIVE)
- SetPageAnonExclusive(page);
- /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
- VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 ||
- (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) &&
- PageAnonExclusive(page), folio);
+
+ if (flags & RMAP_EXCLUSIVE) {
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ for (i = 0; i < nr_pages; i++)
+ SetPageAnonExclusive(page + i);
+ break;
+ case RMAP_MODE_PMD:
+ SetPageAnonExclusive(page);
+ break;
+ }
+ }
+ for (i = 0; i < nr_pages; i++) {
+ struct page *cur_page = page + i;
+
+ /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
+ VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 ||
+ (folio_test_large(folio) &&
+ folio_entire_mapcount(folio) > 1)) &&
+ PageAnonExclusive(cur_page), folio);
+ }

/*
* For large folio, only mlock it if it's fully mapped to VMA. It's
@@ -1368,6 +1366,54 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
mlock_vma_folio(folio, vma);
}

+/**
+ * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio
+ * @folio: The folio to add the mappings to
+ * @page: The first page to add
+ * @nr_pages: The number of pages which will be mapped
+ * @vma: The vm area in which the mappings are added
+ * @address: The user virtual address of the first page to map
+ * @flags: The rmap flags
+ *
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
+ * The caller needs to hold the page table lock, and the page must be locked in
+ * the anon_vma case: to serialize mapping,index checking after setting,
+ * and to ensure that an anon folio is not being upgraded racily to a KSM folio
+ * (but KSM folios are never downgraded).
+ */
+void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page,
+ int nr_pages, struct vm_area_struct *vma, unsigned long address,
+ rmap_t flags)
+{
+ __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags,
+ RMAP_MODE_PTE);
+}
+
+/**
+ * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @vma: The vm area in which the mapping is added
+ * @address: The user virtual address of the first page to map
+ * @flags: The rmap flags
+ *
+ * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock, and the page must be locked in
+ * the anon_vma case: to serialize mapping,index checking after setting.
+ */
+void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma, unsigned long address, rmap_t flags)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/**
* folio_add_new_anon_rmap - Add mapping to a new anonymous folio.
* @folio: The folio to add the mapping to.
--
2.43.0

2023-12-11 15:58:28

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 16/39] mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()

Let's convert remove_migration_pmd(). No need to set RMAP_COMPOUND, that
we will remove soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82ad68fe0d12..b03374d1bb94 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3611,12 +3611,12 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
pmde = pmd_mkdirty(pmde);

if (folio_test_anon(folio)) {
- rmap_t rmap_flags = RMAP_COMPOUND;
+ rmap_t rmap_flags = RMAP_NONE;

if (!is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;

- page_add_anon_rmap(new, vma, haddr, rmap_flags);
+ folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags);
} else {
folio_add_file_rmap_pmd(folio, new, vma);
}
--
2.43.0

2023-12-11 15:58:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 19/39] mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert unuse_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8be70912e298..25f53bec5097 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1805,7 +1805,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (pte_swp_exclusive(old_pte))
rmap_flags |= RMAP_EXCLUSIVE;

- page_add_anon_rmap(page, vma, addr, rmap_flags);
+ folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, addr);
lru_cache_add_inactive_or_unevictable(page, vma);
--
2.43.0

2023-12-11 15:58:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.

While at it, use more folio operations (but only in the code branch we're
touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
manually setting PageAnonExclusive.

We should never see non-anon pages on that branch: otherwise, the
existing page_add_anon_rmap() call would have been flawed already.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1f5634b2f374..82ad68fe0d12 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2398,6 +2398,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long haddr, bool freeze)
{
struct mm_struct *mm = vma->vm_mm;
+ struct folio *folio;
struct page *page;
pgtable_t pgtable;
pmd_t old_pmd, _pmd;
@@ -2493,16 +2494,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
page = pmd_page(old_pmd);
+ folio = page_folio(page);
if (pmd_dirty(old_pmd)) {
dirty = true;
- SetPageDirty(page);
+ folio_set_dirty(folio);
}
write = pmd_write(old_pmd);
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
uffd_wp = pmd_uffd_wp(old_pmd);

- VM_BUG_ON_PAGE(!page_count(page), page);
+ VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

/*
* Without "freeze", we'll simply split the PMD, propagating the
@@ -2519,11 +2522,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
*
* See page_try_share_anon_rmap(): invalidate PMD first.
*/
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = PageAnonExclusive(page);
if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
freeze = false;
- if (!freeze)
- page_ref_add(page, HPAGE_PMD_NR - 1);
+ if (!freeze) {
+ rmap_t rmap_flags = RMAP_NONE;
+
+ folio_ref_add(folio, HPAGE_PMD_NR - 1);
+ if (anon_exclusive)
+ rmap_flags |= RMAP_EXCLUSIVE;
+ folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
+ vma, haddr, rmap_flags);
+ }
}

/*
@@ -2566,8 +2576,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
if (write)
entry = pte_mkwrite(entry, vma);
- if (anon_exclusive)
- SetPageAnonExclusive(page + i);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
@@ -2577,7 +2585,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = pte_mksoft_dirty(entry);
if (uffd_wp)
entry = pte_mkuffd_wp(entry);
- page_add_anon_rmap(page + i, vma, addr, RMAP_NONE);
}
VM_BUG_ON(!pte_none(ptep_get(pte)));
set_pte_at(mm, addr, pte, entry);
--
2.43.0

2023-12-11 15:58:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 20/39] mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert restore_exclusive_pte() and do_swap_page(). While at it,
perform some folio conversion in restore_exclusive_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 70754fd65788..97e064883992 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -710,6 +710,7 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
struct page *page, unsigned long address,
pte_t *ptep)
{
+ struct folio *folio = page_folio(page);
pte_t orig_pte;
pte_t pte;
swp_entry_t entry;
@@ -725,14 +726,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
else if (is_writable_device_exclusive_entry(entry))
pte = maybe_mkwrite(pte_mkdirty(pte), vma);

- VM_BUG_ON(pte_write(pte) && !(PageAnon(page) && PageAnonExclusive(page)));
+ VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) &&
+ PageAnonExclusive(page)), folio);

/*
* No need to take a page reference as one was already
* created when the swap entry was made.
*/
- if (PageAnon(page))
- page_add_anon_rmap(page, vma, address, RMAP_NONE);
+ if (folio_test_anon(folio))
+ folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE);
else
/*
* Currently device exclusive access only supports anonymous
@@ -4073,7 +4075,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
page_add_new_anon_rmap(page, vma, vmf->address);
folio_add_lru_vma(folio, vma);
} else {
- page_add_anon_rmap(page, vma, vmf->address, rmap_flags);
+ folio_add_anon_rmap_pte(folio, page, vma, vmf->address,
+ rmap_flags);
}

VM_BUG_ON(!folio_test_anon(folio) ||
--
2.43.0

2023-12-11 15:58:55

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 22/39] mm/rmap: remove RMAP_COMPOUND

No longer used, let's remove it and clarify RMAP_NONE/RMAP_EXCLUSIVE a
bit.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 12 +++---------
mm/rmap.c | 2 --
2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index bd4edae4dbe7..0acebe41ab8e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -177,20 +177,14 @@ struct anon_vma *folio_get_anon_vma(struct folio *folio);
typedef int __bitwise rmap_t;

/*
- * No special request: if the page is a subpage of a compound page, it is
- * mapped via a PTE. The mapped (sub)page is possibly shared between processes.
+ * No special request: A mapped anonymous (sub)page is possibly shared between
+ * processes.
*/
#define RMAP_NONE ((__force rmap_t)0)

-/* The (sub)page is exclusive to a single process. */
+/* The anonymous (sub)page is exclusive to a single process. */
#define RMAP_EXCLUSIVE ((__force rmap_t)BIT(0))

-/*
- * The compound page is not mapped via PTEs, but instead via a single PMD and
- * should be accounted accordingly.
- */
-#define RMAP_COMPOUND ((__force rmap_t)BIT(1))
-
/*
* Internally, we're using an enum to specify the granularity. Usually,
* we make the compiler create specialized variants for the different
diff --git a/mm/rmap.c b/mm/rmap.c
index 83cba8909848..9212726268ba 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2663,8 +2663,6 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
* The following two functions are for anonymous (private mapped) hugepages.
* Unlike common anonymous pages, anonymous hugepages have no accounting code
* and no lru code, because we handle hugepages differently from common pages.
- *
- * RMAP_COMPOUND is ignored.
*/
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
--
2.43.0

2023-12-11 15:59:10

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 26/39] mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert __collapse_huge_page_copy_succeeded() and
collapse_pte_mapped_thp(). While at it, perform some more folio
conversion in __collapse_huge_page_copy_succeeded().

We can get rid of release_pte_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/khugepaged.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index de174d049e71..4d90c9548ec9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -494,11 +494,6 @@ static void release_pte_folio(struct folio *folio)
folio_putback_lru(folio);
}

-static void release_pte_page(struct page *page)
-{
- release_pte_folio(page_folio(page));
-}
-
static void release_pte_pages(pte_t *pte, pte_t *_pte,
struct list_head *compound_pagelist)
{
@@ -687,6 +682,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
spinlock_t *ptl,
struct list_head *compound_pagelist)
{
+ struct folio *src_folio;
struct page *src_page;
struct page *tmp;
pte_t *_pte;
@@ -708,16 +704,17 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
}
} else {
src_page = pte_page(pteval);
- if (!PageCompound(src_page))
- release_pte_page(src_page);
+ src_folio = page_folio(src_page);
+ if (!folio_test_large(src_folio))
+ release_pte_folio(src_folio);
/*
* ptl mostly unnecessary, but preempt has to
* be disabled to update the per-cpu stats
- * inside page_remove_rmap().
+ * inside folio_remove_rmap_pte().
*/
spin_lock(ptl);
ptep_clear(vma->vm_mm, address, _pte);
- page_remove_rmap(src_page, vma, false);
+ folio_remove_rmap_pte(src_folio, src_page, vma);
spin_unlock(ptl);
free_page_and_swap_cache(src_page);
}
@@ -1624,7 +1621,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
* PTE dirty? Shmem page is already dirty; file is read-only.
*/
ptep_clear(mm, addr, pte);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
nr_ptes++;
}

--
2.43.0

2023-12-11 15:59:11

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 24/39] kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert __replace_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/events/uprobes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 435aac1d8c27..16731d240e16 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -198,7 +198,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
set_pte_at_notify(mm, addr, pvmw.pte,
mk_pte(new_page, vma->vm_page_prot));

- page_remove_rmap(old_page, vma, false);
+ folio_remove_rmap_pte(old_folio, old_page, vma);
if (!folio_mapped(old_folio))
folio_free_swap(old_folio);
page_vma_mapped_walk_done(&pvmw);
--
2.43.0

2023-12-11 15:59:11

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 17/39] mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert remove_migration_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index efc19f53b05e..0e78680589bc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -259,8 +259,8 @@ static bool remove_migration_pte(struct folio *folio,
#endif
{
if (folio_test_anon(folio))
- page_add_anon_rmap(new, vma, pvmw.address,
- rmap_flags);
+ folio_add_anon_rmap_pte(folio, new, vma,
+ pvmw.address, rmap_flags);
else
folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
--
2.43.0

2023-12-11 15:59:16

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 25/39] mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()

Let's convert zap_huge_pmd() and set_pmd_migration_entry(). While at it,
perform some more folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b03374d1bb94..cfaa8b823015 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1898,7 +1898,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,

if (pmd_present(orig_pmd)) {
page = pmd_page(orig_pmd);
- page_remove_rmap(page, vma, true);
+ folio_remove_rmap_pmd(page_folio(page), page, vma);
VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
VM_BUG_ON_PAGE(!PageHead(page), page);
} else if (thp_migration_supported()) {
@@ -2433,12 +2433,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
page = pfn_swap_entry_to_page(entry);
} else {
page = pmd_page(old_pmd);
- if (!PageDirty(page) && pmd_dirty(old_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(old_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio = page_folio(page);
+ if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
+ folio_set_dirty(folio);
+ if (!folio_test_referenced(folio) && pmd_young(old_pmd))
+ folio_set_referenced(folio);
+ folio_remove_rmap_pmd(folio, page, vma);
+ folio_put(folio);
}
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
@@ -2593,7 +2594,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
pte_unmap(pte - 1);

if (!pmd_migration)
- page_remove_rmap(page, vma, true);
+ folio_remove_rmap_pmd(folio, page, vma);
if (freeze)
put_page(page);

@@ -3536,6 +3537,7 @@ late_initcall(split_huge_pages_debugfs);
int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
struct page *page)
{
+ struct folio *folio = page_folio(page);
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
@@ -3551,14 +3553,14 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);

/* See page_try_share_anon_rmap(): invalidate PMD first. */
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
if (anon_exclusive && page_try_share_anon_rmap(page)) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return -EBUSY;
}

if (pmd_dirty(pmdval))
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (pmd_write(pmdval))
entry = make_writable_migration_entry(page_to_pfn(page));
else if (anon_exclusive)
@@ -3575,8 +3577,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
if (pmd_uffd_wp(pmdval))
pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio_remove_rmap_pmd(folio, page, vma);
+ folio_put(folio);
trace_set_migration_pmd(address, pmd_val(pmdswp));

return 0;
--
2.43.0

2023-12-11 15:59:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 28/39] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert zap_pte_range() and closely-related
tlb_flush_rmap_batch(). While at it, perform some more folio conversion
in zap_pte_range().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 23 +++++++++++++----------
mm/mmu_gather.c | 2 +-
2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 97e064883992..9a5724cf895f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1434,6 +1434,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = ptep_get(pte);
+ struct folio *folio;
struct page *page;

if (pte_none(ptent))
@@ -1459,21 +1460,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
continue;
}

+ folio = page_folio(page);
delay_rmap = 0;
- if (!PageAnon(page)) {
+ if (!folio_test_anon(folio)) {
if (pte_dirty(ptent)) {
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (tlb_delay_rmap(tlb)) {
delay_rmap = 1;
force_flush = 1;
}
}
if (pte_young(ptent) && likely(vma_has_recency(vma)))
- mark_page_accessed(page);
+ folio_mark_accessed(folio);
}
rss[mm_counter(page)]--;
if (!delay_rmap) {
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
}
@@ -1489,6 +1491,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
if (is_device_private_entry(entry) ||
is_device_exclusive_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
+ folio = page_folio(page);
if (unlikely(!should_zap_page(details, page)))
continue;
/*
@@ -1500,8 +1503,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
WARN_ON_ONCE(!vma_is_anonymous(vma));
rss[mm_counter(page)]--;
if (is_device_private_entry(entry))
- page_remove_rmap(page, vma, false);
- put_page(page);
+ folio_remove_rmap_pte(folio, page, vma);
+ folio_put(folio);
} else if (!non_swap_entry(entry)) {
/* Genuine swap entry, hence a private anon page */
if (!should_zap_cows(details))
@@ -3220,10 +3223,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* threads.
*
* The critical issue is to order this
- * page_remove_rmap with the ptp_clear_flush above.
- * Those stores are ordered by (if nothing else,)
+ * folio_remove_rmap_pte() with the ptp_clear_flush
+ * above. Those stores are ordered by (if nothing else,)
* the barrier present in the atomic_add_negative
- * in page_remove_rmap.
+ * in folio_remove_rmap_pte();
*
* Then the TLB flush in ptep_clear_flush ensures that
* no process can access the old page before the
@@ -3232,7 +3235,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* mapcount is visible. So transitively, TLBs to
* old page will be flushed before it can be reused.
*/
- page_remove_rmap(vmf->page, vma, false);
+ folio_remove_rmap_pte(old_folio, vmf->page, vma);
}

/* Free the old page.. */
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 4f559f4ddd21..604ddf08affe 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -55,7 +55,7 @@ static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_

if (encoded_page_flags(enc)) {
struct page *page = encoded_page_ptr(enc);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(page_folio(page), page, vma);
}
}
}
--
2.43.0

2023-12-11 15:59:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 18/39] mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert replace_page(). While at it, perform some folio
conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/ksm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index b93389a3780e..2b6888ad1470 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1199,6 +1199,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
static int replace_page(struct vm_area_struct *vma, struct page *page,
struct page *kpage, pte_t orig_pte)
{
+ struct folio *kfolio = page_folio(kpage);
struct mm_struct *mm = vma->vm_mm;
struct folio *folio;
pmd_t *pmd;
@@ -1238,15 +1239,16 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
goto out_mn;
}
VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
- VM_BUG_ON_PAGE(PageAnon(kpage) && PageAnonExclusive(kpage), kpage);
+ VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
+ kfolio);

/*
* No need to check ksm_use_zero_pages here: we can only have a
* zero_page here if ksm_use_zero_pages was enabled already.
*/
if (!is_zero_pfn(page_to_pfn(kpage))) {
- get_page(kpage);
- page_add_anon_rmap(kpage, vma, addr, RMAP_NONE);
+ folio_get(kfolio);
+ folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
newpte = mk_pte(kpage, vma->vm_page_prot);
} else {
/*
--
2.43.0

2023-12-11 15:59:33

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 27/39] mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert replace_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/ksm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 2b6888ad1470..b3d0cfaa2533 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1279,7 +1279,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
set_pte_at_notify(mm, addr, ptep, newpte);

folio = page_folio(page);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
if (!folio_mapped(folio))
folio_free_swap(folio);
folio_put(folio);
--
2.43.0

2023-12-11 15:59:40

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 29/39] mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert migrate_vma_collect_pmd(). While at it, perform more
folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate_device.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 8ac1f79f754a..c51c99151ebb 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -107,6 +107,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

for (; addr < end; addr += PAGE_SIZE, ptep++) {
unsigned long mpfn = 0, pfn;
+ struct folio *folio;
struct page *page;
swp_entry_t entry;
pte_t pte;
@@ -168,41 +169,43 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
}

/*
- * By getting a reference on the page we pin it and that blocks
+ * By getting a reference on the folio we pin it and that blocks
* any kind of migration. Side effect is that it "freezes" the
* pte.
*
- * We drop this reference after isolating the page from the lru
- * for non device page (device page are not on the lru and thus
+ * We drop this reference after isolating the folio from the lru
+ * for non device folio (device folio are not on the lru and thus
* can't be dropped from it).
*/
- get_page(page);
+ folio = page_folio(page);
+ folio_get(folio);

/*
- * We rely on trylock_page() to avoid deadlock between
+ * We rely on folio_trylock() to avoid deadlock between
* concurrent migrations where each is waiting on the others
- * page lock. If we can't immediately lock the page we fail this
+ * folio lock. If we can't immediately lock the folio we fail this
* migration as it is only best effort anyway.
*
- * If we can lock the page it's safe to set up a migration entry
- * now. In the common case where the page is mapped once in a
+ * If we can lock the folio it's safe to set up a migration entry
+ * now. In the common case where the folio is mapped once in a
* single process setting up the migration entry now is an
* optimisation to avoid walking the rmap later with
* try_to_migrate().
*/
- if (trylock_page(page)) {
+ if (folio_trylock(folio)) {
bool anon_exclusive;
pte_t swp_pte;

flush_cache_page(vma, addr, pte_pfn(pte));
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = folio_test_anon(folio) &&
+ PageAnonExclusive(page);
if (anon_exclusive) {
pte = ptep_clear_flush(vma, addr, ptep);

if (page_try_share_anon_rmap(page)) {
set_pte_at(mm, addr, ptep, pte);
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
mpfn = 0;
goto next;
}
@@ -214,7 +217,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pte))
- folio_mark_dirty(page_folio(page));
+ folio_mark_dirty(folio);

/* Setup special migration page table entry */
if (mpfn & MIGRATE_PFN_WRITE)
@@ -248,16 +251,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

/*
* This is like regular unmap: we remove the rmap and
- * drop page refcount. Page won't be freed, as we took
- * a reference just above.
+ * drop the folio refcount. The folio won't be freed, as
+ * we took a reference just above.
*/
- page_remove_rmap(page, vma, false);
- put_page(page);
+ folio_remove_rmap_pte(folio, page, vma);
+ folio_put(folio);

if (pte_present(pte))
unmapped++;
} else {
- put_page(page);
+ folio_put(folio);
mpfn = 0;
}

--
2.43.0

2023-12-11 15:59:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 30/39] mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert try_to_unmap_one() and try_to_migrate_one().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index dc3be5807cee..233432f08e36 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1649,7 +1649,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,

/*
* When racing against e.g. zap_pte_range() on another cpu,
- * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
* try_to_unmap() may return before page_mapped() has become false,
* if page table locking is skipped: use TTU_SYNC to wait for that.
*/
@@ -1930,7 +1930,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_hugetlb(folio)))
hugetlb_remove_rmap(folio);
else
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -1998,7 +1998,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,

/*
* When racing against e.g. zap_pte_range() on another cpu,
- * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
* try_to_migrate() may return before page_mapped() has become false,
* if page table locking is skipped: use TTU_SYNC to wait for that.
*/
@@ -2291,7 +2291,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_hugetlb(folio)))
hugetlb_remove_rmap(folio);
else
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2430,7 +2430,7 @@ static bool page_make_device_exclusive_one(struct folio *folio,
* There is a reference on the page for the swap entry which has
* been removed, so shouldn't take another.
*/
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
}

mmu_notifier_invalidate_range_end(&range);
--
2.43.0

2023-12-11 15:59:49

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 32/39] mm/rmap: remove page_remove_rmap()

All callers are gone, let's remove it and some leftover traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 +---
mm/filemap.c | 10 +++++-----
mm/internal.h | 2 +-
mm/memory-failure.c | 4 ++--
mm/rmap.c | 23 ++---------------------
5 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index a266dc0ef99e..0f4eecd03bdc 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -244,8 +244,6 @@ void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
folio_add_file_rmap_ptes(folio, page, 1, vma)
void folio_add_file_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
-void page_remove_rmap(struct page *, struct vm_area_struct *,
- bool compound);
void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
struct vm_area_struct *);
#define folio_remove_rmap_pte(folio, page, vma) \
@@ -392,7 +390,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
*
* This is similar to page_try_dup_anon_rmap(), however, not used during fork()
* to duplicate a mapping, but instead to prepare for KSM or temporarily
- * unmapping a page (swap, migration) via page_remove_rmap().
+ * unmapping a page (swap, migration) via folio_remove_rmap_*().
*
* Marking the page shared can only fail if the page may be pinned; device
* private pages cannot get pinned and consequently this function cannot fail.
diff --git a/mm/filemap.c b/mm/filemap.c
index c0d7e1d7eea2..beff3865465a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -113,11 +113,11 @@
* ->i_pages lock (try_to_unmap_one)
* ->lruvec->lru_lock (follow_page->mark_page_accessed)
* ->lruvec->lru_lock (check_pte_range->isolate_lru_page)
- * ->private_lock (page_remove_rmap->set_page_dirty)
- * ->i_pages lock (page_remove_rmap->set_page_dirty)
- * bdi.wb->list_lock (page_remove_rmap->set_page_dirty)
- * ->inode->i_lock (page_remove_rmap->set_page_dirty)
- * ->memcg->move_lock (page_remove_rmap->folio_memcg_lock)
+ * ->private_lock (folio_remove_rmap_pte->set_page_dirty)
+ * ->i_pages lock (folio_remove_rmap_pte->set_page_dirty)
+ * bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty)
+ * ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty)
+ * ->memcg->move_lock (folio_remove_rmap_pte->folio_memcg_lock)
* bdi.wb->list_lock (zap_pte_range->set_page_dirty)
* ->inode->i_lock (zap_pte_range->set_page_dirty)
* ->private_lock (zap_pte_range->block_dirty_folio)
diff --git a/mm/internal.h b/mm/internal.h
index 222e63b2dea4..a94355e70bd7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -651,7 +651,7 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma)
* under page table lock for the pte/pmd being added or removed.
*
* mlock is usually called at the end of page_add_*_rmap(), munlock at
- * the end of page_remove_rmap(); but new anon folios are managed by
+ * the end of folio_remove_rmap_*(); but new anon folios are managed by
* folio_add_lru_vma() calling mlock_new_folio().
*/
void mlock_folio(struct folio *folio);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d8c853b35dbb..01af9295c47c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2316,8 +2316,8 @@ int memory_failure(unsigned long pfn, int flags)
* We use page flags to determine what action should be taken, but
* the flags can be modified by the error containment action. One
* example is an mlocked page, where PG_mlocked is cleared by
- * page_remove_rmap() in try_to_unmap_one(). So to determine page status
- * correctly, we save a copy of the page flags at this time.
+ * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page
+ * status correctly, we save a copy of the page flags at this time.
*/
page_flags = p->flags;

diff --git a/mm/rmap.c b/mm/rmap.c
index 233432f08e36..b08dd7d6779d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -470,7 +470,7 @@ void __init anon_vma_init(void)
/*
* Getting a lock on a stable anon_vma from a page off the LRU is tricky!
*
- * Since there is no serialization what so ever against page_remove_rmap()
+ * Since there is no serialization what so ever against folio_remove_rmap_*()
* the best this function can do is return a refcount increased anon_vma
* that might have been relevant to this page.
*
@@ -487,7 +487,7 @@ void __init anon_vma_init(void)
* [ something equivalent to page_mapped_in_vma() ].
*
* Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from
- * page_remove_rmap() that the anon_vma pointer from page->mapping is valid
+ * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid
* if there is a mapcount, we can dereference the anon_vma after observing
* those.
*
@@ -1499,25 +1499,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
#endif
}

-/**
- * page_remove_rmap - take down pte mapping from a page
- * @page: page to remove mapping from
- * @vma: the vm area from which the mapping is removed
- * @compound: uncharge the page as compound or small page
- *
- * The caller needs to hold the pte lock.
- */
-void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!compound))
- folio_remove_rmap_pte(folio, page, vma);
- else
- folio_remove_rmap_pmd(folio, page, vma);
-}
-
static __always_inline void __folio_remove_rmap(struct folio *folio,
struct page *page, int nr_pages, struct vm_area_struct *vma,
enum rmap_mode mode)
--
2.43.0

2023-12-11 15:59:49

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 23/39] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()

Let's mimic what we did with folio_add_file_rmap_*() and
folio_add_anon_rmap_*() so we can similarly replace page_remove_rmap()
next.

Make the compiler always special-case on the granularity by using
__always_inline.

We're adding folio_remove_rmap_ptes() handling right away, as we want to
use that soon for batching rmap operations when unmapping PTE-mapped
large folios.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 ++++
mm/rmap.c | 79 ++++++++++++++++++++++++++++++++++++--------
2 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0acebe41ab8e..a266dc0ef99e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -246,6 +246,12 @@ void folio_add_file_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);
+void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
+ struct vm_area_struct *);
+#define folio_remove_rmap_pte(folio, page, vma) \
+ folio_remove_rmap_ptes(folio, page, 1, vma)
+void folio_remove_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *);

void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
diff --git a/mm/rmap.c b/mm/rmap.c
index 9212726268ba..dc3be5807cee 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1511,25 +1511,38 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool compound)
{
struct folio *folio = page_folio(page);
+
+ if (likely(!compound))
+ folio_remove_rmap_pte(folio, page, vma);
+ else
+ folio_remove_rmap_pmd(folio, page, vma);
+}
+
+static __always_inline void __folio_remove_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *vma,
+ enum rmap_mode mode)
+{
atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool last;
+ int last, nr = 0, nr_pmdmapped = 0;
enum node_stat_item idx;

- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

/* Is page being unmapped by PTE? Is this its last map to be removed? */
- if (likely(!compound)) {
- last = atomic_add_negative(-1, &page->_mapcount);
- nr = last;
- if (last && folio_test_large(folio)) {
- nr = atomic_dec_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ do {
+ last = atomic_add_negative(-1, &page->_mapcount);
+ if (last && folio_test_large(folio)) {
+ last = atomic_dec_return_relaxed(mapped);
+ last = (last < COMPOUND_MAPPED);
+ }

+ if (last)
+ nr++;
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_MODE_PMD:
last = atomic_add_negative(-1, &folio->_entire_mapcount);
if (last) {
nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
@@ -1544,6 +1557,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
nr = 0;
}
}
+ break;
}

if (nr_pmdmapped) {
@@ -1565,7 +1579,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
* is still mapped.
*/
if (folio_test_large(folio) && folio_test_anon(folio))
- if (!compound || nr < nr_pmdmapped)
+ if (mode == RMAP_MODE_PTE || nr < nr_pmdmapped)
deferred_split_folio(folio);
}

@@ -1580,6 +1594,43 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
munlock_vma_folio(folio, vma);
}

+/**
+ * folio_remove_rmap_ptes - remove PTE mappings from a page range of a folio
+ * @folio: The folio to remove the mappings from
+ * @page: The first page to remove
+ * @nr_pages: The number of pages that will be removed from the mapping
+ * @vma: The vm area from which the mappings are removed
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_remove_rmap_ptes(struct folio *folio, struct page *page,
+ int nr_pages, struct vm_area_struct *vma)
+{
+ __folio_remove_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
+}
+
+/**
+ * folio_remove_rmap_pmd - remove a PMD mapping from a page range of a folio
+ * @folio: The folio to remove the mapping from
+ * @page: The first page to remove
+ * @vma: The vm area from which the mapping is removed
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_remove_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
--
2.43.0

2023-12-11 15:59:50

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
remove them.

Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 6 --
include/linux/rmap.h | 150 ++++++++++++++++++++++++++++++-------------
2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ae547b62f325..30edf3f7d1f3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1975,12 +1975,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
return folio_maybe_dma_pinned(folio);
}

-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
-{
- return folio_needs_cow_for_dma(vma, page_folio(page));
-}
-
/**
* is_zero_page - Query if a page is a zero page
* @page: The page to query
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index df60e44fecad..c6d8a02ecd56 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -365,68 +365,130 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
#endif
}

-static inline void __page_dup_rmap(struct page *page, bool compound)
+static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *src_vma,
+ enum rmap_mode mode)
{
- VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+ bool maybe_pinned;
+ int i;

- if (compound) {
- struct folio *folio = (struct folio *)page;
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
+ /*
+ * If this folio may have been pinned by the parent process,
+ * don't allow to duplicate the mappings but instead require to e.g.,
+ * copy the subpage immediately for the child so that we'll always
+ * guarantee the pinned folio won't be randomly replaced in the
+ * future on write faults.
+ */
+ maybe_pinned = likely(!folio_is_device_private(folio)) &&
+ unlikely(folio_needs_cow_for_dma(src_vma, folio));
+
+ /*
+ * No need to check+clear for already shared PTEs/PMDs of the
+ * folio. But if any page is PageAnonExclusive, we must fallback to
+ * copying if the folio maybe pinned.
+ */
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ if (unlikely(maybe_pinned)) {
+ for (i = 0; i < nr_pages; i++)
+ if (PageAnonExclusive(page + i))
+ return -EBUSY;
+ }
+ do {
+ if (PageAnonExclusive(page))
+ ClearPageAnonExclusive(page);
+ atomic_inc(&page->_mapcount);
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_MODE_PMD:
+ if (PageAnonExclusive(page)) {
+ if (unlikely(maybe_pinned))
+ return -EBUSY;
+ ClearPageAnonExclusive(page);
+ }
atomic_inc(&folio->_entire_mapcount);
- } else {
- atomic_inc(&page->_mapcount);
+ break;
}
+ return 0;
}

/**
- * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
- * anonymous page
- * @page: the page to duplicate the mapping for
- * @compound: the page is mapped as compound or as a small page
- * @vma: the source vma
+ * folio_try_dup_anon_rmap_ptes - try duplicating PTE mappings of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ * @src_vma: The vm area from which the mappings are duplicated
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
*
- * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq.
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
*
- * Duplicating the mapping can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
+ * Duplicating the mappings can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail
+ * for them.
*
- * If duplicating the mapping succeeds, the page has to be mapped R/O into
- * the parent and the child. It must *not* get mapped writable after this call.
+ * If duplicating the mappings succeeded, the duplicated PTEs have to be R/O in
+ * the parent and the child. They must *not* be writable after this call
+ * succeeded.
+ *
+ * Returns 0 if duplicating the mappings succeeded. Returns -EBUSY otherwise.
+ */
+static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *src_vma)
+{
+ return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma,
+ RMAP_MODE_PTE);
+}
+#define folio_try_dup_anon_rmap_pte(folio, page, vma) \
+ folio_try_dup_anon_rmap_ptes(folio, page, 1, vma)
+
+/**
+ * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ * @src_vma: The vm area from which the mapping is duplicated
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mapping can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail
+ * for them.
+ *
+ * If duplicating the mapping succeeds, the duplicated PMD has to be R/O in
+ * the parent and the child. They must *not* be writable after this call
+ * succeeded.
*
* Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise.
*/
+static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
+ struct page *page, struct vm_area_struct *src_vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_dup_anon_rmap(folio, page, HPAGE_PMD_NR, src_vma,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+ return -EBUSY;
+#endif
+}
+
static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
struct vm_area_struct *vma)
{
- VM_BUG_ON_PAGE(!PageAnon(page), page);
-
- /*
- * No need to check+clear for already shared pages, including KSM
- * pages.
- */
- if (!PageAnonExclusive(page))
- goto dup;
+ struct folio *folio = page_folio(page);

- /*
- * If this page may have been pinned by the parent process,
- * don't allow to duplicate the mapping but instead require to e.g.,
- * copy the page immediately for the child so that we'll always
- * guarantee the pinned page won't be randomly replaced in the
- * future on write faults.
- */
- if (likely(!is_device_private_page(page)) &&
- unlikely(page_needs_cow_for_dma(vma, page)))
- return -EBUSY;
-
- ClearPageAnonExclusive(page);
- /*
- * It's okay to share the anon page between both processes, mapping
- * the page R/O into both processes.
- */
-dup:
- __page_dup_rmap(page, compound);
- return 0;
+ if (likely(!compound))
+ return folio_try_dup_anon_rmap_pte(folio, page, vma);
+ return folio_try_dup_anon_rmap_pmd(folio, page, vma);
}

/**
--
2.43.0

2023-12-11 15:59:53

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 31/39] Documentation: stop referring to page_remove_rmap()

Refer to folio_remove_rmap_*() instaed.

Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/mm/transhuge.rst | 2 +-
Documentation/mm/unevictable-lru.rst | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index 9a607059ea11..cf81272a6b8b 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -156,7 +156,7 @@ Partial unmap and deferred_split_folio()

Unmapping part of THP (with munmap() or other way) is not going to free
memory immediately. Instead, we detect that a subpage of THP is not in use
-in page_remove_rmap() and queue the THP for splitting if memory pressure
+in folio_remove_rmap_*() and queue the THP for splitting if memory pressure
comes. Splitting will free up unused subpages.

Splitting the page right away is not an option due to locking context in
diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
index 67f1338440a5..b6a07a26b10d 100644
--- a/Documentation/mm/unevictable-lru.rst
+++ b/Documentation/mm/unevictable-lru.rst
@@ -486,7 +486,7 @@ munlock the pages if we're removing the last VM_LOCKED VMA that maps the pages.
Before the unevictable/mlock changes, mlocking did not mark the pages in any
way, so unmapping them required no processing.

-For each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
+For each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls
munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).

@@ -511,7 +511,7 @@ userspace; truncation even unmaps and deletes any private anonymous pages
which had been Copied-On-Write from the file pages now being truncated.

Mlocked pages can be munlocked and deleted in this way: like with munmap(),
-for each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
+for each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls
munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).

--
2.43.0

2023-12-11 16:00:12

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 21/39] mm/rmap: remove page_add_anon_rmap()

All users are gone, remove it and all traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 2 --
mm/rmap.c | 31 ++++---------------------------
2 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 3b5357cb1c09..bd4edae4dbe7 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -240,8 +240,6 @@ void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages,
folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
void folio_add_anon_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *, unsigned long address, rmap_t flags);
-void page_add_anon_rmap(struct page *, struct vm_area_struct *,
- unsigned long address, rmap_t flags);
void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
diff --git a/mm/rmap.c b/mm/rmap.c
index 7787499fa2ad..83cba8909848 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1271,7 +1271,7 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page,
* The page's anon-rmap details (mapping and index) are guaranteed to
* be set up correctly at this point.
*
- * We have exclusion against page_add_anon_rmap because the caller
+ * We have exclusion against folio_add_anon_rmap_*() because the caller
* always holds the page locked.
*
* We have exclusion against page_add_new_anon_rmap because those pages
@@ -1284,29 +1284,6 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page,
page);
}

-/**
- * page_add_anon_rmap - add pte mapping to an anonymous page
- * @page: the page to add the mapping to
- * @vma: the vm area in which the mapping is added
- * @address: the user virtual address mapped
- * @flags: the rmap flags
- *
- * The caller needs to hold the pte lock, and the page must be locked in
- * the anon_vma case: to serialize mapping,index checking after setting,
- * and to ensure that PageAnon is not being upgraded racily to PageKsm
- * (but PageKsm is never downgraded to PageAnon).
- */
-void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!(flags & RMAP_COMPOUND)))
- folio_add_anon_rmap_pte(folio, page, vma, address, flags);
- else
- folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
-}
-
static __always_inline void __folio_add_anon_rmap(struct folio *folio,
struct page *page, int nr_pages, struct vm_area_struct *vma,
unsigned long address, rmap_t flags, enum rmap_mode mode)
@@ -1420,7 +1397,7 @@ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
* @vma: the vm area in which the mapping is added
* @address: the user virtual address mapped
*
- * Like page_add_anon_rmap() but must only be called on *new* folios.
+ * Like folio_add_anon_rmap_*() but must only be called on *new* folios.
* This means the inc-and-test can be bypassed.
* The folio does not have to be locked.
*
@@ -1480,7 +1457,7 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
if (nr)
__lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr);

- /* See comments in page_add_anon_rmap() */
+ /* See comments in folio_add_anon_rmap_*() */
if (!folio_test_large(folio))
mlock_vma_folio(folio, vma);
}
@@ -1594,7 +1571,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,

/*
* It would be tidy to reset folio_test_anon mapping when fully
- * unmapped, but that might overwrite a racing page_add_anon_rmap
+ * unmapped, but that might overwrite a racing folio_add_anon_rmap_*()
* which increments mapcount after us but sets mapping before us:
* so leave the reset to free_pages_prepare, and remember that
* it's only reliable while mapped.
--
2.43.0

2023-12-11 16:00:26

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 36/39] mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()

Let's convert copy_nonpresent_pte(). While at it, perform some more
folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 42a0b7b41b86..caaf4add6fa2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -785,6 +785,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
unsigned long vm_flags = dst_vma->vm_flags;
pte_t orig_pte = ptep_get(src_pte);
pte_t pte = orig_pte;
+ struct folio *folio;
struct page *page;
swp_entry_t entry = pte_to_swp_entry(orig_pte);

@@ -829,6 +830,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}
} else if (is_device_private_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
+ folio = page_folio(page);

/*
* Update rss count even for unaddressable pages, as
@@ -839,10 +841,10 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* for unaddressable pages, at some point. But for now
* keep things as they are.
*/
- get_page(page);
+ folio_get(folio);
rss[mm_counter(page)]++;
/* Cannot fail as these pages cannot get pinned. */
- BUG_ON(page_try_dup_anon_rmap(page, false, src_vma));
+ folio_try_dup_anon_rmap_pte(folio, page, src_vma);

/*
* We do not preserve soft-dirty information, because so
@@ -956,7 +958,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
* future.
*/
folio_get(folio);
- if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) {
+ if (unlikely(folio_try_dup_anon_rmap_pte(folio, page, src_vma))) {
/* Page may be pinned, we have to copy. */
folio_put(folio);
return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
--
2.43.0

2023-12-11 16:00:37

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 38/39] mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]()

Let's convert it like we converted all the other rmap functions.
Don't introduce folio_try_share_anon_rmap_ptes() for now, as we don't
have a user that wants rmap batching in sight. Pretty easy to add later.

All users are easy to convert -- only ksm.c doesn't use folios yet but
that is left for future work -- so let's just do it in a single shot.

While at it, turn the BUG_ON into a WARN_ON_ONCE.

Note that page_try_share_anon_rmap() so far didn't care about pte/pmd
mappings (no compound parameter). We're changing that so we can perform
better sanity checks and make the code actually more readable/consistent.
For example, __folio_rmap_sanity_checks() will make sure that a PMD
range actually falls completely into the folio.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 95 +++++++++++++++++++++++++++++++++-----------
mm/gup.c | 2 +-
mm/huge_memory.c | 9 +++--
mm/internal.h | 4 +-
mm/ksm.c | 5 ++-
mm/migrate_device.c | 2 +-
mm/rmap.c | 11 ++---
7 files changed, 89 insertions(+), 39 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 1e37ee6ae0ba..1e54a28cc884 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -272,7 +272,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
return 0;
}

-/* See page_try_share_anon_rmap() */
+/* See folio_try_share_anon_rmap_*() */
static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
@@ -481,30 +481,15 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
#endif
}

-/**
- * page_try_share_anon_rmap - try marking an exclusive anonymous page possibly
- * shared to prepare for KSM or temporary unmapping
- * @page: the exclusive anonymous page to try marking possibly shared
- *
- * The caller needs to hold the PT lock and has to have the page table entry
- * cleared/invalidated.
- *
- * This is similar to folio_try_dup_anon_rmap_*(), however, not used during
- * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily
- * unmapping a page (swap, migration) via folio_remove_rmap_*().
- *
- * Marking the page shared can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
- *
- * Returns 0 if marking the page possibly shared succeeded. Returns -EBUSY
- * otherwise.
- */
-static inline int page_try_share_anon_rmap(struct page *page)
+static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_mode mode)
{
- VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

- /* device private pages cannot get pinned via GUP. */
- if (unlikely(is_device_private_page(page))) {
+ /* device private folios cannot get pinned via GUP. */
+ if (unlikely(folio_is_device_private(folio))) {
ClearPageAnonExclusive(page);
return 0;
}
@@ -555,7 +540,7 @@ static inline int page_try_share_anon_rmap(struct page *page)
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_mb();

- if (unlikely(page_maybe_dma_pinned(page)))
+ if (unlikely(folio_maybe_dma_pinned(folio)))
return -EBUSY;
ClearPageAnonExclusive(page);

@@ -568,6 +553,68 @@ static inline int page_try_share_anon_rmap(struct page *page)
return 0;
}

+/**
+ * folio_try_share_anon_rmap_pte - try marking an exclusive anonymous page
+ * mapped by a PTE possibly shared to prepare
+ * for KSM or temporary unmapping
+ * @folio: The folio to share a mapping of
+ * @page: The mapped exclusive page
+ *
+ * The caller needs to hold the page table lock and has to have the page table
+ * entries cleared/invalidated.
+ *
+ * This is similar to folio_try_dup_anon_rmap_pte(), however, not used during
+ * fork() to duplicate mappings, but instead to prepare for KSM or temporarily
+ * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pte().
+ *
+ * Marking the mapped page shared can only fail if the folio maybe pinned;
+ * device private folios cannot get pinned and consequently this function cannot
+ * fail.
+ *
+ * Returns 0 if marking the mapped page possibly shared succeeded. Returns
+ * -EBUSY otherwise.
+ */
+static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
+ struct page *page)
+{
+ return __folio_try_share_anon_rmap(folio, page, 1, RMAP_MODE_PTE);
+}
+
+/**
+ * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
+ * range mapped by a PMD possibly shared to
+ * prepare for temporary unmapping
+ * @folio: The folio to share the mapping of
+ * @page: The first page to share the mapping of
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock and has to have the page table
+ * entries cleared/invalidated.
+ *
+ * This is similar to folio_try_dup_anon_rmap_pmd(), however, not used during
+ * fork() to duplicate a mapping, but instead to prepare for temporarily
+ * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pmd().
+ *
+ * Marking the mapped pages shared can only fail if the folio maybe pinned;
+ * device private folios cannot get pinned and consequently this function cannot
+ * fail.
+ *
+ * Returns 0 if marking the mapped pages possibly shared succeeded. Returns
+ * -EBUSY otherwise.
+ */
+static inline int folio_try_share_anon_rmap_pmd(struct folio *folio,
+ struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_share_anon_rmap(folio, page, HPAGE_PMD_NR,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+ return -EBUSY;
+#endif
+}
+
/*
* Called from mm/vmscan.c to handle paging out
*/
diff --git a/mm/gup.c b/mm/gup.c
index 0a5f0e91bfec..df83182ec72d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -177,7 +177,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
/*
* Adjust the pincount before re-checking the PTE for changes.
* This is essentially a smp_mb() and is paired with a memory
- * barrier in page_try_share_anon_rmap().
+ * barrier in folio_try_share_anon_rmap_*().
*/
smp_mb__after_atomic();

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 34f878916621..c681296fa429 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2523,10 +2523,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
* In case we cannot clear PageAnonExclusive(), split the PMD
* only and let try_to_migrate_one() fail later.
*
- * See page_try_share_anon_rmap(): invalidate PMD first.
+ * See folio_try_share_anon_rmap_pmd(): invalidate PMD first.
*/
anon_exclusive = PageAnonExclusive(page);
- if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
+ if (freeze && anon_exclusive &&
+ folio_try_share_anon_rmap_pmd(folio, page))
freeze = false;
if (!freeze) {
rmap_t rmap_flags = RMAP_NONE;
@@ -3554,9 +3555,9 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);

- /* See page_try_share_anon_rmap(): invalidate PMD first. */
+ /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
- if (anon_exclusive && page_try_share_anon_rmap(page)) {
+ if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return -EBUSY;
}
diff --git a/mm/internal.h b/mm/internal.h
index a94355e70bd7..29589bc3f046 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1047,7 +1047,7 @@ enum {
* * Ordinary GUP: Using the PT lock
* * GUP-fast and fork(): mm->write_protect_seq
* * GUP-fast and KSM or temporary unmapping (swap, migration): see
- * page_try_share_anon_rmap()
+ * folio_try_share_anon_rmap_*()
*
* Must be called with the (sub)page that's actually referenced via the
* page table entry, which might not necessarily be the head page for a
@@ -1090,7 +1090,7 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma,
return is_cow_mapping(vma->vm_flags);
}

- /* Paired with a memory barrier in page_try_share_anon_rmap(). */
+ /* Paired with a memory barrier in folio_try_share_anon_rmap_*(). */
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_rmb();

diff --git a/mm/ksm.c b/mm/ksm.c
index b3d0cfaa2533..be76e9dabf4f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1161,8 +1161,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
goto out_unlock;
}

- /* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive && page_try_share_anon_rmap(page)) {
+ /* See folio_try_share_anon_rmap_pte(): clear PTE first. */
+ if (anon_exclusive &&
+ folio_try_share_anon_rmap_pte(page_folio(page), page)) {
set_pte_at(mm, pvmw.address, pvmw.pte, entry);
goto out_unlock;
}
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index c51c99151ebb..9d0c1ad73722 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -202,7 +202,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (anon_exclusive) {
pte = ptep_clear_flush(vma, addr, ptep);

- if (page_try_share_anon_rmap(page)) {
+ if (folio_try_share_anon_rmap_pte(folio, page)) {
set_pte_at(mm, addr, ptep, pte);
folio_unlock(folio);
folio_put(folio);
diff --git a/mm/rmap.c b/mm/rmap.c
index b08dd7d6779d..45296739236f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1868,9 +1868,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
break;
}

- /* See page_try_share_anon_rmap(): clear PTE first. */
+ /* See folio_try_share_anon_rmap(): clear PTE first. */
if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
+ folio_try_share_anon_rmap_pte(folio, subpage)) {
swap_free(entry);
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
@@ -2144,7 +2144,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
pte_t swp_pte;

if (anon_exclusive)
- BUG_ON(page_try_share_anon_rmap(subpage));
+ WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio,
+ subpage));

/*
* Store the pfn of the page in a special migration
@@ -2215,7 +2216,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) &&
!anon_exclusive, subpage);

- /* See page_try_share_anon_rmap(): clear PTE first. */
+ /* See folio_try_share_anon_rmap_pte(): clear PTE first. */
if (folio_test_hugetlb(folio)) {
if (anon_exclusive &&
hugetlb_try_share_anon_rmap(folio)) {
@@ -2226,7 +2227,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
} else if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
+ folio_try_share_anon_rmap_pte(folio, subpage)) {
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
--
2.43.0

2023-12-11 16:01:12

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 35/39] mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd()

Let's convert copy_huge_pmd() and fixup the comment in copy_huge_pud().
While at it, perform more folio conversion in copy_huge_pmd().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cfaa8b823015..34f878916621 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1275,6 +1275,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
{
spinlock_t *dst_ptl, *src_ptl;
struct page *src_page;
+ struct folio *src_folio;
pmd_t pmd;
pgtable_t pgtable = NULL;
int ret = -ENOMEM;
@@ -1341,11 +1342,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,

src_page = pmd_page(pmd);
VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
+ src_folio = page_folio(src_page);

- get_page(src_page);
- if (unlikely(page_try_dup_anon_rmap(src_page, true, src_vma))) {
+ folio_get(src_folio);
+ if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) {
/* Page maybe pinned: split and retry the fault on PTEs. */
- put_page(src_page);
+ folio_put(src_folio);
pte_free(dst_mm, pgtable);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
@@ -1454,8 +1456,8 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}

/*
- * TODO: once we support anonymous pages, use page_try_dup_anon_rmap()
- * and split if duplicating fails.
+ * TODO: once we support anonymous pages, use
+ * folio_try_dup_anon_rmap_*() and split if duplicating fails.
*/
pudp_set_wrprotect(src_mm, addr, src_pud);
pud = pud_mkold(pud_wrprotect(pud));
--
2.43.0

2023-12-11 16:01:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 39/39] mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED

We removed all "bool compound" and RMAP_COMPOUND parameters. Let's
remove the remaining "compound" terminology by making COMPOUND_MAPPED
match the "folio->_entire_mapcount" terminology, renaming it to
ENTIRELY_MAPPED.

ENTIRELY_MAPPED is only used when the whole folio is mapped using a single
page table entry (e.g., a single PMD mapping a PMD-sized THP). For now,
we don't support mapping any THP bigger than that, so ENTIRELY_MAPPED
only applies to PMD-mapped PMD-sized THP only.

Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/mm/transhuge.rst | 2 +-
mm/internal.h | 6 +++---
mm/rmap.c | 18 +++++++++---------
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index cf81272a6b8b..93c9239b9ebe 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -117,7 +117,7 @@ pages:

- map/unmap of a PMD entry for the whole THP increment/decrement
folio->_entire_mapcount and also increment/decrement
- folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount
+ folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount
goes from -1 to 0 or 0 to -1.

- map/unmap of individual pages with PTE entry increment/decrement
diff --git a/mm/internal.h b/mm/internal.h
index 29589bc3f046..188807d2aebc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -54,12 +54,12 @@ void page_writeback_init(void);

/*
* If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
- * its nr_pages_mapped would be 0x400000: choose the COMPOUND_MAPPED bit
+ * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit
* above that range, instead of 2*(PMD_SIZE/PAGE_SIZE). Hugetlb currently
* leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
*/
-#define COMPOUND_MAPPED 0x800000
-#define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1)
+#define ENTIRELY_MAPPED 0x800000
+#define FOLIO_PAGES_MAPPED (ENTIRELY_MAPPED - 1)

/*
* Flags passed to __show_mem() and show_free_areas() to suppress output in
diff --git a/mm/rmap.c b/mm/rmap.c
index 45296739236f..53753834a10d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1173,7 +1173,7 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
first = atomic_inc_and_test(&page->_mapcount);
if (first && folio_test_large(folio)) {
first = atomic_inc_return_relaxed(mapped);
- first = (first < COMPOUND_MAPPED);
+ first = (first < ENTIRELY_MAPPED);
}

if (first)
@@ -1183,15 +1183,15 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
case RMAP_MODE_PMD:
first = atomic_inc_and_test(&folio->_entire_mapcount);
if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
+ nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped);
+ if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) {
*nr_pmdmapped = folio_nr_pages(folio);
nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
/* Raced ahead of a remove and another add? */
if (unlikely(nr < 0))
nr = 0;
} else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
+ /* Raced ahead of a remove of ENTIRELY_MAPPED */
nr = 0;
}
}
@@ -1434,7 +1434,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
} else {
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
- atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
+ atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED);
SetPageAnonExclusive(&folio->page);
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
}
@@ -1516,7 +1516,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
last = atomic_add_negative(-1, &page->_mapcount);
if (last && folio_test_large(folio)) {
last = atomic_dec_return_relaxed(mapped);
- last = (last < COMPOUND_MAPPED);
+ last = (last < ENTIRELY_MAPPED);
}

if (last)
@@ -1526,15 +1526,15 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
case RMAP_MODE_PMD:
last = atomic_add_negative(-1, &folio->_entire_mapcount);
if (last) {
- nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED)) {
+ nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped);
+ if (likely(nr < ENTIRELY_MAPPED)) {
nr_pmdmapped = folio_nr_pages(folio);
nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
/* Raced ahead of another remove and an add? */
if (unlikely(nr < 0))
nr = 0;
} else {
- /* An add of COMPOUND_MAPPED raced ahead */
+ /* An add of ENTIRELY_MAPPED raced ahead */
nr = 0;
}
}
--
2.43.0

2023-12-11 16:01:26

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 10/39] mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()

Let's convert remove_migration_pte().

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index de9d94b99ab7..efc19f53b05e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -262,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio,
page_add_anon_rmap(new, vma, pvmw.address,
rmap_flags);
else
- page_add_file_rmap(new, vma, false);
+ folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
}
if (vma->vm_flags & VM_LOCKED)
--
2.43.0

2023-12-11 16:01:40

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 33/39] mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]()

Let's convert page_dup_file_rmap() like the other rmap functions. As there
is only a single caller, convert that single caller right away and remove
page_dup_file_rmap().

Add folio_dup_file_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 59 ++++++++++++++++++++++++++++++++++++++++----
mm/memory.c | 2 +-
2 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0f4eecd03bdc..df60e44fecad 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -311,6 +311,60 @@ static inline void hugetlb_remove_rmap(struct folio *folio)
atomic_dec(&folio->_entire_mapcount);
}

+static __always_inline void __folio_dup_file_rmap(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_mode mode)
+{
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
+
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ do {
+ atomic_inc(&page->_mapcount);
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_MODE_PMD:
+ atomic_inc(&folio->_entire_mapcount);
+ break;
+ }
+}
+
+/**
+ * folio_dup_file_rmap_ptes - duplicate PTE mappings of a page range of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+static inline void folio_dup_file_rmap_ptes(struct folio *folio,
+ struct page *page, int nr_pages)
+{
+ __folio_dup_file_rmap(folio, page, nr_pages, RMAP_MODE_PTE);
+}
+#define folio_dup_file_rmap_pte(folio, page) \
+ folio_dup_file_rmap_ptes(folio, page, 1)
+
+/**
+ * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+static inline void folio_dup_file_rmap_pmd(struct folio *folio,
+ struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_dup_file_rmap(folio, page, HPAGE_PMD_NR, RMAP_MODE_PTE);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
@@ -325,11 +379,6 @@ static inline void __page_dup_rmap(struct page *page, bool compound)
}
}

-static inline void page_dup_file_rmap(struct page *page, bool compound)
-{
- __page_dup_rmap(page, compound);
-}
-
/**
* page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
* anonymous page
diff --git a/mm/memory.c b/mm/memory.c
index 9a5724cf895f..42a0b7b41b86 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -965,7 +965,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
rss[MM_ANONPAGES]++;
} else if (page) {
folio_get(folio);
- page_dup_file_rmap(page, false);
+ folio_dup_file_rmap_pte(folio, page);
rss[mm_counter_file(page)]++;
}

--
2.43.0

2023-12-11 16:01:42

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 37/39] mm/rmap: remove page_try_dup_anon_rmap()

All users are gone, remove page_try_dup_anon_rmap() and any remaining
traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c6d8a02ecd56..1e37ee6ae0ba 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -256,7 +256,7 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

-/* See page_try_dup_anon_rmap() */
+/* See folio_try_dup_anon_rmap_*() */
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
{
@@ -481,16 +481,6 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
#endif
}

-static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
- struct vm_area_struct *vma)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!compound))
- return folio_try_dup_anon_rmap_pte(folio, page, vma);
- return folio_try_dup_anon_rmap_pmd(folio, page, vma);
-}
-
/**
* page_try_share_anon_rmap - try marking an exclusive anonymous page possibly
* shared to prepare for KSM or temporary unmapping
@@ -499,8 +489,8 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
* The caller needs to hold the PT lock and has to have the page table entry
* cleared/invalidated.
*
- * This is similar to page_try_dup_anon_rmap(), however, not used during fork()
- * to duplicate a mapping, but instead to prepare for KSM or temporarily
+ * This is similar to folio_try_dup_anon_rmap_*(), however, not used during
+ * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily
* unmapping a page (swap, migration) via folio_remove_rmap_*().
*
* Marking the page shared can only fail if the page may be pinned; device
--
2.43.0

2023-12-11 16:14:47

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 01/39] mm/rmap: rename hugepage_add* to hugetlb_add*

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's just call it "hugetlb_".
>
> Yes, it's all already inconsistent and confusing because we have a lot
> of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
> be confused with transparent huge pages, and it matches "hugetlb.c" and
> "folio_test_hugetlb()". So let's minimize confusion in rmap code.
>
> Reviewed-by: Muchun Song <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 4 ++--
> mm/hugetlb.c | 8 ++++----
> mm/migrate.c | 4 ++--
> mm/rmap.c | 8 ++++----
> 4 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index af6a32b6f3e7..0bfea866f39b 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -208,9 +208,9 @@ void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> void page_remove_rmap(struct page *, struct vm_area_struct *,
> bool compound);
>
> -void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *,
> +void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address, rmap_t flags);
> -void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> +void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> static inline void __page_dup_rmap(struct page *page, bool compound)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6feb3e0630d1..305f3ca1dee6 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5285,7 +5285,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add
> pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
>
> __folio_mark_uptodate(new_folio);
> - hugepage_add_new_anon_rmap(new_folio, vma, addr);
> + hugetlb_add_new_anon_rmap(new_folio, vma, addr);
> if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
> newpte = huge_pte_mkuffd_wp(newpte);
> set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz);
> @@ -5988,7 +5988,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> /* Break COW or unshare */
> huge_ptep_clear_flush(vma, haddr, ptep);
> page_remove_rmap(&old_folio->page, vma, true);
> - hugepage_add_new_anon_rmap(new_folio, vma, haddr);
> + hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
> if (huge_pte_uffd_wp(pte))
> newpte = huge_pte_mkuffd_wp(newpte);
> set_huge_pte_at(mm, haddr, ptep, newpte, huge_page_size(h));
> @@ -6277,7 +6277,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> goto backout;
>
> if (anon_rmap)
> - hugepage_add_new_anon_rmap(folio, vma, haddr);
> + hugetlb_add_new_anon_rmap(folio, vma, haddr);
> else
> page_dup_file_rmap(&folio->page, true);
> new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
> @@ -6732,7 +6732,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> if (folio_in_pagecache)
> page_dup_file_rmap(&folio->page, true);
> else
> - hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr);
> + hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);
>
> /*
> * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 35a88334bb3c..4cb849fa0dd2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -249,8 +249,8 @@ static bool remove_migration_pte(struct folio *folio,
>
> pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
> if (folio_test_anon(folio))
> - hugepage_add_anon_rmap(folio, vma, pvmw.address,
> - rmap_flags);
> + hugetlb_add_anon_rmap(folio, vma, pvmw.address,
> + rmap_flags);
> else
> page_dup_file_rmap(new, true);
> set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 846fc79f3ca9..80d42c31281a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2625,8 +2625,8 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
> *
> * RMAP_COMPOUND is ignored.
> */
> -void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> - unsigned long address, rmap_t flags)
> +void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> + unsigned long address, rmap_t flags)
> {
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> @@ -2637,8 +2637,8 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> PageAnonExclusive(&folio->page), folio);
> }
>
> -void hugepage_add_new_anon_rmap(struct folio *folio,
> - struct vm_area_struct *vma, unsigned long address)
> +void hugetlb_add_new_anon_rmap(struct folio *folio,
> + struct vm_area_struct *vma, unsigned long address)
> {
> BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> /* increment count (starts at -1) */

2023-12-11 16:16:41

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()

On 11/12/2023 15:56, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
> code from page_remove_rmap(). This effectively removes one check on the
> small-folio path as well.
>
> Note: all possible candidates that need care are page_remove_rmap() that
> pass compound=true.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 5 +++++
> mm/hugetlb.c | 4 ++--
> mm/rmap.c | 17 ++++++++---------
> 3 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 0bfea866f39b..d85bd1d4de04 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -213,6 +213,11 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +static inline void hugetlb_remove_rmap(struct folio *folio)
> +{
> + atomic_dec(&folio->_entire_mapcount);
> +}
> +
> static inline void __page_dup_rmap(struct page *page, bool compound)
> {
> if (compound) {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 305f3ca1dee6..ef48ae673890 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5676,7 +5676,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> make_pte_marker(PTE_MARKER_UFFD_WP),
> sz);
> hugetlb_count_sub(pages_per_huge_page(h), mm);
> - page_remove_rmap(page, vma, true);
> + hugetlb_remove_rmap(page_folio(page));
>
> spin_unlock(ptl);
> tlb_remove_page_size(tlb, page, huge_page_size(h));
> @@ -5987,7 +5987,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>
> /* Break COW or unshare */
> huge_ptep_clear_flush(vma, haddr, ptep);
> - page_remove_rmap(&old_folio->page, vma, true);
> + hugetlb_remove_rmap(old_folio);
> hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
> if (huge_pte_uffd_wp(pte))
> newpte = huge_pte_mkuffd_wp(newpte);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 80d42c31281a..4e60c1f38eaa 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1482,13 +1482,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> - /* Hugetlb pages are not counted in NR_*MAPPED */
> - if (unlikely(folio_test_hugetlb(folio))) {
> - /* hugetlb pages are always mapped with pmds */
> - atomic_dec(&folio->_entire_mapcount);
> - return;
> - }
> -
> /* Is page being unmapped by PTE? Is this its last map to be removed? */
> if (likely(!compound)) {
> last = atomic_add_negative(-1, &page->_mapcount);
> @@ -1846,7 +1839,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> dec_mm_counter(mm, mm_counter_file(&folio->page));
> }
> discard:
> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
> + if (unlikely(folio_test_hugetlb(folio)))
> + hugetlb_remove_rmap(folio);
> + else
> + page_remove_rmap(subpage, vma, false);
> if (vma->vm_flags & VM_LOCKED)
> mlock_drain_local();
> folio_put(folio);
> @@ -2199,7 +2195,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> */
> }
>
> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
> + if (unlikely(folio_test_hugetlb(folio)))
> + hugetlb_remove_rmap(folio);
> + else
> + page_remove_rmap(subpage, vma, false);
> if (vma->vm_flags & VM_LOCKED)
> mlock_drain_local();
> folio_put(folio);

2023-12-11 16:18:30

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 03/39] mm/rmap: introduce and use hugetlb_add_file_rmap()

On 11/12/2023 15:56, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Right now we're using page_dup_file_rmap() in some cases where "ordinary"
> rmap code would have used page_add_file_rmap(). So let's introduce and
> use hugetlb_add_file_rmap() instead. We won't be adding a
> "hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
> doing the same: "dup" is just an optimization for "add".
>
> What remains is a single page_dup_file_rmap() call in fork() code.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 7 +++++++
> mm/hugetlb.c | 6 +++---
> mm/migrate.c | 2 +-
> 3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index d85bd1d4de04..91178d1aa028 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -213,6 +213,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +static inline void hugetlb_add_file_rmap(struct folio *folio)
> +{
> + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> +
> + atomic_inc(&folio->_entire_mapcount);
> +}
> +
> static inline void hugetlb_remove_rmap(struct folio *folio)
> {
> atomic_dec(&folio->_entire_mapcount);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ef48ae673890..57e898187931 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5408,7 +5408,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> * sleep during the process.
> */
> if (!folio_test_anon(pte_folio)) {
> - page_dup_file_rmap(&pte_folio->page, true);
> + hugetlb_add_file_rmap(pte_folio);
> } else if (page_try_dup_anon_rmap(&pte_folio->page,
> true, src_vma)) {
> pte_t src_pte_old = entry;
> @@ -6279,7 +6279,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> if (anon_rmap)
> hugetlb_add_new_anon_rmap(folio, vma, haddr);
> else
> - page_dup_file_rmap(&folio->page, true);
> + hugetlb_add_file_rmap(folio);
> new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
> && (vma->vm_flags & VM_SHARED)));
> /*
> @@ -6730,7 +6730,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> goto out_release_unlock;
>
> if (folio_in_pagecache)
> - page_dup_file_rmap(&folio->page, true);
> + hugetlb_add_file_rmap(folio);
> else
> hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4cb849fa0dd2..de9d94b99ab7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio,
> hugetlb_add_anon_rmap(folio, vma, pvmw.address,
> rmap_flags);
> else
> - page_dup_file_rmap(new, true);
> + hugetlb_add_file_rmap(folio);
> set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
> psize);
> } else

2023-12-11 16:24:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 01/39] mm/rmap: rename hugepage_add* to hugetlb_add*

On Mon, Dec 11, 2023 at 04:56:14PM +0100, David Hildenbrand wrote:
> Let's just call it "hugetlb_".
>
> Yes, it's all already inconsistent and confusing because we have a lot
> of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
> be confused with transparent huge pages, and it matches "hugetlb.c" and
> "folio_test_hugetlb()". So let's minimize confusion in rmap code.
>
> Reviewed-by: Muchun Song <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>

2023-12-11 16:26:34

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 04/39] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

On 11/12/2023 15:56, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
>
> Note that is_device_private_page() does not apply to hugetlb.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/mm.h | 12 +++++++++---
> include/linux/rmap.h | 15 +++++++++++++++
> mm/hugetlb.c | 3 +--
> 3 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b72bf25a45cf..ae547b62f325 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1964,15 +1964,21 @@ static inline bool page_maybe_dma_pinned(struct page *page)
> *
> * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
> */
> -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
> - struct page *page)
> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
> + struct folio *folio)
> {
> VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));
>
> if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
> return false;
>
> - return page_maybe_dma_pinned(page);
> + return folio_maybe_dma_pinned(folio);
> +}
> +
> +static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
> + struct page *page)
> +{
> + return folio_needs_cow_for_dma(vma, page_folio(page));
> }
>
> /**
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 91178d1aa028..ca42b3db5688 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -213,6 +213,21 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +/* See page_try_dup_anon_rmap() */
> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> + struct vm_area_struct *vma)
> +{
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> +
> + if (PageAnonExclusive(&folio->page)) {
> + if (unlikely(folio_needs_cow_for_dma(vma, folio)))
> + return -EBUSY;
> + ClearPageAnonExclusive(&folio->page);
> + }
> + atomic_inc(&folio->_entire_mapcount);
> + return 0;
> +}
> +
> static inline void hugetlb_add_file_rmap(struct folio *folio)
> {
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 57e898187931..378e460a6ab4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5409,8 +5409,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> */
> if (!folio_test_anon(pte_folio)) {
> hugetlb_add_file_rmap(pte_folio);
> - } else if (page_try_dup_anon_rmap(&pte_folio->page,
> - true, src_vma)) {
> + } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
> pte_t src_pte_old = entry;
> struct folio *new_folio;
>

2023-12-11 16:29:37

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 05/39] mm/rmap: introduce and use hugetlb_try_share_anon_rmap()

On 11/12/2023 15:56, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
>
> Note that try_to_unmap_one() does not need care. Easy to spot because
> among all that nasty hugetlb special-casing in that function, we're not
> using set_huge_pte_at() on the anon path -- well, and that code assumes
> that we would want to swapout.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 23 +++++++++++++++++++++++
> mm/rmap.c | 15 ++++++++++-----
> 2 files changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index ca42b3db5688..4c0650e9f6db 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -228,6 +228,29 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> return 0;
> }
>
> +/* See page_try_share_anon_rmap() */
> +static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
> +{
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> + VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
> +
> + /* Paired with the memory barrier in try_grab_folio(). */
> + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
> + smp_mb();
> +
> + if (unlikely(folio_maybe_dma_pinned(folio)))
> + return -EBUSY;
> + ClearPageAnonExclusive(&folio->page);
> +
> + /*
> + * This is conceptually a smp_wmb() paired with the smp_rmb() in
> + * gup_must_unshare().
> + */
> + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
> + smp_mb__after_atomic();
> + return 0;
> +}
> +
> static inline void hugetlb_add_file_rmap(struct folio *folio)
> {
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4e60c1f38eaa..e210ac1b73de 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2147,13 +2147,18 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> !anon_exclusive, subpage);
>
> /* See page_try_share_anon_rmap(): clear PTE first. */
> - if (anon_exclusive &&
> - page_try_share_anon_rmap(subpage)) {
> - if (folio_test_hugetlb(folio))
> + if (folio_test_hugetlb(folio)) {
> + if (anon_exclusive &&
> + hugetlb_try_share_anon_rmap(folio)) {
> set_huge_pte_at(mm, address, pvmw.pte,
> pteval, hsz);
> - else
> - set_pte_at(mm, address, pvmw.pte, pteval);
> + ret = false;
> + page_vma_mapped_walk_done(&pvmw);
> + break;
> + }
> + } else if (anon_exclusive &&
> + page_try_share_anon_rmap(subpage)) {
> + set_pte_at(mm, address, pvmw.pte, pteval);
> ret = false;
> page_vma_mapped_walk_done(&pvmw);
> break;

2023-12-11 16:30:50

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 06/39] mm/rmap: add hugetlb sanity checks

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's make sure we end up with the right folios in the right functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 7 +++++++
> mm/rmap.c | 6 ++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 4c0650e9f6db..e3857d26b944 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -217,6 +217,7 @@ void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> struct vm_area_struct *vma)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> if (PageAnonExclusive(&folio->page)) {
> @@ -231,6 +232,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> /* See page_try_share_anon_rmap() */
> static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
>
> @@ -253,6 +255,7 @@ static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
>
> static inline void hugetlb_add_file_rmap(struct folio *folio)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
>
> atomic_inc(&folio->_entire_mapcount);
> @@ -260,11 +263,15 @@ static inline void hugetlb_add_file_rmap(struct folio *folio)
>
> static inline void hugetlb_remove_rmap(struct folio *folio)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> +
> atomic_dec(&folio->_entire_mapcount);
> }
>
> static inline void __page_dup_rmap(struct page *page, bool compound)
> {
> + VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
> +
> if (compound) {
> struct folio *folio = (struct folio *)page;
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e210ac1b73de..41597da14f26 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1343,6 +1343,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> {
> int nr = folio_nr_pages(folio);
>
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_VMA(address < vma->vm_start ||
> address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
> __folio_set_swapbacked(folio);
> @@ -1395,6 +1396,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> unsigned int nr_pmdmapped = 0, first;
> int nr = 0;
>
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>
> /* Is page being mapped by PTE? Is this its first map to be added? */
> @@ -1480,6 +1482,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
> bool last;
> enum node_stat_item idx;
>
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> /* Is page being unmapped by PTE? Is this its last map to be removed? */
> @@ -2632,6 +2635,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
> void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> unsigned long address, rmap_t flags)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> atomic_inc(&folio->_entire_mapcount);
> @@ -2644,6 +2648,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> void hugetlb_add_new_anon_rmap(struct folio *folio,
> struct vm_area_struct *vma, unsigned long address)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> +
> BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> /* increment count (starts at -1) */
> atomic_set(&folio->_entire_mapcount, 0);

2023-12-11 16:33:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()

On Mon, Dec 11, 2023 at 04:56:15PM +0100, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
> code from page_remove_rmap(). This effectively removes one check on the
> small-folio path as well.
>
> Note: all possible candidates that need care are page_remove_rmap() that
> pass compound=true.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>

> +++ b/mm/rmap.c
> @@ -1482,13 +1482,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> - /* Hugetlb pages are not counted in NR_*MAPPED */
> - if (unlikely(folio_test_hugetlb(folio))) {
> - /* hugetlb pages are always mapped with pmds */
> - atomic_dec(&folio->_entire_mapcount);
> - return;
> - }

Maybe add
VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio);

2023-12-11 16:36:34

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()

On 11.12.23 17:33, Matthew Wilcox wrote:
> On Mon, Dec 11, 2023 at 04:56:15PM +0100, David Hildenbrand wrote:
>> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
>> For example, hugetlb currently only supports entire mappings, and treats
>> any mapping as mapped using a single "logical PTE". Let's move it out
>> of the way so we can overhaul our "ordinary" rmap.
>> implementation/interface.
>>
>> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
>> code from page_remove_rmap(). This effectively removes one check on the
>> small-folio path as well.
>>
>> Note: all possible candidates that need care are page_remove_rmap() that
>> pass compound=true.
>>
>> Reviewed-by: Yin Fengwei <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
>
>> +++ b/mm/rmap.c
>> @@ -1482,13 +1482,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>>
>> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>>
>> - /* Hugetlb pages are not counted in NR_*MAPPED */
>> - if (unlikely(folio_test_hugetlb(folio))) {
>> - /* hugetlb pages are always mapped with pmds */
>> - atomic_dec(&folio->_entire_mapcount);
>> - return;
>> - }
>
> Maybe add
> VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio);
>

A bulk-add that in patch #6.

Thanks!

--
Cheers,

David / dhildenb

2023-12-13 05:38:41

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH v1 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()



On 2023/12/11 23:56, David Hildenbrand wrote:
> Let's get rid of the compound parameter and instead define implicitly
> which mappings we're adding. That is more future proof, easier to read
> and harder to mess up.
>
> Use an enum to express the granularity internally. Make the compiler
> always special-case on the granularity by using __always_inline. Replace
> the "compound" check by a switch-case that will be removed by the
> compiler completely.
>
> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
> folio_test_pmd_mappable() check by a config check in the caller and
> sanity checks. Convert the single user of folio_add_file_rmap_range().
>
> This function design can later easily be extended to PUDs and to batch
> PMDs. Note that for now we don't support anything bigger than
> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
> will catch if that ever changes.
>
> Next up is removing page_remove_rmap() along with its "compound"
> parameter and smilarly converting all other rmap functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

With one small comment.

> ---
> include/linux/rmap.h | 47 +++++++++++++++++++++++++--
> mm/memory.c | 2 +-
> mm/rmap.c | 75 +++++++++++++++++++++++++++++---------------
> 3 files changed, 95 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index e3857d26b944..1753900f4aed 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -191,6 +191,45 @@ typedef int __bitwise rmap_t;
> */
> #define RMAP_COMPOUND ((__force rmap_t)BIT(1))
>
> +/*
> + * Internally, we're using an enum to specify the granularity. Usually,
> + * we make the compiler create specialized variants for the different
> + * granularity.
> + */
> +enum rmap_mode {
> + RMAP_MODE_PTE = 0,
> + RMAP_MODE_PMD,
> +};
Maybe rmap_level for enum name? To me, PTE and PMD are level instead of
mode.


> +
> +static inline void __folio_rmap_sanity_checks(struct folio *folio,
> + struct page *page, int nr_pages, enum rmap_mode mode)
> +{
> + /* hugetlb folios are handled separately. */
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> + !folio_test_large_rmappable(folio), folio);
> +
> + VM_WARN_ON_ONCE(nr_pages <= 0);
> + VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
> + VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
> +
> + switch (mode) {
> + case RMAP_MODE_PTE:
> + break;
> + case RMAP_MODE_PMD:
> + /*
> + * We don't support folios larger than a single PMD yet. So
> + * when RMAP_MODE_PMD is set, we assume that we are creating
> + * a single "entire" mapping of the folio.
> + */
> + VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
> + VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
> + break;
> + default:
> + VM_WARN_ON_ONCE(true);
> + }
> +}
> +
> /*
> * rmap interfaces called when adding or removing pte of page
> */
> @@ -203,8 +242,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
> void page_add_file_rmap(struct page *, struct vm_area_struct *,
> bool compound);
> -void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> - struct vm_area_struct *, bool compound);
> +void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
> + struct vm_area_struct *);
> +#define folio_add_file_rmap_pte(folio, page, vma) \
> + folio_add_file_rmap_ptes(folio, page, 1, vma)
> +void folio_add_file_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *);
> void page_remove_rmap(struct page *, struct vm_area_struct *,
> bool compound);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 8f0b936b90b5..6a5540ba3c65 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4515,7 +4515,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
> folio_add_lru_vma(folio, vma);
> } else {
> add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
> - folio_add_file_rmap_range(folio, page, nr, vma, false);
> + folio_add_file_rmap_ptes(folio, page, nr, vma);
> }
> set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 41597da14f26..4f30930a1162 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1376,31 +1376,20 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
> }
>
> -/**
> - * folio_add_file_rmap_range - add pte mapping to page range of a folio
> - * @folio: The folio to add the mapping to
> - * @page: The first page to add
> - * @nr_pages: The number of pages which will be mapped
> - * @vma: the vm area in which the mapping is added
> - * @compound: charge the page as compound or small page
> - *
> - * The page range of folio is defined by [first_page, first_page + nr_pages)
> - *
> - * The caller needs to hold the pte lock.
> - */
> -void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> - unsigned int nr_pages, struct vm_area_struct *vma,
> - bool compound)
> +static __always_inline void __folio_add_file_rmap(struct folio *folio,
> + struct page *page, int nr_pages, struct vm_area_struct *vma,
> + enum rmap_mode mode)
> {
> atomic_t *mapped = &folio->_nr_pages_mapped;
> unsigned int nr_pmdmapped = 0, first;
> int nr = 0;
>
> - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> - VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>
> /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(!compound)) {
> + switch (mode) {
> + case RMAP_MODE_PTE:
> do {
> first = atomic_inc_and_test(&page->_mapcount);
> if (first && folio_test_large(folio)) {
> @@ -1411,9 +1400,8 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> if (first)
> nr++;
> } while (page++, --nr_pages > 0);
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> -
> + break;
> + case RMAP_MODE_PMD:
> first = atomic_inc_and_test(&folio->_entire_mapcount);
> if (first) {
> nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1428,6 +1416,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> nr = 0;
> }
> }
> + break;
> }
>
> if (nr_pmdmapped)
> @@ -1441,6 +1430,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> mlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
> + * @folio: The folio to add the mappings to
> + * @page: The first page to add
> + * @nr_pages: The number of pages that will be mapped using PTEs
> + * @vma: The vm area in which the mappings are added
> + *
> + * The page range of the folio is defined by [page, page + nr_pages)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
> + int nr_pages, struct vm_area_struct *vma)
> +{
> + __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
> + * @folio: The folio to add the mapping to
> + * @page: The first page to add
> + * @vma: The vm area in which the mapping is added
> + *
> + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /**
> * page_add_file_rmap - add pte mapping to a file page
> * @page: the page to add the mapping to
> @@ -1453,16 +1479,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> bool compound)
> {
> struct folio *folio = page_folio(page);
> - unsigned int nr_pages;
>
> VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>
> if (likely(!compound))
> - nr_pages = 1;
> + folio_add_file_rmap_pte(folio, page, vma);
> else
> - nr_pages = folio_nr_pages(folio);
> -
> - folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
> + folio_add_file_rmap_pmd(folio, page, vma);
> }
>
> /**

2023-12-13 08:47:50

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

On 13.12.23 06:33, Yin Fengwei wrote:
>
>
> On 2023/12/11 23:56, David Hildenbrand wrote:
>> Let's get rid of the compound parameter and instead define implicitly
>> which mappings we're adding. That is more future proof, easier to read
>> and harder to mess up.
>>
>> Use an enum to express the granularity internally. Make the compiler
>> always special-case on the granularity by using __always_inline. Replace
>> the "compound" check by a switch-case that will be removed by the
>> compiler completely.
>>
>> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
>> folio_test_pmd_mappable() check by a config check in the caller and
>> sanity checks. Convert the single user of folio_add_file_rmap_range().
>>
>> This function design can later easily be extended to PUDs and to batch
>> PMDs. Note that for now we don't support anything bigger than
>> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
>> will catch if that ever changes.
>>
>> Next up is removing page_remove_rmap() along with its "compound"
>> parameter and smilarly converting all other rmap functions.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
> Reviewed-by: Yin Fengwei <[email protected]>
>

Thanks!

> With one small comment.
>
>> ---
>> include/linux/rmap.h | 47 +++++++++++++++++++++++++--
>> mm/memory.c | 2 +-
>> mm/rmap.c | 75 +++++++++++++++++++++++++++++---------------
>> 3 files changed, 95 insertions(+), 29 deletions(-)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index e3857d26b944..1753900f4aed 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -191,6 +191,45 @@ typedef int __bitwise rmap_t;
>> */
>> #define RMAP_COMPOUND ((__force rmap_t)BIT(1))
>>
>> +/*
>> + * Internally, we're using an enum to specify the granularity. Usually,
>> + * we make the compiler create specialized variants for the different
>> + * granularity.
>> + */
>> +enum rmap_mode {
>> + RMAP_MODE_PTE = 0,
>> + RMAP_MODE_PMD,
>> +};
> Maybe rmap_level for enum name? To me, PTE and PMD are level instead of
> mode.

Originally, I wanted to call this "enum rmap_granularity", but that
turned out rather long. Agreed that "level" is better than "mode",
something resembling "granularity" would be even better.

--
Cheers,

David / dhildenb

2023-12-13 09:03:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 06/39] mm/rmap: add hugetlb sanity checks

On 11.12.23 16:56, David Hildenbrand wrote:
> Let's make sure we end up with the right folios in the right functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---

I'll move all !anon handling to the relevant patches, so for this patch
we'll only end up adding sanity checks for the "add" and "add_new" variants.

--
Cheers,

David / dhildenb

2023-12-15 02:26:59

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH v1 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()



On 12/11/2023 11:56 PM, David Hildenbrand wrote:
> Let's mimic what we did with folio_add_file_rmap_*() so we can similarly
> replace page_add_anon_rmap() next.
>
> Make the compiler always special-case on the granularity by using
> __always_inline.
>
> Note that the new functions ignore the RMAP_COMPOUND flag, which we will
> remove as soon as page_add_anon_rmap() is gone.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

With a small question below.

> ---
> include/linux/rmap.h | 6 +++
> mm/rmap.c | 118 ++++++++++++++++++++++++++++++-------------
> 2 files changed, 88 insertions(+), 36 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 7198905dc8be..3b5357cb1c09 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -234,6 +234,12 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
> * rmap interfaces called when adding or removing pte of page
> */
> void folio_move_anon_rmap(struct folio *, struct vm_area_struct *);
> +void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages,
> + struct vm_area_struct *, unsigned long address, rmap_t flags);
> +#define folio_add_anon_rmap_pte(folio, page, vma, address, flags) \
> + folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
> +void folio_add_anon_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *, unsigned long address, rmap_t flags);
> void page_add_anon_rmap(struct page *, struct vm_area_struct *,
> unsigned long address, rmap_t flags);
> void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c5761986a411..7787499fa2ad 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1300,38 +1300,20 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> unsigned long address, rmap_t flags)
> {
> struct folio *folio = page_folio(page);
> - atomic_t *mapped = &folio->_nr_pages_mapped;
> - int nr = 0, nr_pmdmapped = 0;
> - bool compound = flags & RMAP_COMPOUND;
> - bool first;
>
> - /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(!compound)) {
> - first = atomic_inc_and_test(&page->_mapcount);
> - nr = first;
> - if (first && folio_test_large(folio)) {
> - nr = atomic_inc_return_relaxed(mapped);
> - nr = (nr < COMPOUND_MAPPED);
> - }
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> + if (likely(!(flags & RMAP_COMPOUND)))
> + folio_add_anon_rmap_pte(folio, page, vma, address, flags);
> + else
> + folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
> +}
>
> - first = atomic_inc_and_test(&folio->_entire_mapcount);
> - if (first) {
> - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
> - nr_pmdmapped = folio_nr_pages(folio);
> - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> - /* Raced ahead of a remove and another add? */
> - if (unlikely(nr < 0))
> - nr = 0;
> - } else {
> - /* Raced ahead of a remove of COMPOUND_MAPPED */
> - nr = 0;
> - }
> - }
> - }
> +static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> + struct page *page, int nr_pages, struct vm_area_struct *vma,
> + unsigned long address, rmap_t flags, enum rmap_mode mode)
> +{
> + unsigned int i, nr, nr_pmdmapped = 0;
>
> + nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
> if (nr_pmdmapped)
> __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped);
> if (nr)
> @@ -1345,18 +1327,34 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> * folio->index right when not given the address of the head
> * page.
> */
> - VM_WARN_ON_FOLIO(folio_test_large(folio) && !compound, folio);
> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> + mode != RMAP_MODE_PMD, folio);
> __folio_set_anon(folio, vma, address,
> !!(flags & RMAP_EXCLUSIVE));
> } else if (likely(!folio_test_ksm(folio))) {
> __page_check_anon_rmap(folio, page, vma, address);
> }
> - if (flags & RMAP_EXCLUSIVE)
> - SetPageAnonExclusive(page);
> - /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
> - VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 ||
> - (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) &&
> - PageAnonExclusive(page), folio);
> +
> + if (flags & RMAP_EXCLUSIVE) {
> + switch (mode) {
> + case RMAP_MODE_PTE:
> + for (i = 0; i < nr_pages; i++)
> + SetPageAnonExclusive(page + i);
> + break;
> + case RMAP_MODE_PMD:
> + SetPageAnonExclusive(page);
> + break;
> + }
> + }
> + for (i = 0; i < nr_pages; i++) {
> + struct page *cur_page = page + i;
> +
> + /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
> + VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 ||
> + (folio_test_large(folio) &&
> + folio_entire_mapcount(folio) > 1)) &&
> + PageAnonExclusive(cur_page), folio);
> + }
This change will iterate all pages for PMD case. The original behavior
didn't check all pages. Is this change by purpose? Thanks.

>
> /*
> * For large folio, only mlock it if it's fully mapped to VMA. It's
> @@ -1368,6 +1366,54 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> mlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio
> + * @folio: The folio to add the mappings to
> + * @page: The first page to add
> + * @nr_pages: The number of pages which will be mapped
> + * @vma: The vm area in which the mappings are added
> + * @address: The user virtual address of the first page to map
> + * @flags: The rmap flags
> + *
> + * The page range of folio is defined by [first_page, first_page + nr_pages)
> + *
> + * The caller needs to hold the page table lock, and the page must be locked in
> + * the anon_vma case: to serialize mapping,index checking after setting,
> + * and to ensure that an anon folio is not being upgraded racily to a KSM folio
> + * (but KSM folios are never downgraded).
> + */
> +void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page,
> + int nr_pages, struct vm_area_struct *vma, unsigned long address,
> + rmap_t flags)
> +{
> + __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags,
> + RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio
> + * @folio: The folio to add the mapping to
> + * @page: The first page to add
> + * @vma: The vm area in which the mapping is added
> + * @address: The user virtual address of the first page to map
> + * @flags: The rmap flags
> + *
> + * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock, and the page must be locked in
> + * the anon_vma case: to serialize mapping,index checking after setting.
> + */
> +void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma, unsigned long address, rmap_t flags)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags,
> + RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /**
> * folio_add_new_anon_rmap - Add mapping to a new anonymous folio.
> * @folio: The folio to add the mapping to.

2023-12-15 02:27:43

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()



On 12/11/2023 11:56 PM, David Hildenbrand wrote:
> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>
> While at it, use more folio operations (but only in the code branch we're
> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
> manually setting PageAnonExclusive.
>
> We should never see non-anon pages on that branch: otherwise, the
> existing page_add_anon_rmap() call would have been flawed already.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

2023-12-15 02:54:21

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()



On 12/11/2023 11:56 PM, David Hildenbrand wrote:
> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>
> While at it, use more folio operations (but only in the code branch we're
> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
> manually setting PageAnonExclusive.
>
> We should never see non-anon pages on that branch: otherwise, the
> existing page_add_anon_rmap() call would have been flawed already.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>


2023-12-15 15:20:27

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()

On 15.12.23 03:26, Yin, Fengwei wrote:
>
>
> On 12/11/2023 11:56 PM, David Hildenbrand wrote:
>> Let's mimic what we did with folio_add_file_rmap_*() so we can similarly
>> replace page_add_anon_rmap() next.
>>
>> Make the compiler always special-case on the granularity by using
>> __always_inline.
>>
>> Note that the new functions ignore the RMAP_COMPOUND flag, which we will
>> remove as soon as page_add_anon_rmap() is gone.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
> Reviewed-by: Yin Fengwei <[email protected]>
>
> With a small question below.
>

Thanks!

[...]

>> + if (flags & RMAP_EXCLUSIVE) {
>> + switch (mode) {
>> + case RMAP_MODE_PTE:
>> + for (i = 0; i < nr_pages; i++)
>> + SetPageAnonExclusive(page + i);
>> + break;
>> + case RMAP_MODE_PMD:
>> + SetPageAnonExclusive(page);
>> + break;
>> + }
>> + }
>> + for (i = 0; i < nr_pages; i++) {
>> + struct page *cur_page = page + i;
>> +
>> + /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
>> + VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 ||
>> + (folio_test_large(folio) &&
>> + folio_entire_mapcount(folio) > 1)) &&
>> + PageAnonExclusive(cur_page), folio);
>> + }
> This change will iterate all pages for PMD case. The original behavior
> didn't check all pages. Is this change by purpose? Thanks.

Yes, on purpose. I first thought about also separating the code paths
here, but realized that it makes much more sense to check each
individual subpage that is effectively getting mapped by that PMD,
instead of only the head page.

I'll add a comment to the patch description.

--
Cheers,

David / dhildenb


2023-12-18 15:56:37

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 08/39] mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's convert insert_page_into_pte_locked() and do_set_pmd(). While at it,
> perform some folio conversion.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> mm/memory.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6a5540ba3c65..70754fd65788 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1859,12 +1859,14 @@ static int validate_page_before_insert(struct page *page)
> static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
> unsigned long addr, struct page *page, pgprot_t prot)
> {
> + struct folio *folio = page_folio(page);
> +
> if (!pte_none(ptep_get(pte)))
> return -EBUSY;
> /* Ok, finally just insert the thing.. */
> - get_page(page);
> + folio_get(folio);
> inc_mm_counter(vma->vm_mm, mm_counter_file(page));
> - page_add_file_rmap(page, vma, false);
> + folio_add_file_rmap_pte(folio, page, vma);
> set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
> return 0;
> }
> @@ -4409,6 +4411,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)
>
> vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> {
> + struct folio *folio = page_folio(page);
> struct vm_area_struct *vma = vmf->vma;
> bool write = vmf->flags & FAULT_FLAG_WRITE;
> unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> @@ -4418,8 +4421,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
> return ret;
>
> - page = compound_head(page);
> - if (compound_order(page) != HPAGE_PMD_ORDER)
> + if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
> return ret;
>
> /*
> @@ -4428,7 +4430,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> * check. This kind of THP just can be PTE mapped. Access to
> * the corrupted subpage should trigger SIGBUS as expected.
> */
> - if (unlikely(PageHasHWPoisoned(page)))
> + if (unlikely(folio_test_has_hwpoisoned(folio)))
> return ret;
>
> /*
> @@ -4452,7 +4454,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
>
> add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
> - page_add_file_rmap(page, vma, true);
> + folio_add_file_rmap_pmd(folio, page, vma);
>
> /*
> * deposit and withdraw with pmd lock held


2023-12-18 15:59:26

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's get rid of the compound parameter and instead define implicitly

nit: think you mean explicitly

> which mappings we're adding. That is more future proof, easier to read
> and harder to mess up.
>
> Use an enum to express the granularity internally. Make the compiler
> always special-case on the granularity by using __always_inline. Replace
> the "compound" check by a switch-case that will be removed by the
> compiler completely.
>
> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
> folio_test_pmd_mappable() check by a config check in the caller and
> sanity checks. Convert the single user of folio_add_file_rmap_range().
>
> This function design can later easily be extended to PUDs and to batch
> PMDs. Note that for now we don't support anything bigger than
> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
> will catch if that ever changes.
>
> Next up is removing page_remove_rmap() along with its "compound"
> parameter and smilarly converting all other rmap functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 47 +++++++++++++++++++++++++--
> mm/memory.c | 2 +-
> mm/rmap.c | 75 +++++++++++++++++++++++++++++---------------
> 3 files changed, 95 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index e3857d26b944..1753900f4aed 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -191,6 +191,45 @@ typedef int __bitwise rmap_t;
> */
> #define RMAP_COMPOUND ((__force rmap_t)BIT(1))
>
> +/*
> + * Internally, we're using an enum to specify the granularity. Usually,
> + * we make the compiler create specialized variants for the different
> + * granularity.
> + */
> +enum rmap_mode {
> + RMAP_MODE_PTE = 0,
> + RMAP_MODE_PMD,
> +};
> +
> +static inline void __folio_rmap_sanity_checks(struct folio *folio,
> + struct page *page, int nr_pages, enum rmap_mode mode)
> +{
> + /* hugetlb folios are handled separately. */
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> + !folio_test_large_rmappable(folio), folio);
> +
> + VM_WARN_ON_ONCE(nr_pages <= 0);
> + VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
> + VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
> +
> + switch (mode) {
> + case RMAP_MODE_PTE:
> + break;
> + case RMAP_MODE_PMD:
> + /*
> + * We don't support folios larger than a single PMD yet. So
> + * when RMAP_MODE_PMD is set, we assume that we are creating
> + * a single "entire" mapping of the folio.
> + */
> + VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
> + VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
> + break;
> + default:
> + VM_WARN_ON_ONCE(true);
> + }
> +}
> +
> /*
> * rmap interfaces called when adding or removing pte of page
> */
> @@ -203,8 +242,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
> void page_add_file_rmap(struct page *, struct vm_area_struct *,
> bool compound);
> -void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> - struct vm_area_struct *, bool compound);
> +void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
> + struct vm_area_struct *);
> +#define folio_add_file_rmap_pte(folio, page, vma) \
> + folio_add_file_rmap_ptes(folio, page, 1, vma)
> +void folio_add_file_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *);
> void page_remove_rmap(struct page *, struct vm_area_struct *,
> bool compound);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 8f0b936b90b5..6a5540ba3c65 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4515,7 +4515,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
> folio_add_lru_vma(folio, vma);
> } else {
> add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
> - folio_add_file_rmap_range(folio, page, nr, vma, false);
> + folio_add_file_rmap_ptes(folio, page, nr, vma);
> }
> set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 41597da14f26..4f30930a1162 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1376,31 +1376,20 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
> }
>
> -/**
> - * folio_add_file_rmap_range - add pte mapping to page range of a folio
> - * @folio: The folio to add the mapping to
> - * @page: The first page to add
> - * @nr_pages: The number of pages which will be mapped
> - * @vma: the vm area in which the mapping is added
> - * @compound: charge the page as compound or small page
> - *
> - * The page range of folio is defined by [first_page, first_page + nr_pages)
> - *
> - * The caller needs to hold the pte lock.
> - */
> -void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> - unsigned int nr_pages, struct vm_area_struct *vma,
> - bool compound)
> +static __always_inline void __folio_add_file_rmap(struct folio *folio,
> + struct page *page, int nr_pages, struct vm_area_struct *vma,
> + enum rmap_mode mode)
> {
> atomic_t *mapped = &folio->_nr_pages_mapped;
> unsigned int nr_pmdmapped = 0, first;
> int nr = 0;
>
> - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> - VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>
> /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(!compound)) {
> + switch (mode) {
> + case RMAP_MODE_PTE:
> do {
> first = atomic_inc_and_test(&page->_mapcount);
> if (first && folio_test_large(folio)) {
> @@ -1411,9 +1400,8 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> if (first)
> nr++;
> } while (page++, --nr_pages > 0);
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> -
> + break;
> + case RMAP_MODE_PMD:
> first = atomic_inc_and_test(&folio->_entire_mapcount);
> if (first) {
> nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1428,6 +1416,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> nr = 0;
> }
> }
> + break;
> }
>
> if (nr_pmdmapped)
> @@ -1441,6 +1430,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> mlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
> + * @folio: The folio to add the mappings to
> + * @page: The first page to add
> + * @nr_pages: The number of pages that will be mapped using PTEs
> + * @vma: The vm area in which the mappings are added
> + *
> + * The page range of the folio is defined by [page, page + nr_pages)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
> + int nr_pages, struct vm_area_struct *vma)
> +{
> + __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
> + * @folio: The folio to add the mapping to
> + * @page: The first page to add
> + * @vma: The vm area in which the mapping is added
> + *
> + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /**
> * page_add_file_rmap - add pte mapping to a file page
> * @page: the page to add the mapping to
> @@ -1453,16 +1479,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> bool compound)
> {
> struct folio *folio = page_folio(page);
> - unsigned int nr_pages;
>
> VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>
> if (likely(!compound))
> - nr_pages = 1;
> + folio_add_file_rmap_pte(folio, page, vma);
> else
> - nr_pages = folio_nr_pages(folio);
> -
> - folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
> + folio_add_file_rmap_pmd(folio, page, vma);
> }
>
> /**


2023-12-18 16:00:48

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 09/39] mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's convert remove_migration_pmd() and while at it, perform some folio
> conversion.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> mm/huge_memory.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 3a387c6f18b6..1f5634b2f374 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3577,6 +3577,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>
> void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> {
> + struct folio *folio = page_folio(new);
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> @@ -3588,7 +3589,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> return;
>
> entry = pmd_to_swp_entry(*pvmw->pmd);
> - get_page(new);
> + folio_get(folio);
> pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
> if (pmd_swp_soft_dirty(*pvmw->pmd))
> pmde = pmd_mksoft_dirty(pmde);
> @@ -3599,10 +3600,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> if (!is_migration_entry_young(entry))
> pmde = pmd_mkold(pmde);
> /* NOTE: this may contain setting soft-dirty on some archs */
> - if (PageDirty(new) && is_migration_entry_dirty(entry))
> + if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
> pmde = pmd_mkdirty(pmde);
>
> - if (PageAnon(new)) {
> + if (folio_test_anon(folio)) {
> rmap_t rmap_flags = RMAP_COMPOUND;
>
> if (!is_readable_migration_entry(entry))
> @@ -3610,9 +3611,9 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
>
> page_add_anon_rmap(new, vma, haddr, rmap_flags);
> } else {
> - page_add_file_rmap(new, vma, true);
> + folio_add_file_rmap_pmd(folio, new, vma);
> }
> - VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new));
> + VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new));
> set_pmd_at(mm, haddr, pvmw->pmd, pmde);
>
> /* No need to invalidate - it was non-present before */


2023-12-18 16:01:22

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 12/39] mm/rmap: remove page_add_file_rmap()

On 11/12/2023 15:56, David Hildenbrand wrote:
> All users are gone, let's remove it.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/rmap.h | 2 --
> mm/rmap.c | 21 ---------------------
> 2 files changed, 23 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 1753900f4aed..7198905dc8be 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -240,8 +240,6 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
> unsigned long address);
> void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
> -void page_add_file_rmap(struct page *, struct vm_area_struct *,
> - bool compound);
> void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
> struct vm_area_struct *);
> #define folio_add_file_rmap_pte(folio, page, vma) \
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4f30930a1162..2ff2f11275e5 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1467,27 +1467,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
> #endif
> }
>
> -/**
> - * page_add_file_rmap - add pte mapping to a file page
> - * @page: the page to add the mapping to
> - * @vma: the vm area in which the mapping is added
> - * @compound: charge the page as compound or small page
> - *
> - * The caller needs to hold the pte lock.
> - */
> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> - bool compound)
> -{
> - struct folio *folio = page_folio(page);
> -
> - VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
> -
> - if (likely(!compound))
> - folio_add_file_rmap_pte(folio, page, vma);
> - else
> - folio_add_file_rmap_pmd(folio, page, vma);
> -}
> -
> /**
> * page_remove_rmap - take down pte mapping from a page
> * @page: page to remove mapping from


2023-12-18 16:02:14

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 11/39] mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's convert mfill_atomic_install_pte().
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> mm/userfaultfd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 9ec814e47e99..330a481a1654 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -114,7 +114,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> /* Usually, cache pages are already added to LRU */
> if (newly_allocated)
> folio_add_lru(folio);
> - page_add_file_rmap(page, dst_vma, false);
> + folio_add_file_rmap_pte(folio, page, dst_vma);
> } else {
> page_add_new_anon_rmap(page, dst_vma, dst_addr);
> folio_add_lru_vma(folio, dst_vma);


2023-12-18 16:07:22

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 10/39] mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's convert remove_migration_pte().
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> mm/migrate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index de9d94b99ab7..efc19f53b05e 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -262,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio,
> page_add_anon_rmap(new, vma, pvmw.address,
> rmap_flags);
> else
> - page_add_file_rmap(new, vma, false);
> + folio_add_file_rmap_pte(folio, new, vma);
> set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
> }
> if (vma->vm_flags & VM_LOCKED)


2023-12-18 16:18:35

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's factor it out to prepare for reuse as we convert
> page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().
>
> Make the compiler always special-case on the granularity by using
> __always_inline.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/rmap.c | 81 ++++++++++++++++++++++++++++++-------------------------
> 1 file changed, 45 insertions(+), 36 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2ff2f11275e5..c5761986a411 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1157,6 +1157,49 @@ int folio_total_mapcount(struct folio *folio)
> return mapcount;
> }
>
> +static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
> + struct page *page, int nr_pages, enum rmap_mode mode,
> + unsigned int *nr_pmdmapped)
> +{
> + atomic_t *mapped = &folio->_nr_pages_mapped;
> + int first, nr = 0;
> +
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
> +
> + /* Is page being mapped by PTE? Is this its first map to be added? */

I suspect this comment is left over from the old version? It sounds a bit odd in
its new context.

> + switch (mode) {
> + case RMAP_MODE_PTE:
> + do {
> + first = atomic_inc_and_test(&page->_mapcount);
> + if (first && folio_test_large(folio)) {
> + first = atomic_inc_return_relaxed(mapped);
> + first = (first < COMPOUND_MAPPED);
> + }
> +
> + if (first)
> + nr++;
> + } while (page++, --nr_pages > 0);
> + break;
> + case RMAP_MODE_PMD:
> + first = atomic_inc_and_test(&folio->_entire_mapcount);
> + if (first) {
> + nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> + if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
> + *nr_pmdmapped = folio_nr_pages(folio);
> + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> + /* Raced ahead of a remove and another add? */
> + if (unlikely(nr < 0))
> + nr = 0;
> + } else {
> + /* Raced ahead of a remove of COMPOUND_MAPPED */
> + nr = 0;
> + }
> + }
> + break;
> + }
> + return nr;
> +}
> +
> /**
> * folio_move_anon_rmap - move a folio to our anon_vma
> * @folio: The folio to move to our anon_vma
> @@ -1380,45 +1423,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
> struct page *page, int nr_pages, struct vm_area_struct *vma,
> enum rmap_mode mode)
> {
> - atomic_t *mapped = &folio->_nr_pages_mapped;
> - unsigned int nr_pmdmapped = 0, first;
> - int nr = 0;
> + unsigned int nr, nr_pmdmapped = 0;

You're still being inconsistent with signed/unsigned here. Is there a reason
these can't be signed like nr_pages in the interface?

>
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> - __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
> -
> - /* Is page being mapped by PTE? Is this its first map to be added? */
> - switch (mode) {
> - case RMAP_MODE_PTE:
> - do {
> - first = atomic_inc_and_test(&page->_mapcount);
> - if (first && folio_test_large(folio)) {
> - first = atomic_inc_return_relaxed(mapped);
> - first = (first < COMPOUND_MAPPED);
> - }
> -
> - if (first)
> - nr++;
> - } while (page++, --nr_pages > 0);
> - break;
> - case RMAP_MODE_PMD:
> - first = atomic_inc_and_test(&folio->_entire_mapcount);
> - if (first) {
> - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
> - nr_pmdmapped = folio_nr_pages(folio);
> - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> - /* Raced ahead of a remove and another add? */
> - if (unlikely(nr < 0))
> - nr = 0;
> - } else {
> - /* Raced ahead of a remove of COMPOUND_MAPPED */
> - nr = 0;
> - }
> - }
> - break;
> - }
>
> + nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
> if (nr_pmdmapped)
> __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
> NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);


2023-12-18 16:26:57

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's mimic what we did with folio_add_file_rmap_*() so we can similarly
> replace page_add_anon_rmap() next.
>
> Make the compiler always special-case on the granularity by using
> __always_inline.
>
> Note that the new functions ignore the RMAP_COMPOUND flag, which we will
> remove as soon as page_add_anon_rmap() is gone.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/rmap.h | 6 +++
> mm/rmap.c | 118 ++++++++++++++++++++++++++++++-------------
> 2 files changed, 88 insertions(+), 36 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 7198905dc8be..3b5357cb1c09 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -234,6 +234,12 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
> * rmap interfaces called when adding or removing pte of page
> */
> void folio_move_anon_rmap(struct folio *, struct vm_area_struct *);
> +void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages,
> + struct vm_area_struct *, unsigned long address, rmap_t flags);
> +#define folio_add_anon_rmap_pte(folio, page, vma, address, flags) \
> + folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
> +void folio_add_anon_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *, unsigned long address, rmap_t flags);
> void page_add_anon_rmap(struct page *, struct vm_area_struct *,
> unsigned long address, rmap_t flags);
> void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c5761986a411..7787499fa2ad 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1300,38 +1300,20 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> unsigned long address, rmap_t flags)
> {
> struct folio *folio = page_folio(page);
> - atomic_t *mapped = &folio->_nr_pages_mapped;
> - int nr = 0, nr_pmdmapped = 0;
> - bool compound = flags & RMAP_COMPOUND;
> - bool first;
>
> - /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(!compound)) {
> - first = atomic_inc_and_test(&page->_mapcount);
> - nr = first;
> - if (first && folio_test_large(folio)) {
> - nr = atomic_inc_return_relaxed(mapped);
> - nr = (nr < COMPOUND_MAPPED);
> - }
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> + if (likely(!(flags & RMAP_COMPOUND)))
> + folio_add_anon_rmap_pte(folio, page, vma, address, flags);
> + else
> + folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
> +}
>
> - first = atomic_inc_and_test(&folio->_entire_mapcount);
> - if (first) {
> - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
> - nr_pmdmapped = folio_nr_pages(folio);
> - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> - /* Raced ahead of a remove and another add? */
> - if (unlikely(nr < 0))
> - nr = 0;
> - } else {
> - /* Raced ahead of a remove of COMPOUND_MAPPED */
> - nr = 0;
> - }
> - }
> - }
> +static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> + struct page *page, int nr_pages, struct vm_area_struct *vma,
> + unsigned long address, rmap_t flags, enum rmap_mode mode)
> +{
> + unsigned int i, nr, nr_pmdmapped = 0;
>
> + nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
> if (nr_pmdmapped)
> __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped);
> if (nr)
> @@ -1345,18 +1327,34 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> * folio->index right when not given the address of the head
> * page.
> */
> - VM_WARN_ON_FOLIO(folio_test_large(folio) && !compound, folio);
> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> + mode != RMAP_MODE_PMD, folio);
> __folio_set_anon(folio, vma, address,
> !!(flags & RMAP_EXCLUSIVE));
> } else if (likely(!folio_test_ksm(folio))) {
> __page_check_anon_rmap(folio, page, vma, address);
> }
> - if (flags & RMAP_EXCLUSIVE)
> - SetPageAnonExclusive(page);
> - /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
> - VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 ||
> - (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) &&
> - PageAnonExclusive(page), folio);
> +
> + if (flags & RMAP_EXCLUSIVE) {
> + switch (mode) {
> + case RMAP_MODE_PTE:
> + for (i = 0; i < nr_pages; i++)
> + SetPageAnonExclusive(page + i);
> + break;
> + case RMAP_MODE_PMD:
> + SetPageAnonExclusive(page);

Just to check; I suppose only setting this on the head is ok, because it's an
exclusive mapping and therefore by definition it can only be mapped by pmd?

> + break;
> + }
> + }
> + for (i = 0; i < nr_pages; i++) {
> + struct page *cur_page = page + i;
> +
> + /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
> + VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 ||
> + (folio_test_large(folio) &&
> + folio_entire_mapcount(folio) > 1)) &&
> + PageAnonExclusive(cur_page), folio);
> + }
>
> /*
> * For large folio, only mlock it if it's fully mapped to VMA. It's
> @@ -1368,6 +1366,54 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
> mlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio
> + * @folio: The folio to add the mappings to
> + * @page: The first page to add
> + * @nr_pages: The number of pages which will be mapped
> + * @vma: The vm area in which the mappings are added
> + * @address: The user virtual address of the first page to map
> + * @flags: The rmap flags
> + *
> + * The page range of folio is defined by [first_page, first_page + nr_pages)
> + *
> + * The caller needs to hold the page table lock, and the page must be locked in
> + * the anon_vma case: to serialize mapping,index checking after setting,
> + * and to ensure that an anon folio is not being upgraded racily to a KSM folio
> + * (but KSM folios are never downgraded).
> + */
> +void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page,
> + int nr_pages, struct vm_area_struct *vma, unsigned long address,
> + rmap_t flags)
> +{
> + __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags,
> + RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio
> + * @folio: The folio to add the mapping to
> + * @page: The first page to add
> + * @vma: The vm area in which the mapping is added
> + * @address: The user virtual address of the first page to map
> + * @flags: The rmap flags
> + *
> + * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock, and the page must be locked in
> + * the anon_vma case: to serialize mapping,index checking after setting.
> + */
> +void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma, unsigned long address, rmap_t flags)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags,
> + RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /**
> * folio_add_new_anon_rmap - Add mapping to a new anonymous folio.
> * @folio: The folio to add the mapping to.


2023-12-18 16:28:47

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

On 11/12/2023 15:56, David Hildenbrand wrote:
> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>
> While at it, use more folio operations (but only in the code branch we're
> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
> manually setting PageAnonExclusive.
>
> We should never see non-anon pages on that branch: otherwise, the
> existing page_add_anon_rmap() call would have been flawed already.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/huge_memory.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1f5634b2f374..82ad68fe0d12 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2398,6 +2398,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> unsigned long haddr, bool freeze)
> {
> struct mm_struct *mm = vma->vm_mm;
> + struct folio *folio;
> struct page *page;
> pgtable_t pgtable;
> pmd_t old_pmd, _pmd;
> @@ -2493,16 +2494,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> uffd_wp = pmd_swp_uffd_wp(old_pmd);
> } else {
> page = pmd_page(old_pmd);
> + folio = page_folio(page);
> if (pmd_dirty(old_pmd)) {
> dirty = true;
> - SetPageDirty(page);
> + folio_set_dirty(folio);
> }
> write = pmd_write(old_pmd);
> young = pmd_young(old_pmd);
> soft_dirty = pmd_soft_dirty(old_pmd);
> uffd_wp = pmd_uffd_wp(old_pmd);
>
> - VM_BUG_ON_PAGE(!page_count(page), page);
> + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

Is this warning really correct? file-backed memory can be PMD-mapped with
CONFIG_READ_ONLY_THP_FOR_FS, so presumably it can also have the need to be
remapped as pte? Although I guess if we did have a file-backed folio, it
definitely wouldn't be correct to call page_add_anon_rmap() /
folio_add_anon_rmap_ptes()...

>
> /*
> * Without "freeze", we'll simply split the PMD, propagating the
> @@ -2519,11 +2522,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> *
> * See page_try_share_anon_rmap(): invalidate PMD first.
> */
> - anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
> + anon_exclusive = PageAnonExclusive(page);
> if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
> freeze = false;
> - if (!freeze)
> - page_ref_add(page, HPAGE_PMD_NR - 1);
> + if (!freeze) {
> + rmap_t rmap_flags = RMAP_NONE;
> +
> + folio_ref_add(folio, HPAGE_PMD_NR - 1);
> + if (anon_exclusive)
> + rmap_flags |= RMAP_EXCLUSIVE;
> + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
> + vma, haddr, rmap_flags);
> + }
> }
>
> /*
> @@ -2566,8 +2576,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
> if (write)
> entry = pte_mkwrite(entry, vma);
> - if (anon_exclusive)
> - SetPageAnonExclusive(page + i);
> if (!young)
> entry = pte_mkold(entry);
> /* NOTE: this may set soft-dirty too on some archs */
> @@ -2577,7 +2585,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> entry = pte_mksoft_dirty(entry);
> if (uffd_wp)
> entry = pte_mkuffd_wp(entry);
> - page_add_anon_rmap(page + i, vma, addr, RMAP_NONE);
> }
> VM_BUG_ON(!pte_none(ptep_get(pte)));
> set_pte_at(mm, addr, pte, entry);


2023-12-18 17:02:55

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()


>> - if (flags & RMAP_EXCLUSIVE)
>> - SetPageAnonExclusive(page);
>> - /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
>> - VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 ||
>> - (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) &&
>> - PageAnonExclusive(page), folio);
>> +
>> + if (flags & RMAP_EXCLUSIVE) {
>> + switch (mode) {
>> + case RMAP_MODE_PTE:
>> + for (i = 0; i < nr_pages; i++)
>> + SetPageAnonExclusive(page + i);
>> + break;
>> + case RMAP_MODE_PMD:
>> + SetPageAnonExclusive(page);
>
> Just to check; I suppose only setting this on the head is ok, because it's an
> exclusive mapping and therefore by definition it can only be mapped by pmd?

Yes. And when PTE-remapping, we will push the flag to all tail pages. No
change in behavior :)

--
Cheers,

David / dhildenb


2023-12-18 17:04:01

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

On 18.12.23 17:22, Ryan Roberts wrote:
> On 11/12/2023 15:56, David Hildenbrand wrote:
>> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>>
>> While at it, use more folio operations (but only in the code branch we're
>> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
>> manually setting PageAnonExclusive.
>>
>> We should never see non-anon pages on that branch: otherwise, the
>> existing page_add_anon_rmap() call would have been flawed already.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> mm/huge_memory.c | 23 +++++++++++++++--------
>> 1 file changed, 15 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 1f5634b2f374..82ad68fe0d12 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2398,6 +2398,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>> unsigned long haddr, bool freeze)
>> {
>> struct mm_struct *mm = vma->vm_mm;
>> + struct folio *folio;
>> struct page *page;
>> pgtable_t pgtable;
>> pmd_t old_pmd, _pmd;
>> @@ -2493,16 +2494,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>> uffd_wp = pmd_swp_uffd_wp(old_pmd);
>> } else {
>> page = pmd_page(old_pmd);
>> + folio = page_folio(page);
>> if (pmd_dirty(old_pmd)) {
>> dirty = true;
>> - SetPageDirty(page);
>> + folio_set_dirty(folio);
>> }
>> write = pmd_write(old_pmd);
>> young = pmd_young(old_pmd);
>> soft_dirty = pmd_soft_dirty(old_pmd);
>> uffd_wp = pmd_uffd_wp(old_pmd);
>>
>> - VM_BUG_ON_PAGE(!page_count(page), page);
>> + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
>> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> Is this warning really correct? file-backed memory can be PMD-mapped with
> CONFIG_READ_ONLY_THP_FOR_FS, so presumably it can also have the need to be
> remapped as pte? Although I guess if we did have a file-backed folio, it
> definitely wouldn't be correct to call page_add_anon_rmap() /
> folio_add_anon_rmap_ptes()...

Yes, see the patch description where I spell that out.

PTE-remapping a file-back folio will simply zap the PMD and refault from
the page cache after creating a page table.

So this is anon-only code.

--
Cheers,

David / dhildenb


2023-12-18 17:06:29

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap()

On 18.12.23 17:07, Ryan Roberts wrote:
> On 11/12/2023 15:56, David Hildenbrand wrote:
>> Let's factor it out to prepare for reuse as we convert
>> page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().
>>
>> Make the compiler always special-case on the granularity by using
>> __always_inline.
>>
>> Reviewed-by: Yin Fengwei <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> mm/rmap.c | 81 ++++++++++++++++++++++++++++++-------------------------
>> 1 file changed, 45 insertions(+), 36 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 2ff2f11275e5..c5761986a411 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1157,6 +1157,49 @@ int folio_total_mapcount(struct folio *folio)
>> return mapcount;
>> }
>>
>> +static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
>> + struct page *page, int nr_pages, enum rmap_mode mode,
>> + unsigned int *nr_pmdmapped)
>> +{
>> + atomic_t *mapped = &folio->_nr_pages_mapped;
>> + int first, nr = 0;
>> +
>> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>> +
>> + /* Is page being mapped by PTE? Is this its first map to be added? */
>
> I suspect this comment is left over from the old version? It sounds a bit odd in
> its new context.

In this patch, I'm just moving the code, so it would have to be dropped
in a previous patch.

I'm happy to drop all these comments in previous patches.

>
>> + switch (mode) {
>> + case RMAP_MODE_PTE:
>> + do {
>> + first = atomic_inc_and_test(&page->_mapcount);
>> + if (first && folio_test_large(folio)) {
>> + first = atomic_inc_return_relaxed(mapped);
>> + first = (first < COMPOUND_MAPPED);
>> + }
>> +
>> + if (first)
>> + nr++;
>> + } while (page++, --nr_pages > 0);
>> + break;
>> + case RMAP_MODE_PMD:
>> + first = atomic_inc_and_test(&folio->_entire_mapcount);
>> + if (first) {
>> + nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
>> + if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
>> + *nr_pmdmapped = folio_nr_pages(folio);
>> + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
>> + /* Raced ahead of a remove and another add? */
>> + if (unlikely(nr < 0))
>> + nr = 0;
>> + } else {
>> + /* Raced ahead of a remove of COMPOUND_MAPPED */
>> + nr = 0;
>> + }
>> + }
>> + break;
>> + }
>> + return nr;
>> +}
>> +
>> /**
>> * folio_move_anon_rmap - move a folio to our anon_vma
>> * @folio: The folio to move to our anon_vma
>> @@ -1380,45 +1423,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
>> struct page *page, int nr_pages, struct vm_area_struct *vma,
>> enum rmap_mode mode)
>> {
>> - atomic_t *mapped = &folio->_nr_pages_mapped;
>> - unsigned int nr_pmdmapped = 0, first;
>> - int nr = 0;
>> + unsigned int nr, nr_pmdmapped = 0;
>
> You're still being inconsistent with signed/unsigned here. Is there a reason
> these can't be signed like nr_pages in the interface?

I can turn them into signed values.

Personally, I think it's misleading to use "signed" for values that have
absolutely no meaning for negative meaning. But sure, we can be
consistent, at least in rmap code.

--
Cheers,

David / dhildenb


2023-12-19 08:41:07

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap()

On 18/12/2023 17:06, David Hildenbrand wrote:
> On 18.12.23 17:07, Ryan Roberts wrote:
>> On 11/12/2023 15:56, David Hildenbrand wrote:
>>> Let's factor it out to prepare for reuse as we convert
>>> page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().
>>>
>>> Make the compiler always special-case on the granularity by using
>>> __always_inline.
>>>
>>> Reviewed-by: Yin Fengwei <[email protected]>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>> ---
>>>   mm/rmap.c | 81 ++++++++++++++++++++++++++++++-------------------------
>>>   1 file changed, 45 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 2ff2f11275e5..c5761986a411 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1157,6 +1157,49 @@ int folio_total_mapcount(struct folio *folio)
>>>       return mapcount;
>>>   }
>>>   +static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
>>> +        struct page *page, int nr_pages, enum rmap_mode mode,
>>> +        unsigned int *nr_pmdmapped)
>>> +{
>>> +    atomic_t *mapped = &folio->_nr_pages_mapped;
>>> +    int first, nr = 0;
>>> +
>>> +    __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>>> +
>>> +    /* Is page being mapped by PTE? Is this its first map to be added? */
>>
>> I suspect this comment is left over from the old version? It sounds a bit odd in
>> its new context.
>
> In this patch, I'm just moving the code, so it would have to be dropped in a
> previous patch.
>
> I'm happy to drop all these comments in previous patches.

Well it doesn't really mean much to me in this new context, so I would reword if
there is still something you need to convey to the reader, else just remove.

>
>>
>>> +    switch (mode) {
>>> +    case RMAP_MODE_PTE:
>>> +        do {
>>> +            first = atomic_inc_and_test(&page->_mapcount);
>>> +            if (first && folio_test_large(folio)) {
>>> +                first = atomic_inc_return_relaxed(mapped);
>>> +                first = (first < COMPOUND_MAPPED);
>>> +            }
>>> +
>>> +            if (first)
>>> +                nr++;
>>> +        } while (page++, --nr_pages > 0);
>>> +        break;
>>> +    case RMAP_MODE_PMD:
>>> +        first = atomic_inc_and_test(&folio->_entire_mapcount);
>>> +        if (first) {
>>> +            nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
>>> +            if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
>>> +                *nr_pmdmapped = folio_nr_pages(folio);
>>> +                nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
>>> +                /* Raced ahead of a remove and another add? */
>>> +                if (unlikely(nr < 0))
>>> +                    nr = 0;
>>> +            } else {
>>> +                /* Raced ahead of a remove of COMPOUND_MAPPED */
>>> +                nr = 0;
>>> +            }
>>> +        }
>>> +        break;
>>> +    }
>>> +    return nr;
>>> +}
>>> +
>>>   /**
>>>    * folio_move_anon_rmap - move a folio to our anon_vma
>>>    * @folio:    The folio to move to our anon_vma
>>> @@ -1380,45 +1423,11 @@ static __always_inline void
>>> __folio_add_file_rmap(struct folio *folio,
>>>           struct page *page, int nr_pages, struct vm_area_struct *vma,
>>>           enum rmap_mode mode)
>>>   {
>>> -    atomic_t *mapped = &folio->_nr_pages_mapped;
>>> -    unsigned int nr_pmdmapped = 0, first;
>>> -    int nr = 0;
>>> +    unsigned int nr, nr_pmdmapped = 0;
>>
>> You're still being inconsistent with signed/unsigned here. Is there a reason
>> these can't be signed like nr_pages in the interface?
>
> I can turn them into signed values.
>
> Personally, I think it's misleading to use "signed" for values that have
> absolutely no meaning for negative meaning. But sure, we can be consistent, at
> least in rmap code.
>

Well it's an easy way to detect overflow? But I know what you mean. There are
lots of other APIs that accept signed/unsigned 32/64 bits; It's a mess. It would
be a tiny step in the right direction if a series could at least be consistent
with itself though, IMHO. :)

2023-12-19 08:42:55

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

On 18/12/2023 17:03, David Hildenbrand wrote:
> On 18.12.23 17:22, Ryan Roberts wrote:
>> On 11/12/2023 15:56, David Hildenbrand wrote:
>>> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>>>
>>> While at it, use more folio operations (but only in the code branch we're
>>> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
>>> manually setting PageAnonExclusive.
>>>
>>> We should never see non-anon pages on that branch: otherwise, the
>>> existing page_add_anon_rmap() call would have been flawed already.
>>>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>> ---
>>>   mm/huge_memory.c | 23 +++++++++++++++--------
>>>   1 file changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 1f5634b2f374..82ad68fe0d12 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2398,6 +2398,7 @@ static void __split_huge_pmd_locked(struct
>>> vm_area_struct *vma, pmd_t *pmd,
>>>           unsigned long haddr, bool freeze)
>>>   {
>>>       struct mm_struct *mm = vma->vm_mm;
>>> +    struct folio *folio;
>>>       struct page *page;
>>>       pgtable_t pgtable;
>>>       pmd_t old_pmd, _pmd;
>>> @@ -2493,16 +2494,18 @@ static void __split_huge_pmd_locked(struct
>>> vm_area_struct *vma, pmd_t *pmd,
>>>           uffd_wp = pmd_swp_uffd_wp(old_pmd);
>>>       } else {
>>>           page = pmd_page(old_pmd);
>>> +        folio = page_folio(page);
>>>           if (pmd_dirty(old_pmd)) {
>>>               dirty = true;
>>> -            SetPageDirty(page);
>>> +            folio_set_dirty(folio);
>>>           }
>>>           write = pmd_write(old_pmd);
>>>           young = pmd_young(old_pmd);
>>>           soft_dirty = pmd_soft_dirty(old_pmd);
>>>           uffd_wp = pmd_uffd_wp(old_pmd);
>>>   -        VM_BUG_ON_PAGE(!page_count(page), page);
>>> +        VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
>>> +        VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>>
>> Is this warning really correct? file-backed memory can be PMD-mapped with
>> CONFIG_READ_ONLY_THP_FOR_FS, so presumably it can also have the need to be
>> remapped as pte? Although I guess if we did have a file-backed folio, it
>> definitely wouldn't be correct to call page_add_anon_rmap() /
>> folio_add_anon_rmap_ptes()...
>
> Yes, see the patch description where I spell that out.

Oh god, how did I miss that... sorry!

>
> PTE-remapping a file-back folio will simply zap the PMD and refault from the
> page cache after creating a page table.


Yep, that makes sense.

>
> So this is anon-only code.
>