2023-12-20 22:45:20

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 00/40] mm/rmap: interface overhaul

This series overhauls the rmap interface, to get rid of the "bool compound"
/ RMAP_COMPOUND parameter with the goal of making the interface less error
prone, more future proof, and more natural to extend to "batching". Also,
this converts the interface to always consume folio+subpage, which speeds
up operations on large folios.

Further, this series adds PTE-batching variants for 4 rmap functions,
whereby only folio_add_anon_rmap_ptes() is used for batching in this series
when PTE-remapping a PMD-mapped THP. folio_remove_rmap_ptes(),
folio_try_dup_anon_rmap_ptes() and folio_dup_file_rmap_ptes() will soon
come in handy[1,2].

This series performs a lot of folio conversion along the way. Most of the
added LOC in the diff are only due to documentation.

As we're moving to a pte/pmd interface where we clearly express the
mapping granularity we are dealing with, we first get the remainder of
hugetlb out of the way, as it is special and expected to remain special: it
treats everything as a "single logical PTE" and only currently allows
entire mappings.

Even if we'd ever support partial mappings, I strongly assume the interface
and implementation will still differ heavily: hopefull we can avoid working
on subpages/subpage mapcounts completely and only add a "count" parameter
for them to enable batching.

New (extended) hugetlb interface that operates on entire folio:
* hugetlb_add_new_anon_rmap() -> Already existed
* hugetlb_add_anon_rmap() -> Already existed
* hugetlb_try_dup_anon_rmap()
* hugetlb_try_share_anon_rmap()
* hugetlb_add_file_rmap()
* hugetlb_remove_rmap()

New "ordinary" interface for small folios / THP::
* folio_add_new_anon_rmap() -> Already existed
* folio_add_anon_rmap_[pte|ptes|pmd]()
* folio_try_dup_anon_rmap_[pte|ptes|pmd]()
* folio_try_share_anon_rmap_[pte|pmd]()
* folio_add_file_rmap_[pte|ptes|pmd]()
* folio_dup_file_rmap_[pte|ptes|pmd]()
* folio_remove_rmap_[pte|ptes|pmd]()

folio_add_new_anon_rmap() will always map at the largest granularity
possible (currently, a single PMD to cover a PMD-sized THP). Could be
extended if ever required.

In the future, we might want "_pud" variants and eventually "_pmds"
variants for batching.

I ran some simple microbenchmarks on an Intel(R) Xeon(R) Silver 4210R:
measuring munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
remapping PMD-mapped THPs on 1 GiB of memory.

For small folios, there is barely a change (< 1% improvement for me).

For PTE-mapped THP:
* PTE-remapping a PMD-mapped THP is more than 10% faster.
* fork() is more than 4% faster.
* MADV_DONTNEED is 2% faster
* COW when writing only a single byte on a COW-shared PTE is 1% faster
* munmap() barely changes (< 1%).

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]

---

If we pull this into mm/unstable in 2023, I'll have my notebook ready to
debug next to the Christmas tree. ;)

Based on current mm/mm-unstable. Compile-tested with/wihout THP on x86-64
and with defconig on a bunch more. Tested on x86-64.

v1 -> v2:
* Rebased on top of mm-unstable (minor conflicts)
* Move some sanity checks from #6 into #2 -> #5 and leave the remainder in
#6
* Call it "rmap_level" instead of "rmap_mode".
* Consistently use "int" instead of "unsigned int" in rmap code
* Drop some stale comments
* Minor comment/description fixups + additions
* Spotted one last comment leftover, addressed in the (new) last patch
* Added RBs

RFC -> v1:
* Rebased on top of mm-unstable (containing mTHP)
* Use switch()-case and _always_inline for helper functions
* Fixed some (intermittend) compile issues and some smaller stuff
* folio_try_dup_anon_rmap_[pte|ptes|pmd]() rewrite
* Pass nr_pages consistently as "int"
* Simplify sanity checks
* Added RBs

Cc: Andrew Morton <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yin Fengwei <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>

David Hildenbrand (40):
mm/rmap: rename hugepage_add* to hugetlb_add*
mm/rmap: introduce and use hugetlb_remove_rmap()
mm/rmap: introduce and use hugetlb_add_file_rmap()
mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
mm/rmap: add hugetlb sanity checks for anon rmap handling
mm/rmap: convert folio_add_file_rmap_range() into
folio_add_file_rmap_[pte|ptes|pmd]()
mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
mm/rmap: remove page_add_file_rmap()
mm/rmap: factor out adding folio mappings into __folio_add_rmap()
mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/rmap: remove page_add_anon_rmap()
mm/rmap: remove RMAP_COMPOUND
mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
Documentation: stop referring to page_remove_rmap()
mm/rmap: remove page_remove_rmap()
mm/rmap: convert page_dup_file_rmap() to
folio_dup_file_rmap_[pte|ptes|pmd]()
mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
mm/huge_memory: page_try_dup_anon_rmap() ->
folio_try_dup_anon_rmap_pmd()
mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
mm/rmap: remove page_try_dup_anon_rmap()
mm: convert page_try_share_anon_rmap() to
folio_try_share_anon_rmap_[pte|pmd]()
mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED
mm: remove one last reference to page_add_*_rmap()

Documentation/mm/transhuge.rst | 4 +-
Documentation/mm/unevictable-lru.rst | 4 +-
include/linux/mm.h | 6 +-
include/linux/rmap.h | 397 +++++++++++++++++++-----
kernel/events/uprobes.c | 2 +-
mm/filemap.c | 10 +-
mm/gup.c | 2 +-
mm/huge_memory.c | 85 +++---
mm/hugetlb.c | 21 +-
mm/internal.h | 14 +-
mm/khugepaged.c | 17 +-
mm/ksm.c | 15 +-
mm/memory-failure.c | 4 +-
mm/memory.c | 60 ++--
mm/migrate.c | 12 +-
mm/migrate_device.c | 41 +--
mm/mmu_gather.c | 2 +-
mm/rmap.c | 433 ++++++++++++++++-----------
mm/swapfile.c | 2 +-
mm/userfaultfd.c | 2 +-
20 files changed, 739 insertions(+), 394 deletions(-)


base-commit: 2072407a394d0b3a3056f78a5630903da9471db0
--
2.43.0



2023-12-20 22:45:35

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 01/40] mm/rmap: rename hugepage_add* to hugetlb_add*

Let's just call it "hugetlb_".

Yes, it's all already inconsistent and confusing because we have a lot
of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
be confused with transparent huge pages, and it matches "hugetlb.c" and
"folio_test_hugetlb()". So let's minimize confusion in rmap code.

Reviewed-by: Muchun Song <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 ++--
mm/hugetlb.c | 8 ++++----
mm/migrate.c | 4 ++--
mm/rmap.c | 8 ++++----
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0ae2bb0e77f5d..36096ba69bdcd 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -206,9 +206,9 @@ void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

-void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
-void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

static inline void __page_dup_rmap(struct page *page, bool compound)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6feb3e0630d18..305f3ca1dee62 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5285,7 +5285,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add
pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);

__folio_mark_uptodate(new_folio);
- hugepage_add_new_anon_rmap(new_folio, vma, addr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, addr);
if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz);
@@ -5988,7 +5988,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
page_remove_rmap(&old_folio->page, vma, true);
- hugepage_add_new_anon_rmap(new_folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(mm, haddr, ptep, newpte, huge_page_size(h));
@@ -6277,7 +6277,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
goto backout;

if (anon_rmap)
- hugepage_add_new_anon_rmap(folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
page_dup_file_rmap(&folio->page, true);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
@@ -6732,7 +6732,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
if (folio_in_pagecache)
page_dup_file_rmap(&folio->page, true);
else
- hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr);
+ hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

/*
* For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
diff --git a/mm/migrate.c b/mm/migrate.c
index bad3039d165e6..7d1c3f292d24d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -249,8 +249,8 @@ static bool remove_migration_pte(struct folio *folio,

pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
if (folio_test_anon(folio))
- hugepage_add_anon_rmap(folio, vma, pvmw.address,
- rmap_flags);
+ hugetlb_add_anon_rmap(folio, vma, pvmw.address,
+ rmap_flags);
else
page_dup_file_rmap(new, true);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
diff --git a/mm/rmap.c b/mm/rmap.c
index 23da5b1ac33b4..9845499b22f8f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2625,8 +2625,8 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
*
* RMAP_COMPOUND is ignored.
*/
-void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
+void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags)
{
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

@@ -2637,8 +2637,8 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
PageAnonExclusive(&folio->page), folio);
}

-void hugepage_add_new_anon_rmap(struct folio *folio,
- struct vm_area_struct *vma, unsigned long address)
+void hugetlb_add_new_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma, unsigned long address)
{
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
--
2.43.0


2023-12-20 22:45:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 02/40] mm/rmap: introduce and use hugetlb_remove_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
code from page_remove_rmap(). This effectively removes one check on the
small-folio path as well.

Add sanity checks that we end up with the right folios in the right
functions.

Note: all possible candidates that need care are page_remove_rmap() that
pass compound=true.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 7 +++++++
mm/hugetlb.c | 4 ++--
mm/rmap.c | 18 +++++++++---------
3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 36096ba69bdcd..64ae6c4d72720 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -211,6 +211,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_remove_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
+ atomic_dec(&folio->_entire_mapcount);
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
if (compound) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 305f3ca1dee62..ef48ae6738909 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5676,7 +5676,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
make_pte_marker(PTE_MARKER_UFFD_WP),
sz);
hugetlb_count_sub(pages_per_huge_page(h), mm);
- page_remove_rmap(page, vma, true);
+ hugetlb_remove_rmap(page_folio(page));

spin_unlock(ptl);
tlb_remove_page_size(tlb, page, huge_page_size(h));
@@ -5987,7 +5987,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,

/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
- page_remove_rmap(&old_folio->page, vma, true);
+ hugetlb_remove_rmap(old_folio);
hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
diff --git a/mm/rmap.c b/mm/rmap.c
index 9845499b22f8f..261e1af0d254f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1480,15 +1480,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool last;
enum node_stat_item idx;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_PAGE(compound && !PageHead(page), page);

- /* Hugetlb pages are not counted in NR_*MAPPED */
- if (unlikely(folio_test_hugetlb(folio))) {
- /* hugetlb pages are always mapped with pmds */
- atomic_dec(&folio->_entire_mapcount);
- return;
- }
-
/* Is page being unmapped by PTE? Is this its last map to be removed? */
if (likely(!compound)) {
last = atomic_add_negative(-1, &page->_mapcount);
@@ -1846,7 +1840,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
dec_mm_counter(mm, mm_counter_file(&folio->page));
}
discard:
- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2199,7 +2196,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
*/
}

- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
--
2.43.0


2023-12-20 22:45:57

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 03/40] mm/rmap: introduce and use hugetlb_add_file_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

Right now we're using page_dup_file_rmap() in some cases where "ordinary"
rmap code would have used page_add_file_rmap(). So let's introduce and
use hugetlb_add_file_rmap() instead. We won't be adding a
"hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
doing the same: "dup" is just an optimization for "add".

What remains is a single page_dup_file_rmap() call in fork() code.

Add sanity checks that we end up with the right folios in the right
functions.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 8 ++++++++
mm/hugetlb.c | 6 +++---
mm/migrate.c | 2 +-
mm/rmap.c | 1 +
4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 64ae6c4d72720..56900a16f41a6 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -211,6 +211,14 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_add_file_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+
+ atomic_inc(&folio->_entire_mapcount);
+}
+
static inline void hugetlb_remove_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ef48ae6738909..57e8981879314 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5408,7 +5408,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* sleep during the process.
*/
if (!folio_test_anon(pte_folio)) {
- page_dup_file_rmap(&pte_folio->page, true);
+ hugetlb_add_file_rmap(pte_folio);
} else if (page_try_dup_anon_rmap(&pte_folio->page,
true, src_vma)) {
pte_t src_pte_old = entry;
@@ -6279,7 +6279,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
if (anon_rmap)
hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
&& (vma->vm_flags & VM_SHARED)));
/*
@@ -6730,7 +6730,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out_release_unlock;

if (folio_in_pagecache)
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
else
hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

diff --git a/mm/migrate.c b/mm/migrate.c
index 7d1c3f292d24d..0e912443a18c3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio,
hugetlb_add_anon_rmap(folio, vma, pvmw.address,
rmap_flags);
else
- page_dup_file_rmap(new, true);
+ hugetlb_add_file_rmap(folio);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
psize);
} else
diff --git a/mm/rmap.c b/mm/rmap.c
index 261e1af0d254f..a57ec926daf0c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1395,6 +1395,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
unsigned int nr_pmdmapped = 0, first;
int nr = 0;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);

/* Is page being mapped by PTE? Is this its first map to be added? */
--
2.43.0


2023-12-20 22:46:50

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 07/40] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

Let's get rid of the compound parameter and instead define explicitly
which mappings we're adding. That is more future proof, easier to read
and harder to mess up.

Use an enum to express the granularity internally. Make the compiler
always special-case on the granularity by using __always_inline. Replace
the "compound" check by a switch-case that will be removed by the
compiler completely.

Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
folio_test_pmd_mappable() check by a config check in the caller and
sanity checks. Convert the single user of folio_add_file_rmap_range().

While at it, consistently use "int" instead of "unisgned int" in rmap
code when dealing with mapcounts and the number of pages.

This function design can later easily be extended to PUDs and to batch
PMDs. Note that for now we don't support anything bigger than
PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
will catch if that ever changes.

Next up is removing page_remove_rmap() along with its "compound"
parameter and smilarly converting all other rmap functions.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 46 ++++++++++++++++++++++++--
mm/memory.c | 2 +-
mm/rmap.c | 79 ++++++++++++++++++++++++++++----------------
3 files changed, 95 insertions(+), 32 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index d6fefa0f04105..3d86a76b28368 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -191,6 +191,44 @@ typedef int __bitwise rmap_t;
*/
#define RMAP_COMPOUND ((__force rmap_t)BIT(1))

+/*
+ * Internally, we're using an enum to specify the granularity. We make the
+ * compiler emit specialized code for each granularity.
+ */
+enum rmap_level {
+ RMAP_LEVEL_PTE = 0,
+ RMAP_LEVEL_PMD,
+};
+
+static inline void __folio_rmap_sanity_checks(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_level level)
+{
+ /* hugetlb folios are handled separately. */
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_large(folio) &&
+ !folio_test_large_rmappable(folio), folio);
+
+ VM_WARN_ON_ONCE(nr_pages <= 0);
+ VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
+ VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
+
+ switch (level) {
+ case RMAP_LEVEL_PTE:
+ break;
+ case RMAP_LEVEL_PMD:
+ /*
+ * We don't support folios larger than a single PMD yet. So
+ * when RMAP_LEVEL_PMD is set, we assume that we are creating
+ * a single "entire" mapping of the folio.
+ */
+ VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
+ VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
+ break;
+ default:
+ VM_WARN_ON_ONCE(true);
+ }
+}
+
/*
* rmap interfaces called when adding or removing pte of page
*/
@@ -201,8 +239,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
void page_add_file_rmap(struct page *, struct vm_area_struct *,
bool compound);
-void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
- struct vm_area_struct *, bool compound);
+void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
+ struct vm_area_struct *);
+#define folio_add_file_rmap_pte(folio, page, vma) \
+ folio_add_file_rmap_ptes(folio, page, 1, vma)
+void folio_add_file_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

diff --git a/mm/memory.c b/mm/memory.c
index 149f779910fd5..7f957e5a84311 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4515,7 +4515,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
folio_add_lru_vma(folio, vma);
} else {
add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
- folio_add_file_rmap_range(folio, page, nr, vma, false);
+ folio_add_file_rmap_ptes(folio, page, nr, vma);
}
set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);

diff --git a/mm/rmap.c b/mm/rmap.c
index 6a1829324053e..cc1fc2d570f0d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1378,31 +1378,18 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
}

-/**
- * folio_add_file_rmap_range - add pte mapping to page range of a folio
- * @folio: The folio to add the mapping to
- * @page: The first page to add
- * @nr_pages: The number of pages which will be mapped
- * @vma: the vm area in which the mapping is added
- * @compound: charge the page as compound or small page
- *
- * The page range of folio is defined by [first_page, first_page + nr_pages)
- *
- * The caller needs to hold the pte lock.
- */
-void folio_add_file_rmap_range(struct folio *folio, struct page *page,
- unsigned int nr_pages, struct vm_area_struct *vma,
- bool compound)
+static __always_inline void __folio_add_file_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *vma,
+ enum rmap_level level)
{
atomic_t *mapped = &folio->_nr_pages_mapped;
- unsigned int nr_pmdmapped = 0, first;
- int nr = 0;
+ int nr = 0, nr_pmdmapped = 0, first;

- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, level);

- /* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(!compound)) {
+ switch (level) {
+ case RMAP_LEVEL_PTE:
do {
first = atomic_inc_and_test(&page->_mapcount);
if (first && folio_test_large(folio)) {
@@ -1413,9 +1400,8 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
if (first)
nr++;
} while (page++, --nr_pages > 0);
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
-
+ break;
+ case RMAP_LEVEL_PMD:
first = atomic_inc_and_test(&folio->_entire_mapcount);
if (first) {
nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
@@ -1430,6 +1416,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
nr = 0;
}
}
+ break;
}

if (nr_pmdmapped)
@@ -1443,6 +1430,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
mlock_vma_folio(folio, vma);
}

+/**
+ * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
+ * @folio: The folio to add the mappings to
+ * @page: The first page to add
+ * @nr_pages: The number of pages that will be mapped using PTEs
+ * @vma: The vm area in which the mappings are added
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
+ int nr_pages, struct vm_area_struct *vma)
+{
+ __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_LEVEL_PTE);
+}
+
+/**
+ * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @vma: The vm area in which the mapping is added
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_LEVEL_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/**
* page_add_file_rmap - add pte mapping to a file page
* @page: the page to add the mapping to
@@ -1455,16 +1479,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
bool compound)
{
struct folio *folio = page_folio(page);
- unsigned int nr_pages;

VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);

if (likely(!compound))
- nr_pages = 1;
+ folio_add_file_rmap_pte(folio, page, vma);
else
- nr_pages = folio_nr_pages(folio);
-
- folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+ folio_add_file_rmap_pmd(folio, page, vma);
}

/**
--
2.43.0


2023-12-20 22:46:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 06/40] mm/rmap: add hugetlb sanity checks for anon rmap handling

Let's make sure we end up with the right folios in the right functions
when adding an anon rmap, just like we already do in the other rmap
functions.

Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index c229e48cf5a9e..6a1829324053e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1262,6 +1262,8 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
bool compound = flags & RMAP_COMPOUND;
bool first;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
+
/* Is page being mapped by PTE? Is this its first map to be added? */
if (likely(!compound)) {
first = atomic_inc_and_test(&page->_mapcount);
@@ -1343,6 +1345,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
{
int nr = folio_nr_pages(folio);

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_VMA(address < vma->vm_start ||
address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
__folio_set_swapbacked(folio);
@@ -2634,6 +2637,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -2646,6 +2650,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
void hugetlb_add_new_anon_rmap(struct folio *folio,
struct vm_area_struct *vma, unsigned long address)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
--
2.43.0


2023-12-20 22:46:56

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 04/40] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Add sanity checks that we end up with the right folios in the right
functions.

Note that is_device_private_page() does not apply to hugetlb.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 12 +++++++++---
include/linux/rmap.h | 18 ++++++++++++++++++
mm/hugetlb.c | 3 +--
3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b72bf25a45cfd..ae547b62f3252 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1964,15 +1964,21 @@ static inline bool page_maybe_dma_pinned(struct page *page)
*
* The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
*/
-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
+static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct folio *folio)
{
VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));

if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
return false;

- return page_maybe_dma_pinned(page);
+ return folio_maybe_dma_pinned(folio);
+}
+
+static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct page *page)
+{
+ return folio_needs_cow_for_dma(vma, page_folio(page));
}

/**
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 56900a16f41a6..5f26752de945c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -211,6 +211,22 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+/* See page_try_dup_anon_rmap() */
+static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+ if (PageAnonExclusive(&folio->page)) {
+ if (unlikely(folio_needs_cow_for_dma(vma, folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+ }
+ atomic_inc(&folio->_entire_mapcount);
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
@@ -228,6 +244,8 @@ static inline void hugetlb_remove_rmap(struct folio *folio)

static inline void __page_dup_rmap(struct page *page, bool compound)
{
+ VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+
if (compound) {
struct folio *folio = (struct folio *)page;

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57e8981879314..378e460a6ab41 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5409,8 +5409,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
if (!folio_test_anon(pte_folio)) {
hugetlb_add_file_rmap(pte_folio);
- } else if (page_try_dup_anon_rmap(&pte_folio->page,
- true, src_vma)) {
+ } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
pte_t src_pte_old = entry;
struct folio *new_folio;

--
2.43.0


2023-12-20 22:47:18

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 08/40] mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()

Let's convert insert_page_into_pte_locked() and do_set_pmd(). While at it,
perform some folio conversion.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7f957e5a84311..c77d3952d261f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1859,12 +1859,14 @@ static int validate_page_before_insert(struct page *page)
static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, struct page *page, pgprot_t prot)
{
+ struct folio *folio = page_folio(page);
+
if (!pte_none(ptep_get(pte)))
return -EBUSY;
/* Ok, finally just insert the thing.. */
- get_page(page);
+ folio_get(folio);
inc_mm_counter(vma->vm_mm, mm_counter_file(page));
- page_add_file_rmap(page, vma, false);
+ folio_add_file_rmap_pte(folio, page, vma);
set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
return 0;
}
@@ -4409,6 +4411,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)

vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
{
+ struct folio *folio = page_folio(page);
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
@@ -4418,8 +4421,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
return ret;

- page = compound_head(page);
- if (compound_order(page) != HPAGE_PMD_ORDER)
+ if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
return ret;

/*
@@ -4428,7 +4430,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
* check. This kind of THP just can be PTE mapped. Access to
* the corrupted subpage should trigger SIGBUS as expected.
*/
- if (unlikely(PageHasHWPoisoned(page)))
+ if (unlikely(folio_test_has_hwpoisoned(folio)))
return ret;

/*
@@ -4452,7 +4454,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
- page_add_file_rmap(page, vma, true);
+ folio_add_file_rmap_pmd(folio, page, vma);

/*
* deposit and withdraw with pmd lock held
--
2.43.0


2023-12-20 22:47:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 05/40] mm/rmap: introduce and use hugetlb_try_share_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Add sanity checks that we end up with the right folios in the right
functions.

Note that try_to_unmap_one() does not need care. Easy to spot because
among all that nasty hugetlb special-casing in that function, we're not
using set_huge_pte_at() on the anon path -- well, and that code assumes
that we would want to swapout.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 25 +++++++++++++++++++++++++
mm/rmap.c | 15 ++++++++++-----
2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 5f26752de945c..d6fefa0f04105 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -227,6 +227,30 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
return 0;
}

+/* See page_try_share_anon_rmap() */
+static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
+
+ /* Paired with the memory barrier in try_grab_folio(). */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_mb();
+
+ if (unlikely(folio_maybe_dma_pinned(folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+
+ /*
+ * This is conceptually a smp_wmb() paired with the smp_rmb() in
+ * gup_must_unshare().
+ */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_mb__after_atomic();
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
@@ -331,6 +355,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
*/
static inline int page_try_share_anon_rmap(struct page *page)
{
+ VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page);

/* device private pages cannot get pinned via GUP. */
diff --git a/mm/rmap.c b/mm/rmap.c
index a57ec926daf0c..c229e48cf5a9e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2149,13 +2149,18 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
!anon_exclusive, subpage);

/* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
- if (folio_test_hugetlb(folio))
+ if (folio_test_hugetlb(folio)) {
+ if (anon_exclusive &&
+ hugetlb_try_share_anon_rmap(folio)) {
set_huge_pte_at(mm, address, pvmw.pte,
pteval, hsz);
- else
- set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ break;
+ }
+ } else if (anon_exclusive &&
+ page_try_share_anon_rmap(subpage)) {
+ set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
break;
--
2.43.0


2023-12-20 22:47:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 09/40] mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()

Let's convert remove_migration_pmd() and while at it, perform some folio
conversion.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c50dc2e1483fb..bce6f987f36a3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3577,6 +3577,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,

void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
{
+ struct folio *folio = page_folio(new);
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
@@ -3588,7 +3589,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
return;

entry = pmd_to_swp_entry(*pvmw->pmd);
- get_page(new);
+ folio_get(folio);
pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
@@ -3599,10 +3600,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
if (!is_migration_entry_young(entry))
pmde = pmd_mkold(pmde);
/* NOTE: this may contain setting soft-dirty on some archs */
- if (PageDirty(new) && is_migration_entry_dirty(entry))
+ if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
pmde = pmd_mkdirty(pmde);

- if (PageAnon(new)) {
+ if (folio_test_anon(folio)) {
rmap_t rmap_flags = RMAP_COMPOUND;

if (!is_readable_migration_entry(entry))
@@ -3610,9 +3611,9 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)

page_add_anon_rmap(new, vma, haddr, rmap_flags);
} else {
- page_add_file_rmap(new, vma, true);
+ folio_add_file_rmap_pmd(folio, new, vma);
}
- VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new));
+ VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new));
set_pmd_at(mm, haddr, pvmw->pmd, pmde);

/* No need to invalidate - it was non-present before */
--
2.43.0


2023-12-20 22:47:42

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 10/40] mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()

Let's convert remove_migration_pte().

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 0e912443a18c3..65d64a119cabb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -262,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio,
page_add_anon_rmap(new, vma, pvmw.address,
rmap_flags);
else
- page_add_file_rmap(new, vma, false);
+ folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
}
if (vma->vm_flags & VM_LOCKED)
--
2.43.0


2023-12-20 22:47:57

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 11/40] mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()

Let's convert mfill_atomic_install_pte().

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 203cda9192c29..5e718014e6713 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -114,7 +114,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
/* Usually, cache pages are already added to LRU */
if (newly_allocated)
folio_add_lru(folio);
- page_add_file_rmap(page, dst_vma, false);
+ folio_add_file_rmap_pte(folio, page, dst_vma);
} else {
folio_add_new_anon_rmap(folio, dst_vma, dst_addr);
folio_add_lru_vma(folio, dst_vma);
--
2.43.0


2023-12-20 22:48:10

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 12/40] mm/rmap: remove page_add_file_rmap()

All users are gone, let's remove it.

Reviewed-by: Yin Fengwei <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 2 --
mm/rmap.c | 21 ---------------------
2 files changed, 23 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 3d86a76b28368..6a4db6933e7df 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -237,8 +237,6 @@ void page_add_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
-void page_add_file_rmap(struct page *, struct vm_area_struct *,
- bool compound);
void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
struct vm_area_struct *);
#define folio_add_file_rmap_pte(folio, page, vma) \
diff --git a/mm/rmap.c b/mm/rmap.c
index cc1fc2d570f0d..5ab5ef10fbf5e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1467,27 +1467,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
#endif
}

-/**
- * page_add_file_rmap - add pte mapping to a file page
- * @page: the page to add the mapping to
- * @vma: the vm area in which the mapping is added
- * @compound: charge the page as compound or small page
- *
- * The caller needs to hold the pte lock.
- */
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
-{
- struct folio *folio = page_folio(page);
-
- VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
-
- if (likely(!compound))
- folio_add_file_rmap_pte(folio, page, vma);
- else
- folio_add_file_rmap_pmd(folio, page, vma);
-}
-
/**
* page_remove_rmap - take down pte mapping from a page
* @page: page to remove mapping from
--
2.43.0


2023-12-20 22:48:40

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 14/40] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()

Let's mimic what we did with folio_add_file_rmap_*() so we can similarly
replace page_add_anon_rmap() next.

Make the compiler always special-case on the granularity by using
__always_inline.

For the PageAnonExclusive sanity checks, when adding a PMD mapping,
we're now also checking each individual subpage covered by that PMD,
instead of only the head page.

Note that the new functions ignore the RMAP_COMPOUND flag, which we will
remove as soon as page_add_anon_rmap() is gone.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 +++
mm/rmap.c | 120 +++++++++++++++++++++++++++++--------------
2 files changed, 88 insertions(+), 38 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6a4db6933e7df..b5da3d86200e4 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -233,6 +233,12 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
* rmap interfaces called when adding or removing pte of page
*/
void folio_move_anon_rmap(struct folio *, struct vm_area_struct *);
+void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages,
+ struct vm_area_struct *, unsigned long address, rmap_t flags);
+#define folio_add_anon_rmap_pte(folio, page, vma, address, flags) \
+ folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
+void folio_add_anon_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *, unsigned long address, rmap_t flags);
void page_add_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
diff --git a/mm/rmap.c b/mm/rmap.c
index 895a8534a935d..7f380f5a34c90 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1299,40 +1299,20 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
struct folio *folio = page_folio(page);
- atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool compound = flags & RMAP_COMPOUND;
- bool first;
-
- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);

- /* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(!compound)) {
- first = atomic_inc_and_test(&page->_mapcount);
- nr = first;
- if (first && folio_test_large(folio)) {
- nr = atomic_inc_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
+ if (likely(!(flags & RMAP_COMPOUND)))
+ folio_add_anon_rmap_pte(folio, page, vma, address, flags);
+ else
+ folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
+}

- first = atomic_inc_and_test(&folio->_entire_mapcount);
- if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
- nr_pmdmapped = folio_nr_pages(folio);
- nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
- /* Raced ahead of a remove and another add? */
- if (unlikely(nr < 0))
- nr = 0;
- } else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
- nr = 0;
- }
- }
- }
+static __always_inline void __folio_add_anon_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags, enum rmap_level level)
+{
+ int i, nr, nr_pmdmapped = 0;

+ nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped);
if (nr_pmdmapped)
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped);
if (nr)
@@ -1346,18 +1326,34 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
* folio->index right when not given the address of the head
* page.
*/
- VM_WARN_ON_FOLIO(folio_test_large(folio) && !compound, folio);
+ VM_WARN_ON_FOLIO(folio_test_large(folio) &&
+ level != RMAP_LEVEL_PMD, folio);
__folio_set_anon(folio, vma, address,
!!(flags & RMAP_EXCLUSIVE));
} else if (likely(!folio_test_ksm(folio))) {
__page_check_anon_rmap(folio, page, vma, address);
}
- if (flags & RMAP_EXCLUSIVE)
- SetPageAnonExclusive(page);
- /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
- VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 ||
- (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) &&
- PageAnonExclusive(page), folio);
+
+ if (flags & RMAP_EXCLUSIVE) {
+ switch (level) {
+ case RMAP_LEVEL_PTE:
+ for (i = 0; i < nr_pages; i++)
+ SetPageAnonExclusive(page + i);
+ break;
+ case RMAP_LEVEL_PMD:
+ SetPageAnonExclusive(page);
+ break;
+ }
+ }
+ for (i = 0; i < nr_pages; i++) {
+ struct page *cur_page = page + i;
+
+ /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
+ VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 ||
+ (folio_test_large(folio) &&
+ folio_entire_mapcount(folio) > 1)) &&
+ PageAnonExclusive(cur_page), folio);
+ }

/*
* For large folio, only mlock it if it's fully mapped to VMA. It's
@@ -1369,6 +1365,54 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
mlock_vma_folio(folio, vma);
}

+/**
+ * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio
+ * @folio: The folio to add the mappings to
+ * @page: The first page to add
+ * @nr_pages: The number of pages which will be mapped
+ * @vma: The vm area in which the mappings are added
+ * @address: The user virtual address of the first page to map
+ * @flags: The rmap flags
+ *
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
+ * The caller needs to hold the page table lock, and the page must be locked in
+ * the anon_vma case: to serialize mapping,index checking after setting,
+ * and to ensure that an anon folio is not being upgraded racily to a KSM folio
+ * (but KSM folios are never downgraded).
+ */
+void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page,
+ int nr_pages, struct vm_area_struct *vma, unsigned long address,
+ rmap_t flags)
+{
+ __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags,
+ RMAP_LEVEL_PTE);
+}
+
+/**
+ * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @vma: The vm area in which the mapping is added
+ * @address: The user virtual address of the first page to map
+ * @flags: The rmap flags
+ *
+ * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock, and the page must be locked in
+ * the anon_vma case: to serialize mapping,index checking after setting.
+ */
+void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma, unsigned long address, rmap_t flags)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags,
+ RMAP_LEVEL_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/**
* folio_add_new_anon_rmap - Add mapping to a new anonymous folio.
* @folio: The folio to add the mapping to.
--
2.43.0


2023-12-20 22:48:53

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 15/40] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.

While at it, use more folio operations (but only in the code branch we're
touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
manually setting PageAnonExclusive.

We should never see non-anon pages on that branch: otherwise, the
existing page_add_anon_rmap() call would have been flawed already.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bce6f987f36a3..d4c5d22d16117 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2398,6 +2398,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long haddr, bool freeze)
{
struct mm_struct *mm = vma->vm_mm;
+ struct folio *folio;
struct page *page;
pgtable_t pgtable;
pmd_t old_pmd, _pmd;
@@ -2493,16 +2494,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
page = pmd_page(old_pmd);
+ folio = page_folio(page);
if (pmd_dirty(old_pmd)) {
dirty = true;
- SetPageDirty(page);
+ folio_set_dirty(folio);
}
write = pmd_write(old_pmd);
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
uffd_wp = pmd_uffd_wp(old_pmd);

- VM_BUG_ON_PAGE(!page_count(page), page);
+ VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

/*
* Without "freeze", we'll simply split the PMD, propagating the
@@ -2519,11 +2522,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
*
* See page_try_share_anon_rmap(): invalidate PMD first.
*/
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = PageAnonExclusive(page);
if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
freeze = false;
- if (!freeze)
- page_ref_add(page, HPAGE_PMD_NR - 1);
+ if (!freeze) {
+ rmap_t rmap_flags = RMAP_NONE;
+
+ folio_ref_add(folio, HPAGE_PMD_NR - 1);
+ if (anon_exclusive)
+ rmap_flags |= RMAP_EXCLUSIVE;
+ folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
+ vma, haddr, rmap_flags);
+ }
}

/*
@@ -2566,8 +2576,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
if (write)
entry = pte_mkwrite(entry, vma);
- if (anon_exclusive)
- SetPageAnonExclusive(page + i);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
@@ -2577,7 +2585,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = pte_mksoft_dirty(entry);
if (uffd_wp)
entry = pte_mkuffd_wp(entry);
- page_add_anon_rmap(page + i, vma, addr, RMAP_NONE);
}
VM_BUG_ON(!pte_none(ptep_get(pte)));
set_pte_at(mm, addr, pte, entry);
--
2.43.0


2023-12-20 22:49:19

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 17/40] mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert remove_migration_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 65d64a119cabb..b37dd087da265 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -259,8 +259,8 @@ static bool remove_migration_pte(struct folio *folio,
#endif
{
if (folio_test_anon(folio))
- page_add_anon_rmap(new, vma, pvmw.address,
- rmap_flags);
+ folio_add_anon_rmap_pte(folio, new, vma,
+ pvmw.address, rmap_flags);
else
folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
--
2.43.0


2023-12-20 22:49:26

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 16/40] mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()

Let's convert remove_migration_pmd(). No need to set RMAP_COMPOUND, that
we will remove soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d4c5d22d16117..1f438326b69bc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3611,12 +3611,12 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
pmde = pmd_mkdirty(pmde);

if (folio_test_anon(folio)) {
- rmap_t rmap_flags = RMAP_COMPOUND;
+ rmap_t rmap_flags = RMAP_NONE;

if (!is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;

- page_add_anon_rmap(new, vma, haddr, rmap_flags);
+ folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags);
} else {
folio_add_file_rmap_pmd(folio, new, vma);
}
--
2.43.0


2023-12-20 22:49:30

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 19/40] mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert unuse_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2f877ca445137..3eec686484ef5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1806,7 +1806,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (pte_swp_exclusive(old_pte))
rmap_flags |= RMAP_EXCLUSIVE;

- page_add_anon_rmap(page, vma, addr, rmap_flags);
+ folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags);
} else { /* ksm created a completely new copy */
folio_add_new_anon_rmap(folio, vma, addr);
folio_add_lru_vma(folio, vma);
--
2.43.0


2023-12-20 22:49:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 13/40] mm/rmap: factor out adding folio mappings into __folio_add_rmap()

Let's factor it out to prepare for reuse as we convert
page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().

Make the compiler always special-case on the granularity by using
__always_inline.

Reviewed-by: Yin Fengwei <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 78 +++++++++++++++++++++++++++++++------------------------
1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 5ab5ef10fbf5e..895a8534a935d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1157,6 +1157,48 @@ int folio_total_mapcount(struct folio *folio)
return mapcount;
}

+static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_level level,
+ int *nr_pmdmapped)
+{
+ atomic_t *mapped = &folio->_nr_pages_mapped;
+ int first, nr = 0;
+
+ __folio_rmap_sanity_checks(folio, page, nr_pages, level);
+
+ switch (level) {
+ case RMAP_LEVEL_PTE:
+ do {
+ first = atomic_inc_and_test(&page->_mapcount);
+ if (first && folio_test_large(folio)) {
+ first = atomic_inc_return_relaxed(mapped);
+ first = (first < COMPOUND_MAPPED);
+ }
+
+ if (first)
+ nr++;
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_LEVEL_PMD:
+ first = atomic_inc_and_test(&folio->_entire_mapcount);
+ if (first) {
+ nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
+ if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
+ *nr_pmdmapped = folio_nr_pages(folio);
+ nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
+ /* Raced ahead of a remove and another add? */
+ if (unlikely(nr < 0))
+ nr = 0;
+ } else {
+ /* Raced ahead of a remove of COMPOUND_MAPPED */
+ nr = 0;
+ }
+ }
+ break;
+ }
+ return nr;
+}
+
/**
* folio_move_anon_rmap - move a folio to our anon_vma
* @folio: The folio to move to our anon_vma
@@ -1382,43 +1424,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
struct page *page, int nr_pages, struct vm_area_struct *vma,
enum rmap_level level)
{
- atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0, first;
+ int nr, nr_pmdmapped = 0;

VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
- __folio_rmap_sanity_checks(folio, page, nr_pages, level);
-
- switch (level) {
- case RMAP_LEVEL_PTE:
- do {
- first = atomic_inc_and_test(&page->_mapcount);
- if (first && folio_test_large(folio)) {
- first = atomic_inc_return_relaxed(mapped);
- first = (first < COMPOUND_MAPPED);
- }
-
- if (first)
- nr++;
- } while (page++, --nr_pages > 0);
- break;
- case RMAP_LEVEL_PMD:
- first = atomic_inc_and_test(&folio->_entire_mapcount);
- if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
- nr_pmdmapped = folio_nr_pages(folio);
- nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
- /* Raced ahead of a remove and another add? */
- if (unlikely(nr < 0))
- nr = 0;
- } else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
- nr = 0;
- }
- }
- break;
- }

+ nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped);
if (nr_pmdmapped)
__lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);
--
2.43.0


2023-12-20 22:49:51

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 20/40] mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert restore_exclusive_pte() and do_swap_page(). While at it,
perform some folio conversion in restore_exclusive_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c77d3952d261f..6552ea27b0bfa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -710,6 +710,7 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
struct page *page, unsigned long address,
pte_t *ptep)
{
+ struct folio *folio = page_folio(page);
pte_t orig_pte;
pte_t pte;
swp_entry_t entry;
@@ -725,14 +726,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
else if (is_writable_device_exclusive_entry(entry))
pte = maybe_mkwrite(pte_mkdirty(pte), vma);

- VM_BUG_ON(pte_write(pte) && !(PageAnon(page) && PageAnonExclusive(page)));
+ VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) &&
+ PageAnonExclusive(page)), folio);

/*
* No need to take a page reference as one was already
* created when the swap entry was made.
*/
- if (PageAnon(page))
- page_add_anon_rmap(page, vma, address, RMAP_NONE);
+ if (folio_test_anon(folio))
+ folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE);
else
/*
* Currently device exclusive access only supports anonymous
@@ -4075,7 +4077,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_add_new_anon_rmap(folio, vma, vmf->address);
folio_add_lru_vma(folio, vma);
} else {
- page_add_anon_rmap(page, vma, vmf->address, rmap_flags);
+ folio_add_anon_rmap_pte(folio, page, vma, vmf->address,
+ rmap_flags);
}

VM_BUG_ON(!folio_test_anon(folio) ||
--
2.43.0


2023-12-20 22:49:57

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 21/40] mm/rmap: remove page_add_anon_rmap()

All users are gone, remove it and all traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 2 --
mm/rmap.c | 31 ++++---------------------------
2 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b5da3d86200e4..fe7b5a8b0e75b 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -239,8 +239,6 @@ void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages,
folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
void folio_add_anon_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *, unsigned long address, rmap_t flags);
-void page_add_anon_rmap(struct page *, struct vm_area_struct *,
- unsigned long address, rmap_t flags);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
diff --git a/mm/rmap.c b/mm/rmap.c
index 7f380f5a34c90..87415bbf24022 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1270,7 +1270,7 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page,
* The page's anon-rmap details (mapping and index) are guaranteed to
* be set up correctly at this point.
*
- * We have exclusion against page_add_anon_rmap because the caller
+ * We have exclusion against folio_add_anon_rmap_*() because the caller
* always holds the page locked.
*
* We have exclusion against folio_add_new_anon_rmap because those pages
@@ -1283,29 +1283,6 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page,
page);
}

-/**
- * page_add_anon_rmap - add pte mapping to an anonymous page
- * @page: the page to add the mapping to
- * @vma: the vm area in which the mapping is added
- * @address: the user virtual address mapped
- * @flags: the rmap flags
- *
- * The caller needs to hold the pte lock, and the page must be locked in
- * the anon_vma case: to serialize mapping,index checking after setting,
- * and to ensure that PageAnon is not being upgraded racily to PageKsm
- * (but PageKsm is never downgraded to PageAnon).
- */
-void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!(flags & RMAP_COMPOUND)))
- folio_add_anon_rmap_pte(folio, page, vma, address, flags);
- else
- folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
-}
-
static __always_inline void __folio_add_anon_rmap(struct folio *folio,
struct page *page, int nr_pages, struct vm_area_struct *vma,
unsigned long address, rmap_t flags, enum rmap_level level)
@@ -1419,7 +1396,7 @@ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
* @vma: the vm area in which the mapping is added
* @address: the user virtual address mapped
*
- * Like page_add_anon_rmap() but must only be called on *new* folios.
+ * Like folio_add_anon_rmap_*() but must only be called on *new* folios.
* This means the inc-and-test can be bypassed.
* The folio does not have to be locked.
*
@@ -1479,7 +1456,7 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
if (nr)
__lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr);

- /* See comments in page_add_anon_rmap() */
+ /* See comments in folio_add_anon_rmap_*() */
if (!folio_test_large(folio))
mlock_vma_folio(folio, vma);
}
@@ -1593,7 +1570,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,

/*
* It would be tidy to reset folio_test_anon mapping when fully
- * unmapped, but that might overwrite a racing page_add_anon_rmap
+ * unmapped, but that might overwrite a racing folio_add_anon_rmap_*()
* which increments mapcount after us but sets mapping before us:
* so leave the reset to free_pages_prepare, and remember that
* it's only reliable while mapped.
--
2.43.0


2023-12-20 22:50:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 22/40] mm/rmap: remove RMAP_COMPOUND

No longer used, let's remove it and clarify RMAP_NONE/RMAP_EXCLUSIVE a
bit.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 12 +++---------
mm/rmap.c | 2 --
2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index fe7b5a8b0e75b..bf6cb79aa7a0a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -177,20 +177,14 @@ struct anon_vma *folio_get_anon_vma(struct folio *folio);
typedef int __bitwise rmap_t;

/*
- * No special request: if the page is a subpage of a compound page, it is
- * mapped via a PTE. The mapped (sub)page is possibly shared between processes.
+ * No special request: A mapped anonymous (sub)page is possibly shared between
+ * processes.
*/
#define RMAP_NONE ((__force rmap_t)0)

-/* The (sub)page is exclusive to a single process. */
+/* The anonymous (sub)page is exclusive to a single process. */
#define RMAP_EXCLUSIVE ((__force rmap_t)BIT(0))

-/*
- * The compound page is not mapped via PTEs, but instead via a single PMD and
- * should be accounted accordingly.
- */
-#define RMAP_COMPOUND ((__force rmap_t)BIT(1))
-
/*
* Internally, we're using an enum to specify the granularity. We make the
* compiler emit specialized code for each granularity.
diff --git a/mm/rmap.c b/mm/rmap.c
index 87415bbf24022..2b386b9f6791c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2662,8 +2662,6 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
* The following two functions are for anonymous (private mapped) hugepages.
* Unlike common anonymous pages, anonymous hugepages have no accounting code
* and no lru code, because we handle hugepages differently from common pages.
- *
- * RMAP_COMPOUND is ignored.
*/
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
--
2.43.0


2023-12-20 22:50:48

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 18/40] mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert replace_page(). While at it, perform some folio
conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/ksm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 8fa6053a225d9..146aa75fa6ff7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1369,6 +1369,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
static int replace_page(struct vm_area_struct *vma, struct page *page,
struct page *kpage, pte_t orig_pte)
{
+ struct folio *kfolio = page_folio(kpage);
struct mm_struct *mm = vma->vm_mm;
struct folio *folio;
pmd_t *pmd;
@@ -1408,15 +1409,16 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
goto out_mn;
}
VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
- VM_BUG_ON_PAGE(PageAnon(kpage) && PageAnonExclusive(kpage), kpage);
+ VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
+ kfolio);

/*
* No need to check ksm_use_zero_pages here: we can only have a
* zero_page here if ksm_use_zero_pages was enabled already.
*/
if (!is_zero_pfn(page_to_pfn(kpage))) {
- get_page(kpage);
- page_add_anon_rmap(kpage, vma, addr, RMAP_NONE);
+ folio_get(kfolio);
+ folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
newpte = mk_pte(kpage, vma->vm_page_prot);
} else {
/*
--
2.43.0


2023-12-20 22:54:24

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 23/40] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()

Let's mimic what we did with folio_add_file_rmap_*() and
folio_add_anon_rmap_*() so we can similarly replace page_remove_rmap()
next.

Make the compiler always special-case on the granularity by using
__always_inline.

We're adding folio_remove_rmap_ptes() handling right away, as we want to
use that soon for batching rmap operations when unmapping PTE-mapped
large folios.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 ++++
mm/rmap.c | 82 +++++++++++++++++++++++++++++++++++---------
2 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index bf6cb79aa7a0a..57e045093f047 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -243,6 +243,12 @@ void folio_add_file_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);
+void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
+ struct vm_area_struct *);
+#define folio_remove_rmap_pte(folio, page, vma) \
+ folio_remove_rmap_ptes(folio, page, 1, vma)
+void folio_remove_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *);

void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
diff --git a/mm/rmap.c b/mm/rmap.c
index 2b386b9f6791c..1273180753953 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1510,25 +1510,37 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool compound)
{
struct folio *folio = page_folio(page);
+
+ if (likely(!compound))
+ folio_remove_rmap_pte(folio, page, vma);
+ else
+ folio_remove_rmap_pmd(folio, page, vma);
+}
+
+static __always_inline void __folio_remove_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *vma,
+ enum rmap_level level)
+{
atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool last;
+ int last, nr = 0, nr_pmdmapped = 0;
enum node_stat_item idx;

- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
-
- /* Is page being unmapped by PTE? Is this its last map to be removed? */
- if (likely(!compound)) {
- last = atomic_add_negative(-1, &page->_mapcount);
- nr = last;
- if (last && folio_test_large(folio)) {
- nr = atomic_dec_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
+ __folio_rmap_sanity_checks(folio, page, nr_pages, level);
+
+ switch (level) {
+ case RMAP_LEVEL_PTE:
+ do {
+ last = atomic_add_negative(-1, &page->_mapcount);
+ if (last && folio_test_large(folio)) {
+ last = atomic_dec_return_relaxed(mapped);
+ last = (last < COMPOUND_MAPPED);
+ }

+ if (last)
+ nr++;
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_LEVEL_PMD:
last = atomic_add_negative(-1, &folio->_entire_mapcount);
if (last) {
nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
@@ -1543,6 +1555,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
nr = 0;
}
}
+ break;
}

if (nr_pmdmapped) {
@@ -1564,7 +1577,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
* is still mapped.
*/
if (folio_test_large(folio) && folio_test_anon(folio))
- if (!compound || nr < nr_pmdmapped)
+ if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
deferred_split_folio(folio);
}

@@ -1579,6 +1592,43 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
munlock_vma_folio(folio, vma);
}

+/**
+ * folio_remove_rmap_ptes - remove PTE mappings from a page range of a folio
+ * @folio: The folio to remove the mappings from
+ * @page: The first page to remove
+ * @nr_pages: The number of pages that will be removed from the mapping
+ * @vma: The vm area from which the mappings are removed
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_remove_rmap_ptes(struct folio *folio, struct page *page,
+ int nr_pages, struct vm_area_struct *vma)
+{
+ __folio_remove_rmap(folio, page, nr_pages, vma, RMAP_LEVEL_PTE);
+}
+
+/**
+ * folio_remove_rmap_pmd - remove a PMD mapping from a page range of a folio
+ * @folio: The folio to remove the mapping from
+ * @page: The first page to remove
+ * @vma: The vm area from which the mapping is removed
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_remove_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_LEVEL_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
--
2.43.0


2023-12-20 22:55:01

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 24/40] kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert __replace_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/events/uprobes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8b115fc43f041..485bb0389b488 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -198,7 +198,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
set_pte_at_notify(mm, addr, pvmw.pte,
mk_pte(new_page, vma->vm_page_prot));

- page_remove_rmap(old_page, vma, false);
+ folio_remove_rmap_pte(old_folio, old_page, vma);
if (!folio_mapped(old_folio))
folio_free_swap(old_folio);
page_vma_mapped_walk_done(&pvmw);
--
2.43.0


2023-12-20 22:57:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 26/40] mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert __collapse_huge_page_copy_succeeded() and
collapse_pte_mapped_thp(). While at it, perform some more folio
conversion in __collapse_huge_page_copy_succeeded().

We can get rid of release_pte_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/khugepaged.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9cdea59fb4c03..15ec9c729ae58 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -494,11 +494,6 @@ static void release_pte_folio(struct folio *folio)
folio_putback_lru(folio);
}

-static void release_pte_page(struct page *page)
-{
- release_pte_folio(page_folio(page));
-}
-
static void release_pte_pages(pte_t *pte, pte_t *_pte,
struct list_head *compound_pagelist)
{
@@ -687,6 +682,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
spinlock_t *ptl,
struct list_head *compound_pagelist)
{
+ struct folio *src_folio;
struct page *src_page;
struct page *tmp;
pte_t *_pte;
@@ -708,16 +704,17 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
}
} else {
src_page = pte_page(pteval);
- if (!PageCompound(src_page))
- release_pte_page(src_page);
+ src_folio = page_folio(src_page);
+ if (!folio_test_large(src_folio))
+ release_pte_folio(src_folio);
/*
* ptl mostly unnecessary, but preempt has to
* be disabled to update the per-cpu stats
- * inside page_remove_rmap().
+ * inside folio_remove_rmap_pte().
*/
spin_lock(ptl);
ptep_clear(vma->vm_mm, address, _pte);
- page_remove_rmap(src_page, vma, false);
+ folio_remove_rmap_pte(src_folio, src_page, vma);
spin_unlock(ptl);
free_page_and_swap_cache(src_page);
}
@@ -1625,7 +1622,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
* PTE dirty? Shmem page is already dirty; file is read-only.
*/
ptep_clear(mm, addr, pte);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
nr_ptes++;
}

--
2.43.0


2023-12-20 22:58:16

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 25/40] mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()

Let's convert zap_huge_pmd() and set_pmd_migration_entry(). While at it,
perform some more folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1f438326b69bc..e7bc0f38ddc53 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1898,7 +1898,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,

if (pmd_present(orig_pmd)) {
page = pmd_page(orig_pmd);
- page_remove_rmap(page, vma, true);
+ folio_remove_rmap_pmd(page_folio(page), page, vma);
VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
VM_BUG_ON_PAGE(!PageHead(page), page);
} else if (thp_migration_supported()) {
@@ -2433,12 +2433,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
page = pfn_swap_entry_to_page(entry);
} else {
page = pmd_page(old_pmd);
- if (!PageDirty(page) && pmd_dirty(old_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(old_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio = page_folio(page);
+ if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
+ folio_set_dirty(folio);
+ if (!folio_test_referenced(folio) && pmd_young(old_pmd))
+ folio_set_referenced(folio);
+ folio_remove_rmap_pmd(folio, page, vma);
+ folio_put(folio);
}
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
@@ -2593,7 +2594,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
pte_unmap(pte - 1);

if (!pmd_migration)
- page_remove_rmap(page, vma, true);
+ folio_remove_rmap_pmd(folio, page, vma);
if (freeze)
put_page(page);

@@ -3536,6 +3537,7 @@ late_initcall(split_huge_pages_debugfs);
int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
struct page *page)
{
+ struct folio *folio = page_folio(page);
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
@@ -3551,14 +3553,14 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);

/* See page_try_share_anon_rmap(): invalidate PMD first. */
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
if (anon_exclusive && page_try_share_anon_rmap(page)) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return -EBUSY;
}

if (pmd_dirty(pmdval))
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (pmd_write(pmdval))
entry = make_writable_migration_entry(page_to_pfn(page));
else if (anon_exclusive)
@@ -3575,8 +3577,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
if (pmd_uffd_wp(pmdval))
pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio_remove_rmap_pmd(folio, page, vma);
+ folio_put(folio);
trace_set_migration_pmd(address, pmd_val(pmdswp));

return 0;
--
2.43.0


2023-12-20 23:03:00

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert zap_pte_range() and closely-related
tlb_flush_rmap_batch(). While at it, perform some more folio conversion
in zap_pte_range().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 23 +++++++++++++----------
mm/mmu_gather.c | 2 +-
2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6552ea27b0bfa..eda2181275d9b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1434,6 +1434,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = ptep_get(pte);
+ struct folio *folio;
struct page *page;

if (pte_none(ptent))
@@ -1459,21 +1460,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
continue;
}

+ folio = page_folio(page);
delay_rmap = 0;
- if (!PageAnon(page)) {
+ if (!folio_test_anon(folio)) {
if (pte_dirty(ptent)) {
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (tlb_delay_rmap(tlb)) {
delay_rmap = 1;
force_flush = 1;
}
}
if (pte_young(ptent) && likely(vma_has_recency(vma)))
- mark_page_accessed(page);
+ folio_mark_accessed(folio);
}
rss[mm_counter(page)]--;
if (!delay_rmap) {
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
}
@@ -1489,6 +1491,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
if (is_device_private_entry(entry) ||
is_device_exclusive_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
+ folio = page_folio(page);
if (unlikely(!should_zap_page(details, page)))
continue;
/*
@@ -1500,8 +1503,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
WARN_ON_ONCE(!vma_is_anonymous(vma));
rss[mm_counter(page)]--;
if (is_device_private_entry(entry))
- page_remove_rmap(page, vma, false);
- put_page(page);
+ folio_remove_rmap_pte(folio, page, vma);
+ folio_put(folio);
} else if (!non_swap_entry(entry)) {
/* Genuine swap entry, hence a private anon page */
if (!should_zap_cows(details))
@@ -3220,10 +3223,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* threads.
*
* The critical issue is to order this
- * page_remove_rmap with the ptp_clear_flush above.
- * Those stores are ordered by (if nothing else,)
+ * folio_remove_rmap_pte() with the ptp_clear_flush
+ * above. Those stores are ordered by (if nothing else,)
* the barrier present in the atomic_add_negative
- * in page_remove_rmap.
+ * in folio_remove_rmap_pte();
*
* Then the TLB flush in ptep_clear_flush ensures that
* no process can access the old page before the
@@ -3232,7 +3235,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* mapcount is visible. So transitively, TLBs to
* old page will be flushed before it can be reused.
*/
- page_remove_rmap(vmf->page, vma, false);
+ folio_remove_rmap_pte(old_folio, vmf->page, vma);
}

/* Free the old page.. */
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 4f559f4ddd217..604ddf08affed 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -55,7 +55,7 @@ static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_

if (encoded_page_flags(enc)) {
struct page *page = encoded_page_ptr(enc);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(page_folio(page), page, vma);
}
}
}
--
2.43.0


2023-12-20 23:03:26

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 29/40] mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert migrate_vma_collect_pmd(). While at it, perform more
folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate_device.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 81193363f8cd5..39b7754480c67 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -107,6 +107,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

for (; addr < end; addr += PAGE_SIZE, ptep++) {
unsigned long mpfn = 0, pfn;
+ struct folio *folio;
struct page *page;
swp_entry_t entry;
pte_t pte;
@@ -168,41 +169,43 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
}

/*
- * By getting a reference on the page we pin it and that blocks
+ * By getting a reference on the folio we pin it and that blocks
* any kind of migration. Side effect is that it "freezes" the
* pte.
*
- * We drop this reference after isolating the page from the lru
- * for non device page (device page are not on the lru and thus
+ * We drop this reference after isolating the folio from the lru
+ * for non device folio (device folio are not on the lru and thus
* can't be dropped from it).
*/
- get_page(page);
+ folio = page_folio(page);
+ folio_get(folio);

/*
- * We rely on trylock_page() to avoid deadlock between
+ * We rely on folio_trylock() to avoid deadlock between
* concurrent migrations where each is waiting on the others
- * page lock. If we can't immediately lock the page we fail this
+ * folio lock. If we can't immediately lock the folio we fail this
* migration as it is only best effort anyway.
*
- * If we can lock the page it's safe to set up a migration entry
- * now. In the common case where the page is mapped once in a
+ * If we can lock the folio it's safe to set up a migration entry
+ * now. In the common case where the folio is mapped once in a
* single process setting up the migration entry now is an
* optimisation to avoid walking the rmap later with
* try_to_migrate().
*/
- if (trylock_page(page)) {
+ if (folio_trylock(folio)) {
bool anon_exclusive;
pte_t swp_pte;

flush_cache_page(vma, addr, pte_pfn(pte));
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = folio_test_anon(folio) &&
+ PageAnonExclusive(page);
if (anon_exclusive) {
pte = ptep_clear_flush(vma, addr, ptep);

if (page_try_share_anon_rmap(page)) {
set_pte_at(mm, addr, ptep, pte);
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
mpfn = 0;
goto next;
}
@@ -214,7 +217,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pte))
- folio_mark_dirty(page_folio(page));
+ folio_mark_dirty(folio);

/* Setup special migration page table entry */
if (mpfn & MIGRATE_PFN_WRITE)
@@ -248,16 +251,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

/*
* This is like regular unmap: we remove the rmap and
- * drop page refcount. Page won't be freed, as we took
- * a reference just above.
+ * drop the folio refcount. The folio won't be freed, as
+ * we took a reference just above.
*/
- page_remove_rmap(page, vma, false);
- put_page(page);
+ folio_remove_rmap_pte(folio, page, vma);
+ folio_put(folio);

if (pte_present(pte))
unmapped++;
} else {
- put_page(page);
+ folio_put(folio);
mpfn = 0;
}

--
2.43.0


2023-12-20 23:05:24

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 30/40] mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert try_to_unmap_one() and try_to_migrate_one().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 1273180753953..a3ec2be484cfc 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1647,7 +1647,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,

/*
* When racing against e.g. zap_pte_range() on another cpu,
- * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
* try_to_unmap() may return before page_mapped() has become false,
* if page table locking is skipped: use TTU_SYNC to wait for that.
*/
@@ -1928,7 +1928,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_hugetlb(folio)))
hugetlb_remove_rmap(folio);
else
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -1996,7 +1996,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,

/*
* When racing against e.g. zap_pte_range() on another cpu,
- * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
* try_to_migrate() may return before page_mapped() has become false,
* if page table locking is skipped: use TTU_SYNC to wait for that.
*/
@@ -2289,7 +2289,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_hugetlb(folio)))
hugetlb_remove_rmap(folio);
else
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2428,7 +2428,7 @@ static bool page_make_device_exclusive_one(struct folio *folio,
* There is a reference on the page for the swap entry which has
* been removed, so shouldn't take another.
*/
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
}

mmu_notifier_invalidate_range_end(&range);
--
2.43.0


2023-12-20 23:06:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 31/40] Documentation: stop referring to page_remove_rmap()

Refer to folio_remove_rmap_*() instaed.

Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/mm/transhuge.rst | 2 +-
Documentation/mm/unevictable-lru.rst | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index 9a607059ea11c..cf81272a6b8b6 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -156,7 +156,7 @@ Partial unmap and deferred_split_folio()

Unmapping part of THP (with munmap() or other way) is not going to free
memory immediately. Instead, we detect that a subpage of THP is not in use
-in page_remove_rmap() and queue the THP for splitting if memory pressure
+in folio_remove_rmap_*() and queue the THP for splitting if memory pressure
comes. Splitting will free up unused subpages.

Splitting the page right away is not an option due to locking context in
diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
index 67f1338440a50..b6a07a26b10d5 100644
--- a/Documentation/mm/unevictable-lru.rst
+++ b/Documentation/mm/unevictable-lru.rst
@@ -486,7 +486,7 @@ munlock the pages if we're removing the last VM_LOCKED VMA that maps the pages.
Before the unevictable/mlock changes, mlocking did not mark the pages in any
way, so unmapping them required no processing.

-For each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
+For each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls
munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).

@@ -511,7 +511,7 @@ userspace; truncation even unmaps and deletes any private anonymous pages
which had been Copied-On-Write from the file pages now being truncated.

Mlocked pages can be munlocked and deleted in this way: like with munmap(),
-for each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
+for each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls
munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).

--
2.43.0


2023-12-20 23:07:01

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 32/40] mm/rmap: remove page_remove_rmap()

All callers are gone, let's remove it and some leftover traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 +---
mm/filemap.c | 10 +++++-----
mm/internal.h | 2 +-
mm/memory-failure.c | 4 ++--
mm/rmap.c | 23 ++---------------------
5 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 57e045093f047..fef369e37039a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -241,8 +241,6 @@ void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
folio_add_file_rmap_ptes(folio, page, 1, vma)
void folio_add_file_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
-void page_remove_rmap(struct page *, struct vm_area_struct *,
- bool compound);
void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
struct vm_area_struct *);
#define folio_remove_rmap_pte(folio, page, vma) \
@@ -389,7 +387,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
*
* This is similar to page_try_dup_anon_rmap(), however, not used during fork()
* to duplicate a mapping, but instead to prepare for KSM or temporarily
- * unmapping a page (swap, migration) via page_remove_rmap().
+ * unmapping a page (swap, migration) via folio_remove_rmap_*().
*
* Marking the page shared can only fail if the page may be pinned; device
* private pages cannot get pinned and consequently this function cannot fail.
diff --git a/mm/filemap.c b/mm/filemap.c
index 67ba56ecdd32a..c8dafe70d4cce 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -113,11 +113,11 @@
* ->i_pages lock (try_to_unmap_one)
* ->lruvec->lru_lock (follow_page->mark_page_accessed)
* ->lruvec->lru_lock (check_pte_range->isolate_lru_page)
- * ->private_lock (page_remove_rmap->set_page_dirty)
- * ->i_pages lock (page_remove_rmap->set_page_dirty)
- * bdi.wb->list_lock (page_remove_rmap->set_page_dirty)
- * ->inode->i_lock (page_remove_rmap->set_page_dirty)
- * ->memcg->move_lock (page_remove_rmap->folio_memcg_lock)
+ * ->private_lock (folio_remove_rmap_pte->set_page_dirty)
+ * ->i_pages lock (folio_remove_rmap_pte->set_page_dirty)
+ * bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty)
+ * ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty)
+ * ->memcg->move_lock (folio_remove_rmap_pte->folio_memcg_lock)
* bdi.wb->list_lock (zap_pte_range->set_page_dirty)
* ->inode->i_lock (zap_pte_range->set_page_dirty)
* ->private_lock (zap_pte_range->block_dirty_folio)
diff --git a/mm/internal.h b/mm/internal.h
index 222e63b2dea48..a94355e70bd78 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -651,7 +651,7 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma)
* under page table lock for the pte/pmd being added or removed.
*
* mlock is usually called at the end of page_add_*_rmap(), munlock at
- * the end of page_remove_rmap(); but new anon folios are managed by
+ * the end of folio_remove_rmap_*(); but new anon folios are managed by
* folio_add_lru_vma() calling mlock_new_folio().
*/
void mlock_folio(struct folio *folio);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5a23da5eb8c1e..a0d9b4ac7d545 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2315,8 +2315,8 @@ int memory_failure(unsigned long pfn, int flags)
* We use page flags to determine what action should be taken, but
* the flags can be modified by the error containment action. One
* example is an mlocked page, where PG_mlocked is cleared by
- * page_remove_rmap() in try_to_unmap_one(). So to determine page status
- * correctly, we save a copy of the page flags at this time.
+ * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page
+ * status correctly, we save a copy of the page flags at this time.
*/
page_flags = p->flags;

diff --git a/mm/rmap.c b/mm/rmap.c
index a3ec2be484cfc..3ee254a996221 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -470,7 +470,7 @@ void __init anon_vma_init(void)
/*
* Getting a lock on a stable anon_vma from a page off the LRU is tricky!
*
- * Since there is no serialization what so ever against page_remove_rmap()
+ * Since there is no serialization what so ever against folio_remove_rmap_*()
* the best this function can do is return a refcount increased anon_vma
* that might have been relevant to this page.
*
@@ -487,7 +487,7 @@ void __init anon_vma_init(void)
* [ something equivalent to page_mapped_in_vma() ].
*
* Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from
- * page_remove_rmap() that the anon_vma pointer from page->mapping is valid
+ * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid
* if there is a mapcount, we can dereference the anon_vma after observing
* those.
*
@@ -1498,25 +1498,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
#endif
}

-/**
- * page_remove_rmap - take down pte mapping from a page
- * @page: page to remove mapping from
- * @vma: the vm area from which the mapping is removed
- * @compound: uncharge the page as compound or small page
- *
- * The caller needs to hold the pte lock.
- */
-void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!compound))
- folio_remove_rmap_pte(folio, page, vma);
- else
- folio_remove_rmap_pmd(folio, page, vma);
-}
-
static __always_inline void __folio_remove_rmap(struct folio *folio,
struct page *page, int nr_pages, struct vm_area_struct *vma,
enum rmap_level level)
--
2.43.0


2023-12-20 23:07:22

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 33/40] mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]()

Let's convert page_dup_file_rmap() like the other rmap functions. As there
is only a single caller, convert that single caller right away and remove
page_dup_file_rmap().

Add folio_dup_file_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 59 ++++++++++++++++++++++++++++++++++++++++----
mm/memory.c | 2 +-
2 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index fef369e37039a..7607f862e795d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -308,6 +308,60 @@ static inline void hugetlb_remove_rmap(struct folio *folio)
atomic_dec(&folio->_entire_mapcount);
}

+static __always_inline void __folio_dup_file_rmap(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_level level)
+{
+ __folio_rmap_sanity_checks(folio, page, nr_pages, level);
+
+ switch (level) {
+ case RMAP_LEVEL_PTE:
+ do {
+ atomic_inc(&page->_mapcount);
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_LEVEL_PMD:
+ atomic_inc(&folio->_entire_mapcount);
+ break;
+ }
+}
+
+/**
+ * folio_dup_file_rmap_ptes - duplicate PTE mappings of a page range of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+static inline void folio_dup_file_rmap_ptes(struct folio *folio,
+ struct page *page, int nr_pages)
+{
+ __folio_dup_file_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE);
+}
+#define folio_dup_file_rmap_pte(folio, page) \
+ folio_dup_file_rmap_ptes(folio, page, 1)
+
+/**
+ * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+static inline void folio_dup_file_rmap_pmd(struct folio *folio,
+ struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_dup_file_rmap(folio, page, HPAGE_PMD_NR, RMAP_LEVEL_PTE);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
@@ -322,11 +376,6 @@ static inline void __page_dup_rmap(struct page *page, bool compound)
}
}

-static inline void page_dup_file_rmap(struct page *page, bool compound)
-{
- __page_dup_rmap(page, compound);
-}
-
/**
* page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
* anonymous page
diff --git a/mm/memory.c b/mm/memory.c
index eda2181275d9b..dc2a8e6858179 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -965,7 +965,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
rss[MM_ANONPAGES]++;
} else if (page) {
folio_get(folio);
- page_dup_file_rmap(page, false);
+ folio_dup_file_rmap_pte(folio, page);
rss[mm_counter_file(page)]++;
}

--
2.43.0


2023-12-20 23:07:24

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 27/40] mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert replace_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/ksm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 146aa75fa6ff7..716e2f87dd795 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1449,7 +1449,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
set_pte_at_notify(mm, addr, ptep, newpte);

folio = page_folio(page);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
if (!folio_mapped(folio))
folio_free_swap(folio);
folio_put(folio);
--
2.43.0


2023-12-20 23:07:50

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 35/40] mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd()

Let's convert copy_huge_pmd() and fixup the comment in copy_huge_pud().
While at it, perform more folio conversion in copy_huge_pmd().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e7bc0f38ddc53..edbca08449357 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1275,6 +1275,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
{
spinlock_t *dst_ptl, *src_ptl;
struct page *src_page;
+ struct folio *src_folio;
pmd_t pmd;
pgtable_t pgtable = NULL;
int ret = -ENOMEM;
@@ -1341,11 +1342,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,

src_page = pmd_page(pmd);
VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
+ src_folio = page_folio(src_page);

- get_page(src_page);
- if (unlikely(page_try_dup_anon_rmap(src_page, true, src_vma))) {
+ folio_get(src_folio);
+ if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) {
/* Page maybe pinned: split and retry the fault on PTEs. */
- put_page(src_page);
+ folio_put(src_folio);
pte_free(dst_mm, pgtable);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
@@ -1454,8 +1456,8 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}

/*
- * TODO: once we support anonymous pages, use page_try_dup_anon_rmap()
- * and split if duplicating fails.
+ * TODO: once we support anonymous pages, use
+ * folio_try_dup_anon_rmap_*() and split if duplicating fails.
*/
pudp_set_wrprotect(src_mm, addr, src_pud);
pud = pud_mkold(pud_wrprotect(pud));
--
2.43.0


2023-12-20 23:08:07

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 36/40] mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()

Let's convert copy_nonpresent_pte(). While at it, perform some more
folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dc2a8e6858179..d995ead7a3933 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -785,6 +785,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
unsigned long vm_flags = dst_vma->vm_flags;
pte_t orig_pte = ptep_get(src_pte);
pte_t pte = orig_pte;
+ struct folio *folio;
struct page *page;
swp_entry_t entry = pte_to_swp_entry(orig_pte);

@@ -829,6 +830,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}
} else if (is_device_private_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
+ folio = page_folio(page);

/*
* Update rss count even for unaddressable pages, as
@@ -839,10 +841,10 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* for unaddressable pages, at some point. But for now
* keep things as they are.
*/
- get_page(page);
+ folio_get(folio);
rss[mm_counter(page)]++;
/* Cannot fail as these pages cannot get pinned. */
- BUG_ON(page_try_dup_anon_rmap(page, false, src_vma));
+ folio_try_dup_anon_rmap_pte(folio, page, src_vma);

/*
* We do not preserve soft-dirty information, because so
@@ -956,7 +958,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
* future.
*/
folio_get(folio);
- if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) {
+ if (unlikely(folio_try_dup_anon_rmap_pte(folio, page, src_vma))) {
/* Page may be pinned, we have to copy. */
folio_put(folio);
return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
--
2.43.0


2023-12-20 23:08:09

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 37/40] mm/rmap: remove page_try_dup_anon_rmap()

All users are gone, remove page_try_dup_anon_rmap() and any remaining
traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 850aa74b6724c..0ad2ea2734e4a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -253,7 +253,7 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

-/* See page_try_dup_anon_rmap() */
+/* See folio_try_dup_anon_rmap_*() */
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
{
@@ -478,16 +478,6 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
#endif
}

-static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
- struct vm_area_struct *vma)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!compound))
- return folio_try_dup_anon_rmap_pte(folio, page, vma);
- return folio_try_dup_anon_rmap_pmd(folio, page, vma);
-}
-
/**
* page_try_share_anon_rmap - try marking an exclusive anonymous page possibly
* shared to prepare for KSM or temporary unmapping
@@ -496,8 +486,8 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
* The caller needs to hold the PT lock and has to have the page table entry
* cleared/invalidated.
*
- * This is similar to page_try_dup_anon_rmap(), however, not used during fork()
- * to duplicate a mapping, but instead to prepare for KSM or temporarily
+ * This is similar to folio_try_dup_anon_rmap_*(), however, not used during
+ * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily
* unmapping a page (swap, migration) via folio_remove_rmap_*().
*
* Marking the page shared can only fail if the page may be pinned; device
--
2.43.0


2023-12-20 23:08:33

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 38/40] mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]()

Let's convert it like we converted all the other rmap functions.
Don't introduce folio_try_share_anon_rmap_ptes() for now, as we don't
have a user that wants rmap batching in sight. Pretty easy to add later.

All users are easy to convert -- only ksm.c doesn't use folios yet but
that is left for future work -- so let's just do it in a single shot.

While at it, turn the BUG_ON into a WARN_ON_ONCE.

Note that page_try_share_anon_rmap() so far didn't care about pte/pmd
mappings (no compound parameter). We're changing that so we can perform
better sanity checks and make the code actually more readable/consistent.
For example, __folio_rmap_sanity_checks() will make sure that a PMD
range actually falls completely into the folio.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 96 ++++++++++++++++++++++++++++++++------------
mm/gup.c | 2 +-
mm/huge_memory.c | 9 +++--
mm/internal.h | 4 +-
mm/ksm.c | 5 ++-
mm/migrate_device.c | 2 +-
mm/rmap.c | 11 ++---
7 files changed, 89 insertions(+), 40 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0ad2ea2734e4a..fd6fe16fa3583 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -269,7 +269,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
return 0;
}

-/* See page_try_share_anon_rmap() */
+/* See folio_try_share_anon_rmap_*() */
static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
@@ -478,31 +478,15 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
#endif
}

-/**
- * page_try_share_anon_rmap - try marking an exclusive anonymous page possibly
- * shared to prepare for KSM or temporary unmapping
- * @page: the exclusive anonymous page to try marking possibly shared
- *
- * The caller needs to hold the PT lock and has to have the page table entry
- * cleared/invalidated.
- *
- * This is similar to folio_try_dup_anon_rmap_*(), however, not used during
- * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily
- * unmapping a page (swap, migration) via folio_remove_rmap_*().
- *
- * Marking the page shared can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
- *
- * Returns 0 if marking the page possibly shared succeeded. Returns -EBUSY
- * otherwise.
- */
-static inline int page_try_share_anon_rmap(struct page *page)
+static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
+ struct page *page, int nr_pages, enum rmap_level level)
{
- VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
- VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, level);

- /* device private pages cannot get pinned via GUP. */
- if (unlikely(is_device_private_page(page))) {
+ /* device private folios cannot get pinned via GUP. */
+ if (unlikely(folio_is_device_private(folio))) {
ClearPageAnonExclusive(page);
return 0;
}
@@ -553,7 +537,7 @@ static inline int page_try_share_anon_rmap(struct page *page)
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_mb();

- if (unlikely(page_maybe_dma_pinned(page)))
+ if (unlikely(folio_maybe_dma_pinned(folio)))
return -EBUSY;
ClearPageAnonExclusive(page);

@@ -566,6 +550,68 @@ static inline int page_try_share_anon_rmap(struct page *page)
return 0;
}

+/**
+ * folio_try_share_anon_rmap_pte - try marking an exclusive anonymous page
+ * mapped by a PTE possibly shared to prepare
+ * for KSM or temporary unmapping
+ * @folio: The folio to share a mapping of
+ * @page: The mapped exclusive page
+ *
+ * The caller needs to hold the page table lock and has to have the page table
+ * entries cleared/invalidated.
+ *
+ * This is similar to folio_try_dup_anon_rmap_pte(), however, not used during
+ * fork() to duplicate mappings, but instead to prepare for KSM or temporarily
+ * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pte().
+ *
+ * Marking the mapped page shared can only fail if the folio maybe pinned;
+ * device private folios cannot get pinned and consequently this function cannot
+ * fail.
+ *
+ * Returns 0 if marking the mapped page possibly shared succeeded. Returns
+ * -EBUSY otherwise.
+ */
+static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
+ struct page *page)
+{
+ return __folio_try_share_anon_rmap(folio, page, 1, RMAP_LEVEL_PTE);
+}
+
+/**
+ * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
+ * range mapped by a PMD possibly shared to
+ * prepare for temporary unmapping
+ * @folio: The folio to share the mapping of
+ * @page: The first page to share the mapping of
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock and has to have the page table
+ * entries cleared/invalidated.
+ *
+ * This is similar to folio_try_dup_anon_rmap_pmd(), however, not used during
+ * fork() to duplicate a mapping, but instead to prepare for temporarily
+ * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pmd().
+ *
+ * Marking the mapped pages shared can only fail if the folio maybe pinned;
+ * device private folios cannot get pinned and consequently this function cannot
+ * fail.
+ *
+ * Returns 0 if marking the mapped pages possibly shared succeeded. Returns
+ * -EBUSY otherwise.
+ */
+static inline int folio_try_share_anon_rmap_pmd(struct folio *folio,
+ struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_share_anon_rmap(folio, page, HPAGE_PMD_NR,
+ RMAP_LEVEL_PMD);
+#else
+ WARN_ON_ONCE(true);
+ return -EBUSY;
+#endif
+}
+
/*
* Called from mm/vmscan.c to handle paging out
*/
diff --git a/mm/gup.c b/mm/gup.c
index 0a5f0e91bfec5..df83182ec72d5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -177,7 +177,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
/*
* Adjust the pincount before re-checking the PTE for changes.
* This is essentially a smp_mb() and is paired with a memory
- * barrier in page_try_share_anon_rmap().
+ * barrier in folio_try_share_anon_rmap_*().
*/
smp_mb__after_atomic();

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index edbca08449357..ed0f66545e9fb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2523,10 +2523,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
* In case we cannot clear PageAnonExclusive(), split the PMD
* only and let try_to_migrate_one() fail later.
*
- * See page_try_share_anon_rmap(): invalidate PMD first.
+ * See folio_try_share_anon_rmap_pmd(): invalidate PMD first.
*/
anon_exclusive = PageAnonExclusive(page);
- if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
+ if (freeze && anon_exclusive &&
+ folio_try_share_anon_rmap_pmd(folio, page))
freeze = false;
if (!freeze) {
rmap_t rmap_flags = RMAP_NONE;
@@ -3554,9 +3555,9 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);

- /* See page_try_share_anon_rmap(): invalidate PMD first. */
+ /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
- if (anon_exclusive && page_try_share_anon_rmap(page)) {
+ if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return -EBUSY;
}
diff --git a/mm/internal.h b/mm/internal.h
index a94355e70bd78..29589bc3f046d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1047,7 +1047,7 @@ enum {
* * Ordinary GUP: Using the PT lock
* * GUP-fast and fork(): mm->write_protect_seq
* * GUP-fast and KSM or temporary unmapping (swap, migration): see
- * page_try_share_anon_rmap()
+ * folio_try_share_anon_rmap_*()
*
* Must be called with the (sub)page that's actually referenced via the
* page table entry, which might not necessarily be the head page for a
@@ -1090,7 +1090,7 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma,
return is_cow_mapping(vma->vm_flags);
}

- /* Paired with a memory barrier in page_try_share_anon_rmap(). */
+ /* Paired with a memory barrier in folio_try_share_anon_rmap_*(). */
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_rmb();

diff --git a/mm/ksm.c b/mm/ksm.c
index 716e2f87dd795..8c001819cf10f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1331,8 +1331,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
goto out_unlock;
}

- /* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive && page_try_share_anon_rmap(page)) {
+ /* See folio_try_share_anon_rmap_pte(): clear PTE first. */
+ if (anon_exclusive &&
+ folio_try_share_anon_rmap_pte(page_folio(page), page)) {
set_pte_at(mm, pvmw.address, pvmw.pte, entry);
goto out_unlock;
}
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 39b7754480c67..b6c27c76e1a0b 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -202,7 +202,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (anon_exclusive) {
pte = ptep_clear_flush(vma, addr, ptep);

- if (page_try_share_anon_rmap(page)) {
+ if (folio_try_share_anon_rmap_pte(folio, page)) {
set_pte_at(mm, addr, ptep, pte);
folio_unlock(folio);
folio_put(folio);
diff --git a/mm/rmap.c b/mm/rmap.c
index 3ee254a996221..6209e65985a26 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1866,9 +1866,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
break;
}

- /* See page_try_share_anon_rmap(): clear PTE first. */
+ /* See folio_try_share_anon_rmap(): clear PTE first. */
if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
+ folio_try_share_anon_rmap_pte(folio, subpage)) {
swap_free(entry);
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
@@ -2142,7 +2142,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
pte_t swp_pte;

if (anon_exclusive)
- BUG_ON(page_try_share_anon_rmap(subpage));
+ WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio,
+ subpage));

/*
* Store the pfn of the page in a special migration
@@ -2213,7 +2214,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) &&
!anon_exclusive, subpage);

- /* See page_try_share_anon_rmap(): clear PTE first. */
+ /* See folio_try_share_anon_rmap_pte(): clear PTE first. */
if (folio_test_hugetlb(folio)) {
if (anon_exclusive &&
hugetlb_try_share_anon_rmap(folio)) {
@@ -2224,7 +2225,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
} else if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
+ folio_try_share_anon_rmap_pte(folio, subpage)) {
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
--
2.43.0


2023-12-20 23:08:50

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 39/40] mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED

We removed all "bool compound" and RMAP_COMPOUND parameters. Let's
remove the remaining "compound" terminology by making COMPOUND_MAPPED
match the "folio->_entire_mapcount" terminology, renaming it to
ENTIRELY_MAPPED.

ENTIRELY_MAPPED is only used when the whole folio is mapped using a single
page table entry (e.g., a single PMD mapping a PMD-sized THP). For now,
we don't support mapping any THP bigger than that, so ENTIRELY_MAPPED
only applies to PMD-mapped PMD-sized THP only.

Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/mm/transhuge.rst | 2 +-
mm/internal.h | 6 +++---
mm/rmap.c | 18 +++++++++---------
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index cf81272a6b8b6..93c9239b9ebe2 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -117,7 +117,7 @@ pages:

- map/unmap of a PMD entry for the whole THP increment/decrement
folio->_entire_mapcount and also increment/decrement
- folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount
+ folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount
goes from -1 to 0 or 0 to -1.

- map/unmap of individual pages with PTE entry increment/decrement
diff --git a/mm/internal.h b/mm/internal.h
index 29589bc3f046d..188807d2aebc5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -54,12 +54,12 @@ void page_writeback_init(void);

/*
* If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
- * its nr_pages_mapped would be 0x400000: choose the COMPOUND_MAPPED bit
+ * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit
* above that range, instead of 2*(PMD_SIZE/PAGE_SIZE). Hugetlb currently
* leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
*/
-#define COMPOUND_MAPPED 0x800000
-#define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1)
+#define ENTIRELY_MAPPED 0x800000
+#define FOLIO_PAGES_MAPPED (ENTIRELY_MAPPED - 1)

/*
* Flags passed to __show_mem() and show_free_areas() to suppress output in
diff --git a/mm/rmap.c b/mm/rmap.c
index 6209e65985a26..f5d43edad529a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1172,7 +1172,7 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
first = atomic_inc_and_test(&page->_mapcount);
if (first && folio_test_large(folio)) {
first = atomic_inc_return_relaxed(mapped);
- first = (first < COMPOUND_MAPPED);
+ first = (first < ENTIRELY_MAPPED);
}

if (first)
@@ -1182,15 +1182,15 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
case RMAP_LEVEL_PMD:
first = atomic_inc_and_test(&folio->_entire_mapcount);
if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
+ nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped);
+ if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) {
*nr_pmdmapped = folio_nr_pages(folio);
nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
/* Raced ahead of a remove and another add? */
if (unlikely(nr < 0))
nr = 0;
} else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
+ /* Raced ahead of a remove of ENTIRELY_MAPPED */
nr = 0;
}
}
@@ -1433,7 +1433,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
} else {
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
- atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
+ atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED);
SetPageAnonExclusive(&folio->page);
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
}
@@ -1514,7 +1514,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
last = atomic_add_negative(-1, &page->_mapcount);
if (last && folio_test_large(folio)) {
last = atomic_dec_return_relaxed(mapped);
- last = (last < COMPOUND_MAPPED);
+ last = (last < ENTIRELY_MAPPED);
}

if (last)
@@ -1524,15 +1524,15 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
case RMAP_LEVEL_PMD:
last = atomic_add_negative(-1, &folio->_entire_mapcount);
if (last) {
- nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED)) {
+ nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped);
+ if (likely(nr < ENTIRELY_MAPPED)) {
nr_pmdmapped = folio_nr_pages(folio);
nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
/* Raced ahead of another remove and an add? */
if (unlikely(nr < 0))
nr = 0;
} else {
- /* An add of COMPOUND_MAPPED raced ahead */
+ /* An add of ENTIRELY_MAPPED raced ahead */
nr = 0;
}
}
--
2.43.0


2023-12-20 23:08:54

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 40/40] mm: remove one last reference to page_add_*_rmap()

Let's fixup one remaining comment. Note that the only trace remaining of
the old rmap interface is in an example in Documentation/trace/ftrace.rst,
that we'll just leave alone.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index 188807d2aebc5..ac40c3d003368 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -650,7 +650,7 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma)
* should be called with vma's mmap_lock held for read or write,
* under page table lock for the pte/pmd being added or removed.
*
- * mlock is usually called at the end of page_add_*_rmap(), munlock at
+ * mlock is usually called at the end of folio_add_*_rmap_*(), munlock at
* the end of folio_remove_rmap_*(); but new anon folios are managed by
* folio_add_lru_vma() calling mlock_new_folio().
*/
--
2.43.0


2023-12-20 23:20:17

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 34/40] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
remove them.

Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 6 --
include/linux/rmap.h | 150 ++++++++++++++++++++++++++++++-------------
2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ae547b62f3252..30edf3f7d1f38 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1975,12 +1975,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
return folio_maybe_dma_pinned(folio);
}

-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
-{
- return folio_needs_cow_for_dma(vma, page_folio(page));
-}
-
/**
* is_zero_page - Query if a page is a zero page
* @page: The page to query
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 7607f862e795d..850aa74b6724c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -362,68 +362,130 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
#endif
}

-static inline void __page_dup_rmap(struct page *page, bool compound)
+static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *src_vma,
+ enum rmap_level level)
{
- VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+ bool maybe_pinned;
+ int i;
+
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, level);

- if (compound) {
- struct folio *folio = (struct folio *)page;
+ /*
+ * If this folio may have been pinned by the parent process,
+ * don't allow to duplicate the mappings but instead require to e.g.,
+ * copy the subpage immediately for the child so that we'll always
+ * guarantee the pinned folio won't be randomly replaced in the
+ * future on write faults.
+ */
+ maybe_pinned = likely(!folio_is_device_private(folio)) &&
+ unlikely(folio_needs_cow_for_dma(src_vma, folio));

- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
+ /*
+ * No need to check+clear for already shared PTEs/PMDs of the
+ * folio. But if any page is PageAnonExclusive, we must fallback to
+ * copying if the folio maybe pinned.
+ */
+ switch (level) {
+ case RMAP_LEVEL_PTE:
+ if (unlikely(maybe_pinned)) {
+ for (i = 0; i < nr_pages; i++)
+ if (PageAnonExclusive(page + i))
+ return -EBUSY;
+ }
+ do {
+ if (PageAnonExclusive(page))
+ ClearPageAnonExclusive(page);
+ atomic_inc(&page->_mapcount);
+ } while (page++, --nr_pages > 0);
+ break;
+ case RMAP_LEVEL_PMD:
+ if (PageAnonExclusive(page)) {
+ if (unlikely(maybe_pinned))
+ return -EBUSY;
+ ClearPageAnonExclusive(page);
+ }
atomic_inc(&folio->_entire_mapcount);
- } else {
- atomic_inc(&page->_mapcount);
+ break;
}
+ return 0;
}

/**
- * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
- * anonymous page
- * @page: the page to duplicate the mapping for
- * @compound: the page is mapped as compound or as a small page
- * @vma: the source vma
+ * folio_try_dup_anon_rmap_ptes - try duplicating PTE mappings of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ * @src_vma: The vm area from which the mappings are duplicated
*
- * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq.
+ * The page range of the folio is defined by [page, page + nr_pages)
*
- * Duplicating the mapping can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mappings can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail
+ * for them.
+ *
+ * If duplicating the mappings succeeded, the duplicated PTEs have to be R/O in
+ * the parent and the child. They must *not* be writable after this call
+ * succeeded.
+ *
+ * Returns 0 if duplicating the mappings succeeded. Returns -EBUSY otherwise.
+ */
+static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio,
+ struct page *page, int nr_pages, struct vm_area_struct *src_vma)
+{
+ return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma,
+ RMAP_LEVEL_PTE);
+}
+#define folio_try_dup_anon_rmap_pte(folio, page, vma) \
+ folio_try_dup_anon_rmap_ptes(folio, page, 1, vma)
+
+/**
+ * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ * @src_vma: The vm area from which the mapping is duplicated
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
*
- * If duplicating the mapping succeeds, the page has to be mapped R/O into
- * the parent and the child. It must *not* get mapped writable after this call.
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mapping can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail
+ * for them.
+ *
+ * If duplicating the mapping succeeds, the duplicated PMD has to be R/O in
+ * the parent and the child. They must *not* be writable after this call
+ * succeeded.
*
* Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise.
*/
+static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
+ struct page *page, struct vm_area_struct *src_vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_dup_anon_rmap(folio, page, HPAGE_PMD_NR, src_vma,
+ RMAP_LEVEL_PMD);
+#else
+ WARN_ON_ONCE(true);
+ return -EBUSY;
+#endif
+}
+
static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
struct vm_area_struct *vma)
{
- VM_BUG_ON_PAGE(!PageAnon(page), page);
-
- /*
- * No need to check+clear for already shared pages, including KSM
- * pages.
- */
- if (!PageAnonExclusive(page))
- goto dup;
-
- /*
- * If this page may have been pinned by the parent process,
- * don't allow to duplicate the mapping but instead require to e.g.,
- * copy the page immediately for the child so that we'll always
- * guarantee the pinned page won't be randomly replaced in the
- * future on write faults.
- */
- if (likely(!is_device_private_page(page)) &&
- unlikely(page_needs_cow_for_dma(vma, page)))
- return -EBUSY;
+ struct folio *folio = page_folio(page);

- ClearPageAnonExclusive(page);
- /*
- * It's okay to share the anon page between both processes, mapping
- * the page R/O into both processes.
- */
-dup:
- __page_dup_rmap(page, compound);
- return 0;
+ if (likely(!compound))
+ return folio_try_dup_anon_rmap_pte(folio, page, vma);
+ return folio_try_dup_anon_rmap_pmd(folio, page, vma);
}

/**
--
2.43.0


2023-12-21 02:55:50

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 02/40] mm/rmap: introduce and use hugetlb_remove_rmap()



> On Dec 21, 2023, at 06:44, David Hildenbrand <[email protected]> wrote:
>
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
> code from page_remove_rmap(). This effectively removes one check on the
> small-folio path as well.
>
> Add sanity checks that we end up with the right folios in the right
> functions.
>
> Note: all possible candidates that need care are page_remove_rmap() that
> pass compound=true.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.


2023-12-21 02:59:00

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 03/40] mm/rmap: introduce and use hugetlb_add_file_rmap()



> On Dec 21, 2023, at 06:44, David Hildenbrand <[email protected]> wrote:
>
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Right now we're using page_dup_file_rmap() in some cases where "ordinary"
> rmap code would have used page_add_file_rmap(). So let's introduce and
> use hugetlb_add_file_rmap() instead. We won't be adding a
> "hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
> doing the same: "dup" is just an optimization for "add".
>
> What remains is a single page_dup_file_rmap() call in fork() code.
>
> Add sanity checks that we end up with the right folios in the right
> functions.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.


2023-12-21 04:33:55

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 01/40] mm/rmap: rename hugepage_add* to hugetlb_add*

On Wed, Dec 20, 2023 at 11:44:25PM +0100, David Hildenbrand wrote:
> Let's just call it "hugetlb_".
>
> Yes, it's all already inconsistent and confusing because we have a lot
> of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
> be confused with transparent huge pages, and it matches "hugetlb.c" and
> "folio_test_hugetlb()". So let's minimize confusion in rmap code.
>
> Reviewed-by: Muchun Song <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>

2023-12-21 04:35:57

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 03/40] mm/rmap: introduce and use hugetlb_add_file_rmap()

On Wed, Dec 20, 2023 at 11:44:27PM +0100, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Right now we're using page_dup_file_rmap() in some cases where "ordinary"
> rmap code would have used page_add_file_rmap(). So let's introduce and
> use hugetlb_add_file_rmap() instead. We won't be adding a
> "hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
> doing the same: "dup" is just an optimization for "add".
>
> What remains is a single page_dup_file_rmap() call in fork() code.
>
> Add sanity checks that we end up with the right folios in the right
> functions.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>

2023-12-21 04:40:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 04/40] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

On Wed, Dec 20, 2023 at 11:44:28PM +0100, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
>
> Add sanity checks that we end up with the right folios in the right
> functions.
>
> Note that is_device_private_page() does not apply to hugetlb.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>

> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
> + struct folio *folio)

I particularly like it that you introduced this.

> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> + struct vm_area_struct *vma)
> +{
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> +
> + if (PageAnonExclusive(&folio->page)) {

I wonder if we need a folio_test_hugetlb_anon_exclusive() to make this
a little more ergonomic?

> + if (unlikely(folio_needs_cow_for_dma(vma, folio)))
> + return -EBUSY;
> + ClearPageAnonExclusive(&folio->page);

... and set/clear variants.


2023-12-21 05:47:58

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 04/40] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()



> On Dec 21, 2023, at 06:44, David Hildenbrand <[email protected]> wrote:
>
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
>
> Add sanity checks that we end up with the right folios in the right
> functions.
>
> Note that is_device_private_page() does not apply to hugetlb.
>
> Reviewed-by: Yin Fengwei <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.

2023-12-21 09:47:27

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 04/40] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

On 21.12.23 05:40, Matthew Wilcox wrote:
> On Wed, Dec 20, 2023 at 11:44:28PM +0100, David Hildenbrand wrote:
>> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
>> For example, hugetlb currently only supports entire mappings, and treats
>> any mapping as mapped using a single "logical PTE". Let's move it out
>> of the way so we can overhaul our "ordinary" rmap.
>> implementation/interface.
>>
>> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
>> hugetlb handling use dedicated hugetlb_* rmap functions.
>>
>> Add sanity checks that we end up with the right folios in the right
>> functions.
>>
>> Note that is_device_private_page() does not apply to hugetlb.
>>
>> Reviewed-by: Yin Fengwei <[email protected]>
>> Reviewed-by: Ryan Roberts <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
>

Thanks!

>> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
>> + struct folio *folio)
>
> I particularly like it that you introduced this.

And a later patch even removes page_needs_cow_for_dma() :)


A note that we have one remaining user of page_maybe_dma_pinned().
Instead of converting that code to folios, we should probably just
remove that pte_is_pinned() handling completely: it's inconsistent (only
checks PTEs) and cannot handle concurrent GUP-fast. It's a leftover from
the COW issues we had before PageAnonExclusive. [I've had patch lying
around to do that for a long time, but never sent it]

>
>> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
>> + struct vm_area_struct *vma)
>> +{
>> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
>> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>> +
>> + if (PageAnonExclusive(&folio->page)) {
>
> I wonder if we need a folio_test_hugetlb_anon_exclusive() to make this
> a little more ergonomic?
>
>> + if (unlikely(folio_needs_cow_for_dma(vma, folio)))
>> + return -EBUSY;
>> + ClearPageAnonExclusive(&folio->page);
>
> ... and set/clear variants.
>

I thought about that as well, and even going a step further and instead
of having PageAnonExclusive checks outside rmap code, have something
like the following instead:

hugetlb_test_anon_rmap_exclusive()
folio_test_anon_rmap_exclusive_[pte|pmd]()

I added that to my TODO list, because it results again in a bigger
patchset (especially also in GUP).

--
Cheers,

David / dhildenb


2024-01-22 17:41:32

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On 20/12/2023 22:44, David Hildenbrand wrote:
> Let's convert zap_pte_range() and closely-related
> tlb_flush_rmap_batch(). While at it, perform some more folio conversion
> in zap_pte_range().
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/memory.c | 23 +++++++++++++----------
> mm/mmu_gather.c | 2 +-
> 2 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6552ea27b0bfa..eda2181275d9b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1434,6 +1434,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> arch_enter_lazy_mmu_mode();
> do {
> pte_t ptent = ptep_get(pte);
> + struct folio *folio;
> struct page *page;
>
> if (pte_none(ptent))
> @@ -1459,21 +1460,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> continue;
> }
>
> + folio = page_folio(page);
> delay_rmap = 0;
> - if (!PageAnon(page)) {
> + if (!folio_test_anon(folio)) {
> if (pte_dirty(ptent)) {
> - set_page_dirty(page);
> + folio_set_dirty(folio);

Is this foliation change definitely correct? I note that set_page_dirty() is
defined as this:

bool set_page_dirty(struct page *page)
{
return folio_mark_dirty(page_folio(page));
}

And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
equivalent change, as part of the contpte series, I've swapped set_page_dirty()
for folio_mark_dirty().


> if (tlb_delay_rmap(tlb)) {
> delay_rmap = 1;
> force_flush = 1;
> }
> }
> if (pte_young(ptent) && likely(vma_has_recency(vma)))
> - mark_page_accessed(page);
> + folio_mark_accessed(folio);
> }
> rss[mm_counter(page)]--;
> if (!delay_rmap) {
> - page_remove_rmap(page, vma, false);
> + folio_remove_rmap_pte(folio, page, vma);
> if (unlikely(page_mapcount(page) < 0))
> print_bad_pte(vma, addr, ptent, page);
> }
> @@ -1489,6 +1491,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> if (is_device_private_entry(entry) ||
> is_device_exclusive_entry(entry)) {
> page = pfn_swap_entry_to_page(entry);
> + folio = page_folio(page);
> if (unlikely(!should_zap_page(details, page)))
> continue;
> /*
> @@ -1500,8 +1503,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> WARN_ON_ONCE(!vma_is_anonymous(vma));
> rss[mm_counter(page)]--;
> if (is_device_private_entry(entry))
> - page_remove_rmap(page, vma, false);
> - put_page(page);
> + folio_remove_rmap_pte(folio, page, vma);
> + folio_put(folio);
> } else if (!non_swap_entry(entry)) {
> /* Genuine swap entry, hence a private anon page */
> if (!should_zap_cows(details))
> @@ -3220,10 +3223,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> * threads.
> *
> * The critical issue is to order this
> - * page_remove_rmap with the ptp_clear_flush above.
> - * Those stores are ordered by (if nothing else,)
> + * folio_remove_rmap_pte() with the ptp_clear_flush
> + * above. Those stores are ordered by (if nothing else,)
> * the barrier present in the atomic_add_negative
> - * in page_remove_rmap.
> + * in folio_remove_rmap_pte();
> *
> * Then the TLB flush in ptep_clear_flush ensures that
> * no process can access the old page before the
> @@ -3232,7 +3235,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> * mapcount is visible. So transitively, TLBs to
> * old page will be flushed before it can be reused.
> */
> - page_remove_rmap(vmf->page, vma, false);
> + folio_remove_rmap_pte(old_folio, vmf->page, vma);
> }
>
> /* Free the old page.. */
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 4f559f4ddd217..604ddf08affed 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -55,7 +55,7 @@ static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_
>
> if (encoded_page_flags(enc)) {
> struct page *page = encoded_page_ptr(enc);
> - page_remove_rmap(page, vma, false);
> + folio_remove_rmap_pte(page_folio(page), page, vma);
> }
> }
> }


2024-01-22 17:48:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On 22.01.24 17:58, Ryan Roberts wrote:
> On 20/12/2023 22:44, David Hildenbrand wrote:
>> Let's convert zap_pte_range() and closely-related
>> tlb_flush_rmap_batch(). While at it, perform some more folio conversion
>> in zap_pte_range().
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> mm/memory.c | 23 +++++++++++++----------
>> mm/mmu_gather.c | 2 +-
>> 2 files changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 6552ea27b0bfa..eda2181275d9b 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1434,6 +1434,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>> arch_enter_lazy_mmu_mode();
>> do {
>> pte_t ptent = ptep_get(pte);
>> + struct folio *folio;
>> struct page *page;
>>
>> if (pte_none(ptent))
>> @@ -1459,21 +1460,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>> continue;
>> }
>>
>> + folio = page_folio(page);
>> delay_rmap = 0;
>> - if (!PageAnon(page)) {
>> + if (!folio_test_anon(folio)) {
>> if (pte_dirty(ptent)) {
>> - set_page_dirty(page);
>> + folio_set_dirty(folio);
>
> Is this foliation change definitely correct? I note that set_page_dirty() is
> defined as this: >
> bool set_page_dirty(struct page *page)
> {
> return folio_mark_dirty(page_folio(page));
> }
>
> And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
> equivalent change, as part of the contpte series, I've swapped set_page_dirty()
> for folio_mark_dirty().

Good catch, that should be folio_mark_dirty(). Let me send a fixup.

(the difference in naming for both functions really is bad)


--
Cheers,

David / dhildenb


2024-01-22 17:51:44

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On Mon, Jan 22, 2024 at 06:01:58PM +0100, David Hildenbrand wrote:
> > And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
> > equivalent change, as part of the contpte series, I've swapped set_page_dirty()
> > for folio_mark_dirty().
>
> Good catch, that should be folio_mark_dirty(). Let me send a fixup.
>
> (the difference in naming for both functions really is bad)

It really is, and I don't know what to do about it.

We need a function that literally just sets the flag. For every other
flag, that's folio_set_FLAG. We can't use __folio_set_flag because that
means "set the flag non-atomically".

We need a function that does all of the work involved with tracking
dirty folios. I chose folio_mark_dirty() to align with
folio_mark_uptodate() (ie mark is not just 'set" but also "do some extra
work").

But because we're converting from set_page_dirty(), the OBVIOUS rename
is to folio_set_dirty(), which is WRONG.

So we're in the part of the design space where the consistent naming and
the-obvious-thing-to-do-is-wrong are in collision, and I do not have a
good answer.

Maybe we can call the first function _folio_set_dirty(), and we don't
have a folio_set_dirty() at all? We don't have a folio_set_uptodate(),
so there's some precedent there.

2024-01-22 18:06:13

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On 22/01/2024 17:20, Matthew Wilcox wrote:
> On Mon, Jan 22, 2024 at 06:01:58PM +0100, David Hildenbrand wrote:
>>> And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
>>> equivalent change, as part of the contpte series, I've swapped set_page_dirty()
>>> for folio_mark_dirty().
>>
>> Good catch, that should be folio_mark_dirty(). Let me send a fixup.
>>
>> (the difference in naming for both functions really is bad)
>
> It really is, and I don't know what to do about it.
>
> We need a function that literally just sets the flag. For every other
> flag, that's folio_set_FLAG. We can't use __folio_set_flag because that
> means "set the flag non-atomically".
>
> We need a function that does all of the work involved with tracking
> dirty folios. I chose folio_mark_dirty() to align with
> folio_mark_uptodate() (ie mark is not just 'set" but also "do some extra
> work").
>
> But because we're converting from set_page_dirty(), the OBVIOUS rename
> is to folio_set_dirty(), which is WRONG.
>
> So we're in the part of the design space where the consistent naming and
> the-obvious-thing-to-do-is-wrong are in collision, and I do not have a
> good answer.
>
> Maybe we can call the first function _folio_set_dirty(), and we don't
> have a folio_set_dirty() at all? We don't have a folio_set_uptodate(),
> so there's some precedent there.

Is there anything stopping us from renaming set_page_dirty() to
mark_page_dirty() (or page_mark_dirty())? For me the folio naming is consistent,
but the page names suck; presumably PageSetDirty() and set_page_dirty()... yuk.

2024-01-22 18:09:29

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On 22.01.24 18:20, Matthew Wilcox wrote:
> On Mon, Jan 22, 2024 at 06:01:58PM +0100, David Hildenbrand wrote:
>>> And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
>>> equivalent change, as part of the contpte series, I've swapped set_page_dirty()
>>> for folio_mark_dirty().
>>
>> Good catch, that should be folio_mark_dirty(). Let me send a fixup.
>>
>> (the difference in naming for both functions really is bad)
>
> It really is, and I don't know what to do about it.
>
> We need a function that literally just sets the flag. For every other
> flag, that's folio_set_FLAG. We can't use __folio_set_flag because that
> means "set the flag non-atomically".
>
> We need a function that does all of the work involved with tracking
> dirty folios. I chose folio_mark_dirty() to align with
> folio_mark_uptodate() (ie mark is not just 'set" but also "do some extra
> work").
>
> But because we're converting from set_page_dirty(), the OBVIOUS rename
> is to folio_set_dirty(), which is WRONG.

And I made the same mistake at least also in "mm/huge_memory:
page_remove_rmap() -> folio_remove_rmap_pmd()".

I better double check all these so-simple-looking conversions that just
went upstream.

Interestingly, __split_huge_pmd_locked() used SetPageReferenced()
instead of

>
> So we're in the part of the design space where the consistent naming and
> the-obvious-thing-to-do-is-wrong are in collision, and I do not have a
> good answer.
>
> Maybe we can call the first function _folio_set_dirty(), and we don't
> have a folio_set_dirty() at all? We don't have a folio_set_uptodate(),
> so there's some precedent there.

Good question. This mark vs. set is confusing. We want some way to
highlight that folio_set_dirty() is the one that we usually do not want
to use.

--
Cheers,

David / dhildenb


2024-01-22 18:33:26

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On Mon, Jan 22, 2024 at 05:26:00PM +0000, Ryan Roberts wrote:
> On 22/01/2024 17:20, Matthew Wilcox wrote:
> > On Mon, Jan 22, 2024 at 06:01:58PM +0100, David Hildenbrand wrote:
> >>> And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
> >>> equivalent change, as part of the contpte series, I've swapped set_page_dirty()
> >>> for folio_mark_dirty().
> >>
> >> Good catch, that should be folio_mark_dirty(). Let me send a fixup.
> >>
> >> (the difference in naming for both functions really is bad)
> >
> > It really is, and I don't know what to do about it.
> >
> > We need a function that literally just sets the flag. For every other
> > flag, that's folio_set_FLAG. We can't use __folio_set_flag because that
> > means "set the flag non-atomically".
> >
> > We need a function that does all of the work involved with tracking
> > dirty folios. I chose folio_mark_dirty() to align with
> > folio_mark_uptodate() (ie mark is not just 'set" but also "do some extra
> > work").
> >
> > But because we're converting from set_page_dirty(), the OBVIOUS rename
> > is to folio_set_dirty(), which is WRONG.
> >
> > So we're in the part of the design space where the consistent naming and
> > the-obvious-thing-to-do-is-wrong are in collision, and I do not have a
> > good answer.
> >
> > Maybe we can call the first function _folio_set_dirty(), and we don't
> > have a folio_set_dirty() at all? We don't have a folio_set_uptodate(),
> > so there's some precedent there.
>
> Is there anything stopping us from renaming set_page_dirty() to
> mark_page_dirty() (or page_mark_dirty())? For me the folio naming is consistent,
> but the page names suck; presumably PageSetDirty() and set_page_dirty()... yuk.

Well, laziness. There's about 150 places where we mention
set_page_dirty() and all of them need to be converted to
folio_mark_dirty(). I don't particularly like converting code twice;
I get the impression it annoys people.

The important thing is what does it look like when someone writes
a new filesystem in 2030. I fear that they may get confused and
call folio_set_dirty(), not realising that they should be calling
folio_mark_dirty(). It doesn't help that btrfs have decided to introduce
btrfs_folio_set_dirty().

I think MM people can afford to add a leading '_' to folio_set_dirty()
so that's my current favourite option for fixing this mess.

2024-01-22 18:45:19

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

On 22.01.24 18:34, David Hildenbrand wrote:
> On 22.01.24 18:20, Matthew Wilcox wrote:
>> On Mon, Jan 22, 2024 at 06:01:58PM +0100, David Hildenbrand wrote:
>>>> And folio_mark_dirty() is doing more than just setting teh PG_dirty bit. In my
>>>> equivalent change, as part of the contpte series, I've swapped set_page_dirty()
>>>> for folio_mark_dirty().
>>>
>>> Good catch, that should be folio_mark_dirty(). Let me send a fixup.
>>>
>>> (the difference in naming for both functions really is bad)
>>
>> It really is, and I don't know what to do about it.
>>
>> We need a function that literally just sets the flag. For every other
>> flag, that's folio_set_FLAG. We can't use __folio_set_flag because that
>> means "set the flag non-atomically".
>>
>> We need a function that does all of the work involved with tracking
>> dirty folios. I chose folio_mark_dirty() to align with
>> folio_mark_uptodate() (ie mark is not just 'set" but also "do some extra
>> work").
>>
>> But because we're converting from set_page_dirty(), the OBVIOUS rename
>> is to folio_set_dirty(), which is WRONG.
>
> And I made the same mistake at least also in "mm/huge_memory:
> page_remove_rmap() -> folio_remove_rmap_pmd()".
>
> I better double check all these so-simple-looking conversions that just
> went upstream.
>
> Interestingly, __split_huge_pmd_locked() used SetPageReferenced()
> instead of

Forgot to delete that sentence.

Anyhow, it's all confusing. My replacement in 91b2978a34807 from
SetPageDirty -> folio_set_dirty() was correct. It only operates on anon
folios, likely that's why folio_set_dirty() is okay there.

Oh my.

--
Cheers,

David / dhildenb