2023-12-04 14:22:10

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 00/39] mm/rmap: interface overhaul

Baed on mm-stable from a couple of days.

This series proposes an overhaul to our rmap interface, to get rid of the
"bool compound" / RMAP_COMPOUND parameter with the goal of making the
interface less error prone, more future proof, and more natural to extend
to "batching". Also, this converts the interface to always consume
folio+subpage, which speeds up operations on large folios.

Further, this series adds PTE-batching variants for 4 rmap functions,
whereby only folio_add_anon_rmap_ptes() is used for batching in this series
when PTE-remapping a PMD-mapped THP.

Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
-- he carries his own batching variant right now -- and
folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].

There is some overlap with both series (and some other work, like
multi-size THP [3]), so that will need some coordination, and likely a
stepwise inclusion.

I got that started [4], but it made sense to show the whole picture. The
patches of [4] are contained in here, with one additional patch added
("mm/rmap: introduce and use hugetlb_try_share_anon_rmap()") and some
slight patch description changes.

In general, RMAP batching is an important optimization for PTE-mapped
THP, especially once we want to move towards a total mapcount or further,
as shown with my WIP patches on "mapped shared vs. mapped exclusively" [5].

The rmap batching part of [5] is also contained here in a slightly reworked
fork [and I found a bug du to the "compound" parameter handling in these
patches that should be fixed here :) ].

This series performs a lot of folio conversion, that could be separated
if there is a good reason. Most of the added LOC in the diff are only due
to documentation.

As we're moving to a pte/pmd interface where we clearly express the
mapping granularity we are dealing with, we first get the remainder of
hugetlb out of the way, as it is special and expected to remain special: it
treats everything as a "single logical PTE" and only currently allows
entire mappings.

Even if we'd ever support partial mappings, I strongly
assume the interface and implementation will still differ heavily:
hopefull we can avoid working on subpages/subpage mapcounts completely and
only add a "count" parameter for them to enable batching.


New (extended) hugetlb interface that operate on entire folio:
* hugetlb_add_new_anon_rmap() -> Already existed
* hugetlb_add_anon_rmap() -> Already existed
* hugetlb_try_dup_anon_rmap()
* hugetlb_try_share_anon_rmap()
* hugetlb_add_file_rmap()
* hugetlb_remove_rmap()

New "ordinary" interface for small folios / THP::
* folio_add_new_anon_rmap() -> Already existed
* folio_add_anon_rmap_[pte|ptes|pmd]()
* folio_try_dup_anon_rmap_[pte|ptes|pmd]()
* folio_try_share_anon_rmap_[pte|pmd]()
* folio_add_file_rmap_[pte|ptes|pmd]()
* folio_dup_file_rmap_[pte|ptes|pmd]()
* folio_remove_rmap_[pte|ptes|pmd]()

folio_add_new_anon_rmap() will always map at the biggest granularity
possible (currently, a single PMD to cover a PMD-sized THP). Could be
extended if ever required.

In the future, we might want "_pud" variants and eventually "_pmds" variants
for batching. Further, if hugepd is ever a thing outside hugetlb code,
we might want some variants for that. All stuff for the distant future.


I ran some simple microbenchmarks from [5] on an Intel(R) Xeon(R) Silver
4210R: munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
remapping PMD-mapped THPs on 1 GiB of memory.

For small folios, there is barely a change (< 1 % performance improvement),
whereby fork() still stands out with 0.74% performance improvement, but
it might be just some noise. Folio optimizations don't help that much
with small folios.

For PTE-mapped THP:
* PTE-remapping a PMD-mapped THP is more than 10% faster.
-> RMAP batching
* fork() is more than 4% faster.
-> folio conversion
* MADV_DONTNEED is 2% faster
-> folio conversion
* COW by writing only a single byte on a COW-shared PTE
-> folio conversion
* munmap() is only slightly faster (< 1%).

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]
[3] https://lkml.kernel.org/r/[email protected]
[4] https://lkml.kernel.org/r/[email protected]
[5] https://lkml.kernel.org/r/[email protected]

Cc: Andrew Morton <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yin Fengwei <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>

David Hildenbrand (39):
mm/rmap: rename hugepage_add* to hugetlb_add*
mm/rmap: introduce and use hugetlb_remove_rmap()
mm/rmap: introduce and use hugetlb_add_file_rmap()
mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
mm/rmap: add hugetlb sanity checks
mm/rmap: convert folio_add_file_rmap_range() into
folio_add_file_rmap_[pte|ptes|pmd]()
mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
mm/rmap: remove page_add_file_rmap()
mm/rmap: factor out adding folio mappings into __folio_add_rmap()
mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
mm/rmap: remove page_add_anon_rmap()
mm/rmap: remove RMAP_COMPOUND
mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
Documentation: stop referring to page_remove_rmap()
mm/rmap: remove page_remove_rmap()
mm/rmap: convert page_dup_file_rmap() to
folio_dup_file_rmap_[pte|ptes|pmd]()
mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
mm/huge_memory: page_try_dup_anon_rmap() ->
folio_try_dup_anon_rmap_pmd()
mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
mm/rmap: remove page_try_dup_anon_rmap()
mm: convert page_try_share_anon_rmap() to
folio_try_share_anon_rmap_[pte|pmd]()
mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED

Documentation/mm/transhuge.rst | 4 +-
Documentation/mm/unevictable-lru.rst | 4 +-
include/linux/mm.h | 6 +-
include/linux/rmap.h | 380 +++++++++++++++++++-----
kernel/events/uprobes.c | 2 +-
mm/gup.c | 2 +-
mm/huge_memory.c | 85 +++---
mm/hugetlb.c | 21 +-
mm/internal.h | 12 +-
mm/khugepaged.c | 17 +-
mm/ksm.c | 15 +-
mm/memory-failure.c | 4 +-
mm/memory.c | 60 ++--
mm/migrate.c | 12 +-
mm/migrate_device.c | 41 +--
mm/mmu_gather.c | 2 +-
mm/rmap.c | 422 ++++++++++++++++-----------
mm/swapfile.c | 2 +-
mm/userfaultfd.c | 2 +-
19 files changed, 709 insertions(+), 384 deletions(-)

--
2.41.0


2023-12-04 14:22:12

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 03/39] mm/rmap: introduce and use hugetlb_add_file_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

Right now we're using page_dup_file_rmap() in some cases where "ordinary"
rmap code would have used page_add_file_rmap(). So let's introduce and
use hugetlb_add_file_rmap() instead. We won't be adding a
"hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
doing the same: "dup" is just an optimization for "add".

What remains is a single page_dup_file_rmap() call in fork() code.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 7 +++++++
mm/hugetlb.c | 6 +++---
mm/migrate.c | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e8d1dc1d5361f..0a81e8420a961 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,6 +208,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_add_file_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+
+ atomic_inc(&folio->_entire_mapcount);
+}
+
static inline void hugetlb_remove_rmap(struct folio *folio)
{
atomic_dec(&folio->_entire_mapcount);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d17bb53b19ff2..541a8f38cfdc7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5401,7 +5401,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* sleep during the process.
*/
if (!folio_test_anon(pte_folio)) {
- page_dup_file_rmap(&pte_folio->page, true);
+ hugetlb_add_file_rmap(pte_folio);
} else if (page_try_dup_anon_rmap(&pte_folio->page,
true, src_vma)) {
pte_t src_pte_old = entry;
@@ -6272,7 +6272,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
if (anon_rmap)
hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
&& (vma->vm_flags & VM_SHARED)));
/*
@@ -6723,7 +6723,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out_release_unlock;

if (folio_in_pagecache)
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
else
hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

diff --git a/mm/migrate.c b/mm/migrate.c
index 4cb849fa0dd2c..de9d94b99ab78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio,
hugetlb_add_anon_rmap(folio, vma, pvmw.address,
rmap_flags);
else
- page_dup_file_rmap(new, true);
+ hugetlb_add_file_rmap(folio);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
psize);
} else
--
2.41.0

2023-12-04 14:22:19

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 05/39] mm/rmap: introduce and use hugetlb_try_share_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Note that try_to_unmap_one() does not need care. Easy to spot because
among all that nasty hugetlb special-casing in that function, we're not
using set_huge_pte_at() on the anon path -- well, and that code assumes
that we we would want to swapout.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 20 ++++++++++++++++++++
mm/rmap.c | 15 ++++++++++-----
2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8068c332e2ce5..3f38141b53b9d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -223,6 +223,26 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
return 0;
}

+/* See page_try_share_anon_rmap() */
+static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
+
+ /* See page_try_share_anon_rmap() */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_mb();
+
+ if (unlikely(folio_maybe_dma_pinned(folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+
+ /* See page_try_share_anon_rmap() */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_mb__after_atomic();
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
diff --git a/mm/rmap.c b/mm/rmap.c
index 5037581b79ec6..2f1af3958e687 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2105,13 +2105,18 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
!anon_exclusive, subpage);

/* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
- if (folio_test_hugetlb(folio))
+ if (folio_test_hugetlb(folio)) {
+ if (anon_exclusive &&
+ hugetlb_try_share_anon_rmap(folio)) {
set_huge_pte_at(mm, address, pvmw.pte,
pteval, hsz);
- else
- set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ break;
+ }
+ } else if (anon_exclusive &&
+ page_try_share_anon_rmap(page)) {
+ set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
break;
--
2.41.0

2023-12-04 14:22:23

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
code from page_remove_rmap(). This effectively removes one check on the
small-folio path as well.

Note: all possible candidates that need care are page_remove_rmap() that
pass compound=true.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 5 +++++
mm/hugetlb.c | 4 ++--
mm/rmap.c | 17 ++++++++---------
3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 4c5bfeb054636..e8d1dc1d5361f 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,6 +208,11 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_remove_rmap(struct folio *folio)
+{
+ atomic_dec(&folio->_entire_mapcount);
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
if (compound) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4cfa0679661e2..d17bb53b19ff2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5669,7 +5669,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
make_pte_marker(PTE_MARKER_UFFD_WP),
sz);
hugetlb_count_sub(pages_per_huge_page(h), mm);
- page_remove_rmap(page, vma, true);
+ hugetlb_remove_rmap(page_folio(page));

spin_unlock(ptl);
tlb_remove_page_size(tlb, page, huge_page_size(h));
@@ -5980,7 +5980,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,

/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
- page_remove_rmap(&old_folio->page, vma, true);
+ hugetlb_remove_rmap(old_folio);
hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
diff --git a/mm/rmap.c b/mm/rmap.c
index 112467c30b2c9..5037581b79ec6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1440,13 +1440,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,

VM_BUG_ON_PAGE(compound && !PageHead(page), page);

- /* Hugetlb pages are not counted in NR_*MAPPED */
- if (unlikely(folio_test_hugetlb(folio))) {
- /* hugetlb pages are always mapped with pmds */
- atomic_dec(&folio->_entire_mapcount);
- return;
- }
-
/* Is page being unmapped by PTE? Is this its last map to be removed? */
if (likely(!compound)) {
last = atomic_add_negative(-1, &page->_mapcount);
@@ -1804,7 +1797,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
dec_mm_counter(mm, mm_counter_file(&folio->page));
}
discard:
- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2157,7 +2153,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
*/
}

- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
--
2.41.0

2023-12-04 14:22:33

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 01/39] mm/rmap: rename hugepage_add* to hugetlb_add*

Let's just call it "hugetlb_".

Yes, it's all already inconsistent and confusing because we have a lot
of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
be confused with transparent huge pages, and it matches "hugetlb.c" and
"folio_test_hugetlb()". So let's minimize confusion in rmap code.

Reviewed-by: Muchun Song <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 ++--
mm/hugetlb.c | 8 ++++----
mm/migrate.c | 4 ++--
mm/rmap.c | 8 ++++----
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b26fe858fd444..4c5bfeb054636 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -203,9 +203,9 @@ void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

-void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
-void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

static inline void __page_dup_rmap(struct page *page, bool compound)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1169ef2f2176f..4cfa0679661e2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5278,7 +5278,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add
pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);

__folio_mark_uptodate(new_folio);
- hugepage_add_new_anon_rmap(new_folio, vma, addr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, addr);
if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz);
@@ -5981,7 +5981,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
page_remove_rmap(&old_folio->page, vma, true);
- hugepage_add_new_anon_rmap(new_folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(mm, haddr, ptep, newpte, huge_page_size(h));
@@ -6270,7 +6270,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
goto backout;

if (anon_rmap)
- hugepage_add_new_anon_rmap(folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
page_dup_file_rmap(&folio->page, true);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
@@ -6725,7 +6725,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
if (folio_in_pagecache)
page_dup_file_rmap(&folio->page, true);
else
- hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr);
+ hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

/*
* For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
diff --git a/mm/migrate.c b/mm/migrate.c
index 35a88334bb3c2..4cb849fa0dd2c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -249,8 +249,8 @@ static bool remove_migration_pte(struct folio *folio,

pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
if (folio_test_anon(folio))
- hugepage_add_anon_rmap(folio, vma, pvmw.address,
- rmap_flags);
+ hugetlb_add_anon_rmap(folio, vma, pvmw.address,
+ rmap_flags);
else
page_dup_file_rmap(new, true);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
diff --git a/mm/rmap.c b/mm/rmap.c
index 7a27a2b418021..112467c30b2c9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2583,8 +2583,8 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
*
* RMAP_COMPOUND is ignored.
*/
-void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
+void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags)
{
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

@@ -2595,8 +2595,8 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
PageAnonExclusive(&folio->page), folio);
}

-void hugepage_add_new_anon_rmap(struct folio *folio,
- struct vm_area_struct *vma, unsigned long address)
+void hugetlb_add_new_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma, unsigned long address)
{
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
--
2.41.0

2023-12-04 14:22:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 04/39] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Note that is_device_private_page() does not apply to hugetlb.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 12 +++++++++---
include/linux/rmap.h | 15 +++++++++++++++
mm/hugetlb.c | 3 +--
3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece7..24c1c7c5a99c0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1953,15 +1953,21 @@ static inline bool page_maybe_dma_pinned(struct page *page)
*
* The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
*/
-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
+static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct folio *folio)
{
VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));

if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
return false;

- return page_maybe_dma_pinned(page);
+ return folio_maybe_dma_pinned(folio);
+}
+
+static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct page *page)
+{
+ return folio_needs_cow_for_dma(vma, page_folio(page));
}

/**
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0a81e8420a961..8068c332e2ce5 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,6 +208,21 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+/* See page_try_dup_anon_rmap() */
+static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+ if (PageAnonExclusive(&folio->page)) {
+ if (unlikely(folio_needs_cow_for_dma(vma, folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+ }
+ atomic_inc(&folio->_entire_mapcount);
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 541a8f38cfdc7..d927f8b2893c0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5402,8 +5402,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
if (!folio_test_anon(pte_folio)) {
hugetlb_add_file_rmap(pte_folio);
- } else if (page_try_dup_anon_rmap(&pte_folio->page,
- true, src_vma)) {
+ } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
pte_t src_pte_old = entry;
struct folio *new_folio;

--
2.41.0

2023-12-04 14:22:49

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 09/39] mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()

Let's convert remove_migration_pmd() and while at it, perform some folio
conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4f542444a91f2..cb33c6e0404cf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3276,6 +3276,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,

void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
{
+ struct folio *folio = page_folio(new);
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
@@ -3287,7 +3288,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
return;

entry = pmd_to_swp_entry(*pvmw->pmd);
- get_page(new);
+ folio_get(folio);
pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
@@ -3298,10 +3299,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
if (!is_migration_entry_young(entry))
pmde = pmd_mkold(pmde);
/* NOTE: this may contain setting soft-dirty on some archs */
- if (PageDirty(new) && is_migration_entry_dirty(entry))
+ if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
pmde = pmd_mkdirty(pmde);

- if (PageAnon(new)) {
+ if (folio_test_anon(folio)) {
rmap_t rmap_flags = RMAP_COMPOUND;

if (!is_readable_migration_entry(entry))
@@ -3309,9 +3310,9 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)

page_add_anon_rmap(new, vma, haddr, rmap_flags);
} else {
- page_add_file_rmap(new, vma, true);
+ folio_add_file_rmap_pmd(folio, new, vma);
}
- VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new));
+ VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new));
set_pmd_at(mm, haddr, pvmw->pmd, pmde);

/* No need to invalidate - it was non-present before */
--
2.41.0

2023-12-04 14:22:51

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 12/39] mm/rmap: remove page_add_file_rmap()

All users are gone, let's remove it.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 2 --
mm/rmap.c | 21 ---------------------
2 files changed, 23 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index a4a30c361ac50..95f7b94a70295 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -235,8 +235,6 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
-void page_add_file_rmap(struct page *, struct vm_area_struct *,
- bool compound);
void folio_add_file_rmap_ptes(struct folio *, struct page *, unsigned int nr,
struct vm_area_struct *);
#define folio_add_file_rmap_pte(folio, page, vma) \
diff --git a/mm/rmap.c b/mm/rmap.c
index 1614d98062948..53e2c653be99a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1422,27 +1422,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
#endif
}

-/**
- * page_add_file_rmap - add pte mapping to a file page
- * @page: the page to add the mapping to
- * @vma: the vm area in which the mapping is added
- * @compound: charge the page as compound or small page
- *
- * The caller needs to hold the pte lock.
- */
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
-{
- struct folio *folio = page_folio(page);
-
- VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
-
- if (likely(!compound))
- folio_add_file_rmap_pte(folio, page, vma);
- else
- folio_add_file_rmap_pmd(folio, page, vma);
-}
-
/**
* page_remove_rmap - take down pte mapping from a page
* @page: page to remove mapping from
--
2.41.0

2023-12-04 14:22:54

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 08/39] mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()

Let's convert insert_page_into_pte_locked() and do_set_pmd(). While at it,
perform some folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 15325587cff01..be7fe58f7c297 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1845,12 +1845,14 @@ static int validate_page_before_insert(struct page *page)
static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, struct page *page, pgprot_t prot)
{
+ struct folio *folio = page_folio(page);
+
if (!pte_none(ptep_get(pte)))
return -EBUSY;
/* Ok, finally just insert the thing.. */
- get_page(page);
+ folio_get(folio);
inc_mm_counter(vma->vm_mm, mm_counter_file(page));
- page_add_file_rmap(page, vma, false);
+ folio_add_file_rmap_pte(folio, page, vma);
set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
return 0;
}
@@ -4308,6 +4310,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)

vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
{
+ struct folio *folio = page_folio(page);
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
@@ -4317,8 +4320,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (!transhuge_vma_suitable(vma, haddr))
return ret;

- page = compound_head(page);
- if (compound_order(page) != HPAGE_PMD_ORDER)
+ if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
return ret;

/*
@@ -4327,7 +4329,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
* check. This kind of THP just can be PTE mapped. Access to
* the corrupted subpage should trigger SIGBUS as expected.
*/
- if (unlikely(PageHasHWPoisoned(page)))
+ if (unlikely(folio_test_has_hwpoisoned(folio)))
return ret;

/*
@@ -4351,7 +4353,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
- page_add_file_rmap(page, vma, true);
+ folio_add_file_rmap_pmd(folio, page, vma);

/*
* deposit and withdraw with pmd lock held
--
2.41.0

2023-12-04 14:22:55

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 11/39] mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()

Let's convert mfill_atomic_install_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 0b6ca553bebec..abf4c579d328a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -114,7 +114,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
/* Usually, cache pages are already added to LRU */
if (newly_allocated)
folio_add_lru(folio);
- page_add_file_rmap(page, dst_vma, false);
+ folio_add_file_rmap_pte(folio, page, dst_vma);
} else {
page_add_new_anon_rmap(page, dst_vma, dst_addr);
folio_add_lru_vma(folio, dst_vma);
--
2.41.0

2023-12-04 14:22:58

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

Let's get rid of the compound parameter and instead define implicitly
which mappings we're adding. That is more future proof, easier to read
and harder to mess up.

Use an enum to express the granularity internally. Make the compiler
always special-case on the granularity by using __always_inline.

Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
folio_test_pmd_mappable() check by a config check in the caller and
sanity checks. Convert the single user of folio_add_file_rmap_range().

This function design can later easily be extended to PUDs and to batch
PMDs. Note that for now we don't support anything bigger than
PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
will catch if that ever changes.

Next up is removing page_remove_rmap() along with its "compound"
parameter and smilarly converting all other rmap functions.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 47 +++++++++++++++++++++++++++--
mm/memory.c | 2 +-
mm/rmap.c | 72 ++++++++++++++++++++++++++++----------------
3 files changed, 92 insertions(+), 29 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 77e336f86c72d..a4a30c361ac50 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -186,6 +186,45 @@ typedef int __bitwise rmap_t;
*/
#define RMAP_COMPOUND ((__force rmap_t)BIT(1))

+/*
+ * Internally, we're using an enum to specify the granularity. Usually,
+ * we make the compiler create specialized variants for the different
+ * granularity.
+ */
+enum rmap_mode {
+ RMAP_MODE_PTE = 0,
+ RMAP_MODE_PMD,
+};
+
+static inline void __folio_rmap_sanity_checks(struct folio *folio,
+ struct page *page, unsigned int nr_pages, enum rmap_mode mode)
+{
+ /* hugetlb folios are handled separately. */
+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_large(folio) &&
+ !folio_test_large_rmappable(folio), folio);
+
+ VM_WARN_ON_ONCE(!nr_pages || nr_pages > folio_nr_pages(folio));
+ VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
+ VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
+
+ switch (mode) {
+ case RMAP_MODE_PTE:
+ break;
+ case RMAP_MODE_PMD:
+ /*
+ * We don't support folios larger than a single PMD yet. So
+ * when RMAP_MODE_PMD is set, we assume that we are creating
+ * a single "entire" mapping of the folio.
+ */
+ VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
+ VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
+ break;
+ default:
+ VM_WARN_ON_ONCE(true);
+ }
+}
+
/*
* rmap interfaces called when adding or removing pte of page
*/
@@ -198,8 +237,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);
void page_add_file_rmap(struct page *, struct vm_area_struct *,
bool compound);
-void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
- struct vm_area_struct *, bool compound);
+void folio_add_file_rmap_ptes(struct folio *, struct page *, unsigned int nr,
+ struct vm_area_struct *);
+#define folio_add_file_rmap_pte(folio, page, vma) \
+ folio_add_file_rmap_ptes(folio, page, 1, vma)
+void folio_add_file_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

diff --git a/mm/memory.c b/mm/memory.c
index 1f18ed4a54971..15325587cff01 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4414,7 +4414,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
folio_add_lru_vma(folio, vma);
} else {
add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
- folio_add_file_rmap_range(folio, page, nr, vma, false);
+ folio_add_file_rmap_ptes(folio, page, nr, vma);
}
set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);

diff --git a/mm/rmap.c b/mm/rmap.c
index a735ecca47a81..1614d98062948 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1334,31 +1334,19 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
SetPageAnonExclusive(&folio->page);
}

-/**
- * folio_add_file_rmap_range - add pte mapping to page range of a folio
- * @folio: The folio to add the mapping to
- * @page: The first page to add
- * @nr_pages: The number of pages which will be mapped
- * @vma: the vm area in which the mapping is added
- * @compound: charge the page as compound or small page
- *
- * The page range of folio is defined by [first_page, first_page + nr_pages)
- *
- * The caller needs to hold the pte lock.
- */
-void folio_add_file_rmap_range(struct folio *folio, struct page *page,
- unsigned int nr_pages, struct vm_area_struct *vma,
- bool compound)
+static __always_inline void __folio_add_file_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *vma, enum rmap_mode mode)
{
atomic_t *mapped = &folio->_nr_pages_mapped;
unsigned int nr_pmdmapped = 0, first;
int nr = 0;

- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

/* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(!compound)) {
+ if (likely(mode == RMAP_MODE_PTE)) {
do {
first = atomic_inc_and_test(&page->_mapcount);
if (first && folio_test_large(folio)) {
@@ -1369,9 +1357,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
if (first)
nr++;
} while (page++, --nr_pages > 0);
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
-
+ } else if (mode == RMAP_MODE_PMD) {
first = atomic_inc_and_test(&folio->_entire_mapcount);
if (first) {
nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
@@ -1399,6 +1385,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
mlock_vma_folio(folio, vma);
}

+/**
+ * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
+ * @folio: The folio to add the mappings to
+ * @page: The first page to add
+ * @nr_pages: The number of pages that will be mapped using PTEs
+ * @vma: The vm area in which the mappings are added
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
+ unsigned int nr_pages, struct vm_area_struct *vma)
+{
+ __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
+}
+
+/**
+ * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @vma: The vm area in which the mapping is added
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/**
* page_add_file_rmap - add pte mapping to a file page
* @page: the page to add the mapping to
@@ -1411,16 +1434,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
bool compound)
{
struct folio *folio = page_folio(page);
- unsigned int nr_pages;

VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);

if (likely(!compound))
- nr_pages = 1;
+ folio_add_file_rmap_pte(folio, page, vma);
else
- nr_pages = folio_nr_pages(folio);
-
- folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+ folio_add_file_rmap_pmd(folio, page, vma);
}

/**
--
2.41.0

2023-12-04 14:23:04

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 10/39] mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()

Let's convert remove_migration_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index de9d94b99ab78..efc19f53b05e6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -262,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio,
page_add_anon_rmap(new, vma, pvmw.address,
rmap_flags);
else
- page_add_file_rmap(new, vma, false);
+ folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
}
if (vma->vm_flags & VM_LOCKED)
--
2.41.0

2023-12-04 14:23:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()

Let's mimic what we did with folio_add_file_rmap_*() so we can similarly
replace page_add_anon_rmap() next.

Make the compiler always special-case on the granularity by using
__always_inline.

Note that the new functions ignore the RMAP_COMPOUND flag, which we will
remove as soon as page_add_anon_rmap() is gone.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 +++
mm/rmap.c | 115 +++++++++++++++++++++++++++++--------------
2 files changed, 85 insertions(+), 36 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 95f7b94a70295..9e1c197f50199 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -229,6 +229,12 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
* rmap interfaces called when adding or removing pte of page
*/
void folio_move_anon_rmap(struct folio *, struct vm_area_struct *);
+void folio_add_anon_rmap_ptes(struct folio *, struct page *, unsigned int nr,
+ struct vm_area_struct *, unsigned long address, rmap_t flags);
+#define folio_add_anon_rmap_pte(folio, page, vma, address, flags) \
+ folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
+void folio_add_anon_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *, unsigned long address, rmap_t flags);
void page_add_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
diff --git a/mm/rmap.c b/mm/rmap.c
index c09b360402599..85bea11e9266b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1267,38 +1267,21 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
struct folio *folio = page_folio(page);
- atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool compound = flags & RMAP_COMPOUND;
- bool first;

- /* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(!compound)) {
- first = atomic_inc_and_test(&page->_mapcount);
- nr = first;
- if (first && folio_test_large(folio)) {
- nr = atomic_inc_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
+ if (likely(!(flags & RMAP_COMPOUND)))
+ folio_add_anon_rmap_pte(folio, page, vma, address, flags);
+ else
+ folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
+}

- first = atomic_inc_and_test(&folio->_entire_mapcount);
- if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
- nr_pmdmapped = folio_nr_pages(folio);
- nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
- /* Raced ahead of a remove and another add? */
- if (unlikely(nr < 0))
- nr = 0;
- } else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
- nr = 0;
- }
- }
- }
+static __always_inline void __folio_add_anon_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *vma, unsigned long address, rmap_t flags,
+ enum rmap_mode mode)
+{
+ unsigned int i, nr, nr_pmdmapped = 0;

+ nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
if (nr_pmdmapped)
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped);
if (nr)
@@ -1312,18 +1295,30 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
* folio->index right when not given the address of the head
* page.
*/
- VM_WARN_ON_FOLIO(folio_test_large(folio) && !compound, folio);
+ VM_WARN_ON_FOLIO(folio_test_large(folio) &&
+ mode != RMAP_MODE_PMD, folio);
__folio_set_anon(folio, vma, address,
!!(flags & RMAP_EXCLUSIVE));
} else if (likely(!folio_test_ksm(folio))) {
__page_check_anon_rmap(folio, page, vma, address);
}
- if (flags & RMAP_EXCLUSIVE)
- SetPageAnonExclusive(page);
- /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
- VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 ||
- (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) &&
- PageAnonExclusive(page), folio);
+
+ if (flags & RMAP_EXCLUSIVE) {
+ if (likely(nr_pages == 1 || mode != RMAP_MODE_PTE))
+ SetPageAnonExclusive(page);
+ else
+ for (i = 0; i < nr_pages; i++)
+ SetPageAnonExclusive(page + i);
+ }
+ for (i = 0; i < nr_pages; i++) {
+ struct page *cur_page = page + i;
+
+ /* While PTE-mapping a THP we have a PMD and a PTE mapping. */
+ VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 ||
+ (folio_test_large(folio) &&
+ folio_entire_mapcount(folio) > 1)) &&
+ PageAnonExclusive(cur_page), folio);
+ }

/*
* For large folio, only mlock it if it's fully mapped to VMA. It's
@@ -1335,6 +1330,54 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
mlock_vma_folio(folio, vma);
}

+/**
+ * folio_add_anon_rmap_ptes - add PTE mappings to a page range of an anon folio
+ * @folio: The folio to add the mappings to
+ * @page: The first page to add
+ * @nr_pages: The number of pages which will be mapped
+ * @vma: The vm area in which the mappings are added
+ * @address: The user virtual address of the first page to map
+ * @flags: The rmap flags
+ *
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
+ * The caller needs to hold the page table lock, and the page must be locked in
+ * the anon_vma case: to serialize mapping,index checking after setting,
+ * and to ensure that an anon folio is not being upgraded racily to a KSM folio
+ * (but KSM folios are never downgraded).
+ */
+void folio_add_anon_rmap_ptes(struct folio *folio, struct page *page,
+ unsigned int nr_pages, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags)
+{
+ __folio_add_anon_rmap(folio, page, nr_pages, vma, address, flags,
+ RMAP_MODE_PTE);
+}
+
+/**
+ * folio_add_anon_rmap_pmd - add a PMD mapping to a page range of an anon folio
+ * @folio: The folio to add the mapping to
+ * @page: The first page to add
+ * @vma: The vm area in which the mapping is added
+ * @address: The user virtual address of the first page to map
+ * @flags: The rmap flags
+ *
+ * The page range of folio is defined by [first_page, first_page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock, and the page must be locked in
+ * the anon_vma case: to serialize mapping,index checking after setting.
+ */
+void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma, unsigned long address, rmap_t flags)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_add_anon_rmap(folio, page, HPAGE_PMD_NR, vma, address, flags,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/**
* folio_add_new_anon_rmap - Add mapping to a new anonymous folio.
* @folio: The folio to add the mapping to.
--
2.41.0

2023-12-04 14:23:29

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 17/39] mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert remove_migration_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index efc19f53b05e6..0e78680589bcc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -259,8 +259,8 @@ static bool remove_migration_pte(struct folio *folio,
#endif
{
if (folio_test_anon(folio))
- page_add_anon_rmap(new, vma, pvmw.address,
- rmap_flags);
+ folio_add_anon_rmap_pte(folio, new, vma,
+ pvmw.address, rmap_flags);
else
folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
--
2.41.0

2023-12-04 14:23:47

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap()

Let's factor it out to prepare for reuse as we convert
page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().

Make the compiler always special-case on the granularity by using
__always_inline.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 75 +++++++++++++++++++++++++++++++------------------------
1 file changed, 42 insertions(+), 33 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 53e2c653be99a..c09b360402599 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1127,6 +1127,46 @@ int folio_total_mapcount(struct folio *folio)
return mapcount;
}

+static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages, enum rmap_mode mode,
+ int *nr_pmdmapped)
+{
+ atomic_t *mapped = &folio->_nr_pages_mapped;
+ int first, nr = 0;
+
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
+
+ /* Is page being mapped by PTE? Is this its first map to be added? */
+ if (likely(mode == RMAP_MODE_PTE)) {
+ do {
+ first = atomic_inc_and_test(&page->_mapcount);
+ if (first && folio_test_large(folio)) {
+ first = atomic_inc_return_relaxed(mapped);
+ first = (first < COMPOUND_MAPPED);
+ }
+
+ if (first)
+ nr++;
+ } while (page++, --nr_pages > 0);
+ } else if (mode == RMAP_MODE_PMD) {
+ first = atomic_inc_and_test(&folio->_entire_mapcount);
+ if (first) {
+ nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
+ if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
+ *nr_pmdmapped = folio_nr_pages(folio);
+ nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
+ /* Raced ahead of a remove and another add? */
+ if (unlikely(nr < 0))
+ nr = 0;
+ } else {
+ /* Raced ahead of a remove of COMPOUND_MAPPED */
+ nr = 0;
+ }
+ }
+ }
+ return nr;
+}
+
/**
* folio_move_anon_rmap - move a folio to our anon_vma
* @folio: The folio to move to our anon_vma
@@ -1338,42 +1378,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
struct page *page, unsigned int nr_pages,
struct vm_area_struct *vma, enum rmap_mode mode)
{
- atomic_t *mapped = &folio->_nr_pages_mapped;
- unsigned int nr_pmdmapped = 0, first;
- int nr = 0;
+ unsigned int nr, nr_pmdmapped = 0;

VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
- __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
-
- /* Is page being mapped by PTE? Is this its first map to be added? */
- if (likely(mode == RMAP_MODE_PTE)) {
- do {
- first = atomic_inc_and_test(&page->_mapcount);
- if (first && folio_test_large(folio)) {
- first = atomic_inc_return_relaxed(mapped);
- first = (first < COMPOUND_MAPPED);
- }
-
- if (first)
- nr++;
- } while (page++, --nr_pages > 0);
- } else if (mode == RMAP_MODE_PMD) {
- first = atomic_inc_and_test(&folio->_entire_mapcount);
- if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
- nr_pmdmapped = folio_nr_pages(folio);
- nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
- /* Raced ahead of a remove and another add? */
- if (unlikely(nr < 0))
- nr = 0;
- } else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
- nr = 0;
- }
- }
- }

+ nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
if (nr_pmdmapped)
__lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);
--
2.41.0

2023-12-04 14:23:51

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 19/39] mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert unuse_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4bc70f4591641..1ded3c150df95 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1805,7 +1805,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (pte_swp_exclusive(old_pte))
rmap_flags |= RMAP_EXCLUSIVE;

- page_add_anon_rmap(page, vma, addr, rmap_flags);
+ folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, addr);
lru_cache_add_inactive_or_unevictable(page, vma);
--
2.41.0

2023-12-04 14:23:59

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 21/39] mm/rmap: remove page_add_anon_rmap()

All users are gone, remove it and all traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 2 --
mm/rmap.c | 31 ++++---------------------------
2 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9e1c197f50199..865d83148852d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -235,8 +235,6 @@ void folio_add_anon_rmap_ptes(struct folio *, struct page *, unsigned int nr,
folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags)
void folio_add_anon_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *, unsigned long address, rmap_t flags);
-void page_add_anon_rmap(struct page *, struct vm_area_struct *,
- unsigned long address, rmap_t flags);
void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address);
void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
diff --git a/mm/rmap.c b/mm/rmap.c
index 85bea11e9266b..4cb9d8b7d1d65 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1238,7 +1238,7 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page,
* The page's anon-rmap details (mapping and index) are guaranteed to
* be set up correctly at this point.
*
- * We have exclusion against page_add_anon_rmap because the caller
+ * We have exclusion against folio_add_anon_rmap_*() because the caller
* always holds the page locked.
*
* We have exclusion against page_add_new_anon_rmap because those pages
@@ -1251,29 +1251,6 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page,
page);
}

-/**
- * page_add_anon_rmap - add pte mapping to an anonymous page
- * @page: the page to add the mapping to
- * @vma: the vm area in which the mapping is added
- * @address: the user virtual address mapped
- * @flags: the rmap flags
- *
- * The caller needs to hold the pte lock, and the page must be locked in
- * the anon_vma case: to serialize mapping,index checking after setting,
- * and to ensure that PageAnon is not being upgraded racily to PageKsm
- * (but PageKsm is never downgraded to PageAnon).
- */
-void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!(flags & RMAP_COMPOUND)))
- folio_add_anon_rmap_pte(folio, page, vma, address, flags);
- else
- folio_add_anon_rmap_pmd(folio, page, vma, address, flags);
-}
-
static __always_inline void __folio_add_anon_rmap(struct folio *folio,
struct page *page, unsigned int nr_pages,
struct vm_area_struct *vma, unsigned long address, rmap_t flags,
@@ -1384,7 +1361,7 @@ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page,
* @vma: the vm area in which the mapping is added
* @address: the user virtual address mapped
*
- * Like page_add_anon_rmap() but must only be called on *new* folios.
+ * Like folio_add_anon_rmap_*() but must only be called on *new* folios.
* This means the inc-and-test can be bypassed.
* The folio does not have to be locked.
*
@@ -1432,7 +1409,7 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
if (nr)
__lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr);

- /* See comments in page_add_anon_rmap() */
+ /* See comments in folio_add_anon_rmap_*() */
if (!folio_test_large(folio))
mlock_vma_folio(folio, vma);
}
@@ -1546,7 +1523,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,

/*
* It would be tidy to reset folio_test_anon mapping when fully
- * unmapped, but that might overwrite a racing page_add_anon_rmap
+ * unmapped, but that might overwrite a racing folio_add_anon_rmap_*()
* which increments mapcount after us but sets mapping before us:
* so leave the reset to free_pages_prepare, and remember that
* it's only reliable while mapped.
--
2.41.0

2023-12-04 14:24:02

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 22/39] mm/rmap: remove RMAP_COMPOUND

No longer used, let's remove it and clarify RMAP_NONE/RMAP_EXCLUSIVE a
bit.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 12 +++---------
mm/rmap.c | 2 --
2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 865d83148852d..017b216915f19 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -172,20 +172,14 @@ struct anon_vma *folio_get_anon_vma(struct folio *folio);
typedef int __bitwise rmap_t;

/*
- * No special request: if the page is a subpage of a compound page, it is
- * mapped via a PTE. The mapped (sub)page is possibly shared between processes.
+ * No special request: A mapped anonymous (sub)page is possibly shared between
+ * processes.
*/
#define RMAP_NONE ((__force rmap_t)0)

-/* The (sub)page is exclusive to a single process. */
+/* The anonymous (sub)page is exclusive to a single process. */
#define RMAP_EXCLUSIVE ((__force rmap_t)BIT(0))

-/*
- * The compound page is not mapped via PTEs, but instead via a single PMD and
- * should be accounted accordingly.
- */
-#define RMAP_COMPOUND ((__force rmap_t)BIT(1))
-
/*
* Internally, we're using an enum to specify the granularity. Usually,
* we make the compiler create specialized variants for the different
diff --git a/mm/rmap.c b/mm/rmap.c
index 4cb9d8b7d1d65..3587225055c5e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2615,8 +2615,6 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
* The following two functions are for anonymous (private mapped) hugepages.
* Unlike common anonymous pages, anonymous hugepages have no accounting code
* and no lru code, because we handle hugepages differently from common pages.
- *
- * RMAP_COMPOUND is ignored.
*/
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
--
2.41.0

2023-12-04 14:24:07

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 23/39] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()

Let's mimic what we did with folio_add_file_rmap_*() and
folio_add_anon_rmap_*() so we can similarly replace page_remove_rmap()
next.

Make the compiler always special-case on the granularity by using
__always_inline.

We're adding folio_remove_rmap_ptes() handling right away, as we want to
use that soon for batching rmap operations when unmapping PTE-mapped
large folios.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 ++++
mm/rmap.c | 76 ++++++++++++++++++++++++++++++++++++--------
2 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 017b216915f19..dd4ffb1d8ae04 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -241,6 +241,12 @@ void folio_add_file_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);
+void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
+ struct vm_area_struct *);
+#define folio_remove_rmap_pte(folio, page, vma) \
+ folio_remove_rmap_ptes(folio, page, 1, vma)
+void folio_remove_rmap_pmd(struct folio *, struct page *,
+ struct vm_area_struct *);

void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
diff --git a/mm/rmap.c b/mm/rmap.c
index 3587225055c5e..50b6909157ac1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1463,25 +1463,36 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool compound)
{
struct folio *folio = page_folio(page);
+
+ if (likely(!compound))
+ folio_remove_rmap_pte(folio, page, vma);
+ else
+ folio_remove_rmap_pmd(folio, page, vma);
+}
+
+static __always_inline void __folio_remove_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *vma, enum rmap_mode mode)
+{
atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool last;
+ int last, nr = 0, nr_pmdmapped = 0;
enum node_stat_item idx;

- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

/* Is page being unmapped by PTE? Is this its last map to be removed? */
- if (likely(!compound)) {
- last = atomic_add_negative(-1, &page->_mapcount);
- nr = last;
- if (last && folio_test_large(folio)) {
- nr = atomic_dec_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
- } else if (folio_test_pmd_mappable(folio)) {
- /* That test is redundant: it's for safety or to optimize out */
+ if (likely(mode == RMAP_MODE_PTE)) {
+ do {
+ last = atomic_add_negative(-1, &page->_mapcount);
+ if (last && folio_test_large(folio)) {
+ last = atomic_dec_return_relaxed(mapped);
+ last = (last < COMPOUND_MAPPED);
+ }

+ if (last)
+ nr++;
+ } while (page++, --nr_pages > 0);
+ } else if (mode == RMAP_MODE_PMD) {
last = atomic_add_negative(-1, &folio->_entire_mapcount);
if (last) {
nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
@@ -1517,7 +1528,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
* is still mapped.
*/
if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))
- if (!compound || nr < nr_pmdmapped)
+ if (mode == RMAP_MODE_PTE || nr < nr_pmdmapped)
deferred_split_folio(folio);
}

@@ -1532,6 +1543,43 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
munlock_vma_folio(folio, vma);
}

+/**
+ * folio_remove_rmap_ptes - remove PTE mappings from a page range of a folio
+ * @folio: The folio to remove the mappings from
+ * @page: The first page to remove
+ * @nr_pages: The number of pages that will be removed from the mapping
+ * @vma: The vm area from which the mappings are removed
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_remove_rmap_ptes(struct folio *folio, struct page *page,
+ unsigned int nr_pages, struct vm_area_struct *vma)
+{
+ __folio_remove_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
+}
+
+/**
+ * folio_remove_rmap_pmd - remove a PMD mapping from a page range of a folio
+ * @folio: The folio to remove the mapping from
+ * @page: The first page to remove
+ * @vma: The vm area from which the mapping is removed
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
+ struct vm_area_struct *vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_remove_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
--
2.41.0

2023-12-04 14:24:08

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 16/39] mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()

Let's convert remove_migration_pmd(). No need to set RMAP_COMPOUND, that
we will remove soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2c037ab3f4916..332cb6cf99f38 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3310,12 +3310,12 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
pmde = pmd_mkdirty(pmde);

if (folio_test_anon(folio)) {
- rmap_t rmap_flags = RMAP_COMPOUND;
+ rmap_t rmap_flags = RMAP_NONE;

if (!is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;

- page_add_anon_rmap(new, vma, haddr, rmap_flags);
+ folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags);
} else {
folio_add_file_rmap_pmd(folio, new, vma);
}
--
2.41.0

2023-12-04 14:24:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 20/39] mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert restore_exclusive_pte() and do_swap_page(). While at it,
perform some folio conversion in restore_exclusive_pte().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index be7fe58f7c297..9543b6e2b749b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -706,6 +706,7 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
struct page *page, unsigned long address,
pte_t *ptep)
{
+ struct folio *folio = page_folio(page);
pte_t orig_pte;
pte_t pte;
swp_entry_t entry;
@@ -721,14 +722,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
else if (is_writable_device_exclusive_entry(entry))
pte = maybe_mkwrite(pte_mkdirty(pte), vma);

- VM_BUG_ON(pte_write(pte) && !(PageAnon(page) && PageAnonExclusive(page)));
+ VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) &&
+ PageAnonExclusive(page)), folio);

/*
* No need to take a page reference as one was already
* created when the swap entry was made.
*/
- if (PageAnon(page))
- page_add_anon_rmap(page, vma, address, RMAP_NONE);
+ if (folio_test_anon(folio))
+ folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE);
else
/*
* Currently device exclusive access only supports anonymous
@@ -4065,7 +4067,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
page_add_new_anon_rmap(page, vma, vmf->address);
folio_add_lru_vma(folio, vma);
} else {
- page_add_anon_rmap(page, vma, vmf->address, rmap_flags);
+ folio_add_anon_rmap_pte(folio, page, vma, vmf->address,
+ rmap_flags);
}

VM_BUG_ON(!folio_test_anon(folio) ||
--
2.41.0

2023-12-04 14:24:22

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 18/39] mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()

Let's convert replace_page(). While at it, perform some folio
conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/ksm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6a831009b4cbf..357944588a9a9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1199,6 +1199,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
static int replace_page(struct vm_area_struct *vma, struct page *page,
struct page *kpage, pte_t orig_pte)
{
+ struct folio *kfolio = page_folio(kpage);
struct mm_struct *mm = vma->vm_mm;
struct folio *folio;
pmd_t *pmd;
@@ -1238,15 +1239,16 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
goto out_mn;
}
VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
- VM_BUG_ON_PAGE(PageAnon(kpage) && PageAnonExclusive(kpage), kpage);
+ VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
+ kfolio);

/*
* No need to check ksm_use_zero_pages here: we can only have a
* zero_page here if ksm_use_zero_pages was enabled already.
*/
if (!is_zero_pfn(page_to_pfn(kpage))) {
- get_page(kpage);
- page_add_anon_rmap(kpage, vma, addr, RMAP_NONE);
+ folio_get(kfolio);
+ folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
newpte = mk_pte(kpage, vma->vm_page_prot);
} else {
/*
--
2.41.0

2023-12-04 14:24:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.

While at it, use more folio operations (but only in the code branch we're
touching), use VM_WARN_ON_FOLIO(), and pass RMAP_COMPOUND instead of
manually setting PageAnonExclusive.

We should never see non-anon pages on that branch: otherwise, the
existing page_add_anon_rmap() call would have been flawed already.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cb33c6e0404cf..2c037ab3f4916 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2099,6 +2099,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long haddr, bool freeze)
{
struct mm_struct *mm = vma->vm_mm;
+ struct folio *folio;
struct page *page;
pgtable_t pgtable;
pmd_t old_pmd, _pmd;
@@ -2194,16 +2195,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
page = pmd_page(old_pmd);
+ folio = page_folio(page);
if (pmd_dirty(old_pmd)) {
dirty = true;
- SetPageDirty(page);
+ folio_set_dirty(folio);
}
write = pmd_write(old_pmd);
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
uffd_wp = pmd_uffd_wp(old_pmd);

- VM_BUG_ON_PAGE(!page_count(page), page);
+ VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

/*
* Without "freeze", we'll simply split the PMD, propagating the
@@ -2220,11 +2223,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
*
* See page_try_share_anon_rmap(): invalidate PMD first.
*/
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = PageAnonExclusive(page);
if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
freeze = false;
- if (!freeze)
- page_ref_add(page, HPAGE_PMD_NR - 1);
+ if (!freeze) {
+ rmap_t rmap_flags = RMAP_NONE;
+
+ folio_ref_add(folio, HPAGE_PMD_NR - 1);
+ if (anon_exclusive)
+ rmap_flags = RMAP_EXCLUSIVE;
+ folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
+ vma, haddr, rmap_flags);
+ }
}

/*
@@ -2267,8 +2277,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
if (write)
entry = pte_mkwrite(entry, vma);
- if (anon_exclusive)
- SetPageAnonExclusive(page + i);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
@@ -2278,7 +2286,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = pte_mksoft_dirty(entry);
if (uffd_wp)
entry = pte_mkuffd_wp(entry);
- page_add_anon_rmap(page + i, vma, addr, RMAP_NONE);
}
VM_BUG_ON(!pte_none(ptep_get(pte)));
set_pte_at(mm, addr, pte, entry);
--
2.41.0

2023-12-04 14:24:37

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 24/39] kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert __replace_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/events/uprobes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 435aac1d8c272..16731d240e169 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -198,7 +198,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
set_pte_at_notify(mm, addr, pvmw.pte,
mk_pte(new_page, vma->vm_page_prot));

- page_remove_rmap(old_page, vma, false);
+ folio_remove_rmap_pte(old_folio, old_page, vma);
if (!folio_mapped(old_folio))
folio_free_swap(old_folio);
page_vma_mapped_walk_done(&pvmw);
--
2.41.0

2023-12-04 14:24:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 26/39] mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert __collapse_huge_page_copy_succeeded() and
collapse_pte_mapped_thp(). While at it, perform some more folio
conversion in __collapse_huge_page_copy_succeeded().

We can get rid of release_pte_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/khugepaged.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 064654717843e..c2d7438fac22d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -493,11 +493,6 @@ static void release_pte_folio(struct folio *folio)
folio_putback_lru(folio);
}

-static void release_pte_page(struct page *page)
-{
- release_pte_folio(page_folio(page));
-}
-
static void release_pte_pages(pte_t *pte, pte_t *_pte,
struct list_head *compound_pagelist)
{
@@ -686,6 +681,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
spinlock_t *ptl,
struct list_head *compound_pagelist)
{
+ struct folio *src_folio;
struct page *src_page;
struct page *tmp;
pte_t *_pte;
@@ -707,16 +703,17 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
}
} else {
src_page = pte_page(pteval);
- if (!PageCompound(src_page))
- release_pte_page(src_page);
+ src_folio = page_folio(src_page);
+ if (!folio_test_large(src_folio))
+ release_pte_folio(src_folio);
/*
* ptl mostly unnecessary, but preempt has to
* be disabled to update the per-cpu stats
- * inside page_remove_rmap().
+ * inside folio_remove_rmap_pte().
*/
spin_lock(ptl);
ptep_clear(vma->vm_mm, address, _pte);
- page_remove_rmap(src_page, vma, false);
+ folio_remove_rmap_pte(src_folio, src_page, vma);
spin_unlock(ptl);
free_page_and_swap_cache(src_page);
}
@@ -1619,7 +1616,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
* PTE dirty? Shmem page is already dirty; file is read-only.
*/
ptep_clear(mm, addr, pte);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
nr_ptes++;
}

--
2.41.0

2023-12-04 14:24:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 28/39] mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert zap_pte_range() and closely-related
tlb_flush_rmap_batch(). While at it, perform some more folio conversion
in zap_pte_range().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 23 +++++++++++++----------
mm/mmu_gather.c | 2 +-
2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 9543b6e2b749b..8c4f98bb617fa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1425,6 +1425,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = ptep_get(pte);
+ struct folio *folio;
struct page *page;

if (pte_none(ptent))
@@ -1450,21 +1451,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
continue;
}

+ folio = page_folio(page);
delay_rmap = 0;
- if (!PageAnon(page)) {
+ if (!folio_test_anon(folio)) {
if (pte_dirty(ptent)) {
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (tlb_delay_rmap(tlb)) {
delay_rmap = 1;
force_flush = 1;
}
}
if (pte_young(ptent) && likely(vma_has_recency(vma)))
- mark_page_accessed(page);
+ folio_mark_accessed(folio);
}
rss[mm_counter(page)]--;
if (!delay_rmap) {
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
}
@@ -1480,6 +1482,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
if (is_device_private_entry(entry) ||
is_device_exclusive_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
+ folio = page_folio(page);
if (unlikely(!should_zap_page(details, page)))
continue;
/*
@@ -1491,8 +1494,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
WARN_ON_ONCE(!vma_is_anonymous(vma));
rss[mm_counter(page)]--;
if (is_device_private_entry(entry))
- page_remove_rmap(page, vma, false);
- put_page(page);
+ folio_remove_rmap_pte(folio, page, vma);
+ folio_put(folio);
} else if (!non_swap_entry(entry)) {
/* Genuine swap entry, hence a private anon page */
if (!should_zap_cows(details))
@@ -3210,10 +3213,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* threads.
*
* The critical issue is to order this
- * page_remove_rmap with the ptp_clear_flush above.
- * Those stores are ordered by (if nothing else,)
+ * folio_remove_rmap_pte() with the ptp_clear_flush
+ * above. Those stores are ordered by (if nothing else,)
* the barrier present in the atomic_add_negative
- * in page_remove_rmap.
+ * in folio_remove_rmap_pte();
*
* Then the TLB flush in ptep_clear_flush ensures that
* no process can access the old page before the
@@ -3222,7 +3225,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* mapcount is visible. So transitively, TLBs to
* old page will be flushed before it can be reused.
*/
- page_remove_rmap(vmf->page, vma, false);
+ folio_remove_rmap_pte(old_folio, vmf->page, vma);
}

/* Free the old page.. */
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 4f559f4ddd217..604ddf08affed 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -55,7 +55,7 @@ static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_

if (encoded_page_flags(enc)) {
struct page *page = encoded_page_ptr(enc);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(page_folio(page), page, vma);
}
}
}
--
2.41.0

2023-12-04 14:24:47

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 27/39] mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert replace_page().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/ksm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 357944588a9a9..c23aed4f1a344 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1279,7 +1279,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
set_pte_at_notify(mm, addr, ptep, newpte);

folio = page_folio(page);
- page_remove_rmap(page, vma, false);
+ folio_remove_rmap_pte(folio, page, vma);
if (!folio_mapped(folio))
folio_free_swap(folio);
folio_put(folio);
--
2.41.0

2023-12-04 14:25:08

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 30/39] mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert try_to_unmap_one() and try_to_migrate_one().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/rmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 50b6909157ac1..4a0114d04ab48 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,

/*
* When racing against e.g. zap_pte_range() on another cpu,
- * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
* try_to_unmap() may return before page_mapped() has become false,
* if page table locking is skipped: use TTU_SYNC to wait for that.
*/
@@ -1879,7 +1879,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_hugetlb(folio)))
hugetlb_remove_rmap(folio);
else
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -1947,7 +1947,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,

/*
* When racing against e.g. zap_pte_range() on another cpu,
- * in between its ptep_get_and_clear_full() and page_remove_rmap(),
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
* try_to_migrate() may return before page_mapped() has become false,
* if page table locking is skipped: use TTU_SYNC to wait for that.
*/
@@ -2240,7 +2240,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
if (unlikely(folio_test_hugetlb(folio)))
hugetlb_remove_rmap(folio);
else
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2379,7 +2379,7 @@ static bool page_make_device_exclusive_one(struct folio *folio,
* There is a reference on the page for the swap entry which has
* been removed, so shouldn't take another.
*/
- page_remove_rmap(subpage, vma, false);
+ folio_remove_rmap_pte(folio, subpage, vma);
}

mmu_notifier_invalidate_range_end(&range);
--
2.41.0

2023-12-04 14:25:08

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 29/39] mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()

Let's convert migrate_vma_collect_pmd(). While at it, perform more
folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/migrate_device.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 8ac1f79f754a2..c51c99151ebb5 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -107,6 +107,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

for (; addr < end; addr += PAGE_SIZE, ptep++) {
unsigned long mpfn = 0, pfn;
+ struct folio *folio;
struct page *page;
swp_entry_t entry;
pte_t pte;
@@ -168,41 +169,43 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
}

/*
- * By getting a reference on the page we pin it and that blocks
+ * By getting a reference on the folio we pin it and that blocks
* any kind of migration. Side effect is that it "freezes" the
* pte.
*
- * We drop this reference after isolating the page from the lru
- * for non device page (device page are not on the lru and thus
+ * We drop this reference after isolating the folio from the lru
+ * for non device folio (device folio are not on the lru and thus
* can't be dropped from it).
*/
- get_page(page);
+ folio = page_folio(page);
+ folio_get(folio);

/*
- * We rely on trylock_page() to avoid deadlock between
+ * We rely on folio_trylock() to avoid deadlock between
* concurrent migrations where each is waiting on the others
- * page lock. If we can't immediately lock the page we fail this
+ * folio lock. If we can't immediately lock the folio we fail this
* migration as it is only best effort anyway.
*
- * If we can lock the page it's safe to set up a migration entry
- * now. In the common case where the page is mapped once in a
+ * If we can lock the folio it's safe to set up a migration entry
+ * now. In the common case where the folio is mapped once in a
* single process setting up the migration entry now is an
* optimisation to avoid walking the rmap later with
* try_to_migrate().
*/
- if (trylock_page(page)) {
+ if (folio_trylock(folio)) {
bool anon_exclusive;
pte_t swp_pte;

flush_cache_page(vma, addr, pte_pfn(pte));
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = folio_test_anon(folio) &&
+ PageAnonExclusive(page);
if (anon_exclusive) {
pte = ptep_clear_flush(vma, addr, ptep);

if (page_try_share_anon_rmap(page)) {
set_pte_at(mm, addr, ptep, pte);
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
mpfn = 0;
goto next;
}
@@ -214,7 +217,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pte))
- folio_mark_dirty(page_folio(page));
+ folio_mark_dirty(folio);

/* Setup special migration page table entry */
if (mpfn & MIGRATE_PFN_WRITE)
@@ -248,16 +251,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

/*
* This is like regular unmap: we remove the rmap and
- * drop page refcount. Page won't be freed, as we took
- * a reference just above.
+ * drop the folio refcount. The folio won't be freed, as
+ * we took a reference just above.
*/
- page_remove_rmap(page, vma, false);
- put_page(page);
+ folio_remove_rmap_pte(folio, page, vma);
+ folio_put(folio);

if (pte_present(pte))
unmapped++;
} else {
- put_page(page);
+ folio_put(folio);
mpfn = 0;
}

--
2.41.0

2023-12-04 14:25:10

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 31/39] Documentation: stop referring to page_remove_rmap()

Refer to folio_remove_rmap_*() instaed.

Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/mm/transhuge.rst | 2 +-
Documentation/mm/unevictable-lru.rst | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index 9a607059ea11c..cf81272a6b8b6 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -156,7 +156,7 @@ Partial unmap and deferred_split_folio()

Unmapping part of THP (with munmap() or other way) is not going to free
memory immediately. Instead, we detect that a subpage of THP is not in use
-in page_remove_rmap() and queue the THP for splitting if memory pressure
+in folio_remove_rmap_*() and queue the THP for splitting if memory pressure
comes. Splitting will free up unused subpages.

Splitting the page right away is not an option due to locking context in
diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
index 67f1338440a50..b6a07a26b10d5 100644
--- a/Documentation/mm/unevictable-lru.rst
+++ b/Documentation/mm/unevictable-lru.rst
@@ -486,7 +486,7 @@ munlock the pages if we're removing the last VM_LOCKED VMA that maps the pages.
Before the unevictable/mlock changes, mlocking did not mark the pages in any
way, so unmapping them required no processing.

-For each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
+For each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls
munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).

@@ -511,7 +511,7 @@ userspace; truncation even unmaps and deletes any private anonymous pages
which had been Copied-On-Write from the file pages now being truncated.

Mlocked pages can be munlocked and deleted in this way: like with munmap(),
-for each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
+for each PTE (or PMD) being unmapped from a VMA, folio_remove_rmap_*() calls
munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).

--
2.41.0

2023-12-04 14:25:34

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
remove them.

Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 6 --
include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
2 files changed, 100 insertions(+), 51 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 24c1c7c5a99c0..f7565b35ae931 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
return folio_maybe_dma_pinned(folio);
}

-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
-{
- return folio_needs_cow_for_dma(vma, page_folio(page));
-}
-
/**
* is_zero_page - Query if a page is a zero page
* @page: The page to query
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 21d72cc602adc..84439f7720c62 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
#endif
}

-static inline void __page_dup_rmap(struct page *page, bool compound)
+static inline int __folio_try_dup_anon_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *src_vma, enum rmap_mode mode)
{
- VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+ int i;

- if (compound) {
- struct folio *folio = (struct folio *)page;
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
- atomic_inc(&folio->_entire_mapcount);
- } else {
- atomic_inc(&page->_mapcount);
+ /*
+ * No need to check+clear for already shared PTEs/PMDs of the folio.
+ * This includes PTE mappings of (order-0) KSM folios.
+ */
+ if (likely(mode == RMAP_MODE_PTE)) {
+ for (i = 0; i < nr_pages; i++) {
+ if (PageAnonExclusive(page + i))
+ goto clear;
+ }
+ } else if (mode == RMAP_MODE_PMD) {
+ if (PageAnonExclusive(page))
+ goto clear;
}
+ goto dup;
+
+clear:
+ /*
+ * If this folio may have been pinned by the parent process,
+ * don't allow to duplicate the mappings but instead require to e.g.,
+ * copy the subpage immediately for the child so that we'll always
+ * guarantee the pinned folio won't be randomly replaced in the
+ * future on write faults.
+ */
+ if (likely(!folio_is_device_private(folio) &&
+ unlikely(folio_needs_cow_for_dma(src_vma, folio))))
+ return -EBUSY;
+
+ if (likely(mode == RMAP_MODE_PTE)) {
+ for (i = 0; i < nr_pages; i++)
+ ClearPageAnonExclusive(page + i);
+ } else if (mode == RMAP_MODE_PMD) {
+ ClearPageAnonExclusive(page);
+ }
+
+dup:
+ __folio_dup_rmap(folio, page, nr_pages, mode);
+ return 0;
}

/**
- * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
- * anonymous page
- * @page: the page to duplicate the mapping for
- * @compound: the page is mapped as compound or as a small page
- * @vma: the source vma
+ * folio_try_dup_anon_rmap_ptes - try duplicating PTE mappings of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ * @src_vma: The vm area from which the mappings are duplicated
*
- * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq.
+ * The page range of the folio is defined by [page, page + nr_pages)
*
- * Duplicating the mapping can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mappings can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail.
+ *
+ * If duplicating the mappings succeeded, the duplicated PTEs have to be R/O in
+ * the parent and the child. They must *not* be writable after this call.
+ *
+ * Returns 0 if duplicating the mappings succeeded. Returns -EBUSY otherwise.
+ */
+static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *src_vma)
+{
+ return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma,
+ RMAP_MODE_PTE);
+}
+#define folio_try_dup_anon_rmap_pte(folio, page, vma) \
+ folio_try_dup_anon_rmap_ptes(folio, page, 1, vma)
+
+/**
+ * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ * @src_vma: The vm area from which the mapping is duplicated
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
*
- * If duplicating the mapping succeeds, the page has to be mapped R/O into
- * the parent and the child. It must *not* get mapped writable after this call.
+ * Duplicating the mapping can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail.
+ *
+ * If duplicating the mapping succeeds, the duplicated PMD has to be R/O in
+ * the parent and the child. They must *not* be writable after this call.
*
* Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise.
*/
+static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
+ struct page *page, struct vm_area_struct *src_vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_dup_anon_rmap(folio, page, HPAGE_PMD_NR, src_vma,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
struct vm_area_struct *vma)
{
- VM_BUG_ON_PAGE(!PageAnon(page), page);
-
- /*
- * No need to check+clear for already shared pages, including KSM
- * pages.
- */
- if (!PageAnonExclusive(page))
- goto dup;
-
- /*
- * If this page may have been pinned by the parent process,
- * don't allow to duplicate the mapping but instead require to e.g.,
- * copy the page immediately for the child so that we'll always
- * guarantee the pinned page won't be randomly replaced in the
- * future on write faults.
- */
- if (likely(!is_device_private_page(page) &&
- unlikely(page_needs_cow_for_dma(vma, page))))
- return -EBUSY;
+ struct folio *folio = page_folio(page);

- ClearPageAnonExclusive(page);
- /*
- * It's okay to share the anon page between both processes, mapping
- * the page R/O into both processes.
- */
-dup:
- __page_dup_rmap(page, compound);
- return 0;
+ if (likely(!compound))
+ return folio_try_dup_anon_rmap_pte(folio, page, vma);
+ return folio_try_dup_anon_rmap_pmd(folio, page, vma);
}

/**
--
2.41.0

2023-12-04 14:25:35

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 32/39] mm/rmap: remove page_remove_rmap()

All callers are gone, let's remove it and some leftover traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 +---
mm/internal.h | 2 +-
mm/memory-failure.c | 4 ++--
mm/rmap.c | 23 ++---------------------
4 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index dd4ffb1d8ae04..bdbfb11638e4f 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -239,8 +239,6 @@ void folio_add_file_rmap_ptes(struct folio *, struct page *, unsigned int nr,
folio_add_file_rmap_ptes(folio, page, 1, vma)
void folio_add_file_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
-void page_remove_rmap(struct page *, struct vm_area_struct *,
- bool compound);
void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
struct vm_area_struct *);
#define folio_remove_rmap_pte(folio, page, vma) \
@@ -384,7 +382,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
*
* This is similar to page_try_dup_anon_rmap(), however, not used during fork()
* to duplicate a mapping, but instead to prepare for KSM or temporarily
- * unmapping a page (swap, migration) via page_remove_rmap().
+ * unmapping a page (swap, migration) via folio_remove_rmap_*().
*
* Marking the page shared can only fail if the page may be pinned; device
* private pages cannot get pinned and consequently this function cannot fail.
diff --git a/mm/internal.h b/mm/internal.h
index b61034bd50f5f..43dca750c5afc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -651,7 +651,7 @@ folio_within_vma(struct folio *folio, struct vm_area_struct *vma)
* under page table lock for the pte/pmd being added or removed.
*
* mlock is usually called at the end of page_add_*_rmap(), munlock at
- * the end of page_remove_rmap(); but new anon folios are managed by
+ * the end of folio_remove_rmap_*(); but new anon folios are managed by
* folio_add_lru_vma() calling mlock_new_folio().
*/
void mlock_folio(struct folio *folio);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 660c21859118e..d0251cba84795 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2317,8 +2317,8 @@ int memory_failure(unsigned long pfn, int flags)
* We use page flags to determine what action should be taken, but
* the flags can be modified by the error containment action. One
* example is an mlocked page, where PG_mlocked is cleared by
- * page_remove_rmap() in try_to_unmap_one(). So to determine page status
- * correctly, we save a copy of the page flags at this time.
+ * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page
+ * status correctly, we save a copy of the page flags at this time.
*/
page_flags = p->flags;

diff --git a/mm/rmap.c b/mm/rmap.c
index 4a0114d04ab48..8e86024953c03 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -470,7 +470,7 @@ void __init anon_vma_init(void)
/*
* Getting a lock on a stable anon_vma from a page off the LRU is tricky!
*
- * Since there is no serialization what so ever against page_remove_rmap()
+ * Since there is no serialization what so ever against folio_remove_rmap_*()
* the best this function can do is return a refcount increased anon_vma
* that might have been relevant to this page.
*
@@ -487,7 +487,7 @@ void __init anon_vma_init(void)
* [ something equivalent to page_mapped_in_vma() ].
*
* Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from
- * page_remove_rmap() that the anon_vma pointer from page->mapping is valid
+ * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid
* if there is a mapcount, we can dereference the anon_vma after observing
* those.
*/
@@ -1451,25 +1451,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
#endif
}

-/**
- * page_remove_rmap - take down pte mapping from a page
- * @page: page to remove mapping from
- * @vma: the vm area from which the mapping is removed
- * @compound: uncharge the page as compound or small page
- *
- * The caller needs to hold the pte lock.
- */
-void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!compound))
- folio_remove_rmap_pte(folio, page, vma);
- else
- folio_remove_rmap_pmd(folio, page, vma);
-}
-
static __always_inline void __folio_remove_rmap(struct folio *folio,
struct page *page, unsigned int nr_pages,
struct vm_area_struct *vma, enum rmap_mode mode)
--
2.41.0

2023-12-04 14:25:58

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 35/39] mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd()

Let's convert copy_huge_pmd() and fixup the comment in copy_huge_pud().
While at it, perform more folio conversion in copy_huge_pmd().

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9376c28b0ad29..138e1e62790be 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1098,6 +1098,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
{
spinlock_t *dst_ptl, *src_ptl;
struct page *src_page;
+ struct folio *src_folio;
pmd_t pmd;
pgtable_t pgtable = NULL;
int ret = -ENOMEM;
@@ -1164,11 +1165,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,

src_page = pmd_page(pmd);
VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
+ src_folio = page_folio(src_page);

- get_page(src_page);
- if (unlikely(page_try_dup_anon_rmap(src_page, true, src_vma))) {
+ folio_get(src_folio);
+ if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) {
/* Page maybe pinned: split and retry the fault on PTEs. */
- put_page(src_page);
+ folio_put(src_folio);
pte_free(dst_mm, pgtable);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
@@ -1277,8 +1279,8 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}

/*
- * TODO: once we support anonymous pages, use page_try_dup_anon_rmap()
- * and split if duplicating fails.
+ * TODO: once we support anonymous pages, use
+ * folio_try_dup_anon_rmap_*() and split if duplicating fails.
*/
pudp_set_wrprotect(src_mm, addr, src_pud);
pud = pud_mkold(pud_wrprotect(pud));
--
2.41.0

2023-12-04 14:26:05

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 36/39] mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()

Let's convert copy_nonpresent_pte(). While at it, perform some more
folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index eaab6a2e14eba..ad6da8168e461 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -781,6 +781,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
unsigned long vm_flags = dst_vma->vm_flags;
pte_t orig_pte = ptep_get(src_pte);
pte_t pte = orig_pte;
+ struct folio *folio;
struct page *page;
swp_entry_t entry = pte_to_swp_entry(orig_pte);

@@ -825,6 +826,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}
} else if (is_device_private_entry(entry)) {
page = pfn_swap_entry_to_page(entry);
+ folio = page_folio(page);

/*
* Update rss count even for unaddressable pages, as
@@ -835,10 +837,10 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* for unaddressable pages, at some point. But for now
* keep things as they are.
*/
- get_page(page);
+ folio_get(folio);
rss[mm_counter(page)]++;
/* Cannot fail as these pages cannot get pinned. */
- BUG_ON(page_try_dup_anon_rmap(page, false, src_vma));
+ folio_try_dup_anon_rmap_pte(folio, page, src_vma);

/*
* We do not preserve soft-dirty information, because so
@@ -952,7 +954,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
* future.
*/
folio_get(folio);
- if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) {
+ if (unlikely(folio_try_dup_anon_rmap_pte(folio, page, src_vma))) {
/* Page may be pinned, we have to copy. */
folio_put(folio);
return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
--
2.41.0

2023-12-04 14:26:07

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 38/39] mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]()

Let's convert it like we converted all the other rmap functions.
Don't introduce folio_try_share_anon_rmap_ptes() for now, as we don't
have a user that wants rmap batching in sight. Pretty easy to add later.

All users are easy to convert -- only ksm.c doesn't use folios yet but
that is left for future work -- so let's just do it in a single shot.

While at it, turn the BUG_ON into a WARN_ON_ONCE.

Note that page_try_share_anon_rmap() so far didn't care about pte/pmd
mappings (no compound parameter). We're changing that so we can perform
better sanity checks and make the code actually more readable/consistent.
For example, __folio_rmap_sanity_checks() will make sure that a PMD
range actually falls completely into the folio.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 98 ++++++++++++++++++++++++++++++++------------
mm/gup.c | 2 +-
mm/huge_memory.c | 9 ++--
mm/internal.h | 4 +-
mm/ksm.c | 5 ++-
mm/migrate_device.c | 2 +-
mm/rmap.c | 9 ++--
7 files changed, 89 insertions(+), 40 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 3c1df8e020188..ab3ea4583d502 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -267,14 +267,14 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
return 0;
}

-/* See page_try_share_anon_rmap() */
+/* See folio_try_share_anon_rmap_*() */
static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);

- /* See page_try_share_anon_rmap() */
+ /* See folio_try_share_anon_rmap_*() */
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_mb();

@@ -282,7 +282,7 @@ static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
return -EBUSY;
ClearPageAnonExclusive(&folio->page);

- /* See page_try_share_anon_rmap() */
+ /* See folio_try_share_anon_rmap_*() */
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_mb__after_atomic();
return 0;
@@ -463,30 +463,15 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
#endif
}

-/**
- * page_try_share_anon_rmap - try marking an exclusive anonymous page possibly
- * shared to prepare for KSM or temporary unmapping
- * @page: the exclusive anonymous page to try marking possibly shared
- *
- * The caller needs to hold the PT lock and has to have the page table entry
- * cleared/invalidated.
- *
- * This is similar to folio_try_dup_anon_rmap_*(), however, not used during
- * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily
- * unmapping a page (swap, migration) via folio_remove_rmap_*().
- *
- * Marking the page shared can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
- *
- * Returns 0 if marking the page possibly shared succeeded. Returns -EBUSY
- * otherwise.
- */
-static inline int page_try_share_anon_rmap(struct page *page)
+static inline int __folio_try_share_anon_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages, enum rmap_mode mode)
{
- VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page);
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

- /* device private pages cannot get pinned via GUP. */
- if (unlikely(is_device_private_page(page))) {
+ /* device private folios cannot get pinned via GUP. */
+ if (unlikely(folio_is_device_private(folio))) {
ClearPageAnonExclusive(page);
return 0;
}
@@ -537,7 +522,7 @@ static inline int page_try_share_anon_rmap(struct page *page)
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_mb();

- if (unlikely(page_maybe_dma_pinned(page)))
+ if (unlikely(folio_maybe_dma_pinned(folio)))
return -EBUSY;
ClearPageAnonExclusive(page);

@@ -550,6 +535,67 @@ static inline int page_try_share_anon_rmap(struct page *page)
return 0;
}

+/**
+ * folio_try_share_anon_rmap_pte - try marking an exclusive anonymous page
+ * mapped by a PTE possibly shared to prepare
+ * for KSM or temporary unmapping
+ * @folio: The folio to share a mapping of
+ * @page: The mapped exclusive page
+ *
+ * The caller needs to hold the page table lock and has to have the page table
+ * entries cleared/invalidated.
+ *
+ * This is similar to folio_try_dup_anon_rmap_pte(), however, not used during
+ * fork() to duplicate mappings, but instead to prepare for KSM or temporarily
+ * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pte().
+ *
+ * Marking the mapped page shared can only fail if the folio maybe pinned;
+ * device private folios cannot get pinned and consequently this function cannot
+ * fail.
+ *
+ * Returns 0 if marking the mapped page possibly shared succeeded. Returns
+ * -EBUSY otherwise.
+ */
+static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
+ struct page *page)
+{
+ return __folio_try_share_anon_rmap(folio, page, 1, RMAP_MODE_PTE);
+}
+
+/**
+ * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
+ * range mapped by a PMD possibly shared to
+ * prepare for temporary unmapping
+ * @folio: The folio to share the mapping of
+ * @page: The first page to share the mapping of
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock and has to have the page table
+ * entries cleared/invalidated.
+ *
+ * This is similar to folio_try_dup_anon_rmap_pmd(), however, not used during
+ * fork() to duplicate a mapping, but instead to prepare for temporarily
+ * unmapping parts of a folio (swap, migration) via folio_remove_rmap_pmd().
+ *
+ * Marking the mapped pages shared can only fail if the folio maybe pinned;
+ * device private folios cannot get pinned and consequently this function cannot
+ * fail.
+ *
+ * Returns 0 if marking the mapped pages possibly shared succeeded. Returns
+ * -EBUSY otherwise.
+ */
+static inline int folio_try_share_anon_rmap_pmd(struct folio *folio,
+ struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_share_anon_rmap(folio, page, HPAGE_PMD_NR,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
/*
* Called from mm/vmscan.c to handle paging out
*/
diff --git a/mm/gup.c b/mm/gup.c
index 231711efa390d..49f32411c68da 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -177,7 +177,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
/*
* Adjust the pincount before re-checking the PTE for changes.
* This is essentially a smp_mb() and is paired with a memory
- * barrier in page_try_share_anon_rmap().
+ * barrier in folio_try_share_anon_rmap_*().
*/
smp_mb__after_atomic();

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 138e1e62790be..ebbf5ee6192e7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2224,10 +2224,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
* In case we cannot clear PageAnonExclusive(), split the PMD
* only and let try_to_migrate_one() fail later.
*
- * See page_try_share_anon_rmap(): invalidate PMD first.
+ * See folio_try_share_anon_rmap_pmd(): invalidate PMD first.
*/
anon_exclusive = PageAnonExclusive(page);
- if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
+ if (freeze && anon_exclusive &&
+ folio_try_share_anon_rmap_pmd(folio, page))
freeze = false;
if (!freeze) {
rmap_t rmap_flags = RMAP_NONE;
@@ -3253,9 +3254,9 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);

- /* See page_try_share_anon_rmap(): invalidate PMD first. */
+ /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
- if (anon_exclusive && page_try_share_anon_rmap(page)) {
+ if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return -EBUSY;
}
diff --git a/mm/internal.h b/mm/internal.h
index 43dca750c5afc..b9b630717b9b2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1047,7 +1047,7 @@ enum {
* * Ordinary GUP: Using the PT lock
* * GUP-fast and fork(): mm->write_protect_seq
* * GUP-fast and KSM or temporary unmapping (swap, migration): see
- * page_try_share_anon_rmap()
+ * folio_try_share_anon_rmap_*()
*
* Must be called with the (sub)page that's actually referenced via the
* page table entry, which might not necessarily be the head page for a
@@ -1090,7 +1090,7 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma,
return is_cow_mapping(vma->vm_flags);
}

- /* Paired with a memory barrier in page_try_share_anon_rmap(). */
+ /* Paired with a memory barrier in folio_try_share_anon_rmap_*(). */
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
smp_rmb();

diff --git a/mm/ksm.c b/mm/ksm.c
index c23aed4f1a344..51f2d989be2be 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1161,8 +1161,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
goto out_unlock;
}

- /* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive && page_try_share_anon_rmap(page)) {
+ /* See folio_try_share_anon_rmap_pte(): clear PTE first. */
+ if (anon_exclusive &&
+ folio_try_share_anon_rmap_pte(page_folio(page), page)) {
set_pte_at(mm, pvmw.address, pvmw.pte, entry);
goto out_unlock;
}
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index c51c99151ebb5..9d0c1ad737225 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -202,7 +202,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (anon_exclusive) {
pte = ptep_clear_flush(vma, addr, ptep);

- if (page_try_share_anon_rmap(page)) {
+ if (folio_try_share_anon_rmap_pte(folio, page)) {
set_pte_at(mm, addr, ptep, pte);
folio_unlock(folio);
folio_put(folio);
diff --git a/mm/rmap.c b/mm/rmap.c
index 8e86024953c03..7bb3a174efc8d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1817,9 +1817,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
break;
}

- /* See page_try_share_anon_rmap(): clear PTE first. */
+ /* See folio_try_share_anon_rmap(): clear PTE first. */
if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
+ folio_try_share_anon_rmap_pte(folio, subpage)) {
swap_free(entry);
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
@@ -2093,7 +2093,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
pte_t swp_pte;

if (anon_exclusive)
- BUG_ON(page_try_share_anon_rmap(subpage));
+ WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio,
+ subpage));

/*
* Store the pfn of the page in a special migration
@@ -2175,7 +2176,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
} else if (anon_exclusive &&
- page_try_share_anon_rmap(page)) {
+ folio_try_share_anon_rmap_pte(folio, subpage)) {
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
--
2.41.0

2023-12-04 14:26:48

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 33/39] mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]()

Let's convert page_dup_file_rmap() like the other rmap functions. As there
is only a single caller, convert that single caller right away and remove
page_dup_file_rmap().

Add folio_dup_file_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 56 ++++++++++++++++++++++++++++++++++++++++----
mm/memory.c | 2 +-
2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index bdbfb11638e4f..21d72cc602adc 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -303,6 +303,57 @@ static inline void hugetlb_remove_rmap(struct folio *folio)
atomic_dec(&folio->_entire_mapcount);
}

+static inline void __folio_dup_rmap(struct folio *folio, struct page *page,
+ unsigned int nr_pages, enum rmap_mode mode)
+{
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
+
+ if (likely(mode == RMAP_MODE_PTE)) {
+ do {
+ atomic_inc(&page->_mapcount);
+ } while (page++, --nr_pages > 0);
+ } else if (mode == RMAP_MODE_PMD) {
+ atomic_inc(&folio->_entire_mapcount);
+ }
+}
+
+/**
+ * folio_dup_file_rmap_ptes - duplicate PTE mappings of a page range of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ *
+ * The page range of the folio is defined by [page, page + nr_pages)
+ *
+ * The caller needs to hold the page table lock.
+ */
+static inline void folio_dup_file_rmap_ptes(struct folio *folio,
+ struct page *page, unsigned int nr_pages)
+{
+ __folio_dup_rmap(folio, page, nr_pages, RMAP_MODE_PTE);
+}
+#define folio_dup_file_rmap_pte(folio, page) \
+ folio_dup_file_rmap_ptes(folio, page, 1)
+
+/**
+ * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock.
+ */
+static inline void folio_dup_file_rmap_pmd(struct folio *folio,
+ struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ __folio_dup_rmap(folio, page, HPAGE_PMD_NR, RMAP_MODE_PTE);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
@@ -317,11 +368,6 @@ static inline void __page_dup_rmap(struct page *page, bool compound)
}
}

-static inline void page_dup_file_rmap(struct page *page, bool compound)
-{
- __page_dup_rmap(page, compound);
-}
-
/**
* page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
* anonymous page
diff --git a/mm/memory.c b/mm/memory.c
index 8c4f98bb617fa..eaab6a2e14eba 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -961,7 +961,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
rss[MM_ANONPAGES]++;
} else if (page) {
folio_get(folio);
- page_dup_file_rmap(page, false);
+ folio_dup_file_rmap_pte(folio, page);
rss[mm_counter_file(page)]++;
}

--
2.41.0

2023-12-04 14:26:50

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 39/39] mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED

We removed all "bool compound" and RMAP_COMPOUND parameters. Let's
remove the remaining "compound" terminology by making COMPOUND_MAPPED
match the "folio->_entire_mapcount" terminology, renaming it to
ENTIRELY_MAPPED.

ENTIRELY_MAPPED is only used when the whole folio is mapped using a single
page table entry (e.g., a single PMD mapping a PMD-sized THP). For now,
we don't support mapping any THP bigger than that, so ENTIRELY_MAPPED
only applies to PMD-mapped PMD-sized THP only.

Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/mm/transhuge.rst | 2 +-
mm/internal.h | 6 +++---
mm/rmap.c | 18 +++++++++---------
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index cf81272a6b8b6..93c9239b9ebe2 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -117,7 +117,7 @@ pages:

- map/unmap of a PMD entry for the whole THP increment/decrement
folio->_entire_mapcount and also increment/decrement
- folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount
+ folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount
goes from -1 to 0 or 0 to -1.

- map/unmap of individual pages with PTE entry increment/decrement
diff --git a/mm/internal.h b/mm/internal.h
index b9b630717b9b2..700b230666f87 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -54,12 +54,12 @@ void page_writeback_init(void);

/*
* If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
- * its nr_pages_mapped would be 0x400000: choose the COMPOUND_MAPPED bit
+ * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit
* above that range, instead of 2*(PMD_SIZE/PAGE_SIZE). Hugetlb currently
* leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
*/
-#define COMPOUND_MAPPED 0x800000
-#define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1)
+#define ENTIRELY_MAPPED 0x800000
+#define FOLIO_PAGES_MAPPED (ENTIRELY_MAPPED - 1)

/*
* Flags passed to __show_mem() and show_free_areas() to suppress output in
diff --git a/mm/rmap.c b/mm/rmap.c
index 7bb3a174efc8d..a8e3563182103 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1142,7 +1142,7 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
first = atomic_inc_and_test(&page->_mapcount);
if (first && folio_test_large(folio)) {
first = atomic_inc_return_relaxed(mapped);
- first = (first < COMPOUND_MAPPED);
+ first = (first < ENTIRELY_MAPPED);
}

if (first)
@@ -1151,15 +1151,15 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
} else if (mode == RMAP_MODE_PMD) {
first = atomic_inc_and_test(&folio->_entire_mapcount);
if (first) {
- nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
+ nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped);
+ if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) {
*nr_pmdmapped = folio_nr_pages(folio);
nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
/* Raced ahead of a remove and another add? */
if (unlikely(nr < 0))
nr = 0;
} else {
- /* Raced ahead of a remove of COMPOUND_MAPPED */
+ /* Raced ahead of a remove of ENTIRELY_MAPPED */
nr = 0;
}
}
@@ -1384,7 +1384,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
} else {
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
- atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
+ atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED);
nr = folio_nr_pages(folio);
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
}
@@ -1467,7 +1467,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
last = atomic_add_negative(-1, &page->_mapcount);
if (last && folio_test_large(folio)) {
last = atomic_dec_return_relaxed(mapped);
- last = (last < COMPOUND_MAPPED);
+ last = (last < ENTIRELY_MAPPED);
}

if (last)
@@ -1476,15 +1476,15 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
} else if (mode == RMAP_MODE_PMD) {
last = atomic_add_negative(-1, &folio->_entire_mapcount);
if (last) {
- nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
- if (likely(nr < COMPOUND_MAPPED)) {
+ nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped);
+ if (likely(nr < ENTIRELY_MAPPED)) {
nr_pmdmapped = folio_nr_pages(folio);
nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
/* Raced ahead of another remove and an add? */
if (unlikely(nr < 0))
nr = 0;
} else {
- /* An add of COMPOUND_MAPPED raced ahead */
+ /* An add of ENTIRELY_MAPPED raced ahead */
nr = 0;
}
}
--
2.41.0

2023-12-04 14:27:42

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 06/39] mm/rmap: add hugetlb sanity checks

Let's make sure we end up with the right folios in the right functions.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 7 +++++++
mm/rmap.c | 6 ++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 3f38141b53b9d..77e336f86c72d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -212,6 +212,7 @@ void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

if (PageAnonExclusive(&folio->page)) {
@@ -226,6 +227,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
/* See page_try_share_anon_rmap() */
static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);

@@ -245,6 +247,7 @@ static inline int hugetlb_try_share_anon_rmap(struct folio *folio)

static inline void hugetlb_add_file_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -252,11 +255,15 @@ static inline void hugetlb_add_file_rmap(struct folio *folio)

static inline void hugetlb_remove_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
atomic_dec(&folio->_entire_mapcount);
}

static inline void __page_dup_rmap(struct page *page, bool compound)
{
+ VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+
if (compound) {
struct folio *folio = (struct folio *)page;

diff --git a/mm/rmap.c b/mm/rmap.c
index 2f1af3958e687..a735ecca47a81 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1313,6 +1313,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
{
int nr;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
__folio_set_swapbacked(folio);

@@ -1353,6 +1354,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
unsigned int nr_pmdmapped = 0, first;
int nr = 0;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);

/* Is page being mapped by PTE? Is this its first map to be added? */
@@ -1438,6 +1440,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool last;
enum node_stat_item idx;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_PAGE(compound && !PageHead(page), page);

/* Is page being unmapped by PTE? Is this its last map to be removed? */
@@ -2590,6 +2593,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -2602,6 +2606,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
void hugetlb_add_new_anon_rmap(struct folio *folio,
struct vm_area_struct *vma, unsigned long address)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
--
2.41.0

2023-12-04 14:28:18

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 25/39] mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()

Let's convert zap_huge_pmd() and set_pmd_migration_entry(). While at it,
perform some more folio conversion.

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/huge_memory.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 332cb6cf99f38..9376c28b0ad29 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1721,7 +1721,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,

if (pmd_present(orig_pmd)) {
page = pmd_page(orig_pmd);
- page_remove_rmap(page, vma, true);
+ folio_remove_rmap_pmd(page_folio(page), page, vma);
VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
VM_BUG_ON_PAGE(!PageHead(page), page);
} else if (thp_migration_supported()) {
@@ -2134,12 +2134,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
page = pfn_swap_entry_to_page(entry);
} else {
page = pmd_page(old_pmd);
- if (!PageDirty(page) && pmd_dirty(old_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(old_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio = page_folio(page);
+ if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
+ folio_set_dirty(folio);
+ if (!folio_test_referenced(folio) && pmd_young(old_pmd))
+ folio_set_referenced(folio);
+ folio_remove_rmap_pmd(folio, page, vma);
+ folio_put(folio);
}
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
@@ -2294,7 +2295,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
pte_unmap(pte - 1);

if (!pmd_migration)
- page_remove_rmap(page, vma, true);
+ folio_remove_rmap_pmd(folio, page, vma);
if (freeze)
put_page(page);

@@ -3235,6 +3236,7 @@ late_initcall(split_huge_pages_debugfs);
int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
struct page *page)
{
+ struct folio *folio = page_folio(page);
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
@@ -3250,14 +3252,14 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);

/* See page_try_share_anon_rmap(): invalidate PMD first. */
- anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
+ anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
if (anon_exclusive && page_try_share_anon_rmap(page)) {
set_pmd_at(mm, address, pvmw->pmd, pmdval);
return -EBUSY;
}

if (pmd_dirty(pmdval))
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (pmd_write(pmdval))
entry = make_writable_migration_entry(page_to_pfn(page));
else if (anon_exclusive)
@@ -3274,8 +3276,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
if (pmd_uffd_wp(pmdval))
pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio_remove_rmap_pmd(folio, page, vma);
+ folio_put(folio);
trace_set_migration_pmd(address, pmd_val(pmdswp));

return 0;
--
2.41.0

2023-12-04 14:57:36

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 37/39] mm/rmap: remove page_try_dup_anon_rmap()

All users are gone, remove page_try_dup_anon_rmap() and any remaining
traces.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 84439f7720c62..3c1df8e020188 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -251,7 +251,7 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

-/* See page_try_dup_anon_rmap() */
+/* See folio_try_dup_anon_rmap_*() */
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
{
@@ -463,16 +463,6 @@ static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
#endif
}

-static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
- struct vm_area_struct *vma)
-{
- struct folio *folio = page_folio(page);
-
- if (likely(!compound))
- return folio_try_dup_anon_rmap_pte(folio, page, vma);
- return folio_try_dup_anon_rmap_pmd(folio, page, vma);
-}
-
/**
* page_try_share_anon_rmap - try marking an exclusive anonymous page possibly
* shared to prepare for KSM or temporary unmapping
@@ -481,8 +471,8 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
* The caller needs to hold the PT lock and has to have the page table entry
* cleared/invalidated.
*
- * This is similar to page_try_dup_anon_rmap(), however, not used during fork()
- * to duplicate a mapping, but instead to prepare for KSM or temporarily
+ * This is similar to folio_try_dup_anon_rmap_*(), however, not used during
+ * fork() to duplicate a mapping, but instead to prepare for KSM or temporarily
* unmapping a page (swap, migration) via folio_remove_rmap_*().
*
* Marking the page shared can only fail if the page may be pinned; device
--
2.41.0

2023-12-04 17:59:55

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

> -static inline void __page_dup_rmap(struct page *page, bool compound)
> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
> + struct page *page, unsigned int nr_pages,
> + struct vm_area_struct *src_vma, enum rmap_mode mode)
> {
> - VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
> + int i;
>
> - if (compound) {
> - struct folio *folio = (struct folio *)page;
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> - VM_BUG_ON_PAGE(compound && !PageHead(page), page);
> - atomic_inc(&folio->_entire_mapcount);
> - } else {
> - atomic_inc(&page->_mapcount);
> + /*
> + * No need to check+clear for already shared PTEs/PMDs of the folio.
> + * This includes PTE mappings of (order-0) KSM folios.
> + */
> + if (likely(mode == RMAP_MODE_PTE)) {
> + for (i = 0; i < nr_pages; i++) {
> + if (PageAnonExclusive(page + i))
> + goto clear;
> + }
> + } else if (mode == RMAP_MODE_PMD) {
> + if (PageAnonExclusive(page))
> + goto clear;
> }
> + goto dup;
> +
> +clear:
> + /*
> + * If this folio may have been pinned by the parent process,
> + * don't allow to duplicate the mappings but instead require to e.g.,
> + * copy the subpage immediately for the child so that we'll always
> + * guarantee the pinned folio won't be randomly replaced in the
> + * future on write faults.
> + */
> + if (likely(!folio_is_device_private(folio) &&
> + unlikely(folio_needs_cow_for_dma(src_vma, folio))))
> + return -EBUSY;
> +
> + if (likely(mode == RMAP_MODE_PTE)) {
> + for (i = 0; i < nr_pages; i++)
> + ClearPageAnonExclusive(page + i);
> + } else if (mode == RMAP_MODE_PMD) {
> + ClearPageAnonExclusive(page);
> + }
> +
> +dup:
> + __folio_dup_rmap(folio, page, nr_pages, mode);
> + return 0;

Playing with this, I think it can be implemented more efficiently by
only looping once and optimizing for the common case that PAE is set.
Will have to do some more measurements.

--
Cheers,

David / dhildenb

2023-12-04 19:53:20

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 04/12/2023 14:21, David Hildenbrand wrote:
> Baed on mm-stable from a couple of days.
>
> This series proposes an overhaul to our rmap interface, to get rid of the
> "bool compound" / RMAP_COMPOUND parameter with the goal of making the
> interface less error prone, more future proof, and more natural to extend
> to "batching". Also, this converts the interface to always consume
> folio+subpage, which speeds up operations on large folios.
>
> Further, this series adds PTE-batching variants for 4 rmap functions,
> whereby only folio_add_anon_rmap_ptes() is used for batching in this series
> when PTE-remapping a PMD-mapped THP.

I certainly support the objective you have here; making the interfaces clearer,
more consistent and more amenable to batching. I'll try to find some time this
week to review.

>
> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
> -- he carries his own batching variant right now -- and
> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].

Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
on top of [2] once it is merged.

>
> There is some overlap with both series (and some other work, like
> multi-size THP [3]), so that will need some coordination, and likely a
> stepwise inclusion.

Selfishly, I'd really like to get my stuff merged as soon as there is no
technical reason not to. I'd prefer not to add this as a dependency if we can
help it.

>
> I got that started [4], but it made sense to show the whole picture. The
> patches of [4] are contained in here, with one additional patch added
> ("mm/rmap: introduce and use hugetlb_try_share_anon_rmap()") and some
> slight patch description changes.
>
> In general, RMAP batching is an important optimization for PTE-mapped
> THP, especially once we want to move towards a total mapcount or further,
> as shown with my WIP patches on "mapped shared vs. mapped exclusively" [5].
>
> The rmap batching part of [5] is also contained here in a slightly reworked
> fork [and I found a bug du to the "compound" parameter handling in these
> patches that should be fixed here :) ].
>
> This series performs a lot of folio conversion, that could be separated
> if there is a good reason. Most of the added LOC in the diff are only due
> to documentation.
>
> As we're moving to a pte/pmd interface where we clearly express the
> mapping granularity we are dealing with, we first get the remainder of
> hugetlb out of the way, as it is special and expected to remain special: it
> treats everything as a "single logical PTE" and only currently allows
> entire mappings.
>
> Even if we'd ever support partial mappings, I strongly
> assume the interface and implementation will still differ heavily:
> hopefull we can avoid working on subpages/subpage mapcounts completely and
> only add a "count" parameter for them to enable batching.
>
>
> New (extended) hugetlb interface that operate on entire folio:
> * hugetlb_add_new_anon_rmap() -> Already existed
> * hugetlb_add_anon_rmap() -> Already existed
> * hugetlb_try_dup_anon_rmap()
> * hugetlb_try_share_anon_rmap()
> * hugetlb_add_file_rmap()
> * hugetlb_remove_rmap()
>
> New "ordinary" interface for small folios / THP::
> * folio_add_new_anon_rmap() -> Already existed
> * folio_add_anon_rmap_[pte|ptes|pmd]()
> * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
> * folio_try_share_anon_rmap_[pte|pmd]()
> * folio_add_file_rmap_[pte|ptes|pmd]()
> * folio_dup_file_rmap_[pte|ptes|pmd]()
> * folio_remove_rmap_[pte|ptes|pmd]()

I'm not sure if there are official guidelines, but personally if we are
reworking the API, I'd take the opportunity to move "rmap" to the front of the
name, rather than having it burried in the middle as it is for some of these:

rmap_hugetlb_*()

rmap_folio_*()

I guess reading the patches will tell me, but what's the point of "ptes"? Surely
you're either mapping at pte or pmd level, and the number of pages is determined
by the folio size? (or presumably nr param passed in)

Thanks,
Ryan

>
> folio_add_new_anon_rmap() will always map at the biggest granularity
> possible (currently, a single PMD to cover a PMD-sized THP). Could be
> extended if ever required.
>
> In the future, we might want "_pud" variants and eventually "_pmds" variants
> for batching. Further, if hugepd is ever a thing outside hugetlb code,
> we might want some variants for that. All stuff for the distant future.
>
>
> I ran some simple microbenchmarks from [5] on an Intel(R) Xeon(R) Silver
> 4210R: munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
> remapping PMD-mapped THPs on 1 GiB of memory.
>
> For small folios, there is barely a change (< 1 % performance improvement),
> whereby fork() still stands out with 0.74% performance improvement, but
> it might be just some noise. Folio optimizations don't help that much
> with small folios.
>
> For PTE-mapped THP:
> * PTE-remapping a PMD-mapped THP is more than 10% faster.
> -> RMAP batching
> * fork() is more than 4% faster.
> -> folio conversion
> * MADV_DONTNEED is 2% faster
> -> folio conversion
> * COW by writing only a single byte on a COW-shared PTE
> -> folio conversion
> * munmap() is only slightly faster (< 1%).
>
> [1] https://lkml.kernel.org/r/[email protected]
> [2] https://lkml.kernel.org/r/[email protected]
> [3] https://lkml.kernel.org/r/[email protected]
> [4] https://lkml.kernel.org/r/[email protected]
> [5] https://lkml.kernel.org/r/[email protected]
>
> Cc: Andrew Morton <[email protected]>
> Cc: "Matthew Wilcox (Oracle)" <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Ryan Roberts <[email protected]>
> Cc: Yin Fengwei <[email protected]>
> Cc: Mike Kravetz <[email protected]>
> Cc: Muchun Song <[email protected]>
> Cc: Peter Xu <[email protected]>
>
> David Hildenbrand (39):
> mm/rmap: rename hugepage_add* to hugetlb_add*
> mm/rmap: introduce and use hugetlb_remove_rmap()
> mm/rmap: introduce and use hugetlb_add_file_rmap()
> mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
> mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
> mm/rmap: add hugetlb sanity checks
> mm/rmap: convert folio_add_file_rmap_range() into
> folio_add_file_rmap_[pte|ptes|pmd]()
> mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
> mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
> mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
> mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
> mm/rmap: remove page_add_file_rmap()
> mm/rmap: factor out adding folio mappings into __folio_add_rmap()
> mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
> mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
> mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
> mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
> mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
> mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
> mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
> mm/rmap: remove page_add_anon_rmap()
> mm/rmap: remove RMAP_COMPOUND
> mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
> kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
> mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
> mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
> mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
> mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
> mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
> mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
> Documentation: stop referring to page_remove_rmap()
> mm/rmap: remove page_remove_rmap()
> mm/rmap: convert page_dup_file_rmap() to
> folio_dup_file_rmap_[pte|ptes|pmd]()
> mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
> mm/huge_memory: page_try_dup_anon_rmap() ->
> folio_try_dup_anon_rmap_pmd()
> mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
> mm/rmap: remove page_try_dup_anon_rmap()
> mm: convert page_try_share_anon_rmap() to
> folio_try_share_anon_rmap_[pte|pmd]()
> mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED
>
> Documentation/mm/transhuge.rst | 4 +-
> Documentation/mm/unevictable-lru.rst | 4 +-
> include/linux/mm.h | 6 +-
> include/linux/rmap.h | 380 +++++++++++++++++++-----
> kernel/events/uprobes.c | 2 +-
> mm/gup.c | 2 +-
> mm/huge_memory.c | 85 +++---
> mm/hugetlb.c | 21 +-
> mm/internal.h | 12 +-
> mm/khugepaged.c | 17 +-
> mm/ksm.c | 15 +-
> mm/memory-failure.c | 4 +-
> mm/memory.c | 60 ++--
> mm/migrate.c | 12 +-
> mm/migrate_device.c | 41 +--
> mm/mmu_gather.c | 2 +-
> mm/rmap.c | 422 ++++++++++++++++-----------
> mm/swapfile.c | 2 +-
> mm/userfaultfd.c | 2 +-
> 19 files changed, 709 insertions(+), 384 deletions(-)
>

2023-12-05 09:57:25

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

>>
>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>> -- he carries his own batching variant right now -- and
>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>
> Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
> on top of [2] once it is merged.
>
>>
>> There is some overlap with both series (and some other work, like
>> multi-size THP [3]), so that will need some coordination, and likely a
>> stepwise inclusion.
>
> Selfishly, I'd really like to get my stuff merged as soon as there is no
> technical reason not to. I'd prefer not to add this as a dependency if we can
> help it.

It's easy to rework either series on top of each other. The mTHP series has highest priority,
no question, that will go in first.

Regarding the contpte, I think it needs more work. Especially, as raised, to not degrade
order-0 performance. Maybe we won't make the next merge window (and you already predicated
that in some cover letter :P ). Let's see.

But again, the conflicts are all trivial, so I'll happily rebase on top of whatever is
in mm-unstable. Or move the relevant rework to the front so you can just carry
them/base on them. (the batched variants for dup do make the contpte code much easier)

[...]

>>
>>
>> New (extended) hugetlb interface that operate on entire folio:
>> * hugetlb_add_new_anon_rmap() -> Already existed
>> * hugetlb_add_anon_rmap() -> Already existed
>> * hugetlb_try_dup_anon_rmap()
>> * hugetlb_try_share_anon_rmap()
>> * hugetlb_add_file_rmap()
>> * hugetlb_remove_rmap()
>>
>> New "ordinary" interface for small folios / THP::
>> * folio_add_new_anon_rmap() -> Already existed
>> * folio_add_anon_rmap_[pte|ptes|pmd]()
>> * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>> * folio_try_share_anon_rmap_[pte|pmd]()
>> * folio_add_file_rmap_[pte|ptes|pmd]()
>> * folio_dup_file_rmap_[pte|ptes|pmd]()
>> * folio_remove_rmap_[pte|ptes|pmd]()
>
> I'm not sure if there are official guidelines, but personally if we are
> reworking the API, I'd take the opportunity to move "rmap" to the front of the
> name, rather than having it burried in the middle as it is for some of these:
>
> rmap_hugetlb_*()
>
> rmap_folio_*()

No strong opinion. But we might want slightly different names then. For example,
it's "bio_add_folio" and not "bio_folio_add":


rmap_add_new_anon_hugetlb()
rmap_add_anon_hugetlb()
...
rmap_remove_hugetlb()


rmap_add_new_anon_folio()
rmap_add_anon_folio_[pte|ptes|pmd]()
...
rmap_dup_file_folio_[pte|ptes|pmd]()
rmap_remove_folio_[pte|ptes|pmd]()

Thoughts?

>
> I guess reading the patches will tell me, but what's the point of "ptes"? Surely
> you're either mapping at pte or pmd level, and the number of pages is determined
> by the folio size? (or presumably nr param passed in)

It's really (currently) one function to handle 1 vs. multiple PTEs. For example:

void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
struct vm_area_struct *);
#define folio_remove_rmap_pte(folio, page, vma) \
folio_remove_rmap_ptes(folio, page, 1, vma)
void folio_remove_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);


Once could let the compiler generate specialized variants for the single-pte
versions to make the order-0 case faster. For now it's just a helper macro.

--
Cheers,

David / dhildenb

2023-12-05 12:06:03

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

On 04/12/2023 14:21, David Hildenbrand wrote:
> Let's get rid of the compound parameter and instead define implicitly
> which mappings we're adding. That is more future proof, easier to read
> and harder to mess up.
>
> Use an enum to express the granularity internally. Make the compiler
> always special-case on the granularity by using __always_inline.
>
> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
> folio_test_pmd_mappable() check by a config check in the caller and
> sanity checks. Convert the single user of folio_add_file_rmap_range().
>
> This function design can later easily be extended to PUDs and to batch
> PMDs. Note that for now we don't support anything bigger than
> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks

Is that definitely true? Don't we support PUD-mapping file-backed DAX memory?


> will catch if that ever changes.
>
> Next up is removing page_remove_rmap() along with its "compound"
> parameter and smilarly converting all other rmap functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/rmap.h | 47 +++++++++++++++++++++++++++--
> mm/memory.c | 2 +-
> mm/rmap.c | 72 ++++++++++++++++++++++++++++----------------
> 3 files changed, 92 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 77e336f86c72d..a4a30c361ac50 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -186,6 +186,45 @@ typedef int __bitwise rmap_t;
> */
> #define RMAP_COMPOUND ((__force rmap_t)BIT(1))
>
> +/*
> + * Internally, we're using an enum to specify the granularity. Usually,
> + * we make the compiler create specialized variants for the different
> + * granularity.
> + */
> +enum rmap_mode {
> + RMAP_MODE_PTE = 0,
> + RMAP_MODE_PMD,
> +};
> +
> +static inline void __folio_rmap_sanity_checks(struct folio *folio,
> + struct page *page, unsigned int nr_pages, enum rmap_mode mode)
> +{
> + /* hugetlb folios are handled separately. */
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> + !folio_test_large_rmappable(folio), folio);
> +
> + VM_WARN_ON_ONCE(!nr_pages || nr_pages > folio_nr_pages(folio));

nit: I don't think you technically need the second half of this - its covered by
the test below...

> + VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
> + VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);

...this one.

> +
> + switch (mode) {
> + case RMAP_MODE_PTE:
> + break;
> + case RMAP_MODE_PMD:
> + /*
> + * We don't support folios larger than a single PMD yet. So
> + * when RMAP_MODE_PMD is set, we assume that we are creating
> + * a single "entire" mapping of the folio.
> + */
> + VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
> + VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
> + break;
> + default:
> + VM_WARN_ON_ONCE(true);
> + }
> +}
> +
> /*
> * rmap interfaces called when adding or removing pte of page
> */
> @@ -198,8 +237,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
> void page_add_file_rmap(struct page *, struct vm_area_struct *,
> bool compound);
> -void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> - struct vm_area_struct *, bool compound);
> +void folio_add_file_rmap_ptes(struct folio *, struct page *, unsigned int nr,
> + struct vm_area_struct *);
> +#define folio_add_file_rmap_pte(folio, page, vma) \
> + folio_add_file_rmap_ptes(folio, page, 1, vma)
> +void folio_add_file_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *);
> void page_remove_rmap(struct page *, struct vm_area_struct *,
> bool compound);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 1f18ed4a54971..15325587cff01 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4414,7 +4414,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
> folio_add_lru_vma(folio, vma);
> } else {
> add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
> - folio_add_file_rmap_range(folio, page, nr, vma, false);
> + folio_add_file_rmap_ptes(folio, page, nr, vma);
> }
> set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a735ecca47a81..1614d98062948 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1334,31 +1334,19 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> SetPageAnonExclusive(&folio->page);
> }
>
> -/**
> - * folio_add_file_rmap_range - add pte mapping to page range of a folio
> - * @folio: The folio to add the mapping to
> - * @page: The first page to add
> - * @nr_pages: The number of pages which will be mapped
> - * @vma: the vm area in which the mapping is added
> - * @compound: charge the page as compound or small page
> - *
> - * The page range of folio is defined by [first_page, first_page + nr_pages)
> - *
> - * The caller needs to hold the pte lock.
> - */
> -void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> - unsigned int nr_pages, struct vm_area_struct *vma,
> - bool compound)
> +static __always_inline void __folio_add_file_rmap(struct folio *folio,
> + struct page *page, unsigned int nr_pages,
> + struct vm_area_struct *vma, enum rmap_mode mode)
> {
> atomic_t *mapped = &folio->_nr_pages_mapped;
> unsigned int nr_pmdmapped = 0, first;
> int nr = 0;
>
> - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> - VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>
> /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(!compound)) {
> + if (likely(mode == RMAP_MODE_PTE)) {
> do {
> first = atomic_inc_and_test(&page->_mapcount);
> if (first && folio_test_large(folio)) {
> @@ -1369,9 +1357,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> if (first)
> nr++;
> } while (page++, --nr_pages > 0);
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> -
> + } else if (mode == RMAP_MODE_PMD) {
> first = atomic_inc_and_test(&folio->_entire_mapcount);
> if (first) {
> nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1399,6 +1385,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> mlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
> + * @folio: The folio to add the mappings to
> + * @page: The first page to add
> + * @nr_pages: The number of pages that will be mapped using PTEs
> + * @vma: The vm area in which the mappings are added
> + *
> + * The page range of the folio is defined by [page, page + nr_pages)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
> + unsigned int nr_pages, struct vm_area_struct *vma)
> +{
> + __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
> + * @folio: The folio to add the mapping to
> + * @page: The first page to add
> + * @vma: The vm area in which the mapping is added
> + *
> + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /**
> * page_add_file_rmap - add pte mapping to a file page
> * @page: the page to add the mapping to
> @@ -1411,16 +1434,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> bool compound)
> {
> struct folio *folio = page_folio(page);
> - unsigned int nr_pages;
>
> VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>
> if (likely(!compound))
> - nr_pages = 1;
> + folio_add_file_rmap_pte(folio, page, vma);
> else
> - nr_pages = folio_nr_pages(folio);
> -
> - folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
> + folio_add_file_rmap_pmd(folio, page, vma);
> }
>
> /**

2023-12-05 12:22:53

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

On 04/12/2023 14:21, David Hildenbrand wrote:
> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>
> While at it, use more folio operations (but only in the code branch we're
> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_COMPOUND instead of

You mean RMAP_EXCLUSIVE?

> manually setting PageAnonExclusive.
>
> We should never see non-anon pages on that branch: otherwise, the
> existing page_add_anon_rmap() call would have been flawed already.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/huge_memory.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index cb33c6e0404cf..2c037ab3f4916 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2099,6 +2099,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> unsigned long haddr, bool freeze)
> {
> struct mm_struct *mm = vma->vm_mm;
> + struct folio *folio;
> struct page *page;
> pgtable_t pgtable;
> pmd_t old_pmd, _pmd;
> @@ -2194,16 +2195,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> uffd_wp = pmd_swp_uffd_wp(old_pmd);
> } else {
> page = pmd_page(old_pmd);
> + folio = page_folio(page);
> if (pmd_dirty(old_pmd)) {
> dirty = true;
> - SetPageDirty(page);
> + folio_set_dirty(folio);
> }
> write = pmd_write(old_pmd);
> young = pmd_young(old_pmd);
> soft_dirty = pmd_soft_dirty(old_pmd);
> uffd_wp = pmd_uffd_wp(old_pmd);
>
> - VM_BUG_ON_PAGE(!page_count(page), page);
> + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> /*
> * Without "freeze", we'll simply split the PMD, propagating the
> @@ -2220,11 +2223,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> *
> * See page_try_share_anon_rmap(): invalidate PMD first.
> */
> - anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
> + anon_exclusive = PageAnonExclusive(page);
> if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
> freeze = false;
> - if (!freeze)
> - page_ref_add(page, HPAGE_PMD_NR - 1);
> + if (!freeze) {
> + rmap_t rmap_flags = RMAP_NONE;
> +
> + folio_ref_add(folio, HPAGE_PMD_NR - 1);
> + if (anon_exclusive)
> + rmap_flags = RMAP_EXCLUSIVE;

nit: I'd be inclined to make this |= since you're accumulating optional falgs.
Yes, its the only one so it still works as is...

> + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
> + vma, haddr, rmap_flags);
> + }
> }
>
> /*
> @@ -2267,8 +2277,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
> if (write)
> entry = pte_mkwrite(entry, vma);
> - if (anon_exclusive)
> - SetPageAnonExclusive(page + i);
> if (!young)
> entry = pte_mkold(entry);
> /* NOTE: this may set soft-dirty too on some archs */
> @@ -2278,7 +2286,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> entry = pte_mksoft_dirty(entry);
> if (uffd_wp)
> entry = pte_mkuffd_wp(entry);
> - page_add_anon_rmap(page + i, vma, addr, RMAP_NONE);
> }
> VM_BUG_ON(!pte_none(ptep_get(pte)));
> set_pte_at(mm, addr, pte, entry);

2023-12-05 12:26:26

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

On 05.12.23 13:04, Ryan Roberts wrote:
> On 04/12/2023 14:21, David Hildenbrand wrote:
>> Let's get rid of the compound parameter and instead define implicitly
>> which mappings we're adding. That is more future proof, easier to read
>> and harder to mess up.
>>
>> Use an enum to express the granularity internally. Make the compiler
>> always special-case on the granularity by using __always_inline.
>>
>> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
>> folio_test_pmd_mappable() check by a config check in the caller and
>> sanity checks. Convert the single user of folio_add_file_rmap_range().
>>
>> This function design can later easily be extended to PUDs and to batch
>> PMDs. Note that for now we don't support anything bigger than
>> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
>
> Is that definitely true? Don't we support PUD-mapping file-backed DAX memory?

They are not handled via the rmap. Otherwise, all the PMD accounting
(e.g., FilePmdMapped) in RMAP code would already be wrong for them.

And it's easy to verify by looking at zap_huge_pud() that doesn't call
any rmap code.

[...]

>> +
>> +static inline void __folio_rmap_sanity_checks(struct folio *folio,
>> + struct page *page, unsigned int nr_pages, enum rmap_mode mode)
>> +{
>> + /* hugetlb folios are handled separately. */
>> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>> + !folio_test_large_rmappable(folio), folio);
>> +
>> + VM_WARN_ON_ONCE(!nr_pages || nr_pages > folio_nr_pages(folio));
>
> nit: I don't think you technically need the second half of this - its covered by
> the test below...

My thinking was that if nr_pages would be "-1", one could end up with
weird wraparounds.

But yeah, I thought about this as well and might just remove it.

>
>> + VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
>> + VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
>
> ...this one.
>

Thanks!

--
Cheers,

David / dhildenb

2023-12-05 12:27:39

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

On 05.12.23 13:22, Ryan Roberts wrote:
> On 04/12/2023 14:21, David Hildenbrand wrote:
>> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>>
>> While at it, use more folio operations (but only in the code branch we're
>> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_COMPOUND instead of
>
> You mean RMAP_EXCLUSIVE?

Indeed.

[...]

>> - if (!freeze)
>> - page_ref_add(page, HPAGE_PMD_NR - 1);
>> + if (!freeze) {
>> + rmap_t rmap_flags = RMAP_NONE;
>> +
>> + folio_ref_add(folio, HPAGE_PMD_NR - 1);
>> + if (anon_exclusive)
>> + rmap_flags = RMAP_EXCLUSIVE;
>
> nit: I'd be inclined to make this |= since you're accumulating optional falgs.
> Yes, its the only one so it still works as is...


Make sense!

--
Cheers,

David / dhildenb

2023-12-05 12:30:44

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 04/12/2023 19:53, Ryan Roberts wrote:
> On 04/12/2023 14:21, David Hildenbrand wrote:
>> Baed on mm-stable from a couple of days.
>>
>> This series proposes an overhaul to our rmap interface, to get rid of the
>> "bool compound" / RMAP_COMPOUND parameter with the goal of making the
>> interface less error prone, more future proof, and more natural to extend
>> to "batching". Also, this converts the interface to always consume
>> folio+subpage, which speeds up operations on large folios.
>>
>> Further, this series adds PTE-batching variants for 4 rmap functions,
>> whereby only folio_add_anon_rmap_ptes() is used for batching in this series
>> when PTE-remapping a PMD-mapped THP.
>
> I certainly support the objective you have here; making the interfaces clearer,
> more consistent and more amenable to batching. I'll try to find some time this
> week to review.
>
>>
>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>> -- he carries his own batching variant right now -- and
>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>
> Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
> on top of [2] once it is merged.
>
>>
>> There is some overlap with both series (and some other work, like
>> multi-size THP [3]), so that will need some coordination, and likely a
>> stepwise inclusion.
>
> Selfishly, I'd really like to get my stuff merged as soon as there is no
> technical reason not to. I'd prefer not to add this as a dependency if we can
> help it.
>
>>
>> I got that started [4], but it made sense to show the whole picture. The
>> patches of [4] are contained in here, with one additional patch added
>> ("mm/rmap: introduce and use hugetlb_try_share_anon_rmap()") and some
>> slight patch description changes.
>>
>> In general, RMAP batching is an important optimization for PTE-mapped
>> THP, especially once we want to move towards a total mapcount or further,
>> as shown with my WIP patches on "mapped shared vs. mapped exclusively" [5].
>>
>> The rmap batching part of [5] is also contained here in a slightly reworked
>> fork [and I found a bug du to the "compound" parameter handling in these
>> patches that should be fixed here :) ].
>>
>> This series performs a lot of folio conversion, that could be separated
>> if there is a good reason. Most of the added LOC in the diff are only due
>> to documentation.
>>
>> As we're moving to a pte/pmd interface where we clearly express the
>> mapping granularity we are dealing with, we first get the remainder of
>> hugetlb out of the way, as it is special and expected to remain special: it
>> treats everything as a "single logical PTE" and only currently allows
>> entire mappings.
>>
>> Even if we'd ever support partial mappings, I strongly
>> assume the interface and implementation will still differ heavily:
>> hopefull we can avoid working on subpages/subpage mapcounts completely and
>> only add a "count" parameter for them to enable batching.
>>
>>
>> New (extended) hugetlb interface that operate on entire folio:
>> * hugetlb_add_new_anon_rmap() -> Already existed
>> * hugetlb_add_anon_rmap() -> Already existed
>> * hugetlb_try_dup_anon_rmap()
>> * hugetlb_try_share_anon_rmap()
>> * hugetlb_add_file_rmap()
>> * hugetlb_remove_rmap()
>>
>> New "ordinary" interface for small folios / THP::
>> * folio_add_new_anon_rmap() -> Already existed
>> * folio_add_anon_rmap_[pte|ptes|pmd]()
>> * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>> * folio_try_share_anon_rmap_[pte|pmd]()
>> * folio_add_file_rmap_[pte|ptes|pmd]()
>> * folio_dup_file_rmap_[pte|ptes|pmd]()
>> * folio_remove_rmap_[pte|ptes|pmd]()
>
> I'm not sure if there are official guidelines, but personally if we are
> reworking the API, I'd take the opportunity to move "rmap" to the front of the
> name, rather than having it burried in the middle as it is for some of these:
>
> rmap_hugetlb_*()
>
> rmap_folio_*()

In fact, I'd be inclined to drop the "folio" to shorten the name; everything is
a folio, so its not telling us much. e.g.:

New (extended) hugetlb interface that operate on entire folio:
* rmap_hugetlb_add_new_anon() -> Already existed
* rmap_hugetlb_add_anon() -> Already existed
* rmap_hugetlb_try_dup_anon()
* rmap_hugetlb_try_share_anon()
* rmap_hugetlb_add_file()
* rmap_hugetlb_remove()

New "ordinary" interface for small folios / THP::
* rmap_add_new_anon() -> Already existed
* rmap_add_anon_[pte|ptes|pmd]()
* rmap_try_dup_anon_[pte|ptes|pmd]()
* rmap_try_share_anon_[pte|pmd]()
* rmap_add_file_[pte|ptes|pmd]()
* rmap_dup_file_[pte|ptes|pmd]()
* rmap_remove_[pte|ptes|pmd]()


>
> I guess reading the patches will tell me, but what's the point of "ptes"? Surely
> you're either mapping at pte or pmd level, and the number of pages is determined
> by the folio size? (or presumably nr param passed in)
>
> Thanks,
> Ryan
>
>>
>> folio_add_new_anon_rmap() will always map at the biggest granularity
>> possible (currently, a single PMD to cover a PMD-sized THP). Could be
>> extended if ever required.
>>
>> In the future, we might want "_pud" variants and eventually "_pmds" variants
>> for batching. Further, if hugepd is ever a thing outside hugetlb code,
>> we might want some variants for that. All stuff for the distant future.
>>
>>
>> I ran some simple microbenchmarks from [5] on an Intel(R) Xeon(R) Silver
>> 4210R: munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
>> remapping PMD-mapped THPs on 1 GiB of memory.
>>
>> For small folios, there is barely a change (< 1 % performance improvement),
>> whereby fork() still stands out with 0.74% performance improvement, but
>> it might be just some noise. Folio optimizations don't help that much
>> with small folios.
>>
>> For PTE-mapped THP:
>> * PTE-remapping a PMD-mapped THP is more than 10% faster.
>> -> RMAP batching
>> * fork() is more than 4% faster.
>> -> folio conversion
>> * MADV_DONTNEED is 2% faster
>> -> folio conversion
>> * COW by writing only a single byte on a COW-shared PTE
>> -> folio conversion
>> * munmap() is only slightly faster (< 1%).
>>
>> [1] https://lkml.kernel.org/r/[email protected]
>> [2] https://lkml.kernel.org/r/[email protected]
>> [3] https://lkml.kernel.org/r/[email protected]
>> [4] https://lkml.kernel.org/r/[email protected]
>> [5] https://lkml.kernel.org/r/[email protected]
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: "Matthew Wilcox (Oracle)" <[email protected]>
>> Cc: Hugh Dickins <[email protected]>
>> Cc: Ryan Roberts <[email protected]>
>> Cc: Yin Fengwei <[email protected]>
>> Cc: Mike Kravetz <[email protected]>
>> Cc: Muchun Song <[email protected]>
>> Cc: Peter Xu <[email protected]>
>>
>> David Hildenbrand (39):
>> mm/rmap: rename hugepage_add* to hugetlb_add*
>> mm/rmap: introduce and use hugetlb_remove_rmap()
>> mm/rmap: introduce and use hugetlb_add_file_rmap()
>> mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
>> mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
>> mm/rmap: add hugetlb sanity checks
>> mm/rmap: convert folio_add_file_rmap_range() into
>> folio_add_file_rmap_[pte|ptes|pmd]()
>> mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
>> mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
>> mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
>> mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
>> mm/rmap: remove page_add_file_rmap()
>> mm/rmap: factor out adding folio mappings into __folio_add_rmap()
>> mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
>> mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
>> mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
>> mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/rmap: remove page_add_anon_rmap()
>> mm/rmap: remove RMAP_COMPOUND
>> mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
>> kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
>> mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
>> Documentation: stop referring to page_remove_rmap()
>> mm/rmap: remove page_remove_rmap()
>> mm/rmap: convert page_dup_file_rmap() to
>> folio_dup_file_rmap_[pte|ptes|pmd]()
>> mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>> mm/huge_memory: page_try_dup_anon_rmap() ->
>> folio_try_dup_anon_rmap_pmd()
>> mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
>> mm/rmap: remove page_try_dup_anon_rmap()
>> mm: convert page_try_share_anon_rmap() to
>> folio_try_share_anon_rmap_[pte|pmd]()
>> mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED
>>
>> Documentation/mm/transhuge.rst | 4 +-
>> Documentation/mm/unevictable-lru.rst | 4 +-
>> include/linux/mm.h | 6 +-
>> include/linux/rmap.h | 380 +++++++++++++++++++-----
>> kernel/events/uprobes.c | 2 +-
>> mm/gup.c | 2 +-
>> mm/huge_memory.c | 85 +++---
>> mm/hugetlb.c | 21 +-
>> mm/internal.h | 12 +-
>> mm/khugepaged.c | 17 +-
>> mm/ksm.c | 15 +-
>> mm/memory-failure.c | 4 +-
>> mm/memory.c | 60 ++--
>> mm/migrate.c | 12 +-
>> mm/migrate_device.c | 41 +--
>> mm/mmu_gather.c | 2 +-
>> mm/rmap.c | 422 ++++++++++++++++-----------
>> mm/swapfile.c | 2 +-
>> mm/userfaultfd.c | 2 +-
>> 19 files changed, 709 insertions(+), 384 deletions(-)
>>
>

2023-12-05 12:53:00

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 23/39] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()

On 04/12/2023 14:21, David Hildenbrand wrote:
> Let's mimic what we did with folio_add_file_rmap_*() and
> folio_add_anon_rmap_*() so we can similarly replace page_remove_rmap()
> next.
>
> Make the compiler always special-case on the granularity by using
> __always_inline.
>
> We're adding folio_remove_rmap_ptes() handling right away, as we want to
> use that soon for batching rmap operations when unmapping PTE-mapped
> large folios.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/rmap.h | 6 ++++
> mm/rmap.c | 76 ++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 68 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 017b216915f19..dd4ffb1d8ae04 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -241,6 +241,12 @@ void folio_add_file_rmap_pmd(struct folio *, struct page *,
> struct vm_area_struct *);
> void page_remove_rmap(struct page *, struct vm_area_struct *,
> bool compound);
> +void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
> + struct vm_area_struct *);
> +#define folio_remove_rmap_pte(folio, page, vma) \
> + folio_remove_rmap_ptes(folio, page, 1, vma)
> +void folio_remove_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *);
>
> void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address, rmap_t flags);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 3587225055c5e..50b6909157ac1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1463,25 +1463,36 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
> bool compound)
> {
> struct folio *folio = page_folio(page);
> +
> + if (likely(!compound))
> + folio_remove_rmap_pte(folio, page, vma);
> + else
> + folio_remove_rmap_pmd(folio, page, vma);
> +}
> +
> +static __always_inline void __folio_remove_rmap(struct folio *folio,
> + struct page *page, unsigned int nr_pages,
> + struct vm_area_struct *vma, enum rmap_mode mode)
> +{
> atomic_t *mapped = &folio->_nr_pages_mapped;
> - int nr = 0, nr_pmdmapped = 0;
> - bool last;
> + int last, nr = 0, nr_pmdmapped = 0;

nit: you're being inconsistent across the functions with signed vs unsigned for
page counts (e.g. nr, nr_pmdmapped) - see __folio_add_rmap(),
__folio_add_file_rmap(), __folio_add_anon_rmap().

I suggest pick one and stick to it. Personally I'd go with signed int (since
that's what all the counters in struct folio that we are manipulating are,
underneath the atomic_t) then check that nr_pages > 0 in
__folio_rmap_sanity_checks().

> enum node_stat_item idx;
>
> - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> - VM_BUG_ON_PAGE(compound && !PageHead(page), page);
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>
> /* Is page being unmapped by PTE? Is this its last map to be removed? */
> - if (likely(!compound)) {
> - last = atomic_add_negative(-1, &page->_mapcount);
> - nr = last;
> - if (last && folio_test_large(folio)) {
> - nr = atomic_dec_return_relaxed(mapped);
> - nr = (nr < COMPOUND_MAPPED);
> - }
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> + if (likely(mode == RMAP_MODE_PTE)) {
> + do {
> + last = atomic_add_negative(-1, &page->_mapcount);
> + if (last && folio_test_large(folio)) {
> + last = atomic_dec_return_relaxed(mapped);
> + last = (last < COMPOUND_MAPPED);
> + }
>
> + if (last)
> + nr++;
> + } while (page++, --nr_pages > 0);
> + } else if (mode == RMAP_MODE_PMD) {
> last = atomic_add_negative(-1, &folio->_entire_mapcount);
> if (last) {
> nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1517,7 +1528,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
> * is still mapped.
> */
> if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))

folio_test_pmd_mappable() -> folio_test_large()

Since you're converting this to support batch PTE removal, it might as well also
support smaller-than-pmd too?

I currently have a patch to do this same change in the multi-size THP series.


> - if (!compound || nr < nr_pmdmapped)
> + if (mode == RMAP_MODE_PTE || nr < nr_pmdmapped)
> deferred_split_folio(folio);
> }
>
> @@ -1532,6 +1543,43 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
> munlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_remove_rmap_ptes - remove PTE mappings from a page range of a folio
> + * @folio: The folio to remove the mappings from
> + * @page: The first page to remove
> + * @nr_pages: The number of pages that will be removed from the mapping
> + * @vma: The vm area from which the mappings are removed
> + *
> + * The page range of the folio is defined by [page, page + nr_pages)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_remove_rmap_ptes(struct folio *folio, struct page *page,
> + unsigned int nr_pages, struct vm_area_struct *vma)
> +{
> + __folio_remove_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_remove_rmap_pmd - remove a PMD mapping from a page range of a folio
> + * @folio: The folio to remove the mapping from
> + * @page: The first page to remove
> + * @vma: The vm area from which the mapping is removed
> + *
> + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_remove_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /*
> * @arg: enum ttu_flags will be passed to this argument
> */

2023-12-05 13:09:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 23/39] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()

>> +static __always_inline void __folio_remove_rmap(struct folio *folio,
>> + struct page *page, unsigned int nr_pages,
>> + struct vm_area_struct *vma, enum rmap_mode mode)
>> +{
>> atomic_t *mapped = &folio->_nr_pages_mapped;
>> - int nr = 0, nr_pmdmapped = 0;
>> - bool last;
>> + int last, nr = 0, nr_pmdmapped = 0;
>
> nit: you're being inconsistent across the functions with signed vs unsigned for
> page counts (e.g. nr, nr_pmdmapped) - see __folio_add_rmap(),
> __folio_add_file_rmap(), __folio_add_anon_rmap().
>

Definitely.

> I suggest pick one and stick to it. Personally I'd go with signed int (since
> that's what all the counters in struct folio that we are manipulating are,
> underneath the atomic_t) then check that nr_pages > 0 in
> __folio_rmap_sanity_checks().

Can do, but note that the counters are signed to detect udnerflows. It
doesn't make sense here to pass a negative number.

>
>> enum node_stat_item idx;
>>
>> - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>> - VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>>
>> /* Is page being unmapped by PTE? Is this its last map to be removed? */
>> - if (likely(!compound)) {
>> - last = atomic_add_negative(-1, &page->_mapcount);
>> - nr = last;
>> - if (last && folio_test_large(folio)) {
>> - nr = atomic_dec_return_relaxed(mapped);
>> - nr = (nr < COMPOUND_MAPPED);
>> - }
>> - } else if (folio_test_pmd_mappable(folio)) {
>> - /* That test is redundant: it's for safety or to optimize out */
>> + if (likely(mode == RMAP_MODE_PTE)) {
>> + do {
>> + last = atomic_add_negative(-1, &page->_mapcount);
>> + if (last && folio_test_large(folio)) {
>> + last = atomic_dec_return_relaxed(mapped);
>> + last = (last < COMPOUND_MAPPED);
>> + }
>>
>> + if (last)
>> + nr++;
>> + } while (page++, --nr_pages > 0);
>> + } else if (mode == RMAP_MODE_PMD) {
>> last = atomic_add_negative(-1, &folio->_entire_mapcount);
>> if (last) {
>> nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
>> @@ -1517,7 +1528,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>> * is still mapped.
>> */
>> if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))
>
> folio_test_pmd_mappable() -> folio_test_large()
>
> Since you're converting this to support batch PTE removal, it might as well also
> support smaller-than-pmd too?

I remember that you have a patch for that, right? :)

>
> I currently have a patch to do this same change in the multi-size THP series.
>

Ah, yes, and that should go in first.


--
Cheers,

David / dhildenb

2023-12-05 13:12:11

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 04.12.23 15:21, David Hildenbrand wrote:
> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
> remove them.
>
> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
> baching during fork() soon.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---

This is what I currently have (- any renaming):


From 89c3180d6bbbf2236329b405b11e6a8a3cc2c088 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <[email protected]>
Date: Thu, 30 Nov 2023 10:15:17 +0100
Subject: [PATCH] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
remove them.

Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
baching during fork() soon.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 6 --
include/linux/rmap.h | 143 ++++++++++++++++++++++++++++++-------------
2 files changed, 99 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 24c1c7c5a99c0..f7565b35ae931 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
return folio_maybe_dma_pinned(folio);
}

-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
-{
- return folio_needs_cow_for_dma(vma, page_folio(page));
-}
-
/**
* is_zero_page - Query if a page is a zero page
* @page: The page to query
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 06951909bb39b..98862ab7347f2 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
#endif
}

-static inline void __page_dup_rmap(struct page *page, bool compound)
+static inline int __folio_try_dup_anon_rmap(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *src_vma, enum rmap_mode mode)
{
- VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+ bool maybe_pinned;
+ int i;

- if (compound) {
- struct folio *folio = (struct folio *)page;
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+ __folio_rmap_sanity_checks(folio, page, nr_pages, mode);

- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
+ /*
+ * If this folio may have been pinned by the parent process,
+ * don't allow to duplicate the mappings but instead require to e.g.,
+ * copy the subpage immediately for the child so that we'll always
+ * guarantee the pinned folio won't be randomly replaced in the
+ * future on write faults.
+ */
+ maybe_pinned = likely(!folio_is_device_private(folio)) &&
+ unlikely(folio_needs_cow_for_dma(src_vma, folio));
+
+ /*
+ * No need to check+clear for already shared PTEs/PMDs of the
+ * folio. But if any page is PageAnonExclusive, we must fallback to
+ * copying if the folio maybe pinned.
+ */
+ if (likely(mode == RMAP_MODE_PTE)) {
+ if (unlikely(maybe_pinned)) {
+ for (i = 0; i < nr_pages; i++)
+ if (PageAnonExclusive(page + i))
+ return -EBUSY;
+ }
+ do {
+ if (PageAnonExclusive(page))
+ ClearPageAnonExclusive(page);
+ atomic_inc(&page->_mapcount);
+ } while (page++, --nr_pages > 0);
+ } else if (mode == RMAP_MODE_PMD) {
+ if (PageAnonExclusive(page)) {
+ if (unlikely(maybe_pinned))
+ return -EBUSY;
+ ClearPageAnonExclusive(page);
+ }
atomic_inc(&folio->_entire_mapcount);
- } else {
- atomic_inc(&page->_mapcount);
}
+ return 0;
}

/**
- * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
- * anonymous page
- * @page: the page to duplicate the mapping for
- * @compound: the page is mapped as compound or as a small page
- * @vma: the source vma
+ * folio_try_dup_anon_rmap_ptes - try duplicating PTE mappings of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mappings of
+ * @page: The first page to duplicate the mappings of
+ * @nr_pages: The number of pages of which the mapping will be duplicated
+ * @src_vma: The vm area from which the mappings are duplicated
*
- * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq.
+ * The page range of the folio is defined by [page, page + nr_pages)
*
- * Duplicating the mapping can only fail if the page may be pinned; device
- * private pages cannot get pinned and consequently this function cannot fail.
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mappings can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail.
+ *
+ * If duplicating the mappings succeeded, the duplicated PTEs have to be R/O in
+ * the parent and the child. They must *not* be writable after this call.
+ *
+ * Returns 0 if duplicating the mappings succeeded. Returns -EBUSY otherwise.
+ */
+static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio,
+ struct page *page, unsigned int nr_pages,
+ struct vm_area_struct *src_vma)
+{
+ return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma,
+ RMAP_MODE_PTE);
+}
+#define folio_try_dup_anon_rmap_pte(folio, page, vma) \
+ folio_try_dup_anon_rmap_ptes(folio, page, 1, vma)
+
+/**
+ * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range
+ * of a folio
+ * @folio: The folio to duplicate the mapping of
+ * @page: The first page to duplicate the mapping of
+ * @src_vma: The vm area from which the mapping is duplicated
+ *
+ * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
+ *
+ * The caller needs to hold the page table lock and the
+ * vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mapping can only fail if the folio may be pinned; device
+ * private folios cannot get pinned and consequently this function cannot fail.
*
- * If duplicating the mapping succeeds, the page has to be mapped R/O into
- * the parent and the child. It must *not* get mapped writable after this call.
+ * If duplicating the mapping succeeds, the duplicated PMD has to be R/O in
+ * the parent and the child. They must *not* be writable after this call.
*
* Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise.
*/
+static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
+ struct page *page, struct vm_area_struct *src_vma)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ return __folio_try_dup_anon_rmap(folio, page, HPAGE_PMD_NR, src_vma,
+ RMAP_MODE_PMD);
+#else
+ WARN_ON_ONCE(true);
+#endif
+}
+
static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
struct vm_area_struct *vma)
{
- VM_BUG_ON_PAGE(!PageAnon(page), page);
-
- /*
- * No need to check+clear for already shared pages, including KSM
- * pages.
- */
- if (!PageAnonExclusive(page))
- goto dup;
+ struct folio *folio = page_folio(page);

- /*
- * If this page may have been pinned by the parent process,
- * don't allow to duplicate the mapping but instead require to e.g.,
- * copy the page immediately for the child so that we'll always
- * guarantee the pinned page won't be randomly replaced in the
- * future on write faults.
- */
- if (likely(!is_device_private_page(page) &&
- unlikely(page_needs_cow_for_dma(vma, page))))
- return -EBUSY;
-
- ClearPageAnonExclusive(page);
- /*
- * It's okay to share the anon page between both processes, mapping
- * the page R/O into both processes.
- */
-dup:
- __page_dup_rmap(page, compound);
- return 0;
+ if (likely(!compound))
+ return folio_try_dup_anon_rmap_pte(folio, page, vma);
+ return folio_try_dup_anon_rmap_pmd(folio, page, vma);
}

/**
--
2.41.0


--
Cheers,

David / dhildenb

2023-12-05 13:13:17

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 04/12/2023 14:21, David Hildenbrand wrote:
> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
> remove them.
>
> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
> baching during fork() soon.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/mm.h | 6 --
> include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
> 2 files changed, 100 insertions(+), 51 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 24c1c7c5a99c0..f7565b35ae931 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
> return folio_maybe_dma_pinned(folio);
> }
>
> -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
> - struct page *page)
> -{
> - return folio_needs_cow_for_dma(vma, page_folio(page));
> -}
> -
> /**
> * is_zero_page - Query if a page is a zero page
> * @page: The page to query
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 21d72cc602adc..84439f7720c62 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
> #endif
> }
>
> -static inline void __page_dup_rmap(struct page *page, bool compound)
> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,

__always_inline?

> + struct page *page, unsigned int nr_pages,
> + struct vm_area_struct *src_vma, enum rmap_mode mode)
> {
> - VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
> + int i;
>
> - if (compound) {
> - struct folio *folio = (struct folio *)page;
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> - VM_BUG_ON_PAGE(compound && !PageHead(page), page);
> - atomic_inc(&folio->_entire_mapcount);
> - } else {
> - atomic_inc(&page->_mapcount);
> + /*
> + * No need to check+clear for already shared PTEs/PMDs of the folio.
> + * This includes PTE mappings of (order-0) KSM folios.
> + */
> + if (likely(mode == RMAP_MODE_PTE)) {

Presumbly if __always_inline then the compiler will remove this if/else and just
keep the part indicated by mode? In which case "likely" is pretty useless? Same
for all similar sites in the other patches.

> + for (i = 0; i < nr_pages; i++) {
> + if (PageAnonExclusive(page + i))
> + goto clear;
> + }
> + } else if (mode == RMAP_MODE_PMD) {
> + if (PageAnonExclusive(page))
> + goto clear;
> }
> + goto dup;
> +
> +clear:
> + /*
> + * If this folio may have been pinned by the parent process,
> + * don't allow to duplicate the mappings but instead require to e.g.,
> + * copy the subpage immediately for the child so that we'll always
> + * guarantee the pinned folio won't be randomly replaced in the
> + * future on write faults.
> + */
> + if (likely(!folio_is_device_private(folio) &&
> + unlikely(folio_needs_cow_for_dma(src_vma, folio))))
> + return -EBUSY;
> +
> + if (likely(mode == RMAP_MODE_PTE)) {
> + for (i = 0; i < nr_pages; i++)

Do you really need to reset i=0 here? You have already checked that lower pages
are shared in the above loop, so can't you just start from the first exclusive
page here?

> + ClearPageAnonExclusive(page + i);
> + } else if (mode == RMAP_MODE_PMD) {
> + ClearPageAnonExclusive(page);
> + }
> +
> +dup:
> + __folio_dup_rmap(folio, page, nr_pages, mode);
> + return 0;
> }
>
> /**
> - * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
> - * anonymous page
> - * @page: the page to duplicate the mapping for
> - * @compound: the page is mapped as compound or as a small page
> - * @vma: the source vma
> + * folio_try_dup_anon_rmap_ptes - try duplicating PTE mappings of a page range
> + * of a folio
> + * @folio: The folio to duplicate the mappings of
> + * @page: The first page to duplicate the mappings of
> + * @nr_pages: The number of pages of which the mapping will be duplicated
> + * @src_vma: The vm area from which the mappings are duplicated
> *
> - * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq.
> + * The page range of the folio is defined by [page, page + nr_pages)
> *
> - * Duplicating the mapping can only fail if the page may be pinned; device
> - * private pages cannot get pinned and consequently this function cannot fail.
> + * The caller needs to hold the page table lock and the
> + * vma->vma_mm->write_protect_seq.
> + *
> + * Duplicating the mappings can only fail if the folio may be pinned; device
> + * private folios cannot get pinned and consequently this function cannot fail.
> + *
> + * If duplicating the mappings succeeded, the duplicated PTEs have to be R/O in
> + * the parent and the child. They must *not* be writable after this call.
> + *
> + * Returns 0 if duplicating the mappings succeeded. Returns -EBUSY otherwise.
> + */
> +static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio,
> + struct page *page, unsigned int nr_pages,
> + struct vm_area_struct *src_vma)
> +{
> + return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma,
> + RMAP_MODE_PTE);
> +}
> +#define folio_try_dup_anon_rmap_pte(folio, page, vma) \
> + folio_try_dup_anon_rmap_ptes(folio, page, 1, vma)
> +
> +/**
> + * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range
> + * of a folio
> + * @folio: The folio to duplicate the mapping of
> + * @page: The first page to duplicate the mapping of
> + * @src_vma: The vm area from which the mapping is duplicated
> + *
> + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock and the
> + * vma->vma_mm->write_protect_seq.
> *
> - * If duplicating the mapping succeeds, the page has to be mapped R/O into
> - * the parent and the child. It must *not* get mapped writable after this call.
> + * Duplicating the mapping can only fail if the folio may be pinned; device
> + * private folios cannot get pinned and consequently this function cannot fail.
> + *
> + * If duplicating the mapping succeeds, the duplicated PMD has to be R/O in
> + * the parent and the child. They must *not* be writable after this call.
> *
> * Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise.
> */
> +static inline int folio_try_dup_anon_rmap_pmd(struct folio *folio,
> + struct page *page, struct vm_area_struct *src_vma)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + return __folio_try_dup_anon_rmap(folio, page, HPAGE_PMD_NR, src_vma,
> + RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
> struct vm_area_struct *vma)
> {
> - VM_BUG_ON_PAGE(!PageAnon(page), page);
> -
> - /*
> - * No need to check+clear for already shared pages, including KSM
> - * pages.
> - */
> - if (!PageAnonExclusive(page))
> - goto dup;
> -
> - /*
> - * If this page may have been pinned by the parent process,
> - * don't allow to duplicate the mapping but instead require to e.g.,
> - * copy the page immediately for the child so that we'll always
> - * guarantee the pinned page won't be randomly replaced in the
> - * future on write faults.
> - */
> - if (likely(!is_device_private_page(page) &&
> - unlikely(page_needs_cow_for_dma(vma, page))))
> - return -EBUSY;
> + struct folio *folio = page_folio(page);
>
> - ClearPageAnonExclusive(page);
> - /*
> - * It's okay to share the anon page between both processes, mapping
> - * the page R/O into both processes.
> - */
> -dup:
> - __page_dup_rmap(page, compound);
> - return 0;
> + if (likely(!compound))
> + return folio_try_dup_anon_rmap_pte(folio, page, vma);
> + return folio_try_dup_anon_rmap_pmd(folio, page, vma);
> }
>
> /**

2023-12-05 13:17:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05.12.23 14:12, Ryan Roberts wrote:
> On 04/12/2023 14:21, David Hildenbrand wrote:
>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>> remove them.
>>
>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>> baching during fork() soon.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> include/linux/mm.h | 6 --
>> include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>> 2 files changed, 100 insertions(+), 51 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 24c1c7c5a99c0..f7565b35ae931 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
>> return folio_maybe_dma_pinned(folio);
>> }
>>
>> -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>> - struct page *page)
>> -{
>> - return folio_needs_cow_for_dma(vma, page_folio(page));
>> -}
>> -
>> /**
>> * is_zero_page - Query if a page is a zero page
>> * @page: The page to query
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index 21d72cc602adc..84439f7720c62 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
>> #endif
>> }
>>
>> -static inline void __page_dup_rmap(struct page *page, bool compound)
>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>
> __always_inline?

Yes.

>
>> + struct page *page, unsigned int nr_pages,
>> + struct vm_area_struct *src_vma, enum rmap_mode mode)
>> {
>> - VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
>> + int i;
>>
>> - if (compound) {
>> - struct folio *folio = (struct folio *)page;
>> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>>
>> - VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>> - atomic_inc(&folio->_entire_mapcount);
>> - } else {
>> - atomic_inc(&page->_mapcount);
>> + /*
>> + * No need to check+clear for already shared PTEs/PMDs of the folio.
>> + * This includes PTE mappings of (order-0) KSM folios.
>> + */
>> + if (likely(mode == RMAP_MODE_PTE)) {
>
> Presumbly if __always_inline then the compiler will remove this if/else and just
> keep the part indicated by mode? In which case "likely" is pretty useless? Same
> for all similar sites in the other patches.

Yes, also had this in mind. As long as we use __always_inline it
shouldn't ever matter.

>
>> + for (i = 0; i < nr_pages; i++) {
>> + if (PageAnonExclusive(page + i))
>> + goto clear;
>> + }
>> + } else if (mode == RMAP_MODE_PMD) {
>> + if (PageAnonExclusive(page))
>> + goto clear;
>> }
>> + goto dup;
>> +
>> +clear:
>> + /*
>> + * If this folio may have been pinned by the parent process,
>> + * don't allow to duplicate the mappings but instead require to e.g.,
>> + * copy the subpage immediately for the child so that we'll always
>> + * guarantee the pinned folio won't be randomly replaced in the
>> + * future on write faults.
>> + */
>> + if (likely(!folio_is_device_private(folio) &&
>> + unlikely(folio_needs_cow_for_dma(src_vma, folio))))
>> + return -EBUSY;
>> +
>> + if (likely(mode == RMAP_MODE_PTE)) {
>> + for (i = 0; i < nr_pages; i++)
>
> Do you really need to reset i=0 here? You have already checked that lower pages
> are shared in the above loop, so can't you just start from the first exclusive
> page here?

It's best to check the updated patch I sent.

--
Cheers,

David / dhildenb

2023-12-05 13:18:58

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05.12.23 14:17, David Hildenbrand wrote:
> On 05.12.23 14:12, Ryan Roberts wrote:
>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>> remove them.
>>>
>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>> baching during fork() soon.
>>>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>> ---
>>> include/linux/mm.h | 6 --
>>> include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>> 2 files changed, 100 insertions(+), 51 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
>>> return folio_maybe_dma_pinned(folio);
>>> }
>>>
>>> -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>> - struct page *page)
>>> -{
>>> - return folio_needs_cow_for_dma(vma, page_folio(page));
>>> -}
>>> -
>>> /**
>>> * is_zero_page - Query if a page is a zero page
>>> * @page: The page to query
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index 21d72cc602adc..84439f7720c62 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
>>> #endif
>>> }
>>>
>>> -static inline void __page_dup_rmap(struct page *page, bool compound)
>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>
>> __always_inline?
>
> Yes.

Ah, no, I did this for a reason. This function lives in a header, so it
will always be inlined.

--
Cheers,

David / dhildenb

2023-12-05 13:31:55

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 05/12/2023 09:56, David Hildenbrand wrote:
>>>
>>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>>> -- he carries his own batching variant right now -- and
>>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>>
>> Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
>> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
>> on top of [2] once it is merged.
>>
>>>
>>> There is some overlap with both series (and some other work, like
>>> multi-size THP [3]), so that will need some coordination, and likely a
>>> stepwise inclusion.
>>
>> Selfishly, I'd really like to get my stuff merged as soon as there is no
>> technical reason not to. I'd prefer not to add this as a dependency if we can
>> help it.
>
> It's easy to rework either series on top of each other. The mTHP series has
> highest priority,
> no question, that will go in first.

Music to my ears! It would be great to either get a reviewed-by or feedback on
why not, for the key 2 patches in that series (3 & 4) and also your opinion on
whether we need to wait for compaction to land (see cover letter). It would be
great to get this into linux-next ASAP IMHO.

>
> Regarding the contpte, I think it needs more work. Especially, as raised, to not
> degrade
> order-0 performance. Maybe we won't make the next merge window (and you already
> predicated
> that in some cover letter :P ). Let's see.

Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
for patch 2 in that series - would also be really helpful if you had a chance to
look at patch 2 - its new for v3.

>
> But again, the conflicts are all trivial, so I'll happily rebase on top of
> whatever is
> in mm-unstable. Or move the relevant rework to the front so you can just carry
> them/base on them. (the batched variants for dup do make the contpte code much
> easier)

So perhaps we should aim for mTHP, then this, then contpte last, benefiting from
the batching.

>
> [...]
>
>>>
>>>
>>> New (extended) hugetlb interface that operate on entire folio:
>>>   * hugetlb_add_new_anon_rmap() -> Already existed
>>>   * hugetlb_add_anon_rmap() -> Already existed
>>>   * hugetlb_try_dup_anon_rmap()
>>>   * hugetlb_try_share_anon_rmap()
>>>   * hugetlb_add_file_rmap()
>>>   * hugetlb_remove_rmap()
>>>
>>> New "ordinary" interface for small folios / THP::
>>>   * folio_add_new_anon_rmap() -> Already existed
>>>   * folio_add_anon_rmap_[pte|ptes|pmd]()
>>>   * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>>>   * folio_try_share_anon_rmap_[pte|pmd]()
>>>   * folio_add_file_rmap_[pte|ptes|pmd]()
>>>   * folio_dup_file_rmap_[pte|ptes|pmd]()
>>>   * folio_remove_rmap_[pte|ptes|pmd]()
>>
>> I'm not sure if there are official guidelines, but personally if we are
>> reworking the API, I'd take the opportunity to move "rmap" to the front of the
>> name, rather than having it burried in the middle as it is for some of these:
>>
>> rmap_hugetlb_*()
>>
>> rmap_folio_*()
>
> No strong opinion. But we might want slightly different names then. For example,
> it's "bio_add_folio" and not "bio_folio_add":
>
>
> rmap_add_new_anon_hugetlb()
> rmap_add_anon_hugetlb()
> ...
> rmap_remove_hugetlb()
>
>
> rmap_add_new_anon_folio()
> rmap_add_anon_folio_[pte|ptes|pmd]()
> ...
> rmap_dup_file_folio_[pte|ptes|pmd]()
> rmap_remove_folio_[pte|ptes|pmd]()
>
> Thoughts?

Having now reviewed your series, I have a less strong opinion, perhaps it's
actually best with your original names; "folio" is actually the subject after
all; it's the thing being operated on.


>
>>
>> I guess reading the patches will tell me, but what's the point of "ptes"? Surely
>> you're either mapping at pte or pmd level, and the number of pages is determined
>> by the folio size? (or presumably nr param passed in)
>
> It's really (currently) one function to handle 1 vs. multiple PTEs. For example:
>
> void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
>         struct vm_area_struct *);
> #define folio_remove_rmap_pte(folio, page, vma) \
>     folio_remove_rmap_ptes(folio, page, 1, vma)
> void folio_remove_rmap_pmd(struct folio *, struct page *,
>         struct vm_area_struct *);

Yeah now that I've looked at the series, this makes sense. "ptes" was originally
making me think of contpte, but I suspect I'll be the only one with that
association :)
>
>
> Once could let the compiler generate specialized variants for the single-pte
> versions to make the order-0 case faster. For now it's just a helper macro.
>

2023-12-05 13:33:26

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05.12.23 14:17, David Hildenbrand wrote:
> On 05.12.23 14:12, Ryan Roberts wrote:
>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>> remove them.
>>>
>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>> baching during fork() soon.
>>>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>> ---
>>> include/linux/mm.h | 6 --
>>> include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>> 2 files changed, 100 insertions(+), 51 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
>>> return folio_maybe_dma_pinned(folio);
>>> }
>>>
>>> -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>> - struct page *page)
>>> -{
>>> - return folio_needs_cow_for_dma(vma, page_folio(page));
>>> -}
>>> -
>>> /**
>>> * is_zero_page - Query if a page is a zero page
>>> * @page: The page to query
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index 21d72cc602adc..84439f7720c62 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct folio *folio,
>>> #endif
>>> }
>>>
>>> -static inline void __page_dup_rmap(struct page *page, bool compound)
>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>
>> __always_inline?
>
> Yes.
>
>>
>>> + struct page *page, unsigned int nr_pages,
>>> + struct vm_area_struct *src_vma, enum rmap_mode mode)
>>> {
>>> - VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
>>> + int i;
>>>
>>> - if (compound) {
>>> - struct folio *folio = (struct folio *)page;
>>> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>>>
>>> - VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>>> - atomic_inc(&folio->_entire_mapcount);
>>> - } else {
>>> - atomic_inc(&page->_mapcount);
>>> + /*
>>> + * No need to check+clear for already shared PTEs/PMDs of the folio.
>>> + * This includes PTE mappings of (order-0) KSM folios.
>>> + */
>>> + if (likely(mode == RMAP_MODE_PTE)) {
>>
>> Presumbly if __always_inline then the compiler will remove this if/else and just
>> keep the part indicated by mode? In which case "likely" is pretty useless? Same
>> for all similar sites in the other patches.
>
> Yes, also had this in mind. As long as we use __always_inline it
> shouldn't ever matter.

It seems to be cleanest to just do:

switch (mode) {
case RMAP_MODE_PTE:
...
break;
case RMAP_MODE_PMD:
...
break;
}

--
Cheers,

David / dhildenb

2023-12-05 13:37:46

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 23/39] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()

On 05/12/2023 13:09, David Hildenbrand wrote:
>>> +static __always_inline void __folio_remove_rmap(struct folio *folio,
>>> +        struct page *page, unsigned int nr_pages,
>>> +        struct vm_area_struct *vma, enum rmap_mode mode)
>>> +{
>>>       atomic_t *mapped = &folio->_nr_pages_mapped;
>>> -    int nr = 0, nr_pmdmapped = 0;
>>> -    bool last;
>>> +    int last, nr = 0, nr_pmdmapped = 0;
>>
>> nit: you're being inconsistent across the functions with signed vs unsigned for
>> page counts (e.g. nr, nr_pmdmapped) - see __folio_add_rmap(),
>> __folio_add_file_rmap(), __folio_add_anon_rmap().
>>
>
> Definitely.
>
>> I suggest pick one and stick to it. Personally I'd go with signed int (since
>> that's what all the counters in struct folio that we are manipulating are,
>> underneath the atomic_t) then check that nr_pages > 0 in
>> __folio_rmap_sanity_checks().
>
> Can do, but note that the counters are signed to detect udnerflows. It doesn't
> make sense here to pass a negative number.

I agree it doesn't make sense to pass negative - hence the check.

These 2 functions are inconsistent on size, but agree on signed:

long folio_nr_pages(struct folio *folio)
int folio_nr_pages_mapped(struct folio *folio)

I don't have a strong opinon.

>
>>
>>>       enum node_stat_item idx;
>>>   -    VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
>>> -    VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>>> +    __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>>>         /* Is page being unmapped by PTE? Is this its last map to be removed? */
>>> -    if (likely(!compound)) {
>>> -        last = atomic_add_negative(-1, &page->_mapcount);
>>> -        nr = last;
>>> -        if (last && folio_test_large(folio)) {
>>> -            nr = atomic_dec_return_relaxed(mapped);
>>> -            nr = (nr < COMPOUND_MAPPED);
>>> -        }
>>> -    } else if (folio_test_pmd_mappable(folio)) {
>>> -        /* That test is redundant: it's for safety or to optimize out */
>>> +    if (likely(mode == RMAP_MODE_PTE)) {
>>> +        do {
>>> +            last = atomic_add_negative(-1, &page->_mapcount);
>>> +            if (last && folio_test_large(folio)) {
>>> +                last = atomic_dec_return_relaxed(mapped);
>>> +                last = (last < COMPOUND_MAPPED);
>>> +            }
>>>   +            if (last)
>>> +                nr++;
>>> +        } while (page++, --nr_pages > 0);
>>> +    } else if (mode == RMAP_MODE_PMD) {
>>>           last = atomic_add_negative(-1, &folio->_entire_mapcount);
>>>           if (last) {
>>>               nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
>>> @@ -1517,7 +1528,7 @@ void page_remove_rmap(struct page *page, struct
>>> vm_area_struct *vma,
>>>            * is still mapped.
>>>            */
>>>           if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))
>>
>> folio_test_pmd_mappable() -> folio_test_large()
>>
>> Since you're converting this to support batch PTE removal, it might as well also
>> support smaller-than-pmd too?
>
> I remember that you have a patch for that, right? :)
>
>>
>> I currently have a patch to do this same change in the multi-size THP series.
>>
>
> Ah, yes, and that should go in first.
>
>

2023-12-05 13:39:58

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 05.12.23 14:31, Ryan Roberts wrote:
> On 05/12/2023 09:56, David Hildenbrand wrote:
>>>>
>>>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>>>> -- he carries his own batching variant right now -- and
>>>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>>>
>>> Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
>>> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
>>> on top of [2] once it is merged.
>>>
>>>>
>>>> There is some overlap with both series (and some other work, like
>>>> multi-size THP [3]), so that will need some coordination, and likely a
>>>> stepwise inclusion.
>>>
>>> Selfishly, I'd really like to get my stuff merged as soon as there is no
>>> technical reason not to. I'd prefer not to add this as a dependency if we can
>>> help it.
>>
>> It's easy to rework either series on top of each other. The mTHP series has
>> highest priority,
>> no question, that will go in first.
>
> Music to my ears! It would be great to either get a reviewed-by or feedback on
> why not, for the key 2 patches in that series (3 & 4) and also your opinion on
> whether we need to wait for compaction to land (see cover letter). It would be
> great to get this into linux-next ASAP IMHO.

On it :)

>
>>
>> Regarding the contpte, I think it needs more work. Especially, as raised, to not
>> degrade
>> order-0 performance. Maybe we won't make the next merge window (and you already
>> predicated
>> that in some cover letter :P ). Let's see.
>
> Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
> for patch 2 in that series - would also be really helpful if you had a chance to
> look at patch 2 - its new for v3.

I only skimmed over it, but it seems to go into the direction we'll
need. Keeping order-0 performance unharmed should have highest priority.
Hopefully my microbenchmarks are helpful.

>
>>
>> But again, the conflicts are all trivial, so I'll happily rebase on top of
>> whatever is
>> in mm-unstable. Or move the relevant rework to the front so you can just carry
>> them/base on them. (the batched variants for dup do make the contpte code much
>> easier)
>
> So perhaps we should aim for mTHP, then this, then contpte last, benefiting from
> the batching.

Yeah. And again, I don't care too much if I have to rebase on top of
your work if this here takes longer. It's all a fairly trivial conversion.

>>
>> [...]
>>
>>>>
>>>>
>>>> New (extended) hugetlb interface that operate on entire folio:
>>>>   * hugetlb_add_new_anon_rmap() -> Already existed
>>>>   * hugetlb_add_anon_rmap() -> Already existed
>>>>   * hugetlb_try_dup_anon_rmap()
>>>>   * hugetlb_try_share_anon_rmap()
>>>>   * hugetlb_add_file_rmap()
>>>>   * hugetlb_remove_rmap()
>>>>
>>>> New "ordinary" interface for small folios / THP::
>>>>   * folio_add_new_anon_rmap() -> Already existed
>>>>   * folio_add_anon_rmap_[pte|ptes|pmd]()
>>>>   * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>>>>   * folio_try_share_anon_rmap_[pte|pmd]()
>>>>   * folio_add_file_rmap_[pte|ptes|pmd]()
>>>>   * folio_dup_file_rmap_[pte|ptes|pmd]()
>>>>   * folio_remove_rmap_[pte|ptes|pmd]()
>>>
>>> I'm not sure if there are official guidelines, but personally if we are
>>> reworking the API, I'd take the opportunity to move "rmap" to the front of the
>>> name, rather than having it burried in the middle as it is for some of these:
>>>
>>> rmap_hugetlb_*()
>>>
>>> rmap_folio_*()
>>
>> No strong opinion. But we might want slightly different names then. For example,
>> it's "bio_add_folio" and not "bio_folio_add":
>>
>>
>> rmap_add_new_anon_hugetlb()
>> rmap_add_anon_hugetlb()
>> ...
>> rmap_remove_hugetlb()
>>
>>
>> rmap_add_new_anon_folio()
>> rmap_add_anon_folio_[pte|ptes|pmd]()
>> ...
>> rmap_dup_file_folio_[pte|ptes|pmd]()
>> rmap_remove_folio_[pte|ptes|pmd]()
>>
>> Thoughts?
>
> Having now reviewed your series, I have a less strong opinion, perhaps it's
> actually best with your original names; "folio" is actually the subject after
> all; it's the thing being operated on.
>

I think having "folio" in there looks cleaner and more consistent to
other functions.

I tend to like "rmap_dup_file_folio_[pte|ptes|pmd]()", because then we
have "file folio" and "anon folio" as one word.

But then I wonder about the hugetlb part. Maybe simply
"hugetlb_rmap_remove_folio()" etc.

Having the "hugetlb_" prefix at the beginning feels like the right thing
to do, looking at orher hugetlb special-handlings.

But I'll wait a bit until I go crazy on renaming :)

>
>>
>>>
>>> I guess reading the patches will tell me, but what's the point of "ptes"? Surely
>>> you're either mapping at pte or pmd level, and the number of pages is determined
>>> by the folio size? (or presumably nr param passed in)
>>
>> It's really (currently) one function to handle 1 vs. multiple PTEs. For example:
>>
>> void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
>>         struct vm_area_struct *);
>> #define folio_remove_rmap_pte(folio, page, vma) \
>>     folio_remove_rmap_ptes(folio, page, 1, vma)
>> void folio_remove_rmap_pmd(struct folio *, struct page *,
>>         struct vm_area_struct *);
>
> Yeah now that I've looked at the series, this makes sense. "ptes" was originally
> making me think of contpte, but I suspect I'll be the only one with that
> association :)

Ah, yes :)

--
Cheers,

David / dhildenb

2023-12-05 13:41:15

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05/12/2023 13:18, David Hildenbrand wrote:
> On 05.12.23 14:17, David Hildenbrand wrote:
>> On 05.12.23 14:12, Ryan Roberts wrote:
>>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>>> remove them.
>>>>
>>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>>> baching during fork() soon.
>>>>
>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>> ---
>>>>    include/linux/mm.h   |   6 --
>>>>    include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>>>    2 files changed, 100 insertions(+), 51 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct
>>>> vm_area_struct *vma,
>>>>        return folio_maybe_dma_pinned(folio);
>>>>    }
>>>>    -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>>> -                      struct page *page)
>>>> -{
>>>> -    return folio_needs_cow_for_dma(vma, page_folio(page));
>>>> -}
>>>> -
>>>>    /**
>>>>     * is_zero_page - Query if a page is a zero page
>>>>     * @page: The page to query
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 21d72cc602adc..84439f7720c62 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct
>>>> folio *folio,
>>>>    #endif
>>>>    }
>>>>    -static inline void __page_dup_rmap(struct page *page, bool compound)
>>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>>
>>> __always_inline?
>>
>> Yes.
>
> Ah, no, I did this for a reason. This function lives in a header, so it will
> always be inlined.
>

Really? It will certainly be duplicated across every compilation unit, but
that's separate from being inlined - if the optimizer is off, won't it just end
up as an out-of-line function in every compilation unit?

2023-12-05 13:45:35

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05/12/2023 13:32, David Hildenbrand wrote:
> On 05.12.23 14:17, David Hildenbrand wrote:
>> On 05.12.23 14:12, Ryan Roberts wrote:
>>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>>> remove them.
>>>>
>>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>>> baching during fork() soon.
>>>>
>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>> ---
>>>>    include/linux/mm.h   |   6 --
>>>>    include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>>>    2 files changed, 100 insertions(+), 51 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct
>>>> vm_area_struct *vma,
>>>>        return folio_maybe_dma_pinned(folio);
>>>>    }
>>>>    -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>>> -                      struct page *page)
>>>> -{
>>>> -    return folio_needs_cow_for_dma(vma, page_folio(page));
>>>> -}
>>>> -
>>>>    /**
>>>>     * is_zero_page - Query if a page is a zero page
>>>>     * @page: The page to query
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 21d72cc602adc..84439f7720c62 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct
>>>> folio *folio,
>>>>    #endif
>>>>    }
>>>>    -static inline void __page_dup_rmap(struct page *page, bool compound)
>>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>>
>>> __always_inline?
>>
>> Yes.
>>
>>>
>>>> +        struct page *page, unsigned int nr_pages,
>>>> +        struct vm_area_struct *src_vma, enum rmap_mode mode)
>>>>    {
>>>> -    VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
>>>> +    int i;
>>>>    -    if (compound) {
>>>> -        struct folio *folio = (struct folio *)page;
>>>> +    VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>>>>    -        VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>>>> -        atomic_inc(&folio->_entire_mapcount);
>>>> -    } else {
>>>> -        atomic_inc(&page->_mapcount);
>>>> +    /*
>>>> +     * No need to check+clear for already shared PTEs/PMDs of the folio.
>>>> +     * This includes PTE mappings of (order-0) KSM folios.
>>>> +     */
>>>> +    if (likely(mode == RMAP_MODE_PTE)) {
>>>
>>> Presumbly if __always_inline then the compiler will remove this if/else and just
>>> keep the part indicated by mode? In which case "likely" is pretty useless? Same
>>> for all similar sites in the other patches.
>>
>> Yes, also had this in mind. As long as we use __always_inline it
>> shouldn't ever matter.
>
> It seems to be cleanest to just do:
>
> switch (mode) {
> case RMAP_MODE_PTE:
>     ...
>     break;
> case RMAP_MODE_PMD:
>     ...
>     break;
> }
>

Agreed.

2023-12-05 13:49:45

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 05/12/2023 13:39, David Hildenbrand wrote:
> On 05.12.23 14:31, Ryan Roberts wrote:
>> On 05/12/2023 09:56, David Hildenbrand wrote:
>>>>>
>>>>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>>>>> -- he carries his own batching variant right now -- and
>>>>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>>>>
>>>> Note that the contpte series at [2] has a new patch in v3 (patch 2), which
>>>> could
>>>> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
>>>> on top of [2] once it is merged.
>>>>
>>>>>
>>>>> There is some overlap with both series (and some other work, like
>>>>> multi-size THP [3]), so that will need some coordination, and likely a
>>>>> stepwise inclusion.
>>>>
>>>> Selfishly, I'd really like to get my stuff merged as soon as there is no
>>>> technical reason not to. I'd prefer not to add this as a dependency if we can
>>>> help it.
>>>
>>> It's easy to rework either series on top of each other. The mTHP series has
>>> highest priority,
>>> no question, that will go in first.
>>
>> Music to my ears! It would be great to either get a reviewed-by or feedback on
>> why not, for the key 2 patches in that series (3 & 4) and also your opinion on
>> whether we need to wait for compaction to land (see cover letter). It would be
>> great to get this into linux-next ASAP IMHO.
>
> On it :)
>
>>
>>>
>>> Regarding the contpte, I think it needs more work. Especially, as raised, to not
>>> degrade
>>> order-0 performance. Maybe we won't make the next merge window (and you already
>>> predicated
>>> that in some cover letter :P ). Let's see.
>>
>> Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
>> for patch 2 in that series - would also be really helpful if you had a chance to
>> look at patch 2 - its new for v3.
>
> I only skimmed over it, but it seems to go into the direction we'll need.
> Keeping order-0 performance unharmed should have highest priority. Hopefully my
> microbenchmarks are helpful.

Yes absolutely - are you able to share them??

>
>>
>>>
>>> But again, the conflicts are all trivial, so I'll happily rebase on top of
>>> whatever is
>>> in mm-unstable. Or move the relevant rework to the front so you can just carry
>>> them/base on them. (the batched variants for dup do make the contpte code much
>>> easier)
>>
>> So perhaps we should aim for mTHP, then this, then contpte last, benefiting from
>> the batching.
>
> Yeah. And again, I don't care too much if I have to rebase on top of your work
> if this here takes longer. It's all a fairly trivial conversion.
>
>>>
>>> [...]
>>>
>>>>>
>>>>>
>>>>> New (extended) hugetlb interface that operate on entire folio:
>>>>>    * hugetlb_add_new_anon_rmap() -> Already existed
>>>>>    * hugetlb_add_anon_rmap() -> Already existed
>>>>>    * hugetlb_try_dup_anon_rmap()
>>>>>    * hugetlb_try_share_anon_rmap()
>>>>>    * hugetlb_add_file_rmap()
>>>>>    * hugetlb_remove_rmap()
>>>>>
>>>>> New "ordinary" interface for small folios / THP::
>>>>>    * folio_add_new_anon_rmap() -> Already existed
>>>>>    * folio_add_anon_rmap_[pte|ptes|pmd]()
>>>>>    * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>>>>>    * folio_try_share_anon_rmap_[pte|pmd]()
>>>>>    * folio_add_file_rmap_[pte|ptes|pmd]()
>>>>>    * folio_dup_file_rmap_[pte|ptes|pmd]()
>>>>>    * folio_remove_rmap_[pte|ptes|pmd]()
>>>>
>>>> I'm not sure if there are official guidelines, but personally if we are
>>>> reworking the API, I'd take the opportunity to move "rmap" to the front of the
>>>> name, rather than having it burried in the middle as it is for some of these:
>>>>
>>>> rmap_hugetlb_*()
>>>>
>>>> rmap_folio_*()
>>>
>>> No strong opinion. But we might want slightly different names then. For example,
>>> it's "bio_add_folio" and not "bio_folio_add":
>>>
>>>
>>> rmap_add_new_anon_hugetlb()
>>> rmap_add_anon_hugetlb()
>>> ...
>>> rmap_remove_hugetlb()
>>>
>>>
>>> rmap_add_new_anon_folio()
>>> rmap_add_anon_folio_[pte|ptes|pmd]()
>>> ...
>>> rmap_dup_file_folio_[pte|ptes|pmd]()
>>> rmap_remove_folio_[pte|ptes|pmd]()
>>>
>>> Thoughts?
>>
>> Having now reviewed your series, I have a less strong opinion, perhaps it's
>> actually best with your original names; "folio" is actually the subject after
>> all; it's the thing being operated on.
>>
>
> I think having "folio" in there looks cleaner and more consistent to other
> functions.
>
> I tend to like "rmap_dup_file_folio_[pte|ptes|pmd]()", because then we have
> "file folio" and "anon folio" as one word.
>
> But then I wonder about the hugetlb part. Maybe simply
> "hugetlb_rmap_remove_folio()" etc.
>
> Having the "hugetlb_" prefix at the beginning feels like the right thing to do,
> looking at orher hugetlb special-handlings.
>
> But I'll wait a bit until I go crazy on renaming :)

I suspect we could argue in multiple directions for hours :)

Let's see if others have opinions.

FWIW, I've looked through all the patches; I like what I see! This is a really
nice clean up and will definitely help with the various patch sets I've been
working on. Apart from the comments I've already raised, looks in pretty good
shape to me.

>
>>
>>>
>>>>
>>>> I guess reading the patches will tell me, but what's the point of "ptes"?
>>>> Surely
>>>> you're either mapping at pte or pmd level, and the number of pages is
>>>> determined
>>>> by the folio size? (or presumably nr param passed in)
>>>
>>> It's really (currently) one function to handle 1 vs. multiple PTEs. For example:
>>>
>>> void folio_remove_rmap_ptes(struct folio *, struct page *, unsigned int nr,
>>>          struct vm_area_struct *);
>>> #define folio_remove_rmap_pte(folio, page, vma) \
>>>      folio_remove_rmap_ptes(folio, page, 1, vma)
>>> void folio_remove_rmap_pmd(struct folio *, struct page *,
>>>          struct vm_area_struct *);
>>
>> Yeah now that I've looked at the series, this makes sense. "ptes" was originally
>> making me think of contpte, but I suspect I'll be the only one with that
>> association :)
>
> Ah, yes :)
>

2023-12-05 13:51:40

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05.12.23 14:40, Ryan Roberts wrote:
> On 05/12/2023 13:18, David Hildenbrand wrote:
>> On 05.12.23 14:17, David Hildenbrand wrote:
>>> On 05.12.23 14:12, Ryan Roberts wrote:
>>>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>>>> remove them.
>>>>>
>>>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>>>> baching during fork() soon.
>>>>>
>>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>>> ---
>>>>>    include/linux/mm.h   |   6 --
>>>>>    include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>>>>    2 files changed, 100 insertions(+), 51 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct
>>>>> vm_area_struct *vma,
>>>>>        return folio_maybe_dma_pinned(folio);
>>>>>    }
>>>>>    -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>>>> -                      struct page *page)
>>>>> -{
>>>>> -    return folio_needs_cow_for_dma(vma, page_folio(page));
>>>>> -}
>>>>> -
>>>>>    /**
>>>>>     * is_zero_page - Query if a page is a zero page
>>>>>     * @page: The page to query
>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>>> index 21d72cc602adc..84439f7720c62 100644
>>>>> --- a/include/linux/rmap.h
>>>>> +++ b/include/linux/rmap.h
>>>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct
>>>>> folio *folio,
>>>>>    #endif
>>>>>    }
>>>>>    -static inline void __page_dup_rmap(struct page *page, bool compound)
>>>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>>>
>>>> __always_inline?
>>>
>>> Yes.
>>
>> Ah, no, I did this for a reason. This function lives in a header, so it will
>> always be inlined.
>>
>
> Really? It will certainly be duplicated across every compilation unit, but
> that's separate from being inlined - if the optimizer is off, won't it just end
> up as an out-of-line function in every compilation unit?

Good point, I didn't really consider that here, and thinking about it it
makes perfect sense.

I think the compiler might even ignore "always_inline". I read that
especially with recursion the compiler might ignore that. But people can
then complain to the compiler writers about performance issues here, we
told the compiler what we think is best.

--
Cheers,

David / dhildenb

2023-12-05 13:56:03

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul


>>>>>>>
>>>> Regarding the contpte, I think it needs more work. Especially, as raised, to not
>>>> degrade
>>>> order-0 performance. Maybe we won't make the next merge window (and you already
>>>> predicated
>>>> that in some cover letter :P ). Let's see.
>>>
>>> Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
>>> for patch 2 in that series - would also be really helpful if you had a chance to
>>> look at patch 2 - its new for v3.
>>
>> I only skimmed over it, but it seems to go into the direction we'll need.
>> Keeping order-0 performance unharmed should have highest priority. Hopefully my
>> microbenchmarks are helpful.
>
> Yes absolutely - are you able to share them??

I shared them in the reply to your patchset. Let me know if you can't
find them.

[...]

>>> Having now reviewed your series, I have a less strong opinion, perhaps it's
>>> actually best with your original names; "folio" is actually the subject after
>>> all; it's the thing being operated on.
>>>
>>
>> I think having "folio" in there looks cleaner and more consistent to other
>> functions.
>>
>> I tend to like "rmap_dup_file_folio_[pte|ptes|pmd]()", because then we have
>> "file folio" and "anon folio" as one word.
>>
>> But then I wonder about the hugetlb part. Maybe simply
>> "hugetlb_rmap_remove_folio()" etc.
>>
>> Having the "hugetlb_" prefix at the beginning feels like the right thing to do,
>> looking at orher hugetlb special-handlings.
>>
>> But I'll wait a bit until I go crazy on renaming :)
>
> I suspect we could argue in multiple directions for hours :)

:)

>
> Let's see if others have opinions.
>
> FWIW, I've looked through all the patches; I like what I see! This is a really
> nice clean up and will definitely help with the various patch sets I've been
> working on. Apart from the comments I've already raised, looks in pretty good
> shape to me.

Thanks!

--
Cheers,

David / dhildenb

2023-12-05 14:02:55

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05/12/2023 13:50, David Hildenbrand wrote:
> On 05.12.23 14:40, Ryan Roberts wrote:
>> On 05/12/2023 13:18, David Hildenbrand wrote:
>>> On 05.12.23 14:17, David Hildenbrand wrote:
>>>> On 05.12.23 14:12, Ryan Roberts wrote:
>>>>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>>>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>>>>> remove them.
>>>>>>
>>>>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>>>>> baching during fork() soon.
>>>>>>
>>>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>>>> ---
>>>>>>     include/linux/mm.h   |   6 --
>>>>>>     include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>>>>>     2 files changed, 100 insertions(+), 51 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>>>>> --- a/include/linux/mm.h
>>>>>> +++ b/include/linux/mm.h
>>>>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct
>>>>>> vm_area_struct *vma,
>>>>>>         return folio_maybe_dma_pinned(folio);
>>>>>>     }
>>>>>>     -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>>>>> -                      struct page *page)
>>>>>> -{
>>>>>> -    return folio_needs_cow_for_dma(vma, page_folio(page));
>>>>>> -}
>>>>>> -
>>>>>>     /**
>>>>>>      * is_zero_page - Query if a page is a zero page
>>>>>>      * @page: The page to query
>>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>>>> index 21d72cc602adc..84439f7720c62 100644
>>>>>> --- a/include/linux/rmap.h
>>>>>> +++ b/include/linux/rmap.h
>>>>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct
>>>>>> folio *folio,
>>>>>>     #endif
>>>>>>     }
>>>>>>     -static inline void __page_dup_rmap(struct page *page, bool compound)
>>>>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>>>>
>>>>> __always_inline?
>>>>
>>>> Yes.
>>>
>>> Ah, no, I did this for a reason. This function lives in a header, so it will
>>> always be inlined.
>>>
>>
>> Really? It will certainly be duplicated across every compilation unit, but
>> that's separate from being inlined - if the optimizer is off, won't it just end
>> up as an out-of-line function in every compilation unit?
>
> Good point, I didn't really consider that here, and thinking about it it makes
> perfect sense.
>
> I think the compiler might even ignore "always_inline". I read that especially
> with recursion the compiler might ignore that. But people can then complain to
> the compiler writers about performance issues here, we told the compiler what we
> think is best.
>

To be honest, my comment assumed that you had a good reason for using
__always_inline, and in that case then you should be consistent. But if you
don't have a good reason, you should probably just use inline and let the
compiler do what it thinks best?

2023-12-05 14:12:27

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()

On 05.12.23 15:02, Ryan Roberts wrote:
> On 05/12/2023 13:50, David Hildenbrand wrote:
>> On 05.12.23 14:40, Ryan Roberts wrote:
>>> On 05/12/2023 13:18, David Hildenbrand wrote:
>>>> On 05.12.23 14:17, David Hildenbrand wrote:
>>>>> On 05.12.23 14:12, Ryan Roberts wrote:
>>>>>> On 04/12/2023 14:21, David Hildenbrand wrote:
>>>>>>> The last user of page_needs_cow_for_dma() and __page_dup_rmap() are gone,
>>>>>>> remove them.
>>>>>>>
>>>>>>> Add folio_try_dup_anon_rmap_ptes() right away, we want to perform rmap
>>>>>>> baching during fork() soon.
>>>>>>>
>>>>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>>>>> ---
>>>>>>>     include/linux/mm.h   |   6 --
>>>>>>>     include/linux/rmap.h | 145 +++++++++++++++++++++++++++++--------------
>>>>>>>     2 files changed, 100 insertions(+), 51 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>> index 24c1c7c5a99c0..f7565b35ae931 100644
>>>>>>> --- a/include/linux/mm.h
>>>>>>> +++ b/include/linux/mm.h
>>>>>>> @@ -1964,12 +1964,6 @@ static inline bool folio_needs_cow_for_dma(struct
>>>>>>> vm_area_struct *vma,
>>>>>>>         return folio_maybe_dma_pinned(folio);
>>>>>>>     }
>>>>>>>     -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
>>>>>>> -                      struct page *page)
>>>>>>> -{
>>>>>>> -    return folio_needs_cow_for_dma(vma, page_folio(page));
>>>>>>> -}
>>>>>>> -
>>>>>>>     /**
>>>>>>>      * is_zero_page - Query if a page is a zero page
>>>>>>>      * @page: The page to query
>>>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>>>>> index 21d72cc602adc..84439f7720c62 100644
>>>>>>> --- a/include/linux/rmap.h
>>>>>>> +++ b/include/linux/rmap.h
>>>>>>> @@ -354,68 +354,123 @@ static inline void folio_dup_file_rmap_pmd(struct
>>>>>>> folio *folio,
>>>>>>>     #endif
>>>>>>>     }
>>>>>>>     -static inline void __page_dup_rmap(struct page *page, bool compound)
>>>>>>> +static inline int __folio_try_dup_anon_rmap(struct folio *folio,
>>>>>>
>>>>>> __always_inline?
>>>>>
>>>>> Yes.
>>>>
>>>> Ah, no, I did this for a reason. This function lives in a header, so it will
>>>> always be inlined.
>>>>
>>>
>>> Really? It will certainly be duplicated across every compilation unit, but
>>> that's separate from being inlined - if the optimizer is off, won't it just end
>>> up as an out-of-line function in every compilation unit?
>>
>> Good point, I didn't really consider that here, and thinking about it it makes
>> perfect sense.
>>
>> I think the compiler might even ignore "always_inline". I read that especially
>> with recursion the compiler might ignore that. But people can then complain to
>> the compiler writers about performance issues here, we told the compiler what we
>> think is best.
>>
>
> To be honest, my comment assumed that you had a good reason for using
> __always_inline, and in that case then you should be consistent. But if you
> don't have a good reason, you should probably just use inline and let the
> compiler do what it thinks best?

I think __always_inline is the right thing to do here, we really want
the compiler to generate specialized code. I was just somehow ignoring
the scenario you described :)

__always_inline it is.

--
Cheers,

David / dhildenb

2023-12-06 01:25:47

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 03/39] mm/rmap: introduce and use hugetlb_add_file_rmap()



On 12/4/23 22:21, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Right now we're using page_dup_file_rmap() in some cases where "ordinary"
> rmap code would have used page_add_file_rmap(). So let's introduce and
> use hugetlb_add_file_rmap() instead. We won't be adding a
> "hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
> doing the same: "dup" is just an optimization for "add".
>
> What remains is a single page_dup_file_rmap() call in fork() code.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Yin Fengwei <[email protected]>

> ---
> include/linux/rmap.h | 7 +++++++
> mm/hugetlb.c | 6 +++---
> mm/migrate.c | 2 +-
> 3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index e8d1dc1d5361f..0a81e8420a961 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -208,6 +208,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +static inline void hugetlb_add_file_rmap(struct folio *folio)
> +{
> + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> +
> + atomic_inc(&folio->_entire_mapcount);
> +}
> +
> static inline void hugetlb_remove_rmap(struct folio *folio)
> {
> atomic_dec(&folio->_entire_mapcount);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d17bb53b19ff2..541a8f38cfdc7 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5401,7 +5401,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> * sleep during the process.
> */
> if (!folio_test_anon(pte_folio)) {
> - page_dup_file_rmap(&pte_folio->page, true);
> + hugetlb_add_file_rmap(pte_folio);
> } else if (page_try_dup_anon_rmap(&pte_folio->page,
> true, src_vma)) {
> pte_t src_pte_old = entry;
> @@ -6272,7 +6272,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> if (anon_rmap)
> hugetlb_add_new_anon_rmap(folio, vma, haddr);
> else
> - page_dup_file_rmap(&folio->page, true);
> + hugetlb_add_file_rmap(folio);
> new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
> && (vma->vm_flags & VM_SHARED)));
> /*
> @@ -6723,7 +6723,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> goto out_release_unlock;
>
> if (folio_in_pagecache)
> - page_dup_file_rmap(&folio->page, true);
> + hugetlb_add_file_rmap(folio);
> else
> hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4cb849fa0dd2c..de9d94b99ab78 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio,
> hugetlb_add_anon_rmap(folio, vma, pvmw.address,
> rmap_flags);
> else
> - page_dup_file_rmap(new, true);
> + hugetlb_add_file_rmap(folio);
> set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
> psize);
> } else

2023-12-06 01:25:55

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()



On 12/4/23 22:21, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
> code from page_remove_rmap(). This effectively removes one check on the
> small-folio path as well.
>
> Note: all possible candidates that need care are page_remove_rmap() that
> pass compound=true.
>
> Signed-off-by: David Hildenbrand <[email protected]>


> ---
> include/linux/rmap.h | 5 +++++
> mm/hugetlb.c | 4 ++--
> mm/rmap.c | 17 ++++++++---------
> 3 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 4c5bfeb054636..e8d1dc1d5361f 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -208,6 +208,11 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +static inline void hugetlb_remove_rmap(struct folio *folio)
> +{
> + atomic_dec(&folio->_entire_mapcount);
> +}
> +
> static inline void __page_dup_rmap(struct page *page, bool compound)
> {
> if (compound) {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4cfa0679661e2..d17bb53b19ff2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5669,7 +5669,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> make_pte_marker(PTE_MARKER_UFFD_WP),
> sz);
> hugetlb_count_sub(pages_per_huge_page(h), mm);
> - page_remove_rmap(page, vma, true);
> + hugetlb_remove_rmap(page_folio(page));
>
> spin_unlock(ptl);
> tlb_remove_page_size(tlb, page, huge_page_size(h));
> @@ -5980,7 +5980,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>
> /* Break COW or unshare */
> huge_ptep_clear_flush(vma, haddr, ptep);
> - page_remove_rmap(&old_folio->page, vma, true);
> + hugetlb_remove_rmap(old_folio);
> hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
> if (huge_pte_uffd_wp(pte))
> newpte = huge_pte_mkuffd_wp(newpte);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 112467c30b2c9..5037581b79ec6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1440,13 +1440,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> - /* Hugetlb pages are not counted in NR_*MAPPED */
> - if (unlikely(folio_test_hugetlb(folio))) {
> - /* hugetlb pages are always mapped with pmds */
> - atomic_dec(&folio->_entire_mapcount);
> - return;
> - }
> -
> /* Is page being unmapped by PTE? Is this its last map to be removed? */
> if (likely(!compound)) {
> last = atomic_add_negative(-1, &page->_mapcount);
> @@ -1804,7 +1797,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> dec_mm_counter(mm, mm_counter_file(&folio->page));
> }
> discard:
> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
> + if (unlikely(folio_test_hugetlb(folio)))
> + hugetlb_remove_rmap(folio);
> + else
> + page_remove_rmap(subpage, vma, false);
> if (vma->vm_flags & VM_LOCKED)
> mlock_drain_local();
> folio_put(folio);
> @@ -2157,7 +2153,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> */
> }
>
> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
> + if (unlikely(folio_test_hugetlb(folio)))
> + hugetlb_remove_rmap(folio);
> + else
> + page_remove_rmap(subpage, vma, false);
> if (vma->vm_flags & VM_LOCKED)
> mlock_drain_local();
> folio_put(folio);

2023-12-06 01:26:08

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 04/39] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()



On 12/4/23 22:21, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
>
> Note that is_device_private_page() does not apply to hugetlb.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Yin Fengwei <[email protected]>

> ---
> include/linux/mm.h | 12 +++++++++---
> include/linux/rmap.h | 15 +++++++++++++++
> mm/hugetlb.c | 3 +--
> 3 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 418d26608ece7..24c1c7c5a99c0 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1953,15 +1953,21 @@ static inline bool page_maybe_dma_pinned(struct page *page)
> *
> * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
> */
> -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
> - struct page *page)
> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
> + struct folio *folio)
> {
> VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));
>
> if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
> return false;
>
> - return page_maybe_dma_pinned(page);
> + return folio_maybe_dma_pinned(folio);
> +}
> +
> +static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
> + struct page *page)
> +{
> + return folio_needs_cow_for_dma(vma, page_folio(page));
> }
>
> /**
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 0a81e8420a961..8068c332e2ce5 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -208,6 +208,21 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +/* See page_try_dup_anon_rmap() */
> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> + struct vm_area_struct *vma)
> +{
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> +
> + if (PageAnonExclusive(&folio->page)) {
> + if (unlikely(folio_needs_cow_for_dma(vma, folio)))
> + return -EBUSY;
> + ClearPageAnonExclusive(&folio->page);
> + }
> + atomic_inc(&folio->_entire_mapcount);
> + return 0;
> +}
> +
> static inline void hugetlb_add_file_rmap(struct folio *folio)
> {
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 541a8f38cfdc7..d927f8b2893c0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5402,8 +5402,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> */
> if (!folio_test_anon(pte_folio)) {
> hugetlb_add_file_rmap(pte_folio);
> - } else if (page_try_dup_anon_rmap(&pte_folio->page,
> - true, src_vma)) {
> + } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
> pte_t src_pte_old = entry;
> struct folio *new_folio;
>

2023-12-06 01:26:18

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 06/39] mm/rmap: add hugetlb sanity checks



On 12/4/23 22:21, David Hildenbrand wrote:
> Let's make sure we end up with the right folios in the right functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Yin Fengwei <[email protected]>

> ---
> include/linux/rmap.h | 7 +++++++
> mm/rmap.c | 6 ++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 3f38141b53b9d..77e336f86c72d 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -212,6 +212,7 @@ void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> struct vm_area_struct *vma)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> if (PageAnonExclusive(&folio->page)) {
> @@ -226,6 +227,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> /* See page_try_share_anon_rmap() */
> static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
>
> @@ -245,6 +247,7 @@ static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
>
> static inline void hugetlb_add_file_rmap(struct folio *folio)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
>
> atomic_inc(&folio->_entire_mapcount);
> @@ -252,11 +255,15 @@ static inline void hugetlb_add_file_rmap(struct folio *folio)
>
> static inline void hugetlb_remove_rmap(struct folio *folio)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> +
> atomic_dec(&folio->_entire_mapcount);
> }
>
> static inline void __page_dup_rmap(struct page *page, bool compound)
> {
> + VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
> +
> if (compound) {
> struct folio *folio = (struct folio *)page;
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 2f1af3958e687..a735ecca47a81 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1313,6 +1313,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> {
> int nr;
>
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> __folio_set_swapbacked(folio);
>
> @@ -1353,6 +1354,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> unsigned int nr_pmdmapped = 0, first;
> int nr = 0;
>
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>
> /* Is page being mapped by PTE? Is this its first map to be added? */
> @@ -1438,6 +1440,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
> bool last;
> enum node_stat_item idx;
>
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> /* Is page being unmapped by PTE? Is this its last map to be removed? */
> @@ -2590,6 +2593,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
> void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> unsigned long address, rmap_t flags)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>
> atomic_inc(&folio->_entire_mapcount);
> @@ -2602,6 +2606,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> void hugetlb_add_new_anon_rmap(struct folio *folio,
> struct vm_area_struct *vma, unsigned long address)
> {
> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> +
> BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> /* increment count (starts at -1) */
> atomic_set(&folio->_entire_mapcount, 0);

2023-12-06 01:33:13

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()



On 12/4/23 22:21, David Hildenbrand wrote:
> Let's get rid of the compound parameter and instead define implicitly
> which mappings we're adding. That is more future proof, easier to read
> and harder to mess up.
>
> Use an enum to express the granularity internally. Make the compiler
> always special-case on the granularity by using __always_inline.
>
> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
> folio_test_pmd_mappable() check by a config check in the caller and
> sanity checks. Convert the single user of folio_add_file_rmap_range().
>
> This function design can later easily be extended to PUDs and to batch
> PMDs. Note that for now we don't support anything bigger than
> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
> will catch if that ever changes.
I do have a question for the folio which has larger size than PMD in the
future:
Will the folio size be only just like PMD size/PUD size? Or it's possible between
PUD size and PMD size?

If it's possible between PUD size and PMD size, will the mapping be mixed PMD mapping
and PTE mapping or just PTE mapping. I suppose it could be mixed because of efficiency
of page walker.

It may just be too early to consider this now.

Regards
Yin, Fengwei

>
> Next up is removing page_remove_rmap() along with its "compound"
> parameter and smilarly converting all other rmap functions.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/rmap.h | 47 +++++++++++++++++++++++++++--
> mm/memory.c | 2 +-
> mm/rmap.c | 72 ++++++++++++++++++++++++++++----------------
> 3 files changed, 92 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 77e336f86c72d..a4a30c361ac50 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -186,6 +186,45 @@ typedef int __bitwise rmap_t;
> */
> #define RMAP_COMPOUND ((__force rmap_t)BIT(1))
>
> +/*
> + * Internally, we're using an enum to specify the granularity. Usually,
> + * we make the compiler create specialized variants for the different
> + * granularity.
> + */
> +enum rmap_mode {
> + RMAP_MODE_PTE = 0,
> + RMAP_MODE_PMD,
> +};
> +
> +static inline void __folio_rmap_sanity_checks(struct folio *folio,
> + struct page *page, unsigned int nr_pages, enum rmap_mode mode)
> +{
> + /* hugetlb folios are handled separately. */
> + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> + !folio_test_large_rmappable(folio), folio);
> +
> + VM_WARN_ON_ONCE(!nr_pages || nr_pages > folio_nr_pages(folio));
> + VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
> + VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
> +
> + switch (mode) {
> + case RMAP_MODE_PTE:
> + break;
> + case RMAP_MODE_PMD:
> + /*
> + * We don't support folios larger than a single PMD yet. So
> + * when RMAP_MODE_PMD is set, we assume that we are creating
> + * a single "entire" mapping of the folio.
> + */
> + VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
> + VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
> + break;
> + default:
> + VM_WARN_ON_ONCE(true);
> + }
> +}
> +
> /*
> * rmap interfaces called when adding or removing pte of page
> */
> @@ -198,8 +237,12 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
> void page_add_file_rmap(struct page *, struct vm_area_struct *,
> bool compound);
> -void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> - struct vm_area_struct *, bool compound);
> +void folio_add_file_rmap_ptes(struct folio *, struct page *, unsigned int nr,
> + struct vm_area_struct *);
> +#define folio_add_file_rmap_pte(folio, page, vma) \
> + folio_add_file_rmap_ptes(folio, page, 1, vma)
> +void folio_add_file_rmap_pmd(struct folio *, struct page *,
> + struct vm_area_struct *);
> void page_remove_rmap(struct page *, struct vm_area_struct *,
> bool compound);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 1f18ed4a54971..15325587cff01 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4414,7 +4414,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
> folio_add_lru_vma(folio, vma);
> } else {
> add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
> - folio_add_file_rmap_range(folio, page, nr, vma, false);
> + folio_add_file_rmap_ptes(folio, page, nr, vma);
> }
> set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a735ecca47a81..1614d98062948 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1334,31 +1334,19 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> SetPageAnonExclusive(&folio->page);
> }
>
> -/**
> - * folio_add_file_rmap_range - add pte mapping to page range of a folio
> - * @folio: The folio to add the mapping to
> - * @page: The first page to add
> - * @nr_pages: The number of pages which will be mapped
> - * @vma: the vm area in which the mapping is added
> - * @compound: charge the page as compound or small page
> - *
> - * The page range of folio is defined by [first_page, first_page + nr_pages)
> - *
> - * The caller needs to hold the pte lock.
> - */
> -void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> - unsigned int nr_pages, struct vm_area_struct *vma,
> - bool compound)
> +static __always_inline void __folio_add_file_rmap(struct folio *folio,
> + struct page *page, unsigned int nr_pages,
> + struct vm_area_struct *vma, enum rmap_mode mode)
> {
> atomic_t *mapped = &folio->_nr_pages_mapped;
> unsigned int nr_pmdmapped = 0, first;
> int nr = 0;
>
> - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
> - VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
> + VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>
> /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(!compound)) {
> + if (likely(mode == RMAP_MODE_PTE)) {
> do {
> first = atomic_inc_and_test(&page->_mapcount);
> if (first && folio_test_large(folio)) {
> @@ -1369,9 +1357,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> if (first)
> nr++;
> } while (page++, --nr_pages > 0);
> - } else if (folio_test_pmd_mappable(folio)) {
> - /* That test is redundant: it's for safety or to optimize out */
> -
> + } else if (mode == RMAP_MODE_PMD) {
> first = atomic_inc_and_test(&folio->_entire_mapcount);
> if (first) {
> nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1399,6 +1385,43 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> mlock_vma_folio(folio, vma);
> }
>
> +/**
> + * folio_add_file_rmap_ptes - add PTE mappings to a page range of a folio
> + * @folio: The folio to add the mappings to
> + * @page: The first page to add
> + * @nr_pages: The number of pages that will be mapped using PTEs
> + * @vma: The vm area in which the mappings are added
> + *
> + * The page range of the folio is defined by [page, page + nr_pages)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_ptes(struct folio *folio, struct page *page,
> + unsigned int nr_pages, struct vm_area_struct *vma)
> +{
> + __folio_add_file_rmap(folio, page, nr_pages, vma, RMAP_MODE_PTE);
> +}
> +
> +/**
> + * folio_add_file_rmap_pmd - add a PMD mapping to a page range of a folio
> + * @folio: The folio to add the mapping to
> + * @page: The first page to add
> + * @vma: The vm area in which the mapping is added
> + *
> + * The page range of the folio is defined by [page, page + HPAGE_PMD_NR)
> + *
> + * The caller needs to hold the page table lock.
> + */
> +void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
> + struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + __folio_add_file_rmap(folio, page, HPAGE_PMD_NR, vma, RMAP_MODE_PMD);
> +#else
> + WARN_ON_ONCE(true);
> +#endif
> +}
> +
> /**
> * page_add_file_rmap - add pte mapping to a file page
> * @page: the page to add the mapping to
> @@ -1411,16 +1434,13 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> bool compound)
> {
> struct folio *folio = page_folio(page);
> - unsigned int nr_pages;
>
> VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>
> if (likely(!compound))
> - nr_pages = 1;
> + folio_add_file_rmap_pte(folio, page, vma);
> else
> - nr_pages = folio_nr_pages(folio);
> -
> - folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
> + folio_add_file_rmap_pmd(folio, page, vma);
> }
>
> /**

2023-12-06 02:11:25

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 05/39] mm/rmap: introduce and use hugetlb_try_share_anon_rmap()



On 12/4/23 22:21, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
>
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
>
> Note that try_to_unmap_one() does not need care. Easy to spot because
> among all that nasty hugetlb special-casing in that function, we're not
> using set_huge_pte_at() on the anon path -- well, and that code assumes
> that we we would want to swapout.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Yin Fengwei <[email protected]>

> ---
> include/linux/rmap.h | 20 ++++++++++++++++++++
> mm/rmap.c | 15 ++++++++++-----
> 2 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 8068c332e2ce5..3f38141b53b9d 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -223,6 +223,26 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> return 0;
> }
>
> +/* See page_try_share_anon_rmap() */
> +static inline int hugetlb_try_share_anon_rmap(struct folio *folio)
> +{
> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> + VM_WARN_ON_FOLIO(!PageAnonExclusive(&folio->page), folio);
> +
> + /* See page_try_share_anon_rmap() */
> + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
> + smp_mb();
> +
> + if (unlikely(folio_maybe_dma_pinned(folio)))
> + return -EBUSY;
> + ClearPageAnonExclusive(&folio->page);
> +
> + /* See page_try_share_anon_rmap() */
> + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
> + smp_mb__after_atomic();
> + return 0;
> +}
> +
> static inline void hugetlb_add_file_rmap(struct folio *folio)
> {
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 5037581b79ec6..2f1af3958e687 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2105,13 +2105,18 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> !anon_exclusive, subpage);
>
> /* See page_try_share_anon_rmap(): clear PTE first. */
> - if (anon_exclusive &&
> - page_try_share_anon_rmap(subpage)) {
> - if (folio_test_hugetlb(folio))
> + if (folio_test_hugetlb(folio)) {
> + if (anon_exclusive &&
> + hugetlb_try_share_anon_rmap(folio)) {
> set_huge_pte_at(mm, address, pvmw.pte,
> pteval, hsz);
> - else
> - set_pte_at(mm, address, pvmw.pte, pteval);
> + ret = false;
> + page_vma_mapped_walk_done(&pvmw);
> + break;
> + }
> + } else if (anon_exclusive &&
> + page_try_share_anon_rmap(page)) {
> + set_pte_at(mm, address, pvmw.pte, pteval);
> ret = false;
> page_vma_mapped_walk_done(&pvmw);
> break;

2023-12-06 09:18:00

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

On 06.12.23 02:30, Yin Fengwei wrote:
>
>
> On 12/4/23 22:21, David Hildenbrand wrote:
>> Let's get rid of the compound parameter and instead define implicitly
>> which mappings we're adding. That is more future proof, easier to read
>> and harder to mess up.
>>
>> Use an enum to express the granularity internally. Make the compiler
>> always special-case on the granularity by using __always_inline.
>>
>> Add plenty of sanity checks with CONFIG_DEBUG_VM. Replace the
>> folio_test_pmd_mappable() check by a config check in the caller and
>> sanity checks. Convert the single user of folio_add_file_rmap_range().
>>
>> This function design can later easily be extended to PUDs and to batch
>> PMDs. Note that for now we don't support anything bigger than
>> PMD-sized folios (as we cleanly separated hugetlb handling). Sanity checks
>> will catch if that ever changes.
> I do have a question for the folio which has larger size than PMD in the
> future:
> Will the folio size be only just like PMD size/PUD size? Or it's possible between
> PUD size and PMD size?

I strongly assume that we'll see in the future folios larger than a
single PMD (for example, 4 MiB on x86-64).

This will require quite some care in other areas (and this series, as it
converts some PMD handling function to folios, further prepares for that).

>
> If it's possible between PUD size and PMD size, will the mapping be mixed PMD mapping
> and PTE mapping or just PTE mapping. I suppose it could be mixed because of efficiency
> of page walker.

Depending on with which alignment such larger folios are mapped into the
page tables and some other factors, we might indeed end up having parts
of the folio mapped by PMDs and parts by PTEs. Well, and once we involve
PUDs we might have a mixture of all of these :)

The current API here will be able to deal with that (excluding the _pud
variant). To improve performance, we might want PMD batching and have
_pmds functions.

We'll have to tweak the rmap internals to do the rmap accounting
properly then (and the sanity checks will catch any of that and
highlight the need for rmap-internal extensions); maybe once we come to
that, we no longer have these subpage mapcounts, but we'll have to see
if/when/how that happens.

>
> It may just be too early to consider this now.
>

I had that in mind while working on this. I assume it will take some
more time to handle everything else that needs to be prepared for that,
but the rmap interface should be able to handle that, only the internals
will have to be extended.


Thanks!

--
Cheers,

David / dhildenb

2023-12-06 12:21:08

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()

On 06.12.23 02:22, Yin Fengwei wrote:
>
>
> On 12/4/23 22:21, David Hildenbrand wrote:
>> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
>> For example, hugetlb currently only supports entire mappings, and treats
>> any mapping as mapped using a single "logical PTE". Let's move it out
>> of the way so we can overhaul our "ordinary" rmap.
>> implementation/interface.
>>
>> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
>> code from page_remove_rmap(). This effectively removes one check on the
>> small-folio path as well.
>>
>> Note: all possible candidates that need care are page_remove_rmap() that
>> pass compound=true.
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
>

I suspect you wanted to place your RB tag here? :)

--
Cheers,

David / dhildenb

2023-12-07 00:59:38

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 02/39] mm/rmap: introduce and use hugetlb_remove_rmap()



On 12/6/23 20:11, David Hildenbrand wrote:
> On 06.12.23 02:22, Yin Fengwei wrote:
>>
>>
>> On 12/4/23 22:21, David Hildenbrand wrote:
>>> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
>>> For example, hugetlb currently only supports entire mappings, and treats
>>> any mapping as mapped using a single "logical PTE". Let's move it out
>>> of the way so we can overhaul our "ordinary" rmap.
>>> implementation/interface.
>>>
>>> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
>>> code from page_remove_rmap(). This effectively removes one check on the
>>> small-folio path as well.
>>>
>>> Note: all possible candidates that need care are page_remove_rmap() that
>>>        pass compound=true.
>>>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>
>>
>
> I suspect you wanted to place your RB tag here? :)
Oops. Yes. I meant my RB tag here.


Regards
Yin, Fengwei

2023-12-08 01:41:42

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 08/39] mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()



On 12/4/2023 10:21 PM, David Hildenbrand wrote:
> Let's convert insert_page_into_pte_locked() and do_set_pmd(). While at it,
> perform some folio conversion.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

Yes. I make sure my RB tag this time. :).

Regards
Yin, Fengwei

> ---
> mm/memory.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 15325587cff01..be7fe58f7c297 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1845,12 +1845,14 @@ static int validate_page_before_insert(struct page *page)
> static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
> unsigned long addr, struct page *page, pgprot_t prot)
> {
> + struct folio *folio = page_folio(page);
> +
> if (!pte_none(ptep_get(pte)))
> return -EBUSY;
> /* Ok, finally just insert the thing.. */
> - get_page(page);
> + folio_get(folio);
> inc_mm_counter(vma->vm_mm, mm_counter_file(page));
> - page_add_file_rmap(page, vma, false);
> + folio_add_file_rmap_pte(folio, page, vma);
> set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
> return 0;
> }
> @@ -4308,6 +4310,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)
>
> vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> {
> + struct folio *folio = page_folio(page);
> struct vm_area_struct *vma = vmf->vma;
> bool write = vmf->flags & FAULT_FLAG_WRITE;
> unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
> @@ -4317,8 +4320,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> if (!transhuge_vma_suitable(vma, haddr))
> return ret;
>
> - page = compound_head(page);
> - if (compound_order(page) != HPAGE_PMD_ORDER)
> + if (page != &folio->page || folio_order(folio) != HPAGE_PMD_ORDER)
> return ret;
>
> /*
> @@ -4327,7 +4329,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> * check. This kind of THP just can be PTE mapped. Access to
> * the corrupted subpage should trigger SIGBUS as expected.
> */
> - if (unlikely(PageHasHWPoisoned(page)))
> + if (unlikely(folio_test_has_hwpoisoned(folio)))
> return ret;
>
> /*
> @@ -4351,7 +4353,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
>
> add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
> - page_add_file_rmap(page, vma, true);
> + folio_add_file_rmap_pmd(folio, page, vma);
>
> /*
> * deposit and withdraw with pmd lock held

2023-12-08 01:42:09

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 09/39] mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()



On 12/4/2023 10:21 PM, David Hildenbrand wrote:
> Let's convert remove_migration_pmd() and while at it, perform some folio
> conversion.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Yin Fengwei <[email protected]>

> ---
> mm/huge_memory.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4f542444a91f2..cb33c6e0404cf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3276,6 +3276,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>
> void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> {
> + struct folio *folio = page_folio(new);
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> @@ -3287,7 +3288,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> return;
>
> entry = pmd_to_swp_entry(*pvmw->pmd);
> - get_page(new);
> + folio_get(folio);
> pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
> if (pmd_swp_soft_dirty(*pvmw->pmd))
> pmde = pmd_mksoft_dirty(pmde);
> @@ -3298,10 +3299,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> if (!is_migration_entry_young(entry))
> pmde = pmd_mkold(pmde);
> /* NOTE: this may contain setting soft-dirty on some archs */
> - if (PageDirty(new) && is_migration_entry_dirty(entry))
> + if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
> pmde = pmd_mkdirty(pmde);
>
> - if (PageAnon(new)) {
> + if (folio_test_anon(folio)) {
> rmap_t rmap_flags = RMAP_COMPOUND;
>
> if (!is_readable_migration_entry(entry))
> @@ -3309,9 +3310,9 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
>
> page_add_anon_rmap(new, vma, haddr, rmap_flags);
> } else {
> - page_add_file_rmap(new, vma, true);
> + folio_add_file_rmap_pmd(folio, new, vma);
> }
> - VM_BUG_ON(pmd_write(pmde) && PageAnon(new) && !PageAnonExclusive(new));
> + VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new));
> set_pmd_at(mm, haddr, pvmw->pmd, pmde);
>
> /* No need to invalidate - it was non-present before */

2023-12-08 01:42:30

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 10/39] mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()



On 12/4/2023 10:21 PM, David Hildenbrand wrote:
> Let's convert remove_migration_pte().
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

> ---
> mm/migrate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index de9d94b99ab78..efc19f53b05e6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -262,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio,
> page_add_anon_rmap(new, vma, pvmw.address,
> rmap_flags);
> else
> - page_add_file_rmap(new, vma, false);
> + folio_add_file_rmap_pte(folio, new, vma);
> set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
> }
> if (vma->vm_flags & VM_LOCKED)

2023-12-08 01:42:51

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 11/39] mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()



On 12/4/2023 10:21 PM, David Hildenbrand wrote:
> Let's convert mfill_atomic_install_pte().
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

> ---
> mm/userfaultfd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 0b6ca553bebec..abf4c579d328a 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -114,7 +114,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> /* Usually, cache pages are already added to LRU */
> if (newly_allocated)
> folio_add_lru(folio);
> - page_add_file_rmap(page, dst_vma, false);
> + folio_add_file_rmap_pte(folio, page, dst_vma);
> } else {
> page_add_new_anon_rmap(page, dst_vma, dst_addr);
> folio_add_lru_vma(folio, dst_vma);

2023-12-08 01:43:21

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 12/39] mm/rmap: remove page_add_file_rmap()



On 12/4/2023 10:21 PM, David Hildenbrand wrote:
> All users are gone, let's remove it.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

> ---
> include/linux/rmap.h | 2 --
> mm/rmap.c | 21 ---------------------
> 2 files changed, 23 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index a4a30c361ac50..95f7b94a70295 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -235,8 +235,6 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
> unsigned long address);
> void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
> -void page_add_file_rmap(struct page *, struct vm_area_struct *,
> - bool compound);
> void folio_add_file_rmap_ptes(struct folio *, struct page *, unsigned int nr,
> struct vm_area_struct *);
> #define folio_add_file_rmap_pte(folio, page, vma) \
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1614d98062948..53e2c653be99a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1422,27 +1422,6 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
> #endif
> }
>
> -/**
> - * page_add_file_rmap - add pte mapping to a file page
> - * @page: the page to add the mapping to
> - * @vma: the vm area in which the mapping is added
> - * @compound: charge the page as compound or small page
> - *
> - * The caller needs to hold the pte lock.
> - */
> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> - bool compound)
> -{
> - struct folio *folio = page_folio(page);
> -
> - VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
> -
> - if (likely(!compound))
> - folio_add_file_rmap_pte(folio, page, vma);
> - else
> - folio_add_file_rmap_pmd(folio, page, vma);
> -}
> -
> /**
> * page_remove_rmap - take down pte mapping from a page
> * @page: page to remove mapping from

2023-12-08 01:44:42

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH RFC 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap()



On 12/4/2023 10:21 PM, David Hildenbrand wrote:
> Let's factor it out to prepare for reuse as we convert
> page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().
>
> Make the compiler always special-case on the granularity by using
> __always_inline.
>
> Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

> ---
> mm/rmap.c | 75 +++++++++++++++++++++++++++++++------------------------
> 1 file changed, 42 insertions(+), 33 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 53e2c653be99a..c09b360402599 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1127,6 +1127,46 @@ int folio_total_mapcount(struct folio *folio)
> return mapcount;
> }
>
> +static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
> + struct page *page, unsigned int nr_pages, enum rmap_mode mode,
> + int *nr_pmdmapped)
> +{
> + atomic_t *mapped = &folio->_nr_pages_mapped;
> + int first, nr = 0;
> +
> + __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
> +
> + /* Is page being mapped by PTE? Is this its first map to be added? */
> + if (likely(mode == RMAP_MODE_PTE)) {
> + do {
> + first = atomic_inc_and_test(&page->_mapcount);
> + if (first && folio_test_large(folio)) {
> + first = atomic_inc_return_relaxed(mapped);
> + first = (first < COMPOUND_MAPPED);
> + }
> +
> + if (first)
> + nr++;
> + } while (page++, --nr_pages > 0);
> + } else if (mode == RMAP_MODE_PMD) {
> + first = atomic_inc_and_test(&folio->_entire_mapcount);
> + if (first) {
> + nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> + if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
> + *nr_pmdmapped = folio_nr_pages(folio);
> + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> + /* Raced ahead of a remove and another add? */
> + if (unlikely(nr < 0))
> + nr = 0;
> + } else {
> + /* Raced ahead of a remove of COMPOUND_MAPPED */
> + nr = 0;
> + }
> + }
> + }
> + return nr;
> +}
> +
> /**
> * folio_move_anon_rmap - move a folio to our anon_vma
> * @folio: The folio to move to our anon_vma
> @@ -1338,42 +1378,11 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
> struct page *page, unsigned int nr_pages,
> struct vm_area_struct *vma, enum rmap_mode mode)
> {
> - atomic_t *mapped = &folio->_nr_pages_mapped;
> - unsigned int nr_pmdmapped = 0, first;
> - int nr = 0;
> + unsigned int nr, nr_pmdmapped = 0;
>
> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
> - __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
> -
> - /* Is page being mapped by PTE? Is this its first map to be added? */
> - if (likely(mode == RMAP_MODE_PTE)) {
> - do {
> - first = atomic_inc_and_test(&page->_mapcount);
> - if (first && folio_test_large(folio)) {
> - first = atomic_inc_return_relaxed(mapped);
> - first = (first < COMPOUND_MAPPED);
> - }
> -
> - if (first)
> - nr++;
> - } while (page++, --nr_pages > 0);
> - } else if (mode == RMAP_MODE_PMD) {
> - first = atomic_inc_and_test(&folio->_entire_mapcount);
> - if (first) {
> - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
> - nr_pmdmapped = folio_nr_pages(folio);
> - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> - /* Raced ahead of a remove and another add? */
> - if (unlikely(nr < 0))
> - nr = 0;
> - } else {
> - /* Raced ahead of a remove of COMPOUND_MAPPED */
> - nr = 0;
> - }
> - }
> - }
>
> + nr = __folio_add_rmap(folio, page, nr_pages, mode, &nr_pmdmapped);
> if (nr_pmdmapped)
> __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
> NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);

2023-12-08 11:24:34

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 05.12.23 14:49, Ryan Roberts wrote:
> On 05/12/2023 13:39, David Hildenbrand wrote:
>> On 05.12.23 14:31, Ryan Roberts wrote:
>>> On 05/12/2023 09:56, David Hildenbrand wrote:
>>>>>>
>>>>>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>>>>>> -- he carries his own batching variant right now -- and
>>>>>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>>>>>
>>>>> Note that the contpte series at [2] has a new patch in v3 (patch 2), which
>>>>> could
>>>>> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
>>>>> on top of [2] once it is merged.
>>>>>
>>>>>>
>>>>>> There is some overlap with both series (and some other work, like
>>>>>> multi-size THP [3]), so that will need some coordination, and likely a
>>>>>> stepwise inclusion.
>>>>>
>>>>> Selfishly, I'd really like to get my stuff merged as soon as there is no
>>>>> technical reason not to. I'd prefer not to add this as a dependency if we can
>>>>> help it.
>>>>
>>>> It's easy to rework either series on top of each other. The mTHP series has
>>>> highest priority,
>>>> no question, that will go in first.
>>>
>>> Music to my ears! It would be great to either get a reviewed-by or feedback on
>>> why not, for the key 2 patches in that series (3 & 4) and also your opinion on
>>> whether we need to wait for compaction to land (see cover letter). It would be
>>> great to get this into linux-next ASAP IMHO.
>>
>> On it :)
>>
>>>
>>>>
>>>> Regarding the contpte, I think it needs more work. Especially, as raised, to not
>>>> degrade
>>>> order-0 performance. Maybe we won't make the next merge window (and you already
>>>> predicated
>>>> that in some cover letter :P ). Let's see.
>>>
>>> Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
>>> for patch 2 in that series - would also be really helpful if you had a chance to
>>> look at patch 2 - its new for v3.
>>
>> I only skimmed over it, but it seems to go into the direction we'll need.
>> Keeping order-0 performance unharmed should have highest priority. Hopefully my
>> microbenchmarks are helpful.
>
> Yes absolutely - are you able to share them??
>
>>
>>>
>>>>
>>>> But again, the conflicts are all trivial, so I'll happily rebase on top of
>>>> whatever is
>>>> in mm-unstable. Or move the relevant rework to the front so you can just carry
>>>> them/base on them. (the batched variants for dup do make the contpte code much
>>>> easier)
>>>
>>> So perhaps we should aim for mTHP, then this, then contpte last, benefiting from
>>> the batching.
>>
>> Yeah. And again, I don't care too much if I have to rebase on top of your work
>> if this here takes longer. It's all a fairly trivial conversion.
>>
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>>
>>>>>> New (extended) hugetlb interface that operate on entire folio:
>>>>>>    * hugetlb_add_new_anon_rmap() -> Already existed
>>>>>>    * hugetlb_add_anon_rmap() -> Already existed
>>>>>>    * hugetlb_try_dup_anon_rmap()
>>>>>>    * hugetlb_try_share_anon_rmap()
>>>>>>    * hugetlb_add_file_rmap()
>>>>>>    * hugetlb_remove_rmap()
>>>>>>
>>>>>> New "ordinary" interface for small folios / THP::
>>>>>>    * folio_add_new_anon_rmap() -> Already existed
>>>>>>    * folio_add_anon_rmap_[pte|ptes|pmd]()
>>>>>>    * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>>>>>>    * folio_try_share_anon_rmap_[pte|pmd]()
>>>>>>    * folio_add_file_rmap_[pte|ptes|pmd]()
>>>>>>    * folio_dup_file_rmap_[pte|ptes|pmd]()
>>>>>>    * folio_remove_rmap_[pte|ptes|pmd]()
>>>>>
>>>>> I'm not sure if there are official guidelines, but personally if we are
>>>>> reworking the API, I'd take the opportunity to move "rmap" to the front of the
>>>>> name, rather than having it burried in the middle as it is for some of these:
>>>>>
>>>>> rmap_hugetlb_*()
>>>>>
>>>>> rmap_folio_*()
>>>>
>>>> No strong opinion. But we might want slightly different names then. For example,
>>>> it's "bio_add_folio" and not "bio_folio_add":
>>>>
>>>>
>>>> rmap_add_new_anon_hugetlb()
>>>> rmap_add_anon_hugetlb()
>>>> ...
>>>> rmap_remove_hugetlb()
>>>>
>>>>
>>>> rmap_add_new_anon_folio()
>>>> rmap_add_anon_folio_[pte|ptes|pmd]()
>>>> ...
>>>> rmap_dup_file_folio_[pte|ptes|pmd]()
>>>> rmap_remove_folio_[pte|ptes|pmd]()
>>>>
>>>> Thoughts?
>>>
>>> Having now reviewed your series, I have a less strong opinion, perhaps it's
>>> actually best with your original names; "folio" is actually the subject after
>>> all; it's the thing being operated on.

So far I sticked to the original names used in this RFC. I'm testing a
new series that is based on current mm/unstable (especially, mTHP) and
contains all changes discussed here.

If I don't here anything else, I'll send that out as v1 on Monday.

Thanks!

--
Cheers,

David / dhildenb

2023-12-08 11:38:54

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul

On 08/12/2023 11:24, David Hildenbrand wrote:
> On 05.12.23 14:49, Ryan Roberts wrote:
>> On 05/12/2023 13:39, David Hildenbrand wrote:
>>> On 05.12.23 14:31, Ryan Roberts wrote:
>>>> On 05/12/2023 09:56, David Hildenbrand wrote:
>>>>>>>
>>>>>>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>>>>>>> -- he carries his own batching variant right now -- and
>>>>>>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>>>>>>
>>>>>> Note that the contpte series at [2] has a new patch in v3 (patch 2), which
>>>>>> could
>>>>>> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive
>>>>>> [1]
>>>>>> on top of [2] once it is merged.
>>>>>>
>>>>>>>
>>>>>>> There is some overlap with both series (and some other work, like
>>>>>>> multi-size THP [3]), so that will need some coordination, and likely a
>>>>>>> stepwise inclusion.
>>>>>>
>>>>>> Selfishly, I'd really like to get my stuff merged as soon as there is no
>>>>>> technical reason not to. I'd prefer not to add this as a dependency if we can
>>>>>> help it.
>>>>>
>>>>> It's easy to rework either series on top of each other. The mTHP series has
>>>>> highest priority,
>>>>> no question, that will go in first.
>>>>
>>>> Music to my ears! It would be great to either get a reviewed-by or feedback on
>>>> why not, for the key 2 patches in that series (3 & 4) and also your opinion on
>>>> whether we need to wait for compaction to land (see cover letter). It would be
>>>> great to get this into linux-next ASAP IMHO.
>>>
>>> On it :)
>>>
>>>>
>>>>>
>>>>> Regarding the contpte, I think it needs more work. Especially, as raised,
>>>>> to not
>>>>> degrade
>>>>> order-0 performance. Maybe we won't make the next merge window (and you
>>>>> already
>>>>> predicated
>>>>> that in some cover letter :P ). Let's see.
>>>>
>>>> Yeah that's ok. I'll do the work to fix the order-0 perf. And also do the same
>>>> for patch 2 in that series - would also be really helpful if you had a
>>>> chance to
>>>> look at patch 2 - its new for v3.
>>>
>>> I only skimmed over it, but it seems to go into the direction we'll need.
>>> Keeping order-0 performance unharmed should have highest priority. Hopefully my
>>> microbenchmarks are helpful.
>>
>> Yes absolutely - are you able to share them??
>>
>>>
>>>>
>>>>>
>>>>> But again, the conflicts are all trivial, so I'll happily rebase on top of
>>>>> whatever is
>>>>> in mm-unstable. Or move the relevant rework to the front so you can just carry
>>>>> them/base on them. (the batched variants for dup do make the contpte code much
>>>>> easier)
>>>>
>>>> So perhaps we should aim for mTHP, then this, then contpte last, benefiting
>>>> from
>>>> the batching.
>>>
>>> Yeah. And again, I don't care too much if I have to rebase on top of your work
>>> if this here takes longer. It's all a fairly trivial conversion.
>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> New (extended) hugetlb interface that operate on entire folio:
>>>>>>>     * hugetlb_add_new_anon_rmap() -> Already existed
>>>>>>>     * hugetlb_add_anon_rmap() -> Already existed
>>>>>>>     * hugetlb_try_dup_anon_rmap()
>>>>>>>     * hugetlb_try_share_anon_rmap()
>>>>>>>     * hugetlb_add_file_rmap()
>>>>>>>     * hugetlb_remove_rmap()
>>>>>>>
>>>>>>> New "ordinary" interface for small folios / THP::
>>>>>>>     * folio_add_new_anon_rmap() -> Already existed
>>>>>>>     * folio_add_anon_rmap_[pte|ptes|pmd]()
>>>>>>>     * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>>>>>>>     * folio_try_share_anon_rmap_[pte|pmd]()
>>>>>>>     * folio_add_file_rmap_[pte|ptes|pmd]()
>>>>>>>     * folio_dup_file_rmap_[pte|ptes|pmd]()
>>>>>>>     * folio_remove_rmap_[pte|ptes|pmd]()
>>>>>>
>>>>>> I'm not sure if there are official guidelines, but personally if we are
>>>>>> reworking the API, I'd take the opportunity to move "rmap" to the front of
>>>>>> the
>>>>>> name, rather than having it burried in the middle as it is for some of these:
>>>>>>
>>>>>> rmap_hugetlb_*()
>>>>>>
>>>>>> rmap_folio_*()
>>>>>
>>>>> No strong opinion. But we might want slightly different names then. For
>>>>> example,
>>>>> it's "bio_add_folio" and not "bio_folio_add":
>>>>>
>>>>>
>>>>> rmap_add_new_anon_hugetlb()
>>>>> rmap_add_anon_hugetlb()
>>>>> ...
>>>>> rmap_remove_hugetlb()
>>>>>
>>>>>
>>>>> rmap_add_new_anon_folio()
>>>>> rmap_add_anon_folio_[pte|ptes|pmd]()
>>>>> ...
>>>>> rmap_dup_file_folio_[pte|ptes|pmd]()
>>>>> rmap_remove_folio_[pte|ptes|pmd]()
>>>>>
>>>>> Thoughts?
>>>>
>>>> Having now reviewed your series, I have a less strong opinion, perhaps it's
>>>> actually best with your original names; "folio" is actually the subject after
>>>> all; it's the thing being operated on.
>
> So far I sticked to the original names used in this RFC. I'm testing a new
> series that is based on current mm/unstable (especially, mTHP) and contains all
> changes discussed here.
>
> If I don't here anything else, I'll send that out as v1 on Monday.

Get's my vote!

>
> Thanks!
>