LinuxLists.cc - [PATCH v1 0/5] mm/rmap: separate hugetlb rmap handling

2023-11-28 14:52:22

Subject: [PATCH v1 0/5] mm/rmap: separate hugetlb rmap handling

Let's just cleanly separate hugetlb rmap handling from "ordinary" rmap
handling, like we already do when adding anon hugetlb folios. We have
hugetlb special-casing/checks in the callers in all cases either way
already in place: it doesn't make too much sense to call generic-looking
functions that end up doing hugetlb specific things from hugetlb
special-cases.

This now also means that we won't run into "folio_test_pmd_mappable()"
cases for hugetlb, which doesn't make too much sense with gigantic hugetlb
folios that are in fact currently only PUD-mappable. Or having a
folio_add_file_rmap() function that looks like it clould be used by hugetlb
code, but really can't.

This is a stanalone cleanup, but it gets more relevant when adding more
rmap batching (we cannot map parts of a hugetlb folio) or changing the way
we handle partially-mappable folios as in [1], whereby we'd have to add
more hugetlb special casing to keep hugetlb working as is.

If ever something about hugetlb changes that makes them actually
partially-mappable folios, we can look into cleanly merging all code
paths, not just some.

[1] https://lkml.kernel.org/r/[email protected]

Cc: Andrew Morton <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>

David Hildenbrand (5):
mm/rmap: rename hugepage_add* to hugetlb_add*
mm/rmap: introduce and use hugetlb_remove_rmap()
mm/rmap: introduce and use hugetlb_add_file_rmap()
mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
mm/rmap: add hugetlb sanity checks

include/linux/mm.h | 12 +++++++++---
include/linux/rmap.h | 37 +++++++++++++++++++++++++++++++++++--
mm/hugetlb.c | 21 ++++++++++-----------
mm/migrate.c | 6 +++---
mm/rmap.c | 31 ++++++++++++++++++-------------
5 files changed, 75 insertions(+), 32 deletions(-)

base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.41.0

2023-11-28 14:52:41

by David Hildenbrand

[permalink] [raw]

Subject: [PATCH v1 3/5] mm/rmap: introduce and use hugetlb_add_file_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code, and
we already have dedicated functions for adding anon hugetlb folios and
removing hugetlb folios.

Right now we're using page_dup_file_rmap() in some cases where "ordinary"
rmap code would have used page_add_file_rmap(). So let's introduce and
use hugetlb_add_file_rmap() instead. We won't be adding a
"hugetlb_dup_file_rmap()" functon for the fork() case, as it would be
doing the same: "dup" is just an optimization for "add".

While this is a cleanup, this will also make it easier to change rmap
handling for partially-mappable folios.

What remains is a single page_dup_file_rmap() call in fork() code.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 7 +++++++
mm/hugetlb.c | 6 +++---
mm/migrate.c | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e8d1dc1d5361..0a81e8420a96 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,6 +208,13 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_add_file_rmap(struct folio *folio)
+{
+ VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
+
+ atomic_inc(&folio->_entire_mapcount);
+}
+
static inline void hugetlb_remove_rmap(struct folio *folio)
{
atomic_dec(&folio->_entire_mapcount);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d17bb53b19ff..541a8f38cfdc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5401,7 +5401,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* sleep during the process.
*/
if (!folio_test_anon(pte_folio)) {
- page_dup_file_rmap(&pte_folio->page, true);
+ hugetlb_add_file_rmap(pte_folio);
} else if (page_try_dup_anon_rmap(&pte_folio->page,
true, src_vma)) {
pte_t src_pte_old = entry;
@@ -6272,7 +6272,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
if (anon_rmap)
hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
&& (vma->vm_flags & VM_SHARED)));
/*
@@ -6723,7 +6723,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out_release_unlock;

if (folio_in_pagecache)
- page_dup_file_rmap(&folio->page, true);
+ hugetlb_add_file_rmap(folio);
else
hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

diff --git a/mm/migrate.c b/mm/migrate.c
index 4cb849fa0dd2..de9d94b99ab7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio,
hugetlb_add_anon_rmap(folio, vma, pvmw.address,
rmap_flags);
else
- page_dup_file_rmap(new, true);
+ hugetlb_add_file_rmap(folio);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
psize);
} else
--
2.41.0

2023-11-28 14:52:42

by David Hildenbrand

[permalink] [raw]

Subject: [PATCH v1 1/5] mm/rmap: rename hugepage_add* to hugetlb_add*

Let's just call it "hugetlb_".

Yes, it's all already inconsistent and confusing because we have a lot
of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
be confused with transparent huge pages, and it matches "hugetlb.c" and
"folio_test_hugetlb()". So let's minimize confusion in rmap code.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 4 ++--
mm/hugetlb.c | 8 ++++----
mm/migrate.c | 4 ++--
mm/rmap.c | 8 ++++----
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b26fe858fd44..4c5bfeb05463 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -203,9 +203,9 @@ void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);

-void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
-void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
+void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

static inline void __page_dup_rmap(struct page *page, bool compound)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1169ef2f2176..4cfa0679661e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5278,7 +5278,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add
pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);

__folio_mark_uptodate(new_folio);
- hugepage_add_new_anon_rmap(new_folio, vma, addr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, addr);
if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(vma->vm_mm, addr, ptep, newpte, sz);
@@ -5981,7 +5981,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
page_remove_rmap(&old_folio->page, vma, true);
- hugepage_add_new_anon_rmap(new_folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
set_huge_pte_at(mm, haddr, ptep, newpte, huge_page_size(h));
@@ -6270,7 +6270,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
goto backout;

if (anon_rmap)
- hugepage_add_new_anon_rmap(folio, vma, haddr);
+ hugetlb_add_new_anon_rmap(folio, vma, haddr);
else
page_dup_file_rmap(&folio->page, true);
new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
@@ -6725,7 +6725,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
if (folio_in_pagecache)
page_dup_file_rmap(&folio->page, true);
else
- hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr);
+ hugetlb_add_new_anon_rmap(folio, dst_vma, dst_addr);

/*
* For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
diff --git a/mm/migrate.c b/mm/migrate.c
index 35a88334bb3c..4cb849fa0dd2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -249,8 +249,8 @@ static bool remove_migration_pte(struct folio *folio,

pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
if (folio_test_anon(folio))
- hugepage_add_anon_rmap(folio, vma, pvmw.address,
- rmap_flags);
+ hugetlb_add_anon_rmap(folio, vma, pvmw.address,
+ rmap_flags);
else
page_dup_file_rmap(new, true);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte,
diff --git a/mm/rmap.c b/mm/rmap.c
index 7a27a2b41802..112467c30b2c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2583,8 +2583,8 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
*
* RMAP_COMPOUND is ignored.
*/
-void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
- unsigned long address, rmap_t flags)
+void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
+ unsigned long address, rmap_t flags)
{
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

@@ -2595,8 +2595,8 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
PageAnonExclusive(&folio->page), folio);
}

-void hugepage_add_new_anon_rmap(struct folio *folio,
- struct vm_area_struct *vma, unsigned long address)
+void hugetlb_add_new_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma, unsigned long address)
{
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
--
2.41.0

2023-11-28 14:53:12

by David Hildenbrand

[permalink] [raw]

Subject: [PATCH v1 5/5] mm/rmap: add hugetlb sanity checks

Let's make sure we end up with the right folios in the right functions.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 6 ++++++
mm/rmap.c | 6 ++++++
2 files changed, 12 insertions(+)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8068c332e2ce..9625b6551d01 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -212,6 +212,7 @@ void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
struct vm_area_struct *vma)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

if (PageAnonExclusive(&folio->page)) {
@@ -225,6 +226,7 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,

static inline void hugetlb_add_file_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -232,11 +234,15 @@ static inline void hugetlb_add_file_rmap(struct folio *folio)

static inline void hugetlb_remove_rmap(struct folio *folio)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
atomic_dec(&folio->_entire_mapcount);
}

static inline void __page_dup_rmap(struct page *page, bool compound)
{
+ VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+
if (compound) {
struct folio *folio = (struct folio *)page;

diff --git a/mm/rmap.c b/mm/rmap.c
index 5037581b79ec..466f1ea5d0a6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1313,6 +1313,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
{
int nr;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
__folio_set_swapbacked(folio);

@@ -1353,6 +1354,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
unsigned int nr_pmdmapped = 0, first;
int nr = 0;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);

/* Is page being mapped by PTE? Is this its first map to be added? */
@@ -1438,6 +1440,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
bool last;
enum node_stat_item idx;

+ VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
VM_BUG_ON_PAGE(compound && !PageHead(page), page);

/* Is page being unmapped by PTE? Is this its last map to be removed? */
@@ -2585,6 +2588,7 @@ void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc)
void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, rmap_t flags)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);

atomic_inc(&folio->_entire_mapcount);
@@ -2597,6 +2601,8 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
void hugetlb_add_new_anon_rmap(struct folio *folio,
struct vm_area_struct *vma, unsigned long address)
{
+ VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
--
2.41.0

2023-11-28 14:53:15

by David Hildenbrand

[permalink] [raw]

Subject: [PATCH v1 4/5] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code, and
we already have dedicated functions for adding anon hugetlb folios and
removing hugetlb folios.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

While this is a cleanup, this will also make it easier to change rmap
handling for partially-mappable folios.

Note that is_device_private_page() does not apply to hugetlb.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/mm.h | 12 +++++++++---
include/linux/rmap.h | 15 +++++++++++++++
mm/hugetlb.c | 3 +--
3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..24c1c7c5a99c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1953,15 +1953,21 @@ static inline bool page_maybe_dma_pinned(struct page *page)
*
* The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
*/
-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
- struct page *page)
+static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct folio *folio)
{
VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));

if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
return false;

- return page_maybe_dma_pinned(page);
+ return folio_maybe_dma_pinned(folio);
+}
+
+static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
+ struct page *page)
+{
+ return folio_needs_cow_for_dma(vma, page_folio(page));
}

/**
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 0a81e8420a96..8068c332e2ce 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,6 +208,21 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+/* See page_try_dup_anon_rmap() */
+static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+ if (PageAnonExclusive(&folio->page)) {
+ if (unlikely(folio_needs_cow_for_dma(vma, folio)))
+ return -EBUSY;
+ ClearPageAnonExclusive(&folio->page);
+ }
+ atomic_inc(&folio->_entire_mapcount);
+ return 0;
+}
+
static inline void hugetlb_add_file_rmap(struct folio *folio)
{
VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 541a8f38cfdc..d927f8b2893c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5402,8 +5402,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
if (!folio_test_anon(pte_folio)) {
hugetlb_add_file_rmap(pte_folio);
- } else if (page_try_dup_anon_rmap(&pte_folio->page,
- true, src_vma)) {
+ } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
pte_t src_pte_old = entry;
struct folio *new_folio;

--
2.41.0

2023-11-28 14:53:18

by David Hildenbrand

[permalink] [raw]

Subject: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

hugetlb rmap handling differs quite a lot from "ordinary" rmap code. We
don't want this hugetlb special-casing in the rmap functions, as
we're special casing the callers already. Let's simply use a separate
function for hugetlb.

Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
code from page_remove_rmap(). This effectively removes one check on the
small-folio path as well.

While this is a cleanup, this will also make it easier to change rmap
handling for partially-mappable folios.

Note: all possible candidates that need care are page_remove_rmap() that
pass compound=true.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/rmap.h | 5 +++++
mm/hugetlb.c | 4 ++--
mm/rmap.c | 17 ++++++++---------
3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 4c5bfeb05463..e8d1dc1d5361 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -208,6 +208,11 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address);

+static inline void hugetlb_remove_rmap(struct folio *folio)
+{
+ atomic_dec(&folio->_entire_mapcount);
+}
+
static inline void __page_dup_rmap(struct page *page, bool compound)
{
if (compound) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4cfa0679661e..d17bb53b19ff 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5669,7 +5669,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
make_pte_marker(PTE_MARKER_UFFD_WP),
sz);
hugetlb_count_sub(pages_per_huge_page(h), mm);
- page_remove_rmap(page, vma, true);
+ hugetlb_remove_rmap(page_folio(page));

spin_unlock(ptl);
tlb_remove_page_size(tlb, page, huge_page_size(h));
@@ -5980,7 +5980,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,

/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
- page_remove_rmap(&old_folio->page, vma, true);
+ hugetlb_remove_rmap(old_folio);
hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
diff --git a/mm/rmap.c b/mm/rmap.c
index 112467c30b2c..5037581b79ec 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1440,13 +1440,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,

VM_BUG_ON_PAGE(compound && !PageHead(page), page);

- /* Hugetlb pages are not counted in NR_*MAPPED */
- if (unlikely(folio_test_hugetlb(folio))) {
- /* hugetlb pages are always mapped with pmds */
- atomic_dec(&folio->_entire_mapcount);
- return;
- }
-
/* Is page being unmapped by PTE? Is this its last map to be removed? */
if (likely(!compound)) {
last = atomic_add_negative(-1, &page->_mapcount);
@@ -1804,7 +1797,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
dec_mm_counter(mm, mm_counter_file(&folio->page));
}
discard:
- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
@@ -2157,7 +2153,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
*/
}

- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ if (unlikely(folio_test_hugetlb(folio)))
+ hugetlb_remove_rmap(folio);
+ else
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
--
2.41.0

2023-11-28 16:08:54

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

On Tue, Nov 28, 2023 at 03:52:02PM +0100, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code. We
> don't want this hugetlb special-casing in the rmap functions, as
> we're special casing the callers already. Let's simply use a separate
> function for hugetlb.

We were special casing some, not all..

And IIUC the suggestion from the community is we reduce that "special
casing", instead of adding more? To be explicit below..

>
> Let's introduce and use hugetlb_remove_rmap() and remove the hugetlb
> code from page_remove_rmap(). This effectively removes one check on the
> small-folio path as well.
>
> While this is a cleanup, this will also make it easier to change rmap
> handling for partially-mappable folios.
>
> Note: all possible candidates that need care are page_remove_rmap() that
> pass compound=true.
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> include/linux/rmap.h | 5 +++++
> mm/hugetlb.c | 4 ++--
> mm/rmap.c | 17 ++++++++---------
> 3 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 4c5bfeb05463..e8d1dc1d5361 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -208,6 +208,11 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
> unsigned long address);
>
> +static inline void hugetlb_remove_rmap(struct folio *folio)
> +{
> + atomic_dec(&folio->_entire_mapcount);
> +}
> +
> static inline void __page_dup_rmap(struct page *page, bool compound)
> {
> if (compound) {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4cfa0679661e..d17bb53b19ff 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5669,7 +5669,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> make_pte_marker(PTE_MARKER_UFFD_WP),
> sz);
> hugetlb_count_sub(pages_per_huge_page(h), mm);
> - page_remove_rmap(page, vma, true);
> + hugetlb_remove_rmap(page_folio(page));
>
> spin_unlock(ptl);
> tlb_remove_page_size(tlb, page, huge_page_size(h));
> @@ -5980,7 +5980,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>
> /* Break COW or unshare */
> huge_ptep_clear_flush(vma, haddr, ptep);
> - page_remove_rmap(&old_folio->page, vma, true);
> + hugetlb_remove_rmap(old_folio);
> hugetlb_add_new_anon_rmap(new_folio, vma, haddr);
> if (huge_pte_uffd_wp(pte))
> newpte = huge_pte_mkuffd_wp(newpte);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 112467c30b2c..5037581b79ec 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1440,13 +1440,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> - /* Hugetlb pages are not counted in NR_*MAPPED */
> - if (unlikely(folio_test_hugetlb(folio))) {
> - /* hugetlb pages are always mapped with pmds */
> - atomic_dec(&folio->_entire_mapcount);
> - return;
> - }

Fundamentally in the ideal world when hugetlb can be merged into generic
mm, I'd imagine that as dropping here, meanwhile...

> -
> /* Is page being unmapped by PTE? Is this its last map to be removed? */
> if (likely(!compound)) {
> last = atomic_add_negative(-1, &page->_mapcount);
> @@ -1804,7 +1797,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> dec_mm_counter(mm, mm_counter_file(&folio->page));
> }
> discard:
> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
> + if (unlikely(folio_test_hugetlb(folio)))
> + hugetlb_remove_rmap(folio);
> + else
> + page_remove_rmap(subpage, vma, false);

... rather than moving hugetlb_* handlings even upper the stack, we should
hopefully be able to keep this one as a generic api.

I worry this patch is adding more hugetlb-specific paths even onto higher
call stacks so it's harder to generalize, going the other way round to what
we wanted per previous discussions.

Said that, indeed I read mostly nothing yet with the recent rmap patches,
so I may miss some important goal here.

> if (vma->vm_flags & VM_LOCKED)
> mlock_drain_local();
> folio_put(folio);
> @@ -2157,7 +2153,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> */
> }
>
> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
> + if (unlikely(folio_test_hugetlb(folio)))
> + hugetlb_remove_rmap(folio);
> + else
> + page_remove_rmap(subpage, vma, false);
> if (vma->vm_flags & VM_LOCKED)
> mlock_drain_local();
> folio_put(folio);
> --
> 2.41.0
>
>

Thanks,

--
Peter Xu

2023-11-28 16:39:57

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

On 28.11.23 17:08, Peter Xu wrote:
> On Tue, Nov 28, 2023 at 03:52:02PM +0100, David Hildenbrand wrote:
>> hugetlb rmap handling differs quite a lot from "ordinary" rmap code. We
>> don't want this hugetlb special-casing in the rmap functions, as
>> we're special casing the callers already. Let's simply use a separate
>> function for hugetlb.
>
> We were special casing some, not all..
>
> And IIUC the suggestion from the community is we reduce that "special
> casing", instead of adding more? To be explicit below..

Quoting from the cover letter:

"We have hugetlb special-casing/checks in the callers in all cases
either way already in place: it doesn't make too much sense to call
generic-looking functions that end up doing hugetlb specific things from
hugetlb special-cases."

[...]

>> +++ b/mm/rmap.c
>> @@ -1440,13 +1440,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>>
>> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>>
>> - /* Hugetlb pages are not counted in NR_*MAPPED */
>> - if (unlikely(folio_test_hugetlb(folio))) {
>> - /* hugetlb pages are always mapped with pmds */
>> - atomic_dec(&folio->_entire_mapcount);
>> - return;
>> - }
>
> Fundamentally in the ideal world when hugetlb can be merged into generic
> mm, I'd imagine that as dropping here, meanwhile...
>

Quoting from the cover letter:

"If ever something about hugetlb changes that makes them actually
partially-mappable folios, we can look into cleanly merging all code
paths, not just some."

>> -
>> /* Is page being unmapped by PTE? Is this its last map to be removed? */
>> if (likely(!compound)) {
>> last = atomic_add_negative(-1, &page->_mapcount);
>> @@ -1804,7 +1797,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> dec_mm_counter(mm, mm_counter_file(&folio->page));
>> }
>> discard:
>> - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
>> + if (unlikely(folio_test_hugetlb(folio)))
>> + hugetlb_remove_rmap(folio);
>> + else
>> + page_remove_rmap(subpage, vma, false);
>
> ... rather than moving hugetlb_* handlings even upper the stack, we should
> hopefully be able to keep this one as a generic api.
>
> I worry this patch is adding more hugetlb-specific paths even onto higher
> call stacks so it's harder to generalize, going the other way round to what
> we wanted per previous discussions.
>
> Said that, indeed I read mostly nothing yet with the recent rmap patches,
> so I may miss some important goal here.

Quoting from the cover letter:

"This is a stanalone cleanup, but it gets more relevant when adding more
rmap batching (we cannot map parts of a hugetlb folio) or changing the
way we handle partially-mappable folios as in [1], whereby we'd have to
add more hugetlb special casing to keep hugetlb working as is."

Thanks for the review. If you have strong feelings, please add an
explicit nack. Otherwise, I'll move forward with this.

--
Cheers,

David / dhildenb

2023-11-28 17:14:11

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

On Tue, Nov 28, 2023 at 05:39:35PM +0100, David Hildenbrand wrote:
> Quoting from the cover letter:
>
> "We have hugetlb special-casing/checks in the callers in all cases either
> way already in place: it doesn't make too much sense to call generic-looking
> functions that end up doing hugetlb specific things from hugetlb
> special-cases."

I'll take this one as an example: I think one goal (of my understanding of
the mm community) is to make the generic looking functions keep being
generic, dropping any function named as "*hugetlb*" if possible one day
within that generic implementation. I said that in my previous reply.

Having that "*hugetlb*" code already in the code base may or may not be a
good reason to further move it upward the stack.

Strong feelings? No, I don't have. I'm not knowledged enough to do so.

Thanks,

--
Peter Xu

2023-11-28 17:43:11

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

On 28.11.23 18:13, Peter Xu wrote:
> On Tue, Nov 28, 2023 at 05:39:35PM +0100, David Hildenbrand wrote:
>> Quoting from the cover letter:
>>
>> "We have hugetlb special-casing/checks in the callers in all cases either
>> way already in place: it doesn't make too much sense to call generic-looking
>> functions that end up doing hugetlb specific things from hugetlb
>> special-cases."
>
> I'll take this one as an example: I think one goal (of my understanding of
> the mm community) is to make the generic looking functions keep being
> generic, dropping any function named as "*hugetlb*" if possible one day
> within that generic implementation. I said that in my previous reply.

Yes, and I am one of the people asking for that. However, only where it
makes sense (e.g., like page table walking, GUP as you said), and only
when it is actually unified.

I don't think that rmap handling or fault handling will ever be
completely unified to that extreme, and it might also not be desirable.
Just like we have separate paths for anon and file in areas where they
are reasonable different.

What doesn't make sense is using patterns like:

page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));

and then, inside page_remove_rmap() have an initial folio_test_hugetlb()
check that does something completely different.

So each and everyone calling page_remove_rmap (and knowing that it's
certainly not a hugetlb folio) has to run through that check.

Then, we have functions like page_add_file_rmap() that look like they
can be used for hugetlb, but hugetlb is smart enough and only calls
page_dup_file_rmap(), because it doesn't want to touch any file/anon
counters. And to handle that we would have to add folio_test_hugetlb()
inside there, which adds the same as above, trying to unify without
unifying.

Once we're in the area of folio_add_file_rmap_range(), it all stops
making sense, because there is no way we could possibly partially map a
folio today. (and if we can in the future, we might still want separate
handling, because most caller know with which pages they are dealing, below)

Last but not least, it's all inconsistent right now with
hugetlb_add_anon_rmap()/hugetlb_add_new_anon_rmap() being there because
they differ reasonably well from the "ordinary" counterparts.

>
> Having that "*hugetlb*" code already in the code base may or may not be a
> good reason to further move it upward the stack.

If you see a path forward in the foreseeable future where we would have
code that doesn't special-case hugetlb in rmap calling code already, I'd
be interested in that.

hugetlb.c knows that it's dealing with hugetlb pages.

huge_memory.c knows that it's dealing with PMD-mapped thp.

memory.c knows that it it's dealing with PTE-mapped thp or small folios.

Only migrate.c (and e.g., try_to_unmap()) in rmap.c handle different
types. But there is plenty of hugetlb special-casing in there that I
don't really see going away.

>
> Strong feelings? No, I don't have. I'm not knowledged enough to do so.

I'm sure you are, so I'm trusting your judgment :)

I don't think going in the other direction and e.g., removing
hugetlb_add_anon_rmap / hugetlb_add_new_anon_rmap is making a
unification that is not really reasonable. It will only make things
slower and the individual functions more complicated.

--
Cheers,

David / dhildenb

2023-11-28 19:48:29

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

On Tue, Nov 28, 2023 at 06:42:44PM +0100, David Hildenbrand wrote:
> On 28.11.23 18:13, Peter Xu wrote:
> > On Tue, Nov 28, 2023 at 05:39:35PM +0100, David Hildenbrand wrote:
> > > Quoting from the cover letter:
> > >
> > > "We have hugetlb special-casing/checks in the callers in all cases either
> > > way already in place: it doesn't make too much sense to call generic-looking
> > > functions that end up doing hugetlb specific things from hugetlb
> > > special-cases."
> >
> > I'll take this one as an example: I think one goal (of my understanding of
> > the mm community) is to make the generic looking functions keep being
> > generic, dropping any function named as "*hugetlb*" if possible one day
> > within that generic implementation. I said that in my previous reply.
>
> Yes, and I am one of the people asking for that. However, only where it
> makes sense (e.g., like page table walking, GUP as you said), and only when
> it is actually unified.
>
> I don't think that rmap handling or fault handling will ever be completely
> unified to that extreme, and it might also not be desirable. Just like we
> have separate paths for anon and file in areas where they are reasonable
> different.

Yes I haven't check further in that direction, that'll be after the pgtable
work for me if that can first move on smoothly, and it'll also depend on
whether merging the pgtable changes would be good enough so that we can
move on to consider small mappings for hugetlb, until then we need to
settle a final mapcount solution for hugetlb.

>
> What doesn't make sense is using patterns like:
>
> page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
>
> and then, inside page_remove_rmap() have an initial folio_test_hugetlb()
> check that does something completely different.

IIUC above "folio_test_hugetlb(folio)" pattern can become "false" for
hugetlb, if we decided to do mapcount for small hugetlb mappings. If that
happens, I think something like this patch _may_ need to be reverted again
more or less. Or we start to copy some of page_remove_rmap() into the new
hugetlb rmap api.

>
> So each and everyone calling page_remove_rmap (and knowing that it's
> certainly not a hugetlb folio) has to run through that check.

Note that right after this change applied, hugetlb already start to lose
something in common with generic compound folios, where page_remove_rmap()
had:

VM_BUG_ON_PAGE(compound && !PageHead(page), page);

That sanity check goes away immediately for hugetlb, which is still
logically applicable.

>
> Then, we have functions like page_add_file_rmap() that look like they can be
> used for hugetlb, but hugetlb is smart enough and only calls
> page_dup_file_rmap(), because it doesn't want to touch any file/anon
> counters. And to handle that we would have to add folio_test_hugetlb()
> inside there, which adds the same as above, trying to unify without
> unifying.
>
> Once we're in the area of folio_add_file_rmap_range(), it all stops making
> sense, because there is no way we could possibly partially map a folio
> today. (and if we can in the future, we might still want separate handling,
> because most caller know with which pages they are dealing, below)
>
> Last but not least, it's all inconsistent right now with
> hugetlb_add_anon_rmap()/hugetlb_add_new_anon_rmap() being there because they
> differ reasonably well from the "ordinary" counterparts.

Taking hugepage_add_new_anon_rmap() as example, IMHO they still share a lot
of things with !hugetlb codes, and maybe they can already be cleaned up
into something common for a large mapping:

void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
unsigned long address)
{
int nr;

VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);

if (folio_is_hugetlb(folio)) {
folio_clear_hugetlb_restore_reserve(folio);
} else {
__folio_set_swapbacked(folio);
atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
nr = folio_nr_pages(folio);
__lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
}

/* increment count (starts at -1) */
atomic_set(&folio->_entire_mapcount, 0);
__folio_set_anon(folio, vma, address, true);
SetPageAnonExclusive(&folio->page);
}

For folio_add_file_rmap_range(): would it work if it just pass the hugetlb
folio range into it? Would that make it much more difficult with the
recent work on large folios from you or anyone?

> I don't think going in the other direction and e.g., removing
> hugetlb_add_anon_rmap / hugetlb_add_new_anon_rmap is making a unification
> that is not really reasonable. It will only make things slower and the
> individual functions more complicated.

IIUC so far the performance difference should be minimum on which helper to
use.

As I mentioned, I sincerely don't know whether this patch is good or not as
I don't know enough on everything around that is happening. It's just that
I'll still think twice if to create hugetlb its own rmap API, because from
the high level it's going the other way round to me. So I still want to
raise this as a pure question.

Thanks,

--
Peter Xu

2023-11-28 20:33:00

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v1 2/5] mm/rmap: introduce and use hugetlb_remove_rmap()

>> What doesn't make sense is using patterns like:
>>
>> page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
>>
>> and then, inside page_remove_rmap() have an initial folio_test_hugetlb()
>> check that does something completely different.
>
> IIUC above "folio_test_hugetlb(folio)" pattern can become "false" for
> hugetlb, if we decided to do mapcount for small hugetlb mappings. If that
> happens, I think something like this patch _may_ need to be reverted again
> more or less. Or we start to copy some of page_remove_rmap() into the new
> hugetlb rmap api.

Probably just extend it to a hugetlb rmap api if you want to do some
batching. But who knows how that will actually look like. The
migrate.c/rmap.c special casing for hugetlb when migrating/unmapping is
quite severe already and this is just the tip of the iceberg.

Further, the question is if you really want to deal with subpage
mapcounts like the !hugetlb variant does. Quite likely you don't want to
do that if you can avoid it and just have a total mapcount.

>
>>
>> So each and everyone calling page_remove_rmap (and knowing that it's
>> certainly not a hugetlb folio) has to run through that check.
>
> Note that right after this change applied, hugetlb already start to lose
> something in common with generic compound folios, where page_remove_rmap()
> had:
>
> VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> That sanity check goes away immediately for hugetlb, which is still
> logically applicable.

Why? We're passing folios that the caller already properly prepared.

A folio by definition points to the head page.

>
>>
>> Then, we have functions like page_add_file_rmap() that look like they can be
>> used for hugetlb, but hugetlb is smart enough and only calls
>> page_dup_file_rmap(), because it doesn't want to touch any file/anon
>> counters. And to handle that we would have to add folio_test_hugetlb()
>> inside there, which adds the same as above, trying to unify without
>> unifying.
>>
>> Once we're in the area of folio_add_file_rmap_range(), it all stops making
>> sense, because there is no way we could possibly partially map a folio
>> today. (and if we can in the future, we might still want separate handling,
>> because most caller know with which pages they are dealing, below)
>>
>> Last but not least, it's all inconsistent right now with
>> hugetlb_add_anon_rmap()/hugetlb_add_new_anon_rmap() being there because they
>> differ reasonably well from the "ordinary" counterparts.
>
> Taking hugepage_add_new_anon_rmap() as example, IMHO they still share a lot
> of things with !hugetlb codes, and maybe they can already be cleaned up
> into something common for a large mapping:
>
> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
> unsigned long address)
> {
> int nr;
>
> VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
>
> if (folio_is_hugetlb(folio)) {
> folio_clear_hugetlb_restore_reserve(folio);
> } else {
> __folio_set_swapbacked(folio);
> atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
> nr = folio_nr_pages(folio);
> __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
> __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
> }
>
> /* increment count (starts at -1) */
> atomic_set(&folio->_entire_mapcount, 0);
> __folio_set_anon(folio, vma, address, true);
> SetPageAnonExclusive(&folio->page);
> }

Note that function is about to be extended to be more !hugetlb like in
Ryans work (smaller THP).

>
> For folio_add_file_rmap_range(): would it work if it just pass the hugetlb
> folio range into it? Would that make it much more difficult with the
> recent work on large folios from you or anyone?

Sure we can, that doesn't mean that it's a good interface for hugetlb
that doesn't support ranges right now and maybe never will. :)

>
>> I don't think going in the other direction and e.g., removing
>> hugetlb_add_anon_rmap / hugetlb_add_new_anon_rmap is making a unification
>> that is not really reasonable. It will only make things slower and the
>> individual functions more complicated.
>
> IIUC so far the performance difference should be minimum on which helper to
> use.

I've learned that even the likely in "if (likely(compound))" matters in
some rmap function.

So I'm still against effectively adding more folio_is_hugetlb() checks
on hot code paths for the sake of "unification" that I don't really see
as "unification".

>
> As I mentioned, I sincerely don't know whether this patch is good or not as
> I don't know enough on everything around that is happening. It's just that
> I'll still think twice if to create hugetlb its own rmap API, because from
> the high level it's going the other way round to me. So I still want to
> raise this as a pure question.

I consider the end result here easier to read and easier to extend. If
there are good reasons to go the different way, I'm all ears.

--
Cheers,

David / dhildenb

2023-11-29 08:43:37

by Muchun Song

[permalink] [raw]

Subject: Re: [PATCH v1 1/5] mm/rmap: rename hugepage_add* to hugetlb_add*

> On Nov 28, 2023, at 22:52, David Hildenbrand <[email protected]> wrote:
>
> Let's just call it "hugetlb_".
>
> Yes, it's all already inconsistent and confusing because we have a lot
> of "hugepage_" functions for legacy reasons. But "hugetlb" cannot possibly
> be confused with transparent huge pages, and it matches "hugetlb.c" and
> "folio_test_hugetlb()". So let's minimize confusion in rmap code.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.