2023-11-04 03:56:13

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH rfc v2 00/10] mm: convert mm counter to take a folio

Make all mm_counter() and mm_counter_file() callers to use a folio,
then convert mm counter functions to take a folio, which saves lots
of compound_head() calls.

Kefeng Wang (10):
mm: swap: introduce pfn_swap_entry_to_folio()
s390: pgtable: use a folio in ptep_zap_swap_entry()
mm: huge_memory: use a folio in __split_huge_pmd_locked()
mm: huge_memory: use a folio in zap_huge_pmd()
mm: memory: use a folio in copy_nonpresent_pte()
mm: memory: use a folio in zap_pte_range()
mm: memory: use a folio in do_set_pmd()
mm: memory: use a folio in insert_page_into_pte_locked()
mm: convert mm_counter() to take a folio
mm: convert mm_counter_file() to take a folio

arch/s390/mm/pgtable.c | 4 +-
include/linux/mm.h | 12 +++---
include/linux/swapops.h | 13 +++++++
kernel/events/uprobes.c | 2 +-
mm/huge_memory.c | 34 +++++++++--------
mm/khugepaged.c | 4 +-
mm/memory.c | 81 +++++++++++++++++++++++------------------
mm/rmap.c | 10 ++---
mm/userfaultfd.c | 2 +-
9 files changed, 94 insertions(+), 68 deletions(-)

--
2.27.0


2023-11-04 03:56:14

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 01/10] mm: swap: introduce pfn_swap_entry_to_folio()

Introduce a new pfn_swap_entry_to_folio(), it is similar to
pfn_swap_entry_to_page(), but return a folio, which allow us
to completely replace the struct page variables with struct
folio variables.

Signed-off-by: Kefeng Wang <[email protected]>
---
include/linux/swapops.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index bff1e8d97de0..85cb84e4be95 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -468,6 +468,19 @@ static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
return p;
}

+static inline struct folio *pfn_swap_entry_to_folio(swp_entry_t entry)
+{
+ struct folio *folio = pfn_folio(swp_offset_pfn(entry));
+
+ /*
+ * Any use of migration entries may only occur while the
+ * corresponding folio is locked
+ */
+ BUG_ON(is_migration_entry(entry) && !folio_test_locked(folio));
+
+ return folio;
+}
+
/*
* A pfn swap entry is a special type of swap entry that always has a pfn stored
* in the swap offset. They are used to represent unaddressable device memory
--
2.27.0

2023-11-04 03:56:18

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 03/10] mm: huge_memory: use a folio in __split_huge_pmd_locked()

Use a folio in __split_huge_pmd_locked() which replaces five
compound_head() call with two page_folio() calls.

Signed-off-by: Kefeng Wang <[email protected]>
---
mm/huge_memory.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f31f02472396..34dd01970927 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2117,6 +2117,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
count_vm_event(THP_SPLIT_PMD);

if (!vma_is_anonymous(vma)) {
+ struct folio *folio;
old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
/*
* We are going to unmap this huge page. So
@@ -2130,17 +2131,17 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
swp_entry_t entry;

entry = pmd_to_swp_entry(old_pmd);
- page = pfn_swap_entry_to_page(entry);
+ folio = pfn_swap_entry_to_folio(entry);
} else {
- page = pmd_page(old_pmd);
- if (!PageDirty(page) && pmd_dirty(old_pmd))
- set_page_dirty(page);
- if (!PageReferenced(page) && pmd_young(old_pmd))
- SetPageReferenced(page);
- page_remove_rmap(page, vma, true);
- put_page(page);
+ folio = page_folio(pmd_page(old_pmd));
+ if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
+ folio_set_dirty(folio);
+ if (!folio_test_referenced(folio) && pmd_young(old_pmd))
+ folio_set_referenced(folio);
+ page_remove_rmap(&folio->page, vma, true);
+ folio_put(folio);
}
- add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
+ add_mm_counter(mm, mm_counter_file(&folio->page), -HPAGE_PMD_NR);
return;
}

--
2.27.0

2023-11-04 03:56:30

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 02/10] s390: pgtable: use a folio in ptep_zap_swap_entry()

Use a folio in ptep_zap_swap_entry(), which is preparetion for
converting mm counter functions to take a folio.

Signed-off-by: Kefeng Wang <[email protected]>
---
arch/s390/mm/pgtable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 3bd2ab2a9a34..2f946b493fff 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -730,9 +730,9 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
if (!non_swap_entry(entry))
dec_mm_counter(mm, MM_SWAPENTS);
else if (is_migration_entry(entry)) {
- struct page *page = pfn_swap_entry_to_page(entry);
+ struct folio *folio = pfn_swap_entry_to_folio(entry);

- dec_mm_counter(mm, mm_counter(page));
+ dec_mm_counter(mm, mm_counter(&folio->page));
}
free_swap_and_cache(entry);
}
--
2.27.0

2023-11-04 03:56:34

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 06/10] mm: memory: use a folio in zap_pte_range()

Make should_zap_page() to take a folio and use a folio in
zap_pte_range(), which save several compound_head() calls.

Signed-off-by: Kefeng Wang <[email protected]>
---
mm/memory.c | 45 +++++++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index d9314dee355e..806568f9605b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1358,19 +1358,19 @@ static inline bool should_zap_cows(struct zap_details *details)
return details->even_cows;
}

-/* Decides whether we should zap this page with the page pointer specified */
-static inline bool should_zap_page(struct zap_details *details, struct page *page)
+/* Decides whether we should zap this folio with the folio pointer specified */
+static inline bool should_zap_page(struct zap_details *details, struct folio *folio)
{
- /* If we can make a decision without *page.. */
+ /* If we can make a decision without *folio.. */
if (should_zap_cows(details))
return true;

- /* E.g. the caller passes NULL for the case of a zero page */
- if (!page)
+ /* E.g. the caller passes NULL for the case of a zero folio */
+ if (!folio)
return true;

- /* Otherwise we should only zap non-anon pages */
- return !PageAnon(page);
+ /* Otherwise we should only zap non-anon folios */
+ return !folio_test_anon(folio);
}

static inline bool zap_drop_file_uffd_wp(struct zap_details *details)
@@ -1423,7 +1423,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = ptep_get(pte);
- struct page *page;
+ struct folio *folio = NULL;

if (pte_none(ptent))
continue;
@@ -1433,9 +1433,13 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,

if (pte_present(ptent)) {
unsigned int delay_rmap;
+ struct page *page;

page = vm_normal_page(vma, addr, ptent);
- if (unlikely(!should_zap_page(details, page)))
+ if (page)
+ folio = page_folio(page);
+
+ if (unlikely(!should_zap_page(details, folio)))
continue;
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
@@ -1449,16 +1453,16 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
}

delay_rmap = 0;
- if (!PageAnon(page)) {
+ if (!folio_test_anon(folio)) {
if (pte_dirty(ptent)) {
- set_page_dirty(page);
+ folio_set_dirty(folio);
if (tlb_delay_rmap(tlb)) {
delay_rmap = 1;
force_flush = 1;
}
}
if (pte_young(ptent) && likely(vma_has_recency(vma)))
- mark_page_accessed(page);
+ folio_mark_accessed(folio);
}
rss[mm_counter(page)]--;
if (!delay_rmap) {
@@ -1477,9 +1481,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
entry = pte_to_swp_entry(ptent);
if (is_device_private_entry(entry) ||
is_device_exclusive_entry(entry)) {
- page = pfn_swap_entry_to_page(entry);
- if (unlikely(!should_zap_page(details, page)))
+ folio = pfn_swap_entry_to_folio(entry);
+ if (unlikely(!should_zap_page(details, folio)))
continue;
+
/*
* Both device private/exclusive mappings should only
* work with anonymous page so far, so we don't need to
@@ -1487,10 +1492,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
* see zap_install_uffd_wp_if_needed().
*/
WARN_ON_ONCE(!vma_is_anonymous(vma));
- rss[mm_counter(page)]--;
+ rss[mm_counter(&folio->page)]--;
if (is_device_private_entry(entry))
- page_remove_rmap(page, vma, false);
- put_page(page);
+ page_remove_rmap(&folio->page, vma, false);
+ folio_put(folio);
} else if (!non_swap_entry(entry)) {
/* Genuine swap entry, hence a private anon page */
if (!should_zap_cows(details))
@@ -1499,10 +1504,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
if (unlikely(!free_swap_and_cache(entry)))
print_bad_pte(vma, addr, ptent, NULL);
} else if (is_migration_entry(entry)) {
- page = pfn_swap_entry_to_page(entry);
- if (!should_zap_page(details, page))
+ folio = pfn_swap_entry_to_folio(entry);
+ if (!should_zap_page(details, folio))
continue;
- rss[mm_counter(page)]--;
+ rss[mm_counter(&folio->page)]--;
} else if (pte_marker_entry_uffd_wp(entry)) {
/*
* For anon: always drop the marker; for file: only
--
2.27.0

2023-11-04 03:56:36

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 05/10] mm: memory: use a folio in copy_nonpresent_pte()

Use a folio in copy_nonpresent_pte() to save one compound_head() call.

Signed-off-by: Kefeng Wang <[email protected]>
---
mm/memory.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 1f18ed4a5497..d9314dee355e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -779,7 +779,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
unsigned long vm_flags = dst_vma->vm_flags;
pte_t orig_pte = ptep_get(src_pte);
pte_t pte = orig_pte;
- struct page *page;
+ struct folio *folio;
swp_entry_t entry = pte_to_swp_entry(orig_pte);

if (likely(!non_swap_entry(entry))) {
@@ -801,9 +801,9 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}
rss[MM_SWAPENTS]++;
} else if (is_migration_entry(entry)) {
- page = pfn_swap_entry_to_page(entry);
+ folio = pfn_swap_entry_to_folio(entry);

- rss[mm_counter(page)]++;
+ rss[mm_counter(&folio->page)]++;

if (!is_readable_migration_entry(entry) &&
is_cow_mapping(vm_flags)) {
@@ -822,7 +822,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
set_pte_at(src_mm, addr, src_pte, pte);
}
} else if (is_device_private_entry(entry)) {
- page = pfn_swap_entry_to_page(entry);
+ folio = pfn_swap_entry_to_folio(entry);

/*
* Update rss count even for unaddressable pages, as
@@ -833,10 +833,10 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* for unaddressable pages, at some point. But for now
* keep things as they are.
*/
- get_page(page);
- rss[mm_counter(page)]++;
+ folio_get(folio);
+ rss[mm_counter(&folio->page)]++;
/* Cannot fail as these pages cannot get pinned. */
- BUG_ON(page_try_dup_anon_rmap(page, false, src_vma));
+ BUG_ON(page_try_dup_anon_rmap(&folio->page, false, src_vma));

/*
* We do not preserve soft-dirty information, because so
--
2.27.0

2023-11-04 03:56:52

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 10/10] mm: convert mm_counter_file() to take a folio

Since all mm_counter_file() callers with a folio, let's convert
mm_counter_file() to take a folio.

Signed-off-by: Kefeng Wang <[email protected]>
---
include/linux/mm.h | 8 ++++----
kernel/events/uprobes.c | 2 +-
mm/huge_memory.c | 5 +++--
mm/khugepaged.c | 4 ++--
mm/memory.c | 10 +++++-----
mm/rmap.c | 2 +-
6 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fea78900bf84..95573065a46b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2583,10 +2583,10 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
mm_trace_rss_stat(mm, member);
}

-/* Optimized variant when page is already known not to be PageAnon */
-static inline int mm_counter_file(struct page *page)
+/* Optimized variant when folio is already known not to be anon */
+static inline int mm_counter_file(struct folio *folio)
{
- if (PageSwapBacked(page))
+ if (folio_test_swapbacked(folio))
return MM_SHMEMPAGES;
return MM_FILEPAGES;
}
@@ -2595,7 +2595,7 @@ static inline int mm_counter(struct folio *folio)
{
if (folio_test_anon(folio))
return MM_ANONPAGES;
- return mm_counter_file(&folio->page);
+ return mm_counter_file(folio);
}

static inline unsigned long get_mm_rss(struct mm_struct *mm)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 435aac1d8c27..ce251e3a4ae6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
dec_mm_counter(mm, MM_ANONPAGES);

if (!folio_test_anon(old_folio)) {
- dec_mm_counter(mm, mm_counter_file(old_page));
+ dec_mm_counter(mm, mm_counter_file(old_folio));
inc_mm_counter(mm, MM_ANONPAGES);
}

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78a00fe22c2d..88420d067477 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1742,7 +1742,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
} else {
if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
- add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR);
+ add_mm_counter(tlb->mm, mm_counter_file(folio),
+ -HPAGE_PMD_NR);
}

spin_unlock(ptl);
@@ -2143,7 +2144,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
page_remove_rmap(&folio->page, vma, true);
folio_put(folio);
}
- add_mm_counter(mm, mm_counter_file(&folio->page), -HPAGE_PMD_NR);
+ add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
return;
}

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 064654717843..39393f4262b2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1630,7 +1630,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
/* step 3: set proper refcount and mm_counters. */
if (nr_ptes) {
folio_ref_sub(folio, nr_ptes);
- add_mm_counter(mm, mm_counter_file(&folio->page), -nr_ptes);
+ add_mm_counter(mm, mm_counter_file(folio), -nr_ptes);
}

/* step 4: remove empty page table */
@@ -1661,7 +1661,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
if (nr_ptes) {
flush_tlb_mm(mm);
folio_ref_sub(folio, nr_ptes);
- add_mm_counter(mm, mm_counter_file(&folio->page), -nr_ptes);
+ add_mm_counter(mm, mm_counter_file(folio), -nr_ptes);
}
if (start_pte)
pte_unmap_unlock(start_pte, ptl);
diff --git a/mm/memory.c b/mm/memory.c
index ad30d4ad2223..3418ace5e0ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -960,7 +960,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
} else if (page) {
folio_get(folio);
page_dup_file_rmap(page, false);
- rss[mm_counter_file(page)]++;
+ rss[mm_counter_file(folio)]++;
}

/*
@@ -1857,7 +1857,7 @@ static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
folio = page_folio(page);
/* Ok, finally just insert the thing.. */
folio_get(folio);
- inc_mm_counter(vma->vm_mm, mm_counter_file(page));
+ inc_mm_counter(vma->vm_mm, mm_counter_file(folio));
page_add_file_rmap(page, vma, false);
set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
return 0;
@@ -3166,7 +3166,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
if (likely(vmf->pte && pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
if (old_folio) {
if (!folio_test_anon(old_folio)) {
- dec_mm_counter(mm, mm_counter_file(&old_folio->page));
+ dec_mm_counter(mm, mm_counter_file(old_folio));
inc_mm_counter(mm, MM_ANONPAGES);
}
} else {
@@ -4359,7 +4359,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (write)
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

- add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
+ add_mm_counter(vma->vm_mm, mm_counter_file(folio), HPAGE_PMD_NR);
page_add_file_rmap(page, vma, true);

/*
@@ -4422,7 +4422,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
folio_add_new_anon_rmap(folio, vma, addr);
folio_add_lru_vma(folio, vma);
} else {
- add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
+ add_mm_counter(vma->vm_mm, mm_counter_file(folio), nr);
folio_add_file_rmap_range(folio, page, nr, vma, false);
}
set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
diff --git a/mm/rmap.c b/mm/rmap.c
index 7a563490ce08..9e3d0eff8b05 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1801,7 +1801,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*
* See Documentation/mm/mmu_notifier.rst
*/
- dec_mm_counter(mm, mm_counter_file(&folio->page));
+ dec_mm_counter(mm, mm_counter_file(folio));
}
discard:
page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
--
2.27.0

2023-11-04 03:56:56

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 07/10] mm: memory: use a folio in do_set_pmd()

Use a folio in do_set_pmd(), which is a preparetion for
converting mm counter functions to take a folio.

Signed-off-by: Kefeng Wang <[email protected]>
---
mm/memory.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 806568f9605b..ac247850919a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4318,12 +4318,13 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ struct folio *folio;

if (!transhuge_vma_suitable(vma, haddr))
return ret;

- page = compound_head(page);
- if (compound_order(page) != HPAGE_PMD_ORDER)
+ folio = page_folio(page);
+ if (folio_order(folio) != HPAGE_PMD_ORDER)
return ret;

/*
--
2.27.0

2023-11-04 03:57:00

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 08/10] mm: memory: use a folio in insert_page_into_pte_locked()

Use a folio in insert_page_into_pte_locked(), which is preparetion
for converting mm counter functions to take a folio.

Signed-off-by: Kefeng Wang <[email protected]>
---
mm/memory.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index ac247850919a..a2cf240b1975 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1850,10 +1850,13 @@ static int validate_page_before_insert(struct page *page)
static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, struct page *page, pgprot_t prot)
{
+ struct folio *folio;
+
if (!pte_none(ptep_get(pte)))
return -EBUSY;
+ folio = page_folio(page);
/* Ok, finally just insert the thing.. */
- get_page(page);
+ folio_get(folio);
inc_mm_counter(vma->vm_mm, mm_counter_file(page));
page_add_file_rmap(page, vma, false);
set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot));
--
2.27.0

2023-11-04 03:57:10

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 09/10] mm: convert mm_counter() to take a folio

Since all mm_counter() callers with a folio, let's convert
mm_counter() to take a folio.

Signed-off-by: Kefeng Wang <[email protected]>
---
arch/s390/mm/pgtable.c | 2 +-
include/linux/mm.h | 6 +++---
mm/memory.c | 10 +++++-----
mm/rmap.c | 8 ++++----
mm/userfaultfd.c | 2 +-
5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 2f946b493fff..54b184648db6 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -732,7 +732,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
else if (is_migration_entry(entry)) {
struct folio *folio = pfn_swap_entry_to_folio(entry);

- dec_mm_counter(mm, mm_counter(&folio->page));
+ dec_mm_counter(mm, mm_counter(folio));
}
free_swap_and_cache(entry);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..fea78900bf84 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2591,11 +2591,11 @@ static inline int mm_counter_file(struct page *page)
return MM_FILEPAGES;
}

-static inline int mm_counter(struct page *page)
+static inline int mm_counter(struct folio *folio)
{
- if (PageAnon(page))
+ if (folio_test_anon(folio))
return MM_ANONPAGES;
- return mm_counter_file(page);
+ return mm_counter_file(&folio->page);
}

static inline unsigned long get_mm_rss(struct mm_struct *mm)
diff --git a/mm/memory.c b/mm/memory.c
index a2cf240b1975..ad30d4ad2223 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -803,7 +803,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
} else if (is_migration_entry(entry)) {
folio = pfn_swap_entry_to_folio(entry);

- rss[mm_counter(&folio->page)]++;
+ rss[mm_counter(folio)]++;

if (!is_readable_migration_entry(entry) &&
is_cow_mapping(vm_flags)) {
@@ -834,7 +834,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* keep things as they are.
*/
folio_get(folio);
- rss[mm_counter(&folio->page)]++;
+ rss[mm_counter(folio)]++;
/* Cannot fail as these pages cannot get pinned. */
BUG_ON(page_try_dup_anon_rmap(&folio->page, false, src_vma));

@@ -1464,7 +1464,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
if (pte_young(ptent) && likely(vma_has_recency(vma)))
folio_mark_accessed(folio);
}
- rss[mm_counter(page)]--;
+ rss[mm_counter(folio)]--;
if (!delay_rmap) {
page_remove_rmap(page, vma, false);
if (unlikely(page_mapcount(page) < 0))
@@ -1492,7 +1492,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
* see zap_install_uffd_wp_if_needed().
*/
WARN_ON_ONCE(!vma_is_anonymous(vma));
- rss[mm_counter(&folio->page)]--;
+ rss[mm_counter(folio)]--;
if (is_device_private_entry(entry))
page_remove_rmap(&folio->page, vma, false);
folio_put(folio);
@@ -1507,7 +1507,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
folio = pfn_swap_entry_to_folio(entry);
if (!should_zap_page(details, folio))
continue;
- rss[mm_counter(&folio->page)]--;
+ rss[mm_counter(folio)]--;
} else if (pte_marker_entry_uffd_wp(entry)) {
/*
* For anon: always drop the marker; for file: only
diff --git a/mm/rmap.c b/mm/rmap.c
index 7a27a2b41802..7a563490ce08 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1678,7 +1678,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
set_huge_pte_at(mm, address, pvmw.pte, pteval,
hsz);
} else {
- dec_mm_counter(mm, mm_counter(&folio->page));
+ dec_mm_counter(mm, mm_counter(folio));
set_pte_at(mm, address, pvmw.pte, pteval);
}

@@ -1693,7 +1693,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* migration) will not expect userfaults on already
* copied pages.
*/
- dec_mm_counter(mm, mm_counter(&folio->page));
+ dec_mm_counter(mm, mm_counter(folio));
} else if (folio_test_anon(folio)) {
swp_entry_t entry = page_swap_entry(subpage);
pte_t swp_pte;
@@ -2075,7 +2075,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
set_huge_pte_at(mm, address, pvmw.pte, pteval,
hsz);
} else {
- dec_mm_counter(mm, mm_counter(&folio->page));
+ dec_mm_counter(mm, mm_counter(folio));
set_pte_at(mm, address, pvmw.pte, pteval);
}

@@ -2090,7 +2090,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
* migration) will not expect userfaults on already
* copied pages.
*/
- dec_mm_counter(mm, mm_counter(&folio->page));
+ dec_mm_counter(mm, mm_counter(folio));
} else {
swp_entry_t entry;
pte_t swp_pte;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 96d9eae5c7cc..9a6759fa9b06 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -124,7 +124,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
* Must happen after rmap, as mm_counter() checks mapping (via
* PageAnon()), which is set by __page_set_anon_rmap().
*/
- inc_mm_counter(dst_mm, mm_counter(page));
+ inc_mm_counter(dst_mm, mm_counter(folio));

set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);

--
2.27.0

2023-11-04 17:21:43

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] mm: memory: use a folio in zap_pte_range()

On Sat, Nov 04, 2023 at 11:55:18AM +0800, Kefeng Wang wrote:
> -/* Decides whether we should zap this page with the page pointer specified */
> -static inline bool should_zap_page(struct zap_details *details, struct page *page)
> +/* Decides whether we should zap this folio with the folio pointer specified */
> +static inline bool should_zap_page(struct zap_details *details, struct folio *folio)

Surely we should rename this to should_zap_folio()?

> @@ -1487,10 +1492,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> * see zap_install_uffd_wp_if_needed().
> */
> WARN_ON_ONCE(!vma_is_anonymous(vma));
> - rss[mm_counter(page)]--;
> + rss[mm_counter(&folio->page)]--;
> if (is_device_private_entry(entry))
> - page_remove_rmap(page, vma, false);
> - put_page(page);
> + page_remove_rmap(&folio->page, vma, false);
> + folio_put(folio);

This is wrong. If we have a PTE-mapped THP, you'll remove the head page
N times instead of removing each of N pages.

I suspect you're going to collide with Ryan's work by doing this ...

2023-11-06 02:31:08

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] mm: memory: use a folio in zap_pte_range()



On 2023/11/5 1:20, Matthew Wilcox wrote:
> On Sat, Nov 04, 2023 at 11:55:18AM +0800, Kefeng Wang wrote:
>> -/* Decides whether we should zap this page with the page pointer specified */
>> -static inline bool should_zap_page(struct zap_details *details, struct page *page)
>> +/* Decides whether we should zap this folio with the folio pointer specified */
>> +static inline bool should_zap_page(struct zap_details *details, struct folio *folio)
>
> Surely we should rename this to should_zap_folio()?
Will update.
>
>> @@ -1487,10 +1492,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>> * see zap_install_uffd_wp_if_needed().
>> */
>> WARN_ON_ONCE(!vma_is_anonymous(vma));
>> - rss[mm_counter(page)]--;
>> + rss[mm_counter(&folio->page)]--;
>> if (is_device_private_entry(entry))
>> - page_remove_rmap(page, vma, false);
>> - put_page(page);
>> + page_remove_rmap(&folio->page, vma, false);
>> + folio_put(folio);
>
> This is wrong. If we have a PTE-mapped THP, you'll remove the head page
> N times instead of removing each of N pages.

This is device private entry, I suppose that it won't be a THP and large
folio when check migrate_vma_check_page() and migrate_vma_insert_page(),
right?

>
> I suspect you're going to collide with Ryan's work by doing this ...
>
Maybe not if the above is true, at least for now.

Thanks.


2023-11-06 14:21:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] mm: memory: use a folio in zap_pte_range()

On Mon, Nov 06, 2023 at 10:30:59AM +0800, Kefeng Wang wrote:
> On 2023/11/5 1:20, Matthew Wilcox wrote:
> > > - page_remove_rmap(page, vma, false);
> > > - put_page(page);
> > > + page_remove_rmap(&folio->page, vma, false);
> > > + folio_put(folio);
> >
> > This is wrong. If we have a PTE-mapped THP, you'll remove the head page
> > N times instead of removing each of N pages.
>
> This is device private entry, I suppose that it won't be a THP and large
> folio when check migrate_vma_check_page() and migrate_vma_insert_page(),
> right?

I don't want to leave that kind of booby-trap in the code. Both places
which currently call page_remove_rmap() should be left as referring to
the page, not the folio.

2023-11-06 15:09:08

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] mm: memory: use a folio in zap_pte_range()



On 2023/11/6 22:20, Matthew Wilcox wrote:
> On Mon, Nov 06, 2023 at 10:30:59AM +0800, Kefeng Wang wrote:
>> On 2023/11/5 1:20, Matthew Wilcox wrote:
>>>> - page_remove_rmap(page, vma, false);
>>>> - put_page(page);
>>>> + page_remove_rmap(&folio->page, vma, false);
>>>> + folio_put(folio);
>>>
>>> This is wrong. If we have a PTE-mapped THP, you'll remove the head page
>>> N times instead of removing each of N pages.
>>
>> This is device private entry, I suppose that it won't be a THP and large
>> folio when check migrate_vma_check_page() and migrate_vma_insert_page(),
>> right?
>
> I don't want to leave that kind of booby-trap in the code. Both places
> which currently call page_remove_rmap() should be left as referring to
> the page, not the folio.

Sure, I will fix this, also page_try_dup_anon_rmap() for device private
entry in copy_nonpresent_pte of patch5.