2023-09-21 22:53:39

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 0/8] Fix set_huge_pte_at() panic on arm64

Hi All,

This series fixes a bug in arm64's implementation of set_huge_pte_at(), which
can result in an unprivileged user causing a kernel panic. The problem was
triggered when running the new uffd poison mm selftest for HUGETLB memory. This
test (and the uffd poison feature) was merged for v6.6-rc1. However, upon
inspection there are multiple other pre-existing paths that can trigger this
bug.

Ideally, I'd like to get this fix in for v6.6 if possible? And I guess it should
be backported too, given there are call sites where this can theoretically
happen that pre-date v6.6-rc1 (I've cc'ed [email protected]).


Description of Bug
------------------

arm64's huge pte implementation supports multiple huge page sizes, some of which
are implemented in the page table with contiguous mappings. So set_huge_pte_at()
needs to work out how big the logical pte is, so that it can also work out how
many physical ptes (or pmds) need to be written. It does this by grabbing the
folio out of the pte and querying its size.

However, there are cases when the pte being set is actually a swap entry. But
this also used to work fine, because for huge ptes, we only ever saw migration
entries and hwpoison entries. And both of these types of swap entries have a PFN
embedded, so the code would grab that and everything still worked out.

But over time, more calls to set_huge_pte_at() have been added that set swap
entry types that do not embed a PFN. And this causes the code to go bang. The
triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm:
add uffd unit test for UFFDIO_POISON"), which sets a PTE_MARKER_POISONED swap
entry. But review shows there are other places too (PTE_MARKER_UFFD_WP).

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it
will dereference a bad pointer in page_folio():

static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
{
VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));

return page_folio(pfn_to_page(swp_offset_pfn(entry)));
}

So the root cause is due to commit 18f3962953e4 ("mm: hugetlb: kill
set_huge_swap_pte_at()"), which aimed to simplify the interface to the core code
by removing set_huge_swap_pte_at() (which took a page size parameter) and
replacing it with calls to set_huge_swap_pte_at() where the size was inferred
from the folio, as descibed above. While that commit didn't break anything at
the time, it did break the interface because it couldn't handle swap entries
without PFNs. And since then new callers have come along which rely on this
working.


Fix
---

The simplest fix would have been to revert the dodgy cleanup commit, but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at(). As per the original intent of the change, it would also
leave us open to future bugs when people invariably get it wrong and call the
wrong helper.

So instead, I've converted the first parameter of set_huge_pte_at() to be a vma
rather than an mm. This means that the arm64 code can easily recover the huge
page size in all cases. It's a bigger change, due to needing to touch the arches
that implement the function, but it is entirely mechanical, so in my view, low
risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390 (and
additionally x86_64). I've additionally booted and run mm selftests against
arm64, where I observe the uffd poison test is fixed, and there are no other
regressions.


Patches
-------

patches 1-7: Convert core mm and arches to pass vma instead of mm
patch: 8: Fixes the arm64 bug

Patches based on v6.6-rc2.


Thanks,
Ryan


Ryan Roberts (8):
parisc: hugetlb: Convert set_huge_pte_at() to take vma
powerpc: hugetlb: Convert set_huge_pte_at() to take vma
riscv: hugetlb: Convert set_huge_pte_at() to take vma
s390: hugetlb: Convert set_huge_pte_at() to take vma
sparc: hugetlb: Convert set_huge_pte_at() to take vma
mm: hugetlb: Convert set_huge_pte_at() to take vma
arm64: hugetlb: Convert set_huge_pte_at() to take vma
arm64: hugetlb: Fix set_huge_pte_at() to work with all swap entries

arch/arm64/include/asm/hugetlb.h | 2 +-
arch/arm64/mm/hugetlbpage.c | 22 ++++----------
arch/parisc/include/asm/hugetlb.h | 2 +-
arch/parisc/mm/hugetlbpage.c | 4 +--
.../include/asm/nohash/32/hugetlb-8xx.h | 3 +-
arch/powerpc/mm/book3s64/hugetlbpage.c | 2 +-
arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 2 +-
arch/powerpc/mm/nohash/8xx.c | 2 +-
arch/powerpc/mm/pgtable.c | 7 ++++-
arch/riscv/include/asm/hugetlb.h | 2 +-
arch/riscv/mm/hugetlbpage.c | 3 +-
arch/s390/include/asm/hugetlb.h | 8 +++--
arch/s390/mm/hugetlbpage.c | 8 ++++-
arch/sparc/include/asm/hugetlb.h | 8 +++--
arch/sparc/mm/hugetlbpage.c | 8 ++++-
include/asm-generic/hugetlb.h | 6 ++--
include/linux/hugetlb.h | 6 ++--
mm/damon/vaddr.c | 2 +-
mm/hugetlb.c | 30 +++++++++----------
mm/migrate.c | 2 +-
mm/rmap.c | 10 +++----
mm/vmalloc.c | 5 +++-
22 files changed, 80 insertions(+), 64 deletions(-)

--
2.25.1


2023-09-21 23:57:01

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 4/8] s390: hugetlb: Convert set_huge_pte_at() to take vma

In order to fix a bug, arm64 needs access to the vma inside it's
implementation of set_huge_pte_at(). Provide for this by converting the
mm parameter to be a vma. Any implementations that require the mm can
access it via vma->vm_mm.

This commit makes the required s390 modifications. Separate commits
update the other arches and core code, before the actual bug is fixed in
arm64.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <[email protected]>
---
arch/s390/include/asm/hugetlb.h | 8 +++++---
arch/s390/mm/hugetlbpage.c | 8 +++++++-
2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index f07267875a19..da7f43b1871f 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -15,7 +15,9 @@
#define hugetlb_free_pgd_range free_pgd_range
#define hugepages_supported() (MACHINE_HAS_EDAT1)

-void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, pte_t pte);
+void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
pte_t huge_ptep_get(pte_t *ptep);
pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
@@ -65,7 +67,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
int changed = !pte_same(huge_ptep_get(ptep), pte);
if (changed) {
huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
- set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+ __set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
}
return changed;
}
@@ -74,7 +76,7 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
pte_t pte = huge_ptep_get_and_clear(mm, addr, ptep);
- set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
+ __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
}

static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index c718f2a0de94..d69e2dc6e508 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -142,7 +142,7 @@ static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
__storage_key_init_range(paddr, paddr + size - 1);
}

-void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
unsigned long rste;
@@ -163,6 +163,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
set_pte(ptep, __pte(rste));
}

+void set_huge_pte_at(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ __set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
+
pte_t huge_ptep_get(pte_t *ptep)
{
return __rste_to_pte(pte_val(*ptep));
--
2.25.1

2023-09-22 00:14:35

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 6/8] mm: hugetlb: Convert set_huge_pte_at() to take vma

In order to fix a bug, arm64 needs access to the vma inside it's
implementation of set_huge_pte_at(). Provide for this by converting the
mm parameter to be a vma. Any implementations that require the mm can
access it via vma->vm_mm.

This commit makes the required modifications to the core mm. Separate
commits update the arches, before the actual bug is fixed in arm64.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <[email protected]>
---
include/asm-generic/hugetlb.h | 6 +++---
include/linux/hugetlb.h | 6 +++---
mm/damon/vaddr.c | 2 +-
mm/hugetlb.c | 30 +++++++++++++++---------------
mm/migrate.c | 2 +-
mm/rmap.c | 10 +++++-----
mm/vmalloc.c | 5 ++++-
7 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 4da02798a00b..515e4777fb65 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -75,10 +75,10 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
#endif

#ifndef __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
+static inline void set_huge_pte_at(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, pte_t pte)
{
- set_pte_at(mm, addr, ptep, pte);
+ set_pte_at(vma->vm_mm, addr, ptep, pte);
}
#endif

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 5b2626063f4f..08184f32430c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -984,7 +984,7 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t old_pte, pte_t pte)
{
- set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+ set_huge_pte_at(vma, addr, ptep, pte);
}
#endif

@@ -1172,8 +1172,8 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
#endif
}

-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
+static inline void set_huge_pte_at(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, pte_t pte)
{
}

diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 4c81a9dbd044..55da8cee8fbc 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -347,7 +347,7 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
if (pte_young(entry)) {
referenced = true;
entry = pte_mkold(entry);
- set_huge_pte_at(mm, addr, pte, entry);
+ set_huge_pte_at(vma, addr, pte, entry);
}

#ifdef CONFIG_MMU_NOTIFIER
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ba6d39b71cb1..bcc30cd62586 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4988,7 +4988,7 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add
hugepage_add_new_anon_rmap(new_folio, vma, addr);
if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
newpte = huge_pte_mkuffd_wp(newpte);
- set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
+ set_huge_pte_at(vma, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5065,7 +5065,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
- set_huge_pte_at(dst, addr, dst_pte, entry);
+ set_huge_pte_at(dst_vma, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
bool uffd_wp = pte_swp_uffd_wp(entry);
@@ -5080,17 +5080,17 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
entry = pte_swp_mkuffd_wp(entry);
- set_huge_pte_at(src, addr, src_pte, entry);
+ set_huge_pte_at(src_vma, addr, src_pte, entry);
}
if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
- set_huge_pte_at(dst, addr, dst_pte, entry);
+ set_huge_pte_at(dst_vma, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
pte_marker marker = copy_pte_marker(
pte_to_swp_entry(entry), dst_vma);

if (marker)
- set_huge_pte_at(dst, addr, dst_pte,
+ set_huge_pte_at(dst_vma, addr, dst_pte,
make_pte_marker(marker));
} else {
entry = huge_ptep_get(src_pte);
@@ -5166,7 +5166,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);

- set_huge_pte_at(dst, addr, dst_pte, entry);
+ set_huge_pte_at(dst_vma, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
spin_unlock(src_ptl);
@@ -5202,7 +5202,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);

pte = huge_ptep_get_and_clear(mm, old_addr, src_pte);
- set_huge_pte_at(mm, new_addr, dst_pte, pte);
+ set_huge_pte_at(vma, new_addr, dst_pte, pte);

if (src_ptl != dst_ptl)
spin_unlock(src_ptl);
@@ -5336,7 +5336,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
*/
if (pte_swp_uffd_wp_any(pte) &&
!(zap_flags & ZAP_FLAG_DROP_MARKER))
- set_huge_pte_at(mm, address, ptep,
+ set_huge_pte_at(vma, address, ptep,
make_pte_marker(PTE_MARKER_UFFD_WP));
else
huge_pte_clear(mm, address, ptep, sz);
@@ -5370,7 +5370,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
/* Leave a uffd-wp pte marker if needed */
if (huge_pte_uffd_wp(pte) &&
!(zap_flags & ZAP_FLAG_DROP_MARKER))
- set_huge_pte_at(mm, address, ptep,
+ set_huge_pte_at(vma, address, ptep,
make_pte_marker(PTE_MARKER_UFFD_WP));
hugetlb_count_sub(pages_per_huge_page(h), mm);
page_remove_rmap(page, vma, true);
@@ -5676,7 +5676,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
hugepage_add_new_anon_rmap(new_folio, vma, haddr);
if (huge_pte_uffd_wp(pte))
newpte = huge_pte_mkuffd_wp(newpte);
- set_huge_pte_at(mm, haddr, ptep, newpte);
+ set_huge_pte_at(vma, haddr, ptep, newpte);
folio_set_hugetlb_migratable(new_folio);
/* Make the old page be freed below */
new_folio = old_folio;
@@ -5972,7 +5972,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
*/
if (unlikely(pte_marker_uffd_wp(old_pte)))
new_pte = huge_pte_mkuffd_wp(new_pte);
- set_huge_pte_at(mm, haddr, ptep, new_pte);
+ set_huge_pte_at(vma, haddr, ptep, new_pte);

hugetlb_count_add(pages_per_huge_page(h), mm);
if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
@@ -6261,7 +6261,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
}

_dst_pte = make_pte_marker(PTE_MARKER_POISONED);
- set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+ set_huge_pte_at(dst_vma, dst_addr, dst_pte, _dst_pte);

/* No need to invalidate - it was non-present before */
update_mmu_cache(dst_vma, dst_addr, dst_pte);
@@ -6412,7 +6412,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
if (wp_enabled)
_dst_pte = huge_pte_mkuffd_wp(_dst_pte);

- set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+ set_huge_pte_at(dst_vma, dst_addr, dst_pte, _dst_pte);

hugetlb_count_add(pages_per_huge_page(h), dst_mm);

@@ -6598,7 +6598,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
else if (uffd_wp_resolve)
newpte = pte_swp_clear_uffd_wp(newpte);
if (!pte_same(pte, newpte))
- set_huge_pte_at(mm, address, ptep, newpte);
+ set_huge_pte_at(vma, address, ptep, newpte);
} else if (unlikely(is_pte_marker(pte))) {
/* No other markers apply for now. */
WARN_ON_ONCE(!pte_marker_uffd_wp(pte));
@@ -6622,7 +6622,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
/* None pte */
if (unlikely(uffd_wp))
/* Safe to modify directly (none->non-present). */
- set_huge_pte_at(mm, address, ptep,
+ set_huge_pte_at(vma, address, ptep,
make_pte_marker(PTE_MARKER_UFFD_WP));
}
spin_unlock(ptl);
diff --git a/mm/migrate.c b/mm/migrate.c
index b7fa020003f3..6aa752984f32 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -251,7 +251,7 @@ static bool remove_migration_pte(struct folio *folio,
rmap_flags);
else
page_dup_file_rmap(new, true);
- set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
+ set_huge_pte_at(vma, pvmw.address, pvmw.pte, pte);
} else
#endif
{
diff --git a/mm/rmap.c b/mm/rmap.c
index ec7f8e6c9e48..a6353a0c67e8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1628,7 +1628,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (folio_test_hugetlb(folio)) {
hugetlb_count_sub(folio_nr_pages(folio), mm);
- set_huge_pte_at(mm, address, pvmw.pte, pteval);
+ set_huge_pte_at(vma, address, pvmw.pte, pteval);
} else {
dec_mm_counter(mm, mm_counter(&folio->page));
set_pte_at(mm, address, pvmw.pte, pteval);
@@ -2020,7 +2020,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (folio_test_hugetlb(folio)) {
hugetlb_count_sub(folio_nr_pages(folio), mm);
- set_huge_pte_at(mm, address, pvmw.pte, pteval);
+ set_huge_pte_at(vma, address, pvmw.pte, pteval);
} else {
dec_mm_counter(mm, mm_counter(&folio->page));
set_pte_at(mm, address, pvmw.pte, pteval);
@@ -2044,7 +2044,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,

if (arch_unmap_one(mm, vma, address, pteval) < 0) {
if (folio_test_hugetlb(folio))
- set_huge_pte_at(mm, address, pvmw.pte, pteval);
+ set_huge_pte_at(vma, address, pvmw.pte, pteval);
else
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
@@ -2058,7 +2058,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
if (anon_exclusive &&
page_try_share_anon_rmap(subpage)) {
if (folio_test_hugetlb(folio))
- set_huge_pte_at(mm, address, pvmw.pte, pteval);
+ set_huge_pte_at(vma, address, pvmw.pte, pteval);
else
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
@@ -2090,7 +2090,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
if (pte_uffd_wp(pteval))
swp_pte = pte_swp_mkuffd_wp(swp_pte);
if (folio_test_hugetlb(folio))
- set_huge_pte_at(mm, address, pvmw.pte, swp_pte);
+ set_huge_pte_at(vma, address, pvmw.pte, swp_pte);
else
set_pte_at(mm, address, pvmw.pte, swp_pte);
trace_set_migration_pte(address, pte_val(swp_pte),
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef8599d394fd..10fa40222f30 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -94,6 +94,9 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
unsigned int max_page_shift, pgtbl_mod_mask *mask)
{
+#ifdef CONFIG_HUGETLB_PAGE
+ struct vm_area_struct vma = TLB_FLUSH_VMA(&init_mm, 0);
+#endif
pte_t *pte;
u64 pfn;
unsigned long size = PAGE_SIZE;
@@ -111,7 +114,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte_t entry = pfn_pte(pfn, prot);

entry = arch_make_huge_pte(entry, ilog2(size), 0);
- set_huge_pte_at(&init_mm, addr, pte, entry);
+ set_huge_pte_at(&vma, addr, pte, entry);
pfn += PFN_DOWN(size);
continue;
}
--
2.25.1

2023-09-22 01:47:06

by SeongJae Park

[permalink] [raw]
Subject: Re: [PATCH v1 6/8] mm: hugetlb: Convert set_huge_pte_at() to take vma

Hi Ryan,

On Thu, 21 Sep 2023 17:20:05 +0100 Ryan Roberts <[email protected]> wrote:

> In order to fix a bug, arm64 needs access to the vma inside it's
> implementation of set_huge_pte_at(). Provide for this by converting the
> mm parameter to be a vma. Any implementations that require the mm can
> access it via vma->vm_mm.
>
> This commit makes the required modifications to the core mm. Separate
> commits update the arches, before the actual bug is fixed in arm64.
>
> No behavioral changes intended.
>
> Signed-off-by: Ryan Roberts <[email protected]>

For mm/damon/ part change,

Reviewed-by: SeongJae Park <[email protected]>


Thanks,
SJ


> ---
> include/asm-generic/hugetlb.h | 6 +++---
> include/linux/hugetlb.h | 6 +++---
> mm/damon/vaddr.c | 2 +-
> mm/hugetlb.c | 30 +++++++++++++++---------------
> mm/migrate.c | 2 +-
> mm/rmap.c | 10 +++++-----
> mm/vmalloc.c | 5 ++++-
> 7 files changed, 32 insertions(+), 29 deletions(-)
>
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index 4da02798a00b..515e4777fb65 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -75,10 +75,10 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
> #endif
>
> #ifndef __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
> -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep, pte_t pte)
> +static inline void set_huge_pte_at(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep, pte_t pte)
> {
> - set_pte_at(mm, addr, ptep, pte);
> + set_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> #endif
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 5b2626063f4f..08184f32430c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -984,7 +984,7 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t old_pte, pte_t pte)
> {
> - set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> + set_huge_pte_at(vma, addr, ptep, pte);
> }
> #endif
>
> @@ -1172,8 +1172,8 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> #endif
> }
>
> -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep, pte_t pte)
> +static inline void set_huge_pte_at(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep, pte_t pte)
> {
> }
>
> diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> index 4c81a9dbd044..55da8cee8fbc 100644
> --- a/mm/damon/vaddr.c
> +++ b/mm/damon/vaddr.c
> @@ -347,7 +347,7 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
> if (pte_young(entry)) {
> referenced = true;
> entry = pte_mkold(entry);
> - set_huge_pte_at(mm, addr, pte, entry);
> + set_huge_pte_at(vma, addr, pte, entry);
> }
>
> #ifdef CONFIG_MMU_NOTIFIER
[...]


Thanks,
SJ

2023-09-22 02:29:01

by Ryan Roberts

[permalink] [raw]
Subject: [PATCH v1 3/8] riscv: hugetlb: Convert set_huge_pte_at() to take vma

In order to fix a bug, arm64 needs access to the vma inside it's
implementation of set_huge_pte_at(). Provide for this by converting the
mm parameter to be a vma. Any implementations that require the mm can
access it via vma->vm_mm.

This commit makes the required riscv modifications. Separate commits
update the other arches and core code, before the actual bug is fixed in
arm64.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <[email protected]>
---
arch/riscv/include/asm/hugetlb.h | 2 +-
arch/riscv/mm/hugetlbpage.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 34e24f078cc1..be1ac8582bc2 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -17,7 +17,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz);

#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
-void set_huge_pte_at(struct mm_struct *mm,
+void set_huge_pte_at(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep, pte_t pte);

#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 96225a8533ad..7cdbf0960772 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -177,11 +177,12 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
return entry;
}

-void set_huge_pte_at(struct mm_struct *mm,
+void set_huge_pte_at(struct vm_area_struct *vma,
unsigned long addr,
pte_t *ptep,
pte_t pte)
{
+ struct mm_struct *mm = vma->vm_mm;
int i, pte_num;

if (!pte_napot(pte)) {
--
2.25.1

2023-09-22 06:00:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v1 0/8] Fix set_huge_pte_at() panic on arm64

On Thu, 21 Sep 2023 17:19:59 +0100 Ryan Roberts <[email protected]> wrote:

> Hi All,
>
> This series fixes a bug in arm64's implementation of set_huge_pte_at(), which
> can result in an unprivileged user causing a kernel panic. The problem was
> triggered when running the new uffd poison mm selftest for HUGETLB memory. This
> test (and the uffd poison feature) was merged for v6.6-rc1. However, upon
> inspection there are multiple other pre-existing paths that can trigger this
> bug.
>
> Ideally, I'd like to get this fix in for v6.6 if possible? And I guess it should
> be backported too, given there are call sites where this can theoretically
> happen that pre-date v6.6-rc1 (I've cc'ed [email protected]).

This gets you a naggygram from Greg. The way to request a backport is
to add cc:stable to all the changelogs. I'll make that change to my copy.


> Ryan Roberts (8):
> parisc: hugetlb: Convert set_huge_pte_at() to take vma
> powerpc: hugetlb: Convert set_huge_pte_at() to take vma
> riscv: hugetlb: Convert set_huge_pte_at() to take vma
> s390: hugetlb: Convert set_huge_pte_at() to take vma
> sparc: hugetlb: Convert set_huge_pte_at() to take vma
> mm: hugetlb: Convert set_huge_pte_at() to take vma
> arm64: hugetlb: Convert set_huge_pte_at() to take vma
> arm64: hugetlb: Fix set_huge_pte_at() to work with all swap entries
>
> arch/arm64/include/asm/hugetlb.h | 2 +-
> arch/arm64/mm/hugetlbpage.c | 22 ++++----------
> arch/parisc/include/asm/hugetlb.h | 2 +-
> arch/parisc/mm/hugetlbpage.c | 4 +--
> .../include/asm/nohash/32/hugetlb-8xx.h | 3 +-
> arch/powerpc/mm/book3s64/hugetlbpage.c | 2 +-
> arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 2 +-
> arch/powerpc/mm/nohash/8xx.c | 2 +-
> arch/powerpc/mm/pgtable.c | 7 ++++-
> arch/riscv/include/asm/hugetlb.h | 2 +-
> arch/riscv/mm/hugetlbpage.c | 3 +-
> arch/s390/include/asm/hugetlb.h | 8 +++--
> arch/s390/mm/hugetlbpage.c | 8 ++++-
> arch/sparc/include/asm/hugetlb.h | 8 +++--
> arch/sparc/mm/hugetlbpage.c | 8 ++++-
> include/asm-generic/hugetlb.h | 6 ++--
> include/linux/hugetlb.h | 6 ++--
> mm/damon/vaddr.c | 2 +-
> mm/hugetlb.c | 30 +++++++++----------
> mm/migrate.c | 2 +-
> mm/rmap.c | 10 +++----
> mm/vmalloc.c | 5 +++-
> 22 files changed, 80 insertions(+), 64 deletions(-)

Looks scary but it's actually a fairly modest patchset. It could
easily be all rolled into a single patch for ease of backporting.
Maybe Greg has an opinion?

2023-09-22 07:56:15

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v1 3/8] riscv: hugetlb: Convert set_huge_pte_at() to take vma

Hi Ryan,

On 21/09/2023 18:20, Ryan Roberts wrote:
> In order to fix a bug, arm64 needs access to the vma inside it's
> implementation of set_huge_pte_at(). Provide for this by converting the
> mm parameter to be a vma. Any implementations that require the mm can
> access it via vma->vm_mm.
>
> This commit makes the required riscv modifications. Separate commits
> update the other arches and core code, before the actual bug is fixed in
> arm64.
>
> No behavioral changes intended.
>
> Signed-off-by: Ryan Roberts <[email protected]>
> ---
> arch/riscv/include/asm/hugetlb.h | 2 +-
> arch/riscv/mm/hugetlbpage.c | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
> index 34e24f078cc1..be1ac8582bc2 100644
> --- a/arch/riscv/include/asm/hugetlb.h
> +++ b/arch/riscv/include/asm/hugetlb.h
> @@ -17,7 +17,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep, unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
> -void set_huge_pte_at(struct mm_struct *mm,
> +void set_huge_pte_at(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep, pte_t pte);
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
> index 96225a8533ad..7cdbf0960772 100644
> --- a/arch/riscv/mm/hugetlbpage.c
> +++ b/arch/riscv/mm/hugetlbpage.c
> @@ -177,11 +177,12 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
> return entry;
> }
>
> -void set_huge_pte_at(struct mm_struct *mm,
> +void set_huge_pte_at(struct vm_area_struct *vma,
> unsigned long addr,
> pte_t *ptep,
> pte_t pte)
> {
> + struct mm_struct *mm = vma->vm_mm;
> int i, pte_num;
>
> if (!pte_napot(pte)) {


You can add:

Reviewed-by: Alexandre Ghiti <[email protected]>

I realize that we may have the same issue with our contig pte
implementation (called napot in riscv) as we don't handle swap/migration
entries at all. So I guess we need something similar, and I'll implement
it (unless you want to do it of course, but I guess it's easier for me
to test). One (maybe stupid) question though: wouldn't it be possible to
extract the contig pte size from the value of ptep instead of using a vma?

Thanks,

Alex

2023-09-22 08:53:08

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v1 3/8] riscv: hugetlb: Convert set_huge_pte_at() to take vma

On 22/09/2023 08:54, Alexandre Ghiti wrote:
> Hi Ryan,
>
> On 21/09/2023 18:20, Ryan Roberts wrote:
>> In order to fix a bug, arm64 needs access to the vma inside it's
>> implementation of set_huge_pte_at(). Provide for this by converting the
>> mm parameter to be a vma. Any implementations that require the mm can
>> access it via vma->vm_mm.
>>
>> This commit makes the required riscv modifications. Separate commits
>> update the other arches and core code, before the actual bug is fixed in
>> arm64.
>>
>> No behavioral changes intended.
>>
>> Signed-off-by: Ryan Roberts <[email protected]>
>> ---
>>   arch/riscv/include/asm/hugetlb.h | 2 +-
>>   arch/riscv/mm/hugetlbpage.c      | 3 ++-
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
>> index 34e24f078cc1..be1ac8582bc2 100644
>> --- a/arch/riscv/include/asm/hugetlb.h
>> +++ b/arch/riscv/include/asm/hugetlb.h
>> @@ -17,7 +17,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>>               pte_t *ptep, unsigned long sz);
>>     #define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
>> -void set_huge_pte_at(struct mm_struct *mm,
>> +void set_huge_pte_at(struct vm_area_struct *vma,
>>                unsigned long addr, pte_t *ptep, pte_t pte);
>>     #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
>> index 96225a8533ad..7cdbf0960772 100644
>> --- a/arch/riscv/mm/hugetlbpage.c
>> +++ b/arch/riscv/mm/hugetlbpage.c
>> @@ -177,11 +177,12 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int
>> shift, vm_flags_t flags)
>>       return entry;
>>   }
>>   -void set_huge_pte_at(struct mm_struct *mm,
>> +void set_huge_pte_at(struct vm_area_struct *vma,
>>                unsigned long addr,
>>                pte_t *ptep,
>>                pte_t pte)
>>   {
>> +    struct mm_struct *mm = vma->vm_mm;
>>       int i, pte_num;
>>         if (!pte_napot(pte)) {
>
>
> You can add:
>
> Reviewed-by: Alexandre Ghiti <[email protected]>

Thanks!

>
> I realize that we may have the same issue with our contig pte implementation
> (called napot in riscv) as we don't handle swap/migration entries at all. So I
> guess we need something similar, and I'll implement it (unless you want to do it
> of course, but I guess it's easier for me to test).

Yes -I'll leave you to do the riscv part.

> One (maybe stupid) question
> though: wouldn't it be possible to extract the contig pte size from the value of
> ptep instead of using a vma?

Not for arm64: We support contpmd, pmd and contpte entries as backing for the
logical huge pte, depending on size. So without the size, we can't distinguish
between a coincidentally-aligned pmd entry vs a contpmd entry (which is just a
fixed size block of pmd entries).

Discussion with Christophe on the powerpc patch triggered some thinking; There
is theoretical problem with my current approach because there is one call site
in the core code that calls set_huge_pte_at(&init_mm). I've changed that to:

struct vm_area_struct vma = TLB_FLUSH_VMA(&init_mm, 0);
set_huge_pte_at(&vma);

knowing that this will never actually get called for arm64 because we return
PAGE_SIZE for arch_vmap_pte_range_map_size() and all other arches just take the
mm and ignore the rest of the vma. So it's safe, but fragile.

But it looks like riscv overrides arch_vmap_pte_range_map_size() and therefore
the call will be made there. And if riscv also needs to determine the size from
the vma, then bang.

So I'm going to rework it to continue to pass the mm in, but also add a size
parameter. Then it's totally safe. Will post a v2 later today.

Thanks,
Ryan

>
> Thanks,
>
> Alex
>