2024-03-06 10:42:19

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 00/13] mm/treewide: Remove pXd_huge() API

From: Peter Xu <[email protected]>

[based on akpm/mm-unstable latest commit a7f399ae964e]

In previous work [1], we removed the pXd_large() API, which is arch
specific. This patchset further removes the hugetlb pXd_huge() API.

Hugetlb was never special on creating huge mappings when compared with
other huge mappings. Having a standalone API just to detect such pgtable
entries is more or less redundant, especially after the pXd_leaf() API set
is introduced with/without CONFIG_HUGETLB_PAGE.

When looking at this problem, a few issues are also exposed that we don't
have a clear definition of the *_huge() variance API. This patchset
started by cleaning these issues first, then replace all *_huge() users to
use *_leaf(), then drop all *_huge() code.

On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for
all the rest archs they're reported "false" instead. This part is done in
patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll
leave that to hmm experts to decide.

Besides, there are three archs (arm, arm64, powerpc) that have slightly
different definitions between the *_huge() v.s. *_leaf() variances. I
tackled them separately so that it'll be easier for arch experts to chim in
when necessary. This part is done in patch 6-9.

The final patches 10-13 do the rest on the final removal, since *_leaf()
will be the ultimate API in the future, and we seem to have quite some
confusions on how *_huge() APIs can be defined, provide a rich comment for
*_leaf() API set to define them properly to avoid future misuse, and
hopefully that'll also help new archs to start support huge mappings and
avoid traps (like either swap entries, or PROT_NONE entry checks).

The whole series is only lightly tested on x86, while as usual I don't have
the capability to test all archs that it touches.

Marking this series RFC as of now.

[1] https://lore.kernel.org/r/[email protected]

Peter Xu (13):
mm/hmm: Process pud swap entry without pud_huge()
mm/gup: Cache p4d in follow_p4d_mask()
mm/gup: Check p4d presence before going on
mm/x86: Change pXd_huge() behavior to exclude swap entries
mm/sparc: Change pXd_huge() behavior to exclude swap entries
mm/arm: Use macros to define pmd/pud helpers
mm/arm: Redefine pmd_huge() with pmd_leaf()
mm/arm64: Merge pXd_huge() and pXd_leaf() definitions
mm/powerpc: Redefine pXd_huge() with pXd_leaf()
mm/gup: Merge pXd huge mapping checks
mm/treewide: Replace pXd_huge() with pXd_leaf()
mm/treewide: Remove pXd_huge()
mm: Document pXd_leaf() API

arch/arm/include/asm/pgtable-2level.h | 4 +--
arch/arm/include/asm/pgtable-3level-hwdef.h | 1 +
arch/arm/include/asm/pgtable-3level.h | 6 ++--
arch/arm/mm/Makefile | 1 -
arch/arm/mm/hugetlbpage.c | 34 -------------------
arch/arm64/include/asm/pgtable.h | 6 +++-
arch/arm64/mm/hugetlbpage.c | 18 ++--------
arch/loongarch/mm/hugetlbpage.c | 12 +------
arch/mips/include/asm/pgtable-32.h | 2 +-
arch/mips/include/asm/pgtable-64.h | 2 +-
arch/mips/mm/hugetlbpage.c | 10 ------
arch/mips/mm/tlb-r4k.c | 2 +-
arch/parisc/mm/hugetlbpage.c | 11 ------
.../include/asm/book3s/64/pgtable-4k.h | 20 -----------
.../include/asm/book3s/64/pgtable-64k.h | 25 --------------
arch/powerpc/include/asm/book3s/64/pgtable.h | 3 --
arch/powerpc/include/asm/nohash/pgtable.h | 10 ------
arch/powerpc/mm/pgtable_64.c | 6 ++--
arch/riscv/mm/hugetlbpage.c | 10 ------
arch/s390/mm/hugetlbpage.c | 10 ------
arch/sh/mm/hugetlbpage.c | 10 ------
arch/sparc/mm/hugetlbpage.c | 12 -------
arch/x86/mm/hugetlbpage.c | 26 --------------
arch/x86/mm/pgtable.c | 4 +--
include/linux/hugetlb.h | 24 -------------
include/linux/pgtable.h | 24 ++++++++++---
mm/gup.c | 24 ++++++-------
mm/hmm.c | 9 ++---
mm/memory.c | 2 +-
29 files changed, 56 insertions(+), 272 deletions(-)
delete mode 100644 arch/arm/mm/hugetlbpage.c

--
2.44.0



2024-03-06 10:42:45

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 02/13] mm/gup: Cache p4d in follow_p4d_mask()

From: Peter Xu <[email protected]>

Add a variable to cache p4d in follow_p4d_mask(). It's a good practise to
make sure all the following checks will have a consistent view of the entry.

Signed-off-by: Peter Xu <[email protected]>
---
mm/gup.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index df83182ec72d..69a777f4fc5c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -772,16 +772,17 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
unsigned int flags,
struct follow_page_context *ctx)
{
- p4d_t *p4d;
+ p4d_t *p4dp, p4d;

- p4d = p4d_offset(pgdp, address);
- if (p4d_none(*p4d))
+ p4dp = p4d_offset(pgdp, address);
+ p4d = READ_ONCE(*p4dp);
+ if (p4d_none(p4d))
return no_page_table(vma, flags);
- BUILD_BUG_ON(p4d_huge(*p4d));
- if (unlikely(p4d_bad(*p4d)))
+ BUILD_BUG_ON(p4d_huge(p4d));
+ if (unlikely(p4d_bad(p4d)))
return no_page_table(vma, flags);

- return follow_pud_mask(vma, address, p4d, flags, ctx);
+ return follow_pud_mask(vma, address, p4dp, flags, ctx);
}

/**
--
2.44.0


2024-03-06 10:42:57

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 03/13] mm/gup: Check p4d presence before going on

From: Peter Xu <[email protected]>

Currently there should have no p4d swap entries so it may not matter much,
however this may help us to rule out swap entries in pXd_huge() API, which
will include p4d_huge(). The p4d_present() checks make it 100% clear that
we won't rely on p4d_huge() for swap entries.

Signed-off-by: Peter Xu <[email protected]>
---
mm/gup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 69a777f4fc5c..f87a7d90f2ae 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -776,7 +776,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,

p4dp = p4d_offset(pgdp, address);
p4d = READ_ONCE(*p4dp);
- if (p4d_none(p4d))
+ if (p4d_none(p4d) || !p4d_present(p4d))
return no_page_table(vma, flags);
BUILD_BUG_ON(p4d_huge(p4d));
if (unlikely(p4d_bad(p4d)))
@@ -3069,7 +3069,7 @@ static int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, unsigned lo
p4d_t p4d = READ_ONCE(*p4dp);

next = p4d_addr_end(addr, end);
- if (p4d_none(p4d))
+ if (p4d_none(p4d) || !p4d_present(p4d))
return 0;
BUILD_BUG_ON(p4d_huge(p4d));
if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) {
--
2.44.0


2024-03-06 10:43:10

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 04/13] mm/x86: Change pXd_huge() behavior to exclude swap entries

From: Peter Xu <[email protected]>

This patch partly reverts below commits:

3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage")

Right now, pXd_huge() definition across kernel is unclear. We have two
groups that think differently on swap entries:

- x86/sparc: Allow pXd_huge() to accept swap entries
- all the rest: Doesn't allow pXd_huge() to accept swap entries

This is so confusing. Since the sparc helpers seem to be added in 2016,
which is after x86's (2015), so sparc could have followed a trend. x86
proposed such swap handling in 2015 to resolve hugetlb swap entries hit in
GUP, but now GUP guards swap entries with !pXd_present() in all layers so
we should be safe.

We should define this API properly, one way or another, rather than keep
them defined differently across archs.

Gut feeling tells me that pXd_huge() shouldn't include swap entries, and it
turns out that I am not the only one thinking so, the question was raised
when the current pmd_huge() for x86 was proposed by Ville Syrjälä:

https://lore.kernel.org/all/[email protected]/

I might also be missing something obvious, but why is it even necessary
to treat PRESENT==0+PSE==0 as a huge entry?

It is also questioned when Jason Gunthorpe reviewed the other patchset on
swap entry handlings:

https://lore.kernel.org/all/[email protected]/

Revert its meaning back to original. It shouldn't have any functional
change as we should be ready with guards on !pXd_present() explicitly
everywhere.

Note that I also dropped the "#if CONFIG_PGTABLE_LEVELS > 2", it was there
probably because it was breaking things when 3a194f3f8ad0 was proposed,
according to the report here:

https://lore.kernel.org/all/[email protected]/

Now we shouldn't need that.

Instead of reverting to _PAGE_PSE raw check, leverage pXd_leaf().

Cc: Naoya Horiguchi <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/x86/mm/hugetlbpage.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 5804bbae4f01..8362953a24ce 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -20,29 +20,19 @@
#include <asm/elf.h>

/*
- * pmd_huge() returns 1 if @pmd is hugetlb related entry, that is normal
- * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
- * Otherwise, returns 0.
+ * pmd_huge() returns 1 if @pmd is hugetlb related entry.
*/
int pmd_huge(pmd_t pmd)
{
- return !pmd_none(pmd) &&
- (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
+ return pmd_leaf(pmd);
}

/*
- * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
- * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
- * Otherwise, returns 0.
+ * pud_huge() returns 1 if @pud is hugetlb related entry.
*/
int pud_huge(pud_t pud)
{
-#if CONFIG_PGTABLE_LEVELS > 2
- return !pud_none(pud) &&
- (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
-#else
- return 0;
-#endif
+ return pud_leaf(pud);
}

#ifdef CONFIG_HUGETLB_PAGE
--
2.44.0


2024-03-06 10:43:49

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf()

From: Peter Xu <[email protected]>

Most of the archs already define these two APIs the same way. ARM is more
complicated in two aspects:

- For pXd_huge() it's always checking against !PXD_TABLE_BIT, while for
pXd_leaf() it's always checking against PXD_TYPE_SECT.

- SECT/TABLE bits are defined differently on 2-level v.s. 3-level ARM
pgtables, which makes the whole thing even harder to follow.

Luckily, the second complexity should be hidden by the pmd_leaf()
implementation against 2-level v.s. 3-level headers. Invoke pmd_leaf()
directly for pmd_huge(), to remove the first part of complexity. This
prepares to drop pXd_huge() API globally.

When at it, drop the obsolete comments - it's outdated.

Cc: Russell King <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/arm/mm/hugetlbpage.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index dd7a0277c5c0..c2fa643f6bb5 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -18,11 +18,6 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>

-/*
- * On ARM, huge pages are backed by pmd's rather than pte's, so we do a lot
- * of type casting from pmd_t * to pte_t *.
- */
-
int pud_huge(pud_t pud)
{
return 0;
@@ -30,5 +25,5 @@ int pud_huge(pud_t pud)

int pmd_huge(pmd_t pmd)
{
- return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
+ return pmd_leaf(pmd);
}
--
2.44.0


2024-03-06 10:44:04

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions

From: Peter Xu <[email protected]>

Unlike most archs, aarch64 defines pXd_huge() and pXd_leaf() slightly
differently. Redefine the pXd_huge() with pXd_leaf().

There used to be two traps for old aarch64 definitions over these APIs that
I found when reading the code around, they're:

(1) 4797ec2dc83a ("arm64: fix pud_huge() for 2-level pagetables")
(2) 23bc8f69f0ec ("arm64: mm: fix p?d_leaf()")

Define pXd_huge() with the current pXd_leaf() will make sure (2) isn't a
problem (on PROT_NONE checks). To make sure it also works for (1), we move
over the __PAGETABLE_PMD_FOLDED check to pud_leaf(), allowing it to
constantly returning "false" for 2-level pgtables, which looks even safer
to cover both now.

Cc: Muchun Song <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/arm64/include/asm/pgtable.h | 4 ++++
arch/arm64/mm/hugetlbpage.c | 8 ++------
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 401087e8a43d..14d24c357c7a 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -704,7 +704,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
#define pud_none(pud) (!pud_val(pud))
#define pud_bad(pud) (!pud_table(pud))
#define pud_present(pud) pte_present(pud_pte(pud))
+#ifndef __PAGETABLE_PMD_FOLDED
#define pud_leaf(pud) (pud_present(pud) && !pud_table(pud))
+#else
+#define pud_leaf(pud) false
+#endif
#define pud_valid(pud) pte_valid(pud_pte(pud))
#define pud_user(pud) pte_user(pud_pte(pud))
#define pud_user_exec(pud) pte_user_exec(pud_pte(pud))
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 0f0e10bb0a95..1234bbaef5bf 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -81,16 +81,12 @@ bool arch_hugetlb_migration_supported(struct hstate *h)

int pmd_huge(pmd_t pmd)
{
- return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
+ return pmd_leaf(pmd);
}

int pud_huge(pud_t pud)
{
-#ifndef __PAGETABLE_PMD_FOLDED
- return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
-#else
- return 0;
-#endif
+ return pud_leaf(pud);
}

static int find_num_contig(struct mm_struct *mm, unsigned long addr,
--
2.44.0


2024-03-06 10:44:24

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

From: Peter Xu <[email protected]>

PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
constantly returns 0 for hash MMUs. AFAICT that is fine to be removed,
because pXd_huge() reflects a hugetlb entry, while it's own hugetlb pgtable
lookup function (__find_linux_pte() shared by all powerpc code) already use
pXd_leaf() irrelevant of the MMU type. It means pXd_leaf() should work all
fine with hash MMU pgtables or something could already went wrong.

The goal should be that we will have one API pXd_leaf() to detect all kinds
of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than
pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.

This helps to simplify a follow up patch to drop pXd_huge() treewide.

Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/powerpc/include/asm/book3s/64/pgtable-4k.h | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
index 48f21820afe2..92545981bb49 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
@@ -8,22 +8,12 @@
#ifdef CONFIG_HUGETLB_PAGE
static inline int pmd_huge(pmd_t pmd)
{
- /*
- * leaf pte for huge page
- */
- if (radix_enabled())
- return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
- return 0;
+ return pmd_leaf(pmd);
}

static inline int pud_huge(pud_t pud)
{
- /*
- * leaf pte for huge page
- */
- if (radix_enabled())
- return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
- return 0;
+ return pud_leaf(pud);
}

/*
--
2.44.0


2024-03-06 10:44:54

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 06/13] mm/arm: Use macros to define pmd/pud helpers

From: Peter Xu <[email protected]>

It's already confusing that ARM 2-level v.s. 3-level defines SECT bit
differently on pmd/puds. Always use a macro which is much clearer.

Cc: Russell King <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/arm/include/asm/pgtable-2level.h | 4 ++--
arch/arm/include/asm/pgtable-3level-hwdef.h | 1 +
arch/arm/include/asm/pgtable-3level.h | 4 ++--
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index b0a262566eb9..4245c2e74720 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -213,8 +213,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)

#define pmd_pfn(pmd) (__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))

-#define pmd_leaf(pmd) (pmd_val(pmd) & 2)
-#define pmd_bad(pmd) (pmd_val(pmd) & 2)
+#define pmd_leaf(pmd) (pmd_val(pmd) & PMD_TYPE_SECT)
+#define pmd_bad(pmd) pmd_leaf(pmd)
#define pmd_present(pmd) (pmd_val(pmd))

#define copy_pmd(pmdpd,pmdps) \
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index 2f35b4eddaa8..e7b666cf0060 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -14,6 +14,7 @@
* + Level 1/2 descriptor
* - common
*/
+#define PUD_TABLE_BIT (_AT(pmdval_t, 1) << 1)
#define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0)
#define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0)
#define PMD_TYPE_TABLE (_AT(pmdval_t, 3) << 0)
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 4b1d9eb3908a..e7aecbef75c9 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -112,7 +112,7 @@
#ifndef __ASSEMBLY__

#define pud_none(pud) (!pud_val(pud))
-#define pud_bad(pud) (!(pud_val(pud) & 2))
+#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
#define pud_present(pud) (pud_val(pud))
#define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_TABLE)
@@ -137,7 +137,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
}

-#define pmd_bad(pmd) (!(pmd_val(pmd) & 2))
+#define pmd_bad(pmd) (!(pmd_val(pmd) & PMD_TABLE_BIT))

#define copy_pmd(pmdpd,pmdps) \
do { \
--
2.44.0


2024-03-06 10:44:58

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf()

From: Peter Xu <[email protected]>

Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(),
reuse it. Luckily, pXd_huge() isn't widely used.

Signed-off-by: Peter Xu <[email protected]>
---
arch/arm/include/asm/pgtable-3level.h | 2 +-
arch/arm64/include/asm/pgtable.h | 2 +-
arch/arm64/mm/hugetlbpage.c | 4 ++--
arch/loongarch/mm/hugetlbpage.c | 2 +-
arch/mips/mm/tlb-r4k.c | 2 +-
arch/powerpc/mm/pgtable_64.c | 6 +++---
arch/x86/mm/pgtable.c | 4 ++--
mm/gup.c | 4 ++--
mm/hmm.c | 2 +-
mm/memory.c | 2 +-
10 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index e7aecbef75c9..9e3c44f0aea2 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -190,7 +190,7 @@ static inline pte_t pte_mkspecial(pte_t pte)
#define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY))

#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd))
-#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
+#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd))

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define pmd_trans_huge(pmd) (pmd_val(pmd) && !pmd_table(pmd))
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 14d24c357c7a..c4efa47fed5f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -512,7 +512,7 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd)
return pmd;
}

-#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
+#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd))

#define pmd_write(pmd) pte_write(pmd_pte(pmd))

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 1234bbaef5bf..f494fc31201f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -321,7 +321,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
if (sz != PUD_SIZE && pud_none(pud))
return NULL;
/* hugepage or swap? */
- if (pud_huge(pud) || !pud_present(pud))
+ if (pud_leaf(pud) || !pud_present(pud))
return (pte_t *)pudp;
/* table; check the next level */

@@ -333,7 +333,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
if (!(sz == PMD_SIZE || sz == CONT_PMD_SIZE) &&
pmd_none(pmd))
return NULL;
- if (pmd_huge(pmd) || !pmd_present(pmd))
+ if (pmd_leaf(pmd) || !pmd_present(pmd))
return (pte_t *)pmdp;

if (sz == CONT_PTE_SIZE)
diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c
index 1e76fcb83093..a4e78e74aa21 100644
--- a/arch/loongarch/mm/hugetlbpage.c
+++ b/arch/loongarch/mm/hugetlbpage.c
@@ -64,7 +64,7 @@ uint64_t pmd_to_entrylo(unsigned long pmd_val)
{
uint64_t val;
/* PMD as PTE. Must be huge page */
- if (!pmd_huge(__pmd(pmd_val)))
+ if (!pmd_leaf(__pmd(pmd_val)))
panic("%s", __func__);

val = pmd_val ^ _PAGE_HUGE;
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index 4106084e57d7..76f3b9c0a9f0 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -326,7 +326,7 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
idx = read_c0_index();
#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
/* this could be a huge page */
- if (pmd_huge(*pmdp)) {
+ if (pmd_leaf(*pmdp)) {
unsigned long lo;
write_c0_pagemask(PM_HUGE_MASK);
ptep = (pte_t *)pmdp;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 9b99113cb51a..6621cfc3baf8 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -102,7 +102,7 @@ struct page *p4d_page(p4d_t p4d)
{
if (p4d_leaf(p4d)) {
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
- VM_WARN_ON(!p4d_huge(p4d));
+ VM_WARN_ON(!p4d_leaf(p4d));
return pte_page(p4d_pte(p4d));
}
return virt_to_page(p4d_pgtable(p4d));
@@ -113,7 +113,7 @@ struct page *pud_page(pud_t pud)
{
if (pud_leaf(pud)) {
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
- VM_WARN_ON(!pud_huge(pud));
+ VM_WARN_ON(!pud_leaf(pud));
return pte_page(pud_pte(pud));
}
return virt_to_page(pud_pgtable(pud));
@@ -132,7 +132,7 @@ struct page *pmd_page(pmd_t pmd)
* enabled so these checks can't be used.
*/
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
- VM_WARN_ON(!(pmd_leaf(pmd) || pmd_huge(pmd)));
+ VM_WARN_ON(!pmd_leaf(pmd));
return pte_page(pmd_pte(pmd));
}
return virt_to_page(pmd_page_vaddr(pmd));
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ff690ddc2334..d74f0814e086 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -731,7 +731,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
return 0;

/* Bail out if we are we on a populated non-leaf entry: */
- if (pud_present(*pud) && !pud_huge(*pud))
+ if (pud_present(*pud) && !pud_leaf(*pud))
return 0;

set_pte((pte_t *)pud, pfn_pte(
@@ -760,7 +760,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
}

/* Bail out if we are we on a populated non-leaf entry: */
- if (pmd_present(*pmd) && !pmd_huge(*pmd))
+ if (pmd_present(*pmd) && !pmd_leaf(*pmd))
return 0;

set_pte((pte_t *)pmd, pfn_pte(
diff --git a/mm/gup.c b/mm/gup.c
index de7188d2af86..9c0bc9d49c32 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -778,7 +778,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
p4d = READ_ONCE(*p4dp);
if (p4d_none(p4d) || !p4d_present(p4d))
return no_page_table(vma, flags);
- BUILD_BUG_ON(p4d_huge(p4d));
+ BUILD_BUG_ON(p4d_leaf(p4d));
if (unlikely(p4d_bad(p4d)))
return no_page_table(vma, flags);

@@ -3070,7 +3070,7 @@ static int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, unsigned lo
next = p4d_addr_end(addr, end);
if (p4d_none(p4d) || !p4d_present(p4d))
return 0;
- BUILD_BUG_ON(p4d_huge(p4d));
+ BUILD_BUG_ON(p4d_leaf(p4d));
if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) {
if (!gup_huge_pd(__hugepd(p4d_val(p4d)), addr,
P4D_SHIFT, next, flags, pages, nr))
diff --git a/mm/hmm.c b/mm/hmm.c
index c44391a0246e..3b526c529b55 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -429,7 +429,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
return hmm_vma_walk_hole(start, end, -1, walk);
}

- if (pud_huge(pud) && pud_devmap(pud)) {
+ if (pud_leaf(pud) && pud_devmap(pud)) {
unsigned long i, npages, pfn;
unsigned int required_fault;
unsigned long *hmm_pfns;
diff --git a/mm/memory.c b/mm/memory.c
index e17669d4f72f..96562053be9c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2763,7 +2763,7 @@ static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
unsigned long next;
int err = 0;

- BUG_ON(pud_huge(*pud));
+ BUG_ON(pud_leaf(*pud));

if (create) {
pmd = pmd_alloc_track(mm, pud, addr, mask);
--
2.44.0


2024-03-06 10:45:12

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 12/13] mm/treewide: Remove pXd_huge()

From: Peter Xu <[email protected]>

This API is not used anymore, drop it for the whole tree.

Signed-off-by: Peter Xu <[email protected]>
---
arch/arm/mm/Makefile | 1 -
arch/arm/mm/hugetlbpage.c | 29 -------------------
arch/arm64/mm/hugetlbpage.c | 10 -------
arch/loongarch/mm/hugetlbpage.c | 10 -------
arch/mips/include/asm/pgtable-32.h | 2 +-
arch/mips/include/asm/pgtable-64.h | 2 +-
arch/mips/mm/hugetlbpage.c | 10 -------
arch/parisc/mm/hugetlbpage.c | 11 -------
.../include/asm/book3s/64/pgtable-4k.h | 10 -------
.../include/asm/book3s/64/pgtable-64k.h | 25 ----------------
arch/powerpc/include/asm/book3s/64/pgtable.h | 3 --
arch/powerpc/include/asm/nohash/pgtable.h | 10 -------
arch/riscv/mm/hugetlbpage.c | 10 -------
arch/s390/mm/hugetlbpage.c | 10 -------
arch/sh/mm/hugetlbpage.c | 10 -------
arch/sparc/mm/hugetlbpage.c | 10 -------
arch/x86/mm/hugetlbpage.c | 16 ----------
include/linux/hugetlb.h | 24 ---------------
18 files changed, 2 insertions(+), 201 deletions(-)
delete mode 100644 arch/arm/mm/hugetlbpage.c

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 71b858c9b10c..1779e12db085 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -21,7 +21,6 @@ KASAN_SANITIZE_physaddr.o := n
obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o

obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
-obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_ARM_PV_FIXUP) += pv-fixup-asm.o

obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o
diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
deleted file mode 100644
index c2fa643f6bb5..000000000000
--- a/arch/arm/mm/hugetlbpage.c
+++ /dev/null
@@ -1,29 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * arch/arm/mm/hugetlbpage.c
- *
- * Copyright (C) 2012 ARM Ltd.
- *
- * Based on arch/x86/include/asm/hugetlb.h and Bill Carson's patches
- */
-
-#include <linux/init.h>
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/hugetlb.h>
-#include <linux/pagemap.h>
-#include <linux/err.h>
-#include <linux/sysctl.h>
-#include <asm/mman.h>
-#include <asm/tlb.h>
-#include <asm/tlbflush.h>
-
-int pud_huge(pud_t pud)
-{
- return 0;
-}
-
-int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);
-}
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index f494fc31201f..ca58210d6c07 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -79,16 +79,6 @@ bool arch_hugetlb_migration_supported(struct hstate *h)
}
#endif

-int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);
-}
-
-int pud_huge(pud_t pud)
-{
- return pud_leaf(pud);
-}
-
static int find_num_contig(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, size_t *pgsize)
{
diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c
index a4e78e74aa21..12222c56cb59 100644
--- a/arch/loongarch/mm/hugetlbpage.c
+++ b/arch/loongarch/mm/hugetlbpage.c
@@ -50,16 +50,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
return (pte_t *) pmd;
}

-int pmd_huge(pmd_t pmd)
-{
- return (pmd_val(pmd) & _PAGE_HUGE) != 0;
-}
-
-int pud_huge(pud_t pud)
-{
- return (pud_val(pud) & _PAGE_HUGE) != 0;
-}
-
uint64_t pmd_to_entrylo(unsigned long pmd_val)
{
uint64_t val;
diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h
index 0e196650f4f4..92b7591aac2a 100644
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -129,7 +129,7 @@ static inline int pmd_none(pmd_t pmd)
static inline int pmd_bad(pmd_t pmd)
{
#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
- /* pmd_huge(pmd) but inline */
+ /* pmd_leaf(pmd) but inline */
if (unlikely(pmd_val(pmd) & _PAGE_HUGE))
return 0;
#endif
diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..7c28510b3768 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -245,7 +245,7 @@ static inline int pmd_none(pmd_t pmd)
static inline int pmd_bad(pmd_t pmd)
{
#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
- /* pmd_huge(pmd) but inline */
+ /* pmd_leaf(pmd) but inline */
if (unlikely(pmd_val(pmd) & _PAGE_HUGE))
return 0;
#endif
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index 7eaff5b07873..0b9e15555b59 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -57,13 +57,3 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
}
return (pte_t *) pmd;
}
-
-int pmd_huge(pmd_t pmd)
-{
- return (pmd_val(pmd) & _PAGE_HUGE) != 0;
-}
-
-int pud_huge(pud_t pud)
-{
- return (pud_val(pud) & _PAGE_HUGE) != 0;
-}
diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index a9f7e21f6656..0356199bd9e7 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -180,14 +180,3 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
}
return changed;
}
-
-
-int pmd_huge(pmd_t pmd)
-{
- return 0;
-}
-
-int pud_huge(pud_t pud)
-{
- return 0;
-}
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
index 92545981bb49..baf934578c3a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
@@ -6,16 +6,6 @@
*/
#ifndef __ASSEMBLY__
#ifdef CONFIG_HUGETLB_PAGE
-static inline int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);
-}
-
-static inline int pud_huge(pud_t pud)
-{
- return pud_leaf(pud);
-}
-
/*
* With radix , we have hugepage ptes in the pud and pmd entries. We don't
* need to setup hugepage directory for them. Our pte and page directory format
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
index 2fce3498b000..579a7153857f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
@@ -4,31 +4,6 @@

#ifndef __ASSEMBLY__
#ifdef CONFIG_HUGETLB_PAGE
-/*
- * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have
- * 16GB hugepage pte in PGD and 16MB hugepage pte at PMD;
- *
- * Defined in such a way that we can optimize away code block at build time
- * if CONFIG_HUGETLB_PAGE=n.
- *
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on CONFIG_HUGETLB_PAGE
- */
-static inline int pmd_huge(pmd_t pmd)
-{
- /*
- * leaf pte for huge page
- */
- return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
-static inline int pud_huge(pud_t pud)
-{
- /*
- * leaf pte for huge page
- */
- return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
-}

/*
* With 64k page size, we have hugepage ptes in the pgd and pmd entries. We don't
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index df66dce8306f..05a91c99d431 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1436,9 +1436,6 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va
return false;
}

-/*
- * Like pmd_huge(), but works regardless of config options
- */
#define pmd_leaf pmd_leaf
static inline bool pmd_leaf(pmd_t pmd)
{
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 427db14292c9..f5f39d4f03c8 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -351,16 +351,6 @@ static inline int hugepd_ok(hugepd_t hpd)
#endif
}

-static inline int pmd_huge(pmd_t pmd)
-{
- return 0;
-}
-
-static inline int pud_huge(pud_t pud)
-{
- return 0;
-}
-
#define is_hugepd(hpd) (hugepd_ok(hpd))
#endif

diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 29c7606414d2..9a4bc4bd2a01 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -399,16 +399,6 @@ static bool is_napot_size(unsigned long size)

#endif /*CONFIG_RISCV_ISA_SVNAPOT*/

-int pud_huge(pud_t pud)
-{
- return pud_leaf(pud);
-}
-
-int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);
-}
-
static bool __hugetlb_valid_size(unsigned long size)
{
if (size == HPAGE_SIZE)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index c2e8242bd15d..ca43b6fce71c 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -233,16 +233,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
return (pte_t *) pmdp;
}

-int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);
-}
-
-int pud_huge(pud_t pud)
-{
- return pud_leaf(pud);
-}
-
bool __init arch_hugetlb_valid_size(unsigned long size)
{
if (MACHINE_HAS_EDAT1 && size == PMD_SIZE)
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index 6cb0ad73dbb9..ff209b55285a 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -70,13 +70,3 @@ pte_t *huge_pte_offset(struct mm_struct *mm,

return pte;
}
-
-int pmd_huge(pmd_t pmd)
-{
- return 0;
-}
-
-int pud_huge(pud_t pud)
-{
- return 0;
-}
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index d31c2cec35c9..8ed5bdf95d25 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -407,16 +407,6 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
return entry;
}

-int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);;
-}
-
-int pud_huge(pud_t pud)
-{
- return pud_leaf(pud);
-}
-
static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
unsigned long addr)
{
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8362953a24ce..dab6db288e5b 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -19,22 +19,6 @@
#include <asm/tlbflush.h>
#include <asm/elf.h>

-/*
- * pmd_huge() returns 1 if @pmd is hugetlb related entry.
- */
-int pmd_huge(pmd_t pmd)
-{
- return pmd_leaf(pmd);
-}
-
-/*
- * pud_huge() returns 1 if @pud is hugetlb related entry.
- */
-int pud_huge(pud_t pud)
-{
- return pud_leaf(pud);
-}
-
#ifdef CONFIG_HUGETLB_PAGE
static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long addr, unsigned long len,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 77b30a8c6076..300de33c6fde 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -272,13 +272,9 @@ void hugetlb_vma_unlock_write(struct vm_area_struct *vma);
int hugetlb_vma_trylock_write(struct vm_area_struct *vma);
void hugetlb_vma_assert_locked(struct vm_area_struct *vma);
void hugetlb_vma_lock_release(struct kref *kref);
-
-int pmd_huge(pmd_t pmd);
-int pud_huge(pud_t pud);
long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot,
unsigned long cp_flags);
-
bool is_hugetlb_entry_migration(pte_t pte);
bool is_hugetlb_entry_hwpoisoned(pte_t pte);
void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
@@ -399,16 +395,6 @@ static inline void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
{
}

-static inline int pmd_huge(pmd_t pmd)
-{
- return 0;
-}
-
-static inline int pud_huge(pud_t pud)
-{
- return 0;
-}
-
static inline int is_hugepage_only_range(struct mm_struct *mm,
unsigned long addr, unsigned long len)
{
@@ -493,16 +479,6 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { }

#endif /* !CONFIG_HUGETLB_PAGE */
-/*
- * hugepages at page global directory. If arch support
- * hugepages at pgd level, they need to define this.
- */
-#ifndef pgd_huge
-#define pgd_huge(x) 0
-#endif
-#ifndef p4d_huge
-#define p4d_huge(x) 0
-#endif

#ifndef pgd_write
static inline int pgd_write(pgd_t pgd)
--
2.44.0


2024-03-06 10:45:25

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 13/13] mm: Document pXd_leaf() API

From: Peter Xu <[email protected]>

There's one small section already, but since we're going to remove
pXd_huge(), that comment may start to obsolete.

Rewrite that section with more information, hopefully with that the API is
crystal clear on what it implies.

Signed-off-by: Peter Xu <[email protected]>
---
include/linux/pgtable.h | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..6b0d222a7fad 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1770,11 +1770,25 @@ typedef unsigned int pgtbl_mod_mask;
#endif

/*
- * p?d_leaf() - true if this entry is a final mapping to a physical address.
- * This differs from p?d_huge() by the fact that they are always available (if
- * the architecture supports large pages at the appropriate level) even
- * if CONFIG_HUGETLB_PAGE is not defined.
- * Only meaningful when called on a valid entry.
+ * pXd_leaf() is the API to check whether a pgtable entry is a huge page
+ * mapping. It should work globally across all archs, without any
+ * dependency on CONFIG_* options. For architectures that do not support
+ * huge mappings on specific levels, below fallbacks will be used.
+ *
+ * A leaf pgtable entry should always imply the following:
+ *
+ * - It is a "present" entry. IOW, before using this API, please check it
+ * with pXd_present() first. NOTE: it may not always mean the "present
+ * bit" is set. For example, PROT_NONE entries are always "present".
+ *
+ * - It should _never_ be a swap entry of any type. Above "present" check
+ * should have guarded this, but let's be crystal clear on this.
+ *
+ * - It should contain a huge PFN, which points to a huge page larger than
+ * PAGE_SIZE of the platform. The PFN format isn't important here.
+ *
+ * - It should cover all kinds of huge mappings (e.g., pXd_trans_huge(),
+ * pXd_devmap(), or hugetlb mappings).
*/
#ifndef pgd_leaf
#define pgd_leaf(x) false
--
2.44.0


2024-03-06 10:45:46

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 10/13] mm/gup: Merge pXd huge mapping checks

From: Peter Xu <[email protected]>

Huge mapping checks in GUP are slightly redundant and can be simplified.

pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and
pXd_devmap() should both imply pXd_leaf(). Time to merge them into one.

Signed-off-by: Peter Xu <[email protected]>
---
mm/gup.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index f87a7d90f2ae..de7188d2af86 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3005,8 +3005,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
if (!pmd_present(pmd))
return 0;

- if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
- pmd_devmap(pmd))) {
+ if (unlikely(pmd_leaf(pmd))) {
/* See gup_pte_range() */
if (pmd_protnone(pmd))
return 0;
@@ -3043,7 +3042,7 @@ static int gup_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, unsigned lo
next = pud_addr_end(addr, end);
if (unlikely(!pud_present(pud)))
return 0;
- if (unlikely(pud_huge(pud) || pud_devmap(pud))) {
+ if (unlikely(pud_leaf(pud))) {
if (!gup_huge_pud(pud, pudp, addr, next, flags,
pages, nr))
return 0;
@@ -3096,7 +3095,7 @@ static void gup_pgd_range(unsigned long addr, unsigned long end,
next = pgd_addr_end(addr, end);
if (pgd_none(pgd))
return;
- if (unlikely(pgd_huge(pgd))) {
+ if (unlikely(pgd_leaf(pgd))) {
if (!gup_huge_pgd(pgd, pgdp, addr, next, flags,
pages, nr))
return;
--
2.44.0


2024-03-06 11:16:19

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 01/13] mm/hmm: Process pud swap entry without pud_huge()

From: Peter Xu <[email protected]>

Swap pud entries do not always return true for pud_huge() for all archs.
x86 and sparc (so far) allow it, but all the rest do not accept a swap
entry to be reported as pud_huge(). So it's not safe to check swap entries
within pud_huge(). Check swap entries before pud_huge(), so it should be
always safe.

This is the only place in the kernel that (IMHO, wrongly) relies on
pud_huge() to return true on pud swap entries. The plan is to cleanup
pXd_huge() to only report non-swap mappings for all archs.

Cc: Alistair Popple <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/hmm.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 277ddcab4947..c44391a0246e 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -424,7 +424,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
walk->action = ACTION_CONTINUE;

pud = READ_ONCE(*pudp);
- if (pud_none(pud)) {
+ if (pud_none(pud) || !pud_present(pud)) {
spin_unlock(ptl);
return hmm_vma_walk_hole(start, end, -1, walk);
}
@@ -435,11 +435,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
unsigned long *hmm_pfns;
unsigned long cpu_flags;

- if (!pud_present(pud)) {
- spin_unlock(ptl);
- return hmm_vma_walk_hole(start, end, -1, walk);
- }
-
i = (addr - range->start) >> PAGE_SHIFT;
npages = (end - addr) >> PAGE_SHIFT;
hmm_pfns = &range->hmm_pfns[i];
--
2.44.0


2024-03-06 11:18:59

by Peter Xu

[permalink] [raw]
Subject: [PATCH RFC 05/13] mm/sparc: Change pXd_huge() behavior to exclude swap entries

From: Peter Xu <[email protected]>

Please refer to the previous patch on the reasoning for x86. Now sparc is
the only architecture that will allow swap entries to be reported as
pXd_huge(). After this patch, all architectures should forbid swap entries
in pXd_huge().

Cc: David S. Miller <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
arch/sparc/mm/hugetlbpage.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index b432500c13a5..d31c2cec35c9 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -409,14 +409,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,

int pmd_huge(pmd_t pmd)
{
- return !pmd_none(pmd) &&
- (pmd_val(pmd) & (_PAGE_VALID|_PAGE_PMD_HUGE)) != _PAGE_VALID;
+ return pmd_leaf(pmd);;
}

int pud_huge(pud_t pud)
{
- return !pud_none(pud) &&
- (pud_val(pud) & (_PAGE_VALID|_PAGE_PUD_HUGE)) != _PAGE_VALID;
+ return pud_leaf(pud);
}

static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
--
2.44.0


2024-03-06 12:57:50

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH RFC 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

[email protected] writes:
> From: Peter Xu <[email protected]>
>
> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> constantly returns 0 for hash MMUs. AFAICT that is fine to be removed,
> because pXd_huge() reflects a hugetlb entry, while it's own hugetlb pgtable
> lookup function (__find_linux_pte() shared by all powerpc code) already use
> pXd_leaf() irrelevant of the MMU type. It means pXd_leaf() should work all
> fine with hash MMU pgtables or something could already went wrong.

Yes I think that's correct.

4K Hash MMU doesn't support any hugepage size at PMD or PUD level (the
geometry is wrong), so pmd/pud_huge() were written with that in mind,
ie. they are hard coded to return false.

But it should be OK to use pmd/pud_leaf(), they will actually look for
_PAGE_PTE, but it should never be set for 4K Hash.

See eg. arch/powerpc/include/asm/book3s/64/hash-4k.h:

static inline pmd_t hash__pmd_mkhuge(pmd_t pmd)
{
BUG();
return pmd;
}

> The goal should be that we will have one API pXd_leaf() to detect all kinds
> of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than
> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
>
> This helps to simplify a follow up patch to drop pXd_huge() treewide.
>
> Cc: Michael Ellerman <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: "Aneesh Kumar K.V" <[email protected]>
> Cc: "Naveen N. Rao" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/powerpc/include/asm/book3s/64/pgtable-4k.h | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> index 48f21820afe2..92545981bb49 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> @@ -8,22 +8,12 @@
> #ifdef CONFIG_HUGETLB_PAGE
> static inline int pmd_huge(pmd_t pmd)
> {
> - /*
> - * leaf pte for huge page
> - */
> - if (radix_enabled())
> - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> - return 0;
> + return pmd_leaf(pmd);
> }
>
> static inline int pud_huge(pud_t pud)
> {
> - /*
> - * leaf pte for huge page
> - */
> - if (radix_enabled())
> - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> - return 0;
> + return pud_leaf(pud);
> }

This doesn't actually compile though.

arch/powerpc/include/asm/book3s/64/pgtable-4k.h:11:16: error: implicit declaration of function ‘pmd_leaf’; did you mean ‘pgd_clear’? [-Werror=implicit-function-declaration]

etc.

To make it compile we'd need to relocate the pmd/pud_leaf() definitions:

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index df66dce8306f..fd7180fded75 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -262,6 +262,18 @@ extern unsigned long __kernel_io_end;

extern struct page *vmemmap;
extern unsigned long pci_io_base;
+
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+ return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
+{
+ return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
#endif /* __ASSEMBLY__ */

#include <asm/book3s/64/hash.h>
@@ -1436,20 +1448,5 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va
return false;
}

-/*
- * Like pmd_huge(), but works regardless of config options
- */
-#define pmd_leaf pmd_leaf
-static inline bool pmd_leaf(pmd_t pmd)
-{
- return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
-#define pud_leaf pud_leaf
-static inline bool pud_leaf(pud_t pud)
-{
- return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
-}
-
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */

cheers

2024-03-07 03:11:12

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH RFC 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

On Wed, Mar 06, 2024 at 11:56:56PM +1100, Michael Ellerman wrote:
> [email protected] writes:
> > From: Peter Xu <[email protected]>
> >
> > PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> > constantly returns 0 for hash MMUs. AFAICT that is fine to be removed,
> > because pXd_huge() reflects a hugetlb entry, while it's own hugetlb pgtable
> > lookup function (__find_linux_pte() shared by all powerpc code) already use
> > pXd_leaf() irrelevant of the MMU type. It means pXd_leaf() should work all
> > fine with hash MMU pgtables or something could already went wrong.
>
> Yes I think that's correct.
>
> 4K Hash MMU doesn't support any hugepage size at PMD or PUD level (the
> geometry is wrong), so pmd/pud_huge() were written with that in mind,
> ie. they are hard coded to return false.
>
> But it should be OK to use pmd/pud_leaf(), they will actually look for
> _PAGE_PTE, but it should never be set for 4K Hash.
>
> See eg. arch/powerpc/include/asm/book3s/64/hash-4k.h:
>
> static inline pmd_t hash__pmd_mkhuge(pmd_t pmd)
> {
> BUG();
> return pmd;
> }

Good to get confirmation on this, thanks, Michael. These explanations also
look better than what I wrote, I'll amend the commit message.

>
> > The goal should be that we will have one API pXd_leaf() to detect all kinds
> > of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than
> > pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
> >
> > This helps to simplify a follow up patch to drop pXd_huge() treewide.
> >
> > Cc: Michael Ellerman <[email protected]>
> > Cc: Nicholas Piggin <[email protected]>
> > Cc: Christophe Leroy <[email protected]>
> > Cc: "Aneesh Kumar K.V" <[email protected]>
> > Cc: "Naveen N. Rao" <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > arch/powerpc/include/asm/book3s/64/pgtable-4k.h | 14 ++------------
> > 1 file changed, 2 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> > index 48f21820afe2..92545981bb49 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> > @@ -8,22 +8,12 @@
> > #ifdef CONFIG_HUGETLB_PAGE
> > static inline int pmd_huge(pmd_t pmd)
> > {
> > - /*
> > - * leaf pte for huge page
> > - */
> > - if (radix_enabled())
> > - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> > - return 0;
> > + return pmd_leaf(pmd);
> > }
> >
> > static inline int pud_huge(pud_t pud)
> > {
> > - /*
> > - * leaf pte for huge page
> > - */
> > - if (radix_enabled())
> > - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> > - return 0;
> > + return pud_leaf(pud);
> > }
>
> This doesn't actually compile though.
>
> arch/powerpc/include/asm/book3s/64/pgtable-4k.h:11:16: error: implicit declaration of function ‘pmd_leaf’; did you mean ‘pgd_clear’? [-Werror=implicit-function-declaration]
>
> etc.
>
> To make it compile we'd need to relocate the pmd/pud_leaf() definitions:
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index df66dce8306f..fd7180fded75 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -262,6 +262,18 @@ extern unsigned long __kernel_io_end;
>
> extern struct page *vmemmap;
> extern unsigned long pci_io_base;
> +
> +#define pmd_leaf pmd_leaf
> +static inline bool pmd_leaf(pmd_t pmd)
> +{
> + return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
> +#define pud_leaf pud_leaf
> +static inline bool pud_leaf(pud_t pud)
> +{
> + return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> +}
> #endif /* __ASSEMBLY__ */
>
> #include <asm/book3s/64/hash.h>
> @@ -1436,20 +1448,5 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va
> return false;
> }
>
> -/*
> - * Like pmd_huge(), but works regardless of config options
> - */
> -#define pmd_leaf pmd_leaf
> -static inline bool pmd_leaf(pmd_t pmd)
> -{
> - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
> -#define pud_leaf pud_leaf
> -static inline bool pud_leaf(pud_t pud)
> -{
> - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> -}
> -
> #endif /* __ASSEMBLY__ */
> #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */

Thanks for the help, I'll fix that. I'm wondering when syzbot will start
to feed my series into the testers; I do still rely on those feedbacks on
compilation issues with such treewide changes, but so far I didn't yet
receive any reports.

I've also attached the new patch directly here in case of any further
comment.

Thanks,

==========8<===========
From 9e75aef2141170f241577e7786aaa4bbbfd93360 Mon Sep 17 00:00:00 2001
From: Peter Xu <[email protected]>
Date: Wed, 6 Mar 2024 14:49:48 +0800
Subject: [PATCH] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1],
it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
it will keep returning false.

As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
such huge mappings for 4K hash MMUs. Meanwhile, the major powerpc hugetlb
pgtable walker __find_linux_pte(), already used pXd_leaf() to check hugetlb
mappings.

The goal should be that we will have one API pXd_leaf() to detect all kinds
of huge mappings. AFAICT we need to use the pXd_leaf() impl (rather than
pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.

This helps to simplify a follow up patch to drop pXd_huge() treewide.

NOTE: *_leaf() definition need to be moved before the inclusion of
asm/book3s/64/pgtable-4k.h, which defines pXd_huge() with it.

[1] https://lore.kernel.org/r/[email protected]

Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Xu <[email protected]>
---
.../include/asm/book3s/64/pgtable-4k.h | 14 ++--------
arch/powerpc/include/asm/book3s/64/pgtable.h | 27 +++++++++----------
2 files changed, 14 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
index 48f21820afe2..92545981bb49 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
@@ -8,22 +8,12 @@
#ifdef CONFIG_HUGETLB_PAGE
static inline int pmd_huge(pmd_t pmd)
{
- /*
- * leaf pte for huge page
- */
- if (radix_enabled())
- return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
- return 0;
+ return pmd_leaf(pmd);
}

static inline int pud_huge(pud_t pud)
{
- /*
- * leaf pte for huge page
- */
- if (radix_enabled())
- return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
- return 0;
+ return pud_leaf(pud);
}

/*
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index df66dce8306f..fd7180fded75 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -262,6 +262,18 @@ extern unsigned long __kernel_io_end;

extern struct page *vmemmap;
extern unsigned long pci_io_base;
+
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+ return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
+{
+ return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
#endif /* __ASSEMBLY__ */

#include <asm/book3s/64/hash.h>
@@ -1436,20 +1448,5 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va
return false;
}

-/*
- * Like pmd_huge(), but works regardless of config options
- */
-#define pmd_leaf pmd_leaf
-static inline bool pmd_leaf(pmd_t pmd)
-{
- return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
-#define pud_leaf pud_leaf
-static inline bool pud_leaf(pud_t pud)
-{
- return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
-}
-
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
--
2.44.0

--
Peter Xu


2024-03-07 18:12:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 01/13] mm/hmm: Process pud swap entry without pud_huge()

On Wed, Mar 06, 2024 at 06:41:35PM +0800, [email protected] wrote:
> From: Peter Xu <[email protected]>
>
> Swap pud entries do not always return true for pud_huge() for all archs.
> x86 and sparc (so far) allow it, but all the rest do not accept a swap
> entry to be reported as pud_huge(). So it's not safe to check swap entries
> within pud_huge(). Check swap entries before pud_huge(), so it should be
> always safe.
>
> This is the only place in the kernel that (IMHO, wrongly) relies on
> pud_huge() to return true on pud swap entries. The plan is to cleanup
> pXd_huge() to only report non-swap mappings for all archs.
>
> Cc: Alistair Popple <[email protected]>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/hmm.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

> @@ -424,7 +424,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
> walk->action = ACTION_CONTINUE;
>
> pud = READ_ONCE(*pudp);
> - if (pud_none(pud)) {
> + if (pud_none(pud) || !pud_present(pud)) {

Isn't this a tautology? pud_none always implies !present() ?

Jason

2024-03-07 20:08:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 03/13] mm/gup: Check p4d presence before going on

On Wed, Mar 06, 2024 at 06:41:37PM +0800, [email protected] wrote:
> From: Peter Xu <[email protected]>
>
> Currently there should have no p4d swap entries so it may not matter much,
> however this may help us to rule out swap entries in pXd_huge() API, which
> will include p4d_huge(). The p4d_present() checks make it 100% clear that
> we won't rely on p4d_huge() for swap entries.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/gup.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

> p4dp = p4d_offset(pgdp, address);
> p4d = READ_ONCE(*p4dp);
> - if (p4d_none(p4d))
> + if (p4d_none(p4d) || !p4d_present(p4d))
> return no_page_table(vma, flags);

Like the other place I think this makes more sense as

if (!p4_present(p4d))
return no_page_table(vma, flags);

Since none can never be present.

IOW if the following code doesn't decode swap entries then it should
be guarded with a chceck on present not none..

Jason

2024-03-07 20:20:00

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 10/13] mm/gup: Merge pXd huge mapping checks

On Wed, Mar 06, 2024 at 06:41:44PM +0800, [email protected] wrote:
> From: Peter Xu <[email protected]>
>
> Huge mapping checks in GUP are slightly redundant and can be simplified.
>
> pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and
> pXd_devmap() should both imply pXd_leaf(). Time to merge them into one.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/gup.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)

Much better!

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2024-03-07 20:32:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 04/13] mm/x86: Change pXd_huge() behavior to exclude swap entries

On Wed, Mar 06, 2024 at 06:41:38PM +0800, [email protected] wrote:
> From: Peter Xu <[email protected]>
>
> This patch partly reverts below commits:
>
> 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
> cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage")
>
> Right now, pXd_huge() definition across kernel is unclear. We have two
> groups that think differently on swap entries:
>
> - x86/sparc: Allow pXd_huge() to accept swap entries
> - all the rest: Doesn't allow pXd_huge() to accept swap entries
>
> This is so confusing. Since the sparc helpers seem to be added in 2016,
> which is after x86's (2015), so sparc could have followed a trend. x86
> proposed such swap handling in 2015 to resolve hugetlb swap entries hit in
> GUP, but now GUP guards swap entries with !pXd_present() in all layers so
> we should be safe.
>
> We should define this API properly, one way or another, rather than keep
> them defined differently across archs.
>
> Gut feeling tells me that pXd_huge() shouldn't include swap entries, and it
> turns out that I am not the only one thinking so, the question was raised
> when the current pmd_huge() for x86 was proposed by Ville Syrjälä:
>
> https://lore.kernel.org/all/[email protected]/
>
> I might also be missing something obvious, but why is it even necessary
> to treat PRESENT==0+PSE==0 as a huge entry?
>
> It is also questioned when Jason Gunthorpe reviewed the other patchset on
> swap entry handlings:
>
> https://lore.kernel.org/all/[email protected]/
>
> Revert its meaning back to original. It shouldn't have any functional
> change as we should be ready with guards on !pXd_present() explicitly
> everywhere.
>
> Note that I also dropped the "#if CONFIG_PGTABLE_LEVELS > 2", it was there
> probably because it was breaking things when 3a194f3f8ad0 was proposed,
> according to the report here:
>
> https://lore.kernel.org/all/[email protected]/
>
> Now we shouldn't need that.
>
> Instead of reverting to _PAGE_PSE raw check, leverage pXd_leaf().
>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: [email protected]
> Signed-off-by: Peter Xu <[email protected]>
> ---
> arch/x86/mm/hugetlbpage.c | 18 ++++--------------
> 1 file changed, 4 insertions(+), 14 deletions(-)

I think this is the right thing to do, callers should be more directly
sensitive to swap entries not back into it indirectly from a helper
like this.

Jason

2024-03-08 06:50:44

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH RFC 01/13] mm/hmm: Process pud swap entry without pud_huge()

On Thu, Mar 07, 2024 at 02:12:33PM -0400, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2024 at 06:41:35PM +0800, [email protected] wrote:
> > From: Peter Xu <[email protected]>
> >
> > Swap pud entries do not always return true for pud_huge() for all archs.
> > x86 and sparc (so far) allow it, but all the rest do not accept a swap
> > entry to be reported as pud_huge(). So it's not safe to check swap entries
> > within pud_huge(). Check swap entries before pud_huge(), so it should be
> > always safe.
> >
> > This is the only place in the kernel that (IMHO, wrongly) relies on
> > pud_huge() to return true on pud swap entries. The plan is to cleanup
> > pXd_huge() to only report non-swap mappings for all archs.
> >
> > Cc: Alistair Popple <[email protected]>
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > mm/hmm.c | 7 +------
> > 1 file changed, 1 insertion(+), 6 deletions(-)
>
> Reviewed-by: Jason Gunthorpe <[email protected]>
>
> > @@ -424,7 +424,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
> > walk->action = ACTION_CONTINUE;
> >
> > pud = READ_ONCE(*pudp);
> > - if (pud_none(pud)) {
> > + if (pud_none(pud) || !pud_present(pud)) {
>
> Isn't this a tautology? pud_none always implies !present() ?

Hmm yes I think so, afact, it should be "all=none+swap+present". I still
remember I missed that once previously, it's not always obvious when
preparing such patches. :( I'll simplify this and also on patch 3.

Thanks,

--
Peter Xu


2024-03-08 15:21:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 13/13] mm: Document pXd_leaf() API

On Wed, Mar 06, 2024 at 06:41:47PM +0800, [email protected] wrote:
> From: Peter Xu <[email protected]>
>
> There's one small section already, but since we're going to remove
> pXd_huge(), that comment may start to obsolete.
>
> Rewrite that section with more information, hopefully with that the API is
> crystal clear on what it implies.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> include/linux/pgtable.h | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2024-03-11 09:59:08

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH RFC 00/13] mm/treewide: Remove pXd_huge() API



Le 06/03/2024 à 11:41, [email protected] a écrit :
> From: Peter Xu <[email protected]>
>
> [based on akpm/mm-unstable latest commit a7f399ae964e]
>
> In previous work [1], we removed the pXd_large() API, which is arch
> specific. This patchset further removes the hugetlb pXd_huge() API.
>
> Hugetlb was never special on creating huge mappings when compared with
> other huge mappings. Having a standalone API just to detect such pgtable
> entries is more or less redundant, especially after the pXd_leaf() API set
> is introduced with/without CONFIG_HUGETLB_PAGE.
>
> When looking at this problem, a few issues are also exposed that we don't
> have a clear definition of the *_huge() variance API. This patchset
> started by cleaning these issues first, then replace all *_huge() users to
> use *_leaf(), then drop all *_huge() code.
>
> On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for
> all the rest archs they're reported "false" instead. This part is done in
> patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll
> leave that to hmm experts to decide.
>
> Besides, there are three archs (arm, arm64, powerpc) that have slightly
> different definitions between the *_huge() v.s. *_leaf() variances. I
> tackled them separately so that it'll be easier for arch experts to chim in
> when necessary. This part is done in patch 6-9.
>
> The final patches 10-13 do the rest on the final removal, since *_leaf()
> will be the ultimate API in the future, and we seem to have quite some
> confusions on how *_huge() APIs can be defined, provide a rich comment for
> *_leaf() API set to define them properly to avoid future misuse, and
> hopefully that'll also help new archs to start support huge mappings and
> avoid traps (like either swap entries, or PROT_NONE entry checks).
>
> The whole series is only lightly tested on x86, while as usual I don't have
> the capability to test all archs that it touches.
>
> Marking this series RFC as of now.
>
> [1] https://lore.kernel.org/r/[email protected]
>

Hi Peter, and nice job you are doing in cleaning up things around _huge
stuff.

One thing that might be worth looking at also at some point is the mess
around pmd_clear_huge() and pud_clear_huge().

I tried to clean things up with commit c742199a014d ("mm/pgtable: add
stubs for {pmd/pub}_{set/clear}_huge") but it was reverted because of
arm64 by commit d8a719059b9d ("Revert "mm/pgtable: add stubs for
{pmd/pub}_{set/clear}_huge"")

So now powerpc/8xx has to implement pmd_clear_huge() and
pud_clear_huge() allthough 8xx page hierarchy only has 2 levels.

Christophe

2024-03-12 20:01:30

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH RFC 00/13] mm/treewide: Remove pXd_huge() API

Hi, Christophe,

On Mon, Mar 11, 2024 at 09:58:47AM +0000, Christophe Leroy wrote:
> Hi Peter, and nice job you are doing in cleaning up things around _huge
> stuff.

Thanks. I appreciate your help along the way on Power.

>
> One thing that might be worth looking at also at some point is the mess
> around pmd_clear_huge() and pud_clear_huge().
>
> I tried to clean things up with commit c742199a014d ("mm/pgtable: add
> stubs for {pmd/pub}_{set/clear}_huge") but it was reverted because of
> arm64 by commit d8a719059b9d ("Revert "mm/pgtable: add stubs for
> {pmd/pub}_{set/clear}_huge"")
>
> So now powerpc/8xx has to implement pmd_clear_huge() and
> pud_clear_huge() allthough 8xx page hierarchy only has 2 levels.

Those are so far out of my radar, as my focus right now is still more on
hugetlbfs relevant side of things, while kernel mappings are not yet
directly involved in hugetlbfs, even though they're still huge mappings.

It's a pity to know that broke arm and got reverted, as that looks like a
good thing to clean it up if ever possible. I tend to agree with you that
it seems for 3lvl we should define pgd_huge*() instead of pud_huge*(), so
that it looks like the only way to provide such a treewide clean API is to
properly define those APIs for aarch64, and define different pud helpers
for either 3/4 levels. But I confess I don't think I fully digested all
the bits.

Thanks,

--
Peter Xu