2019-04-26 06:22:25

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 00/17] Reduce ifdef mess in hugetlbpage.c

This is a split out of v1 series "Reduce ifdef mess in hugetlbpage.c and slice.c"

The main purpose of this series is to reduce the amount of #ifdefs in
hugetlbpage.c by moving subarch specific stuff in dedicated header files.

This series also drops all CONFIG_PPC_64K_PAGES related code in book3e
as it cannot be selected.

Compile test at http://kisskb.ellerman.id.au/kisskb/head/74280b305020ab2c50efe383d37fccff009e6b84/

Christophe Leroy (17):
powerpc/mm: Don't BUG() in hugepd_page()
powerpc/mm: don't BUG in add_huge_page_size()
powerpc/book3e: drop mmu_get_tsize()
powerpc/64: only book3s/64 supports CONFIG_PPC_64K_PAGES
powerpc/book3e: hugetlbpage is only for CONFIG_PPC_FSL_BOOK3E
powerpc/mm: move __find_linux_pte() out of hugetlbpage.c
powerpc/mm: make hugetlbpage.c depend on CONFIG_HUGETLB_PAGE
powerpc/mm: make gup_hugepte() static
powerpc/mm: split asm/hugetlb.h into dedicated subarch files
powerpc/mm: add a helper to populate hugepd
powerpc/mm: cleanup ifdef mess in add_huge_page_size()
powerpc/mm: move hugetlb_disabled into asm/hugetlb.h
powerpc/mm: cleanup HPAGE_SHIFT setup
powerpc/mm: cleanup remaining ifdef mess in hugetlbpage.c
powerpc/mm: flatten function __find_linux_pte() step 1
powerpc/mm: flatten function __find_linux_pte() step 2
powerpc/mm: flatten function __find_linux_pte() step 3

arch/powerpc/Kconfig | 3 +-
arch/powerpc/include/asm/book3s/64/hugetlb.h | 72 +++++++
arch/powerpc/include/asm/hugetlb.h | 87 +-------
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 44 +++++
arch/powerpc/include/asm/nohash/64/pgalloc.h | 3 -
arch/powerpc/include/asm/nohash/64/pgtable.h | 4 -
arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 45 +++++
arch/powerpc/include/asm/nohash/pte-book3e.h | 5 -
arch/powerpc/include/asm/page.h | 12 +-
arch/powerpc/include/asm/pgtable-be-types.h | 9 +-
arch/powerpc/include/asm/pgtable-types.h | 9 +-
arch/powerpc/include/asm/pgtable.h | 3 -
arch/powerpc/include/asm/task_size_64.h | 2 +-
arch/powerpc/kernel/fadump.c | 1 +
arch/powerpc/mm/Makefile | 4 +-
arch/powerpc/mm/hash_utils_64.c | 1 +
arch/powerpc/mm/hugetlbpage-book3e.c | 52 ++---
arch/powerpc/mm/hugetlbpage-hash64.c | 16 ++
arch/powerpc/mm/hugetlbpage.c | 240 +++--------------------
arch/powerpc/mm/pgtable.c | 114 +++++++++++
arch/powerpc/mm/tlb_low_64e.S | 31 ---
arch/powerpc/mm/tlb_nohash.c | 13 --
22 files changed, 366 insertions(+), 404 deletions(-)
create mode 100644 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
create mode 100644 arch/powerpc/include/asm/nohash/hugetlb-book3e.h

--
2.13.3


2019-04-26 06:00:54

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 01/17] powerpc/mm: Don't BUG() in hugepd_page()

Use VM_BUG_ON() instead of BUG_ON(), as those BUG_ON()
are not there to catch runtime errors but to catch errors
during development cycle only.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/hugetlb.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 8d40565ad0c3..7f1867e428c0 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -14,7 +14,7 @@
*/
static inline pte_t *hugepd_page(hugepd_t hpd)
{
- BUG_ON(!hugepd_ok(hpd));
+ VM_BUG_ON(!hugepd_ok(hpd));
/*
* We have only four bits to encode, MMU page size
*/
@@ -42,7 +42,7 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma,

static inline pte_t *hugepd_page(hugepd_t hpd)
{
- BUG_ON(!hugepd_ok(hpd));
+ VM_BUG_ON(!hugepd_ok(hpd));
#ifdef CONFIG_PPC_8xx
return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
#else
--
2.13.3

2019-04-26 06:01:00

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 09/17] powerpc/mm: split asm/hugetlb.h into dedicated subarch files

Three subarches support hugepages:
- fsl book3e
- book3s/64
- 8xx

This patch splits asm/hugetlb.h to reduce the #ifdef mess.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/book3s/64/hugetlb.h | 40 +++++++++++
arch/powerpc/include/asm/hugetlb.h | 87 ++----------------------
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 31 +++++++++
arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 31 +++++++++
4 files changed, 106 insertions(+), 83 deletions(-)
create mode 100644 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
create mode 100644 arch/powerpc/include/asm/nohash/hugetlb-book3e.h

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index ec2a55a553c7..7c99f018f7b5 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -62,4 +62,44 @@ extern pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t old_pte, pte_t new_pte);
+/*
+ * This should work for other subarchs too. But right now we use the
+ * new format only for 64bit book3s
+ */
+static inline pte_t *hugepd_page(hugepd_t hpd)
+{
+ VM_BUG_ON(!hugepd_ok(hpd));
+ /*
+ * We have only four bits to encode, MMU page size
+ */
+ BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
+ return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK);
+}
+
+static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
+{
+ return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2;
+}
+
+static inline unsigned int hugepd_shift(hugepd_t hpd)
+{
+ return mmu_psize_to_shift(hugepd_mmu_psize(hpd));
+}
+static inline void flush_hugetlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+ if (radix_enabled())
+ return radix__flush_hugetlb_page(vma, vmaddr);
+}
+
+static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
+ unsigned int pdshift)
+{
+ unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
+
+ return hugepd_page(hpd) + idx;
+}
+
+void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
+
#endif
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 7f1867e428c0..fd5c0873a57d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -6,83 +6,13 @@
#include <asm/page.h>

#ifdef CONFIG_PPC_BOOK3S_64
-
#include <asm/book3s/64/hugetlb.h>
-/*
- * This should work for other subarchs too. But right now we use the
- * new format only for 64bit book3s
- */
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
- VM_BUG_ON(!hugepd_ok(hpd));
- /*
- * We have only four bits to encode, MMU page size
- */
- BUILD_BUG_ON((MMU_PAGE_COUNT - 1) > 0xf);
- return __va(hpd_val(hpd) & HUGEPD_ADDR_MASK);
-}
-
-static inline unsigned int hugepd_mmu_psize(hugepd_t hpd)
-{
- return (hpd_val(hpd) & HUGEPD_SHIFT_MASK) >> 2;
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
- return mmu_psize_to_shift(hugepd_mmu_psize(hpd));
-}
-static inline void flush_hugetlb_page(struct vm_area_struct *vma,
- unsigned long vmaddr)
-{
- if (radix_enabled())
- return radix__flush_hugetlb_page(vma, vmaddr);
-}
-
-#else
-
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
- VM_BUG_ON(!hugepd_ok(hpd));
-#ifdef CONFIG_PPC_8xx
- return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
-#else
- return (pte_t *)((hpd_val(hpd) &
- ~HUGEPD_SHIFT_MASK) | PD_HUGE);
-#endif
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-#ifdef CONFIG_PPC_8xx
- return ((hpd_val(hpd) & _PMD_PAGE_MASK) >> 1) + 17;
-#else
- return hpd_val(hpd) & HUGEPD_SHIFT_MASK;
-#endif
-}
-
+#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#include <asm/nohash/hugetlb-book3e.h>
+#elif defined(CONFIG_PPC_8xx)
+#include <asm/nohash/32/hugetlb-8xx.h>
#endif /* CONFIG_PPC_BOOK3S_64 */

-
-static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
- unsigned pdshift)
-{
- /*
- * On FSL BookE, we have multiple higher-level table entries that
- * point to the same hugepte. Just use the first one since they're all
- * identical. So for that case, idx=0.
- */
- unsigned long idx = 0;
-
- pte_t *dir = hugepd_page(hpd);
-#ifdef CONFIG_PPC_8xx
- idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
-#elif !defined(CONFIG_PPC_FSL_BOOK3E)
- idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
-#endif
-
- return dir + idx;
-}
-
void flush_dcache_icache_hugepage(struct page *page);

int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
@@ -99,15 +29,6 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,

void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
pte_t pte);
-#ifdef CONFIG_PPC_8xx
-static inline void flush_hugetlb_page(struct vm_area_struct *vma,
- unsigned long vmaddr)
-{
- flush_tlb_page(vma, vmaddr);
-}
-#else
-void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
-#endif

#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
new file mode 100644
index 000000000000..27acd2ad2176
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H
+#define _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H
+
+static inline pte_t *hugepd_page(hugepd_t hpd)
+{
+ VM_BUG_ON(!hugepd_ok(hpd));
+
+ return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
+}
+
+static inline unsigned int hugepd_shift(hugepd_t hpd)
+{
+ return ((hpd_val(hpd) & _PMD_PAGE_MASK) >> 1) + 17;
+}
+
+static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
+ unsigned int pdshift)
+{
+ unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+
+ return hugepd_page(hpd) + idx;
+}
+
+static inline void flush_hugetlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+ flush_tlb_page(vma, vmaddr);
+}
+
+#endif /* _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H */
diff --git a/arch/powerpc/include/asm/nohash/hugetlb-book3e.h b/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
new file mode 100644
index 000000000000..e94f1cd048ee
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_HUGETLB_BOOK3E_H
+#define _ASM_POWERPC_NOHASH_HUGETLB_BOOK3E_H
+
+static inline pte_t *hugepd_page(hugepd_t hpd)
+{
+ if (WARN_ON(!hugepd_ok(hpd)))
+ return NULL;
+
+ return (pte_t *)((hpd_val(hpd) & ~HUGEPD_SHIFT_MASK) | PD_HUGE);
+}
+
+static inline unsigned int hugepd_shift(hugepd_t hpd)
+{
+ return hpd_val(hpd) & HUGEPD_SHIFT_MASK;
+}
+
+static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
+ unsigned int pdshift)
+{
+ /*
+ * On FSL BookE, we have multiple higher-level table entries that
+ * point to the same hugepte. Just use the first one since they're all
+ * identical. So for that case, idx=0.
+ */
+ return hugepd_page(hpd);
+}
+
+void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
+
+#endif /* _ASM_POWERPC_NOHASH_HUGETLB_BOOK3E_H */
--
2.13.3

2019-04-26 06:01:07

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 16/17] powerpc/mm: flatten function __find_linux_pte() step 2

__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

Previous patch left { } blocks. This patch removes the first one
by shifting its content to the left.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/pgtable.c | 62 +++++++++++++++++++++++------------------------
1 file changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index d332abeedf0a..c1c6d0b79baa 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -369,39 +369,37 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
hpdp = (hugepd_t *)&pud;
goto out_huge;
}
- {
- pdshift = PMD_SHIFT;
- pmdp = pmd_offset(&pud, ea);
- pmd = READ_ONCE(*pmdp);
- /*
- * A hugepage collapse is captured by pmd_none, because
- * it mark the pmd none and do a hpte invalidate.
- */
- if (pmd_none(pmd))
- return NULL;
-
- if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
- if (is_thp)
- *is_thp = true;
- ret_pte = (pte_t *) pmdp;
- goto out;
- }
- /*
- * pmd_large check below will handle the swap pmd pte
- * we need to do both the check because they are config
- * dependent.
- */
- if (pmd_huge(pmd) || pmd_large(pmd)) {
- ret_pte = (pte_t *) pmdp;
- goto out;
- }
- if (is_hugepd(__hugepd(pmd_val(pmd)))) {
- hpdp = (hugepd_t *)&pmd;
- goto out_huge;
- }
-
- return pte_offset_kernel(&pmd, ea);
+ pdshift = PMD_SHIFT;
+ pmdp = pmd_offset(&pud, ea);
+ pmd = READ_ONCE(*pmdp);
+ /*
+ * A hugepage collapse is captured by pmd_none, because
+ * it mark the pmd none and do a hpte invalidate.
+ */
+ if (pmd_none(pmd))
+ return NULL;
+
+ if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
+ if (is_thp)
+ *is_thp = true;
+ ret_pte = (pte_t *)pmdp;
+ goto out;
+ }
+ /*
+ * pmd_large check below will handle the swap pmd pte
+ * we need to do both the check because they are config
+ * dependent.
+ */
+ if (pmd_huge(pmd) || pmd_large(pmd)) {
+ ret_pte = (pte_t *)pmdp;
+ goto out;
}
+ if (is_hugepd(__hugepd(pmd_val(pmd)))) {
+ hpdp = (hugepd_t *)&pmd;
+ goto out_huge;
+ }
+
+ return pte_offset_kernel(&pmd, ea);
}
out_huge:
if (!hpdp)
--
2.13.3

2019-04-26 06:01:34

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 13/17] powerpc/mm: cleanup HPAGE_SHIFT setup

Only book3s/64 may select default among several HPAGE_SHIFT at runtime.
8xx always defines 512K pages as default
FSL_BOOK3E always defines 4M pages as default

This patch limits HUGETLB_PAGE_SIZE_VARIABLE to book3s/64
moves the definitions in subarches files.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/hugetlb.h | 2 ++
arch/powerpc/include/asm/page.h | 11 ++++++++---
arch/powerpc/mm/hugetlbpage-hash64.c | 16 ++++++++++++++++
arch/powerpc/mm/hugetlbpage.c | 23 +++--------------------
5 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5d8e692d6470..7815eb0cc2a5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -390,7 +390,7 @@ source "kernel/Kconfig.hz"

config HUGETLB_PAGE_SIZE_VARIABLE
bool
- depends on HUGETLB_PAGE
+ depends on HUGETLB_PAGE && PPC_BOOK3S_64
default y

config MATH_EMULATION
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 84598c6b0959..20a101046cff 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -15,6 +15,8 @@

extern bool hugetlb_disabled;

+void hugetlbpage_init_default(void);
+
void flush_dcache_icache_hugepage(struct page *page);

int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 6b508420d92b..dbc8c0679480 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -28,10 +28,15 @@
#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT)

#ifndef __ASSEMBLY__
-#ifdef CONFIG_HUGETLB_PAGE
-extern unsigned int HPAGE_SHIFT;
-#else
+#ifndef CONFIG_HUGETLB_PAGE
#define HPAGE_SHIFT PAGE_SHIFT
+#elif defined(CONFIG_PPC_BOOK3S_64)
+extern unsigned int hpage_shift;
+#define HPAGE_SHIFT hpage_shift
+#elif defined(CONFIG_PPC_8xx)
+#define HPAGE_SHIFT 19 /* 512k pages */
+#elif defined(CONFIG_PPC_FSL_BOOK3E)
+#define HPAGE_SHIFT 22 /* 4M pages */
#endif
#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT)
#define HPAGE_MASK (~(HPAGE_SIZE - 1))
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index b0d9209d9a86..7a58204c3688 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -15,6 +15,9 @@
#include <asm/cacheflush.h>
#include <asm/machdep.h>

+unsigned int hpage_shift;
+EXPORT_SYMBOL(hpage_shift);
+
extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
unsigned long pa, unsigned long rlags,
unsigned long vflags, int psize, int ssize);
@@ -145,3 +148,16 @@ void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr
old_pte, pte);
set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
}
+
+void hugetlbpage_init_default(void)
+{
+ /* Set default large page size. Currently, we pick 16M or 1M
+ * depending on what is available
+ */
+ if (mmu_psize_defs[MMU_PAGE_16M].shift)
+ hpage_shift = mmu_psize_defs[MMU_PAGE_16M].shift;
+ else if (mmu_psize_defs[MMU_PAGE_1M].shift)
+ hpage_shift = mmu_psize_defs[MMU_PAGE_1M].shift;
+ else if (mmu_psize_defs[MMU_PAGE_2M].shift)
+ hpage_shift = mmu_psize_defs[MMU_PAGE_2M].shift;
+}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 828860a7492e..265bd6d04233 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -28,9 +28,6 @@

bool hugetlb_disabled = false;

-unsigned int HPAGE_SHIFT;
-EXPORT_SYMBOL(HPAGE_SHIFT);
-
#define hugepd_none(hpd) (hpd_val(hpd) == 0)

#define PTE_T_ORDER (__builtin_ffs(sizeof(pte_t)) - __builtin_ffs(sizeof(void *)))
@@ -647,23 +644,9 @@ static int __init hugetlbpage_init(void)
#endif
}

-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
- /* Default hpage size = 4M on FSL_BOOK3E and 512k on 8xx */
- if (mmu_psize_defs[MMU_PAGE_4M].shift)
- HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_4M].shift;
- else if (mmu_psize_defs[MMU_PAGE_512K].shift)
- HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_512K].shift;
-#else
- /* Set default large page size. Currently, we pick 16M or 1M
- * depending on what is available
- */
- if (mmu_psize_defs[MMU_PAGE_16M].shift)
- HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_16M].shift;
- else if (mmu_psize_defs[MMU_PAGE_1M].shift)
- HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_1M].shift;
- else if (mmu_psize_defs[MMU_PAGE_2M].shift)
- HPAGE_SHIFT = mmu_psize_defs[MMU_PAGE_2M].shift;
-#endif
+ if (IS_ENABLED(HUGETLB_PAGE_SIZE_VARIABLE))
+ hugetlbpage_init_default();
+
return 0;
}

--
2.13.3

2019-04-26 06:02:02

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 15/17] powerpc/mm: flatten function __find_linux_pte() step 1

__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

This patch flattens the function by getting rid of as much if/else
as possible. In order to ease the review, this is done in three steps.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/pgtable.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9f4ccd15849f..d332abeedf0a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -339,12 +339,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
*/
if (pgd_none(pgd))
return NULL;
- else if (pgd_huge(pgd)) {
- ret_pte = (pte_t *) pgdp;
+
+ if (pgd_huge(pgd)) {
+ ret_pte = (pte_t *)pgdp;
goto out;
- } else if (is_hugepd(__hugepd(pgd_val(pgd))))
+ }
+ if (is_hugepd(__hugepd(pgd_val(pgd)))) {
hpdp = (hugepd_t *)&pgd;
- else {
+ goto out_huge;
+ }
+ {
/*
* Even if we end up with an unmap, the pgtable will not
* be freed, because we do an rcu free and here we are
@@ -356,12 +360,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,

if (pud_none(pud))
return NULL;
- else if (pud_huge(pud)) {
+
+ if (pud_huge(pud)) {
ret_pte = (pte_t *) pudp;
goto out;
- } else if (is_hugepd(__hugepd(pud_val(pud))))
+ }
+ if (is_hugepd(__hugepd(pud_val(pud)))) {
hpdp = (hugepd_t *)&pud;
- else {
+ goto out_huge;
+ }
+ {
pdshift = PMD_SHIFT;
pmdp = pmd_offset(&pud, ea);
pmd = READ_ONCE(*pmdp);
@@ -386,12 +394,16 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
if (pmd_huge(pmd) || pmd_large(pmd)) {
ret_pte = (pte_t *) pmdp;
goto out;
- } else if (is_hugepd(__hugepd(pmd_val(pmd))))
+ }
+ if (is_hugepd(__hugepd(pmd_val(pmd)))) {
hpdp = (hugepd_t *)&pmd;
- else
- return pte_offset_kernel(&pmd, ea);
+ goto out_huge;
+ }
+
+ return pte_offset_kernel(&pmd, ea);
}
}
+out_huge:
if (!hpdp)
return NULL;

--
2.13.3

2019-04-26 06:02:26

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 10/17] powerpc/mm: add a helper to populate hugepd

This patchs adds a subarch helper to populate hugepd.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/book3s/64/hugetlb.h | 5 +++++
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 8 ++++++++
arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 6 ++++++
arch/powerpc/mm/hugetlbpage.c | 20 +-------------------
4 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 7c99f018f7b5..5537046b4c78 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -100,6 +100,11 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
return hugepd_page(hpd) + idx;
}

+static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int pshift)
+{
+ *hpdp = __hugepd(__pa(new) | HUGEPD_VAL_BITS | (shift_to_mmu_psize(pshift) << 2));
+}
+
void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);

#endif
diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
index 27acd2ad2176..d9aa72fd76b3 100644
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -2,6 +2,8 @@
#ifndef _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H
#define _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H

+#define PAGE_SHIFT_8M 23
+
static inline pte_t *hugepd_page(hugepd_t hpd)
{
VM_BUG_ON(!hugepd_ok(hpd));
@@ -28,4 +30,10 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma,
flush_tlb_page(vma, vmaddr);
}

+static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int pshift)
+{
+ *hpdp = __hugepd(__pa(new) | _PMD_USER | _PMD_PRESENT |
+ (pshift == PAGE_SHIFT_8M ? _PMD_PAGE_8M : _PMD_PAGE_512K));
+}
+
#endif /* _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H */
diff --git a/arch/powerpc/include/asm/nohash/hugetlb-book3e.h b/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
index e94f1cd048ee..51439bcfe313 100644
--- a/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
+++ b/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
@@ -28,4 +28,10 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,

void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);

+static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int pshift)
+{
+ /* We use the old format for PPC_FSL_BOOK3E */
+ *hpdp = __hugepd(((unsigned long)new & ~PD_HUGE) | pshift);
+}
+
#endif /* _ASM_POWERPC_NOHASH_HUGETLB_BOOK3E_H */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index ef0f5b8e87fd..0322a0617371 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -26,12 +26,6 @@
#include <asm/hugetlb.h>
#include <asm/pte-walk.h>

-#define PAGE_SHIFT_64K 16
-#define PAGE_SHIFT_512K 19
-#define PAGE_SHIFT_8M 23
-#define PAGE_SHIFT_16M 24
-#define PAGE_SHIFT_16G 34
-
bool hugetlb_disabled = false;

unsigned int HPAGE_SHIFT;
@@ -95,19 +89,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
for (i = 0; i < num_hugepd; i++, hpdp++) {
if (unlikely(!hugepd_none(*hpdp)))
break;
- else {
-#ifdef CONFIG_PPC_BOOK3S_64
- *hpdp = __hugepd(__pa(new) | HUGEPD_VAL_BITS |
- (shift_to_mmu_psize(pshift) << 2));
-#elif defined(CONFIG_PPC_8xx)
- *hpdp = __hugepd(__pa(new) | _PMD_USER |
- (pshift == PAGE_SHIFT_8M ? _PMD_PAGE_8M :
- _PMD_PAGE_512K) | _PMD_PRESENT);
-#else
- /* We use the old format for PPC_FSL_BOOK3E */
- *hpdp = __hugepd(((unsigned long)new & ~PD_HUGE) | pshift);
-#endif
- }
+ hugepd_populate(hpdp, new, pshift);
}
/* If we bailed from the for loop early, an error occurred, clean up */
if (i < num_hugepd) {
--
2.13.3

2019-04-26 06:02:28

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 17/17] powerpc/mm: flatten function __find_linux_pte() step 3

__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

Previous patches left a { } block. This patch removes it.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/pgtable.c | 98 +++++++++++++++++++++++------------------------
1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index c1c6d0b79baa..db4a6253df92 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -348,59 +348,59 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
hpdp = (hugepd_t *)&pgd;
goto out_huge;
}
- {
- /*
- * Even if we end up with an unmap, the pgtable will not
- * be freed, because we do an rcu free and here we are
- * irq disabled
- */
- pdshift = PUD_SHIFT;
- pudp = pud_offset(&pgd, ea);
- pud = READ_ONCE(*pudp);

- if (pud_none(pud))
- return NULL;
+ /*
+ * Even if we end up with an unmap, the pgtable will not
+ * be freed, because we do an rcu free and here we are
+ * irq disabled
+ */
+ pdshift = PUD_SHIFT;
+ pudp = pud_offset(&pgd, ea);
+ pud = READ_ONCE(*pudp);

- if (pud_huge(pud)) {
- ret_pte = (pte_t *) pudp;
- goto out;
- }
- if (is_hugepd(__hugepd(pud_val(pud)))) {
- hpdp = (hugepd_t *)&pud;
- goto out_huge;
- }
- pdshift = PMD_SHIFT;
- pmdp = pmd_offset(&pud, ea);
- pmd = READ_ONCE(*pmdp);
- /*
- * A hugepage collapse is captured by pmd_none, because
- * it mark the pmd none and do a hpte invalidate.
- */
- if (pmd_none(pmd))
- return NULL;
-
- if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
- if (is_thp)
- *is_thp = true;
- ret_pte = (pte_t *)pmdp;
- goto out;
- }
- /*
- * pmd_large check below will handle the swap pmd pte
- * we need to do both the check because they are config
- * dependent.
- */
- if (pmd_huge(pmd) || pmd_large(pmd)) {
- ret_pte = (pte_t *)pmdp;
- goto out;
- }
- if (is_hugepd(__hugepd(pmd_val(pmd)))) {
- hpdp = (hugepd_t *)&pmd;
- goto out_huge;
- }
+ if (pud_none(pud))
+ return NULL;

- return pte_offset_kernel(&pmd, ea);
+ if (pud_huge(pud)) {
+ ret_pte = (pte_t *)pudp;
+ goto out;
}
+ if (is_hugepd(__hugepd(pud_val(pud)))) {
+ hpdp = (hugepd_t *)&pud;
+ goto out_huge;
+ }
+ pdshift = PMD_SHIFT;
+ pmdp = pmd_offset(&pud, ea);
+ pmd = READ_ONCE(*pmdp);
+ /*
+ * A hugepage collapse is captured by pmd_none, because
+ * it mark the pmd none and do a hpte invalidate.
+ */
+ if (pmd_none(pmd))
+ return NULL;
+
+ if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
+ if (is_thp)
+ *is_thp = true;
+ ret_pte = (pte_t *)pmdp;
+ goto out;
+ }
+ /*
+ * pmd_large check below will handle the swap pmd pte
+ * we need to do both the check because they are config
+ * dependent.
+ */
+ if (pmd_huge(pmd) || pmd_large(pmd)) {
+ ret_pte = (pte_t *)pmdp;
+ goto out;
+ }
+ if (is_hugepd(__hugepd(pmd_val(pmd)))) {
+ hpdp = (hugepd_t *)&pmd;
+ goto out_huge;
+ }
+
+ return pte_offset_kernel(&pmd, ea);
+
out_huge:
if (!hpdp)
return NULL;
--
2.13.3

2019-04-26 06:02:47

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 08/17] powerpc/mm: make gup_hugepte() static

gup_huge_pd() is the only user of gup_hugepte() and it is
located in the same file. This patch moves gup_huge_pd()
after gup_hugepte() and makes gup_hugepte() static.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/pgtable.h | 3 ---
arch/powerpc/mm/hugetlbpage.c | 38 +++++++++++++++++++-------------------
2 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 505550fb2935..c51846da41a7 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -89,9 +89,6 @@ extern void paging_init(void);
*/
extern void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);

-extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
- unsigned long end, int write,
- struct page **pages, int *nr);
#ifndef CONFIG_TRANSPARENT_HUGEPAGE
#define pmd_large(pmd) 0
#endif
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 809f8f167962..ef0f5b8e87fd 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -539,23 +539,6 @@ static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
return (__boundary - 1 < end - 1) ? __boundary : end;
}

-int gup_huge_pd(hugepd_t hugepd, unsigned long addr, unsigned pdshift,
- unsigned long end, int write, struct page **pages, int *nr)
-{
- pte_t *ptep;
- unsigned long sz = 1UL << hugepd_shift(hugepd);
- unsigned long next;
-
- ptep = hugepte_offset(hugepd, addr, pdshift);
- do {
- next = hugepte_addr_end(addr, end, sz);
- if (!gup_hugepte(ptep, sz, addr, end, write, pages, nr))
- return 0;
- } while (ptep++, addr = next, addr != end);
-
- return 1;
-}
-
#ifdef CONFIG_PPC_MM_SLICES
unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff,
@@ -753,8 +736,8 @@ void flush_dcache_icache_hugepage(struct page *page)
}
}

-int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
- unsigned long end, int write, struct page **pages, int *nr)
+static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
+ unsigned long end, int write, struct page **pages, int *nr)
{
unsigned long pte_end;
struct page *head, *page;
@@ -800,3 +783,20 @@ int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,

return 1;
}
+
+int gup_huge_pd(hugepd_t hugepd, unsigned long addr, unsigned int pdshift,
+ unsigned long end, int write, struct page **pages, int *nr)
+{
+ pte_t *ptep;
+ unsigned long sz = 1UL << hugepd_shift(hugepd);
+ unsigned long next;
+
+ ptep = hugepte_offset(hugepd, addr, pdshift);
+ do {
+ next = hugepte_addr_end(addr, end, sz);
+ if (!gup_hugepte(ptep, sz, addr, end, write, pages, nr))
+ return 0;
+ } while (ptep++, addr = next, addr != end);
+
+ return 1;
+}
--
2.13.3

2019-04-26 06:02:49

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 05/17] powerpc/book3e: hugetlbpage is only for CONFIG_PPC_FSL_BOOK3E

As per Kconfig.cputype, only CONFIG_PPC_FSL_BOOK3E gets to
select SYS_SUPPORTS_HUGETLBFS so simplify accordingly.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/Makefile | 2 +-
arch/powerpc/mm/hugetlbpage-book3e.c | 47 +++++++++++++++---------------------
2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 3c1bd9fa23cd..2c23d1ece034 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -37,7 +37,7 @@ obj-y += hugetlbpage.o
ifdef CONFIG_HUGETLB_PAGE
obj-$(CONFIG_PPC_BOOK3S_64) += hugetlbpage-hash64.o
obj-$(CONFIG_PPC_RADIX_MMU) += hugetlbpage-radix.o
-obj-$(CONFIG_PPC_BOOK3E_MMU) += hugetlbpage-book3e.o
+obj-$(CONFIG_PPC_FSL_BOOK3E) += hugetlbpage-book3e.o
endif
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o
obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o
diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c b/arch/powerpc/mm/hugetlbpage-book3e.c
index c911fe9bfa0e..61915f4d3c7f 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -11,8 +11,9 @@

#include <asm/mmu.h>

-#ifdef CONFIG_PPC_FSL_BOOK3E
#ifdef CONFIG_PPC64
+#include <asm/paca.h>
+
static inline int tlb1_next(void)
{
struct paca_struct *paca = get_paca();
@@ -29,28 +30,6 @@ static inline int tlb1_next(void)
tcd->esel_next = next;
return this;
}
-#else
-static inline int tlb1_next(void)
-{
- int index, ncams;
-
- ncams = mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY;
-
- index = this_cpu_read(next_tlbcam_idx);
-
- /* Just round-robin the entries and wrap when we hit the end */
- if (unlikely(index == ncams - 1))
- __this_cpu_write(next_tlbcam_idx, tlbcam_index);
- else
- __this_cpu_inc(next_tlbcam_idx);
-
- return index;
-}
-#endif /* !PPC64 */
-#endif /* FSL */
-
-#if defined(CONFIG_PPC_FSL_BOOK3E) && defined(CONFIG_PPC64)
-#include <asm/paca.h>

static inline void book3e_tlb_lock(void)
{
@@ -93,6 +72,23 @@ static inline void book3e_tlb_unlock(void)
paca->tcd_ptr->lock = 0;
}
#else
+static inline int tlb1_next(void)
+{
+ int index, ncams;
+
+ ncams = mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY;
+
+ index = this_cpu_read(next_tlbcam_idx);
+
+ /* Just round-robin the entries and wrap when we hit the end */
+ if (unlikely(index == ncams - 1))
+ __this_cpu_write(next_tlbcam_idx, tlbcam_index);
+ else
+ __this_cpu_inc(next_tlbcam_idx);
+
+ return index;
+}
+
static inline void book3e_tlb_lock(void)
{
}
@@ -134,10 +130,7 @@ void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
unsigned long psize, tsize, shift;
unsigned long flags;
struct mm_struct *mm;
-
-#ifdef CONFIG_PPC_FSL_BOOK3E
int index;
-#endif

if (unlikely(is_kernel_addr(ea)))
return;
@@ -161,11 +154,9 @@ void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
return;
}

-#ifdef CONFIG_PPC_FSL_BOOK3E
/* We have to use the CAM(TLB1) on FSL parts for hugepages */
index = tlb1_next();
mtspr(SPRN_MAS0, MAS0_ESEL(index) | MAS0_TLBSEL(1));
-#endif

mas1 = MAS1_VALID | MAS1_TID(mm->context.id) | MAS1_TSIZE(tsize);
mas2 = ea & ~((1UL << shift) - 1);
--
2.13.3

2019-04-26 06:03:09

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 12/17] powerpc/mm: move hugetlb_disabled into asm/hugetlb.h

No need to have this in asm/page.h, move it into asm/hugetlb.h

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/hugetlb.h | 2 ++
arch/powerpc/include/asm/page.h | 1 -
arch/powerpc/kernel/fadump.c | 1 +
arch/powerpc/mm/hash_utils_64.c | 1 +
4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index fd5c0873a57d..84598c6b0959 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -13,6 +13,8 @@
#include <asm/nohash/32/hugetlb-8xx.h>
#endif /* CONFIG_PPC_BOOK3S_64 */

+extern bool hugetlb_disabled;
+
void flush_dcache_icache_hugepage(struct page *page);

int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 748f5db2e2b7..6b508420d92b 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -29,7 +29,6 @@

#ifndef __ASSEMBLY__
#ifdef CONFIG_HUGETLB_PAGE
-extern bool hugetlb_disabled;
extern unsigned int HPAGE_SHIFT;
#else
#define HPAGE_SHIFT PAGE_SHIFT
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 45a8d0be1c96..25f063f56ec5 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -36,6 +36,7 @@
#include <linux/sysfs.h>
#include <linux/slab.h>
#include <linux/cma.h>
+#include <linux/hugetlb.h>

#include <asm/debugfs.h>
#include <asm/page.h>
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index f727197de713..db1aef2adc88 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -37,6 +37,7 @@
#include <linux/context_tracking.h>
#include <linux/libfdt.h>
#include <linux/pkeys.h>
+#include <linux/hugetlb.h>

#include <asm/debugfs.h>
#include <asm/processor.h>
--
2.13.3

2019-04-26 06:03:15

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 04/17] powerpc/64: only book3s/64 supports CONFIG_PPC_64K_PAGES

CONFIG_PPC_64K_PAGES cannot be selected by nohash/64.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/Kconfig | 1 -
arch/powerpc/include/asm/nohash/64/pgalloc.h | 3 ---
arch/powerpc/include/asm/nohash/64/pgtable.h | 4 ----
arch/powerpc/include/asm/nohash/pte-book3e.h | 5 -----
arch/powerpc/include/asm/pgtable-be-types.h | 9 ++------
arch/powerpc/include/asm/pgtable-types.h | 9 ++------
arch/powerpc/include/asm/task_size_64.h | 2 +-
arch/powerpc/mm/tlb_low_64e.S | 31 ----------------------------
arch/powerpc/mm/tlb_nohash.c | 13 ------------
9 files changed, 5 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2d0be82c3061..5d8e692d6470 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -375,7 +375,6 @@ config ZONE_DMA
config PGTABLE_LEVELS
int
default 2 if !PPC64
- default 3 if PPC_64K_PAGES && !PPC_BOOK3S_64
default 4

source "arch/powerpc/sysdev/Kconfig"
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 66d086f85bd5..ded453f9b5a8 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -171,12 +171,9 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,

#define __pmd_free_tlb(tlb, pmd, addr) \
pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX)
-#ifndef CONFIG_PPC_64K_PAGES
#define __pud_free_tlb(tlb, pud, addr) \
pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)

-#endif /* CONFIG_PPC_64K_PAGES */
-
#define check_pgt_cache() do { } while (0)

#endif /* _ASM_POWERPC_PGALLOC_64_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 0384a3302fb6..937a8476ec41 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -10,10 +10,6 @@
#include <asm/barrier.h>
#include <asm/asm-const.h>

-#ifdef CONFIG_PPC_64K_PAGES
-#error "Page size not supported"
-#endif
-
#define FIRST_USER_ADDRESS 0UL

/*
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h b/arch/powerpc/include/asm/nohash/pte-book3e.h
index dd40d200f274..813918f40765 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -60,13 +60,8 @@
#define _PAGE_SPECIAL _PAGE_SW0

/* Base page size */
-#ifdef CONFIG_PPC_64K_PAGES
-#define _PAGE_PSIZE _PAGE_PSIZE_64K
-#define PTE_RPN_SHIFT (28)
-#else
#define _PAGE_PSIZE _PAGE_PSIZE_4K
#define PTE_RPN_SHIFT (24)
-#endif

#define PTE_WIMGE_SHIFT (19)
#define PTE_BAP_SHIFT (2)
diff --git a/arch/powerpc/include/asm/pgtable-be-types.h b/arch/powerpc/include/asm/pgtable-be-types.h
index a89c67b62680..b169bbf95fcb 100644
--- a/arch/powerpc/include/asm/pgtable-be-types.h
+++ b/arch/powerpc/include/asm/pgtable-be-types.h
@@ -33,11 +33,7 @@ static inline __be64 pmd_raw(pmd_t x)
return x.pmd;
}

-/*
- * 64 bit hash always use 4 level table. Everybody else use 4 level
- * only for 4K page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) || !defined(CONFIG_PPC_64K_PAGES)
+/* 64 bit always use 4 level table. */
typedef struct { __be64 pud; } pud_t;
#define __pud(x) ((pud_t) { cpu_to_be64(x) })
#define __pud_raw(x) ((pud_t) { (x) })
@@ -51,7 +47,6 @@ static inline __be64 pud_raw(pud_t x)
return x.pud;
}

-#endif /* CONFIG_PPC_BOOK3S_64 || !CONFIG_PPC_64K_PAGES */
#endif /* CONFIG_PPC64 */

/* PGD level */
@@ -77,7 +72,7 @@ typedef struct { unsigned long pgprot; } pgprot_t;
* With hash config 64k pages additionally define a bigger "real PTE" type that
* gathers the "second half" part of the PTE for pseudo 64k pages
*/
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_BOOK3S_64)
+#ifdef CONFIG_PPC_64K_PAGES
typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
#else
typedef struct { pte_t pte; } real_pte_t;
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index 3b0edf041b2e..d11b4c61d686 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -23,18 +23,13 @@ static inline unsigned long pmd_val(pmd_t x)
return x.pmd;
}

-/*
- * 64 bit hash always use 4 level table. Everybody else use 4 level
- * only for 4K page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) || !defined(CONFIG_PPC_64K_PAGES)
+/* 64 bit always use 4 level table. */
typedef struct { unsigned long pud; } pud_t;
#define __pud(x) ((pud_t) { (x) })
static inline unsigned long pud_val(pud_t x)
{
return x.pud;
}
-#endif /* CONFIG_PPC_BOOK3S_64 || !CONFIG_PPC_64K_PAGES */
#endif /* CONFIG_PPC64 */

/* PGD level */
@@ -54,7 +49,7 @@ typedef struct { unsigned long pgprot; } pgprot_t;
* With hash config 64k pages additionally define a bigger "real PTE" type that
* gathers the "second half" part of the PTE for pseudo 64k pages
*/
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_BOOK3S_64)
+#ifdef CONFIG_PPC_64K_PAGES
typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
#else
typedef struct { pte_t pte; } real_pte_t;
diff --git a/arch/powerpc/include/asm/task_size_64.h b/arch/powerpc/include/asm/task_size_64.h
index eab4779f6b84..c993482237ed 100644
--- a/arch/powerpc/include/asm/task_size_64.h
+++ b/arch/powerpc/include/asm/task_size_64.h
@@ -20,7 +20,7 @@
/*
* For now 512TB is only supported with book3s and 64K linux page size.
*/
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+#ifdef CONFIG_PPC_64K_PAGES
/*
* Max value currently used:
*/
diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index 9ed90064f542..58959ce15415 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -24,11 +24,7 @@
#include <asm/kvm_booke_hv_asm.h>
#include <asm/feature-fixups.h>

-#ifdef CONFIG_PPC_64K_PAGES
-#define VPTE_PMD_SHIFT (PTE_INDEX_SIZE+1)
-#else
#define VPTE_PMD_SHIFT (PTE_INDEX_SIZE)
-#endif
#define VPTE_PUD_SHIFT (VPTE_PMD_SHIFT + PMD_INDEX_SIZE)
#define VPTE_PGD_SHIFT (VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
#define VPTE_INDEX_SIZE (VPTE_PGD_SHIFT + PGD_INDEX_SIZE)
@@ -167,13 +163,11 @@ MMU_FTR_SECTION_ELSE
ldx r14,r14,r15 /* grab pgd entry */
ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_USE_TLBRSRV)

-#ifndef CONFIG_PPC_64K_PAGES
rldicl r15,r16,64-PUD_SHIFT+3,64-PUD_INDEX_SIZE-3
clrrdi r15,r15,3
cmpdi cr0,r14,0
bge tlb_miss_fault_bolted /* Bad pgd entry or hugepage; bail */
ldx r14,r14,r15 /* grab pud entry */
-#endif /* CONFIG_PPC_64K_PAGES */

rldicl r15,r16,64-PMD_SHIFT+3,64-PMD_INDEX_SIZE-3
clrrdi r15,r15,3
@@ -682,18 +676,7 @@ normal_tlb_miss:
* order to handle the weird page table format used by linux
*/
ori r10,r15,0x1
-#ifdef CONFIG_PPC_64K_PAGES
- /* For the top bits, 16 bytes per PTE */
- rldicl r14,r16,64-(PAGE_SHIFT-4),PAGE_SHIFT-4+4
- /* Now create the bottom bits as 0 in position 0x8000 and
- * the rest calculated for 8 bytes per PTE
- */
- rldicl r15,r16,64-(PAGE_SHIFT-3),64-15
- /* Insert the bottom bits in */
- rlwimi r14,r15,0,16,31
-#else
rldicl r14,r16,64-(PAGE_SHIFT-3),PAGE_SHIFT-3+4
-#endif
sldi r15,r10,60
clrrdi r14,r14,3
or r10,r15,r14
@@ -732,11 +715,7 @@ finish_normal_tlb_miss:

/* Check page size, if not standard, update MAS1 */
rldicl r11,r14,64-8,64-8
-#ifdef CONFIG_PPC_64K_PAGES
- cmpldi cr0,r11,BOOK3E_PAGESZ_64K
-#else
cmpldi cr0,r11,BOOK3E_PAGESZ_4K
-#endif
beq- 1f
mfspr r11,SPRN_MAS1
rlwimi r11,r14,31,21,24
@@ -857,14 +836,12 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
cmpdi cr0,r15,0
bge virt_page_table_tlb_miss_fault

-#ifndef CONFIG_PPC_64K_PAGES
/* Get to PUD entry */
rldicl r11,r16,64-VPTE_PUD_SHIFT,64-PUD_INDEX_SIZE-3
clrrdi r10,r11,3
ldx r15,r10,r15
cmpdi cr0,r15,0
bge virt_page_table_tlb_miss_fault
-#endif /* CONFIG_PPC_64K_PAGES */

/* Get to PMD entry */
rldicl r11,r16,64-VPTE_PMD_SHIFT,64-PMD_INDEX_SIZE-3
@@ -1106,14 +1083,12 @@ htw_tlb_miss:
cmpdi cr0,r15,0
bge htw_tlb_miss_fault

-#ifndef CONFIG_PPC_64K_PAGES
/* Get to PUD entry */
rldicl r11,r16,64-(PUD_SHIFT-3),64-PUD_INDEX_SIZE-3
clrrdi r10,r11,3
ldx r15,r10,r15
cmpdi cr0,r15,0
bge htw_tlb_miss_fault
-#endif /* CONFIG_PPC_64K_PAGES */

/* Get to PMD entry */
rldicl r11,r16,64-(PMD_SHIFT-3),64-PMD_INDEX_SIZE-3
@@ -1132,9 +1107,7 @@ htw_tlb_miss:
* 4K page we need to extract a bit from the virtual address and
* insert it into the "PA52" bit of the RPN.
*/
-#ifndef CONFIG_PPC_64K_PAGES
rlwimi r15,r16,32-9,20,20
-#endif
/* Now we build the MAS:
*
* MAS 0 : Fully setup with defaults in MAS4 and TLBnCFG
@@ -1144,11 +1117,7 @@ htw_tlb_miss:
* MAS 2 : Use defaults
* MAS 3+7 : Needs to be done
*/
-#ifdef CONFIG_PPC_64K_PAGES
- ori r10,r15,(BOOK3E_PAGESZ_64K << MAS3_SPSIZE_SHIFT)
-#else
ori r10,r15,(BOOK3E_PAGESZ_4K << MAS3_SPSIZE_SHIFT)
-#endif

BEGIN_MMU_FTR_SECTION
srdi r16,r10,32
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 088e0a6b5ade..1514d5e2a73b 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -433,11 +433,7 @@ void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned long address)
unsigned long rid = (address & rmask) | 0x1000000000000000ul;
unsigned long vpte = address & ~rmask;

-#ifdef CONFIG_PPC_64K_PAGES
- vpte = (vpte >> (PAGE_SHIFT - 4)) & ~0xfffful;
-#else
vpte = (vpte >> (PAGE_SHIFT - 3)) & ~0xffful;
-#endif
vpte |= rid;
__flush_tlb_page(tlb->mm, vpte, tsize, 0);
}
@@ -625,21 +621,12 @@ static void early_init_this_mmu(void)

case PPC_HTW_IBM:
mas4 |= MAS4_INDD;
-#ifdef CONFIG_PPC_64K_PAGES
- mas4 |= BOOK3E_PAGESZ_256M << MAS4_TSIZED_SHIFT;
- mmu_pte_psize = MMU_PAGE_256M;
-#else
mas4 |= BOOK3E_PAGESZ_1M << MAS4_TSIZED_SHIFT;
mmu_pte_psize = MMU_PAGE_1M;
-#endif
break;

case PPC_HTW_NONE:
-#ifdef CONFIG_PPC_64K_PAGES
- mas4 |= BOOK3E_PAGESZ_64K << MAS4_TSIZED_SHIFT;
-#else
mas4 |= BOOK3E_PAGESZ_4K << MAS4_TSIZED_SHIFT;
-#endif
mmu_pte_psize = mmu_virtual_psize;
break;
}
--
2.13.3

2019-04-26 06:03:22

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 03/17] powerpc/book3e: drop mmu_get_tsize()

This function is not used anymore, drop it.

Fixes: b42279f0165c ("powerpc/mm/nohash: MM_SLICE is only used by book3s 64")
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/hugetlbpage-book3e.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c b/arch/powerpc/mm/hugetlbpage-book3e.c
index f84ec46cdb26..c911fe9bfa0e 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -49,11 +49,6 @@ static inline int tlb1_next(void)
#endif /* !PPC64 */
#endif /* FSL */

-static inline int mmu_get_tsize(int psize)
-{
- return mmu_psize_defs[psize].enc;
-}
-
#if defined(CONFIG_PPC_FSL_BOOK3E) && defined(CONFIG_PPC64)
#include <asm/paca.h>

--
2.13.3

2019-04-26 06:03:35

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 14/17] powerpc/mm: cleanup remaining ifdef mess in hugetlbpage.c

Only 3 subarches support huge pages. So when it is either 2 of them,
it is not the third one.

And mmu_has_feature() is known by all subarches so IS_ENABLED() can
be used instead of #ifdef

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/hugetlbpage.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 265bd6d04233..1d5c6ec04351 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -226,7 +226,7 @@ int __init alloc_bootmem_huge_page(struct hstate *h)
return __alloc_bootmem_huge_page(h);
}

-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
+#ifndef CONFIG_PPC_BOOK3S_64
#define HUGEPD_FREELIST_SIZE \
((PAGE_SIZE - sizeof(struct hugepd_freelist)) / sizeof(pte_t))

@@ -597,10 +597,10 @@ static int __init hugetlbpage_init(void)
return 0;
}

-#if !defined(CONFIG_PPC_FSL_BOOK3E) && !defined(CONFIG_PPC_8xx)
- if (!radix_enabled() && !mmu_has_feature(MMU_FTR_16M_PAGE))
+ if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled() &&
+ !mmu_has_feature(MMU_FTR_16M_PAGE))
return -ENODEV;
-#endif
+
for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
unsigned shift;
unsigned pdshift;
@@ -638,10 +638,8 @@ static int __init hugetlbpage_init(void)
pgtable_cache_add(PTE_INDEX_SIZE);
else if (pdshift > shift)
pgtable_cache_add(pdshift - shift);
-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
- else
+ else if (IS_ENABLED(CONFIG_PPC_FSL_BOOK3E) || IS_ENABLED(CONFIG_PPC_8xx))
pgtable_cache_add(PTE_T_ORDER);
-#endif
}

if (IS_ENABLED(HUGETLB_PAGE_SIZE_VARIABLE))
--
2.13.3

2019-04-26 06:03:50

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 06/17] powerpc/mm: move __find_linux_pte() out of hugetlbpage.c

__find_linux_pte() is the only function in hugetlbpage.c
which is compiled in regardless on CONFIG_HUGETLBPAGE

This patch moves it in pgtable.c.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/hugetlbpage.c | 103 -----------------------------------------
arch/powerpc/mm/pgtable.c | 104 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+), 103 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 6b9b6d747df3..81daa0e38665 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -758,109 +758,6 @@ void flush_dcache_icache_hugepage(struct page *page)

#endif /* CONFIG_HUGETLB_PAGE */

-/*
- * We have 4 cases for pgds and pmds:
- * (1) invalid (all zeroes)
- * (2) pointer to next table, as normal; bottom 6 bits == 0
- * (3) leaf pte for huge page _PAGE_PTE set
- * (4) hugepd pointer, _PAGE_PTE = 0 and bits [2..6] indicate size of table
- *
- * So long as we atomically load page table pointers we are safe against teardown,
- * we can follow the address down to the the page and take a ref on it.
- * This function need to be called with interrupts disabled. We use this variant
- * when we have MSR[EE] = 0 but the paca->irq_soft_mask = IRQS_ENABLED
- */
-pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
- bool *is_thp, unsigned *hpage_shift)
-{
- pgd_t pgd, *pgdp;
- pud_t pud, *pudp;
- pmd_t pmd, *pmdp;
- pte_t *ret_pte;
- hugepd_t *hpdp = NULL;
- unsigned pdshift = PGDIR_SHIFT;
-
- if (hpage_shift)
- *hpage_shift = 0;
-
- if (is_thp)
- *is_thp = false;
-
- pgdp = pgdir + pgd_index(ea);
- pgd = READ_ONCE(*pgdp);
- /*
- * Always operate on the local stack value. This make sure the
- * value don't get updated by a parallel THP split/collapse,
- * page fault or a page unmap. The return pte_t * is still not
- * stable. So should be checked there for above conditions.
- */
- if (pgd_none(pgd))
- return NULL;
- else if (pgd_huge(pgd)) {
- ret_pte = (pte_t *) pgdp;
- goto out;
- } else if (is_hugepd(__hugepd(pgd_val(pgd))))
- hpdp = (hugepd_t *)&pgd;
- else {
- /*
- * Even if we end up with an unmap, the pgtable will not
- * be freed, because we do an rcu free and here we are
- * irq disabled
- */
- pdshift = PUD_SHIFT;
- pudp = pud_offset(&pgd, ea);
- pud = READ_ONCE(*pudp);
-
- if (pud_none(pud))
- return NULL;
- else if (pud_huge(pud)) {
- ret_pte = (pte_t *) pudp;
- goto out;
- } else if (is_hugepd(__hugepd(pud_val(pud))))
- hpdp = (hugepd_t *)&pud;
- else {
- pdshift = PMD_SHIFT;
- pmdp = pmd_offset(&pud, ea);
- pmd = READ_ONCE(*pmdp);
- /*
- * A hugepage collapse is captured by pmd_none, because
- * it mark the pmd none and do a hpte invalidate.
- */
- if (pmd_none(pmd))
- return NULL;
-
- if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
- if (is_thp)
- *is_thp = true;
- ret_pte = (pte_t *) pmdp;
- goto out;
- }
- /*
- * pmd_large check below will handle the swap pmd pte
- * we need to do both the check because they are config
- * dependent.
- */
- if (pmd_huge(pmd) || pmd_large(pmd)) {
- ret_pte = (pte_t *) pmdp;
- goto out;
- } else if (is_hugepd(__hugepd(pmd_val(pmd))))
- hpdp = (hugepd_t *)&pmd;
- else
- return pte_offset_kernel(&pmd, ea);
- }
- }
- if (!hpdp)
- return NULL;
-
- ret_pte = hugepte_offset(*hpdp, ea, pdshift);
- pdshift = hugepd_shift(*hpdp);
-out:
- if (hpage_shift)
- *hpage_shift = pdshift;
- return ret_pte;
-}
-EXPORT_SYMBOL_GPL(__find_linux_pte);
-
int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
unsigned long end, int write, struct page **pages, int *nr)
{
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index d3d61d29b4f1..9f4ccd15849f 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -30,6 +30,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
+#include <asm/hugetlb.h>

static inline int is_exec_fault(void)
{
@@ -299,3 +300,106 @@ unsigned long vmalloc_to_phys(void *va)
return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va);
}
EXPORT_SYMBOL_GPL(vmalloc_to_phys);
+
+/*
+ * We have 4 cases for pgds and pmds:
+ * (1) invalid (all zeroes)
+ * (2) pointer to next table, as normal; bottom 6 bits == 0
+ * (3) leaf pte for huge page _PAGE_PTE set
+ * (4) hugepd pointer, _PAGE_PTE = 0 and bits [2..6] indicate size of table
+ *
+ * So long as we atomically load page table pointers we are safe against teardown,
+ * we can follow the address down to the the page and take a ref on it.
+ * This function need to be called with interrupts disabled. We use this variant
+ * when we have MSR[EE] = 0 but the paca->irq_soft_mask = IRQS_ENABLED
+ */
+pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
+ bool *is_thp, unsigned *hpage_shift)
+{
+ pgd_t pgd, *pgdp;
+ pud_t pud, *pudp;
+ pmd_t pmd, *pmdp;
+ pte_t *ret_pte;
+ hugepd_t *hpdp = NULL;
+ unsigned pdshift = PGDIR_SHIFT;
+
+ if (hpage_shift)
+ *hpage_shift = 0;
+
+ if (is_thp)
+ *is_thp = false;
+
+ pgdp = pgdir + pgd_index(ea);
+ pgd = READ_ONCE(*pgdp);
+ /*
+ * Always operate on the local stack value. This make sure the
+ * value don't get updated by a parallel THP split/collapse,
+ * page fault or a page unmap. The return pte_t * is still not
+ * stable. So should be checked there for above conditions.
+ */
+ if (pgd_none(pgd))
+ return NULL;
+ else if (pgd_huge(pgd)) {
+ ret_pte = (pte_t *) pgdp;
+ goto out;
+ } else if (is_hugepd(__hugepd(pgd_val(pgd))))
+ hpdp = (hugepd_t *)&pgd;
+ else {
+ /*
+ * Even if we end up with an unmap, the pgtable will not
+ * be freed, because we do an rcu free and here we are
+ * irq disabled
+ */
+ pdshift = PUD_SHIFT;
+ pudp = pud_offset(&pgd, ea);
+ pud = READ_ONCE(*pudp);
+
+ if (pud_none(pud))
+ return NULL;
+ else if (pud_huge(pud)) {
+ ret_pte = (pte_t *) pudp;
+ goto out;
+ } else if (is_hugepd(__hugepd(pud_val(pud))))
+ hpdp = (hugepd_t *)&pud;
+ else {
+ pdshift = PMD_SHIFT;
+ pmdp = pmd_offset(&pud, ea);
+ pmd = READ_ONCE(*pmdp);
+ /*
+ * A hugepage collapse is captured by pmd_none, because
+ * it mark the pmd none and do a hpte invalidate.
+ */
+ if (pmd_none(pmd))
+ return NULL;
+
+ if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
+ if (is_thp)
+ *is_thp = true;
+ ret_pte = (pte_t *) pmdp;
+ goto out;
+ }
+ /*
+ * pmd_large check below will handle the swap pmd pte
+ * we need to do both the check because they are config
+ * dependent.
+ */
+ if (pmd_huge(pmd) || pmd_large(pmd)) {
+ ret_pte = (pte_t *) pmdp;
+ goto out;
+ } else if (is_hugepd(__hugepd(pmd_val(pmd))))
+ hpdp = (hugepd_t *)&pmd;
+ else
+ return pte_offset_kernel(&pmd, ea);
+ }
+ }
+ if (!hpdp)
+ return NULL;
+
+ ret_pte = hugepte_offset(*hpdp, ea, pdshift);
+ pdshift = hugepd_shift(*hpdp);
+out:
+ if (hpage_shift)
+ *hpage_shift = pdshift;
+ return ret_pte;
+}
+EXPORT_SYMBOL_GPL(__find_linux_pte);
--
2.13.3

2019-04-26 06:23:31

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 02/17] powerpc/mm: don't BUG in add_huge_page_size()

Use VM_BUG_ON() instead of BUG_ON(), as those BUG_ON()
are not there to catch runtime errors but to catch errors
during development cycle only.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 9e732bb2c84a..6b9b6d747df3 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -634,7 +634,7 @@ static int __init add_huge_page_size(unsigned long long size)
}
#endif

- BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
+ VM_BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);

/* Return if huge page size has already been setup */
if (size_to_hstate(size))
--
2.13.3

2019-04-26 06:23:45

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 07/17] powerpc/mm: make hugetlbpage.c depend on CONFIG_HUGETLB_PAGE

The only function in hugetlbpage.c which doesn't depend on
CONFIG_HUGETLB_PAGE is gup_hugepte(), and this function is
only called from gup_huge_pd() which depends on
CONFIG_HUGETLB_PAGE so all the content of hugetlbpage.c
depends on CONFIG_HUGETLB_PAGE.

This patch modifies Makefile to only compile hugetlbpage.c
when CONFIG_HUGETLB_PAGE is set.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/Makefile | 2 +-
arch/powerpc/mm/hugetlbpage.c | 5 -----
2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 2c23d1ece034..20b900537fc9 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -33,7 +33,7 @@ obj-$(CONFIG_PPC_FSL_BOOK3E) += fsl_booke_mmu.o
obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
obj-$(CONFIG_PPC_SPLPAR) += vphn.o
obj-$(CONFIG_PPC_MM_SLICES) += slice.o
-obj-y += hugetlbpage.o
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
ifdef CONFIG_HUGETLB_PAGE
obj-$(CONFIG_PPC_BOOK3S_64) += hugetlbpage-hash64.o
obj-$(CONFIG_PPC_RADIX_MMU) += hugetlbpage-radix.o
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 81daa0e38665..809f8f167962 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -26,9 +26,6 @@
#include <asm/hugetlb.h>
#include <asm/pte-walk.h>

-
-#ifdef CONFIG_HUGETLB_PAGE
-
#define PAGE_SHIFT_64K 16
#define PAGE_SHIFT_512K 19
#define PAGE_SHIFT_8M 23
@@ -756,8 +753,6 @@ void flush_dcache_icache_hugepage(struct page *page)
}
}

-#endif /* CONFIG_HUGETLB_PAGE */
-
int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
unsigned long end, int write, struct page **pages, int *nr)
{
--
2.13.3

2019-04-26 06:24:24

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 11/17] powerpc/mm: cleanup ifdef mess in add_huge_page_size()

Introduce a subarch specific helper check_and_get_huge_psize()
to check the huge page sizes and cleanup the ifdef mess in
add_huge_page_size()

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/book3s/64/hugetlb.h | 27 +++++++++++++++++
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 5 ++++
arch/powerpc/include/asm/nohash/hugetlb-book3e.h | 8 +++++
arch/powerpc/mm/hugetlbpage.c | 37 ++----------------------
4 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 5537046b4c78..0756036e2b2d 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -107,4 +107,31 @@ static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int pshi

void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);

+static inline int check_and_get_huge_psize(int shift)
+{
+ int mmu_psize;
+
+ if (shift > SLICE_HIGH_SHIFT)
+ return -EINVAL;
+
+ mmu_psize = shift_to_mmu_psize(shift);
+
+ /*
+ * We need to make sure that for different page sizes reported by
+ * firmware we only add hugetlb support for page sizes that can be
+ * supported by linux page table layout.
+ * For now we have
+ * Radix: 2M and 1G
+ * Hash: 16M and 16G
+ */
+ if (radix_enabled()) {
+ if (mmu_psize != MMU_PAGE_2M && mmu_psize != MMU_PAGE_1G)
+ return -EINVAL;
+ } else {
+ if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
+ return -EINVAL;
+ }
+ return mmu_psize;
+}
+
#endif
diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
index d9aa72fd76b3..baa386c51d87 100644
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -36,4 +36,9 @@ static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int pshi
(pshift == PAGE_SHIFT_8M ? _PMD_PAGE_8M : _PMD_PAGE_512K));
}

+static inline int check_and_get_huge_psize(int shift)
+{
+ return shift_to_mmu_psize(shift);
+}
+
#endif /* _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H */
diff --git a/arch/powerpc/include/asm/nohash/hugetlb-book3e.h b/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
index 51439bcfe313..ecd8694cb229 100644
--- a/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
+++ b/arch/powerpc/include/asm/nohash/hugetlb-book3e.h
@@ -34,4 +34,12 @@ static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int pshi
*hpdp = __hugepd(((unsigned long)new & ~PD_HUGE) | pshift);
}

+static inline int check_and_get_huge_psize(int shift)
+{
+ if (shift & 1) /* Not a power of 4 */
+ return -EINVAL;
+
+ return shift_to_mmu_psize(shift);
+}
+
#endif /* _ASM_POWERPC_NOHASH_HUGETLB_BOOK3E_H */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 0322a0617371..828860a7492e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -551,13 +551,6 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
return vma_kernel_pagesize(vma);
}

-static inline bool is_power_of_4(unsigned long x)
-{
- if (is_power_of_2(x))
- return (__ilog2(x) % 2) ? false : true;
- return false;
-}
-
static int __init add_huge_page_size(unsigned long long size)
{
int shift = __ffs(size);
@@ -565,37 +558,13 @@ static int __init add_huge_page_size(unsigned long long size)

/* Check that it is a page size supported by the hardware and
* that it fits within pagetable and slice limits. */
- if (size <= PAGE_SIZE)
- return -EINVAL;
-#if defined(CONFIG_PPC_FSL_BOOK3E)
- if (!is_power_of_4(size))
+ if (size <= PAGE_SIZE || !is_power_of_2(size))
return -EINVAL;
-#elif !defined(CONFIG_PPC_8xx)
- if (!is_power_of_2(size) || (shift > SLICE_HIGH_SHIFT))
- return -EINVAL;
-#endif

- if ((mmu_psize = shift_to_mmu_psize(shift)) < 0)
+ mmu_psize = check_and_get_huge_psize(size);
+ if (mmu_psize < 0)
return -EINVAL;

-#ifdef CONFIG_PPC_BOOK3S_64
- /*
- * We need to make sure that for different page sizes reported by
- * firmware we only add hugetlb support for page sizes that can be
- * supported by linux page table layout.
- * For now we have
- * Radix: 2M and 1G
- * Hash: 16M and 16G
- */
- if (radix_enabled()) {
- if (mmu_psize != MMU_PAGE_2M && mmu_psize != MMU_PAGE_1G)
- return -EINVAL;
- } else {
- if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
- return -EINVAL;
- }
-#endif
-
VM_BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);

/* Return if huge page size has already been setup */
--
2.13.3

2019-05-02 12:03:30

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] powerpc/mm: Don't BUG() in hugepd_page()

Christophe Leroy <[email protected]> writes:
> Use VM_BUG_ON() instead of BUG_ON(), as those BUG_ON()
> are not there to catch runtime errors but to catch errors
> during development cycle only.

I've dropped this one and the next, because I don't like VM_BUG_ON().

Why not? Because it's contradictory. It's a condition that's so
important that we should BUG, but only if the kernel has been built
specially for debugging.

I don't really buy the development cycle distinction, it's not like we
have a rigorous test suite that we run and then we declare everything's
gold and ship a product. We often don't find bugs until they're hit in
the wild.

For example the recent corruption Joel discovered with STRICT_KERNEL_RWX
could have been caught by a BUG_ON() to check we weren't patching kernel
text in radix__change_memory_range(), but he wouldn't have been using
CONFIG_DEBUG_VM. (See 8adddf349fda)

I know Aneesh disagrees with me on this, so maybe you two can convince
me otherwise.

cheers

> diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> index 8d40565ad0c3..7f1867e428c0 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -14,7 +14,7 @@
> */
> static inline pte_t *hugepd_page(hugepd_t hpd)
> {
> - BUG_ON(!hugepd_ok(hpd));
> + VM_BUG_ON(!hugepd_ok(hpd));
> /*
> * We have only four bits to encode, MMU page size
> */
> @@ -42,7 +42,7 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma,
>
> static inline pte_t *hugepd_page(hugepd_t hpd)
> {
> - BUG_ON(!hugepd_ok(hpd));
> + VM_BUG_ON(!hugepd_ok(hpd));
> #ifdef CONFIG_PPC_8xx
> return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
> #else
> --
> 2.13.3

2019-05-02 12:14:48

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] powerpc/mm: Don't BUG() in hugepd_page()



Le 02/05/2019 à 14:02, Michael Ellerman a écrit :
> Christophe Leroy <[email protected]> writes:
>> Use VM_BUG_ON() instead of BUG_ON(), as those BUG_ON()
>> are not there to catch runtime errors but to catch errors
>> during development cycle only.
>
> I've dropped this one and the next, because I don't like VM_BUG_ON().
>
> Why not? Because it's contradictory. It's a condition that's so
> important that we should BUG, but only if the kernel has been built
> specially for debugging.
>
> I don't really buy the development cycle distinction, it's not like we
> have a rigorous test suite that we run and then we declare everything's
> gold and ship a product. We often don't find bugs until they're hit in
> the wild.
>
> For example the recent corruption Joel discovered with STRICT_KERNEL_RWX
> could have been caught by a BUG_ON() to check we weren't patching kernel
> text in radix__change_memory_range(), but he wouldn't have been using
> CONFIG_DEBUG_VM. (See 8adddf349fda)
>
> I know Aneesh disagrees with me on this, so maybe you two can convince
> me otherwise.
>

I have no strong oppinion about this. In v1, I replaced them with a
WARN_ON(), and Aneesh suggested to go with VM_BUG_ON() instead.

My main purpose was to reduce the amount of BUG/BUG_ON and I thought
those were good candidates, but if you prefer keeping the BUG(), that's
ok for me. Or maybe you prefered v1 alternatives (series at
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=98170) ?

Christophe

2019-05-03 07:30:03

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] powerpc/book3e: drop mmu_get_tsize()

On Fri, 2019-04-26 at 05:59:38 UTC, Christophe Leroy wrote:
> This function is not used anymore, drop it.
>
> Fixes: b42279f0165c ("powerpc/mm/nohash: MM_SLICE is only used by book3s 64")
> Signed-off-by: Christophe Leroy <[email protected]>

Patches 3-17 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a521c44c3ded9fe184c5de3eed3a442a

cheers