2014-04-16 11:46:52

by Steve Capper

[permalink] [raw]
Subject: [PATCH V2 0/5] Huge pages for short descriptors on ARM

Hello,
This series brings HugeTLB pages and Transparent Huge Pages (THP) to
ARM on short descriptors.

Russell, Andrew,
I would like to get this in next (and hopefully 3.16 if no problems
arise) if that sounds reasonable?

There's one patch at the beginning of the series for mm:
mm: hugetlb: Introduce huge_pte_{page,present,young}
This has been tested on ARM and s390 and should compile out for other
architectures.

The rest of the series targets arch/arm.

I've bumped the series to V2 as it was rebased (and tested against)
v3.15-rc1. On ARM the libhugetlbfs test suite, some THP PROT_NONE
tests and the recursive execve test all passed successfully.

Thanks,
--
Steve


Steve Capper (5):
mm: hugetlb: Introduce huge_pte_{page,present,young}
arm: mm: Adjust the parameters for __sync_icache_dcache
arm: mm: Make mmu_gather aware of huge pages
arm: mm: HugeTLB support for non-LPAE systems
arm: mm: Add Transparent HugePage support for non-LPAE

arch/arm/Kconfig | 4 +-
arch/arm/include/asm/hugetlb-2level.h | 136 ++++++++++++++++++++++++++++++++++
arch/arm/include/asm/hugetlb-3level.h | 6 ++
arch/arm/include/asm/hugetlb.h | 10 +--
arch/arm/include/asm/pgtable-2level.h | 129 +++++++++++++++++++++++++++++++-
arch/arm/include/asm/pgtable-3level.h | 3 +-
arch/arm/include/asm/pgtable.h | 9 +--
arch/arm/include/asm/tlb.h | 14 +++-
arch/arm/kernel/head.S | 10 ++-
arch/arm/mm/fault.c | 13 ----
arch/arm/mm/flush.c | 9 +--
arch/arm/mm/fsr-2level.c | 4 +-
arch/arm/mm/hugetlbpage.c | 2 +-
arch/arm/mm/mmu.c | 51 +++++++++++++
arch/s390/include/asm/hugetlb.h | 15 ++++
include/asm-generic/hugetlb.h | 15 ++++
mm/hugetlb.c | 22 +++---
17 files changed, 399 insertions(+), 53 deletions(-)
create mode 100644 arch/arm/include/asm/hugetlb-2level.h

--
1.8.1.4


2014-04-16 11:47:04

by Steve Capper

[permalink] [raw]
Subject: [PATCH V2 2/5] arm: mm: Adjust the parameters for __sync_icache_dcache

Rather than take a pte_t as an input, break this down to the pfn
and whether or not the memory is executable.

This allows us to use this function for ptes and pmds.

Signed-off-by: Steve Capper <[email protected]>
---
arch/arm/include/asm/pgtable.h | 6 +++---
arch/arm/mm/flush.c | 9 ++++-----
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 5478e5d..3a9c238 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -228,11 +228,11 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
(pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte))

#if __LINUX_ARM_ARCH__ < 6
-static inline void __sync_icache_dcache(pte_t pteval)
+static inline void __sync_icache_dcache(unsigned long pfn, int exec);
{
}
#else
-extern void __sync_icache_dcache(pte_t pteval);
+extern void __sync_icache_dcache(unsigned long pfn, int exec);
#endif

static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
@@ -241,7 +241,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
unsigned long ext = 0;

if (addr < TASK_SIZE && pte_valid_user(pteval)) {
- __sync_icache_dcache(pteval);
+ __sync_icache_dcache(pte_pfn(pteval), pte_exec(pteval));
ext |= PTE_EXT_NG;
}

diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 3387e60..df0d5ca 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -232,16 +232,15 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
}

#if __LINUX_ARM_ARCH__ >= 6
-void __sync_icache_dcache(pte_t pteval)
+void __sync_icache_dcache(unsigned long pfn, int exec)
{
- unsigned long pfn;
struct page *page;
struct address_space *mapping;

- if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
+ if (cache_is_vipt_nonaliasing() && !exec)
/* only flush non-aliasing VIPT caches for exec mappings */
return;
- pfn = pte_pfn(pteval);
+
if (!pfn_valid(pfn))
return;

@@ -254,7 +253,7 @@ void __sync_icache_dcache(pte_t pteval)
if (!test_and_set_bit(PG_dcache_clean, &page->flags))
__flush_dcache_page(mapping, page);

- if (pte_exec(pteval))
+ if (exec)
__flush_icache_all();
}
#endif
--
1.8.1.4

2014-04-16 11:47:35

by Steve Capper

[permalink] [raw]
Subject: [PATCH V2 5/5] arm: mm: Add Transparent HugePage support for non-LPAE

Much of the required code for THP has been implemented in the
earlier non-LPAE HugeTLB patch.

One more domain bit is used (to store whether or not the THP is
splitting).

Some THP helper functions are defined; and we have to re-define
pmd_page such that it distinguishes between page tables and
sections.

Signed-off-by: Steve Capper <[email protected]>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/pgtable-2level.h | 32 ++++++++++++++++++++++++++++++++
arch/arm/include/asm/pgtable-3level.h | 1 +
arch/arm/include/asm/pgtable.h | 2 --
arch/arm/include/asm/tlb.h | 3 +++
5 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5e80fad..f5d4354 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1836,7 +1836,7 @@ config SYS_SUPPORTS_HUGETLBFS

config HAVE_ARCH_TRANSPARENT_HUGEPAGE
def_bool y
- depends on ARM_LPAE
+ depends on SYS_SUPPORTS_HUGETLBFS

config ARCH_WANT_GENERAL_HUGETLB
def_bool y
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 323e19f..bc1a7b8 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -212,6 +212,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
*/
#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
+#define PMD_DSECT_SPLITTING (_AT(pmdval_t, 1) << 7)

#define PMD_BIT_FUNC(fn,op) \
static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
@@ -232,6 +233,16 @@ extern pgprot_t get_huge_pgprot(pgprot_t newprot);

#define pfn_pmd(pfn,prot) __pmd(__pfn_to_phys(pfn) | pgprot_val(prot));
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),get_huge_pgprot(prot));
+#define pmd_mkhuge(pmd) (pmd)
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_DSECT_SPLITTING)
+#define pmd_trans_huge(pmd) (pmd_thp_or_huge(pmd))
+#else
+static inline int pmd_trans_huge(pmd_t pmd);
+#endif
+
+#define pmd_mknotpresent(pmd) (__pmd(0))

PMD_BIT_FUNC(mkdirty, |= PMD_DSECT_DIRTY);
PMD_BIT_FUNC(mkwrite, |= PMD_SECT_AP_WRITE);
@@ -239,6 +250,8 @@ PMD_BIT_FUNC(wrprotect, &= ~PMD_SECT_AP_WRITE);
PMD_BIT_FUNC(mknexec, |= PMD_SECT_XN);
PMD_BIT_FUNC(rmprotnone, |= PMD_TYPE_SECT);
PMD_BIT_FUNC(mkyoung, |= PMD_DSECT_AF);
+PMD_BIT_FUNC(mkold, &= ~PMD_DSECT_AF);
+PMD_BIT_FUNC(mksplitting, |= PMD_DSECT_SPLITTING);

#define pmd_young(pmd) (pmd_val(pmd) & PMD_DSECT_AF)
#define pmd_write(pmd) (pmd_val(pmd) & PMD_SECT_AP_WRITE)
@@ -279,6 +292,25 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
return pmd;
}

+static inline int has_transparent_hugepage(void)
+{
+ return 1;
+}
+
+static inline struct page *pmd_page(pmd_t pmd)
+{
+ /*
+ * for a section, we need to mask off more of the pmd
+ * before looking up the page as it is a section descriptor.
+ *
+ * pmd_page only gets sections from the thp code.
+ */
+ if (pmd_trans_huge(pmd))
+ return (phys_to_page(pmd_val(pmd) & HPAGE_MASK));
+
+ return phys_to_page(pmd_val(pmd) & PHYS_MASK);
+}
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_PGTABLE_2LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index a4b71c1..82c61d6 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -214,6 +214,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)

#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd))
#define pmd_thp_or_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
+#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 576511f2..95f1909 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -189,8 +189,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
}

-#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
-
#ifndef CONFIG_HIGHPTE
#define __pte_map(pmd) pmd_page_vaddr(*(pmd))
#define __pte_unmap(pte) do { } while (0)
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index b2498e6..77037d9 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -218,6 +218,9 @@ static inline void
tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
{
tlb_add_flush(tlb, addr);
+#ifndef CONFIG_ARM_LPAE
+ tlb_add_flush(tlb, addr + SZ_1M);
+#endif
}

#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
--
1.8.1.4

2014-04-16 11:47:33

by Steve Capper

[permalink] [raw]
Subject: [PATCH V2 4/5] arm: mm: HugeTLB support for non-LPAE systems

Add huge page support for systems with short descriptors. Rather than
store separate linux/hardware huge ptes, we work directly with the
hardware descriptors at the pmd level.

As we work directly with the pmd and need to store information that
doesn't directly correspond to hardware bits (such as the accessed
flag and dirty bit); we re-purporse the domain bits of the short
section descriptor. In order to use these domain bits for storage,
we need to make ourselves a client for all 16 domains and this is
done in head.S.

Storing extra information in the domain bits also makes it a lot
easier to implement Transparent Huge Pages, and some of the code in
pgtable-2level.h is arranged to facilitate THP support in a later
patch.

Non-LPAE HugeTLB pages are incompatible with the huge page migration
code (enabled when CONFIG_MEMORY_FAILURE is selected) as that code
dereferences PTEs directly, rather than calling huge_ptep_get and
set_huge_pte_at.

Signed-off-by: Steve Capper <[email protected]>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/hugetlb-2level.h | 136 ++++++++++++++++++++++++++++++++++
arch/arm/include/asm/hugetlb-3level.h | 6 ++
arch/arm/include/asm/hugetlb.h | 10 +--
arch/arm/include/asm/pgtable-2level.h | 97 +++++++++++++++++++++++-
arch/arm/include/asm/pgtable-3level.h | 2 +-
arch/arm/include/asm/pgtable.h | 1 +
arch/arm/kernel/head.S | 10 ++-
arch/arm/mm/fault.c | 13 ----
arch/arm/mm/fsr-2level.c | 4 +-
arch/arm/mm/hugetlbpage.c | 2 +-
arch/arm/mm/mmu.c | 51 +++++++++++++
12 files changed, 305 insertions(+), 29 deletions(-)
create mode 100644 arch/arm/include/asm/hugetlb-2level.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ab438cb..5e80fad 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1832,7 +1832,7 @@ config HW_PERF_EVENTS

config SYS_SUPPORTS_HUGETLBFS
def_bool y
- depends on ARM_LPAE
+ depends on ARM_LPAE || (!CPU_USE_DOMAINS && !MEMORY_FAILURE)

config HAVE_ARCH_TRANSPARENT_HUGEPAGE
def_bool y
diff --git a/arch/arm/include/asm/hugetlb-2level.h b/arch/arm/include/asm/hugetlb-2level.h
new file mode 100644
index 0000000..f8d701f
--- /dev/null
+++ b/arch/arm/include/asm/hugetlb-2level.h
@@ -0,0 +1,136 @@
+/*
+ * arch/arm/include/asm/hugetlb-2level.h
+ *
+ * Copyright (C) 2014 Linaro Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_ARM_HUGETLB_2LEVEL_H
+#define _ASM_ARM_HUGETLB_2LEVEL_H
+
+
+static inline pte_t huge_ptep_get(pte_t *ptep)
+{
+ return *ptep;
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ set_pmd_at(mm, addr, (pmd_t *) ptep, __pmd(pte_val(pte)));
+}
+
+static inline pte_t pte_mkhuge(pte_t pte) { return pte; }
+
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t *)ptep;
+ pmd_clear(pmdp);
+ flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
+}
+
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t *) ptep;
+ set_pmd_at(mm, addr, pmdp, pmd_wrprotect(*pmdp));
+}
+
+
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t *)ptep;
+ pte_t pte = huge_ptep_get(ptep);
+ pmd_clear(pmdp);
+
+ return pte;
+}
+
+static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ pte_t pte, int dirty)
+{
+ int changed = !pte_same(huge_ptep_get(ptep), pte);
+ if (changed) {
+ set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+ flush_tlb_range(vma, addr, addr + HPAGE_SIZE);
+ }
+
+ return changed;
+}
+
+static inline pte_t huge_pte_mkwrite(pte_t pte)
+{
+ pmd_t pmd = __pmd(pte_val(pte));
+ pmd = pmd_mkwrite(pmd);
+ return __pte(pmd_val(pmd));
+}
+
+static inline pte_t huge_pte_mkdirty(pte_t pte)
+{
+ pmd_t pmd = __pmd(pte_val(pte));
+ pmd = pmd_mkdirty(pmd);
+ return __pte(pmd_val(pmd));
+}
+
+static inline unsigned long huge_pte_dirty(pte_t pte)
+{
+ return pmd_dirty(__pmd(pte_val(pte)));
+}
+
+static inline unsigned long huge_pte_write(pte_t pte)
+{
+ return pmd_write(__pmd(pte_val(pte)));
+}
+
+static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
+{
+ pmd_t *pmdp = (pmd_t *)ptep;
+ pmd_clear(pmdp);
+}
+
+static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot)
+{
+ pmd_t pmd = mk_pmd(page,pgprot);
+ return __pte(pmd_val(pmd));
+}
+
+static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
+{
+ pmd_t pmd = pmd_modify(__pmd(pte_val(pte)), newprot);
+ return __pte(pmd_val(pmd));
+}
+
+static inline pte_t huge_pte_wrprotect(pte_t pte)
+{
+ pmd_t pmd = pmd_wrprotect(__pmd(pte_val(pte)));
+ return __pte(pmd_val(pmd));
+}
+
+static inline struct page *huge_pte_page(pte_t pte)
+{
+ return pfn_to_page((pte_val(pte) & HPAGE_MASK) >> PAGE_SHIFT);
+}
+
+static inline unsigned long huge_pte_present(pte_t pte)
+{
+ return 1;
+}
+
+static inline pte_t huge_pte_mkyoung(pte_t pte)
+{
+ return __pte(pmd_val(pmd_mkyoung(__pmd(pte_val(pte)))));
+}
+
+#endif /* _ASM_ARM_HUGETLB_2LEVEL_H */
diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
index d4014fb..c633119 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -22,6 +22,7 @@
#ifndef _ASM_ARM_HUGETLB_3LEVEL_H
#define _ASM_ARM_HUGETLB_3LEVEL_H

+#include <asm-generic/hugetlb.h>

/*
* If our huge pte is non-zero then mark the valid bit.
@@ -68,4 +69,9 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
}

+static inline pte_t huge_pte_wrprotect(pte_t pte)
+{
+ return pte_wrprotect(pte);
+}
+
#endif /* _ASM_ARM_HUGETLB_3LEVEL_H */
diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 1f1b1cd..1d7f7b7 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -23,9 +23,12 @@
#define _ASM_ARM_HUGETLB_H

#include <asm/page.h>
-#include <asm-generic/hugetlb.h>

+#ifdef CONFIG_ARM_LPAE
#include <asm/hugetlb-3level.h>
+#else
+#include <asm/hugetlb-2level.h>
+#endif

static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
@@ -62,11 +65,6 @@ static inline int huge_pte_none(pte_t pte)
return pte_none(pte);
}

-static inline pte_t huge_pte_wrprotect(pte_t pte)
-{
- return pte_wrprotect(pte);
-}
-
static inline int arch_prepare_hugepage(struct page *page)
{
return 0;
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..323e19f 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -156,6 +156,19 @@
#define pud_clear(pudp) do { } while (0)
#define set_pud(pud,pudp) do { } while (0)

+static inline int pmd_thp_or_huge(pmd_t pmd)
+{
+ if ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_FAULT)
+ return pmd_val(pmd);
+
+ return ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT);
+}
+
+static inline int pte_huge(pte_t pte)
+{
+ return pmd_thp_or_huge(__pmd(pte_val(pte)));
+}
+
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
{
return (pmd_t *)pud;
@@ -184,11 +197,87 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)

/*
- * We don't have huge page support for short descriptors, for the moment
- * define empty stubs for use by pin_page_for_write.
+ * now follows some of the definitions to allow huge page support, we can't put
+ * these in the hugetlb source files as they are also required for transparent
+ * hugepage support.
*/
-#define pmd_hugewillfault(pmd) (0)
-#define pmd_thp_or_huge(pmd) (0)
+
+#define HPAGE_SHIFT PMD_SHIFT
+#define HPAGE_SIZE (_AC(1, UL) << HPAGE_SHIFT)
+#define HPAGE_MASK (~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
+
+/*
+ * We re-purpose the following domain bits in the section descriptor
+ */
+#define PMD_DSECT_DIRTY (_AT(pmdval_t, 1) << 5)
+#define PMD_DSECT_AF (_AT(pmdval_t, 1) << 6)
+
+#define PMD_BIT_FUNC(fn,op) \
+static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
+
+static inline unsigned long pmd_pfn(pmd_t pmd)
+{
+ /*
+ * for a section, we need to mask off more of the pmd
+ * before looking up the pfn.
+ */
+ if (pmd_thp_or_huge(pmd))
+ return __phys_to_pfn(pmd_val(pmd) & HPAGE_MASK);
+ else
+ return __phys_to_pfn(pmd_val(pmd) & PHYS_MASK);
+}
+
+extern pgprot_t get_huge_pgprot(pgprot_t newprot);
+
+#define pfn_pmd(pfn,prot) __pmd(__pfn_to_phys(pfn) | pgprot_val(prot));
+#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),get_huge_pgprot(prot));
+
+PMD_BIT_FUNC(mkdirty, |= PMD_DSECT_DIRTY);
+PMD_BIT_FUNC(mkwrite, |= PMD_SECT_AP_WRITE);
+PMD_BIT_FUNC(wrprotect, &= ~PMD_SECT_AP_WRITE);
+PMD_BIT_FUNC(mknexec, |= PMD_SECT_XN);
+PMD_BIT_FUNC(rmprotnone, |= PMD_TYPE_SECT);
+PMD_BIT_FUNC(mkyoung, |= PMD_DSECT_AF);
+
+#define pmd_young(pmd) (pmd_val(pmd) & PMD_DSECT_AF)
+#define pmd_write(pmd) (pmd_val(pmd) & PMD_SECT_AP_WRITE)
+#define pmd_exec(pmd) (!(pmd_val(pmd) & PMD_SECT_XN))
+#define pmd_dirty(pmd) (pmd_val(pmd) & PMD_DSECT_DIRTY)
+
+#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd))
+
+#define __HAVE_ARCH_PMD_WRITE
+
+extern void __sync_icache_dcache(unsigned long pfn, int exec);
+
+static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd)
+{
+ VM_BUG_ON((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_TABLE);
+
+ if (!pmd_val(pmd)) {
+ pmdp[0] = pmdp[1] = pmd;
+ } else {
+ pmdp[0] = __pmd(pmd_val(pmd));
+ pmdp[1] = __pmd(pmd_val(pmd) + SECTION_SIZE);
+
+ __sync_icache_dcache(pmd_pfn(pmd), pmd_exec(pmd));
+ }
+
+ flush_pmd_entry(pmdp);
+}
+
+static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
+{
+ pgprot_t hugeprot = get_huge_pgprot(newprot);
+ const pmdval_t mask = PMD_SECT_XN | PMD_SECT_AP_WRITE |
+ PMD_TYPE_SECT;
+
+ pmd_val(pmd) = (pmd_val(pmd) & ~mask) | (pgprot_val(hugeprot) & mask);
+
+ return pmd;
+}

#endif /* __ASSEMBLY__ */

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..a4b71c1 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,7 +213,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY))

#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd))
-#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
+#define pmd_thp_or_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 3a9c238..576511f2 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -222,6 +222,7 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
#define pte_dirty(pte) (pte_val(pte) & L_PTE_DIRTY)
#define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG)
#define pte_exec(pte) (!(pte_val(pte) & L_PTE_XN))
+#define pte_protnone(pte) (pte_val(pte) & L_PTE_NONE)
#define pte_special(pte) (0)

#define pte_valid_user(pte) \
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index f8c0883..dd9ab4c 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -434,13 +434,21 @@ __enable_mmu:
bic r0, r0, #CR_I
#endif
#ifndef CONFIG_ARM_LPAE
+#ifndef CONFIG_SYS_SUPPORTS_HUGETLBFS
mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
domain_val(DOMAIN_IO, DOMAIN_CLIENT))
+#else
+ @ set ourselves as the client in all domains
+ @ this allows us to then use the 4 domain bits in the
+ @ section descriptors in our transparent huge pages
+ ldr r5, =0x55555555
+#endif /* CONFIG_SYS_SUPPORTS_HUGETLBFS */
+
mcr p15, 0, r5, c3, c0, 0 @ load domain access register
mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
-#endif
+#endif /* CONFIG_ARM_LPAE */
b __turn_mmu_on
ENDPROC(__enable_mmu)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index eb8830a..faae9bd 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -491,19 +491,6 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
#endif /* CONFIG_MMU */

/*
- * Some section permission faults need to be handled gracefully.
- * They can happen due to a __{get,put}_user during an oops.
- */
-#ifndef CONFIG_ARM_LPAE
-static int
-do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
-{
- do_bad_area(addr, fsr, regs);
- return 0;
-}
-#endif /* CONFIG_ARM_LPAE */
-
-/*
* This abort handler always returns "fault".
*/
static int
diff --git a/arch/arm/mm/fsr-2level.c b/arch/arm/mm/fsr-2level.c
index 18ca74c..c1a2afc 100644
--- a/arch/arm/mm/fsr-2level.c
+++ b/arch/arm/mm/fsr-2level.c
@@ -16,7 +16,7 @@ static struct fsr_info fsr_info[] = {
{ do_bad, SIGBUS, 0, "external abort on non-linefetch" },
{ do_bad, SIGSEGV, SEGV_ACCERR, "page domain fault" },
{ do_bad, SIGBUS, 0, "external abort on translation" },
- { do_sect_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },
{ do_bad, SIGBUS, 0, "external abort on translation" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "page permission fault" },
/*
@@ -56,7 +56,7 @@ static struct fsr_info ifsr_info[] = {
{ do_bad, SIGBUS, 0, "unknown 10" },
{ do_bad, SIGSEGV, SEGV_ACCERR, "page domain fault" },
{ do_bad, SIGBUS, 0, "external abort on translation" },
- { do_sect_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },
{ do_bad, SIGBUS, 0, "external abort on translation" },
{ do_page_fault, SIGSEGV, SEGV_ACCERR, "page permission fault" },
{ do_bad, SIGBUS, 0, "unknown 16" },
diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index 54ee616..619b082 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -54,7 +54,7 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)

int pmd_huge(pmd_t pmd)
{
- return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
+ return pmd_thp_or_huge(pmd);
}

int pmd_huge_support(void)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b68c6b2..4799864 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -389,6 +389,44 @@ SET_MEMORY_FN(x, pte_set_x)
SET_MEMORY_FN(nx, pte_set_nx)

/*
+ * If the system supports huge pages and we are running with short descriptors,
+ * then compute the pgprot values for a huge page. We do not need to do this
+ * with LPAE as there is no software/hardware bit distinction for ptes.
+ *
+ * We are only interested in:
+ * 1) The memory type: huge pages are user pages so a section of type
+ * MT_MEMORY_RW. This is used to create new huge ptes/thps.
+ *
+ * 2) XN, PROT_NONE, WRITE. These are set/unset through protection changes
+ * by pte_modify or pmd_modify and are used to make new ptes/thps.
+ *
+ * The other bits: dirty, young, splitting are not modified by pte_modify
+ * or pmd_modify nor are they used to create new ptes or pmds thus they are not
+ * considered here.
+ */
+#if defined(CONFIG_SYS_SUPPORTS_HUGETLBFS) && !defined(CONFIG_ARM_LPAE)
+static pgprot_t _hugepgprotval;
+
+pgprot_t get_huge_pgprot(pgprot_t newprot)
+{
+ pte_t inprot = __pte(pgprot_val(newprot));
+ pmd_t pmdret = __pmd(pgprot_val(_hugepgprotval));
+
+ if (!pte_exec(inprot))
+ pmdret = pmd_mknexec(pmdret);
+
+ if (pte_write(inprot))
+ pmdret = pmd_mkwrite(pmdret);
+
+ if (!pte_protnone(inprot))
+ pmdret = pmd_rmprotnone(pmdret);
+
+ return __pgprot(pmd_val(pmdret));
+}
+#endif
+
+
+/*
* Adjust the PMD section entries according to the CPU in use.
*/
static void __init build_mem_type_table(void)
@@ -637,6 +675,19 @@ static void __init build_mem_type_table(void)
if (t->prot_sect)
t->prot_sect |= PMD_DOMAIN(t->domain);
}
+
+#if defined(CONFIG_SYS_SUPPORTS_HUGETLBFS) && !defined(CONFIG_ARM_LPAE)
+ /*
+ * we assume all huge pages are user pages and that hardware access
+ * flag updates are disabled (which is the case for short descriptors).
+ */
+ pgprot_val(_hugepgprotval) = mem_types[MT_MEMORY_RW].prot_sect
+ | PMD_SECT_AP_READ | PMD_SECT_nG;
+
+ pgprot_val(_hugepgprotval) &= ~(PMD_SECT_AP_WRITE | PMD_SECT_XN
+ | PMD_TYPE_SECT);
+#endif
+
}

#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
--
1.8.1.4

2014-04-16 11:47:31

by Steve Capper

[permalink] [raw]
Subject: [PATCH V2 1/5] mm: hugetlb: Introduce huge_pte_{page,present,young}

Introduce huge pte versions of pte_page, pte_present and pte_young.

This allows ARM (without LPAE) to use alternative pte processing logic
for huge ptes.

Generic implementations that call the standard pte versions are also
added to asm-generic/hugetlb.h.

Signed-off-by: Steve Capper <[email protected]>
Acked-by: Gerald Schaefer <[email protected]>
---
arch/s390/include/asm/hugetlb.h | 15 +++++++++++++++
include/asm-generic/hugetlb.h | 15 +++++++++++++++
mm/hugetlb.c | 22 +++++++++++-----------
3 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index 11eae5f..7b13ec0 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -112,4 +112,19 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
return pte_modify(pte, newprot);
}

+static inline struct page *huge_pte_page(pte_t pte)
+{
+ return pte_page(pte);
+}
+
+static inline unsigned long huge_pte_present(pte_t pte)
+{
+ return pte_present(pte);
+}
+
+static inline pte_t huge_pte_mkyoung(pte_t pte)
+{
+ return pte_mkyoung(pte);
+}
+
#endif /* _ASM_S390_HUGETLB_H */
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 99b490b..2dc68fe 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -37,4 +37,19 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_clear(mm, addr, ptep);
}

+static inline struct page *huge_pte_page(pte_t pte)
+{
+ return pte_page(pte);
+}
+
+static inline unsigned long huge_pte_present(pte_t pte)
+{
+ return pte_present(pte);
+}
+
+static inline pte_t huge_pte_mkyoung(pte_t pte)
+{
+ return pte_mkyoung(pte);
+}
+
#endif /* _ASM_GENERIC_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dd30f22..1e77c07 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2350,7 +2350,7 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
entry = huge_pte_wrprotect(mk_huge_pte(page,
vma->vm_page_prot));
}
- entry = pte_mkyoung(entry);
+ entry = huge_pte_mkyoung(entry);
entry = pte_mkhuge(entry);
entry = arch_make_huge_pte(entry, vma, page, writable);

@@ -2410,7 +2410,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
if (cow)
huge_ptep_set_wrprotect(src, addr, src_pte);
entry = huge_ptep_get(src_pte);
- ptepage = pte_page(entry);
+ ptepage = huge_pte_page(entry);
get_page(ptepage);
page_dup_rmap(ptepage);
set_huge_pte_at(dst, addr, dst_pte, entry);
@@ -2429,7 +2429,7 @@ static int is_hugetlb_entry_migration(pte_t pte)
{
swp_entry_t swp;

- if (huge_pte_none(pte) || pte_present(pte))
+ if (huge_pte_none(pte) || huge_pte_present(pte))
return 0;
swp = pte_to_swp_entry(pte);
if (non_swap_entry(swp) && is_migration_entry(swp))
@@ -2442,7 +2442,7 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte)
{
swp_entry_t swp;

- if (huge_pte_none(pte) || pte_present(pte))
+ if (huge_pte_none(pte) || huge_pte_present(pte))
return 0;
swp = pte_to_swp_entry(pte);
if (non_swap_entry(swp) && is_hwpoison_entry(swp))
@@ -2495,7 +2495,7 @@ again:
goto unlock;
}

- page = pte_page(pte);
+ page = huge_pte_page(pte);
/*
* If a reference page is supplied, it is because a specific
* page is being unmapped, not a range. Ensure the page we
@@ -2645,7 +2645,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */

- old_page = pte_page(pte);
+ old_page = huge_pte_page(pte);

retry_avoidcopy:
/* If no-one else is actually using this page, avoid the copy
@@ -3033,7 +3033,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* Note that locking order is always pagecache_page -> page,
* so no worry about deadlock.
*/
- page = pte_page(entry);
+ page = huge_pte_page(entry);
get_page(page);
if (page != pagecache_page)
lock_page(page);
@@ -3053,7 +3053,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}
entry = huge_pte_mkdirty(entry);
}
- entry = pte_mkyoung(entry);
+ entry = huge_pte_mkyoung(entry);
if (huge_ptep_set_access_flags(vma, address, ptep, entry,
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, address, ptep);
@@ -3144,7 +3144,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
}

pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
- page = pte_page(huge_ptep_get(pte));
+ page = huge_pte_page(huge_ptep_get(pte));
same_page:
if (pages) {
pages[i] = mem_map_offset(page, pfn_offset);
@@ -3501,7 +3501,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
{
struct page *page;

- page = pte_page(*(pte_t *)pmd);
+ page = huge_pte_page(*(pte_t *)pmd);
if (page)
page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
return page;
@@ -3513,7 +3513,7 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
{
struct page *page;

- page = pte_page(*(pte_t *)pud);
+ page = huge_pte_page(*(pte_t *)pud);
if (page)
page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
return page;
--
1.8.1.4

2014-04-16 11:47:28

by Steve Capper

[permalink] [raw]
Subject: [PATCH V2 3/5] arm: mm: Make mmu_gather aware of huge pages

Huge pages on short descriptors are arranged as pairs of 1MB sections.
We need to be careful and ensure that the TLBs for both sections are
flushed when we tlb_add_flush on a HugeTLB page.

This patch extends the tlb flush range to HPAGE_SIZE rather than
PAGE_SIZE when addresses belonging to huge page VMAs are added to
the flush range.

Signed-off-by: Steve Capper <[email protected]>
---
arch/arm/include/asm/tlb.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 0baf7f0..b2498e6 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -81,10 +81,17 @@ static inline void tlb_flush(struct mmu_gather *tlb)
static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
{
if (!tlb->fullmm) {
+ unsigned long size = PAGE_SIZE;
+
if (addr < tlb->range_start)
tlb->range_start = addr;
- if (addr + PAGE_SIZE > tlb->range_end)
- tlb->range_end = addr + PAGE_SIZE;
+
+ if (!config_enabled(CONFIG_ARM_LPAE) && tlb->vma
+ && is_vm_hugetlb_page(tlb->vma))
+ size = HPAGE_SIZE;
+
+ if (addr + size > tlb->range_end)
+ tlb->range_end = addr + size;
}
}

--
1.8.1.4

2014-04-24 10:22:45

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On Wed, Apr 16, 2014 at 12:46:38PM +0100, Steve Capper wrote:
> Hello,
> This series brings HugeTLB pages and Transparent Huge Pages (THP) to
> ARM on short descriptors.
>
> Russell, Andrew,
> I would like to get this in next (and hopefully 3.16 if no problems
> arise) if that sounds reasonable?
>
> There's one patch at the beginning of the series for mm:
> mm: hugetlb: Introduce huge_pte_{page,present,young}
> This has been tested on ARM and s390 and should compile out for other
> architectures.
>
> The rest of the series targets arch/arm.
>
> I've bumped the series to V2 as it was rebased (and tested against)
> v3.15-rc1. On ARM the libhugetlbfs test suite, some THP PROT_NONE
> tests and the recursive execve test all passed successfully.
>
> Thanks,
> --
> Steve

Hello,
Just a ping on this...

I would really like to get huge page support for short descriptors on
ARM merged as I've been carrying around these patches for a long time.

Recently I've had no issues raised about the code. The patches have
been tested and found to be both beneficial to system performance and
stable.

There are two parts to the series, the first patch is a core mm/ patch
that introduces some huge_pte_ helper functions that allows for a much
simpler ARM (without LPAE) implementation. The second part is the
actual arch/arm code.

I'm not sure how to proceed with these patches. I was thinking that
they could be picked up into linux-next? If that sounds reasonable;
Andrew, would you like to take the mm/ patch and Russell could you
please take the arch/arm patches?

Also, I was hoping to get these into 3.16. Are there any objections to
that?

Thank you,
--
Steve

>
>
> Steve Capper (5):
> mm: hugetlb: Introduce huge_pte_{page,present,young}
> arm: mm: Adjust the parameters for __sync_icache_dcache
> arm: mm: Make mmu_gather aware of huge pages
> arm: mm: HugeTLB support for non-LPAE systems
> arm: mm: Add Transparent HugePage support for non-LPAE
>
> arch/arm/Kconfig | 4 +-
> arch/arm/include/asm/hugetlb-2level.h | 136 ++++++++++++++++++++++++++++++++++
> arch/arm/include/asm/hugetlb-3level.h | 6 ++
> arch/arm/include/asm/hugetlb.h | 10 +--
> arch/arm/include/asm/pgtable-2level.h | 129 +++++++++++++++++++++++++++++++-
> arch/arm/include/asm/pgtable-3level.h | 3 +-
> arch/arm/include/asm/pgtable.h | 9 +--
> arch/arm/include/asm/tlb.h | 14 +++-
> arch/arm/kernel/head.S | 10 ++-
> arch/arm/mm/fault.c | 13 ----
> arch/arm/mm/flush.c | 9 +--
> arch/arm/mm/fsr-2level.c | 4 +-
> arch/arm/mm/hugetlbpage.c | 2 +-
> arch/arm/mm/mmu.c | 51 +++++++++++++
> arch/s390/include/asm/hugetlb.h | 15 ++++
> include/asm-generic/hugetlb.h | 15 ++++
> mm/hugetlb.c | 22 +++---
> 17 files changed, 399 insertions(+), 53 deletions(-)
> create mode 100644 arch/arm/include/asm/hugetlb-2level.h
>
> --
> 1.8.1.4
>

2014-04-24 10:37:28

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

Hi Steve,

On Thu, Apr 24, 2014 at 11:22:29AM +0100, Steve Capper wrote:
> On Wed, Apr 16, 2014 at 12:46:38PM +0100, Steve Capper wrote:
> Just a ping on this...
>
> I would really like to get huge page support for short descriptors on
> ARM merged as I've been carrying around these patches for a long time.
>
> Recently I've had no issues raised about the code. The patches have
> been tested and found to be both beneficial to system performance and
> stable.
>
> There are two parts to the series, the first patch is a core mm/ patch
> that introduces some huge_pte_ helper functions that allows for a much
> simpler ARM (without LPAE) implementation. The second part is the
> actual arch/arm code.
>
> I'm not sure how to proceed with these patches. I was thinking that
> they could be picked up into linux-next? If that sounds reasonable;
> Andrew, would you like to take the mm/ patch and Russell could you
> please take the arch/arm patches?
>
> Also, I was hoping to get these into 3.16. Are there any objections to
> that?

Who is asking for this code? We already support hugepages for LPAE systems,
so this would be targetting what? A9? I'm reluctant to add ~400 lines of
subtle, low-level mm code to arch/arm/ if it doesn't have any active users.

I guess I'm after some commitment that this is (a) useful to somebody and
(b) going to be tested regularly, otherwise it will go the way of things
like big-endian, where we end up carrying around code which is broken more
often than not (although big-endian is more self-contained).

Will

2014-04-24 10:43:30

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On Thu, Apr 24, 2014 at 11:36:39AM +0100, Will Deacon wrote:
> I guess I'm after some commitment that this is (a) useful to somebody and
> (b) going to be tested regularly, otherwise it will go the way of things
> like big-endian, where we end up carrying around code which is broken more
> often than not (although big-endian is more self-contained).

It may be something worth considering adding to my nightly builder/boot
testing, but I suspect that's impractical as it probably requires a BE
userspace, which would then mean that the platform can't boot LE.

I suspect that we will just have to rely on BE users staying around and
reporting problems when they occur.

--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

2014-04-24 10:47:26

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On Thu, Apr 24, 2014 at 11:42:32AM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 24, 2014 at 11:36:39AM +0100, Will Deacon wrote:
> > I guess I'm after some commitment that this is (a) useful to somebody and
> > (b) going to be tested regularly, otherwise it will go the way of things
> > like big-endian, where we end up carrying around code which is broken more
> > often than not (although big-endian is more self-contained).
>
> It may be something worth considering adding to my nightly builder/boot
> testing, but I suspect that's impractical as it probably requires a BE
> userspace, which would then mean that the platform can't boot LE.
>
> I suspect that we will just have to rely on BE users staying around and
> reporting problems when they occur.

Indeed. Marc and I have BE guests running under kvmtool on an LE host, so
that's what I've been using (then a BE busybox can sit in the host
filesystem and be passed via something like 9pfs).

Will

2014-04-24 10:56:04

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On 24 April 2014 11:42, Russell King - ARM Linux <[email protected]> wrote:
> On Thu, Apr 24, 2014 at 11:36:39AM +0100, Will Deacon wrote:
>> I guess I'm after some commitment that this is (a) useful to somebody and
>> (b) going to be tested regularly, otherwise it will go the way of things
>> like big-endian, where we end up carrying around code which is broken more
>> often than not (although big-endian is more self-contained).
>
> It may be something worth considering adding to my nightly builder/boot
> testing, but I suspect that's impractical as it probably requires a BE
> userspace, which would then mean that the platform can't boot LE.
>
> I suspect that we will just have to rely on BE users staying around and
> reporting problems when they occur.

The huge page support is for standard LE, I think Will was saying that
this will be like BE if no-one uses it.
I would appreciate any extra testing a *lot*. :-).

It's somewhat unfair to compare huge pages on short descriptors with
BE. For a start, the userspace that works with LPAE will work on the
short-descriptor kernel too. Great care has been taken to ensure that
programmers can just port their huge page code over to ARM from other
architectures without any issues. As things like libhugetlbfs (which
fully supports ARM) get incorporated into distros on ARM, huge pages
become the norm as opposed to the exception.

Some devices have very few TLBs and I believe this series will be very
beneficial for people using those devices.

Cheers,
--
Steve

2014-04-24 11:04:19

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On Thu, Apr 24, 2014 at 11:55:56AM +0100, Steve Capper wrote:
> On 24 April 2014 11:42, Russell King - ARM Linux <[email protected]> wrote:
> > On Thu, Apr 24, 2014 at 11:36:39AM +0100, Will Deacon wrote:
> >> I guess I'm after some commitment that this is (a) useful to somebody and
> >> (b) going to be tested regularly, otherwise it will go the way of things
> >> like big-endian, where we end up carrying around code which is broken more
> >> often than not (although big-endian is more self-contained).
> >
> > It may be something worth considering adding to my nightly builder/boot
> > testing, but I suspect that's impractical as it probably requires a BE
> > userspace, which would then mean that the platform can't boot LE.
> >
> > I suspect that we will just have to rely on BE users staying around and
> > reporting problems when they occur.
>
> The huge page support is for standard LE, I think Will was saying that
> this will be like BE if no-one uses it.

We're not saying that.

What we're asking is this: *Who* is using hugepages today?

What we're then doing is comparing it to the situation we have today with
BE, where BE support is *always* getting broken because no one in the main
community tests it - not even a build test, nor a boot test which would
be required to find the problems that (for example) cropped up in the
last merge window.

> It's somewhat unfair to compare huge pages on short descriptors with
> BE. For a start, the userspace that works with LPAE will work on the
> short-descriptor kernel too.

That sounds good, but the question is how does this get tested by
facilities such as my build/boot system, or Olof/Kevin's system?
Without that, it will find itself in exactly the same situation that
BE is in, where problems aren't found until after updates are merged
into Linus' tree.

--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

2014-04-24 12:03:46

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On 24 April 2014 12:03, Russell King - ARM Linux <[email protected]> wrote:
> On Thu, Apr 24, 2014 at 11:55:56AM +0100, Steve Capper wrote:
>> On 24 April 2014 11:42, Russell King - ARM Linux <[email protected]> wrote:
>> > On Thu, Apr 24, 2014 at 11:36:39AM +0100, Will Deacon wrote:
>> >> I guess I'm after some commitment that this is (a) useful to somebody and
>> >> (b) going to be tested regularly, otherwise it will go the way of things
>> >> like big-endian, where we end up carrying around code which is broken more
>> >> often than not (although big-endian is more self-contained).
>> >
>> > It may be something worth considering adding to my nightly builder/boot
>> > testing, but I suspect that's impractical as it probably requires a BE
>> > userspace, which would then mean that the platform can't boot LE.
>> >
>> > I suspect that we will just have to rely on BE users staying around and
>> > reporting problems when they occur.
>>
>> The huge page support is for standard LE, I think Will was saying that
>> this will be like BE if no-one uses it.
>
> We're not saying that.
>

Apologies, I was talking at cross-purposes.

> What we're asking is this: *Who* is using hugepages today?

I've asked the people who have been in touch with me to jump in to
this discussion.
People working on phones and servers have expressed an interest.

>
> What we're then doing is comparing it to the situation we have today with
> BE, where BE support is *always* getting broken because no one in the main
> community tests it - not even a build test, nor a boot test which would
> be required to find the problems that (for example) cropped up in the
> last merge window.

I can appreciate that concern.

>
>> It's somewhat unfair to compare huge pages on short descriptors with
>> BE. For a start, the userspace that works with LPAE will work on the
>> short-descriptor kernel too.
>
> That sounds good, but the question is how does this get tested by
> facilities such as my build/boot system, or Olof/Kevin's system?
> Without that, it will find itself in exactly the same situation that
> BE is in, where problems aren't found until after updates are merged
> into Linus' tree.
>

For minimal build/boot testing, I would recommend enabling:
CONFIG_HUGETLBFS=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

That should not have any significant effect on the running system (one
has to opt-in to use HugeTLB or THP in this case), so could be put in
a defconfig.

For actual usage testing, typically one would use the upstream
libhugelbfs test suite as it is very good at finding problems, is kept
up to date, and is easy to automate and interpret the results. For
THP, I usually run a kernel build repeatedly with
/sys/kernel/mm/transparent_hugepage/enabled set to always along with
LTP's mm tests.

We also run continuous integration tests within Linaro against Linaro
kernels, and the libhugetlbfs test suite is one of our tests. If it
helps things, I can set up automated huge page tests within Linaro and
pull in another branch?

Cheers,
--
Steve

2014-04-24 13:33:29

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On Thu, Apr 24, 2014 at 5:36 AM, Will Deacon <[email protected]> wrote:
> Hi Steve,
>
> On Thu, Apr 24, 2014 at 11:22:29AM +0100, Steve Capper wrote:
>> On Wed, Apr 16, 2014 at 12:46:38PM +0100, Steve Capper wrote:

[...]

>> I'm not sure how to proceed with these patches. I was thinking that
>> they could be picked up into linux-next? If that sounds reasonable;
>> Andrew, would you like to take the mm/ patch and Russell could you
>> please take the arch/arm patches?
>>
>> Also, I was hoping to get these into 3.16. Are there any objections to
>> that?
>
> Who is asking for this code? We already support hugepages for LPAE systems,
> so this would be targetting what? A9? I'm reluctant to add ~400 lines of
> subtle, low-level mm code to arch/arm/ if it doesn't have any active users.

I can't really speak to the who so much anymore. I can say on the
server front, it was not only Calxeda asking for this.

Presumably there are also performance benefits on older systems even
with 128MB-1GB of RAM. Given that KVM guests can only use 3GB of RAM,
enabling LPAE in guest kernels has little benefit. So this may still
be useful on LPAE capable systems. Also, Oracle Java will use
hugetlbfs if available and Java performance needs all the help it can
get.

> I guess I'm after some commitment that this is (a) useful to somebody and
> (b) going to be tested regularly, otherwise it will go the way of things
> like big-endian, where we end up carrying around code which is broken more
> often than not (although big-endian is more self-contained).

One key difference here is enabling THP is or should be transparent
(to state the obvious) to users. While the BE code itself may be
self-contained, using BE is very much not in that category.
Potentially every driver on a platform could be broken for BE. Case in
point, the Calxeda xgmac driver is broken on BE due to using __raw i/o
accessors instead of relaxed variants.

Rob

2014-06-03 00:27:26

by Grazvydas Ignotas

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Huge pages for short descriptors on ARM

On Thu, Apr 24, 2014 at 2:03 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Thu, Apr 24, 2014 at 11:55:56AM +0100, Steve Capper wrote:
>> On 24 April 2014 11:42, Russell King - ARM Linux <[email protected]> wrote:
>> > On Thu, Apr 24, 2014 at 11:36:39AM +0100, Will Deacon wrote:
>> >> I guess I'm after some commitment that this is (a) useful to somebody and
>> >> (b) going to be tested regularly, otherwise it will go the way of things
>> >> like big-endian, where we end up carrying around code which is broken more
>> >> often than not (although big-endian is more self-contained).
>> >
>> > It may be something worth considering adding to my nightly builder/boot
>> > testing, but I suspect that's impractical as it probably requires a BE
>> > userspace, which would then mean that the platform can't boot LE.
>> >
>> > I suspect that we will just have to rely on BE users staying around and
>> > reporting problems when they occur.
>>
>> The huge page support is for standard LE, I think Will was saying that
>> this will be like BE if no-one uses it.
>
> We're not saying that.
>
> What we're asking is this: *Who* is using hugepages today?

We are using it on opanpandora handheld, it's really useful for doing
graphics in software. Here are some benchmarks I did some time ago:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-February/148835.html
For example Cortex-A8 only has 32 dTLB entries so they run out pretty
fast while drawing vertical lines on linear images. And it's not so
rare thing to do, like for drawing vertical scrollbars.

Other people find use for it too, like to get more consistent results
between benchmark runs:
http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html

Yes in my case this is niche device and I can keep patching in the
hugepage support, but mainline support would make life easier and
would be very much appreciated.


--
GraÅžvydas