The PAT bit gets relocated to bit 12 when PUD/PMD mappings are used. This
bit 12, however, is not covered by PTE_FLAGS_MASK, which is corrently used
for masking pfn and flags for all levels. This patch-set updates pud/pmd
interfaces and multiple functions to handle the large PAT bit properly.
Patch 1/10-2/10 make changes necessary for patch 3/10 to use P?D_PAGE_MASK.
Patch 3/10 fixes pud/pmd interfaces to handle the PAT bit properly, and
patch 4/10 adds p?d_pgprot() interfaces for PUD/PMD.
Patch 5/10 fixes /sys/kernel/debug/kernel_page_tables to show the PAT bit
properly.
Patch 6/10-9/10 fix multiple functions to handle the large PAT bit properly.
Patch 10/10 fix the same pgprot handling in try_preserve_large_page() by
leveraging the changes made in patch 8/10.
Note, the PAT bit is first enabled in 4.2-rc1 with WT mappings. The functions
fixed by patch 6/10-9/10 are not used with WT mappings yet. These fixes will
protect them from future use with the PAT bit set.
---
v3:
- Add patch 4/10 and 6/10-9/10 for multiple interfaces to handle the large
PAT bit.
- Add patch 10/10 to fix the same pgprot handling in
try_preserve_large_page().
v2:
- Change p?n_pfn() to handle the PAT bit. (Juergen Gross)
- Mask pfn and flags with P?D_PAGE_MASK. (Juergen Gross)
- Change p?d_page_vaddr() and p?d_page() to handle the PAT bit.
---
Toshi Kani (10):
1/10 x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
2/10 x86/asm: Move PUD_PAGE macros to page_types.h
3/10 x86/asm: Fix pud/pmd interfaces to handle large PAT bit
4/10 x86/asm: Add pud_pgprot() and pmd_pgprot()
5/10 x86/mm: Fix page table dump to show PAT bit
6/10 x86/mm: Fix slow_virt_to_phys() to handle large PAT bit
7/10 x86/mm: Fix gup_huge_p?d() to handle large PAT bit
8/10 x86/mm: Fix try_preserve_large_page() to handle large PAT bit
9/10 x86/mm: Fix __split_large_page() to handle large PAT bit
10/10 x86/mm: Fix the same pgprot handling in try_preserve_large_page()
---
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 +
arch/x86/include/asm/page_64_types.h | 3 --
arch/x86/include/asm/page_types.h | 3 ++
arch/x86/include/asm/pgtable.h | 18 ++++---
arch/x86/include/asm/pgtable_types.h | 40 +++++++++++++--
arch/x86/mm/dump_pagetables.c | 39 +++++++-------
arch/x86/mm/gup.c | 18 +++----
arch/x86/mm/pageattr.c | 79 ++++++++++++++++++-----------
8 files changed, 131 insertions(+), 71 deletions(-)
In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32bit
kernel configuration by re-defining it to CONFIG_X86_32. However,
it does not re-define CONFIG_PGTABLE_LEVELS leaving it as 4 levels.
Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 as X86_PAE is not
set.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
index 175cc72..87a86e0 100644
--- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
@@ -14,11 +14,13 @@
*/
#undef CONFIG_64BIT
#undef CONFIG_X86_64
+#undef CONFIG_PGTABLE_LEVELS
#undef CONFIG_ILLEGAL_POINTER_VALUE
#undef CONFIG_SPARSEMEM_VMEMMAP
#undef CONFIG_NR_CPUS
#define CONFIG_X86_32 1
+#define CONFIG_PGTABLE_LEVELS 2
#define CONFIG_PAGE_OFFSET 0
#define CONFIG_ILLEGAL_POINTER_VALUE 0
#define CONFIG_NR_CPUS 1
PUD_SHIFT is defined according to a kernel configuration, which
allows it be commonly used by any kernels. However, PUD_PAGE_SIZE
and PUD_PAGE_MASK, which are calculated from PUD_SHIFT, are defined
in page_64_types.h, which allows them be used by a 64-bit kernel
only.
Move PUD_PAGE_SIZE and PUD_PAGE_MASK to page_types.h so that they
can be used by any kernels as well.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/page_64_types.h | 3 ---
arch/x86/include/asm/page_types.h | 3 +++
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 4edd53b..4928cf0 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -26,9 +26,6 @@
#define MCE_STACK 4
#define N_EXCEPTION_STACKS 4 /* hw limit: 7 */
-#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
-#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
-
/*
* Set __PAGE_OFFSET to the most negative possible address +
* PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index c7c712f..c5b7fb2 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -20,6 +20,9 @@
#define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1))
+#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
+
#define HPAGE_SHIFT PMD_SHIFT
#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT)
#define HPAGE_MASK (~(HPAGE_SIZE - 1))
The PAT bit gets relocated to bit 12 when PUD and PMD mappings are
used. This bit 12, however, is not covered by PTE_FLAGS_MASK, which
is corrently used for masking pfn and flags for all cases.
Fix pud/pmd interfaces to handle pfn and flags properly by using
P?D_PAGE_MASK when PUD/PMD mappings are used, i.e. PSE bit is set.
Suggested-by: Juergen Gross <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Konrad Wilk <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/pgtable.h | 14 +++++++-----
arch/x86/include/asm/pgtable_types.h | 40 +++++++++++++++++++++++++++++++---
2 files changed, 44 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..0733ec7 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -142,12 +142,12 @@ static inline unsigned long pte_pfn(pte_t pte)
static inline unsigned long pmd_pfn(pmd_t pmd)
{
- return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+ return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
}
static inline unsigned long pud_pfn(pud_t pud)
{
- return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+ return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
}
#define pte_page(pte) pfn_to_page(pte_pfn(pte))
@@ -502,14 +502,15 @@ static inline int pmd_none(pmd_t pmd)
static inline unsigned long pmd_page_vaddr(pmd_t pmd)
{
- return (unsigned long)__va(pmd_val(pmd) & PTE_PFN_MASK);
+ return (unsigned long)__va(pmd_val(pmd) & pmd_pfn_mask(pmd));
}
/*
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pmd_page(pmd) pfn_to_page((pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT)
+#define pmd_page(pmd) \
+ pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
/*
* the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -570,14 +571,15 @@ static inline int pud_present(pud_t pud)
static inline unsigned long pud_page_vaddr(pud_t pud)
{
- return (unsigned long)__va((unsigned long)pud_val(pud) & PTE_PFN_MASK);
+ return (unsigned long)__va(pud_val(pud) & pud_pfn_mask(pud));
}
/*
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT)
+#define pud_page(pud) \
+ pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
/* Find an entry in the second-level page table.. */
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 13f310b..dd5b0aa 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -209,10 +209,10 @@ enum page_cache_mode {
#include <linux/types.h>
-/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */
+/* Extracts the PFN from a (pte|pmd|pud|pgd)val_t of a 4KB page */
#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK)
-/* PTE_FLAGS_MASK extracts the flags from a (pte|pmd|pud|pgd)val_t */
+/* Extracts the flags from a (pte|pmd|pud|pgd)val_t of a 4KB page */
#define PTE_FLAGS_MASK (~PTE_PFN_MASK)
typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
@@ -276,14 +276,46 @@ static inline pmdval_t native_pmd_val(pmd_t pmd)
}
#endif
+static inline pudval_t pud_pfn_mask(pud_t pud)
+{
+ if (native_pud_val(pud) & _PAGE_PSE)
+ return PUD_PAGE_MASK & PHYSICAL_PAGE_MASK;
+ else
+ return PTE_PFN_MASK;
+}
+
+static inline pudval_t pud_flags_mask(pud_t pud)
+{
+ if (native_pud_val(pud) & _PAGE_PSE)
+ return ~(PUD_PAGE_MASK & (pudval_t)PHYSICAL_PAGE_MASK);
+ else
+ return ~PTE_PFN_MASK;
+}
+
static inline pudval_t pud_flags(pud_t pud)
{
- return native_pud_val(pud) & PTE_FLAGS_MASK;
+ return native_pud_val(pud) & pud_flags_mask(pud);
+}
+
+static inline pmdval_t pmd_pfn_mask(pmd_t pmd)
+{
+ if (native_pmd_val(pmd) & _PAGE_PSE)
+ return PMD_PAGE_MASK & PHYSICAL_PAGE_MASK;
+ else
+ return PTE_PFN_MASK;
+}
+
+static inline pmdval_t pmd_flags_mask(pmd_t pmd)
+{
+ if (native_pmd_val(pmd) & _PAGE_PSE)
+ return ~(PMD_PAGE_MASK & (pmdval_t)PHYSICAL_PAGE_MASK);
+ else
+ return ~PTE_PFN_MASK;
}
static inline pmdval_t pmd_flags(pmd_t pmd)
{
- return native_pmd_val(pmd) & PTE_FLAGS_MASK;
+ return native_pmd_val(pmd) & pmd_flags_mask(pmd);
}
static inline pte_t native_make_pte(pteval_t val)
pte_pgprot() returns a pgprot_t value by calling pte_flags(). Now
that pud_flags() and pmd_flags() work differently from pte_flags(),
define pud_pgprot() and pmd_pgprot() for PUD/PMD. Also update
pte_pgprot() to remove the masking with PTE_FLAGS_MASK, which is
unnecessary since pte_flags() takes care of it.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/pgtable.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0733ec7..59fc341 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -379,7 +379,9 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
return __pgprot(preservebits | addbits);
}
-#define pte_pgprot(x) __pgprot(pte_flags(x) & PTE_FLAGS_MASK)
+#define pte_pgprot(x) __pgprot(pte_flags(x))
+#define pmd_pgprot(x) __pgprot(pmd_flags(x))
+#define pud_pgprot(x) __pgprot(pud_flags(x))
#define canon_pgprot(p) __pgprot(massage_pgprot(p))
/sys/kernel/debug/kernel_page_tables does not show the PAT bit for
PUD/PMD mappings. This is because walk_pud_level(), walk_pmd_level()
and note_page() mask the flags with PTE_FLAGS_MASK, which does not
cover their PAT bit, _PAGE_PAT_LARGE.
Fix it by replacing the use of PTE_FLAGS_MASK with p?d_flags(),
which masks the flags properly.
Change also to show the PAT bit as "PAT" to be consistent with
other bits.
Reported-by: Robert Elliott <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/dump_pagetables.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f0cedf3..71ab2d7 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -155,7 +155,7 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg)
pt_dump_cont_printf(m, dmsg, " ");
if ((level == 4 && pr & _PAGE_PAT) ||
((level == 3 || level == 2) && pr & _PAGE_PAT_LARGE))
- pt_dump_cont_printf(m, dmsg, "pat ");
+ pt_dump_cont_printf(m, dmsg, "PAT ");
else
pt_dump_cont_printf(m, dmsg, " ");
if (pr & _PAGE_GLOBAL)
@@ -198,8 +198,8 @@ static void note_page(struct seq_file *m, struct pg_state *st,
* we have now. "break" is either changing perms, levels or
* address space marker.
*/
- prot = pgprot_val(new_prot) & PTE_FLAGS_MASK;
- cur = pgprot_val(st->current_prot) & PTE_FLAGS_MASK;
+ prot = pgprot_val(new_prot);
+ cur = pgprot_val(st->current_prot);
if (!st->level) {
/* First entry */
@@ -269,13 +269,13 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
{
int i;
pte_t *start;
+ pgprotval_t prot;
start = (pte_t *) pmd_page_vaddr(addr);
for (i = 0; i < PTRS_PER_PTE; i++) {
- pgprot_t prot = pte_pgprot(*start);
-
+ prot = pte_flags(*start);
st->current_address = normalize_addr(P + i * PTE_LEVEL_MULT);
- note_page(m, st, prot, 4);
+ note_page(m, st, __pgprot(prot), 4);
start++;
}
}
@@ -287,18 +287,19 @@ static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr,
{
int i;
pmd_t *start;
+ pgprotval_t prot;
start = (pmd_t *) pud_page_vaddr(addr);
for (i = 0; i < PTRS_PER_PMD; i++) {
st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
if (!pmd_none(*start)) {
- pgprotval_t prot = pmd_val(*start) & PTE_FLAGS_MASK;
-
- if (pmd_large(*start) || !pmd_present(*start))
+ if (pmd_large(*start) || !pmd_present(*start)) {
+ prot = pmd_flags(*start);
note_page(m, st, __pgprot(prot), 3);
- else
+ } else {
walk_pte_level(m, st, *start,
P + i * PMD_LEVEL_MULT);
+ }
} else
note_page(m, st, __pgprot(0), 3);
start++;
@@ -318,19 +319,20 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
{
int i;
pud_t *start;
+ pgprotval_t prot;
start = (pud_t *) pgd_page_vaddr(addr);
for (i = 0; i < PTRS_PER_PUD; i++) {
st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
if (!pud_none(*start)) {
- pgprotval_t prot = pud_val(*start) & PTE_FLAGS_MASK;
-
- if (pud_large(*start) || !pud_present(*start))
+ if (pud_large(*start) || !pud_present(*start)) {
+ prot = pud_flags(*start);
note_page(m, st, __pgprot(prot), 2);
- else
+ } else {
walk_pmd_level(m, st, *start,
P + i * PUD_LEVEL_MULT);
+ }
} else
note_page(m, st, __pgprot(0), 2);
@@ -351,6 +353,7 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
#else
pgd_t *start = swapper_pg_dir;
#endif
+ pgprotval_t prot;
int i;
struct pg_state st = {};
@@ -362,13 +365,13 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
for (i = 0; i < PTRS_PER_PGD; i++) {
st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
if (!pgd_none(*start)) {
- pgprotval_t prot = pgd_val(*start) & PTE_FLAGS_MASK;
-
- if (pgd_large(*start) || !pgd_present(*start))
+ if (pgd_large(*start) || !pgd_present(*start)) {
+ prot = pgd_flags(*start);
note_page(m, &st, __pgprot(prot), 1);
- else
+ } else {
walk_pud_level(m, &st, *start,
i * PGD_LEVEL_MULT);
+ }
} else
note_page(m, &st, __pgprot(0), 1);
slow_virt_to_phys() calls lookup_address() to obtain *pte and
its level. It then calls pte_pfn() to obtain the PFN for any
level. This does not result the correct PFN when the large
PAT bit is set because pte_pfn() does not mask the large PAT bit
properly for PUD/PMD.
Fix slow_virt_to_phys() to use pud_pfn() and pmd_pfn() according
to the level.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/pageattr.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 727158c..ecc24e5 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -415,18 +415,28 @@ pmd_t *lookup_pmd_address(unsigned long address)
phys_addr_t slow_virt_to_phys(void *__virt_addr)
{
unsigned long virt_addr = (unsigned long)__virt_addr;
- phys_addr_t phys_addr;
- unsigned long offset;
+ unsigned long phys_addr, offset;
enum pg_level level;
- unsigned long pmask;
pte_t *pte;
pte = lookup_address(virt_addr, &level);
BUG_ON(!pte);
- pmask = page_level_mask(level);
- offset = virt_addr & ~pmask;
- phys_addr = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
- return (phys_addr | offset);
+
+ switch (level) {
+ case PG_LEVEL_1G:
+ phys_addr = pud_pfn(*(pud_t *)pte) << PAGE_SHIFT;
+ offset = virt_addr & ~PUD_PAGE_MASK;
+ break;
+ case PG_LEVEL_2M:
+ phys_addr = pmd_pfn(*(pmd_t *)pte) << PAGE_SHIFT;
+ offset = virt_addr & ~PMD_PAGE_MASK;
+ break;
+ default:
+ phys_addr = pte_pfn(*pte) << PAGE_SHIFT;
+ offset = virt_addr & ~PAGE_MASK;
+ }
+
+ return (phys_addr_t)(phys_addr | offset);
}
EXPORT_SYMBOL_GPL(slow_virt_to_phys);
gup_huge_pud() and gup_huge_pmd() cast *pud and *pmd to *pte,
and use pte_xxx() interfaces to obtain the flags and PFN.
However, the pte_xxx() interface does not handle the large
PAT bit properly for PUD/PMD.
Fix gup_huge_pud() and gup_huge_pmd() to use pud_xxx() and
pmd_xxx() interfaces according to their type.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/gup.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 81bf3d2..ae9a37b 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -118,21 +118,20 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
unsigned long end, int write, struct page **pages, int *nr)
{
unsigned long mask;
- pte_t pte = *(pte_t *)&pmd;
struct page *head, *page;
int refs;
mask = _PAGE_PRESENT|_PAGE_USER;
if (write)
mask |= _PAGE_RW;
- if ((pte_flags(pte) & mask) != mask)
+ if ((pmd_flags(pmd) & mask) != mask)
return 0;
/* hugepages are never "special" */
- VM_BUG_ON(pte_flags(pte) & _PAGE_SPECIAL);
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+ VM_BUG_ON(pmd_flags(pmd) & _PAGE_SPECIAL);
+ VM_BUG_ON(!pfn_valid(pmd_pfn(pmd)));
refs = 0;
- head = pte_page(pte);
+ head = pmd_page(pmd);
page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
do {
VM_BUG_ON_PAGE(compound_head(page) != head, page);
@@ -195,21 +194,20 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
unsigned long end, int write, struct page **pages, int *nr)
{
unsigned long mask;
- pte_t pte = *(pte_t *)&pud;
struct page *head, *page;
int refs;
mask = _PAGE_PRESENT|_PAGE_USER;
if (write)
mask |= _PAGE_RW;
- if ((pte_flags(pte) & mask) != mask)
+ if ((pud_flags(pud) & mask) != mask)
return 0;
/* hugepages are never "special" */
- VM_BUG_ON(pte_flags(pte) & _PAGE_SPECIAL);
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+ VM_BUG_ON(pud_flags(pud) & _PAGE_SPECIAL);
+ VM_BUG_ON(!pfn_valid(pud_pfn(pud)));
refs = 0;
- head = pte_page(pte);
+ head = pud_page(pud);
page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
do {
VM_BUG_ON_PAGE(compound_head(page) != head, page);
try_preserve_large_page() is called from __change_page_attr() to
change the map attribute by preserving the large page. This
function uses pte_pfn() and pte_pgprot() for PUD/PMD, which do not
handle the large PAT bit properly.
Fix try_preserve_large_page() to use corresponding p?d_pfn() and
p?d_pgprot().
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/pageattr.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index ecc24e5..2724755 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -469,7 +469,7 @@ static int
try_preserve_large_page(pte_t *kpte, unsigned long address,
struct cpa_data *cpa)
{
- unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn;
+ unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn;
pte_t new_pte, old_pte, *tmp;
pgprot_t old_prot, new_prot, req_prot;
int i, do_split = 1;
@@ -489,17 +489,21 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
switch (level) {
case PG_LEVEL_2M:
-#ifdef CONFIG_X86_64
+ old_prot = pmd_pgprot(*(pmd_t *)kpte);
+ old_pfn = pmd_pfn(*(pmd_t *)kpte);
+ break;
case PG_LEVEL_1G:
-#endif
- psize = page_level_size(level);
- pmask = page_level_mask(level);
+ old_prot = pud_pgprot(*(pud_t *)kpte);
+ old_pfn = pud_pfn(*(pud_t *)kpte);
break;
default:
do_split = -EINVAL;
goto out_unlock;
}
+ psize = page_level_size(level);
+ pmask = page_level_mask(level);
+
/*
* Calculate the number of pages, which fit into this large
* page starting at address:
@@ -515,7 +519,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
* up accordingly.
*/
old_pte = *kpte;
- old_prot = req_prot = pgprot_large_2_4k(pte_pgprot(old_pte));
+ old_prot = req_prot = pgprot_large_2_4k(old_prot);
pgprot_val(req_prot) &= ~pgprot_val(cpa->mask_clr);
pgprot_val(req_prot) |= pgprot_val(cpa->mask_set);
@@ -541,10 +545,10 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
req_prot = canon_pgprot(req_prot);
/*
- * old_pte points to the large page base address. So we need
+ * old_pfn points to the large page base pfn. So we need
* to add the offset of the virtual address:
*/
- pfn = pte_pfn(old_pte) + ((address & (psize - 1)) >> PAGE_SHIFT);
+ pfn = old_pfn + ((address & (psize - 1)) >> PAGE_SHIFT);
cpa->pfn = pfn;
new_prot = static_protections(req_prot, address, pfn);
@@ -555,7 +559,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
* the pages in the range we try to preserve:
*/
addr = address & pmask;
- pfn = pte_pfn(old_pte);
+ pfn = old_pfn;
for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr += PAGE_SIZE, pfn++) {
pgprot_t chk_prot = static_protections(req_prot, addr, pfn);
@@ -585,7 +589,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
* The address is aligned and the number of pages
* covers the full page.
*/
- new_pte = pfn_pte(pte_pfn(old_pte), new_prot);
+ new_pte = pfn_pte(old_pfn, new_prot);
__set_pmd_pte(kpte, address, new_pte);
cpa->flags |= CPA_FLUSHTLB;
do_split = 0;
__split_large_page() is called from __change_page_attr() to change
the map attribute by splitting the large page into smaller pages.
This function uses pte_pfn() and pte_pgprot() for PUD/PMD, which
do not handle the large PAT bit properly.
Fix __split_large_page() to use corresponding p?d_pfn() and p?d_pgprot().
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/pageattr.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 2724755..d0e40ed 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -606,7 +606,7 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
struct page *base)
{
pte_t *pbase = (pte_t *)page_address(base);
- unsigned long pfn, pfninc = 1;
+ unsigned long ref_pfn, pfn, pfninc = 1;
unsigned int i, level;
pte_t *tmp;
pgprot_t ref_prot;
@@ -623,26 +623,33 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
}
paravirt_alloc_pte(&init_mm, page_to_pfn(base));
- ref_prot = pte_pgprot(pte_clrhuge(*kpte));
- /* promote PAT bit to correct position */
- if (level == PG_LEVEL_2M)
+ switch (level) {
+ case PG_LEVEL_2M:
+ ref_prot = pmd_pgprot(*(pmd_t *)kpte);
+ /* clear PSE and promote PAT bit to correct position */
ref_prot = pgprot_large_2_4k(ref_prot);
+ ref_pfn = pmd_pfn(*(pmd_t *)kpte);
+ break;
-#ifdef CONFIG_X86_64
- if (level == PG_LEVEL_1G) {
+ case PG_LEVEL_1G:
+ ref_prot = pud_pgprot(*(pud_t *)kpte);
+ ref_pfn = pud_pfn(*(pud_t *)kpte);
pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT;
+
/*
- * Set the PSE flags only if the PRESENT flag is set
+ * Clear the PSE flags if the PRESENT flag is not set
* otherwise pmd_present/pmd_huge will return true
* even on a non present pmd.
*/
- if (pgprot_val(ref_prot) & _PAGE_PRESENT)
- pgprot_val(ref_prot) |= _PAGE_PSE;
- else
+ if (!(pgprot_val(ref_prot) & _PAGE_PRESENT))
pgprot_val(ref_prot) &= ~_PAGE_PSE;
+ break;
+
+ default:
+ spin_unlock(&pgd_lock);
+ return 1;
}
-#endif
/*
* Set the GLOBAL flags only if the PRESENT flag is set
@@ -658,7 +665,7 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
/*
* Get the target pfn from the original entry:
*/
- pfn = pte_pfn(*kpte);
+ pfn = ref_pfn;
for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot)));
try_preserve_large_page() checks if new_prot is the same as
old_prot. If so, it simply sets do_split to 0, and returns
with no-operation. However, old_prot is set as a 4KB pgprot
value while new_prot is a large page pgprot value.
Now that old_prot is initially set from p?d_pgprot() as a
large page pgprot value, fix it by not overwriting old_prot
with a 4KB pgprot value.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/pageattr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d0e40ed..6f9b885 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -519,7 +519,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
* up accordingly.
*/
old_pte = *kpte;
- old_prot = req_prot = pgprot_large_2_4k(old_prot);
+ req_prot = pgprot_large_2_4k(old_prot);
pgprot_val(req_prot) &= ~pgprot_val(cpa->mask_clr);
pgprot_val(req_prot) |= pgprot_val(cpa->mask_set);
On Wed, 5 Aug 2015, Toshi Kani wrote:
> In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32bit
> kernel configuration by re-defining it to CONFIG_X86_32. However,
> it does not re-define CONFIG_PGTABLE_LEVELS leaving it as 4 levels.
> Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 as X86_PAE is not
> set.
You fail to explain WHY this is required. I have not yet spotted any
code in vclock_gettime.c which is affected by this.
Thanks,
tglx
On 8/20/2015 1:46 PM, Thomas Gleixner wrote:
> On Wed, 5 Aug 2015, Toshi Kani wrote:
>
>> In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32bit
>> kernel configuration by re-defining it to CONFIG_X86_32. However,
>> it does not re-define CONFIG_PGTABLE_LEVELS leaving it as 4 levels.
>> Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 as X86_PAE is not
>> set.
> You fail to explain WHY this is required. I have not yet spotted any
> code in vclock_gettime.c which is affected by this.
Sorry about that. Without this patch 01, applying patch 02 & 03 causes
the following compile errors in vclock_gettime.c. This is because it
includes pgtable_type.h (see blow), which now requires PUD_SHIFT and
PMD_SHIFT defined properly. In case of X86_32, pgtable_type.h includes
pgtable_nopud.h and pgtable-nopmd.h, which define these SHIFTs when
CONFIG_PGTABLE_LEVEL is set to 2 (or 3 if PAE is also defined).
In file included from ./arch/x86/include/asm/paravirt_types.h:44:0,
from ./arch/x86/include/asm/ptrace.h:71,
from ./arch/x86/include/asm/alternative.h:8,
from ./arch/x86/include/asm/bitops.h:16,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/list.h:8,
from include/linux/preempt.h:10,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/clocksource.h:12,
from ./arch/x86/include/asm/vgtod.h:5,
from arch/x86/entry/vdso/vdso32/../vclock_gettime.c:15,
from arch/x86/entry/vdso/vdso32/vclock_gettime.c:30:
./arch/x86/include/asm/pgtable_types.h: In function pud_pfn_mask・
./arch/x86/include/asm/pgtable_types.h:282:23: error: PUD_SHIFT・
undeclared (first use in this function)
return PUD_PAGE_MASK & PHYSICAL_PAGE_MASK;
^
./arch/x86/include/asm/pgtable_types.h:282:23: note: each undeclared
identifier is reported only once for each function it appears in
./arch/x86/include/asm/pgtable_types.h: In function pud_flags_mask・
./arch/x86/include/asm/pgtable_types.h:290:25: error: PUD_SHIFT・
undeclared (first use in this function)
return ~(PUD_PAGE_MASK & (pudval_t)PHYSICAL_PAGE_MASK);
^
./arch/x86/include/asm/pgtable_types.h: In function pmd_pfn_mask・
./arch/x86/include/asm/pgtable_types.h:303:23: error: PMD_SHIFT・
undeclared (first use in this function)
return PMD_PAGE_MASK & PHYSICAL_PAGE_MASK;
^
./arch/x86/include/asm/pgtable_types.h: In function pmd_flags_mask・
./arch/x86/include/asm/pgtable_types.h:311:25: error: PMD_SHIFT・
undeclared (first use in this function)
return ~(PMD_PAGE_MASK & (pmdval_t)PHYSICAL_PAGE_MASK);
^
scripts/Makefile.build:258: recipe for target
'arch/x86/entry/vdso/vdso32/vclock_gettime.o' failed
make[3]: *** [arch/x86/entry/vdso/vdso32/vclock_gettime.o] Error 1
make[3]: *** Waiting for unfinished jobs....
Thanks,
-Toshi
On Thu, 2015-08-20 at 17:04 -0600, Toshi Kani wrote:
> On 8/20/2015 1:46 PM, Thomas Gleixner wrote:
> > On Wed, 5 Aug 2015, Toshi Kani wrote:
> >
> > > In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32bit
> > > kernel configuration by re-defining it to CONFIG_X86_32. However,
> > > it does not re-define CONFIG_PGTABLE_LEVELS leaving it as 4 levels.
> > > Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 as X86_PAE is not
> > > set.
> > You fail to explain WHY this is required. I have not yet spotted any
> > code in vclock_gettime.c which is affected by this.
>
> Sorry about that. Without this patch 01, applying patch 02 & 03 causes
> the following compile errors in vclock_gettime.c. This is because it
> includes pgtable_type.h (see blow), which now requires PUD_SHIFT and
> PMD_SHIFT defined properly. In case of X86_32, pgtable_type.h includes
> pgtable_nopud.h and pgtable-nopmd.h, which define these SHIFTs when
> CONFIG_PGTABLE_LEVEL is set to 2 (or 3 if PAE is also defined).
> :
Attached is patch 01/10 with updated descriptions. The rest of the patchset
still applies cleanly.
Please let me know if you have any further comments.
Thanks,
-Toshi
----
Subject: [PATCH v3 UPDATE 1/10] x86/vdso32: Define PGTABLE_LEVELS to 32bit
VDSO
In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32-bit
non-PAE kernel configuration by re-defining it to CONFIG_X86_32.
However, it does not re-define CONFIG_PGTABLE_LEVELS leaving it
as 4 levels.
This mismatch leads <asm/pgtable_type.h> to NOT include <asm-generic/
pgtable-nopud.h> and <asm-generic/pgtable-nopmd.h>, which will cause
compile errors when a later patch enhances <asm/pgtable_type.h> to
use PUD_SHIFT and PMD_SHIFT. These -nopud & -nopmd headers define
these SHIFTs for the 32-bit non-PAE kernel.
Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 levels.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
index 175cc72..87a86e0 100644
--- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
@@ -14,11 +14,13 @@
*/
#undef CONFIG_64BIT
#undef CONFIG_X86_64
+#undef CONFIG_PGTABLE_LEVELS
#undef CONFIG_ILLEGAL_POINTER_VALUE
#undef CONFIG_SPARSEMEM_VMEMMAP
#undef CONFIG_NR_CPUS
#define CONFIG_X86_32 1
+#define CONFIG_PGTABLE_LEVELS 2
#define CONFIG_PAGE_OFFSET 0
#define CONFIG_ILLEGAL_POINTER_VALUE 0
#define CONFIG_NR_CPUS 1
On Wed, 5 Aug 2015, Toshi Kani wrote:
> The PAT bit gets relocated to bit 12 when PUD and PMD mappings are
> used. This bit 12, however, is not covered by PTE_FLAGS_MASK, which
> is corrently used for masking pfn and flags for all cases.
>
> Fix pud/pmd interfaces to handle pfn and flags properly by using
> P?D_PAGE_MASK when PUD/PMD mappings are used, i.e. PSE bit is set.
Can you please split that into a patch introducing and describing the
new mask helper functions and a second one making use of it?
Thanks,
tglx
On Tue, 2015-08-25 at 10:16 +0200, Thomas Gleixner wrote:
> On Wed, 5 Aug 2015, Toshi Kani wrote:
>
> > The PAT bit gets relocated to bit 12 when PUD and PMD mappings are
> > used. This bit 12, however, is not covered by PTE_FLAGS_MASK, which
> > is corrently used for masking pfn and flags for all cases.
> >
> > Fix pud/pmd interfaces to handle pfn and flags properly by using
> > P?D_PAGE_MASK when PUD/PMD mappings are used, i.e. PSE bit is set.
>
> Can you please split that into a patch introducing and describing the
> new mask helper functions and a second one making use of it?
Will do. I will send out v4 patchset today with this update (and the patch
01 update).
Thanks,
-Toshi