2015-07-15 16:25:00

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 0/4] x86, mm: Handle large PAT bit in pud/pmd interfaces

The PAT bit gets relocated to bit 12 when PUD and PMD mappings are used.
This bit 12, however, is not covered by PTE_FLAGS_MASK, which is corrently
used for masking pfn and flags for all cases.

Patch 1/4-2/4 make changes necessary for patch 3/4 to use P?D_PAGE_MASK.

Patch 3/4 fixes pud/pmd interfaces to handle the PAT bit when PUD and PMD
mappings are used.

Patch 3/4 fixes /sys/kernel/debug/kernel_page_tables to show the PAT bit
properly.

Note, the PAT bit is first enabled in 4.2-rc1 with WT mappings.

---
v2:
- Change p?n_pfn() to handle the PAT bit. (Juergen Gross)
- Mask pfn and flags with P?D_PAGE_MASK. (Juergen Gross)
- Change p?d_page_vaddr() and p?d_page() to handle the PAT bit.

---
Toshi Kani (4):
1/4 x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
2/4 x86, asm: Move PUD_PAGE macros to page_types.h
3/4 x86: Fix pud/pmd interfaces to handle large PAT bit
4/4 x86, mm: Fix page table dump to show PAT bit

---
4fa1ff9a08c...dd58e3d52618b00dd768de1753c35611906fcbee --stat
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 ++
arch/x86/include/asm/page_64_types.h | 3 ---
arch/x86/include/asm/page_types.h | 3 +++
arch/x86/include/asm/pgtable.h | 14 +++++-----
arch/x86/include/asm/pgtable_types.h | 40 ++++++++++++++++++++++++++---
arch/x86/mm/dump_pagetables.c | 39 +++++++++++++++-------------
6 files changed, 70 insertions(+), 31 deletions(-)


2015-07-15 16:25:07

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 1/4] x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO

In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32bit
kernel configuration by re-defining it to CONFIG_X86_32. However,
it does not re-define CONFIG_PGTABLE_LEVELS leaving it as 4 levels.
Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 as X86_PAE is not
set.

Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
index 175cc72..87a86e0 100644
--- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
@@ -14,11 +14,13 @@
*/
#undef CONFIG_64BIT
#undef CONFIG_X86_64
+#undef CONFIG_PGTABLE_LEVELS
#undef CONFIG_ILLEGAL_POINTER_VALUE
#undef CONFIG_SPARSEMEM_VMEMMAP
#undef CONFIG_NR_CPUS

#define CONFIG_X86_32 1
+#define CONFIG_PGTABLE_LEVELS 2
#define CONFIG_PAGE_OFFSET 0
#define CONFIG_ILLEGAL_POINTER_VALUE 0
#define CONFIG_NR_CPUS 1

2015-07-15 16:25:10

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 2/4] x86, asm: Move PUD_PAGE macros to page_types.h

PUD_SHIFT is defined according to a kernel configuration, which
allows it be commonly used by any kernels. However, PUD_PAGE_SIZE
and PUD_PAGE_MASK, which are calculated from PUD_SHIFT, are defined
in page_64_types.h, which allows them be used by a 64-bit kernel
only.

Move PUD_PAGE_SIZE and PUD_PAGE_MASK to page_types.h so that they
can be used by any kernels as well.

Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/page_64_types.h | 3 ---
arch/x86/include/asm/page_types.h | 3 +++
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 4edd53b..4928cf0 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -26,9 +26,6 @@
#define MCE_STACK 4
#define N_EXCEPTION_STACKS 4 /* hw limit: 7 */

-#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
-#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
-
/*
* Set __PAGE_OFFSET to the most negative possible address +
* PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index c7c712f..c5b7fb2 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -20,6 +20,9 @@
#define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1))

+#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
+
#define HPAGE_SHIFT PMD_SHIFT
#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT)
#define HPAGE_MASK (~(HPAGE_SIZE - 1))

2015-07-15 16:25:12

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 3/4] x86: Fix pud/pmd interfaces to handle large PAT bit

The PAT bit gets relocated to bit 12 when PUD and PMD mappings are
used. This bit 12, however, is not covered by PTE_FLAGS_MASK, which
is corrently used for masking pfn and flags for all cases.

Fix pud/pmd interfaces to handle pfn and flags properly by using
P?D_PAGE_MASK when PUD/PMD mappings are used, i.e. PSE bit is set.

Suggested-by: Juergen Gross <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Konrad Wilk <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/include/asm/pgtable.h | 14 +++++++-----
arch/x86/include/asm/pgtable_types.h | 40 +++++++++++++++++++++++++++++++---
2 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5b..0733ec7 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -142,12 +142,12 @@ static inline unsigned long pte_pfn(pte_t pte)

static inline unsigned long pmd_pfn(pmd_t pmd)
{
- return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+ return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
}

static inline unsigned long pud_pfn(pud_t pud)
{
- return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+ return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
}

#define pte_page(pte) pfn_to_page(pte_pfn(pte))
@@ -502,14 +502,15 @@ static inline int pmd_none(pmd_t pmd)

static inline unsigned long pmd_page_vaddr(pmd_t pmd)
{
- return (unsigned long)__va(pmd_val(pmd) & PTE_PFN_MASK);
+ return (unsigned long)__va(pmd_val(pmd) & pmd_pfn_mask(pmd));
}

/*
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pmd_page(pmd) pfn_to_page((pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT)
+#define pmd_page(pmd) \
+ pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)

/*
* the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -570,14 +571,15 @@ static inline int pud_present(pud_t pud)

static inline unsigned long pud_page_vaddr(pud_t pud)
{
- return (unsigned long)__va((unsigned long)pud_val(pud) & PTE_PFN_MASK);
+ return (unsigned long)__va(pud_val(pud) & pud_pfn_mask(pud));
}

/*
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT)
+#define pud_page(pud) \
+ pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)

/* Find an entry in the second-level page table.. */
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 13f310b..dd5b0aa 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -209,10 +209,10 @@ enum page_cache_mode {

#include <linux/types.h>

-/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */
+/* Extracts the PFN from a (pte|pmd|pud|pgd)val_t of a 4KB page */
#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK)

-/* PTE_FLAGS_MASK extracts the flags from a (pte|pmd|pud|pgd)val_t */
+/* Extracts the flags from a (pte|pmd|pud|pgd)val_t of a 4KB page */
#define PTE_FLAGS_MASK (~PTE_PFN_MASK)

typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
@@ -276,14 +276,46 @@ static inline pmdval_t native_pmd_val(pmd_t pmd)
}
#endif

+static inline pudval_t pud_pfn_mask(pud_t pud)
+{
+ if (native_pud_val(pud) & _PAGE_PSE)
+ return PUD_PAGE_MASK & PHYSICAL_PAGE_MASK;
+ else
+ return PTE_PFN_MASK;
+}
+
+static inline pudval_t pud_flags_mask(pud_t pud)
+{
+ if (native_pud_val(pud) & _PAGE_PSE)
+ return ~(PUD_PAGE_MASK & (pudval_t)PHYSICAL_PAGE_MASK);
+ else
+ return ~PTE_PFN_MASK;
+}
+
static inline pudval_t pud_flags(pud_t pud)
{
- return native_pud_val(pud) & PTE_FLAGS_MASK;
+ return native_pud_val(pud) & pud_flags_mask(pud);
+}
+
+static inline pmdval_t pmd_pfn_mask(pmd_t pmd)
+{
+ if (native_pmd_val(pmd) & _PAGE_PSE)
+ return PMD_PAGE_MASK & PHYSICAL_PAGE_MASK;
+ else
+ return PTE_PFN_MASK;
+}
+
+static inline pmdval_t pmd_flags_mask(pmd_t pmd)
+{
+ if (native_pmd_val(pmd) & _PAGE_PSE)
+ return ~(PMD_PAGE_MASK & (pmdval_t)PHYSICAL_PAGE_MASK);
+ else
+ return ~PTE_PFN_MASK;
}

static inline pmdval_t pmd_flags(pmd_t pmd)
{
- return native_pmd_val(pmd) & PTE_FLAGS_MASK;
+ return native_pmd_val(pmd) & pmd_flags_mask(pmd);
}

static inline pte_t native_make_pte(pteval_t val)

2015-07-15 16:25:09

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v2 4/4] x86, mm: Fix page table dump to show PAT bit

/sys/kernel/debug/kernel_page_tables does not show the PAT bit
for PUD and PMD mappings. This is because walk_pud_level(),
walk_pmd_level() and note_page() mask the flags with PTE_FLAGS_MASK,
which does not cover their PAT bit, _PAGE_PAT_LARGE.

Fix it by replacing the use of PTE_FLAGS_MASK with p?d_flags(),
which masks the flags properly.

Change also to show the PAT bit as "PAT" to be consistent with
other bits.

Reported-by: Robert Elliott <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
---
arch/x86/mm/dump_pagetables.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f0cedf3..71ab2d7 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -155,7 +155,7 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg)
pt_dump_cont_printf(m, dmsg, " ");
if ((level == 4 && pr & _PAGE_PAT) ||
((level == 3 || level == 2) && pr & _PAGE_PAT_LARGE))
- pt_dump_cont_printf(m, dmsg, "pat ");
+ pt_dump_cont_printf(m, dmsg, "PAT ");
else
pt_dump_cont_printf(m, dmsg, " ");
if (pr & _PAGE_GLOBAL)
@@ -198,8 +198,8 @@ static void note_page(struct seq_file *m, struct pg_state *st,
* we have now. "break" is either changing perms, levels or
* address space marker.
*/
- prot = pgprot_val(new_prot) & PTE_FLAGS_MASK;
- cur = pgprot_val(st->current_prot) & PTE_FLAGS_MASK;
+ prot = pgprot_val(new_prot);
+ cur = pgprot_val(st->current_prot);

if (!st->level) {
/* First entry */
@@ -269,13 +269,13 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
{
int i;
pte_t *start;
+ pgprotval_t prot;

start = (pte_t *) pmd_page_vaddr(addr);
for (i = 0; i < PTRS_PER_PTE; i++) {
- pgprot_t prot = pte_pgprot(*start);
-
+ prot = pte_flags(*start);
st->current_address = normalize_addr(P + i * PTE_LEVEL_MULT);
- note_page(m, st, prot, 4);
+ note_page(m, st, __pgprot(prot), 4);
start++;
}
}
@@ -287,18 +287,19 @@ static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr,
{
int i;
pmd_t *start;
+ pgprotval_t prot;

start = (pmd_t *) pud_page_vaddr(addr);
for (i = 0; i < PTRS_PER_PMD; i++) {
st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
if (!pmd_none(*start)) {
- pgprotval_t prot = pmd_val(*start) & PTE_FLAGS_MASK;
-
- if (pmd_large(*start) || !pmd_present(*start))
+ if (pmd_large(*start) || !pmd_present(*start)) {
+ prot = pmd_flags(*start);
note_page(m, st, __pgprot(prot), 3);
- else
+ } else {
walk_pte_level(m, st, *start,
P + i * PMD_LEVEL_MULT);
+ }
} else
note_page(m, st, __pgprot(0), 3);
start++;
@@ -318,19 +319,20 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
{
int i;
pud_t *start;
+ pgprotval_t prot;

start = (pud_t *) pgd_page_vaddr(addr);

for (i = 0; i < PTRS_PER_PUD; i++) {
st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
if (!pud_none(*start)) {
- pgprotval_t prot = pud_val(*start) & PTE_FLAGS_MASK;
-
- if (pud_large(*start) || !pud_present(*start))
+ if (pud_large(*start) || !pud_present(*start)) {
+ prot = pud_flags(*start);
note_page(m, st, __pgprot(prot), 2);
- else
+ } else {
walk_pmd_level(m, st, *start,
P + i * PUD_LEVEL_MULT);
+ }
} else
note_page(m, st, __pgprot(0), 2);

@@ -351,6 +353,7 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
#else
pgd_t *start = swapper_pg_dir;
#endif
+ pgprotval_t prot;
int i;
struct pg_state st = {};

@@ -362,13 +365,13 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
for (i = 0; i < PTRS_PER_PGD; i++) {
st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
if (!pgd_none(*start)) {
- pgprotval_t prot = pgd_val(*start) & PTE_FLAGS_MASK;
-
- if (pgd_large(*start) || !pgd_present(*start))
+ if (pgd_large(*start) || !pgd_present(*start)) {
+ prot = pgd_flags(*start);
note_page(m, &st, __pgprot(prot), 1);
- else
+ } else {
walk_pud_level(m, &st, *start,
i * PGD_LEVEL_MULT);
+ }
} else
note_page(m, &st, __pgprot(0), 1);

2015-07-21 08:05:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] x86, mm: Handle large PAT bit in pud/pmd interfaces


* Toshi Kani <[email protected]> wrote:

> The PAT bit gets relocated to bit 12 when PUD and PMD mappings are used.
> This bit 12, however, is not covered by PTE_FLAGS_MASK, which is corrently
> used for masking pfn and flags for all cases.
>
> Patch 1/4-2/4 make changes necessary for patch 3/4 to use P?D_PAGE_MASK.
>
> Patch 3/4 fixes pud/pmd interfaces to handle the PAT bit when PUD and PMD
> mappings are used.
>
> Patch 3/4 fixes /sys/kernel/debug/kernel_page_tables to show the PAT bit
> properly.
>
> Note, the PAT bit is first enabled in 4.2-rc1 with WT mappings.

Are patches 1-3 only needed to fix /sys/kernel/debug/kernel_page_tables output, or
are there other things fixed as well? The patches do not tell us any of that
information ...

Thanks,

Ingo

2015-07-21 14:53:12

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] x86, mm: Handle large PAT bit in pud/pmd interfaces

On Tue, 2015-07-21 at 10:05 +0200, Ingo Molnar wrote:
> * Toshi Kani <[email protected]> wrote:
>
> > The PAT bit gets relocated to bit 12 when PUD and PMD mappings are
> > used.
> > This bit 12, however, is not covered by PTE_FLAGS_MASK, which is
> > corrently
> > used for masking pfn and flags for all cases.
> >
> > Patch 1/4-2/4 make changes necessary for patch 3/4 to use
> > P?D_PAGE_MASK.
> >
> > Patch 3/4 fixes pud/pmd interfaces to handle the PAT bit when PUD and
> > PMD
> > mappings are used.
> >
> > Patch 3/4 fixes /sys/kernel/debug/kernel_page_tables to show the PAT
> > bit
> > properly.
> >
> > Note, the PAT bit is first enabled in 4.2-rc1 with WT mappings.
>
> Are patches 1-3 only needed to fix /sys/kernel/debug/kernel_page_tables
> output, or
> are there other things fixed as well? The patches do not tell us any of
> that information ...

Patch 3 (and patch 1-2 needed for patch 3) fixes multiple pud/pmd
interfaces to work properly with _PAGE_PAT_LARGE bit set. Because pmem is
the only module that can create a range with this bit set with large page
WT maps in 4.2, this issue has not been exposed other than the case in
kernel_page_tables fixed by patch 4. Since there can be other cases in
future, all patches should go to 4.2 to prevent them to happen. There is
no issue in 4.1 & older since they cannot set the bit.

Thanks,
-Toshi