2020-04-28 19:46:54

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 0/7] Record the mm_struct in the page table pages

From: "Matthew Wilcox (Oracle)" <[email protected]>

Pages which are in use as page tables have some space unused in struct
page. It would be handy to have a pointer to the struct mm_struct that
they belong to so that we can handle uncorrectable errors in page tables
more gracefully. There are a few other things we could use it for too,
such as checking that the page table entry actually belongs to the task
we think it ought to. This patch series does none of that, but does
lay the groundwork for it.

Matthew Wilcox (Oracle) (7):
mm: Document x86 uses a linked list of pgds
mm: Move pt_mm within struct page
arm: Thread mm_struct throughout page table allocation
arm64: Thread mm_struct throughout page table allocation
m68k: Thread mm_struct throughout page table allocation
mm: Set pt_mm in PTE constructor
mm: Set pt_mm in PMD constructor

arch/arc/include/asm/pgalloc.h | 2 +-
arch/arm/mm/mmu.c | 66 ++++++++---------
arch/arm64/include/asm/pgalloc.h | 2 +-
arch/arm64/mm/mmu.c | 93 ++++++++++++------------
arch/m68k/include/asm/mcf_pgalloc.h | 2 +-
arch/m68k/include/asm/motorola_pgalloc.h | 10 +--
arch/m68k/mm/motorola.c | 4 +-
arch/openrisc/include/asm/pgalloc.h | 2 +-
arch/powerpc/mm/book3s64/pgtable.c | 2 +-
arch/powerpc/mm/pgtable-frag.c | 2 +-
arch/s390/include/asm/pgalloc.h | 2 +-
arch/s390/mm/pgalloc.c | 2 +-
arch/sparc/mm/init_64.c | 2 +-
arch/sparc/mm/srmmu.c | 2 +-
arch/x86/include/asm/pgalloc.h | 2 +-
arch/x86/mm/pgtable.c | 3 +-
arch/xtensa/include/asm/pgalloc.h | 2 +-
include/asm-generic/pgalloc.h | 2 +-
include/linux/mm.h | 18 ++++-
include/linux/mm_types.h | 12 +--
20 files changed, 122 insertions(+), 110 deletions(-)


base-commit: 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c
--
2.26.2


2020-04-28 19:47:13

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 5/7] m68k: Thread mm_struct throughout page table allocation

From: "Matthew Wilcox (Oracle)" <[email protected]>

An upcoming patch will pass mm_struct to the page table constructor.
Make sure m68k has the appropriate mm_struct at the point it needs to
call the constructor.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
arch/m68k/include/asm/motorola_pgalloc.h | 10 +++++-----
arch/m68k/mm/motorola.c | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/m68k/include/asm/motorola_pgalloc.h b/arch/m68k/include/asm/motorola_pgalloc.h
index c66e42917912..dbac0c597397 100644
--- a/arch/m68k/include/asm/motorola_pgalloc.h
+++ b/arch/m68k/include/asm/motorola_pgalloc.h
@@ -15,12 +15,12 @@ enum m68k_table_types {
};

extern void init_pointer_table(void *table, int type);
-extern void *get_pointer_table(int type);
+extern void *get_pointer_table(int type, struct mm_struct *mm);
extern int free_pointer_table(void *table, int type);

static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
{
- return get_pointer_table(TABLE_PTE);
+ return get_pointer_table(TABLE_PTE, mm);
}

static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@@ -30,7 +30,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)

static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
{
- return get_pointer_table(TABLE_PTE);
+ return get_pointer_table(TABLE_PTE, mm);
}

static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
@@ -47,7 +47,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,

static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
{
- return get_pointer_table(TABLE_PMD);
+ return get_pointer_table(TABLE_PMD, mm);
}

static inline int pmd_free(struct mm_struct *mm, pmd_t *pmd)
@@ -69,7 +69,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)

static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- return get_pointer_table(TABLE_PGD);
+ return get_pointer_table(TABLE_PGD, mm);
}


diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index fc16190ec2d6..7743480be0cf 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -113,7 +113,7 @@ void __init init_pointer_table(void *table, int type)
return;
}

-void *get_pointer_table(int type)
+void *get_pointer_table(int type, struct mm_struct *mm)
{
ptable_desc *dp = ptable_list[type].next;
unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp);
--
2.26.2

2020-04-28 19:49:05

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 7/7] mm: Set pt_mm in PMD constructor

From: "Matthew Wilcox (Oracle)" <[email protected]>

By setting pt_mm for pages in use as page tables, we can help with
debugging and lay the foundation for handling hardware errors in page
tables more gracefully. It also opens up the possibility for adding
more sanity checks in the future.

Also set and clear the PageTable bit so that we know these are page tables.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
arch/arm64/include/asm/pgalloc.h | 2 +-
arch/arm64/mm/mmu.c | 2 +-
arch/powerpc/mm/book3s64/pgtable.c | 2 +-
arch/s390/include/asm/pgalloc.h | 2 +-
arch/x86/include/asm/pgalloc.h | 2 +-
arch/x86/mm/pgtable.c | 2 +-
include/linux/mm.h | 13 +++++++++++--
7 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 172d76fa0245..920da9c5786c 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -30,7 +30,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
page = alloc_page(gfp);
if (!page)
return NULL;
- if (!pgtable_pmd_page_ctor(page)) {
+ if (!pgtable_pmd_page_ctor(page, mm)) {
__free_page(page);
return NULL;
}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c706bed1e496..b7bdde1990be 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -383,7 +383,7 @@ static phys_addr_t pgd_pgtable_alloc(int shift, struct mm_struct *mm)
if (shift == PAGE_SHIFT)
BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa), mm));
else if (shift == PMD_SHIFT)
- BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
+ BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa), mm));

return pa;
}
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index e0bb69c616e4..9fda5287c197 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -297,7 +297,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
page = alloc_page(gfp);
if (!page)
return NULL;
- if (!pgtable_pmd_page_ctor(page)) {
+ if (!pgtable_pmd_page_ctor(page, mm)) {
__free_pages(page, 0);
return NULL;
}
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 74a352f8c0d1..bebad4e5d42a 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long vmaddr)
if (!table)
return NULL;
crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
- if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
+ if (!pgtable_pmd_page_ctor(virt_to_page(table), mm)) {
crst_table_free(mm, table);
return NULL;
}
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index 29aa7859bdee..33514f0a9e79 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -96,7 +96,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
page = alloc_pages(gfp, 0);
if (!page)
return NULL;
- if (!pgtable_pmd_page_ctor(page)) {
+ if (!pgtable_pmd_page_ctor(page, mm)) {
__free_pages(page, 0);
return NULL;
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f5f46737aea0..8f4255662c5a 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -229,7 +229,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
if (!pmd)
failed = true;
- if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
+ if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd), mm)) {
free_page((unsigned long)pmd);
pmd = NULL;
failed = true;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a98eebeba91..e2924d900fc5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2216,11 +2216,14 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
return ptlock_ptr(pmd_to_page(pmd));
}

-static inline bool pgtable_pmd_page_ctor(struct page *page)
+static inline
+bool pgtable_pmd_page_ctor(struct page *page, struct mm_struct *mm)
{
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
page->pmd_huge_pte = NULL;
#endif
+ __SetPageTable(page);
+ page->pt_mm = mm;
return ptlock_init(page);
}

@@ -2229,6 +2232,8 @@ static inline void pgtable_pmd_page_dtor(struct page *page)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
#endif
+ __ClearPageTable(page);
+ page->pt_mm = NULL;
ptlock_free(page);
}

@@ -2241,7 +2246,11 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
return &mm->page_table_lock;
}

-static inline bool pgtable_pmd_page_ctor(struct page *page) { return true; }
+static inline
+bool pgtable_pmd_page_ctor(struct page *page, struct mm_struct *mm)
+{
+ return true;
+}
static inline void pgtable_pmd_page_dtor(struct page *page) {}

#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
--
2.26.2

2020-04-28 19:49:05

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 6/7] mm: Set pt_mm in PTE constructor

From: "Matthew Wilcox (Oracle)" <[email protected]>

By setting pt_mm for pages in use as page tables, we can help with
debugging and lay the foundation for handling hardware errors in page
tables more gracefully. It also opens up the possibility for adding
more sanity checks in the future.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
arch/arc/include/asm/pgalloc.h | 2 +-
arch/arm/mm/mmu.c | 2 +-
arch/arm64/mm/mmu.c | 2 +-
arch/m68k/include/asm/mcf_pgalloc.h | 2 +-
arch/m68k/mm/motorola.c | 2 +-
arch/openrisc/include/asm/pgalloc.h | 2 +-
arch/powerpc/mm/pgtable-frag.c | 2 +-
arch/s390/mm/pgalloc.c | 2 +-
arch/sparc/mm/init_64.c | 2 +-
arch/sparc/mm/srmmu.c | 2 +-
arch/xtensa/include/asm/pgalloc.h | 2 +-
include/asm-generic/pgalloc.h | 2 +-
include/linux/mm.h | 5 ++++-
13 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/arc/include/asm/pgalloc.h b/arch/arc/include/asm/pgalloc.h
index b747f2ec2928..5f6b1f3bc2a2 100644
--- a/arch/arc/include/asm/pgalloc.h
+++ b/arch/arc/include/asm/pgalloc.h
@@ -108,7 +108,7 @@ pte_alloc_one(struct mm_struct *mm)
return 0;
memzero((void *)pte_pg, PTRS_PER_PTE * sizeof(pte_t));
page = virt_to_page(pte_pg);
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
__free_page(page);
return 0;
}
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index e5275bfbe695..9c16c45570ba 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -707,7 +707,7 @@ static void *__init late_alloc(unsigned long sz, struct mm_struct *mm)
{
void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));

- if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
+ if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr), mm))
BUG();
return ptr;
}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 69ecc83c3be0..c706bed1e496 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -381,7 +381,7 @@ static phys_addr_t pgd_pgtable_alloc(int shift, struct mm_struct *mm)
* folded, and if so pgtable_pmd_page_ctor() becomes nop.
*/
if (shift == PAGE_SHIFT)
- BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
+ BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa), mm));
else if (shift == PMD_SHIFT)
BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h
index bc1228e00518..369a3523e834 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -50,7 +50,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)

if (!page)
return NULL;
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
__free_page(page);
return NULL;
}
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 7743480be0cf..6bb7c9f348ad 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -137,7 +137,7 @@ void *get_pointer_table(int type, struct mm_struct *mm)
* m68k doesn't have SPLIT_PTE_PTLOCKS for not having
* SMP.
*/
- pgtable_pte_page_ctor(virt_to_page(page));
+ pgtable_pte_page_ctor(virt_to_page(page, mm));
}

mmu_page_ctor(page);
diff --git a/arch/openrisc/include/asm/pgalloc.h b/arch/openrisc/include/asm/pgalloc.h
index da12a4c38c4b..1a80dfc928b5 100644
--- a/arch/openrisc/include/asm/pgalloc.h
+++ b/arch/openrisc/include/asm/pgalloc.h
@@ -75,7 +75,7 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm)
if (!pte)
return NULL;
clear_page(page_address(pte));
- if (!pgtable_pte_page_ctor(pte)) {
+ if (!pgtable_pte_page_ctor(pte, mm)) {
__free_page(pte);
return NULL;
}
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index ee4bd6d38602..59a8c85e01ac 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -61,7 +61,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
if (!page)
return NULL;
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
__free_page(page);
return NULL;
}
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 498c98a312f4..0363828749e2 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -208,7 +208,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
page = alloc_page(GFP_KERNEL);
if (!page)
return NULL;
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
__free_page(page);
return NULL;
}
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 1cf0d666dea3..d2cc80828415 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2928,7 +2928,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page)
return NULL;
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
free_unref_page(page);
return NULL;
}
diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
index b7c94de70cca..019ff2019b55 100644
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -382,7 +382,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
if ((pte = (unsigned long)pte_alloc_one_kernel(mm)) == 0)
return NULL;
page = pfn_to_page(__nocache_pa(pte) >> PAGE_SHIFT);
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
__free_page(page);
return NULL;
}
diff --git a/arch/xtensa/include/asm/pgalloc.h b/arch/xtensa/include/asm/pgalloc.h
index 1d38f0e755ba..43cc05255832 100644
--- a/arch/xtensa/include/asm/pgalloc.h
+++ b/arch/xtensa/include/asm/pgalloc.h
@@ -55,7 +55,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
if (!pte)
return NULL;
page = virt_to_page(pte);
- if (!pgtable_pte_page_ctor(page)) {
+ if (!pgtable_pte_page_ctor(page, mm)) {
__free_page(page);
return NULL;
}
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 73f7421413cb..24c2d6e194fb 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -63,7 +63,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
pte = alloc_page(gfp);
if (!pte)
return NULL;
- if (!pgtable_pte_page_ctor(pte)) {
+ if (!pgtable_pte_page_ctor(pte, mm)) {
__free_page(pte);
return NULL;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5a323422d783..2a98eebeba91 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2157,11 +2157,13 @@ static inline void pgtable_init(void)
pgtable_cache_init();
}

-static inline bool pgtable_pte_page_ctor(struct page *page)
+static inline
+bool pgtable_pte_page_ctor(struct page *page, struct mm_struct *mm)
{
if (!ptlock_init(page))
return false;
__SetPageTable(page);
+ page->pt_mm = mm;
inc_zone_page_state(page, NR_PAGETABLE);
return true;
}
@@ -2170,6 +2172,7 @@ static inline void pgtable_pte_page_dtor(struct page *page)
{
ptlock_free(page);
__ClearPageTable(page);
+ page->pt_mm = NULL;
dec_zone_page_state(page, NR_PAGETABLE);
}

--
2.26.2

2020-04-28 19:49:25

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 1/7] mm: Document x86 uses a linked list of pgds

From: "Matthew Wilcox (Oracle)" <[email protected]>

x86 uses page->lru of the pages used for pgds, but that's not immediately
obvious to anyone looking to make changes. Add a struct list_head to
the union so it's clearly in use for pgds.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm_types.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4aba6c0c2ba8..9bb34e2cd5a5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -142,8 +142,13 @@ struct page {
struct list_head deferred_list;
};
struct { /* Page table pages */
- unsigned long _pt_pad_1; /* compound_head */
- pgtable_t pmd_huge_pte; /* protected by page->ptl */
+ union {
+ struct list_head pgd_list; /* x86 */
+ struct {
+ unsigned long _pt_pad_1;
+ pgtable_t pmd_huge_pte;
+ };
+ };
unsigned long _pt_pad_2; /* mapping */
union {
struct mm_struct *pt_mm; /* x86 pgds only */
--
2.26.2

2020-04-28 19:49:44

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 2/7] mm: Move pt_mm within struct page

From: "Matthew Wilcox (Oracle)" <[email protected]>

Instead of a per-arch word within struct page, use a formerly reserved
word. This word is shared with page->mapping, so it must be cleared
before being freed as it is checked in free_pages().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
arch/x86/mm/pgtable.c | 1 +
include/linux/mm_types.h | 7 ++-----
2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 7bd2c3a52297..f5f46737aea0 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -95,6 +95,7 @@ static inline void pgd_list_del(pgd_t *pgd)
struct page *page = virt_to_page(pgd);

list_del(&page->lru);
+ page->pt_mm = NULL;
}

#define UNSHARED_PTRS_PER_PGD \
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9bb34e2cd5a5..7efa12f4626f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -149,11 +149,8 @@ struct page {
pgtable_t pmd_huge_pte;
};
};
- unsigned long _pt_pad_2; /* mapping */
- union {
- struct mm_struct *pt_mm; /* x86 pgds only */
- atomic_t pt_frag_refcount; /* powerpc */
- };
+ struct mm_struct *pt_mm;
+ atomic_t pt_frag_refcount; /* powerpc */
#if ALLOC_SPLIT_PTLOCKS
spinlock_t *ptl;
#else
--
2.26.2

2020-04-28 21:43:14

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm: Document x86 uses a linked list of pgds

On Tue, Apr 28, 2020 at 12:44:43PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> x86 uses page->lru of the pages used for pgds, but that's not immediately
> obvious to anyone looking to make changes. Add a struct list_head to
> the union so it's clearly in use for pgds.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/mm_types.h | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 4aba6c0c2ba8..9bb34e2cd5a5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -142,8 +142,13 @@ struct page {
> struct list_head deferred_list;
> };
> struct { /* Page table pages */
> - unsigned long _pt_pad_1; /* compound_head */
> - pgtable_t pmd_huge_pte; /* protected by page->ptl */
> + union {
> + struct list_head pgd_list; /* x86 */

Shouldn't pgd_list_{add,del}() use this list head variable instead of lru to
complete the documentation?

Probably the list iteration loops arch/x86/* as well?

Ira

> + struct {
> + unsigned long _pt_pad_1;
> + pgtable_t pmd_huge_pte;
> + };
> + };
> unsigned long _pt_pad_2; /* mapping */
> union {
> struct mm_struct *pt_mm; /* x86 pgds only */
> --
> 2.26.2
>
>

2020-04-28 22:56:04

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm: Document x86 uses a linked list of pgds

On Tue, Apr 28, 2020 at 02:41:09PM -0700, Ira Weiny wrote:
> On Tue, Apr 28, 2020 at 12:44:43PM -0700, Matthew Wilcox wrote:
> > x86 uses page->lru of the pages used for pgds, but that's not immediately
> > obvious to anyone looking to make changes. Add a struct list_head to
> > the union so it's clearly in use for pgds.
>
> Shouldn't pgd_list_{add,del}() use this list head variable instead of lru to
> complete the documentation?
>
> Probably the list iteration loops arch/x86/* as well?

Yes, but I felt that was out of scope for this patchset. Untangling the
uses of struct page is a long and messy business; if we have to fix
everything at once, we'll never get anywhere. There's also the slab
users of page->lru instead of page->slab_list.

What I actually want to get to is:

struct page {
unsigned long flags;
union {
struct file_page file;
struct anon_page anon;
struct pt_page pt;
struct slab_page slab;
struct tail_page tail;
struct rcu_head rcu;
};
union {
atomic_t _mapcount;
...
};
atomic_t refcount;
...
};

and then we can refer to page->pt.list and so on.

2020-04-29 00:28:46

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 0/7] Record the mm_struct in the page table pages

On Tue, Apr 28, 2020 at 12:44:42PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Pages which are in use as page tables have some space unused in struct
> page. It would be handy to have a pointer to the struct mm_struct that
> they belong to so that we can handle uncorrectable errors in page tables
> more gracefully. There are a few other things we could use it for too,
> such as checking that the page table entry actually belongs to the task
> we think it ought to. This patch series does none of that, but does
> lay the groundwork for it.
>
> Matthew Wilcox (Oracle) (7):

How does it work for kernel side of virtual address space?

And your employer may be interested in semantics around
CONFIG_ARCH_WANT_HUGE_PMD_SHARE :P

--
Kirill A. Shutemov

2020-04-29 00:54:12

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm: Set pt_mm in PMD constructor

On Tue, Apr 28, 2020 at 12:44:49PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> By setting pt_mm for pages in use as page tables, we can help with
> debugging and lay the foundation for handling hardware errors in page
> tables more gracefully. It also opens up the possibility for adding
> more sanity checks in the future.
>
> Also set and clear the PageTable bit so that we know these are page tables.

As far as I can see you don't yet introduce any checks. It makes patchset
somewhat pointless.

I'm not entirely sure how such checks would look like. The single page
table tree would have at least two pt_mm: the owner and init_mm. Hugetlb
shared page tables would make a mess here. Hm?

--
Kirill A. Shutemov

2020-04-29 01:53:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 0/7] Record the mm_struct in the page table pages

On Wed, Apr 29, 2020 at 03:26:24AM +0300, Kirill A. Shutemov wrote:
> On Tue, Apr 28, 2020 at 12:44:42PM -0700, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <[email protected]>
> >
> > Pages which are in use as page tables have some space unused in struct
> > page. It would be handy to have a pointer to the struct mm_struct that
> > they belong to so that we can handle uncorrectable errors in page tables
> > more gracefully. There are a few other things we could use it for too,
> > such as checking that the page table entry actually belongs to the task
> > we think it ought to. This patch series does none of that, but does
> > lay the groundwork for it.
> >
> > Matthew Wilcox (Oracle) (7):
>
> How does it work for kernel side of virtual address space?

init_mm

> And your employer may be interested in semantics around
> CONFIG_ARCH_WANT_HUGE_PMD_SHARE :P

I was thinking about that. Right now, it's only useful for debugging
purposes (as you point out in a later email). I think it's OK if shared
PMDs aren't supported as well as regular PTEs, just because there are
so few of them that uncorrectable errors are less likely to strike in
those pages.

2020-04-29 07:36:09

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 2/7] mm: Move pt_mm within struct page

Hi Matthew,

On Tue, Apr 28, 2020 at 9:44 PM Matthew Wilcox <[email protected]> wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
> Instead of a per-arch word within struct page, use a formerly reserved
> word. This word is shared with page->mapping, so it must be cleared
> before being freed as it is checked in free_pages().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Thanks for your patch!

> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -149,11 +149,8 @@ struct page {
> pgtable_t pmd_huge_pte;
> };
> };
> - unsigned long _pt_pad_2; /* mapping */
> - union {
> - struct mm_struct *pt_mm; /* x86 pgds only */
> - atomic_t pt_frag_refcount; /* powerpc */
> - };
> + struct mm_struct *pt_mm;
> + atomic_t pt_frag_refcount; /* powerpc */

So here is now an implicit hole on 64-bit platforms, right?
Do we have any where alignof(long) != 8?

> #if ALLOC_SPLIT_PTLOCKS
> spinlock_t *ptl;
> #else

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2020-04-29 07:46:52

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 5/7] m68k: Thread mm_struct throughout page table allocation

On Tue, Apr 28, 2020 at 9:45 PM Matthew Wilcox <[email protected]> wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> An upcoming patch will pass mm_struct to the page table constructor.
> Make sure m68k has the appropriate mm_struct at the point it needs to
> call the constructor.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Reviewed-by: Geert Uytterhoeven <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2020-04-29 07:48:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 6/7] mm: Set pt_mm in PTE constructor

On Tue, Apr 28, 2020 at 9:45 PM Matthew Wilcox <[email protected]> wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> By setting pt_mm for pages in use as page tables, we can help with
> debugging and lay the foundation for handling hardware errors in page
> tables more gracefully. It also opens up the possibility for adding
> more sanity checks in the future.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

> arch/m68k/include/asm/mcf_pgalloc.h | 2 +-
> arch/m68k/mm/motorola.c | 2 +-

Reviewed-by: Geert Uytterhoeven <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2020-04-29 12:55:45

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/7] mm: Move pt_mm within struct page

On Wed, Apr 29, 2020 at 09:34:02AM +0200, Geert Uytterhoeven wrote:
> > +++ b/include/linux/mm_types.h
> > @@ -149,11 +149,8 @@ struct page {
> > pgtable_t pmd_huge_pte;
> > };
> > };
> > - unsigned long _pt_pad_2; /* mapping */
> > - union {
> > - struct mm_struct *pt_mm; /* x86 pgds only */
> > - atomic_t pt_frag_refcount; /* powerpc */
> > - };
> > + struct mm_struct *pt_mm;
> > + atomic_t pt_frag_refcount; /* powerpc */
>
> So here is now an implicit hole on 64-bit platforms, right?
> Do we have any where alignof(long) != 8?

There's an implicit hole if someone's turned on spinlock debugging and
has split pagetable locks. Without the need to allocate the spinlock
separately, the ptl will actually move from the same word as 'private'
to the same word as 'index', freeing up 'private' entirely. I don't
intend to depend on that, but it's not quite as critical to line up the
various members of struct page as it used to be.

2020-04-29 18:32:05

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm: Document x86 uses a linked list of pgds

On Tue, Apr 28, 2020 at 03:52:51PM -0700, Matthew Wilcox wrote:
> On Tue, Apr 28, 2020 at 02:41:09PM -0700, Ira Weiny wrote:
> > On Tue, Apr 28, 2020 at 12:44:43PM -0700, Matthew Wilcox wrote:
> > > x86 uses page->lru of the pages used for pgds, but that's not immediately
> > > obvious to anyone looking to make changes. Add a struct list_head to
> > > the union so it's clearly in use for pgds.
> >
> > Shouldn't pgd_list_{add,del}() use this list head variable instead of lru to
> > complete the documentation?
> >
> > Probably the list iteration loops arch/x86/* as well?
>
> Yes, but I felt that was out of scope for this patchset. Untangling the
> uses of struct page is a long and messy business; if we have to fix
> everything at once, we'll never get anywhere. There's also the slab
> users of page->lru instead of page->slab_list.

But doesn't changing the use of lru with this new name in the code also help to
identify the users?

>
> What I actually want to get to is:
>
> struct page {
> unsigned long flags;
> union {
> struct file_page file;
> struct anon_page anon;
> struct pt_page pt;
> struct slab_page slab;
> struct tail_page tail;
> struct rcu_head rcu;
> };
> union {
> atomic_t _mapcount;
> ...
> };
> atomic_t refcount;
> ...
> };
>
> and then we can refer to page->pt.list and so on.

Then later on we know exactly where page->pt.list needs to be inserted.

I'm not opposed to the patch as it is. But as someone newer it seems like the
following documents the use of lru as much if not more.

Compile tested only but feel free to merge if you like.
Ira

From 63fa92a940fa17567ab45a64b7ac058d4d41a54d Mon Sep 17 00:00:00 2001
From: Ira Weiny <[email protected]>
Date: Wed, 29 Apr 2020 11:10:59 -0700
Subject: [PATCH] mm: Complete documenting the use of lru for pgd_list

Signed-off-by: Ira Weiny <[email protected]>
---
arch/x86/mm/fault.c | 2 +-
arch/x86/mm/init_64.c | 4 ++--
arch/x86/mm/pat/set_memory.c | 2 +-
arch/x86/mm/pgtable.c | 4 ++--
arch/x86/xen/mmu_pv.c | 4 ++--
5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a51df516b87b..f07d477f8787 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -203,7 +203,7 @@ static void vmalloc_sync(void)
struct page *page;

spin_lock(&pgd_lock);
- list_for_each_entry(page, &pgd_list, lru) {
+ list_for_each_entry(page, &pgd_list, pgd_list) {
spinlock_t *pgt_lock;

/* the pgt_lock only for Xen */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3b289c2f75cd..e2ae3618a65d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -140,7 +140,7 @@ static void sync_global_pgds_l5(unsigned long start, unsigned long end)
continue;

spin_lock(&pgd_lock);
- list_for_each_entry(page, &pgd_list, lru) {
+ list_for_each_entry(page, &pgd_list, pgd_list) {
pgd_t *pgd;
spinlock_t *pgt_lock;

@@ -181,7 +181,7 @@ static void sync_global_pgds_l4(unsigned long start, unsigned long end)
continue;

spin_lock(&pgd_lock);
- list_for_each_entry(page, &pgd_list, lru) {
+ list_for_each_entry(page, &pgd_list, pgd_list) {
pgd_t *pgd;
p4d_t *p4d;
spinlock_t *pgt_lock;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 59eca6a94ce7..a1edfc593141 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -723,7 +723,7 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
if (!SHARED_KERNEL_PMD) {
struct page *page;

- list_for_each_entry(page, &pgd_list, lru) {
+ list_for_each_entry(page, &pgd_list, pgd_list) {
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 8f4255662c5a..28ea8cc3f3a2 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -87,14 +87,14 @@ static inline void pgd_list_add(pgd_t *pgd)
{
struct page *page = virt_to_page(pgd);

- list_add(&page->lru, &pgd_list);
+ list_add(&page->pgd_list, &pgd_list);
}

static inline void pgd_list_del(pgd_t *pgd)
{
struct page *page = virt_to_page(pgd);

- list_del(&page->lru);
+ list_del(&page->pgd_list);
page->pt_mm = NULL;
}

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index bbba8b17829a..df6592be3208 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -844,7 +844,7 @@ void xen_mm_pin_all(void)

spin_lock(&pgd_lock);

- list_for_each_entry(page, &pgd_list, lru) {
+ list_for_each_entry(page, &pgd_list, pgd_list) {
if (!PagePinned(page)) {
__xen_pgd_pin(&init_mm, (pgd_t *)page_address(page));
SetPageSavePinned(page);
@@ -963,7 +963,7 @@ void xen_mm_unpin_all(void)

spin_lock(&pgd_lock);

- list_for_each_entry(page, &pgd_list, lru) {
+ list_for_each_entry(page, &pgd_list, pgd_list) {
if (PageSavePinned(page)) {
BUG_ON(!PagePinned(page));
__xen_pgd_unpin(&init_mm, (pgd_t *)page_address(page));
--
2.25.1

2020-05-24 07:46:14

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 0/7] Record the mm_struct in the page table pages

On Tue, Apr 28, 2020 at 06:51:26PM -0700, Matthew Wilcox wrote:
> On Wed, Apr 29, 2020 at 03:26:24AM +0300, Kirill A. Shutemov wrote:
> > On Tue, Apr 28, 2020 at 12:44:42PM -0700, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)" <[email protected]>
> > >
> > > Pages which are in use as page tables have some space unused in struct
> > > page. It would be handy to have a pointer to the struct mm_struct that
> > > they belong to so that we can handle uncorrectable errors in page tables
> > > more gracefully. There are a few other things we could use it for too,
> > > such as checking that the page table entry actually belongs to the task
> > > we think it ought to. This patch series does none of that, but does
> > > lay the groundwork for it.
> > >
> > > Matthew Wilcox (Oracle) (7):
> >
> > How does it work for kernel side of virtual address space?
>
> init_mm

A note to keep in mind is that most of the kernel page tables are seen
as PG_reserved rather than PageTable().

> > And your employer may be interested in semantics around
> > CONFIG_ARCH_WANT_HUGE_PMD_SHARE :P
>
> I was thinking about that. Right now, it's only useful for debugging
> purposes (as you point out in a later email). I think it's OK if shared
> PMDs aren't supported as well as regular PTEs, just because there are
> so few of them that uncorrectable errors are less likely to strike in
> those pages.
>

--
Sincerely yours,
Mike.