2020-05-04 18:42:45

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table)

This is part II of ASI RFC v4. Please refer to the cover letter of
part I for an overview the ASI RFC.

https://lore.kernel.org/lkml/[email protected]/

This part introduces decorated page-table which encapsulate native page
table (e.g. a PGD) in order to provide convenient page-table management
functions, such as tracking address range mapped in a page-table or
safely handling references to another page-table.

Decorated page-table can then be used to easily create and manage page
tables to be used with ASI. It will be used by the ASI test driver (see
part III) and later by KVM ASI.

Decorated page-table is independent of ASI, and can potentially be used
anywhere a page-table is needed.

Thanks,

alex.

-----

Alexandre Chartre (13):
mm/x86: Introduce decorated page-table (dpt)
mm/dpt: Track buffers allocated for a decorated page-table
mm/dpt: Add decorated page-table entry offset functions
mm/dpt: Add decorated page-table entry allocation functions
mm/dpt: Add decorated page-table entry set functions
mm/dpt: Functions to populate a decorated page-table from a VA range
mm/dpt: Helper functions to map module into a decorated page-table
mm/dpt: Keep track of VA ranges mapped in a decorated page-table
mm/dpt: Functions to clear decorated page-table entries for a VA range
mm/dpt: Function to copy page-table entries for percpu buffer
mm/dpt: Add decorated page-table remap function
mm/dpt: Handle decorated page-table mapped range leaks and overlaps
mm/asi: Function to init decorated page-table with ASI core mappings

arch/x86/include/asm/asi.h | 2 +
arch/x86/include/asm/dpt.h | 89 +++
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/asi.c | 57 ++
arch/x86/mm/dpt.c | 1051 ++++++++++++++++++++++++++++++++++++
5 files changed, 1200 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/include/asm/dpt.h
create mode 100644 arch/x86/mm/dpt.c

--
2.18.2


2020-05-04 18:42:57

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 01/13] mm/x86: Introduce decorated page-table (dpt)

A decorated page-table (dpt) encapsulates a native page-table (e.g.
a PGD) and maintain additional attributes related to this page-table.
It aims to be the base structure for providing useful functions to
manage a page-table, such as tracking VA range mapped in a page-table
or safely handling references to another page-table.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/dpt.h | 23 +++++++++++++
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/dpt.c | 67 ++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/include/asm/dpt.h
create mode 100644 arch/x86/mm/dpt.c

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
new file mode 100644
index 000000000000..1da4d43d5e94
--- /dev/null
+++ b/arch/x86/include/asm/dpt.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_X86_MM_DPT_H
+#define ARCH_X86_MM_DPT_H
+
+#include <linux/spinlock.h>
+
+#include <asm/pgtable.h>
+
+/*
+ * A decorated page-table (dpt) encapsulates a native page-table (e.g.
+ * a PGD) and maintain additional attributes related to this page-table.
+ */
+struct dpt {
+ spinlock_t lock; /* protect all attributes */
+ pgd_t *pagetable; /* the actual page-table */
+ unsigned int alignment; /* page-table alignment */
+
+};
+
+extern struct dpt *dpt_create(unsigned int pgt_alignment);
+extern void dpt_destroy(struct dpt *dpt);
+
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index e57af263e870..5b52d854a030 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -48,7 +48,7 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o
-obj-$(CONFIG_ADDRESS_SPACE_ISOLATION) += asi.o
+obj-$(CONFIG_ADDRESS_SPACE_ISOLATION) += asi.o dpt.o

obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
new file mode 100644
index 000000000000..333e259c5b7f
--- /dev/null
+++ b/arch/x86/mm/dpt.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019, 2020, Oracle and/or its affiliates.
+ *
+ */
+
+#include <linux/slab.h>
+
+#include <asm/dpt.h>
+
+/*
+ * dpt_create - allocate a page-table and create a corresponding
+ * decorated page-table. The page-table is allocated and aligned
+ * at the specified alignment (pgt_alignment) which should be a
+ * multiple of PAGE_SIZE.
+ */
+struct dpt *dpt_create(unsigned int pgt_alignment)
+{
+ unsigned int alloc_order;
+ unsigned long pagetable;
+ struct dpt *dpt;
+
+ if (!IS_ALIGNED(pgt_alignment, PAGE_SIZE))
+ return NULL;
+
+ alloc_order = round_up(PAGE_SIZE + pgt_alignment,
+ PAGE_SIZE) >> PAGE_SHIFT;
+
+ dpt = kzalloc(sizeof(*dpt), GFP_KERNEL);
+ if (!dpt)
+ return NULL;
+
+ pagetable = (unsigned long)__get_free_pages(GFP_KERNEL_ACCOUNT |
+ __GFP_ZERO,
+ alloc_order);
+ if (!pagetable) {
+ kfree(dpt);
+ return NULL;
+ }
+ dpt->pagetable = (pgd_t *)(pagetable + pgt_alignment);
+ dpt->alignment = pgt_alignment;
+
+ spin_lock_init(&dpt->lock);
+
+ return dpt;
+}
+EXPORT_SYMBOL(dpt_create);
+
+void dpt_destroy(struct dpt *dpt)
+{
+ unsigned int pgt_alignment;
+ unsigned int alloc_order;
+
+ if (!dpt)
+ return;
+
+ if (dpt->pagetable) {
+ pgt_alignment = dpt->alignment;
+ alloc_order = round_up(PAGE_SIZE + pgt_alignment,
+ PAGE_SIZE) >> PAGE_SHIFT;
+ free_pages((unsigned long)(dpt->pagetable) - pgt_alignment,
+ alloc_order);
+ }
+
+ kfree(dpt);
+}
+EXPORT_SYMBOL(dpt_destroy);
--
2.18.2

2020-05-04 18:43:11

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 05/13] mm/dpt: Add decorated page-table entry set functions

Add wrappers around the page table entry (pgd/p4d/pud/pmd) set
functions which check that an existing entry is not being
overwritten.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/mm/dpt.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 126 insertions(+)

diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index a2f54ba00255..7a1b4cd53b03 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -258,6 +258,132 @@ static p4d_t *dpt_p4d_alloc(struct dpt *dpt, pgd_t *pgd, unsigned long addr)
return p4d;
}

+/*
+ * dpt_set_pXX() functions are equivalent to kernel set_pXX() functions
+ * but, in addition, they ensure that they are not overwriting an already
+ * existing reference in the decorated page table. Otherwise an error is
+ * returned.
+ */
+
+static int dpt_set_pte(struct dpt *dpt, pte_t *pte, pte_t pte_value)
+{
+#ifdef DEBUG
+ /*
+ * The pte pointer should come from dpt_pte_alloc() or dpt_pte_offset()
+ * both of which check if the pointer is in the decorated page table.
+ * So this is a paranoid check to ensure the pointer is really in the
+ * decorated page table.
+ */
+ if (!dpt_valid_offset(dpt, pte)) {
+ pr_err("DPT %p: PTE %px not found\n", dpt, pte);
+ return -EINVAL;
+ }
+#endif
+ set_pte(pte, pte_value);
+
+ return 0;
+}
+
+static int dpt_set_pmd(struct dpt *dpt, pmd_t *pmd, pmd_t pmd_value)
+{
+#ifdef DEBUG
+ /*
+ * The pmd pointer should come from dpt_pmd_alloc() or dpt_pmd_offset()
+ * both of which check if the pointer is in the decorated page table.
+ * So this is a paranoid check to ensure the pointer is really in the
+ * decorated page table.
+ */
+ if (!dpt_valid_offset(dpt, pmd)) {
+ pr_err("DPT %p: PMD %px not found\n", dpt, pmd);
+ return -EINVAL;
+ }
+#endif
+ if (pmd_val(*pmd) == pmd_val(pmd_value))
+ return 0;
+
+ if (!pmd_none(*pmd)) {
+ pr_err("DPT %p: PMD %px overwriting %lx with %lx\n",
+ dpt, pmd, pmd_val(*pmd), pmd_val(pmd_value));
+ return -EBUSY;
+ }
+
+ set_pmd(pmd, pmd_value);
+
+ return 0;
+}
+
+static int dpt_set_pud(struct dpt *dpt, pud_t *pud, pud_t pud_value)
+{
+#ifdef DEBUG
+ /*
+ * The pud pointer should come from dpt_pud_alloc() or dpt_pud_offset()
+ * both of which check if the pointer is in the decorated page table.
+ * So this is a paranoid check to ensure the pointer is really in the
+ * decorated page table.
+ */
+ if (!dpt_valid_offset(dpt, pud)) {
+ pr_err("DPT %p: PUD %px not found\n", dpt, pud);
+ return -EINVAL;
+ }
+#endif
+ if (pud_val(*pud) == pud_val(pud_value))
+ return 0;
+
+ if (!pud_none(*pud)) {
+ pr_err("DPT %p: PUD %px overwriting %lx with %lx\n",
+ dpt, pud, pud_val(*pud), pud_val(pud_value));
+ return -EBUSY;
+ }
+
+ set_pud(pud, pud_value);
+
+ return 0;
+}
+
+static int dpt_set_p4d(struct dpt *dpt, p4d_t *p4d, p4d_t p4d_value)
+{
+#ifdef DEBUG
+ /*
+ * The p4d pointer should come from dpt_p4d_alloc() or dpt_p4d_offset()
+ * both of which check if the pointer is in the decorated page table.
+ * So this is a paranoid check to ensure the pointer is really in the
+ * decorated page table.
+ */
+ if (!dpt_valid_offset(dpt, p4d)) {
+ pr_err("DPT %p: P4D %px not found\n", dpt, p4d);
+ return -EINVAL;
+ }
+#endif
+ if (p4d_val(*p4d) == p4d_val(p4d_value))
+ return 0;
+
+ if (!p4d_none(*p4d)) {
+ pr_err("DPT %p: P4D %px overwriting %lx with %lx\n",
+ dpt, p4d, p4d_val(*p4d), p4d_val(p4d_value));
+ return -EBUSY;
+ }
+
+ set_p4d(p4d, p4d_value);
+
+ return 0;
+}
+
+static int dpt_set_pgd(struct dpt *dpt, pgd_t *pgd, pgd_t pgd_value)
+{
+ if (pgd_val(*pgd) == pgd_val(pgd_value))
+ return 0;
+
+ if (!pgd_none(*pgd)) {
+ pr_err("DPT %p: PGD %px overwriting %lx with %lx\n",
+ dpt, pgd, pgd_val(*pgd), pgd_val(pgd_value));
+ return -EBUSY;
+ }
+
+ set_pgd(pgd, pgd_value);
+
+ return 0;
+}
+
/*
* dpt_create - allocate a page-table and create a corresponding
* decorated page-table. The page-table is allocated and aligned
--
2.18.2

2020-05-04 18:43:16

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 06/13] mm/dpt: Functions to populate a decorated page-table from a VA range

Provide functions to copy page-table entries from the kernel page-table
to a decorated page-table for a specified VA range. These functions are
based on the copy_pxx_range() functions defined in mm/memory.c. A first
difference is that a level parameter can be specified to indicate the
page-table level (PGD, P4D, PUD PMD, PTE) at which the copy should be
done. Also functions don't rely on mm or vma, and they don't alter the
source page-table even if an entry is bad. Finally, the VA range start
and size don't need to be page-aligned.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/dpt.h | 3 +
arch/x86/mm/dpt.c | 205 +++++++++++++++++++++++++++++++++++++
2 files changed, 208 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index b9cba051ebf2..85d2c5051acb 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -40,5 +40,8 @@ struct dpt {

extern struct dpt *dpt_create(unsigned int pgt_alignment);
extern void dpt_destroy(struct dpt *dpt);
+extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
+ enum page_table_level level);
+extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);

#endif
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 7a1b4cd53b03..0e725344b921 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -384,6 +384,211 @@ static int dpt_set_pgd(struct dpt *dpt, pgd_t *pgd, pgd_t pgd_value)
return 0;
}

+static int dpt_copy_pte_range(struct dpt *dpt, pmd_t *dst_pmd, pmd_t *src_pmd,
+ unsigned long addr, unsigned long end)
+{
+ pte_t *src_pte, *dst_pte;
+
+ dst_pte = dpt_pte_alloc(dpt, dst_pmd, addr);
+ if (IS_ERR(dst_pte))
+ return PTR_ERR(dst_pte);
+
+ addr &= PAGE_MASK;
+ src_pte = pte_offset_map(src_pmd, addr);
+
+ do {
+ dpt_set_pte(dpt, dst_pte, *src_pte);
+
+ } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr < end);
+
+ return 0;
+}
+
+static int dpt_copy_pmd_range(struct dpt *dpt, pud_t *dst_pud, pud_t *src_pud,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ pmd_t *src_pmd, *dst_pmd;
+ unsigned long next;
+ int err;
+
+ dst_pmd = dpt_pmd_alloc(dpt, dst_pud, addr);
+ if (IS_ERR(dst_pmd))
+ return PTR_ERR(dst_pmd);
+
+ src_pmd = pmd_offset(src_pud, addr);
+
+ do {
+ next = pmd_addr_end(addr, end);
+ if (level == PGT_LEVEL_PMD || pmd_none(*src_pmd) ||
+ pmd_trans_huge(*src_pmd) || pmd_devmap(*src_pmd)) {
+ err = dpt_set_pmd(dpt, dst_pmd, *src_pmd);
+ if (err)
+ return err;
+ continue;
+ }
+
+ if (!pmd_present(*src_pmd)) {
+ pr_warn("DPT %p: PMD not present for [%lx,%lx]\n",
+ dpt, addr, next - 1);
+ pmd_clear(dst_pmd);
+ continue;
+ }
+
+ err = dpt_copy_pte_range(dpt, dst_pmd, src_pmd, addr, next);
+ if (err) {
+ pr_err("DPT %p: PMD error copying PTE addr=%lx next=%lx\n",
+ dpt, addr, next);
+ return err;
+ }
+
+ } while (dst_pmd++, src_pmd++, addr = next, addr < end);
+
+ return 0;
+}
+
+static int dpt_copy_pud_range(struct dpt *dpt, p4d_t *dst_p4d, p4d_t *src_p4d,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ pud_t *src_pud, *dst_pud;
+ unsigned long next;
+ int err;
+
+ dst_pud = dpt_pud_alloc(dpt, dst_p4d, addr);
+ if (IS_ERR(dst_pud))
+ return PTR_ERR(dst_pud);
+
+ src_pud = pud_offset(src_p4d, addr);
+
+ do {
+ next = pud_addr_end(addr, end);
+ if (level == PGT_LEVEL_PUD || pud_none(*src_pud) ||
+ pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) {
+ err = dpt_set_pud(dpt, dst_pud, *src_pud);
+ if (err)
+ return err;
+ continue;
+ }
+
+ err = dpt_copy_pmd_range(dpt, dst_pud, src_pud, addr, next,
+ level);
+ if (err) {
+ pr_err("DPT %p: PUD error copying PMD addr=%lx next=%lx\n",
+ dpt, addr, next);
+ return err;
+ }
+
+ } while (dst_pud++, src_pud++, addr = next, addr < end);
+
+ return 0;
+}
+
+static int dpt_copy_p4d_range(struct dpt *dpt, pgd_t *dst_pgd, pgd_t *src_pgd,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ p4d_t *src_p4d, *dst_p4d;
+ unsigned long next;
+ int err;
+
+ dst_p4d = dpt_p4d_alloc(dpt, dst_pgd, addr);
+ if (IS_ERR(dst_p4d))
+ return PTR_ERR(dst_p4d);
+
+ src_p4d = p4d_offset(src_pgd, addr);
+
+ do {
+ next = p4d_addr_end(addr, end);
+ if (level == PGT_LEVEL_P4D || p4d_none(*src_p4d)) {
+ err = dpt_set_p4d(dpt, dst_p4d, *src_p4d);
+ if (err)
+ return err;
+ continue;
+ }
+
+ err = dpt_copy_pud_range(dpt, dst_p4d, src_p4d, addr, next,
+ level);
+ if (err) {
+ pr_err("DPT %p: P4D error copying PUD addr=%lx next=%lx\n",
+ dpt, addr, next);
+ return err;
+ }
+
+ } while (dst_p4d++, src_p4d++, addr = next, addr < end);
+
+ return 0;
+}
+
+static int dpt_copy_pgd_range(struct dpt *dpt,
+ pgd_t *dst_pagetable, pgd_t *src_pagetable,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ pgd_t *src_pgd, *dst_pgd;
+ unsigned long next;
+ int err;
+
+ dst_pgd = pgd_offset_pgd(dst_pagetable, addr);
+ src_pgd = pgd_offset_pgd(src_pagetable, addr);
+
+ do {
+ next = pgd_addr_end(addr, end);
+ if (level == PGT_LEVEL_PGD || pgd_none(*src_pgd)) {
+ err = dpt_set_pgd(dpt, dst_pgd, *src_pgd);
+ if (err)
+ return err;
+ continue;
+ }
+
+ err = dpt_copy_p4d_range(dpt, dst_pgd, src_pgd, addr, next,
+ level);
+ if (err) {
+ pr_err("DPT %p: PGD error copying P4D addr=%lx next=%lx\n",
+ dpt, addr, next);
+ return err;
+ }
+
+ } while (dst_pgd++, src_pgd++, addr = next, addr < end);
+
+ return 0;
+}
+
+/*
+ * Copy page table entries from the current page table (i.e. from the
+ * kernel page table) to the specified decorated page-table. The level
+ * parameter specifies the page-table level (PGD, P4D, PUD PMD, PTE)
+ * at which the copy should be done.
+ */
+int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
+ enum page_table_level level)
+{
+ unsigned long addr = (unsigned long)ptr;
+ unsigned long end = addr + ((unsigned long)size);
+ unsigned long flags;
+ int err;
+
+ pr_debug("DPT %p: MAP %px/%lx/%d\n", dpt, ptr, size, level);
+
+ spin_lock_irqsave(&dpt->lock, flags);
+ err = dpt_copy_pgd_range(dpt, dpt->pagetable, current->mm->pgd,
+ addr, end, level);
+ spin_unlock_irqrestore(&dpt->lock, flags);
+
+ return err;
+}
+EXPORT_SYMBOL(dpt_map_range);
+
+/*
+ * Copy page-table PTE entries from the current page-table to the
+ * specified decorated page-table.
+ */
+int dpt_map(struct dpt *dpt, void *ptr, unsigned long size)
+{
+ return dpt_map_range(dpt, ptr, size, PGT_LEVEL_PTE);
+}
+EXPORT_SYMBOL(dpt_map);
+
/*
* dpt_create - allocate a page-table and create a corresponding
* decorated page-table. The page-table is allocated and aligned
--
2.18.2

2020-05-04 18:43:27

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 09/13] mm/dpt: Functions to clear decorated page-table entries for a VA range

Provide functions to clear page-table entries in a decorated page-table
for a specified VA range. Functions also check that the clearing
effectively happens in the decorated page-table and there is no crossing
of the decorated page-table boundary (through references to another page
table), so that another page table is not modified by mistake.

As information (address, size, page-table level) about VA ranges mapped
to the decorated page-table is tracked, clearing is done with just
specifying the start address of the range.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/dpt.h | 1 +
arch/x86/mm/dpt.c | 135 +++++++++++++++++++++++++++++++++++++
2 files changed, 136 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 0d74afb10141..01727ef0577e 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -56,6 +56,7 @@ extern void dpt_destroy(struct dpt *dpt);
extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
enum page_table_level level);
extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);
+extern void dpt_unmap(struct dpt *dpt, void *ptr);

static inline int dpt_map_module(struct dpt *dpt, char *module_name)
{
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index 12eb0d794d84..c495c9b59b3e 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -636,6 +636,141 @@ int dpt_map(struct dpt *dpt, void *ptr, unsigned long size)
}
EXPORT_SYMBOL(dpt_map);

+static void dpt_clear_pte_range(struct dpt *dpt, pmd_t *pmd,
+ unsigned long addr, unsigned long end)
+{
+ pte_t *pte;
+
+ pte = dpt_pte_offset(dpt, pmd, addr);
+ if (IS_ERR(pte))
+ return;
+
+ do {
+ pte_clear(NULL, addr, pte);
+ } while (pte++, addr += PAGE_SIZE, addr < end);
+}
+
+static void dpt_clear_pmd_range(struct dpt *dpt, pud_t *pud,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ unsigned long next;
+ pmd_t *pmd;
+
+ pmd = dpt_pmd_offset(dpt, pud, addr);
+ if (IS_ERR(pmd))
+ return;
+
+ do {
+ next = pmd_addr_end(addr, end);
+ if (pmd_none(*pmd))
+ continue;
+ if (level == PGT_LEVEL_PMD || pmd_trans_huge(*pmd) ||
+ pmd_devmap(*pmd) || !pmd_present(*pmd)) {
+ pmd_clear(pmd);
+ continue;
+ }
+ dpt_clear_pte_range(dpt, pmd, addr, next);
+ } while (pmd++, addr = next, addr < end);
+}
+
+static void dpt_clear_pud_range(struct dpt *dpt, p4d_t *p4d,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ unsigned long next;
+ pud_t *pud;
+
+ pud = dpt_pud_offset(dpt, p4d, addr);
+ if (IS_ERR(pud))
+ return;
+
+ do {
+ next = pud_addr_end(addr, end);
+ if (pud_none(*pud))
+ continue;
+ if (level == PGT_LEVEL_PUD || pud_trans_huge(*pud) ||
+ pud_devmap(*pud)) {
+ pud_clear(pud);
+ continue;
+ }
+ dpt_clear_pmd_range(dpt, pud, addr, next, level);
+ } while (pud++, addr = next, addr < end);
+}
+
+static void dpt_clear_p4d_range(struct dpt *dpt, pgd_t *pgd,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ unsigned long next;
+ p4d_t *p4d;
+
+ p4d = dpt_p4d_offset(dpt, pgd, addr);
+ if (IS_ERR(p4d))
+ return;
+
+ do {
+ next = p4d_addr_end(addr, end);
+ if (p4d_none(*p4d))
+ continue;
+ if (level == PGT_LEVEL_P4D) {
+ p4d_clear(p4d);
+ continue;
+ }
+ dpt_clear_pud_range(dpt, p4d, addr, next, level);
+ } while (p4d++, addr = next, addr < end);
+}
+
+static void dpt_clear_pgd_range(struct dpt *dpt, pgd_t *pagetable,
+ unsigned long addr, unsigned long end,
+ enum page_table_level level)
+{
+ unsigned long next;
+ pgd_t *pgd;
+
+ pgd = pgd_offset_pgd(pagetable, addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(*pgd))
+ continue;
+ if (level == PGT_LEVEL_PGD) {
+ pgd_clear(pgd);
+ continue;
+ }
+ dpt_clear_p4d_range(dpt, pgd, addr, next, level);
+ } while (pgd++, addr = next, addr < end);
+}
+
+/*
+ * Clear page table entries in the specified decorated page-table.
+ */
+void dpt_unmap(struct dpt *dpt, void *ptr)
+{
+ struct dpt_range_mapping *range_mapping;
+ unsigned long addr, end;
+ unsigned long flags;
+
+ spin_lock_irqsave(&dpt->lock, flags);
+
+ range_mapping = dpt_get_range_mapping(dpt, ptr);
+ if (!range_mapping) {
+ pr_debug("DPT %p: UNMAP %px - not mapped\n", dpt, ptr);
+ goto done;
+ }
+
+ addr = (unsigned long)range_mapping->ptr;
+ end = addr + range_mapping->size;
+ pr_debug("DPT %p: UNMAP %px/%lx/%d\n", dpt, ptr,
+ range_mapping->size, range_mapping->level);
+ dpt_clear_pgd_range(dpt, dpt->pagetable, addr, end,
+ range_mapping->level);
+ list_del(&range_mapping->list);
+ kfree(range_mapping);
+done:
+ spin_unlock_irqrestore(&dpt->lock, flags);
+}
+EXPORT_SYMBOL(dpt_unmap);
+
/*
* dpt_create - allocate a page-table and create a corresponding
* decorated page-table. The page-table is allocated and aligned
--
2.18.2

2020-05-04 18:44:03

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 07/13] mm/dpt: Helper functions to map module into a decorated page-table

Add helper functions to easily map a module into a decorated page-table.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/dpt.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index 85d2c5051acb..5a38d97a70a8 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -2,6 +2,7 @@
#ifndef ARCH_X86_MM_DPT_H
#define ARCH_X86_MM_DPT_H

+#include <linux/module.h>
#include <linux/spinlock.h>
#include <linux/xarray.h>

@@ -44,4 +45,24 @@ extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
enum page_table_level level);
extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);

+static inline int dpt_map_module(struct dpt *dpt, char *module_name)
+{
+ struct module *module;
+
+ module = find_module(module_name);
+ if (!module)
+ return -ESRCH;
+
+ return dpt_map(dpt, module->core_layout.base, module->core_layout.size);
+}
+
+/*
+ * Copy the memory mapping for the current module. This is defined as a
+ * macro to ensure it is expanded in the module making the call so that
+ * THIS_MODULE has the correct value.
+ */
+#define DPT_MAP_THIS_MODULE(dpt) \
+ (dpt_map(dpt, THIS_MODULE->core_layout.base, \
+ THIS_MODULE->core_layout.size))
+
#endif
--
2.18.2

2020-05-04 18:44:48

by Alexandre Chartre

[permalink] [raw]
Subject: [RFC v4][PATCH part-2 11/13] mm/dpt: Add decorated page-table remap function

Add a function to remap an already mapped buffer with a new address in
a decorated page-table: the already mapped buffer is unmapped, and a
new mapping is added for the specified new address.

This is useful to track and remap a buffer which can be freed and
then reallocated.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/dpt.h | 1 +
arch/x86/mm/dpt.c | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/dpt.h b/arch/x86/include/asm/dpt.h
index fd8c1b84ffe2..3234ba968d80 100644
--- a/arch/x86/include/asm/dpt.h
+++ b/arch/x86/include/asm/dpt.h
@@ -57,6 +57,7 @@ extern int dpt_map_range(struct dpt *dpt, void *ptr, size_t size,
enum page_table_level level);
extern int dpt_map(struct dpt *dpt, void *ptr, unsigned long size);
extern void dpt_unmap(struct dpt *dpt, void *ptr);
+extern int dpt_remap(struct dpt *dpt, void **mapping, void *ptr, size_t size);

static inline int dpt_map_module(struct dpt *dpt, char *module_name)
{
diff --git a/arch/x86/mm/dpt.c b/arch/x86/mm/dpt.c
index adc59f9ed876..9517e3081716 100644
--- a/arch/x86/mm/dpt.c
+++ b/arch/x86/mm/dpt.c
@@ -809,6 +809,31 @@ int dpt_map_percpu(struct dpt *dpt, void *percpu_ptr, size_t size)
}
EXPORT_SYMBOL(dpt_map_percpu);

+int dpt_remap(struct dpt *dpt, void **current_ptrp, void *new_ptr, size_t size)
+{
+ void *current_ptr = *current_ptrp;
+ int err;
+
+ if (current_ptr == new_ptr) {
+ /* no change, already mapped */
+ return 0;
+ }
+
+ if (current_ptr) {
+ dpt_unmap(dpt, current_ptr);
+ *current_ptrp = NULL;
+ }
+
+ err = dpt_map(dpt, new_ptr, size);
+ if (err)
+ return err;
+
+ *current_ptrp = new_ptr;
+
+ return 0;
+}
+EXPORT_SYMBOL(dpt_remap);
+
/*
* dpt_create - allocate a page-table and create a corresponding
* decorated page-table. The page-table is allocated and aligned
--
2.18.2

2020-05-14 09:32:23

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table)

Hello Alexandre,

On Mon, May 04, 2020 at 04:57:57PM +0200, Alexandre Chartre wrote:
> This is part II of ASI RFC v4. Please refer to the cover letter of
> part I for an overview the ASI RFC.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> This part introduces decorated page-table which encapsulate native page
> table (e.g. a PGD) in order to provide convenient page-table management
> functions, such as tracking address range mapped in a page-table or
> safely handling references to another page-table.
>
> Decorated page-table can then be used to easily create and manage page
> tables to be used with ASI. It will be used by the ASI test driver (see
> part III) and later by KVM ASI.
>
> Decorated page-table is independent of ASI, and can potentially be used
> anywhere a page-table is needed.

This is very impressive work!

I wonder why did you decide to make dpt x86-specific? Unless I've missed
simething, the dpt implementation does not rely on anything architecture
specific and can go straight to linux/mm.

Another thing that comes to mind is that we already have a very
decorated page table, which is mm_struct. I admit that my attempt to
split out the core page table bits from the mm_struct [1] didn't went
far, but I still think we need a first class abstraction for the page
table that will be used by both user memory management and the
management of the reduced kernel address spaces.


[1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=pg_table/v0.0

> Thanks,
>
> alex.
>
> -----
>
> Alexandre Chartre (13):
> mm/x86: Introduce decorated page-table (dpt)
> mm/dpt: Track buffers allocated for a decorated page-table
> mm/dpt: Add decorated page-table entry offset functions
> mm/dpt: Add decorated page-table entry allocation functions
> mm/dpt: Add decorated page-table entry set functions
> mm/dpt: Functions to populate a decorated page-table from a VA range
> mm/dpt: Helper functions to map module into a decorated page-table
> mm/dpt: Keep track of VA ranges mapped in a decorated page-table
> mm/dpt: Functions to clear decorated page-table entries for a VA range
> mm/dpt: Function to copy page-table entries for percpu buffer
> mm/dpt: Add decorated page-table remap function
> mm/dpt: Handle decorated page-table mapped range leaks and overlaps
> mm/asi: Function to init decorated page-table with ASI core mappings
>
> arch/x86/include/asm/asi.h | 2 +
> arch/x86/include/asm/dpt.h | 89 +++
> arch/x86/mm/Makefile | 2 +-
> arch/x86/mm/asi.c | 57 ++
> arch/x86/mm/dpt.c | 1051 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 1200 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/include/asm/dpt.h
> create mode 100644 arch/x86/mm/dpt.c
>
> --
> 2.18.2
>

--
Sincerely yours,
Mike.

2020-05-14 11:46:13

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [RFC v4][PATCH part-2 00/13] ASI - Part II (Decorated Page-Table)


On 5/14/20 11:29 AM, Mike Rapoport wrote:
> Hello Alexandre,
>
> On Mon, May 04, 2020 at 04:57:57PM +0200, Alexandre Chartre wrote:
>> This is part II of ASI RFC v4. Please refer to the cover letter of
>> part I for an overview the ASI RFC.
>>
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> This part introduces decorated page-table which encapsulate native page
>> table (e.g. a PGD) in order to provide convenient page-table management
>> functions, such as tracking address range mapped in a page-table or
>> safely handling references to another page-table.
>>
>> Decorated page-table can then be used to easily create and manage page
>> tables to be used with ASI. It will be used by the ASI test driver (see
>> part III) and later by KVM ASI.
>>
>> Decorated page-table is independent of ASI, and can potentially be used
>> anywhere a page-table is needed.

Hi Mike,

> This is very impressive work!
>
> I wonder why did you decide to make dpt x86-specific? Unless I've missed
> simething, the dpt implementation does not rely on anything architecture
> specific and can go straight to linux/mm.

Correct, this is not x86 specific. I put it in arch/x86 because that's currently
the only place were I use it, but it can be moved to linux/mm.

> Another thing that comes to mind is that we already have a very
> decorated page table, which is mm_struct.

mm_struct doesn't define a generic page-table encapsulation. mm_struct references
a page table (i.e. PGD) and adds all kind of attributes needed for mm management but
not necessarily related to page-table.

> I admit that my attempt to
> split out the core page table bits from the mm_struct [1] didn't went
> far, but I still think we need a first class abstraction for the page
> table that will be used by both user memory management and the
> management of the reduced kernel address spaces.

Agree. I remember your attempt to extract the page-table from mm_struct; this is
not a simple work! For ASI, I didn't need mm, so it was simpler to built a simple
decorated page-table without attempting to use with mm (at least for now).

Thanks,

alex.

PS: if you want to play with dpt, there's a bug in dpt_destroy(), patch 08 adds a
a double free of dpt->backend_pages pages.

>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=pg_table/v0.0
>
>> Thanks,
>>
>> alex.
>>
>> -----
>>
>> Alexandre Chartre (13):
>> mm/x86: Introduce decorated page-table (dpt)
>> mm/dpt: Track buffers allocated for a decorated page-table
>> mm/dpt: Add decorated page-table entry offset functions
>> mm/dpt: Add decorated page-table entry allocation functions
>> mm/dpt: Add decorated page-table entry set functions
>> mm/dpt: Functions to populate a decorated page-table from a VA range
>> mm/dpt: Helper functions to map module into a decorated page-table
>> mm/dpt: Keep track of VA ranges mapped in a decorated page-table
>> mm/dpt: Functions to clear decorated page-table entries for a VA range
>> mm/dpt: Function to copy page-table entries for percpu buffer
>> mm/dpt: Add decorated page-table remap function
>> mm/dpt: Handle decorated page-table mapped range leaks and overlaps
>> mm/asi: Function to init decorated page-table with ASI core mappings
>>
>> arch/x86/include/asm/asi.h | 2 +
>> arch/x86/include/asm/dpt.h | 89 +++
>> arch/x86/mm/Makefile | 2 +-
>> arch/x86/mm/asi.c | 57 ++
>> arch/x86/mm/dpt.c | 1051 ++++++++++++++++++++++++++++++++++++
>> 5 files changed, 1200 insertions(+), 1 deletion(-)
>> create mode 100644 arch/x86/include/asm/dpt.h
>> create mode 100644 arch/x86/mm/dpt.c
>>
>> --
>> 2.18.2
>>
>