Hi,
This can be used as a debugging tool for dumping the second stage
page-tables under debugfs.
From the previous feedback I re-worked the series and added support for
guest page-tables dumping under VHE & nVHE configuration. I extended the
list of reviewers as I missed the interested parties in the first round.
When CONFIG_NVHE_EL2_PTDUMP_DEBUGFS is enabled under pKVM environment,
ptdump registers the 'host_stage2_kernel_page_tables' entry with debugfs.
Guests are registering a file named '%u_guest_stage2_page_tables' when
they are created.
This allows us to dump the host stage-2 page-tables with the following command:
cat /sys/kernel/debug/host_stage2_kernel_page_tables.
The output is showing the entries in the following format:
<IPA range> <size> <descriptor type> <access permissions> <mem_attributes>
The tool interprets the pKVM ownership annotation stored in the invalid
entries and dumps to the console the ownership information. To be able
to access the host stage-2 page-tables from the kernel, a new hypervisor
call was introduced which allows us to snapshot the page-tables in a host
provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
as this should be used under debugging environment.
Link to the first version:
https://lore.kernel.org/all/[email protected]/
Changelog:
v1 -> v2:
* use the stage-2 pagetable walker for dumping descriptors instead of
the one provided by ptdump.
* support for guests pagetables dumping under VHE/nVHE non-protected
Thanks,
Sebastian Ene (11):
KVM: arm64: Add snap shooting the host stage-2 pagetables
arm64: ptdump: Use the mask from the state structure
arm64: ptdump: Add the walker function to the ptdump info structure
KVM: arm64: Move pagetable definitions to common header
arm64: ptdump: Introduce stage-2 pagetables format description
arm64: ptdump: Add hooks on debugfs file operations
arm64: ptdump: Register a debugfs entry for the host stage-2
page-tables
arm64: ptdump: Parse the host stage-2 page-tables from the snapshot
arm64: ptdump: Interpret memory attributes based on runtime
configuration
arm64: ptdump: Interpret pKVM ownership annotations
arm64: ptdump: Add support for guest stage-2 pagetables dumping
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 85 +++
arch/arm64/include/asm/ptdump.h | 27 +-
arch/arm64/kvm/Kconfig | 12 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 8 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 18 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 103 ++++
arch/arm64/kvm/hyp/pgtable.c | 98 ++--
arch/arm64/kvm/mmu.c | 3 +
arch/arm64/mm/ptdump.c | 487 +++++++++++++++++-
arch/arm64/mm/ptdump_debugfs.c | 42 +-
11 files changed, 822 insertions(+), 62 deletions(-)
--
2.42.0.655.g421f12c284-goog
Stage-2 needs a dedicated walk function to be able to parse concatenated
pagetables. The ptdump info structure is used to hold different
configuration options for the walker. This structure is registered with
the debugfs entry and is stored in the argument for the debugfs file.
Hence, in preparation for parsing the stage-2 pagetables add the walk
function as an argument for the debugfs file.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/include/asm/ptdump.h | 1 +
arch/arm64/mm/ptdump.c | 1 +
arch/arm64/mm/ptdump_debugfs.c | 3 ++-
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 581caac525b0..1f6e0aabf16a 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -19,6 +19,7 @@ struct ptdump_info {
struct mm_struct *mm;
const struct addr_marker *markers;
unsigned long base_addr;
+ void (*ptdump_walk)(struct seq_file *s, struct ptdump_info *info);
};
void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 8761a70f916f..d531e24ea0b2 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -346,6 +346,7 @@ static struct ptdump_info kernel_ptdump_info = {
.mm = &init_mm,
.markers = address_markers,
.base_addr = PAGE_OFFSET,
+ .ptdump_walk = &ptdump_walk,
};
void ptdump_check_wx(void)
diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 68bf1a125502..7564519db1e6 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -10,7 +10,8 @@ static int ptdump_show(struct seq_file *m, void *v)
struct ptdump_info *info = m->private;
get_online_mems();
- ptdump_walk(m, info);
+ if (info->ptdump_walk)
+ info->ptdump_walk(m, info);
put_online_mems();
return 0;
}
--
2.42.0.655.g421f12c284-goog
Add an array which holds human readable information about the format of
a stage-2 descriptor. The array is then used by the descriptor parser
to extract information about the memory attributes.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/mm/ptdump.c | 87 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index d531e24ea0b2..58a4ea975497 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -24,6 +24,7 @@
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
#include <asm/ptdump.h>
+#include <asm/kvm_pgtable.h>
enum address_markers_idx {
@@ -171,6 +172,66 @@ static const struct prot_bits pte_bits[] = {
}
};
+static const struct prot_bits stage2_pte_bits[] = {
+ {
+ .mask = PTE_VALID,
+ .val = PTE_VALID,
+ .set = " ",
+ .clear = "F",
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_HI_S2_XN,
+ .val = KVM_PTE_LEAF_ATTR_HI_S2_XN,
+ .set = "XN",
+ .clear = " ",
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R,
+ .val = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R,
+ .set = "R",
+ .clear = " ",
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
+ .val = KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
+ .set = "W",
+ .clear = " ",
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_AF,
+ .val = KVM_PTE_LEAF_ATTR_LO_S2_AF,
+ .set = "AF",
+ .clear = " ",
+ }, {
+ .mask = PTE_NG,
+ .val = PTE_NG,
+ .set = "FnXS",
+ .clear = " ",
+ }, {
+ .mask = PTE_CONT,
+ .val = PTE_CONT,
+ .set = "CON",
+ .clear = " ",
+ }, {
+ .mask = PTE_TABLE_BIT,
+ .val = PTE_TABLE_BIT,
+ .set = " ",
+ .clear = "BLK",
+ }, {
+ .mask = KVM_PGTABLE_PROT_SW0,
+ .val = KVM_PGTABLE_PROT_SW0,
+ .set = "SW0", /* PKVM_PAGE_SHARED_OWNED */
+ }, {
+ .mask = KVM_PGTABLE_PROT_SW1,
+ .val = KVM_PGTABLE_PROT_SW1,
+ .set = "SW1", /* PKVM_PAGE_SHARED_BORROWED */
+ }, {
+ .mask = KVM_PGTABLE_PROT_SW2,
+ .val = KVM_PGTABLE_PROT_SW2,
+ .set = "SW2",
+ }, {
+ .mask = KVM_PGTABLE_PROT_SW3,
+ .val = KVM_PGTABLE_PROT_SW3,
+ .set = "SW3",
+ },
+};
+
struct pg_level {
const struct prot_bits *bits;
const char *name;
@@ -202,6 +263,26 @@ static struct pg_level pg_level[] = {
},
};
+static struct pg_level stage2_pg_level[] = {
+ { /* pgd */
+ .name = "PGD",
+ .bits = stage2_pte_bits,
+ .num = ARRAY_SIZE(stage2_pte_bits),
+ }, { /* pud */
+ .name = (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
+ .bits = stage2_pte_bits,
+ .num = ARRAY_SIZE(stage2_pte_bits),
+ }, { /* pmd */
+ .name = (CONFIG_PGTABLE_LEVELS > 2) ? "PMD" : "PGD",
+ .bits = stage2_pte_bits,
+ .num = ARRAY_SIZE(stage2_pte_bits),
+ }, { /* pte */
+ .name = "PTE",
+ .bits = stage2_pte_bits,
+ .num = ARRAY_SIZE(stage2_pte_bits),
+ },
+};
+
static void dump_prot(struct pg_state *st, const struct prot_bits *bits,
size_t num)
{
@@ -340,6 +421,12 @@ static void __init ptdump_initialize(void)
if (pg_level[i].bits)
for (j = 0; j < pg_level[i].num; j++)
pg_level[i].mask |= pg_level[i].bits[j].mask;
+
+ for (i = 0; i < ARRAY_SIZE(stage2_pg_level); i++)
+ if (stage2_pg_level[i].bits)
+ for (j = 0; j < stage2_pg_level[i].num; j++)
+ stage2_pg_level[i].mask |=
+ stage2_pg_level[i].bits[j].mask;
}
static struct ptdump_info kernel_ptdump_info = {
--
2.42.0.655.g421f12c284-goog
Introduce callbacks invoked when the debugfs entry is accessed from
userspace. This hooks will allow us to allocate and prepare the memory
resources used by ptdump when the debugfs file is opened/closed.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/include/asm/ptdump.h | 3 +++
arch/arm64/mm/ptdump.c | 1 +
arch/arm64/mm/ptdump_debugfs.c | 34 ++++++++++++++++++++++++++++++++-
3 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 1f6e0aabf16a..88dcab1dab97 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -19,7 +19,10 @@ struct ptdump_info {
struct mm_struct *mm;
const struct addr_marker *markers;
unsigned long base_addr;
+ void (*ptdump_prepare_walk)(struct ptdump_info *info);
void (*ptdump_walk)(struct seq_file *s, struct ptdump_info *info);
+ void (*ptdump_end_walk)(struct ptdump_info *info);
+ struct mutex file_lock;
};
void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 58a4ea975497..fe239b9af50c 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -24,6 +24,7 @@
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
#include <asm/ptdump.h>
+#include <asm/kvm_pkvm.h>
#include <asm/kvm_pgtable.h>
diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 7564519db1e6..14619452dd8d 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -15,7 +15,39 @@ static int ptdump_show(struct seq_file *m, void *v)
put_online_mems();
return 0;
}
-DEFINE_SHOW_ATTRIBUTE(ptdump);
+
+static int ptdump_open(struct inode *inode, struct file *file)
+{
+ int ret;
+ struct ptdump_info *info = inode->i_private;
+
+ ret = single_open(file, ptdump_show, inode->i_private);
+ if (!ret && info->ptdump_prepare_walk) {
+ mutex_lock(&info->file_lock);
+ info->ptdump_prepare_walk(info);
+ }
+ return ret;
+}
+
+static int ptdump_release(struct inode *inode, struct file *file)
+{
+ struct ptdump_info *info = inode->i_private;
+
+ if (info->ptdump_end_walk) {
+ info->ptdump_end_walk(info);
+ mutex_unlock(&info->file_lock);
+ }
+
+ return single_release(inode, file);
+}
+
+static const struct file_operations ptdump_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = ptdump_release,
+};
void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name)
{
--
2.42.0.655.g421f12c284-goog
Add a walker function which configures ptdump to parse the page-tables
from the snapshot. Convert the physical address of the pagetable's start
address to a host virtual address and use the ptdump walker to parse the
page-table descriptors.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/mm/ptdump.c | 63 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 7c78b8994ca1..3ba4848272df 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -479,6 +479,11 @@ static void *ptdump_host_va(phys_addr_t phys)
return __va(phys);
}
+static struct kvm_pgtable_mm_ops host_mmops = {
+ .phys_to_virt = ptdump_host_va,
+ .virt_to_phys = ptdump_host_pa,
+};
+
static size_t stage2_get_pgd_len(void)
{
u64 mmfr0, mmfr1, vtcr;
@@ -604,6 +609,63 @@ static void stage2_ptdump_end_walk(struct ptdump_info *info)
free_pages_exact(snapshot, PAGE_SIZE);
info->priv = NULL;
}
+
+static int stage2_ptdump_visitor(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
+{
+ struct pg_state *st = ctx->arg;
+ struct ptdump_state *pt_st = &st->ptdump;
+
+ if (st->pg_level[ctx->level].mask & ctx->old)
+ pt_st->note_page(pt_st, ctx->addr, ctx->level, ctx->old);
+
+ return 0;
+}
+
+static void stage2_ptdump_walk(struct seq_file *s, struct ptdump_info *info)
+{
+ struct kvm_pgtable_snapshot *snapshot = info->priv;
+ struct pg_state st;
+ struct kvm_pgtable *pgtable;
+ u64 start_ipa = 0, end_ipa;
+ struct addr_marker ipa_address_markers[3];
+ struct kvm_pgtable_walker walker = (struct kvm_pgtable_walker) {
+ .cb = stage2_ptdump_visitor,
+ .arg = &st,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ if (snapshot == NULL || !snapshot->pgtable.pgd)
+ return;
+
+ pgtable = &snapshot->pgtable;
+ pgtable->mm_ops = &host_mmops;
+ end_ipa = BIT(pgtable->ia_bits) - 1;
+
+ memset(&ipa_address_markers[0], 0, sizeof(ipa_address_markers));
+
+ ipa_address_markers[0].start_address = start_ipa;
+ ipa_address_markers[0].name = "IPA start";
+
+ ipa_address_markers[1].start_address = end_ipa;
+ ipa_address_markers[1].name = "IPA end";
+
+ st = (struct pg_state) {
+ .seq = s,
+ .marker = &ipa_address_markers[0],
+ .level = pgtable->start_level - 1,
+ .pg_level = &stage2_pg_level[0],
+ .ptdump = {
+ .note_page = note_page,
+ .range = (struct ptdump_range[]) {
+ {start_ipa, end_ipa},
+ {0, 0},
+ },
+ },
+ };
+
+ kvm_pgtable_walk(pgtable, start_ipa, end_ipa, &walker);
+}
#endif /* CONFIG_NVHE_EL2_PTDUMP_DEBUGFS */
static void __init ptdump_register_host_stage2(void)
@@ -616,6 +678,7 @@ static void __init ptdump_register_host_stage2(void)
.mc_len = host_s2_pgtable_pages(),
.ptdump_prepare_walk = stage2_ptdump_prepare_walk,
.ptdump_end_walk = stage2_ptdump_end_walk,
+ .ptdump_walk = stage2_ptdump_walk,
};
mutex_init(&stage2_kernel_ptdump_info.file_lock);
--
2.42.0.655.g421f12c284-goog
Initialize a structures used to keep the state of the host stage-2 ptdump
walker when pKVM is enabled. Create a new debugfs entry for the host
stage-2 pagetables and hook the callbacks invoked when the entry is
accessed. When the debugfs file is opened, allocate memory resources which
will be shared with the hypervisor for saving the pagetable snapshot.
On close release the associated memory and we unshare it from the
hypervisor.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/include/asm/ptdump.h | 2 +
arch/arm64/kvm/Kconfig | 12 +++
arch/arm64/mm/ptdump.c | 161 ++++++++++++++++++++++++++++++++
3 files changed, 175 insertions(+)
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 88dcab1dab97..35b883524462 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -23,6 +23,8 @@ struct ptdump_info {
void (*ptdump_walk)(struct seq_file *s, struct ptdump_info *info);
void (*ptdump_end_walk)(struct ptdump_info *info);
struct mutex file_lock;
+ size_t mc_len;
+ void *priv;
};
void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 83c1e09be42e..4b1847704bb3 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -71,4 +71,16 @@ config PROTECTED_NVHE_STACKTRACE
If unsure, or not using protected nVHE (pKVM), say N.
+config NVHE_EL2_PTDUMP_DEBUGFS
+ bool "Present the stage-2 pagetables to debugfs"
+ depends on NVHE_EL2_DEBUG && PTDUMP_DEBUGFS && KVM
+ help
+ Say Y here if you want to show the stage-2 kernel pagetables
+ layout in a debugfs file. This information is only useful for kernel developers
+ who are working in architecture specific areas of the kernel.
+ It is probably not a good idea to enable this feature in a production
+ kernel.
+
+ If in doubt, say N.
+
endif # VIRTUALIZATION
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index fe239b9af50c..7c78b8994ca1 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -466,6 +466,165 @@ void ptdump_check_wx(void)
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
}
+#ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
+static struct ptdump_info stage2_kernel_ptdump_info;
+
+static phys_addr_t ptdump_host_pa(void *addr)
+{
+ return __pa(addr);
+}
+
+static void *ptdump_host_va(phys_addr_t phys)
+{
+ return __va(phys);
+}
+
+static size_t stage2_get_pgd_len(void)
+{
+ u64 mmfr0, mmfr1, vtcr;
+ u32 phys_shift = get_kvm_ipa_limit();
+
+ mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+ mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+ vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+
+ return kvm_pgtable_stage2_pgd_size(vtcr);
+}
+
+static void stage2_ptdump_prepare_walk(struct ptdump_info *info)
+{
+ struct kvm_pgtable_snapshot *snapshot;
+ int ret, pgd_index, mc_index, pgd_pages_sz;
+ void *page_hva;
+ phys_addr_t pgd;
+
+ snapshot = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL_ACCOUNT);
+ if (!snapshot)
+ return;
+
+ memset(snapshot, 0, PAGE_SIZE);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, virt_to_pfn(snapshot));
+ if (ret)
+ goto free_snapshot;
+
+ snapshot->pgd_len = stage2_get_pgd_len();
+ pgd_pages_sz = snapshot->pgd_len / PAGE_SIZE;
+ snapshot->pgd_hva = alloc_pages_exact(snapshot->pgd_len,
+ GFP_KERNEL_ACCOUNT);
+ if (!snapshot->pgd_hva)
+ goto unshare_snapshot;
+
+ for (pgd_index = 0; pgd_index < pgd_pages_sz; pgd_index++) {
+ page_hva = snapshot->pgd_hva + pgd_index * PAGE_SIZE;
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
+ virt_to_pfn(page_hva));
+ if (ret)
+ goto unshare_pgd_pages;
+ }
+
+ for (mc_index = 0; mc_index < info->mc_len; mc_index++) {
+ page_hva = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL_ACCOUNT);
+ if (!page_hva)
+ goto free_memcache_pages;
+
+ push_hyp_memcache(&snapshot->mc, page_hva, ptdump_host_pa);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
+ virt_to_pfn(page_hva));
+ if (ret) {
+ pop_hyp_memcache(&snapshot->mc, ptdump_host_va);
+ free_pages_exact(page_hva, PAGE_SIZE);
+ goto free_memcache_pages;
+ }
+ }
+
+ ret = kvm_call_hyp_nvhe(__pkvm_copy_host_stage2, snapshot);
+ if (ret)
+ goto free_memcache_pages;
+
+ pgd = (phys_addr_t)snapshot->pgtable.pgd;
+ snapshot->pgtable.pgd = phys_to_virt(pgd);
+ info->priv = snapshot;
+ return;
+
+free_memcache_pages:
+ page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va);
+ while (page_hva) {
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp,
+ virt_to_pfn(page_hva));
+ WARN_ON(ret);
+ free_pages_exact(page_hva, PAGE_SIZE);
+ page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va);
+ }
+unshare_pgd_pages:
+ pgd_index = pgd_index - 1;
+ for (; pgd_index >= 0; pgd_index--) {
+ page_hva = snapshot->pgd_hva + pgd_index * PAGE_SIZE;
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp,
+ virt_to_pfn(page_hva));
+ WARN_ON(ret);
+ }
+ free_pages_exact(snapshot->pgd_hva, snapshot->pgd_len);
+unshare_snapshot:
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp,
+ virt_to_pfn(snapshot)));
+free_snapshot:
+ free_pages_exact(snapshot, PAGE_SIZE);
+ info->priv = NULL;
+}
+
+static void stage2_ptdump_end_walk(struct ptdump_info *info)
+{
+ struct kvm_pgtable_snapshot *snapshot = info->priv;
+ void *page_hva;
+ int pgd_index, ret, pgd_pages_sz;
+
+ if (!snapshot)
+ return;
+
+ page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va);
+ while (page_hva) {
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp,
+ virt_to_pfn(page_hva));
+ WARN_ON(ret);
+ free_pages_exact(page_hva, PAGE_SIZE);
+ page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va);
+ }
+
+ pgd_pages_sz = snapshot->pgd_len / PAGE_SIZE;
+ for (pgd_index = 0; pgd_index < pgd_pages_sz; pgd_index++) {
+ page_hva = snapshot->pgd_hva + pgd_index * PAGE_SIZE;
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp,
+ virt_to_pfn(page_hva));
+ WARN_ON(ret);
+ }
+
+ free_pages_exact(snapshot->pgd_hva, snapshot->pgd_len);
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp,
+ virt_to_pfn(snapshot)));
+ free_pages_exact(snapshot, PAGE_SIZE);
+ info->priv = NULL;
+}
+#endif /* CONFIG_NVHE_EL2_PTDUMP_DEBUGFS */
+
+static void __init ptdump_register_host_stage2(void)
+{
+#ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
+ if (!is_protected_kvm_enabled())
+ return;
+
+ stage2_kernel_ptdump_info = (struct ptdump_info) {
+ .mc_len = host_s2_pgtable_pages(),
+ .ptdump_prepare_walk = stage2_ptdump_prepare_walk,
+ .ptdump_end_walk = stage2_ptdump_end_walk,
+ };
+
+ mutex_init(&stage2_kernel_ptdump_info.file_lock);
+
+ ptdump_debugfs_register(&stage2_kernel_ptdump_info,
+ "host_stage2_kernel_page_tables");
+#endif
+}
+
static int __init ptdump_init(void)
{
address_markers[PAGE_END_NR].start_address = PAGE_END;
@@ -474,6 +633,8 @@ static int __init ptdump_init(void)
#endif
ptdump_initialize();
ptdump_debugfs_register(&kernel_ptdump_info, "kernel_page_tables");
+ ptdump_register_host_stage2();
+
return 0;
}
device_initcall(ptdump_init);
--
2.42.0.655.g421f12c284-goog
When FWB is used the memory attributes stored in the descriptors have a
different bitfield layout. Introduce two callbacks that verify the current
runtime configuration before parsing the attribute fields.
Add support for parsing the memory attribute fields from the page table
descriptors.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/mm/ptdump.c | 66 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 65 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 3ba4848272df..5f9a334b0f0c 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -85,13 +85,22 @@ struct pg_state {
bool check_wx;
unsigned long wx_pages;
unsigned long uxn_pages;
+ struct ptdump_info *info;
};
+/*
+ * This callback checks the runtime configuration before interpreting the
+ * attributes defined in the prot_bits.
+ */
+typedef bool (*is_feature_cb)(const void *ctx);
+
struct prot_bits {
u64 mask;
u64 val;
const char *set;
const char *clear;
+ is_feature_cb feature_on; /* bit ignored if the callback returns false */
+ is_feature_cb feature_off; /* bit ignored if the callback returns true */
};
static const struct prot_bits pte_bits[] = {
@@ -173,6 +182,34 @@ static const struct prot_bits pte_bits[] = {
}
};
+static bool is_fwb_enabled(const void *ctx)
+{
+ const struct pg_state *st = ctx;
+ const struct ptdump_info *info = st->info;
+ struct kvm_pgtable_snapshot *snapshot = info->priv;
+ struct kvm_pgtable *pgtable = &snapshot->pgtable;
+
+ bool fwb_enabled = false;
+
+ if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+ fwb_enabled = !(pgtable->flags & KVM_PGTABLE_S2_NOFWB);
+
+ return fwb_enabled;
+}
+
+static bool is_table_bit_ignored(const void *ctx)
+{
+ const struct pg_state *st = ctx;
+
+ if (!(st->current_prot & PTE_VALID))
+ return true;
+
+ if (st->level == CONFIG_PGTABLE_LEVELS)
+ return true;
+
+ return false;
+}
+
static const struct prot_bits stage2_pte_bits[] = {
{
.mask = PTE_VALID,
@@ -214,6 +251,27 @@ static const struct prot_bits stage2_pte_bits[] = {
.val = PTE_TABLE_BIT,
.set = " ",
.clear = "BLK",
+ .feature_off = is_table_bit_ignored,
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | PTE_VALID,
+ .val = PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_VALID,
+ .set = "DEVICE/nGnRE",
+ .feature_off = is_fwb_enabled,
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | PTE_VALID,
+ .val = PTE_S2_MEMATTR(MT_S2_FWB_DEVICE_nGnRE) | PTE_VALID,
+ .set = "DEVICE/nGnRE FWB",
+ .feature_on = is_fwb_enabled,
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | PTE_VALID,
+ .val = PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_VALID,
+ .set = "MEM/NORMAL",
+ .feature_off = is_fwb_enabled,
+ }, {
+ .mask = KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | PTE_VALID,
+ .val = PTE_S2_MEMATTR(MT_S2_FWB_NORMAL) | PTE_VALID,
+ .set = "MEM/NORMAL FWB",
+ .feature_on = is_fwb_enabled,
}, {
.mask = KVM_PGTABLE_PROT_SW0,
.val = KVM_PGTABLE_PROT_SW0,
@@ -285,13 +343,19 @@ static struct pg_level stage2_pg_level[] = {
};
static void dump_prot(struct pg_state *st, const struct prot_bits *bits,
- size_t num)
+ size_t num)
{
unsigned i;
for (i = 0; i < num; i++, bits++) {
const char *s;
+ if (bits->feature_on && !bits->feature_on(st))
+ continue;
+
+ if (bits->feature_off && bits->feature_off(st))
+ continue;
+
if ((st->current_prot & bits->mask) == bits->val)
s = bits->set;
else
--
2.42.0.655.g421f12c284-goog
Add support for interpretting pKVM invalid stage-2 descriptors that hold
ownership information. We use these descriptors to keep track of the
memory donations from the host side.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 7 +++++++
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 7 -------
arch/arm64/mm/ptdump.c | 10 ++++++++++
3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 913f34d75b29..938baffa7d4d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -87,6 +87,13 @@ typedef u64 kvm_pte_t;
*/
#define KVM_INVALID_PTE_LOCKED BIT(10)
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+ PKVM_ID_HOST,
+ PKVM_ID_HYP,
+ PKVM_ID_FFA,
+};
+
static inline bool kvm_pte_valid(kvm_pte_t pte)
{
return pte & KVM_PTE_VALID;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 9cfb35d68850..cc2c439ffe75 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -53,13 +53,6 @@ struct host_mmu {
};
extern struct host_mmu host_mmu;
-/* This corresponds to page-table locking order */
-enum pkvm_component_id {
- PKVM_ID_HOST,
- PKVM_ID_HYP,
- PKVM_ID_FFA,
-};
-
extern unsigned long hyp_nr_cpus;
int __pkvm_prot_finalize(void);
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 5f9a334b0f0c..4687840dcb69 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -272,6 +272,16 @@ static const struct prot_bits stage2_pte_bits[] = {
.val = PTE_S2_MEMATTR(MT_S2_FWB_NORMAL) | PTE_VALID,
.set = "MEM/NORMAL FWB",
.feature_on = is_fwb_enabled,
+ }, {
+ .mask = KVM_INVALID_PTE_OWNER_MASK | PTE_VALID,
+ .val = FIELD_PREP_CONST(KVM_INVALID_PTE_OWNER_MASK,
+ PKVM_ID_HYP),
+ .set = "HYP",
+ }, {
+ .mask = KVM_INVALID_PTE_OWNER_MASK | PTE_VALID,
+ .val = FIELD_PREP_CONST(KVM_INVALID_PTE_OWNER_MASK,
+ PKVM_ID_FFA),
+ .set = "FF-A",
}, {
.mask = KVM_PGTABLE_PROT_SW0,
.val = KVM_PGTABLE_PROT_SW0,
--
2.42.0.655.g421f12c284-goog
Register a debugfs file on guest creation to be able to view their
second translation tables with ptdump. This assumes that the host is in
control of the guest stage-2 and has direct access to the pagetables.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/include/asm/ptdump.h | 21 +++++++--
arch/arm64/kvm/mmu.c | 3 ++
arch/arm64/mm/ptdump.c | 84 +++++++++++++++++++++++++++++++++
arch/arm64/mm/ptdump_debugfs.c | 5 +-
4 files changed, 108 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 35b883524462..be86244d532b 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -5,6 +5,8 @@
#ifndef __ASM_PTDUMP_H
#define __ASM_PTDUMP_H
+#include <asm/kvm_pgtable.h>
+
#ifdef CONFIG_PTDUMP_CORE
#include <linux/mm_types.h>
@@ -30,14 +32,27 @@ struct ptdump_info {
void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
#ifdef CONFIG_PTDUMP_DEBUGFS
#define EFI_RUNTIME_MAP_END DEFAULT_MAP_WINDOW_64
-void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name);
+struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
+ const char *name);
#else
-static inline void ptdump_debugfs_register(struct ptdump_info *info,
- const char *name) { }
+static inline struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
+ const char *name)
+{
+ return NULL;
+}
#endif
void ptdump_check_wx(void);
#endif /* CONFIG_PTDUMP_CORE */
+#ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
+void ptdump_register_guest_stage2(struct kvm_pgtable *pgt, void *lock);
+void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt);
+#else
+static inline void ptdump_register_guest_stage2(struct kvm_pgtable *pgt,
+ void *lock) { }
+static inline void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt) { }
+#endif /* CONFIG_NVHE_EL2_PTDUMP_DEBUGFS */
+
#ifdef CONFIG_DEBUG_WX
#define debug_checkwx() ptdump_check_wx()
#else
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 482280fe22d7..e47988dba34d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -11,6 +11,7 @@
#include <linux/sched/signal.h>
#include <trace/events/kvm.h>
#include <asm/pgalloc.h>
+#include <asm/ptdump.h>
#include <asm/cacheflush.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
@@ -908,6 +909,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
if (err)
goto out_free_pgtable;
+ ptdump_register_guest_stage2(pgt, &kvm->mmu_lock);
mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
if (!mmu->last_vcpu_ran) {
err = -ENOMEM;
@@ -1021,6 +1023,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
write_unlock(&kvm->mmu_lock);
if (pgt) {
+ ptdump_unregister_guest_stage2(pgt);
kvm_pgtable_stage2_destroy(pgt);
kfree(pgt);
}
diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 4687840dcb69..facfb15468f5 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -26,6 +26,7 @@
#include <asm/ptdump.h>
#include <asm/kvm_pkvm.h>
#include <asm/kvm_pgtable.h>
+#include <asm/kvm_host.h>
enum address_markers_idx {
@@ -543,6 +544,22 @@ void ptdump_check_wx(void)
#ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
static struct ptdump_info stage2_kernel_ptdump_info;
+#define GUEST_NAME_LEN (32U)
+
+struct ptdump_registered_guest {
+ struct list_head reg_list;
+ struct ptdump_info info;
+ struct mm_struct mem;
+ struct kvm_pgtable_snapshot snapshot;
+ struct dentry *dentry;
+ rwlock_t *lock;
+ char reg_name[GUEST_NAME_LEN];
+};
+
+static LIST_HEAD(ptdump_guest_list);
+static DEFINE_MUTEX(ptdump_list_lock);
+static u16 guest_no;
+
static phys_addr_t ptdump_host_pa(void *addr)
{
return __pa(addr);
@@ -740,6 +757,73 @@ static void stage2_ptdump_walk(struct seq_file *s, struct ptdump_info *info)
kvm_pgtable_walk(pgtable, start_ipa, end_ipa, &walker);
}
+
+static void guest_stage2_ptdump_walk(struct seq_file *s,
+ struct ptdump_info *info)
+{
+ struct kvm_pgtable_snapshot *snapshot = info->priv;
+ struct ptdump_registered_guest *guest;
+
+ guest = container_of(snapshot, struct ptdump_registered_guest,
+ snapshot);
+ read_lock(guest->lock);
+ stage2_ptdump_walk(s, info);
+ read_unlock(guest->lock);
+}
+
+void ptdump_register_guest_stage2(struct kvm_pgtable *pgt, void *lock)
+{
+ struct ptdump_registered_guest *guest;
+ struct dentry *d;
+
+ if (pgt == NULL || lock == NULL)
+ return;
+
+ guest = kzalloc(sizeof(struct ptdump_registered_guest), GFP_KERNEL);
+ if (!guest)
+ return;
+
+ memcpy(&guest->snapshot.pgtable, pgt, sizeof(struct kvm_pgtable));
+ guest->info = (struct ptdump_info) {
+ .ptdump_walk = guest_stage2_ptdump_walk,
+ .priv = &guest->snapshot
+ };
+
+ mutex_init(&guest->info.file_lock);
+ guest->lock = lock;
+ mutex_lock(&ptdump_list_lock);
+ snprintf(guest->reg_name, GUEST_NAME_LEN,
+ "%u_guest_stage2_page_tables", guest_no++);
+ d = ptdump_debugfs_register(&guest->info, guest->reg_name);
+ if (!d) {
+ mutex_unlock(&ptdump_list_lock);
+ goto free_entry;
+ }
+
+ guest->dentry = d;
+ list_add(&guest->reg_list, &ptdump_guest_list);
+ mutex_unlock(&ptdump_list_lock);
+ return;
+
+free_entry:
+ kfree(guest);
+}
+
+void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt)
+{
+ struct ptdump_registered_guest *guest;
+
+ mutex_lock(&ptdump_list_lock);
+ list_for_each_entry(guest, &ptdump_guest_list, reg_list) {
+ if (guest->snapshot.pgtable.pgd == pgt->pgd) {
+ list_del(&guest->reg_list);
+ debugfs_remove(guest->dentry);
+ kfree(guest);
+ break;
+ }
+ }
+ mutex_unlock(&ptdump_list_lock);
+}
#endif /* CONFIG_NVHE_EL2_PTDUMP_DEBUGFS */
static void __init ptdump_register_host_stage2(void)
diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 14619452dd8d..356753e27dee 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -49,7 +49,8 @@ static const struct file_operations ptdump_fops = {
.release = ptdump_release,
};
-void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name)
+struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
+ const char *name)
{
- debugfs_create_file(name, 0400, NULL, info, &ptdump_fops);
+ return debugfs_create_file(name, 0400, NULL, info, &ptdump_fops);
}
--
2.42.0.655.g421f12c284-goog
In preparation for using the stage-2 definitions in ptdump, move some of
these macros in the common header.
Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 42 ++++++++++++++++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 42 ----------------------------
2 files changed, 42 insertions(+), 42 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index be615700f8ac..913f34d75b29 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -45,6 +45,48 @@ typedef u64 kvm_pte_t;
#define KVM_PHYS_INVALID (-1ULL)
+#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
+
+#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP GENMASK(7, 6)
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO \
+ ({ cpus_have_final_cap(ARM64_KVM_HVHE) ? 2 : 3; })
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW \
+ ({ cpus_have_final_cap(ARM64_KVM_HVHE) ? 0 : 1; })
+#define KVM_PTE_LEAF_ATTR_LO_S1_SH GENMASK(9, 8)
+#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS 3
+#define KVM_PTE_LEAF_ATTR_LO_S1_AF BIT(10)
+
+#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR GENMASK(5, 2)
+#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R BIT(6)
+#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W BIT(7)
+#define KVM_PTE_LEAF_ATTR_LO_S2_SH GENMASK(9, 8)
+#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS 3
+#define KVM_PTE_LEAF_ATTR_LO_S2_AF BIT(10)
+
+#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 50)
+
+#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
+
+#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
+
+#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54)
+
+#define KVM_PTE_LEAF_ATTR_HI_S1_GP BIT(50)
+
+#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
+ KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
+ KVM_PTE_LEAF_ATTR_HI_S2_XN)
+
+#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
+#define KVM_MAX_OWNER_ID 1
+
+/*
+ * Used to indicate a pte for which a 'break-before-make' sequence is in
+ * progress.
+ */
+#define KVM_INVALID_PTE_LOCKED BIT(10)
+
static inline bool kvm_pte_valid(kvm_pte_t pte)
{
return pte & KVM_PTE_VALID;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 256654b89c1e..67fa122c6028 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -17,48 +17,6 @@
#define KVM_PTE_TYPE_PAGE 1
#define KVM_PTE_TYPE_TABLE 1
-#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
-
-#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
-#define KVM_PTE_LEAF_ATTR_LO_S1_AP GENMASK(7, 6)
-#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO \
- ({ cpus_have_final_cap(ARM64_KVM_HVHE) ? 2 : 3; })
-#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW \
- ({ cpus_have_final_cap(ARM64_KVM_HVHE) ? 0 : 1; })
-#define KVM_PTE_LEAF_ATTR_LO_S1_SH GENMASK(9, 8)
-#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS 3
-#define KVM_PTE_LEAF_ATTR_LO_S1_AF BIT(10)
-
-#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR GENMASK(5, 2)
-#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R BIT(6)
-#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W BIT(7)
-#define KVM_PTE_LEAF_ATTR_LO_S2_SH GENMASK(9, 8)
-#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS 3
-#define KVM_PTE_LEAF_ATTR_LO_S2_AF BIT(10)
-
-#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 50)
-
-#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
-
-#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
-
-#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54)
-
-#define KVM_PTE_LEAF_ATTR_HI_S1_GP BIT(50)
-
-#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
- KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
- KVM_PTE_LEAF_ATTR_HI_S2_XN)
-
-#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 1
-
-/*
- * Used to indicate a pte for which a 'break-before-make' sequence is in
- * progress.
- */
-#define KVM_INVALID_PTE_LOCKED BIT(10)
-
struct kvm_pgtable_walk_data {
struct kvm_pgtable_walker *walker;
--
2.42.0.655.g421f12c284-goog
On Thu, Oct 19, 2023 at 02:40:21PM +0000, Sebastian Ene wrote:
> Hi,
>
> This can be used as a debugging tool for dumping the second stage
> page-tables under debugfs.
>
> From the previous feedback I re-worked the series and added support for
> guest page-tables dumping under VHE & nVHE configuration. I extended the
> list of reviewers as I missed the interested parties in the first round.
>
> When CONFIG_NVHE_EL2_PTDUMP_DEBUGFS is enabled under pKVM environment,
> ptdump registers the 'host_stage2_kernel_page_tables' entry with debugfs.
> Guests are registering a file named '%u_guest_stage2_page_tables' when
> they are created.
I believe guests entries should be also available for nVHE and VHE.
>
> This allows us to dump the host stage-2 page-tables with the following command:
> cat /sys/kernel/debug/host_stage2_kernel_page_tables.
As it needs the debugfs anyway, this should probably live in the kvm/ debugfs
folder, while the VMs ptdump should be placed in their respective folder.
This is quite easy, you should get access to the global kvm_debugfs_dir and
struct kvm->debugfs_dentry.
>
> The output is showing the entries in the following format:
> <IPA range> <size> <descriptor type> <access permissions> <mem_attributes>
>
> The tool interprets the pKVM ownership annotation stored in the invalid
> entries and dumps to the console the ownership information. To be able
> to access the host stage-2 page-tables from the kernel, a new hypervisor
> call was introduced which allows us to snapshot the page-tables in a host
> provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
> as this should be used under debugging environment.
>
> Link to the first version:
> https://lore.kernel.org/all/[email protected]/
>
> Changelog:
> v1 -> v2:
> * use the stage-2 pagetable walker for dumping descriptors instead of
> the one provided by ptdump.
>
> * support for guests pagetables dumping under VHE/nVHE non-protected
>
> Thanks,
>
>
> Sebastian Ene (11):
> KVM: arm64: Add snap shooting the host stage-2 pagetables
> arm64: ptdump: Use the mask from the state structure
> arm64: ptdump: Add the walker function to the ptdump info structure
> KVM: arm64: Move pagetable definitions to common header
> arm64: ptdump: Introduce stage-2 pagetables format description
> arm64: ptdump: Add hooks on debugfs file operations
> arm64: ptdump: Register a debugfs entry for the host stage-2
> page-tables
> arm64: ptdump: Parse the host stage-2 page-tables from the snapshot
> arm64: ptdump: Interpret memory attributes based on runtime
> configuration
> arm64: ptdump: Interpret pKVM ownership annotations
> arm64: ptdump: Add support for guest stage-2 pagetables dumping
>
> arch/arm64/include/asm/kvm_asm.h | 1 +
> arch/arm64/include/asm/kvm_pgtable.h | 85 +++
> arch/arm64/include/asm/ptdump.h | 27 +-
> arch/arm64/kvm/Kconfig | 12 +
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 8 +-
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 18 +
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 103 ++++
> arch/arm64/kvm/hyp/pgtable.c | 98 ++--
> arch/arm64/kvm/mmu.c | 3 +
> arch/arm64/mm/ptdump.c | 487 +++++++++++++++++-
> arch/arm64/mm/ptdump_debugfs.c | 42 +-
> 11 files changed, 822 insertions(+), 62 deletions(-)
>
> --
> 2.42.0.655.g421f12c284-goog
>
On Thu, Oct 19, 2023 at 02:40:33PM +0000, Sebastian Ene wrote:
> Register a debugfs file on guest creation to be able to view their
> second translation tables with ptdump. This assumes that the host is in
> control of the guest stage-2 and has direct access to the pagetables.
What about pKVM? The walker you wrote for the host stage-2 should be
reusable in that case?
>
> Signed-off-by: Sebastian Ene <[email protected]>
> ---
> arch/arm64/include/asm/ptdump.h | 21 +++++++--
> arch/arm64/kvm/mmu.c | 3 ++
> arch/arm64/mm/ptdump.c | 84 +++++++++++++++++++++++++++++++++
> arch/arm64/mm/ptdump_debugfs.c | 5 +-
> 4 files changed, 108 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
> index 35b883524462..be86244d532b 100644
> --- a/arch/arm64/include/asm/ptdump.h
> +++ b/arch/arm64/include/asm/ptdump.h
> @@ -5,6 +5,8 @@
> #ifndef __ASM_PTDUMP_H
> #define __ASM_PTDUMP_H
>
> +#include <asm/kvm_pgtable.h>
> +
> #ifdef CONFIG_PTDUMP_CORE
>
> #include <linux/mm_types.h>
> @@ -30,14 +32,27 @@ struct ptdump_info {
> void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
> #ifdef CONFIG_PTDUMP_DEBUGFS
> #define EFI_RUNTIME_MAP_END DEFAULT_MAP_WINDOW_64
> -void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name);
> +struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
> + const char *name);
> #else
> -static inline void ptdump_debugfs_register(struct ptdump_info *info,
> - const char *name) { }
> +static inline struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
> + const char *name)
> +{
> + return NULL;
> +}
> #endif
> void ptdump_check_wx(void);
> #endif /* CONFIG_PTDUMP_CORE */
>
> +#ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
> +void ptdump_register_guest_stage2(struct kvm_pgtable *pgt, void *lock);
> +void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt);
> +#else
> +static inline void ptdump_register_guest_stage2(struct kvm_pgtable *pgt,
> + void *lock) { }
> +static inline void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt) { }
> +#endif /* CONFIG_NVHE_EL2_PTDUMP_DEBUGFS */
I believe this should be compatible with VHE as well, that option should be
renamed.
> +
> #ifdef CONFIG_DEBUG_WX
> #define debug_checkwx() ptdump_check_wx()
> #else
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 482280fe22d7..e47988dba34d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -11,6 +11,7 @@
> #include <linux/sched/signal.h>
> #include <trace/events/kvm.h>
> #include <asm/pgalloc.h>
> +#include <asm/ptdump.h>
> #include <asm/cacheflush.h>
> #include <asm/kvm_arm.h>
> #include <asm/kvm_mmu.h>
> @@ -908,6 +909,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
> if (err)
> goto out_free_pgtable;
>
> + ptdump_register_guest_stage2(pgt, &kvm->mmu_lock);
> mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
> if (!mmu->last_vcpu_ran) {
> err = -ENOMEM;
> @@ -1021,6 +1023,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> write_unlock(&kvm->mmu_lock);
>
> if (pgt) {
> + ptdump_unregister_guest_stage2(pgt);
> kvm_pgtable_stage2_destroy(pgt);
> kfree(pgt);
> }
> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index 4687840dcb69..facfb15468f5 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -26,6 +26,7 @@
> #include <asm/ptdump.h>
> #include <asm/kvm_pkvm.h>
> #include <asm/kvm_pgtable.h>
> +#include <asm/kvm_host.h>
>
>
> enum address_markers_idx {
> @@ -543,6 +544,22 @@ void ptdump_check_wx(void)
> #ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
> static struct ptdump_info stage2_kernel_ptdump_info;
>
> +#define GUEST_NAME_LEN (32U)
> +
> +struct ptdump_registered_guest {
> + struct list_head reg_list;
> + struct ptdump_info info;
> + struct mm_struct mem;
> + struct kvm_pgtable_snapshot snapshot;
> + struct dentry *dentry;
> + rwlock_t *lock;
> + char reg_name[GUEST_NAME_LEN];
> +};
> +
> +static LIST_HEAD(ptdump_guest_list);
> +static DEFINE_MUTEX(ptdump_list_lock);
> +static u16 guest_no;
This is not robust enough: If 1 VM starts then 65535 others which are killed.
guest_no overflows. The next number is 0 which is already taken.
Linux has and ID allocation to solve this problem, but I don't think this is
necessary anyway. This should simply reuse the struct kvm->debugfs_dentry.
Also probably most of the informations contained in ptdump_registered_guest can
be found in struct kvm. The debugfs should then probably simply take struct kvm
for the private argument.
> +
> static phys_addr_t ptdump_host_pa(void *addr)
> {
> return __pa(addr);
> @@ -740,6 +757,73 @@ static void stage2_ptdump_walk(struct seq_file *s, struct ptdump_info *info)
>
> kvm_pgtable_walk(pgtable, start_ipa, end_ipa, &walker);
> }
[...]
On Fri, Oct 20, 2023 at 09:19:33AM +0100, Vincent Donnefort wrote:
> On Thu, Oct 19, 2023 at 02:40:21PM +0000, Sebastian Ene wrote:
> > Hi,
> >
> > This can be used as a debugging tool for dumping the second stage
> > page-tables under debugfs.
> >
> > From the previous feedback I re-worked the series and added support for
> > guest page-tables dumping under VHE & nVHE configuration. I extended the
> > list of reviewers as I missed the interested parties in the first round.
> >
> > When CONFIG_NVHE_EL2_PTDUMP_DEBUGFS is enabled under pKVM environment,
> > ptdump registers the 'host_stage2_kernel_page_tables' entry with debugfs.
> > Guests are registering a file named '%u_guest_stage2_page_tables' when
> > they are created.
Hi,
>
> I believe guests entries should be also available for nVHE and VHE.
>
Yes, we support dumping the guest stage-2 pagetables with this under
both modes. The host stage-2 is available only in
kvm.arm.mode="protected".
> >
> > This allows us to dump the host stage-2 page-tables with the following command:
> > cat /sys/kernel/debug/host_stage2_kernel_page_tables.
>
> As it needs the debugfs anyway, this should probably live in the kvm/ debugfs
> folder, while the VMs ptdump should be placed in their respective folder.
>
> This is quite easy, you should get access to the global kvm_debugfs_dir and
> struct kvm->debugfs_dentry.
>
Right, I was thinking to place them under kvm/ debugfs entry but then I
noticed that ptdump files are not registered under this path.
> >
> > The output is showing the entries in the following format:
> > <IPA range> <size> <descriptor type> <access permissions> <mem_attributes>
> >
> > The tool interprets the pKVM ownership annotation stored in the invalid
> > entries and dumps to the console the ownership information. To be able
> > to access the host stage-2 page-tables from the kernel, a new hypervisor
> > call was introduced which allows us to snapshot the page-tables in a host
> > provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
> > as this should be used under debugging environment.
> >
> > Link to the first version:
> > https://lore.kernel.org/all/[email protected]/
> >
> > Changelog:
> > v1 -> v2:
> > * use the stage-2 pagetable walker for dumping descriptors instead of
> > the one provided by ptdump.
> >
> > * support for guests pagetables dumping under VHE/nVHE non-protected
> >
> > Thanks,
> >
> >
> > Sebastian Ene (11):
> > KVM: arm64: Add snap shooting the host stage-2 pagetables
> > arm64: ptdump: Use the mask from the state structure
> > arm64: ptdump: Add the walker function to the ptdump info structure
> > KVM: arm64: Move pagetable definitions to common header
> > arm64: ptdump: Introduce stage-2 pagetables format description
> > arm64: ptdump: Add hooks on debugfs file operations
> > arm64: ptdump: Register a debugfs entry for the host stage-2
> > page-tables
> > arm64: ptdump: Parse the host stage-2 page-tables from the snapshot
> > arm64: ptdump: Interpret memory attributes based on runtime
> > configuration
> > arm64: ptdump: Interpret pKVM ownership annotations
> > arm64: ptdump: Add support for guest stage-2 pagetables dumping
> >
> > arch/arm64/include/asm/kvm_asm.h | 1 +
> > arch/arm64/include/asm/kvm_pgtable.h | 85 +++
> > arch/arm64/include/asm/ptdump.h | 27 +-
> > arch/arm64/kvm/Kconfig | 12 +
> > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 8 +-
> > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 18 +
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 103 ++++
> > arch/arm64/kvm/hyp/pgtable.c | 98 ++--
> > arch/arm64/kvm/mmu.c | 3 +
> > arch/arm64/mm/ptdump.c | 487 +++++++++++++++++-
> > arch/arm64/mm/ptdump_debugfs.c | 42 +-
> > 11 files changed, 822 insertions(+), 62 deletions(-)
> >
> > --
> > 2.42.0.655.g421f12c284-goog
> >
On Fri, Oct 20, 2023 at 09:40:06AM +0100, Vincent Donnefort wrote:
> On Thu, Oct 19, 2023 at 02:40:33PM +0000, Sebastian Ene wrote:
> > Register a debugfs file on guest creation to be able to view their
> > second translation tables with ptdump. This assumes that the host is in
> > control of the guest stage-2 and has direct access to the pagetables.
>
> What about pKVM? The walker you wrote for the host stage-2 should be
> reusable in that case?
>
Yes, when pKVM will be ready upstream the walker which duplicates the
pagetables for the host will be re-used for the guests. We will have to
add a separate HVC for this which receives as an argument the guest
vmid.
> >
> > Signed-off-by: Sebastian Ene <[email protected]>
> > ---
> > arch/arm64/include/asm/ptdump.h | 21 +++++++--
> > arch/arm64/kvm/mmu.c | 3 ++
> > arch/arm64/mm/ptdump.c | 84 +++++++++++++++++++++++++++++++++
> > arch/arm64/mm/ptdump_debugfs.c | 5 +-
> > 4 files changed, 108 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
> > index 35b883524462..be86244d532b 100644
> > --- a/arch/arm64/include/asm/ptdump.h
> > +++ b/arch/arm64/include/asm/ptdump.h
> > @@ -5,6 +5,8 @@
> > #ifndef __ASM_PTDUMP_H
> > #define __ASM_PTDUMP_H
> >
> > +#include <asm/kvm_pgtable.h>
> > +
> > #ifdef CONFIG_PTDUMP_CORE
> >
> > #include <linux/mm_types.h>
> > @@ -30,14 +32,27 @@ struct ptdump_info {
> > void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
> > #ifdef CONFIG_PTDUMP_DEBUGFS
> > #define EFI_RUNTIME_MAP_END DEFAULT_MAP_WINDOW_64
> > -void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name);
> > +struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
> > + const char *name);
> > #else
> > -static inline void ptdump_debugfs_register(struct ptdump_info *info,
> > - const char *name) { }
> > +static inline struct dentry *ptdump_debugfs_register(struct ptdump_info *info,
> > + const char *name)
> > +{
> > + return NULL;
> > +}
> > #endif
> > void ptdump_check_wx(void);
> > #endif /* CONFIG_PTDUMP_CORE */
> >
> > +#ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
> > +void ptdump_register_guest_stage2(struct kvm_pgtable *pgt, void *lock);
> > +void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt);
> > +#else
> > +static inline void ptdump_register_guest_stage2(struct kvm_pgtable *pgt,
> > + void *lock) { }
> > +static inline void ptdump_unregister_guest_stage2(struct kvm_pgtable *pgt) { }
> > +#endif /* CONFIG_NVHE_EL2_PTDUMP_DEBUGFS */
>
> I believe this should be compatible with VHE as well, that option should be
> renamed.
>
Good point, I will rename this.
> > +
> > #ifdef CONFIG_DEBUG_WX
> > #define debug_checkwx() ptdump_check_wx()
> > #else
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 482280fe22d7..e47988dba34d 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -11,6 +11,7 @@
> > #include <linux/sched/signal.h>
> > #include <trace/events/kvm.h>
> > #include <asm/pgalloc.h>
> > +#include <asm/ptdump.h>
> > #include <asm/cacheflush.h>
> > #include <asm/kvm_arm.h>
> > #include <asm/kvm_mmu.h>
> > @@ -908,6 +909,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
> > if (err)
> > goto out_free_pgtable;
> >
> > + ptdump_register_guest_stage2(pgt, &kvm->mmu_lock);
> > mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
> > if (!mmu->last_vcpu_ran) {
> > err = -ENOMEM;
> > @@ -1021,6 +1023,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> > write_unlock(&kvm->mmu_lock);
> >
> > if (pgt) {
> > + ptdump_unregister_guest_stage2(pgt);
> > kvm_pgtable_stage2_destroy(pgt);
> > kfree(pgt);
> > }
> > diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> > index 4687840dcb69..facfb15468f5 100644
> > --- a/arch/arm64/mm/ptdump.c
> > +++ b/arch/arm64/mm/ptdump.c
> > @@ -26,6 +26,7 @@
> > #include <asm/ptdump.h>
> > #include <asm/kvm_pkvm.h>
> > #include <asm/kvm_pgtable.h>
> > +#include <asm/kvm_host.h>
> >
> >
> > enum address_markers_idx {
> > @@ -543,6 +544,22 @@ void ptdump_check_wx(void)
> > #ifdef CONFIG_NVHE_EL2_PTDUMP_DEBUGFS
> > static struct ptdump_info stage2_kernel_ptdump_info;
> >
> > +#define GUEST_NAME_LEN (32U)
> > +
> > +struct ptdump_registered_guest {
> > + struct list_head reg_list;
> > + struct ptdump_info info;
> > + struct mm_struct mem;
> > + struct kvm_pgtable_snapshot snapshot;
> > + struct dentry *dentry;
> > + rwlock_t *lock;
> > + char reg_name[GUEST_NAME_LEN];
> > +};
> > +
> > +static LIST_HEAD(ptdump_guest_list);
> > +static DEFINE_MUTEX(ptdump_list_lock);
> > +static u16 guest_no;
>
> This is not robust enough: If 1 VM starts then 65535 others which are killed.
> guest_no overflows. The next number is 0 which is already taken.
>
Yes, I guess this should be improved. In the case you described we won't
register any debugfs file because of the name clash.
> Linux has and ID allocation to solve this problem, but I don't think this is
> necessary anyway. This should simply reuse the struct kvm->debugfs_dentry.
>
> Also probably most of the informations contained in ptdump_registered_guest can
> be found in struct kvm. The debugfs should then probably simply take struct kvm
> for the private argument.
>
I would prefer to keep it as a separate struct here as it gives some
flexibility if we need to extend it for guests pKVM support. I think we
can drop the struct mm_struct from here.
Thanks,
Sebastian
> > +
> > static phys_addr_t ptdump_host_pa(void *addr)
> > {
> > return __pa(addr);
> > @@ -740,6 +757,73 @@ static void stage2_ptdump_walk(struct seq_file *s, struct ptdump_info *info)
> >
> > kvm_pgtable_walk(pgtable, start_ipa, end_ipa, &walker);
> > }
>
> [...]