2023-12-18 14:06:13

by Sebastian Ene

[permalink] [raw]
Subject: [PATCH v4 00/10] arm64: ptdump: View the second stage page-tables

Hi,

This can be used as a debugging tool for dumping the second stage
page-tables.

When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump registers
'/sys/debug/kvm/<guest_id>/stage2_page_tables' entry with debugfs
upon guest creation. This allows userspace tools (eg. cat) to dump the
stage-2 pagetables by reading the registered file.

Reading the debugfs file shows stage-2 memory ranges in following format:
<IPA range> <size> <descriptor type> <access permissions> <mem_attributes>

Under pKVM configuration(kvm-arm.mode=protected) ptdump registers an entry
for the host stage-2 pagetables in the following path:
/sys/debug/kvm/host_stage2_page_tables/

The tool interprets the pKVM ownership annotation stored in the invalid
entries and dumps to the console the ownership information. To be able
to access the host stage-2 page-tables from the kernel, a new hypervisor
call was introduced which allows us to snapshot the page-tables in a host
provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
as this should be used under debugging environment.

Link to the third version:
https://lore.kernel.org/all/[email protected]/

Link to the second version:
https://lore.kernel.org/all/[email protected]/#r

Link to the first version:
https://lore.kernel.org/all/[email protected]/

Changelog:
v3 -> current_version:
* refactorization: moved all the **KVM** specific components under
kvm/ as suggested by Oliver. Introduced a new file
arm64/kvm/ptdump.c which handled the second stage translation.
re-used only the display portion from mm/ptdump.c

* pagetable snapshot creation now uses memory donated from the host.
The memory is no longer shared with the host as this can pose a security
risk if the host has access to manipulate the pagetable copy while
the hypervisor iterates it.

* fixed a memory leak: while memory was used from the memcache for
building the snapshot pagetable, it was no longer giving back the
pages to the host for freeing. A separate array was introduced to
keep track of the pages allocated from the memcache.

v2 -> v3:
* register the stage-2 debugfs entry for the host under
/sys/debug/kvm/host_stage2_page_tables and in
/sys/debug/kvm/<guest_id>/stage2_page_tables for guests.

* don't use a static array for parsing the attributes description,
generate it dynamically based on the number of pagetable levels

* remove the lock that was guarding the seq_file private inode data,
and keep the data private to the open file session.

* minor fixes & renaming of CONFIG_NVHE_EL2_PTDUMP_DEBUGFS to
CONFIG_PTDUMP_STAGE2_DEBUGFS

v1 -> v2:
* use the stage-2 pagetable walker for dumping descriptors instead of
the one provided by ptdump.

* support for guests pagetables dumping under VHE/nVHE non-protected

Thanks,

Sebastian Ene (10):
KVM: arm64: Add snapshot interface for the host stage-2 pagetable
KVM: arm64: Add ptdump registration with debugfs for the stage-2
pagetables
KVM: arm64: Invoke the snapshot interface for the host stage-2
pagetable
arm64: ptdump: Expose the attribute parsing functionality
arm64: ptdump: Use the mask from the state structure
KVM: arm64: Move pagetable definitions to common header
KVM: arm64: Walk the pagetable snapshot and parse the ptdump
descriptors
arm64: ptdump: Interpret memory attributes based on the runtime config
arm64: ptdump: Interpret pKVM ownership annotations
arm64: ptdump: Add guest stage-2 pagetables dumping

arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 111 +++++
arch/arm64/include/asm/ptdump.h | 49 +-
arch/arm64/kvm/Kconfig | 13 +
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/arm.c | 2 +
arch/arm64/kvm/debug.c | 6 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 27 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 12 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 186 ++++++++
arch/arm64/kvm/hyp/pgtable.c | 85 ++--
arch/arm64/kvm/kvm_ptdump.h | 21 +
arch/arm64/kvm/ptdump.c | 436 ++++++++++++++++++
arch/arm64/mm/ptdump.c | 55 +--
14 files changed, 897 insertions(+), 108 deletions(-)
create mode 100644 arch/arm64/kvm/kvm_ptdump.h
create mode 100644 arch/arm64/kvm/ptdump.c

--
2.43.0.472.g3155946c3a-goog



2023-12-18 14:06:14

by Sebastian Ene

[permalink] [raw]
Subject: [PATCH v4 03/10] KVM: arm64: Invoke the snapshot interface for the host stage-2 pagetable

Allocate memory for the snapshot by creating a memory cache with empty
pages that will be used by the hypervisor during the page table copy.
Get the required size of the PGD and allocate physically contiguous
memory for it. Allocate contiguous memory for an array that is used to
keep track of the pages used from the memcache. Call the snapshot
interface and release the memory for the snapshot.

Signed-off-by: Sebastian Ene <[email protected]>
---
arch/arm64/kvm/ptdump.c | 107 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)

diff --git a/arch/arm64/kvm/ptdump.c b/arch/arm64/kvm/ptdump.c
index 5816fc632..e99bab427 100644
--- a/arch/arm64/kvm/ptdump.c
+++ b/arch/arm64/kvm/ptdump.c
@@ -25,6 +25,9 @@ static int kvm_ptdump_open(struct inode *inode, struct file *file);
static int kvm_ptdump_release(struct inode *inode, struct file *file);
static int kvm_ptdump_show(struct seq_file *m, void *);

+static phys_addr_t get_host_pa(void *addr);
+static void *get_host_va(phys_addr_t pa);
+
static const struct file_operations kvm_ptdump_fops = {
.open = kvm_ptdump_open,
.read = seq_read,
@@ -32,6 +35,11 @@ static const struct file_operations kvm_ptdump_fops = {
.release = kvm_ptdump_release,
};

+static struct kvm_pgtable_mm_ops ptdump_host_mmops = {
+ .phys_to_virt = get_host_va,
+ .virt_to_phys = get_host_pa,
+};
+
static int kvm_ptdump_open(struct inode *inode, struct file *file)
{
struct kvm_ptdump_register *reg = inode->i_private;
@@ -78,11 +86,110 @@ static void kvm_ptdump_debugfs_register(struct kvm_ptdump_register *reg,

static struct kvm_ptdump_register host_reg;

+static size_t host_stage2_get_pgd_len(void)
+{
+ u32 phys_shift = get_kvm_ipa_limit();
+ u64 vtcr = kvm_get_vtcr(read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
+ read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1),
+ phys_shift);
+ return (kvm_pgtable_stage2_pgd_size(vtcr) >> PAGE_SHIFT);
+}
+
+static phys_addr_t get_host_pa(void *addr)
+{
+ return __pa(addr);
+}
+
+static void *get_host_va(phys_addr_t pa)
+{
+ return __va(pa);
+}
+
+static void kvm_host_put_ptdump_info(void *snap)
+{
+ void *mc_page;
+ size_t i;
+ struct kvm_pgtable_snapshot *snapshot;
+
+ if (!snap)
+ return;
+
+ snapshot = snap;
+ while ((mc_page = pop_hyp_memcache(&snapshot->mc, get_host_va)) != NULL)
+ free_page((unsigned long)mc_page);
+
+ if (snapshot->pgd_hva)
+ free_pages_exact(snapshot->pgd_hva, snapshot->pgd_pages);
+
+ if (snapshot->used_pages_hva) {
+ for (i = 0; i < snapshot->used_pages_indx; i++) {
+ mc_page = get_host_va(snapshot->used_pages_hva[i]);
+ free_page((unsigned long)mc_page);
+ }
+
+ free_pages_exact(snapshot->used_pages_hva, snapshot->num_used_pages);
+ }
+
+ free_page((unsigned long)snapshot);
+}
+
+static void *kvm_host_get_ptdump_info(struct kvm_ptdump_register *reg)
+{
+ int i, ret;
+ void *mc_page;
+ struct kvm_pgtable_snapshot *snapshot;
+ size_t memcache_len;
+
+ snapshot = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+ if (!snapshot)
+ return NULL;
+
+ memset(snapshot, 0, sizeof(struct kvm_pgtable_snapshot));
+
+ snapshot->pgd_pages = host_stage2_get_pgd_len();
+ snapshot->pgd_hva = alloc_pages_exact(snapshot->pgd_pages, GFP_KERNEL_ACCOUNT);
+ if (!snapshot->pgd_hva)
+ goto err;
+
+ memcache_len = (size_t)reg->priv;
+ for (i = 0; i < memcache_len; i++) {
+ mc_page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+ if (!mc_page)
+ goto err;
+
+ push_hyp_memcache(&snapshot->mc, mc_page, get_host_pa);
+ }
+
+ snapshot->num_used_pages = DIV_ROUND_UP(sizeof(phys_addr_t) * memcache_len,
+ PAGE_SIZE);
+ snapshot->used_pages_hva = alloc_pages_exact(snapshot->num_used_pages,
+ GFP_KERNEL_ACCOUNT);
+ if (!snapshot->used_pages_hva)
+ goto err;
+
+ ret = kvm_call_hyp_nvhe(__pkvm_host_stage2_snapshot, snapshot);
+ if (ret) {
+ pr_err("ERROR %d snapshot host pagetables\n", ret);
+ goto err;
+ }
+
+ snapshot->pgtable.pgd = get_host_va((phys_addr_t)snapshot->pgtable.pgd);
+ snapshot->pgtable.mm_ops = &ptdump_host_mmops;
+
+ return snapshot;
+err:
+ kvm_host_put_ptdump_info(snapshot);
+ return NULL;
+}
+
void kvm_ptdump_register_host(void)
{
if (!is_protected_kvm_enabled())
return;

+ host_reg.get_ptdump_info = kvm_host_get_ptdump_info;
+ host_reg.put_ptdump_info = kvm_host_put_ptdump_info;
+
kvm_ptdump_debugfs_register(&host_reg, "host_page_tables",
kvm_debugfs_dir);
}
--
2.43.0.472.g3155946c3a-goog


2023-12-19 11:46:08

by Sebastian Ene

[permalink] [raw]
Subject: Re: [PATCH v4 03/10] KVM: arm64: Invoke the snapshot interface for the host stage-2 pagetable

On Mon, Dec 18, 2023 at 01:58:53PM +0000, Sebastian Ene wrote:
> Allocate memory for the snapshot by creating a memory cache with empty
> pages that will be used by the hypervisor during the page table copy.
> Get the required size of the PGD and allocate physically contiguous
> memory for it. Allocate contiguous memory for an array that is used to
> keep track of the pages used from the memcache. Call the snapshot
> interface and release the memory for the snapshot.
>
> Signed-off-by: Sebastian Ene <[email protected]>
> ---
> arch/arm64/kvm/ptdump.c | 107 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 107 insertions(+)
>
> diff --git a/arch/arm64/kvm/ptdump.c b/arch/arm64/kvm/ptdump.c
> index 5816fc632..e99bab427 100644
> --- a/arch/arm64/kvm/ptdump.c
> +++ b/arch/arm64/kvm/ptdump.c
> @@ -25,6 +25,9 @@ static int kvm_ptdump_open(struct inode *inode, struct file *file);
> static int kvm_ptdump_release(struct inode *inode, struct file *file);
> static int kvm_ptdump_show(struct seq_file *m, void *);
>
> +static phys_addr_t get_host_pa(void *addr);
> +static void *get_host_va(phys_addr_t pa);
> +
> static const struct file_operations kvm_ptdump_fops = {
> .open = kvm_ptdump_open,
> .read = seq_read,
> @@ -32,6 +35,11 @@ static const struct file_operations kvm_ptdump_fops = {
> .release = kvm_ptdump_release,
> };
>
> +static struct kvm_pgtable_mm_ops ptdump_host_mmops = {
> + .phys_to_virt = get_host_va,
> + .virt_to_phys = get_host_pa,
> +};
> +
> static int kvm_ptdump_open(struct inode *inode, struct file *file)
> {
> struct kvm_ptdump_register *reg = inode->i_private;
> @@ -78,11 +86,110 @@ static void kvm_ptdump_debugfs_register(struct kvm_ptdump_register *reg,
>
> static struct kvm_ptdump_register host_reg;
>
> +static size_t host_stage2_get_pgd_len(void)
> +{
> + u32 phys_shift = get_kvm_ipa_limit();
> + u64 vtcr = kvm_get_vtcr(read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
> + read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1),
> + phys_shift);
> + return (kvm_pgtable_stage2_pgd_size(vtcr) >> PAGE_SHIFT);
> +}
> +
> +static phys_addr_t get_host_pa(void *addr)
> +{
> + return __pa(addr);
> +}
> +
> +static void *get_host_va(phys_addr_t pa)
> +{
> + return __va(pa);
> +}
> +
> +static void kvm_host_put_ptdump_info(void *snap)
> +{
> + void *mc_page;
> + size_t i;
> + struct kvm_pgtable_snapshot *snapshot;
> +
> + if (!snap)
> + return;
> +
> + snapshot = snap;
> + while ((mc_page = pop_hyp_memcache(&snapshot->mc, get_host_va)) != NULL)
> + free_page((unsigned long)mc_page);
> +
> + if (snapshot->pgd_hva)
> + free_pages_exact(snapshot->pgd_hva, snapshot->pgd_pages);
> +
> + if (snapshot->used_pages_hva) {
> + for (i = 0; i < snapshot->used_pages_indx; i++) {
> + mc_page = get_host_va(snapshot->used_pages_hva[i]);
> + free_page((unsigned long)mc_page);
> + }
> +
> + free_pages_exact(snapshot->used_pages_hva, snapshot->num_used_pages);
> + }
> +
> + free_page((unsigned long)snapshot);
> +}
> +
> +static void *kvm_host_get_ptdump_info(struct kvm_ptdump_register *reg)
> +{
> + int i, ret;
> + void *mc_page;
> + struct kvm_pgtable_snapshot *snapshot;
> + size_t memcache_len;
> +
> + snapshot = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> + if (!snapshot)
> + return NULL;
> +
> + memset(snapshot, 0, sizeof(struct kvm_pgtable_snapshot));
> +
> + snapshot->pgd_pages = host_stage2_get_pgd_len();
> + snapshot->pgd_hva = alloc_pages_exact(snapshot->pgd_pages, GFP_KERNEL_ACCOUNT);

The allocation should be specified in the number of bytes here.

> + if (!snapshot->pgd_hva)
> + goto err;
> +
> + memcache_len = (size_t)reg->priv;
> + for (i = 0; i < memcache_len; i++) {
> + mc_page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> + if (!mc_page)
> + goto err;
> +
> + push_hyp_memcache(&snapshot->mc, mc_page, get_host_pa);
> + }
> +
> + snapshot->num_used_pages = DIV_ROUND_UP(sizeof(phys_addr_t) * memcache_len,
> + PAGE_SIZE);
> + snapshot->used_pages_hva = alloc_pages_exact(snapshot->num_used_pages,
> + GFP_KERNEL_ACCOUNT);

the same as previous comment.

> + if (!snapshot->used_pages_hva)
> + goto err;
> +
> + ret = kvm_call_hyp_nvhe(__pkvm_host_stage2_snapshot, snapshot);
> + if (ret) {
> + pr_err("ERROR %d snapshot host pagetables\n", ret);
> + goto err;
> + }
> +
> + snapshot->pgtable.pgd = get_host_va((phys_addr_t)snapshot->pgtable.pgd);
> + snapshot->pgtable.mm_ops = &ptdump_host_mmops;
> +
> + return snapshot;
> +err:
> + kvm_host_put_ptdump_info(snapshot);
> + return NULL;
> +}
> +
> void kvm_ptdump_register_host(void)
> {
> if (!is_protected_kvm_enabled())
> return;
>
> + host_reg.get_ptdump_info = kvm_host_get_ptdump_info;
> + host_reg.put_ptdump_info = kvm_host_put_ptdump_info;
> +
> kvm_ptdump_debugfs_register(&host_reg, "host_page_tables",
> kvm_debugfs_dir);
> }
> --
> 2.43.0.472.g3155946c3a-goog
>

2023-12-21 18:37:02

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] arm64: ptdump: View the second stage page-tables

Hi Sebastian,

I'm going to try and review the series a bit more after the holidays,
but in general this is unecessarily complex. Furthermore, building out
the feature for pKVM *first* rather than 'normal' VMs makes it all very
challenging to follow.

I would strongly prefer that this series be split in two, adding support
for 'normal' VMs with a follow-up series for getting the pKVM plumbing
in place for host stage-2.

--
Thanks,
Oliver

2023-12-21 18:42:07

by Sebastian Ene

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] arm64: ptdump: View the second stage page-tables

On Thu, Dec 21, 2023 at 06:36:28PM +0000, Oliver Upton wrote:

Hello Oliver,

> Hi Sebastian,
>
> I'm going to try and review the series a bit more after the holidays,
> but in general this is unecessarily complex. Furthermore, building out
> the feature for pKVM *first* rather than 'normal' VMs makes it all very
> challenging to follow.
>

Thanks for having a look. This sounds good, I can split the series into
two afer the holidays and push a new version because I am really not
happy with my current state of the patches (especially because I
discovered some issues after I subimtted on the list).


> I would strongly prefer that this series be split in two, adding support
> for 'normal' VMs with a follow-up series for getting the pKVM plumbing
> in place for host stage-2.
>
> --
> Thanks,
> Oliver

Thank you,
Seb