2021-10-19 12:15:31

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 00/15] KVM: arm64: pkvm: Implement unshare hypercall

Hi all,

This is v2 of the series previously posted here:

https://lore.kernel.org/kvmarm/[email protected]/

This series implements an unshare hypercall at EL2 in nVHE protected
mode, and makes use of it to unmmap guest-specific data-structures from
EL2 stage-1 during guest tear-down. Crucially, the implementation of the
share and unshare routines use page refcounts in the host kernel to
avoid accidentally unmapping data-structures that overlap a common page.

This series has two main benefits. Firstly it allows EL2 to track the
state of shared pages cleanly, as they can now transition from SHARED
back to OWNED. This will simplify permission checks once e.g. pkvm
implements a donation hcall to provide memory to protected guests, as
there should then be no reason for the host to donate a page that is
currently marked shared. And secondly, it avoids having dangling
mappings in the hypervisor's stage-1, which should be a good idea from
a security perspective as the hypervisor is obviously running with
elevated privileges. And perhaps worth noting is that this also
refactors the EL2 page-tracking checks in a more scalable way, which
should allow to implement other memory transitions (host donating memory
to a guest, a guest sharing back with the host, ...) much more easily in
the future.

Changes since v2:

- moved the refcounting of pages shared more than once to the host in
order to simplify and optimize the hyp code;

- synchronized lifetime of the vcpu and its parent task struct using
get_task_struct() / put_task_struct();

- rebased on kvmarm/next

- rebased on Marc's v2 refactoring of the first vcpu run:
https://lore.kernel.org/kvmarm/[email protected]

- small improvements/refactoring throughout;

This has been lightly tested on Qemu, by spawning and powering off a
guest 50 times. You can find a branch with everything applied here:

https://android-kvm.googlesource.com/linux qperret/hyp-unshare-v2

Thanks!
Quentin


Quentin Perret (7):
KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings()
KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
KVM: arm64: Refcount hyp stage-1 pgtable pages
KVM: arm64: Fixup hyp stage-1 refcount
KVM: arm64: Introduce kvm_share_hyp()
KVM: arm64: pkvm: Refcount the pages shared with EL2
KVM: arm64: pkvm: Unshare guest structs during teardown

Will Deacon (8):
KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages
KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
KVM: arm64: Implement do_share() helper for sharing memory
KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()
KVM: arm64: Implement do_unshare() helper for unsharing memory
KVM: arm64: Expose unshare hypercall to the host

arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/include/asm/kvm_mmu.h | 2 +
arch/arm64/include/asm/kvm_pgtable.h | 21 +
arch/arm64/kvm/arm.c | 6 +-
arch/arm64/kvm/fpsimd.c | 33 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +
arch/arm64/kvm/hyp/nvhe/early_alloc.c | 5 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 500 +++++++++++++++---
arch/arm64/kvm/hyp/nvhe/setup.c | 32 +-
arch/arm64/kvm/hyp/pgtable.c | 80 ++-
arch/arm64/kvm/mmu.c | 132 ++++-
arch/arm64/kvm/reset.c | 10 +-
14 files changed, 733 insertions(+), 105 deletions(-)

--
2.33.0.1079.g6e70778dc9-goog


2021-10-19 12:15:40

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator

In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.

In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/early_alloc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
index 1306c430ab87..00de04153cc6 100644
--- a/arch/arm64/kvm/hyp/nvhe/early_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -43,6 +43,9 @@ void *hyp_early_alloc_page(void *arg)
return hyp_early_alloc_contig(1);
}

+static void hyp_early_alloc_get_page(void *addr) { }
+static void hyp_early_alloc_put_page(void *addr) { }
+
void hyp_early_alloc_init(void *virt, unsigned long size)
{
base = cur = (unsigned long)virt;
@@ -51,4 +54,6 @@ void hyp_early_alloc_init(void *virt, unsigned long size)
hyp_early_alloc_mm_ops.zalloc_page = hyp_early_alloc_page;
hyp_early_alloc_mm_ops.phys_to_virt = hyp_phys_to_virt;
hyp_early_alloc_mm_ops.virt_to_phys = hyp_virt_to_phys;
+ hyp_early_alloc_mm_ops.get_page = hyp_early_alloc_get_page;
+ hyp_early_alloc_mm_ops.put_page = hyp_early_alloc_put_page;
}
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:15:53

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages

To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f8ceebe4982e..768a58835153 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -408,8 +408,10 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
return false;

new = kvm_init_valid_leaf_pte(phys, data->attr, level);
- if (hyp_pte_needs_update(old, new))
+ if (hyp_pte_needs_update(old, new)) {
smp_store_release(ptep, new);
+ data->mm_ops->get_page(ptep);
+ }

data->phys += granule;
return true;
@@ -433,6 +435,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
return -ENOMEM;

kvm_set_table_pte(ptep, childp, mm_ops);
+ mm_ops->get_page(ptep);
return 0;
}

@@ -482,8 +485,16 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
struct kvm_pgtable_mm_ops *mm_ops = arg;
+ kvm_pte_t pte = *ptep;
+
+ if (!kvm_pte_valid(pte))
+ return 0;
+
+ mm_ops->put_page(ptep);
+
+ if (kvm_pte_table(pte, level))
+ mm_ops->put_page(kvm_pte_follow(pte, mm_ops));

- mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
return 0;
}

@@ -491,7 +502,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
{
struct kvm_pgtable_walker walker = {
.cb = hyp_free_walker,
- .flags = KVM_PGTABLE_WALK_TABLE_POST,
+ .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
.arg = pgt->mm_ops,
};

--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:03

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 04/15] KVM: arm64: Fixup hyp stage-1 refcount

In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/setup.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 862c7b514e20..9e5c92237a51 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -167,12 +167,22 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
{
enum kvm_pgtable_prot prot;
enum pkvm_page_state state;
+ struct kvm_pgtable_mm_ops *mm_ops = arg;
kvm_pte_t pte = *ptep;
phys_addr_t phys;

if (!kvm_pte_valid(pte))
return 0;

+ /*
+ * Fix-up the refcount for the page-table pages as the early allocator
+ * was unable to access the hyp_vmemmap and so the buddy allocator has
+ * initialised the refcount to '1'.
+ */
+ mm_ops->get_page(ptep);
+ if (flag != KVM_PGTABLE_WALK_LEAF)
+ return 0;
+
if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;

@@ -205,7 +215,8 @@ static int finalize_host_mappings(void)
{
struct kvm_pgtable_walker walker = {
.cb = finalize_host_mappings_walker,
- .flags = KVM_PGTABLE_WALK_LEAF,
+ .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+ .arg = pkvm_pgtable.mm_ops,
};

return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits), &walker);
@@ -230,19 +241,19 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;

- ret = finalize_host_mappings();
- if (ret)
- goto out;
-
pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
- .zalloc_page = hyp_zalloc_hyp_page,
- .phys_to_virt = hyp_phys_to_virt,
- .virt_to_phys = hyp_virt_to_phys,
- .get_page = hpool_get_page,
- .put_page = hpool_put_page,
+ .zalloc_page = hyp_zalloc_hyp_page,
+ .phys_to_virt = hyp_phys_to_virt,
+ .virt_to_phys = hyp_virt_to_phys,
+ .get_page = hpool_get_page,
+ .put_page = hpool_put_page,
};
pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

+ ret = finalize_host_mappings();
+ if (ret)
+ goto out;
+
out:
/*
* We tail-called to here from handle___pkvm_init() and will not return,
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:03

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 07/15] KVM: arm64: Introduce kvm_share_hyp()

The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 1 +
arch/arm64/kvm/arm.c | 4 ++--
arch/arm64/kvm/fpsimd.c | 4 ++--
arch/arm64/kvm/mmu.c | 19 +++++++++++++------
arch/arm64/kvm/reset.c | 2 +-
5 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 02d378887743..185d0f62b724 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -150,6 +150,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
#include <asm/kvm_pgtable.h>
#include <asm/stage2_pgtable.h>

+int kvm_share_hyp(void *from, void *to);
int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e1cc3f8560e7..5312d5a2ef69 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -146,7 +146,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
return ret;

- ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
+ ret = kvm_share_hyp(kvm, kvm + 1);
if (ret)
goto out_free_stage2_pgd;

@@ -343,7 +343,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
if (err)
return err;

- return create_hyp_mappings(vcpu, vcpu + 1, PAGE_HYP);
+ return kvm_share_hyp(vcpu, vcpu + 1);
}

void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 62c0d78da7be..2fe1128d9f3d 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -35,11 +35,11 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
* Make sure the host task thread flags and fpsimd state are
* visible to hyp:
*/
- ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
+ ret = kvm_share_hyp(ti, ti + 1);
if (ret)
goto error;

- ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
+ ret = kvm_share_hyp(fpsimd, fpsimd + 1);
if (ret)
goto error;

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0019b2309f70..0cc4b295e525 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -299,6 +299,17 @@ static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
return 0;
}

+int kvm_share_hyp(void *from, void *to)
+{
+ if (is_kernel_in_hyp_mode())
+ return 0;
+
+ if (kvm_host_owns_hyp_mappings())
+ return create_hyp_mappings(from, to, PAGE_HYP);
+
+ return pkvm_share_hyp(kvm_kaddr_to_phys(from), kvm_kaddr_to_phys(to));
+}
+
/**
* create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
* @from: The virtual kernel start address of the range
@@ -319,12 +330,8 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
if (is_kernel_in_hyp_mode())
return 0;

- if (!kvm_host_owns_hyp_mappings()) {
- if (WARN_ON(prot != PAGE_HYP))
- return -EPERM;
- return pkvm_share_hyp(kvm_kaddr_to_phys(from),
- kvm_kaddr_to_phys(to));
- }
+ if (!kvm_host_owns_hyp_mappings())
+ return -EPERM;

start = start & PAGE_MASK;
end = PAGE_ALIGN(end);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 59e4c6e819e9..cf781ec9212a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -113,7 +113,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
if (!buf)
return -ENOMEM;

- ret = create_hyp_mappings(buf, buf + reg_sz, PAGE_HYP);
+ ret = kvm_share_hyp(buf, buf + reg_sz);
if (ret) {
kfree(buf);
return ret;
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:12

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2

In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/mmu.c | 78 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 68 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0cc4b295e525..f0c16ed95974 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -284,30 +284,88 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
}
}

-static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
+struct hyp_shared_pfn {
+ u64 pfn;
+ int count;
+ struct rb_node node;
+};
+
+static DEFINE_MUTEX(hyp_shared_pfns_lock);
+static struct rb_root hyp_shared_pfns = RB_ROOT;
+
+static struct hyp_shared_pfn *find_shared_pfn(u64 pfn, struct rb_node ***node,
+ struct rb_node **parent)
{
- phys_addr_t addr;
- int ret;
+ struct hyp_shared_pfn *this;
+
+ *node = &hyp_shared_pfns.rb_node;
+ *parent = NULL;
+ while (**node) {
+ this = container_of(**node, struct hyp_shared_pfn, node);
+ *parent = **node;
+ if (this->pfn < pfn)
+ *node = &((**node)->rb_left);
+ else if (this->pfn > pfn)
+ *node = &((**node)->rb_right);
+ else
+ return this;
+ }

- for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += PAGE_SIZE) {
- ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
- __phys_to_pfn(addr));
- if (ret)
- return ret;
+ return NULL;
+}
+
+static int share_pfn_hyp(u64 pfn)
+{
+ struct rb_node **node, *parent;
+ struct hyp_shared_pfn *this;
+ int ret = 0;
+
+ mutex_lock(&hyp_shared_pfns_lock);
+ this = find_shared_pfn(pfn, &node, &parent);
+ if (this) {
+ this->count++;
+ goto unlock;
}

- return 0;
+ this = kzalloc(sizeof(*this), GFP_KERNEL);
+ if (!this) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+
+ this->pfn = pfn;
+ this->count = 1;
+ rb_link_node(&this->node, parent, node);
+ rb_insert_color(&this->node, &hyp_shared_pfns);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, pfn, 1);
+unlock:
+ mutex_unlock(&hyp_shared_pfns_lock);
+
+ return ret;
}

int kvm_share_hyp(void *from, void *to)
{
+ phys_addr_t start, end, cur;
+ u64 pfn;
+ int ret;
+
if (is_kernel_in_hyp_mode())
return 0;

if (kvm_host_owns_hyp_mappings())
return create_hyp_mappings(from, to, PAGE_HYP);

- return pkvm_share_hyp(kvm_kaddr_to_phys(from), kvm_kaddr_to_phys(to));
+ start = ALIGN_DOWN(kvm_kaddr_to_phys(from), PAGE_SIZE);
+ end = PAGE_ALIGN(kvm_kaddr_to_phys(to));
+ for (cur = start; cur < end; cur += PAGE_SIZE) {
+ pfn = __phys_to_pfn(cur);
+ ret = share_pfn_hyp(pfn);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
}

/**
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:14

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2

From: Will Deacon <[email protected]>

Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
stage-1 mappings at EL2.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 21 ++++++++++
arch/arm64/kvm/hyp/pgtable.c | 63 ++++++++++++++++++++++++++++
2 files changed, 84 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 027783829584..9d076f36401d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -251,6 +251,27 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
enum kvm_pgtable_prot prot);

+/**
+ * kvm_pgtable_hyp_unmap() - Remove a mapping from a hypervisor stage-1 page-table.
+ * @pgt: Page-table structure initialised by kvm_pgtable_hyp_init().
+ * @addr: Virtual address from which to remove the mapping.
+ * @size: Size of the mapping.
+ *
+ * The offset of @addr within a page is ignored, @size is rounded-up to
+ * the next page boundary and @phys is rounded-down to the previous page
+ * boundary.
+ *
+ * TLB invalidation is performed for each page-table entry cleared during the
+ * unmapping operation and the reference count for the page-table page
+ * containing the cleared entry is decremented, with unreferenced pages being
+ * freed. The unmapping operation will stop early if it encounters either an
+ * invalid page-table entry or a valid block mapping which maps beyond the range
+ * being unmapped.
+ *
+ * Return: Number of bytes unmapped, which may be 0.
+ */
+u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
+
/**
* kvm_get_vtcr() - Helper to construct VTCR_EL2
* @mmfr0: Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 768a58835153..6ad4cb2d6947 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -463,6 +463,69 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
return ret;
}

+struct hyp_unmap_data {
+ u64 unmapped;
+ struct kvm_pgtable_mm_ops *mm_ops;
+};
+
+static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+ enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+ kvm_pte_t pte = *ptep, *childp = NULL;
+ u64 granule = kvm_granule_size(level);
+ struct hyp_unmap_data *data = arg;
+ struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+
+ if (!kvm_pte_valid(pte))
+ return -EINVAL;
+
+ if (kvm_pte_table(pte, level)) {
+ childp = kvm_pte_follow(pte, mm_ops);
+
+ if (mm_ops->page_count(childp) != 1)
+ return 0;
+
+ kvm_clear_pte(ptep);
+ dsb(ishst);
+ __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
+ } else {
+ if (end - addr < granule)
+ return -EINVAL;
+
+ kvm_clear_pte(ptep);
+ dsb(ishst);
+ __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
+ data->unmapped += granule;
+ }
+
+ dsb(ish);
+ isb();
+ mm_ops->put_page(ptep);
+
+ if (childp)
+ mm_ops->put_page(childp);
+
+ return 0;
+}
+
+u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+ struct hyp_unmap_data unmap_data = {
+ .mm_ops = pgt->mm_ops,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = hyp_unmap_walker,
+ .arg = &unmap_data,
+ .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+ };
+
+ if (!pgt->mm_ops->page_count)
+ return 0;
+
+ kvm_pgtable_walk(pgt, addr, size, &walker);
+ return unmap_data.unmapped;
+}
+
int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
struct kvm_pgtable_mm_ops *mm_ops)
{
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:24

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages

From: Will Deacon <[email protected]>

Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.

This state will be used for permission checking during page transitions
in later patches.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index b58c910babaf..56445586c755 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -24,6 +24,11 @@ enum pkvm_page_state {
PKVM_PAGE_OWNED = 0ULL,
PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
+ __PKVM_PAGE_RESERVED = KVM_PGTABLE_PROT_SW0 |
+ KVM_PGTABLE_PROT_SW1,
+
+ /* Meta-states which aren't encoded directly in the PTE's SW bits */
+ PKVM_NOPAGE,
};

#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:53

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors

From: Will Deacon <[email protected]>

In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 ++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1f0fb0780622..c111e0fcfa0a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -27,6 +27,26 @@ static struct hyp_pool host_s2_pool;

const u8 pkvm_hyp_id = 1;

+static void host_lock_component(void)
+{
+ hyp_spin_lock(&host_kvm.lock);
+}
+
+static void host_unlock_component(void)
+{
+ hyp_spin_unlock(&host_kvm.lock);
+}
+
+static void hyp_lock_component(void)
+{
+ hyp_spin_lock(&pkvm_pgd_lock);
+}
+
+static void hyp_unlock_component(void)
+{
+ hyp_spin_unlock(&pkvm_pgd_lock);
+}
+
static void *host_s2_zalloc_pages_exact(size_t size)
{
return hyp_alloc_pages(&host_s2_pool, get_order(size));
@@ -327,14 +347,14 @@ static int host_stage2_idmap(u64 addr)

prot = is_memory ? PKVM_HOST_MEM_PROT : PKVM_HOST_MMIO_PROT;

- hyp_spin_lock(&host_kvm.lock);
+ host_lock_component();
ret = host_stage2_adjust_range(addr, &range);
if (ret)
goto unlock;

ret = host_stage2_idmap_locked(range.start, range.end - range.start, prot);
unlock:
- hyp_spin_unlock(&host_kvm.lock);
+ host_unlock_component();

return ret;
}
@@ -358,8 +378,8 @@ int __pkvm_host_share_hyp(u64 pfn)
if (!addr_is_memory(addr))
return -EINVAL;

- hyp_spin_lock(&host_kvm.lock);
- hyp_spin_lock(&pkvm_pgd_lock);
+ host_lock_component();
+ hyp_lock_component();

ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, NULL);
if (ret)
@@ -421,8 +441,8 @@ int __pkvm_host_share_hyp(u64 pfn)
BUG_ON(ret);

unlock:
- hyp_spin_unlock(&pkvm_pgd_lock);
- hyp_spin_unlock(&host_kvm.lock);
+ hyp_unlock_component();
+ host_unlock_component();

return ret;
}
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:16:54

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 14/15] KVM: arm64: Expose unshare hypercall to the host

From: Will Deacon <[email protected]>

Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 +++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 33 +++++++++++++++++++
4 files changed, 43 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 0167f3702c61..59c55aedb0ae 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -63,6 +63,7 @@ enum __kvm_host_smccc_func {

/* Hypercalls available after pKVM finalisation */
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 56445586c755..80e99836eac7 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -55,6 +55,7 @@ extern const u8 pkvm_hyp_id;

int __pkvm_prot_finalize(void);
int __pkvm_host_share_hyp(u64 pfn);
+int __pkvm_host_unshare_hyp(u64 pfn);

bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index b096bf009144..5e2197db0d32 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -147,6 +147,13 @@ static void handle___pkvm_host_share_hyp(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_host_share_hyp(pfn);
}

+static void handle___pkvm_host_unshare_hyp(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, pfn, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_host_unshare_hyp(pfn);
+}
+
static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -184,6 +191,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_prot_finalize),

HANDLE_FUNC(__pkvm_host_share_hyp),
+ HANDLE_FUNC(__pkvm_host_unshare_hyp),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 205616c9f107..fc542585a964 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -757,3 +757,36 @@ int __pkvm_host_share_hyp(u64 pfn)

return ret;
}
+
+int __pkvm_host_unshare_hyp(u64 pfn)
+{
+ int ret;
+ u64 host_addr = hyp_pfn_to_phys(pfn);
+ u64 hyp_addr = (u64)__hyp_va(host_addr);
+ struct pkvm_mem_share share = {
+ .tx = {
+ .nr_pages = 1,
+ .initiator = {
+ .id = PKVM_ID_HOST,
+ .addr = host_addr,
+ .host = {
+ .completer_addr = hyp_addr,
+ },
+ },
+ .completer = {
+ .id = PKVM_ID_HYP,
+ },
+ },
+ .prot = PAGE_HYP,
+ };
+
+ host_lock_component();
+ hyp_lock_component();
+
+ ret = do_unshare(&share);
+
+ hyp_unlock_component();
+ host_unlock_component();
+
+ return ret;
+}
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:17:15

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory

From: Will Deacon <[email protected]>

Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.

Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 115 ++++++++++++++++++++++++++
1 file changed, 115 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ab1732c38b3b..205616c9f107 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -474,6 +474,16 @@ static int host_request_owned_transition(u64 *completer_addr,
return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
}

+static int host_request_unshare(u64 *completer_addr,
+ const struct pkvm_mem_transition *tx)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+ u64 addr = tx->initiator.addr;
+
+ *completer_addr = tx->initiator.host.completer_addr;
+ return __host_check_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
+}
+
static int host_initiate_share(u64 *completer_addr,
const struct pkvm_mem_transition *tx)
{
@@ -484,6 +494,16 @@ static int host_initiate_share(u64 *completer_addr,
return __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
}

+static int host_initiate_unshare(u64 *completer_addr,
+ const struct pkvm_mem_transition *tx)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+ u64 addr = tx->initiator.addr;
+
+ *completer_addr = tx->initiator.host.completer_addr;
+ return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
{
if (!kvm_pte_valid(pte))
@@ -524,6 +544,17 @@ static int hyp_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
}

+static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+
+ if (__hyp_ack_skip_pgtable_check(tx))
+ return 0;
+
+ return __hyp_check_page_state_range(addr, size,
+ PKVM_PAGE_SHARED_BORROWED);
+}
+
static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
enum kvm_pgtable_prot perms)
{
@@ -534,6 +565,14 @@ static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
return pkvm_create_mappings_locked(start, end, prot);
}

+static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+ int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, addr, size);
+
+ return (ret != size) ? -EFAULT : 0;
+}
+
static int check_share(struct pkvm_mem_share *share)
{
const struct pkvm_mem_transition *tx = &share->tx;
@@ -610,6 +649,82 @@ static int do_share(struct pkvm_mem_share *share)
return WARN_ON(__do_share(share));
}

+static int check_unshare(struct pkvm_mem_share *share)
+{
+ const struct pkvm_mem_transition *tx = &share->tx;
+ u64 completer_addr;
+ int ret;
+
+ switch (tx->initiator.id) {
+ case PKVM_ID_HOST:
+ ret = host_request_unshare(&completer_addr, tx);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ if (ret)
+ return ret;
+
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ ret = hyp_ack_unshare(completer_addr, tx);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+static int __do_unshare(struct pkvm_mem_share *share)
+{
+ const struct pkvm_mem_transition *tx = &share->tx;
+ u64 completer_addr;
+ int ret;
+
+ switch (tx->initiator.id) {
+ case PKVM_ID_HOST:
+ ret = host_initiate_unshare(&completer_addr, tx);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ if (ret)
+ return ret;
+
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ ret = hyp_complete_unshare(completer_addr, tx);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+/*
+ * do_unshare():
+ *
+ * The page owner revokes access from another component for a range of
+ * pages which were previously shared using do_share().
+ *
+ * Initiator: SHARED_OWNED => OWNED
+ * Completer: SHARED_BORROWED => NOPAGE
+ */
+static int do_unshare(struct pkvm_mem_share *share)
+{
+ int ret;
+
+ ret = check_unshare(share);
+ if (ret)
+ return ret;
+
+ return WARN_ON(__do_unshare(share));
+}
+
int __pkvm_host_share_hyp(u64 pfn)
{
int ret;
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:17:54

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 12/15] KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()

From: Will Deacon <[email protected]>

__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.

Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 121 +++++++-------------------
1 file changed, 33 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d87cdd53dd21..ab1732c38b3b 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -359,94 +359,6 @@ static int host_stage2_idmap(u64 addr)
return ret;
}

-static inline bool check_prot(enum kvm_pgtable_prot prot,
- enum kvm_pgtable_prot required,
- enum kvm_pgtable_prot denied)
-{
- return (prot & (required | denied)) == required;
-}
-
-int __pkvm_host_share_hyp(u64 pfn)
-{
- phys_addr_t addr = hyp_pfn_to_phys(pfn);
- enum kvm_pgtable_prot prot, cur;
- void *virt = __hyp_va(addr);
- enum pkvm_page_state state;
- kvm_pte_t pte;
- int ret;
-
- if (!addr_is_memory(addr))
- return -EINVAL;
-
- host_lock_component();
- hyp_lock_component();
-
- ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, NULL);
- if (ret)
- goto unlock;
- if (!pte)
- goto map_shared;
-
- /*
- * Check attributes in the host stage-2 PTE. We need the page to be:
- * - mapped RWX as we're sharing memory;
- * - not borrowed, as that implies absence of ownership.
- * Otherwise, we can't let it got through
- */
- cur = kvm_pgtable_stage2_pte_prot(pte);
- prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
- if (!check_prot(cur, PKVM_HOST_MEM_PROT, prot)) {
- ret = -EPERM;
- goto unlock;
- }
-
- state = pkvm_getstate(cur);
- if (state == PKVM_PAGE_OWNED)
- goto map_shared;
-
- /*
- * Tolerate double-sharing the same page, but this requires
- * cross-checking the hypervisor stage-1.
- */
- if (state != PKVM_PAGE_SHARED_OWNED) {
- ret = -EPERM;
- goto unlock;
- }
-
- ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, NULL);
- if (ret)
- goto unlock;
-
- /*
- * If the page has been shared with the hypervisor, it must be
- * already mapped as SHARED_BORROWED in its stage-1.
- */
- cur = kvm_pgtable_hyp_pte_prot(pte);
- prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
- if (!check_prot(cur, prot, ~prot))
- ret = -EPERM;
- goto unlock;
-
-map_shared:
- /*
- * If the page is not yet shared, adjust mappings in both page-tables
- * while both locks are held.
- */
- prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
- ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
- BUG_ON(ret);
-
- prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
- ret = host_stage2_idmap_locked(addr, PAGE_SIZE, prot);
- BUG_ON(ret);
-
-unlock:
- hyp_unlock_component();
- host_unlock_component();
-
- return ret;
-}
-
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
@@ -697,3 +609,36 @@ static int do_share(struct pkvm_mem_share *share)

return WARN_ON(__do_share(share));
}
+
+int __pkvm_host_share_hyp(u64 pfn)
+{
+ int ret;
+ u64 host_addr = hyp_pfn_to_phys(pfn);
+ u64 hyp_addr = (u64)__hyp_va(host_addr);
+ struct pkvm_mem_share share = {
+ .tx = {
+ .nr_pages = 1,
+ .initiator = {
+ .id = PKVM_ID_HOST,
+ .addr = host_addr,
+ .host = {
+ .completer_addr = hyp_addr,
+ },
+ },
+ .completer = {
+ .id = PKVM_ID_HYP,
+ },
+ },
+ .prot = PAGE_HYP,
+ };
+
+ host_lock_component();
+ hyp_lock_component();
+
+ ret = do_share(&share);
+
+ hyp_unlock_component();
+ host_unlock_component();
+
+ return ret;
+}
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:18:34

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 11/15] KVM: arm64: Implement do_share() helper for sharing memory

From: Will Deacon <[email protected]>

By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.

Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 237 ++++++++++++++++++++++++++
1 file changed, 237 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c111e0fcfa0a..d87cdd53dd21 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -460,3 +460,240 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
ret = host_stage2_idmap(addr);
BUG_ON(ret && ret != -EAGAIN);
}
+
+/* This corresponds to locking order */
+enum pkvm_component_id {
+ PKVM_ID_HOST,
+ PKVM_ID_HYP,
+};
+
+struct pkvm_mem_transition {
+ u64 nr_pages;
+
+ struct {
+ enum pkvm_component_id id;
+ /* Address in the initiator's address space */
+ u64 addr;
+
+ union {
+ struct {
+ /* Address in the completer's address space */
+ u64 completer_addr;
+ } host;
+ };
+ } initiator;
+
+ struct {
+ enum pkvm_component_id id;
+ } completer;
+};
+
+struct pkvm_mem_share {
+ const struct pkvm_mem_transition tx;
+ const enum kvm_pgtable_prot prot;
+};
+
+struct check_walk_data {
+ enum pkvm_page_state desired;
+ enum pkvm_page_state (*get_page_state)(kvm_pte_t pte);
+};
+
+static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
+ kvm_pte_t *ptep,
+ enum kvm_pgtable_walk_flags flag,
+ void * const arg)
+{
+ struct check_walk_data *d = arg;
+ kvm_pte_t pte = *ptep;
+
+ if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+ return -EINVAL;
+
+ return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+}
+
+static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ struct check_walk_data *data)
+{
+ struct kvm_pgtable_walker walker = {
+ .cb = __check_page_state_visitor,
+ .arg = data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ return kvm_pgtable_walk(pgt, addr, size, &walker);
+}
+
+static enum pkvm_page_state host_get_page_state(kvm_pte_t pte)
+{
+ if (!kvm_pte_valid(pte) && pte)
+ return PKVM_NOPAGE;
+
+ return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __host_check_page_state_range(u64 addr, u64 size,
+ enum pkvm_page_state state)
+{
+ struct check_walk_data d = {
+ .desired = state,
+ .get_page_state = host_get_page_state,
+ };
+
+ hyp_assert_lock_held(&host_kvm.lock);
+ return check_page_state_range(&host_kvm.pgt, addr, size, &d);
+}
+
+static int __host_set_page_state_range(u64 addr, u64 size,
+ enum pkvm_page_state state)
+{
+ enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
+
+ return host_stage2_idmap_locked(addr, size, prot);
+}
+
+static int host_request_owned_transition(u64 *completer_addr,
+ const struct pkvm_mem_transition *tx)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+ u64 addr = tx->initiator.addr;
+
+ *completer_addr = tx->initiator.host.completer_addr;
+ return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int host_initiate_share(u64 *completer_addr,
+ const struct pkvm_mem_transition *tx)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+ u64 addr = tx->initiator.addr;
+
+ *completer_addr = tx->initiator.host.completer_addr;
+ return __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
+}
+
+static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
+{
+ if (!kvm_pte_valid(pte))
+ return PKVM_NOPAGE;
+
+ return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __hyp_check_page_state_range(u64 addr, u64 size,
+ enum pkvm_page_state state)
+{
+ struct check_walk_data d = {
+ .desired = state,
+ .get_page_state = hyp_get_page_state,
+ };
+
+ hyp_assert_lock_held(&pkvm_pgd_lock);
+ return check_page_state_range(&pkvm_pgtable, addr, size, &d);
+}
+
+static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+ return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+ tx->initiator.id != PKVM_ID_HOST);
+}
+
+static int hyp_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
+ enum kvm_pgtable_prot perms)
+{
+ u64 size = tx->nr_pages * PAGE_SIZE;
+
+ if (perms != PAGE_HYP)
+ return -EPERM;
+
+ if (__hyp_ack_skip_pgtable_check(tx))
+ return 0;
+
+ return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
+static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
+ enum kvm_pgtable_prot perms)
+{
+ void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+ enum kvm_pgtable_prot prot;
+
+ prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
+ return pkvm_create_mappings_locked(start, end, prot);
+}
+
+static int check_share(struct pkvm_mem_share *share)
+{
+ const struct pkvm_mem_transition *tx = &share->tx;
+ u64 completer_addr;
+ int ret;
+
+ switch (tx->initiator.id) {
+ case PKVM_ID_HOST:
+ ret = host_request_owned_transition(&completer_addr, tx);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ if (ret)
+ return ret;
+
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ ret = hyp_ack_share(completer_addr, tx, share->prot);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+static int __do_share(struct pkvm_mem_share *share)
+{
+ const struct pkvm_mem_transition *tx = &share->tx;
+ u64 completer_addr;
+ int ret;
+
+ switch (tx->initiator.id) {
+ case PKVM_ID_HOST:
+ ret = host_initiate_share(&completer_addr, tx);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ if (ret)
+ return ret;
+
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ ret = hyp_complete_share(completer_addr, tx, share->prot);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+/*
+ * do_share():
+ *
+ * The page owner grants access to another component with a given set
+ * of permissions.
+ *
+ * Initiator: OWNED => SHARED_OWNED
+ * Completer: NOPAGE => SHARED_BORROWED
+ */
+static int do_share(struct pkvm_mem_share *share)
+{
+ int ret;
+
+ ret = check_share(share);
+ if (ret)
+ return ret;
+
+ return WARN_ON(__do_share(share));
+}
--
2.33.0.1079.g6e70778dc9-goog

2021-10-19 12:19:33

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown

Make use of the newly introduced unshare hypercall during guest teardown
to unmap guest-related data structures from the hyp stage-1.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/asm/kvm_mmu.h | 1 +
arch/arm64/kvm/arm.c | 2 ++
arch/arm64/kvm/fpsimd.c | 29 +++++++++++++++++++++
arch/arm64/kvm/mmu.c | 42 +++++++++++++++++++++++++++++++
arch/arm64/kvm/reset.c | 8 +++++-
6 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d1451c8a768b..9f70e82c7436 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -323,6 +323,7 @@ struct kvm_vcpu_arch {

struct thread_info *host_thread_info; /* hyp VA */
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
+ struct task_struct *parent_task;

struct {
/* {Break,watch}point registers */
@@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);

static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
{
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 185d0f62b724..81839e9a8a24 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -151,6 +151,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
#include <asm/stage2_pgtable.h>

int kvm_share_hyp(void *from, void *to);
+void kvm_unshare_hyp(void *from, void *to);
int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5312d5a2ef69..ebab9529b736 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -188,6 +188,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
}
}
atomic_set(&kvm->online_vcpus, 0);
+
+ kvm_unshare_hyp(kvm, kvm + 1);
}

int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 2fe1128d9f3d..ace5ebef8aab 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -15,6 +15,22 @@
#include <asm/kvm_mmu.h>
#include <asm/sysreg.h>

+void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
+{
+ struct task_struct *p = vcpu->arch.parent_task;
+ struct user_fpsimd_state *fpsimd;
+ struct thread_info *ti;
+
+ if (!is_protected_kvm_enabled() || !p)
+ return;
+
+ ti = &p->thread_info;
+ kvm_unshare_hyp(ti, ti + 1);
+ fpsimd = &p->thread.uw.fpsimd_state;
+ kvm_unshare_hyp(fpsimd, fpsimd + 1);
+ put_task_struct(p);
+}
+
/*
* Called on entry to KVM_RUN unless this vcpu previously ran at least
* once and the most recent prior KVM_RUN for this vcpu was called from
@@ -31,6 +47,8 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
struct thread_info *ti = &current->thread_info;
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;

+ kvm_vcpu_unshare_task_fp(vcpu);
+
/*
* Make sure the host task thread flags and fpsimd state are
* visible to hyp:
@@ -45,6 +63,17 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)

vcpu->arch.host_thread_info = kern_hyp_va(ti);
vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+
+ /*
+ * We need to keep current's task_struct pinned until its data has been
+ * unshared with the hypervisor to make sure it is not re-used by the
+ * kernel and donated to someone else while already shared -- see
+ * kvm_vcpu_unshare_task_fp() for the matching put_task_struct().
+ */
+ if (is_protected_kvm_enabled()) {
+ get_task_struct(current);
+ vcpu->arch.parent_task = current;
+ }
error:
return ret;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f0c16ed95974..d548141da2ea 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -344,6 +344,32 @@ static int share_pfn_hyp(u64 pfn)
return ret;
}

+static int unshare_pfn_hyp(u64 pfn)
+{
+ struct rb_node **node, *parent;
+ struct hyp_shared_pfn *this;
+ int ret = 0;
+
+ mutex_lock(&hyp_shared_pfns_lock);
+ this = find_shared_pfn(pfn, &node, &parent);
+ if (WARN_ON(!this)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ this->count--;
+ if (this->count)
+ goto unlock;
+
+ rb_erase(&this->node, &hyp_shared_pfns);
+ kfree(this);
+ ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn, 1);
+unlock:
+ mutex_unlock(&hyp_shared_pfns_lock);
+
+ return ret;
+}
+
int kvm_share_hyp(void *from, void *to)
{
phys_addr_t start, end, cur;
@@ -368,6 +394,22 @@ int kvm_share_hyp(void *from, void *to)
return 0;
}

+void kvm_unshare_hyp(void *from, void *to)
+{
+ phys_addr_t start, end, cur;
+ u64 pfn;
+
+ if (is_kernel_in_hyp_mode() || kvm_host_owns_hyp_mappings() || !from)
+ return;
+
+ start = ALIGN_DOWN(kvm_kaddr_to_phys(from), PAGE_SIZE);
+ end = PAGE_ALIGN(kvm_kaddr_to_phys(to));
+ for (cur = start; cur < end; cur += PAGE_SIZE) {
+ pfn = __phys_to_pfn(cur);
+ WARN_ON(unshare_pfn_hyp(pfn));
+ }
+}
+
/**
* create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
* @from: The virtual kernel start address of the range
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index cf781ec9212a..bf15e9de4123 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -150,7 +150,13 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)

void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- kfree(vcpu->arch.sve_state);
+ void *sve_state = vcpu->arch.sve_state;
+
+ kvm_vcpu_unshare_task_fp(vcpu);
+ kvm_unshare_hyp(vcpu, vcpu + 1);
+ if (sve_state)
+ kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
+ kfree(sve_state);
}

static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
--
2.33.0.1079.g6e70778dc9-goog

2021-10-21 10:10:23

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v2 07/15] KVM: arm64: Introduce kvm_share_hyp()

Hi Quentin,

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 0019b2309f70..0cc4b295e525 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -299,6 +299,17 @@ static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
> return 0;
> }
>
> +int kvm_share_hyp(void *from, void *to)
> +{
> + if (is_kernel_in_hyp_mode())
> + return 0;
> +
> + if (kvm_host_owns_hyp_mappings())
> + return create_hyp_mappings(from, to, PAGE_HYP);
> +
> + return pkvm_share_hyp(kvm_kaddr_to_phys(from), kvm_kaddr_to_phys(to));

We should be careful about vmalloc memory here. kvm_kaddr_to_phys will
happily return the physical address but the range is not guaranteed to
be physically contiguous. It doesn't look like this series ever shares
vmalloc memory, but we should make it harder for users of this function
to shoot themselves in the foot.

One option would be to turn this into a loop and call pkvm_share_hyp on
each physical page. But since the hypervisor has no means of making
those pages virtually contigous anyway, probably not the right approach.

We could make it possible to share vmalloc buffer that fit into a page
and reject others. There we still need to be careful about the upper
bound because the way it's written now, 'to' can be treated as
exclusive and 'kvm_kaddr_to_phys(to)' would not always return the page
after 'kvm_kaddr_to_phys(from)'.

-David