2021-10-13 16:02:38

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 00/16] KVM: arm64: Implement unshare hypercall for pkvm

Hi all,

This series implements an unshare hypercall at EL2 in nVHE protected
mode, and makes use of it to unmmap guest-specific data-structures from
EL2 stage-1 during guest tear-down. Crucially, the implementation of the
share and unshare hypercall implements page refcounts at EL2 to avoid
accidentally unmapping data-structures that overlap a common page.

This series has two main benefits. Firstly it allows EL2 to track the
state of shared pages cleanly, as they can now transition from SHARED
back to OWNED. This will simplify permission checks once e.g. pkvm
implements a donation hcall to provide memory to protected guests, as
there should then be no reason for the host to donate a page that is
currently marked shared. And secondly, it avoids having dangling
mappings in the hypervisor's stage-1, which should be a good idea from
a security perspective as the hypervisor is obviously running with
elevated privileges. And perhaps worth noting is that this also
refactors the EL2 page-tracking checks in a more scalable way, which
should allow to implement other memory transitions (host donating memory
to a guest, a guest sharing back with the host, ...) much more easily in
the future.

The series is organized as follows:

- patches 01-05 refactor the implementation of the existing share
hypercall;

- patches 06-10 introduce infrastructure to allow unmapping pages from
EL2 stage-1;

- patches 11-14 allow to refcount pages that are shared more than once
with EL2;

- patches 15-16 add the unshare hypercall, and make use of it when
tearing down guests.

This has been lightly tested on Qemu, by spawning and powering off a
guest 50 times.

Feedback welcome :) !

Thanks,
Quentin

Quentin Perret (11):
KVM: arm64: Avoid remapping the SVE state in the hyp stage-1
KVM: arm64: Introduce kvm_share_hyp()
KVM: arm64: Accept page ranges in pkvm share hypercall
KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
KVM: arm64: Refcount hyp stage-1 pgtable pages
KVM: arm64: Fixup hyp stage-1 refcount
KVM: arm64: Back hyp_vmemmap for all of memory
KVM: arm64: Move hyp refcount helpers to header files
KVM: arm64: Refcount shared pages at EL2
KVM: arm64: pkvm: Introduce an unshare hypercall
KVM: arm64: pkvm: Unshare guest structs during teardown

Will Deacon (5):
KVM: arm64: Introduce do_share() helper for memory sharing between
components
KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()
KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
KVM: arm64: Move double-sharing logic into hyp-specific function

arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/include/asm/kvm_mmu.h | 2 +
arch/arm64/include/asm/kvm_pgtable.h | 21 +
arch/arm64/kvm/arm.c | 17 +-
arch/arm64/kvm/fpsimd.c | 25 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 8 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 18 +
arch/arm64/kvm/hyp/include/nvhe/mm.h | 29 +-
arch/arm64/kvm/hyp/nvhe/early_alloc.c | 5 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 12 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 596 ++++++++++++++++--
arch/arm64/kvm/hyp/nvhe/mm.c | 31 +-
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 22 +-
arch/arm64/kvm/hyp/nvhe/setup.c | 39 +-
arch/arm64/kvm/hyp/pgtable.c | 80 ++-
arch/arm64/kvm/hyp/reserved_mem.c | 17 +-
arch/arm64/kvm/mmu.c | 48 +-
arch/arm64/kvm/reset.c | 13 +-
19 files changed, 814 insertions(+), 172 deletions(-)

--
2.33.0.882.g93a45727a2-goog


2021-10-13 16:03:04

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 16/16] KVM: arm64: pkvm: Unshare guest structs during teardown

Make use of the newly introduced unshare hypercall during guest teardown
to unmap guest-related data structures from the hyp stage-1.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/asm/kvm_mmu.h | 1 +
arch/arm64/kvm/arm.c | 2 ++
arch/arm64/kvm/fpsimd.c | 10 ++++++++--
arch/arm64/kvm/mmu.c | 16 ++++++++++++++++
arch/arm64/kvm/reset.c | 13 ++++++++++++-
6 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f8be56d5342b..8b61cdcd1b29 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -322,6 +322,8 @@ struct kvm_vcpu_arch {

struct thread_info *host_thread_info; /* hyp VA */
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
+ struct thread_info *kern_thread_info;
+ struct user_fpsimd_state *kern_fpsimd_state;

struct {
/* {Break,watch}point registers */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 185d0f62b724..81839e9a8a24 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -151,6 +151,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
#include <asm/stage2_pgtable.h>

int kvm_share_hyp(void *from, void *to);
+void kvm_unshare_hyp(void *from, void *to);
int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f2e74635332b..f11c51db6fe6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -188,6 +188,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
}
}
atomic_set(&kvm->online_vcpus, 0);
+
+ kvm_unshare_hyp(kvm, kvm + 1);
}

int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 2fe1128d9f3d..67059daf4d26 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -28,23 +28,29 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
{
int ret;

- struct thread_info *ti = &current->thread_info;
- struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
+ struct thread_info *ti = vcpu->arch.kern_thread_info;
+ struct user_fpsimd_state *fpsimd = vcpu->arch.kern_fpsimd_state;

/*
* Make sure the host task thread flags and fpsimd state are
* visible to hyp:
*/
+ kvm_unshare_hyp(ti, ti + 1);
+ ti = &current->thread_info;
ret = kvm_share_hyp(ti, ti + 1);
if (ret)
goto error;

+ kvm_unshare_hyp(fpsimd, fpsimd + 1);
+ fpsimd = &current->thread.uw.fpsimd_state;
ret = kvm_share_hyp(fpsimd, fpsimd + 1);
if (ret)
goto error;

vcpu->arch.host_thread_info = kern_hyp_va(ti);
vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+ vcpu->arch.kern_thread_info = ti;
+ vcpu->arch.kern_fpsimd_state = fpsimd;
error:
return ret;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc9865a8c988..f01b0e49e262 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -300,6 +300,22 @@ int kvm_share_hyp(void *from, void *to)
nr_pages);
}

+void kvm_unshare_hyp(void *from, void *to)
+{
+ phys_addr_t start, end;
+ u64 nr_pages;
+
+ if (is_kernel_in_hyp_mode() || kvm_host_owns_hyp_mappings() || !from)
+ return;
+
+ start = ALIGN_DOWN(kvm_kaddr_to_phys(from), PAGE_SIZE);
+ end = PAGE_ALIGN(kvm_kaddr_to_phys(to));
+ nr_pages = (end - start) >> PAGE_SHIFT;
+
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, __phys_to_pfn(start),
+ nr_pages));
+}
+
/**
* create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
* @from: The virtual kernel start address of the range
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ce36b0a3343..e3e9c9e1f1c8 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -141,7 +141,18 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)

void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- kfree(vcpu->arch.sve_state);
+ struct user_fpsimd_state *fpsimd = vcpu->arch.kern_fpsimd_state;
+ struct thread_info *ti = vcpu->arch.kern_thread_info;
+ void *sve_state = vcpu->arch.sve_state;
+
+ kvm_unshare_hyp(vcpu, vcpu + 1);
+ if (ti)
+ kvm_unshare_hyp(ti, ti + 1);
+ if (fpsimd)
+ kvm_unshare_hyp(fpsimd, fpsimd + 1);
+ if (sve_state && vcpu->arch.has_run_once)
+ kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
+ kfree(sve_state);
}

static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
--
2.33.0.882.g93a45727a2-goog

2021-10-13 16:04:20

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 11/16] KVM: arm64: Back hyp_vmemmap for all of memory

The EL2 vmemmap in nVHE Protected mode is currently very sparse: only
memory pages owned by the hypervisor itself have a matching struct
hyp_page. But since the size of these structs has been reduced
significantly, it appears that we can now afford backing the vmemmap for
all of memory.

This will simplify a lot memory tracking as the hypervisor will have a
place to store metadata (e.g. refcounts) that wouldn't otherwise fit in
the 4 SW bits we have in the host stage-2 page-table.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mm.h | 29 ++++++++++++++++++--------
arch/arm64/kvm/hyp/nvhe/mm.c | 31 ++++++++++++++++++++++++----
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 4 +---
arch/arm64/kvm/hyp/nvhe/setup.c | 7 +++----
arch/arm64/kvm/hyp/reserved_mem.c | 17 ++-------------
5 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index c9a8f535212e..f5e8582252c3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -20,23 +20,34 @@ extern u64 __io_map_base;

int hyp_create_idmap(u32 hyp_va_bits);
int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
enum kvm_pgtable_prot prot);

-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
- unsigned long *start, unsigned long *end)
+static inline unsigned long hyp_vmemmap_memblock_size(struct memblock_region *reg)
{
- unsigned long nr_pages = size >> PAGE_SHIFT;
- struct hyp_page *p = hyp_phys_to_page(phys);
+ unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+ unsigned long start, end;

- *start = (unsigned long)p;
- *end = *start + nr_pages * sizeof(struct hyp_page);
- *start = ALIGN_DOWN(*start, PAGE_SIZE);
- *end = ALIGN(*end, PAGE_SIZE);
+ start = hyp_phys_to_pfn(reg->base) * sizeof(struct hyp_page);
+ end = start + nr_pages * sizeof(struct hyp_page);
+ start = ALIGN_DOWN(start, PAGE_SIZE);
+ end = ALIGN(end, PAGE_SIZE);
+
+ return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(void)
+{
+ unsigned long res = 0, i;
+
+ for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++)
+ res += hyp_vmemmap_memblock_size(&kvm_nvhe_sym(hyp_memory)[i]);
+
+ return res >> PAGE_SHIFT;
}

static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 2fabeceb889a..65b948cbc0f5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -103,13 +103,36 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
return ret;
}

-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+int hyp_back_vmemmap(phys_addr_t back)
{
- unsigned long start, end;
+ unsigned long i, start, size, end = 0;
+ int ret;

- hyp_vmemmap_range(phys, size, &start, &end);
+ for (i = 0; i < hyp_memblock_nr; i++) {
+ start = hyp_memory[i].base;
+ start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+ /*
+ * The beginning of the hyp_vmemmap region for the current
+ * memblock may already be backed by the page backing the end
+ * the previous region, so avoid mapping it twice.
+ */
+ start = max(start, end);
+
+ end = hyp_memory[i].base + hyp_memory[i].size;
+ end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+ if (start >= end)
+ continue;
+
+ size = end - start;
+ ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+ if (ret)
+ return ret;
+
+ memset(hyp_phys_to_virt(back), 0, size);
+ back += size;
+ }

- return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+ return 0;
}

static void *__hyp_bp_vect_base;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 41fc25bdfb34..38accc2e23e3 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -234,10 +234,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,

/* Init the vmemmap portion */
p = hyp_phys_to_page(phys);
- for (i = 0; i < nr_pages; i++) {
- p[i].order = 0;
+ for (i = 0; i < nr_pages; i++)
hyp_set_page_refcounted(&p[i]);
- }

/* Attach the unused pages to the buddy tree */
for (i = reserved_pages; i < nr_pages; i++)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 98b39facae04..98441e4039b9 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -29,12 +29,11 @@ static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;

static int divide_memory_pool(void *virt, unsigned long size)
{
- unsigned long vstart, vend, nr_pages;
+ unsigned long nr_pages;

hyp_early_alloc_init(virt, size);

- hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
- nr_pages = (vend - vstart) >> PAGE_SHIFT;
+ nr_pages = hyp_vmemmap_pages();
vmemmap_base = hyp_early_alloc_contig(nr_pages);
if (!vmemmap_base)
return -ENOMEM;
@@ -76,7 +75,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
if (ret)
return ret;

- ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+ ret = hyp_back_vmemmap(hyp_virt_to_phys(vmemmap_base));
if (ret)
return ret;

diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 578670e3f608..81db85bfdbad 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -54,7 +54,7 @@ static int __init register_memblock_regions(void)

void __init kvm_hyp_reserve(void)
{
- u64 nr_pages, prev, hyp_mem_pages = 0;
+ u64 hyp_mem_pages = 0;
int ret;

if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
@@ -72,20 +72,7 @@ void __init kvm_hyp_reserve(void)

hyp_mem_pages += hyp_s1_pgtable_pages();
hyp_mem_pages += host_s2_pgtable_pages();
-
- /*
- * The hyp_vmemmap needs to be backed by pages, but these pages
- * themselves need to be present in the vmemmap, so compute the number
- * of pages needed by looking for a fixed point.
- */
- nr_pages = 0;
- do {
- prev = nr_pages;
- nr_pages = hyp_mem_pages + prev;
- nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
- nr_pages += __hyp_pgtable_max_pages(nr_pages);
- } while (nr_pages != prev);
- hyp_mem_pages += nr_pages;
+ hyp_mem_pages += hyp_vmemmap_pages();

/*
* Try to allocate a PMD-aligned region to reduce TLB pressure once
--
2.33.0.882.g93a45727a2-goog

2021-10-13 16:04:45

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 09/16] KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table

From: Will Deacon <[email protected]>

kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.

Wire up this callback for the hypervisor stage-1 page-table.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/setup.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index ad89801dfed7..98b39facae04 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -246,6 +246,7 @@ void __noreturn __pkvm_init_finalise(void)
.virt_to_phys = hyp_virt_to_phys,
.get_page = hpool_get_page,
.put_page = hpool_put_page,
+ .page_count = hyp_page_count,
};
pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

--
2.33.0.882.g93a45727a2-goog

2021-10-13 16:04:46

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 14/16] KVM: arm64: Refcount shared pages at EL2

We currently allow double sharing of pages from the hypervisor to the
host, but don't track how many times each page is shared. In order to
prepare the introduction of an unshare operation in the hypervisor,
refcount the physical pages which the host shares more than once.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 3378117d010c..cad76bc68e53 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -560,6 +560,9 @@ static int hyp_check_incoming_share(struct pkvm_page_req *req,
if (ack->completer.prot != prot)
return -EPERM;

+ if (WARN_ON(!hyp_phys_to_page(req->phys)->refcount))
+ return -EINVAL;
+
return 0;
}

@@ -619,13 +622,22 @@ static int hyp_complete_share(struct pkvm_page_req *req,
enum kvm_pgtable_prot perms)
{
void *start = (void *)req->completer.addr, *end = start + PAGE_SIZE;
+ struct hyp_page *page = hyp_phys_to_page(req->phys);
enum kvm_pgtable_prot prot;
+ int ret = 0;

- if (req->initiator.state == PKVM_PAGE_SHARED_OWNED)
+ if (req->initiator.state == PKVM_PAGE_SHARED_OWNED) {
+ hyp_page_ref_inc(page);
return 0;
+ }

prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
- return pkvm_create_mappings_locked(start, end, prot);
+ ret = pkvm_create_mappings_locked(start, end, prot);
+
+ if (!ret)
+ hyp_set_page_refcounted(page);
+
+ return ret;
}

/* Update the completer's page-table for the page-sharing request */
--
2.33.0.882.g93a45727a2-goog

2021-10-13 16:04:51

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 13/16] KVM: arm64: Move double-sharing logic into hyp-specific function

From: Will Deacon <[email protected]>

Strictly speaking, double-sharing a page is an invalid transition and
should be rejected, however we allow this in order to simplify the
book-keeping when KVM metadata (such as vcpu structures) co-exists in
the same page.

Given that double-sharing is only required for pages shared with the
hypervisor by the host, move the handling into a hyp-specific function
to check incoming shares, therefore preventing double-sharing outside
of this particular transition.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 57 +++++++++++++++++++--------
1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 909e60f71b06..3378117d010c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -536,6 +536,33 @@ static int ack_share(struct pkvm_page_share_ack *ack,
}
}

+static int hyp_check_incoming_share(struct pkvm_page_req *req,
+ struct pkvm_page_share_ack *ack,
+ enum pkvm_component_id initiator,
+ enum kvm_pgtable_prot prot)
+{
+ /*
+ * We allow the host to share the same page twice, but that means we
+ * have to check that the states really do match exactly.
+ */
+ if (initiator != PKVM_ID_HOST)
+ return -EPERM;
+
+ if (req->initiator.state != PKVM_PAGE_SHARED_OWNED)
+ return -EPERM;
+
+ if (ack->completer.state != PKVM_PAGE_SHARED_BORROWED)
+ return -EPERM;
+
+ if (ack->completer.phys != req->phys)
+ return -EPERM;
+
+ if (ack->completer.prot != prot)
+ return -EPERM;
+
+ return 0;
+}
+
/*
* Check that the page states in the initiator and the completer are compatible
* for the requested page-sharing operation to go ahead.
@@ -544,6 +571,8 @@ static int check_share(struct pkvm_page_req *req,
struct pkvm_page_share_ack *ack,
struct pkvm_mem_share *share)
{
+ struct pkvm_mem_transition *tx = &share->tx;
+
if (!addr_is_memory(req->phys))
return -EINVAL;

@@ -552,25 +581,22 @@ static int check_share(struct pkvm_page_req *req,
return 0;
}

- if (req->initiator.state != PKVM_PAGE_SHARED_OWNED)
- return -EPERM;
-
- if (ack->completer.state != PKVM_PAGE_SHARED_BORROWED)
- return -EPERM;
-
- if (ack->completer.phys != req->phys)
- return -EPERM;
-
- if (ack->completer.prot != share->prot)
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ return hyp_check_incoming_share(req, ack, tx->initiator.id,
+ share->prot);
+ default:
return -EPERM;
-
- return 0;
+ }
}

static int host_initiate_share(struct pkvm_page_req *req)
{
enum kvm_pgtable_prot prot;

+ if (req->initiator.state == PKVM_PAGE_SHARED_OWNED)
+ return 0;
+
prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
return host_stage2_idmap_locked(req->initiator.addr, PAGE_SIZE, prot);
}
@@ -595,6 +621,9 @@ static int hyp_complete_share(struct pkvm_page_req *req,
void *start = (void *)req->completer.addr, *end = start + PAGE_SIZE;
enum kvm_pgtable_prot prot;

+ if (req->initiator.state == PKVM_PAGE_SHARED_OWNED)
+ return 0;
+
prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
return pkvm_create_mappings_locked(start, end, prot);
}
@@ -653,10 +682,6 @@ static int do_share(struct pkvm_mem_share *share)
if (ret)
break;

- /* Allow double-sharing by skipping over the page */
- if (req.initiator.state == PKVM_PAGE_SHARED_OWNED)
- continue;
-
ret = initiate_share(&req, share);
if (ret)
break;
--
2.33.0.882.g93a45727a2-goog

2021-10-13 19:13:53

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 12/16] KVM: arm64: Move hyp refcount helpers to header files

We will soon need to touch the hyp_page refcount from outside
page_alloc.c in nVHE protected mode, so move the relevant helpers into a
header file.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/memory.h | 18 ++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 18 ------------------
2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7edb3edb..e77783be0f3f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -12,6 +12,24 @@ struct hyp_page {
unsigned short order;
};

+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+ BUG_ON(p->refcount == USHRT_MAX);
+ p->refcount++;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+ p->refcount--;
+ return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+ BUG_ON(p->refcount);
+ p->refcount = 1;
+}
+
extern u64 __hyp_vmemmap;
#define hyp_vmemmap ((struct hyp_page *)__hyp_vmemmap)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 38accc2e23e3..0d977169ed08 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -144,24 +144,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
return p;
}

-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
- BUG_ON(p->refcount == USHRT_MAX);
- p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
- p->refcount--;
- return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
- BUG_ON(p->refcount);
- p->refcount = 1;
-}
-
static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
{
if (hyp_page_ref_dec_and_test(p))
--
2.33.0.882.g93a45727a2-goog

2021-10-13 19:14:42

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 07/16] KVM: arm64: Refcount hyp stage-1 pgtable pages

To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f8ceebe4982e..768a58835153 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -408,8 +408,10 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
return false;

new = kvm_init_valid_leaf_pte(phys, data->attr, level);
- if (hyp_pte_needs_update(old, new))
+ if (hyp_pte_needs_update(old, new)) {
smp_store_release(ptep, new);
+ data->mm_ops->get_page(ptep);
+ }

data->phys += granule;
return true;
@@ -433,6 +435,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
return -ENOMEM;

kvm_set_table_pte(ptep, childp, mm_ops);
+ mm_ops->get_page(ptep);
return 0;
}

@@ -482,8 +485,16 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
struct kvm_pgtable_mm_ops *mm_ops = arg;
+ kvm_pte_t pte = *ptep;
+
+ if (!kvm_pte_valid(pte))
+ return 0;
+
+ mm_ops->put_page(ptep);
+
+ if (kvm_pte_table(pte, level))
+ mm_ops->put_page(kvm_pte_follow(pte, mm_ops));

- mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
return 0;
}

@@ -491,7 +502,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
{
struct kvm_pgtable_walker walker = {
.cb = hyp_free_walker,
- .flags = KVM_PGTABLE_WALK_TABLE_POST,
+ .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
.arg = pgt->mm_ops,
};

--
2.33.0.882.g93a45727a2-goog

2021-10-13 19:15:50

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 08/16] KVM: arm64: Fixup hyp stage-1 refcount

In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/setup.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 57c27846320f..ad89801dfed7 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -166,12 +166,22 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
{
enum kvm_pgtable_prot prot;
enum pkvm_page_state state;
+ struct kvm_pgtable_mm_ops *mm_ops = arg;
kvm_pte_t pte = *ptep;
phys_addr_t phys;

if (!kvm_pte_valid(pte))
return 0;

+ /*
+ * Fix-up the refcount for the page-table pages as the early allocator
+ * was unable to access the hyp_vmemmap and so the buddy allocator has
+ * initialised the refcount to '1'.
+ */
+ mm_ops->get_page(ptep);
+ if (flag != KVM_PGTABLE_WALK_LEAF)
+ return 0;
+
if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;

@@ -204,7 +214,8 @@ static int finalize_host_mappings(void)
{
struct kvm_pgtable_walker walker = {
.cb = finalize_host_mappings_walker,
- .flags = KVM_PGTABLE_WALK_LEAF,
+ .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+ .arg = pkvm_pgtable.mm_ops,
};

return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits), &walker);
@@ -229,19 +240,19 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;

- ret = finalize_host_mappings();
- if (ret)
- goto out;
-
pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
- .zalloc_page = hyp_zalloc_hyp_page,
- .phys_to_virt = hyp_phys_to_virt,
- .virt_to_phys = hyp_virt_to_phys,
- .get_page = hpool_get_page,
- .put_page = hpool_put_page,
+ .zalloc_page = hyp_zalloc_hyp_page,
+ .phys_to_virt = hyp_phys_to_virt,
+ .virt_to_phys = hyp_virt_to_phys,
+ .get_page = hpool_get_page,
+ .put_page = hpool_put_page,
};
pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

+ ret = finalize_host_mappings();
+ if (ret)
+ goto out;
+
out:
/*
* We tail-called to here from handle___pkvm_init() and will not return,
--
2.33.0.882.g93a45727a2-goog

2021-10-13 19:17:27

by Quentin Perret

[permalink] [raw]
Subject: [PATCH 01/16] KVM: arm64: Introduce do_share() helper for memory sharing between components

From: Will Deacon <[email protected]>

In preparation for extending memory sharing to include the guest as well
as the hypervisor and the host, introduce a high-level do_share() helper
which allows memory to be shared between these components without
duplication of validity checks.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 5 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 315 ++++++++++++++++++
2 files changed, 320 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index b58c910babaf..56445586c755 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -24,6 +24,11 @@ enum pkvm_page_state {
PKVM_PAGE_OWNED = 0ULL,
PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
+ __PKVM_PAGE_RESERVED = KVM_PGTABLE_PROT_SW0 |
+ KVM_PGTABLE_PROT_SW1,
+
+ /* Meta-states which aren't encoded directly in the PTE's SW bits */
+ PKVM_NOPAGE,
};

#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bacd493a4eac..53e503501044 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -443,3 +443,318 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
ret = host_stage2_idmap(addr);
BUG_ON(ret && ret != -EAGAIN);
}
+
+/* This corresponds to locking order */
+enum pkvm_component_id {
+ PKVM_ID_HOST,
+ PKVM_ID_HYP,
+};
+
+struct pkvm_mem_transition {
+ u64 nr_pages;
+
+ struct {
+ enum pkvm_component_id id;
+ u64 addr;
+
+ union {
+ struct {
+ u64 completer_addr;
+ } host;
+ };
+ } initiator;
+
+ struct {
+ enum pkvm_component_id id;
+ } completer;
+};
+
+struct pkvm_mem_share {
+ struct pkvm_mem_transition tx;
+ enum kvm_pgtable_prot prot;
+};
+
+struct pkvm_page_req {
+ struct {
+ enum pkvm_page_state state;
+ u64 addr;
+ } initiator;
+
+ struct {
+ u64 addr;
+ } completer;
+
+ phys_addr_t phys;
+};
+
+struct pkvm_page_share_ack {
+ struct {
+ enum pkvm_page_state state;
+ phys_addr_t phys;
+ enum kvm_pgtable_prot prot;
+ } completer;
+};
+
+static void host_lock_component(void)
+{
+ hyp_spin_lock(&host_kvm.lock);
+}
+
+static void host_unlock_component(void)
+{
+ hyp_spin_unlock(&host_kvm.lock);
+}
+
+static void hyp_lock_component(void)
+{
+ hyp_spin_lock(&pkvm_pgd_lock);
+}
+
+static void hyp_unlock_component(void)
+{
+ hyp_spin_unlock(&pkvm_pgd_lock);
+}
+
+static int host_request_share(struct pkvm_page_req *req,
+ struct pkvm_mem_transition *tx,
+ u64 idx)
+{
+ u64 offset = idx * PAGE_SIZE;
+ enum kvm_pgtable_prot prot;
+ u64 host_addr;
+ kvm_pte_t pte;
+ int err;
+
+ hyp_assert_lock_held(&host_kvm.lock);
+
+ host_addr = tx->initiator.addr + offset;
+ err = kvm_pgtable_get_leaf(&host_kvm.pgt, host_addr, &pte, NULL);
+ if (err)
+ return err;
+
+ if (!kvm_pte_valid(pte) && pte)
+ return -EPERM;
+
+ prot = kvm_pgtable_stage2_pte_prot(pte);
+ *req = (struct pkvm_page_req) {
+ .initiator = {
+ .state = pkvm_getstate(prot),
+ .addr = host_addr,
+ },
+ .completer = {
+ .addr = tx->initiator.host.completer_addr + offset,
+ },
+ .phys = host_addr,
+ };
+
+ return 0;
+}
+
+/*
+ * Populate the page-sharing request (@req) based on the share transition
+ * information from the initiator and its current page state.
+ */
+static int request_share(struct pkvm_page_req *req,
+ struct pkvm_mem_share *share,
+ u64 idx)
+{
+ struct pkvm_mem_transition *tx = &share->tx;
+
+ switch (tx->initiator.id) {
+ case PKVM_ID_HOST:
+ return host_request_share(req, tx, idx);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int hyp_ack_share(struct pkvm_page_share_ack *ack,
+ struct pkvm_page_req *req,
+ enum kvm_pgtable_prot perms)
+{
+ enum pkvm_page_state state = PKVM_NOPAGE;
+ enum kvm_pgtable_prot prot = 0;
+ phys_addr_t phys = 0;
+ kvm_pte_t pte;
+ u64 hyp_addr;
+ int err;
+
+ hyp_assert_lock_held(&pkvm_pgd_lock);
+
+ if (perms != PAGE_HYP)
+ return -EPERM;
+
+ hyp_addr = req->completer.addr;
+ err = kvm_pgtable_get_leaf(&pkvm_pgtable, hyp_addr, &pte, NULL);
+ if (err)
+ return err;
+
+ if (kvm_pte_valid(pte)) {
+ state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
+ phys = kvm_pte_to_phys(pte);
+ prot = kvm_pgtable_hyp_pte_prot(pte) & KVM_PGTABLE_PROT_RWX;
+ }
+
+ *ack = (struct pkvm_page_share_ack) {
+ .completer = {
+ .state = state,
+ .phys = phys,
+ .prot = prot,
+ },
+ };
+
+ return 0;
+}
+
+/*
+ * Populate the page-sharing acknowledgment (@ack) based on the sharing request
+ * from the initiator and the current page state in the completer.
+ */
+static int ack_share(struct pkvm_page_share_ack *ack,
+ struct pkvm_page_req *req,
+ struct pkvm_mem_share *share)
+{
+ struct pkvm_mem_transition *tx = &share->tx;
+
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ return hyp_ack_share(ack, req, share->prot);
+ default:
+ return -EINVAL;
+ }
+}
+
+/*
+ * Check that the page states in the initiator and the completer are compatible
+ * for the requested page-sharing operation to go ahead.
+ */
+static int check_share(struct pkvm_page_req *req,
+ struct pkvm_page_share_ack *ack,
+ struct pkvm_mem_share *share)
+{
+ if (!addr_is_memory(req->phys))
+ return -EINVAL;
+
+ if (req->initiator.state == PKVM_PAGE_OWNED &&
+ ack->completer.state == PKVM_NOPAGE) {
+ return 0;
+ }
+
+ if (req->initiator.state != PKVM_PAGE_SHARED_OWNED)
+ return -EPERM;
+
+ if (ack->completer.state != PKVM_PAGE_SHARED_BORROWED)
+ return -EPERM;
+
+ if (ack->completer.phys != req->phys)
+ return -EPERM;
+
+ if (ack->completer.prot != share->prot)
+ return -EPERM;
+
+ return 0;
+}
+
+static int host_initiate_share(struct pkvm_page_req *req)
+{
+ enum kvm_pgtable_prot prot;
+
+ prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
+ return host_stage2_idmap_locked(req->initiator.addr, PAGE_SIZE, prot);
+}
+
+/* Update the initiator's page-table for the page-sharing request */
+static int initiate_share(struct pkvm_page_req *req,
+ struct pkvm_mem_share *share)
+{
+ struct pkvm_mem_transition *tx = &share->tx;
+
+ switch (tx->initiator.id) {
+ case PKVM_ID_HOST:
+ return host_initiate_share(req);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int hyp_complete_share(struct pkvm_page_req *req,
+ enum kvm_pgtable_prot perms)
+{
+ void *start = (void *)req->completer.addr, *end = start + PAGE_SIZE;
+ enum kvm_pgtable_prot prot;
+
+ prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
+ return pkvm_create_mappings_locked(start, end, prot);
+}
+
+/* Update the completer's page-table for the page-sharing request */
+static int complete_share(struct pkvm_page_req *req,
+ struct pkvm_mem_share *share)
+{
+ struct pkvm_mem_transition *tx = &share->tx;
+
+ switch (tx->completer.id) {
+ case PKVM_ID_HYP:
+ return hyp_complete_share(req, share->prot);
+ default:
+ return -EINVAL;
+ }
+}
+
+/*
+ * do_share():
+ *
+ * The page owner grants access to another component with a given set
+ * of permissions.
+ *
+ * Initiator: OWNED => SHARED_OWNED
+ * Completer: NOPAGE => SHARED_BORROWED
+ *
+ * Note that we permit the same share operation to be repeated from the
+ * host to the hypervisor, as this removes the need for excessive
+ * book-keeping of shared KVM data structures at EL1.
+ */
+static int do_share(struct pkvm_mem_share *share)
+{
+ struct pkvm_page_req req;
+ int ret = 0;
+ u64 idx;
+
+ for (idx = 0; idx < share->tx.nr_pages; ++idx) {
+ struct pkvm_page_share_ack ack;
+
+ ret = request_share(&req, share, idx);
+ if (ret)
+ goto out;
+
+ ret = ack_share(&ack, &req, share);
+ if (ret)
+ goto out;
+
+ ret = check_share(&req, &ack, share);
+ if (ret)
+ goto out;
+ }
+
+ for (idx = 0; idx < share->tx.nr_pages; ++idx) {
+ ret = request_share(&req, share, idx);
+ if (ret)
+ break;
+
+ /* Allow double-sharing by skipping over the page */
+ if (req.initiator.state == PKVM_PAGE_SHARED_OWNED)
+ continue;
+
+ ret = initiate_share(&req, share);
+ if (ret)
+ break;
+
+ ret = complete_share(&req, share);
+ if (ret)
+ break;
+ }
+
+ WARN_ON(ret);
+out:
+ return ret;
+}
--
2.33.0.882.g93a45727a2-goog

2021-10-16 18:46:13

by Andrew Walbran

[permalink] [raw]
Subject: Re: [PATCH 01/16] KVM: arm64: Introduce do_share() helper for memory sharing between components

On Wed, 13 Oct 2021 at 16:58, 'Quentin Perret' via kernel-team
<[email protected]> wrote:
>
> From: Will Deacon <[email protected]>
>
> In preparation for extending memory sharing to include the guest as well
> as the hypervisor and the host, introduce a high-level do_share() helper
> which allows memory to be shared between these components without
> duplication of validity checks.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 5 +
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 315 ++++++++++++++++++
> 2 files changed, 320 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index b58c910babaf..56445586c755 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -24,6 +24,11 @@ enum pkvm_page_state {
> PKVM_PAGE_OWNED = 0ULL,
> PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
> PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
> + __PKVM_PAGE_RESERVED = KVM_PGTABLE_PROT_SW0 |
> + KVM_PGTABLE_PROT_SW1,
> +
> + /* Meta-states which aren't encoded directly in the PTE's SW bits */
> + PKVM_NOPAGE,
> };
>
> #define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index bacd493a4eac..53e503501044 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -443,3 +443,318 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
> ret = host_stage2_idmap(addr);
> BUG_ON(ret && ret != -EAGAIN);
> }
> +
> +/* This corresponds to locking order */
> +enum pkvm_component_id {
> + PKVM_ID_HOST,
> + PKVM_ID_HYP,
> +};
> +
> +struct pkvm_mem_transition {
> + u64 nr_pages;
> +
> + struct {
> + enum pkvm_component_id id;
> + u64 addr;
Is this the physical address or the IPA of the initiator? It would be
good to have a comment explaining.

> +
> + union {
> + struct {
> + u64 completer_addr;
> + } host;
> + };
> + } initiator;
> +
> + struct {
> + enum pkvm_component_id id;
> + } completer;
> +};
> +
> +struct pkvm_mem_share {
> + struct pkvm_mem_transition tx;
> + enum kvm_pgtable_prot prot;
> +};
> +
> +struct pkvm_page_req {
> + struct {
> + enum pkvm_page_state state;
> + u64 addr;
> + } initiator;
> +
> + struct {
> + u64 addr;
> + } completer;
> +
> + phys_addr_t phys;
> +};
> +
> +struct pkvm_page_share_ack {
> + struct {
> + enum pkvm_page_state state;
> + phys_addr_t phys;
> + enum kvm_pgtable_prot prot;
> + } completer;
> +};
> +
> +static void host_lock_component(void)
> +{
> + hyp_spin_lock(&host_kvm.lock);
> +}
> +
> +static void host_unlock_component(void)
> +{
> + hyp_spin_unlock(&host_kvm.lock);
> +}
> +
> +static void hyp_lock_component(void)
> +{
> + hyp_spin_lock(&pkvm_pgd_lock);
> +}
> +
> +static void hyp_unlock_component(void)
> +{
> + hyp_spin_unlock(&pkvm_pgd_lock);
> +}
> +
> +static int host_request_share(struct pkvm_page_req *req,
> + struct pkvm_mem_transition *tx,
> + u64 idx)
> +{
> + u64 offset = idx * PAGE_SIZE;
> + enum kvm_pgtable_prot prot;
> + u64 host_addr;
> + kvm_pte_t pte;
> + int err;
> +
> + hyp_assert_lock_held(&host_kvm.lock);
> +
> + host_addr = tx->initiator.addr + offset;
> + err = kvm_pgtable_get_leaf(&host_kvm.pgt, host_addr, &pte, NULL);
> + if (err)
> + return err;
> +
> + if (!kvm_pte_valid(pte) && pte)
> + return -EPERM;
> +
> + prot = kvm_pgtable_stage2_pte_prot(pte);
> + *req = (struct pkvm_page_req) {
> + .initiator = {
> + .state = pkvm_getstate(prot),
> + .addr = host_addr,
> + },
> + .completer = {
> + .addr = tx->initiator.host.completer_addr + offset,
> + },
> + .phys = host_addr,
> + };
> +
> + return 0;
> +}
> +
> +/*
> + * Populate the page-sharing request (@req) based on the share transition
> + * information from the initiator and its current page state.
> + */
> +static int request_share(struct pkvm_page_req *req,
> + struct pkvm_mem_share *share,
> + u64 idx)
> +{
> + struct pkvm_mem_transition *tx = &share->tx;
> +
> + switch (tx->initiator.id) {
> + case PKVM_ID_HOST:
> + return host_request_share(req, tx, idx);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int hyp_ack_share(struct pkvm_page_share_ack *ack,
> + struct pkvm_page_req *req,
> + enum kvm_pgtable_prot perms)
> +{
> + enum pkvm_page_state state = PKVM_NOPAGE;
> + enum kvm_pgtable_prot prot = 0;
> + phys_addr_t phys = 0;
> + kvm_pte_t pte;
> + u64 hyp_addr;
> + int err;
> +
> + hyp_assert_lock_held(&pkvm_pgd_lock);
> +
> + if (perms != PAGE_HYP)
> + return -EPERM;
> +
> + hyp_addr = req->completer.addr;
> + err = kvm_pgtable_get_leaf(&pkvm_pgtable, hyp_addr, &pte, NULL);
> + if (err)
> + return err;
> +
> + if (kvm_pte_valid(pte)) {
> + state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> + phys = kvm_pte_to_phys(pte);
> + prot = kvm_pgtable_hyp_pte_prot(pte) & KVM_PGTABLE_PROT_RWX;
> + }
> +
> + *ack = (struct pkvm_page_share_ack) {
> + .completer = {
> + .state = state,
> + .phys = phys,
> + .prot = prot,
> + },
> + };
> +
> + return 0;
> +}
> +
> +/*
> + * Populate the page-sharing acknowledgment (@ack) based on the sharing request
> + * from the initiator and the current page state in the completer.
> + */
> +static int ack_share(struct pkvm_page_share_ack *ack,
> + struct pkvm_page_req *req,
> + struct pkvm_mem_share *share)
> +{
> + struct pkvm_mem_transition *tx = &share->tx;
> +
> + switch (tx->completer.id) {
> + case PKVM_ID_HYP:
> + return hyp_ack_share(ack, req, share->prot);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +/*
> + * Check that the page states in the initiator and the completer are compatible
> + * for the requested page-sharing operation to go ahead.
> + */
> +static int check_share(struct pkvm_page_req *req,
> + struct pkvm_page_share_ack *ack,
> + struct pkvm_mem_share *share)
> +{
> + if (!addr_is_memory(req->phys))
> + return -EINVAL;
> +
> + if (req->initiator.state == PKVM_PAGE_OWNED &&
> + ack->completer.state == PKVM_NOPAGE) {
> + return 0;
> + }
> +
> + if (req->initiator.state != PKVM_PAGE_SHARED_OWNED)
> + return -EPERM;
> +
> + if (ack->completer.state != PKVM_PAGE_SHARED_BORROWED)
> + return -EPERM;
> +
> + if (ack->completer.phys != req->phys)
> + return -EPERM;
> +
> + if (ack->completer.prot != share->prot)
> + return -EPERM;
I guess this is the workaround you mentioned for the fact that the
host can share the same page twice? It might be worth adding a comment
to explain that that's what's going on.

> +
> + return 0;
> +}
> +
> +static int host_initiate_share(struct pkvm_page_req *req)
> +{
> + enum kvm_pgtable_prot prot;
> +
> + prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
> + return host_stage2_idmap_locked(req->initiator.addr, PAGE_SIZE, prot);
> +}
> +
> +/* Update the initiator's page-table for the page-sharing request */
> +static int initiate_share(struct pkvm_page_req *req,
> + struct pkvm_mem_share *share)
> +{
> + struct pkvm_mem_transition *tx = &share->tx;
> +
> + switch (tx->initiator.id) {
> + case PKVM_ID_HOST:
> + return host_initiate_share(req);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int hyp_complete_share(struct pkvm_page_req *req,
> + enum kvm_pgtable_prot perms)
> +{
> + void *start = (void *)req->completer.addr, *end = start + PAGE_SIZE;
> + enum kvm_pgtable_prot prot;
> +
> + prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
> + return pkvm_create_mappings_locked(start, end, prot);
> +}
> +
> +/* Update the completer's page-table for the page-sharing request */
> +static int complete_share(struct pkvm_page_req *req,
> + struct pkvm_mem_share *share)
> +{
> + struct pkvm_mem_transition *tx = &share->tx;
> +
> + switch (tx->completer.id) {
> + case PKVM_ID_HYP:
> + return hyp_complete_share(req, share->prot);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +/*
> + * do_share():
> + *
> + * The page owner grants access to another component with a given set
> + * of permissions.
> + *
> + * Initiator: OWNED => SHARED_OWNED
> + * Completer: NOPAGE => SHARED_BORROWED
> + *
> + * Note that we permit the same share operation to be repeated from the
> + * host to the hypervisor, as this removes the need for excessive
> + * book-keeping of shared KVM data structures at EL1.
> + */
> +static int do_share(struct pkvm_mem_share *share)
> +{
> + struct pkvm_page_req req;
> + int ret = 0;
> + u64 idx;
> +
> + for (idx = 0; idx < share->tx.nr_pages; ++idx) {
> + struct pkvm_page_share_ack ack;
> +
> + ret = request_share(&req, share, idx);
> + if (ret)
> + goto out;
> +
> + ret = ack_share(&ack, &req, share);
> + if (ret)
> + goto out;
> +
> + ret = check_share(&req, &ack, share);
> + if (ret)
> + goto out;
> + }
> +
> + for (idx = 0; idx < share->tx.nr_pages; ++idx) {
> + ret = request_share(&req, share, idx);
> + if (ret)
> + break;
> +
> + /* Allow double-sharing by skipping over the page */
> + if (req.initiator.state == PKVM_PAGE_SHARED_OWNED)
> + continue;
> +
> + ret = initiate_share(&req, share);
> + if (ret)
> + break;
> +
> + ret = complete_share(&req, share);
> + if (ret)
> + break;
> + }
> +
> + WARN_ON(ret);
> +out:
> + return ret;
> +}
> --
> 2.33.0.882.g93a45727a2-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2021-10-18 03:35:38

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 16/16] KVM: arm64: pkvm: Unshare guest structs during teardown

On Wed, 13 Oct 2021 16:58:31 +0100,
Quentin Perret <[email protected]> wrote:
>
> Make use of the newly introduced unshare hypercall during guest teardown
> to unmap guest-related data structures from the hyp stage-1.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/include/asm/kvm_mmu.h | 1 +
> arch/arm64/kvm/arm.c | 2 ++
> arch/arm64/kvm/fpsimd.c | 10 ++++++++--
> arch/arm64/kvm/mmu.c | 16 ++++++++++++++++
> arch/arm64/kvm/reset.c | 13 ++++++++++++-
> 6 files changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f8be56d5342b..8b61cdcd1b29 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -322,6 +322,8 @@ struct kvm_vcpu_arch {
>
> struct thread_info *host_thread_info; /* hyp VA */
> struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
> + struct thread_info *kern_thread_info;
> + struct user_fpsimd_state *kern_fpsimd_state;
>
> struct {
> /* {Break,watch}point registers */
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 185d0f62b724..81839e9a8a24 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -151,6 +151,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
> #include <asm/stage2_pgtable.h>
>
> int kvm_share_hyp(void *from, void *to);
> +void kvm_unshare_hyp(void *from, void *to);
> int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> void __iomem **kaddr,
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f2e74635332b..f11c51db6fe6 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -188,6 +188,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> }
> }
> atomic_set(&kvm->online_vcpus, 0);
> +
> + kvm_unshare_hyp(kvm, kvm + 1);
> }
>
> int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 2fe1128d9f3d..67059daf4d26 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -28,23 +28,29 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> {
> int ret;
>
> - struct thread_info *ti = &current->thread_info;
> - struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
> + struct thread_info *ti = vcpu->arch.kern_thread_info;
> + struct user_fpsimd_state *fpsimd = vcpu->arch.kern_fpsimd_state;
>
> /*
> * Make sure the host task thread flags and fpsimd state are
> * visible to hyp:
> */
> + kvm_unshare_hyp(ti, ti + 1);

At this stage, the old thread may have been destroyed and the memory
recycled. What happens if, in the interval, that memory gets shared
again in another context? My guts feeling is that either the sharing
fails, or the unshare above breaks something else (the refcounting
doesn't save us if the sharing is not with HYP).

In any case, I wonder whether we need a notification from the core
code that a thread for which we have a HYP mapping is gone so that we
can mop up the mapping at that point. That's similar to what we have
for MMU notifiers and S2 PTs.

This is doable by hooking into fpsimd_release_task() and extending
thread_struct to track the sharing with HYP.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-10-18 10:34:42

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 16/16] KVM: arm64: pkvm: Unshare guest structs during teardown

On Saturday 16 Oct 2021 at 13:25:45 (+0100), Marc Zyngier wrote:
> At this stage, the old thread may have been destroyed and the memory
> recycled. What happens if, in the interval, that memory gets shared
> again in another context? My guts feeling is that either the sharing
> fails, or the unshare above breaks something else (the refcounting
> doesn't save us if the sharing is not with HYP).

Aha, yes, that's a good point. The problematic scenario would be: a vcpu
runs in the context of task A, then blocks. Then task A is destroyed,
but the vcpu isn't (possibly because e.g. userspace intends to run it in
the context of another task or something along those lines). But the
thread_info and fpsimd_state of task A remain shared with the hypervisor
until the next vcpu run, even though the memory has been freed by the
host, and is possibly reallocated to another guest in the meantime.

So yes, at this point sharing/donating this memory range with a new
guest will fail, and confuse the host massively :/

> In any case, I wonder whether we need a notification from the core
> code that a thread for which we have a HYP mapping is gone so that we
> can mop up the mapping at that point. That's similar to what we have
> for MMU notifiers and S2 PTs.
>
> This is doable by hooking into fpsimd_release_task() and extending
> thread_struct to track the sharing with HYP.

I've been looking into this, but struggled to find a good way to avoid
all the races. Specifically, handling the case where a vcpu and the task
which last ran it get destroyed at the same time isn't that easy to
handle: you end up having to maintain pointers from the task to the vcpu
and vice versa, but I have no obvious idea to update them both in a
non-racy way (other than having a one big lock to serialize all
those changes, but that would mean serializing all task destructions so
that really doesn't scale).

Another option is to take a refcount on 'current' from
kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with
the hyp and release the refcount of the previous task after unsharing.
But that means we'll have to also drop the refcount when the vcpu
gets destroyed, as well as explicitly unshare at that point. Shouldn't
be too bad I think. Thoughts?

Thanks,
Quentin

2021-10-18 14:18:38

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 16/16] KVM: arm64: pkvm: Unshare guest structs during teardown

On Monday 18 Oct 2021 at 11:32:13 (+0100), Quentin Perret wrote:
> Another option is to take a refcount on 'current' from
> kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with
> the hyp and release the refcount of the previous task after unsharing.
> But that means we'll have to also drop the refcount when the vcpu
> gets destroyed, as well as explicitly unshare at that point. Shouldn't
> be too bad I think. Thoughts?

Something like the below seems to work OK on my setup, including
SIGKILL'ing the guest and such. How much do you hate it?

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f8be56d5342b..50598d704c71 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -322,6 +322,7 @@ struct kvm_vcpu_arch {

struct thread_info *host_thread_info; /* hyp VA */
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
+ struct task_struct *parent_task;

struct {
/* {Break,watch}point registers */
@@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);

static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
{
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 2fe1128d9f3d..27afeebbe1cb 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -15,6 +15,22 @@
#include <asm/kvm_mmu.h>
#include <asm/sysreg.h>

+void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
+{
+ struct task_struct *p = vcpu->arch.parent_task;
+ struct user_fpsimd_state *fpsimd;
+ struct thread_info *ti;
+
+ if (!static_branch_likely(&kvm_protected_mode_initialized) || !p)
+ return;
+
+ ti = &p->thread_info;
+ kvm_unshare_hyp(ti, ti + 1);
+ fpsimd = &p->thread.uw.fpsimd_state;
+ kvm_unshare_hyp(fpsimd, fpsimd + 1);
+ put_task_struct(p);
+}
+
/*
* Called on entry to KVM_RUN unless this vcpu previously ran at least
* once and the most recent prior KVM_RUN for this vcpu was called from
@@ -31,6 +47,8 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
struct thread_info *ti = &current->thread_info;
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;

+ kvm_vcpu_unshare_task_fp(vcpu);
+
/*
* Make sure the host task thread flags and fpsimd state are
* visible to hyp:
@@ -45,6 +63,17 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)

vcpu->arch.host_thread_info = kern_hyp_va(ti);
vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+
+ /*
+ * We need to keep current's task_struct pinned until its data has been
+ * unshared with the hypervisor to make sure it is not re-used by the
+ * kernel and donated to someone else while already shared -- see
+ * kvm_vcpu_unshare_task_fp() for the matching put_task_struct().
+ */
+ if (static_branch_likely(&kvm_protected_mode_initialized)) {
+ get_task_struct(current);
+ vcpu->arch.parent_task = current;
+ }
error:
return ret;
}
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ce36b0a3343..c2a2cd7d5748 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -141,7 +141,13 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)

void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- kfree(vcpu->arch.sve_state);
+ void *sve_state = vcpu->arch.sve_state;
+
+ kvm_vcpu_unshare_task_fp(vcpu);
+ kvm_unshare_hyp(vcpu, vcpu + 1);
+ if (sve_state && vcpu->arch.has_run_once)
+ kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
+ kfree(sve_state);
}

static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)

2021-10-18 17:24:29

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 16/16] KVM: arm64: pkvm: Unshare guest structs during teardown

On 2021-10-18 15:03, Quentin Perret wrote:
> On Monday 18 Oct 2021 at 11:32:13 (+0100), Quentin Perret wrote:
>> Another option is to take a refcount on 'current' from
>> kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with
>> the hyp and release the refcount of the previous task after unsharing.
>> But that means we'll have to also drop the refcount when the vcpu
>> gets destroyed, as well as explicitly unshare at that point. Shouldn't
>> be too bad I think. Thoughts?
>
> Something like the below seems to work OK on my setup, including
> SIGKILL'ing the guest and such. How much do you hate it?

It is annoyingly elegant! Small nitpick below.

>
> diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> index f8be56d5342b..50598d704c71 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -322,6 +322,7 @@ struct kvm_vcpu_arch {
>
> struct thread_info *host_thread_info; /* hyp VA */
> struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
> + struct task_struct *parent_task;
>
> struct {
> /* {Break,watch}point registers */
> @@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu
> *vcpu);
> void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
> void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);
>
> static inline bool kvm_pmu_counter_deferred(struct perf_event_attr
> *attr)
> {
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 2fe1128d9f3d..27afeebbe1cb 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -15,6 +15,22 @@
> #include <asm/kvm_mmu.h>
> #include <asm/sysreg.h>
>
> +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
> +{
> + struct task_struct *p = vcpu->arch.parent_task;
> + struct user_fpsimd_state *fpsimd;
> + struct thread_info *ti;
> +
> + if (!static_branch_likely(&kvm_protected_mode_initialized) || !p)

Shouldn't this be a check on is_protected_kvm_enabled() instead?
The two should be equivalent outside of the initialisation code...

Otherwise, ship it.

M.
--
Jazz is not dead. It just smells funny...

2021-10-19 09:41:37

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 16/16] KVM: arm64: pkvm: Unshare guest structs during teardown

On Monday 18 Oct 2021 at 18:12:22 (+0100), Marc Zyngier wrote:
> On 2021-10-18 15:03, Quentin Perret wrote:
> > On Monday 18 Oct 2021 at 11:32:13 (+0100), Quentin Perret wrote:
> > > Another option is to take a refcount on 'current' from
> > > kvm_arch_vcpu_run_map_fp() before sharing thread-specific structs with
> > > the hyp and release the refcount of the previous task after unsharing.
> > > But that means we'll have to also drop the refcount when the vcpu
> > > gets destroyed, as well as explicitly unshare at that point. Shouldn't
> > > be too bad I think. Thoughts?
> >
> > Something like the below seems to work OK on my setup, including
> > SIGKILL'ing the guest and such. How much do you hate it?
>
> It is annoyingly elegant! Small nitpick below.
>
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h
> > index f8be56d5342b..50598d704c71 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -322,6 +322,7 @@ struct kvm_vcpu_arch {
> >
> > struct thread_info *host_thread_info; /* hyp VA */
> > struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
> > + struct task_struct *parent_task;
> >
> > struct {
> > /* {Break,watch}point registers */
> > @@ -738,6 +739,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> > void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> > void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
> > void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> > +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);
> >
> > static inline bool kvm_pmu_counter_deferred(struct perf_event_attr
> > *attr)
> > {
> > diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> > index 2fe1128d9f3d..27afeebbe1cb 100644
> > --- a/arch/arm64/kvm/fpsimd.c
> > +++ b/arch/arm64/kvm/fpsimd.c
> > @@ -15,6 +15,22 @@
> > #include <asm/kvm_mmu.h>
> > #include <asm/sysreg.h>
> >
> > +void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
> > +{
> > + struct task_struct *p = vcpu->arch.parent_task;
> > + struct user_fpsimd_state *fpsimd;
> > + struct thread_info *ti;
> > +
> > + if (!static_branch_likely(&kvm_protected_mode_initialized) || !p)
>
> Shouldn't this be a check on is_protected_kvm_enabled() instead?
> The two should be equivalent outside of the initialisation code...

Yup, it'd be nice to do checks on kvm_protected_mode_initialized only
when they're strictly necessary, and that's not the case here. I'll fold
that change in v2.

Cheers
Quentin

2021-10-19 10:40:33

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH 01/16] KVM: arm64: Introduce do_share() helper for memory sharing between components

Hi Andrew,

On Friday 15 Oct 2021 at 16:11:49 (+0100), Andrew Walbran wrote:
> On Wed, 13 Oct 2021 at 16:58, 'Quentin Perret' via kernel-team
> > +struct pkvm_mem_transition {
> > + u64 nr_pages;
> > +
> > + struct {
> > + enum pkvm_component_id id;
> > + u64 addr;
> Is this the physical address or the IPA of the initiator? It would be
> good to have a comment explaining.

That's the address in the initiator's address space. For the host and
guests that'll be an IPA (which also happens to be the same as the PA
for the host) and for the hypervisor that'll be an EL2 VA.

But yes, a comment won't hurt, so I'll add something.

<snip>
> > +static int check_share(struct pkvm_page_req *req,
> > + struct pkvm_page_share_ack *ack,
> > + struct pkvm_mem_share *share)
> > +{
> > + if (!addr_is_memory(req->phys))
> > + return -EINVAL;
> > +
> > + if (req->initiator.state == PKVM_PAGE_OWNED &&
> > + ack->completer.state == PKVM_NOPAGE) {
> > + return 0;
> > + }
> > +
> > + if (req->initiator.state != PKVM_PAGE_SHARED_OWNED)
> > + return -EPERM;
> > +
> > + if (ack->completer.state != PKVM_PAGE_SHARED_BORROWED)
> > + return -EPERM;
> > +
> > + if (ack->completer.phys != req->phys)
> > + return -EPERM;
> > +
> > + if (ack->completer.prot != share->prot)
> > + return -EPERM;
> I guess this is the workaround you mentioned for the fact that the
> host can share the same page twice? It might be worth adding a comment
> to explain that that's what's going on.

Yep, that's what is going on here. But FWIW I'm currently reworking the
way we refcount pages in v2, which will now become the host's
responsibility. So, that should simplify things quite a bit at EL2, and
make all of the above go away :-)

Cheers,
Quentin