Hi all,
This RFC series provides the infrastructure enabling to wrap the host
kernel with a stage 2 when running KVM in nVHE. This can be useful for
several use-cases, but the primary motivation is to (eventually) be able
to protect guest memory from the host kernel. More details about the
overall idea, design, and motivations can be found in Will's talk at KVM
Forum 2020 [1], or the pKVM talk at the Android uconf during LPC 2020
[2].
This series essentially gets us to a point where the 'VM' bit is set
in the host's HCR_EL2 when running in nVHE and if 'kvm-arm.protected'
is set on the kernel command line. The EL2 object directly handles
memory aborts from the host and manages entirely its stage 2 page table.
However, this series does _not_ provide any real user for this (yet)
and simply idmaps everything into the host stage 2 as RWX cacheable.
This is all about the infrastructure for now, so clearly not ready for
inclusion upstream yet (hence the RFC tag), but the bases are there and
I thought it'd be useful to start a discussion with the community early
as this is a rather intrusive change. So, here goes.
One of the interesting requirements that comes with the series is that
managing page-tables requires some sort of memory allocator at EL2 to
allocate, refcount and free memory pages. Clearly, none of that is
currently possible in nVHE, so a significant chunk of the series is
dedicated to solving that problem. The proposed EL2 memory allocator
mimics Linux' buddy system in principles, and re-uses some of the arm64
mm design choices. Specifically, it uses a vmemmap at EL2 which contains
a set of struct hyp_page entries to hold pages metadata. To support
this, I extended the EL2 object to make it manage its own stage 1
page-table in addition to host stage 2. This simplifies the hyp_vmemmap
creation and was going to be required anyway for the protected VM
use-case -- the threat model implies the host cannot be trusted after
boot, and it will thus be crucial to ensure it cannot map arbitrary code
at EL2.
The pool of memory pages used by the EL2 allocator are reserved by the
host early during boot (while it is still trusted) using the memblock
API, and are donated to EL2 during KVM init. The current assumption is
that the host reserves enough pages to allow the EL2 object to map all
of memory at page granularity for both hyp stage 1 and host stage 2,
plus some extra pages for device mappings.
On top of that the series introduces a few smaller features that are
needed along the way, but hopefully all of those are detailed properly
in the relevant commit messages.
And as a last note, I'd like to point out that there are at this point
trivial ways for the host to circumvent its stage 2 protection. It still
owns the guests stage 2 for example, meaning that nothing would prevent
a malicious host from using a guest as a proxy to access protected
memory, _yet_. This series lays the ground for future work to address
these things, which will clearly require a stage 2 over the host at some
point, so I just wanted to set the expectations right.
With all that in mind, the series is organized as follows:
- patches 01-03 provide EL2 with some utility libraries needed for
memory management and synchronization;
- patches 04-09 mostly refactor smalls portions of the code to ease the
EL2 memory management;
- patches 10-17 add the actual EL2 memory management code, as well as
the setup/bootstrap code on the KVM init path;
- patches 18-24 refactor the existing stage 2 management code to make
it re-usable from the EL2 object;
- and finally patches 25-27 introduce the host stage 2 and the trap
handling logic at EL2.
This work is based on the latest kvmarm/queue (which includes Marc's
host EL2 entry rework [3], as well as Will's guest vector refactoring
[4]) + David's PSCI proxying series [5].
And if you'd like a branch that has all the bits and pieces:
https://android-kvm.googlesource.com/linux qperret/host-stage2
Boot-tested (host and guest) using qemu in VHE and nVHE, and on real
hardware on a AML-S905X-CC (Le Potato).
Thanks,
Quentin
[1] https://kvmforum2020.sched.com/event/eE24/virtualization-for-the-masses-exposing-kvm-on-android-will-deacon-google
[2] https://youtu.be/54q6RzS9BpQ?t=10859
[3] https://lore.kernel.org/kvmarm/[email protected]/
[4] https://lore.kernel.org/kvmarm/[email protected]/
[5] https://lore.kernel.org/kvmarm/[email protected]/
Quentin Perret (24):
KVM: arm64: Initialize kvm_nvhe_init_params early
KVM: arm64: Avoid free_page() in page-table allocator
KVM: arm64: Factor memory allocation out of pgtable.c
KVM: arm64: Introduce a BSS section for use at Hyp
KVM: arm64: Make kvm_call_hyp() a function call at Hyp
KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
KVM: arm64: Introduce an early Hyp page allocator
KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
KVM: arm64: Introduce a Hyp buddy page allocator
KVM: arm64: Enable access to sanitized CPU features at EL2
KVM: arm64: Factor out vector address calculation
of/fdt: Introduce early_init_dt_add_memory_hyp()
KVM: arm64: Prepare Hyp memory protection
KVM: arm64: Elevate Hyp mappings creation at EL2
KVM: arm64: Use kvm_arch for stage 2 pgtable
KVM: arm64: Use kvm_arch in kvm_s2_mmu
KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
KVM: arm64: Refactor kvm_arm_setup_stage2()
KVM: arm64: Refactor __load_guest_stage2()
KVM: arm64: Refactor __populate_fault_info()
KVM: arm64: Make memcache anonymous in pgtable allocator
KVM: arm64: Reserve memory for host stage 2
KVM: arm64: Sort the memblock regions list
KVM: arm64: Wrap the host with a stage 2
Will Deacon (3):
arm64: lib: Annotate {clear,copy}_page() as position-independent
KVM: arm64: Link position-independent string routines into .hyp.text
KVM: arm64: Add standalone ticket spinlock implementation for use at
hyp
arch/arm64/include/asm/cpufeature.h | 1 +
arch/arm64/include/asm/hyp_image.h | 4 +
arch/arm64/include/asm/kvm_asm.h | 13 +-
arch/arm64/include/asm/kvm_cpufeature.h | 19 ++
arch/arm64/include/asm/kvm_host.h | 17 +-
arch/arm64/include/asm/kvm_hyp.h | 8 +
arch/arm64/include/asm/kvm_mmu.h | 69 +++++-
arch/arm64/include/asm/kvm_pgtable.h | 41 +++-
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kernel/cpufeature.c | 14 +-
arch/arm64/kernel/image-vars.h | 35 +++
arch/arm64/kernel/vmlinux.lds.S | 7 +
arch/arm64/kvm/arm.c | 136 +++++++++--
arch/arm64/kvm/hyp/Makefile | 2 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 36 +--
arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 ++
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 32 +++
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 33 +++
arch/arm64/kvm/hyp/include/nvhe/memory.h | 55 +++++
arch/arm64/kvm/hyp/include/nvhe/mm.h | 107 +++++++++
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 95 ++++++++
arch/arm64/kvm/hyp/include/nvhe/util.h | 25 ++
arch/arm64/kvm/hyp/nvhe/Makefile | 9 +-
arch/arm64/kvm/hyp/nvhe/cache.S | 13 ++
arch/arm64/kvm/hyp/nvhe/cpufeature.c | 8 +
arch/arm64/kvm/hyp/nvhe/early_alloc.c | 60 +++++
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 39 ++++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 50 ++++
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 191 ++++++++++++++++
arch/arm64/kvm/hyp/nvhe/mm.c | 175 ++++++++++++++
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 185 +++++++++++++++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 7 +-
arch/arm64/kvm/hyp/nvhe/setup.c | 214 ++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/stub.c | 22 ++
arch/arm64/kvm/hyp/nvhe/switch.c | 12 +-
arch/arm64/kvm/hyp/nvhe/tlb.c | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 98 ++++----
arch/arm64/kvm/hyp/reserved_mem.c | 95 ++++++++
arch/arm64/kvm/mmu.c | 114 +++++++++-
arch/arm64/kvm/reset.c | 42 +---
arch/arm64/lib/clear_page.S | 4 +-
arch/arm64/lib/copy_page.S | 4 +-
arch/arm64/mm/init.c | 3 +
drivers/of/fdt.c | 5 +
46 files changed, 1971 insertions(+), 151 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/util.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c
--
2.29.2.299.gdc1121823c-goog
From: Will Deacon <[email protected]>
Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
code and ensure that we always execute the '__pi_' entry point on the
offchance that it changes in future.
[ qperret: Commit title nits ]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kernel/image-vars.h | 11 +++++++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 4 ++++
2 files changed, 15 insertions(+)
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 8539f34d7538..dd8ccc9efb6a 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -105,6 +105,17 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
/* Array containing bases of nVHE per-CPU memory regions. */
KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
+/* Position-independent library routines */
+__kvm_nvhe_clear_page = __kvm_nvhe___pi_clear_page;
+__kvm_nvhe_copy_page = __kvm_nvhe___pi_copy_page;
+__kvm_nvhe_memcpy = __kvm_nvhe___pi_memcpy;
+__kvm_nvhe_memset = __kvm_nvhe___pi_memset;
+
+#ifdef CONFIG_KASAN
+__kvm_nvhe___memcpy = __kvm_nvhe___pi_memcpy;
+__kvm_nvhe___memset = __kvm_nvhe___pi_memset;
+#endif
+
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2..590fdefb42dd 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -6,10 +6,14 @@
asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__
+lib-objs := clear_page.o copy_page.o memcpy.o memset.o
+lib-objs := $(addprefix ../../../lib/, $(lib-objs))
+
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
hyp-main.o hyp-smp.o psci-relay.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o
+obj-y += $(lib-objs)
##
## Build rules for compiling nVHE hyp code
--
2.29.2.299.gdc1121823c-goog
From: Will Deacon <[email protected]>
clear_page() and copy_page() are suitable for use outside of the kernel
address space, so annotate them as position-independent code.
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/lib/clear_page.S | 4 ++--
arch/arm64/lib/copy_page.S | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index 073acbf02a7c..b84b179edba3 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -14,7 +14,7 @@
* Parameters:
* x0 - dest
*/
-SYM_FUNC_START(clear_page)
+SYM_FUNC_START_PI(clear_page)
mrs x1, dczid_el0
and w1, w1, #0xf
mov x2, #4
@@ -25,5 +25,5 @@ SYM_FUNC_START(clear_page)
tst x0, #(PAGE_SIZE - 1)
b.ne 1b
ret
-SYM_FUNC_END(clear_page)
+SYM_FUNC_END_PI(clear_page)
EXPORT_SYMBOL(clear_page)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index e7a793961408..29144f4cd449 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -17,7 +17,7 @@
* x0 - dest
* x1 - src
*/
-SYM_FUNC_START(copy_page)
+SYM_FUNC_START_PI(copy_page)
alternative_if ARM64_HAS_NO_HW_PREFETCH
// Prefetch three cache lines ahead.
prfm pldl1strm, [x1, #128]
@@ -75,5 +75,5 @@ alternative_else_nop_endif
stnp x16, x17, [x0, #112 - 256]
ret
-SYM_FUNC_END(copy_page)
+SYM_FUNC_END_PI(copy_page)
EXPORT_SYMBOL(copy_page)
--
2.29.2.299.gdc1121823c-goog
Previous commits have introduced infrastructure at EL2 to enable the Hyp
code to manage its own memory, and more specifically its stage 1 page
tables. However, this was preliminary work, and none of it is currently
in use.
Put all of this together by elevating the hyp mappings creation at EL2
when memory protection is enabled. In this case, the host kernel running
at EL1 still creates _temporary_ Hyp mappings, only used while
initializing the hypervisor, but frees them right after, and flips a
static key marking the new 'protected' mode of operation.
As such, all calls to create_hyp_mappings() after kvm init has finished
turn into hypercalls, as the host now has no 'legal' way to modify the
hypevisor page tables directly.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 1 -
arch/arm64/kvm/arm.c | 51 ++++++++++++++++++++++++++++++--
arch/arm64/kvm/mmu.c | 34 +++++++++++++++++++++
3 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index cb104443d8e4..bb756757b51c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -285,6 +285,5 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
*/
asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
}
-
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b1e1747e4bbf..cfe5cc55b425 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1373,7 +1373,7 @@ static void cpu_prepare_hyp_mode(int cpu)
__flush_dcache_area(params, sizeof(*params));
}
-static void cpu_init_hyp_mode(void)
+static void kvm_set_hyp_vector(void)
{
struct kvm_nvhe_init_params *params;
struct arm_smccc_res res;
@@ -1391,6 +1391,11 @@ static void cpu_init_hyp_mode(void)
params = this_cpu_ptr_nvhe_sym(kvm_init_params);
arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
+}
+
+static void cpu_init_hyp_mode(void)
+{
+ kvm_set_hyp_vector();
/*
* Disabling SSBD on a non-VHE system requires us to enable SSBS
@@ -1433,7 +1438,10 @@ static void cpu_set_hyp_vector(void)
struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
void *vector = hyp_spectre_vector_selector[data->slot];
- *this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+ if (!is_protected_kvm_enabled())
+ *this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+ else
+ kvm_call_hyp_nvhe(__hyp_cpu_set_vector, data->slot);
}
static void cpu_hyp_reinit(void)
@@ -1441,13 +1449,14 @@ static void cpu_hyp_reinit(void)
kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
cpu_hyp_reset();
- cpu_set_hyp_vector();
if (is_kernel_in_hyp_mode())
kvm_timer_init_vhe();
else
cpu_init_hyp_mode();
+ cpu_set_hyp_vector();
+
kvm_arm_init_debug();
if (vgic_present)
@@ -1653,6 +1662,36 @@ static int copy_cpu_ftr_regs(void)
return 0;
}
+static int kvm_hyp_enable_protection(void)
+{
+ void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+ int ret, cpu;
+ void *addr;
+
+ if (!is_protected_kvm_enabled())
+ return 0;
+
+ if (!hyp_mem_base)
+ return -ENOMEM;
+
+ addr = phys_to_virt(hyp_mem_base);
+ ret = create_hyp_mappings(addr, addr + hyp_mem_size - 1, PAGE_HYP);
+ if (ret)
+ return ret;
+
+ kvm_set_hyp_vector();
+ ret = kvm_call_hyp_nvhe(__kvm_hyp_protect, hyp_mem_base, hyp_mem_size,
+ num_possible_cpus(), kern_hyp_va(per_cpu_base));
+ if (ret)
+ return ret;
+
+ free_hyp_pgds();
+ for_each_possible_cpu(cpu)
+ free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+
+ return 0;
+}
+
/**
* Inits Hyp-mode on all online CPUs
*/
@@ -1789,6 +1828,12 @@ static int init_hyp_mode(void)
for_each_possible_cpu(cpu)
cpu_prepare_hyp_mode(cpu);
+ err = kvm_hyp_enable_protection();
+ if (err) {
+ kvm_err("Failed to enable hyp memory protection: %d\n", err);
+ goto out_err;
+ }
+
return 0;
out_err:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3cf9397dabdb..5c2e0feb9689 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -225,15 +225,39 @@ void free_hyp_pgds(void)
if (hyp_pgtable) {
kvm_pgtable_hyp_destroy(hyp_pgtable);
kfree(hyp_pgtable);
+ hyp_pgtable = NULL;
}
mutex_unlock(&kvm_hyp_pgd_mutex);
}
+static bool kvm_host_owns_hyp_mappings(void)
+{
+ if (static_branch_likely(&kvm_protected_mode_initialized))
+ return false;
+
+ /*
+ * This can happen at boot time when __create_hyp_mappings() is called
+ * after the hyp protection has been enabled, but the static key has
+ * not been flipped yet.
+ */
+ if (!hyp_pgtable && is_protected_kvm_enabled())
+ return false;
+
+ BUG_ON(!hyp_pgtable);
+
+ return true;
+}
+
static int __create_hyp_mappings(unsigned long start, unsigned long size,
unsigned long phys, enum kvm_pgtable_prot prot)
{
int err;
+ if (!kvm_host_owns_hyp_mappings()) {
+ return kvm_call_hyp_nvhe(__hyp_create_mappings,
+ start, size, phys, prot);
+ }
+
mutex_lock(&kvm_hyp_pgd_mutex);
err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
mutex_unlock(&kvm_hyp_pgd_mutex);
@@ -295,6 +319,16 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
unsigned long base;
int ret = 0;
+ if (!kvm_host_owns_hyp_mappings()) {
+ base = kvm_call_hyp_nvhe(__hyp_create_private_mapping,
+ phys_addr, size, prot);
+ if (!base)
+ return -ENOMEM;
+ *haddr = base;
+
+ return 0;
+ }
+
mutex_lock(&kvm_hyp_pgd_mutex);
/*
--
2.29.2.299.gdc1121823c-goog
Currently, the hyp code cannot make full use of a bss, as the kernel
section is mapped read-only.
While this mapping could simply be changed to read-write, it would
intermingle even more the hyp and kernel state than they currently are.
Instead, introduce a __hyp_bss section, that uses reserved pages, and
create the appropriate RW hyp mappings during KVM init.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/kernel/vmlinux.lds.S | 7 +++++++
arch/arm64/kvm/arm.c | 11 +++++++++++
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
4 files changed, 20 insertions(+)
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 8ff579361731..f58cf493de16 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -12,6 +12,7 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
extern char __hyp_text_start[], __hyp_text_end[];
extern char __hyp_data_ro_after_init_start[], __hyp_data_ro_after_init_end[];
+extern char __hyp_bss_start[], __hyp_bss_end[];
extern char __idmap_text_start[], __idmap_text_end[];
extern char __initdata_begin[], __initdata_end[];
extern char __inittext_begin[], __inittext_end[];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 4382b5d0645d..ded78a25365d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -8,6 +8,13 @@
#define RO_EXCEPTION_TABLE_ALIGN 8
#define RUNTIME_DISCARD_EXIT
+#define BSS_FIRST_SECTIONS \
+ . = ALIGN(PAGE_SIZE); \
+ __hyp_bss_start = .; \
+ *(.hyp.bss) \
+ . = ALIGN(PAGE_SIZE); \
+ __hyp_bss_end = .;
+
#include <asm-generic/vmlinux.lds.h>
#include <asm/cache.h>
#include <asm/hyp_image.h>
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7335eb4fb0bd..882eb383bd75 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1709,7 +1709,18 @@ static int init_hyp_mode(void)
goto out_err;
}
+ /*
+ * .hyp.bss is placed at the beginning of the .bss section, so map that
+ * part RW, and the rest RO as the hyp shouldn't be touching it.
+ */
err = create_hyp_mappings(kvm_ksym_ref(__bss_start),
+ kvm_ksym_ref(__hyp_bss_end), PAGE_HYP);
+ if (err) {
+ kvm_err("Cannot map hyp bss section: %d\n", err);
+ goto out_err;
+ }
+
+ err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_end),
kvm_ksym_ref(__bss_stop), PAGE_HYP_RO);
if (err) {
kvm_err("Cannot map bss section\n");
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
index 5d76ff2ba63e..dc281d90063e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
@@ -17,4 +17,5 @@ SECTIONS {
PERCPU_INPUT(L1_CACHE_BYTES)
}
HYP_SECTION(.data..ro_after_init)
+ HYP_SECTION(.bss)
}
--
2.29.2.299.gdc1121823c-goog
In order to allow the usage of code shared by the host and the hyp in
static inline library function, allow the usage of kvm_nvhe_sym() at el2
by defaulting to the raw symbol name.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/hyp_image.h | 4 ++++
arch/arm64/include/asm/kvm_asm.h | 4 ++--
arch/arm64/kvm/arm.c | 2 +-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index daa1a1da539e..8b807b646b8f 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -7,11 +7,15 @@
#ifndef __ARM64_HYP_IMAGE_H__
#define __ARM64_HYP_IMAGE_H__
+#ifndef __KVM_NVHE_HYPERVISOR__
/*
* KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
* to separate it from the kernel proper.
*/
#define kvm_nvhe_sym(sym) __kvm_nvhe_##sym
+#else
+#define kvm_nvhe_sym(sym) sym
+#endif
#ifdef LINKER_SCRIPT
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 1a86581e581e..e4934f5e4234 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -173,11 +173,11 @@ struct kvm_s2_mmu;
DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
-DECLARE_KVM_NVHE_SYM(__kvm_hyp_psci_cpu_entry);
#define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
#define __kvm_hyp_host_vector CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
-#define __kvm_hyp_psci_cpu_entry CHOOSE_NVHE_SYM(__kvm_hyp_psci_cpu_entry)
+
+void kvm_nvhe_sym(__kvm_hyp_psci_cpu_entry)(void);
extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_SYM(__per_cpu_start);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 882eb383bd75..391cf6753a13 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
params->vector_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_host_vector));
params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
- params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
+ params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
params->pgd_pa = kvm_mmu_get_httbr();
/*
--
2.29.2.299.gdc1121823c-goog
kvm_call_hyp() has some logic to issue a function call or a hypercall
depending the EL at which the kernel is running. However, all the code
compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to run only at EL2,
and in this case a simple function call is needed.
Add ifdefery to kvm_host.h to symplify kvm_call_hyp() in .hyp.text.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ac11adab6602..7a5d5f4b3351 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -557,6 +557,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm);
+#ifndef __KVM_NVHE_HYPERVISOR__
#define kvm_call_hyp_nvhe(f, ...) \
({ \
struct arm_smccc_res res; \
@@ -596,6 +597,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
\
ret; \
})
+#else /* __KVM_NVHE_HYPERVISOR__ */
+#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
+#endif /* __KVM_NVHE_HYPERVISOR__ */
void force_vm_exit(const cpumask_t *mask);
void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
--
2.29.2.299.gdc1121823c-goog
The current stage2 page-table allocator uses a memcache to get
pre-allocated pages when it needs any. To allow re-using this code at
EL2 which uses a concept of memory pools, make the memcache argument to
kvm_pgtable_stage2_map() anonymous. and let the mm_ops zalloc_page()
callbacks use it the way they need to.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
arch/arm64/kvm/hyp/pgtable.c | 4 ++--
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8e8f1d2c5e0e..d846bc3d3b77 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -176,8 +176,8 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
* @size: Size of the mapping.
* @phys: Physical address of the memory to map.
* @prot: Permissions and attributes for the mapping.
- * @mc: Cache of pre-allocated GFP_PGTABLE_USER memory from which to
- * allocate page-table pages.
+ * @mc: Cache of pre-allocated memory from which to allocate page-table
+ * pages.
*
* The offset of @addr within a page is ignored, @size is rounded-up to
* the next page boundary and @phys is rounded-down to the previous page
@@ -194,7 +194,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
*/
int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
u64 phys, enum kvm_pgtable_prot prot,
- struct kvm_mmu_memory_cache *mc);
+ void *mc);
/**
* kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 96a25d0b7b6e..5dd1b4978fe8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -443,7 +443,7 @@ struct stage2_map_data {
kvm_pte_t *anchor;
struct kvm_s2_mmu *mmu;
- struct kvm_mmu_memory_cache *memcache;
+ void *memcache;
struct kvm_pgtable_mm_ops *mm_ops;
};
@@ -613,7 +613,7 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
u64 phys, enum kvm_pgtable_prot prot,
- struct kvm_mmu_memory_cache *mc)
+ void *mc)
{
int ret;
struct stage2_map_data map_data = {
--
2.29.2.299.gdc1121823c-goog
In order to use the kernel list library at EL2, introduce stubs for the
CONFIG_DEBUG_LIST out-of-lines calls.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/stub.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1fc0684a7678..33bd381d8f73 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -10,7 +10,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
+ hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o
obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/stub.c b/arch/arm64/kvm/hyp/nvhe/stub.c
new file mode 100644
index 000000000000..c0aa6bbfd79d
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/stub.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stubs for out-of-line function calls caused by re-using kernel
+ * infrastructure at EL2.
+ *
+ * Copyright (C) 2020 - Google LLC
+ */
+
+#include <linux/list.h>
+
+#ifdef CONFIG_DEBUG_LIST
+bool __list_add_valid(struct list_head *new, struct list_head *prev,
+ struct list_head *next)
+{
+ return true;
+}
+
+bool __list_del_entry_valid(struct list_head *entry)
+{
+ return true;
+}
+#endif
--
2.29.2.299.gdc1121823c-goog
The hypervisor will need the list of memblock regions sorted by
increasing start address to make look-ups more efficient. Make the
host do the hard work early while it is still trusted to avoid the need
for a sorting library at EL2.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/hyp/reserved_mem.c | 18 ++++++++++++++++++
3 files changed, 20 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 53b01d25e7d9..ec304a5c728b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -746,6 +746,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
extern phys_addr_t hyp_mem_base;
extern phys_addr_t hyp_mem_size;
void __init reserve_kvm_hyp(void);
+void kvm_sort_memblock_regions(void);
#else
static inline void reserve_kvm_hyp(void) { }
#endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e06c95a10dba..8160a0d12a58 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1685,6 +1685,7 @@ static int kvm_hyp_enable_protection(void)
return ret;
kvm_set_hyp_vector();
+ kvm_sort_memblock_regions();
ret = kvm_call_hyp_nvhe(__kvm_hyp_protect, hyp_mem_base, hyp_mem_size,
num_possible_cpus(), kern_hyp_va(per_cpu_base));
if (ret)
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index c2c0484b6211..7da8e2915c1c 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -6,6 +6,7 @@
#include <linux/kvm_host.h>
#include <linux/memblock.h>
+#include <linux/sort.h>
#include <asm/kvm_host.h>
@@ -31,6 +32,23 @@ void __init early_init_dt_add_memory_hyp(u64 base, u64 size)
kvm_nvhe_sym(hyp_memblock_nr)++;
}
+static int cmp_hyp_memblock(const void *p1, const void *p2)
+{
+ const struct hyp_memblock_region *r1 = p1;
+ const struct hyp_memblock_region *r2 = p2;
+
+ return r1->start < r2->start ? -1 : (r1->start > r2->start);
+}
+
+void kvm_sort_memblock_regions(void)
+{
+ sort(kvm_nvhe_sym(hyp_memory),
+ kvm_nvhe_sym(hyp_memblock_nr),
+ sizeof(struct hyp_memblock_region),
+ cmp_hyp_memblock,
+ NULL);
+}
+
extern bool enable_protected_kvm;
void __init reserve_kvm_hyp(void)
{
--
2.29.2.299.gdc1121823c-goog
Currently, the KVM page-table allocator uses a mix of put_page() and
free_page() calls depending on the context even though page-allocation
is always achieved using variants of __get_free_page().
Make the code consitent by using put_page() throughout, and reduce the
memory management API surface used by the page-table code. This will
ease factoring out page-alloction from pgtable.c, which is a
pre-requisite to creating page-tables at EL2.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..d7122c5eac24 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -410,7 +410,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
- free_page((unsigned long)kvm_pte_follow(*ptep));
+ put_page(virt_to_page(kvm_pte_follow(*ptep)));
return 0;
}
@@ -422,7 +422,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
};
WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
- free_page((unsigned long)pgt->pgd);
+ put_page(virt_to_page(pgt->pgd));
pgt->pgd = NULL;
}
@@ -551,7 +551,7 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
if (!data->anchor)
return 0;
- free_page((unsigned long)kvm_pte_follow(*ptep));
+ put_page(virt_to_page(kvm_pte_follow(*ptep)));
put_page(virt_to_page(ptep));
if (data->anchor == ptep) {
@@ -674,7 +674,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
}
if (childp)
- free_page((unsigned long)childp);
+ put_page(virt_to_page(childp));
return 0;
}
@@ -871,7 +871,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
put_page(virt_to_page(ptep));
if (kvm_pte_table(pte, level))
- free_page((unsigned long)kvm_pte_follow(pte));
+ put_page(virt_to_page(kvm_pte_follow(pte)));
return 0;
}
--
2.29.2.299.gdc1121823c-goog
When memory protection is enabled, the hyp code will require a basic
form of memory management in order to allocate and free memory pages at
EL2. This is needed for various use-cases, including the creation of hyp
mappings or the allocation of stage 2 page tables.
To address these use-case, introduce a simple memory allocator in the
hyp code. The allocator is designed as a conventional 'buddy allocator',
working with a page granularity. It allows to allocate and free
physically contiguous pages from memory 'pools', with a guaranteed order
alignment in the PA space. Each page in a memory pool is associated
with a struct hyp_page which holds the page's metadata, including its
refcount, as well as its current order, hence mimicking the kernel's
buddy system in the GFP infrastructure. The hyp_page metadata are made
accessible through a hyp_vmemmap, following the concept of
SPARSE_VMEMMAP in the kernel.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 32 ++++
arch/arm64/kvm/hyp/include/nvhe/memory.h | 25 +++
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 185 +++++++++++++++++++++++
4 files changed, 243 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
new file mode 100644
index 000000000000..95587faee171
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_GFP_H
+#define __KVM_HYP_GFP_H
+
+#include <linux/list.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_MAX_ORDER 11U
+#define HYP_NO_ORDER UINT_MAX
+
+struct hyp_pool {
+ hyp_spinlock_t lock;
+ struct list_head free_area[HYP_MAX_ORDER + 1];
+ phys_addr_t range_start;
+ phys_addr_t range_end;
+};
+
+/* GFP flags */
+#define HYP_GFP_NONE 0
+#define HYP_GFP_ZERO 1
+
+/* Allocation */
+void *hyp_alloc_pages(struct hyp_pool *pool, gfp_t mask, unsigned int order);
+void hyp_get_page(void *addr);
+void hyp_put_page(void *addr);
+
+/* Used pages cannot be freed */
+int hyp_pool_init(struct hyp_pool *pool, phys_addr_t phys,
+ unsigned int nr_pages, unsigned int used_pages);
+#endif /* __KVM_HYP_GFP_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 64c44c142c95..ed47674bc988 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -6,7 +6,17 @@
#include <linux/types.h>
+struct hyp_pool;
+struct hyp_page {
+ unsigned int refcount;
+ unsigned int order;
+ struct hyp_pool *pool;
+ struct list_head node;
+};
+
extern s64 hyp_physvirt_offset;
+extern u64 __hyp_vmemmap;
+#define hyp_vmemmap ((struct hyp_page *)__hyp_vmemmap)
#define __hyp_pa(virt) ((phys_addr_t)(virt) + hyp_physvirt_offset)
#define __hyp_va(virt) ((void *)((phys_addr_t)(virt) - hyp_physvirt_offset))
@@ -21,4 +31,19 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr)
return __hyp_pa(addr);
}
+#define hyp_phys_to_pfn(phys) ((phys) >> PAGE_SHIFT)
+#define hyp_phys_to_page(phys) (&hyp_vmemmap[hyp_phys_to_pfn(phys)])
+#define hyp_virt_to_page(virt) hyp_phys_to_page(__hyp_pa(virt))
+
+#define hyp_page_to_phys(page) ((phys_addr_t)((page) - hyp_vmemmap) << PAGE_SHIFT)
+#define hyp_page_to_virt(page) __hyp_va(hyp_page_to_phys(page))
+#define hyp_page_to_pool(page) (((struct hyp_page *)page)->pool)
+
+static inline int hyp_page_count(void *addr)
+{
+ struct hyp_page *p = hyp_virt_to_page(addr);
+
+ return p->refcount;
+}
+
#endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 33bd381d8f73..9e5eacfec6ec 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -10,7 +10,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
+ hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o
obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
new file mode 100644
index 000000000000..6de6515f0432
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <nvhe/gfp.h>
+
+u64 __hyp_vmemmap;
+
+/*
+ * Example buddy-tree for a 4-pages physically contiguous pool:
+ *
+ * o : Page 3
+ * /
+ * o-o : Page 2
+ * /
+ * / o : Page 1
+ * / /
+ * o---o-o : Page 0
+ * Order 2 1 0
+ *
+ * Example of requests on this zon:
+ * __find_buddy(pool, page 0, order 0) => page 1
+ * __find_buddy(pool, page 0, order 1) => page 2
+ * __find_buddy(pool, page 1, order 0) => page 0
+ * __find_buddy(pool, page 2, order 0) => page 3
+ */
+static struct hyp_page *__find_buddy(struct hyp_pool *pool, struct hyp_page *p,
+ unsigned int order)
+{
+ phys_addr_t addr = hyp_page_to_phys(p);
+
+ addr ^= (PAGE_SIZE << order);
+ if (addr < pool->range_start || addr >= pool->range_end)
+ return NULL;
+
+ return hyp_phys_to_page(addr);
+}
+
+static void __hyp_attach_page(struct hyp_pool *pool,
+ struct hyp_page *p)
+{
+ unsigned int order = p->order;
+ struct hyp_page *buddy;
+
+ p->order = HYP_NO_ORDER;
+ for (; order < HYP_MAX_ORDER; order++) {
+ /* Nothing to do if the buddy isn't in a free-list */
+ buddy = __find_buddy(pool, p, order);
+ if (!buddy || list_empty(&buddy->node) || buddy->order != order)
+ break;
+
+ /* Otherwise, coalesce the buddies and go one level up */
+ list_del_init(&buddy->node);
+ buddy->order = HYP_NO_ORDER;
+ p = (p < buddy) ? p : buddy;
+ }
+
+ p->order = order;
+ list_add_tail(&p->node, &pool->free_area[order]);
+}
+
+void hyp_put_page(void *addr)
+{
+ struct hyp_page *p = hyp_virt_to_page(addr);
+ struct hyp_pool *pool = hyp_page_to_pool(p);
+
+ hyp_spin_lock(&pool->lock);
+ if (!p->refcount)
+ hyp_panic();
+ p->refcount--;
+ if (!p->refcount)
+ __hyp_attach_page(pool, p);
+ hyp_spin_unlock(&pool->lock);
+}
+
+void hyp_get_page(void *addr)
+{
+ struct hyp_page *p = hyp_virt_to_page(addr);
+ struct hyp_pool *pool = hyp_page_to_pool(p);
+
+ hyp_spin_lock(&pool->lock);
+ p->refcount++;
+ hyp_spin_unlock(&pool->lock);
+}
+
+/* Extract a page from the buddy tree, at a specific order */
+static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
+ struct hyp_page *p,
+ unsigned int order)
+{
+ struct hyp_page *buddy;
+
+ if (p->order == HYP_NO_ORDER || p->order < order)
+ return NULL;
+
+ list_del_init(&p->node);
+
+ /* Split the page in two until reaching the requested order */
+ while (p->order > order) {
+ p->order--;
+ buddy = __find_buddy(pool, p, p->order);
+ buddy->order = p->order;
+ list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
+ }
+
+ p->refcount = 1;
+
+ return p;
+}
+
+static void clear_hyp_page(struct hyp_page *p)
+{
+ unsigned long i;
+
+ for (i = 0; i < (1 << p->order); i++)
+ clear_page(hyp_page_to_virt(p) + (i << PAGE_SHIFT));
+}
+
+static void *__hyp_alloc_pages(struct hyp_pool *pool, gfp_t mask,
+ unsigned int order)
+{
+ unsigned int i = order;
+ struct hyp_page *p;
+
+ /* Look for a high-enough-order page */
+ while (i <= HYP_MAX_ORDER && list_empty(&pool->free_area[i]))
+ i++;
+ if (i > HYP_MAX_ORDER)
+ return NULL;
+
+ /* Extract it from the tree at the right order */
+ p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
+ p = __hyp_extract_page(pool, p, order);
+
+ if (mask & HYP_GFP_ZERO)
+ clear_hyp_page(p);
+
+ return p;
+}
+
+void *hyp_alloc_pages(struct hyp_pool *pool, gfp_t mask, unsigned int order)
+{
+ struct hyp_page *p;
+
+ hyp_spin_lock(&pool->lock);
+ p = __hyp_alloc_pages(pool, mask, order);
+ hyp_spin_unlock(&pool->lock);
+
+ return p ? hyp_page_to_virt(p) : NULL;
+}
+
+/* hyp_vmemmap must be backed beforehand */
+int hyp_pool_init(struct hyp_pool *pool, phys_addr_t phys,
+ unsigned int nr_pages, unsigned int used_pages)
+{
+ struct hyp_page *p;
+ int i;
+
+ if (phys % PAGE_SIZE)
+ return -EINVAL;
+
+ hyp_spin_lock_init(&pool->lock);
+ for (i = 0; i <= HYP_MAX_ORDER; i++)
+ INIT_LIST_HEAD(&pool->free_area[i]);
+ pool->range_start = phys;
+ pool->range_end = phys + (nr_pages << PAGE_SHIFT);
+
+ /* Init the vmemmap portion */
+ p = hyp_phys_to_page(phys);
+ memset(p, 0, sizeof(*p) * nr_pages);
+ for (i = 0; i < nr_pages; i++, p++) {
+ p->pool = pool;
+ INIT_LIST_HEAD(&p->node);
+ }
+
+ /* Attach the unused pages to the buddy tree */
+ p = hyp_phys_to_page(phys + (used_pages << PAGE_SHIFT));
+ for (i = used_pages; i < nr_pages; i++, p++)
+ __hyp_attach_page(pool, p);
+
+ return 0;
+}
--
2.29.2.299.gdc1121823c-goog
In order to re-map the guest vectors at EL2 when pKVM is enabled,
refactor __kvm_vector_slot2idx() and kvm_init_vector_slot() to move all
the address calculation logic in a static inline function.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 8 ++++++++
arch/arm64/kvm/arm.c | 9 +--------
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5168a0c516ae..cb104443d8e4 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -171,6 +171,14 @@ phys_addr_t kvm_mmu_get_httbr(void);
phys_addr_t kvm_get_idmap_vector(void);
int kvm_mmu_init(void);
+static inline void *__kvm_vector_slot2addr(void *base,
+ enum arm64_hyp_spectre_vector slot)
+{
+ int idx = slot - (slot != HYP_VECTOR_DIRECT);
+
+ return base + (idx * SZ_2K);
+}
+
struct kvm;
#define kvm_flush_dcache_to_poc(a,l) __flush_dcache_area((a), (l))
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c7f8fca97202..b1e1747e4bbf 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1318,16 +1318,9 @@ static unsigned long nvhe_percpu_order(void)
/* A lookup table holding the hypervisor VA for each vector slot */
static void *hyp_spectre_vector_selector[BP_HARDEN_EL2_SLOTS];
-static int __kvm_vector_slot2idx(enum arm64_hyp_spectre_vector slot)
-{
- return slot - (slot != HYP_VECTOR_DIRECT);
-}
-
static void kvm_init_vector_slot(void *base, enum arm64_hyp_spectre_vector slot)
{
- int idx = __kvm_vector_slot2idx(slot);
-
- hyp_spectre_vector_selector[slot] = base + (idx * SZ_2K);
+ hyp_spectre_vector_selector[slot] = __kvm_vector_slot2addr(base, slot);
}
static int kvm_init_vector_slots(void)
--
2.29.2.299.gdc1121823c-goog
Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.
Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy.
While at it, introduce the first user of this infrastructure by
implementing __flush_dcache_area at EL2, which needs
arm64_ftr_reg_ctrel0.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/cpufeature.h | 1 +
arch/arm64/include/asm/kvm_cpufeature.h | 17 ++++++++++++++
arch/arm64/kernel/cpufeature.c | 12 ++++++++++
arch/arm64/kernel/image-vars.h | 2 ++
arch/arm64/kvm/arm.c | 31 +++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 3 ++-
arch/arm64/kvm/hyp/nvhe/cache.S | 13 +++++++++++
arch/arm64/kvm/hyp/nvhe/cpufeature.c | 8 +++++++
8 files changed, 86 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index da250e4741bd..3dfbd76fb647 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -600,6 +600,7 @@ void __init setup_cpu_features(void);
void check_local_cpu_capabilities(void);
u64 read_sanitised_ftr_reg(u32 id);
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst);
static inline bool cpu_supports_mixed_endian_el0(void)
{
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
new file mode 100644
index 000000000000..d34f85cba358
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <asm/cpufeature.h>
+
+#ifndef KVM_HYP_CPU_FTR_REG
+#if defined(__KVM_NVHE_HYPERVISOR__)
+#define KVM_HYP_CPU_FTR_REG(id, name) extern struct arm64_ftr_reg name;
+#else
+#define KVM_HYP_CPU_FTR_REG(id, name) DECLARE_KVM_NVHE_SYM(name);
+#endif
+#endif
+
+KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index dd5bc0f0cf0d..3bc86d1423f8 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1116,6 +1116,18 @@ u64 read_sanitised_ftr_reg(u32 id)
}
EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
+{
+ struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
+
+ if (!regp)
+ return -EINVAL;
+
+ memcpy(dst, regp, sizeof(*regp));
+
+ return 0;
+}
+
#define read_sysreg_case(r) \
case r: return read_sysreg_s(r)
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index dd8ccc9efb6a..c35d768672eb 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -116,6 +116,8 @@ __kvm_nvhe___memcpy = __kvm_nvhe___pi_memcpy;
__kvm_nvhe___memset = __kvm_nvhe___pi_memset;
#endif
+_kvm_nvhe___flush_dcache_area = __kvm_nvhe___pi___flush_dcache_area;
+
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 391cf6753a13..c7f8fca97202 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -34,6 +34,7 @@
#include <asm/virt.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
+#include <asm/kvm_cpufeature.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_emulate.h>
#include <asm/sections.h>
@@ -1636,6 +1637,29 @@ static void teardown_hyp_mode(void)
}
}
+#undef KVM_HYP_CPU_FTR_REG
+#define KVM_HYP_CPU_FTR_REG(id, name) \
+ { .sys_id = id, .dst = (struct arm64_ftr_reg *)&kvm_nvhe_sym(name) },
+static const struct __ftr_reg_copy_entry {
+ u32 sys_id;
+ struct arm64_ftr_reg *dst;
+} hyp_ftr_regs[] = {
+ #include <asm/kvm_cpufeature.h>
+};
+
+static int copy_cpu_ftr_regs(void)
+{
+ int i, ret;
+
+ for (i = 0; i < ARRAY_SIZE(hyp_ftr_regs); i++) {
+ ret = copy_ftr_reg(hyp_ftr_regs[i].sys_id, hyp_ftr_regs[i].dst);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
/**
* Inits Hyp-mode on all online CPUs
*/
@@ -1644,6 +1668,13 @@ static int init_hyp_mode(void)
int cpu;
int err = 0;
+ /*
+ * Copy the required CPU feature register in their EL2 counterpart
+ */
+ err = copy_cpu_ftr_regs();
+ if (err)
+ return err;
+
/*
* Allocate Hyp PGD and setup Hyp identity mapping
*/
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 9e5eacfec6ec..72cfe53f106f 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -10,7 +10,8 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
+ hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
+ cache.o cpufeature.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o
obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
new file mode 100644
index 000000000000..36cef6915428
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Code copied from arch/arm64/mm/cache.S.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/alternative.h>
+
+SYM_FUNC_START_PI(__flush_dcache_area)
+ dcache_by_line_op civac, sy, x0, x1, x2, x3
+ ret
+SYM_FUNC_END_PI(__flush_dcache_area)
diff --git a/arch/arm64/kvm/hyp/nvhe/cpufeature.c b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
new file mode 100644
index 000000000000..a887508f996f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#define KVM_HYP_CPU_FTR_REG(id, name) struct arm64_ftr_reg name;
+#include <asm/kvm_cpufeature.h>
--
2.29.2.299.gdc1121823c-goog
Extend the memory pool allocated for the hypervisor to include enough
pages to map all of memory at page granularity for the host stage 2.
While at it, also reserve some memory for device mappings.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mm.h | 36 ++++++++++++++++++++++++----
arch/arm64/kvm/hyp/nvhe/setup.c | 12 ++++++++++
arch/arm64/kvm/hyp/reserved_mem.c | 2 ++
3 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 5a3ad6f4e5bc..b79be2580164 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -52,15 +52,12 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
return total;
}
-static inline unsigned long hyp_s1_pgtable_size(void)
+static inline unsigned long __hyp_pgtable_total_size(void)
{
struct hyp_memblock_region *reg;
unsigned long nr_pages, res = 0;
int i;
- if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
- return 0;
-
for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
reg = &kvm_nvhe_sym(hyp_memory)[i];
nr_pages = (reg->end - reg->start) >> PAGE_SHIFT;
@@ -68,6 +65,18 @@ static inline unsigned long hyp_s1_pgtable_size(void)
res += nr_pages << PAGE_SHIFT;
}
+ return res;
+}
+
+static inline unsigned long hyp_s1_pgtable_size(void)
+{
+ unsigned long res, nr_pages;
+
+ if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
+ return 0;
+
+ res = __hyp_pgtable_total_size();
+
/* Allow 1 GiB for private mappings */
nr_pages = (1 << 30) >> PAGE_SHIFT;
nr_pages = __hyp_pgtable_max_pages(nr_pages);
@@ -76,4 +85,23 @@ static inline unsigned long hyp_s1_pgtable_size(void)
return res;
}
+static inline unsigned long host_s2_mem_pgtable_size(void)
+{
+ unsigned long max_pgd_sz = 16 << PAGE_SHIFT;
+
+ if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
+ return 0;
+
+ return __hyp_pgtable_total_size() + max_pgd_sz;
+}
+
+static inline unsigned long host_s2_dev_pgtable_size(void)
+{
+ if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
+ return 0;
+
+ /* Allow 1 GiB for private mappings */
+ return __hyp_pgtable_max_pages((1 << 30) >> PAGE_SHIFT) << PAGE_SHIFT;
+}
+
#endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 9679c97b875b..b73e6b08cfba 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -24,6 +24,8 @@ unsigned long hyp_nr_cpus;
static void *stacks_base;
static void *vmemmap_base;
static void *hyp_pgt_base;
+static void *host_s2_mem_pgt_base;
+static void *host_s2_dev_pgt_base;
static int divide_memory_pool(void *virt, unsigned long size)
{
@@ -46,6 +48,16 @@ static int divide_memory_pool(void *virt, unsigned long size)
if (!hyp_pgt_base)
return -ENOMEM;
+ nr_pages = host_s2_mem_pgtable_size() >> PAGE_SHIFT;
+ host_s2_mem_pgt_base = hyp_early_alloc_contig(nr_pages);
+ if (!host_s2_mem_pgt_base)
+ return -ENOMEM;
+
+ nr_pages = host_s2_dev_pgtable_size() >> PAGE_SHIFT;
+ host_s2_dev_pgt_base = hyp_early_alloc_contig(nr_pages);
+ if (!host_s2_dev_pgt_base)
+ return -ENOMEM;
+
return 0;
}
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 02b0b18006f5..c2c0484b6211 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -47,6 +47,8 @@ void __init reserve_kvm_hyp(void)
hyp_mem_size += num_possible_cpus() << PAGE_SHIFT;
hyp_mem_size += hyp_s1_pgtable_size();
+ hyp_mem_size += host_s2_mem_pgtable_size();
+ hyp_mem_size += host_s2_dev_pgtable_size();
/*
* The hyp_vmemmap needs to be backed by pages, but these pages
--
2.29.2.299.gdc1121823c-goog
When memory protection is enabled, the Hyp code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to initialize Hyp memory protection.
During the init hcall, the hypervisor runs with the host-provided
page-table and uses the trivial early page allocator to create its own
set of page-tables, using a memory pool that was donated by the host.
Specifically, the hypervisor creates its own mappings for __hyp_text,
the Hyp memory pool, the __hyp_bss, the portion of hyp_vmemmap
corresponding to the Hyp pool, among other things. It then jumps back in
the idmap page, switches to use the newly-created pgd (instead of the
temporary one provided by the host) and then installs the full-fledged
buddy allocator which will then be the only one in used from then on.
Note that for the sake of symplifying the review, this only introduces
the code doing this operation, without actually being called by anyhing
yet. This will be done in a subsequent patch, which will introduce the
necessary host kernel changes.
Credits to Will for __kvm_init_switch_pgd.
Co-authored-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 6 +-
arch/arm64/include/asm/kvm_host.h | 8 +
arch/arm64/include/asm/kvm_hyp.h | 8 +
arch/arm64/kernel/cpufeature.c | 2 +-
arch/arm64/kernel/image-vars.h | 19 +++
arch/arm64/kvm/hyp/Makefile | 2 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +
arch/arm64/kvm/hyp/include/nvhe/mm.h | 79 +++++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 4 +-
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 30 ++++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 44 +++++
arch/arm64/kvm/hyp/nvhe/mm.c | 175 ++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 -
arch/arm64/kvm/hyp/nvhe/setup.c | 196 +++++++++++++++++++++++
arch/arm64/kvm/hyp/reserved_mem.c | 75 +++++++++
arch/arm64/kvm/mmu.c | 2 +-
arch/arm64/mm/init.c | 3 +
17 files changed, 653 insertions(+), 8 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index e4934f5e4234..9266b17f8ba9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -57,6 +57,10 @@
#define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 12
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14
+#define __KVM_HOST_SMCCC_FUNC___kvm_hyp_protect 15
+#define __KVM_HOST_SMCCC_FUNC___hyp_create_mappings 16
+#define __KVM_HOST_SMCCC_FUNC___hyp_create_private_mapping 17
+#define __KVM_HOST_SMCCC_FUNC___hyp_cpu_set_vector 18
#ifndef __ASSEMBLY__
@@ -171,7 +175,7 @@ struct kvm_vcpu;
struct kvm_s2_mmu;
DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
-DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
+DECLARE_KVM_HYP_SYM(__kvm_hyp_host_vector);
DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
#define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
#define __kvm_hyp_host_vector CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7a5d5f4b3351..ee8bb8021637 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -742,4 +742,12 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
#define kvm_vcpu_has_pmu(vcpu) \
(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
+#ifdef CONFIG_KVM
+extern phys_addr_t hyp_mem_base;
+extern phys_addr_t hyp_mem_size;
+void __init reserve_kvm_hyp(void);
+#else
+static inline void reserve_kvm_hyp(void) { }
+#endif
+
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 95a2bbbcc7e1..dbd2ef86afa9 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -105,5 +105,13 @@ void __noreturn hyp_panic(void);
void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
#endif
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __kvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
+ phys_addr_t pgd, void *sp, void *cont_fn);
+int __kvm_hyp_protect(phys_addr_t phys, unsigned long size,
+ unsigned long nr_cpus, unsigned long *per_cpu_base);
+void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+#endif
+
#endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 3bc86d1423f8..010458f6d799 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1722,7 +1722,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
#endif /* CONFIG_ARM64_MTE */
#ifdef CONFIG_KVM
-static bool enable_protected_kvm;
+bool enable_protected_kvm;
static bool has_protected_kvm(const struct arm64_cpu_capabilities *entry, int __unused)
{
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index c35d768672eb..f2d43e6cd86d 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -118,6 +118,25 @@ __kvm_nvhe___memset = __kvm_nvhe___pi_memset;
_kvm_nvhe___flush_dcache_area = __kvm_nvhe___pi___flush_dcache_area;
+/* Hypevisor VA size */
+KVM_NVHE_ALIAS(hyp_va_bits);
+
+/* Kernel memory sections */
+KVM_NVHE_ALIAS(__start_rodata);
+KVM_NVHE_ALIAS(__end_rodata);
+KVM_NVHE_ALIAS(__bss_start);
+KVM_NVHE_ALIAS(__bss_stop);
+
+/* Hyp memory sections */
+KVM_NVHE_ALIAS(__hyp_idmap_text_start);
+KVM_NVHE_ALIAS(__hyp_idmap_text_end);
+KVM_NVHE_ALIAS(__hyp_text_start);
+KVM_NVHE_ALIAS(__hyp_text_end);
+KVM_NVHE_ALIAS(__hyp_data_ro_after_init_start);
+KVM_NVHE_ALIAS(__hyp_data_ro_after_init_end);
+KVM_NVHE_ALIAS(__hyp_bss_start);
+KVM_NVHE_ALIAS(__hyp_bss_end);
+
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 687598e41b21..b726332eec49 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -10,4 +10,4 @@ subdir-ccflags-y := -I$(incdir) \
-DDISABLE_BRANCH_PROFILING \
$(DISABLE_STACKLEAK_PLUGIN)
-obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
+obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o reserved_mem.o
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index ed47674bc988..c8af6fe87bfb 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -6,6 +6,12 @@
#include <linux/types.h>
+#define HYP_MEMBLOCK_REGIONS 128
+struct hyp_memblock_region {
+ phys_addr_t start;
+ phys_addr_t end;
+};
+
struct hyp_pool;
struct hyp_page {
unsigned int refcount;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
new file mode 100644
index 000000000000..5a3ad6f4e5bc
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MM_H
+#define __KVM_HYP_MM_H
+
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+#include <linux/types.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+extern struct hyp_memblock_region kvm_nvhe_sym(hyp_memory)[];
+extern int kvm_nvhe_sym(hyp_memblock_nr);
+extern struct kvm_pgtable hyp_pgtable;
+extern hyp_spinlock_t __hyp_pgd_lock;
+extern struct hyp_pool hpool;
+extern u64 __io_map_base;
+extern u32 hyp_va_bits;
+
+int hyp_create_idmap(void);
+int hyp_map_vectors(void);
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
+int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+int __hyp_create_mappings(unsigned long start, unsigned long size,
+ unsigned long phys, unsigned long prot);
+unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
+ unsigned long prot);
+
+static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
+ unsigned long *start, unsigned long *end)
+{
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct hyp_page *p = hyp_phys_to_page(phys);
+
+ *start = (unsigned long)p;
+ *end = *start + nr_pages * sizeof(struct hyp_page);
+ *start = ALIGN_DOWN(*start, PAGE_SIZE);
+ *end = ALIGN(*end, PAGE_SIZE);
+}
+
+static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
+{
+ unsigned long total = 0, i;
+
+ /* Provision the worst case scenario with 4 levels of page-table */
+ for (i = 0; i < 4; i++) {
+ nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
+ total += nr_pages;
+ }
+
+ return total;
+}
+
+static inline unsigned long hyp_s1_pgtable_size(void)
+{
+ struct hyp_memblock_region *reg;
+ unsigned long nr_pages, res = 0;
+ int i;
+
+ if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
+ return 0;
+
+ for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+ reg = &kvm_nvhe_sym(hyp_memory)[i];
+ nr_pages = (reg->end - reg->start) >> PAGE_SHIFT;
+ nr_pages = __hyp_pgtable_max_pages(nr_pages);
+ res += nr_pages << PAGE_SHIFT;
+ }
+
+ /* Allow 1 GiB for private mappings */
+ nr_pages = (1 << 30) >> PAGE_SHIFT;
+ nr_pages = __hyp_pgtable_max_pages(nr_pages);
+ res += nr_pages << PAGE_SHIFT;
+
+ return res;
+}
+
+#endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 72cfe53f106f..d7381a503182 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -11,9 +11,9 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
- cache.o cpufeature.o
+ cache.o cpufeature.o setup.o mm.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
- ../fpsimd.o ../hyp-entry.o ../exception.o
+ ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
obj-y += $(lib-objs)
##
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 8f3602f320ac..e2d62297edfe 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -247,4 +247,34 @@ alternative_else_nop_endif
SYM_CODE_END(__kvm_handle_stub_hvc)
+SYM_FUNC_START(__kvm_init_switch_pgd)
+ /* Turn the MMU off */
+ pre_disable_mmu_workaround
+ mrs x2, sctlr_el2
+ bic x3, x2, #SCTLR_ELx_M
+ msr sctlr_el2, x3
+ isb
+
+ tlbi alle2
+
+ /* Install the new pgtables */
+ ldr x3, [x0, #NVHE_INIT_PGD_PA]
+ phys_to_ttbr x4, x3
+alternative_if ARM64_HAS_CNP
+ orr x4, x4, #TTBR_CNP_BIT
+alternative_else_nop_endif
+ msr ttbr0_el2, x4
+
+ /* Set the new stack pointer */
+ ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
+ mov sp, x0
+
+ /* And turn the MMU back on! */
+ dsb nsh
+ isb
+ msr sctlr_el2, x2
+ isb
+ ret x1
+SYM_FUNC_END(__kvm_init_switch_pgd)
+
.popsection
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 933329699425..a0bfe0d26da6 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -6,12 +6,15 @@
#include <hyp/switch.h>
+#include <asm/pgtable-types.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_host.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
+#include <nvhe/mm.h>
+
DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
#define cpu_reg(ctxt, r) (ctxt)->regs.regs[r]
@@ -106,6 +109,43 @@ static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
}
+static void handle___kvm_hyp_protect(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+ DECLARE_REG(unsigned long, size, host_ctxt, 2);
+ DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
+ DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
+
+ cpu_reg(host_ctxt, 1) = __kvm_hyp_protect(phys, size, nr_cpus,
+ per_cpu_base);
+}
+
+static void handle___hyp_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(enum arm64_hyp_spectre_vector, slot, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = hyp_cpu_set_vector(slot);
+}
+
+static void handle___hyp_create_mappings(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(unsigned long, start, host_ctxt, 1);
+ DECLARE_REG(unsigned long, size, host_ctxt, 2);
+ DECLARE_REG(unsigned long, phys, host_ctxt, 3);
+ DECLARE_REG(unsigned long, prot, host_ctxt, 4);
+
+ cpu_reg(host_ctxt, 1) = __hyp_create_mappings(start, size, phys, prot);
+}
+
+static void handle___hyp_create_private_mapping(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+ DECLARE_REG(size_t, size, host_ctxt, 2);
+ DECLARE_REG(unsigned long, prot, host_ctxt, 3);
+
+ cpu_reg(host_ctxt, 1) = __hyp_create_private_mapping(phys, size, prot);
+}
+
typedef void (*hcall_t)(struct kvm_cpu_context *);
#define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = kimg_fn_ptr(handle_##x)
@@ -125,6 +165,10 @@ static const hcall_t *host_hcall[] = {
HANDLE_FUNC(__kvm_get_mdcr_el2),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_aprs),
+ HANDLE_FUNC(__kvm_hyp_protect),
+ HANDLE_FUNC(__hyp_cpu_set_vector),
+ HANDLE_FUNC(__hyp_create_mappings),
+ HANDLE_FUNC(__hyp_create_private_mapping),
};
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
new file mode 100644
index 000000000000..cad5dae197c6
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/spinlock.h>
+
+struct kvm_pgtable hyp_pgtable;
+
+hyp_spinlock_t __hyp_pgd_lock;
+u64 __io_map_base;
+
+struct hyp_memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
+int hyp_memblock_nr;
+
+int __hyp_create_mappings(unsigned long start, unsigned long size,
+ unsigned long phys, unsigned long prot)
+{
+ int err;
+
+ hyp_spin_lock(&__hyp_pgd_lock);
+ err = kvm_pgtable_hyp_map(&hyp_pgtable, start, size, phys, prot);
+ hyp_spin_unlock(&__hyp_pgd_lock);
+
+ return err;
+}
+
+unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
+ unsigned long prot)
+{
+ unsigned long addr;
+ int ret;
+
+ hyp_spin_lock(&__hyp_pgd_lock);
+
+ size = PAGE_ALIGN(size + offset_in_page(phys));
+ addr = __io_map_base;
+ __io_map_base += size;
+
+ /* Are we overflowing on the vmemmap ? */
+ if (__io_map_base > __hyp_vmemmap) {
+ __io_map_base -= size;
+ addr = 0;
+ goto out;
+ }
+
+ ret = kvm_pgtable_hyp_map(&hyp_pgtable, addr, size, phys, prot);
+ if (ret) {
+ addr = 0;
+ goto out;
+ }
+
+ addr = addr + offset_in_page(phys);
+out:
+ hyp_spin_unlock(&__hyp_pgd_lock);
+
+ return addr;
+}
+
+int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
+{
+ unsigned long start = (unsigned long)from;
+ unsigned long end = (unsigned long)to;
+ unsigned long virt_addr;
+ phys_addr_t phys;
+
+ start = start & PAGE_MASK;
+ end = PAGE_ALIGN(end);
+
+ for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
+ int err;
+
+ phys = hyp_virt_to_phys((void *)virt_addr);
+ err = __hyp_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+{
+ unsigned long start, end;
+
+ hyp_vmemmap_range(phys, size, &start, &end);
+
+ return __hyp_create_mappings(start, end - start, back, PAGE_HYP);
+}
+
+static void *__hyp_bp_vect_base;
+int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
+{
+ void *vector;
+
+ switch (slot) {
+ case HYP_VECTOR_DIRECT: {
+ vector = hyp_symbol_addr(__kvm_hyp_vector);
+ break;
+ }
+ case HYP_VECTOR_SPECTRE_DIRECT: {
+ vector = hyp_symbol_addr(__bp_harden_hyp_vecs);
+ break;
+ }
+ case HYP_VECTOR_INDIRECT:
+ case HYP_VECTOR_SPECTRE_INDIRECT: {
+ vector = (void *)__hyp_bp_vect_base;
+ break;
+ }
+ default:
+ return -EINVAL;
+ }
+
+ vector = __kvm_vector_slot2addr(vector, slot);
+ *this_cpu_ptr(&kvm_hyp_vector) = (unsigned long)vector;
+
+ return 0;
+}
+
+int hyp_map_vectors(void)
+{
+ unsigned long bp_base;
+
+ if (!cpus_have_const_cap(ARM64_SPECTRE_V3A))
+ return 0;
+
+ bp_base = (unsigned long)hyp_symbol_addr(__bp_harden_hyp_vecs);
+ bp_base = __hyp_pa(bp_base);
+ bp_base = __hyp_create_private_mapping(bp_base, __BP_HARDEN_HYP_VECS_SZ,
+ PAGE_HYP_EXEC);
+ if (!bp_base)
+ return -1;
+
+ __hyp_bp_vect_base = (void *)bp_base;
+
+ return 0;
+}
+
+int hyp_create_idmap(void)
+{
+ unsigned long start, end;
+
+ start = (unsigned long)hyp_symbol_addr(__hyp_idmap_text_start);
+ start = hyp_virt_to_phys((void *)start);
+ start = ALIGN_DOWN(start, PAGE_SIZE);
+
+ end = (unsigned long)hyp_symbol_addr(__hyp_idmap_text_end);
+ end = hyp_virt_to_phys((void *)end);
+ end = ALIGN(end, PAGE_SIZE);
+
+ /*
+ * One half of the VA space is reserved to linearly map portions of
+ * memory -- see va_layout.c for more details. The other half of the VA
+ * space contains the trampoline page, and needs some care. Split that
+ * second half in two and find the quarter of VA space not conflicting
+ * with the idmap to place the IOs and the vmemmap. IOs use the lower
+ * half of the quarter and the vmemmap the upper half.
+ */
+ __io_map_base = start & BIT(hyp_va_bits - 2);
+ __io_map_base ^= BIT(hyp_va_bits - 2);
+ __hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
+
+ return __hyp_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index dbe57ae84a0c..cfc6dac0f0ac 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -193,8 +193,6 @@ static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
return ret;
}
-void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
-
asmlinkage void __noreturn __kvm_hyp_psci_cpu_entry(void)
{
struct kvm_host_psci_state *cpu_state = this_cpu_ptr(&kvm_host_psci_state);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
new file mode 100644
index 000000000000..9679c97b875b
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+
+struct hyp_pool hpool;
+struct kvm_pgtable_mm_ops hyp_pgtable_mm_ops;
+unsigned long hyp_nr_cpus;
+
+#define hyp_percpu_size ((unsigned long)__per_cpu_end - \
+ (unsigned long)__per_cpu_start)
+
+static void *stacks_base;
+static void *vmemmap_base;
+static void *hyp_pgt_base;
+
+static int divide_memory_pool(void *virt, unsigned long size)
+{
+ unsigned long vstart, vend, nr_pages;
+
+ hyp_early_alloc_init(virt, size);
+
+ stacks_base = hyp_early_alloc_contig(hyp_nr_cpus);
+ if (!stacks_base)
+ return -ENOMEM;
+
+ hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
+ nr_pages = (vend - vstart) >> PAGE_SHIFT;
+ vmemmap_base = hyp_early_alloc_contig(nr_pages);
+ if (!vmemmap_base)
+ return -ENOMEM;
+
+ nr_pages = hyp_s1_pgtable_size() >> PAGE_SHIFT;
+ hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
+ if (!hyp_pgt_base)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
+ unsigned long *per_cpu_base)
+{
+ void *start, *end, *virt = hyp_phys_to_virt(phys);
+ int ret, i;
+
+ /* Recreate the hyp page-table using the early page allocator */
+ hyp_early_alloc_init(hyp_pgt_base, hyp_s1_pgtable_size());
+ ret = kvm_pgtable_hyp_init(&hyp_pgtable, hyp_va_bits,
+ &hyp_early_alloc_mm_ops);
+ if (ret)
+ return ret;
+
+ ret = hyp_create_idmap();
+ if (ret)
+ return ret;
+
+ ret = hyp_map_vectors();
+ if (ret)
+ return ret;
+
+ ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+ if (ret)
+ return ret;
+
+ ret = hyp_create_mappings(hyp_symbol_addr(__hyp_text_start),
+ hyp_symbol_addr(__hyp_text_end),
+ PAGE_HYP_EXEC);
+ if (ret)
+ return ret;
+
+ ret = hyp_create_mappings(hyp_symbol_addr(__start_rodata),
+ hyp_symbol_addr(__end_rodata), PAGE_HYP_RO);
+ if (ret)
+ return ret;
+
+ ret = hyp_create_mappings(hyp_symbol_addr(__hyp_data_ro_after_init_start),
+ hyp_symbol_addr(__hyp_data_ro_after_init_end),
+ PAGE_HYP_RO);
+ if (ret)
+ return ret;
+
+ ret = hyp_create_mappings(hyp_symbol_addr(__bss_start),
+ hyp_symbol_addr(__hyp_bss_end), PAGE_HYP);
+ if (ret)
+ return ret;
+
+ ret = hyp_create_mappings(hyp_symbol_addr(__hyp_bss_end),
+ hyp_symbol_addr(__bss_stop), PAGE_HYP_RO);
+ if (ret)
+ return ret;
+
+ ret = hyp_create_mappings(virt, virt + size - 1, PAGE_HYP);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < hyp_nr_cpus; i++) {
+ start = (void *)kern_hyp_va(per_cpu_base[i]);
+ end = start + PAGE_ALIGN(hyp_percpu_size);
+ ret = hyp_create_mappings(start, end, PAGE_HYP);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void update_nvhe_init_params(void)
+{
+ struct kvm_nvhe_init_params *params;
+ unsigned long i, stack;
+
+ for (i = 0; i < hyp_nr_cpus; i++) {
+ stack = (unsigned long)stacks_base + (i << PAGE_SHIFT);
+ params = per_cpu_ptr(&kvm_init_params, i);
+ params->stack_hyp_va = stack + PAGE_SIZE;
+ params->pgd_pa = __hyp_pa(hyp_pgtable.pgd);
+ __flush_dcache_area(params, sizeof(*params));
+ }
+}
+
+static void *hyp_zalloc_hyp_page(void *arg)
+{
+ return hyp_alloc_pages(&hpool, HYP_GFP_ZERO, 0);
+}
+
+void __noreturn __kvm_hyp_protect_finalise(void)
+{
+ struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
+ struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
+ unsigned long nr_pages, used_pages;
+ int ret;
+
+ /* Now that the vmemmap is backed, install the full-fledged allocator */
+ nr_pages = hyp_s1_pgtable_size() >> PAGE_SHIFT;
+ used_pages = hyp_early_alloc_nr_pages();
+ ret = hyp_pool_init(&hpool, __hyp_pa(hyp_pgt_base), nr_pages, used_pages);
+ if (ret)
+ goto out;
+
+ hyp_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
+ hyp_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
+ hyp_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
+ hyp_pgtable_mm_ops.get_page = hyp_get_page;
+ hyp_pgtable_mm_ops.put_page = hyp_put_page;
+ hyp_pgtable.mm_ops = &hyp_pgtable_mm_ops;
+
+out:
+ host_ctxt->regs.regs[0] = SMCCC_RET_SUCCESS;
+ host_ctxt->regs.regs[1] = ret;
+
+ __host_enter(host_ctxt);
+}
+
+int __kvm_hyp_protect(phys_addr_t phys, unsigned long size,
+ unsigned long nr_cpus, unsigned long *per_cpu_base)
+{
+ struct kvm_nvhe_init_params *params;
+ void *virt = hyp_phys_to_virt(phys);
+ void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
+ int ret;
+
+ if (phys % PAGE_SIZE || size % PAGE_SIZE || (u64)virt % PAGE_SIZE)
+ return -EINVAL;
+
+ hyp_spin_lock_init(&__hyp_pgd_lock);
+ hyp_nr_cpus = nr_cpus;
+
+ ret = divide_memory_pool(virt, size);
+ if (ret)
+ return ret;
+
+ ret = recreate_hyp_mappings(phys, size, per_cpu_base);
+ if (ret)
+ return ret;
+
+ update_nvhe_init_params();
+
+ /* Jump in the idmap page to switch to the new page-tables */
+ params = this_cpu_ptr(&kvm_init_params);
+ fn = (typeof(fn))__hyp_pa(hyp_symbol_addr(__kvm_init_switch_pgd));
+ fn(__hyp_pa(params), hyp_symbol_addr(__kvm_hyp_protect_finalise));
+
+ unreachable();
+}
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
new file mode 100644
index 000000000000..02b0b18006f5
--- /dev/null
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/memblock.h>
+
+#include <asm/kvm_host.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+
+phys_addr_t hyp_mem_base;
+phys_addr_t hyp_mem_size;
+
+void __init early_init_dt_add_memory_hyp(u64 base, u64 size)
+{
+ struct hyp_memblock_region *reg;
+
+ if (kvm_nvhe_sym(hyp_memblock_nr) >= HYP_MEMBLOCK_REGIONS)
+ kvm_nvhe_sym(hyp_memblock_nr) = -1;
+
+ if (kvm_nvhe_sym(hyp_memblock_nr) < 0)
+ return;
+
+ reg = kvm_nvhe_sym(hyp_memory);
+ reg[kvm_nvhe_sym(hyp_memblock_nr)].start = base;
+ reg[kvm_nvhe_sym(hyp_memblock_nr)].end = base + size;
+ kvm_nvhe_sym(hyp_memblock_nr)++;
+}
+
+extern bool enable_protected_kvm;
+void __init reserve_kvm_hyp(void)
+{
+ u64 nr_pages, prev;
+
+ if (!enable_protected_kvm)
+ return;
+
+ if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
+ return;
+
+ if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
+ return;
+
+ hyp_mem_size += num_possible_cpus() << PAGE_SHIFT;
+ hyp_mem_size += hyp_s1_pgtable_size();
+
+ /*
+ * The hyp_vmemmap needs to be backed by pages, but these pages
+ * themselves need to be present in the vmemmap, so compute the number
+ * of pages needed by looking for a fixed point.
+ */
+ nr_pages = 0;
+ do {
+ prev = nr_pages;
+ nr_pages = (hyp_mem_size >> PAGE_SHIFT) + prev;
+ nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
+ nr_pages += __hyp_pgtable_max_pages(nr_pages);
+ } while (nr_pages != prev);
+ hyp_mem_size += nr_pages << PAGE_SHIFT;
+
+ hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+ hyp_mem_size, SZ_2M);
+ if (!hyp_mem_base) {
+ kvm_err("Failed to reserve hyp memory\n");
+ return;
+ }
+ memblock_reserve(hyp_mem_base, hyp_mem_size);
+
+ kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
+ hyp_mem_base);
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 278e163beda4..3cf9397dabdb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1264,10 +1264,10 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
.virt_to_phys = kvm_host_pa,
};
+u32 hyp_va_bits;
int kvm_mmu_init(void)
{
int err;
- u32 hyp_va_bits;
hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 095540667f0f..f81da019b677 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -34,6 +34,7 @@
#include <asm/fixmap.h>
#include <asm/kasan.h>
#include <asm/kernel-pgtable.h>
+#include <asm/kvm_host.h>
#include <asm/memory.h>
#include <asm/numa.h>
#include <asm/sections.h>
@@ -390,6 +391,8 @@ void __init arm64_memblock_init(void)
reserve_elfcorehdr();
+ reserve_kvm_hyp();
+
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
dma_contiguous_reserve(arm64_dma32_phys_limit);
--
2.29.2.299.gdc1121823c-goog
Refactor __load_guest_stage2() to introduce __load_stage2() which will
be re-used when loading the host stage 2.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5a76358e8c7a..96843b7b6eaa 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -321,9 +321,9 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
* Must be called from hyp code running at EL2 with an updated VTTBR
* and interrupts disabled.
*/
-static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+static __always_inline void __load_stage2(struct kvm_s2_mmu *mmu, unsigned long vtcr)
{
- write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
+ write_sysreg(vtcr, vtcr_el2);
write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
/*
@@ -334,6 +334,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
}
+static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+{
+ __load_stage2(mmu, kern_hyp_va(mmu->arch)->vtcr);
+}
+
static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
{
return container_of(mmu->arch, struct kvm, arch);
--
2.29.2.299.gdc1121823c-goog
In order to make use of the stage 2 pgtable code for the host stage 2,
use struct kvm_arch in lieu of struct kvm as the host will have the
former but not the latter.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 5 +++--
arch/arm64/kvm/hyp/pgtable.c | 6 +++---
arch/arm64/kvm/mmu.c | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 45acc9dc6c45..8e8f1d2c5e0e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -151,12 +151,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
/**
* kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
* @pgt: Uninitialised page-table structure to initialise.
- * @kvm: KVM structure representing the guest virtual machine.
+ * @arch: Arch-specific KVM structure representing the guest virtual
+ * machine.
* @mm_ops: Memory management callbacks.
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
struct kvm_pgtable_mm_ops *mm_ops);
/**
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 61a8a34ddfdb..96a25d0b7b6e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -855,11 +855,11 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
return kvm_pgtable_walk(pgt, addr, size, &walker);
}
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
struct kvm_pgtable_mm_ops *mm_ops)
{
size_t pgd_sz;
- u64 vtcr = kvm->arch.vtcr;
+ u64 vtcr = arch->vtcr;
u32 ia_bits = VTCR_EL2_IPA(vtcr);
u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
@@ -872,7 +872,7 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
pgt->ia_bits = ia_bits;
pgt->start_level = start_level;
pgt->mm_ops = mm_ops;
- pgt->mmu = &kvm->arch.mmu;
+ pgt->mmu = &arch->mmu;
/* Ensure zeroed PGD pages are visible to the hardware walker */
dsb(ishst);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5c2e0feb9689..384f2acc0115 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -461,7 +461,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
if (!pgt)
return -ENOMEM;
- err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
+ err = kvm_pgtable_stage2_init(pgt, &kvm->arch, &kvm_s2_mm_ops);
if (err)
goto out_free_pgtable;
--
2.29.2.299.gdc1121823c-goog
Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a copy
of the memory regions parsed from DT. This will be needed in the context
of the protected nVHE feature of KVM/arm64 where the code running at EL2
will be cleanly separated from the host kernel during boot, and will
need its own representation of memory.
Signed-off-by: Quentin Perret <[email protected]>
---
drivers/of/fdt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4602e467ca8b..af2b5a09c5b4 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1099,6 +1099,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
#define MAX_MEMBLOCK_ADDR ((phys_addr_t)~0)
#endif
+void __init __weak early_init_dt_add_memory_hyp(u64 base, u64 size)
+{
+}
+
void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
{
const u64 phys_offset = MIN_MEMBLOCK_ADDR;
@@ -1139,6 +1143,7 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
base = phys_offset;
}
memblock_add(base, size);
+ early_init_dt_add_memory_hyp(base, size);
}
int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
--
2.29.2.299.gdc1121823c-goog
In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/arm64/include/asm/kvm_mmu.h | 7 ++++++-
arch/arm64/kvm/mmu.c | 8 ++++----
3 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ee8bb8021637..53b01d25e7d9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -86,7 +86,7 @@ struct kvm_s2_mmu {
/* The last vcpu id that ran on each physical CPU */
int __percpu *last_vcpu_ran;
- struct kvm *kvm;
+ struct kvm_arch *arch;
};
struct kvm_arch {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index bb756757b51c..714357ebd278 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -275,7 +275,7 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
*/
static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
{
- write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
+ write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
/*
@@ -285,5 +285,10 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
*/
asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
}
+
+static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
+{
+ return container_of(mmu->arch, struct kvm, arch);
+}
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 384f2acc0115..3b1c53e754ee 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -169,7 +169,7 @@ static void *kvm_host_va(phys_addr_t phys)
static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
bool may_block)
{
- struct kvm *kvm = mmu->kvm;
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
phys_addr_t end = start + size;
assert_spin_locked(&kvm->mmu_lock);
@@ -474,7 +474,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
for_each_possible_cpu(cpu)
*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
- mmu->kvm = kvm;
+ mmu->arch = &kvm->arch;
mmu->pgt = pgt;
mmu->pgd_phys = __pa(pgt->pgd);
mmu->vmid.vmid_gen = 0;
@@ -556,7 +556,7 @@ void stage2_unmap_vm(struct kvm *kvm)
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
{
- struct kvm *kvm = mmu->kvm;
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
struct kvm_pgtable *pgt = NULL;
spin_lock(&kvm->mmu_lock);
@@ -625,7 +625,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
*/
static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
{
- struct kvm *kvm = mmu->kvm;
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
}
--
2.29.2.299.gdc1121823c-goog
Refactor __populate_fault_info() to introduce __get_fault_info() which
will be used once the host is wrapped in a stage 2.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/hyp/switch.h | 36 +++++++++++++++----------
1 file changed, 22 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 84473574c2e7..e9005255d639 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -157,19 +157,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
return true;
}
-static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+static inline bool __get_fault_info(u64 esr, u64 *far, u64 *hpfar)
{
- u8 ec;
- u64 esr;
- u64 hpfar, far;
-
- esr = vcpu->arch.fault.esr_el2;
- ec = ESR_ELx_EC(esr);
-
- if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
- return true;
-
- far = read_sysreg_el2(SYS_FAR);
+ *far = read_sysreg_el2(SYS_FAR);
/*
* The HPFAR can be invalid if the stage 2 fault did not
@@ -185,12 +175,30 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
if (!(esr & ESR_ELx_S1PTW) &&
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
(esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
- if (!__translate_far_to_hpfar(far, &hpfar))
+ if (!__translate_far_to_hpfar(*far, hpfar))
return false;
} else {
- hpfar = read_sysreg(hpfar_el2);
+ *hpfar = read_sysreg(hpfar_el2);
}
+ return true;
+}
+
+static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+{
+ u8 ec;
+ u64 esr;
+ u64 hpfar, far;
+
+ esr = vcpu->arch.fault.esr_el2;
+ ec = ESR_ELx_EC(esr);
+
+ if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
+ return true;
+
+ if (!__get_fault_info(esr, &far, &hpfar))
+ return false;
+
vcpu->arch.fault.far_el2 = far;
vcpu->arch.fault.hpfar_el2 = hpfar;
return true;
--
2.29.2.299.gdc1121823c-goog
From: Will Deacon <[email protected]>
We will soon need to synchronise multiple CPUs in the hyp text at EL2.
The qspinlock-based locking used by the host is overkill for this purpose
and requires a working "percpu" implementation for the MCS nodes.
Implement a simple ticket locking scheme based heavily on the code removed
by c11090474d70 ("arm64: locking: Replace ticket lock implementation with
qspinlock").
[ qperret: removed the __KVM_NVHE_HYPERVISOR__ build-time check from
spinlock.h ]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 95 ++++++++++++++++++++++
arch/arm64/kvm/hyp/include/nvhe/util.h | 25 ++++++
2 files changed, 120 insertions(+)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/util.h
diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
new file mode 100644
index 000000000000..bbfe2cbd9f62
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * A stand-alone ticket spinlock implementation, primarily for use by the
+ * non-VHE hypervisor code running at EL2.
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <[email protected]>
+ *
+ * Heavily based on the implementation removed by c11090474d70 which was:
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __ARM64_KVM_HYP_SPINLOCK_H__
+#define __ARM64_KVM_HYP_SPINLOCK_H__
+
+#include <asm/alternative.h>
+
+typedef union hyp_spinlock {
+ u32 __val;
+ struct {
+#ifdef __AARCH64EB__
+ u16 next, owner;
+#else
+ u16 owner, next;
+#endif
+ };
+} hyp_spinlock_t;
+
+#define hyp_spin_lock_init(l) \
+do { \
+ *(l) = (hyp_spinlock_t){ .__val = 0 }; \
+} while (0)
+
+static inline void hyp_spin_lock(hyp_spinlock_t *lock)
+{
+ u32 tmp;
+ hyp_spinlock_t lockval, newval;
+
+ asm volatile(
+ /* Atomically increment the next ticket. */
+ ALTERNATIVE(
+ /* LL/SC */
+" prfm pstl1strm, %3\n"
+"1: ldaxr %w0, %3\n"
+" add %w1, %w0, #(1 << 16)\n"
+" stxr %w2, %w1, %3\n"
+" cbnz %w2, 1b\n",
+ /* LSE atomics */
+" .arch_extension lse\n"
+" mov %w2, #(1 << 16)\n"
+" ldadda %w2, %w0, %3\n"
+ __nops(3),
+ ARM64_HAS_LSE_ATOMICS)
+
+ /* Did we get the lock? */
+" eor %w1, %w0, %w0, ror #16\n"
+" cbz %w1, 3f\n"
+ /*
+ * No: spin on the owner. Send a local event to avoid missing an
+ * unlock before the exclusive load.
+ */
+" sevl\n"
+"2: wfe\n"
+" ldaxrh %w2, %4\n"
+" eor %w1, %w2, %w0, lsr #16\n"
+" cbnz %w1, 2b\n"
+ /* We got the lock. Critical section starts here. */
+"3:"
+ : "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
+ : "Q" (lock->owner)
+ : "memory");
+}
+
+static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
+{
+ u64 tmp;
+
+ asm volatile(
+ ALTERNATIVE(
+ /* LL/SC */
+ " ldrh %w1, %0\n"
+ " add %w1, %w1, #1\n"
+ " stlrh %w1, %0",
+ /* LSE atomics */
+ " .arch_extension lse\n"
+ " mov %w1, #1\n"
+ " staddlh %w1, %0\n"
+ __nops(1),
+ ARM64_HAS_LSE_ATOMICS)
+ : "=Q" (lock->owner), "=&r" (tmp)
+ :
+ : "memory");
+}
+
+#endif /* __ARM64_KVM_HYP_SPINLOCK_H__ */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/util.h b/arch/arm64/kvm/hyp/include/nvhe/util.h
new file mode 100644
index 000000000000..9c58cc436a83
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/util.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Standalone re-implementations of kernel interfaces for use at EL2.
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <[email protected]>
+ */
+
+#ifndef __KVM_NVHE_HYPERVISOR__
+#error "Attempt to include nVHE code outside of EL2 object"
+#endif
+
+#ifndef __ARM64_KVM_NVHE_UTIL_H__
+#define __ARM64_KVM_NVHE_UTIL_H__
+
+/* Locking (hyp_spinlock_t) */
+#include <nvhe/spinlock.h>
+
+#undef spin_lock_init
+#define spin_lock_init hyp_spin_lock_init
+#undef spin_lock
+#define spin_lock hyp_spin_lock
+#undef spin_unlock
+#define spin_unlock hyp_spin_unlock
+
+#endif /* __ARM64_KVM_NVHE_UTIL_H__ */
--
2.29.2.299.gdc1121823c-goog
Move the initialization of kvm_nvhe_init_params in a dedicated function
that is run early, and only once during KVM init, rather than every time
the KVM vectors are set and reset.
This also opens the opportunity for the hypervisor to change the init
structs during boot, hence simplifying the replacement of host-provided
page-tables and stacks by the ones the hypervisor will create for
itself.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/arm.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d6d5211653b7..7335eb4fb0bd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1355,24 +1355,20 @@ static int kvm_init_vector_slots(void)
return 0;
}
-static void cpu_init_hyp_mode(void)
+static void cpu_prepare_hyp_mode(int cpu)
{
- struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
- struct arm_smccc_res res;
-
- /* Switch from the HYP stub to our own HYP init vector */
- __hyp_set_vectors(kvm_get_idmap_vector());
+ struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
/*
* Calculate the raw per-cpu offset without a translation from the
* kernel's mapping to the linear mapping, and store it in tpidr_el2
* so that we can use adr_l to access per-cpu variables in EL2.
*/
- params->tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
+ params->tpidr_el2 = (unsigned long)per_cpu_ptr_nvhe_sym(__per_cpu_start, cpu) -
(unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
params->vector_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_host_vector));
- params->stack_hyp_va = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+ params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
params->pgd_pa = kvm_mmu_get_httbr();
@@ -1381,6 +1377,15 @@ static void cpu_init_hyp_mode(void)
* be read while the MMU is off.
*/
__flush_dcache_area(params, sizeof(*params));
+}
+
+static void cpu_init_hyp_mode(void)
+{
+ struct kvm_nvhe_init_params *params;
+ struct arm_smccc_res res;
+
+ /* Switch from the HYP stub to our own HYP init vector */
+ __hyp_set_vectors(kvm_get_idmap_vector());
/*
* Call initialization code, and switch to the full blown HYP code.
@@ -1389,6 +1394,7 @@ static void cpu_init_hyp_mode(void)
* cpus_have_const_cap() wrapper.
*/
BUG_ON(!system_capabilities_finalized());
+ params = this_cpu_ptr_nvhe_sym(kvm_init_params);
arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
@@ -1742,6 +1748,12 @@ static int init_hyp_mode(void)
init_cpu_logical_map();
init_psci_relay();
+ /*
+ * Prepare the CPU initialization parameters
+ */
+ for_each_possible_cpu(cpu)
+ cpu_prepare_hyp_mode(cpu);
+
return 0;
out_err:
--
2.29.2.299.gdc1121823c-goog
With nVHE, the host currently creates all s1 hypervisor mappings at EL1
during boot, installs them at EL2, and extends them as required (e.g.
when creating a new VM). But in a world where the host is no longer
trusted, it cannot have full control over the code mapped in the
hypervisor.
In preparation for enabling the hypervisor to create its own s1 mappings
during boot, introduce an early page allocator, with minimal
functionality. This allocator is designed to be used only during early
bootstrap of the hyp code when memory protection is enabled, which will
then switch to using a full-fledged page allocator after init.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
arch/arm64/kvm/hyp/include/nvhe/memory.h | 24 ++++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/early_alloc.c | 60 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 5 +-
5 files changed, 101 insertions(+), 4 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
diff --git a/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
new file mode 100644
index 000000000000..68ce2bf9a718
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_EARLY_ALLOC_H
+#define __KVM_HYP_EARLY_ALLOC_H
+
+#include <asm/kvm_pgtable.h>
+
+void hyp_early_alloc_init(void *virt, unsigned long size);
+unsigned long hyp_early_alloc_nr_pages(void);
+void *hyp_early_alloc_page(void *arg);
+void *hyp_early_alloc_contig(unsigned int nr_pages);
+
+extern struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+
+#endif /* __KVM_HYP_EARLY_ALLOC_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
new file mode 100644
index 000000000000..64c44c142c95
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MEMORY_H
+#define __KVM_HYP_MEMORY_H
+
+#include <asm/page.h>
+
+#include <linux/types.h>
+
+extern s64 hyp_physvirt_offset;
+
+#define __hyp_pa(virt) ((phys_addr_t)(virt) + hyp_physvirt_offset)
+#define __hyp_va(virt) ((void *)((phys_addr_t)(virt) - hyp_physvirt_offset))
+
+static inline void *hyp_phys_to_virt(phys_addr_t phys)
+{
+ return __hyp_va(phys);
+}
+
+static inline phys_addr_t hyp_virt_to_phys(void *addr)
+{
+ return __hyp_pa(addr);
+}
+
+#endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 590fdefb42dd..1fc0684a7678 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -10,7 +10,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o hyp-smp.o psci-relay.o
+ hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o
obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
new file mode 100644
index 000000000000..de4c45662970
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/memory.h>
+
+struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+s64 __ro_after_init hyp_physvirt_offset;
+
+static unsigned long base;
+static unsigned long end;
+static unsigned long cur;
+
+unsigned long hyp_early_alloc_nr_pages(void)
+{
+ return (cur - base) >> PAGE_SHIFT;
+}
+
+extern void clear_page(void *to);
+
+void *hyp_early_alloc_contig(unsigned int nr_pages)
+{
+ unsigned long ret = cur, i, p;
+
+ if (!nr_pages)
+ return NULL;
+
+ cur += nr_pages << PAGE_SHIFT;
+ if (cur > end) {
+ cur = ret;
+ return NULL;
+ }
+
+ for (i = 0; i < nr_pages; i++) {
+ p = ret + (i << PAGE_SHIFT);
+ clear_page((void *)(p));
+ }
+
+ return (void *)ret;
+}
+
+void *hyp_early_alloc_page(void *arg)
+{
+ return hyp_early_alloc_contig(1);
+}
+
+void hyp_early_alloc_init(unsigned long virt, unsigned long size)
+{
+ base = virt;
+ end = virt + size;
+ cur = virt;
+
+ hyp_early_alloc_mm_ops.zalloc_page = hyp_early_alloc_page;
+ hyp_early_alloc_mm_ops.phys_to_virt = hyp_phys_to_virt;
+ hyp_early_alloc_mm_ops.virt_to_phys = hyp_virt_to_phys;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 313ef42f0eab..dbe57ae84a0c 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -14,6 +14,8 @@
#include <kvm/arm_psci.h>
#include <uapi/linux/psci.h>
+#include <nvhe/memory.h>
+
#define INVALID_CPU_ID UINT_MAX
extern char __kvm_hyp_cpu_entry[];
@@ -21,9 +23,6 @@ extern char __kvm_hyp_cpu_entry[];
/* Config options set by the host. */
u32 __ro_after_init kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 __ro_after_init kvm_host_psci_function_id[PSCI_FN_MAX];
-s64 __ro_after_init hyp_physvirt_offset;
-
-#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
struct kvm_host_psci_state {
atomic_t pending_on;
--
2.29.2.299.gdc1121823c-goog
Move the registers relevant to host stage 2 enablement to
kvm_nvhe_init_params to prepare the ground for enabling it in later
patches.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 3 +++
arch/arm64/kernel/asm-offsets.c | 3 +++
arch/arm64/kvm/arm.c | 5 +++++
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 9 +++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +----
5 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 9266b17f8ba9..089eea6e54fc 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -158,6 +158,9 @@ struct kvm_nvhe_init_params {
unsigned long stack_hyp_va;
unsigned long entry_hyp_va;
phys_addr_t pgd_pa;
+ unsigned long hcr_el2;
+ unsigned long vttbr;
+ unsigned long vtcr;
};
/* Translate a kernel address @ptr into its equivalent linear mapping */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 9752100bf01f..2c3813bff6ea 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -115,6 +115,9 @@ int main(void)
DEFINE(NVHE_INIT_STACK_HYP_VA, offsetof(struct kvm_nvhe_init_params, stack_hyp_va));
DEFINE(NVHE_INIT_ENTRY_HYP_VA, offsetof(struct kvm_nvhe_init_params, entry_hyp_va));
DEFINE(NVHE_INIT_PGD_PA, offsetof(struct kvm_nvhe_init_params, pgd_pa));
+ DEFINE(NVHE_INIT_HCR_EL2, offsetof(struct kvm_nvhe_init_params, hcr_el2));
+ DEFINE(NVHE_INIT_VTTBR, offsetof(struct kvm_nvhe_init_params, vttbr));
+ DEFINE(NVHE_INIT_VTCR, offsetof(struct kvm_nvhe_init_params, vtcr));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index cfe5cc55b425..e06c95a10dba 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1365,6 +1365,11 @@ static void cpu_prepare_hyp_mode(int cpu)
params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
params->pgd_pa = kvm_mmu_get_httbr();
+ if (is_protected_kvm_enabled())
+ params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
+ else
+ params->hcr_el2 = HCR_HOST_NVHE_FLAGS;
+ params->vttbr = params->vtcr = 0;
/*
* Flush the init params from the data cache because the struct will
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index e2d62297edfe..9f3f3098670a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -103,6 +103,15 @@ alternative_else_nop_endif
ldr x1, [x0, #NVHE_INIT_STACK_HYP_VA]
mov sp, x1
+ ldr x1, [x0, #NVHE_INIT_HCR_EL2]
+ msr hcr_el2, x1
+
+ ldr x1, [x0, #NVHE_INIT_VTTBR]
+ msr vttbr_el2, x1
+
+ ldr x1, [x0, #NVHE_INIT_VTCR]
+ msr vtcr_el2, x1
+
ldr x1, [x0, #NVHE_INIT_PGD_PA]
phys_to_ttbr x0, x1
alternative_if ARM64_HAS_CNP
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f3d0e9eca56c..979a76cdf9fb 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -97,10 +97,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
write_sysreg(mdcr_el2, mdcr_el2);
- if (is_protected_kvm_enabled())
- write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
- else
- write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
+ write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
write_sysreg(__kvm_hyp_host_vector, vbar_el2);
}
--
2.29.2.299.gdc1121823c-goog
When KVM runs in protected nVHE mode, make use of a stage 2 page-table
to give the hypervisor some control over the host memory accesses. At
the moment all memory aborts from the host will be instantly idmapped
RWX at stage 2 in a lazy fashion. Later patches will make use of that
infrastructure to implement access control restrictions to e.g. protect
guest memory from the host.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_cpufeature.h | 2 +
arch/arm64/kernel/image-vars.h | 3 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 33 +++
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 191 ++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/setup.c | 6 +
arch/arm64/kvm/hyp/nvhe/switch.c | 7 +-
arch/arm64/kvm/hyp/nvhe/tlb.c | 4 +-
9 files changed, 247 insertions(+), 7 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
index d34f85cba358..74043a149322 100644
--- a/arch/arm64/include/asm/kvm_cpufeature.h
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -15,3 +15,5 @@
#endif
KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR0_EL1, arm64_ftr_reg_id_aa64mmfr0_el1)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR1_EL1, arm64_ftr_reg_id_aa64mmfr1_el1)
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index f2d43e6cd86d..e2652278dd63 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -137,6 +137,9 @@ KVM_NVHE_ALIAS(__hyp_data_ro_after_init_end);
KVM_NVHE_ALIAS(__hyp_bss_start);
KVM_NVHE_ALIAS(__hyp_bss_end);
+/* pKVM static key */
+KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
+
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
new file mode 100644
index 000000000000..a22ef118a610
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#ifndef __KVM_NVHE_MEM_PROTECT__
+#define __KVM_NVHE_MEM_PROTECT__
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/virt.h>
+#include <nvhe/spinlock.h>
+
+struct host_kvm {
+ struct kvm_arch arch;
+ struct kvm_pgtable pgt;
+ struct kvm_pgtable_mm_ops mm_ops;
+ hyp_spinlock_t lock;
+};
+extern struct host_kvm host_kvm;
+
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
+
+static __always_inline void __load_host_stage2(void)
+{
+ if (static_branch_likely(&kvm_protected_mode_initialized))
+ __load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+ else
+ write_sysreg(0, vttbr_el2);
+}
+#endif /* __KVM_NVHE_MEM_PROTECT__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index d7381a503182..c3e2f98555c4 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -11,7 +11,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
- cache.o cpufeature.o setup.o mm.o
+ cache.o cpufeature.o setup.o mm.o mem_protect.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a0bfe0d26da6..5d0cb17e03a1 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -13,6 +13,7 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
+#include <nvhe/mem_protect.h>
#include <nvhe/mm.h>
DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -234,6 +235,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
case ESR_ELx_EC_SMC64:
handle_host_smc(host_ctxt);
break;
+ case ESR_ELx_EC_IABT_LOW:
+ fallthrough;
+ case ESR_ELx_EC_DABT_LOW:
+ handle_host_mem_abort(host_ctxt);
+ break;
default:
hyp_panic();
}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
new file mode 100644
index 000000000000..0cd3eb178f3b
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_cpufeature.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
+
+#include <hyp/switch.h>
+
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+
+extern unsigned long hyp_nr_cpus;
+struct host_kvm host_kvm;
+
+struct hyp_pool host_s2_mem;
+struct hyp_pool host_s2_dev;
+
+static void *host_s2_zalloc_pages_exact(size_t size)
+{
+ return hyp_alloc_pages(&host_s2_mem, HYP_GFP_ZERO, get_order(size));
+}
+
+static void *host_s2_zalloc_page(void *pool)
+{
+ return hyp_alloc_pages(pool, HYP_GFP_ZERO, 0);
+}
+
+static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+ unsigned long nr_pages;
+ int ret;
+
+ nr_pages = host_s2_mem_pgtable_size() >> PAGE_SHIFT;
+ ret = hyp_pool_init(&host_s2_mem, __hyp_pa(mem_pgt_pool), nr_pages, 0);
+ if (ret)
+ return ret;
+
+ nr_pages = host_s2_dev_pgtable_size() >> PAGE_SHIFT;
+ ret = hyp_pool_init(&host_s2_dev, __hyp_pa(dev_pgt_pool), nr_pages, 0);
+ if (ret)
+ return ret;
+
+ host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
+ host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
+ host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
+ host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
+ host_kvm.mm_ops.page_count = hyp_page_count;
+ host_kvm.mm_ops.get_page = hyp_get_page;
+ host_kvm.mm_ops.put_page = hyp_put_page;
+
+ return 0;
+}
+
+static void prepare_host_vtcr(void)
+{
+ u32 parange, phys_shift;
+ u64 mmfr0, mmfr1;
+
+ mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
+ mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
+
+ /* The host stage 2 is id-mapped, so use parange for T0SZ */
+ parange = kvm_get_parange(mmfr0);
+ phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+ host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+}
+
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+ struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+ struct kvm_nvhe_init_params *params;
+ int ret, i;
+
+ prepare_host_vtcr();
+ hyp_spin_lock_init(&host_kvm.lock);
+
+ ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
+ if (ret)
+ return ret;
+
+ ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
+ &host_kvm.mm_ops);
+ if (ret)
+ return ret;
+
+ mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
+ mmu->arch = &host_kvm.arch;
+ mmu->pgt = &host_kvm.pgt;
+ mmu->vmid.vmid_gen = 0;
+ mmu->vmid.vmid = 0;
+
+ for (i = 0; i < hyp_nr_cpus; i++) {
+ params = per_cpu_ptr(&kvm_init_params, i);
+ params->vttbr = kvm_get_vttbr(mmu);
+ params->vtcr = host_kvm.arch.vtcr;
+ params->hcr_el2 |= HCR_VM;
+ __flush_dcache_area(params, sizeof(*params));
+ }
+
+ write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
+ __load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+
+ return 0;
+}
+
+static void host_stage2_unmap_dev_all(void)
+{
+ struct kvm_pgtable *pgt = &host_kvm.pgt;
+ struct hyp_memblock_region *reg;
+ u64 addr = 0;
+ int i;
+
+ /* Unmap all non-memory regions to recycle the pages */
+ for (i = 0; i < hyp_memblock_nr; i++, addr = reg->end) {
+ reg = &hyp_memory[i];
+ kvm_pgtable_stage2_unmap(pgt, addr, reg->start - addr);
+ }
+ kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
+}
+
+static bool ipa_is_memory(u64 ipa)
+{
+ int cur, left = 0, right = hyp_memblock_nr;
+ struct hyp_memblock_region *reg;
+
+ /* The list of memblock regions is sorted, binary search it */
+ while (left < right) {
+ cur = (left + right) >> 1;
+ reg = &hyp_memory[cur];
+ if (ipa < reg->start)
+ right = cur;
+ else if (ipa >= reg->end)
+ left = cur + 1;
+ else
+ return true;
+ }
+
+ return false;
+}
+
+static int __host_stage2_map(u64 ipa, u64 size, enum kvm_pgtable_prot prot,
+ struct hyp_pool *p)
+{
+ return kvm_pgtable_stage2_map(&host_kvm.pgt, ipa, size, ipa, prot, p);
+}
+
+static int host_stage2_map(u64 ipa, u64 size, enum kvm_pgtable_prot prot)
+{
+ int ret, is_memory = ipa_is_memory(ipa);
+ struct hyp_pool *pool;
+
+ pool = is_memory ? &host_s2_mem : &host_s2_dev;
+
+ hyp_spin_lock(&host_kvm.lock);
+ ret = __host_stage2_map(ipa, size, prot, pool);
+ if (ret == -ENOMEM && !is_memory) {
+ host_stage2_unmap_dev_all();
+ ret = __host_stage2_map(ipa, size, prot, pool);
+ }
+ hyp_spin_unlock(&host_kvm.lock);
+
+ return ret;
+}
+
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
+{
+ enum kvm_pgtable_prot prot;
+ u64 far, hpfar, esr, ipa;
+ int ret;
+
+ esr = read_sysreg_el2(SYS_ESR);
+ if (!__get_fault_info(esr, &far, &hpfar))
+ hyp_panic();
+
+ prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W | KVM_PGTABLE_PROT_X;
+ ipa = (hpfar & HPFAR_MASK) << 8;
+ ret = host_stage2_map(ipa, PAGE_SIZE, prot);
+ if (ret)
+ hyp_panic();
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index b73e6b08cfba..3e079027e608 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -12,6 +12,7 @@
#include <nvhe/early_alloc.h>
#include <nvhe/gfp.h>
#include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
#include <nvhe/mm.h>
struct hyp_pool hpool;
@@ -161,6 +162,11 @@ void __noreturn __kvm_hyp_protect_finalise(void)
if (ret)
goto out;
+ /* Wrap the host with a stage 2 */
+ ret = kvm_host_prepare_stage2(host_s2_mem_pgt_base, host_s2_dev_pgt_base);
+ if (ret)
+ goto out;
+
hyp_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
hyp_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
hyp_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 979a76cdf9fb..31bc1a843bf8 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -28,6 +28,8 @@
#include <asm/processor.h>
#include <asm/thread_info.h>
+#include <nvhe/mem_protect.h>
+
/* Non-VHE specific context */
DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
@@ -102,11 +104,6 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
write_sysreg(__kvm_hyp_host_vector, vbar_el2);
}
-static void __load_host_stage2(void)
-{
- write_sysreg(0, vttbr_el2);
-}
-
/* Save VGICv3 state on non-VHE systems */
static void __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
{
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index fbde89a2c6e8..255a23a1b2db 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -8,6 +8,8 @@
#include <asm/kvm_mmu.h>
#include <asm/tlbflush.h>
+#include <nvhe/mem_protect.h>
+
struct tlb_inv_context {
u64 tcr;
};
@@ -43,7 +45,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
{
- write_sysreg(0, vttbr_el2);
+ __load_host_stage2();
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
/* Ensure write of the host VMID */
--
2.29.2.299.gdc1121823c-goog
In order to re-use some of the stage 2 setup at EL2, factor parts of
kvm_arm_setup_stage2() out into static inline functions.
No functional change intended.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 48 ++++++++++++++++++++++++++++++++
arch/arm64/kvm/reset.c | 42 +++-------------------------
2 files changed, 52 insertions(+), 38 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 714357ebd278..5a76358e8c7a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -256,6 +256,54 @@ static inline int kvm_write_guest_lock(struct kvm *kvm, gpa_t gpa,
return ret;
}
+static inline u64 kvm_get_parange(u64 mmfr0)
+{
+ u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
+ ID_AA64MMFR0_PARANGE_SHIFT);
+ if (parange > ID_AA64MMFR0_PARANGE_MAX)
+ parange = ID_AA64MMFR0_PARANGE_MAX;
+
+ return parange;
+}
+
+/*
+ * The VTCR value is common across all the physical CPUs on the system.
+ * We use system wide sanitised values to fill in different fields,
+ * except for Hardware Management of Access Flags. HA Flag is set
+ * unconditionally on all CPUs, as it is safe to run with or without
+ * the feature and the bit is RES0 on CPUs that don't support it.
+ */
+static inline u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
+{
+ u64 vtcr = VTCR_EL2_FLAGS;
+ u8 lvls;
+
+ vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+ vtcr |= VTCR_EL2_T0SZ(phys_shift);
+ /*
+ * Use a minimum 2 level page table to prevent splitting
+ * host PMD huge pages at stage2.
+ */
+ lvls = stage2_pgtable_levels(phys_shift);
+ if (lvls < 2)
+ lvls = 2;
+ vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+
+ /*
+ * Enable the Hardware Access Flag management, unconditionally
+ * on all CPUs. The features is RES0 on CPUs without the support
+ * and must be ignored by the CPUs.
+ */
+ vtcr |= VTCR_EL2_HA;
+
+ /* Set the vmid bits */
+ vtcr |= (get_vmid_bits(mmfr1) == 16) ?
+ VTCR_EL2_VS_16BIT :
+ VTCR_EL2_VS_8BIT;
+
+ return vtcr;
+}
+
#define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr)
static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3e772ea4e066..074b39dbe539 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -384,19 +384,10 @@ int kvm_set_ipa_limit(void)
return 0;
}
-/*
- * Configure the VTCR_EL2 for this VM. The VTCR value is common
- * across all the physical CPUs on the system. We use system wide
- * sanitised values to fill in different fields, except for Hardware
- * Management of Access Flags. HA Flag is set unconditionally on
- * all CPUs, as it is safe to run with or without the feature and
- * the bit is RES0 on CPUs that don't support it.
- */
int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
{
- u64 vtcr = VTCR_EL2_FLAGS, mmfr0;
- u32 parange, phys_shift;
- u8 lvls;
+ u64 mmfr0, mmfr1;
+ u32 phys_shift;
if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
return -EINVAL;
@@ -411,33 +402,8 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
}
mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
- parange = cpuid_feature_extract_unsigned_field(mmfr0,
- ID_AA64MMFR0_PARANGE_SHIFT);
- if (parange > ID_AA64MMFR0_PARANGE_MAX)
- parange = ID_AA64MMFR0_PARANGE_MAX;
- vtcr |= parange << VTCR_EL2_PS_SHIFT;
-
- vtcr |= VTCR_EL2_T0SZ(phys_shift);
- /*
- * Use a minimum 2 level page table to prevent splitting
- * host PMD huge pages at stage2.
- */
- lvls = stage2_pgtable_levels(phys_shift);
- if (lvls < 2)
- lvls = 2;
- vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
-
- /*
- * Enable the Hardware Access Flag management, unconditionally
- * on all CPUs. The features is RES0 on CPUs without the support
- * and must be ignored by the CPUs.
- */
- vtcr |= VTCR_EL2_HA;
+ mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+ kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
- /* Set the vmid bits */
- vtcr |= (kvm_get_vmid_bits() == 16) ?
- VTCR_EL2_VS_16BIT :
- VTCR_EL2_VS_8BIT;
- kvm->arch.vtcr = vtcr;
return 0;
}
--
2.29.2.299.gdc1121823c-goog
In preparation for enabling the creation of page-tables at EL2, factor
all memory allocation out of the page-table code, hence making it
re-usable with any compatible memory allocator.
No functional changes intended.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 32 +++++++++-
arch/arm64/kvm/hyp/pgtable.c | 90 +++++++++++++++++-----------
arch/arm64/kvm/mmu.c | 70 +++++++++++++++++++++-
3 files changed, 154 insertions(+), 38 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 52ab38db04c7..45acc9dc6c45 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,17 +13,41 @@
typedef u64 kvm_pte_t;
+/**
+ * struct kvm_pgtable_mm_ops - Memory management callbacks.
+ * @zalloc_page: Allocate a zeroed memory page.
+ * @zalloc_pages_exact: Allocate an exact number of zeroed memory pages.
+ * @free_pages_exact: Free an exact number of memory pages.
+ * @get_page: Increment the refcount on a page.
+ * @put_page: Decrement the refcount on a page.
+ * @page_count: Returns the refcount of a page.
+ * @phys_to_virt: Convert a physical address into a virtual address.
+ * @virt_to_phys: Convert a virtual address into a physical address.
+ */
+struct kvm_pgtable_mm_ops {
+ void* (*zalloc_page)(void *arg);
+ void* (*zalloc_pages_exact)(size_t size);
+ void (*free_pages_exact)(void *addr, size_t size);
+ void (*get_page)(void *addr);
+ void (*put_page)(void *addr);
+ int (*page_count)(void *addr);
+ void* (*phys_to_virt)(phys_addr_t phys);
+ phys_addr_t (*virt_to_phys)(void *addr);
+};
+
/**
* struct kvm_pgtable - KVM page-table.
* @ia_bits: Maximum input address size, in bits.
* @start_level: Level at which the page-table walk starts.
* @pgd: Pointer to the first top-level entry of the page-table.
+ * @mm_ops: Memory management callbacks.
* @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
*/
struct kvm_pgtable {
u32 ia_bits;
u32 start_level;
kvm_pte_t *pgd;
+ struct kvm_pgtable_mm_ops *mm_ops;
/* Stage-2 only */
struct kvm_s2_mmu *mmu;
@@ -86,10 +110,12 @@ struct kvm_pgtable_walker {
* kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table.
* @pgt: Uninitialised page-table structure to initialise.
* @va_bits: Maximum virtual address bits.
+ * @mm_ops: Memory management callbacks.
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits);
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+ struct kvm_pgtable_mm_ops *mm_ops);
/**
* kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
@@ -126,10 +152,12 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
* kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
* @pgt: Uninitialised page-table structure to initialise.
* @kvm: KVM structure representing the guest virtual machine.
+ * @mm_ops: Memory management callbacks.
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+ struct kvm_pgtable_mm_ops *mm_ops);
/**
* kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d7122c5eac24..61a8a34ddfdb 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -148,9 +148,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
return pte;
}
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
+static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
{
- return __va(kvm_pte_to_phys(pte));
+ return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
}
static void kvm_set_invalid_pte(kvm_pte_t *ptep)
@@ -159,9 +159,10 @@ static void kvm_set_invalid_pte(kvm_pte_t *ptep)
WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
}
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
+static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
+ struct kvm_pgtable_mm_ops *mm_ops)
{
- kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
+ kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
pte |= KVM_PTE_VALID;
@@ -229,7 +230,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
goto out;
}
- childp = kvm_pte_follow(pte);
+ childp = kvm_pte_follow(pte, data->pgt->mm_ops);
ret = __kvm_pgtable_walk(data, childp, level + 1);
if (ret)
goto out;
@@ -304,8 +305,9 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
}
struct hyp_map_data {
- u64 phys;
- kvm_pte_t attr;
+ u64 phys;
+ kvm_pte_t attr;
+ struct kvm_pgtable_mm_ops *mm_ops;
};
static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -355,6 +357,8 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
kvm_pte_t *childp;
+ struct hyp_map_data *data = arg;
+ struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
return 0;
@@ -362,11 +366,11 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;
- childp = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+ childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
if (!childp)
return -ENOMEM;
- kvm_set_table_pte(ptep, childp);
+ kvm_set_table_pte(ptep, childp, mm_ops);
return 0;
}
@@ -376,6 +380,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
int ret;
struct hyp_map_data map_data = {
.phys = ALIGN_DOWN(phys, PAGE_SIZE),
+ .mm_ops = pgt->mm_ops,
};
struct kvm_pgtable_walker walker = {
.cb = hyp_map_walker,
@@ -393,16 +398,18 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
return ret;
}
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+ struct kvm_pgtable_mm_ops *mm_ops)
{
u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
- pgt->pgd = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+ pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
if (!pgt->pgd)
return -ENOMEM;
pgt->ia_bits = va_bits;
pgt->start_level = KVM_PGTABLE_MAX_LEVELS - levels;
+ pgt->mm_ops = mm_ops;
pgt->mmu = NULL;
return 0;
}
@@ -410,7 +417,9 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
- put_page(virt_to_page(kvm_pte_follow(*ptep)));
+ struct kvm_pgtable_mm_ops *mm_ops = arg;
+
+ mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
return 0;
}
@@ -419,10 +428,11 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
struct kvm_pgtable_walker walker = {
.cb = hyp_free_walker,
.flags = KVM_PGTABLE_WALK_TABLE_POST,
+ .arg = pgt->mm_ops,
};
WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
- put_page(virt_to_page(pgt->pgd));
+ pgt->mm_ops->put_page(pgt->pgd);
pgt->pgd = NULL;
}
@@ -434,6 +444,8 @@ struct stage2_map_data {
struct kvm_s2_mmu *mmu;
struct kvm_mmu_memory_cache *memcache;
+
+ struct kvm_pgtable_mm_ops *mm_ops;
};
static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -501,12 +513,12 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
struct stage2_map_data *data)
{
+ struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
kvm_pte_t *childp, pte = *ptep;
- struct page *page = virt_to_page(ptep);
if (data->anchor) {
if (kvm_pte_valid(pte))
- put_page(page);
+ mm_ops->put_page(ptep);
return 0;
}
@@ -520,7 +532,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
if (!data->memcache)
return -ENOMEM;
- childp = kvm_mmu_memory_cache_alloc(data->memcache);
+ childp = mm_ops->zalloc_page(data->memcache);
if (!childp)
return -ENOMEM;
@@ -532,13 +544,13 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
if (kvm_pte_valid(pte)) {
kvm_set_invalid_pte(ptep);
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
- put_page(page);
+ mm_ops->put_page(ptep);
}
- kvm_set_table_pte(ptep, childp);
+ kvm_set_table_pte(ptep, childp, mm_ops);
out_get_page:
- get_page(page);
+ mm_ops->get_page(ptep);
return 0;
}
@@ -546,13 +558,14 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
+ struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
int ret = 0;
if (!data->anchor)
return 0;
- put_page(virt_to_page(kvm_pte_follow(*ptep)));
- put_page(virt_to_page(ptep));
+ mm_ops->put_page(kvm_pte_follow(*ptep, mm_ops));
+ mm_ops->put_page(ptep);
if (data->anchor == ptep) {
data->anchor = NULL;
@@ -607,6 +620,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
.phys = ALIGN_DOWN(phys, PAGE_SIZE),
.mmu = pgt->mmu,
.memcache = mc,
+ .mm_ops = pgt->mm_ops,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
@@ -643,7 +657,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
- struct kvm_s2_mmu *mmu = arg;
+ struct kvm_pgtable *pgt = arg;
+ struct kvm_s2_mmu *mmu = pgt->mmu;
+ struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
kvm_pte_t pte = *ptep, *childp = NULL;
bool need_flush = false;
@@ -651,9 +667,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
return 0;
if (kvm_pte_table(pte, level)) {
- childp = kvm_pte_follow(pte);
+ childp = kvm_pte_follow(pte, mm_ops);
- if (page_count(virt_to_page(childp)) != 1)
+ if (mm_ops->page_count(childp) != 1)
return 0;
} else if (stage2_pte_cacheable(pte)) {
need_flush = true;
@@ -666,15 +682,15 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
*/
kvm_set_invalid_pte(ptep);
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
- put_page(virt_to_page(ptep));
+ mm_ops->put_page(ptep);
if (need_flush) {
- stage2_flush_dcache(kvm_pte_follow(pte),
+ stage2_flush_dcache(kvm_pte_follow(pte, mm_ops),
kvm_granule_size(level));
}
if (childp)
- put_page(virt_to_page(childp));
+ mm_ops->put_page(childp);
return 0;
}
@@ -683,7 +699,7 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
struct kvm_pgtable_walker walker = {
.cb = stage2_unmap_walker,
- .arg = pgt->mmu,
+ .arg = pgt,
.flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
};
@@ -815,12 +831,13 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
+ struct kvm_pgtable_mm_ops *mm_ops = arg;
kvm_pte_t pte = *ptep;
if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pte))
return 0;
- stage2_flush_dcache(kvm_pte_follow(pte), kvm_granule_size(level));
+ stage2_flush_dcache(kvm_pte_follow(pte, mm_ops), kvm_granule_size(level));
return 0;
}
@@ -829,6 +846,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
struct kvm_pgtable_walker walker = {
.cb = stage2_flush_walker,
.flags = KVM_PGTABLE_WALK_LEAF,
+ .arg = pgt->mm_ops,
};
if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
@@ -837,7 +855,8 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
return kvm_pgtable_walk(pgt, addr, size, &walker);
}
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+ struct kvm_pgtable_mm_ops *mm_ops)
{
size_t pgd_sz;
u64 vtcr = kvm->arch.vtcr;
@@ -846,12 +865,13 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
- pgt->pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
if (!pgt->pgd)
return -ENOMEM;
pgt->ia_bits = ia_bits;
pgt->start_level = start_level;
+ pgt->mm_ops = mm_ops;
pgt->mmu = &kvm->arch.mmu;
/* Ensure zeroed PGD pages are visible to the hardware walker */
@@ -863,15 +883,16 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
+ struct kvm_pgtable_mm_ops *mm_ops = arg;
kvm_pte_t pte = *ptep;
if (!kvm_pte_valid(pte))
return 0;
- put_page(virt_to_page(ptep));
+ mm_ops->put_page(ptep);
if (kvm_pte_table(pte, level))
- put_page(virt_to_page(kvm_pte_follow(pte)));
+ mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
return 0;
}
@@ -883,10 +904,11 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
.cb = stage2_free_walker,
.flags = KVM_PGTABLE_WALK_LEAF |
KVM_PGTABLE_WALK_TABLE_POST,
+ .arg = pgt->mm_ops,
};
WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
- free_pages_exact(pgt->pgd, pgd_sz);
+ pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
pgt->pgd = NULL;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1f41173e6149..278e163beda4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -88,6 +88,48 @@ static bool kvm_is_device_pfn(unsigned long pfn)
return !pfn_valid(pfn);
}
+static void *stage2_memcache_alloc_page(void *arg)
+{
+ struct kvm_mmu_memory_cache *mc = arg;
+ kvm_pte_t *ptep = NULL;
+
+ /* Allocated with GFP_KERNEL_ACCOUNT, so no need to zero */
+ if (mc && mc->nobjs)
+ ptep = mc->objects[--mc->nobjs];
+
+ return ptep;
+}
+
+static void *kvm_host_zalloc_pages_exact(size_t size)
+{
+ return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+}
+
+static void kvm_host_get_page(void *addr)
+{
+ get_page(virt_to_page(addr));
+}
+
+static void kvm_host_put_page(void *addr)
+{
+ put_page(virt_to_page(addr));
+}
+
+static int kvm_host_page_count(void *addr)
+{
+ return page_count(virt_to_page(addr));
+}
+
+static phys_addr_t kvm_host_pa(void *addr)
+{
+ return __pa(addr);
+}
+
+static void *kvm_host_va(phys_addr_t phys)
+{
+ return __va(phys);
+}
+
/*
* Unmapping vs dcache management:
*
@@ -351,6 +393,17 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
return 0;
}
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
+ .zalloc_page = stage2_memcache_alloc_page,
+ .zalloc_pages_exact = kvm_host_zalloc_pages_exact,
+ .free_pages_exact = free_pages_exact,
+ .get_page = kvm_host_get_page,
+ .put_page = kvm_host_put_page,
+ .page_count = kvm_host_page_count,
+ .phys_to_virt = kvm_host_va,
+ .virt_to_phys = kvm_host_pa,
+};
+
/**
* kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
* @kvm: The pointer to the KVM structure
@@ -374,7 +427,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
if (!pgt)
return -ENOMEM;
- err = kvm_pgtable_stage2_init(pgt, kvm);
+ err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
if (err)
goto out_free_pgtable;
@@ -1198,6 +1251,19 @@ static int kvm_map_idmap_text(void)
return err;
}
+static void *kvm_hyp_zalloc_page(void *arg)
+{
+ return (void *)get_zeroed_page(GFP_KERNEL);
+}
+
+static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
+ .zalloc_page = kvm_hyp_zalloc_page,
+ .get_page = kvm_host_get_page,
+ .put_page = kvm_host_put_page,
+ .phys_to_virt = kvm_host_va,
+ .virt_to_phys = kvm_host_pa,
+};
+
int kvm_mmu_init(void)
{
int err;
@@ -1241,7 +1307,7 @@ int kvm_mmu_init(void)
goto out;
}
- err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits);
+ err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
if (err)
goto out_free_pgtable;
--
2.29.2.299.gdc1121823c-goog
On Tue, Nov 17, 2020 at 12:16 PM Quentin Perret <[email protected]> wrote:
>
> Introduce early_init_dt_add_memory_hyp() to allow KVM to conserve a copy
> of the memory regions parsed from DT. This will be needed in the context
> of the protected nVHE feature of KVM/arm64 where the code running at EL2
> will be cleanly separated from the host kernel during boot, and will
> need its own representation of memory.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> drivers/of/fdt.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4602e467ca8b..af2b5a09c5b4 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1099,6 +1099,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
> #define MAX_MEMBLOCK_ADDR ((phys_addr_t)~0)
> #endif
>
> +void __init __weak early_init_dt_add_memory_hyp(u64 base, u64 size)
> +{
> +}
> +
> void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
> {
> const u64 phys_offset = MIN_MEMBLOCK_ADDR;
> @@ -1139,6 +1143,7 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
> base = phys_offset;
> }
> memblock_add(base, size);
> + early_init_dt_add_memory_hyp(base, size);
Can this be done right after we add all the memblocks using the
memblock API? I thought EFI would also need to be handled, but looks
like it just calls early_init_dt_add_memory_arch(). That's odd
especially for ACPI systems...
I don't really like putting what looks like an arm64 only hook here,
but then I don't want an arm64 version of
early_init_dt_add_memory_arch() either. We're almost to the point of
getting rid of the arch specific ones. But I don't have a better
suggestion currently.
Rob
Hi Rob,
On Tuesday 17 Nov 2020 at 13:44:53 (-0600), Rob Herring wrote:
> Can this be done right after we add all the memblocks using the
> memblock API?
Possibly, but the thing I'm a bit worried about is the way 'no-map'
regions are removed from memblocks early on.
The EL2 object needs to know about these parts of memory too (and in
fact we may be able to enforce the 'no-map' attribute at the host stage
2 level as well). It's also possible we'll need to have portions of the
guests payload preloaded (and verified) by the bootloader into reserved
memory regions, possibly no-map, to make sure the host does not mess
with them in a normal use-case. So, I couldn't find a much better place
than this one but suggestions are very much welcome.
I'll have a go at the memblock stuff to see if I find a way to make it
work from that angle.
> I thought EFI would also need to be handled, but looks
> like it just calls early_init_dt_add_memory_arch(). That's odd
> especially for ACPI systems...
>
> I don't really like putting what looks like an arm64 only hook here,
> but then I don't want an arm64 version of
> early_init_dt_add_memory_arch() either. We're almost to the point of
> getting rid of the arch specific ones. But I don't have a better
> suggestion currently.
Ack, the ugly truth is that this is likely to remain arm64-specific. I
figured this was simple enough that we might want to consider it,
though.
Thanks,
Quentin
On Wednesday 18 Nov 2020 at 09:25:47 (+0000), Quentin Perret wrote:
> I'll have a go at the memblock stuff to see if I find a way to make it
> work from that angle.
OK, no luck with the memblock API, but I figured that I can actually
postpone the KVM memory reservation to a later point, after
unflatten_device_tree(), which lets me iterate over the memory nodes
directly rather than having the fdt driver do it for me.
The below seems to boot alright (though I'm not too familiar with
of_address_to_resource() so I may not be using right) and keeps the
whole thing in arch/arm64. Thoughts?
Thanks,
Quentin
---8<---
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 7da8e2915c1c..cab5ad587a3a 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -6,6 +6,7 @@
#include <linux/kvm_host.h>
#include <linux/memblock.h>
+#include <linux/of_address.h>
#include <linux/sort.h>
#include <asm/kvm_host.h>
@@ -16,7 +17,7 @@
phys_addr_t hyp_mem_base;
phys_addr_t hyp_mem_size;
-void __init early_init_dt_add_memory_hyp(u64 base, u64 size)
+static int __init add_hyp_memblock_region(struct resource *rsrc)
{
struct hyp_memblock_region *reg;
@@ -24,12 +25,14 @@ void __init early_init_dt_add_memory_hyp(u64 base, u64 size)
kvm_nvhe_sym(hyp_memblock_nr) = -1;
if (kvm_nvhe_sym(hyp_memblock_nr) < 0)
- return;
+ return -ENOMEM;
reg = kvm_nvhe_sym(hyp_memory);
- reg[kvm_nvhe_sym(hyp_memblock_nr)].start = base;
- reg[kvm_nvhe_sym(hyp_memblock_nr)].end = base + size;
+ reg[kvm_nvhe_sym(hyp_memblock_nr)].start = rsrc->start;
+ reg[kvm_nvhe_sym(hyp_memblock_nr)].end = rsrc->end;
kvm_nvhe_sym(hyp_memblock_nr)++;
+
+ return 0;
}
static int cmp_hyp_memblock(const void *p1, const void *p2)
@@ -52,7 +55,10 @@ void kvm_sort_memblock_regions(void)
extern bool enable_protected_kvm;
void __init reserve_kvm_hyp(void)
{
+ struct device_node *np;
+ struct resource rsrc;
u64 nr_pages, prev;
+ int i;
if (!enable_protected_kvm)
return;
@@ -60,8 +66,14 @@ void __init reserve_kvm_hyp(void)
if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
return;
- if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
- return;
+ for_each_node_by_type(np, "memory") {
+ for (i = 0; !of_address_to_resource(np, i, &rsrc); i++) {
+ if (!add_hyp_memblock_region(&rsrc))
+ continue;
+ kvm_err("Failed to add hyp memblock\n");
+ return;
+ }
+ }
hyp_mem_size += num_possible_cpus() << PAGE_SHIFT;
hyp_mem_size += hyp_s1_pgtable_size();
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f81da019b677..114f788a4da4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -391,7 +391,6 @@ void __init arm64_memblock_init(void)
reserve_elfcorehdr();
- reserve_kvm_hyp();
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -423,6 +422,8 @@ void __init bootmem_init(void)
dma_pernuma_cma_reserve();
+ reserve_kvm_hyp();
+
/*
* sparse_init() tries to allocate memory from memblock, so must be
* done after the fixed reservations
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index af2b5a09c5b4..4602e467ca8b 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1099,10 +1099,6 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
#define MAX_MEMBLOCK_ADDR ((phys_addr_t)~0)
#endif
-void __init __weak early_init_dt_add_memory_hyp(u64 base, u64 size)
-{
-}
-
void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
{
const u64 phys_offset = MIN_MEMBLOCK_ADDR;
@@ -1143,7 +1139,6 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
base = phys_offset;
}
memblock_add(base, size);
- early_init_dt_add_memory_hyp(base, size);
}
int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
Hi Quentin,
On Tue, Nov 17, 2020 at 6:16 PM 'Quentin Perret' via kernel-team
<[email protected]> wrote:
>
> Introduce the infrastructure in KVM enabling to copy CPU feature
> registers into EL2-owned data-structures, to allow reading sanitised
> values directly at EL2 in nVHE.
>
> Given that only a subset of these features are being read by the
> hypervisor, the ones that need to be copied are to be listed under
> <asm/kvm_cpufeature.h> together with the name of the nVHE variable that
> will hold the copy.
>
> While at it, introduce the first user of this infrastructure by
> implementing __flush_dcache_area at EL2, which needs
> arm64_ftr_reg_ctrel0.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/cpufeature.h | 1 +
> arch/arm64/include/asm/kvm_cpufeature.h | 17 ++++++++++++++
> arch/arm64/kernel/cpufeature.c | 12 ++++++++++
> arch/arm64/kernel/image-vars.h | 2 ++
> arch/arm64/kvm/arm.c | 31 +++++++++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/Makefile | 3 ++-
> arch/arm64/kvm/hyp/nvhe/cache.S | 13 +++++++++++
> arch/arm64/kvm/hyp/nvhe/cpufeature.c | 8 +++++++
> 8 files changed, 86 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
> create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
> create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index da250e4741bd..3dfbd76fb647 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -600,6 +600,7 @@ void __init setup_cpu_features(void);
> void check_local_cpu_capabilities(void);
>
> u64 read_sanitised_ftr_reg(u32 id);
> +int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst);
>
> static inline bool cpu_supports_mixed_endian_el0(void)
> {
> diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
> new file mode 100644
> index 000000000000..d34f85cba358
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_cpufeature.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: Quentin Perret <[email protected]>
> + */
Missing include guard.
> +
> +#include <asm/cpufeature.h>
> +
> +#ifndef KVM_HYP_CPU_FTR_REG
> +#if defined(__KVM_NVHE_HYPERVISOR__)
> +#define KVM_HYP_CPU_FTR_REG(id, name) extern struct arm64_ftr_reg name;
> +#else
> +#define KVM_HYP_CPU_FTR_REG(id, name) DECLARE_KVM_NVHE_SYM(name);
> +#endif
> +#endif
> +
> +KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index dd5bc0f0cf0d..3bc86d1423f8 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1116,6 +1116,18 @@ u64 read_sanitised_ftr_reg(u32 id)
> }
> EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
>
> +int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
> +{
> + struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
> +
> + if (!regp)
> + return -EINVAL;
> +
> + memcpy(dst, regp, sizeof(*regp));
> +
> + return 0;
> +}
> +
> #define read_sysreg_case(r) \
> case r: return read_sysreg_s(r)
>
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index dd8ccc9efb6a..c35d768672eb 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -116,6 +116,8 @@ __kvm_nvhe___memcpy = __kvm_nvhe___pi_memcpy;
> __kvm_nvhe___memset = __kvm_nvhe___pi_memset;
> #endif
>
> +_kvm_nvhe___flush_dcache_area = __kvm_nvhe___pi___flush_dcache_area;
> +
> #endif /* CONFIG_KVM */
>
> #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 391cf6753a13..c7f8fca97202 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -34,6 +34,7 @@
> #include <asm/virt.h>
> #include <asm/kvm_arm.h>
> #include <asm/kvm_asm.h>
> +#include <asm/kvm_cpufeature.h>
> #include <asm/kvm_mmu.h>
> #include <asm/kvm_emulate.h>
> #include <asm/sections.h>
> @@ -1636,6 +1637,29 @@ static void teardown_hyp_mode(void)
> }
> }
>
> +#undef KVM_HYP_CPU_FTR_REG
> +#define KVM_HYP_CPU_FTR_REG(id, name) \
> + { .sys_id = id, .dst = (struct arm64_ftr_reg *)&kvm_nvhe_sym(name) },
> +static const struct __ftr_reg_copy_entry {
> + u32 sys_id;
> + struct arm64_ftr_reg *dst;
> +} hyp_ftr_regs[] = {
> + #include <asm/kvm_cpufeature.h>
> +};
> +
> +static int copy_cpu_ftr_regs(void)
> +{
> + int i, ret;
> +
> + for (i = 0; i < ARRAY_SIZE(hyp_ftr_regs); i++) {
> + ret = copy_ftr_reg(hyp_ftr_regs[i].sys_id, hyp_ftr_regs[i].dst);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> /**
> * Inits Hyp-mode on all online CPUs
> */
> @@ -1644,6 +1668,13 @@ static int init_hyp_mode(void)
> int cpu;
> int err = 0;
>
> + /*
> + * Copy the required CPU feature register in their EL2 counterpart
> + */
> + err = copy_cpu_ftr_regs();
> + if (err)
> + return err;
> +
> /*
> * Allocate Hyp PGD and setup Hyp identity mapping
> */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 9e5eacfec6ec..72cfe53f106f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -10,7 +10,8 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
> lib-objs := $(addprefix ../../../lib/, $(lib-objs))
>
> obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> - hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
> + hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> + cache.o cpufeature.o
> obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> ../fpsimd.o ../hyp-entry.o ../exception.o
> obj-y += $(lib-objs)
> diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
> new file mode 100644
> index 000000000000..36cef6915428
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/cache.S
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Code copied from arch/arm64/mm/cache.S.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/assembler.h>
> +#include <asm/alternative.h>
> +
> +SYM_FUNC_START_PI(__flush_dcache_area)
> + dcache_by_line_op civac, sy, x0, x1, x2, x3
> + ret
> +SYM_FUNC_END_PI(__flush_dcache_area)
> diff --git a/arch/arm64/kvm/hyp/nvhe/cpufeature.c b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
> new file mode 100644
> index 000000000000..a887508f996f
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
> @@ -0,0 +1,8 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: Quentin Perret <[email protected]>
> + */
> +
> +#define KVM_HYP_CPU_FTR_REG(id, name) struct arm64_ftr_reg name;
> +#include <asm/kvm_cpufeature.h>
> --
> 2.29.2.299.gdc1121823c-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
/fuad
On Tue, Nov 17, 2020 at 06:15:48PM +0000, 'Quentin Perret' via kernel-team wrote:
> kvm_call_hyp() has some logic to issue a function call or a hypercall
> depending the EL at which the kernel is running. However, all the code
> compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to run only at EL2,
> and in this case a simple function call is needed.
>
> Add ifdefery to kvm_host.h to symplify kvm_call_hyp() in .hyp.text.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index ac11adab6602..7a5d5f4b3351 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -557,6 +557,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
> void kvm_arm_halt_guest(struct kvm *kvm);
> void kvm_arm_resume_guest(struct kvm *kvm);
>
> +#ifndef __KVM_NVHE_HYPERVISOR__
> #define kvm_call_hyp_nvhe(f, ...) \
> ({ \
> struct arm_smccc_res res; \
> @@ -596,6 +597,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
> \
> ret; \
> })
> +#else /* __KVM_NVHE_HYPERVISOR__ */
> +#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
> +#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
> +#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
> +#endif /* __KVM_NVHE_HYPERVISOR__ */
I was hoping we could define this as the following instead. That would require
adding host-side declarations of all functions currently called with _nvhe.
#define kvm_call_hyp_nvhe(f, ...) \
+ is_nvhe_hyp_code() ? f(__VA_ARGS__) : \
({ \
struct arm_smccc_res res; \
\
arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(f), \
##__VA_ARGS__, &res); \
WARN_ON(res.a0 != SMCCC_RET_SUCCESS); \
\
res.a1; \
})
Up to you what you think is cleaner, just my 2 cents...
On Tue, Nov 17, 2020 at 06:15:49PM +0000, 'Quentin Perret' via kernel-team wrote:
> In order to allow the usage of code shared by the host and the hyp in
> static inline library function, allow the usage of kvm_nvhe_sym() at el2
> by defaulting to the raw symbol name.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/hyp_image.h | 4 ++++
> arch/arm64/include/asm/kvm_asm.h | 4 ++--
> arch/arm64/kvm/arm.c | 2 +-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
> index daa1a1da539e..8b807b646b8f 100644
> --- a/arch/arm64/include/asm/hyp_image.h
> +++ b/arch/arm64/include/asm/hyp_image.h
> @@ -7,11 +7,15 @@
> #ifndef __ARM64_HYP_IMAGE_H__
> #define __ARM64_HYP_IMAGE_H__
>
> +#ifndef __KVM_NVHE_HYPERVISOR__
> /*
> * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
> * to separate it from the kernel proper.
> */
> #define kvm_nvhe_sym(sym) __kvm_nvhe_##sym
> +#else
> +#define kvm_nvhe_sym(sym) sym
> +#endif
>
> #ifdef LINKER_SCRIPT
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 1a86581e581e..e4934f5e4234 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -173,11 +173,11 @@ struct kvm_s2_mmu;
> DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
> DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
> DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
> -DECLARE_KVM_NVHE_SYM(__kvm_hyp_psci_cpu_entry);
> #define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
> #define __kvm_hyp_host_vector CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
> #define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
> -#define __kvm_hyp_psci_cpu_entry CHOOSE_NVHE_SYM(__kvm_hyp_psci_cpu_entry)
> +
> +void kvm_nvhe_sym(__kvm_hyp_psci_cpu_entry)(void);
>
> extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> DECLARE_KVM_NVHE_SYM(__per_cpu_start);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 882eb383bd75..391cf6753a13 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
>
> params->vector_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_host_vector));
> params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> - params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
> + params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
Why is this change needed?
> params->pgd_pa = kvm_mmu_get_httbr();
>
> /*
> --
> 2.29.2.299.gdc1121823c-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Tue, Nov 17, 2020 at 06:15:42PM +0000, 'Quentin Perret' via kernel-team wrote:
> From: Will Deacon <[email protected]>
>
> Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
> code and ensure that we always execute the '__pi_' entry point on the
> offchance that it changes in future.
>
> [ qperret: Commit title nits ]
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kernel/image-vars.h | 11 +++++++++++
> arch/arm64/kvm/hyp/nvhe/Makefile | 4 ++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 8539f34d7538..dd8ccc9efb6a 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -105,6 +105,17 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
> /* Array containing bases of nVHE per-CPU memory regions. */
> KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
>
> +/* Position-independent library routines */
> +__kvm_nvhe_clear_page = __kvm_nvhe___pi_clear_page;
> +__kvm_nvhe_copy_page = __kvm_nvhe___pi_copy_page;
> +__kvm_nvhe_memcpy = __kvm_nvhe___pi_memcpy;
> +__kvm_nvhe_memset = __kvm_nvhe___pi_memset;
> +
> +#ifdef CONFIG_KASAN
> +__kvm_nvhe___memcpy = __kvm_nvhe___pi_memcpy;
> +__kvm_nvhe___memset = __kvm_nvhe___pi_memset;
> +#endif
> +
> #endif /* CONFIG_KVM */
Nit: Would be good to use the kvm_nvhe_sym() helper for the namespacing.
And feel free to define something like KVM_NVHE_ALIAS for PI in hyp-image.h.
>
> #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 1f1e351c5fe2..590fdefb42dd 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -6,10 +6,14 @@
> asflags-y := -D__KVM_NVHE_HYPERVISOR__
> ccflags-y := -D__KVM_NVHE_HYPERVISOR__
>
> +lib-objs := clear_page.o copy_page.o memcpy.o memset.o
> +lib-objs := $(addprefix ../../../lib/, $(lib-objs))
> +
> obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> hyp-main.o hyp-smp.o psci-relay.o
> obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> ../fpsimd.o ../hyp-entry.o ../exception.o
> +obj-y += $(lib-objs)
>
> ##
> ## Build rules for compiling nVHE hyp code
> --
> 2.29.2.299.gdc1121823c-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Monday 23 Nov 2020 at 10:55:20 (+0000), Fuad Tabba wrote:
> > diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
> > new file mode 100644
> > index 000000000000..d34f85cba358
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/kvm_cpufeature.h
> > @@ -0,0 +1,17 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2020 - Google LLC
> > + * Author: Quentin Perret <[email protected]>
> > + */
>
> Missing include guard.
Right, but on purpose :)
See how arm.c includes this header twice with different definitions of
KVM_HYP_CPU_FTR_REG for instance.
Thanks,
Quentin
On Monday 23 Nov 2020 at 12:34:25 (+0000), David Brazdil wrote:
> On Tue, Nov 17, 2020 at 06:15:42PM +0000, 'Quentin Perret' via kernel-team wrote:
> > diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> > index 8539f34d7538..dd8ccc9efb6a 100644
> > --- a/arch/arm64/kernel/image-vars.h
> > +++ b/arch/arm64/kernel/image-vars.h
> > @@ -105,6 +105,17 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
> > /* Array containing bases of nVHE per-CPU memory regions. */
> > KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
> >
> > +/* Position-independent library routines */
> > +__kvm_nvhe_clear_page = __kvm_nvhe___pi_clear_page;
> > +__kvm_nvhe_copy_page = __kvm_nvhe___pi_copy_page;
> > +__kvm_nvhe_memcpy = __kvm_nvhe___pi_memcpy;
> > +__kvm_nvhe_memset = __kvm_nvhe___pi_memset;
> > +
> > +#ifdef CONFIG_KASAN
> > +__kvm_nvhe___memcpy = __kvm_nvhe___pi_memcpy;
> > +__kvm_nvhe___memset = __kvm_nvhe___pi_memset;
> > +#endif
> > +
> > #endif /* CONFIG_KVM */
>
> Nit: Would be good to use the kvm_nvhe_sym() helper for the namespacing.
> And feel free to define something like KVM_NVHE_ALIAS for PI in hyp-image.h.
Ack, that'd be much nicer, I'll fix it up for v2.
Thanks,
Quentin
On Monday 23 Nov 2020 at 13:22:23 (+0000), David Brazdil wrote:
> Could you help my understand why we need this?
> * Why do we need PI routines in the first place? Would my series that fixes
> relocations in hyp code remove the need?
> * You added these aliases for the string routines because you were worried
> somebody would change the implementation in arch/arm64/lib, right? But this
> cache flush function is defined in hyp/nvhe. So why do we need to point to
> the PI alias if we control the implementation?
Right, in the specific case of the __flush_dcache_area() function none
of the PI stuff is really needed I think. I did it this way to keep
things as consistent as possible with the host-side implementation, but
that is not required.
I understand this can cause confusion, so yes, I'll simplify this for
v2.
Cheers,
Quentin
> +int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
> +{
> + struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
> +
> + if (!regp)
> + return -EINVAL;
> +
> + memcpy(dst, regp, sizeof(*regp));
> +
> + return 0;
> +}
> +
> #define read_sysreg_case(r) \
> case r: return read_sysreg_s(r)
>
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index dd8ccc9efb6a..c35d768672eb 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -116,6 +116,8 @@ __kvm_nvhe___memcpy = __kvm_nvhe___pi_memcpy;
> __kvm_nvhe___memset = __kvm_nvhe___pi_memset;
> #endif
>
> +_kvm_nvhe___flush_dcache_area = __kvm_nvhe___pi___flush_dcache_area;
> +
Could you help my understand why we need this?
* Why do we need PI routines in the first place? Would my series that fixes
relocations in hyp code remove the need?
* You added these aliases for the string routines because you were worried
somebody would change the implementation in arch/arm64/lib, right? But this
cache flush function is defined in hyp/nvhe. So why do we need to point to
the PI alias if we control the implementation?
On Mon, Nov 23, 2020 at 02:02:50PM +0000, 'Quentin Perret' via kernel-team wrote:
> On Monday 23 Nov 2020 at 12:57:23 (+0000), David Brazdil wrote:
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 882eb383bd75..391cf6753a13 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
> > >
> > > params->vector_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_host_vector));
> > > params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> > > - params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
> > > + params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
> >
> > Why is this change needed?
>
> You mean this line specifically or the whole __kvm_hyp_psci_cpu_entry
> thing?
>
> For the latter, it is to avoid having the compiler complain about
> __kvm_hyp_psci_cpu_entry being re-defined as a different symbol. If
> there is a better way to solve this problem I'm happy to change it -- I
> must admit I got a little confused with the namespacing along the way.
Yeah, we do need a more robust approach. It's getting out of control.
>
> Thanks,
> Quentin
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Monday 23 Nov 2020 at 12:57:23 (+0000), David Brazdil wrote:
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 882eb383bd75..391cf6753a13 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1369,7 +1369,7 @@ static void cpu_prepare_hyp_mode(int cpu)
> >
> > params->vector_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_host_vector));
> > params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> > - params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref(__kvm_hyp_psci_cpu_entry));
> > + params->entry_hyp_va = kern_hyp_va((unsigned long)kvm_ksym_ref_nvhe(__kvm_hyp_psci_cpu_entry));
>
> Why is this change needed?
You mean this line specifically or the whole __kvm_hyp_psci_cpu_entry
thing?
For the latter, it is to avoid having the compiler complain about
__kvm_hyp_psci_cpu_entry being re-defined as a different symbol. If
there is a better way to solve this problem I'm happy to change it -- I
must admit I got a little confused with the namespacing along the way.
Thanks,
Quentin
Hi Quentin,
On Tue, Nov 17, 2020 at 6:17 PM 'Quentin Perret' via kernel-team
<[email protected]> wrote:
>
> When memory protection is enabled, the Hyp code needs the ability to
> create and manage its own page-table. To do so, introduce a new set of
> hypercalls to initialize Hyp memory protection.
>
> During the init hcall, the hypervisor runs with the host-provided
> page-table and uses the trivial early page allocator to create its own
> set of page-tables, using a memory pool that was donated by the host.
> Specifically, the hypervisor creates its own mappings for __hyp_text,
> the Hyp memory pool, the __hyp_bss, the portion of hyp_vmemmap
> corresponding to the Hyp pool, among other things. It then jumps back in
> the idmap page, switches to use the newly-created pgd (instead of the
> temporary one provided by the host) and then installs the full-fledged
> buddy allocator which will then be the only one in used from then on.
>
> Note that for the sake of symplifying the review, this only introduces
> the code doing this operation, without actually being called by anyhing
> yet. This will be done in a subsequent patch, which will introduce the
> necessary host kernel changes.
>
> Credits to Will for __kvm_init_switch_pgd.
>
> Co-authored-by: Will Deacon <[email protected]>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/kvm_asm.h | 6 +-
> arch/arm64/include/asm/kvm_host.h | 8 +
> arch/arm64/include/asm/kvm_hyp.h | 8 +
> arch/arm64/kernel/cpufeature.c | 2 +-
> arch/arm64/kernel/image-vars.h | 19 +++
> arch/arm64/kvm/hyp/Makefile | 2 +-
> arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +
> arch/arm64/kvm/hyp/include/nvhe/mm.h | 79 +++++++++
> arch/arm64/kvm/hyp/nvhe/Makefile | 4 +-
> arch/arm64/kvm/hyp/nvhe/hyp-init.S | 30 ++++
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 44 +++++
> arch/arm64/kvm/hyp/nvhe/mm.c | 175 ++++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 -
> arch/arm64/kvm/hyp/nvhe/setup.c | 196 +++++++++++++++++++++++
> arch/arm64/kvm/hyp/reserved_mem.c | 75 +++++++++
> arch/arm64/kvm/mmu.c | 2 +-
> arch/arm64/mm/init.c | 3 +
> 17 files changed, 653 insertions(+), 8 deletions(-)
> create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
> create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
> create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
> create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index e4934f5e4234..9266b17f8ba9 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -57,6 +57,10 @@
> #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 12
> #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13
> #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14
> +#define __KVM_HOST_SMCCC_FUNC___kvm_hyp_protect 15
> +#define __KVM_HOST_SMCCC_FUNC___hyp_create_mappings 16
> +#define __KVM_HOST_SMCCC_FUNC___hyp_create_private_mapping 17
> +#define __KVM_HOST_SMCCC_FUNC___hyp_cpu_set_vector 18
>
> #ifndef __ASSEMBLY__
>
> @@ -171,7 +175,7 @@ struct kvm_vcpu;
> struct kvm_s2_mmu;
>
> DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
> -DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
> +DECLARE_KVM_HYP_SYM(__kvm_hyp_host_vector);
> DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
> #define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
> #define __kvm_hyp_host_vector CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7a5d5f4b3351..ee8bb8021637 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -742,4 +742,12 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> #define kvm_vcpu_has_pmu(vcpu) \
> (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
>
> +#ifdef CONFIG_KVM
> +extern phys_addr_t hyp_mem_base;
> +extern phys_addr_t hyp_mem_size;
> +void __init reserve_kvm_hyp(void);
> +#else
> +static inline void reserve_kvm_hyp(void) { }
> +#endif
> +
> #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 95a2bbbcc7e1..dbd2ef86afa9 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -105,5 +105,13 @@ void __noreturn hyp_panic(void);
> void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
> #endif
>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +void __kvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
> + phys_addr_t pgd, void *sp, void *cont_fn);
> +int __kvm_hyp_protect(phys_addr_t phys, unsigned long size,
> + unsigned long nr_cpus, unsigned long *per_cpu_base);
> +void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
> +#endif
> +
> #endif /* __ARM64_KVM_HYP_H__ */
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 3bc86d1423f8..010458f6d799 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1722,7 +1722,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
> #endif /* CONFIG_ARM64_MTE */
>
> #ifdef CONFIG_KVM
> -static bool enable_protected_kvm;
> +bool enable_protected_kvm;
>
> static bool has_protected_kvm(const struct arm64_cpu_capabilities *entry, int __unused)
> {
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index c35d768672eb..f2d43e6cd86d 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -118,6 +118,25 @@ __kvm_nvhe___memset = __kvm_nvhe___pi_memset;
>
> _kvm_nvhe___flush_dcache_area = __kvm_nvhe___pi___flush_dcache_area;
>
> +/* Hypevisor VA size */
> +KVM_NVHE_ALIAS(hyp_va_bits);
> +
> +/* Kernel memory sections */
> +KVM_NVHE_ALIAS(__start_rodata);
> +KVM_NVHE_ALIAS(__end_rodata);
> +KVM_NVHE_ALIAS(__bss_start);
> +KVM_NVHE_ALIAS(__bss_stop);
> +
> +/* Hyp memory sections */
> +KVM_NVHE_ALIAS(__hyp_idmap_text_start);
> +KVM_NVHE_ALIAS(__hyp_idmap_text_end);
> +KVM_NVHE_ALIAS(__hyp_text_start);
> +KVM_NVHE_ALIAS(__hyp_text_end);
> +KVM_NVHE_ALIAS(__hyp_data_ro_after_init_start);
> +KVM_NVHE_ALIAS(__hyp_data_ro_after_init_end);
> +KVM_NVHE_ALIAS(__hyp_bss_start);
> +KVM_NVHE_ALIAS(__hyp_bss_end);
> +
> #endif /* CONFIG_KVM */
>
> #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 687598e41b21..b726332eec49 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -10,4 +10,4 @@ subdir-ccflags-y := -I$(incdir) \
> -DDISABLE_BRANCH_PROFILING \
> $(DISABLE_STACKLEAK_PLUGIN)
>
> -obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
> +obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o reserved_mem.o
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index ed47674bc988..c8af6fe87bfb 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -6,6 +6,12 @@
>
> #include <linux/types.h>
>
> +#define HYP_MEMBLOCK_REGIONS 128
> +struct hyp_memblock_region {
> + phys_addr_t start;
> + phys_addr_t end;
> +};
> +
> struct hyp_pool;
> struct hyp_page {
> unsigned int refcount;
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> new file mode 100644
> index 000000000000..5a3ad6f4e5bc
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -0,0 +1,79 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __KVM_HYP_MM_H
> +#define __KVM_HYP_MM_H
> +
> +#include <asm/kvm_pgtable.h>
> +#include <asm/spectre.h>
> +#include <linux/types.h>
> +
> +#include <nvhe/memory.h>
> +#include <nvhe/spinlock.h>
> +
> +extern struct hyp_memblock_region kvm_nvhe_sym(hyp_memory)[];
> +extern int kvm_nvhe_sym(hyp_memblock_nr);
> +extern struct kvm_pgtable hyp_pgtable;
nit: I found the name of this struct to be confusing (hyp_pgtable),
since there's also
arch/arm64/kvm/mmu.c:25:static struct kvm_pgtable *hyp_pgtable;
which has the same name, but is a pointer to the hyp page table before
being swapped out in favor of this one.
> +extern hyp_spinlock_t __hyp_pgd_lock;
> +extern struct hyp_pool hpool;
> +extern u64 __io_map_base;
> +extern u32 hyp_va_bits;
> +
> +int hyp_create_idmap(void);
> +int hyp_map_vectors(void);
> +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +int __hyp_create_mappings(unsigned long start, unsigned long size,
> + unsigned long phys, unsigned long prot);
> +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
> + unsigned long prot);
> +
nit: I also thought that the hyp_create_mappings function names are a
bit confusing, since there's the create_hyp_mappings functions which
use the aforementioned *hyp_pgtable.
> +static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
> + unsigned long *start, unsigned long *end)
> +{
> + unsigned long nr_pages = size >> PAGE_SHIFT;
> + struct hyp_page *p = hyp_phys_to_page(phys);
> +
> + *start = (unsigned long)p;
> + *end = *start + nr_pages * sizeof(struct hyp_page);
> + *start = ALIGN_DOWN(*start, PAGE_SIZE);
> + *end = ALIGN(*end, PAGE_SIZE);
> +}
> +
> +static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
> +{
> + unsigned long total = 0, i;
> +
> + /* Provision the worst case scenario with 4 levels of page-table */
> + for (i = 0; i < 4; i++) {
> + nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
> + total += nr_pages;
> + }
> +
> + return total;
> +}
> +
> +static inline unsigned long hyp_s1_pgtable_size(void)
> +{
> + struct hyp_memblock_region *reg;
> + unsigned long nr_pages, res = 0;
> + int i;
> +
> + if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
> + return 0;
> +
> + for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
> + reg = &kvm_nvhe_sym(hyp_memory)[i];
> + nr_pages = (reg->end - reg->start) >> PAGE_SHIFT;
> + nr_pages = __hyp_pgtable_max_pages(nr_pages);
> + res += nr_pages << PAGE_SHIFT;
> + }
> +
> + /* Allow 1 GiB for private mappings */
> + nr_pages = (1 << 30) >> PAGE_SHIFT;
> + nr_pages = __hyp_pgtable_max_pages(nr_pages);
> + res += nr_pages << PAGE_SHIFT;
> +
> + return res;
> +}
> +
> +#endif /* __KVM_HYP_MM_H */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 72cfe53f106f..d7381a503182 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -11,9 +11,9 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
>
> obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> - cache.o cpufeature.o
> + cache.o cpufeature.o setup.o mm.o
> obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> - ../fpsimd.o ../hyp-entry.o ../exception.o
> + ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
> obj-y += $(lib-objs)
>
> ##
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index 8f3602f320ac..e2d62297edfe 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -247,4 +247,34 @@ alternative_else_nop_endif
>
> SYM_CODE_END(__kvm_handle_stub_hvc)
>
> +SYM_FUNC_START(__kvm_init_switch_pgd)
> + /* Turn the MMU off */
> + pre_disable_mmu_workaround
> + mrs x2, sctlr_el2
> + bic x3, x2, #SCTLR_ELx_M
> + msr sctlr_el2, x3
> + isb
> +
> + tlbi alle2
> +
> + /* Install the new pgtables */
> + ldr x3, [x0, #NVHE_INIT_PGD_PA]
> + phys_to_ttbr x4, x3
> +alternative_if ARM64_HAS_CNP
> + orr x4, x4, #TTBR_CNP_BIT
> +alternative_else_nop_endif
> + msr ttbr0_el2, x4
> +
> + /* Set the new stack pointer */
> + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> + mov sp, x0
> +
> + /* And turn the MMU back on! */
> + dsb nsh
> + isb
> + msr sctlr_el2, x2
> + isb
> + ret x1
> +SYM_FUNC_END(__kvm_init_switch_pgd)
> +
Should the instruction cache be flushed here (ic iallu), to discard
speculatively fetched instructions?
> .popsection
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 933329699425..a0bfe0d26da6 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -6,12 +6,15 @@
>
> #include <hyp/switch.h>
>
> +#include <asm/pgtable-types.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_emulate.h>
> #include <asm/kvm_host.h>
> #include <asm/kvm_hyp.h>
> #include <asm/kvm_mmu.h>
>
> +#include <nvhe/mm.h>
> +
> DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
>
> #define cpu_reg(ctxt, r) (ctxt)->regs.regs[r]
> @@ -106,6 +109,43 @@ static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
> __vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
> }
>
> +static void handle___kvm_hyp_protect(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
> + DECLARE_REG(unsigned long, size, host_ctxt, 2);
> + DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
> + DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
> +
> + cpu_reg(host_ctxt, 1) = __kvm_hyp_protect(phys, size, nr_cpus,
> + per_cpu_base);
> +}
> +
> +static void handle___hyp_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(enum arm64_hyp_spectre_vector, slot, host_ctxt, 1);
> +
> + cpu_reg(host_ctxt, 1) = hyp_cpu_set_vector(slot);
> +}
> +
> +static void handle___hyp_create_mappings(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(unsigned long, start, host_ctxt, 1);
> + DECLARE_REG(unsigned long, size, host_ctxt, 2);
> + DECLARE_REG(unsigned long, phys, host_ctxt, 3);
> + DECLARE_REG(unsigned long, prot, host_ctxt, 4);
> +
> + cpu_reg(host_ctxt, 1) = __hyp_create_mappings(start, size, phys, prot);
> +}
> +
> +static void handle___hyp_create_private_mapping(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
> + DECLARE_REG(size_t, size, host_ctxt, 2);
> + DECLARE_REG(unsigned long, prot, host_ctxt, 3);
> +
> + cpu_reg(host_ctxt, 1) = __hyp_create_private_mapping(phys, size, prot);
> +}
> +
> typedef void (*hcall_t)(struct kvm_cpu_context *);
>
> #define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = kimg_fn_ptr(handle_##x)
> @@ -125,6 +165,10 @@ static const hcall_t *host_hcall[] = {
> HANDLE_FUNC(__kvm_get_mdcr_el2),
> HANDLE_FUNC(__vgic_v3_save_aprs),
> HANDLE_FUNC(__vgic_v3_restore_aprs),
> + HANDLE_FUNC(__kvm_hyp_protect),
> + HANDLE_FUNC(__hyp_cpu_set_vector),
> + HANDLE_FUNC(__hyp_create_mappings),
> + HANDLE_FUNC(__hyp_create_private_mapping),
> };
>
> static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> new file mode 100644
> index 000000000000..cad5dae197c6
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -0,0 +1,175 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <[email protected]>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/spectre.h>
> +
> +#include <nvhe/early_alloc.h>
> +#include <nvhe/gfp.h>
> +#include <nvhe/memory.h>
> +#include <nvhe/mm.h>
> +#include <nvhe/spinlock.h>
> +
> +struct kvm_pgtable hyp_pgtable;
> +
> +hyp_spinlock_t __hyp_pgd_lock;
> +u64 __io_map_base;
> +
> +struct hyp_memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> +int hyp_memblock_nr;
> +
> +int __hyp_create_mappings(unsigned long start, unsigned long size,
> + unsigned long phys, unsigned long prot)
> +{
> + int err;
> +
> + hyp_spin_lock(&__hyp_pgd_lock);
> + err = kvm_pgtable_hyp_map(&hyp_pgtable, start, size, phys, prot);
> + hyp_spin_unlock(&__hyp_pgd_lock);
> +
> + return err;
> +}
> +
> +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
> + unsigned long prot)
> +{
> + unsigned long addr;
> + int ret;
> +
> + hyp_spin_lock(&__hyp_pgd_lock);
> +
> + size = PAGE_ALIGN(size + offset_in_page(phys));
> + addr = __io_map_base;
> + __io_map_base += size;
> +
> + /* Are we overflowing on the vmemmap ? */
> + if (__io_map_base > __hyp_vmemmap) {
> + __io_map_base -= size;
> + addr = 0;
> + goto out;
> + }
> +
> + ret = kvm_pgtable_hyp_map(&hyp_pgtable, addr, size, phys, prot);
> + if (ret) {
> + addr = 0;
> + goto out;
> + }
> +
> + addr = addr + offset_in_page(phys);
> +out:
> + hyp_spin_unlock(&__hyp_pgd_lock);
> +
> + return addr;
> +}
> +
> +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> +{
> + unsigned long start = (unsigned long)from;
> + unsigned long end = (unsigned long)to;
> + unsigned long virt_addr;
> + phys_addr_t phys;
> +
> + start = start & PAGE_MASK;
> + end = PAGE_ALIGN(end);
> +
> + for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
> + int err;
> +
> + phys = hyp_virt_to_phys((void *)virt_addr);
> + err = __hyp_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
> +{
> + unsigned long start, end;
> +
> + hyp_vmemmap_range(phys, size, &start, &end);
> +
> + return __hyp_create_mappings(start, end - start, back, PAGE_HYP);
> +}
> +
> +static void *__hyp_bp_vect_base;
> +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
> +{
> + void *vector;
> +
> + switch (slot) {
> + case HYP_VECTOR_DIRECT: {
> + vector = hyp_symbol_addr(__kvm_hyp_vector);
> + break;
> + }
> + case HYP_VECTOR_SPECTRE_DIRECT: {
> + vector = hyp_symbol_addr(__bp_harden_hyp_vecs);
> + break;
> + }
> + case HYP_VECTOR_INDIRECT:
> + case HYP_VECTOR_SPECTRE_INDIRECT: {
> + vector = (void *)__hyp_bp_vect_base;
> + break;
> + }
> + default:
> + return -EINVAL;
> + }
> +
> + vector = __kvm_vector_slot2addr(vector, slot);
> + *this_cpu_ptr(&kvm_hyp_vector) = (unsigned long)vector;
> +
> + return 0;
> +}
> +
> +int hyp_map_vectors(void)
> +{
> + unsigned long bp_base;
> +
> + if (!cpus_have_const_cap(ARM64_SPECTRE_V3A))
> + return 0;
> +
> + bp_base = (unsigned long)hyp_symbol_addr(__bp_harden_hyp_vecs);
> + bp_base = __hyp_pa(bp_base);
> + bp_base = __hyp_create_private_mapping(bp_base, __BP_HARDEN_HYP_VECS_SZ,
> + PAGE_HYP_EXEC);
> + if (!bp_base)
> + return -1;
> +
> + __hyp_bp_vect_base = (void *)bp_base;
> +
> + return 0;
> +}
> +
> +int hyp_create_idmap(void)
> +{
> + unsigned long start, end;
> +
> + start = (unsigned long)hyp_symbol_addr(__hyp_idmap_text_start);
> + start = hyp_virt_to_phys((void *)start);
> + start = ALIGN_DOWN(start, PAGE_SIZE);
> +
> + end = (unsigned long)hyp_symbol_addr(__hyp_idmap_text_end);
> + end = hyp_virt_to_phys((void *)end);
> + end = ALIGN(end, PAGE_SIZE);
> +
> + /*
> + * One half of the VA space is reserved to linearly map portions of
> + * memory -- see va_layout.c for more details. The other half of the VA
> + * space contains the trampoline page, and needs some care. Split that
> + * second half in two and find the quarter of VA space not conflicting
> + * with the idmap to place the IOs and the vmemmap. IOs use the lower
> + * half of the quarter and the vmemmap the upper half.
> + */
> + __io_map_base = start & BIT(hyp_va_bits - 2);
> + __io_map_base ^= BIT(hyp_va_bits - 2);
> + __hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
> +
> + return __hyp_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
> +}
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index dbe57ae84a0c..cfc6dac0f0ac 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -193,8 +193,6 @@ static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
> return ret;
> }
>
> -void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
> -
> asmlinkage void __noreturn __kvm_hyp_psci_cpu_entry(void)
> {
> struct kvm_host_psci_state *cpu_state = this_cpu_ptr(&kvm_host_psci_state);
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> new file mode 100644
> index 000000000000..9679c97b875b
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <[email protected]>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_pgtable.h>
> +
> +#include <nvhe/early_alloc.h>
> +#include <nvhe/gfp.h>
> +#include <nvhe/memory.h>
> +#include <nvhe/mm.h>
> +
> +struct hyp_pool hpool;
> +struct kvm_pgtable_mm_ops hyp_pgtable_mm_ops;
> +unsigned long hyp_nr_cpus;
> +
> +#define hyp_percpu_size ((unsigned long)__per_cpu_end - \
> + (unsigned long)__per_cpu_start)
> +
> +static void *stacks_base;
> +static void *vmemmap_base;
> +static void *hyp_pgt_base;
> +
> +static int divide_memory_pool(void *virt, unsigned long size)
> +{
> + unsigned long vstart, vend, nr_pages;
> +
> + hyp_early_alloc_init(virt, size);
> +
> + stacks_base = hyp_early_alloc_contig(hyp_nr_cpus);
> + if (!stacks_base)
> + return -ENOMEM;
> +
> + hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
> + nr_pages = (vend - vstart) >> PAGE_SHIFT;
> + vmemmap_base = hyp_early_alloc_contig(nr_pages);
> + if (!vmemmap_base)
> + return -ENOMEM;
> +
> + nr_pages = hyp_s1_pgtable_size() >> PAGE_SHIFT;
> + hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
> + if (!hyp_pgt_base)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> + unsigned long *per_cpu_base)
> +{
> + void *start, *end, *virt = hyp_phys_to_virt(phys);
> + int ret, i;
> +
> + /* Recreate the hyp page-table using the early page allocator */
> + hyp_early_alloc_init(hyp_pgt_base, hyp_s1_pgtable_size());
> + ret = kvm_pgtable_hyp_init(&hyp_pgtable, hyp_va_bits,
> + &hyp_early_alloc_mm_ops);
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_idmap();
> + if (ret)
> + return ret;
> +
> + ret = hyp_map_vectors();
> + if (ret)
> + return ret;
> +
> + ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_mappings(hyp_symbol_addr(__hyp_text_start),
> + hyp_symbol_addr(__hyp_text_end),
> + PAGE_HYP_EXEC);
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_mappings(hyp_symbol_addr(__start_rodata),
> + hyp_symbol_addr(__end_rodata), PAGE_HYP_RO);
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_mappings(hyp_symbol_addr(__hyp_data_ro_after_init_start),
> + hyp_symbol_addr(__hyp_data_ro_after_init_end),
> + PAGE_HYP_RO);
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_mappings(hyp_symbol_addr(__bss_start),
> + hyp_symbol_addr(__hyp_bss_end), PAGE_HYP);
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_mappings(hyp_symbol_addr(__hyp_bss_end),
> + hyp_symbol_addr(__bss_stop), PAGE_HYP_RO);
> + if (ret)
> + return ret;
> +
> + ret = hyp_create_mappings(virt, virt + size - 1, PAGE_HYP);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < hyp_nr_cpus; i++) {
> + start = (void *)kern_hyp_va(per_cpu_base[i]);
> + end = start + PAGE_ALIGN(hyp_percpu_size);
> + ret = hyp_create_mappings(start, end, PAGE_HYP);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static void update_nvhe_init_params(void)
> +{
> + struct kvm_nvhe_init_params *params;
> + unsigned long i, stack;
> +
> + for (i = 0; i < hyp_nr_cpus; i++) {
> + stack = (unsigned long)stacks_base + (i << PAGE_SHIFT);
> + params = per_cpu_ptr(&kvm_init_params, i);
> + params->stack_hyp_va = stack + PAGE_SIZE;
> + params->pgd_pa = __hyp_pa(hyp_pgtable.pgd);
> + __flush_dcache_area(params, sizeof(*params));
> + }
> +}
> +
> +static void *hyp_zalloc_hyp_page(void *arg)
> +{
> + return hyp_alloc_pages(&hpool, HYP_GFP_ZERO, 0);
> +}
> +
> +void __noreturn __kvm_hyp_protect_finalise(void)
> +{
> + struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> + struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
> + unsigned long nr_pages, used_pages;
> + int ret;
> +
> + /* Now that the vmemmap is backed, install the full-fledged allocator */
> + nr_pages = hyp_s1_pgtable_size() >> PAGE_SHIFT;
> + used_pages = hyp_early_alloc_nr_pages();
> + ret = hyp_pool_init(&hpool, __hyp_pa(hyp_pgt_base), nr_pages, used_pages);
> + if (ret)
> + goto out;
> +
> + hyp_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
> + hyp_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
> + hyp_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
> + hyp_pgtable_mm_ops.get_page = hyp_get_page;
> + hyp_pgtable_mm_ops.put_page = hyp_put_page;
> + hyp_pgtable.mm_ops = &hyp_pgtable_mm_ops;
> +
> +out:
> + host_ctxt->regs.regs[0] = SMCCC_RET_SUCCESS;
> + host_ctxt->regs.regs[1] = ret;
> +
> + __host_enter(host_ctxt);
> +}
> +
> +int __kvm_hyp_protect(phys_addr_t phys, unsigned long size,
> + unsigned long nr_cpus, unsigned long *per_cpu_base)
> +{
> + struct kvm_nvhe_init_params *params;
> + void *virt = hyp_phys_to_virt(phys);
> + void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
> + int ret;
> +
> + if (phys % PAGE_SIZE || size % PAGE_SIZE || (u64)virt % PAGE_SIZE)
> + return -EINVAL;
> +
> + hyp_spin_lock_init(&__hyp_pgd_lock);
> + hyp_nr_cpus = nr_cpus;
> +
> + ret = divide_memory_pool(virt, size);
> + if (ret)
> + return ret;
> +
> + ret = recreate_hyp_mappings(phys, size, per_cpu_base);
> + if (ret)
> + return ret;
> +
> + update_nvhe_init_params();
> +
> + /* Jump in the idmap page to switch to the new page-tables */
> + params = this_cpu_ptr(&kvm_init_params);
> + fn = (typeof(fn))__hyp_pa(hyp_symbol_addr(__kvm_init_switch_pgd));
> + fn(__hyp_pa(params), hyp_symbol_addr(__kvm_hyp_protect_finalise));
> +
> + unreachable();
> +}
> diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
> new file mode 100644
> index 000000000000..02b0b18006f5
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/reserved_mem.c
> @@ -0,0 +1,75 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: Quentin Perret <[email protected]>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <linux/memblock.h>
> +
> +#include <asm/kvm_host.h>
> +
> +#include <nvhe/memory.h>
> +#include <nvhe/mm.h>
> +
> +phys_addr_t hyp_mem_base;
> +phys_addr_t hyp_mem_size;
> +
> +void __init early_init_dt_add_memory_hyp(u64 base, u64 size)
> +{
> + struct hyp_memblock_region *reg;
> +
> + if (kvm_nvhe_sym(hyp_memblock_nr) >= HYP_MEMBLOCK_REGIONS)
> + kvm_nvhe_sym(hyp_memblock_nr) = -1;
> +
> + if (kvm_nvhe_sym(hyp_memblock_nr) < 0)
> + return;
> +
> + reg = kvm_nvhe_sym(hyp_memory);
> + reg[kvm_nvhe_sym(hyp_memblock_nr)].start = base;
> + reg[kvm_nvhe_sym(hyp_memblock_nr)].end = base + size;
> + kvm_nvhe_sym(hyp_memblock_nr)++;
> +}
> +
> +extern bool enable_protected_kvm;
> +void __init reserve_kvm_hyp(void)
> +{
> + u64 nr_pages, prev;
> +
> + if (!enable_protected_kvm)
> + return;
> +
> + if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
> + return;
> +
> + if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
> + return;
> +
> + hyp_mem_size += num_possible_cpus() << PAGE_SHIFT;
> + hyp_mem_size += hyp_s1_pgtable_size();
> +
> + /*
> + * The hyp_vmemmap needs to be backed by pages, but these pages
> + * themselves need to be present in the vmemmap, so compute the number
> + * of pages needed by looking for a fixed point.
> + */
> + nr_pages = 0;
> + do {
> + prev = nr_pages;
> + nr_pages = (hyp_mem_size >> PAGE_SHIFT) + prev;
> + nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
> + nr_pages += __hyp_pgtable_max_pages(nr_pages);
> + } while (nr_pages != prev);
> + hyp_mem_size += nr_pages << PAGE_SHIFT;
> +
> + hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
> + hyp_mem_size, SZ_2M);
> + if (!hyp_mem_base) {
> + kvm_err("Failed to reserve hyp memory\n");
> + return;
> + }
> + memblock_reserve(hyp_mem_base, hyp_mem_size);
> +
> + kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
> + hyp_mem_base);
> +}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 278e163beda4..3cf9397dabdb 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1264,10 +1264,10 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
> .virt_to_phys = kvm_host_pa,
> };
>
> +u32 hyp_va_bits;
> int kvm_mmu_init(void)
> {
> int err;
> - u32 hyp_va_bits;
>
> hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
> hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 095540667f0f..f81da019b677 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -34,6 +34,7 @@
> #include <asm/fixmap.h>
> #include <asm/kasan.h>
> #include <asm/kernel-pgtable.h>
> +#include <asm/kvm_host.h>
> #include <asm/memory.h>
> #include <asm/numa.h>
> #include <asm/sections.h>
> @@ -390,6 +391,8 @@ void __init arm64_memblock_init(void)
>
> reserve_elfcorehdr();
>
> + reserve_kvm_hyp();
> +
> high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
>
> dma_contiguous_reserve(arm64_dma32_phys_limit);
> --
> 2.29.2.299.gdc1121823c-goog
Cheers,
/fuad
On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
<snip>
> > +int hyp_create_idmap(void);
> > +int hyp_map_vectors(void);
> > +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> > +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> > +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > +int __hyp_create_mappings(unsigned long start, unsigned long size,
> > + unsigned long phys, unsigned long prot);
> > +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
> > + unsigned long prot);
> > +
>
> nit: I also thought that the hyp_create_mappings function names are a
> bit confusing, since there's the create_hyp_mappings functions which
> use the aforementioned *hyp_pgtable.
Sure, happy to re-name those (and hyp_pgtable above). Any suggestions?
<snip>
> > +SYM_FUNC_START(__kvm_init_switch_pgd)
> > + /* Turn the MMU off */
> > + pre_disable_mmu_workaround
> > + mrs x2, sctlr_el2
> > + bic x3, x2, #SCTLR_ELx_M
> > + msr sctlr_el2, x3
> > + isb
> > +
> > + tlbi alle2
> > +
> > + /* Install the new pgtables */
> > + ldr x3, [x0, #NVHE_INIT_PGD_PA]
> > + phys_to_ttbr x4, x3
> > +alternative_if ARM64_HAS_CNP
> > + orr x4, x4, #TTBR_CNP_BIT
> > +alternative_else_nop_endif
> > + msr ttbr0_el2, x4
> > +
> > + /* Set the new stack pointer */
> > + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> > + mov sp, x0
> > +
> > + /* And turn the MMU back on! */
> > + dsb nsh
> > + isb
> > + msr sctlr_el2, x2
> > + isb
> > + ret x1
> > +SYM_FUNC_END(__kvm_init_switch_pgd)
> > +
>
> Should the instruction cache be flushed here (ic iallu), to discard
> speculatively fetched instructions?
Hmm, Will? Thoughts?
Thanks,
Quentin
On Fri, Dec 04, 2020 at 06:01:52PM +0000, Quentin Perret wrote:
> On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
> <snip>
> > > +int hyp_create_idmap(void);
> > > +int hyp_map_vectors(void);
> > > +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> > > +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> > > +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > > +int __hyp_create_mappings(unsigned long start, unsigned long size,
> > > + unsigned long phys, unsigned long prot);
> > > +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
> > > + unsigned long prot);
> > > +
> >
> > nit: I also thought that the hyp_create_mappings function names are a
> > bit confusing, since there's the create_hyp_mappings functions which
> > use the aforementioned *hyp_pgtable.
>
> Sure, happy to re-name those (and hyp_pgtable above). Any suggestions?
>
>
> <snip>
> > > +SYM_FUNC_START(__kvm_init_switch_pgd)
> > > + /* Turn the MMU off */
> > > + pre_disable_mmu_workaround
> > > + mrs x2, sctlr_el2
> > > + bic x3, x2, #SCTLR_ELx_M
> > > + msr sctlr_el2, x3
> > > + isb
> > > +
> > > + tlbi alle2
> > > +
> > > + /* Install the new pgtables */
> > > + ldr x3, [x0, #NVHE_INIT_PGD_PA]
> > > + phys_to_ttbr x4, x3
> > > +alternative_if ARM64_HAS_CNP
> > > + orr x4, x4, #TTBR_CNP_BIT
> > > +alternative_else_nop_endif
> > > + msr ttbr0_el2, x4
> > > +
> > > + /* Set the new stack pointer */
> > > + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> > > + mov sp, x0
> > > +
> > > + /* And turn the MMU back on! */
> > > + dsb nsh
> > > + isb
> > > + msr sctlr_el2, x2
> > > + isb
> > > + ret x1
> > > +SYM_FUNC_END(__kvm_init_switch_pgd)
> > > +
> >
> > Should the instruction cache be flushed here (ic iallu), to discard
> > speculatively fetched instructions?
>
> Hmm, Will? Thoughts?
The I-cache is physically tagged, so not sure what invalidation would
achieve here. Fuad -- what do you think could go wrong specifically?
Will
On Mon, Dec 07, 2020 at 10:20:03AM +0000, Will Deacon wrote:
> On Fri, Dec 04, 2020 at 06:01:52PM +0000, Quentin Perret wrote:
> > On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
> > <snip>
> > > > +SYM_FUNC_START(__kvm_init_switch_pgd)
> > > > + /* Turn the MMU off */
> > > > + pre_disable_mmu_workaround
> > > > + mrs x2, sctlr_el2
> > > > + bic x3, x2, #SCTLR_ELx_M
> > > > + msr sctlr_el2, x3
> > > > + isb
> > > > +
> > > > + tlbi alle2
> > > > +
> > > > + /* Install the new pgtables */
> > > > + ldr x3, [x0, #NVHE_INIT_PGD_PA]
> > > > + phys_to_ttbr x4, x3
> > > > +alternative_if ARM64_HAS_CNP
> > > > + orr x4, x4, #TTBR_CNP_BIT
> > > > +alternative_else_nop_endif
> > > > + msr ttbr0_el2, x4
> > > > +
> > > > + /* Set the new stack pointer */
> > > > + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> > > > + mov sp, x0
> > > > +
> > > > + /* And turn the MMU back on! */
> > > > + dsb nsh
> > > > + isb
> > > > + msr sctlr_el2, x2
> > > > + isb
> > > > + ret x1
> > > > +SYM_FUNC_END(__kvm_init_switch_pgd)
> > > > +
> > >
> > > Should the instruction cache be flushed here (ic iallu), to discard
> > > speculatively fetched instructions?
> >
> > Hmm, Will? Thoughts?
>
> The I-cache is physically tagged, so not sure what invalidation would
> achieve here. Fuad -- what do you think could go wrong specifically?
While the MMU is off, instruction fetches can be made from the PoC
rather than the PoU, so where instructions have been modified/copied and
not cleaned to the PoC, it's possible to fetch stale copies into the
I-caches. The physical tag doesn't prevent that.
In the regular CPU boot paths, __enabble_mmu() has an IC IALLU after
enabling the MMU to ensure that we get rid of anything stale (e.g. so
secondaries don't miss ftrace patching, which is only cleaned to the
PoU).
That might not be a problem here, if things are suitably padded and
never dynamically patched, but if so it's probably worth a comment.
Fuad, is that the sort of thing you were considering, or did you have
additional concerns?
Thanks,
Mark.
On Mon, Dec 07, 2020 at 11:05:45AM +0000, Mark Rutland wrote:
> On Mon, Dec 07, 2020 at 10:20:03AM +0000, Will Deacon wrote:
> > On Fri, Dec 04, 2020 at 06:01:52PM +0000, Quentin Perret wrote:
> > > On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
> > > <snip>
> > > > > +SYM_FUNC_START(__kvm_init_switch_pgd)
> > > > > + /* Turn the MMU off */
> > > > > + pre_disable_mmu_workaround
> > > > > + mrs x2, sctlr_el2
> > > > > + bic x3, x2, #SCTLR_ELx_M
> > > > > + msr sctlr_el2, x3
> > > > > + isb
> > > > > +
> > > > > + tlbi alle2
> > > > > +
> > > > > + /* Install the new pgtables */
> > > > > + ldr x3, [x0, #NVHE_INIT_PGD_PA]
> > > > > + phys_to_ttbr x4, x3
> > > > > +alternative_if ARM64_HAS_CNP
> > > > > + orr x4, x4, #TTBR_CNP_BIT
> > > > > +alternative_else_nop_endif
> > > > > + msr ttbr0_el2, x4
> > > > > +
> > > > > + /* Set the new stack pointer */
> > > > > + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> > > > > + mov sp, x0
> > > > > +
> > > > > + /* And turn the MMU back on! */
> > > > > + dsb nsh
> > > > > + isb
> > > > > + msr sctlr_el2, x2
> > > > > + isb
> > > > > + ret x1
> > > > > +SYM_FUNC_END(__kvm_init_switch_pgd)
> > > > > +
> > > >
> > > > Should the instruction cache be flushed here (ic iallu), to discard
> > > > speculatively fetched instructions?
> > >
> > > Hmm, Will? Thoughts?
> >
> > The I-cache is physically tagged, so not sure what invalidation would
> > achieve here. Fuad -- what do you think could go wrong specifically?
>
> While the MMU is off, instruction fetches can be made from the PoC
> rather than the PoU, so where instructions have been modified/copied and
> not cleaned to the PoC, it's possible to fetch stale copies into the
> I-caches. The physical tag doesn't prevent that.
Oh yeah, we even have a comment about that in
idmap_kpti_install_ng_mappings(). Maybe we should wrap disable_mmu and
enable_mmu in some macros so we don't have to trip over this every time (and
this would mean we could get rid of pre_disable_mmu_workaround too).
> In the regular CPU boot paths, __enabble_mmu() has an IC IALLU after
> enabling the MMU to ensure that we get rid of anything stale (e.g. so
> secondaries don't miss ftrace patching, which is only cleaned to the
> PoU).
>
> That might not be a problem here, if things are suitably padded and
> never dynamically patched, but if so it's probably worth a comment.
It's fragile enough that we should just do the invalidation.
Will
On Mon, Dec 7, 2020 at 11:05 AM Mark Rutland <[email protected]> wrote:
>
> On Mon, Dec 07, 2020 at 10:20:03AM +0000, Will Deacon wrote:
> > On Fri, Dec 04, 2020 at 06:01:52PM +0000, Quentin Perret wrote:
> > > On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
> > > <snip>
> > > > > +SYM_FUNC_START(__kvm_init_switch_pgd)
> > > > > + /* Turn the MMU off */
> > > > > + pre_disable_mmu_workaround
> > > > > + mrs x2, sctlr_el2
> > > > > + bic x3, x2, #SCTLR_ELx_M
> > > > > + msr sctlr_el2, x3
> > > > > + isb
> > > > > +
> > > > > + tlbi alle2
> > > > > +
> > > > > + /* Install the new pgtables */
> > > > > + ldr x3, [x0, #NVHE_INIT_PGD_PA]
> > > > > + phys_to_ttbr x4, x3
> > > > > +alternative_if ARM64_HAS_CNP
> > > > > + orr x4, x4, #TTBR_CNP_BIT
> > > > > +alternative_else_nop_endif
> > > > > + msr ttbr0_el2, x4
> > > > > +
> > > > > + /* Set the new stack pointer */
> > > > > + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> > > > > + mov sp, x0
> > > > > +
> > > > > + /* And turn the MMU back on! */
> > > > > + dsb nsh
> > > > > + isb
> > > > > + msr sctlr_el2, x2
> > > > > + isb
> > > > > + ret x1
> > > > > +SYM_FUNC_END(__kvm_init_switch_pgd)
> > > > > +
> > > >
> > > > Should the instruction cache be flushed here (ic iallu), to discard
> > > > speculatively fetched instructions?
> > >
> > > Hmm, Will? Thoughts?
> >
> > The I-cache is physically tagged, so not sure what invalidation would
> > achieve here. Fuad -- what do you think could go wrong specifically?
>
> While the MMU is off, instruction fetches can be made from the PoC
> rather than the PoU, so where instructions have been modified/copied and
> not cleaned to the PoC, it's possible to fetch stale copies into the
> I-caches. The physical tag doesn't prevent that.
>
> In the regular CPU boot paths, __enabble_mmu() has an IC IALLU after
> enabling the MMU to ensure that we get rid of anything stale (e.g. so
> secondaries don't miss ftrace patching, which is only cleaned to the
> PoU).
>
> That might not be a problem here, if things are suitably padded and
> never dynamically patched, but if so it's probably worth a comment.
>
> Fuad, is that the sort of thing you were considering, or did you have
> additional concerns?
No other concerns. Thanks Mark.
/fuad
On Fri, Dec 4, 2020 at 6:01 PM Quentin Perret <[email protected]> wrote:
>
> On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
> <snip>
> > > +int hyp_create_idmap(void);
> > > +int hyp_map_vectors(void);
> > > +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> > > +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> > > +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > > +int __hyp_create_mappings(unsigned long start, unsigned long size,
> > > + unsigned long phys, unsigned long prot);
> > > +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
> > > + unsigned long prot);
> > > +
> >
> > nit: I also thought that the hyp_create_mappings function names are a
> > bit confusing, since there's the create_hyp_mappings functions which
> > use the aforementioned *hyp_pgtable.
>
> Sure, happy to re-name those (and hyp_pgtable above). Any suggestions?
Perhaps something to indicate that these are temporary, tmp_ or
bootstrap_ maybe?
Cheers,
/fuad
On Monday 07 Dec 2020 at 11:16:05 (+0000), Fuad Tabba wrote:
> On Fri, Dec 4, 2020 at 6:01 PM Quentin Perret <[email protected]> wrote:
> >
> > On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
> > <snip>
> > > > +int hyp_create_idmap(void);
> > > > +int hyp_map_vectors(void);
> > > > +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> > > > +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> > > > +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > > > +int __hyp_create_mappings(unsigned long start, unsigned long size,
> > > > + unsigned long phys, unsigned long prot);
> > > > +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
> > > > + unsigned long prot);
> > > > +
> > >
> > > nit: I also thought that the hyp_create_mappings function names are a
> > > bit confusing, since there's the create_hyp_mappings functions which
> > > use the aforementioned *hyp_pgtable.
> >
> > Sure, happy to re-name those (and hyp_pgtable above). Any suggestions?
>
> Perhaps something to indicate that these are temporary, tmp_ or
> bootstrap_ maybe?
Hmm, the thing is these are temporary only in protected mode, they're
permanent otherwise :/
Perhaps I could prefix the protected pgtable (and associated functions)
with 'pkvm_' or so? Marc, any preferences?
Thanks,
Quentin
Hi Quentin,
On Tue, Nov 17, 2020 at 06:15:56PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the Hyp code needs the ability to
> create and manage its own page-table. To do so, introduce a new set of
> hypercalls to initialize Hyp memory protection.
>
> During the init hcall, the hypervisor runs with the host-provided
> page-table and uses the trivial early page allocator to create its own
> set of page-tables, using a memory pool that was donated by the host.
> Specifically, the hypervisor creates its own mappings for __hyp_text,
> the Hyp memory pool, the __hyp_bss, the portion of hyp_vmemmap
> corresponding to the Hyp pool, among other things. It then jumps back in
> the idmap page, switches to use the newly-created pgd (instead of the
> temporary one provided by the host) and then installs the full-fledged
> buddy allocator which will then be the only one in used from then on.
>
> Note that for the sake of symplifying the review, this only introduces
> the code doing this operation, without actually being called by anyhing
> yet. This will be done in a subsequent patch, which will introduce the
> necessary host kernel changes.
[...]
> diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
> new file mode 100644
> index 000000000000..02b0b18006f5
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/reserved_mem.c
[...]
> +extern bool enable_protected_kvm;
> +void __init reserve_kvm_hyp(void)
> +{
> + u64 nr_pages, prev;
> +
> + if (!enable_protected_kvm)
> + return;
> +
> + if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
> + return;
> +
> + if (kvm_nvhe_sym(hyp_memblock_nr) <= 0)
> + return;
> +
> + hyp_mem_size += num_possible_cpus() << PAGE_SHIFT;
> + hyp_mem_size += hyp_s1_pgtable_size();
> +
> + /*
> + * The hyp_vmemmap needs to be backed by pages, but these pages
> + * themselves need to be present in the vmemmap, so compute the number
> + * of pages needed by looking for a fixed point.
> + */
> + nr_pages = 0;
> + do {
> + prev = nr_pages;
> + nr_pages = (hyp_mem_size >> PAGE_SHIFT) + prev;
> + nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
> + nr_pages += __hyp_pgtable_max_pages(nr_pages);
> + } while (nr_pages != prev);
> + hyp_mem_size += nr_pages << PAGE_SHIFT;
> +
> + hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
> + hyp_mem_size, SZ_2M);
> + if (!hyp_mem_base) {
> + kvm_err("Failed to reserve hyp memory\n");
> + return;
> + }
> + memblock_reserve(hyp_mem_base, hyp_mem_size);
Why not use the RESERVEDMEM_OF_DECLARE() interface for the hypervisor
memory? That way, the hypervisor memory can either be statically partitioned
as a carveout or allocated dynamically for us -- we wouldn't need to care.
Will
On 2020-12-07 11:58, Quentin Perret wrote:
> On Monday 07 Dec 2020 at 11:16:05 (+0000), Fuad Tabba wrote:
>> On Fri, Dec 4, 2020 at 6:01 PM Quentin Perret <[email protected]>
>> wrote:
>> >
>> > On Thursday 03 Dec 2020 at 12:57:33 (+0000), Fuad Tabba wrote:
>> > <snip>
>> > > > +int hyp_create_idmap(void);
>> > > > +int hyp_map_vectors(void);
>> > > > +int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
>> > > > +int hyp_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
>> > > > +int hyp_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
>> > > > +int __hyp_create_mappings(unsigned long start, unsigned long size,
>> > > > + unsigned long phys, unsigned long prot);
>> > > > +unsigned long __hyp_create_private_mapping(phys_addr_t phys, size_t size,
>> > > > + unsigned long prot);
>> > > > +
>> > >
>> > > nit: I also thought that the hyp_create_mappings function names are a
>> > > bit confusing, since there's the create_hyp_mappings functions which
>> > > use the aforementioned *hyp_pgtable.
>> >
>> > Sure, happy to re-name those (and hyp_pgtable above). Any suggestions?
>>
>> Perhaps something to indicate that these are temporary, tmp_ or
>> bootstrap_ maybe?
>
> Hmm, the thing is these are temporary only in protected mode, they're
> permanent otherwise :/
>
> Perhaps I could prefix the protected pgtable (and associated functions)
> with 'pkvm_' or so? Marc, any preferences?
None. Whichever name you pick, someone will ask you to change it.
Just call it Bob.
What I really *don't* want is see a blanket rename of existing symbols
or concepts.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Monday 07 Dec 2020 at 13:40:52 (+0000), Will Deacon wrote:
> Why not use the RESERVEDMEM_OF_DECLARE() interface for the hypervisor
> memory? That way, the hypervisor memory can either be statically partitioned
> as a carveout or allocated dynamically for us -- we wouldn't need to care.
Yup, I did consider that, but the actual amount of memory we need to
reserve for the hypervisor depends on things such as the size of struct
hyp_page, which depends on the kernel you're running (that is, it might
change over time). So, that really felt like something the kernel should
be doing, to keep the DT backward compatible, ... Or did you have
something more elaborate in mind?
Thanks,
Quentin
On Monday 07 Dec 2020 at 13:54:48 (+0000), Marc Zyngier wrote:
> None. Whichever name you pick, someone will ask you to change it.
> Just call it Bob.
:-)
> What I really *don't* want is see a blanket rename of existing symbols
> or concepts.
Understood. I'll go with pkvm_create_mappings() and friends for all the
new functions unless someone comes up with a better name in the meantime.
Thanks,
Quentin
On Mon, Dec 07, 2020 at 02:11:20PM +0000, Quentin Perret wrote:
> On Monday 07 Dec 2020 at 13:40:52 (+0000), Will Deacon wrote:
> > Why not use the RESERVEDMEM_OF_DECLARE() interface for the hypervisor
> > memory? That way, the hypervisor memory can either be statically partitioned
> > as a carveout or allocated dynamically for us -- we wouldn't need to care.
>
> Yup, I did consider that, but the actual amount of memory we need to
> reserve for the hypervisor depends on things such as the size of struct
> hyp_page, which depends on the kernel you're running (that is, it might
> change over time). So, that really felt like something the kernel should
> be doing, to keep the DT backward compatible, ... Or did you have
> something more elaborate in mind?
No, that's fair. Just wanted to make sure we had a good reason not to use
the existing memory reservation code.
Will