2020-11-09 11:34:29

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 00/24] Opt-in always-on nVHE hypervisor

As we progress towards being able to keep guest state private to the
host running nVHE hypervisor, this series allows the hypervisor to
install itself on newly booted CPUs before the host is allowed to run
on them.

All functionality described below is opt-in, guarded by an early param
'kvm-arm.protected'. Future patches specific to the new "protected" mode
should be hidden behind the same param.

The hypervisor starts trapping host SMCs and intercepting host's PSCI
CPU_ON/OFF/SUSPEND calls. It replaces the host's entry point with its
own, initializes the EL2 state of the new CPU and installs the nVHE hyp
vector before ERETing to the host's entry point.

The kernel checks new cores' features against the finalized system
capabilities. To avoid the need to move this code/data to EL2, the
implementation only allows to boot cores that were online at the time of
KVM initialization and therefore had been checked already.

Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
to EL3. Future changes will need to ensure the safety of all SMCs wrt.
private guests.

The host is still allowed to reset EL2 back to the stub vector, eg. for
hibernation or kexec, but will not disable nVHE when there are no VMs.

Tested on Rock Pi 4b, based on 5.10-rc3.

changes since rfc:
* add early param to make features opt-in
* simplify CPU_ON/SUSPEND implementation
* replace spinlocks with CAS atomic
* make cpu_logical_map ro_after_init

-David

David Brazdil (24):
psci: Accessor for configured PSCI version
psci: Accessor for configured PSCI function IDs
arm64: Move MAIR_EL1_SET to asm/memory.h
kvm: arm64: Initialize MAIR_EL2 using a constant
kvm: arm64: Add .hyp.data..ro_after_init ELF section
kvm: arm64: Support per_cpu_ptr in nVHE hyp code
kvm: arm64: Create nVHE copy of cpu_logical_map
kvm: arm64: Move hyp-init params to a per-CPU struct
kvm: arm64: Refactor handle_trap to use a switch
kvm: arm64: Extract parts of el2_setup into a macro
kvm: arm64: Add SMC handler in nVHE EL2
kvm: arm64: Extract __do_hyp_init into a helper function
kvm: arm64: Add CPU entry point in nVHE hyp
kvm: arm64: Add function to enter host from KVM nVHE hyp code
kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2
kvm: arm64: Add offset for hyp VA <-> PA conversion
kvm: arm64: Add __hyp_pa_symbol helper macro
kvm: arm64: Forward safe PSCI SMCs coming from host
kvm: arm64: Intercept host's PSCI_CPU_ON SMCs
kvm: arm64: Intercept host's CPU_SUSPEND PSCI SMCs
kvm: arm64: Add kvm-arm.protected early kernel parameter
kvm: arm64: Keep nVHE EL2 vector installed
kvm: arm64: Trap host SMCs in protected mode.
kvm: arm64: Fix EL2 mode availability checks

arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 136 ++++++++++++++
arch/arm64/include/asm/kvm_hyp.h | 9 +
arch/arm64/include/asm/memory.h | 13 ++
arch/arm64/include/asm/percpu.h | 6 +
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/include/asm/virt.h | 26 +++
arch/arm64/kernel/asm-offsets.c | 5 +
arch/arm64/kernel/head.S | 140 ++------------
arch/arm64/kernel/image-vars.h | 7 +
arch/arm64/kernel/vmlinux.lds.S | 10 +
arch/arm64/kvm/arm.c | 157 ++++++++++++++--
arch/arm64/kvm/hyp/nvhe/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/host.S | 9 +
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 84 +++++++--
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 56 +++++-
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
arch/arm64/kvm/hyp/nvhe/percpu.c | 38 ++++
arch/arm64/kvm/hyp/nvhe/psci.c | 291 +++++++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +-
arch/arm64/mm/proc.S | 13 --
drivers/firmware/psci/psci.c | 25 ++-
include/linux/psci.h | 18 ++
include/uapi/linux/psci.h | 1 +
24 files changed, 865 insertions(+), 190 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/percpu.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/psci.c

--
2.29.2.222.g5d2a92d10f8-goog


2020-11-09 11:34:49

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 02/24] psci: Accessor for configured PSCI function IDs

Function IDs used by PSCI are configurable for v0.1 via DT/APCI. If the
host is using PSCI v0.1, KVM's host PSCI proxy needs to use the same IDs.
Expose the array holding the information with a read-only accessor.

Signed-off-by: David Brazdil <[email protected]>
---
drivers/firmware/psci/psci.c | 14 ++++++--------
include/linux/psci.h | 10 ++++++++++
2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index bc1b2d60fdbf..b67b2ba8a084 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -65,16 +65,14 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned long,
unsigned long, unsigned long);
static psci_fn *invoke_psci_fn;

-enum psci_function {
- PSCI_FN_CPU_SUSPEND,
- PSCI_FN_CPU_ON,
- PSCI_FN_CPU_OFF,
- PSCI_FN_MIGRATE,
- PSCI_FN_MAX,
-};
-
static u32 psci_function_id[PSCI_FN_MAX];

+u32 psci_get_function_id(enum psci_function fn)
+{
+ WARN_ON(fn >= PSCI_FN_MAX);
+ return psci_function_id[fn];
+}
+
#define PSCI_0_2_POWER_STATE_MASK \
(PSCI_0_2_POWER_STATE_ID_MASK | \
PSCI_0_2_POWER_STATE_TYPE_MASK | \
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 5b5dcf176aa6..8fe681a7b43d 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -29,6 +29,16 @@ bool psci_has_osi_support(void);
*/
int psci_driver_version(void);

+enum psci_function {
+ PSCI_FN_CPU_SUSPEND,
+ PSCI_FN_CPU_ON,
+ PSCI_FN_CPU_OFF,
+ PSCI_FN_MIGRATE,
+ PSCI_FN_MAX,
+};
+
+u32 psci_get_function_id(enum psci_function fn);
+
struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:34:55

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 04/24] kvm: arm64: Initialize MAIR_EL2 using a constant

MAIR_EL2 is currently initialized to the value of MAIR_EL1, which itself
is set to a constant MAIR_EL1_SET.

Initialize MAIR_EL2 to the MAIR_EL1_SET constant directly in preparation
of allowing KVM to start CPU cores itself and not initializing itself
before ERETing to EL1. In that case, MAIR_EL2 will be initialized before
MAIR_EL1.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index b11a9d7db677..96e70f976ff5 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -111,7 +111,7 @@ alternative_else_nop_endif

msr tcr_el2, x0

- mrs x0, mair_el1
+ mov_q x0, MAIR_EL1_SET
msr mair_el2, x0
isb

--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:35:06

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 05/24] kvm: arm64: Add .hyp.data..ro_after_init ELF section

Add rules for renaming the .data..ro_after_init ELF section in KVM nVHE
object files to .hyp.data..ro_after_init, linking it into the kernel
and mapping it in hyp at runtime.

The section is RW to the host, then mapped RO in hyp. The expectation is
that the host populates the variables in the section and they are never
changed by hyp afterwards.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/kernel/vmlinux.lds.S | 10 ++++++++++
arch/arm64/kvm/arm.c | 8 ++++++++
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
4 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 3994169985ef..8ff579361731 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -11,6 +11,7 @@ extern char __alt_instructions[], __alt_instructions_end[];
extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
extern char __hyp_text_start[], __hyp_text_end[];
+extern char __hyp_data_ro_after_init_start[], __hyp_data_ro_after_init_end[];
extern char __idmap_text_start[], __idmap_text_end[];
extern char __initdata_begin[], __initdata_end[];
extern char __inittext_begin[], __inittext_end[];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 1bda604f4c70..4382b5d0645d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -30,6 +30,13 @@ jiffies = jiffies_64;
*(__kvm_ex_table) \
__stop___kvm_ex_table = .;

+#define HYPERVISOR_DATA_SECTIONS \
+ HYP_SECTION_NAME(.data..ro_after_init) : { \
+ __hyp_data_ro_after_init_start = .; \
+ *(HYP_SECTION_NAME(.data..ro_after_init)) \
+ __hyp_data_ro_after_init_end = .; \
+ }
+
#define HYPERVISOR_PERCPU_SECTION \
. = ALIGN(PAGE_SIZE); \
HYP_SECTION_NAME(.data..percpu) : { \
@@ -37,6 +44,7 @@ jiffies = jiffies_64;
}
#else /* CONFIG_KVM */
#define HYPERVISOR_EXTABLE
+#define HYPERVISOR_DATA_SECTIONS
#define HYPERVISOR_PERCPU_SECTION
#endif

@@ -234,6 +242,8 @@ SECTIONS
_sdata = .;
RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)

+ HYPERVISOR_DATA_SECTIONS
+
/*
* Data written with the MMU off but read with the MMU on requires
* cache lines to be invalidated, discarding up to a Cache Writeback
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5750ec34960e..9ba9db2aa7f8 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1602,6 +1602,14 @@ static int init_hyp_mode(void)
goto out_err;
}

+ err = create_hyp_mappings(kvm_ksym_ref(__hyp_data_ro_after_init_start),
+ kvm_ksym_ref(__hyp_data_ro_after_init_end),
+ PAGE_HYP_RO);
+ if (err) {
+ kvm_err("Cannot map .hyp.data..ro_after_init section\n");
+ goto out_err;
+ }
+
err = create_hyp_mappings(kvm_ksym_ref(__start_rodata),
kvm_ksym_ref(__end_rodata), PAGE_HYP_RO);
if (err) {
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
index bb2d986ff696..5d76ff2ba63e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
@@ -16,4 +16,5 @@ SECTIONS {
HYP_SECTION_NAME(.data..percpu) : {
PERCPU_INPUT(L1_CACHE_BYTES)
}
+ HYP_SECTION(.data..ro_after_init)
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:35:11

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 06/24] kvm: arm64: Support per_cpu_ptr in nVHE hyp code

When compiling with __KVM_NVHE_HYPERVISOR__ redefine per_cpu_offset() to
__hyp_per_cpu_offset() which looks up the base of the nVHE per-CPU
region of the given cpu and computes its offset from the
.hyp.data..percpu section.

This enables use of per_cpu_ptr() helpers in nVHE hyp code. Until now
only this_cpu_ptr() was supported by setting TPIDR_EL2.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/percpu.h | 6 ++++++
arch/arm64/kernel/image-vars.h | 3 +++
arch/arm64/kvm/hyp/nvhe/Makefile | 3 ++-
arch/arm64/kvm/hyp/nvhe/percpu.c | 22 ++++++++++++++++++++++
4 files changed, 33 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/percpu.c

diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 1599e17379d8..8f1661603b78 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -239,6 +239,12 @@ PERCPU_RET_OP(add, add, ldadd)
#define this_cpu_cmpxchg_8(pcp, o, n) \
_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)

+#ifdef __KVM_NVHE_HYPERVISOR__
+extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
+#define __per_cpu_offset
+#define per_cpu_offset(cpu) __hyp_per_cpu_offset((cpu))
+#endif
+
#include <asm-generic/percpu.h>

/* Redefine macros for nVHE hyp under DEBUG_PREEMPT to avoid its dependencies. */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index c615b285ff5b..78a42a7cdb72 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -103,6 +103,9 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
KVM_NVHE_ALIAS(__start___kvm_ex_table);
KVM_NVHE_ALIAS(__stop___kvm_ex_table);

+/* Array containing bases of nVHE per-CPU memory regions. */
+KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
+
#endif /* CONFIG_KVM */

#endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index ddde15fe85f2..c45f440cce51 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -6,7 +6,8 @@
asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__

-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o hyp-main.o
+obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
+ hyp-main.o percpu.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o

diff --git a/arch/arm64/kvm/hyp/nvhe/percpu.c b/arch/arm64/kvm/hyp/nvhe/percpu.c
new file mode 100644
index 000000000000..5fd0c5696907
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/percpu.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: David Brazdil <[email protected]>
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+unsigned long __hyp_per_cpu_offset(unsigned int cpu)
+{
+ unsigned long *cpu_base_array;
+ unsigned long this_cpu_base;
+
+ if (cpu >= ARRAY_SIZE(kvm_arm_hyp_percpu_base))
+ hyp_panic();
+
+ cpu_base_array = kern_hyp_va(&kvm_arm_hyp_percpu_base[0]);
+ this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
+ return this_cpu_base - (unsigned long)&__per_cpu_start;
+}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:35:29

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 14/24] kvm: arm64: Add function to enter host from KVM nVHE hyp code

All nVHE hyp code is currently executed as handlers of host's HVCs. This
will change as nVHE starts intercepting host's PSCI CPU_ON SMCs. The
newly booted CPU will need to initialize EL2 state and then enter the
host. Add __host_enter function that branches into the existing
host state-restoring code after the trap handler would have returned.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/host.S | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index ed27f06a31ba..ff04d7115eab 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -41,6 +41,7 @@ SYM_FUNC_START(__host_exit)
bl handle_trap

/* Restore host regs x0-x17 */
+__host_enter_restore_full:
ldp x0, x1, [x29, #CPU_XREG_OFFSET(0)]
ldp x2, x3, [x29, #CPU_XREG_OFFSET(2)]
ldp x4, x5, [x29, #CPU_XREG_OFFSET(4)]
@@ -63,6 +64,14 @@ __host_enter_without_restoring:
sb
SYM_FUNC_END(__host_exit)

+/*
+ * void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+ */
+SYM_FUNC_START(__host_enter)
+ mov x29, x0
+ b __host_enter_restore_full
+SYM_FUNC_END(__host_enter)
+
/*
* void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
*/
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:35:32

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 15/24] kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2

Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
with the version used by the host's PSCI driver and the function IDs it
was configured with. If the SMC function ID matches one of the
configured PSCI calls (for v0.1) or falls into the PSCI function ID
range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
SMCs return PSCI_RET_NOT_SUPPORTED.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_hyp.h | 4 ++
arch/arm64/kvm/arm.c | 13 ++++
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++
arch/arm64/kvm/hyp/nvhe/psci.c | 102 +++++++++++++++++++++++++++++
include/uapi/linux/psci.h | 1 +
6 files changed, 125 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/psci.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index a3289071f3d8..95a2bbbcc7e1 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -96,6 +96,10 @@ void deactivate_traps_vhe_put(void);

u64 __guest_enter(struct kvm_vcpu *vcpu);

+#ifdef __KVM_NVHE_HYPERVISOR__
+bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
+#endif
+
void __noreturn hyp_panic(void);
#ifdef __KVM_NVHE_HYPERVISOR__
void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1a57b6025937..28e3bc056225 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -19,6 +19,7 @@
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
#include <linux/sched/stat.h>
+#include <linux/psci.h>
#include <trace/events/kvm.h>

#define CREATE_TRACE_POINTS
@@ -1498,6 +1499,17 @@ static void init_cpu_logical_map(void)
CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
}

+static void init_psci(void)
+{
+ extern u32 kvm_nvhe_sym(kvm_host_psci_version);
+ extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
+ int i;
+
+ CHOOSE_NVHE_SYM(kvm_host_psci_version) = psci_driver_version();
+ for (i = 0; i < PSCI_FN_MAX; ++i)
+ CHOOSE_NVHE_SYM(kvm_host_psci_function_id)[i] = psci_get_function_id(i);
+}
+
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@@ -1677,6 +1689,7 @@ static int init_hyp_mode(void)
}

init_cpu_logical_map();
+ init_psci();

return 0;

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index c45f440cce51..647b63337a51 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -7,7 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__

obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o percpu.o
+ hyp-main.o percpu.o psci.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 8661bc7deaa9..69f34d4f2773 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -134,6 +134,10 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
*/
skip_host_instruction();

+ /* Try to handle host's PSCI SMCs. */
+ if (kvm_host_psci_handler(host_ctxt))
+ return;
+
/* Forward SMC not handled in EL2 to EL3. */
forward_host_smc(host_ctxt);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
new file mode 100644
index 000000000000..82d3b2c89658
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: David Brazdil <[email protected]>
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <kvm/arm_hypercalls.h>
+#include <linux/arm-smccc.h>
+#include <linux/psci.h>
+#include <kvm/arm_psci.h>
+#include <uapi/linux/psci.h>
+
+/* Config options set by the host. */
+u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
+u32 kvm_host_psci_function_id[PSCI_FN_MAX];
+
+static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
+{
+ return host_ctxt->regs.regs[0];
+}
+
+static bool is_psci_0_1_call(u64 func_id)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(kvm_host_psci_function_id); ++i) {
+ if (func_id == kvm_host_psci_function_id[i])
+ return true;
+ }
+ return false;
+}
+
+static bool is_psci_0_2_fn_call(u64 func_id)
+{
+ u64 base = func_id & ~PSCI_0_2_FN_ID_MASK;
+
+ return base == PSCI_0_2_FN_BASE || base == PSCI_0_2_FN64_BASE;
+}
+
+static bool is_psci_call(u64 func_id)
+{
+ if (kvm_host_psci_version == PSCI_VERSION(0, 0))
+ return false;
+ else if (kvm_host_psci_version == PSCI_VERSION(0, 1))
+ return is_psci_0_1_call(func_id);
+ else
+ return is_psci_0_2_fn_call(func_id);
+}
+
+static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ return PSCI_RET_NOT_SUPPORTED;
+}
+
+static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ switch (func_id) {
+ default:
+ return PSCI_RET_NOT_SUPPORTED;
+ }
+}
+
+static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ int ret;
+
+ ret = psci_0_2_handler(func_id, host_ctxt);
+ if (ret != PSCI_RET_NOT_SUPPORTED)
+ return ret;
+
+ switch (func_id) {
+ default:
+ return PSCI_RET_NOT_SUPPORTED;
+ }
+}
+
+bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
+{
+ u64 func_id = get_psci_func_id(host_ctxt);
+ unsigned long ret;
+
+ if (!is_psci_call(func_id))
+ return false;
+
+ if (kvm_host_psci_version == PSCI_VERSION(0, 1))
+ ret = psci_0_1_handler(func_id, host_ctxt);
+ else if (kvm_host_psci_version == PSCI_VERSION(0, 2))
+ ret = psci_0_2_handler(func_id, host_ctxt);
+ else if (PSCI_VERSION_MAJOR(kvm_host_psci_version) >= 1)
+ ret = psci_1_0_handler(func_id, host_ctxt);
+ else
+ ret = PSCI_RET_NOT_SUPPORTED;
+
+ host_ctxt->regs.regs[0] = ret;
+ host_ctxt->regs.regs[1] = 0;
+ host_ctxt->regs.regs[2] = 0;
+ host_ctxt->regs.regs[3] = 0;
+ return true;
+}
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 2fcad1dd0b0e..0d52b8dbe8c2 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -29,6 +29,7 @@
#define PSCI_0_2_FN64_BASE \
(PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
#define PSCI_0_2_FN64(n) (PSCI_0_2_FN64_BASE + (n))
+#define PSCI_0_2_FN_ID_MASK 0xffff

#define PSCI_0_2_FN_PSCI_VERSION PSCI_0_2_FN(0)
#define PSCI_0_2_FN_CPU_SUSPEND PSCI_0_2_FN(1)
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:35:39

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 16/24] kvm: arm64: Add offset for hyp VA <-> PA conversion

Add a host-initialized constant to KVM nVHE hyp code for converting
between EL2 linear map virtual addresses and physical addresses.
Also add `__hyp_pa` macro that performs the conversion.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 15 +++++++++++++++
arch/arm64/kvm/hyp/nvhe/psci.c | 3 +++
2 files changed, 18 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 28e3bc056225..dc7d43d7785a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1484,6 +1484,20 @@ static inline void hyp_cpu_pm_exit(void)
}
#endif

+static void init_hyp_physvirt_offset(void)
+{
+ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
+ unsigned long kern_vaddr, hyp_vaddr, paddr;
+
+ /* Check that kvm_arm_hyp_percpu_base has been set. */
+ BUG_ON(kvm_arm_hyp_percpu_base[0] == 0);
+
+ kern_vaddr = kvm_arm_hyp_percpu_base[0];
+ hyp_vaddr = kern_hyp_va(kern_vaddr);
+ paddr = __pa(kern_vaddr);
+ CHOOSE_NVHE_SYM(hyp_physvirt_offset) = (s64)paddr - (s64)hyp_vaddr;
+}
+
static void init_cpu_logical_map(void)
{
extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
@@ -1688,6 +1702,7 @@ static int init_hyp_mode(void)
}
}

+ init_hyp_physvirt_offset();
init_cpu_logical_map();
init_psci();

diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 82d3b2c89658..b0b5df590ba5 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -16,6 +16,9 @@
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
+s64 hyp_physvirt_offset;
+
+#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)

static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:35:53

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 17/24] kvm: arm64: Add __hyp_pa_symbol helper macro

Add helper macro for computing the PA of a kernel symbol in nVHE hyp
code. This will be useful for computing the PA of a PSCI CPU_ON entry
point.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/psci.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index b0b5df590ba5..7510b9e174e9 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -20,6 +20,16 @@ s64 hyp_physvirt_offset;

#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)

+#define __hyp_pa_symbol(sym) \
+ ({ \
+ extern char sym[]; \
+ unsigned long kern_va; \
+ \
+ asm volatile("ldr %0, =%1" : "=r" (kern_va) \
+ : "S" (sym)); \
+ kern_va - kimage_voffset; \
+ })
+
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
return host_ctxt->regs.regs[0];
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:36:05

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 21/24] kvm: arm64: Add kvm-arm.protected early kernel parameter

Add an early parameter that allows users to opt into protected KVM mode
when using the nVHE hypervisor. In this mode, guest state will be kept
private from the host. This will primarily involve enabling stage-2
address translation for the host, restricting DMA to host memory, and
filtering host SMCs.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/virt.h | 9 +++++++++
arch/arm64/kvm/arm.c | 23 ++++++++++++++++++++++-
2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 6069be50baf9..2c3124512c00 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -65,6 +65,8 @@ extern u32 __boot_cpu_mode[2];
void __hyp_set_vectors(phys_addr_t phys_vector_base);
void __hyp_reset_vectors(void);

+DECLARE_STATIC_KEY_FALSE(kvm_protected_mode);
+
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
@@ -97,6 +99,13 @@ static __always_inline bool has_vhe(void)
return cpus_have_final_cap(ARM64_HAS_VIRT_HOST_EXTN);
}

+static __always_inline bool is_kvm_protected_mode(void)
+{
+ return IS_ENABLED(CONFIG_KVM) &&
+ (is_nvhe_hyp_code() || !is_kernel_in_hyp_mode()) &&
+ static_branch_likely(&kvm_protected_mode);
+}
+
#endif /* __ASSEMBLY__ */

#endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a931253ebb61..452a01afaf33 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -47,6 +47,8 @@
__asm__(".arch_extension virt");
#endif

+DEFINE_STATIC_KEY_FALSE(kvm_protected_mode);
+
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);

static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
@@ -1796,6 +1798,11 @@ int kvm_arch_init(void *opaque)
return -ENODEV;
}

+ if (in_hyp_mode && static_branch_unlikely(&kvm_protected_mode)) {
+ kvm_pr_unimpl("VHE protected mode unsupported, not initializing\n");
+ return -ENODEV;
+ }
+
if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
cpus_have_final_cap(ARM64_WORKAROUND_1508412))
kvm_info("Guests without required CPU erratum workarounds can deadlock system!\n" \
@@ -1827,7 +1834,9 @@ int kvm_arch_init(void *opaque)
if (err)
goto out_hyp;

- if (in_hyp_mode)
+ if (is_kvm_protected_mode())
+ kvm_info("Protected nVHE mode initialized successfully\n");
+ else if (in_hyp_mode)
kvm_info("VHE mode initialized successfully\n");
else
kvm_info("Hyp mode initialized successfully\n");
@@ -1848,6 +1857,18 @@ void kvm_arch_exit(void)
kvm_perf_teardown();
}

+static int __init early_kvm_protected_cfg(char *buf)
+{
+ bool val;
+ int err;
+
+ err = strtobool(buf, &val);
+ if (!err && val)
+ static_branch_enable(&kvm_protected_mode);
+ return err;
+}
+early_param("kvm-arm.protected", early_kvm_protected_cfg);
+
static int arm_init(void)
{
int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:37:21

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 11/24] kvm: arm64: Add SMC handler in nVHE EL2

Add handler of host SMCs in KVM nVHE trap handler. Forward all SMCs to
EL3 and propagate the result back to EL1. This is done in preparation
for validating host SMCs in KVM nVHE protected mode.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 35 ++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 19332c20fcde..8661bc7deaa9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -106,6 +106,38 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
host_ctxt->regs.regs[1] = ret;
}

+static void skip_host_instruction(void)
+{
+ write_sysreg_el2(read_sysreg_el2(SYS_ELR) + 4, SYS_ELR);
+}
+
+static void forward_host_smc(struct kvm_cpu_context *host_ctxt)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_smc(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
+ host_ctxt->regs.regs[2], host_ctxt->regs.regs[3],
+ host_ctxt->regs.regs[4], host_ctxt->regs.regs[5],
+ host_ctxt->regs.regs[6], host_ctxt->regs.regs[7],
+ &res);
+ host_ctxt->regs.regs[0] = res.a0;
+ host_ctxt->regs.regs[1] = res.a1;
+ host_ctxt->regs.regs[2] = res.a2;
+ host_ctxt->regs.regs[3] = res.a3;
+}
+
+static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
+{
+ /*
+ * Unlike HVC, the return address of an SMC is the instruction's PC.
+ * Move the return address past the instruction.
+ */
+ skip_host_instruction();
+
+ /* Forward SMC not handled in EL2 to EL3. */
+ forward_host_smc(host_ctxt);
+}
+
void handle_trap(struct kvm_cpu_context *host_ctxt)
{
u64 esr = read_sysreg_el2(SYS_ESR);
@@ -114,6 +146,9 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
case ESR_ELx_EC_HVC64:
handle_host_hcall(host_ctxt);
break;
+ case ESR_ELx_EC_SMC64:
+ handle_host_smc(host_ctxt);
+ break;
default:
hyp_panic();
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:37:57

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 20/24] kvm: arm64: Intercept host's CPU_SUSPEND PSCI SMCs

Add a handler of CPU_SUSPEND host PSCI SMCs. The SMC can either enter
a sleep state indistinguishable from a WFI or a deeper sleep state that
behaves like a CPU_OFF+CPU_ON.

The handler saves r0,pc of the host and makes the same call to EL3 with
the hyp CPU entry point. It either returns back to the handler and then
back to the host, or wakes up into the entry point and initializes EL2
state before dropping back to EL1.

There is a simple atomic lock around the reset state struct to protect
from races with CPU_ON. A well-behaved host should never run CPU_ON
against an already online core, and the kernel indeed does not allow
that, so if the core sees its reset state struct locked, it will return
a non-spec error code PENDING_ON. This protects the hypervisor state and
avoids the need for more complicated locking and/or tracking power state
of individual cores.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/psci.c | 39 +++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index f9b82a87bf44..ec00036a1613 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -127,6 +127,39 @@ static void release_reset_state(struct kvm_host_psci_state *cpu_state)
atomic_set_release(&cpu_state->pending_on, 0);
}

+static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ u64 power_state = host_ctxt->regs.regs[1];
+ unsigned long pc = host_ctxt->regs.regs[2];
+ unsigned long r0 = host_ctxt->regs.regs[3];
+ struct kvm_host_psci_state *cpu_state;
+ struct kvm_nvhe_init_params *cpu_params;
+ int ret;
+
+ cpu_state = this_cpu_ptr(&kvm_host_psci_state);
+ cpu_params = this_cpu_ptr(&kvm_init_params);
+
+ /*
+ * Lock the reset state struct. This fails if the host has concurrently
+ * called CPU_ON with this CPU as target. The kernel keeps track of
+ * online CPUs, so that should never happen. If it does anyway, return
+ * a non-spec error. This avoids the need for spinlocks.
+ */
+ if (!try_acquire_reset_state(cpu_state, pc, r0))
+ return PSCI_RET_ALREADY_ON;
+
+ /*
+ * Will either return if shallow sleep state, or wake up into the entry
+ * point if it is a deep sleep state.
+ */
+ ret = psci_call(func_id, power_state,
+ __hyp_pa_symbol(__kvm_hyp_cpu_entry),
+ __hyp_pa(cpu_params));
+
+ release_reset_state(cpu_state);
+ return ret;
+}
+
static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
u64 mpidr = host_ctxt->regs.regs[1];
@@ -180,7 +213,9 @@ asmlinkage void __noreturn kvm_host_psci_cpu_entry(void)

static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
- if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
+ if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_SUSPEND])
+ return psci_cpu_suspend(func_id, host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
return psci_forward(host_ctxt);
else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_ON])
return psci_cpu_on(func_id, host_ctxt);
@@ -204,6 +239,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
case PSCI_0_2_FN_SYSTEM_RESET:
psci_forward_noreturn(host_ctxt);
unreachable();
+ case PSCI_0_2_FN64_CPU_SUSPEND:
+ return psci_cpu_suspend(func_id, host_ctxt);
case PSCI_0_2_FN64_CPU_ON:
return psci_cpu_on(func_id, host_ctxt);
default:
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:39:25

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 18/24] kvm: arm64: Forward safe PSCI SMCs coming from host

Forward the following PSCI SMCs issued by host to EL3 as they do not
require the hypervisor's intervention. This assumes that EL3 correctly
implements the PSCI specification.

Only function IDs implemented in Linux are included.

Where both 32-bit and 64-bit variants exist, it is assumed that the host
will always use the 64-bit variant.

* SMCs that only return information about the system
* PSCI_VERSION - PSCI version implemented by EL3
* PSCI_FEATURES - optional features supported by EL3
* AFFINITY_INFO - power state of core/cluster
* MIGRATE_INFO_TYPE - whether Trusted OS can be migrated
* MIGRATE_INFO_UP_CPU - resident core of Trusted OS
* operations which do not affect the hypervisor
* MIGRATE - migrate Trusted OS to a different core
* SET_SUSPEND_MODE - toggle OS-initiated mode
* system shutdown/reset
* SYSTEM_OFF
* SYSTEM_RESET
* SYSTEM_RESET2

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/psci.c | 43 +++++++++++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 7510b9e174e9..05a34a152069 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -63,14 +63,51 @@ static bool is_psci_call(u64 func_id)
return is_psci_0_2_fn_call(func_id);
}

+static unsigned long psci_call(unsigned long fn, unsigned long arg0,
+ unsigned long arg1, unsigned long arg2)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_smc(fn, arg0, arg1, arg2, &res);
+ return res.a0;
+}
+
+static unsigned long psci_forward(struct kvm_cpu_context *host_ctxt)
+{
+ return psci_call(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
+ host_ctxt->regs.regs[2], host_ctxt->regs.regs[3]);
+}
+
+static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *host_ctxt)
+{
+ psci_forward(host_ctxt);
+ hyp_panic(); /* unreachable */
+}
+
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
- return PSCI_RET_NOT_SUPPORTED;
+ if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
+ return psci_forward(host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
+ return psci_forward(host_ctxt);
+ else
+ return PSCI_RET_NOT_SUPPORTED;
}

static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
switch (func_id) {
+ case PSCI_0_2_FN_PSCI_VERSION:
+ case PSCI_0_2_FN_CPU_OFF:
+ case PSCI_0_2_FN64_AFFINITY_INFO:
+ case PSCI_0_2_FN64_MIGRATE:
+ case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
+ case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
+ return psci_forward(host_ctxt);
+ case PSCI_0_2_FN_SYSTEM_OFF:
+ case PSCI_0_2_FN_SYSTEM_RESET:
+ psci_forward_noreturn(host_ctxt);
+ unreachable();
default:
return PSCI_RET_NOT_SUPPORTED;
}
@@ -85,6 +122,10 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
return ret;

switch (func_id) {
+ case PSCI_1_0_FN_PSCI_FEATURES:
+ case PSCI_1_0_FN_SET_SUSPEND_MODE:
+ case PSCI_1_1_FN64_SYSTEM_RESET2:
+ return psci_forward(host_ctxt);
default:
return PSCI_RET_NOT_SUPPORTED;
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:39:32

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 08/24] kvm: arm64: Move hyp-init params to a per-CPU struct

Once we start initializing KVM on newly booted cores before the rest of
the kernel, parameters to __do_hyp_init will need to be provided by EL2
rather than EL1. At that point it will not be possible to pass its four
arguments directly because PSCI_CPU_ON only supports one context
argument.

Refactor __do_hyp_init to accept its parameters in a struct. This
prepares the code for KVM booting cores as well as removes any limits on
the number of __do_hyp_init arguments.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 7 +++++++
arch/arm64/include/asm/kvm_hyp.h | 4 ++++
arch/arm64/kernel/asm-offsets.c | 4 ++++
arch/arm64/kvm/arm.c | 26 ++++++++++++++------------
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 21 ++++++++++-----------
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 ++
6 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 54387ccd1ab2..a49a87a186c3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -150,6 +150,13 @@ extern void *__vhe_undefined_symbol;

#endif

+struct kvm_nvhe_init_params {
+ phys_addr_t pgd_ptr;
+ unsigned long tpidr_el2;
+ unsigned long hyp_stack_ptr;
+ unsigned long vector_ptr;
+};
+
/* Translate a kernel address @ptr into its equivalent linear mapping */
#define kvm_ksym_ref(ptr) \
({ \
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 6b664de5ec1f..a3289071f3d8 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -15,6 +15,10 @@
DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);

+#ifdef __KVM_NVHE_HYPERVISOR__
+DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+#endif
+
#define read_sysreg_elx(r,nvh,vh) \
({ \
u64 reg; \
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 7d32fc959b1a..0cbb86135c7c 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -110,6 +110,10 @@ int main(void)
DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1]));
DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data, host_ctxt));
+ DEFINE(NVHE_INIT_PGD_PTR, offsetof(struct kvm_nvhe_init_params, pgd_ptr));
+ DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params, tpidr_el2));
+ DEFINE(NVHE_INIT_STACK_PTR, offsetof(struct kvm_nvhe_init_params, hyp_stack_ptr));
+ DEFINE(NVHE_INIT_VECTOR_PTR, offsetof(struct kvm_nvhe_init_params, vector_ptr));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b85b4294b72d..1a57b6025937 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -50,6 +50,7 @@ DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);

static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);

/* The VMID used in the VTTBR */
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
@@ -1331,10 +1332,7 @@ static int kvm_map_vectors(void)

static void cpu_init_hyp_mode(void)
{
- phys_addr_t pgd_ptr;
- unsigned long hyp_stack_ptr;
- unsigned long vector_ptr;
- unsigned long tpidr_el2;
+ struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
struct arm_smccc_res res;

/* Switch from the HYP stub to our own HYP init vector */
@@ -1345,13 +1343,18 @@ static void cpu_init_hyp_mode(void)
* kernel's mapping to the linear mapping, and store it in tpidr_el2
* so that we can use adr_l to access per-cpu variables in EL2.
*/
- tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
- (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
+ params->tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
+ (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));

- pgd_ptr = kvm_mmu_get_httbr();
- hyp_stack_ptr = __this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE;
- hyp_stack_ptr = kern_hyp_va(hyp_stack_ptr);
- vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
+ params->pgd_ptr = kvm_mmu_get_httbr();
+ params->vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
+ params->hyp_stack_ptr = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+
+ /*
+ * Flush the init params from the data cache because the struct will
+ * be read from while the MMU is off.
+ */
+ __flush_dcache_area(params, sizeof(*params));

/*
* Call initialization code, and switch to the full blown HYP code.
@@ -1360,8 +1363,7 @@ static void cpu_init_hyp_mode(void)
* cpus_have_const_cap() wrapper.
*/
BUG_ON(!system_capabilities_finalized());
- arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init),
- pgd_ptr, tpidr_el2, hyp_stack_ptr, vector_ptr, &res);
+ arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
WARN_ON(res.a0 != SMCCC_RET_SUCCESS);

/*
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 96e70f976ff5..6f3ac5d428ec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -47,10 +47,7 @@ __invalid:

/*
* x0: SMCCC function ID
- * x1: HYP pgd
- * x2: per-CPU offset
- * x3: HYP stack
- * x4: HYP vectors
+ * x1: struct kvm_nvhe_init_params PA
*/
__do_hyp_init:
/* Check for a stub HVC call */
@@ -71,10 +68,16 @@ __do_hyp_init:
mov x0, #SMCCC_RET_NOT_SUPPORTED
eret

-1:
- /* Set tpidr_el2 for use by HYP to free a register */
- msr tpidr_el2, x2
+1: ldr x0, [x1, #NVHE_INIT_TPIDR_EL2]
+ msr tpidr_el2, x0

+ ldr x0, [x1, #NVHE_INIT_STACK_PTR]
+ mov sp, x0
+
+ ldr x0, [x1, #NVHE_INIT_VECTOR_PTR]
+ msr vbar_el2, x0
+
+ ldr x1, [x1, #NVHE_INIT_PGD_PTR]
phys_to_ttbr x0, x1
alternative_if ARM64_HAS_CNP
orr x0, x0, #TTBR_CNP_BIT
@@ -134,10 +137,6 @@ alternative_else_nop_endif
msr sctlr_el2, x0
isb

- /* Set the stack and new vectors */
- mov sp, x3
- msr vbar_el2, x4
-
/* Hello, World! */
mov x0, #SMCCC_RET_SUCCESS
eret
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e2eafe2c93af..411b0f652417 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -14,6 +14,8 @@

#include <kvm/arm_hypercalls.h>

+DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+
static void handle_host_hcall(unsigned long func_id,
struct kvm_cpu_context *host_ctxt)
{
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:39:37

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 09/24] kvm: arm64: Refactor handle_trap to use a switch

Small refactor so that nVHE's handle_trap uses a switch on the Exception
Class value of ESR_EL2 in preparation for adding a handler of SMC32/64.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 411b0f652417..19332c20fcde 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -16,9 +16,9 @@

DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);

-static void handle_host_hcall(unsigned long func_id,
- struct kvm_cpu_context *host_ctxt)
+static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
{
+ unsigned long func_id = host_ctxt->regs.regs[0];
unsigned long ret = 0;

switch (func_id) {
@@ -109,11 +109,12 @@ static void handle_host_hcall(unsigned long func_id,
void handle_trap(struct kvm_cpu_context *host_ctxt)
{
u64 esr = read_sysreg_el2(SYS_ESR);
- unsigned long func_id;

- if (ESR_ELx_EC(esr) != ESR_ELx_EC_HVC64)
+ switch (ESR_ELx_EC(esr)) {
+ case ESR_ELx_EC_HVC64:
+ handle_host_hcall(host_ctxt);
+ break;
+ default:
hyp_panic();
-
- func_id = host_ctxt->regs.regs[0];
- handle_host_hcall(func_id, host_ctxt);
+ }
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:39:38

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 10/24] kvm: arm64: Extract parts of el2_setup into a macro

When the a CPU is booted in EL2, the kernel checks for VHE support and
initializes the CPU core accordingly. For nVHE it also installs the stub
vectors and drops down to EL1.

Once KVM gains the ability to boot cores without going through the
kernel entry point, it will need to initialize the CPU the same way.
Extract the relevant bits of el2_setup into init_el2_state macro
with an argument specifying whether to initialize for VHE or nVHE.

No functional change. Size of el2_setup increased by 148 bytes due
to duplication.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 128 ++++++++++++++++++++++++++++
arch/arm64/kernel/head.S | 140 +++----------------------------
2 files changed, 141 insertions(+), 127 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a49a87a186c3..893327d1e449 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -331,6 +331,134 @@ extern char __smccc_workaround_1_smc[__SMCCC_WORKAROUND_1_SMC_SZ];
msr sp_el0, \tmp
.endm

+.macro init_el2_state mode
+
+.ifnes "\mode", "vhe"
+.ifnes "\mode", "nvhe"
+.error "Invalid 'mode' argument"
+.endif
+.endif
+
+ mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
+ msr sctlr_el2, x0
+ isb
+
+ /*
+ * Allow Non-secure EL1 and EL0 to access physical timer and counter.
+ * This is not necessary for VHE, since the host kernel runs in EL2,
+ * and EL0 accesses are configured in the later stage of boot process.
+ * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
+ * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
+ * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
+ * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
+ * EL2.
+ */
+.ifeqs "\mode", "nvhe"
+ mrs x0, cnthctl_el2
+ orr x0, x0, #3 // Enable EL1 physical timers
+ msr cnthctl_el2, x0
+.endif
+ msr cntvoff_el2, xzr // Clear virtual offset
+
+#ifdef CONFIG_ARM_GIC_V3
+ /* GICv3 system register access */
+ mrs x0, id_aa64pfr0_el1
+ ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
+ cbz x0, 3f
+
+ mrs_s x0, SYS_ICC_SRE_EL2
+ orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
+ orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
+ msr_s SYS_ICC_SRE_EL2, x0
+ isb // Make sure SRE is now set
+ mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
+ tbz x0, #0, 3f // and check that it sticks
+ msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
+3:
+#endif
+
+ /* Populate ID registers. */
+ mrs x0, midr_el1
+ mrs x1, mpidr_el1
+ msr vpidr_el2, x0
+ msr vmpidr_el2, x1
+
+#ifdef CONFIG_COMPAT
+ msr hstr_el2, xzr // Disable CP15 traps to EL2
+#endif
+
+ /* EL2 debug */
+ mrs x1, id_aa64dfr0_el1
+ sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
+ cmp x0, #1
+ b.lt 4f // Skip if no PMU present
+ mrs x0, pmcr_el0 // Disable debug access traps
+ ubfx x0, x0, #11, #5 // to EL2 and allow access to
+4:
+ csel x3, xzr, x0, lt // all PMU counters from EL1
+
+ /* Statistical profiling */
+ ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
+ cbz x0, 7f // Skip if SPE not present
+.ifeqs "\mode", "nvhe"
+ mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
+ and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
+ cbnz x4, 5f // then permit sampling of physical
+ mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
+ 1 << SYS_PMSCR_EL2_PA_SHIFT)
+ msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
+5:
+ mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
+ orr x3, x3, x1 // If we don't have VHE, then
+ b 7f // use EL1&0 translation.
+.endif
+ orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
+7:
+ msr mdcr_el2, x3 // Configure debug traps
+
+ /* LORegions */
+ mrs x1, id_aa64mmfr1_el1
+ ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
+ cbz x0, 1f
+ msr_s SYS_LORC_EL1, xzr
+1:
+
+ /* Stage-2 translation */
+ msr vttbr_el2, xzr
+
+.ifeqs "\mode", "nvhe"
+ /*
+ * When VHE is not in use, early init of EL2 and EL1 needs to be
+ * done here.
+ * When VHE _is_ in use, EL1 will not be used in the host and
+ * requires no configuration, and all non-hyp-specific EL2 setup
+ * will be done via the _EL1 system register aliases in __cpu_setup.
+ */
+ mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
+ msr sctlr_el1, x0
+
+ /* Coprocessor traps. */
+ mov x0, #0x33ff
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+
+ /* SVE register access */
+ mrs x1, id_aa64pfr0_el1
+ ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
+ cbz x1, 7f
+
+ bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+ isb
+ mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
+ msr_s SYS_ZCR_EL2, x1 // length for EL1.
+
+ /* spsr */
+7: mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
+ PSR_MODE_EL1h)
+ msr spsr_el2, x0
+.endif
+.endm
+
#endif

#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d8d9caf02834..e7270b63abed 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -25,6 +25,7 @@
#include <asm/image.h>
#include <asm/kernel-pgtable.h>
#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
#include <asm/page.h>
@@ -499,153 +500,38 @@ SYM_FUNC_START(el2_setup)
isb
ret

-1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
- msr sctlr_el2, x0
-
+1:
#ifdef CONFIG_ARM64_VHE
/*
- * Check for VHE being present. For the rest of the EL2 setup,
- * x2 being non-zero indicates that we do have VHE, and that the
- * kernel is intended to run at EL2.
+ * Check for VHE being present. x2 being non-zero indicates that we
+ * do have VHE, and that the kernel is intended to run at EL2.
*/
mrs x2, id_aa64mmfr1_el1
ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
-#else
- mov x2, xzr
-#endif
+ cbz x2, el2_setup_nvhe

- /* Hyp configuration. */
- mov_q x0, HCR_HOST_NVHE_FLAGS
- cbz x2, set_hcr
mov_q x0, HCR_HOST_VHE_FLAGS
-set_hcr:
msr hcr_el2, x0
isb

- /*
- * Allow Non-secure EL1 and EL0 to access physical timer and counter.
- * This is not necessary for VHE, since the host kernel runs in EL2,
- * and EL0 accesses are configured in the later stage of boot process.
- * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
- * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
- * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
- * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
- * EL2.
- */
- cbnz x2, 1f
- mrs x0, cnthctl_el2
- orr x0, x0, #3 // Enable EL1 physical timers
- msr cnthctl_el2, x0
-1:
- msr cntvoff_el2, xzr // Clear virtual offset
-
-#ifdef CONFIG_ARM_GIC_V3
- /* GICv3 system register access */
- mrs x0, id_aa64pfr0_el1
- ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
- cbz x0, 3f
-
- mrs_s x0, SYS_ICC_SRE_EL2
- orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
- orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
- msr_s SYS_ICC_SRE_EL2, x0
- isb // Make sure SRE is now set
- mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
- tbz x0, #0, 3f // and check that it sticks
- msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
-
-3:
-#endif
-
- /* Populate ID registers. */
- mrs x0, midr_el1
- mrs x1, mpidr_el1
- msr vpidr_el2, x0
- msr vmpidr_el2, x1
-
-#ifdef CONFIG_COMPAT
- msr hstr_el2, xzr // Disable CP15 traps to EL2
-#endif
-
- /* EL2 debug */
- mrs x1, id_aa64dfr0_el1
- sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
- cmp x0, #1
- b.lt 4f // Skip if no PMU present
- mrs x0, pmcr_el0 // Disable debug access traps
- ubfx x0, x0, #11, #5 // to EL2 and allow access to
-4:
- csel x3, xzr, x0, lt // all PMU counters from EL1
-
- /* Statistical profiling */
- ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
- cbz x0, 7f // Skip if SPE not present
- cbnz x2, 6f // VHE?
- mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
- and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
- cbnz x4, 5f // then permit sampling of physical
- mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
- 1 << SYS_PMSCR_EL2_PA_SHIFT)
- msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
-5:
- mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
- orr x3, x3, x1 // If we don't have VHE, then
- b 7f // use EL1&0 translation.
-6: // For VHE, use EL2 translation
- orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
-7:
- msr mdcr_el2, x3 // Configure debug traps
-
- /* LORegions */
- mrs x1, id_aa64mmfr1_el1
- ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
- cbz x0, 1f
- msr_s SYS_LORC_EL1, xzr
-1:
-
- /* Stage-2 translation */
- msr vttbr_el2, xzr
-
- cbz x2, install_el2_stub
+ init_el2_state vhe

mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
isb
ret
+#endif

-SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
- /*
- * When VHE is not in use, early init of EL2 and EL1 needs to be
- * done here.
- * When VHE _is_ in use, EL1 will not be used in the host and
- * requires no configuration, and all non-hyp-specific EL2 setup
- * will be done via the _EL1 system register aliases in __cpu_setup.
- */
- mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
- msr sctlr_el1, x0
-
- /* Coprocessor traps. */
- mov x0, #0x33ff
- msr cptr_el2, x0 // Disable copro. traps to EL2
-
- /* SVE register access */
- mrs x1, id_aa64pfr0_el1
- ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
- cbz x1, 7f
-
- bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
- msr cptr_el2, x0 // Disable copro. traps to EL2
+SYM_INNER_LABEL(el2_setup_nvhe, SYM_L_LOCAL)
+ mov_q x0, HCR_HOST_NVHE_FLAGS
+ msr hcr_el2, x0
isb
- mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
- msr_s SYS_ZCR_EL2, x1 // length for EL1.
+
+ init_el2_state nvhe

/* Hypervisor stub */
-7: adr_l x0, __hyp_stub_vectors
+ adr_l x0, __hyp_stub_vectors
msr vbar_el2, x0

- /* spsr */
- mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
- PSR_MODE_EL1h)
- msr spsr_el2, x0
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
eret
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:40:25

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 13/24] kvm: arm64: Add CPU entry point in nVHE hyp

When nVHE hyp starts interception host's PSCI CPU_ON SMCs, it will need
to install KVM on the newly booted CPU before returning to the host. Add
an entry point which expects the same kvm_nvhe_init_params struct as the
__kvm_hyp_init HVC in the CPU_ON context argument (x0).

The entry point initializes EL2 state with the same init_el2_state macro
used by the kernel's entry point. It then initializes KVM using the same
helper function used in the __kvm_hyp_init HVC.

When done, the entry point branches to a function provided in the init
params.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 30 ++++++++++++++++++++++++++++++
3 files changed, 32 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 893327d1e449..efb4872bb29f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -155,6 +155,7 @@ struct kvm_nvhe_init_params {
unsigned long tpidr_el2;
unsigned long hyp_stack_ptr;
unsigned long vector_ptr;
+ unsigned long psci_cpu_entry_fn;
};

/* Translate a kernel address @ptr into its equivalent linear mapping */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cbb86135c7c..ffc84e68ad97 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -114,6 +114,7 @@ int main(void)
DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params, tpidr_el2));
DEFINE(NVHE_INIT_STACK_PTR, offsetof(struct kvm_nvhe_init_params, hyp_stack_ptr));
DEFINE(NVHE_INIT_VECTOR_PTR, offsetof(struct kvm_nvhe_init_params, vector_ptr));
+ DEFINE(NVHE_INIT_PSCI_CPU_ENTRY_FN, offsetof(struct kvm_nvhe_init_params, psci_cpu_entry_fn));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 1697d25756e9..f999a35b2c8c 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -6,6 +6,7 @@

#include <linux/arm-smccc.h>
#include <linux/linkage.h>
+#include <linux/irqchip/arm-gic-v3.h>

#include <asm/alternative.h>
#include <asm/assembler.h>
@@ -159,6 +160,35 @@ alternative_else_nop_endif
ret
SYM_CODE_END(___kvm_hyp_init)

+SYM_CODE_START(__kvm_hyp_cpu_entry)
+ msr SPsel, #1 // We want to use SP_EL{1,2}
+
+ /*
+ * Check that the core was booted in EL2. Loop indefinitely if not
+ * because it cannot be safely given to the host without installing KVM.
+ */
+ mrs x1, CurrentEL
+ cmp x1, #CurrentEL_EL2
+ b.ne .
+
+ /* Initialize EL2 CPU state to sane values. */
+ mov x29, x0
+ init_el2_state nvhe
+ mov x0, x29
+
+ /*
+ * Load hyp VA of C entry function. Must do so before switching on the
+ * MMU because the struct pointer is PA and not identity-mapped in hyp.
+ */
+ ldr x29, [x0, #NVHE_INIT_PSCI_CPU_ENTRY_FN]
+
+ /* Enable MMU, set vectors and stack. */
+ bl ___kvm_hyp_init
+
+ /* Leave idmap. */
+ br x29
+SYM_CODE_END(__kvm_hyp_cpu_entry)
+
SYM_CODE_START(__kvm_handle_stub_hvc)
cmp x0, #HVC_SOFT_RESTART
b.ne 1f
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:40:26

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 12/24] kvm: arm64: Extract __do_hyp_init into a helper function

In preparation for adding a CPU entry point in nVHE hyp code, extract
most of __do_hyp_init hypervisor initialization code into a common
helper function. This will be invoked by the entry point to install KVM
on the newly booted CPU.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 39 +++++++++++++++++++++---------
1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 6f3ac5d428ec..1697d25756e9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -68,16 +68,35 @@ __do_hyp_init:
mov x0, #SMCCC_RET_NOT_SUPPORTED
eret

-1: ldr x0, [x1, #NVHE_INIT_TPIDR_EL2]
- msr tpidr_el2, x0
+1: mov x0, x1
+ mov x4, lr
+ bl ___kvm_hyp_init
+ mov lr, x4

- ldr x0, [x1, #NVHE_INIT_STACK_PTR]
- mov sp, x0
+ /* Hello, World! */
+ mov x0, #SMCCC_RET_SUCCESS
+ eret
+SYM_CODE_END(__kvm_hyp_init)
+
+/*
+ * Initialize the hypervisor in EL2.
+ *
+ * Only uses x0..x3 so as to not clobber callee-saved SMCCC registers
+ * and leave x4 for the caller.
+ *
+ * x0: struct kvm_nvhe_init_params PA
+ */
+SYM_CODE_START(___kvm_hyp_init)
+ ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
+ msr tpidr_el2, x1
+
+ ldr x1, [x0, #NVHE_INIT_STACK_PTR]
+ mov sp, x1

- ldr x0, [x1, #NVHE_INIT_VECTOR_PTR]
- msr vbar_el2, x0
+ ldr x1, [x0, #NVHE_INIT_VECTOR_PTR]
+ msr vbar_el2, x1

- ldr x1, [x1, #NVHE_INIT_PGD_PTR]
+ ldr x1, [x0, #NVHE_INIT_PGD_PTR]
phys_to_ttbr x0, x1
alternative_if ARM64_HAS_CNP
orr x0, x0, #TTBR_CNP_BIT
@@ -137,10 +156,8 @@ alternative_else_nop_endif
msr sctlr_el2, x0
isb

- /* Hello, World! */
- mov x0, #SMCCC_RET_SUCCESS
- eret
-SYM_CODE_END(__kvm_hyp_init)
+ ret
+SYM_CODE_END(___kvm_hyp_init)

SYM_CODE_START(__kvm_handle_stub_hvc)
cmp x0, #HVC_SOFT_RESTART
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:40:55

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 24/24] kvm: arm64: Fix EL2 mode availability checks

With protected nVHE hyp code interception host's PSCI CPU_ON/OFF/SUSPEND
SMCs, from the host's perspective new CPUs start booting in EL1 while
previously they would have booted in EL2. The kernel logic which keeps
track of the mode CPUs were booted in needs to be adjusted to account
for this fact.

Add a static key enabled if KVM protected nVHE initialization is
successful.

When the key is enabled, is_hyp_mode_available continues to report
`true` because its users either treat it a check whether KVM will be /
has been initialized, or whether stub HVCs can be made (eg. hibernate).

is_hyp_mode_mismatched is changed to report `false` when the key is
enabled. That's because all cores' modes matched at the point of KVM
init and KVM will not allow cores not present at init to boot. That
said, the function is never used after KVM is initialized.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/virt.h | 17 +++++++++++++++++
arch/arm64/kvm/arm.c | 9 ++++++---
2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 2c3124512c00..8159d6010f4b 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -66,10 +66,19 @@ void __hyp_set_vectors(phys_addr_t phys_vector_base);
void __hyp_reset_vectors(void);

DECLARE_STATIC_KEY_FALSE(kvm_protected_mode);
+DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);

/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
+ /*
+ * If KVM protected mode is initialized, all CPUs must have been booted
+ * in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
+ */
+ if (IS_ENABLED(CONFIG_KVM) &&
+ static_branch_likely(&kvm_protected_mode_initialized))
+ return true;
+
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
__boot_cpu_mode[1] == BOOT_CPU_MODE_EL2);
}
@@ -77,6 +86,14 @@ static inline bool is_hyp_mode_available(void)
/* Check if the bootloader has booted CPUs in different modes */
static inline bool is_hyp_mode_mismatched(void)
{
+ /*
+ * If KVM protected mode is initialized, all CPUs must have been booted
+ * in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
+ */
+ if (IS_ENABLED(CONFIG_KVM) &&
+ static_branch_likely(&kvm_protected_mode_initialized))
+ return false;
+
return __boot_cpu_mode[0] != __boot_cpu_mode[1];
}

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c09b95cfa00a..9a2329c92a01 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -48,6 +48,7 @@ __asm__(".arch_extension virt");
#endif

DEFINE_STATIC_KEY_FALSE(kvm_protected_mode);
+DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);

DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);

@@ -1838,12 +1839,14 @@ int kvm_arch_init(void *opaque)
if (err)
goto out_hyp;

- if (is_kvm_protected_mode())
+ if (is_kvm_protected_mode()) {
+ static_branch_enable(&kvm_protected_mode_initialized);
kvm_info("Protected nVHE mode initialized successfully\n");
- else if (in_hyp_mode)
+ } else if (in_hyp_mode) {
kvm_info("VHE mode initialized successfully\n");
- else
+ } else {
kvm_info("Hyp mode initialized successfully\n");
+ }

return 0;

--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:40:57

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 23/24] kvm: arm64: Trap host SMCs in protected mode.

While protected nVHE KVM is installed, start trapping all host SMCs.
By default, these are simply forwarded to EL3, but PSCI SMCs are
validated first.

Create new constant HCR_HOST_NVHE_PROTECTED_FLAGS with the new set of HCR
flags to use while the nVHE vector is installed when the kernel was
booted with the protected flag enabled. Switch back to the default HCR
flags when switching back to the stub vector.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kernel/image-vars.h | 4 ++++
arch/arm64/kvm/arm.c | 35 ++++++++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 8 +++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 ++++-
5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 64ce29378467..4e90c2debf70 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -80,6 +80,7 @@
HCR_FMO | HCR_IMO | HCR_PTW )
#define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
+#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)

/* TCR_EL2 Registers bits */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 78a42a7cdb72..75cda51674f4 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -62,9 +62,13 @@ __efistub__ctype = _ctype;
*/

/* Alternative callbacks for init-time patching of nVHE hyp code. */
+KVM_NVHE_ALIAS(kvm_patch_hcr_flags);
KVM_NVHE_ALIAS(kvm_patch_vector_branch);
KVM_NVHE_ALIAS(kvm_update_va_mask);

+/* Static key enabled when the user opted into nVHE protected mode. */
+KVM_NVHE_ALIAS(kvm_protected_mode);
+
/* Global kernel state accessed by nVHE hyp code. */
KVM_NVHE_ALIAS(kvm_vgic_global_state);

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 574aa2d026e6..c09b95cfa00a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1861,6 +1861,41 @@ void kvm_arch_exit(void)
kvm_perf_teardown();
}

+static inline u32 __init __gen_mov_hcr_insn(u64 hcr, u32 rd, int i)
+{
+ int shift = 48 - (i * 16);
+ u16 imm = (hcr >> shift) & GENMASK(16, 0);
+
+ return aarch64_insn_gen_movewide(rd, imm, shift,
+ AARCH64_INSN_VARIANT_64BIT,
+ (i == 0) ? AARCH64_INSN_MOVEWIDE_ZERO
+ : AARCH64_INSN_MOVEWIDE_KEEP);
+}
+
+void __init kvm_patch_hcr_flags(struct alt_instr *alt,
+ __le32 *origptr, __le32 *updptr, int nr_inst)
+{
+ int i;
+ u32 rd;
+
+ BUG_ON(nr_inst != 4);
+
+ /* Skip for VHE and unprotected nVHE modes. */
+ if (!is_kvm_protected_mode())
+ return;
+
+ rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD,
+ le32_to_cpu(origptr[0]));
+
+ for (i = 0; i < nr_inst; i++) {
+ u32 oinsn = __gen_mov_hcr_insn(HCR_HOST_NVHE_FLAGS, rd, i);
+ u32 insn = __gen_mov_hcr_insn(HCR_HOST_NVHE_PROTECTED_FLAGS, rd, i);
+
+ BUG_ON(oinsn != le32_to_cpu(origptr[i]));
+ updptr[i] = cpu_to_le32(insn);
+ }
+}
+
static int __init early_kvm_protected_cfg(char *buf)
{
bool val;
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index f999a35b2c8c..bbe6c5f558e0 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -88,6 +88,12 @@ SYM_CODE_END(__kvm_hyp_init)
* x0: struct kvm_nvhe_init_params PA
*/
SYM_CODE_START(___kvm_hyp_init)
+alternative_cb kvm_patch_hcr_flags
+ mov_q x1, HCR_HOST_NVHE_FLAGS
+alternative_cb_end
+ msr hcr_el2, x1
+ isb
+
ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
msr tpidr_el2, x1

@@ -220,6 +226,8 @@ reset:
bic x5, x5, x6 // Clear SCTL_M and etc
pre_disable_mmu_workaround
msr sctlr_el2, x5
+ mov_q x5, HCR_HOST_NVHE_FLAGS
+ msr hcr_el2, x5
isb

/* Install stub vectors */
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 8ae8160bc93a..f605b25a9afc 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -96,7 +96,10 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;

write_sysreg(mdcr_el2, mdcr_el2);
- write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
+ if (is_kvm_protected_mode())
+ write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
+ else
+ write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
write_sysreg(__kvm_hyp_host_vector, vbar_el2);
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:41:03

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 01/24] psci: Accessor for configured PSCI version

The version of PSCI that the kernel should use to communicate with
firmware is typically obtained from probing PSCI_VERSION. However, that
doesn't work for PSCI v0.1 where the host gets the information from
DT/ACPI, or if PSCI is not supported / was disabled.

KVM's host PSCI proxy needs to be configured with the same version
used by the host driver. Expose the PSCI version used by the host
with a read-only accessor.

Signed-off-by: David Brazdil <[email protected]>
---
drivers/firmware/psci/psci.c | 11 +++++++++++
include/linux/psci.h | 8 ++++++++
2 files changed, 19 insertions(+)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 00af99b6f97c..bc1b2d60fdbf 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -49,6 +49,13 @@ static int resident_cpu = -1;
struct psci_operations psci_ops;
static enum arm_smccc_conduit psci_conduit = SMCCC_CONDUIT_NONE;

+static int driver_version = PSCI_VERSION(0, 0);
+
+int psci_driver_version(void)
+{
+ return driver_version;
+}
+
bool psci_tos_resident_on(int cpu)
{
return cpu == resident_cpu;
@@ -461,6 +468,8 @@ static int __init psci_probe(void)
return -EINVAL;
}

+ driver_version = ver;
+
psci_0_2_set_functions();

psci_init_migrate();
@@ -514,6 +523,8 @@ static int __init psci_0_1_init(struct device_node *np)

pr_info("Using PSCI v0.1 Function IDs from DT\n");

+ driver_version = PSCI_VERSION(0, 1);
+
if (!of_property_read_u32(np, "cpu_suspend", &id)) {
psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
psci_ops.cpu_suspend = psci_cpu_suspend;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 2a1bfb890e58..5b5dcf176aa6 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -21,6 +21,14 @@ bool psci_power_state_is_valid(u32 state);
int psci_set_osi_mode(bool enable);
bool psci_has_osi_support(void);

+/**
+ * The version of the PSCI specification followed by the driver.
+ * This is equivalent to calling PSCI_VERSION except:
+ * (a) it also works for PSCI v0.1, which does not support PSCI_VERSION, and
+ * (b) it is set to v0.0 if the PSCI driver was not initialized.
+ */
+int psci_driver_version(void);
+
struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:41:16

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 19/24] kvm: arm64: Intercept host's PSCI_CPU_ON SMCs

Add a handler of the CPU_ON PSCI call from host. When invoked, it looks
up the logical CPU ID corresponding to the provided MPIDR and populates
the state struct of the target CPU with the provided x0, pc. It then
calls CPU_ON itself, with an entry point in hyp that initializes EL2
state before returning ERET to the provided PC in EL1.

There is a simple atomic lock around the reset state struct. If it is
already locked, CPU_ON will return PENDING_ON error code.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_hyp.h | 1 +
arch/arm64/kvm/arm.c | 3 +
arch/arm64/kvm/hyp/nvhe/psci.c | 98 ++++++++++++++++++++++++++++++++
3 files changed, 102 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 95a2bbbcc7e1..4586fae36184 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -97,6 +97,7 @@ void deactivate_traps_vhe_put(void);
u64 __guest_enter(struct kvm_vcpu *vcpu);

#ifdef __KVM_NVHE_HYPERVISOR__
+asmlinkage void __noreturn kvm_host_psci_cpu_entry(void);
bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
#endif

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index dc7d43d7785a..a931253ebb61 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1333,6 +1333,7 @@ static int kvm_map_vectors(void)

static void cpu_init_hyp_mode(void)
{
+ DECLARE_KVM_NVHE_SYM(kvm_host_psci_cpu_entry);
struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
struct arm_smccc_res res;

@@ -1350,6 +1351,8 @@ static void cpu_init_hyp_mode(void)
params->pgd_ptr = kvm_mmu_get_httbr();
params->vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
params->hyp_stack_ptr = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+ params->psci_cpu_entry_fn = (unsigned long)kern_hyp_va(
+ kvm_ksym_ref(CHOOSE_NVHE_SYM(kvm_host_psci_cpu_entry)));

/*
* Flush the init params from the data cache because the struct will
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 05a34a152069..f9b82a87bf44 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -9,10 +9,13 @@
#include <asm/kvm_mmu.h>
#include <kvm/arm_hypercalls.h>
#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
#include <linux/psci.h>
#include <kvm/arm_psci.h>
#include <uapi/linux/psci.h>

+#define INVALID_CPU_ID UINT_MAX
+
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
@@ -30,6 +33,14 @@ s64 hyp_physvirt_offset;
kern_va - kimage_voffset; \
})

+struct kvm_host_psci_state {
+ atomic_t pending_on;
+ unsigned long pc;
+ unsigned long r0;
+};
+
+static DEFINE_PER_CPU(struct kvm_host_psci_state, kvm_host_psci_state);
+
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
return host_ctxt->regs.regs[0];
@@ -84,10 +95,95 @@ static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *ho
hyp_panic(); /* unreachable */
}

+static unsigned int find_cpu_id(u64 mpidr)
+{
+ int i;
+
+ if (mpidr != INVALID_HWID) {
+ for (i = 0; i < NR_CPUS; i++) {
+ if (cpu_logical_map(i) == mpidr)
+ return i;
+ }
+ }
+
+ return INVALID_CPU_ID;
+}
+
+static bool try_acquire_reset_state(struct kvm_host_psci_state *cpu_state,
+ unsigned long pc, unsigned long r0)
+{
+ if (atomic_cmpxchg_acquire(&cpu_state->pending_on, 0, 1) != 0)
+ return false;
+
+ cpu_state->pc = pc;
+ cpu_state->r0 = r0;
+ wmb();
+
+ return true;
+}
+
+static void release_reset_state(struct kvm_host_psci_state *cpu_state)
+{
+ atomic_set_release(&cpu_state->pending_on, 0);
+}
+
+static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ u64 mpidr = host_ctxt->regs.regs[1];
+ unsigned long pc = host_ctxt->regs.regs[2];
+ unsigned long r0 = host_ctxt->regs.regs[3];
+ unsigned int cpu_id;
+ struct kvm_host_psci_state *cpu_state;
+ struct kvm_nvhe_init_params *cpu_params;
+ int ret;
+
+ /*
+ * Find the logical CPU ID for the given MPIDR. The search set is
+ * the set of CPUs that were online at the point of KVM initialization.
+ * Booting other CPUs is rejected because their cpufeatures were not
+ * checked against the finalized capabilities. This could be relaxed
+ * by doing the feature checks in hyp.
+ */
+ cpu_id = find_cpu_id(mpidr);
+ if (cpu_id == INVALID_CPU_ID)
+ return PSCI_RET_INVALID_PARAMS;
+
+ cpu_state = per_cpu_ptr(&kvm_host_psci_state, cpu_id);
+ cpu_params = per_cpu_ptr(&kvm_init_params, cpu_id);
+
+ if (!try_acquire_reset_state(cpu_state, pc, r0))
+ return PSCI_RET_ALREADY_ON;
+
+ ret = psci_call(func_id, mpidr,
+ __hyp_pa_symbol(__kvm_hyp_cpu_entry),
+ __hyp_pa(cpu_params));
+
+ if (ret != PSCI_RET_SUCCESS)
+ release_reset_state(cpu_state);
+ return ret;
+}
+
+void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+
+asmlinkage void __noreturn kvm_host_psci_cpu_entry(void)
+{
+ struct kvm_host_psci_state *cpu_state = this_cpu_ptr(&kvm_host_psci_state);
+ struct kvm_cpu_context *host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+
+ host_ctxt->regs.regs[0] = cpu_state->r0;
+ write_sysreg_el2(cpu_state->pc, SYS_ELR);
+
+ release_reset_state(cpu_state);
+
+ __host_enter(host_ctxt);
+}
+
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
return psci_forward(host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_ON])
+ return psci_cpu_on(func_id, host_ctxt);
else if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
return psci_forward(host_ctxt);
else
@@ -108,6 +204,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
case PSCI_0_2_FN_SYSTEM_RESET:
psci_forward_noreturn(host_ctxt);
unreachable();
+ case PSCI_0_2_FN64_CPU_ON:
+ return psci_cpu_on(func_id, host_ctxt);
default:
return PSCI_RET_NOT_SUPPORTED;
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:41:17

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 22/24] kvm: arm64: Keep nVHE EL2 vector installed

KVM by default keeps the stub vector installed and installs the nVHE
vector only briefly for init and later on demand. Change this policy
to install the vector at init and then never uninstall it if the kernel
was given the protected KVM command line parameter.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 452a01afaf33..574aa2d026e6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1431,7 +1431,8 @@ static void _kvm_arch_hardware_disable(void *discard)

void kvm_arch_hardware_disable(void)
{
- _kvm_arch_hardware_disable(NULL);
+ if (!is_kvm_protected_mode())
+ _kvm_arch_hardware_disable(NULL);
}

#ifdef CONFIG_CPU_PM
@@ -1474,11 +1475,13 @@ static struct notifier_block hyp_init_cpu_pm_nb = {

static void __init hyp_cpu_pm_init(void)
{
- cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
+ if (!is_kvm_protected_mode())
+ cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
}
static void __init hyp_cpu_pm_exit(void)
{
- cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
+ if (!is_kvm_protected_mode())
+ cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
}
#else
static inline void hyp_cpu_pm_init(void)
@@ -1576,7 +1579,8 @@ static int init_subsystems(void)
kvm_coproc_table_init();

out:
- on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
+ if (err || !is_kvm_protected_mode())
+ on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);

return err;
}
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:41:48

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 07/24] kvm: arm64: Create nVHE copy of cpu_logical_map

When KVM starts validating host's PSCI requests, it will need to map
MPIDR back to the CPU ID. To this end, copy cpu_logical_map into nVHE
hyp memory when KVM is initialized.

Only copy the information for CPUs that are online at the point of KVM
initialization so that KVM rejects CPUs whose features were not checked
against the finalized capabilities.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 17 +++++++++++++++++
arch/arm64/kvm/hyp/nvhe/percpu.c | 16 ++++++++++++++++
2 files changed, 33 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9ba9db2aa7f8..b85b4294b72d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1481,6 +1481,21 @@ static inline void hyp_cpu_pm_exit(void)
}
#endif

+static void init_cpu_logical_map(void)
+{
+ extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
+ int cpu;
+
+ /*
+ * Copy the MPIDR <-> logical CPU ID mapping to hyp.
+ * Only copy the set of online CPUs whose features have been chacked
+ * against the finalized system capabilities. The hypervisor will not
+ * allow any other CPUs from the `possible` set to boot.
+ */
+ for_each_online_cpu(cpu)
+ CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
+}
+
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@@ -1659,6 +1674,8 @@ static int init_hyp_mode(void)
}
}

+ init_cpu_logical_map();
+
return 0;

out_err:
diff --git a/arch/arm64/kvm/hyp/nvhe/percpu.c b/arch/arm64/kvm/hyp/nvhe/percpu.c
index 5fd0c5696907..d0b9dbc2df45 100644
--- a/arch/arm64/kvm/hyp/nvhe/percpu.c
+++ b/arch/arm64/kvm/hyp/nvhe/percpu.c
@@ -8,6 +8,22 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>

+/*
+ * nVHE copy of data structures tracking available CPU cores.
+ * Only entries for CPUs that were online at KVM init are populated.
+ * Other CPUs should not be allowed to boot because their features were
+ * not checked against the finalized system capabilities.
+ */
+u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
+
+u64 cpu_logical_map(int cpu)
+{
+ if (cpu < 0 || cpu >= ARRAY_SIZE(__cpu_logical_map))
+ hyp_panic();
+
+ return __cpu_logical_map[cpu];
+}
+
unsigned long __hyp_per_cpu_offset(unsigned int cpu)
{
unsigned long *cpu_base_array;
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 11:41:53

by David Brazdil

[permalink] [raw]
Subject: [PATCH v1 03/24] arm64: Move MAIR_EL1_SET to asm/memory.h

KVM currently initializes MAIR_EL2 to the value of MAIR_EL1. In
preparation for initializing MAIR_EL2 before MAIR_EL1, move the constant
into a shared header file.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/memory.h | 13 +++++++++++++
arch/arm64/mm/proc.S | 13 -------------
2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index cd61239bae8c..aca00737e771 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -152,6 +152,19 @@
#define MT_S2_FWB_NORMAL 6
#define MT_S2_FWB_DEVICE_nGnRE 1

+/*
+ * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and
+ * changed during __cpu_setup to Normal Tagged if the system supports MTE.
+ */
+#define MAIR_EL1_SET \
+ (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \
+ MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) | \
+ MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
+
#ifdef CONFIG_ARM64_4K_PAGES
#define IOREMAP_MAX_ORDER (PUD_SHIFT)
#else
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 23c326a06b2d..25ff21b3a1c6 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -45,19 +45,6 @@
#define TCR_KASAN_FLAGS 0
#endif

-/*
- * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and
- * changed during __cpu_setup to Normal Tagged if the system supports MTE.
- */
-#define MAIR_EL1_SET \
- (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \
- MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) | \
- MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
-
#ifdef CONFIG_CPU_PM
/**
* cpu_do_suspend - save CPU registers context
--
2.29.2.222.g5d2a92d10f8-goog

2020-11-09 17:01:41

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v1 17/24] kvm: arm64: Add __hyp_pa_symbol helper macro

Hey David,

On Monday 09 Nov 2020 at 11:32:26 (+0000), David Brazdil wrote:
> Add helper macro for computing the PA of a kernel symbol in nVHE hyp
> code. This will be useful for computing the PA of a PSCI CPU_ON entry
> point.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/psci.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
> index b0b5df590ba5..7510b9e174e9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci.c
> @@ -20,6 +20,16 @@ s64 hyp_physvirt_offset;
>
> #define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
>
> +#define __hyp_pa_symbol(sym) \
> + ({ \
> + extern char sym[]; \
> + unsigned long kern_va; \
> + \
> + asm volatile("ldr %0, =%1" : "=r" (kern_va) \
> + : "S" (sym)); \
> + kern_va - kimage_voffset; \
> + })
> +

Could this be simplified to __hyp_pa(hyp_symbol_addr(sym))? That would
avoid the dependency on kimage_voffset.

Thanks,
Quentin

2020-11-09 18:12:22

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 17/24] kvm: arm64: Add __hyp_pa_symbol helper macro

On 2020-11-09 16:59, Quentin Perret wrote:
> Hey David,
>
> On Monday 09 Nov 2020 at 11:32:26 (+0000), David Brazdil wrote:
>> Add helper macro for computing the PA of a kernel symbol in nVHE hyp
>> code. This will be useful for computing the PA of a PSCI CPU_ON entry
>> point.
>>
>> Signed-off-by: David Brazdil <[email protected]>
>> ---
>> arch/arm64/kvm/hyp/nvhe/psci.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c
>> b/arch/arm64/kvm/hyp/nvhe/psci.c
>> index b0b5df590ba5..7510b9e174e9 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/psci.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/psci.c
>> @@ -20,6 +20,16 @@ s64 hyp_physvirt_offset;
>>
>> #define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
>>
>> +#define __hyp_pa_symbol(sym) \
>> + ({ \
>> + extern char sym[]; \
>> + unsigned long kern_va; \
>> + \
>> + asm volatile("ldr %0, =%1" : "=r" (kern_va) \
>> + : "S" (sym)); \
>> + kern_va - kimage_voffset; \
>> + })
>> +
>
> Could this be simplified to __hyp_pa(hyp_symbol_addr(sym))? That would
> avoid the dependency on kimage_voffset.

I'm going to move away from evaluating kimage_voffset at runtime anyway,
see [1].

Thanks,

M.

[1] https://lore.kernel.org/r/[email protected]
--
Jazz is not dead. It just smells funny...

2020-11-10 05:06:07

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 23/24] kvm: arm64: Trap host SMCs in protected mode.

Hi David,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on dennis-percpu/for-next]
[also build test WARNING on linus/master v5.10-rc3 next-20201109]
[cannot apply to kvmarm/next arm64/for-next/core soc/for-next arm/for-next xlnx/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/David-Brazdil/Opt-in-always-on-nVHE-hypervisor/20201109-193833
base: https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-next
config: arm64-randconfig-r022-20201109 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 09ec07827b1128504457a93dee80b2ceee1af600)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/0day-ci/linux/commit/a59ab708ed6039e83756720b1d5974e84db5a8f4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review David-Brazdil/Opt-in-always-on-nVHE-hypervisor/20201109-193833
git checkout a59ab708ed6039e83756720b1d5974e84db5a8f4
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> arch/arm64/kvm/arm.c:1875:13: warning: no previous prototype for function 'kvm_patch_hcr_flags' [-Wmissing-prototypes]
void __init kvm_patch_hcr_flags(struct alt_instr *alt,
^
arch/arm64/kvm/arm.c:1875:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __init kvm_patch_hcr_flags(struct alt_instr *alt,
^
static
1 warning generated.

vim +/kvm_patch_hcr_flags +1875 arch/arm64/kvm/arm.c

1874
> 1875 void __init kvm_patch_hcr_flags(struct alt_instr *alt,
1876 __le32 *origptr, __le32 *updptr, int nr_inst)
1877 {
1878 int i;
1879 u32 rd;
1880
1881 BUG_ON(nr_inst != 4);
1882
1883 /* Skip for VHE and unprotected nVHE modes. */
1884 if (!is_kvm_protected_mode())
1885 return;
1886
1887 rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD,
1888 le32_to_cpu(origptr[0]));
1889
1890 for (i = 0; i < nr_inst; i++) {
1891 u32 oinsn = __gen_mov_hcr_insn(HCR_HOST_NVHE_FLAGS, rd, i);
1892 u32 insn = __gen_mov_hcr_insn(HCR_HOST_NVHE_PROTECTED_FLAGS, rd, i);
1893
1894 BUG_ON(oinsn != le32_to_cpu(origptr[i]));
1895 updptr[i] = cpu_to_le32(insn);
1896 }
1897 }
1898

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.08 kB)
.config.gz (36.90 kB)
Download all attachments

2020-11-10 09:11:02

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 23/24] kvm: arm64: Trap host SMCs in protected mode.

On 2020-11-09 11:32, David Brazdil wrote:
> While protected nVHE KVM is installed, start trapping all host SMCs.
> By default, these are simply forwarded to EL3, but PSCI SMCs are
> validated first.
>
> Create new constant HCR_HOST_NVHE_PROTECTED_FLAGS with the new set of
> HCR
> flags to use while the nVHE vector is installed when the kernel was
> booted with the protected flag enabled. Switch back to the default HCR
> flags when switching back to the stub vector.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/kvm_arm.h | 1 +
> arch/arm64/kernel/image-vars.h | 4 ++++
> arch/arm64/kvm/arm.c | 35 ++++++++++++++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/hyp-init.S | 8 +++++++
> arch/arm64/kvm/hyp/nvhe/switch.c | 5 ++++-
> 5 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h
> b/arch/arm64/include/asm/kvm_arm.h
> index 64ce29378467..4e90c2debf70 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -80,6 +80,7 @@
> HCR_FMO | HCR_IMO | HCR_PTW )
> #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
> +#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
> #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
>
> /* TCR_EL2 Registers bits */
> diff --git a/arch/arm64/kernel/image-vars.h
> b/arch/arm64/kernel/image-vars.h
> index 78a42a7cdb72..75cda51674f4 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -62,9 +62,13 @@ __efistub__ctype = _ctype;
> */
>
> /* Alternative callbacks for init-time patching of nVHE hyp code. */
> +KVM_NVHE_ALIAS(kvm_patch_hcr_flags);
> KVM_NVHE_ALIAS(kvm_patch_vector_branch);
> KVM_NVHE_ALIAS(kvm_update_va_mask);
>
> +/* Static key enabled when the user opted into nVHE protected mode. */
> +KVM_NVHE_ALIAS(kvm_protected_mode);
> +
> /* Global kernel state accessed by nVHE hyp code. */
> KVM_NVHE_ALIAS(kvm_vgic_global_state);
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 574aa2d026e6..c09b95cfa00a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1861,6 +1861,41 @@ void kvm_arch_exit(void)
> kvm_perf_teardown();
> }
>
> +static inline u32 __init __gen_mov_hcr_insn(u64 hcr, u32 rd, int i)
> +{
> + int shift = 48 - (i * 16);
> + u16 imm = (hcr >> shift) & GENMASK(16, 0);

I really doubt you want to encode 17 bits.

> +
> + return aarch64_insn_gen_movewide(rd, imm, shift,
> + AARCH64_INSN_VARIANT_64BIT,
> + (i == 0) ? AARCH64_INSN_MOVEWIDE_ZERO
> + : AARCH64_INSN_MOVEWIDE_KEEP);
> +}

I've added a generate_mov_q() helper as part of my host EL2 entry
rework.
We can probably share some stuff here.

> +
> +void __init kvm_patch_hcr_flags(struct alt_instr *alt,
> + __le32 *origptr, __le32 *updptr, int nr_inst)
> +{
> + int i;
> + u32 rd;
> +
> + BUG_ON(nr_inst != 4);
> +
> + /* Skip for VHE and unprotected nVHE modes. */
> + if (!is_kvm_protected_mode())
> + return;
> +
> + rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD,
> + le32_to_cpu(origptr[0]));
> +
> + for (i = 0; i < nr_inst; i++) {
> + u32 oinsn = __gen_mov_hcr_insn(HCR_HOST_NVHE_FLAGS, rd, i);
> + u32 insn = __gen_mov_hcr_insn(HCR_HOST_NVHE_PROTECTED_FLAGS, rd, i);
> +
> + BUG_ON(oinsn != le32_to_cpu(origptr[i]));
> + updptr[i] = cpu_to_le32(insn);
> + }
> +}
> +
> static int __init early_kvm_protected_cfg(char *buf)
> {
> bool val;
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index f999a35b2c8c..bbe6c5f558e0 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -88,6 +88,12 @@ SYM_CODE_END(__kvm_hyp_init)
> * x0: struct kvm_nvhe_init_params PA
> */
> SYM_CODE_START(___kvm_hyp_init)
> +alternative_cb kvm_patch_hcr_flags
> + mov_q x1, HCR_HOST_NVHE_FLAGS

You really want to be careful here: the mov_q macro expands to 2, 3 or 4
instructions, depending on the input data...

It is also odd that you have both a static key and a patching
alternative.
Why isn't "protected KVM" a capability that can be evaluated as a a non
patching alternative? In general, I'd like to reserve patching
alternatives
to values that cannot be evaluated at compile time (VM offsets, for
example).

> +alternative_cb_end
> + msr hcr_el2, x1
> + isb
> +
> ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
> msr tpidr_el2, x1
>
> @@ -220,6 +226,8 @@ reset:
> bic x5, x5, x6 // Clear SCTL_M and etc
> pre_disable_mmu_workaround
> msr sctlr_el2, x5
> + mov_q x5, HCR_HOST_NVHE_FLAGS
> + msr hcr_el2, x5
> isb
>
> /* Install stub vectors */
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c
> b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 8ae8160bc93a..f605b25a9afc 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -96,7 +96,10 @@ static void __deactivate_traps(struct kvm_vcpu
> *vcpu)
> mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
>
> write_sysreg(mdcr_el2, mdcr_el2);
> - write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
> + if (is_kvm_protected_mode())
> + write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
> + else
> + write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
> write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
> write_sysreg(__kvm_hyp_host_vector, vbar_el2);
> }

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 09:25:54

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 17/24] kvm: arm64: Add __hyp_pa_symbol helper macro

On Mon, Nov 09, 2020 at 06:10:05PM +0000, Marc Zyngier wrote:
> On 2020-11-09 16:59, Quentin Perret wrote:
> > Hey David,
> >
> > On Monday 09 Nov 2020 at 11:32:26 (+0000), David Brazdil wrote:
> > > Add helper macro for computing the PA of a kernel symbol in nVHE hyp
> > > code. This will be useful for computing the PA of a PSCI CPU_ON entry
> > > point.
> > >
> > > Signed-off-by: David Brazdil <[email protected]>
> > > ---
> > > arch/arm64/kvm/hyp/nvhe/psci.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c
> > > b/arch/arm64/kvm/hyp/nvhe/psci.c
> > > index b0b5df590ba5..7510b9e174e9 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/psci.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/psci.c
> > > @@ -20,6 +20,16 @@ s64 hyp_physvirt_offset;
> > >
> > > #define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
> > >
> > > +#define __hyp_pa_symbol(sym) \
> > > + ({ \
> > > + extern char sym[]; \
> > > + unsigned long kern_va; \
> > > + \
> > > + asm volatile("ldr %0, =%1" : "=r" (kern_va) \
> > > + : "S" (sym)); \
> > > + kern_va - kimage_voffset; \
> > > + })
> > > +
> >
> > Could this be simplified to __hyp_pa(hyp_symbol_addr(sym))? That would
> > avoid the dependency on kimage_voffset.

Ah, didn't see that one. Ok, removing this patch.

>
> I'm going to move away from evaluating kimage_voffset at runtime anyway,
> see [1].

Awesome! One more dependency gone.

>
> Thanks,
>
> M.
>
> [1] https://lore.kernel.org/r/[email protected]
> --
> Jazz is not dead. It just smells funny...

2020-11-10 10:19:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] Opt-in always-on nVHE hypervisor

On Mon, Nov 09, 2020 at 11:32:09AM +0000, David Brazdil wrote:
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.

Why? I thought we were trying to kill nVHE off now that newer CPUs
provide the saner virtualization extensions?

2020-11-10 11:21:27

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] Opt-in always-on nVHE hypervisor

On 2020-11-10 10:15, Christoph Hellwig wrote:
> On Mon, Nov 09, 2020 at 11:32:09AM +0000, David Brazdil wrote:
>> As we progress towards being able to keep guest state private to the
>> host running nVHE hypervisor, this series allows the hypervisor to
>> install itself on newly booted CPUs before the host is allowed to run
>> on them.
>
> Why? I thought we were trying to kill nVHE off now that newer CPUs
> provide the saner virtualization extensions?

We can't kill nVHE at all, because that is the only game in town.
You can't even buy a decent machine with VHE, no matter how much money
you put on the table.

nVHE is here for the foreseeable future, and we even use its misfeatures
to our advantage in order to offer confidential VMs. See Will's
presentation
at KVM forum a couple of weeks ago for the gory details.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 13:03:47

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 23/24] kvm: arm64: Trap host SMCs in protected mode.

> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > @@ -88,6 +88,12 @@ SYM_CODE_END(__kvm_hyp_init)
> > * x0: struct kvm_nvhe_init_params PA
> > */
> > SYM_CODE_START(___kvm_hyp_init)
> > +alternative_cb kvm_patch_hcr_flags
> > + mov_q x1, HCR_HOST_NVHE_FLAGS
>
> You really want to be careful here: the mov_q macro expands to 2, 3 or 4
> instructions, depending on the input data...
>
> It is also odd that you have both a static key and a patching alternative.
> Why isn't "protected KVM" a capability that can be evaluated as a a non
> patching alternative? In general, I'd like to reserve patching alternatives
> to values that cannot be evaluated at compile time (VM offsets, for
> example).

Capability was my initial idea as well but it looked tied to CPU features.
Looking at it again, you're right that there is precedent for setting them
from kernel params. Alright, I'll change it and that will get rid of the
custom patching.

2020-11-10 14:49:32

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] psci: Accessor for configured PSCI version

On 2020-11-09 11:32, David Brazdil wrote:
> The version of PSCI that the kernel should use to communicate with
> firmware is typically obtained from probing PSCI_VERSION. However, that
> doesn't work for PSCI v0.1 where the host gets the information from
> DT/ACPI, or if PSCI is not supported / was disabled.
>
> KVM's host PSCI proxy needs to be configured with the same version
> used by the host driver. Expose the PSCI version used by the host
> with a read-only accessor.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> drivers/firmware/psci/psci.c | 11 +++++++++++
> include/linux/psci.h | 8 ++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/drivers/firmware/psci/psci.c
> b/drivers/firmware/psci/psci.c
> index 00af99b6f97c..bc1b2d60fdbf 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -49,6 +49,13 @@ static int resident_cpu = -1;
> struct psci_operations psci_ops;
> static enum arm_smccc_conduit psci_conduit = SMCCC_CONDUIT_NONE;
>
> +static int driver_version = PSCI_VERSION(0, 0);
> +
> +int psci_driver_version(void)
> +{
> + return driver_version;
> +}
> +
> bool psci_tos_resident_on(int cpu)
> {
> return cpu == resident_cpu;
> @@ -461,6 +468,8 @@ static int __init psci_probe(void)
> return -EINVAL;
> }
>
> + driver_version = ver;
> +
> psci_0_2_set_functions();
>
> psci_init_migrate();
> @@ -514,6 +523,8 @@ static int __init psci_0_1_init(struct device_node
> *np)
>
> pr_info("Using PSCI v0.1 Function IDs from DT\n");
>
> + driver_version = PSCI_VERSION(0, 1);
> +
> if (!of_property_read_u32(np, "cpu_suspend", &id)) {
> psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
> psci_ops.cpu_suspend = psci_cpu_suspend;
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index 2a1bfb890e58..5b5dcf176aa6 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -21,6 +21,14 @@ bool psci_power_state_is_valid(u32 state);
> int psci_set_osi_mode(bool enable);
> bool psci_has_osi_support(void);
>
> +/**
> + * The version of the PSCI specification followed by the driver.
> + * This is equivalent to calling PSCI_VERSION except:
> + * (a) it also works for PSCI v0.1, which does not support
> PSCI_VERSION, and
> + * (b) it is set to v0.0 if the PSCI driver was not initialized.
> + */
> +int psci_driver_version(void);
> +
> struct psci_operations {
> u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);

I still maintain that populating .get_version in all cases instead of
duplicating an existing functionality is a better outcome. PSCI not
supported would be implied by .get_version being NULL.

What breaks?

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 14:52:35

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 03/24] arm64: Move MAIR_EL1_SET to asm/memory.h

On 2020-11-09 11:32, David Brazdil wrote:
> KVM currently initializes MAIR_EL2 to the value of MAIR_EL1. In
> preparation for initializing MAIR_EL2 before MAIR_EL1, move the
> constant
> into a shared header file.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/memory.h | 13 +++++++++++++
> arch/arm64/mm/proc.S | 13 -------------
> 2 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/include/asm/memory.h
> b/arch/arm64/include/asm/memory.h
> index cd61239bae8c..aca00737e771 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -152,6 +152,19 @@
> #define MT_S2_FWB_NORMAL 6
> #define MT_S2_FWB_DEVICE_nGnRE 1
>
> +/*
> + * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal
> memory and
> + * changed during __cpu_setup to Normal Tagged if the system supports
> MTE.
> + */
> +#define MAIR_EL1_SET \

If we are going to use this at EL2 directly, consider renaming it to
MAIR_ELx_SET, as we do for other constants that are shared across
exception
levels.

> + (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \

This creates an implicit dependency between sysreg.h and memory.h.
Consider including asm/sysreg.h, assuming this doesn't create any
circular
dependency, or even move it to sysreg.h altogether.

> + MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) | \
> + MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) | \
> + MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \
> + MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
> + MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \
> + MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
> +
> #ifdef CONFIG_ARM64_4K_PAGES
> #define IOREMAP_MAX_ORDER (PUD_SHIFT)
> #else
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 23c326a06b2d..25ff21b3a1c6 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -45,19 +45,6 @@
> #define TCR_KASAN_FLAGS 0
> #endif
>
> -/*
> - * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal
> memory and
> - * changed during __cpu_setup to Normal Tagged if the system supports
> MTE.
> - */
> -#define MAIR_EL1_SET \
> - (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \
> - MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) | \
> - MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) | \
> - MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \
> - MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
> - MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \
> - MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
> -
> #ifdef CONFIG_CPU_PM
> /**
> * cpu_do_suspend - save CPU registers context

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 15:11:00

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 06/24] kvm: arm64: Support per_cpu_ptr in nVHE hyp code

On 2020-11-09 11:32, David Brazdil wrote:
> When compiling with __KVM_NVHE_HYPERVISOR__ redefine per_cpu_offset()
> to
> __hyp_per_cpu_offset() which looks up the base of the nVHE per-CPU
> region of the given cpu and computes its offset from the
> .hyp.data..percpu section.
>
> This enables use of per_cpu_ptr() helpers in nVHE hyp code. Until now
> only this_cpu_ptr() was supported by setting TPIDR_EL2.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/percpu.h | 6 ++++++
> arch/arm64/kernel/image-vars.h | 3 +++
> arch/arm64/kvm/hyp/nvhe/Makefile | 3 ++-
> arch/arm64/kvm/hyp/nvhe/percpu.c | 22 ++++++++++++++++++++++
> 4 files changed, 33 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/kvm/hyp/nvhe/percpu.c
>
> diff --git a/arch/arm64/include/asm/percpu.h
> b/arch/arm64/include/asm/percpu.h
> index 1599e17379d8..8f1661603b78 100644
> --- a/arch/arm64/include/asm/percpu.h
> +++ b/arch/arm64/include/asm/percpu.h
> @@ -239,6 +239,12 @@ PERCPU_RET_OP(add, add, ldadd)
> #define this_cpu_cmpxchg_8(pcp, o, n) \
> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
> +#define __per_cpu_offset
> +#define per_cpu_offset(cpu) __hyp_per_cpu_offset((cpu))
> +#endif
> +
> #include <asm-generic/percpu.h>
>
> /* Redefine macros for nVHE hyp under DEBUG_PREEMPT to avoid its
> dependencies. */
> diff --git a/arch/arm64/kernel/image-vars.h
> b/arch/arm64/kernel/image-vars.h
> index c615b285ff5b..78a42a7cdb72 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -103,6 +103,9 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
> KVM_NVHE_ALIAS(__start___kvm_ex_table);
> KVM_NVHE_ALIAS(__stop___kvm_ex_table);
>
> +/* Array containing bases of nVHE per-CPU memory regions. */
> +KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
> +
> #endif /* CONFIG_KVM */
>
> #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index ddde15fe85f2..c45f440cce51 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -6,7 +6,8 @@
> asflags-y := -D__KVM_NVHE_HYPERVISOR__
> ccflags-y := -D__KVM_NVHE_HYPERVISOR__
>
> -obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o
> host.o hyp-main.o
> +obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o
> host.o \
> + hyp-main.o percpu.o
> obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o
> ../entry.o \
> ../fpsimd.o ../hyp-entry.o
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/percpu.c
> b/arch/arm64/kvm/hyp/nvhe/percpu.c
> new file mode 100644
> index 000000000000..5fd0c5696907
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/percpu.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: David Brazdil <[email protected]>
> + */
> +
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +
> +unsigned long __hyp_per_cpu_offset(unsigned int cpu)
> +{
> + unsigned long *cpu_base_array;
> + unsigned long this_cpu_base;
> +
> + if (cpu >= ARRAY_SIZE(kvm_arm_hyp_percpu_base))
> + hyp_panic();
> +
> + cpu_base_array = kern_hyp_va(&kvm_arm_hyp_percpu_base[0]);

There is no guarantee that this will not generate a PC relative
addressing, resulting in kern_hyp_va() being applied twice.

Consider using hyp_symbol_addr() instead, which always does the right
by forcing a PC relative addressing and not subsequently mangling
the address.

> + this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
> + return this_cpu_base - (unsigned long)&__per_cpu_start;

And this is the opposite case: if the compiler generates an absolute
address, you're toast. Yes, this is just as unlikely, but hey...
Same remedy should apply.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 15:30:12

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 07/24] kvm: arm64: Create nVHE copy of cpu_logical_map

On 2020-11-09 11:32, David Brazdil wrote:
> When KVM starts validating host's PSCI requests, it will need to map
> MPIDR back to the CPU ID. To this end, copy cpu_logical_map into nVHE
> hyp memory when KVM is initialized.
>
> Only copy the information for CPUs that are online at the point of KVM
> initialization so that KVM rejects CPUs whose features were not checked
> against the finalized capabilities.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 17 +++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/percpu.c | 16 ++++++++++++++++
> 2 files changed, 33 insertions(+)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 9ba9db2aa7f8..b85b4294b72d 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1481,6 +1481,21 @@ static inline void hyp_cpu_pm_exit(void)
> }
> #endif
>
> +static void init_cpu_logical_map(void)
> +{
> + extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
> + int cpu;
> +
> + /*
> + * Copy the MPIDR <-> logical CPU ID mapping to hyp.
> + * Only copy the set of online CPUs whose features have been chacked
> + * against the finalized system capabilities. The hypervisor will not
> + * allow any other CPUs from the `possible` set to boot.
> + */
> + for_each_online_cpu(cpu)
> + CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
> +}
> +
> static int init_common_resources(void)
> {
> return kvm_set_ipa_limit();
> @@ -1659,6 +1674,8 @@ static int init_hyp_mode(void)
> }
> }
>
> + init_cpu_logical_map();
> +
> return 0;
>
> out_err:
> diff --git a/arch/arm64/kvm/hyp/nvhe/percpu.c
> b/arch/arm64/kvm/hyp/nvhe/percpu.c
> index 5fd0c5696907..d0b9dbc2df45 100644
> --- a/arch/arm64/kvm/hyp/nvhe/percpu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/percpu.c
> @@ -8,6 +8,22 @@
> #include <asm/kvm_hyp.h>
> #include <asm/kvm_mmu.h>
>
> +/*
> + * nVHE copy of data structures tracking available CPU cores.
> + * Only entries for CPUs that were online at KVM init are populated.
> + * Other CPUs should not be allowed to boot because their features
> were
> + * not checked against the finalized system capabilities.
> + */
> +u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1]
> = INVALID_HWID };

I'm not sure what __ro_after_init means once we get S2 isolation.

> +
> +u64 cpu_logical_map(int cpu)

nit: is there any reason why "cpu" cannot be unsigned? The thought
of a negative CPU number makes me shiver...

> +{
> + if (cpu < 0 || cpu >= ARRAY_SIZE(__cpu_logical_map))
> + hyp_panic();
> +
> + return __cpu_logical_map[cpu];
> +}
> +
> unsigned long __hyp_per_cpu_offset(unsigned int cpu)
> {
> unsigned long *cpu_base_array;

Overall, this patch would make more sense closer it its use case
(in patch 19). I also don't understand why this lives in percpu.c...

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 15:33:48

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 08/24] kvm: arm64: Move hyp-init params to a per-CPU struct

On 2020-11-09 11:32, David Brazdil wrote:
> Once we start initializing KVM on newly booted cores before the rest of
> the kernel, parameters to __do_hyp_init will need to be provided by EL2
> rather than EL1. At that point it will not be possible to pass its four
> arguments directly because PSCI_CPU_ON only supports one context
> argument.
>
> Refactor __do_hyp_init to accept its parameters in a struct. This
> prepares the code for KVM booting cores as well as removes any limits
> on
> the number of __do_hyp_init arguments.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/kvm_asm.h | 7 +++++++
> arch/arm64/include/asm/kvm_hyp.h | 4 ++++
> arch/arm64/kernel/asm-offsets.c | 4 ++++
> arch/arm64/kvm/arm.c | 26 ++++++++++++++------------
> arch/arm64/kvm/hyp/nvhe/hyp-init.S | 21 ++++++++++-----------
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 ++
> 6 files changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h
> b/arch/arm64/include/asm/kvm_asm.h
> index 54387ccd1ab2..a49a87a186c3 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -150,6 +150,13 @@ extern void *__vhe_undefined_symbol;
>
> #endif
>
> +struct kvm_nvhe_init_params {
> + phys_addr_t pgd_ptr;
> + unsigned long tpidr_el2;
> + unsigned long hyp_stack_ptr;
> + unsigned long vector_ptr;
> +};

Please add some documentation here, specially indicating what address
space
the values are relative to.

> +
> /* Translate a kernel address @ptr into its equivalent linear mapping
> */
> #define kvm_ksym_ref(ptr) \
> ({ \
> diff --git a/arch/arm64/include/asm/kvm_hyp.h
> b/arch/arm64/include/asm/kvm_hyp.h
> index 6b664de5ec1f..a3289071f3d8 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -15,6 +15,10 @@
> DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
> DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);
>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> +#endif
> +
> #define read_sysreg_elx(r,nvh,vh) \
> ({ \
> u64 reg; \
> diff --git a/arch/arm64/kernel/asm-offsets.c
> b/arch/arm64/kernel/asm-offsets.c
> index 7d32fc959b1a..0cbb86135c7c 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -110,6 +110,10 @@ int main(void)
> DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context,
> sys_regs[APGAKEYLO_EL1]));
> DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context,
> __hyp_running_vcpu));
> DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data,
> host_ctxt));
> + DEFINE(NVHE_INIT_PGD_PTR, offsetof(struct kvm_nvhe_init_params,
> pgd_ptr));
> + DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params,
> tpidr_el2));
> + DEFINE(NVHE_INIT_STACK_PTR, offsetof(struct kvm_nvhe_init_params,
> hyp_stack_ptr));
> + DEFINE(NVHE_INIT_VECTOR_PTR, offsetof(struct kvm_nvhe_init_params,
> vector_ptr));
> #endif
> #ifdef CONFIG_CPU_PM
> DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b85b4294b72d..1a57b6025937 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -50,6 +50,7 @@ DECLARE_KVM_HYP_PER_CPU(unsigned long,
> kvm_hyp_vector);
>
> static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params,
> kvm_init_params);
>
> /* The VMID used in the VTTBR */
> static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> @@ -1331,10 +1332,7 @@ static int kvm_map_vectors(void)
>
> static void cpu_init_hyp_mode(void)
> {
> - phys_addr_t pgd_ptr;
> - unsigned long hyp_stack_ptr;
> - unsigned long vector_ptr;
> - unsigned long tpidr_el2;
> + struct kvm_nvhe_init_params *params =
> this_cpu_ptr_nvhe_sym(kvm_init_params);
> struct arm_smccc_res res;
>
> /* Switch from the HYP stub to our own HYP init vector */
> @@ -1345,13 +1343,18 @@ static void cpu_init_hyp_mode(void)
> * kernel's mapping to the linear mapping, and store it in tpidr_el2
> * so that we can use adr_l to access per-cpu variables in EL2.
> */
> - tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
> - (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
> + params->tpidr_el2 = (unsigned
> long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
> + (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
>
> - pgd_ptr = kvm_mmu_get_httbr();
> - hyp_stack_ptr = __this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE;
> - hyp_stack_ptr = kern_hyp_va(hyp_stack_ptr);
> - vector_ptr = (unsigned
> long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
> + params->pgd_ptr = kvm_mmu_get_httbr();
> + params->vector_ptr = (unsigned
> long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
> + params->hyp_stack_ptr =
> kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
> +
> + /*
> + * Flush the init params from the data cache because the struct will
> + * be read from while the MMU is off.

s/from while/while/ ?

> + */
> + __flush_dcache_area(params, sizeof(*params));
>
> /*
> * Call initialization code, and switch to the full blown HYP code.
> @@ -1360,8 +1363,7 @@ static void cpu_init_hyp_mode(void)
> * cpus_have_const_cap() wrapper.
> */
> BUG_ON(!system_capabilities_finalized());
> - arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init),
> - pgd_ptr, tpidr_el2, hyp_stack_ptr, vector_ptr, &res);
> + arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init),
> virt_to_phys(params), &res);
> WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
>
> /*
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index 96e70f976ff5..6f3ac5d428ec 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -47,10 +47,7 @@ __invalid:
>
> /*
> * x0: SMCCC function ID
> - * x1: HYP pgd
> - * x2: per-CPU offset
> - * x3: HYP stack
> - * x4: HYP vectors
> + * x1: struct kvm_nvhe_init_params PA
> */
> __do_hyp_init:
> /* Check for a stub HVC call */
> @@ -71,10 +68,16 @@ __do_hyp_init:
> mov x0, #SMCCC_RET_NOT_SUPPORTED
> eret
>
> -1:
> - /* Set tpidr_el2 for use by HYP to free a register */
> - msr tpidr_el2, x2
> +1: ldr x0, [x1, #NVHE_INIT_TPIDR_EL2]
> + msr tpidr_el2, x0
>
> + ldr x0, [x1, #NVHE_INIT_STACK_PTR]
> + mov sp, x0
> +
> + ldr x0, [x1, #NVHE_INIT_VECTOR_PTR]
> + msr vbar_el2, x0
> +
> + ldr x1, [x1, #NVHE_INIT_PGD_PTR]
> phys_to_ttbr x0, x1
> alternative_if ARM64_HAS_CNP
> orr x0, x0, #TTBR_CNP_BIT
> @@ -134,10 +137,6 @@ alternative_else_nop_endif
> msr sctlr_el2, x0
> isb
>
> - /* Set the stack and new vectors */
> - mov sp, x3
> - msr vbar_el2, x4
> -
> /* Hello, World! */
> mov x0, #SMCCC_RET_SUCCESS
> eret
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index e2eafe2c93af..411b0f652417 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -14,6 +14,8 @@
>
> #include <kvm/arm_hypercalls.h>
>
> +DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> +
> static void handle_host_hcall(unsigned long func_id,
> struct kvm_cpu_context *host_ctxt)
> {

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-10 15:59:14

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 10/24] kvm: arm64: Extract parts of el2_setup into a macro

On 2020-11-09 11:32, David Brazdil wrote:
> When the a CPU is booted in EL2, the kernel checks for VHE support and
> initializes the CPU core accordingly. For nVHE it also installs the
> stub
> vectors and drops down to EL1.
>
> Once KVM gains the ability to boot cores without going through the
> kernel entry point, it will need to initialize the CPU the same way.
> Extract the relevant bits of el2_setup into init_el2_state macro
> with an argument specifying whether to initialize for VHE or nVHE.
>
> No functional change. Size of el2_setup increased by 148 bytes due
> to duplication.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/kvm_asm.h | 128 ++++++++++++++++++++++++++++
> arch/arm64/kernel/head.S | 140 +++----------------------------
> 2 files changed, 141 insertions(+), 127 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h
> b/arch/arm64/include/asm/kvm_asm.h
> index a49a87a186c3..893327d1e449 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -331,6 +331,134 @@ extern char
> __smccc_workaround_1_smc[__SMCCC_WORKAROUND_1_SMC_SZ];
> msr sp_el0, \tmp
> .endm
>
> +.macro init_el2_state mode
> +
> +.ifnes "\mode", "vhe"
> +.ifnes "\mode", "nvhe"
> +.error "Invalid 'mode' argument"
> +.endif
> +.endif
> +
> + mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> + msr sctlr_el2, x0
> + isb
> +
> + /*
> + * Allow Non-secure EL1 and EL0 to access physical timer and counter.
> + * This is not necessary for VHE, since the host kernel runs in EL2,
> + * and EL0 accesses are configured in the later stage of boot
> process.
> + * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit
> layout
> + * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are
> redefined
> + * to access CNTHCTL_EL2. This allows the kernel designed to run at
> EL1
> + * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
> + * EL2.
> + */
> +.ifeqs "\mode", "nvhe"
> + mrs x0, cnthctl_el2
> + orr x0, x0, #3 // Enable EL1 physical timers
> + msr cnthctl_el2, x0
> +.endif
> + msr cntvoff_el2, xzr // Clear virtual offset
> +
> +#ifdef CONFIG_ARM_GIC_V3
> + /* GICv3 system register access */
> + mrs x0, id_aa64pfr0_el1
> + ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> + cbz x0, 3f
> +
> + mrs_s x0, SYS_ICC_SRE_EL2
> + orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
> + orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> + msr_s SYS_ICC_SRE_EL2, x0
> + isb // Make sure SRE is now set
> + mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
> + tbz x0, #0, 3f // and check that it sticks
> + msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
> +3:
> +#endif
> +
> + /* Populate ID registers. */
> + mrs x0, midr_el1
> + mrs x1, mpidr_el1
> + msr vpidr_el2, x0
> + msr vmpidr_el2, x1

I don't think this has any effect on VHE, and could be lumped
together with the nVHE code.

> +
> +#ifdef CONFIG_COMPAT
> + msr hstr_el2, xzr // Disable CP15 traps to EL2
> +#endif
> +
> + /* EL2 debug */
> + mrs x1, id_aa64dfr0_el1
> + sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
> + cmp x0, #1
> + b.lt 4f // Skip if no PMU present
> + mrs x0, pmcr_el0 // Disable debug access traps
> + ubfx x0, x0, #11, #5 // to EL2 and allow access to
> +4:
> + csel x3, xzr, x0, lt // all PMU counters from EL1
> +
> + /* Statistical profiling */
> + ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
> + cbz x0, 7f // Skip if SPE not present
> +.ifeqs "\mode", "nvhe"
> + mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
> + and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
> + cbnz x4, 5f // then permit sampling of physical
> + mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
> + 1 << SYS_PMSCR_EL2_PA_SHIFT)
> + msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
> +5:
> + mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
> + orr x3, x3, x1 // If we don't have VHE, then
> + b 7f // use EL1&0 translation.
> +.endif
> + orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1

This orr would probably be better placed in an "else" close to the
previous macro. And since you are making things "modular", why not
go the extra mile, and define macros for each functionality that
defer between modes? It would certainly make this change more digest,
and the result more readable.

> +7:
> + msr mdcr_el2, x3 // Configure debug traps
> +
> + /* LORegions */
> + mrs x1, id_aa64mmfr1_el1
> + ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
> + cbz x0, 1f
> + msr_s SYS_LORC_EL1, xzr
> +1:
> +
> + /* Stage-2 translation */
> + msr vttbr_el2, xzr
> +
> +.ifeqs "\mode", "nvhe"
> + /*
> + * When VHE is not in use, early init of EL2 and EL1 needs to be
> + * done here.
> + * When VHE _is_ in use, EL1 will not be used in the host and
> + * requires no configuration, and all non-hyp-specific EL2 setup
> + * will be done via the _EL1 system register aliases in __cpu_setup.
> + */
> + mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> + msr sctlr_el1, x0
> +
> + /* Coprocessor traps. */
> + mov x0, #0x33ff
> + msr cptr_el2, x0 // Disable copro. traps to EL2
> +
> + /* SVE register access */
> + mrs x1, id_aa64pfr0_el1
> + ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
> + cbz x1, 7f
> +
> + bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
> + msr cptr_el2, x0 // Disable copro. traps to EL2
> + isb
> + mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
> + msr_s SYS_ZCR_EL2, x1 // length for EL1.
> +
> + /* spsr */
> +7: mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> + PSR_MODE_EL1h)
> + msr spsr_el2, x0
> +.endif
> +.endm
> +
> #endif
>
> #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index d8d9caf02834..e7270b63abed 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -25,6 +25,7 @@
> #include <asm/image.h>
> #include <asm/kernel-pgtable.h>
> #include <asm/kvm_arm.h>
> +#include <asm/kvm_asm.h>
> #include <asm/memory.h>
> #include <asm/pgtable-hwdef.h>
> #include <asm/page.h>
> @@ -499,153 +500,38 @@ SYM_FUNC_START(el2_setup)
> isb
> ret
>
> -1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> - msr sctlr_el2, x0
> -
> +1:
> #ifdef CONFIG_ARM64_VHE
> /*
> - * Check for VHE being present. For the rest of the EL2 setup,
> - * x2 being non-zero indicates that we do have VHE, and that the
> - * kernel is intended to run at EL2.
> + * Check for VHE being present. x2 being non-zero indicates that we
> + * do have VHE, and that the kernel is intended to run at EL2.
> */
> mrs x2, id_aa64mmfr1_el1
> ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
> -#else
> - mov x2, xzr
> -#endif
> + cbz x2, el2_setup_nvhe

What initialises x2 to zero when CONFIG_VHE is disabled?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 11:33:58

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 01/24] psci: Accessor for configured PSCI version

> > +
> > struct psci_operations {
> > u32 (*get_version)(void);
> > int (*cpu_suspend)(u32 state, unsigned long entry_point);
>
> I still maintain that populating .get_version in all cases instead of
> duplicating an existing functionality is a better outcome. PSCI not
> supported would be implied by .get_version being NULL.
>
> What breaks?

Must have missed that suggestion before. Good idea, I'll do that.

2020-11-11 12:01:27

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 13/24] kvm: arm64: Add CPU entry point in nVHE hyp

On 2020-11-09 11:32, David Brazdil wrote:
> When nVHE hyp starts interception host's PSCI CPU_ON SMCs, it will need
> to install KVM on the newly booted CPU before returning to the host.
> Add
> an entry point which expects the same kvm_nvhe_init_params struct as
> the
> __kvm_hyp_init HVC in the CPU_ON context argument (x0).
>
> The entry point initializes EL2 state with the same init_el2_state
> macro
> used by the kernel's entry point. It then initializes KVM using the
> same
> helper function used in the __kvm_hyp_init HVC.
>
> When done, the entry point branches to a function provided in the init
> params.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/kvm_asm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 1 +
> arch/arm64/kvm/hyp/nvhe/hyp-init.S | 30 ++++++++++++++++++++++++++++++
> 3 files changed, 32 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h
> b/arch/arm64/include/asm/kvm_asm.h
> index 893327d1e449..efb4872bb29f 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -155,6 +155,7 @@ struct kvm_nvhe_init_params {
> unsigned long tpidr_el2;
> unsigned long hyp_stack_ptr;
> unsigned long vector_ptr;
> + unsigned long psci_cpu_entry_fn;
> };
>
> /* Translate a kernel address @ptr into its equivalent linear mapping
> */
> diff --git a/arch/arm64/kernel/asm-offsets.c
> b/arch/arm64/kernel/asm-offsets.c
> index 0cbb86135c7c..ffc84e68ad97 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -114,6 +114,7 @@ int main(void)
> DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params,
> tpidr_el2));
> DEFINE(NVHE_INIT_STACK_PTR, offsetof(struct kvm_nvhe_init_params,
> hyp_stack_ptr));
> DEFINE(NVHE_INIT_VECTOR_PTR, offsetof(struct kvm_nvhe_init_params,
> vector_ptr));
> + DEFINE(NVHE_INIT_PSCI_CPU_ENTRY_FN, offsetof(struct
> kvm_nvhe_init_params, psci_cpu_entry_fn));
> #endif
> #ifdef CONFIG_CPU_PM
> DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index 1697d25756e9..f999a35b2c8c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -6,6 +6,7 @@
>
> #include <linux/arm-smccc.h>
> #include <linux/linkage.h>
> +#include <linux/irqchip/arm-gic-v3.h>

This should probably be included from the file that provides
init_el2_state.

>
> #include <asm/alternative.h>
> #include <asm/assembler.h>
> @@ -159,6 +160,35 @@ alternative_else_nop_endif
> ret
> SYM_CODE_END(___kvm_hyp_init)
>
> +SYM_CODE_START(__kvm_hyp_cpu_entry)
> + msr SPsel, #1 // We want to use SP_EL{1,2}
> +
> + /*
> + * Check that the core was booted in EL2. Loop indefinitely if not
> + * because it cannot be safely given to the host without installing
> KVM.
> + */
> + mrs x1, CurrentEL
> + cmp x1, #CurrentEL_EL2
> + b.ne .

This is a bit brutal. Consider using a WFE/WFI loop as we have in other
places already (see __secondary_too_slow for example).

> +
> + /* Initialize EL2 CPU state to sane values. */
> + mov x29, x0
> + init_el2_state nvhe
> + mov x0, x29
> +
> + /*
> + * Load hyp VA of C entry function. Must do so before switching on
> the
> + * MMU because the struct pointer is PA and not identity-mapped in
> hyp.
> + */
> + ldr x29, [x0, #NVHE_INIT_PSCI_CPU_ENTRY_FN]
> +
> + /* Enable MMU, set vectors and stack. */
> + bl ___kvm_hyp_init
> +
> + /* Leave idmap. */
> + br x29

To a point I made against an earlier patch: psci_cpu_entry_fn seems to
be a HYP
VA, and really needs to be documented as such, because this is pretty
hard to
follow otherwise.

> +SYM_CODE_END(__kvm_hyp_cpu_entry)
> +
> SYM_CODE_START(__kvm_handle_stub_hvc)
> cmp x0, #HVC_SOFT_RESTART
> b.ne 1f

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 12:19:34

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 11/24] kvm: arm64: Add SMC handler in nVHE EL2

On 2020-11-09 11:32, David Brazdil wrote:
> Add handler of host SMCs in KVM nVHE trap handler. Forward all SMCs to
> EL3 and propagate the result back to EL1. This is done in preparation
> for validating host SMCs in KVM nVHE protected mode.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 35 ++++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 19332c20fcde..8661bc7deaa9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -106,6 +106,38 @@ static void handle_host_hcall(struct
> kvm_cpu_context *host_ctxt)
> host_ctxt->regs.regs[1] = ret;
> }
>
> +static void skip_host_instruction(void)
> +{
> + write_sysreg_el2(read_sysreg_el2(SYS_ELR) + 4, SYS_ELR);
> +}
> +
> +static void forward_host_smc(struct kvm_cpu_context *host_ctxt)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_smc(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
> + host_ctxt->regs.regs[2], host_ctxt->regs.regs[3],
> + host_ctxt->regs.regs[4], host_ctxt->regs.regs[5],
> + host_ctxt->regs.regs[6], host_ctxt->regs.regs[7],
> + &res);

How do you make sure that EL3 actually supports SMCCC 1.1? If that's not
the case, don't we risk additional registers being corrupted?

What of SMCCC 1.2 calls that extend arguments to up to x17?

> + host_ctxt->regs.regs[0] = res.a0;
> + host_ctxt->regs.regs[1] = res.a1;
> + host_ctxt->regs.regs[2] = res.a2;
> + host_ctxt->regs.regs[3] = res.a3;
> +}
> +
> +static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
> +{
> + /*
> + * Unlike HVC, the return address of an SMC is the instruction's PC.
> + * Move the return address past the instruction.
> + */
> + skip_host_instruction();
> +
> + /* Forward SMC not handled in EL2 to EL3. */
> + forward_host_smc(host_ctxt);

nit: In general, we update the PC *after* emulating the instruction that
trapped. Not a big deal, but it makes it easier to reason across the
code base.

> +}
> +
> void handle_trap(struct kvm_cpu_context *host_ctxt)
> {
> u64 esr = read_sysreg_el2(SYS_ESR);
> @@ -114,6 +146,9 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> case ESR_ELx_EC_HVC64:
> handle_host_hcall(host_ctxt);
> break;
> + case ESR_ELx_EC_SMC64:
> + handle_host_smc(host_ctxt);
> + break;
> default:
> hyp_panic();
> }

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 12:36:30

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 06/24] kvm: arm64: Support per_cpu_ptr in nVHE hyp code

> > +
> > + cpu_base_array = kern_hyp_va(&kvm_arm_hyp_percpu_base[0]);
>
> There is no guarantee that this will not generate a PC relative
> addressing, resulting in kern_hyp_va() being applied twice.
>
> Consider using hyp_symbol_addr() instead, which always does the right
> by forcing a PC relative addressing and not subsequently mangling
> the address.
>
> > + this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
> > + return this_cpu_base - (unsigned long)&__per_cpu_start;
>
> And this is the opposite case: if the compiler generates an absolute
> address, you're toast. Yes, this is just as unlikely, but hey...
> Same remedy should apply.

Good point, and I'll probably keep forgetting about this in the future. Now
that all .hyp.text is only executed under hyp page tables, should we start
thinking about fixing up the relocations?

2020-11-11 12:48:25

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 06/24] kvm: arm64: Support per_cpu_ptr in nVHE hyp code

On 2020-11-11 12:32, David Brazdil wrote:
>> > +
>> > + cpu_base_array = kern_hyp_va(&kvm_arm_hyp_percpu_base[0]);
>>
>> There is no guarantee that this will not generate a PC relative
>> addressing, resulting in kern_hyp_va() being applied twice.
>>
>> Consider using hyp_symbol_addr() instead, which always does the right
>> by forcing a PC relative addressing and not subsequently mangling
>> the address.
>>
>> > + this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
>> > + return this_cpu_base - (unsigned long)&__per_cpu_start;
>>
>> And this is the opposite case: if the compiler generates an absolute
>> address, you're toast. Yes, this is just as unlikely, but hey...
>> Same remedy should apply.
>
> Good point, and I'll probably keep forgetting about this in the future.
> Now
> that all .hyp.text is only executed under hyp page tables, should we
> start
> thinking about fixing up the relocations?

Why not, if you can deal with the hypervisor text being mapped at a
random
location, and make sure that the kernel doesn't process the relocations
for you. This would certainly save us a lot of runtime offsetting (which
I'm adding to in a separate series).

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 13:07:52

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 07/24] kvm: arm64: Create nVHE copy of cpu_logical_map

> > +/*
> > + * nVHE copy of data structures tracking available CPU cores.
> > + * Only entries for CPUs that were online at KVM init are populated.
> > + * Other CPUs should not be allowed to boot because their features were
> > + * not checked against the finalized system capabilities.
> > + */
> > +u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1]
> > = INVALID_HWID };
>
> I'm not sure what __ro_after_init means once we get S2 isolation.

It is stretching the definition of 'init' a bit, I know, but I don't see what
your worry is about S2? The intention is to mark this read-only for .hyp.text
at runtime. With S2, the host won't be able to write to it after KVM init.
Obviously that's currently not the case.

One thing we might change in the future is marking it RW for some initial
setup in a HVC handler, then marking it RO for the rest of uptime.

>
> > +
> > +u64 cpu_logical_map(int cpu)
>
> nit: is there any reason why "cpu" cannot be unsigned? The thought
> of a negative CPU number makes me shiver...

Same here. That's how it's defined in kernel proper, so I went with that.

>
> > +{
> > + if (cpu < 0 || cpu >= ARRAY_SIZE(__cpu_logical_map))
> > + hyp_panic();
> > +
> > + return __cpu_logical_map[cpu];
> > +}
> > +
> > unsigned long __hyp_per_cpu_offset(unsigned int cpu)
> > {
> > unsigned long *cpu_base_array;
>
> Overall, this patch would make more sense closer it its use case
> (in patch 19). I also don't understand why this lives in percpu.c...

I didn't think it called for adding another C file for this. How about we
rename this file to smp.c? Would that make sense for both?

2020-11-11 13:20:39

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 15/24] kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2

On 2020-11-09 11:32, David Brazdil wrote:
> Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
> with the version used by the host's PSCI driver and the function IDs it
> was configured with. If the SMC function ID matches one of the
> configured PSCI calls (for v0.1) or falls into the PSCI function ID
> range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
> SMCs return PSCI_RET_NOT_SUPPORTED.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/kvm_hyp.h | 4 ++
> arch/arm64/kvm/arm.c | 13 ++++
> arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++
> arch/arm64/kvm/hyp/nvhe/psci.c | 102 +++++++++++++++++++++++++++++
> include/uapi/linux/psci.h | 1 +
> 6 files changed, 125 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/kvm/hyp/nvhe/psci.c
>
> diff --git a/arch/arm64/include/asm/kvm_hyp.h
> b/arch/arm64/include/asm/kvm_hyp.h
> index a3289071f3d8..95a2bbbcc7e1 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -96,6 +96,10 @@ void deactivate_traps_vhe_put(void);
>
> u64 __guest_enter(struct kvm_vcpu *vcpu);
>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
> +#endif
> +
> void __noreturn hyp_panic(void);
> #ifdef __KVM_NVHE_HYPERVISOR__
> void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr,
> u64 par);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 1a57b6025937..28e3bc056225 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -19,6 +19,7 @@
> #include <linux/kvm_irqfd.h>
> #include <linux/irqbypass.h>
> #include <linux/sched/stat.h>
> +#include <linux/psci.h>
> #include <trace/events/kvm.h>
>
> #define CREATE_TRACE_POINTS
> @@ -1498,6 +1499,17 @@ static void init_cpu_logical_map(void)
> CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
> }
>
> +static void init_psci(void)

nit: init_psci_relay?

> +{
> + extern u32 kvm_nvhe_sym(kvm_host_psci_version);
> + extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
> + int i;
> +
> + CHOOSE_NVHE_SYM(kvm_host_psci_version) = psci_driver_version();
> + for (i = 0; i < PSCI_FN_MAX; ++i)
> + CHOOSE_NVHE_SYM(kvm_host_psci_function_id)[i] =
> psci_get_function_id(i);
> +}
> +
> static int init_common_resources(void)
> {
> return kvm_set_ipa_limit();
> @@ -1677,6 +1689,7 @@ static int init_hyp_mode(void)
> }
>
> init_cpu_logical_map();
> + init_psci();
>
> return 0;
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index c45f440cce51..647b63337a51 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -7,7 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__
> ccflags-y := -D__KVM_NVHE_HYPERVISOR__
>
> obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o
> host.o \
> - hyp-main.o percpu.o
> + hyp-main.o percpu.o psci.o
> obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o
> ../entry.o \
> ../fpsimd.o ../hyp-entry.o
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 8661bc7deaa9..69f34d4f2773 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -134,6 +134,10 @@ static void handle_host_smc(struct
> kvm_cpu_context *host_ctxt)
> */
> skip_host_instruction();
>
> + /* Try to handle host's PSCI SMCs. */
> + if (kvm_host_psci_handler(host_ctxt))
> + return;
> +
> /* Forward SMC not handled in EL2 to EL3. */
> forward_host_smc(host_ctxt);
> }
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c
> b/arch/arm64/kvm/hyp/nvhe/psci.c
> new file mode 100644
> index 000000000000..82d3b2c89658
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/psci.c

nit: can we please name this psci-relay.c, or psci-proxy.c?
We already have a psci.c in the tree, and having the same file name
messes
with my editor... ;-)

> @@ -0,0 +1,102 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: David Brazdil <[email protected]>
> + */
> +
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <kvm/arm_hypercalls.h>
> +#include <linux/arm-smccc.h>
> +#include <linux/psci.h>
> +#include <kvm/arm_psci.h>
> +#include <uapi/linux/psci.h>
> +
> +/* Config options set by the host. */
> +u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
> +u32 kvm_host_psci_function_id[PSCI_FN_MAX];
> +
> +static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
> +{
> + return host_ctxt->regs.regs[0];
> +}
> +
> +static bool is_psci_0_1_call(u64 func_id)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < ARRAY_SIZE(kvm_host_psci_function_id); ++i) {
> + if (func_id == kvm_host_psci_function_id[i])
> + return true;
> + }
> + return false;
> +}
> +
> +static bool is_psci_0_2_fn_call(u64 func_id)
> +{
> + u64 base = func_id & ~PSCI_0_2_FN_ID_MASK;
> +
> + return base == PSCI_0_2_FN_BASE || base == PSCI_0_2_FN64_BASE;

I couldn't spot in the spec where PSCI reserves 16bit worth of IDs in
each range.

> +}
> +
> +static bool is_psci_call(u64 func_id)
> +{
> + if (kvm_host_psci_version == PSCI_VERSION(0, 0))
> + return false;
> + else if (kvm_host_psci_version == PSCI_VERSION(0, 1))
> + return is_psci_0_1_call(func_id);
> + else
> + return is_psci_0_2_fn_call(func_id);

Consider using switch/case constructs for readability.

> +}
> +
> +static unsigned long psci_0_1_handler(u64 func_id, struct
> kvm_cpu_context *host_ctxt)
> +{
> + return PSCI_RET_NOT_SUPPORTED;
> +}
> +
> +static unsigned long psci_0_2_handler(u64 func_id, struct
> kvm_cpu_context *host_ctxt)
> +{
> + switch (func_id) {
> + default:
> + return PSCI_RET_NOT_SUPPORTED;
> + }
> +}
> +
> +static unsigned long psci_1_0_handler(u64 func_id, struct
> kvm_cpu_context *host_ctxt)
> +{
> + int ret;
> +
> + ret = psci_0_2_handler(func_id, host_ctxt);
> + if (ret != PSCI_RET_NOT_SUPPORTED)
> + return ret;
> +
> + switch (func_id) {
> + default:
> + return PSCI_RET_NOT_SUPPORTED;
> + }

It would probably help to adopt the same structure as we have in the
KVM PSCI implementation:

switch(psci_fn) {
case PSCI_0_2_FN_PSCI_VERSION:
val = KVM_ARM_PSCI_1_0;
break;

[...]
default:
return kvm_psci_0_2_call(vcpu);

which allows 1.0 to override some 0.2 functions, and otherwise leave
it to the 0.2 backend.

> +}
> +
> +bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
> +{
> + u64 func_id = get_psci_func_id(host_ctxt);
> + unsigned long ret;
> +
> + if (!is_psci_call(func_id))
> + return false;
> +
> + if (kvm_host_psci_version == PSCI_VERSION(0, 1))
> + ret = psci_0_1_handler(func_id, host_ctxt);
> + else if (kvm_host_psci_version == PSCI_VERSION(0, 2))
> + ret = psci_0_2_handler(func_id, host_ctxt);
> + else if (PSCI_VERSION_MAJOR(kvm_host_psci_version) >= 1)
> + ret = psci_1_0_handler(func_id, host_ctxt);
> + else
> + ret = PSCI_RET_NOT_SUPPORTED;

Same remark about the use of switch/case.

> +
> + host_ctxt->regs.regs[0] = ret;
> + host_ctxt->regs.regs[1] = 0;
> + host_ctxt->regs.regs[2] = 0;
> + host_ctxt->regs.regs[3] = 0;
> + return true;
> +}
> diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
> index 2fcad1dd0b0e..0d52b8dbe8c2 100644
> --- a/include/uapi/linux/psci.h
> +++ b/include/uapi/linux/psci.h
> @@ -29,6 +29,7 @@
> #define PSCI_0_2_FN64_BASE \
> (PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
> #define PSCI_0_2_FN64(n) (PSCI_0_2_FN64_BASE + (n))
> +#define PSCI_0_2_FN_ID_MASK 0xffff
>
> #define PSCI_0_2_FN_PSCI_VERSION PSCI_0_2_FN(0)
> #define PSCI_0_2_FN_CPU_SUSPEND PSCI_0_2_FN(1)

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 13:32:25

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 07/24] kvm: arm64: Create nVHE copy of cpu_logical_map

On 2020-11-11 13:03, David Brazdil wrote:
>> > +/*
>> > + * nVHE copy of data structures tracking available CPU cores.
>> > + * Only entries for CPUs that were online at KVM init are populated.
>> > + * Other CPUs should not be allowed to boot because their features were
>> > + * not checked against the finalized system capabilities.
>> > + */
>> > +u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1]
>> > = INVALID_HWID };
>>
>> I'm not sure what __ro_after_init means once we get S2 isolation.
>
> It is stretching the definition of 'init' a bit, I know, but I don't
> see what
> your worry is about S2? The intention is to mark this read-only for
> .hyp.text
> at runtime. With S2, the host won't be able to write to it after KVM
> init.
> Obviously that's currently not the case.

More importantly, EL2 can write to it at any time, which is the bit I'm
worried
about, as it makes the annotation misleading.

> One thing we might change in the future is marking it RW for some
> initial
> setup in a HVC handler, then marking it RO for the rest of uptime.

That'd be a desirable outcome, and it would be consistent with the rest
of the kernel.

>
>>
>> > +
>> > +u64 cpu_logical_map(int cpu)
>>
>> nit: is there any reason why "cpu" cannot be unsigned? The thought
>> of a negative CPU number makes me shiver...
>
> Same here. That's how it's defined in kernel proper, so I went with
> that.

I'm happy to deviate from the kernel (give the function a different name
if this clashes with existing include files).

We can also fix the rest of the kernel (I've just written the trivial
patch).

>>
>> > +{
>> > + if (cpu < 0 || cpu >= ARRAY_SIZE(__cpu_logical_map))
>> > + hyp_panic();
>> > +
>> > + return __cpu_logical_map[cpu];
>> > +}
>> > +
>> > unsigned long __hyp_per_cpu_offset(unsigned int cpu)
>> > {
>> > unsigned long *cpu_base_array;
>>
>> Overall, this patch would make more sense closer it its use case
>> (in patch 19). I also don't understand why this lives in percpu.c...
>
> I didn't think it called for adding another C file for this. How about
> we
> rename this file to smp.c? Would that make sense for both?

Make that hyp-smp.c, please!

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 13:48:17

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 07/24] kvm: arm64: Create nVHE copy of cpu_logical_map

On Wed, Nov 11, 2020 at 01:29:29PM +0000, Marc Zyngier wrote:
> On 2020-11-11 13:03, David Brazdil wrote:
> > > > +/*
> > > > + * nVHE copy of data structures tracking available CPU cores.
> > > > + * Only entries for CPUs that were online at KVM init are populated.
> > > > + * Other CPUs should not be allowed to boot because their features were
> > > > + * not checked against the finalized system capabilities.
> > > > + */
> > > > +u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1]
> > > > = INVALID_HWID };
> > >
> > > I'm not sure what __ro_after_init means once we get S2 isolation.
> >
> > It is stretching the definition of 'init' a bit, I know, but I don't see
> > what
> > your worry is about S2? The intention is to mark this read-only for
> > .hyp.text
> > at runtime. With S2, the host won't be able to write to it after KVM
> > init.
> > Obviously that's currently not the case.
>
> More importantly, EL2 can write to it at any time, which is the bit I'm
> worried
> about, as it makes the annotation misleading.

EL2 can't, at least not accidentally. The hyp memory mapping is PAGE_HYP_RO
(see patch 05). Does this annotation have stronger guarantees in EL1?
AFAICT, those variables are made PAGE_KERNEL_RO in mark_rodata_ro().

>
> > One thing we might change in the future is marking it RW for some
> > initial
> > setup in a HVC handler, then marking it RO for the rest of uptime.
>
> That'd be a desirable outcome, and it would be consistent with the rest
> of the kernel.
>
> >
> > >
> > > > +
> > > > +u64 cpu_logical_map(int cpu)
> > >
> > > nit: is there any reason why "cpu" cannot be unsigned? The thought
> > > of a negative CPU number makes me shiver...
> >
> > Same here. That's how it's defined in kernel proper, so I went with
> > that.
>
> I'm happy to deviate from the kernel (give the function a different name
> if this clashes with existing include files).
>
> We can also fix the rest of the kernel (I've just written the trivial
> patch).

Shouldn't clash with include files. Where fixing the kernel might clash is
all the users of for_each_*_cpu that use an int for the iterator var.

2020-11-11 13:50:02

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 16/24] kvm: arm64: Add offset for hyp VA <-> PA conversion

On 2020-11-09 11:32, David Brazdil wrote:
> Add a host-initialized constant to KVM nVHE hyp code for converting
> between EL2 linear map virtual addresses and physical addresses.
> Also add `__hyp_pa` macro that performs the conversion.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 15 +++++++++++++++
> arch/arm64/kvm/hyp/nvhe/psci.c | 3 +++
> 2 files changed, 18 insertions(+)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 28e3bc056225..dc7d43d7785a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1484,6 +1484,20 @@ static inline void hyp_cpu_pm_exit(void)
> }
> #endif
>
> +static void init_hyp_physvirt_offset(void)
> +{
> + extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
> + unsigned long kern_vaddr, hyp_vaddr, paddr;
> +
> + /* Check that kvm_arm_hyp_percpu_base has been set. */
> + BUG_ON(kvm_arm_hyp_percpu_base[0] == 0);

Why is this dependent on the percpu base? Or is that just a convenient
symbol?

> +
> + kern_vaddr = kvm_arm_hyp_percpu_base[0];
> + hyp_vaddr = kern_hyp_va(kern_vaddr);
> + paddr = __pa(kern_vaddr);
> + CHOOSE_NVHE_SYM(hyp_physvirt_offset) = (s64)paddr - (s64)hyp_vaddr;
> +}

It feels like this offset could be set at the point where we compute
the hyp_va offset in va_layout.c, couldn't it? It would have the
advantage of keeping all the ugly VA hacks together.

> +
> static void init_cpu_logical_map(void)
> {
> extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
> @@ -1688,6 +1702,7 @@ static int init_hyp_mode(void)
> }
> }
>
> + init_hyp_physvirt_offset();
> init_cpu_logical_map();
> init_psci();
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c
> b/arch/arm64/kvm/hyp/nvhe/psci.c
> index 82d3b2c89658..b0b5df590ba5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci.c
> @@ -16,6 +16,9 @@
> /* Config options set by the host. */
> u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
> u32 kvm_host_psci_function_id[PSCI_FN_MAX];
> +s64 hyp_physvirt_offset;
> +
> +#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
>
> static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
> {

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 13:54:08

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 07/24] kvm: arm64: Create nVHE copy of cpu_logical_map

On 2020-11-11 13:45, David Brazdil wrote:
> On Wed, Nov 11, 2020 at 01:29:29PM +0000, Marc Zyngier wrote:
>> On 2020-11-11 13:03, David Brazdil wrote:
>> > > > +/*
>> > > > + * nVHE copy of data structures tracking available CPU cores.
>> > > > + * Only entries for CPUs that were online at KVM init are populated.
>> > > > + * Other CPUs should not be allowed to boot because their features were
>> > > > + * not checked against the finalized system capabilities.
>> > > > + */
>> > > > +u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1]
>> > > > = INVALID_HWID };
>> > >
>> > > I'm not sure what __ro_after_init means once we get S2 isolation.
>> >
>> > It is stretching the definition of 'init' a bit, I know, but I don't see
>> > what
>> > your worry is about S2? The intention is to mark this read-only for
>> > .hyp.text
>> > at runtime. With S2, the host won't be able to write to it after KVM
>> > init.
>> > Obviously that's currently not the case.
>>
>> More importantly, EL2 can write to it at any time, which is the bit
>> I'm
>> worried
>> about, as it makes the annotation misleading.
>
> EL2 can't, at least not accidentally. The hyp memory mapping is
> PAGE_HYP_RO
> (see patch 05).

Ah, I obviously overlooked that. Thanks for setting me straight.

> Shouldn't clash with include files. Where fixing the kernel might clash
> is
> all the users of for_each_*_cpu that use an int for the iterator var.

I don't think that's a problem (nobody expects that many CPUs). But if
you
are confident that we don't have a problem, no need to change the kernel
itself.

M.
--
Jazz is not dead. It just smells funny...

2020-11-11 14:35:09

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] Opt-in always-on nVHE hypervisor

Hi David,

On 2020-11-09 11:32, David Brazdil wrote:
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.
>
> All functionality described below is opt-in, guarded by an early param
> 'kvm-arm.protected'. Future patches specific to the new "protected"
> mode
> should be hidden behind the same param.
>
> The hypervisor starts trapping host SMCs and intercepting host's PSCI
> CPU_ON/OFF/SUSPEND calls. It replaces the host's entry point with its
> own, initializes the EL2 state of the new CPU and installs the nVHE hyp
> vector before ERETing to the host's entry point.
>
> The kernel checks new cores' features against the finalized system
> capabilities. To avoid the need to move this code/data to EL2, the
> implementation only allows to boot cores that were online at the time
> of
> KVM initialization and therefore had been checked already.
>
> Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
> implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
> to EL3. Future changes will need to ensure the safety of all SMCs wrt.
> private guests.
>
> The host is still allowed to reset EL2 back to the stub vector, eg. for
> hibernation or kexec, but will not disable nVHE when there are no VMs.
>
> Tested on Rock Pi 4b, based on 5.10-rc3.

I think I've gone through most of the patches. When you respin this
series, you may want to do so on top of my host EL2 entry rework [1],
which change a few things you currently rely on.

If anything in there doesn't work for you, please let me know.

Thanks,

M.

[1] https://lore.kernel.org/kvm/[email protected]/
--
Jazz is not dead. It just smells funny...

2020-11-16 11:53:20

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 13/24] kvm: arm64: Add CPU entry point in nVHE hyp

> > #ifdef CONFIG_CPU_PM
> > DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > index 1697d25756e9..f999a35b2c8c 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > @@ -6,6 +6,7 @@
> >
> > #include <linux/arm-smccc.h>
> > #include <linux/linkage.h>
> > +#include <linux/irqchip/arm-gic-v3.h>
>
> This should probably be included from the file that provides init_el2_state.

Agreed. This is a workaround for the fact that the arm-gic* headers don't play
nice with each other (define the same constants). Including arm-gic-v3.h in
kvm_asm.h will trigger macro redefine warnings in vgic*-v2.c because they
include arm-gic.h.

Another option is to create a header just for el2 init. Would that be
preferable? Other ideas?

>
> >
> > #include <asm/alternative.h>
> > #include <asm/assembler.h>
> > @@ -159,6 +160,35 @@ alternative_else_nop_endif
> > ret
> > SYM_CODE_END(___kvm_hyp_init)
> >
> > +SYM_CODE_START(__kvm_hyp_cpu_entry)
> > + msr SPsel, #1 // We want to use SP_EL{1,2}
> > +
> > + /*
> > + * Check that the core was booted in EL2. Loop indefinitely if not
> > + * because it cannot be safely given to the host without installing
> > KVM.
> > + */
> > + mrs x1, CurrentEL
> > + cmp x1, #CurrentEL_EL2
> > + b.ne .
>
> This is a bit brutal. Consider using a WFE/WFI loop as we have in other
> places already (see __secondary_too_slow for example).

Ack

> > +
> > + /* Initialize EL2 CPU state to sane values. */
> > + mov x29, x0
> > + init_el2_state nvhe
> > + mov x0, x29
> > +
> > + /*
> > + * Load hyp VA of C entry function. Must do so before switching on the
> > + * MMU because the struct pointer is PA and not identity-mapped in
> > hyp.
> > + */
> > + ldr x29, [x0, #NVHE_INIT_PSCI_CPU_ENTRY_FN]
> > +
> > + /* Enable MMU, set vectors and stack. */
> > + bl ___kvm_hyp_init
> > +
> > + /* Leave idmap. */
> > + br x29
>
> To a point I made against an earlier patch: psci_cpu_entry_fn seems to be a
> HYP
> VA, and really needs to be documented as such, because this is pretty hard
> to
> follow otherwise.
>
> > +SYM_CODE_END(__kvm_hyp_cpu_entry)
> > +
> > SYM_CODE_START(__kvm_handle_stub_hvc)
> > cmp x0, #HVC_SOFT_RESTART
> > b.ne 1f
>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...

2020-11-16 12:45:04

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 13/24] kvm: arm64: Add CPU entry point in nVHE hyp

On 2020-11-16 11:49, David Brazdil wrote:
>> > #ifdef CONFIG_CPU_PM
>> > DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
>> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> > b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> > index 1697d25756e9..f999a35b2c8c 100644
>> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> > @@ -6,6 +6,7 @@
>> >
>> > #include <linux/arm-smccc.h>
>> > #include <linux/linkage.h>
>> > +#include <linux/irqchip/arm-gic-v3.h>
>>
>> This should probably be included from the file that provides
>> init_el2_state.
>
> Agreed. This is a workaround for the fact that the arm-gic* headers
> don't play
> nice with each other (define the same constants).

Ah, that...

> Including arm-gic-v3.h in
> kvm_asm.h will trigger macro redefine warnings in vgic*-v2.c because
> they
> include arm-gic.h.

Boo.

> Another option is to create a header just for el2 init. Would that be
> preferable? Other ideas?

Having an asm/el2_setup.h file feels like a better option. After all, it
is in no way KVM specific, and having the macros hanging around in
asm/kvm_asm.h feels odd.

M.
--
Jazz is not dead. It just smells funny...

2020-11-16 18:00:41

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v1 10/24] kvm: arm64: Extract parts of el2_setup into a macro

Hi Marc,

> > + * Check for VHE being present. x2 being non-zero indicates that we
> > + * do have VHE, and that the kernel is intended to run at EL2.
> > */
> > mrs x2, id_aa64mmfr1_el1
> > ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
> > -#else
> > - mov x2, xzr
> > -#endif
> > + cbz x2, el2_setup_nvhe
>
> What initialises x2 to zero when CONFIG_VHE is disabled?

There is no need for x2 anymore, the nVHE code just falls through. Basically:

el2_setup:
< check CurrentEL >
b.eq 1f
ret
1:
#ifdef CONFIG_VHE
< check has VHE >
cbz el2_setup_nvhe
< VHE setup >
ret
#endif
el2_setup_nvhe:
< nVHE setup >
eret

-David

2021-01-19 13:32:02

by Janne Karhunen

[permalink] [raw]
Subject: Re: [PATCH v1 00/24] Opt-in always-on nVHE hypervisor

On Tue, Nov 10, 2020 at 1:19 PM Marc Zyngier <[email protected]> wrote:

> > Why? I thought we were trying to kill nVHE off now that newer CPUs
> > provide the saner virtualization extensions?
>
> We can't kill nVHE at all, because that is the only game in town.
> You can't even buy a decent machine with VHE, no matter how much money
> you put on the table.

As I mentioned it earlier, we did this type of nVHE hypervisor and the
proof of concept is here:
https://github.com/jkrh/kvms

See the README. It runs successfully on multiple pieces of arm64
hardware and provides a tiny QEMU based development environment via
the makefiles for the QEMU 'max' CPU. The code is rough, the amount of
man hours put to it is not sky high, but it does run. I'll update a
new kernel patch to patches/ dir for one of the later kernels
hopefully next week, up to now we have only supported kernels between
4.9 .... 5.6 as this is what our development hardware(s) run with. It
requires a handful of hooks in the kvm code, but the actual kvm calls
are just rerouted back to the kernel symbols. This way the hypervisor
itself can be kept very tiny.

The s2 page tables are fully owned by the hyp and the guests are
unmapped from the host memory when configured with the option (we call
it host blinding). Multiple VMs can be run without pinning them into
the memory. It also provides a tiny out of tree driver prototype stub
to protect the critical sections of the kernel memory beyond the
kernel's own reach. There are still holes in the implementation such
as the virtio-mapback handling via whitelisting and paging integrity
checks, and many things are not quite all the way there yet. One step
at a time.


--
Janne