2020-11-17 02:06:25

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 00/24] Opt-in always-on nVHE hypervisor

As we progress towards being able to keep guest state private to the
host running nVHE hypervisor, this series allows the hypervisor to
install itself on newly booted CPUs before the host is allowed to run
on them.

All functionality described below is opt-in, guarded by an early param
'kvm-arm.protected'. Future patches specific to the new "protected" mode
should be hidden behind the same param.

The hypervisor starts trapping host SMCs and intercepting host's PSCI
CPU_ON/SUSPEND calls. It replaces the host's entry point with its own,
initializes the EL2 state of the new CPU and installs the nVHE hyp vector
before ERETing to the host's entry point.

The kernel checks new cores' features against the finalized system
capabilities. To avoid the need to move this code/data to EL2, the
implementation only allows to boot cores that were online at the time of
KVM initialization and therefore had been checked already.

Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
to EL3. Future changes will need to ensure the safety of all SMCs wrt.
private guests.

The host is still allowed to reset EL2 back to the stub vector, eg. for
hibernation or kexec, but will not disable nVHE when there are no VMs.

Tested on Rock Pi 4b, based on 5.10-rc4.

changes since v1:
* early param sets a capability instead of a static key
* assume SMCCC v1.2 for host SMC forwarding
* fix reserved SMC ID range for PSCI
* split init_el2_state into smaller macros, move to el2_setup.h
* many small cleanups

changes since RFC:
* add early param to make features opt-in
* simplify CPU_ON/SUSPEND implementation
* replace spinlocks with CAS atomic
* make cpu_logical_map ro_after_init

David Brazdil (24):
psci: Support psci_ops.get_version for v0.1
psci: Accessor for configured PSCI function IDs
arm64: Make cpu_logical_map() take unsigned int
arm64: Move MAIR_EL1_SET to asm/memory.h
kvm: arm64: Initialize MAIR_EL2 using a constant
kvm: arm64: Move hyp-init params to a per-CPU struct
kvm: arm64: Refactor handle_trap to use a switch
kvm: arm64: Add SMC handler in nVHE EL2
kvm: arm64: Add .hyp.data..ro_after_init ELF section
kvm: arm64: Support per_cpu_ptr in nVHE hyp code
kvm: arm64: Create nVHE copy of cpu_logical_map
kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2
kvm: arm64: Add offset for hyp VA <-> PA conversion
kvm: arm64: Forward safe PSCI SMCs coming from host
kvm: arm64: Extract parts of el2_setup into a macro
kvm: arm64: Extract __do_hyp_init into a helper function
kvm: arm64: Add CPU entry point in nVHE hyp
kvm: arm64: Add function to enter host from KVM nVHE hyp code
kvm: arm64: Intercept host's PSCI_CPU_ON SMCs
kvm: arm64: Intercept host's CPU_SUSPEND PSCI SMCs
kvm: arm64: Add kvm-arm.protected early kernel parameter
kvm: arm64: Keep nVHE EL2 vector installed
kvm: arm64: Trap host SMCs in protected mode.
kvm: arm64: Fix EL2 mode availability checks

arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/el2_setup.h | 185 +++++++++++++++++
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 16 +-
arch/arm64/include/asm/kvm_hyp.h | 8 +
arch/arm64/include/asm/memory.h | 29 ++-
arch/arm64/include/asm/percpu.h | 6 +
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/include/asm/smp.h | 4 +-
arch/arm64/include/asm/sysreg.h | 30 +++
arch/arm64/include/asm/virt.h | 26 +++
arch/arm64/kernel/asm-offsets.c | 5 +
arch/arm64/kernel/cpufeature.c | 29 +++
arch/arm64/kernel/head.S | 144 ++-----------
arch/arm64/kernel/image-vars.h | 3 +
arch/arm64/kernel/setup.c | 2 +-
arch/arm64/kernel/vmlinux.lds.S | 10 +
arch/arm64/kvm/arm.c | 94 +++++++--
arch/arm64/kvm/hyp/nvhe/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/host.S | 47 +++++
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 90 +++++++--
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 47 ++++-
arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 40 ++++
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 289 +++++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +-
arch/arm64/kvm/va_layout.c | 30 ++-
arch/arm64/mm/proc.S | 15 +-
drivers/firmware/psci/psci.c | 21 +-
include/linux/psci.h | 10 +
30 files changed, 977 insertions(+), 217 deletions(-)
create mode 100644 arch/arm64/include/asm/el2_setup.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/hyp-smp.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/psci-relay.c

--
2.29.2.299.gdc1121823c-goog


2020-11-17 02:07:28

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 15/24] kvm: arm64: Extract parts of el2_setup into a macro

When the a CPU is booted in EL2, the kernel checks for VHE support and
initializes the CPU core accordingly. For nVHE it also installs the stub
vectors and drops down to EL1.

Once KVM gains the ability to boot cores without going through the
kernel entry point, it will need to initialize the CPU the same way.
Extract the relevant bits of el2_setup into an init_el2_state macro
with an argument specifying whether to initialize for VHE or nVHE.

No functional change. Size of el2_setup increased by 148 bytes due
to duplication.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/el2_setup.h | 185 +++++++++++++++++++++++++++++
arch/arm64/kernel/head.S | 144 +++-------------------
2 files changed, 201 insertions(+), 128 deletions(-)
create mode 100644 arch/arm64/include/asm/el2_setup.h

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
new file mode 100644
index 000000000000..e5026e0aa878
--- /dev/null
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -0,0 +1,185 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Author: Marc Zyngier <[email protected]>
+ */
+
+#ifndef __ARM_KVM_INIT_H__
+#define __ARM_KVM_INIT_H__
+
+#ifndef __ASSEMBLY__
+#error Assembly-only header
+#endif
+
+#ifdef CONFIG_ARM_GIC_V3
+#include <linux/irqchip/arm-gic-v3.h>
+#endif
+
+#include <asm/kvm_arm.h>
+#include <asm/ptrace.h>
+#include <asm/sysreg.h>
+
+.macro __init_el2_sctlr
+ mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
+ msr sctlr_el2, x0
+ isb
+.endm
+
+/*
+ * Allow Non-secure EL1 and EL0 to access physical timer and counter.
+ * This is not necessary for VHE, since the host kernel runs in EL2,
+ * and EL0 accesses are configured in the later stage of boot process.
+ * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
+ * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
+ * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
+ * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
+ * EL2.
+ */
+.macro __init_el2_timers mode
+.ifeqs "\mode", "nvhe"
+ mrs x0, cnthctl_el2
+ orr x0, x0, #3 // Enable EL1 physical timers
+ msr cnthctl_el2, x0
+.endif
+ msr cntvoff_el2, xzr // Clear virtual offset
+.endm
+
+.macro __init_el2_debug mode
+ mrs x1, id_aa64dfr0_el1
+ sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
+ cmp x0, #1
+ b.lt 1f // Skip if no PMU present
+ mrs x0, pmcr_el0 // Disable debug access traps
+ ubfx x0, x0, #11, #5 // to EL2 and allow access to
+1:
+ csel x2, xzr, x0, lt // all PMU counters from EL1
+
+ /* Statistical profiling */
+ ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
+ cbz x0, 3f // Skip if SPE not present
+
+.ifeqs "\mode", "nvhe"
+ mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2,
+ and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
+ cbnz x0, 2f // then permit sampling of physical
+ mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
+ 1 << SYS_PMSCR_EL2_PA_SHIFT)
+ msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter
+2:
+ mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
+ orr x2, x2, x0 // If we don't have VHE, then
+ // use EL1&0 translation.
+.else
+ orr x2, x2, #MDCR_EL2_TPMS // For VHE, use EL2 translation
+ // and disable access from EL1
+.endif
+
+3:
+ msr mdcr_el2, x2 // Configure debug traps
+.endm
+
+/* LORegions */
+.macro __init_el2_lor
+ mrs x1, id_aa64mmfr1_el1
+ ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
+ cbz x0, 1f
+ msr_s SYS_LORC_EL1, xzr
+1:
+.endm
+
+/* Stage-2 translation */
+.macro __init_el2_stage2
+ msr vttbr_el2, xzr
+.endm
+
+/* GICv3 system register access */
+#ifdef CONFIG_ARM_GIC_V3
+.macro __init_el2_gicv3
+ mrs x0, id_aa64pfr0_el1
+ ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
+ cbz x0, 1f
+
+ mrs_s x0, SYS_ICC_SRE_EL2
+ orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
+ orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
+ msr_s SYS_ICC_SRE_EL2, x0
+ isb // Make sure SRE is now set
+ mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
+ tbz x0, #0, 1f // and check that it sticks
+ msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
+1:
+.endm
+#endif
+
+/* Virtual CPU ID registers */
+.macro __init_el2_nvhe_idregs
+ mrs x0, midr_el1
+ mrs x1, mpidr_el1
+ msr vpidr_el2, x0
+ msr vmpidr_el2, x1
+.endm
+
+/* Coprocessor traps */
+.macro __init_el2_nvhe_cptr
+ mov x0, #0x33ff
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+.endm
+
+/* SVE register access */
+.macro __init_el2_nvhe_sve
+ mrs x1, id_aa64pfr0_el1
+ ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
+ cbz x1, 1f
+
+ bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+ isb
+ mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
+ msr_s SYS_ZCR_EL2, x1 // length for EL1.
+1:
+.endm
+
+.macro __init_el2_nvhe_spsr
+ mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
+ PSR_MODE_EL1h)
+ msr spsr_el2, x0
+.endm
+
+.macro init_el2_state mode
+
+.ifnes "\mode", "vhe"
+.ifnes "\mode", "nvhe"
+.error "Invalid 'mode' argument"
+.endif
+.endif
+
+ __init_el2_sctlr
+ __init_el2_timers \mode
+ __init_el2_debug \mode
+ __init_el2_lor
+ __init_el2_stage2
+
+#ifdef CONFIG_ARM_GIC_V3
+ __init_el2_gicv3
+#endif
+
+#ifdef CONFIG_COMPAT
+ msr hstr_el2, xzr // Disable CP15 traps to EL2
+#endif
+
+ /*
+ * When VHE is not in use, early init of EL2 needs to be done here.
+ * When VHE _is_ in use, EL1 will not be used in the host and
+ * requires no configuration, and all non-hyp-specific EL2 setup
+ * will be done via the _EL1 system register aliases in __cpu_setup.
+ */
+.ifeqs "\mode", "nvhe"
+ __init_el2_nvhe_idregs
+ __init_el2_nvhe_cptr
+ __init_el2_nvhe_sve
+ __init_el2_nvhe_spsr
+.endif
+
+.endm
+
+#endif /* __ARM_KVM_INIT_H__ */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d8d9caf02834..da913ce9e89f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -11,7 +11,6 @@

#include <linux/linkage.h>
#include <linux/init.h>
-#include <linux/irqchip/arm-gic-v3.h>
#include <linux/pgtable.h>

#include <asm/asm_pointer_auth.h>
@@ -21,6 +20,7 @@
#include <asm/asm-offsets.h>
#include <asm/cache.h>
#include <asm/cputype.h>
+#include <asm/el2_setup.h>
#include <asm/elf.h>
#include <asm/image.h>
#include <asm/kernel-pgtable.h>
@@ -493,159 +493,47 @@ SYM_FUNC_START(el2_setup)
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.eq 1f
+
mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
msr sctlr_el1, x0
mov w0, #BOOT_CPU_MODE_EL1 // This cpu booted in EL1
isb
ret

-1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
- msr sctlr_el2, x0
-
+1:
#ifdef CONFIG_ARM64_VHE
/*
- * Check for VHE being present. For the rest of the EL2 setup,
- * x2 being non-zero indicates that we do have VHE, and that the
- * kernel is intended to run at EL2.
+ * Check for VHE being present. x2 being non-zero indicates that we
+ * do have VHE, and that the kernel is intended to run at EL2.
*/
mrs x2, id_aa64mmfr1_el1
ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
-#else
- mov x2, xzr
-#endif
+ cbz x2, el2_setup_nvhe

- /* Hyp configuration. */
- mov_q x0, HCR_HOST_NVHE_FLAGS
- cbz x2, set_hcr
mov_q x0, HCR_HOST_VHE_FLAGS
-set_hcr:
msr hcr_el2, x0
isb

- /*
- * Allow Non-secure EL1 and EL0 to access physical timer and counter.
- * This is not necessary for VHE, since the host kernel runs in EL2,
- * and EL0 accesses are configured in the later stage of boot process.
- * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
- * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
- * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
- * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
- * EL2.
- */
- cbnz x2, 1f
- mrs x0, cnthctl_el2
- orr x0, x0, #3 // Enable EL1 physical timers
- msr cnthctl_el2, x0
-1:
- msr cntvoff_el2, xzr // Clear virtual offset
-
-#ifdef CONFIG_ARM_GIC_V3
- /* GICv3 system register access */
- mrs x0, id_aa64pfr0_el1
- ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
- cbz x0, 3f
-
- mrs_s x0, SYS_ICC_SRE_EL2
- orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
- orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
- msr_s SYS_ICC_SRE_EL2, x0
- isb // Make sure SRE is now set
- mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
- tbz x0, #0, 3f // and check that it sticks
- msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
-
-3:
-#endif
-
- /* Populate ID registers. */
- mrs x0, midr_el1
- mrs x1, mpidr_el1
- msr vpidr_el2, x0
- msr vmpidr_el2, x1
-
-#ifdef CONFIG_COMPAT
- msr hstr_el2, xzr // Disable CP15 traps to EL2
-#endif
-
- /* EL2 debug */
- mrs x1, id_aa64dfr0_el1
- sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
- cmp x0, #1
- b.lt 4f // Skip if no PMU present
- mrs x0, pmcr_el0 // Disable debug access traps
- ubfx x0, x0, #11, #5 // to EL2 and allow access to
-4:
- csel x3, xzr, x0, lt // all PMU counters from EL1
-
- /* Statistical profiling */
- ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
- cbz x0, 7f // Skip if SPE not present
- cbnz x2, 6f // VHE?
- mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
- and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
- cbnz x4, 5f // then permit sampling of physical
- mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
- 1 << SYS_PMSCR_EL2_PA_SHIFT)
- msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
-5:
- mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
- orr x3, x3, x1 // If we don't have VHE, then
- b 7f // use EL1&0 translation.
-6: // For VHE, use EL2 translation
- orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
-7:
- msr mdcr_el2, x3 // Configure debug traps
-
- /* LORegions */
- mrs x1, id_aa64mmfr1_el1
- ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
- cbz x0, 1f
- msr_s SYS_LORC_EL1, xzr
-1:
-
- /* Stage-2 translation */
- msr vttbr_el2, xzr
-
- cbz x2, install_el2_stub
+ init_el2_state vhe

mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
isb
ret
+#endif

-SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
- /*
- * When VHE is not in use, early init of EL2 and EL1 needs to be
- * done here.
- * When VHE _is_ in use, EL1 will not be used in the host and
- * requires no configuration, and all non-hyp-specific EL2 setup
- * will be done via the _EL1 system register aliases in __cpu_setup.
- */
- mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
- msr sctlr_el1, x0
-
- /* Coprocessor traps. */
- mov x0, #0x33ff
- msr cptr_el2, x0 // Disable copro. traps to EL2
-
- /* SVE register access */
- mrs x1, id_aa64pfr0_el1
- ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
- cbz x1, 7f
-
- bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
- msr cptr_el2, x0 // Disable copro. traps to EL2
+SYM_INNER_LABEL(el2_setup_nvhe, SYM_L_LOCAL)
+ mov_q x0, HCR_HOST_NVHE_FLAGS
+ msr hcr_el2, x0
isb
- mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
- msr_s SYS_ZCR_EL2, x1 // length for EL1.
+
+ init_el2_state nvhe

/* Hypervisor stub */
-7: adr_l x0, __hyp_stub_vectors
+ adr_l x0, __hyp_stub_vectors
msr vbar_el2, x0

- /* spsr */
- mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
- PSR_MODE_EL1h)
- msr spsr_el2, x0
+ mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
+ msr sctlr_el1, x0
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
eret
--
2.29.2.299.gdc1121823c-goog

2020-11-17 02:08:29

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 12/24] kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2

Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
with the version used by the host's PSCI driver and the function IDs it
was configured with. If the SMC function ID matches one of the
configured PSCI calls (for v0.1) or falls into the PSCI function ID
range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
SMCs return PSCI_RET_NOT_SUPPORTED.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_hyp.h | 4 ++
arch/arm64/kvm/arm.c | 14 ++++
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +-
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 104 +++++++++++++++++++++++++++
5 files changed, 128 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/psci-relay.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index a3289071f3d8..95a2bbbcc7e1 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -96,6 +96,10 @@ void deactivate_traps_vhe_put(void);

u64 __guest_enter(struct kvm_vcpu *vcpu);

+#ifdef __KVM_NVHE_HYPERVISOR__
+bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
+#endif
+
void __noreturn hyp_panic(void);
#ifdef __KVM_NVHE_HYPERVISOR__
void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index cdd7981ea560..7d2270eeecfb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -19,6 +19,7 @@
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
#include <linux/sched/stat.h>
+#include <linux/psci.h>
#include <trace/events/kvm.h>

#define CREATE_TRACE_POINTS
@@ -1514,6 +1515,18 @@ static void init_cpu_logical_map(void)
CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
}

+static void init_psci_relay(void)
+{
+ extern u32 kvm_nvhe_sym(kvm_host_psci_version);
+ extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
+ int i;
+
+ CHOOSE_NVHE_SYM(kvm_host_psci_version) = psci_ops.get_version
+ ? psci_ops.get_version() : PSCI_VERSION(0, 0);
+ for (i = 0; i < PSCI_FN_MAX; ++i)
+ CHOOSE_NVHE_SYM(kvm_host_psci_function_id)[i] = psci_get_function_id(i);
+}
+
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@@ -1693,6 +1706,7 @@ static int init_hyp_mode(void)
}

init_cpu_logical_map();
+ init_psci_relay();

return 0;

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 2d842e009a40..bf62c8e42ab2 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -7,7 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__

obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o hyp-smp.o
+ hyp-main.o hyp-smp.o psci-relay.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 71a17af05953..df4acb40dd39 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -120,7 +120,11 @@ static void skip_host_instruction(void)

static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
{
- default_host_smc_handler(host_ctxt);
+ bool handled;
+
+ handled = kvm_host_psci_handler(host_ctxt);
+ if (!handled)
+ default_host_smc_handler(host_ctxt);

/*
* Unlike HVC, the return address of an SMC is the instruction's PC.
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
new file mode 100644
index 000000000000..d75d3f896bfd
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: David Brazdil <[email protected]>
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <kvm/arm_hypercalls.h>
+#include <linux/arm-smccc.h>
+#include <linux/psci.h>
+#include <kvm/arm_psci.h>
+#include <uapi/linux/psci.h>
+
+/* Config options set by the host. */
+u32 __ro_after_init kvm_host_psci_version = PSCI_VERSION(0, 0);
+u32 __ro_after_init kvm_host_psci_function_id[PSCI_FN_MAX];
+
+static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
+{
+ return host_ctxt->regs.regs[0];
+}
+
+static bool is_psci_0_1_call(u64 func_id)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(kvm_host_psci_function_id); ++i) {
+ if (func_id == kvm_host_psci_function_id[i])
+ return true;
+ }
+ return false;
+}
+
+static bool is_psci_0_2_call(u64 func_id)
+{
+ /* SMCCC reserves IDs 0x00-1F with the given 32/64-bit base for PSCI. */
+ return (PSCI_0_2_FN(0) <= func_id && func_id <= PSCI_0_2_FN(31)) ||
+ (PSCI_0_2_FN64(0) <= func_id && func_id <= PSCI_0_2_FN64(31));
+}
+
+static bool is_psci_call(u64 func_id)
+{
+ switch (kvm_host_psci_version) {
+ case PSCI_VERSION(0, 0):
+ return false;
+ case PSCI_VERSION(0, 1):
+ return is_psci_0_1_call(func_id);
+ default:
+ return is_psci_0_2_call(func_id);
+ }
+}
+
+static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ return PSCI_RET_NOT_SUPPORTED;
+}
+
+static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ switch (func_id) {
+ default:
+ return PSCI_RET_NOT_SUPPORTED;
+ }
+}
+
+static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ switch (func_id) {
+ default:
+ return psci_0_2_handler(func_id, host_ctxt);
+ }
+}
+
+bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
+{
+ u64 func_id = get_psci_func_id(host_ctxt);
+ unsigned long ret;
+
+ if (!is_psci_call(func_id))
+ return false;
+
+ switch (kvm_host_psci_version) {
+ case PSCI_VERSION(0, 0):
+ ret = PSCI_RET_NOT_SUPPORTED;
+ break;
+ case PSCI_VERSION(0, 1):
+ ret = psci_0_1_handler(func_id, host_ctxt);
+ break;
+ case PSCI_VERSION(0, 2):
+ ret = psci_0_2_handler(func_id, host_ctxt);
+ break;
+ default:
+ ret = psci_1_0_handler(func_id, host_ctxt);
+ break;
+ }
+
+ host_ctxt->regs.regs[0] = ret;
+ host_ctxt->regs.regs[1] = 0;
+ host_ctxt->regs.regs[2] = 0;
+ host_ctxt->regs.regs[3] = 0;
+ return true;
+}
--
2.29.2.299.gdc1121823c-goog

2020-11-17 03:23:49

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 01/24] psci: Support psci_ops.get_version for v0.1

KVM's host PSCI SMC filter needs to be aware of the PSCI version of the
system but currently it is impossible to distinguish between v0.1 and
PSCI disabled because both have get_version == NULL.

Populate get_version for v0.1 with a function that returns a constant.

psci_opt.get_version is currently unused so this has no effect on
existing functionality.

Signed-off-by: David Brazdil <[email protected]>
---
drivers/firmware/psci/psci.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 00af99b6f97c..213c68418a65 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -146,6 +146,11 @@ static int psci_to_linux_errno(int errno)
return -EINVAL;
}

+static u32 psci_get_version_0_1(void)
+{
+ return PSCI_VERSION(0, 1);
+}
+
static u32 psci_get_version(void)
{
return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
@@ -514,6 +519,8 @@ static int __init psci_0_1_init(struct device_node *np)

pr_info("Using PSCI v0.1 Function IDs from DT\n");

+ psci_ops.get_version = psci_get_version_0_1;
+
if (!of_property_read_u32(np, "cpu_suspend", &id)) {
psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
psci_ops.cpu_suspend = psci_cpu_suspend;
--
2.29.2.299.gdc1121823c-goog

2020-11-17 03:24:06

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 11/24] kvm: arm64: Create nVHE copy of cpu_logical_map

When KVM starts validating host's PSCI requests, it will need to map
MPIDR back to the CPU ID. To this end, copy cpu_logical_map into nVHE
hyp memory when KVM is initialized.

Only copy the information for CPUs that are online at the point of KVM
initialization so that KVM rejects CPUs whose features were not checked
against the finalized capabilities.

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 17 +++++++++++++++++
arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 16 ++++++++++++++++
2 files changed, 33 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 97af6c285f84..cdd7981ea560 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1499,6 +1499,21 @@ static inline void hyp_cpu_pm_exit(void)
}
#endif

+static void init_cpu_logical_map(void)
+{
+ extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
+ int cpu;
+
+ /*
+ * Copy the MPIDR <-> logical CPU ID mapping to hyp.
+ * Only copy the set of online CPUs whose features have been chacked
+ * against the finalized system capabilities. The hypervisor will not
+ * allow any other CPUs from the `possible` set to boot.
+ */
+ for_each_online_cpu(cpu)
+ CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
+}
+
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@@ -1677,6 +1692,8 @@ static int init_hyp_mode(void)
}
}

+ init_cpu_logical_map();
+
return 0;

out_err:
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 7b0363b4857f..cbab0c6246e2 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -8,6 +8,22 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>

+/*
+ * nVHE copy of data structures tracking available CPU cores.
+ * Only entries for CPUs that were online at KVM init are populated.
+ * Other CPUs should not be allowed to boot because their features were
+ * not checked against the finalized system capabilities.
+ */
+u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
+
+u64 cpu_logical_map(unsigned int cpu)
+{
+ if (cpu >= ARRAY_SIZE(__cpu_logical_map))
+ hyp_panic();
+
+ return __cpu_logical_map[cpu];
+}
+
unsigned long __hyp_per_cpu_offset(unsigned int cpu)
{
unsigned long *cpu_base_array;
--
2.29.2.299.gdc1121823c-goog

2020-11-17 03:24:38

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 14/24] kvm: arm64: Forward safe PSCI SMCs coming from host

Forward the following PSCI SMCs issued by host to EL3 as they do not
require the hypervisor's intervention. This assumes that EL3 correctly
implements the PSCI specification.

Only function IDs implemented in Linux are included.

Where both 32-bit and 64-bit variants exist, it is assumed that the host
will always use the 64-bit variant.

* SMCs that only return information about the system
* PSCI_VERSION - PSCI version implemented by EL3
* PSCI_FEATURES - optional features supported by EL3
* AFFINITY_INFO - power state of core/cluster
* MIGRATE_INFO_TYPE - whether Trusted OS can be migrated
* MIGRATE_INFO_UP_CPU - resident core of Trusted OS
* operations which do not affect the hypervisor
* MIGRATE - migrate Trusted OS to a different core
* SET_SUSPEND_MODE - toggle OS-initiated mode
* system shutdown/reset
* SYSTEM_OFF
* SYSTEM_RESET
* SYSTEM_RESET2

Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 43 +++++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 1583b63322c4..7542de8bd679 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -55,14 +55,51 @@ static bool is_psci_call(u64 func_id)
}
}

+static unsigned long psci_call(unsigned long fn, unsigned long arg0,
+ unsigned long arg1, unsigned long arg2)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_smc(fn, arg0, arg1, arg2, &res);
+ return res.a0;
+}
+
+static unsigned long psci_forward(struct kvm_cpu_context *host_ctxt)
+{
+ return psci_call(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
+ host_ctxt->regs.regs[2], host_ctxt->regs.regs[3]);
+}
+
+static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *host_ctxt)
+{
+ psci_forward(host_ctxt);
+ hyp_panic(); /* unreachable */
+}
+
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
- return PSCI_RET_NOT_SUPPORTED;
+ if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
+ return psci_forward(host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
+ return psci_forward(host_ctxt);
+ else
+ return PSCI_RET_NOT_SUPPORTED;
}

static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
switch (func_id) {
+ case PSCI_0_2_FN_PSCI_VERSION:
+ case PSCI_0_2_FN_CPU_OFF:
+ case PSCI_0_2_FN64_AFFINITY_INFO:
+ case PSCI_0_2_FN64_MIGRATE:
+ case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
+ case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
+ return psci_forward(host_ctxt);
+ case PSCI_0_2_FN_SYSTEM_OFF:
+ case PSCI_0_2_FN_SYSTEM_RESET:
+ psci_forward_noreturn(host_ctxt);
+ unreachable();
default:
return PSCI_RET_NOT_SUPPORTED;
}
@@ -71,6 +108,10 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
switch (func_id) {
+ case PSCI_1_0_FN_PSCI_FEATURES:
+ case PSCI_1_0_FN_SET_SUSPEND_MODE:
+ case PSCI_1_1_FN64_SYSTEM_RESET2:
+ return psci_forward(host_ctxt);
default:
return psci_0_2_handler(func_id, host_ctxt);
}
--
2.29.2.299.gdc1121823c-goog

2020-11-17 03:25:05

by David Brazdil

[permalink] [raw]
Subject: [PATCH v2 02/24] psci: Accessor for configured PSCI function IDs

Function IDs used by PSCI are configurable for v0.1 via DT/APCI. If the
host is using PSCI v0.1, KVM's host PSCI proxy needs to use the same IDs.
Expose the array holding the information with a read-only accessor.

Signed-off-by: David Brazdil <[email protected]>
---
drivers/firmware/psci/psci.c | 14 ++++++--------
include/linux/psci.h | 10 ++++++++++
2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 213c68418a65..d835f3d8b121 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -58,16 +58,14 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned long,
unsigned long, unsigned long);
static psci_fn *invoke_psci_fn;

-enum psci_function {
- PSCI_FN_CPU_SUSPEND,
- PSCI_FN_CPU_ON,
- PSCI_FN_CPU_OFF,
- PSCI_FN_MIGRATE,
- PSCI_FN_MAX,
-};
-
static u32 psci_function_id[PSCI_FN_MAX];

+u32 psci_get_function_id(enum psci_function fn)
+{
+ WARN_ON(fn >= PSCI_FN_MAX);
+ return psci_function_id[fn];
+}
+
#define PSCI_0_2_POWER_STATE_MASK \
(PSCI_0_2_POWER_STATE_ID_MASK | \
PSCI_0_2_POWER_STATE_TYPE_MASK | \
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 2a1bfb890e58..5b49a5c82d6f 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -21,6 +21,16 @@ bool psci_power_state_is_valid(u32 state);
int psci_set_osi_mode(bool enable);
bool psci_has_osi_support(void);

+enum psci_function {
+ PSCI_FN_CPU_SUSPEND,
+ PSCI_FN_CPU_ON,
+ PSCI_FN_CPU_OFF,
+ PSCI_FN_MIGRATE,
+ PSCI_FN_MAX,
+};
+
+u32 psci_get_function_id(enum psci_function fn);
+
struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
--
2.29.2.299.gdc1121823c-goog

2020-11-23 13:48:17

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 00/24] Opt-in always-on nVHE hypervisor

On Mon, 16 Nov 2020 20:42:54 +0000,
David Brazdil <[email protected]> wrote:
>
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.
>
> All functionality described below is opt-in, guarded by an early param
> 'kvm-arm.protected'. Future patches specific to the new "protected" mode
> should be hidden behind the same param.
>
> The hypervisor starts trapping host SMCs and intercepting host's PSCI
> CPU_ON/SUSPEND calls. It replaces the host's entry point with its own,
> initializes the EL2 state of the new CPU and installs the nVHE hyp vector
> before ERETing to the host's entry point.
>
> The kernel checks new cores' features against the finalized system
> capabilities. To avoid the need to move this code/data to EL2, the
> implementation only allows to boot cores that were online at the time of
> KVM initialization and therefore had been checked already.
>
> Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
> implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
> to EL3. Future changes will need to ensure the safety of all SMCs wrt.
> private guests.
>
> The host is still allowed to reset EL2 back to the stub vector, eg. for
> hibernation or kexec, but will not disable nVHE when there are no VMs.
>
> Tested on Rock Pi 4b, based on 5.10-rc4.

Adding Lorenzo and Sudeep for the PSCI side of things.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2020-11-23 13:51:33

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 02/24] psci: Accessor for configured PSCI function IDs

On Mon, 16 Nov 2020 20:42:56 +0000,
David Brazdil <[email protected]> wrote:
>
> Function IDs used by PSCI are configurable for v0.1 via DT/APCI. If the
> host is using PSCI v0.1, KVM's host PSCI proxy needs to use the same IDs.
> Expose the array holding the information with a read-only accessor.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> drivers/firmware/psci/psci.c | 14 ++++++--------
> include/linux/psci.h | 10 ++++++++++
> 2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 213c68418a65..d835f3d8b121 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -58,16 +58,14 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned long,
> unsigned long, unsigned long);
> static psci_fn *invoke_psci_fn;
>
> -enum psci_function {
> - PSCI_FN_CPU_SUSPEND,
> - PSCI_FN_CPU_ON,
> - PSCI_FN_CPU_OFF,
> - PSCI_FN_MIGRATE,
> - PSCI_FN_MAX,
> -};
> -
> static u32 psci_function_id[PSCI_FN_MAX];
>
> +u32 psci_get_function_id(enum psci_function fn)
> +{
> + WARN_ON(fn >= PSCI_FN_MAX);

If we are going to warn on something that is out of bounds, maybe we
shouldn't perform the access at all? And maybe check for lower bound
as well?

> + return psci_function_id[fn];
> +}
> +
> #define PSCI_0_2_POWER_STATE_MASK \
> (PSCI_0_2_POWER_STATE_ID_MASK | \
> PSCI_0_2_POWER_STATE_TYPE_MASK | \
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index 2a1bfb890e58..5b49a5c82d6f 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -21,6 +21,16 @@ bool psci_power_state_is_valid(u32 state);
> int psci_set_osi_mode(bool enable);
> bool psci_has_osi_support(void);
>
> +enum psci_function {
> + PSCI_FN_CPU_SUSPEND,
> + PSCI_FN_CPU_ON,
> + PSCI_FN_CPU_OFF,
> + PSCI_FN_MIGRATE,
> + PSCI_FN_MAX,
> +};
> +
> +u32 psci_get_function_id(enum psci_function fn);
> +
> struct psci_operations {
> u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);
> --
> 2.29.2.299.gdc1121823c-goog
>
>

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2020-11-23 15:29:40

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 15/24] kvm: arm64: Extract parts of el2_setup into a macro

On Mon, 16 Nov 2020 20:43:09 +0000,
David Brazdil <[email protected]> wrote:
>
> When the a CPU is booted in EL2, the kernel checks for VHE support and
> initializes the CPU core accordingly. For nVHE it also installs the stub
> vectors and drops down to EL1.
>
> Once KVM gains the ability to boot cores without going through the
> kernel entry point, it will need to initialize the CPU the same way.
> Extract the relevant bits of el2_setup into an init_el2_state macro
> with an argument specifying whether to initialize for VHE or nVHE.
>
> No functional change. Size of el2_setup increased by 148 bytes due
> to duplication.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/el2_setup.h | 185 +++++++++++++++++++++++++++++
> arch/arm64/kernel/head.S | 144 +++-------------------
> 2 files changed, 201 insertions(+), 128 deletions(-)
> create mode 100644 arch/arm64/include/asm/el2_setup.h
>
> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> new file mode 100644
> index 000000000000..e5026e0aa878
> --- /dev/null
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -0,0 +1,185 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier <[email protected]>
> + */
> +
> +#ifndef __ARM_KVM_INIT_H__
> +#define __ARM_KVM_INIT_H__
> +
> +#ifndef __ASSEMBLY__
> +#error Assembly-only header
> +#endif
> +
> +#ifdef CONFIG_ARM_GIC_V3
> +#include <linux/irqchip/arm-gic-v3.h>
> +#endif
> +
> +#include <asm/kvm_arm.h>
> +#include <asm/ptrace.h>
> +#include <asm/sysreg.h>
> +
> +.macro __init_el2_sctlr
> + mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> + msr sctlr_el2, x0
> + isb
> +.endm
> +
> +/*
> + * Allow Non-secure EL1 and EL0 to access physical timer and counter.
> + * This is not necessary for VHE, since the host kernel runs in EL2,
> + * and EL0 accesses are configured in the later stage of boot process.
> + * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
> + * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
> + * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
> + * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
> + * EL2.
> + */
> +.macro __init_el2_timers mode
> +.ifeqs "\mode", "nvhe"
> + mrs x0, cnthctl_el2
> + orr x0, x0, #3 // Enable EL1 physical timers
> + msr cnthctl_el2, x0
> +.endif
> + msr cntvoff_el2, xzr // Clear virtual offset
> +.endm
> +
> +.macro __init_el2_debug mode
> + mrs x1, id_aa64dfr0_el1
> + sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
> + cmp x0, #1
> + b.lt 1f // Skip if no PMU present
> + mrs x0, pmcr_el0 // Disable debug access traps
> + ubfx x0, x0, #11, #5 // to EL2 and allow access to
> +1:
> + csel x2, xzr, x0, lt // all PMU counters from EL1
> +
> + /* Statistical profiling */
> + ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
> + cbz x0, 3f // Skip if SPE not present
> +
> +.ifeqs "\mode", "nvhe"
> + mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2,
> + and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
> + cbnz x0, 2f // then permit sampling of physical
> + mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
> + 1 << SYS_PMSCR_EL2_PA_SHIFT)
> + msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter
> +2:
> + mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
> + orr x2, x2, x0 // If we don't have VHE, then
> + // use EL1&0 translation.
> +.else
> + orr x2, x2, #MDCR_EL2_TPMS // For VHE, use EL2 translation
> + // and disable access from EL1
> +.endif
> +
> +3:
> + msr mdcr_el2, x2 // Configure debug traps
> +.endm
> +
> +/* LORegions */
> +.macro __init_el2_lor
> + mrs x1, id_aa64mmfr1_el1
> + ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
> + cbz x0, 1f
> + msr_s SYS_LORC_EL1, xzr
> +1:
> +.endm
> +
> +/* Stage-2 translation */
> +.macro __init_el2_stage2
> + msr vttbr_el2, xzr
> +.endm
> +
> +/* GICv3 system register access */
> +#ifdef CONFIG_ARM_GIC_V3

nit: this #ifdef isn't relevant anymore and can be dropped throughout
the file.

> +.macro __init_el2_gicv3
> + mrs x0, id_aa64pfr0_el1
> + ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> + cbz x0, 1f
> +
> + mrs_s x0, SYS_ICC_SRE_EL2
> + orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
> + orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> + msr_s SYS_ICC_SRE_EL2, x0
> + isb // Make sure SRE is now set
> + mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
> + tbz x0, #0, 1f // and check that it sticks
> + msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
> +1:
> +.endm
> +#endif
> +
> +/* Virtual CPU ID registers */
> +.macro __init_el2_nvhe_idregs
> + mrs x0, midr_el1
> + mrs x1, mpidr_el1
> + msr vpidr_el2, x0
> + msr vmpidr_el2, x1
> +.endm
> +
> +/* Coprocessor traps */
> +.macro __init_el2_nvhe_cptr
> + mov x0, #0x33ff
> + msr cptr_el2, x0 // Disable copro. traps to EL2
> +.endm
> +
> +/* SVE register access */
> +.macro __init_el2_nvhe_sve
> + mrs x1, id_aa64pfr0_el1
> + ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
> + cbz x1, 1f
> +
> + bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
> + msr cptr_el2, x0 // Disable copro. traps to EL2
> + isb
> + mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
> + msr_s SYS_ZCR_EL2, x1 // length for EL1.
> +1:
> +.endm
> +
> +.macro __init_el2_nvhe_spsr

nit: this would be better named as "prepare_eret".

> + mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> + PSR_MODE_EL1h)
> + msr spsr_el2, x0
> +.endm
> +
> +.macro init_el2_state mode
> +
> +.ifnes "\mode", "vhe"
> +.ifnes "\mode", "nvhe"
> +.error "Invalid 'mode' argument"
> +.endif
> +.endif
> +
> + __init_el2_sctlr
> + __init_el2_timers \mode
> + __init_el2_debug \mode
> + __init_el2_lor
> + __init_el2_stage2
> +
> +#ifdef CONFIG_ARM_GIC_V3
> + __init_el2_gicv3
> +#endif
> +
> +#ifdef CONFIG_COMPAT

I also think we can drop this one, as HSTR_EL2 is always defined, even
when AArch32 isn't present in the system.

> + msr hstr_el2, xzr // Disable CP15 traps to EL2
> +#endif
> +
> + /*
> + * When VHE is not in use, early init of EL2 needs to be done here.
> + * When VHE _is_ in use, EL1 will not be used in the host and
> + * requires no configuration, and all non-hyp-specific EL2 setup
> + * will be done via the _EL1 system register aliases in __cpu_setup.
> + */
> +.ifeqs "\mode", "nvhe"
> + __init_el2_nvhe_idregs
> + __init_el2_nvhe_cptr
> + __init_el2_nvhe_sve
> + __init_el2_nvhe_spsr
> +.endif
> +
> +.endm

One thing that is missing here is a description of the registers that
are clobbered. It was easy to spot before (everything was in the same
file), and a bit harder now.

> +
> +#endif /* __ARM_KVM_INIT_H__ */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index d8d9caf02834..da913ce9e89f 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -11,7 +11,6 @@
>
> #include <linux/linkage.h>
> #include <linux/init.h>
> -#include <linux/irqchip/arm-gic-v3.h>
> #include <linux/pgtable.h>
>
> #include <asm/asm_pointer_auth.h>
> @@ -21,6 +20,7 @@
> #include <asm/asm-offsets.h>
> #include <asm/cache.h>
> #include <asm/cputype.h>
> +#include <asm/el2_setup.h>
> #include <asm/elf.h>
> #include <asm/image.h>
> #include <asm/kernel-pgtable.h>
> @@ -493,159 +493,47 @@ SYM_FUNC_START(el2_setup)
> mrs x0, CurrentEL
> cmp x0, #CurrentEL_EL2
> b.eq 1f
> +
> mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> msr sctlr_el1, x0
> mov w0, #BOOT_CPU_MODE_EL1 // This cpu booted in EL1
> isb
> ret
>
> -1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> - msr sctlr_el2, x0
> -
> +1:
> #ifdef CONFIG_ARM64_VHE
> /*
> - * Check for VHE being present. For the rest of the EL2 setup,
> - * x2 being non-zero indicates that we do have VHE, and that the
> - * kernel is intended to run at EL2.
> + * Check for VHE being present. x2 being non-zero indicates that we
> + * do have VHE, and that the kernel is intended to run at EL2.
> */
> mrs x2, id_aa64mmfr1_el1
> ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
> -#else
> - mov x2, xzr
> -#endif
> + cbz x2, el2_setup_nvhe
>
> - /* Hyp configuration. */
> - mov_q x0, HCR_HOST_NVHE_FLAGS
> - cbz x2, set_hcr
> mov_q x0, HCR_HOST_VHE_FLAGS
> -set_hcr:
> msr hcr_el2, x0
> isb
>
> - /*
> - * Allow Non-secure EL1 and EL0 to access physical timer and counter.
> - * This is not necessary for VHE, since the host kernel runs in EL2,
> - * and EL0 accesses are configured in the later stage of boot process.
> - * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
> - * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
> - * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
> - * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
> - * EL2.
> - */
> - cbnz x2, 1f
> - mrs x0, cnthctl_el2
> - orr x0, x0, #3 // Enable EL1 physical timers
> - msr cnthctl_el2, x0
> -1:
> - msr cntvoff_el2, xzr // Clear virtual offset
> -
> -#ifdef CONFIG_ARM_GIC_V3
> - /* GICv3 system register access */
> - mrs x0, id_aa64pfr0_el1
> - ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> - cbz x0, 3f
> -
> - mrs_s x0, SYS_ICC_SRE_EL2
> - orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
> - orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> - msr_s SYS_ICC_SRE_EL2, x0
> - isb // Make sure SRE is now set
> - mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
> - tbz x0, #0, 3f // and check that it sticks
> - msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
> -
> -3:
> -#endif
> -
> - /* Populate ID registers. */
> - mrs x0, midr_el1
> - mrs x1, mpidr_el1
> - msr vpidr_el2, x0
> - msr vmpidr_el2, x1
> -
> -#ifdef CONFIG_COMPAT
> - msr hstr_el2, xzr // Disable CP15 traps to EL2
> -#endif
> -
> - /* EL2 debug */
> - mrs x1, id_aa64dfr0_el1
> - sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
> - cmp x0, #1
> - b.lt 4f // Skip if no PMU present
> - mrs x0, pmcr_el0 // Disable debug access traps
> - ubfx x0, x0, #11, #5 // to EL2 and allow access to
> -4:
> - csel x3, xzr, x0, lt // all PMU counters from EL1
> -
> - /* Statistical profiling */
> - ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
> - cbz x0, 7f // Skip if SPE not present
> - cbnz x2, 6f // VHE?
> - mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
> - and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
> - cbnz x4, 5f // then permit sampling of physical
> - mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
> - 1 << SYS_PMSCR_EL2_PA_SHIFT)
> - msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
> -5:
> - mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
> - orr x3, x3, x1 // If we don't have VHE, then
> - b 7f // use EL1&0 translation.
> -6: // For VHE, use EL2 translation
> - orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
> -7:
> - msr mdcr_el2, x3 // Configure debug traps
> -
> - /* LORegions */
> - mrs x1, id_aa64mmfr1_el1
> - ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
> - cbz x0, 1f
> - msr_s SYS_LORC_EL1, xzr
> -1:
> -
> - /* Stage-2 translation */
> - msr vttbr_el2, xzr
> -
> - cbz x2, install_el2_stub
> + init_el2_state vhe
>
> mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
> isb
> ret
> +#endif
>
> -SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
> - /*
> - * When VHE is not in use, early init of EL2 and EL1 needs to be
> - * done here.
> - * When VHE _is_ in use, EL1 will not be used in the host and
> - * requires no configuration, and all non-hyp-specific EL2 setup
> - * will be done via the _EL1 system register aliases in __cpu_setup.
> - */
> - mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> - msr sctlr_el1, x0
> -
> - /* Coprocessor traps. */
> - mov x0, #0x33ff
> - msr cptr_el2, x0 // Disable copro. traps to EL2
> -
> - /* SVE register access */
> - mrs x1, id_aa64pfr0_el1
> - ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
> - cbz x1, 7f
> -
> - bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
> - msr cptr_el2, x0 // Disable copro. traps to EL2
> +SYM_INNER_LABEL(el2_setup_nvhe, SYM_L_LOCAL)
> + mov_q x0, HCR_HOST_NVHE_FLAGS
> + msr hcr_el2, x0
> isb
> - mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
> - msr_s SYS_ZCR_EL2, x1 // length for EL1.
> +
> + init_el2_state nvhe
>
> /* Hypervisor stub */
> -7: adr_l x0, __hyp_stub_vectors
> + adr_l x0, __hyp_stub_vectors
> msr vbar_el2, x0
>
> - /* spsr */
> - mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> - PSR_MODE_EL1h)
> - msr spsr_el2, x0
> + mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> + msr sctlr_el1, x0
> msr elr_el2, lr
> mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
> eret
> --
> 2.29.2.299.gdc1121823c-goog
>
>

It looks much better now, thanks a lot for going through the pain of
splitting everything.

M.

--
Without deviation from the norm, progress is not possible.

2020-11-23 18:00:13

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 12/24] kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2

On Mon, 16 Nov 2020 20:43:06 +0000,
David Brazdil <[email protected]> wrote:
>
> Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
> with the version used by the host's PSCI driver and the function IDs it
> was configured with. If the SMC function ID matches one of the
> configured PSCI calls (for v0.1) or falls into the PSCI function ID
> range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
> SMCs return PSCI_RET_NOT_SUPPORTED.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/include/asm/kvm_hyp.h | 4 ++
> arch/arm64/kvm/arm.c | 14 ++++
> arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +-
> arch/arm64/kvm/hyp/nvhe/psci-relay.c | 104 +++++++++++++++++++++++++++
> 5 files changed, 128 insertions(+), 2 deletions(-)
> create mode 100644 arch/arm64/kvm/hyp/nvhe/psci-relay.c
>
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index a3289071f3d8..95a2bbbcc7e1 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -96,6 +96,10 @@ void deactivate_traps_vhe_put(void);
>
> u64 __guest_enter(struct kvm_vcpu *vcpu);
>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
> +#endif
> +
> void __noreturn hyp_panic(void);
> #ifdef __KVM_NVHE_HYPERVISOR__
> void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index cdd7981ea560..7d2270eeecfb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -19,6 +19,7 @@
> #include <linux/kvm_irqfd.h>
> #include <linux/irqbypass.h>
> #include <linux/sched/stat.h>
> +#include <linux/psci.h>
> #include <trace/events/kvm.h>
>
> #define CREATE_TRACE_POINTS
> @@ -1514,6 +1515,18 @@ static void init_cpu_logical_map(void)
> CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
> }
>
> +static void init_psci_relay(void)
> +{
> + extern u32 kvm_nvhe_sym(kvm_host_psci_version);
> + extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];

nit: I'd rather have these outside of the function body.

> + int i;
> +
> + CHOOSE_NVHE_SYM(kvm_host_psci_version) = psci_ops.get_version
> + ? psci_ops.get_version() : PSCI_VERSION(0, 0);

nit: please write this with an if/else construct, it will read a lot
better.

> + for (i = 0; i < PSCI_FN_MAX; ++i)
> + CHOOSE_NVHE_SYM(kvm_host_psci_function_id)[i] = psci_get_function_id(i);

Either pick kvm_nvhe_sym(), or CHOOSE_NVHE_SYM(). Having both used
together is just an annoyance (and in this case there is nothing to
choose, really).

> +}
> +
> static int init_common_resources(void)
> {
> return kvm_set_ipa_limit();
> @@ -1693,6 +1706,7 @@ static int init_hyp_mode(void)
> }
>
> init_cpu_logical_map();
> + init_psci_relay();
>
> return 0;
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 2d842e009a40..bf62c8e42ab2 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -7,7 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__
> ccflags-y := -D__KVM_NVHE_HYPERVISOR__
>
> obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> - hyp-main.o hyp-smp.o
> + hyp-main.o hyp-smp.o psci-relay.o
> obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> ../fpsimd.o ../hyp-entry.o
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 71a17af05953..df4acb40dd39 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -120,7 +120,11 @@ static void skip_host_instruction(void)
>
> static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
> {
> - default_host_smc_handler(host_ctxt);
> + bool handled;
> +
> + handled = kvm_host_psci_handler(host_ctxt);
> + if (!handled)
> + default_host_smc_handler(host_ctxt);
>
> /*
> * Unlike HVC, the return address of an SMC is the instruction's PC.
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> new file mode 100644
> index 000000000000..d75d3f896bfd
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -0,0 +1,104 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 - Google LLC
> + * Author: David Brazdil <[email protected]>
> + */
> +
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <kvm/arm_hypercalls.h>
> +#include <linux/arm-smccc.h>
> +#include <linux/psci.h>
> +#include <kvm/arm_psci.h>
> +#include <uapi/linux/psci.h>
> +
> +/* Config options set by the host. */
> +u32 __ro_after_init kvm_host_psci_version = PSCI_VERSION(0, 0);
> +u32 __ro_after_init kvm_host_psci_function_id[PSCI_FN_MAX];
> +
> +static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
> +{
> + return host_ctxt->regs.regs[0];
> +}
> +
> +static bool is_psci_0_1_call(u64 func_id)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < ARRAY_SIZE(kvm_host_psci_function_id); ++i) {
> + if (func_id == kvm_host_psci_function_id[i])
> + return true;
> + }
> + return false;
> +}
> +
> +static bool is_psci_0_2_call(u64 func_id)
> +{
> + /* SMCCC reserves IDs 0x00-1F with the given 32/64-bit base for PSCI. */
> + return (PSCI_0_2_FN(0) <= func_id && func_id <= PSCI_0_2_FN(31)) ||
> + (PSCI_0_2_FN64(0) <= func_id && func_id <= PSCI_0_2_FN64(31));
> +}
> +
> +static bool is_psci_call(u64 func_id)
> +{
> + switch (kvm_host_psci_version) {
> + case PSCI_VERSION(0, 0):
> + return false;
> + case PSCI_VERSION(0, 1):
> + return is_psci_0_1_call(func_id);
> + default:
> + return is_psci_0_2_call(func_id);
> + }
> +}
> +
> +static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
> +{
> + return PSCI_RET_NOT_SUPPORTED;
> +}
> +
> +static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
> +{
> + switch (func_id) {
> + default:
> + return PSCI_RET_NOT_SUPPORTED;
> + }
> +}
> +
> +static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
> +{
> + switch (func_id) {
> + default:
> + return psci_0_2_handler(func_id, host_ctxt);
> + }
> +}
> +
> +bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
> +{
> + u64 func_id = get_psci_func_id(host_ctxt);
> + unsigned long ret;
> +
> + if (!is_psci_call(func_id))
> + return false;
> +
> + switch (kvm_host_psci_version) {
> + case PSCI_VERSION(0, 0):
> + ret = PSCI_RET_NOT_SUPPORTED;

But isn't that way too late? No PSCI means that we cannot control the
way CPUs boot at all. I think we should completely fail the whole
Protected KVM business if we don't have PSCI.

> + break;
> + case PSCI_VERSION(0, 1):
> + ret = psci_0_1_handler(func_id, host_ctxt);
> + break;
> + case PSCI_VERSION(0, 2):
> + ret = psci_0_2_handler(func_id, host_ctxt);
> + break;
> + default:
> + ret = psci_1_0_handler(func_id, host_ctxt);
> + break;
> + }
> +
> + host_ctxt->regs.regs[0] = ret;
> + host_ctxt->regs.regs[1] = 0;
> + host_ctxt->regs.regs[2] = 0;
> + host_ctxt->regs.regs[3] = 0;
> + return true;
> +}
> --
> 2.29.2.299.gdc1121823c-goog
>
>

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2020-11-23 18:06:43

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 00/24] Opt-in always-on nVHE hypervisor

Hi David,

On Mon, 16 Nov 2020 20:42:54 +0000,
David Brazdil <[email protected]> wrote:
>
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.
>
> All functionality described below is opt-in, guarded by an early param
> 'kvm-arm.protected'. Future patches specific to the new "protected" mode
> should be hidden behind the same param.
>
> The hypervisor starts trapping host SMCs and intercepting host's PSCI
> CPU_ON/SUSPEND calls. It replaces the host's entry point with its own,
> initializes the EL2 state of the new CPU and installs the nVHE hyp vector
> before ERETing to the host's entry point.
>
> The kernel checks new cores' features against the finalized system
> capabilities. To avoid the need to move this code/data to EL2, the
> implementation only allows to boot cores that were online at the time of
> KVM initialization and therefore had been checked already.
>
> Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
> implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
> to EL3. Future changes will need to ensure the safety of all SMCs wrt.
> private guests.
>
> The host is still allowed to reset EL2 back to the stub vector, eg. for
> hibernation or kexec, but will not disable nVHE when there are no VMs.

I have now been through the whole series, and I don't think there is
anything really major (although I haven't tried it yet).

I think it would benefit from being rebased on top of kvmarm/queue, as
it'd give you the opportunity to replace a number of per-CPU state
fields with global function pointers. Another thing is that we seem to
have diverging interpretations of the PSCI spec when it comes to
CPU_SUSPEND.

Finally, please include the PSCI maintainers in your next posting, as
I'll need their Ack to pick the first few patches.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2020-11-25 13:01:27

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v2 15/24] kvm: arm64: Extract parts of el2_setup into a macro

On Mon, Nov 23, 2020 at 03:27:01PM +0000, Marc Zyngier wrote:
> On Mon, 16 Nov 2020 20:43:09 +0000,
> David Brazdil <[email protected]> wrote:
> >
> > When the a CPU is booted in EL2, the kernel checks for VHE support and
> > initializes the CPU core accordingly. For nVHE it also installs the stub
> > vectors and drops down to EL1.
> >
> > Once KVM gains the ability to boot cores without going through the
> > kernel entry point, it will need to initialize the CPU the same way.
> > Extract the relevant bits of el2_setup into an init_el2_state macro
> > with an argument specifying whether to initialize for VHE or nVHE.
> >
> > No functional change. Size of el2_setup increased by 148 bytes due
> > to duplication.
> >
> > Signed-off-by: David Brazdil <[email protected]>
> > ---
> > arch/arm64/include/asm/el2_setup.h | 185 +++++++++++++++++++++++++++++
> > arch/arm64/kernel/head.S | 144 +++-------------------
> > 2 files changed, 201 insertions(+), 128 deletions(-)
> > create mode 100644 arch/arm64/include/asm/el2_setup.h
> >
> > diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> > new file mode 100644
> > index 000000000000..e5026e0aa878
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/el2_setup.h
> > @@ -0,0 +1,185 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2012,2013 - ARM Ltd
> > + * Author: Marc Zyngier <[email protected]>
> > + */
> > +
> > +#ifndef __ARM_KVM_INIT_H__
> > +#define __ARM_KVM_INIT_H__
> > +
> > +#ifndef __ASSEMBLY__
> > +#error Assembly-only header
> > +#endif
> > +
> > +#ifdef CONFIG_ARM_GIC_V3
> > +#include <linux/irqchip/arm-gic-v3.h>
> > +#endif
> > +
> > +#include <asm/kvm_arm.h>
> > +#include <asm/ptrace.h>
> > +#include <asm/sysreg.h>
> > +
> > +.macro __init_el2_sctlr
> > + mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> > + msr sctlr_el2, x0
> > + isb
> > +.endm
> > +
> > +/*
> > + * Allow Non-secure EL1 and EL0 to access physical timer and counter.
> > + * This is not necessary for VHE, since the host kernel runs in EL2,
> > + * and EL0 accesses are configured in the later stage of boot process.
> > + * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
> > + * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
> > + * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
> > + * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
> > + * EL2.
> > + */
> > +.macro __init_el2_timers mode
> > +.ifeqs "\mode", "nvhe"
> > + mrs x0, cnthctl_el2
> > + orr x0, x0, #3 // Enable EL1 physical timers
> > + msr cnthctl_el2, x0
> > +.endif
> > + msr cntvoff_el2, xzr // Clear virtual offset
> > +.endm
> > +
> > +.macro __init_el2_debug mode
> > + mrs x1, id_aa64dfr0_el1
> > + sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
> > + cmp x0, #1
> > + b.lt 1f // Skip if no PMU present
> > + mrs x0, pmcr_el0 // Disable debug access traps
> > + ubfx x0, x0, #11, #5 // to EL2 and allow access to
> > +1:
> > + csel x2, xzr, x0, lt // all PMU counters from EL1
> > +
> > + /* Statistical profiling */
> > + ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
> > + cbz x0, 3f // Skip if SPE not present
> > +
> > +.ifeqs "\mode", "nvhe"
> > + mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2,
> > + and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
> > + cbnz x0, 2f // then permit sampling of physical
> > + mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
> > + 1 << SYS_PMSCR_EL2_PA_SHIFT)
> > + msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter
> > +2:
> > + mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
> > + orr x2, x2, x0 // If we don't have VHE, then
> > + // use EL1&0 translation.
> > +.else
> > + orr x2, x2, #MDCR_EL2_TPMS // For VHE, use EL2 translation
> > + // and disable access from EL1
> > +.endif
> > +
> > +3:
> > + msr mdcr_el2, x2 // Configure debug traps
> > +.endm
> > +
> > +/* LORegions */
> > +.macro __init_el2_lor
> > + mrs x1, id_aa64mmfr1_el1
> > + ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
> > + cbz x0, 1f
> > + msr_s SYS_LORC_EL1, xzr
> > +1:
> > +.endm
> > +
> > +/* Stage-2 translation */
> > +.macro __init_el2_stage2
> > + msr vttbr_el2, xzr
> > +.endm
> > +
> > +/* GICv3 system register access */
> > +#ifdef CONFIG_ARM_GIC_V3
>
> nit: this #ifdef isn't relevant anymore and can be dropped throughout
> the file.
>
> > +.macro __init_el2_gicv3
> > + mrs x0, id_aa64pfr0_el1
> > + ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> > + cbz x0, 1f
> > +
> > + mrs_s x0, SYS_ICC_SRE_EL2
> > + orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
> > + orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> > + msr_s SYS_ICC_SRE_EL2, x0
> > + isb // Make sure SRE is now set
> > + mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
> > + tbz x0, #0, 1f // and check that it sticks
> > + msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
> > +1:
> > +.endm
> > +#endif
> > +
> > +/* Virtual CPU ID registers */
> > +.macro __init_el2_nvhe_idregs
> > + mrs x0, midr_el1
> > + mrs x1, mpidr_el1
> > + msr vpidr_el2, x0
> > + msr vmpidr_el2, x1
> > +.endm
> > +
> > +/* Coprocessor traps */
> > +.macro __init_el2_nvhe_cptr
> > + mov x0, #0x33ff
> > + msr cptr_el2, x0 // Disable copro. traps to EL2
> > +.endm
> > +
> > +/* SVE register access */
> > +.macro __init_el2_nvhe_sve
> > + mrs x1, id_aa64pfr0_el1
> > + ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
> > + cbz x1, 1f
> > +
> > + bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
> > + msr cptr_el2, x0 // Disable copro. traps to EL2
> > + isb
> > + mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
> > + msr_s SYS_ZCR_EL2, x1 // length for EL1.
> > +1:
> > +.endm
> > +
> > +.macro __init_el2_nvhe_spsr
>
> nit: this would be better named as "prepare_eret".
>
> > + mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> > + PSR_MODE_EL1h)
> > + msr spsr_el2, x0
> > +.endm
> > +
> > +.macro init_el2_state mode
> > +
> > +.ifnes "\mode", "vhe"
> > +.ifnes "\mode", "nvhe"
> > +.error "Invalid 'mode' argument"
> > +.endif
> > +.endif
> > +
> > + __init_el2_sctlr
> > + __init_el2_timers \mode
> > + __init_el2_debug \mode
> > + __init_el2_lor
> > + __init_el2_stage2
> > +
> > +#ifdef CONFIG_ARM_GIC_V3
> > + __init_el2_gicv3
> > +#endif
> > +
> > +#ifdef CONFIG_COMPAT
>
> I also think we can drop this one, as HSTR_EL2 is always defined, even
> when AArch32 isn't present in the system.
>
> > + msr hstr_el2, xzr // Disable CP15 traps to EL2
> > +#endif
> > +
> > + /*
> > + * When VHE is not in use, early init of EL2 needs to be done here.
> > + * When VHE _is_ in use, EL1 will not be used in the host and
> > + * requires no configuration, and all non-hyp-specific EL2 setup
> > + * will be done via the _EL1 system register aliases in __cpu_setup.
> > + */
> > +.ifeqs "\mode", "nvhe"
> > + __init_el2_nvhe_idregs
> > + __init_el2_nvhe_cptr
> > + __init_el2_nvhe_sve
> > + __init_el2_nvhe_spsr
> > +.endif
> > +
> > +.endm
>
> One thing that is missing here is a description of the registers that
> are clobbered. It was easy to spot before (everything was in the same
> file), and a bit harder now.

Will add a comment, but hopefully should be relatively easy to confirm.
The flow was broken but everything is in this one header file.

>
> > +
> > +#endif /* __ARM_KVM_INIT_H__ */
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index d8d9caf02834..da913ce9e89f 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -11,7 +11,6 @@
> >
> > #include <linux/linkage.h>
> > #include <linux/init.h>
> > -#include <linux/irqchip/arm-gic-v3.h>
> > #include <linux/pgtable.h>
> >
> > #include <asm/asm_pointer_auth.h>
> > @@ -21,6 +20,7 @@
> > #include <asm/asm-offsets.h>
> > #include <asm/cache.h>
> > #include <asm/cputype.h>
> > +#include <asm/el2_setup.h>
> > #include <asm/elf.h>
> > #include <asm/image.h>
> > #include <asm/kernel-pgtable.h>
> > @@ -493,159 +493,47 @@ SYM_FUNC_START(el2_setup)
> > mrs x0, CurrentEL
> > cmp x0, #CurrentEL_EL2
> > b.eq 1f
> > +
> > mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> > msr sctlr_el1, x0
> > mov w0, #BOOT_CPU_MODE_EL1 // This cpu booted in EL1
> > isb
> > ret
> >
> > -1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> > - msr sctlr_el2, x0
> > -
> > +1:
> > #ifdef CONFIG_ARM64_VHE
> > /*
> > - * Check for VHE being present. For the rest of the EL2 setup,
> > - * x2 being non-zero indicates that we do have VHE, and that the
> > - * kernel is intended to run at EL2.
> > + * Check for VHE being present. x2 being non-zero indicates that we
> > + * do have VHE, and that the kernel is intended to run at EL2.
> > */
> > mrs x2, id_aa64mmfr1_el1
> > ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
> > -#else
> > - mov x2, xzr
> > -#endif
> > + cbz x2, el2_setup_nvhe
> >
> > - /* Hyp configuration. */
> > - mov_q x0, HCR_HOST_NVHE_FLAGS
> > - cbz x2, set_hcr
> > mov_q x0, HCR_HOST_VHE_FLAGS
> > -set_hcr:
> > msr hcr_el2, x0
> > isb
> >
> > - /*
> > - * Allow Non-secure EL1 and EL0 to access physical timer and counter.
> > - * This is not necessary for VHE, since the host kernel runs in EL2,
> > - * and EL0 accesses are configured in the later stage of boot process.
> > - * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
> > - * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
> > - * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
> > - * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
> > - * EL2.
> > - */
> > - cbnz x2, 1f
> > - mrs x0, cnthctl_el2
> > - orr x0, x0, #3 // Enable EL1 physical timers
> > - msr cnthctl_el2, x0
> > -1:
> > - msr cntvoff_el2, xzr // Clear virtual offset
> > -
> > -#ifdef CONFIG_ARM_GIC_V3
> > - /* GICv3 system register access */
> > - mrs x0, id_aa64pfr0_el1
> > - ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> > - cbz x0, 3f
> > -
> > - mrs_s x0, SYS_ICC_SRE_EL2
> > - orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
> > - orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> > - msr_s SYS_ICC_SRE_EL2, x0
> > - isb // Make sure SRE is now set
> > - mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
> > - tbz x0, #0, 3f // and check that it sticks
> > - msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
> > -
> > -3:
> > -#endif
> > -
> > - /* Populate ID registers. */
> > - mrs x0, midr_el1
> > - mrs x1, mpidr_el1
> > - msr vpidr_el2, x0
> > - msr vmpidr_el2, x1
> > -
> > -#ifdef CONFIG_COMPAT
> > - msr hstr_el2, xzr // Disable CP15 traps to EL2
> > -#endif
> > -
> > - /* EL2 debug */
> > - mrs x1, id_aa64dfr0_el1
> > - sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
> > - cmp x0, #1
> > - b.lt 4f // Skip if no PMU present
> > - mrs x0, pmcr_el0 // Disable debug access traps
> > - ubfx x0, x0, #11, #5 // to EL2 and allow access to
> > -4:
> > - csel x3, xzr, x0, lt // all PMU counters from EL1
> > -
> > - /* Statistical profiling */
> > - ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
> > - cbz x0, 7f // Skip if SPE not present
> > - cbnz x2, 6f // VHE?
> > - mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
> > - and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
> > - cbnz x4, 5f // then permit sampling of physical
> > - mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
> > - 1 << SYS_PMSCR_EL2_PA_SHIFT)
> > - msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
> > -5:
> > - mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
> > - orr x3, x3, x1 // If we don't have VHE, then
> > - b 7f // use EL1&0 translation.
> > -6: // For VHE, use EL2 translation
> > - orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
> > -7:
> > - msr mdcr_el2, x3 // Configure debug traps
> > -
> > - /* LORegions */
> > - mrs x1, id_aa64mmfr1_el1
> > - ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
> > - cbz x0, 1f
> > - msr_s SYS_LORC_EL1, xzr
> > -1:
> > -
> > - /* Stage-2 translation */
> > - msr vttbr_el2, xzr
> > -
> > - cbz x2, install_el2_stub
> > + init_el2_state vhe
> >
> > mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
> > isb
> > ret
> > +#endif
> >
> > -SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
> > - /*
> > - * When VHE is not in use, early init of EL2 and EL1 needs to be
> > - * done here.
> > - * When VHE _is_ in use, EL1 will not be used in the host and
> > - * requires no configuration, and all non-hyp-specific EL2 setup
> > - * will be done via the _EL1 system register aliases in __cpu_setup.
> > - */
> > - mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> > - msr sctlr_el1, x0
> > -
> > - /* Coprocessor traps. */
> > - mov x0, #0x33ff
> > - msr cptr_el2, x0 // Disable copro. traps to EL2
> > -
> > - /* SVE register access */
> > - mrs x1, id_aa64pfr0_el1
> > - ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
> > - cbz x1, 7f
> > -
> > - bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
> > - msr cptr_el2, x0 // Disable copro. traps to EL2
> > +SYM_INNER_LABEL(el2_setup_nvhe, SYM_L_LOCAL)
> > + mov_q x0, HCR_HOST_NVHE_FLAGS
> > + msr hcr_el2, x0
> > isb
> > - mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
> > - msr_s SYS_ZCR_EL2, x1 // length for EL1.
> > +
> > + init_el2_state nvhe
> >
> > /* Hypervisor stub */
> > -7: adr_l x0, __hyp_stub_vectors
> > + adr_l x0, __hyp_stub_vectors
> > msr vbar_el2, x0
> >
> > - /* spsr */
> > - mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> > - PSR_MODE_EL1h)
> > - msr spsr_el2, x0
> > + mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
> > + msr sctlr_el1, x0
> > msr elr_el2, lr
> > mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
> > eret
> > --
> > 2.29.2.299.gdc1121823c-goog
> >
> >
>
> It looks much better now, thanks a lot for going through the pain of
> splitting everything.
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.