As we progress towards being able to keep guest state private to the
host running nVHE hypervisor, this series allows the hypervisor to
install itself on newly booted CPUs before the host is allowed to run
on them.
To this end, the hypervisor starts trapping host SMCs and intercepting
host's PSCI CPU_ON/OFF/SUSPEND calls. It replaces the host's entry point
with its own, initializes the EL2 state of the new CPU and installs
the nVHE hyp vector before ERETing to the host's entry point.
Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
to EL3. Future changes will need to ensure the safety of all SMCs wrt.
private guests.
The host is still allowed to reset EL2 back to the stub vector, eg. for
hibernation or kexec, but will not disable nVHE when there are no VMs.
Tested on Rock Pi 4b.
Sending this as an RFC to get feedback on the following decisions:
1) The kernel checks new cores' features against the finalized system
capabilities. To avoid the need to move this code/data to EL2, the
implementation only allows to boot cores that were online at the time of
KVM initialization.
2) Trapping and forwarding SMCs cannot be switched off. This could cause
issues eg. if EL3 always returned to EL1. A kernel command line flag may
be needed to turn the feature off on such platforms.
-David
David Brazdil (25):
psci: Export configured PSCI version
psci: Export configured PSCI function IDs
psci: Export psci_cpu_suspend_feature
arm64: Move MAIR_EL1_SET to asm/memory.h
kvm: arm64: Initialize MAIR_EL2 using a constant
kvm: arm64: Add .hyp.data ELF section
kvm: arm64: Support per_cpu_ptr in nVHE hyp code
kvm: arm64: Create nVHE copy of cpu_logical_map
kvm: arm64: Move hyp-init params to a per-CPU struct
kvm: arm64: Refactor handle_trap to use a switch
kvm: arm64: Extract parts of el2_setup into a macro
kvm: arm64: Add SMC handler in nVHE EL2
kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2
kvm: arm64: Forward safe PSCI SMCs coming from host
kvm: arm64: Add offset for hyp VA <-> PA conversion
kvm: arm64: Bootstrap PSCI power state of host CPUs
kvm: arm64: Intercept PSCI_CPU_OFF host SMC calls
kvm: arm64: Extract __do_hyp_init into a helper function
kvm: arm64: Add CPU entry point in nVHE hyp
kvm: arm64: Add function to enter host from KVM nVHE hyp code
kvm: arm64: Intercept PSCI_CPU_ON host SMC calls
kvm: arm64: Intercept host's CPU_SUSPEND PSCI SMCs
kvm: arm64: Keep nVHE EL2 vector installed
kvm: arm64: Trap host SMCs
kvm: arm64: Fix EL2 mode availability checks
Will Deacon (1):
arm64: kvm: Add standalone ticket spinlock implementation for use at
hyp
arch/arm64/include/asm/kvm_arm.h | 3 +-
arch/arm64/include/asm/kvm_asm.h | 142 +++++++++
arch/arm64/include/asm/kvm_hyp.h | 10 +
arch/arm64/include/asm/memory.h | 13 +
arch/arm64/include/asm/percpu.h | 6 +
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/include/asm/virt.h | 16 +
arch/arm64/kernel/asm-offsets.c | 5 +
arch/arm64/kernel/head.S | 140 +--------
arch/arm64/kernel/image-vars.h | 3 +
arch/arm64/kernel/vmlinux.lds.S | 10 +
arch/arm64/kvm/arm.c | 109 ++++++-
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 96 ++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/host.S | 9 +
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 82 ++++-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 57 +++-
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 3 +
arch/arm64/kvm/hyp/nvhe/percpu.c | 38 +++
arch/arm64/kvm/hyp/nvhe/psci.c | 333 +++++++++++++++++++++
arch/arm64/mm/proc.S | 13 -
drivers/firmware/psci/psci.c | 27 +-
include/linux/psci.h | 20 ++
include/uapi/linux/psci.h | 8 +
24 files changed, 948 insertions(+), 199 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/percpu.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/psci.c
--
2.29.1.341.ge80a0c044ae-goog
Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
with the version used by the host's PSCI driver and the function IDs it
was configured with. If the SMC function ID matches one of the
configured PSCI calls (for v0.1) or falls into the PSCI function ID
range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
SMCs return PSCI_RET_NOT_SUPPORTED.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_hyp.h | 4 ++
arch/arm64/kvm/arm.c | 12 ++++
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++
arch/arm64/kvm/hyp/nvhe/psci.c | 102 +++++++++++++++++++++++++++++
include/uapi/linux/psci.h | 1 +
6 files changed, 124 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/psci.c
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index a3289071f3d8..95a2bbbcc7e1 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -96,6 +96,10 @@ void deactivate_traps_vhe_put(void);
u64 __guest_enter(struct kvm_vcpu *vcpu);
+#ifdef __KVM_NVHE_HYPERVISOR__
+bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
+#endif
+
void __noreturn hyp_panic(void);
#ifdef __KVM_NVHE_HYPERVISOR__
void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ff200fc8d653..cedec793da64 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -19,6 +19,7 @@
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
#include <linux/sched/stat.h>
+#include <linux/psci.h>
#include <trace/events/kvm.h>
#define CREATE_TRACE_POINTS
@@ -1498,6 +1499,16 @@ static void init_cpu_logical_map(void)
CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
}
+static void init_psci(void)
+{
+ extern u32 kvm_nvhe_sym(kvm_host_psci_version);
+ extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
+
+ kvm_nvhe_sym(kvm_host_psci_version) = psci_driver_version;
+ memcpy(kvm_nvhe_sym(kvm_host_psci_function_id),
+ psci_function_id, sizeof(psci_function_id));
+}
+
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@@ -1676,6 +1687,7 @@ static int init_hyp_mode(void)
}
init_cpu_logical_map();
+ init_psci();
return 0;
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index c45f440cce51..647b63337a51 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -7,7 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
- hyp-main.o percpu.o
+ hyp-main.o percpu.o psci.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index fffc2dc09a1f..aa54db514550 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -134,6 +134,10 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
*/
skip_host_instruction();
+ /* Try to handle host's PSCI SMCs. */
+ if (kvm_host_psci_handler(host_ctxt))
+ return;
+
/* Forward SMC not handled in EL2 to EL3. */
forward_host_smc(host_ctxt);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
new file mode 100644
index 000000000000..82d3b2c89658
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: David Brazdil <[email protected]>
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <kvm/arm_hypercalls.h>
+#include <linux/arm-smccc.h>
+#include <linux/psci.h>
+#include <kvm/arm_psci.h>
+#include <uapi/linux/psci.h>
+
+/* Config options set by the host. */
+u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
+u32 kvm_host_psci_function_id[PSCI_FN_MAX];
+
+static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
+{
+ return host_ctxt->regs.regs[0];
+}
+
+static bool is_psci_0_1_call(u64 func_id)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(kvm_host_psci_function_id); ++i) {
+ if (func_id == kvm_host_psci_function_id[i])
+ return true;
+ }
+ return false;
+}
+
+static bool is_psci_0_2_fn_call(u64 func_id)
+{
+ u64 base = func_id & ~PSCI_0_2_FN_ID_MASK;
+
+ return base == PSCI_0_2_FN_BASE || base == PSCI_0_2_FN64_BASE;
+}
+
+static bool is_psci_call(u64 func_id)
+{
+ if (kvm_host_psci_version == PSCI_VERSION(0, 0))
+ return false;
+ else if (kvm_host_psci_version == PSCI_VERSION(0, 1))
+ return is_psci_0_1_call(func_id);
+ else
+ return is_psci_0_2_fn_call(func_id);
+}
+
+static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ return PSCI_RET_NOT_SUPPORTED;
+}
+
+static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ switch (func_id) {
+ default:
+ return PSCI_RET_NOT_SUPPORTED;
+ }
+}
+
+static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ int ret;
+
+ ret = psci_0_2_handler(func_id, host_ctxt);
+ if (ret != PSCI_RET_NOT_SUPPORTED)
+ return ret;
+
+ switch (func_id) {
+ default:
+ return PSCI_RET_NOT_SUPPORTED;
+ }
+}
+
+bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
+{
+ u64 func_id = get_psci_func_id(host_ctxt);
+ unsigned long ret;
+
+ if (!is_psci_call(func_id))
+ return false;
+
+ if (kvm_host_psci_version == PSCI_VERSION(0, 1))
+ ret = psci_0_1_handler(func_id, host_ctxt);
+ else if (kvm_host_psci_version == PSCI_VERSION(0, 2))
+ ret = psci_0_2_handler(func_id, host_ctxt);
+ else if (PSCI_VERSION_MAJOR(kvm_host_psci_version) >= 1)
+ ret = psci_1_0_handler(func_id, host_ctxt);
+ else
+ ret = PSCI_RET_NOT_SUPPORTED;
+
+ host_ctxt->regs.regs[0] = ret;
+ host_ctxt->regs.regs[1] = 0;
+ host_ctxt->regs.regs[2] = 0;
+ host_ctxt->regs.regs[3] = 0;
+ return true;
+}
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 2fcad1dd0b0e..0d52b8dbe8c2 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -29,6 +29,7 @@
#define PSCI_0_2_FN64_BASE \
(PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
#define PSCI_0_2_FN64(n) (PSCI_0_2_FN64_BASE + (n))
+#define PSCI_0_2_FN_ID_MASK 0xffff
#define PSCI_0_2_FN_PSCI_VERSION PSCI_0_2_FN(0)
#define PSCI_0_2_FN_CPU_SUSPEND PSCI_0_2_FN(1)
--
2.29.1.341.ge80a0c044ae-goog
When the a CPU is booted in EL2, the kernel checks for VHE support and
initializes the CPU core accordingly. For nVHE it also installs the stub
vectors and drops down to EL1.
Once KVM gains the ability to boot cores without going through the
kernel entry point, it will need to initialize the CPU the same way.
Extract the relevant bits of el2_setup into init_el2_state macro
with an argument specifying whether to initialize for VHE or nVHE.
No functional change. Size of el2_setup increased by 148 bytes due
to duplication.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 128 ++++++++++++++++++++++++++++
arch/arm64/kernel/head.S | 140 +++----------------------------
2 files changed, 141 insertions(+), 127 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a49a87a186c3..893327d1e449 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -331,6 +331,134 @@ extern char __smccc_workaround_1_smc[__SMCCC_WORKAROUND_1_SMC_SZ];
msr sp_el0, \tmp
.endm
+.macro init_el2_state mode
+
+.ifnes "\mode", "vhe"
+.ifnes "\mode", "nvhe"
+.error "Invalid 'mode' argument"
+.endif
+.endif
+
+ mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
+ msr sctlr_el2, x0
+ isb
+
+ /*
+ * Allow Non-secure EL1 and EL0 to access physical timer and counter.
+ * This is not necessary for VHE, since the host kernel runs in EL2,
+ * and EL0 accesses are configured in the later stage of boot process.
+ * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
+ * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
+ * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
+ * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
+ * EL2.
+ */
+.ifeqs "\mode", "nvhe"
+ mrs x0, cnthctl_el2
+ orr x0, x0, #3 // Enable EL1 physical timers
+ msr cnthctl_el2, x0
+.endif
+ msr cntvoff_el2, xzr // Clear virtual offset
+
+#ifdef CONFIG_ARM_GIC_V3
+ /* GICv3 system register access */
+ mrs x0, id_aa64pfr0_el1
+ ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
+ cbz x0, 3f
+
+ mrs_s x0, SYS_ICC_SRE_EL2
+ orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
+ orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
+ msr_s SYS_ICC_SRE_EL2, x0
+ isb // Make sure SRE is now set
+ mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
+ tbz x0, #0, 3f // and check that it sticks
+ msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
+3:
+#endif
+
+ /* Populate ID registers. */
+ mrs x0, midr_el1
+ mrs x1, mpidr_el1
+ msr vpidr_el2, x0
+ msr vmpidr_el2, x1
+
+#ifdef CONFIG_COMPAT
+ msr hstr_el2, xzr // Disable CP15 traps to EL2
+#endif
+
+ /* EL2 debug */
+ mrs x1, id_aa64dfr0_el1
+ sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
+ cmp x0, #1
+ b.lt 4f // Skip if no PMU present
+ mrs x0, pmcr_el0 // Disable debug access traps
+ ubfx x0, x0, #11, #5 // to EL2 and allow access to
+4:
+ csel x3, xzr, x0, lt // all PMU counters from EL1
+
+ /* Statistical profiling */
+ ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
+ cbz x0, 7f // Skip if SPE not present
+.ifeqs "\mode", "nvhe"
+ mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
+ and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
+ cbnz x4, 5f // then permit sampling of physical
+ mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
+ 1 << SYS_PMSCR_EL2_PA_SHIFT)
+ msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
+5:
+ mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
+ orr x3, x3, x1 // If we don't have VHE, then
+ b 7f // use EL1&0 translation.
+.endif
+ orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
+7:
+ msr mdcr_el2, x3 // Configure debug traps
+
+ /* LORegions */
+ mrs x1, id_aa64mmfr1_el1
+ ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
+ cbz x0, 1f
+ msr_s SYS_LORC_EL1, xzr
+1:
+
+ /* Stage-2 translation */
+ msr vttbr_el2, xzr
+
+.ifeqs "\mode", "nvhe"
+ /*
+ * When VHE is not in use, early init of EL2 and EL1 needs to be
+ * done here.
+ * When VHE _is_ in use, EL1 will not be used in the host and
+ * requires no configuration, and all non-hyp-specific EL2 setup
+ * will be done via the _EL1 system register aliases in __cpu_setup.
+ */
+ mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
+ msr sctlr_el1, x0
+
+ /* Coprocessor traps. */
+ mov x0, #0x33ff
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+
+ /* SVE register access */
+ mrs x1, id_aa64pfr0_el1
+ ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
+ cbz x1, 7f
+
+ bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+ isb
+ mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
+ msr_s SYS_ZCR_EL2, x1 // length for EL1.
+
+ /* spsr */
+7: mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
+ PSR_MODE_EL1h)
+ msr spsr_el2, x0
+.endif
+.endm
+
#endif
#endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d8d9caf02834..e7270b63abed 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -25,6 +25,7 @@
#include <asm/image.h>
#include <asm/kernel-pgtable.h>
#include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
#include <asm/page.h>
@@ -499,153 +500,38 @@ SYM_FUNC_START(el2_setup)
isb
ret
-1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
- msr sctlr_el2, x0
-
+1:
#ifdef CONFIG_ARM64_VHE
/*
- * Check for VHE being present. For the rest of the EL2 setup,
- * x2 being non-zero indicates that we do have VHE, and that the
- * kernel is intended to run at EL2.
+ * Check for VHE being present. x2 being non-zero indicates that we
+ * do have VHE, and that the kernel is intended to run at EL2.
*/
mrs x2, id_aa64mmfr1_el1
ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
-#else
- mov x2, xzr
-#endif
+ cbz x2, el2_setup_nvhe
- /* Hyp configuration. */
- mov_q x0, HCR_HOST_NVHE_FLAGS
- cbz x2, set_hcr
mov_q x0, HCR_HOST_VHE_FLAGS
-set_hcr:
msr hcr_el2, x0
isb
- /*
- * Allow Non-secure EL1 and EL0 to access physical timer and counter.
- * This is not necessary for VHE, since the host kernel runs in EL2,
- * and EL0 accesses are configured in the later stage of boot process.
- * Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
- * as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
- * to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
- * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
- * EL2.
- */
- cbnz x2, 1f
- mrs x0, cnthctl_el2
- orr x0, x0, #3 // Enable EL1 physical timers
- msr cnthctl_el2, x0
-1:
- msr cntvoff_el2, xzr // Clear virtual offset
-
-#ifdef CONFIG_ARM_GIC_V3
- /* GICv3 system register access */
- mrs x0, id_aa64pfr0_el1
- ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
- cbz x0, 3f
-
- mrs_s x0, SYS_ICC_SRE_EL2
- orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
- orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
- msr_s SYS_ICC_SRE_EL2, x0
- isb // Make sure SRE is now set
- mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
- tbz x0, #0, 3f // and check that it sticks
- msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
-
-3:
-#endif
-
- /* Populate ID registers. */
- mrs x0, midr_el1
- mrs x1, mpidr_el1
- msr vpidr_el2, x0
- msr vmpidr_el2, x1
-
-#ifdef CONFIG_COMPAT
- msr hstr_el2, xzr // Disable CP15 traps to EL2
-#endif
-
- /* EL2 debug */
- mrs x1, id_aa64dfr0_el1
- sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
- cmp x0, #1
- b.lt 4f // Skip if no PMU present
- mrs x0, pmcr_el0 // Disable debug access traps
- ubfx x0, x0, #11, #5 // to EL2 and allow access to
-4:
- csel x3, xzr, x0, lt // all PMU counters from EL1
-
- /* Statistical profiling */
- ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
- cbz x0, 7f // Skip if SPE not present
- cbnz x2, 6f // VHE?
- mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
- and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
- cbnz x4, 5f // then permit sampling of physical
- mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
- 1 << SYS_PMSCR_EL2_PA_SHIFT)
- msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
-5:
- mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
- orr x3, x3, x1 // If we don't have VHE, then
- b 7f // use EL1&0 translation.
-6: // For VHE, use EL2 translation
- orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
-7:
- msr mdcr_el2, x3 // Configure debug traps
-
- /* LORegions */
- mrs x1, id_aa64mmfr1_el1
- ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
- cbz x0, 1f
- msr_s SYS_LORC_EL1, xzr
-1:
-
- /* Stage-2 translation */
- msr vttbr_el2, xzr
-
- cbz x2, install_el2_stub
+ init_el2_state vhe
mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
isb
ret
+#endif
-SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
- /*
- * When VHE is not in use, early init of EL2 and EL1 needs to be
- * done here.
- * When VHE _is_ in use, EL1 will not be used in the host and
- * requires no configuration, and all non-hyp-specific EL2 setup
- * will be done via the _EL1 system register aliases in __cpu_setup.
- */
- mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
- msr sctlr_el1, x0
-
- /* Coprocessor traps. */
- mov x0, #0x33ff
- msr cptr_el2, x0 // Disable copro. traps to EL2
-
- /* SVE register access */
- mrs x1, id_aa64pfr0_el1
- ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
- cbz x1, 7f
-
- bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
- msr cptr_el2, x0 // Disable copro. traps to EL2
+SYM_INNER_LABEL(el2_setup_nvhe, SYM_L_LOCAL)
+ mov_q x0, HCR_HOST_NVHE_FLAGS
+ msr hcr_el2, x0
isb
- mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
- msr_s SYS_ZCR_EL2, x1 // length for EL1.
+
+ init_el2_state nvhe
/* Hypervisor stub */
-7: adr_l x0, __hyp_stub_vectors
+ adr_l x0, __hyp_stub_vectors
msr vbar_el2, x0
- /* spsr */
- mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
- PSR_MODE_EL1h)
- msr spsr_el2, x0
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
eret
--
2.29.1.341.ge80a0c044ae-goog
Forward the following PSCI SMCs issued by host to EL3 as they do not
require the hypervisor's intervention. This assumes that EL3 correctly
implements the PSCI specification.
Only function IDs implemented in Linux are included.
Where both 32-bit and 64-bit variants exist, it is assumed that the host
will always use the 64-bit variant.
* SMCs that only return information about the system
* PSCI_VERSION - PSCI version implemented by EL3
* PSCI_FEATURES - optional features supported by EL3
* AFFINITY_INFO - power state of core/cluster
* MIGRATE_INFO_TYPE - whether Trusted OS can be migrated
* MIGRATE_INFO_UP_CPU - resident core of Trusted OS
* operations which do not affect the hypervisor
* MIGRATE - migrate Trusted OS to a different core
* SET_SUSPEND_MODE - toggle OS-initiated mode
* system shutdown/reset
* SYSTEM_OFF
* SYSTEM_RESET
* SYSTEM_RESET2
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/psci.c | 40 +++++++++++++++++++++++++++++++++-
1 file changed, 39 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 82d3b2c89658..8f779560ab6f 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -50,14 +50,48 @@ static bool is_psci_call(u64 func_id)
return is_psci_0_2_fn_call(func_id);
}
+static unsigned long psci_call(unsigned long fn, unsigned long arg0,
+ unsigned long arg1, unsigned long arg2)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_smc(fn, arg0, arg1, arg2, &res);
+ return res.a0;
+}
+
+static unsigned long psci_forward(struct kvm_cpu_context *host_ctxt)
+{
+ return psci_call(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
+ host_ctxt->regs.regs[2], host_ctxt->regs.regs[3]);
+}
+
+static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *host_ctxt)
+{
+ psci_forward(host_ctxt);
+ hyp_panic(); /* unreachable */
+}
+
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
- return PSCI_RET_NOT_SUPPORTED;
+ if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
+ return psci_forward(host_ctxt);
+ else
+ return PSCI_RET_NOT_SUPPORTED;
}
static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
switch (func_id) {
+ case PSCI_0_2_FN_PSCI_VERSION:
+ case PSCI_0_2_FN64_AFFINITY_INFO:
+ case PSCI_0_2_FN64_MIGRATE:
+ case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
+ case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
+ return psci_forward(host_ctxt);
+ case PSCI_0_2_FN_SYSTEM_OFF:
+ case PSCI_0_2_FN_SYSTEM_RESET:
+ psci_forward_noreturn(host_ctxt);
+ unreachable();
default:
return PSCI_RET_NOT_SUPPORTED;
}
@@ -72,6 +106,10 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
return ret;
switch (func_id) {
+ case PSCI_1_0_FN_PSCI_FEATURES:
+ case PSCI_1_0_FN_SET_SUSPEND_MODE:
+ case PSCI_1_1_FN64_SYSTEM_RESET2:
+ return psci_forward(host_ctxt);
default:
return PSCI_RET_NOT_SUPPORTED;
}
--
2.29.1.341.ge80a0c044ae-goog
Proxying host's PSCI SMCs will require synchronizing CPU_ON/OFF/SUSPEND
calls based on the observed state of individual cores. Add a per-CPU enum
that tracks the power state of each core and initialize all CPUs online
at the point of KVM init to ON.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 6 ++++++
arch/arm64/include/asm/kvm_hyp.h | 1 +
arch/arm64/kvm/arm.c | 5 +++++
arch/arm64/kvm/hyp/nvhe/psci.c | 2 ++
4 files changed, 14 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 893327d1e449..9eecb37db6df 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -157,6 +157,12 @@ struct kvm_nvhe_init_params {
unsigned long vector_ptr;
};
+enum kvm_nvhe_psci_state {
+ KVM_NVHE_PSCI_CPU_OFF = 0,
+ KVM_NVHE_PSCI_CPU_PENDING_ON,
+ KVM_NVHE_PSCI_CPU_ON,
+};
+
/* Translate a kernel address @ptr into its equivalent linear mapping */
#define kvm_ksym_ref(ptr) \
({ \
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 95a2bbbcc7e1..cf4c1d16c3e0 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -97,6 +97,7 @@ void deactivate_traps_vhe_put(void);
u64 __guest_enter(struct kvm_vcpu *vcpu);
#ifdef __KVM_NVHE_HYPERVISOR__
+void kvm_host_psci_cpu_init(void);
bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
#endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 580d4a656a7b..5b073806463e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -52,6 +52,7 @@ DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+DECLARE_KVM_NVHE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);
/* The VMID used in the VTTBR */
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
@@ -1517,10 +1518,14 @@ static void init_psci(void)
{
extern u32 kvm_nvhe_sym(kvm_host_psci_version);
extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
+ int cpu;
kvm_nvhe_sym(kvm_host_psci_version) = psci_driver_version;
memcpy(kvm_nvhe_sym(kvm_host_psci_function_id),
psci_function_id, sizeof(psci_function_id));
+
+ for_each_online_cpu(cpu)
+ *per_cpu_ptr_nvhe_sym(psci_cpu_state, cpu) = KVM_NVHE_PSCI_CPU_ON;
}
static int init_common_resources(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 3eafcf48a29b..c3d0a6246c66 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -20,6 +20,8 @@ s64 hyp_physvirt_offset;
#define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)
+DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);
+
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
return host_ctxt->regs.regs[0];
--
2.29.1.341.ge80a0c044ae-goog
KVM currently initializes MAIR_EL2 to the value of MAIR_EL1. In
preparation for initializing MAIR_EL2 before MAIR_EL1, move the constant
into a shared header file.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/memory.h | 13 +++++++++++++
arch/arm64/mm/proc.S | 13 -------------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index cd61239bae8c..aca00737e771 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -152,6 +152,19 @@
#define MT_S2_FWB_NORMAL 6
#define MT_S2_FWB_DEVICE_nGnRE 1
+/*
+ * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and
+ * changed during __cpu_setup to Normal Tagged if the system supports MTE.
+ */
+#define MAIR_EL1_SET \
+ (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \
+ MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) | \
+ MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \
+ MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
+
#ifdef CONFIG_ARM64_4K_PAGES
#define IOREMAP_MAX_ORDER (PUD_SHIFT)
#else
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 23c326a06b2d..25ff21b3a1c6 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -45,19 +45,6 @@
#define TCR_KASAN_FLAGS 0
#endif
-/*
- * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and
- * changed during __cpu_setup to Normal Tagged if the system supports MTE.
- */
-#define MAIR_EL1_SET \
- (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \
- MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) | \
- MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \
- MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
-
#ifdef CONFIG_CPU_PM
/**
* cpu_do_suspend - save CPU registers context
--
2.29.1.341.ge80a0c044ae-goog
Add a handler of CPU_SUSPEND host PSCI SMCs. When invoked, it determines
whether the requested power state loses context, ie. whether it is
indistinguishable from a WHI or whether it is a deeper sleep state that
behaves like a CPU_OFF+CPU_ON.
If it's the former, it forwards the call to EL3 and returns to the host
after waking up.
If it's the latter, it saves r0,pc of the host into and makes the same
call to EL3 with the hyp CPU entry point. When the core wakes up, EL2
state is initialized before dropping back to EL1.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 2 ++
arch/arm64/kvm/hyp/nvhe/psci.c | 49 +++++++++++++++++++++++++++++++++-
drivers/firmware/psci/psci.c | 9 -------
include/uapi/linux/psci.h | 7 +++++
4 files changed, 57 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 166975999ead..6fbda652200b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1521,9 +1521,11 @@ static void init_psci(void)
{
extern u32 kvm_nvhe_sym(kvm_host_psci_version);
extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
+ extern u32 kvm_nvhe_sym(kvm_host_psci_cpu_suspend_feature);
int cpu;
kvm_nvhe_sym(kvm_host_psci_version) = psci_driver_version;
+ kvm_nvhe_sym(kvm_host_psci_cpu_suspend_feature) = psci_cpu_suspend_feature;
memcpy(kvm_nvhe_sym(kvm_host_psci_function_id),
psci_function_id, sizeof(psci_function_id));
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 42ee5effa827..4899c8319bb4 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -21,6 +21,7 @@
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
+u32 kvm_host_psci_cpu_suspend_feature;
s64 hyp_physvirt_offset;
#define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)
@@ -83,6 +84,20 @@ static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *ho
hyp_panic(); /* unreachable */
}
+static bool psci_has_ext_power_state(void)
+{
+ return kvm_host_psci_cpu_suspend_feature & PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
+}
+
+static bool psci_power_state_loses_context(u32 state)
+{
+ const u32 mask = psci_has_ext_power_state() ?
+ PSCI_1_0_EXT_POWER_STATE_TYPE_MASK :
+ PSCI_0_2_POWER_STATE_TYPE_MASK;
+
+ return state & mask;
+}
+
static unsigned int find_cpu_id(u64 mpidr)
{
int i;
@@ -106,6 +121,34 @@ static phys_addr_t cpu_entry_pa(void)
return kern_va - kimage_voffset;
}
+static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ u64 power_state = host_ctxt->regs.regs[1];
+ unsigned long pc = host_ctxt->regs.regs[2];
+ unsigned long r0 = host_ctxt->regs.regs[3];
+ hyp_spinlock_t *cpu_lock;
+ struct vcpu_reset_state *cpu_reset;
+ struct kvm_nvhe_init_params *cpu_params;
+
+ if (!psci_power_state_loses_context(power_state)) {
+ /* This power state has the same semantics as WFI. */
+ return psci_call(PSCI_0_2_FN64_CPU_SUSPEND, 0, 0, 0);
+ }
+
+ cpu_lock = this_cpu_ptr(&psci_cpu_lock);
+ cpu_reset = this_cpu_ptr(&psci_cpu_reset);
+ cpu_params = this_cpu_ptr(&kvm_init_params);
+
+ /* Resuming from this state has the same semantics as CPU_ON. */
+ hyp_spin_lock(cpu_lock);
+ *cpu_reset = (struct vcpu_reset_state){
+ .pc = pc,
+ .r0 = r0,
+ };
+ hyp_spin_unlock(cpu_lock);
+ return psci_call(func_id, power_state, cpu_entry_pa(), __hyp_pa(cpu_params));
+}
+
static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
@@ -193,7 +236,9 @@ static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
- if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
+ if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_SUSPEND])
+ return psci_cpu_suspend(func_id, host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
return psci_cpu_off(func_id, host_ctxt);
else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_ON])
return psci_cpu_on(func_id, host_ctxt);
@@ -216,6 +261,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
case PSCI_0_2_FN_SYSTEM_RESET:
psci_forward_noreturn(host_ctxt);
unreachable();
+ case PSCI_0_2_FN64_CPU_SUSPEND:
+ return psci_cpu_suspend(func_id, host_ctxt);
case PSCI_0_2_FN_CPU_OFF:
return psci_cpu_off(func_id, host_ctxt);
case PSCI_0_2_FN64_CPU_ON:
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index b6ad237b1518..387e24409da7 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -62,15 +62,6 @@ static psci_fn *invoke_psci_fn;
u32 psci_function_id[PSCI_FN_MAX];
-#define PSCI_0_2_POWER_STATE_MASK \
- (PSCI_0_2_POWER_STATE_ID_MASK | \
- PSCI_0_2_POWER_STATE_TYPE_MASK | \
- PSCI_0_2_POWER_STATE_AFFL_MASK)
-
-#define PSCI_1_0_EXT_POWER_STATE_MASK \
- (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
- PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
-
u32 psci_cpu_suspend_feature;
static bool psci_system_reset2_supported;
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 0d52b8dbe8c2..df3d85ce86f7 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -65,6 +65,10 @@
#define PSCI_0_2_POWER_STATE_AFFL_SHIFT 24
#define PSCI_0_2_POWER_STATE_AFFL_MASK \
(0x3 << PSCI_0_2_POWER_STATE_AFFL_SHIFT)
+#define PSCI_0_2_POWER_STATE_MASK \
+ (PSCI_0_2_POWER_STATE_ID_MASK | \
+ PSCI_0_2_POWER_STATE_TYPE_MASK | \
+ PSCI_0_2_POWER_STATE_AFFL_MASK)
/* PSCI extended power state encoding for CPU_SUSPEND function */
#define PSCI_1_0_EXT_POWER_STATE_ID_MASK 0xfffffff
@@ -72,6 +76,9 @@
#define PSCI_1_0_EXT_POWER_STATE_TYPE_SHIFT 30
#define PSCI_1_0_EXT_POWER_STATE_TYPE_MASK \
(0x1 << PSCI_1_0_EXT_POWER_STATE_TYPE_SHIFT)
+#define PSCI_1_0_EXT_POWER_STATE_MASK \
+ (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
+ PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
/* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
#define PSCI_0_2_AFFINITY_LEVEL_ON 0
--
2.29.1.341.ge80a0c044ae-goog
Add a handler of the CPU_OFF PSCI host SMC trapped in KVM nVHE hyp code.
When invoked, it changes the recorded state of the core to OFF before
forwarding the call to EL3. If the call fails, it changes the state back
to ON and returns the error to the host.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/psci.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index c3d0a6246c66..00dc0cab860c 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -13,6 +13,8 @@
#include <kvm/arm_psci.h>
#include <uapi/linux/psci.h>
+#include <nvhe/spinlock.h>
+
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
@@ -20,6 +22,7 @@ s64 hyp_physvirt_offset;
#define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)
+static DEFINE_PER_CPU(hyp_spinlock_t, psci_cpu_lock);
DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
@@ -76,9 +79,32 @@ static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *ho
hyp_panic(); /* unreachable */
}
+static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
+ enum kvm_nvhe_psci_state *cpu_power = this_cpu_ptr(&psci_cpu_state);
+ u32 power_state = (u32)host_ctxt->regs.regs[1];
+ int ret;
+
+ /* Change the recorded state to OFF before forwarding the call. */
+ hyp_spin_lock(cpu_lock);
+ *cpu_power = KVM_NVHE_PSCI_CPU_OFF;
+ hyp_spin_unlock(cpu_lock);
+
+ ret = psci_call(func_id, power_state, 0, 0);
+
+ /* Call was unsuccessful. Restore the recorded state and return to host. */
+ hyp_spin_lock(cpu_lock);
+ *cpu_power = KVM_NVHE_PSCI_CPU_ON;
+ hyp_spin_unlock(cpu_lock);
+ return ret;
+}
+
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
- if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
+ if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
+ return psci_cpu_off(func_id, host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
return psci_forward(host_ctxt);
else
return PSCI_RET_NOT_SUPPORTED;
@@ -97,6 +123,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
case PSCI_0_2_FN_SYSTEM_RESET:
psci_forward_noreturn(host_ctxt);
unreachable();
+ case PSCI_0_2_FN_CPU_OFF:
+ return psci_cpu_off(func_id, host_ctxt);
default:
return PSCI_RET_NOT_SUPPORTED;
}
--
2.29.1.341.ge80a0c044ae-goog
In preparation for adding a CPU entry point in nVHE hyp code, extract
most of __do_hyp_init hypervisor initialization code into a common
helper function. This will be invoked by the entry point to install KVM
on the newly booted CPU.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 39 +++++++++++++++++++++---------
1 file changed, 28 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 6f3ac5d428ec..1726cc44b3ee 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -68,16 +68,35 @@ __do_hyp_init:
mov x0, #SMCCC_RET_NOT_SUPPORTED
eret
-1: ldr x0, [x1, #NVHE_INIT_TPIDR_EL2]
- msr tpidr_el2, x0
+1: mov x0, x1
+ mov x4, lr
+ bl ___kvm_hyp_init
+ mov lr, x4
- ldr x0, [x1, #NVHE_INIT_STACK_PTR]
- mov sp, x0
+ /* Hello, World! */
+ mov x0, #SMCCC_RET_SUCCESS
+ eret
+SYM_CODE_END(__kvm_hyp_init)
+
+/*
+ * Initialize the hypervisor in EL2.
+ *
+ * Only modifies x0..x3 so as to not clobber callee-saved SMCCC registers
+ * and leave x4 for the caller.
+ *
+ * x0: struct kvm_nvhe_init_params PA
+ */
+SYM_CODE_START(___kvm_hyp_init)
+ ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
+ msr tpidr_el2, x1
+
+ ldr x1, [x0, #NVHE_INIT_STACK_PTR]
+ mov sp, x1
- ldr x0, [x1, #NVHE_INIT_VECTOR_PTR]
- msr vbar_el2, x0
+ ldr x1, [x0, #NVHE_INIT_VECTOR_PTR]
+ msr vbar_el2, x1
- ldr x1, [x1, #NVHE_INIT_PGD_PTR]
+ ldr x1, [x0, #NVHE_INIT_PGD_PTR]
phys_to_ttbr x0, x1
alternative_if ARM64_HAS_CNP
orr x0, x0, #TTBR_CNP_BIT
@@ -137,10 +156,8 @@ alternative_else_nop_endif
msr sctlr_el2, x0
isb
- /* Hello, World! */
- mov x0, #SMCCC_RET_SUCCESS
- eret
-SYM_CODE_END(__kvm_hyp_init)
+ ret
+SYM_CODE_END(___kvm_hyp_init)
SYM_CODE_START(__kvm_handle_stub_hvc)
cmp x0, #HVC_SOFT_RESTART
--
2.29.1.341.ge80a0c044ae-goog
KVM by default keeps the stub vector installed and installs the nVHE
vector only briefly for init and later on demand. Change this policy
to install the vector at init and then never uninstall it.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6fbda652200b..3dff6af69eca 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -64,6 +64,11 @@ static bool vgic_present;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
+static bool keep_hyp_installed(void)
+{
+ return !is_kernel_in_hyp_mode();
+}
+
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@@ -1430,7 +1435,8 @@ static void _kvm_arch_hardware_disable(void *discard)
void kvm_arch_hardware_disable(void)
{
- _kvm_arch_hardware_disable(NULL);
+ if (!keep_hyp_installed())
+ _kvm_arch_hardware_disable(NULL);
}
#ifdef CONFIG_CPU_PM
@@ -1473,11 +1479,13 @@ static struct notifier_block hyp_init_cpu_pm_nb = {
static void __init hyp_cpu_pm_init(void)
{
- cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
+ if (!keep_hyp_installed())
+ cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
}
static void __init hyp_cpu_pm_exit(void)
{
- cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
+ if (!keep_hyp_installed())
+ cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
}
#else
static inline void hyp_cpu_pm_init(void)
@@ -1580,7 +1588,8 @@ static int init_subsystems(void)
kvm_coproc_table_init();
out:
- on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
+ if (err || !keep_hyp_installed())
+ on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
return err;
}
--
2.29.1.341.ge80a0c044ae-goog
From: Will Deacon <[email protected]>
We will soon need to synchronise multiple CPUs in the hyp text at EL2.
The qspinlock-based locking used by the host is overkill for this purpose
and relies on the kernel's "percpu" implementation for the MCS nodes.
Implement a simple ticket locking scheme based heavily on the code removed
by commit c11090474d70 ("arm64: locking: Replace ticket lock implementation
with qspinlock").
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 96 ++++++++++++++++++++++
1 file changed, 96 insertions(+)
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
new file mode 100644
index 000000000000..dc0397e5b5f2
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * A stand-alone ticket spinlock implementation for use by the non-VHE
+ * KVM hypervisor code running at EL2.
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <[email protected]>
+ *
+ * Heavily based on the implementation removed by c11090474d70 which was:
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __KVM_NVHE_HYPERVISOR__
+#error "Attempt to include nVHE code outside of EL2 object"
+#endif
+
+#ifndef __ARM64_KVM_NVHE_SPINLOCK_H__
+#define __ARM64_KVM_NVHE_SPINLOCK_H__
+
+#include <asm/alternative.h>
+#include <asm/lse.h>
+
+typedef union hyp_spinlock {
+ u32 __val;
+ struct {
+#ifdef __AARCH64EB__
+ u16 next, owner;
+#else
+ u16 owner, next;
+ };
+#endif
+} hyp_spinlock_t;
+
+#define hyp_spin_lock_init(l) \
+do { \
+ *(l) = (hyp_spinlock_t){ .__val = 0 }; \
+} while (0)
+
+static inline void hyp_spin_lock(hyp_spinlock_t *lock)
+{
+ u32 tmp;
+ hyp_spinlock_t lockval, newval;
+
+ asm volatile(
+ /* Atomically increment the next ticket. */
+ ARM64_LSE_ATOMIC_INSN(
+ /* LL/SC */
+" prfm pstl1strm, %3\n"
+"1: ldaxr %w0, %3\n"
+" add %w1, %w0, #(1 << 16)\n"
+" stxr %w2, %w1, %3\n"
+" cbnz %w2, 1b\n",
+ /* LSE atomics */
+" mov %w2, #(1 << 16)\n"
+" ldadda %w2, %w0, %3\n"
+ __nops(3))
+
+ /* Did we get the lock? */
+" eor %w1, %w0, %w0, ror #16\n"
+" cbz %w1, 3f\n"
+ /*
+ * No: spin on the owner. Send a local event to avoid missing an
+ * unlock before the exclusive load.
+ */
+" sevl\n"
+"2: wfe\n"
+" ldaxrh %w2, %4\n"
+" eor %w1, %w2, %w0, lsr #16\n"
+" cbnz %w1, 2b\n"
+ /* We got the lock. Critical section starts here. */
+"3:"
+ : "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
+ : "Q" (lock->owner)
+ : "memory");
+}
+
+static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
+{
+ u64 tmp;
+
+ asm volatile(
+ ARM64_LSE_ATOMIC_INSN(
+ /* LL/SC */
+ " ldrh %w1, %0\n"
+ " add %w1, %w1, #1\n"
+ " stlrh %w1, %0",
+ /* LSE atomics */
+ " mov %w1, #1\n"
+ " staddlh %w1, %0\n"
+ __nops(1))
+ : "=Q" (lock->owner), "=&r" (tmp)
+ :
+ : "memory");
+}
+
+#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
--
2.29.1.341.ge80a0c044ae-goog
The version of PSCI that the kernel should use to communicate with
firmware is typically obtained from probing PSCI_VERSION. However, that
doesn't work for PSCI v0.1 where the host gets the information from
DT/ACPI, or if PSCI is not supported / was disabled.
KVM's PSCI proxy for the host needs to be configured with the same
version used by the host driver. Expose the PSCI version used by the
host.
Signed-off-by: David Brazdil <[email protected]>
---
drivers/firmware/psci/psci.c | 6 ++++++
include/linux/psci.h | 8 ++++++++
2 files changed, 14 insertions(+)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 00af99b6f97c..ff523bdbfe3f 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -49,6 +49,8 @@ static int resident_cpu = -1;
struct psci_operations psci_ops;
static enum arm_smccc_conduit psci_conduit = SMCCC_CONDUIT_NONE;
+int psci_driver_version = PSCI_VERSION(0, 0);
+
bool psci_tos_resident_on(int cpu)
{
return cpu == resident_cpu;
@@ -461,6 +463,8 @@ static int __init psci_probe(void)
return -EINVAL;
}
+ psci_driver_version = ver;
+
psci_0_2_set_functions();
psci_init_migrate();
@@ -514,6 +518,8 @@ static int __init psci_0_1_init(struct device_node *np)
pr_info("Using PSCI v0.1 Function IDs from DT\n");
+ psci_driver_version = PSCI_VERSION(0, 1);
+
if (!of_property_read_u32(np, "cpu_suspend", &id)) {
psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
psci_ops.cpu_suspend = psci_cpu_suspend;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 2a1bfb890e58..cb35b90d1746 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -21,6 +21,14 @@ bool psci_power_state_is_valid(u32 state);
int psci_set_osi_mode(bool enable);
bool psci_has_osi_support(void);
+/**
+ * The version of the PSCI specification followed by the driver.
+ * This is equivalent to calling PSCI_VERSION except:
+ * (a) it also works for PSCI v0.1, which does not support PSCI_VERSION, and
+ * (b) it is set to v0.0 if the PSCI driver was not initialized.
+ */
+extern int psci_driver_version;
+
struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
--
2.29.1.341.ge80a0c044ae-goog
Add a handler of the CPU_ON PSCI call from host. When invoked, it
looks up the logical CPU ID corresponding to the provided MPIDR and
populates the state struct of the target CPU with the provided x0, pc.
It then calls CPU_ON itself, with an entry point in hyp that initializes
EL2 state before returning ERET to the provided PC in EL1.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_hyp.h | 1 +
arch/arm64/kvm/arm.c | 3 +
arch/arm64/kvm/hyp/nvhe/psci.c | 113 +++++++++++++++++++++++++++++++
3 files changed, 117 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index cf4c1d16c3e0..2d88a2dad4de 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -97,6 +97,7 @@ void deactivate_traps_vhe_put(void);
u64 __guest_enter(struct kvm_vcpu *vcpu);
#ifdef __KVM_NVHE_HYPERVISOR__
+asmlinkage void __noreturn kvm_host_psci_cpu_entry(void);
void kvm_host_psci_cpu_init(void);
bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
#endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5b073806463e..166975999ead 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1334,6 +1334,7 @@ static int kvm_map_vectors(void)
static void cpu_init_hyp_mode(void)
{
+ DECLARE_KVM_NVHE_SYM(kvm_host_psci_cpu_entry);
struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
struct arm_smccc_res res;
@@ -1351,6 +1352,8 @@ static void cpu_init_hyp_mode(void)
params->pgd_ptr = kvm_mmu_get_httbr();
params->vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
params->hyp_stack_ptr = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+ params->psci_cpu_entry_fn = (unsigned long)kern_hyp_va(
+ kvm_ksym_ref(CHOOSE_NVHE_SYM(kvm_host_psci_cpu_entry)));
/*
* Flush the init params from the data cache because the struct will
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 00dc0cab860c..42ee5effa827 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -9,12 +9,15 @@
#include <asm/kvm_mmu.h>
#include <kvm/arm_hypercalls.h>
#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
#include <linux/psci.h>
#include <kvm/arm_psci.h>
#include <uapi/linux/psci.h>
#include <nvhe/spinlock.h>
+#define INVALID_CPU_ID UINT_MAX
+
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
@@ -24,6 +27,7 @@ s64 hyp_physvirt_offset;
static DEFINE_PER_CPU(hyp_spinlock_t, psci_cpu_lock);
DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);
+static DEFINE_PER_CPU(struct vcpu_reset_state, psci_cpu_reset);
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
@@ -79,6 +83,29 @@ static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *ho
hyp_panic(); /* unreachable */
}
+static unsigned int find_cpu_id(u64 mpidr)
+{
+ int i;
+
+ if (mpidr != INVALID_HWID) {
+ for (i = 0; i < NR_CPUS; i++) {
+ if (cpu_logical_map(i) == mpidr)
+ return i;
+ }
+ }
+
+ return INVALID_CPU_ID;
+}
+
+static phys_addr_t cpu_entry_pa(void)
+{
+ extern char __kvm_hyp_cpu_entry[];
+ unsigned long kern_va;
+
+ asm volatile("ldr %0, =%1" : "=r" (kern_va) : "S" (__kvm_hyp_cpu_entry));
+ return kern_va - kimage_voffset;
+}
+
static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
@@ -100,10 +127,76 @@ static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
return ret;
}
+static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
+{
+ u64 mpidr = host_ctxt->regs.regs[1] & MPIDR_HWID_BITMASK;
+ unsigned long pc = host_ctxt->regs.regs[2];
+ unsigned long r0 = host_ctxt->regs.regs[3];
+ unsigned int cpu_id;
+ hyp_spinlock_t *cpu_lock;
+ enum kvm_nvhe_psci_state *cpu_power;
+ struct vcpu_reset_state *cpu_reset;
+ struct kvm_nvhe_init_params *cpu_params;
+ int ret;
+
+ /*
+ * Find the logical CPU ID for the given MPIDR. The search set is
+ * the set of CPUs that were online at the point of KVM initialization.
+ * Booting other CPUs is rejected because their cpufeatures were not
+ * checked against the finalized capabilities. This could be relaxed
+ * by doing the feature checks in hyp.
+ */
+ cpu_id = find_cpu_id(mpidr);
+ if (cpu_id == INVALID_CPU_ID)
+ return PSCI_RET_INVALID_PARAMS;
+
+ cpu_lock = per_cpu_ptr(&psci_cpu_lock, cpu_id);
+ cpu_power = per_cpu_ptr(&psci_cpu_state, cpu_id);
+ cpu_reset = per_cpu_ptr(&psci_cpu_reset, cpu_id);
+ cpu_params = per_cpu_ptr(&kvm_init_params, cpu_id);
+
+ do {
+ hyp_spin_lock(cpu_lock);
+
+ if (*cpu_power != KVM_NVHE_PSCI_CPU_OFF) {
+ if (kvm_host_psci_version == PSCI_VERSION(0, 1))
+ ret = PSCI_RET_INVALID_PARAMS;
+ else if (*cpu_power == KVM_NVHE_PSCI_CPU_ON)
+ ret = PSCI_RET_ALREADY_ON;
+ else
+ ret = PSCI_RET_ON_PENDING;
+ hyp_spin_unlock(cpu_lock);
+ return ret;
+ }
+
+ *cpu_reset = (struct vcpu_reset_state){
+ .pc = pc,
+ .r0 = r0,
+ };
+
+ ret = psci_call(func_id, mpidr, cpu_entry_pa(), __hyp_pa(cpu_params));
+
+ if (ret == PSCI_RET_SUCCESS)
+ *cpu_power = KVM_NVHE_PSCI_CPU_PENDING_ON;
+
+ hyp_spin_unlock(cpu_lock);
+
+ /*
+ * If recorded CPU state is OFF but EL3 reports that it's ON,
+ * we must have hit a race with CPU_OFF on the target core.
+ * Loop to try again.
+ */
+ } while (ret == PSCI_RET_ALREADY_ON);
+
+ return ret;
+}
+
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
return psci_cpu_off(func_id, host_ctxt);
+ else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_ON])
+ return psci_cpu_on(func_id, host_ctxt);
else if (func_id == kvm_host_psci_function_id[PSCI_FN_MIGRATE])
return psci_forward(host_ctxt);
else
@@ -125,6 +218,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
unreachable();
case PSCI_0_2_FN_CPU_OFF:
return psci_cpu_off(func_id, host_ctxt);
+ case PSCI_0_2_FN64_CPU_ON:
+ return psci_cpu_on(func_id, host_ctxt);
default:
return PSCI_RET_NOT_SUPPORTED;
}
@@ -148,6 +243,24 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
}
}
+void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+
+asmlinkage void __noreturn kvm_host_psci_cpu_entry(void)
+{
+ hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
+ enum kvm_nvhe_psci_state *cpu_power = this_cpu_ptr(&psci_cpu_state);
+ struct vcpu_reset_state *cpu_reset = this_cpu_ptr(&psci_cpu_reset);
+ struct kvm_cpu_context *host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+
+ hyp_spin_lock(cpu_lock);
+ *cpu_power = KVM_NVHE_PSCI_CPU_ON;
+ host_ctxt->regs.regs[0] = cpu_reset->r0;
+ write_sysreg_el2(cpu_reset->pc, SYS_ELR);
+ hyp_spin_unlock(cpu_lock);
+
+ __host_enter(host_ctxt);
+}
+
bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
{
u64 func_id = get_psci_func_id(host_ctxt);
--
2.29.1.341.ge80a0c044ae-goog
KVM's PSCI proxy could probe the firmware to establish features
supported for CPU_SUSPEND, but since the kernel's PSCI driver already
does that, and other information about the driver is already exported,
export the value of psci_cpu_suspend_feature as well for convenience.
Signed-off-by: David Brazdil <[email protected]>
---
drivers/firmware/psci/psci.c | 2 +-
include/linux/psci.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index ffcb88f60e21..b6ad237b1518 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -71,7 +71,7 @@ u32 psci_function_id[PSCI_FN_MAX];
(PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
-static u32 psci_cpu_suspend_feature;
+u32 psci_cpu_suspend_feature;
static bool psci_system_reset2_supported;
static inline bool psci_has_ext_power_state(void)
diff --git a/include/linux/psci.h b/include/linux/psci.h
index 877d844ee6d9..a5832d91d493 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -29,6 +29,8 @@ bool psci_has_osi_support(void);
*/
extern int psci_driver_version;
+extern u32 psci_cpu_suspend_feature;
+
enum psci_function {
PSCI_FN_CPU_SUSPEND,
PSCI_FN_CPU_ON,
--
2.29.1.341.ge80a0c044ae-goog
When compiling with __KVM_NVHE_HYPERVISOR__ redefine per_cpu_offset() to
__hyp_per_cpu_offset() which looks up the base of the nVHE per-CPU
region of the given cpu and computes its offset from the
.hyp.data..percpu section.
This enables use of per_cpu_ptr() helpers in nVHE hyp code. Until now
only this_cpu_ptr() was supported by setting TPIDR_EL2.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/percpu.h | 6 ++++++
arch/arm64/kernel/image-vars.h | 3 +++
arch/arm64/kvm/hyp/nvhe/Makefile | 3 ++-
arch/arm64/kvm/hyp/nvhe/percpu.c | 22 ++++++++++++++++++++++
4 files changed, 33 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/hyp/nvhe/percpu.c
diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 1599e17379d8..8f1661603b78 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -239,6 +239,12 @@ PERCPU_RET_OP(add, add, ldadd)
#define this_cpu_cmpxchg_8(pcp, o, n) \
_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+#ifdef __KVM_NVHE_HYPERVISOR__
+extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
+#define __per_cpu_offset
+#define per_cpu_offset(cpu) __hyp_per_cpu_offset((cpu))
+#endif
+
#include <asm-generic/percpu.h>
/* Redefine macros for nVHE hyp under DEBUG_PREEMPT to avoid its dependencies. */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index c615b285ff5b..78a42a7cdb72 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -103,6 +103,9 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
KVM_NVHE_ALIAS(__start___kvm_ex_table);
KVM_NVHE_ALIAS(__stop___kvm_ex_table);
+/* Array containing bases of nVHE per-CPU memory regions. */
+KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
+
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index ddde15fe85f2..c45f440cce51 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -6,7 +6,8 @@
asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__
-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o hyp-main.o
+obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
+ hyp-main.o percpu.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o
diff --git a/arch/arm64/kvm/hyp/nvhe/percpu.c b/arch/arm64/kvm/hyp/nvhe/percpu.c
new file mode 100644
index 000000000000..5fd0c5696907
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/percpu.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: David Brazdil <[email protected]>
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+unsigned long __hyp_per_cpu_offset(unsigned int cpu)
+{
+ unsigned long *cpu_base_array;
+ unsigned long this_cpu_base;
+
+ if (cpu >= ARRAY_SIZE(kvm_arm_hyp_percpu_base))
+ hyp_panic();
+
+ cpu_base_array = kern_hyp_va(&kvm_arm_hyp_percpu_base[0]);
+ this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
+ return this_cpu_base - (unsigned long)&__per_cpu_start;
+}
--
2.29.1.341.ge80a0c044ae-goog
When KVM starts validating host's PSCI requests, it will need to map
MPIDR back to the CPU ID. To this end, copy cpu_logical_map into nVHE
hyp memory when KVM is initialized.
Only copy the information for CPUs that are online at the point of KVM
initialization so that KVM rejects CPUs whose features were not checked
against the finalized capabilities.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 17 +++++++++++++++++
arch/arm64/kvm/hyp/nvhe/percpu.c | 16 ++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8bb9fffe2a8f..58e9cc183bd5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1481,6 +1481,21 @@ static inline void hyp_cpu_pm_exit(void)
}
#endif
+static void init_cpu_logical_map(void)
+{
+ extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
+ int cpu;
+
+ /*
+ * Copy the MPIDR <-> logical CPU ID mapping to hyp.
+ * Only copy the set of online CPUs whose features have been chacked
+ * against the finalized system capabilities. The hypervisor will not
+ * allow any other CPUs from the `possible` set to boot.
+ */
+ for_each_online_cpu(cpu)
+ CHOOSE_NVHE_SYM(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
+}
+
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@@ -1658,6 +1673,8 @@ static int init_hyp_mode(void)
}
}
+ init_cpu_logical_map();
+
return 0;
out_err:
diff --git a/arch/arm64/kvm/hyp/nvhe/percpu.c b/arch/arm64/kvm/hyp/nvhe/percpu.c
index 5fd0c5696907..8b7f6b7dbd48 100644
--- a/arch/arm64/kvm/hyp/nvhe/percpu.c
+++ b/arch/arm64/kvm/hyp/nvhe/percpu.c
@@ -8,6 +8,22 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
+/*
+ * nVHE copy of data structures tracking available CPU cores.
+ * Only entries for CPUs that were online at KVM init are populated.
+ * Other CPUs should not be allowed to boot because their features were
+ * not checked against the finalized system capabilities.
+ */
+u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
+
+u64 cpu_logical_map(int cpu)
+{
+ if (cpu < 0 || cpu >= ARRAY_SIZE(__cpu_logical_map))
+ hyp_panic();
+
+ return __cpu_logical_map[cpu];
+}
+
unsigned long __hyp_per_cpu_offset(unsigned int cpu)
{
unsigned long *cpu_base_array;
--
2.29.1.341.ge80a0c044ae-goog
While nVHE KVM is installed, start trapping all host SMCs. By default,
these are simply forwarded to EL3, but PSCI SMCs are validated first.
Create new constant HCR_HOST_NVHE_STUB_FLAGS with the old set of HCR
flags to use before the nVHE vector is installed or when switching back
to stub vector.
Extend HCR_HOST_NVHE_FLAGS to contain HCR_TSC. Set HCR_EL2 to it before
installing nVHE vector.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 3 ++-
arch/arm64/kernel/head.S | 2 +-
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 6 ++++++
3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 64ce29378467..04b862955f32 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -79,7 +79,8 @@
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
HCR_FMO | HCR_IMO | HCR_PTW )
#define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
-#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
+#define HCR_HOST_NVHE_STUB_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
+#define HCR_HOST_NVHE_FLAGS (HCR_HOST_NVHE_STUB_FLAGS | HCR_TSC)
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
/* TCR_EL2 Registers bits */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index e7270b63abed..ea17413a04e0 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -522,7 +522,7 @@ SYM_FUNC_START(el2_setup)
#endif
SYM_INNER_LABEL(el2_setup_nvhe, SYM_L_LOCAL)
- mov_q x0, HCR_HOST_NVHE_FLAGS
+ mov_q x0, HCR_HOST_NVHE_STUB_FLAGS
msr hcr_el2, x0
isb
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index dd297a1a8f82..97684deba6c1 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -88,6 +88,10 @@ SYM_CODE_END(__kvm_hyp_init)
* x0: struct kvm_nvhe_init_params PA
*/
SYM_CODE_START(___kvm_hyp_init)
+ mov_q x1, HCR_HOST_NVHE_FLAGS
+ msr hcr_el2, x1
+ isb
+
ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
msr tpidr_el2, x1
@@ -220,6 +224,8 @@ reset:
bic x5, x5, x6 // Clear SCTL_M and etc
pre_disable_mmu_workaround
msr sctlr_el2, x5
+ mov_q x5, HCR_HOST_NVHE_STUB_FLAGS
+ msr hcr_el2, x5
isb
/* Install stub vectors */
--
2.29.1.341.ge80a0c044ae-goog
With nVHE hyp code interception host's PSCI CPU_ON/OFF/SUSPEND SMCs,
from the host's perspective new CPUs start booting in EL1 while
previously they would have booted in EL2. The kernel logic which keeps
track of the mode CPUs were booted in needs to be adjusted to account
for this fact.
Add a static key enabled if KVM nVHE initialization is successful.
When the key is enabled, is_hyp_mode_available continues to report
`true` because its users either treat it a check whether KVM will be /
has been initialized, or whether stub HVCs can be made (eg. hibernate).
is_hyp_mode_mismatched is changed to report `false` when the key is
enabled. That's because all cores' modes matched at the point of KVM
init and KVM will not allow cores not present at init to boot. That
said, the function is never used after KVM is initialized.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/virt.h | 16 ++++++++++++++++
arch/arm64/kvm/arm.c | 5 +++++
2 files changed, 21 insertions(+)
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 6069be50baf9..597430b5f5aa 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -65,9 +65,18 @@ extern u32 __boot_cpu_mode[2];
void __hyp_set_vectors(phys_addr_t phys_vector_base);
void __hyp_reset_vectors(void);
+DECLARE_STATIC_KEY_FALSE(kvm_nvhe_initialized);
+
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
+ /*
+ * If KVM nVHE is initialized, all CPUs must have been booted in EL2.
+ * Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
+ */
+ if (IS_ENABLED(CONFIG_KVM) && static_branch_unlikely(&kvm_nvhe_initialized))
+ return true;
+
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
__boot_cpu_mode[1] == BOOT_CPU_MODE_EL2);
}
@@ -75,6 +84,13 @@ static inline bool is_hyp_mode_available(void)
/* Check if the bootloader has booted CPUs in different modes */
static inline bool is_hyp_mode_mismatched(void)
{
+ /*
+ * If KVM nVHE is initialized, all CPUs must have been booted in EL2.
+ * Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
+ */
+ if (IS_ENABLED(CONFIG_KVM) && static_branch_unlikely(&kvm_nvhe_initialized))
+ return false;
+
return __boot_cpu_mode[0] != __boot_cpu_mode[1];
}
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3dff6af69eca..e93956d6235d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -47,6 +47,8 @@
__asm__(".arch_extension virt");
#endif
+DEFINE_STATIC_KEY_FALSE(kvm_nvhe_initialized);
+
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
@@ -1841,6 +1843,9 @@ int kvm_arch_init(void *opaque)
if (err)
goto out_hyp;
+ if (!in_hyp_mode)
+ static_branch_enable(&kvm_nvhe_initialized);
+
if (in_hyp_mode)
kvm_info("VHE mode initialized successfully\n");
else
--
2.29.1.341.ge80a0c044ae-goog
Add a host-initialized constant to KVM nVHE hyp code for converting
between EL2 linear map virtual addresses and physical addresses.
Also add `__hyp_pa` macro that performs the conversion.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/arm.c | 15 +++++++++++++++
arch/arm64/kvm/hyp/nvhe/psci.c | 3 +++
2 files changed, 18 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index cedec793da64..580d4a656a7b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1484,6 +1484,20 @@ static inline void hyp_cpu_pm_exit(void)
}
#endif
+static void init_hyp_physvirt_offset(void)
+{
+ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
+ unsigned long kern_vaddr, hyp_vaddr, paddr;
+
+ /* Check that kvm_arm_hyp_percpu_base has been set. */
+ BUG_ON(kvm_arm_hyp_percpu_base[0] == 0);
+
+ kern_vaddr = kvm_arm_hyp_percpu_base[0];
+ hyp_vaddr = kern_hyp_va(kern_vaddr);
+ paddr = __pa(kern_vaddr);
+ CHOOSE_NVHE_SYM(hyp_physvirt_offset) = (s64)paddr - (s64)hyp_vaddr;
+}
+
static void init_cpu_logical_map(void)
{
extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
@@ -1686,6 +1700,7 @@ static int init_hyp_mode(void)
}
}
+ init_hyp_physvirt_offset();
init_cpu_logical_map();
init_psci();
diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
index 8f779560ab6f..3eafcf48a29b 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci.c
@@ -16,6 +16,9 @@
/* Config options set by the host. */
u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
u32 kvm_host_psci_function_id[PSCI_FN_MAX];
+s64 hyp_physvirt_offset;
+
+#define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
--
2.29.1.341.ge80a0c044ae-goog
Small refactor so that nVHE's handle_trap uses a switch on the Exception
Class value of ESR_EL2 in preparation for adding a handler of SMC32/64.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 411b0f652417..19332c20fcde 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -16,9 +16,9 @@
DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
-static void handle_host_hcall(unsigned long func_id,
- struct kvm_cpu_context *host_ctxt)
+static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
{
+ unsigned long func_id = host_ctxt->regs.regs[0];
unsigned long ret = 0;
switch (func_id) {
@@ -109,11 +109,12 @@ static void handle_host_hcall(unsigned long func_id,
void handle_trap(struct kvm_cpu_context *host_ctxt)
{
u64 esr = read_sysreg_el2(SYS_ESR);
- unsigned long func_id;
- if (ESR_ELx_EC(esr) != ESR_ELx_EC_HVC64)
+ switch (ESR_ELx_EC(esr)) {
+ case ESR_ELx_EC_HVC64:
+ handle_host_hcall(host_ctxt);
+ break;
+ default:
hyp_panic();
-
- func_id = host_ctxt->regs.regs[0];
- handle_host_hcall(func_id, host_ctxt);
+ }
}
--
2.29.1.341.ge80a0c044ae-goog
All nVHE hyp code is currently executed as handlers of host's HVCs. This
will change as nVHE starts intercepting host's PSCI CPU_ON SMCs. The
newly booted CPU will need to initialize EL2 state and then enter the
host. Add __host_enter function that branches into the existing
host state-restoring code after the trap handler would have returned.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/host.S | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index ed27f06a31ba..ff04d7115eab 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -41,6 +41,7 @@ SYM_FUNC_START(__host_exit)
bl handle_trap
/* Restore host regs x0-x17 */
+__host_enter_restore_full:
ldp x0, x1, [x29, #CPU_XREG_OFFSET(0)]
ldp x2, x3, [x29, #CPU_XREG_OFFSET(2)]
ldp x4, x5, [x29, #CPU_XREG_OFFSET(4)]
@@ -63,6 +64,14 @@ __host_enter_without_restoring:
sb
SYM_FUNC_END(__host_exit)
+/*
+ * void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+ */
+SYM_FUNC_START(__host_enter)
+ mov x29, x0
+ b __host_enter_restore_full
+SYM_FUNC_END(__host_enter)
+
/*
* void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
*/
--
2.29.1.341.ge80a0c044ae-goog
Once we start initializing KVM on newly booted cores before the rest of
the kernel, parameters to __do_hyp_init will need to be provided by EL2
rather than EL1. At that point it will not be possible to pass its four
arguments directly because PSCI_CPU_ON only supports one context
argument.
Refactor __do_hyp_init to accept its parameters in a struct. This
prepares the code for KVM booting cores as well as removes any limits on
the number of __do_hyp_init arguments.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 7 +++++++
arch/arm64/include/asm/kvm_hyp.h | 4 ++++
arch/arm64/kernel/asm-offsets.c | 4 ++++
arch/arm64/kvm/arm.c | 26 ++++++++++++++------------
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 21 ++++++++++-----------
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 ++
6 files changed, 41 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 54387ccd1ab2..a49a87a186c3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -150,6 +150,13 @@ extern void *__vhe_undefined_symbol;
#endif
+struct kvm_nvhe_init_params {
+ phys_addr_t pgd_ptr;
+ unsigned long tpidr_el2;
+ unsigned long hyp_stack_ptr;
+ unsigned long vector_ptr;
+};
+
/* Translate a kernel address @ptr into its equivalent linear mapping */
#define kvm_ksym_ref(ptr) \
({ \
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 6b664de5ec1f..a3289071f3d8 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -15,6 +15,10 @@
DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);
+#ifdef __KVM_NVHE_HYPERVISOR__
+DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+#endif
+
#define read_sysreg_elx(r,nvh,vh) \
({ \
u64 reg; \
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 7d32fc959b1a..0cbb86135c7c 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -110,6 +110,10 @@ int main(void)
DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1]));
DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data, host_ctxt));
+ DEFINE(NVHE_INIT_PGD_PTR, offsetof(struct kvm_nvhe_init_params, pgd_ptr));
+ DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params, tpidr_el2));
+ DEFINE(NVHE_INIT_STACK_PTR, offsetof(struct kvm_nvhe_init_params, hyp_stack_ptr));
+ DEFINE(NVHE_INIT_VECTOR_PTR, offsetof(struct kvm_nvhe_init_params, vector_ptr));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 58e9cc183bd5..ff200fc8d653 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -50,6 +50,7 @@ DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
/* The VMID used in the VTTBR */
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
@@ -1331,10 +1332,7 @@ static int kvm_map_vectors(void)
static void cpu_init_hyp_mode(void)
{
- phys_addr_t pgd_ptr;
- unsigned long hyp_stack_ptr;
- unsigned long vector_ptr;
- unsigned long tpidr_el2;
+ struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
struct arm_smccc_res res;
/* Switch from the HYP stub to our own HYP init vector */
@@ -1345,13 +1343,18 @@ static void cpu_init_hyp_mode(void)
* kernel's mapping to the linear mapping, and store it in tpidr_el2
* so that we can use adr_l to access per-cpu variables in EL2.
*/
- tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
- (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
+ params->tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
+ (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
- pgd_ptr = kvm_mmu_get_httbr();
- hyp_stack_ptr = __this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE;
- hyp_stack_ptr = kern_hyp_va(hyp_stack_ptr);
- vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
+ params->pgd_ptr = kvm_mmu_get_httbr();
+ params->vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
+ params->hyp_stack_ptr = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+
+ /*
+ * Flush the init params from the data cache because the struct will
+ * be read from while the MMU is off.
+ */
+ __flush_dcache_area(params, sizeof(*params));
/*
* Call initialization code, and switch to the full blown HYP code.
@@ -1360,8 +1363,7 @@ static void cpu_init_hyp_mode(void)
* cpus_have_const_cap() wrapper.
*/
BUG_ON(!system_capabilities_finalized());
- arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init),
- pgd_ptr, tpidr_el2, hyp_stack_ptr, vector_ptr, &res);
+ arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
/*
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 96e70f976ff5..6f3ac5d428ec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -47,10 +47,7 @@ __invalid:
/*
* x0: SMCCC function ID
- * x1: HYP pgd
- * x2: per-CPU offset
- * x3: HYP stack
- * x4: HYP vectors
+ * x1: struct kvm_nvhe_init_params PA
*/
__do_hyp_init:
/* Check for a stub HVC call */
@@ -71,10 +68,16 @@ __do_hyp_init:
mov x0, #SMCCC_RET_NOT_SUPPORTED
eret
-1:
- /* Set tpidr_el2 for use by HYP to free a register */
- msr tpidr_el2, x2
+1: ldr x0, [x1, #NVHE_INIT_TPIDR_EL2]
+ msr tpidr_el2, x0
+ ldr x0, [x1, #NVHE_INIT_STACK_PTR]
+ mov sp, x0
+
+ ldr x0, [x1, #NVHE_INIT_VECTOR_PTR]
+ msr vbar_el2, x0
+
+ ldr x1, [x1, #NVHE_INIT_PGD_PTR]
phys_to_ttbr x0, x1
alternative_if ARM64_HAS_CNP
orr x0, x0, #TTBR_CNP_BIT
@@ -134,10 +137,6 @@ alternative_else_nop_endif
msr sctlr_el2, x0
isb
- /* Set the stack and new vectors */
- mov sp, x3
- msr vbar_el2, x4
-
/* Hello, World! */
mov x0, #SMCCC_RET_SUCCESS
eret
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e2eafe2c93af..411b0f652417 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -14,6 +14,8 @@
#include <kvm/arm_hypercalls.h>
+DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+
static void handle_host_hcall(unsigned long func_id,
struct kvm_cpu_context *host_ctxt)
{
--
2.29.1.341.ge80a0c044ae-goog
When nVHE hyp starts interception host's PSCI CPU_ON SMCs, it will need
to install KVM on the newly booted CPU before returning to the host. Add
an entry point which expects the same kvm_nvhe_init_params struct as the
__kvm_hyp_init HVC in the CPU_ON context argument (x0).
The entry point initializes EL2 state with the same init_el2_state macro
used by the kernel's entry point. It then initializes KVM using the same
helper function used in the __kvm_hyp_init HVC.
When done, the entry point branches to a function provided in the init
params.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 30 ++++++++++++++++++++++++++++++
3 files changed, 32 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 9eecb37db6df..8350b95ce94e 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -155,6 +155,7 @@ struct kvm_nvhe_init_params {
unsigned long tpidr_el2;
unsigned long hyp_stack_ptr;
unsigned long vector_ptr;
+ unsigned long psci_cpu_entry_fn;
};
enum kvm_nvhe_psci_state {
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cbb86135c7c..ffc84e68ad97 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -114,6 +114,7 @@ int main(void)
DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params, tpidr_el2));
DEFINE(NVHE_INIT_STACK_PTR, offsetof(struct kvm_nvhe_init_params, hyp_stack_ptr));
DEFINE(NVHE_INIT_VECTOR_PTR, offsetof(struct kvm_nvhe_init_params, vector_ptr));
+ DEFINE(NVHE_INIT_PSCI_CPU_ENTRY_FN, offsetof(struct kvm_nvhe_init_params, psci_cpu_entry_fn));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 1726cc44b3ee..dd297a1a8f82 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -6,6 +6,7 @@
#include <linux/arm-smccc.h>
#include <linux/linkage.h>
+#include <linux/irqchip/arm-gic-v3.h>
#include <asm/alternative.h>
#include <asm/assembler.h>
@@ -159,6 +160,35 @@ alternative_else_nop_endif
ret
SYM_CODE_END(___kvm_hyp_init)
+SYM_CODE_START(__kvm_hyp_cpu_entry)
+ msr SPsel, #1 // We want to use SP_EL{1,2}
+
+ /*
+ * Check that the core was booted in EL2. Loop indefinitely if not
+ * because it cannot be safely given to the host without installing KVM.
+ */
+ mrs x1, CurrentEL
+ cmp x1, #CurrentEL_EL2
+ b.ne .
+
+ /* Initialize EL2 CPU state to sane values. */
+ mov x29, x0
+ init_el2_state nvhe
+ mov x0, x29
+
+ /*
+ * Load hyp VA of C entry function. Must do so before switching on the
+ * MMU because the struct pointer is PA and not identity-mapped in hyp.
+ */
+ ldr x29, [x0, #NVHE_INIT_PSCI_CPU_ENTRY_FN]
+
+ /* Enable MMU, set vectors and stack. */
+ bl ___kvm_hyp_init
+
+ /* Leave idmap. */
+ br x29
+SYM_CODE_END(__kvm_hyp_cpu_entry)
+
SYM_CODE_START(__kvm_handle_stub_hvc)
cmp x0, #HVC_SOFT_RESTART
b.ne 1f
--
2.29.1.341.ge80a0c044ae-goog
Add handler of host SMCs in KVM nVHE trap handler. Forward all SMCs to
EL3 and propagate the result back to EL1. This is done in preparation
for validating host SMCs.
Signed-off-by: David Brazdil <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 36 ++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 19332c20fcde..fffc2dc09a1f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -106,6 +106,38 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
host_ctxt->regs.regs[1] = ret;
}
+static void skip_host_instruction(void)
+{
+ write_sysreg_el2(read_sysreg_el2(SYS_ELR) + 4, SYS_ELR);
+}
+
+static void forward_host_smc(struct kvm_cpu_context *host_ctxt)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_smc(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
+ host_ctxt->regs.regs[2], host_ctxt->regs.regs[3],
+ host_ctxt->regs.regs[4], host_ctxt->regs.regs[5],
+ host_ctxt->regs.regs[6], host_ctxt->regs.regs[7],
+ &res);
+ host_ctxt->regs.regs[0] = res.a0;
+ host_ctxt->regs.regs[1] = res.a1;
+ host_ctxt->regs.regs[2] = res.a2;
+ host_ctxt->regs.regs[3] = res.a3;
+}
+
+static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
+{
+ /*
+ * Unlike HVC, the return address of an SMC is the instruction's PC.
+ * Move the return address past the instruction.
+ */
+ skip_host_instruction();
+
+ /* Forward SMC not handled in EL2 to EL3. */
+ forward_host_smc(host_ctxt);
+}
+
void handle_trap(struct kvm_cpu_context *host_ctxt)
{
u64 esr = read_sysreg_el2(SYS_ESR);
@@ -114,6 +146,10 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
case ESR_ELx_EC_HVC64:
handle_host_hcall(host_ctxt);
break;
+ case ESR_ELx_EC_SMC32:
+ case ESR_ELx_EC_SMC64:
+ handle_host_smc(host_ctxt);
+ break;
default:
hyp_panic();
}
--
2.29.1.341.ge80a0c044ae-goog
On 2020-11-04 18:36, David Brazdil wrote:
> The version of PSCI that the kernel should use to communicate with
> firmware is typically obtained from probing PSCI_VERSION. However, that
> doesn't work for PSCI v0.1 where the host gets the information from
> DT/ACPI, or if PSCI is not supported / was disabled.
>
> KVM's PSCI proxy for the host needs to be configured with the same
> version used by the host driver. Expose the PSCI version used by the
> host.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> drivers/firmware/psci/psci.c | 6 ++++++
> include/linux/psci.h | 8 ++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/drivers/firmware/psci/psci.c
> b/drivers/firmware/psci/psci.c
> index 00af99b6f97c..ff523bdbfe3f 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -49,6 +49,8 @@ static int resident_cpu = -1;
> struct psci_operations psci_ops;
> static enum arm_smccc_conduit psci_conduit = SMCCC_CONDUIT_NONE;
>
> +int psci_driver_version = PSCI_VERSION(0, 0);
> +
> bool psci_tos_resident_on(int cpu)
> {
> return cpu == resident_cpu;
> @@ -461,6 +463,8 @@ static int __init psci_probe(void)
> return -EINVAL;
> }
>
> + psci_driver_version = ver;
> +
> psci_0_2_set_functions();
>
> psci_init_migrate();
> @@ -514,6 +518,8 @@ static int __init psci_0_1_init(struct device_node
> *np)
>
> pr_info("Using PSCI v0.1 Function IDs from DT\n");
>
> + psci_driver_version = PSCI_VERSION(0, 1);
> +
> if (!of_property_read_u32(np, "cpu_suspend", &id)) {
> psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
> psci_ops.cpu_suspend = psci_cpu_suspend;
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index 2a1bfb890e58..cb35b90d1746 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -21,6 +21,14 @@ bool psci_power_state_is_valid(u32 state);
> int psci_set_osi_mode(bool enable);
> bool psci_has_osi_support(void);
>
> +/**
> + * The version of the PSCI specification followed by the driver.
> + * This is equivalent to calling PSCI_VERSION except:
> + * (a) it also works for PSCI v0.1, which does not support
> PSCI_VERSION, and
> + * (b) it is set to v0.0 if the PSCI driver was not initialized.
> + */
> +extern int psci_driver_version;
> +
> struct psci_operations {
> u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);
How about providing a get_version callback for pre-0.2 implementations
instead? This would avoid exposing more symbols (psci_ops is already
global).
Thanks,
M.
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 00af99b6f97c..b84454e12d92 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -500,6 +500,11 @@ static int __init psci_0_2_init(struct device_node
*np)
return psci_probe();
}
+static u32 psci_0_1_get_version(void)
+{
+ return PSCI_VERSION(0, 1);
+}
+
/*
* PSCI < v0.2 get PSCI Function IDs via DT.
*/
@@ -514,6 +519,8 @@ static int __init psci_0_1_init(struct device_node
*np)
pr_info("Using PSCI v0.1 Function IDs from DT\n");
+ psci_ops.get_version = psci_0_1_get_version;
+
if (!of_property_read_u32(np, "cpu_suspend", &id)) {
psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
psci_ops.cpu_suspend = psci_cpu_suspend;
--
Jazz is not dead. It just smells funny...
On Wed, 4 Nov 2020 at 19:37, 'David Brazdil' via kernel-team
<[email protected]> wrote:
>
> Add a handler of CPU_SUSPEND host PSCI SMCs. When invoked, it determines
> whether the requested power state loses context, ie. whether it is
> indistinguishable from a WHI or whether it is a deeper sleep state that
Do you mean WFI?
> behaves like a CPU_OFF+CPU_ON.
>
> If it's the former, it forwards the call to EL3 and returns to the host
> after waking up.
>
> If it's the latter, it saves r0,pc of the host into and makes the same
> call to EL3 with the hyp CPU entry point. When the core wakes up, EL2
> state is initialized before dropping back to EL1.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 2 ++
> arch/arm64/kvm/hyp/nvhe/psci.c | 49 +++++++++++++++++++++++++++++++++-
> drivers/firmware/psci/psci.c | 9 -------
> include/uapi/linux/psci.h | 7 +++++
> 4 files changed, 57 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 166975999ead..6fbda652200b 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1521,9 +1521,11 @@ static void init_psci(void)
> {
> extern u32 kvm_nvhe_sym(kvm_host_psci_version);
> extern u32 kvm_nvhe_sym(kvm_host_psci_function_id)[PSCI_FN_MAX];
> + extern u32 kvm_nvhe_sym(kvm_host_psci_cpu_suspend_feature);
> int cpu;
>
> kvm_nvhe_sym(kvm_host_psci_version) = psci_driver_version;
> + kvm_nvhe_sym(kvm_host_psci_cpu_suspend_feature) = psci_cpu_suspend_feature;
> memcpy(kvm_nvhe_sym(kvm_host_psci_function_id),
> psci_function_id, sizeof(psci_function_id));
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c b/arch/arm64/kvm/hyp/nvhe/psci.c
> index 42ee5effa827..4899c8319bb4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci.c
> @@ -21,6 +21,7 @@
> /* Config options set by the host. */
> u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
> u32 kvm_host_psci_function_id[PSCI_FN_MAX];
> +u32 kvm_host_psci_cpu_suspend_feature;
> s64 hyp_physvirt_offset;
>
> #define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)
> @@ -83,6 +84,20 @@ static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *ho
> hyp_panic(); /* unreachable */
> }
>
> +static bool psci_has_ext_power_state(void)
> +{
> + return kvm_host_psci_cpu_suspend_feature & PSCI_1_0_FEATURES_CPU_SUSPEND_PF_MASK;
> +}
> +
> +static bool psci_power_state_loses_context(u32 state)
> +{
> + const u32 mask = psci_has_ext_power_state() ?
> + PSCI_1_0_EXT_POWER_STATE_TYPE_MASK :
> + PSCI_0_2_POWER_STATE_TYPE_MASK;
> +
> + return state & mask;
> +}
> +
> static unsigned int find_cpu_id(u64 mpidr)
> {
> int i;
> @@ -106,6 +121,34 @@ static phys_addr_t cpu_entry_pa(void)
> return kern_va - kimage_voffset;
> }
>
> +static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
> +{
> + u64 power_state = host_ctxt->regs.regs[1];
> + unsigned long pc = host_ctxt->regs.regs[2];
> + unsigned long r0 = host_ctxt->regs.regs[3];
> + hyp_spinlock_t *cpu_lock;
> + struct vcpu_reset_state *cpu_reset;
> + struct kvm_nvhe_init_params *cpu_params;
> +
> + if (!psci_power_state_loses_context(power_state)) {
> + /* This power state has the same semantics as WFI. */
> + return psci_call(PSCI_0_2_FN64_CPU_SUSPEND, 0, 0, 0);
> + }
> +
> + cpu_lock = this_cpu_ptr(&psci_cpu_lock);
> + cpu_reset = this_cpu_ptr(&psci_cpu_reset);
> + cpu_params = this_cpu_ptr(&kvm_init_params);
> +
> + /* Resuming from this state has the same semantics as CPU_ON. */
> + hyp_spin_lock(cpu_lock);
> + *cpu_reset = (struct vcpu_reset_state){
> + .pc = pc,
> + .r0 = r0,
> + };
> + hyp_spin_unlock(cpu_lock);
> + return psci_call(func_id, power_state, cpu_entry_pa(), __hyp_pa(cpu_params));
> +}
> +
> static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
> {
> hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
> @@ -193,7 +236,9 @@ static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
>
> static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
> {
> - if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
> + if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_SUSPEND])
> + return psci_cpu_suspend(func_id, host_ctxt);
> + else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_OFF])
> return psci_cpu_off(func_id, host_ctxt);
> else if (func_id == kvm_host_psci_function_id[PSCI_FN_CPU_ON])
> return psci_cpu_on(func_id, host_ctxt);
> @@ -216,6 +261,8 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
> case PSCI_0_2_FN_SYSTEM_RESET:
> psci_forward_noreturn(host_ctxt);
> unreachable();
> + case PSCI_0_2_FN64_CPU_SUSPEND:
> + return psci_cpu_suspend(func_id, host_ctxt);
> case PSCI_0_2_FN_CPU_OFF:
> return psci_cpu_off(func_id, host_ctxt);
> case PSCI_0_2_FN64_CPU_ON:
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index b6ad237b1518..387e24409da7 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -62,15 +62,6 @@ static psci_fn *invoke_psci_fn;
>
> u32 psci_function_id[PSCI_FN_MAX];
>
> -#define PSCI_0_2_POWER_STATE_MASK \
> - (PSCI_0_2_POWER_STATE_ID_MASK | \
> - PSCI_0_2_POWER_STATE_TYPE_MASK | \
> - PSCI_0_2_POWER_STATE_AFFL_MASK)
> -
> -#define PSCI_1_0_EXT_POWER_STATE_MASK \
> - (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
> - PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
> -
> u32 psci_cpu_suspend_feature;
> static bool psci_system_reset2_supported;
>
> diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
> index 0d52b8dbe8c2..df3d85ce86f7 100644
> --- a/include/uapi/linux/psci.h
> +++ b/include/uapi/linux/psci.h
> @@ -65,6 +65,10 @@
> #define PSCI_0_2_POWER_STATE_AFFL_SHIFT 24
> #define PSCI_0_2_POWER_STATE_AFFL_MASK \
> (0x3 << PSCI_0_2_POWER_STATE_AFFL_SHIFT)
> +#define PSCI_0_2_POWER_STATE_MASK \
> + (PSCI_0_2_POWER_STATE_ID_MASK | \
> + PSCI_0_2_POWER_STATE_TYPE_MASK | \
> + PSCI_0_2_POWER_STATE_AFFL_MASK)
>
> /* PSCI extended power state encoding for CPU_SUSPEND function */
> #define PSCI_1_0_EXT_POWER_STATE_ID_MASK 0xfffffff
> @@ -72,6 +76,9 @@
> #define PSCI_1_0_EXT_POWER_STATE_TYPE_SHIFT 30
> #define PSCI_1_0_EXT_POWER_STATE_TYPE_MASK \
> (0x1 << PSCI_1_0_EXT_POWER_STATE_TYPE_SHIFT)
> +#define PSCI_1_0_EXT_POWER_STATE_MASK \
> + (PSCI_1_0_EXT_POWER_STATE_ID_MASK | \
> + PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
>
> /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
> #define PSCI_0_2_AFFINITY_LEVEL_ON 0
> --
> 2.29.1.341.ge80a0c044ae-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
> > Add a handler of CPU_SUSPEND host PSCI SMCs. When invoked, it determines
> > whether the requested power state loses context, ie. whether it is
> > indistinguishable from a WHI or whether it is a deeper sleep state that
> Do you mean WFI?
Of course, sorry, just a typo.
On 2020-11-04 18:36, David Brazdil wrote:
> Add handler of host SMCs in KVM nVHE trap handler. Forward all SMCs to
> EL3 and propagate the result back to EL1. This is done in preparation
> for validating host SMCs.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 36 ++++++++++++++++++++++++++++++
> 1 file changed, 36 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 19332c20fcde..fffc2dc09a1f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -106,6 +106,38 @@ static void handle_host_hcall(struct
> kvm_cpu_context *host_ctxt)
> host_ctxt->regs.regs[1] = ret;
> }
>
> +static void skip_host_instruction(void)
> +{
> + write_sysreg_el2(read_sysreg_el2(SYS_ELR) + 4, SYS_ELR);
> +}
> +
> +static void forward_host_smc(struct kvm_cpu_context *host_ctxt)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_smc(host_ctxt->regs.regs[0], host_ctxt->regs.regs[1],
> + host_ctxt->regs.regs[2], host_ctxt->regs.regs[3],
> + host_ctxt->regs.regs[4], host_ctxt->regs.regs[5],
> + host_ctxt->regs.regs[6], host_ctxt->regs.regs[7],
> + &res);
> + host_ctxt->regs.regs[0] = res.a0;
> + host_ctxt->regs.regs[1] = res.a1;
> + host_ctxt->regs.regs[2] = res.a2;
> + host_ctxt->regs.regs[3] = res.a3;
> +}
> +
> +static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
> +{
> + /*
> + * Unlike HVC, the return address of an SMC is the instruction's PC.
> + * Move the return address past the instruction.
> + */
> + skip_host_instruction();
> +
> + /* Forward SMC not handled in EL2 to EL3. */
> + forward_host_smc(host_ctxt);
> +}
> +
> void handle_trap(struct kvm_cpu_context *host_ctxt)
> {
> u64 esr = read_sysreg_el2(SYS_ESR);
> @@ -114,6 +146,10 @@ void handle_trap(struct kvm_cpu_context
> *host_ctxt)
> case ESR_ELx_EC_HVC64:
> handle_host_hcall(host_ctxt);
> break;
> + case ESR_ELx_EC_SMC32:
How is that even possible? Host EL1 is strictly 64bit, so SMC32 cannot
occur.
> + case ESR_ELx_EC_SMC64:
> + handle_host_smc(host_ctxt);
> + break;
> default:
> hyp_panic();
> }
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On 2020-11-04 18:36, David Brazdil wrote:
> Add a handler of the CPU_OFF PSCI host SMC trapped in KVM nVHE hyp
> code.
> When invoked, it changes the recorded state of the core to OFF before
> forwarding the call to EL3. If the call fails, it changes the state
> back
> to ON and returns the error to the host.
>
> Signed-off-by: David Brazdil <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/psci.c | 30 +++++++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci.c
> b/arch/arm64/kvm/hyp/nvhe/psci.c
> index c3d0a6246c66..00dc0cab860c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci.c
> @@ -13,6 +13,8 @@
> #include <kvm/arm_psci.h>
> #include <uapi/linux/psci.h>
>
> +#include <nvhe/spinlock.h>
> +
> /* Config options set by the host. */
> u32 kvm_host_psci_version = PSCI_VERSION(0, 0);
> u32 kvm_host_psci_function_id[PSCI_FN_MAX];
> @@ -20,6 +22,7 @@ s64 hyp_physvirt_offset;
>
> #define __hyp_pa(x) ((phys_addr_t)(x) + hyp_physvirt_offset)
>
> +static DEFINE_PER_CPU(hyp_spinlock_t, psci_cpu_lock);
> DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);
>
> static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
> @@ -76,9 +79,32 @@ static __noreturn unsigned long
> psci_forward_noreturn(struct kvm_cpu_context *ho
> hyp_panic(); /* unreachable */
> }
>
> +static int psci_cpu_off(u64 func_id, struct kvm_cpu_context
> *host_ctxt)
> +{
> + hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
> + enum kvm_nvhe_psci_state *cpu_power = this_cpu_ptr(&psci_cpu_state);
> + u32 power_state = (u32)host_ctxt->regs.regs[1];
> + int ret;
> +
> + /* Change the recorded state to OFF before forwarding the call. */
> + hyp_spin_lock(cpu_lock);
> + *cpu_power = KVM_NVHE_PSCI_CPU_OFF;
> + hyp_spin_unlock(cpu_lock);
So at this point, another CPU can observe the vcpu being "off", and
issue
a PSCI_ON, which may result in an "already on". I'm not sure this is an
actual issue, but it is worth documenting.
What is definitely missing is a rational about *why* we need to track
the
state of the vcpus. I naively imagined that we could directly proxy the
PSCI calls to EL3, only repainting PC for PSCI_ON.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Hi Marc,
> > +static DEFINE_PER_CPU(hyp_spinlock_t, psci_cpu_lock);
> > DEFINE_PER_CPU(enum kvm_nvhe_psci_state, psci_cpu_state);
> >
> > static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
> > @@ -76,9 +79,32 @@ static __noreturn unsigned long
> > psci_forward_noreturn(struct kvm_cpu_context *ho
> > hyp_panic(); /* unreachable */
> > }
> >
> > +static int psci_cpu_off(u64 func_id, struct kvm_cpu_context *host_ctxt)
> > +{
> > + hyp_spinlock_t *cpu_lock = this_cpu_ptr(&psci_cpu_lock);
> > + enum kvm_nvhe_psci_state *cpu_power = this_cpu_ptr(&psci_cpu_state);
> > + u32 power_state = (u32)host_ctxt->regs.regs[1];
> > + int ret;
> > +
> > + /* Change the recorded state to OFF before forwarding the call. */
> > + hyp_spin_lock(cpu_lock);
> > + *cpu_power = KVM_NVHE_PSCI_CPU_OFF;
> > + hyp_spin_unlock(cpu_lock);
>
> So at this point, another CPU can observe the vcpu being "off", and issue
> a PSCI_ON, which may result in an "already on". I'm not sure this is an
> actual issue, but it is worth documenting.
>
> What is definitely missing is a rational about *why* we need to track the
> state of the vcpus. I naively imagined that we could directly proxy the
> PSCI calls to EL3, only repainting PC for PSCI_ON.
I think I've solved that particular problem by *not* using cpu_power for
AFFINITY_INFO. It's used only for resolving the race between CPU_ON/OFF.
You are, however, right that perhaps that is not needed either and resolving
the race should be left to the host. In that case the hypervisor would be just
repainting the CPU_ON/SUSPEND args, as you said.
On 2020-11-04 18:36, David Brazdil wrote:
> As we progress towards being able to keep guest state private to the
> host running nVHE hypervisor, this series allows the hypervisor to
> install itself on newly booted CPUs before the host is allowed to run
> on them.
>
> To this end, the hypervisor starts trapping host SMCs and intercepting
> host's PSCI CPU_ON/OFF/SUSPEND calls. It replaces the host's entry
> point
> with its own, initializes the EL2 state of the new CPU and installs
> the nVHE hyp vector before ERETing to the host's entry point.
>
> Other PSCI SMCs are forwarded to EL3, though only the known set of SMCs
> implemented in the kernel is allowed. Non-PSCI SMCs are also forwarded
> to EL3. Future changes will need to ensure the safety of all SMCs wrt.
> private guests.
>
> The host is still allowed to reset EL2 back to the stub vector, eg. for
> hibernation or kexec, but will not disable nVHE when there are no VMs.
>
> Tested on Rock Pi 4b.
>
>
> Sending this as an RFC to get feedback on the following decisions:
>
> 1) The kernel checks new cores' features against the finalized system
> capabilities. To avoid the need to move this code/data to EL2, the
> implementation only allows to boot cores that were online at the time
> of
> KVM initialization.
>
> 2) Trapping and forwarding SMCs cannot be switched off. This could
> cause
> issues eg. if EL3 always returned to EL1. A kernel command line flag
> may
> be needed to turn the feature off on such platforms.
I'd rather have it the other way around (buy-in rather than turn off).
On top of the potential issue with stupid EL3s, there is the issue that
PSCI is optional, and that protected VMs won't be able to work without
it. Another related thing is that EL3 itself is optional.
Note that this flag shouldn't be specific to PSCI proxying. It should
also
control Stage-2 wrapping, and the whole pKVM.
Thanks,
M.
--
Jazz is not dead. It just smells funny...