Allow userspace to request handling PSCI calls from guests. Our goal is
to enable a vCPU hot-add solution for Arm where the VMM presents
possible resources to the guest at boot, and controls which vCPUs can be
brought up by allowing or denying PSCI CPU_ON calls. Passing HVC and
PSCI to userspace has been discussed on the list in the context of vCPU
hot-add [1,2] but it can also be useful for implementing other SMCCC and
vendor hypercalls [3,4,5].
Patches 1-3 allow userspace to request WFI to be executed in KVM. That
way the VMM can easily implement the PSCI CPU_SUSPEND function, which is
mandatory from PSCI v0.2 onwards (even if it doesn't have a more useful
implementation than WFI, natively available to the guest).
Patch 4 lets userspace request any HVC that isn't handled by KVM, and
patch 5 lets userspace request PSCI calls, disabling in-kernel PSCI
handling.
I'm focusing on the PSCI bits, but a complete prototype of vCPU hot-add
for arm64 on Linux and QEMU, most of it from Salil and James, is
available at [6].
[1] https://lore.kernel.org/kvmarm/[email protected]/
[2] https://lore.kernel.org/linux-arm-kernel/[email protected]/
(Followed by KVM forum and Linaro Open discussions)
[3] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[4] https://lore.kernel.org/kvm/[email protected]/
[5] https://lore.kernel.org/kvm/[email protected]/
[6] https://jpbrucker.net/git/linux/log/?h=cpuhp/devel
https://jpbrucker.net/git/qemu/log/?h=cpuhp/devel
Jean-Philippe Brucker (5):
KVM: arm64: Replace power_off with mp_state in struct kvm_vcpu_arch
KVM: arm64: Move WFI execution to check_vcpu_requests()
KVM: arm64: Allow userspace to request WFI
KVM: arm64: Pass hypercalls to userspace
KVM: arm64: Pass PSCI calls to userspace
Documentation/virt/kvm/api.rst | 46 +++++++++++++++----
Documentation/virt/kvm/arm/psci.rst | 1 +
arch/arm64/include/asm/kvm_host.h | 10 +++-
include/kvm/arm_hypercalls.h | 1 +
include/kvm/arm_psci.h | 4 ++
include/uapi/linux/kvm.h | 3 ++
arch/arm64/kvm/arm.c | 71 +++++++++++++++++++++--------
arch/arm64/kvm/handle_exit.c | 3 +-
arch/arm64/kvm/hypercalls.c | 28 +++++++++++-
arch/arm64/kvm/psci.c | 69 ++++++++++++++--------------
10 files changed, 170 insertions(+), 66 deletions(-)
--
2.31.1
Prepare for WFI requests from userspace, by adding a suspend request and
moving the WFI execution into check_vcpu_requests(), next to the
power-off logic.
vcpu->arch.mp_state, previously only RUNNABLE or STOPPED, supports an
additional state HALTED and two new state transitions:
RUNNABLE -> HALTED from WFI or PSCI CPU_SUSPEND (same vCPU)
HALTED -> RUNNABLE vGIC IRQ, pending timer, signal
There shouldn't be any functional change with this patch, even though
the KVM_GET_MP_STATE ioctl could now in theory return
KVM_MP_STATE_HALTED, which would break some users' mp_state support. In
practice it should not happen because we do not return to userspace with
HALTED state. Both WFI and PSCI CPU_SUSPEND stay in the vCPU run loop
until the suspend request is consumed. It does feel fragile though,
maybe we should explicitly return RUNNABLE in KVM_GET_MP_STATE in place
of HALTED, to prevent future breakage.
Signed-off-by: Jean-Philippe Brucker <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/arm.c | 18 ++++++++++++++-
arch/arm64/kvm/handle_exit.c | 3 +--
arch/arm64/kvm/psci.c | 37 +++++++++++++------------------
4 files changed, 35 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 55a04f4d5919..3ca732feb9a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -46,6 +46,7 @@
#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
#define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3)
#define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4)
+#define KVM_REQ_SUSPEND KVM_ARCH_REQ(5)
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
KVM_DIRTY_LOG_INITIALLY_SET)
@@ -722,6 +723,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
bool kvm_arm_vcpu_is_off(struct kvm_vcpu *vcpu);
+void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu);
/* Guest/host FPSIMD coordination helpers */
int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index bcc24adb9c0a..d8cbaa0373c7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -447,6 +447,12 @@ bool kvm_arm_vcpu_is_off(struct kvm_vcpu *vcpu)
return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
}
+void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+ kvm_make_request(KVM_REQ_SUSPEND, vcpu);
+}
+
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
@@ -667,6 +673,8 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
static void check_vcpu_requests(struct kvm_vcpu *vcpu)
{
+ bool irq_pending;
+
if (kvm_request_pending(vcpu)) {
if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
vcpu_req_sleep(vcpu);
@@ -678,7 +686,7 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
* Clear IRQ_PENDING requests that were made to guarantee
* that a VCPU sees new virtual interrupts.
*/
- kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
+ irq_pending = kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
if (kvm_check_request(KVM_REQ_RECORD_STEAL, vcpu))
kvm_update_stolen_time(vcpu);
@@ -690,6 +698,14 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
vgic_v4_load(vcpu);
preempt_enable();
}
+
+ if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) {
+ if (!irq_pending) {
+ kvm_vcpu_block(vcpu);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+ }
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+ }
}
}
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6f48336b1d86..9717df3104cf 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -95,8 +95,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
} else {
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
vcpu->stat.wfi_exit_stat++;
- kvm_vcpu_block(vcpu);
- kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+ kvm_arm_vcpu_suspend(vcpu);
}
kvm_incr_pc(vcpu);
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 24b4a2265dbd..42a307ceb95f 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -31,27 +31,6 @@ static unsigned long psci_affinity_mask(unsigned long affinity_level)
return 0;
}
-static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
-{
- /*
- * NOTE: For simplicity, we make VCPU suspend emulation to be
- * same-as WFI (Wait-for-interrupt) emulation.
- *
- * This means for KVM the wakeup events are interrupts and
- * this is consistent with intended use of StateID as described
- * in section 5.4.1 of PSCI v0.2 specification (ARM DEN 0022A).
- *
- * Further, we also treat power-down request to be same as
- * stand-by request as-per section 5.4.2 clause 3 of PSCI v0.2
- * specification (ARM DEN 0022A). This means all suspend states
- * for KVM will preserve the register state.
- */
- kvm_vcpu_block(vcpu);
- kvm_clear_request(KVM_REQ_UNHALT, vcpu);
-
- return PSCI_RET_SUCCESS;
-}
-
static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
{
struct vcpu_reset_state *reset_state;
@@ -227,7 +206,21 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
break;
case PSCI_0_2_FN_CPU_SUSPEND:
case PSCI_0_2_FN64_CPU_SUSPEND:
- val = kvm_psci_vcpu_suspend(vcpu);
+ /*
+ * NOTE: For simplicity, we make VCPU suspend emulation to be
+ * same-as WFI (Wait-for-interrupt) emulation.
+ *
+ * This means for KVM the wakeup events are interrupts and this
+ * is consistent with intended use of StateID as described in
+ * section 5.4.1 of PSCI v0.2 specification (ARM DEN 0022A).
+ *
+ * Further, we also treat power-down request to be same as
+ * stand-by request as-per section 5.4.2 clause 3 of PSCI v0.2
+ * specification (ARM DEN 0022A). This means all suspend states
+ * for KVM will preserve the register state.
+ */
+ kvm_arm_vcpu_suspend(vcpu);
+ val = PSCI_RET_SUCCESS;
break;
case PSCI_0_2_FN_CPU_OFF:
kvm_arm_vcpu_power_off(vcpu);
--
2.31.1
Let userspace request to handle PSCI calls, by setting the new
KVM_CAP_ARM_PSCI_TO_USER capability.
SMCCC probe requires PSCI v1.x. If userspace only implements PSCI v0.2,
the guest won't query SMCCC support through PSCI and won't use the
spectre workarounds. We could hijack PSCI_VERSION and pretend to support
v1.0 if userspace does not, then handle all v1.0 calls ourselves
(including guessing the PSCI feature set implemented by the guest), but
that seems unnecessary. After all the API already allows userspace to
force a version lower than v1.0 using the firmware pseudo-registers.
The KVM_REG_ARM_PSCI_VERSION pseudo-register currently resets to either
v0.1 if userspace doesn't set KVM_ARM_VCPU_PSCI_0_2, or
KVM_ARM_PSCI_LATEST (1.0).
Suggested-by: James Morse <[email protected]>
Signed-off-by: Jean-Philippe Brucker <[email protected]>
---
Documentation/virt/kvm/api.rst | 14 ++++++++++++++
Documentation/virt/kvm/arm/psci.rst | 1 +
arch/arm64/include/asm/kvm_host.h | 1 +
include/kvm/arm_hypercalls.h | 1 +
include/uapi/linux/kvm.h | 1 +
arch/arm64/kvm/arm.c | 10 +++++++---
arch/arm64/kvm/hypercalls.c | 2 +-
arch/arm64/kvm/psci.c | 13 +++++++++++++
8 files changed, 39 insertions(+), 4 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3d8c1661e7b2..f24eb70e575d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6907,3 +6907,17 @@ available to the guest on migration.
This capability indicates that KVM can pass unhandled hypercalls to userspace,
if the VMM enables it. Hypercalls are passed with KVM_EXIT_HYPERCALL in
kvm_run::hypercall.
+
+8.34 KVM_CAP_ARM_PSCI_TO_USER
+-----------------------------
+
+:Architectures: arm64
+
+When the VMM enables this capability, all PSCI calls are passed to userspace
+instead of being handled by KVM. Capability KVM_CAP_ARM_HVC_TO_USER must be
+enabled first.
+
+Userspace should support at least PSCI v1.0. Otherwise SMCCC features won't be
+available to the guest. Userspace does not need to handle the SMCCC_VERSION
+parameter for the PSCI_FEATURES function. The KVM_ARM_VCPU_PSCI_0_2 vCPU
+feature should be set even if this capability is enabled.
diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/psci.rst
index d52c2e83b5b8..110011d1fa3f 100644
--- a/Documentation/virt/kvm/arm/psci.rst
+++ b/Documentation/virt/kvm/arm/psci.rst
@@ -34,6 +34,7 @@ The following register is defined:
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
+ - Defaults to PSCI 1.0 if userspace enables KVM_CAP_ARM_PSCI_TO_USER.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 25554ce97045..5d74b769c16d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -124,6 +124,7 @@ struct kvm_arch {
*/
bool return_nisv_io_abort_to_user;
bool hvc_to_user;
+ bool psci_to_user;
/*
* VM-wide PMU filter, implemented as a bitmap and big enough for
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 0e2509d27910..b66c6a000ef3 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -6,6 +6,7 @@
#include <asm/kvm_emulate.h>
+int kvm_hvc_user(struct kvm_vcpu *vcpu);
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index aa831986a399..2b8e55aa7e1e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1085,6 +1085,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_PTP_KVM 198
#define KVM_CAP_ARM_MP_HALTED 199
#define KVM_CAP_ARM_HVC_TO_USER 200
+#define KVM_CAP_ARM_PSCI_TO_USER 201
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 074197721e97..bc3e63b0b3ad 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -83,7 +83,7 @@ int kvm_arch_check_processor_compat(void *opaque)
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
- int r;
+ int r = -EINVAL;
if (cap->flags)
return -EINVAL;
@@ -97,8 +97,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = 0;
kvm->arch.hvc_to_user = true;
break;
- default:
- r = -EINVAL;
+ case KVM_CAP_ARM_PSCI_TO_USER:
+ if (kvm->arch.hvc_to_user) {
+ r = 0;
+ kvm->arch.psci_to_user = true;
+ }
break;
}
@@ -213,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PTP_KVM:
case KVM_CAP_ARM_MP_HALTED:
case KVM_CAP_ARM_HVC_TO_USER:
+ case KVM_CAP_ARM_PSCI_TO_USER:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index ccc2015eddf9..621d5a5b7e48 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -58,7 +58,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
val[3] = lower_32_bits(cycles);
}
-static int kvm_hvc_user(struct kvm_vcpu *vcpu)
+int kvm_hvc_user(struct kvm_vcpu *vcpu)
{
int i;
struct kvm_run *run = vcpu->run;
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 42a307ceb95f..7f44ee527966 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -353,6 +353,16 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
return 1;
}
+static bool kvm_psci_call_is_user(struct kvm_vcpu *vcpu)
+{
+ /* Handle the special case of SMCCC probe through PSCI */
+ if (smccc_get_function(vcpu) == PSCI_1_0_FN_PSCI_FEATURES &&
+ smccc_get_arg1(vcpu) == ARM_SMCCC_VERSION_FUNC_ID)
+ return false;
+
+ return vcpu->kvm->arch.psci_to_user;
+}
+
/**
* kvm_psci_call - handle PSCI call if r0 value is in range
* @vcpu: Pointer to the VCPU struct
@@ -369,6 +379,9 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
*/
int kvm_psci_call(struct kvm_vcpu *vcpu)
{
+ if (kvm_psci_call_is_user(vcpu))
+ return kvm_hvc_user(vcpu);
+
switch (kvm_psci_version(vcpu, vcpu->kvm)) {
case KVM_ARM_PSCI_1_0:
return kvm_psci_1_0_call(vcpu);
--
2.31.1
To help userspace implement PSCI CPU_SUSPEND, allow setting the "HALTED"
MP state to request a WFI before returning to the guest.
Userspace won't obtain a HALTED mp_state from a KVM_GET_MP_STATE call
unless they set it themselves. When set by KVM, to handle wfi or
CPU_SUSPEND, it is consumed before returning to userspace.
Signed-off-by: Jean-Philippe Brucker <[email protected]>
---
Documentation/virt/kvm/api.rst | 15 +++++++++------
include/uapi/linux/kvm.h | 1 +
arch/arm64/kvm/arm.c | 11 ++++++++++-
3 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..e4fe7fb60d5d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1416,8 +1416,8 @@ Possible values are:
which has not yet received an INIT signal [x86]
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
now ready for a SIPI [x86]
- KVM_MP_STATE_HALTED the vcpu has executed a HLT instruction and
- is waiting for an interrupt [x86]
+ KVM_MP_STATE_HALTED the vcpu has executed a HLT/WFI instruction
+ and is waiting for an interrupt [x86,arm64]
KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector
accessible via KVM_GET_VCPU_EVENTS) [x86]
KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64]
@@ -1435,8 +1435,9 @@ these architectures.
For arm/arm64:
^^^^^^^^^^^^^^
-The only states that are valid are KVM_MP_STATE_STOPPED and
-KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
+Valid states are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect
+if the vcpu is paused or not. If KVM_CAP_ARM_MP_HALTED is present, state
+KVM_MP_STATE_HALTED is also valid.
4.39 KVM_SET_MP_STATE
---------------------
@@ -1457,8 +1458,10 @@ these architectures.
For arm/arm64:
^^^^^^^^^^^^^^
-The only states that are valid are KVM_MP_STATE_STOPPED and
-KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
+Valid states are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect
+if the vcpu should be paused or not. If KVM_CAP_ARM_MP_HALTED is present,
+KVM_MP_STATE_HALTED can be set, to wait for interrupts targeted at the vcpu
+before running it.
4.40 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 79d9c44d1ad7..06ba64c49737 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1083,6 +1083,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_SGX_ATTRIBUTE 196
#define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
#define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_MP_HALTED 199
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d8cbaa0373c7..d6ad977fea5f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -207,6 +207,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
+ case KVM_CAP_ARM_MP_HALTED:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
@@ -469,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
case KVM_MP_STATE_RUNNABLE:
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
break;
+ case KVM_MP_STATE_HALTED:
+ kvm_arm_vcpu_suspend(vcpu);
+ break;
case KVM_MP_STATE_STOPPED:
kvm_arm_vcpu_power_off(vcpu);
break;
@@ -699,7 +703,12 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
preempt_enable();
}
- if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) {
+ /*
+ * Check mp_state again in case userspace changed their mind
+ * after requesting suspend.
+ */
+ if (kvm_check_request(KVM_REQ_SUSPEND, vcpu) &&
+ vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
if (!irq_pending) {
kvm_vcpu_block(vcpu);
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
--
2.31.1
Hi Jean-Philippe,
On Tue, Jun 8, 2021 at 4:54 PM Jean-Philippe Brucker
<[email protected]> wrote:
>
> To help userspace implement PSCI CPU_SUSPEND, allow setting the "HALTED"
> MP state to request a WFI before returning to the guest.
>
> Userspace won't obtain a HALTED mp_state from a KVM_GET_MP_STATE call
> unless they set it themselves. When set by KVM, to handle wfi or
> CPU_SUSPEND, it is consumed before returning to userspace.
>
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 15 +++++++++------
> include/uapi/linux/kvm.h | 1 +
> arch/arm64/kvm/arm.c | 11 ++++++++++-
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..e4fe7fb60d5d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1416,8 +1416,8 @@ Possible values are:
> which has not yet received an INIT signal [x86]
> KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
> now ready for a SIPI [x86]
> - KVM_MP_STATE_HALTED the vcpu has executed a HLT instruction and
> - is waiting for an interrupt [x86]
> + KVM_MP_STATE_HALTED the vcpu has executed a HLT/WFI instruction
> + and is waiting for an interrupt [x86,arm64]
Considering that arm64 has a HLT instruction (for debugging), which is
very different from the x86 one, would it be good to clarify that in
the comment. e.g.,
"the vcpu has executed a HLT(x86)/WFI(arm64) instruction"?
Thanks,
/fuad
> KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector
> accessible via KVM_GET_VCPU_EVENTS) [x86]
> KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64]
> @@ -1435,8 +1435,9 @@ these architectures.
> For arm/arm64:
> ^^^^^^^^^^^^^^
>
> -The only states that are valid are KVM_MP_STATE_STOPPED and
> -KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
> +Valid states are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect
> +if the vcpu is paused or not. If KVM_CAP_ARM_MP_HALTED is present, state
> +KVM_MP_STATE_HALTED is also valid.
>
> 4.39 KVM_SET_MP_STATE
> ---------------------
> @@ -1457,8 +1458,10 @@ these architectures.
> For arm/arm64:
> ^^^^^^^^^^^^^^
>
> -The only states that are valid are KVM_MP_STATE_STOPPED and
> -KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
> +Valid states are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect
> +if the vcpu should be paused or not. If KVM_CAP_ARM_MP_HALTED is present,
> +KVM_MP_STATE_HALTED can be set, to wait for interrupts targeted at the vcpu
> +before running it.
>
> 4.40 KVM_SET_IDENTITY_MAP_ADDR
> ------------------------------
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 79d9c44d1ad7..06ba64c49737 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1083,6 +1083,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_SGX_ATTRIBUTE 196
> #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
> #define KVM_CAP_PTP_KVM 198
> +#define KVM_CAP_ARM_MP_HALTED 199
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d8cbaa0373c7..d6ad977fea5f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -207,6 +207,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_VCPU_ATTRIBUTES:
> case KVM_CAP_PTP_KVM:
> + case KVM_CAP_ARM_MP_HALTED:
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> @@ -469,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> case KVM_MP_STATE_RUNNABLE:
> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> break;
> + case KVM_MP_STATE_HALTED:
> + kvm_arm_vcpu_suspend(vcpu);
> + break;
> case KVM_MP_STATE_STOPPED:
> kvm_arm_vcpu_power_off(vcpu);
> break;
> @@ -699,7 +703,12 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
> preempt_enable();
> }
>
> - if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) {
> + /*
> + * Check mp_state again in case userspace changed their mind
> + * after requesting suspend.
> + */
> + if (kvm_check_request(KVM_REQ_SUSPEND, vcpu) &&
> + vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
> if (!irq_pending) {
> kvm_vcpu_block(vcpu);
> kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> --
> 2.31.1
>
> _______________________________________________
> kvmarm mailing list
> [email protected]
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Hi Jean-Philippe,
On Tue, Jun 8, 2021 at 4:54 PM Jean-Philippe Brucker
<[email protected]> wrote:
>
> Prepare for WFI requests from userspace, by adding a suspend request and
> moving the WFI execution into check_vcpu_requests(), next to the
> power-off logic.
>
> vcpu->arch.mp_state, previously only RUNNABLE or STOPPED, supports an
> additional state HALTED and two new state transitions:
>
> RUNNABLE -> HALTED from WFI or PSCI CPU_SUSPEND (same vCPU)
> HALTED -> RUNNABLE vGIC IRQ, pending timer, signal
>
> There shouldn't be any functional change with this patch, even though
> the KVM_GET_MP_STATE ioctl could now in theory return
> KVM_MP_STATE_HALTED, which would break some users' mp_state support. In
> practice it should not happen because we do not return to userspace with
> HALTED state. Both WFI and PSCI CPU_SUSPEND stay in the vCPU run loop
> until the suspend request is consumed. It does feel fragile though,
> maybe we should explicitly return RUNNABLE in KVM_GET_MP_STATE in place
> of HALTED, to prevent future breakage.
It's not really a functional change, but it might introduce some
timing/scheduling changes I think.
Before your changes, the kvm_vcpu_block() would take place at the end
of the vCPU run loop, via handle_exit(). Now it takes place closer to
the beginning, after cond_resched() is called, and not if there is a
KVM_REQ_IRQ_PENDING.
If my observation is correct, would it be good to mention that?
Thanks,
/fuad
>
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/kvm/arm.c | 18 ++++++++++++++-
> arch/arm64/kvm/handle_exit.c | 3 +--
> arch/arm64/kvm/psci.c | 37 +++++++++++++------------------
> 4 files changed, 35 insertions(+), 25 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 55a04f4d5919..3ca732feb9a5 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -46,6 +46,7 @@
> #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2)
> #define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3)
> #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4)
> +#define KVM_REQ_SUSPEND KVM_ARCH_REQ(5)
>
> #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> KVM_DIRTY_LOG_INITIALLY_SET)
> @@ -722,6 +723,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr);
> void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> bool kvm_arm_vcpu_is_off(struct kvm_vcpu *vcpu);
> +void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu);
>
> /* Guest/host FPSIMD coordination helpers */
> int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index bcc24adb9c0a..d8cbaa0373c7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -447,6 +447,12 @@ bool kvm_arm_vcpu_is_off(struct kvm_vcpu *vcpu)
> return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
> }
>
> +void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> +{
> + vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> + kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> +}
> +
> int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> struct kvm_mp_state *mp_state)
> {
> @@ -667,6 +673,8 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
>
> static void check_vcpu_requests(struct kvm_vcpu *vcpu)
> {
> + bool irq_pending;
> +
> if (kvm_request_pending(vcpu)) {
> if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
> vcpu_req_sleep(vcpu);
> @@ -678,7 +686,7 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
> * Clear IRQ_PENDING requests that were made to guarantee
> * that a VCPU sees new virtual interrupts.
> */
> - kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
> + irq_pending = kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
>
> if (kvm_check_request(KVM_REQ_RECORD_STEAL, vcpu))
> kvm_update_stolen_time(vcpu);
> @@ -690,6 +698,14 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
> vgic_v4_load(vcpu);
> preempt_enable();
> }
> +
> + if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) {
> + if (!irq_pending) {
> + kvm_vcpu_block(vcpu);
> + kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> + }
> + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> + }
> }
> }
>
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 6f48336b1d86..9717df3104cf 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -95,8 +95,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> } else {
> trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
> vcpu->stat.wfi_exit_stat++;
> - kvm_vcpu_block(vcpu);
> - kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> + kvm_arm_vcpu_suspend(vcpu);
> }
>
> kvm_incr_pc(vcpu);
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 24b4a2265dbd..42a307ceb95f 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -31,27 +31,6 @@ static unsigned long psci_affinity_mask(unsigned long affinity_level)
> return 0;
> }
>
> -static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
> -{
> - /*
> - * NOTE: For simplicity, we make VCPU suspend emulation to be
> - * same-as WFI (Wait-for-interrupt) emulation.
> - *
> - * This means for KVM the wakeup events are interrupts and
> - * this is consistent with intended use of StateID as described
> - * in section 5.4.1 of PSCI v0.2 specification (ARM DEN 0022A).
> - *
> - * Further, we also treat power-down request to be same as
> - * stand-by request as-per section 5.4.2 clause 3 of PSCI v0.2
> - * specification (ARM DEN 0022A). This means all suspend states
> - * for KVM will preserve the register state.
> - */
> - kvm_vcpu_block(vcpu);
> - kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> -
> - return PSCI_RET_SUCCESS;
> -}
> -
> static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> {
> struct vcpu_reset_state *reset_state;
> @@ -227,7 +206,21 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
> break;
> case PSCI_0_2_FN_CPU_SUSPEND:
> case PSCI_0_2_FN64_CPU_SUSPEND:
> - val = kvm_psci_vcpu_suspend(vcpu);
> + /*
> + * NOTE: For simplicity, we make VCPU suspend emulation to be
> + * same-as WFI (Wait-for-interrupt) emulation.
> + *
> + * This means for KVM the wakeup events are interrupts and this
> + * is consistent with intended use of StateID as described in
> + * section 5.4.1 of PSCI v0.2 specification (ARM DEN 0022A).
> + *
> + * Further, we also treat power-down request to be same as
> + * stand-by request as-per section 5.4.2 clause 3 of PSCI v0.2
> + * specification (ARM DEN 0022A). This means all suspend states
> + * for KVM will preserve the register state.
> + */
> + kvm_arm_vcpu_suspend(vcpu);
> + val = PSCI_RET_SUCCESS;
> break;
> case PSCI_0_2_FN_CPU_OFF:
> kvm_arm_vcpu_power_off(vcpu);
> --
> 2.31.1
>
> _______________________________________________
> kvmarm mailing list
> [email protected]
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Hi Jean-Philippe,
On Tue, Jun 8, 2021 at 4:54 PM Jean-Philippe Brucker
<[email protected]> wrote:
>
> Let userspace request to handle PSCI calls, by setting the new
> KVM_CAP_ARM_PSCI_TO_USER capability.
>
> SMCCC probe requires PSCI v1.x. If userspace only implements PSCI v0.2,
> the guest won't query SMCCC support through PSCI and won't use the
> spectre workarounds. We could hijack PSCI_VERSION and pretend to support
> v1.0 if userspace does not, then handle all v1.0 calls ourselves
> (including guessing the PSCI feature set implemented by the guest), but
> that seems unnecessary. After all the API already allows userspace to
> force a version lower than v1.0 using the firmware pseudo-registers.
>
> The KVM_REG_ARM_PSCI_VERSION pseudo-register currently resets to either
> v0.1 if userspace doesn't set KVM_ARM_VCPU_PSCI_0_2, or
> KVM_ARM_PSCI_LATEST (1.0).
>
> Suggested-by: James Morse <[email protected]>
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 14 ++++++++++++++
> Documentation/virt/kvm/arm/psci.rst | 1 +
> arch/arm64/include/asm/kvm_host.h | 1 +
> include/kvm/arm_hypercalls.h | 1 +
> include/uapi/linux/kvm.h | 1 +
> arch/arm64/kvm/arm.c | 10 +++++++---
> arch/arm64/kvm/hypercalls.c | 2 +-
> arch/arm64/kvm/psci.c | 13 +++++++++++++
> 8 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 3d8c1661e7b2..f24eb70e575d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6907,3 +6907,17 @@ available to the guest on migration.
> This capability indicates that KVM can pass unhandled hypercalls to userspace,
> if the VMM enables it. Hypercalls are passed with KVM_EXIT_HYPERCALL in
> kvm_run::hypercall.
> +
> +8.34 KVM_CAP_ARM_PSCI_TO_USER
> +-----------------------------
> +
> +:Architectures: arm64
> +
> +When the VMM enables this capability, all PSCI calls are passed to userspace
> +instead of being handled by KVM. Capability KVM_CAP_ARM_HVC_TO_USER must be
> +enabled first.
> +
> +Userspace should support at least PSCI v1.0. Otherwise SMCCC features won't be
> +available to the guest. Userspace does not need to handle the SMCCC_VERSION
> +parameter for the PSCI_FEATURES function. The KVM_ARM_VCPU_PSCI_0_2 vCPU
> +feature should be set even if this capability is enabled.
> diff --git a/Documentation/virt/kvm/arm/psci.rst b/Documentation/virt/kvm/arm/psci.rst
> index d52c2e83b5b8..110011d1fa3f 100644
> --- a/Documentation/virt/kvm/arm/psci.rst
> +++ b/Documentation/virt/kvm/arm/psci.rst
> @@ -34,6 +34,7 @@ The following register is defined:
> - Allows any PSCI version implemented by KVM and compatible with
> v0.2 to be set with SET_ONE_REG
> - Affects the whole VM (even if the register view is per-vcpu)
> + - Defaults to PSCI 1.0 if userspace enables KVM_CAP_ARM_PSCI_TO_USER.
Should this be "to PSCI 1.x", to match the comment and for future expansion?
>
> * KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
> Holds the state of the firmware support to mitigate CVE-2017-5715, as
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 25554ce97045..5d74b769c16d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -124,6 +124,7 @@ struct kvm_arch {
> */
> bool return_nisv_io_abort_to_user;
> bool hvc_to_user;
> + bool psci_to_user;
>
> /*
> * VM-wide PMU filter, implemented as a bitmap and big enough for
> diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> index 0e2509d27910..b66c6a000ef3 100644
> --- a/include/kvm/arm_hypercalls.h
> +++ b/include/kvm/arm_hypercalls.h
> @@ -6,6 +6,7 @@
>
> #include <asm/kvm_emulate.h>
>
> +int kvm_hvc_user(struct kvm_vcpu *vcpu);
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
>
> static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index aa831986a399..2b8e55aa7e1e 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1085,6 +1085,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_PTP_KVM 198
> #define KVM_CAP_ARM_MP_HALTED 199
> #define KVM_CAP_ARM_HVC_TO_USER 200
> +#define KVM_CAP_ARM_PSCI_TO_USER 201
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 074197721e97..bc3e63b0b3ad 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -83,7 +83,7 @@ int kvm_arch_check_processor_compat(void *opaque)
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> struct kvm_enable_cap *cap)
> {
> - int r;
> + int r = -EINVAL;
>
> if (cap->flags)
> return -EINVAL;
> @@ -97,8 +97,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> r = 0;
> kvm->arch.hvc_to_user = true;
> break;
> - default:
> - r = -EINVAL;
> + case KVM_CAP_ARM_PSCI_TO_USER:
> + if (kvm->arch.hvc_to_user) {
> + r = 0;
> + kvm->arch.psci_to_user = true;
> + }
Should this return -EINVAL if hvc_to_user isn't set, rather than
silently not setting psci_to_user, or should it be a parameter of
KVM_CAP_ARM_HVC_TO_USER rather than its own thing?
Thanks,
/fuad
> break;
> }
>
> @@ -213,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_PTP_KVM:
> case KVM_CAP_ARM_MP_HALTED:
> case KVM_CAP_ARM_HVC_TO_USER:
> + case KVM_CAP_ARM_PSCI_TO_USER:
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index ccc2015eddf9..621d5a5b7e48 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,7 +58,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> -static int kvm_hvc_user(struct kvm_vcpu *vcpu)
> +int kvm_hvc_user(struct kvm_vcpu *vcpu)
> {
> int i;
> struct kvm_run *run = vcpu->run;
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 42a307ceb95f..7f44ee527966 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -353,6 +353,16 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> +static bool kvm_psci_call_is_user(struct kvm_vcpu *vcpu)
> +{
> + /* Handle the special case of SMCCC probe through PSCI */
> + if (smccc_get_function(vcpu) == PSCI_1_0_FN_PSCI_FEATURES &&
> + smccc_get_arg1(vcpu) == ARM_SMCCC_VERSION_FUNC_ID)
> + return false;
> +
> + return vcpu->kvm->arch.psci_to_user;
> +}
> +
> /**
> * kvm_psci_call - handle PSCI call if r0 value is in range
> * @vcpu: Pointer to the VCPU struct
> @@ -369,6 +379,9 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
> */
> int kvm_psci_call(struct kvm_vcpu *vcpu)
> {
> + if (kvm_psci_call_is_user(vcpu))
> + return kvm_hvc_user(vcpu);
> +
> switch (kvm_psci_version(vcpu, vcpu->kvm)) {
> case KVM_ARM_PSCI_1_0:
> return kvm_psci_1_0_call(vcpu);
> --
> 2.31.1
>
> _______________________________________________
> kvmarm mailing list
> [email protected]
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Mon, Jul 19, 2021 at 11:02 AM Jean-Philippe Brucker
<[email protected]> wrote:
> We forward the whole PSCI function range, so it's either KVM or userspace.
> If KVM manages PSCI and the guest calls an unimplemented function, that
> returns directly to the guest without going to userspace.
>
> The concern is valid for any other range, though. If userspace enables the
> HVC cap it receives function calls that at some point KVM might need to
> handle itself. So we need some negotiation between user and KVM about the
> specific HVC ranges that userspace can and will handle.
Are we going to use KVM_CAPs for every interesting HVC range that
userspace may want to trap? I wonder if a more generic interface for
hypercall filtering would have merit to handle the aforementioned
cases, and whatever else a VMM will want to intercept down the line.
For example, x86 has the concept of 'MSR filtering', wherein userspace
can specify a set of registers that it wants to intercept. Doing
something similar for HVCs would avoid the need for a kernel change
each time a VMM wishes to intercept a new hypercall.
--
Thanks,
Oliver
Hi Jean-Philippe,
I'm not really familiar with this part of KVM, and I'm still trying to get my head
around how this works, so please bare with me if I ask silly questions.
This is how I understand this will work:
1. VMM opts in to forward HVC calls not handled by KVM.
2. VMM opts in to forward PSCI calls, other than
PSCI_1_0_FN_PSCI_FEATURES(ARM_SMCCC_VERSION_FUNC_ID).
3. Guest emulates PSCI calls (and all the other HVC calls).
3.a For CPU_SUSPEND coming from VCPU A, userspace does a
KVM_SET_MP_STATE(KVM_MP_STATE_HALTED) ioctl on the VCPU fd which sets the request
KVM_REQ_SUSPEND.
3.b The next time the VCPU is run, KVM blocks the VCPU as a result of the
request. kvm_vcpu_block() does a schedule() in a loop until it decides that the
CPU must unblock.
3.c The VCPU will run as normal after kvm_vcpu_block() returns.
Please correct me if I got something wrong.
I have a few general questions. It doesn't mean there's something wrong with your
approach, I'm just trying to understand it better.
1. Why forwarding PSCI calls to userspace depend on enabling forwarding for other
HVC calls? As I understand from the patches, those handle distinct function IDs.
2. HVC call forwarding to userspace also forwards PSCI functions which are defined
in ARM DEN 0022D, but not (yet) implemented by KVM. What happens if KVM's PSCI
implementation gets support for one of those functions? How does userspace know
that now it also needs to enable PSCI call forwarding to be able to handle that
function?
It looks to me like the boundary between the functions that are forwarded when HVC
call forwarding is enabled and the functions that are forwarded when PSCI call
forwarding is enabled is based on what Linux v5.13 handles. Have you considered
choosing this boundary based on something less arbitrary, like the function types
specified in ARM DEN 0028C, table 2-1?
In my opinion, setting the MP state to HALTED looks like a sensible approach to
implementing PSCI_SUSPEND. I'll take a closer look at the patches after I get a
better understanding about what is going on.
On 6/8/21 4:48 PM, Jean-Philippe Brucker wrote:
> Allow userspace to request handling PSCI calls from guests. Our goal is
> to enable a vCPU hot-add solution for Arm where the VMM presents
> possible resources to the guest at boot, and controls which vCPUs can be
> brought up by allowing or denying PSCI CPU_ON calls. Passing HVC and
> PSCI to userspace has been discussed on the list in the context of vCPU
> hot-add [1,2] but it can also be useful for implementing other SMCCC and
> vendor hypercalls [3,4,5].
>
> Patches 1-3 allow userspace to request WFI to be executed in KVM. That
I don't understand this. KVM, in kvm_vcpu_block(), does not execute an WFI.
PSCI_SUSPEND is documented as being indistinguishable from an WFI from the guest's
point of view, but it's implementation is not architecturally defined.
Thanks,
Alex
> way the VMM can easily implement the PSCI CPU_SUSPEND function, which is
> mandatory from PSCI v0.2 onwards (even if it doesn't have a more useful
> implementation than WFI, natively available to the guest).
>
> Patch 4 lets userspace request any HVC that isn't handled by KVM, and
> patch 5 lets userspace request PSCI calls, disabling in-kernel PSCI
> handling.
>
> I'm focusing on the PSCI bits, but a complete prototype of vCPU hot-add
> for arm64 on Linux and QEMU, most of it from Salil and James, is
> available at [6].
>
> [1] https://lore.kernel.org/kvmarm/[email protected]/
> [2] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> (Followed by KVM forum and Linaro Open discussions)
> [3] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> [4] https://lore.kernel.org/kvm/[email protected]/
> [5] https://lore.kernel.org/kvm/[email protected]/
> [6] https://jpbrucker.net/git/linux/log/?h=cpuhp/devel
> https://jpbrucker.net/git/qemu/log/?h=cpuhp/devel
>
> Jean-Philippe Brucker (5):
> KVM: arm64: Replace power_off with mp_state in struct kvm_vcpu_arch
> KVM: arm64: Move WFI execution to check_vcpu_requests()
> KVM: arm64: Allow userspace to request WFI
> KVM: arm64: Pass hypercalls to userspace
> KVM: arm64: Pass PSCI calls to userspace
>
> Documentation/virt/kvm/api.rst | 46 +++++++++++++++----
> Documentation/virt/kvm/arm/psci.rst | 1 +
> arch/arm64/include/asm/kvm_host.h | 10 +++-
> include/kvm/arm_hypercalls.h | 1 +
> include/kvm/arm_psci.h | 4 ++
> include/uapi/linux/kvm.h | 3 ++
> arch/arm64/kvm/arm.c | 71 +++++++++++++++++++++--------
> arch/arm64/kvm/handle_exit.c | 3 +-
> arch/arm64/kvm/hypercalls.c | 28 +++++++++++-
> arch/arm64/kvm/psci.c | 69 ++++++++++++++--------------
> 10 files changed, 170 insertions(+), 66 deletions(-)
>
Hi Alex,
I'm not planning to resend this work at the moment, because it looks like
vcpu hot-add will go a different way so I don't have a user. But I'll
probably address the feedback so far and park it on some branch, in case
anyone else needs it.
On Mon, Jul 19, 2021 at 04:29:18PM +0100, Alexandru Elisei wrote:
> 1. Why forwarding PSCI calls to userspace depend on enabling forwarding for other
> HVC calls? As I understand from the patches, those handle distinct function IDs.
The HVC cap from patch 4 enables returning from the VCPU_RUN ioctl with
KVM_EXIT_HYPERCALL, for any HVC not handled by KVM. This one should
definitely be improved, either by letting userspace choose the ranges of
HVC it wants, or at least by reporting ranges reserved by KVM to
userspace.
The PSCI cap from patch 5 disables the in-kernel PSCI implementation. As a
result those HVCs are forwarded to userspace.
It was suggested that other users will want to handle HVC calls (SDEI for
example [1]), hence splitting into two capabilities rather than just the
PSCI cap. In v5.14 x86 added KVM_CAP_EXIT_HYPERCALL [2], which lets
userspace receive specific hypercalls. We could reuse that and have PSCI
be one bit of that capability's parameter.
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[2] https://lore.kernel.org/kvm/90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com/
> 2. HVC call forwarding to userspace also forwards PSCI functions which are defined
> in ARM DEN 0022D, but not (yet) implemented by KVM. What happens if KVM's PSCI
> implementation gets support for one of those functions? How does userspace know
> that now it also needs to enable PSCI call forwarding to be able to handle that
> function?
We forward the whole PSCI function range, so it's either KVM or userspace.
If KVM manages PSCI and the guest calls an unimplemented function, that
returns directly to the guest without going to userspace.
The concern is valid for any other range, though. If userspace enables the
HVC cap it receives function calls that at some point KVM might need to
handle itself. So we need some negotiation between user and KVM about the
specific HVC ranges that userspace can and will handle.
> It looks to me like the boundary between the functions that are forwarded when HVC
> call forwarding is enabled and the functions that are forwarded when PSCI call
> forwarding is enabled is based on what Linux v5.13 handles. Have you considered
> choosing this boundary based on something less arbitrary, like the function types
> specified in ARM DEN 0028C, table 2-1?
For PSCI I've used the range 0-0x1f as the boundary, which is reserved for
PSCI by SMCCC (table 6-4 in that document).
>
> In my opinion, setting the MP state to HALTED looks like a sensible approach to
> implementing PSCI_SUSPEND. I'll take a closer look at the patches after I get a
> better understanding about what is going on.
>
> On 6/8/21 4:48 PM, Jean-Philippe Brucker wrote:
> > Allow userspace to request handling PSCI calls from guests. Our goal is
> > to enable a vCPU hot-add solution for Arm where the VMM presents
> > possible resources to the guest at boot, and controls which vCPUs can be
> > brought up by allowing or denying PSCI CPU_ON calls. Passing HVC and
> > PSCI to userspace has been discussed on the list in the context of vCPU
> > hot-add [1,2] but it can also be useful for implementing other SMCCC and
> > vendor hypercalls [3,4,5].
> >
> > Patches 1-3 allow userspace to request WFI to be executed in KVM. That
>
> I don't understand this. KVM, in kvm_vcpu_block(), does not execute an WFI.
> PSCI_SUSPEND is documented as being indistinguishable from an WFI from the guest's
> point of view, but it's implementation is not architecturally defined.
Yes that was an oversimplification on my part
Thanks,
Jean
On Mon, Jul 19, 2021 at 12:37:52PM -0700, Oliver Upton wrote:
> On Mon, Jul 19, 2021 at 11:02 AM Jean-Philippe Brucker
> <[email protected]> wrote:
> > We forward the whole PSCI function range, so it's either KVM or userspace.
> > If KVM manages PSCI and the guest calls an unimplemented function, that
> > returns directly to the guest without going to userspace.
> >
> > The concern is valid for any other range, though. If userspace enables the
> > HVC cap it receives function calls that at some point KVM might need to
> > handle itself. So we need some negotiation between user and KVM about the
> > specific HVC ranges that userspace can and will handle.
>
> Are we going to use KVM_CAPs for every interesting HVC range that
> userspace may want to trap? I wonder if a more generic interface for
> hypercall filtering would have merit to handle the aforementioned
> cases, and whatever else a VMM will want to intercept down the line.
>
> For example, x86 has the concept of 'MSR filtering', wherein userspace
> can specify a set of registers that it wants to intercept. Doing
> something similar for HVCs would avoid the need for a kernel change
> each time a VMM wishes to intercept a new hypercall.
Yes we could introduce a VM device group for this:
* User reads attribute KVM_ARM_VM_HVC_NR_SLOTS, which defines the number
of available HVC ranges.
* User writes attribute KVM_ARM_VM_HVC_SET_RANGE with one range
struct kvm_arm_hvc_range {
__u32 slot;
#define KVM_ARM_HVC_USER (1 << 0) /* Enable range. 0 disables it */
__u16 flags;
__u16 imm;
__u32 fn_start;
__u32 fn_end;
};
* KVM forwards any HVC within this range to userspace.
* If one of the ranges is PSCI functions, disable KVM PSCI.
Since it's more work for KVM to keep track of ranges, I didn't include it
in the RFC, and I'm going to leave it to the next person dealing with this
stuff :)
Thanks,
Jean
On Tue, Jun 08, 2021 at 05:48:01PM +0200, Jean-Philippe Brucker wrote:
> Allow userspace to request handling PSCI calls from guests. Our goal is
> to enable a vCPU hot-add solution for Arm where the VMM presents
> possible resources to the guest at boot, and controls which vCPUs can be
> brought up by allowing or denying PSCI CPU_ON calls.
Since it looks like vCPU hot-add will be implemented differently, I don't
intend to resend this series at the moment. But some of it could be
useful for other projects and to avoid the helpful review effort going to
waste, I fixed it up and will leave it on branch
https://jpbrucker.net/git/linux/log/?h=kvm/psci-to-userspace
It now only uses KVM_CAP_EXIT_HYPERCALL introduced in v5.14.
Thanks,
Jean