2024-01-29 21:39:43

by Colton Lewis

[permalink] [raw]
Subject: [PATCH] KVM: arm64: Add capability for unconditional WFx passthrough

Add KVM_CAP_ARM_WFX_PASSTHROUGH capability to always allow WFE/WFI
instructions to run without trapping. Current behavior is to only
allow this if the vcpu is the only task running. This commit keeps the
old behavior when the capability is not set.

This allows userspace to set deterministic behavior and increase
efficiency for platforms with direct interrupt injection support.

The implementation adds a new flag
KVM_ARCH_FLAG_WFX_PASSTHROUGH_ENABLED to kvm.arch.flags.

Signed-off-by: Colton Lewis <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/arm.c | 7 ++++++-
include/uapi/linux/kvm.h | 1 +
3 files changed, 9 insertions(+), 1 deletion(-)

This patch is based on v6.8-rc1
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..e0d5ec2983fa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -274,6 +274,8 @@ struct kvm_arch {
#define KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE 6
/* Initial ID reg values loaded */
#define KVM_ARCH_FLAG_ID_REGS_INITIALIZED 7
+ /* Never trap WFE/WFI instructions */
+#define KVM_ARCH_FLAG_WFX_PASSTHROUGH_ENABLED 8
unsigned long flags;

/* VM-wide vCPU feature set */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a25265aca432..6d993991bd7a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -116,6 +116,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->slots_lock);
break;
+ case KVM_CAP_ARM_WFX_PASSTHROUGH:
+ r = 0;
+ set_bit(KVM_ARCH_FLAG_WFX_PASSTHROUGH_ENABLED, &kvm->arch.flags);
+ break;
default:
r = -EINVAL;
break;
@@ -456,7 +460,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);

- if (single_task_running())
+ if (single_task_running() ||
+ test_bit(KVM_ARCH_FLAG_WFX_PASSTHROUGH_ENABLED, &vcpu->kvm->arch.flags))
vcpu_clear_wfx_traps(vcpu);
else
vcpu_set_wfx_traps(vcpu);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c3308536482b..7635b5cd2b3b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1155,6 +1155,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_ARM_WFX_PASSTHROUGH 236

#ifdef KVM_CAP_IRQ_ROUTING

--
2.43.0.429.g432eaa2c6b-goog


2024-01-29 23:17:27

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Add capability for unconditional WFx passthrough

Hi Colton,

On Mon, Jan 29, 2024 at 09:39:17PM +0000, Colton Lewis wrote:
> Add KVM_CAP_ARM_WFX_PASSTHROUGH capability to always allow WFE/WFI
> instructions to run without trapping. Current behavior is to only
> allow this if the vcpu is the only task running. This commit keeps the
> old behavior when the capability is not set.
>
> This allows userspace to set deterministic behavior and increase
> efficiency for platforms with direct interrupt injection support.

Marc and I actually had an offlist conversation (shame on us!) about
this very topic since there are users asking for the _opposite_ of this
patch (unconditionally trap) [*].

I had originally wanted something like this, but Marc made the very good
point that (1) the behavior of WFx traps is in no way user-visible and
(2) it is entirely an IMP DEF behavior. The architecture only requires
the traps be effective if the instruction does not complete in finite
time.

We need to think of an interface that doesn't depend on
implementation-specific behavior, such as a control based on runqueue
depth.

[*] https://lore.kernel.org/kvmarm/[email protected]/

--
Thanks,
Oliver

2024-01-31 18:38:24

by Colton Lewis

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Add capability for unconditional WFx passthrough

Oliver Upton <[email protected]> writes:

> Hi Colton,

> On Mon, Jan 29, 2024 at 09:39:17PM +0000, Colton Lewis wrote:
>> Add KVM_CAP_ARM_WFX_PASSTHROUGH capability to always allow WFE/WFI
>> instructions to run without trapping. Current behavior is to only
>> allow this if the vcpu is the only task running. This commit keeps the
>> old behavior when the capability is not set.

>> This allows userspace to set deterministic behavior and increase
>> efficiency for platforms with direct interrupt injection support.

> Marc and I actually had an offlist conversation (shame on us!) about
> this very topic since there are users asking for the _opposite_ of this
> patch (unconditionally trap) [*].

> I had originally wanted something like this, but Marc made the very good
> point that (1) the behavior of WFx traps is in no way user-visible and
> (2) it is entirely an IMP DEF behavior. The architecture only requires
> the traps be effective if the instruction does not complete in finite
> time.

> We need to think of an interface that doesn't depend on
> implementation-specific behavior, such as a control based on runqueue
> depth.

Good to know. I'll be thinking about that.


> [*]
> https://lore.kernel.org/kvmarm/[email protected]/

> --
> Thanks,
> Oliver

2024-02-19 09:39:53

by sundongxu (A)

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Add capability for unconditional WFx passthrough

Hi Marc, Oliver,

On 2024/1/30 7:17, Oliver Upton wrote:
> Hi Colton,
>
> On Mon, Jan 29, 2024 at 09:39:17PM +0000, Colton Lewis wrote:
>> Add KVM_CAP_ARM_WFX_PASSTHROUGH capability to always allow WFE/WFI
>> instructions to run without trapping. Current behavior is to only
>> allow this if the vcpu is the only task running. This commit keeps the
>> old behavior when the capability is not set.
>>
>> This allows userspace to set deterministic behavior and increase
>> efficiency for platforms with direct interrupt injection support.
>
> Marc and I actually had an offlist conversation (shame on us!) about
> this very topic since there are users asking for the _opposite_ of this
> patch (unconditionally trap) [*].
>
> I had originally wanted something like this, but Marc made the very good
> point that (1) the behavior of WFx traps is in no way user-visible and
> (2) it is entirely an IMP DEF behavior. The architecture only requires
> the traps be effective if the instruction does not complete in finite
> time.
>
> We need to think of an interface that doesn't depend on
> implementation-specific behavior, such as a control based on runqueue
> depth.

If I understand correctly, this run queue belongs to the scheduler, right?
And I will be appreciated if you can share any more detail information
about this.

>
> [*] https://lore.kernel.org/kvmarm/[email protected]/
>

Thanks,
Dongxu

2024-03-12 21:28:56

by Colton Lewis

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Add capability for unconditional WFx passthrough

Oliver Upton <[email protected]> writes:

> Hi Colton,

> On Mon, Jan 29, 2024 at 09:39:17PM +0000, Colton Lewis wrote:
>> Add KVM_CAP_ARM_WFX_PASSTHROUGH capability to always allow WFE/WFI
>> instructions to run without trapping. Current behavior is to only
>> allow this if the vcpu is the only task running. This commit keeps the
>> old behavior when the capability is not set.

>> This allows userspace to set deterministic behavior and increase
>> efficiency for platforms with direct interrupt injection support.

> Marc and I actually had an offlist conversation (shame on us!) about
> this very topic since there are users asking for the _opposite_ of this
> patch (unconditionally trap) [*].

> I had originally wanted something like this, but Marc made the very good
> point that (1) the behavior of WFx traps is in no way user-visible and
> (2) it is entirely an IMP DEF behavior. The architecture only requires
> the traps be effective if the instruction does not complete in finite
> time.

> We need to think of an interface that doesn't depend on
> implementation-specific behavior, such as a control based on runqueue
> depth.

Here's the first thing I came up with after returning to this problem:

We have an ioctl to get/set a threshold value,
wfx_traps_runqueue_depth. If the depth is less than or equal to the
threshold, disable WFx traps. If greater than, enable WFx traps.

Current behavior occurs with a setting of 1. Always trap occurs with a
setting of 0. Never trap occurs with any large enough number.

Of course, having an integer may be more flexible than needed. I can't
imagine a practical use for a number between 1 and UINT_MAX, in which
case it would be better as an enum for different behaviors than a
integer threshold.

What do you think?

Also, do you mean runqueue depth for the current CPU or globally?