2023-04-26 08:51:47

by Anselm Busse

[permalink] [raw]
Subject: [PATCH] KVM: x86: Add a vCPU stat for #AC exceptions

This patch adds a KVM vCPU stat that reflects the number of #AC
exceptions caused by a guest. This improves the identification and
debugging of issues that are possibly caused by guests triggering
split-locks and allows more insides compared to the current situation
of having only a warning printed when an #AC exception is raised.

Signed-off-by: Anselm Busse <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/x86.c | 1 +
3 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 808c292ad3f4..b4ab719fbc69 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1523,6 +1523,7 @@ struct kvm_vcpu_stat {
u64 preemption_other;
u64 guest_mode;
u64 notify_window_exits;
+ u64 split_lock_exceptions;
};

struct x86_instruction_info;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d2d6e1b6c788..8f48fd8ddead 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5309,6 +5309,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
kvm_run->debug.arch.exception = ex_no;
break;
case AC_VECTOR:
+ vmx->vcpu.stat.split_lock_exceptions++;
+
if (vmx_guest_inject_ac(vcpu)) {
kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d852ce84920..416a1ed6c423 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -297,6 +297,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, preemption_other),
STATS_DESC_IBOOLEAN(VCPU, guest_mode),
STATS_DESC_COUNTER(VCPU, notify_window_exits),
+ STATS_DESC_COUNTER(VCPU, split_lock_exceptions),
};

const struct kvm_stats_header kvm_vcpu_stats_header = {
--
2.39.2




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




2023-04-26 15:25:57

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Add a vCPU stat for #AC exceptions

On 4/26/2023 4:26 PM, Anselm Busse wrote:
> This patch adds a KVM vCPU stat that reflects the number of #AC
> exceptions caused by a guest. This improves the identification and
> debugging of issues that are possibly caused by guests triggering
> split-locks and allows more insides compared to the current situation
> of having only a warning printed when an #AC exception is raised.

Note, on Intel platform, #AC exception has three sources according to
the latest spec:

1. violation on alignment check when CPL = 3, while CR0.AM and EFLAG.AC
are set;

2. split lock, when MSR_MEMORY_CTRL.[29] is set;

3. UC lock, when CPUID.0x7_0x2:EDX[16] is 1 and
MSR_MEMORY_CTRL(0x33).[28] is 1. (see ISE version 048);

you cannot treat every #AC as split lock #AC.

> Signed-off-by: Anselm Busse <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 1 +
> 3 files changed, 4 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 808c292ad3f4..b4ab719fbc69 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1523,6 +1523,7 @@ struct kvm_vcpu_stat {
> u64 preemption_other;
> u64 guest_mode;
> u64 notify_window_exits;
> + u64 split_lock_exceptions;
> };
>
> struct x86_instruction_info;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d2d6e1b6c788..8f48fd8ddead 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5309,6 +5309,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> kvm_run->debug.arch.exception = ex_no;
> break;
> case AC_VECTOR:
> + vmx->vcpu.stat.split_lock_exceptions++;
> +
> if (vmx_guest_inject_ac(vcpu)) {
> kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3d852ce84920..416a1ed6c423 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -297,6 +297,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
> STATS_DESC_COUNTER(VCPU, preemption_other),
> STATS_DESC_IBOOLEAN(VCPU, guest_mode),
> STATS_DESC_COUNTER(VCPU, notify_window_exits),
> + STATS_DESC_COUNTER(VCPU, split_lock_exceptions),
> };
>
> const struct kvm_stats_header kvm_vcpu_stats_header = {

2023-04-26 16:58:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Add a vCPU stat for #AC exceptions

On Wed, Apr 26, 2023, Anselm Busse wrote:
> This patch adds a KVM vCPU stat that reflects the number of #AC
> exceptions caused by a guest. This improves the identification and
> debugging of issues that are possibly caused by guests triggering
> split-locks and allows more insides compared to the current situation
> of having only a warning printed when an #AC exception is raised.

Irrespective of the inaccuracy Xiaoyao pointed out, I don't want to add a one-off
stat for _any_ exception. I agree with what Marc said[*] when we (Google / GCP)
tried to push our pile o' stats upstream:

: Because I'm pretty sure that whatever stat we expose, every cloud
: vendor will want their own variant, so we may just as well put the
: matter in their own hands.

That doesn't mean I don't want a massive pile of stats about all things KVM, quite
the opposite, but I don't think they belong in upstream where KVM has to maintain
them in perpetuity. E.g. at some point in the (distant) future, split-lock #AC will
be completely uninteresting because all software will have been updated/fixed.

FWIW, we looked at using eBPF for our out-of-tree stats and ultimately decided that
carrying patches to add our stats would be significantly easier to maintain than an
eBPF-based approach, e.g. rebasing this patch is trivial. But the challenges we
anticipated with switching to eBPF were largely specific to running at scale. eBPF
is a very viable approach for gathering information for debug, development,
individual users, etc.

On idea I had for easing the pain of out-of-tree stats was to clean up KVM x86's
tracepoints, e.g. to give eBPF programs more stable and useful hooks, but also to
allow CSPs like us to play macro games to "inject" stats at key points, e.g. add
infrastructure to #define overload tracepoints to make KVM trampoline through
out-of-tree stats code. But we haven't pursued that idea because (a) as above,
carrying patches for out-of-tree stats requires minimal effort and (b) it wouldn't
eliminate "invasive" code because we'd (GCP) inevitably want stats in places where
a KVM tracepoint makes no sense.

So as much as I advocate for pushing code upstream, this is one of the few areas
where I think it's better to carry code out-of-tree.

[*] https://lore.kernel.org/all/[email protected]