2024-02-15 16:19:16

by Dongli Zhang

[permalink] [raw]
Subject: Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv

Hi Alejandro,

Is there any use case of this counter in the bug?

E.g., there are already trace_kvm_apicv_accept_irq() there. The ftrace or ebpf
would be able to tell if the hardware accelerated interrupt delivery is active?.

Any extra benefits? E.g., if this counter may need to match with any other
counter in the KVM/guest so that a bug can be detected? That will be very helpful.

Thank you very much!

Dongli Zhang

On 2/15/24 08:01, Alejandro Jimenez wrote:
> Export binary stat counting how many interrupts have been delivered via
> APICv/AVIC acceleration from the host. This is one of the most reliable
> methods to detect when hardware accelerated interrupt delivery is active,
> since APIC timer interrupts are regularly injected and exercise these
> code paths.
>
> Signed-off-by: Alejandro Jimenez <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm/svm.c | 3 +++
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 1 +
> 4 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9b960a523715..b6f18084d504 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1564,6 +1564,7 @@ struct kvm_vcpu_stat {
> u64 preemption_other;
> u64 guest_mode;
> u64 notify_window_exits;
> + u64 apicv_accept_irq;
> };
>
> struct x86_instruction_info;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e90b429c84f1..2243af08ed39 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3648,6 +3648,9 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
> }
>
> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
> +
> + ++vcpu->stat.apicv_accept_irq;
> +
> if (in_guest_mode) {
> /*
> * Signal the doorbell to tell hardware to inject the IRQ. If
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d4e6625e0a9a..f7db75ae2c55 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4275,6 +4275,8 @@ static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
> } else {
> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
> trig_mode, vector);
> +
> + ++vcpu->stat.apicv_accept_irq;
> }
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f7f598f066e7..2ad70cf6e52c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -304,6 +304,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
> STATS_DESC_COUNTER(VCPU, preemption_other),
> STATS_DESC_IBOOLEAN(VCPU, guest_mode),
> STATS_DESC_COUNTER(VCPU, notify_window_exits),
> + STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
> };
>
> const struct kvm_stats_header kvm_vcpu_stats_header = {


2024-02-15 18:12:52

by Alejandro Jimenez

[permalink] [raw]
Subject: Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv

Hi Dongli

On 2/15/24 11:16, Dongli Zhang wrote:
> Hi Alejandro,
>
> Is there any use case of this counter in the bug?

I don't have a specific bug in mind that this is trying to address. This patch is just an example is to show how existing data points (i.e. the trace_kvm_apicv_accept_irq tracepoint) can also be exposed via the stats framework with minimal overhead, and to support the point in the cover letter that querying the binary stats could be the best choice for a "single source" that tells us the full status of APICv/AVIC (i.e. is SVM and IOMMU AVIC both working, are there any inhibits set, etc)

>
> E.g., there are already trace_kvm_apicv_accept_irq() there. The ftrace or ebpf
> would be able to tell if the hardware accelerated interrupt delivery is active?.

Yes, the tracepoint already provides information if you know it exists AND have sufficient privileges to use tracefs or ebpf. The purpose of the RFC is to agree on a mechanism by which to expose all the apicv relevant data (and any new additions) via a single interface so that the sources of information are not scattered across tracepoints, debugfs entries, or in data structures that need to be read via BPF.

My understanding is that the stats subsystem method can work when using ftrace of bpftrace is not possible, so that is why I am suggesting that is used as the "standard" method to expose this info.
There will of course be some duplication with existing tracepoints, but there is already precedent in KVM where both stats and tracepoints are updated simultaneously (e.g. mmu_{un}sync_page(), {svm|vmx}_inject_irq()).

>
> Any extra benefits? E.g., if this counter may need to match with any other
> counter in the KVM/guest so that a bug can be detected? That will be very helpful.

Again, I didn't have a specific scenario for using this counter other than the associated tracepoint is the one I typically use to determine if APICv is active. But let's think of an example on the spot: In a hypothetical scenario where I want to determine the ratio that a vCPU spends blocking or in guest mode, I could add another stat e.g.:

+
+ ++vcpu->stat.apicv_accept_irq;
+
if (in_guest_mode) {
/*
* Signal the doorbell to tell hardware to inject the IRQ. If
* the vCPU exits the guest before the doorbell chimes, hardware
* will automatically process AVIC interrupts at the next VMRUN.
*/
avic_ring_doorbell(vcpu);
+ ++vcpu->stat.avic_doorbell_rung;
} else {
/*
* Wake the vCPU if it was blocking. KVM will then detect the
* pending IRQ when checking if the vCPU has a wake event.
*/
kvm_vcpu_wake_up(vcpu);
}

and then the ratio of (avic_doorbell_rung / apicv_accept_irq) lets me estimate a percentage of time the target vCPU is idle or running. There are likely better ways of determining this, but you get the idea. The goal is to have a general consensus for whether or not I should opt to add a new tracepoint (trace_kvm_avic_ring_doorbell) or a new stat as the "preferred" solution. Obviously there are still cases where a tracepoint is the best approach (e.g. it transfers more information).

Hopefully I didn't stray too far from your question/point.

Alejandro

>
> Thank you very much!
>
> Dongli Zhang
>
> On 2/15/24 08:01, Alejandro Jimenez wrote:
>> Export binary stat counting how many interrupts have been delivered via
>> APICv/AVIC acceleration from the host. This is one of the most reliable
>> methods to detect when hardware accelerated interrupt delivery is active,
>> since APIC timer interrupts are regularly injected and exercise these
>> code paths.
>>
>> Signed-off-by: Alejandro Jimenez <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/svm/svm.c | 3 +++
>> arch/x86/kvm/vmx/vmx.c | 2 ++
>> arch/x86/kvm/x86.c | 1 +
>> 4 files changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 9b960a523715..b6f18084d504 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1564,6 +1564,7 @@ struct kvm_vcpu_stat {
>> u64 preemption_other;
>> u64 guest_mode;
>> u64 notify_window_exits;
>> + u64 apicv_accept_irq;
>> };
>>
>> struct x86_instruction_info;
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index e90b429c84f1..2243af08ed39 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3648,6 +3648,9 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>> }
>>
>> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
>> +
>> + ++vcpu->stat.apicv_accept_irq;
>> +
>> if (in_guest_mode) {
>> /*
>> * Signal the doorbell to tell hardware to inject the IRQ. If
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index d4e6625e0a9a..f7db75ae2c55 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4275,6 +4275,8 @@ static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
>> } else {
>> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
>> trig_mode, vector);
>> +
>> + ++vcpu->stat.apicv_accept_irq;
>> }
>> }
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f7f598f066e7..2ad70cf6e52c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -304,6 +304,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
>> STATS_DESC_COUNTER(VCPU, preemption_other),
>> STATS_DESC_IBOOLEAN(VCPU, guest_mode),
>> STATS_DESC_COUNTER(VCPU, notify_window_exits),
>> + STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
>> };
>>
>> const struct kvm_stats_header kvm_vcpu_stats_header = {

2024-04-16 18:25:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv

On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> Hi Dongli
>
> On 2/15/24 11:16, Dongli Zhang wrote:
> > Hi Alejandro,
> >
> > Is there any use case of this counter in the bug?
>
> I don't have a specific bug in mind that this is trying to address. This
> patch is just an example is to show how existing data points (i.e. the
> trace_kvm_apicv_accept_irq tracepoint) can also be exposed via the stats
> framework with minimal overhead, and to support the point in the cover letter
> that querying the binary stats could be the best choice for a "single source"
> that tells us the full status of APICv/AVIC (i.e. is SVM and IOMMU AVIC both
> working, are there any inhibits set, etc)

Yeah, but as noted in my response to the cover letter, stats are ABI, whereas
tracepoints are not, i.e. the bar for adding stats is much higher than the bar
for adding tracepoints.

In other words, stats need to come with a concrete use case (preferably more than
one), an explanation of why userspace needs a KVM-provided stat, and a decent
level of confidence that KVM can provide deterministic, sane, and broadly useful
data.

E.g. this proposed stat is of limited usefulness because it applies to a very
narrow combination of IRQs and hardware.