2023-07-26 15:23:04

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v2 0/3] Fix 'Spurious APIC interrupt (vector 0xFF) on CPU#n' issue

Recently we found an issue which causes these error messages

to be sometimes logged if the guest has VFIO device attached:



'Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen'



It was traced to the incorrect APICv inhibition bug which started with

'KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base'

(All these issues are now fixed)



However, there are valid cases for the APICv to be inhibited and it should not

cause spurious interrupts to be injected to the guest.



After some debug, the root cause was found and it is that __kvm_apic_update_irr

doesn't set irr_pending which later triggers a int->unsigned char conversion

bug which leads to the wrong 0xFF injection.



This also leads to an unbounded delay in injecting the interrupt and hurts

performance.



In addition to that, I also noticed that __kvm_apic_update_irr is not atomic

in regard to IRR, which can lead to an even harder to debug bug.



V2: applied Paolo's feedback for the patch 1.



Best regards,

Maxim Levitsky



Maxim Levitsky (3):

KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomically

KVM: x86: VMX: set irr_pending in kvm_apic_update_irr

KVM: x86: check the kvm_cpu_get_interrupt result before using it



arch/x86/kvm/lapic.c | 25 +++++++++++++++++--------

arch/x86/kvm/x86.c | 10 +++++++---

2 files changed, 24 insertions(+), 11 deletions(-)



--

2.26.3






2023-07-26 15:27:09

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v2 2/3] KVM: x86: VMX: set irr_pending in kvm_apic_update_irr

When the APICv is inhibited, the irr_pending optimization is used.

Therefore, when kvm_apic_update_irr sets bits in the IRR,
it must set irr_pending to true as well.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/lapic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b17b37e4d4fcd1..a983a16163b137 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -666,8 +666,11 @@ EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir, int *max_irr)
{
struct kvm_lapic *apic = vcpu->arch.apic;
+ bool irr_updated = __kvm_apic_update_irr(pir, apic->regs, max_irr);

- return __kvm_apic_update_irr(pir, apic->regs, max_irr);
+ if (unlikely(!apic->apicv_active && irr_updated))
+ apic->irr_pending = true;
+ return irr_updated;
}
EXPORT_SYMBOL_GPL(kvm_apic_update_irr);

--
2.26.3


2023-07-29 03:42:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Fix 'Spurious APIC interrupt (vector 0xFF) on CPU#n' issue

On Wed, Jul 26, 2023, Maxim Levitsky wrote:
> Maxim Levitsky (3):
> KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomically
> KVM: x86: VMX: set irr_pending in kvm_apic_update_irr
> KVM: x86: check the kvm_cpu_get_interrupt result before using it
>
> arch/x86/kvm/lapic.c | 25 +++++++++++++++++--------
> arch/x86/kvm/x86.c | 10 +++++++---
> 2 files changed, 24 insertions(+), 11 deletions(-)

Paolo, are you still planning on taking these directly? I can also grab them
and send them your way next week. I have a couple of (not super urgent, but
kinda urgent) fixes for 6.5 that I'm planning on sending a PULL request for, just
didn't get around to that today.