[1/2] changes sync_pir_to_irr interface to avoid a definitely
superfluous KVM_REQ_EVENT in [2/2]. It's possible that we don't need
KVM_REQ_EVENT, but a slight slowdown in execution is worth the time
needed for a proof. (A quick test showed that KVM_REQ_EVENT is always
already set if sync_pir_to_irr does something.)
[2/2] is applicable to userspace split IOAPIC as well, but I hope that
we want it to work like normal IOAPIC (no ACK for edge irq), so I
haven't fixed it.
Radim Krčmář (2):
KVM: x86: return bool from x86_ops.sync_pir_to_irr
KVM: x86: fix edge EOI and IOAPIC reconfig race
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/ioapic.c | 7 ++++++-
arch/x86/kvm/svm.c | 4 ++--
arch/x86/kvm/vmx.c | 12 +++++++-----
4 files changed, 16 insertions(+), 9 deletions(-)
--
2.5.0
True means that we have added PIR to IRR.
Signed-off-by: Radim Krčmář <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm.c | 4 ++--
arch/x86/kvm/vmx.c | 12 +++++++-----
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09acaa64ef8e..b73696b59d77 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -831,7 +831,7 @@ struct kvm_x86_ops {
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
- void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
+ bool (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 189e46479dd5..cd4ad20951c4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3758,9 +3758,9 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu)
return;
}
-static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+static bool svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
{
- return;
+ return false;
}
static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4cf25b90dbe0..e3ae8c236cca 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -817,7 +817,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
static bool guest_state_valid(struct kvm_vcpu *vcpu);
static u32 vmx_segment_access_rights(struct kvm_segment *var);
-static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
+static bool vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
static int alloc_identity_pagetable(struct kvm *kvm);
@@ -4430,19 +4430,21 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
kvm_vcpu_kick(vcpu);
}
-static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+static bool vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
if (!pi_test_and_clear_on(&vmx->pi_desc))
- return;
+ return false;
kvm_apic_update_irr(vcpu, vmx->pi_desc.pir);
+
+ return true;
}
-static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
+static bool vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
{
- return;
+ return false;
}
/*
--
2.5.0
(The main problem is that we care about EOI of edge interrupts, but our
house of cards started a long time ago, so overturning that decision is
not ideal for a stable fix.)
KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration so EOI from the pending interrupt is not
recognized.
This is not a problem for level interrupts, because IOAPIC sends
interrupt with the new configuration.
And then there are edge interrupts with ACK notifiers, like i8254 timer;
things can happen in this order
1) IOAPIC inject a vector from i8254
2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
on original VCPU gets cleared
3) guest's handler for the vector does EOI
4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
not in that VCPU's eoi_exit_bitmap
5) i8254 stops working
A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.
This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated. The simple solution also doesn't work
if we are reconfiguring the vector. (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)
The are no races because ioapic injection and reconfig are locked.
Fixes: 638e7c03efea ("KVM: x86: Add EOI exit bitmap inference")
[Before 638e7c03efea, this bug happened only with APICv.]
Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support")
Cc: <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
---
arch/x86/kvm/ioapic.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2dcda0f188ba..85d25fe25e39 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -239,6 +239,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
union kvm_ioapic_redirect_entry *e;
int index;
+ if (kvm_x86_ops->sync_pir_to_irr(vcpu))
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
spin_lock(&ioapic->lock);
for (index = 0; index < IOAPIC_NUM_PINS; index++) {
e = &ioapic->redirtbl[index];
@@ -246,7 +249,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) ||
index == RTC_GSI) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
- e->fields.dest_id, e->fields.dest_mode))
+ e->fields.dest_id, e->fields.dest_mode) ||
+ (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
+ kvm_apic_pending_eoi(vcpu, e->fields.vector)))
__set_bit(e->fields.vector,
(unsigned long *)eoi_exit_bitmap);
}
--
2.5.0
On 13/08/2015 15:46, Radim Krčmář wrote:
> 1) IOAPIC inject a vector from i8254
> 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
> on original VCPU gets cleared
> 3) guest's handler for the vector does EOI
> 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
> not in that VCPU's eoi_exit_bitmap
> 5) i8254 stops working
>
> This creates an unwanted situation if the vector is reused by a
> non-IOAPIC source, but I think it is so rare that we don't want to make
> the solution more sophisticated.
What happens if the vector is changed in step 2?
__kvm_ioapic_update_eoi won't match the redirection table entry.
How do you reproduce the bug?
Paolo
2015-08-13 16:53+0200, Paolo Bonzini:
> On 13/08/2015 15:46, Radim Krčmář wrote:
>> 1) IOAPIC inject a vector from i8254
>> 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
>> on original VCPU gets cleared
>> 3) guest's handler for the vector does EOI
>> 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
>> not in that VCPU's eoi_exit_bitmap
>> 5) i8254 stops working
>>
>> This creates an unwanted situation if the vector is reused by a
>> non-IOAPIC source, but I think it is so rare that we don't want to make
>> the solution more sophisticated.
>
> What happens if the vector is changed in step 2?
> __kvm_ioapic_update_eoi won't match the redirection table entry.
Yes, the EOI is going to be ignored. (With APICv, VMX won't even exit.)
In the patch, I dissmissed it as "shouldn't happen in the wild" because
we've always had the vector-change bug :) (Unlike the destination-change
one, which was APICv-only before recent changes.)
A simple solution to the vector-change would have a list of one-time
fixups (vector, *ioapic) and hooks in ioapic reconfig, scan and EOI.
A complex solution would replace ioapic scanning with an array of list
of ioapics (it needs to be a list or small array because vectors can be
shared).
An ioapic would be added to list[vector] on reconfig and removed on
reconfig unless an edge fixup was needed, then it would last til next
EOI (I guess we won't need to consider vector in IRR and ISR).
Callbacks would update the eoi_exit_bitmap on relevant changes.
I considered doing the complex one, but then it occured to me that we
want the destination-change fixed in stable as APICv machines are
starting to get used and people might migrate old guests on them.
> How do you reproduce the bug?
I run rhel4 (2.6.9) kernel on 2 VCPUs and frequently alternate
smp_affinity of "timer". The bug is hit within seconds.
On 14/08/2015 10:38, Radim Krčmář wrote:
>> How do you reproduce the bug?
> I run rhel4 (2.6.9) kernel on 2 VCPUs and frequently alternate
> smp_affinity of "timer". The bug is hit within seconds.
Nice, I'll try to make a unit test for it on the plane. :)
Paolo