I'm looking at reducing the interrupt overhead for virtualized guests:
some workloads spend a large part of their time processing interrupts.
This patchset supplies infrastructure to reduce the IRQ ack overhead on
x86: the idea is to add an eoi_write callback that we can then optimize
without touching other apic functionality.
The main user is kvm: on kvm, an EOI write from the guest causes an
expensive exit to host; we can avoid this using shared memory as the
last patches in the series demonstrate.
But I also wrote a micro-optimized version for the regular x2apic: this
shaves off a branch and about 9 instructions from EOI when x2apic is
used, and a comment in ack_APIC_irq implies that someone counted
instructions there, at some point.
That's patch 4 in the series - if someone's unhappy with
this patch specifically this patch can be dropped
as nothing else in the series depends on it.
Also included in the patchset are a couple of trivial macro fixes.
The patches work fine on my boxes. See individual patches
for perf tests. You need to patch qemu to whitelist the kvm feature.
qemu patch will be sent as a reply to this mail.
The patches are against 3.4-rc7 - let me know if
I need to rebase.
Please review, and consider for linux 3.5.
Thanks,
MST
Changes from v3:
Address review comments by Marcelo:
Multiple cosmetic changes eoi -> pv_eoi
Added multiple comments
Changes from v2:
Kill guest with GP on an illegal MSR value
Add documentation
Changes from v1:
Add host side patch to series
Remove kvm-specific __test_and_clear_bit, document
that x86 one does what we want already
Clear msr on cpu unplug
Michael S. Tsirkin (5):
kvm_para: guest side for eoi avoidance
x86/bitops: note on __test_and_clear_bit atomicity
kvm: host side for eoi optimization
kvm: eoi msi documentation
kvm: only sync when attention bits set
Documentation/virtual/kvm/msr.txt | 32 +++++++++
arch/x86/include/asm/bitops.h | 13 +++-
arch/x86/include/asm/kvm_host.h | 12 +++
arch/x86/include/asm/kvm_para.h | 7 ++
arch/x86/kernel/kvm.c | 51 +++++++++++++-
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/irq.c | 2 +-
arch/x86/kvm/lapic.c | 137 +++++++++++++++++++++++++++++++++++--
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/trace.h | 34 +++++++++
arch/x86/kvm/x86.c | 8 ++-
11 files changed, 287 insertions(+), 12 deletions(-)
--
MST
The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
Guest tests it using a single est and clear operation - this is
necessary so that host can detect interrupt nesting - and if set, it can
skip the EOI MSR.
I run a simple microbenchmark to show exit reduction
(note: for testing, need to apply follow-up patch
'kvm: host side for eoi optimization' + a qemu patch
I posted separately, on host):
Before:
Performance counter stats for 'sleep 1s':
47,357 kvm:kvm_entry [99.98%]
0 kvm:kvm_hypercall [99.98%]
0 kvm:kvm_hv_hypercall [99.98%]
5,001 kvm:kvm_pio [99.98%]
0 kvm:kvm_cpuid [99.98%]
22,124 kvm:kvm_apic [99.98%]
49,849 kvm:kvm_exit [99.98%]
21,115 kvm:kvm_inj_virq [99.98%]
0 kvm:kvm_inj_exception [99.98%]
0 kvm:kvm_page_fault [99.98%]
22,937 kvm:kvm_msr [99.98%]
0 kvm:kvm_cr [99.98%]
0 kvm:kvm_pic_set_irq [99.98%]
0 kvm:kvm_apic_ipi [99.98%]
22,207 kvm:kvm_apic_accept_irq [99.98%]
22,421 kvm:kvm_eoi [99.98%]
0 kvm:kvm_pv_eoi [99.99%]
0 kvm:kvm_nested_vmrun [99.99%]
0 kvm:kvm_nested_intercepts [99.99%]
0 kvm:kvm_nested_vmexit [99.99%]
0 kvm:kvm_nested_vmexit_inject [99.99%]
0 kvm:kvm_nested_intr_vmexit [99.99%]
0 kvm:kvm_invlpga [99.99%]
0 kvm:kvm_skinit [99.99%]
57 kvm:kvm_emulate_insn [99.99%]
0 kvm:vcpu_match_mmio [99.99%]
0 kvm:kvm_userspace_exit [99.99%]
2 kvm:kvm_set_irq [99.99%]
2 kvm:kvm_ioapic_set_irq [99.99%]
23,609 kvm:kvm_msi_set_irq [99.99%]
1 kvm:kvm_ack_irq [99.99%]
131 kvm:kvm_mmio [99.99%]
226 kvm:kvm_fpu [100.00%]
0 kvm:kvm_age_page [100.00%]
0 kvm:kvm_try_async_get_page [100.00%]
0 kvm:kvm_async_pf_doublefault [100.00%]
0 kvm:kvm_async_pf_not_present [100.00%]
0 kvm:kvm_async_pf_ready [100.00%]
0 kvm:kvm_async_pf_completed
1.002100578 seconds time elapsed
After:
Performance counter stats for 'sleep 1s':
28,354 kvm:kvm_entry [99.98%]
0 kvm:kvm_hypercall [99.98%]
0 kvm:kvm_hv_hypercall [99.98%]
1,347 kvm:kvm_pio [99.98%]
0 kvm:kvm_cpuid [99.98%]
1,931 kvm:kvm_apic [99.98%]
29,595 kvm:kvm_exit [99.98%]
24,884 kvm:kvm_inj_virq [99.98%]
0 kvm:kvm_inj_exception [99.98%]
0 kvm:kvm_page_fault [99.98%]
1,986 kvm:kvm_msr [99.98%]
0 kvm:kvm_cr [99.98%]
0 kvm:kvm_pic_set_irq [99.98%]
0 kvm:kvm_apic_ipi [99.99%]
25,953 kvm:kvm_apic_accept_irq [99.99%]
26,132 kvm:kvm_eoi [99.99%]
26,593 kvm:kvm_pv_eoi [99.99%]
0 kvm:kvm_nested_vmrun [99.99%]
0 kvm:kvm_nested_intercepts [99.99%]
0 kvm:kvm_nested_vmexit [99.99%]
0 kvm:kvm_nested_vmexit_inject [99.99%]
0 kvm:kvm_nested_intr_vmexit [99.99%]
0 kvm:kvm_invlpga [99.99%]
0 kvm:kvm_skinit [99.99%]
284 kvm:kvm_emulate_insn [99.99%]
68 kvm:vcpu_match_mmio [99.99%]
68 kvm:kvm_userspace_exit [99.99%]
2 kvm:kvm_set_irq [99.99%]
2 kvm:kvm_ioapic_set_irq [99.99%]
28,288 kvm:kvm_msi_set_irq [99.99%]
1 kvm:kvm_ack_irq [99.99%]
131 kvm:kvm_mmio [100.00%]
588 kvm:kvm_fpu [100.00%]
0 kvm:kvm_age_page [100.00%]
0 kvm:kvm_try_async_get_page [100.00%]
0 kvm:kvm_async_pf_doublefault [100.00%]
0 kvm:kvm_async_pf_not_present [100.00%]
0 kvm:kvm_async_pf_ready [100.00%]
0 kvm:kvm_async_pf_completed
1.002039622 seconds time elapsed
We see that # of exits is almost halved.
Signed-off-by: Michael S. Tsirkin <[email protected]>
---
arch/x86/include/asm/bitops.h | 6 +++-
arch/x86/include/asm/kvm_para.h | 7 +++++
arch/x86/kernel/kvm.c | 51 ++++++++++++++++++++++++++++++++++++--
3 files changed, 59 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index b97596e..c9c70ea 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -26,11 +26,13 @@
#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
/* Technically wrong, but this avoids compilation errors on some gcc
versions. */
-#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "=m"
#else
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "+m"
#endif
+#define BITOP_ADDR(x) BITOP_ADDR_CONSTRAINT (*(volatile long *) (x))
+
#define ADDR BITOP_ADDR(addr)
/*
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..6203ffb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -22,6 +22,7 @@
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
+#define KVM_FEATURE_PV_EOI 6
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
@@ -37,6 +38,7 @@
#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
#define MSR_KVM_STEAL_TIME 0x4b564d03
+#define MSR_KVM_PV_EOI_EN 0x4b564d04
struct kvm_steal_time {
__u64 steal;
@@ -89,6 +91,11 @@ struct kvm_vcpu_pv_apf_data {
__u32 enabled;
};
+#define KVM_PV_EOI_BIT 0
+#define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
+#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
+#define KVM_PV_EOI_DISABLED 0x0
+
#ifdef __KERNEL__
#include <asm/processor.h>
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..85cd6ac 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,6 +39,8 @@
#include <asm/desc.h>
#include <asm/tlbflush.h>
#include <asm/idle.h>
+#include <asm/apic.h>
+#include <asm/apicdef.h>
static int kvmapf = 1;
@@ -283,6 +285,24 @@ static void kvm_register_steal_time(void)
cpu, __pa(st));
}
+/* size alignment is implied but just to make it explicit. */
+static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) __aligned(2) =
+ KVM_PV_EOI_DISABLED;
+
+static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+{
+ /**
+ * This relies on __test_and_clear_bit to modify the memory
+ * in a way that is atomic with respect to the local CPU.
+ * The hypervisor only accesses this memory from the local CPU so
+ * there's no need for lock or memory barriers.
+ * An optimization barrier is implied in apic write.
+ */
+ if (__test_and_clear_bit(KVM_PV_EOI_BIT, &__get_cpu_var(kvm_apic_eoi)))
+ return;
+ apic->write(APIC_EOI, APIC_EOI_ACK);
+}
+
void __cpuinit kvm_guest_cpu_init(void)
{
if (!kvm_para_available())
@@ -300,11 +320,17 @@ void __cpuinit kvm_guest_cpu_init(void)
smp_processor_id());
}
+ if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+ __get_cpu_var(kvm_apic_eoi) = 0;
+ wrmsrl(MSR_KVM_PV_EOI_EN, __pa(&__get_cpu_var(kvm_apic_eoi)) |
+ KVM_MSR_ENABLED);
+ }
+
if (has_steal_clock)
kvm_register_steal_time();
}
-static void kvm_pv_disable_apf(void *unused)
+static void kvm_pv_disable_apf(void)
{
if (!__get_cpu_var(apf_reason).enabled)
return;
@@ -316,11 +342,18 @@ static void kvm_pv_disable_apf(void *unused)
smp_processor_id());
}
+static void kvm_pv_guest_cpu_reboot(void *unused)
+{
+ if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+ wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+ kvm_pv_disable_apf();
+}
+
static int kvm_pv_reboot_notify(struct notifier_block *nb,
unsigned long code, void *unused)
{
if (code == SYS_RESTART)
- on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+ on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
return NOTIFY_DONE;
}
@@ -371,7 +404,9 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
static void kvm_guest_cpu_offline(void *dummy)
{
kvm_disable_steal_time();
- kvm_pv_disable_apf(NULL);
+ if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
+ wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+ kvm_pv_disable_apf();
apf_task_wake_all();
}
@@ -424,6 +459,16 @@ void __init kvm_guest_init(void)
pv_time_ops.steal_clock = kvm_steal_clock;
}
+ if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+ struct apic **drv;
+
+ for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
+ /* Should happen once for each apic */
+ WARN_ON((*drv)->eoi_write == kvm_guest_apic_eoi_write);
+ (*drv)->eoi_write = kvm_guest_apic_eoi_write;
+ }
+ }
+
#ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(&kvm_cpu_notifier);
--
MST
__test_and_clear_bit is actually atomic with respect
to the local CPU. Add a note saying that KVM on x86
relies on this behaviour so people don't accidentaly break it.
Also warn not to rely on this in portable code.
Signed-off-by: Michael S. Tsirkin <[email protected]>
---
arch/x86/include/asm/bitops.h | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index c9c70ea..86f3a1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -264,6 +264,13 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
* This operation is non-atomic and can be reordered.
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
+ *
+ * Note: the operation is performed atomically with respect to
+ * the local CPU, but not other CPUs. Portable code should not
+ * rely on this behaviour.
+ * KVM relies on this behaviour on x86 for modifying memory that is also
+ * accessed from a hypervisor on the same CPU if running in a VM: don't change
+ * this without also updating arch/x86/kernel/kvm.c
*/
static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
{
--
MST
Implementation of PV EOI using shared memory.
This reduces the number of exits an interrupt
causes as much as by half.
The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
We set it before injecting an interrupt and clear
before injecting a nested one. Guest tests it using
a test and clear operation - this is necessary
so that host can detect interrupt nesting -
and if set, it can skip the EOI MSR.
There's a new MSR to set the address of said register
in guest memory. Otherwise not much changed:
- Guest EOI is not required
- Register is tested & ISR is automatically cleared on exit
For testing results see description of previous patch
'kvm_para: guest side for eoi avoidance'.
Signed-off-by: Michael S. Tsirkin <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 12 ++++
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/irq.c | 2 +-
arch/x86/kvm/lapic.c | 137 +++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/trace.h | 34 ++++++++++
arch/x86/kvm/x86.c | 5 ++
7 files changed, 187 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32297f2..3ded418 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -174,6 +174,13 @@ enum {
/* apic attention bits */
#define KVM_APIC_CHECK_VAPIC 0
+/*
+ * The following bit is set with PV-EOI, unset on EOI.
+ * We detect PV-EOI changes by guest by comparing
+ * this bit with PV-EOI in guest memory.
+ * See the implementation in apic_update_pv_eoi.
+ */
+#define KVM_APIC_PV_EOI_PENDING 1
/*
* We don't want allocation failures within the mmu code, so we preallocate
@@ -485,6 +492,11 @@ struct kvm_vcpu_arch {
u64 length;
u64 status;
} osvw;
+
+ struct {
+ u64 msr_val;
+ struct gfn_to_hva_cache data;
+ } pv_eoi;
};
struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eba8345..a9f155a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_NOP_IO_DELAY) |
(1 << KVM_FEATURE_CLOCKSOURCE2) |
(1 << KVM_FEATURE_ASYNC_PF) |
+ (1 << KVM_FEATURE_PV_EOI) |
(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
if (sched_info_on())
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 7e06ba1..07bdfab 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
if (!irqchip_in_kernel(v->kvm))
return v->arch.interrupt.pending;
- if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */
+ if (kvm_apic_has_interrupt(v) < 0) { /* LAPIC */
if (kvm_apic_accept_pic_intr(v)) {
s = pic_irqchip(v->kvm); /* PIC */
return s->output;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93c1574..b4f7013 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
irq->level, irq->trig_mode);
}
+static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
+{
+
+ return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
+ sizeof(val));
+}
+
+static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
+{
+
+ return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
+ sizeof(*val));
+}
+
+static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
+}
+
+static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
+{
+ u8 val;
+ if (pv_eoi_get_user(vcpu, &val) < 0)
+ apic_debug("Can't read EOI MSR value: 0x%llx\n",
+ (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+ return val & 0x1;
+}
+
+static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
+{
+ if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
+ apic_debug("Can't set EOI MSR value: 0x%llx\n",
+ (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+ return;
+ }
+ __set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
+static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
+{
+ if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
+ apic_debug("Can't clear EOI MSR value: 0x%llx\n",
+ (unsigned long long)vcpi->arch.pv_eoi.msr_val);
+ return;
+ }
+ __clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
static inline int apic_find_highest_isr(struct kvm_lapic *apic)
{
int result;
@@ -482,16 +530,26 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
}
-static void apic_set_eoi(struct kvm_lapic *apic)
+static int apic_set_eoi(struct kvm_lapic *apic)
{
int vector = apic_find_highest_isr(apic);
+
+ trace_kvm_eoi(apic, vector);
+
/*
* Not every write EOI will has corresponding ISR,
* one example is when Kernel check timer on setup_IO_APIC
*/
if (vector == -1)
- return;
+ return vector;
+ /*
+ * It's legal for guest to ignore the PV EOI optimization
+ * and signal EOI by APIC write. If this happens, clear
+ * PV EOI on guest's behalf.
+ */
+ if (pv_eoi_enabled(apic->vcpu))
+ pv_eoi_clr_pending(apic->vcpu);
apic_clear_vector(vector, apic->regs + APIC_ISR);
apic_update_ppr(apic);
@@ -505,6 +563,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
}
kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+ return vector;
}
static void apic_send_ipi(struct kvm_lapic *apic)
@@ -1085,6 +1144,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
atomic_set(&apic->lapic_timer.pending, 0);
if (kvm_vcpu_is_bsp(vcpu))
vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
+ vcpu->arch.pv_eoi.msr_val = 0;
apic_update_ppr(apic);
vcpu->arch.apic_arb_prio = 0;
@@ -1211,9 +1271,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
apic_update_ppr(apic);
highest_irr = apic_find_highest_irr(apic);
- if ((highest_irr == -1) ||
- ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
+ if (highest_irr == -1)
return -1;
+ if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
+ return -2;
return highest_irr;
}
@@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
int vector = kvm_apic_has_interrupt(vcpu);
struct kvm_lapic *apic = vcpu->arch.apic;
- if (vector == -1)
+ /* Detect interrupt nesting and disable EOI optimization */
+ if (pv_eoi_enabled(vcpu) && vector == -2)
+ pv_eoi_clr_pending(vcpu);
+
+ if (vector < 0)
return -1;
+ if (pv_eoi_enabled(vcpu)) {
+ if (kvm_ioapic_handles_vector(vcpu->kvm, vector))
+ pv_eoi_clr_pending(vcpu);
+ else
+ pv_eoi_set_pending(vcpu);
+ }
+
apic_set_vector(vector, apic->regs + APIC_ISR);
apic_update_ppr(apic);
apic_clear_irr(vector, apic);
@@ -1262,6 +1334,18 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
MSR_IA32_APICBASE_BASE;
kvm_apic_set_version(vcpu);
+ if (pv_eoi_enabled(apic->vcpu)) {
+ /*
+ * Update shadow bit in apic_attention bitmask
+ * from guest memory.
+ */
+ if (pv_eoi_get_pending(apic->vcpu))
+ __set_bit(KVM_APIC_PV_EOI_PENDING,
+ &vcpu->arch.apic_attention);
+ else
+ __clear_bit(KVM_APIC_PV_EOI_PENDING,
+ &vcpu->arch.apic_attention);
+ }
apic_update_ppr(apic);
hrtimer_cancel(&apic->lapic_timer.timer);
update_divide_count(apic);
@@ -1283,11 +1367,42 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
}
+/*
+ * apic_update_pv_eoi - called on vmexit
+ *
+ * Detect whether guest triggered PV EOI since the
+ * last entry. If yes, set EOI on guests's behalf.
+ */
+static void apic_update_pv_eoi(struct kvm_lapic *apic)
+{
+ /*
+ * PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
+ * and KVM_PV_EOI_ENABLED in guest memory as follows:
+ *
+ * KVM_APIC_PV_EOI_PENDING is unset:
+ * -> host disabled PV EOI.
+ * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
+ * -> host enabled PV EOI, guest did not execute EOI yet.
+ * KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
+ * -> host enabled PV EOI, guest executed EOI.
+ */
+ int vector;
+ BUG_ON(!pv_eoi_enabled(apic->vcpu));
+ if (pv_eoi_get_pending(apic->vcpu))
+ return;
+ __clear_bit(KVM_APIC_PV_EOI_PENDING, &apic->vcpu->arch.apic_attention);
+ vector = apic_set_eoi(apic);
+ trace_kvm_pv_eoi(apic, vector);
+}
+
void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
{
u32 data;
void *vapic;
+ if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
+ apic_update_pv_eoi(vcpu->arch.apic);
+
if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;
@@ -1394,3 +1509,15 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
return 0;
}
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
+{
+ u64 addr = data & ~KVM_MSR_ENABLED;
+ if (pv_eoi_enabled(vcpu))
+ pv_eoi_clr_pending(vcpu);
+ vcpu->arch.pv_eoi.msr_val = data;
+ if (!pv_eoi_enabled(vcpu))
+ return 0;
+ return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
+ addr);
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6f4ce25..149ef01 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -60,4 +60,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
}
+
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
#endif
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d264..851914e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
__entry->coalesced ? " (coalesced)" : "")
);
+TRACE_EVENT(kvm_eoi,
+ TP_PROTO(struct kvm_lapic *apic, int vector),
+ TP_ARGS(apic, vector),
+
+ TP_STRUCT__entry(
+ __field( __u32, apicid )
+ __field( int, vector )
+ ),
+
+ TP_fast_assign(
+ __entry->apicid = apic->vcpu->vcpu_id;
+ __entry->vector = vector;
+ ),
+
+ TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
+TRACE_EVENT(kvm_pv_eoi,
+ TP_PROTO(struct kvm_lapic *apic, int vector),
+ TP_ARGS(apic, vector),
+
+ TP_STRUCT__entry(
+ __field( __u32, apicid )
+ __field( int, vector )
+ ),
+
+ TP_fast_assign(
+ __entry->apicid = apic->vcpu->vcpu_id;
+ __entry->vector = vector;
+ ),
+
+ TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
/*
* Tracepoint for nested VMRUN
*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 185a2b8..4d77af0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
#ifdef CONFIG_X86_64
@@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
break;
+ case MSR_KVM_PV_EOI_EN:
+ if (kvm_lapic_enable_pv_eoi(vcpu, data))
+ return 1;
+ break;
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
--
MST
Document the new EOI MSR. Couldn't decide whether this change belongs
conceptually on guest or host side, so a separate patch.
Signed-off-by: Michael S. Tsirkin <[email protected]>
---
Documentation/virtual/kvm/msr.txt | 32 ++++++++++++++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 5031780..3481f1b 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -219,3 +219,35 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
steal: the amount of time in which this vCPU did not run, in
nanoseconds. Time during which the vcpu is idle, will not be
reported as steal time.
+
+MSR_KVM_EOI_EN: 0x4b564d04
+ data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
+ when disabled. When enabled, bits 63-1 hold 2-byte aligned physical address
+ of a 2 byte memory area which must be in guest RAM and must be zeroed.
+
+ The first, least significant bit of 2 byte memory location will be
+ written to by the hypervisor, typically at the time of interrupt
+ injection. Value of 1 means that guest can skip writing EOI to the apic
+ (using MSR or MMIO write); instead, it is sufficient to signal
+ EOI by clearing the bit in guest memory - this location will
+ later be polled by the hypervisor.
+ Value of 0 means that the EOI write is required.
+
+ It is always safe for the guest to ignore the optimization and perform
+ the APIC EOI write anyway.
+
+ Hypervisor is guaranteed to only modify this least
+ significant bit while in the current VCPU context, this means that
+ guest does not need to use either lock prefix or memory ordering
+ primitives to synchronise with the hypervisor.
+
+ However, hypervisor can set and clear this memory bit at any time:
+ therefore to make sure hypervisor does not interrupt the
+ guest and clear the least significant bit in the memory area
+ in the window between guest testing it to detect
+ whether it can skip EOI apic write and between guest
+ clearing it to signal EOI to the hypervisor,
+ guest must both read the least sgnificant bit in the memory area and
+ clear it using a single CPU instruction, such as test and clear, or
+ compare and exchange.
+
--
MST
Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
attention bitmask but kvm still syncs lapic unconditionally.
As that commit suggested and in anticipation of adding more attention
bits, only sync lapic if(apic_attention).
Signed-off-by: Michael S. Tsirkin <[email protected]>
---
arch/x86/kvm/x86.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4d77af0..4fa26f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5375,7 +5375,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(vcpu->arch.tsc_always_catchup))
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
- kvm_lapic_sync_from_vapic(vcpu);
+ if (unlikely(vcpu->arch.apic_attention))
+ kvm_lapic_sync_from_vapic(vcpu);
r = kvm_x86_ops->handle_exit(vcpu);
out:
--
MST
On Wed, May 16, 2012 at 02:45:50PM +0300, Michael S. Tsirkin wrote:
> I'm looking at reducing the interrupt overhead for virtualized guests:
> some workloads spend a large part of their time processing interrupts.
> This patchset supplies infrastructure to reduce the IRQ ack overhead on
> x86: the idea is to add an eoi_write callback that we can then optimize
> without touching other apic functionality.
Ugh. I lost first part of the patchset.
Sorry.
Will send a fixed one now.
On Wed, May 16, 2012 at 02:46:12PM +0300, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
>
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
>
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
>
> For testing results see description of previous patch
> 'kvm_para: guest side for eoi avoidance'.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 12 ++++
> arch/x86/kvm/cpuid.c | 1 +
> arch/x86/kvm/irq.c | 2 +-
> arch/x86/kvm/lapic.c | 137 +++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/lapic.h | 2 +
> arch/x86/kvm/trace.h | 34 ++++++++++
> arch/x86/kvm/x86.c | 5 ++
> 7 files changed, 187 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 32297f2..3ded418 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -174,6 +174,13 @@ enum {
>
> /* apic attention bits */
> #define KVM_APIC_CHECK_VAPIC 0
> +/*
> + * The following bit is set with PV-EOI, unset on EOI.
> + * We detect PV-EOI changes by guest by comparing
> + * this bit with PV-EOI in guest memory.
> + * See the implementation in apic_update_pv_eoi.
> + */
> +#define KVM_APIC_PV_EOI_PENDING 1
>
> /*
> * We don't want allocation failures within the mmu code, so we preallocate
> @@ -485,6 +492,11 @@ struct kvm_vcpu_arch {
> u64 length;
> u64 status;
> } osvw;
> +
> + struct {
> + u64 msr_val;
> + struct gfn_to_hva_cache data;
> + } pv_eoi;
> };
>
> struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index eba8345..a9f155a 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_NOP_IO_DELAY) |
> (1 << KVM_FEATURE_CLOCKSOURCE2) |
> (1 << KVM_FEATURE_ASYNC_PF) |
> + (1 << KVM_FEATURE_PV_EOI) |
> (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>
> if (sched_info_on())
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 7e06ba1..07bdfab 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> if (!irqchip_in_kernel(v->kvm))
> return v->arch.interrupt.pending;
>
> - if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */
> + if (kvm_apic_has_interrupt(v) < 0) { /* LAPIC */
> if (kvm_apic_accept_pic_intr(v)) {
> s = pic_irqchip(v->kvm); /* PIC */
> return s->output;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 93c1574..b4f7013 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
> irq->level, irq->trig_mode);
> }
>
> +static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> +{
> +
> + return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
> + sizeof(val));
> +}
> +
> +static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> +{
> +
> + return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
> + sizeof(*val));
> +}
> +
> +static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
> +}
> +
> +static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
> +{
> + u8 val;
> + if (pv_eoi_get_user(vcpu, &val) < 0)
> + apic_debug("Can't read EOI MSR value: 0x%llx\n",
> + (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> + return val & 0x1;
> +}
> +
> +static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
> +{
> + if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
> + apic_debug("Can't set EOI MSR value: 0x%llx\n",
> + (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> + return;
> + }
> + __set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> +static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
> +{
> + if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
> + apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> + (unsigned long long)vcpi->arch.pv_eoi.msr_val);
> + return;
> + }
> + __clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> {
> int result;
> @@ -482,16 +530,26 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
> return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> }
>
> -static void apic_set_eoi(struct kvm_lapic *apic)
> +static int apic_set_eoi(struct kvm_lapic *apic)
> {
> int vector = apic_find_highest_isr(apic);
> +
> + trace_kvm_eoi(apic, vector);
> +
> /*
> * Not every write EOI will has corresponding ISR,
> * one example is when Kernel check timer on setup_IO_APIC
> */
> if (vector == -1)
> - return;
> + return vector;
>
> + /*
> + * It's legal for guest to ignore the PV EOI optimization
> + * and signal EOI by APIC write. If this happens, clear
> + * PV EOI on guest's behalf.
> + */
> + if (pv_eoi_enabled(apic->vcpu))
> + pv_eoi_clr_pending(apic->vcpu);
> apic_clear_vector(vector, apic->regs + APIC_ISR);
> apic_update_ppr(apic);
>
> @@ -505,6 +563,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
> kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> }
> kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> + return vector;
> }
>
> static void apic_send_ipi(struct kvm_lapic *apic)
> @@ -1085,6 +1144,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
> atomic_set(&apic->lapic_timer.pending, 0);
> if (kvm_vcpu_is_bsp(vcpu))
> vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> + vcpu->arch.pv_eoi.msr_val = 0;
> apic_update_ppr(apic);
>
> vcpu->arch.apic_arb_prio = 0;
> @@ -1211,9 +1271,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>
> apic_update_ppr(apic);
> highest_irr = apic_find_highest_irr(apic);
> - if ((highest_irr == -1) ||
> - ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> + if (highest_irr == -1)
> return -1;
> + if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> + return -2;
> return highest_irr;
> }
>
> @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> int vector = kvm_apic_has_interrupt(vcpu);
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> - if (vector == -1)
> + /* Detect interrupt nesting and disable EOI optimization */
> + if (pv_eoi_enabled(vcpu) && vector == -2)
> + pv_eoi_clr_pending(vcpu);
> +
> + if (vector < 0)
With interrupt window exiting, the guest will exit:
- as soon as it sets RFLAGS.IF=1 and there is any
interrupt pending in IRR.
- any new interrupt is set in IRR will kick vcpu
out of guest mode and recalculate interrupt-window-exiting.
Doesnt this make this bit unnecessary ?
On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > int vector = kvm_apic_has_interrupt(vcpu);
> > struct kvm_lapic *apic = vcpu->arch.apic;
> >
> > - if (vector == -1)
> > + /* Detect interrupt nesting and disable EOI optimization */
> > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > + pv_eoi_clr_pending(vcpu);
> > +
> > + if (vector < 0)
>
> With interrupt window exiting, the guest will exit:
>
> - as soon as it sets RFLAGS.IF=1 and there is any
> interrupt pending in IRR.
> - any new interrupt is set in IRR will kick vcpu
> out of guest mode and recalculate interrupt-window-exiting.
>
> Doesnt this make this bit unnecessary ?
Looks like we could cut it out. But I'm not sure how architectural it is
that we exit on interrupt window.
I guess there are reasons to exit on interrupt window but
isn't it better to make the feature independent of it?
This almost never happens in my testing anyway, so
however we handle it is unlikely to affect performance.
--
MST
On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > int vector = kvm_apic_has_interrupt(vcpu);
> > > struct kvm_lapic *apic = vcpu->arch.apic;
> > >
> > > - if (vector == -1)
> > > + /* Detect interrupt nesting and disable EOI optimization */
> > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > + pv_eoi_clr_pending(vcpu);
> > > +
> > > + if (vector < 0)
> >
> > With interrupt window exiting, the guest will exit:
> >
> > - as soon as it sets RFLAGS.IF=1 and there is any
> > interrupt pending in IRR.
> > - any new interrupt is set in IRR will kick vcpu
> > out of guest mode and recalculate interrupt-window-exiting.
> >
> > Doesnt this make this bit unnecessary ?
>
> Looks like we could cut it out. But I'm not sure how architectural it is
> that we exit on interrupt window.
We request exit on interrupt window only if there is pending irq that
can be delivered on a guest entry.
> I guess there are reasons to exit on interrupt window but
> isn't it better to make the feature independent of it?
>
> This almost never happens in my testing anyway, so
> however we handle it is unlikely to affect performance.
>
> --
> MST
--
Gleb.
On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > >
> > > > - if (vector == -1)
> > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > + pv_eoi_clr_pending(vcpu);
> > > > +
> > > > + if (vector < 0)
> > >
> > > With interrupt window exiting, the guest will exit:
> > >
> > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > interrupt pending in IRR.
> > > - any new interrupt is set in IRR will kick vcpu
> > > out of guest mode and recalculate interrupt-window-exiting.
> > >
> > > Doesnt this make this bit unnecessary ?
> >
> > Looks like we could cut it out. But I'm not sure how architectural it is
> > that we exit on interrupt window.
> We request exit on interrupt window only if there is pending irq that
> can be delivered on a guest entry.
Aha. If so what Marcelo proposed won't work I think: if we inject A then B
which is lower priority, we need an exit on EOI, we can't inject
immediately.
> > I guess there are reasons to exit on interrupt window but
> > isn't it better to make the feature independent of it?
> >
> > This almost never happens in my testing anyway, so
> > however we handle it is unlikely to affect performance.
> >
> > --
> > MST
>
> --
> Gleb.
On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > >
> > > > - if (vector == -1)
> > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > + pv_eoi_clr_pending(vcpu);
> > > > +
> > > > + if (vector < 0)
> > >
> > > With interrupt window exiting, the guest will exit:
> > >
> > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > interrupt pending in IRR.
> > > - any new interrupt is set in IRR will kick vcpu
> > > out of guest mode and recalculate interrupt-window-exiting.
> > >
> > > Doesnt this make this bit unnecessary ?
> >
> > Looks like we could cut it out. But I'm not sure how architectural it is
> > that we exit on interrupt window.
> > I guess there are reasons to exit on interrupt window but
> > isn't it better to make the feature independent of it?
>
> Hum... not sure. Is it helpful for the Hyper-V interface?
>
> > This almost never happens in my testing anyway, so
> > however we handle it is unlikely to affect performance.
>
> It decreases the amount of state that must be maintained.
>
> BTW there is a bug covered by interrupt window exiting:
>
> vcpu0 host
> - irr 5 set
> - isr 5 set, irr 5 cleared
> - eoi_skip bit not set,
> no other bit set in irr.
> - enter guest
>
> irr 4 set
> kick vcpu0 out of guest mode
>
> - eoi pending bit not set
> (previous interrupt injection
> still pending)
> - skip eoi
>
> If it were not for interrupt window exiting, this would
> inject vector 4 on an unrelated exit who knows how long
> in the future.
>
> Also note optimization depends on the fact that the host
> kicks vcpu out unconditionally (so it is dependent on
> certain kvm implementation details).
>
>
>
Look we can summarize as follows: irq windows exit is
required both before and after this patch.
But it does not make the check above redundant.
On Wed, May 16, 2012 at 02:40:22PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:34:23PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > > > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > > > >
> > > > > > > > > > - if (vector == -1)
> > > > > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > > > > +
> > > > > > > > > > + if (vector < 0)
> > > > > > > > >
> > > > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > > >
> > > > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > > > > interrupt pending in IRR.
> > > > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > > >
> > > > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > > >
> > > > > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > > > > that we exit on interrupt window.
> > > > > > > We request exit on interrupt window only if there is pending irq that
> > > > > > > can be delivered on a guest entry.
> > > > > >
> > > > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > > > immediately.
> > > > >
> > > > > Please describe the scenario clearly, i can't see the problem.
> > > >
> > > > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > > > but irq window is not requested because 100 can't be injected, During
> > > > EOI exit 100 is injected.
> > >
> > > interrupt window exiting is always requested if IRR is pending. Except
> > > if NMI window is requested (which has higher priority).
> > >
> > This is where we enable irq window. We do it only if there is interrupt
> > pending:
> > if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> > kvm_x86_ops->enable_irq_window(vcpu);
> >
> > kvm_cpu_has_interrupt() checks apic by calling kvm_apic_has_interrupt()
> >
> >
> > kvm_apic_has_interrupt():
> > apic_update_ppr(apic);
> > highest_irr = apic_find_highest_irr(apic);
> > if ((highest_irr == -1) ||
> > ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > return -1;
> > And above checks IRR priority, so we can have IRR set and do not enable
> > irq window exit.
>
> Right, but then you cannot inject interrupt anyway so EOI is not
> necessary. Instead TPR-below-threshold trap is set which handles
> that case. No?
>
No. EOI clears ISR -> PROCPRI is recalculated -> pending interrupt is
injected.
--
Gleb.
On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > >
> > > > > - if (vector == -1)
> > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > + pv_eoi_clr_pending(vcpu);
> > > > > +
> > > > > + if (vector < 0)
> > > >
> > > > With interrupt window exiting, the guest will exit:
> > > >
> > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > interrupt pending in IRR.
> > > > - any new interrupt is set in IRR will kick vcpu
> > > > out of guest mode and recalculate interrupt-window-exiting.
> > > >
> > > > Doesnt this make this bit unnecessary ?
> > >
> > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > that we exit on interrupt window.
> > We request exit on interrupt window only if there is pending irq that
> > can be delivered on a guest entry.
>
> Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> which is lower priority, we need an exit on EOI, we can't inject
> immediately.
Please describe the scenario clearly, i can't see the problem.
On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > >
> > > > > > > > - if (vector == -1)
> > > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > > +
> > > > > > > > + if (vector < 0)
> > > > > > >
> > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > >
> > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > > interrupt pending in IRR.
> > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > >
> > > > > > > Doesnt this make this bit unnecessary ?
> > > > > >
> > > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > > that we exit on interrupt window.
> > > > > We request exit on interrupt window only if there is pending irq that
> > > > > can be delivered on a guest entry.
> > > >
> > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > immediately.
> > >
> > > Please describe the scenario clearly, i can't see the problem.
> >
> > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > but irq window is not requested because 100 can't be injected, During
> > EOI exit 100 is injected.
>
> interrupt window exiting is always requested if IRR is pending. Except
> if NMI window is requested (which has higher priority).
>
This is where we enable irq window. We do it only if there is interrupt
pending:
if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
kvm_x86_ops->enable_irq_window(vcpu);
kvm_cpu_has_interrupt() checks apic by calling kvm_apic_has_interrupt()
kvm_apic_has_interrupt():
apic_update_ppr(apic);
highest_irr = apic_find_highest_irr(apic);
if ((highest_irr == -1) ||
((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
return -1;
And above checks IRR priority, so we can have IRR set and do not enable
irq window exit.
> What am i missing here?
--
Gleb.
On Wed, May 16, 2012 at 08:34:23PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > > >
> > > > > > > > > - if (vector == -1)
> > > > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > > > +
> > > > > > > > > + if (vector < 0)
> > > > > > > >
> > > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > >
> > > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > > > interrupt pending in IRR.
> > > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > >
> > > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > >
> > > > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > > > that we exit on interrupt window.
> > > > > > We request exit on interrupt window only if there is pending irq that
> > > > > > can be delivered on a guest entry.
> > > > >
> > > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > > immediately.
> > > >
> > > > Please describe the scenario clearly, i can't see the problem.
> > >
> > > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > > but irq window is not requested because 100 can't be injected, During
> > > EOI exit 100 is injected.
> >
> > interrupt window exiting is always requested if IRR is pending. Except
> > if NMI window is requested (which has higher priority).
> >
> This is where we enable irq window. We do it only if there is interrupt
> pending:
> if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> kvm_x86_ops->enable_irq_window(vcpu);
>
> kvm_cpu_has_interrupt() checks apic by calling kvm_apic_has_interrupt()
>
>
> kvm_apic_has_interrupt():
> apic_update_ppr(apic);
> highest_irr = apic_find_highest_irr(apic);
> if ((highest_irr == -1) ||
> ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> return -1;
> And above checks IRR priority, so we can have IRR set and do not enable
> irq window exit.
Right, but then you cannot inject interrupt anyway so EOI is not
necessary. Instead TPR-below-threshold trap is set which handles
that case. No?
On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > >
> > > > > > > - if (vector == -1)
> > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > +
> > > > > > > + if (vector < 0)
> > > > > >
> > > > > > With interrupt window exiting, the guest will exit:
> > > > > >
> > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > interrupt pending in IRR.
> > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > >
> > > > > > Doesnt this make this bit unnecessary ?
> > > > >
> > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > that we exit on interrupt window.
> > > > We request exit on interrupt window only if there is pending irq that
> > > > can be delivered on a guest entry.
> > >
> > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > which is lower priority, we need an exit on EOI, we can't inject
> > > immediately.
> >
> > Please describe the scenario clearly, i can't see the problem.
>
> During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> but irq window is not requested because 100 can't be injected, During
> EOI exit 100 is injected.
interrupt window exiting is always requested if IRR is pending. Except
if NMI window is requested (which has higher priority).
What am i missing here?
On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >
> > > > > > - if (vector == -1)
> > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > + if (vector < 0)
> > > > >
> > > > > With interrupt window exiting, the guest will exit:
> > > > >
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > >
> > > > > Doesnt this make this bit unnecessary ?
> > > >
> > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > > I guess there are reasons to exit on interrupt window but
> > > > isn't it better to make the feature independent of it?
> > >
> > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > >
> > > > This almost never happens in my testing anyway, so
> > > > however we handle it is unlikely to affect performance.
> > >
> > > It decreases the amount of state that must be maintained.
> > >
> > > BTW there is a bug covered by interrupt window exiting:
> > >
> > > vcpu0 host
> > > - irr 5 set
> > > - isr 5 set, irr 5 cleared
> > > - eoi_skip bit not set,
> > > no other bit set in irr.
> > > - enter guest
> > >
> > > irr 4 set
> > > kick vcpu0 out of guest mode
> > >
> > > - eoi pending bit not set
> > > (previous interrupt injection
> > > still pending)
> > > - skip eoi
> > >
> > > If it were not for interrupt window exiting, this would
> > > inject vector 4 on an unrelated exit who knows how long
> > > in the future.
> > >
> > > Also note optimization depends on the fact that the host
> > > kicks vcpu out unconditionally (so it is dependent on
> > > certain kvm implementation details).
> > >
> >
> > Look we can summarize as follows: irq windows exit is
> > required both before and after this patch.
> > But it does not make the check above redundant.
>
> Right, it is not redundant.
>
> The above is still a bug: a case where eoi pending bit is not updated
> properly.
When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
Michael does your patch do that?
>
> How come is this compatible with hyper-v again? Enlight me.
--
Gleb.
On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >
> > > > > > - if (vector == -1)
> > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > + if (vector < 0)
> > > > >
> > > > > With interrupt window exiting, the guest will exit:
> > > > >
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > >
> > > > > Doesnt this make this bit unnecessary ?
> > > >
> > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > > I guess there are reasons to exit on interrupt window but
> > > > isn't it better to make the feature independent of it?
> > >
> > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > >
> > > > This almost never happens in my testing anyway, so
> > > > however we handle it is unlikely to affect performance.
> > >
> > > It decreases the amount of state that must be maintained.
> > >
> > > BTW there is a bug covered by interrupt window exiting:
> > >
> > > vcpu0 host
> > > - irr 5 set
> > > - isr 5 set, irr 5 cleared
> > > - eoi_skip bit not set,
> > > no other bit set in irr.
> > > - enter guest
> > >
> > > irr 4 set
> > > kick vcpu0 out of guest mode
> > >
> > > - eoi pending bit not set
> > > (previous interrupt injection
> > > still pending)
> > > - skip eoi
> > >
> > > If it were not for interrupt window exiting, this would
> > > inject vector 4 on an unrelated exit who knows how long
> > > in the future.
> > >
> > > Also note optimization depends on the fact that the host
> > > kicks vcpu out unconditionally (so it is dependent on
> > > certain kvm implementation details).
> > >
> >
> > Look we can summarize as follows: irq windows exit is
> > required both before and after this patch.
> > But it does not make the check above redundant.
>
> Right, it is not redundant.
>
> The above is still a bug: a case where eoi pending bit is not updated
> properly.
>
> How come is this compatible with hyper-v again? Enlight me.
Set kvm
to point pv_eoi msr data to va and it will just work.
--
MST
On Wed, May 16, 2012 at 09:25:20PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > >
> > > > > > > - if (vector == -1)
> > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > +
> > > > > > > + if (vector < 0)
> > > > > >
> > > > > > With interrupt window exiting, the guest will exit:
> > > > > >
> > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > interrupt pending in IRR.
> > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > >
> > > > > > Doesnt this make this bit unnecessary ?
> > > > >
> > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > that we exit on interrupt window.
> > > > > I guess there are reasons to exit on interrupt window but
> > > > > isn't it better to make the feature independent of it?
> > > >
> > > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > >
> > > > > This almost never happens in my testing anyway, so
> > > > > however we handle it is unlikely to affect performance.
> > > >
> > > > It decreases the amount of state that must be maintained.
> > > >
> > > > BTW there is a bug covered by interrupt window exiting:
> > > >
> > > > vcpu0 host
> > > > - irr 5 set
> > > > - isr 5 set, irr 5 cleared
> > > > - eoi_skip bit not set,
> > > > no other bit set in irr.
> > > > - enter guest
> > > >
> > > > irr 4 set
> > > > kick vcpu0 out of guest mode
> > > >
> > > > - eoi pending bit not set
> > > > (previous interrupt injection
> > > > still pending)
> > > > - skip eoi
> > > >
> > > > If it were not for interrupt window exiting, this would
> > > > inject vector 4 on an unrelated exit who knows how long
> > > > in the future.
> > > >
> > > > Also note optimization depends on the fact that the host
> > > > kicks vcpu out unconditionally (so it is dependent on
> > > > certain kvm implementation details).
> > > >
> > >
> > > Look we can summarize as follows: irq windows exit is
> > > required both before and after this patch.
> > > But it does not make the check above redundant.
> >
> > Right, it is not redundant.
> >
> > The above is still a bug: a case where eoi pending bit is not updated
> > properly.
> When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
> Michael does your patch do that?
I think this does it:
/* Detect interrupt nesting and disable EOI optimization */
if (pv_eoi_enabled(vcpu) && vector == -2)
pv_eoi_clr_pending(vcpu);
-2 is returned when irr is set.
> >
> > How come is this compatible with hyper-v again? Enlight me.
>
> --
> Gleb.
On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >
> > > > > > - if (vector == -1)
> > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > + if (vector < 0)
> > > > >
> > > > > With interrupt window exiting, the guest will exit:
> > > > >
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > >
> > > > > Doesnt this make this bit unnecessary ?
> > > >
> > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > We request exit on interrupt window only if there is pending irq that
> > > can be delivered on a guest entry.
> >
> > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > which is lower priority, we need an exit on EOI, we can't inject
> > immediately.
>
> Please describe the scenario clearly, i can't see the problem.
During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
but irq window is not requested because 100 can't be injected, During
EOI exit 100 is injected.
--
Gleb.
On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > int vector = kvm_apic_has_interrupt(vcpu);
> > > struct kvm_lapic *apic = vcpu->arch.apic;
> > >
> > > - if (vector == -1)
> > > + /* Detect interrupt nesting and disable EOI optimization */
> > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > + pv_eoi_clr_pending(vcpu);
> > > +
> > > + if (vector < 0)
> >
> > With interrupt window exiting, the guest will exit:
> >
> > - as soon as it sets RFLAGS.IF=1 and there is any
> > interrupt pending in IRR.
> > - any new interrupt is set in IRR will kick vcpu
> > out of guest mode and recalculate interrupt-window-exiting.
> >
> > Doesnt this make this bit unnecessary ?
>
> Looks like we could cut it out. But I'm not sure how architectural it is
> that we exit on interrupt window.
> I guess there are reasons to exit on interrupt window but
> isn't it better to make the feature independent of it?
Hum... not sure. Is it helpful for the Hyper-V interface?
> This almost never happens in my testing anyway, so
> however we handle it is unlikely to affect performance.
It decreases the amount of state that must be maintained.
BTW there is a bug covered by interrupt window exiting:
vcpu0 host
- irr 5 set
- isr 5 set, irr 5 cleared
- eoi_skip bit not set,
no other bit set in irr.
- enter guest
irr 4 set
kick vcpu0 out of guest mode
- eoi pending bit not set
(previous interrupt injection
still pending)
- skip eoi
If it were not for interrupt window exiting, this would
inject vector 4 on an unrelated exit who knows how long
in the future.
Also note optimization depends on the fact that the host
kicks vcpu out unconditionally (so it is dependent on
certain kvm implementation details).
On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > >
> > > > > > - if (vector == -1)
> > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > +
> > > > > > + if (vector < 0)
> > > > >
> > > > > With interrupt window exiting, the guest will exit:
> > > > >
> > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > interrupt pending in IRR.
> > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > >
> > > > > Doesnt this make this bit unnecessary ?
> > > >
> > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > that we exit on interrupt window.
> > > > I guess there are reasons to exit on interrupt window but
> > > > isn't it better to make the feature independent of it?
> > >
> > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > >
> > > > This almost never happens in my testing anyway, so
> > > > however we handle it is unlikely to affect performance.
> > >
> > > It decreases the amount of state that must be maintained.
> > >
> > > BTW there is a bug covered by interrupt window exiting:
> > >
> > > vcpu0 host
> > > - irr 5 set
> > > - isr 5 set, irr 5 cleared
> > > - eoi_skip bit not set,
> > > no other bit set in irr.
> > > - enter guest
> > >
> > > irr 4 set
> > > kick vcpu0 out of guest mode
> > >
> > > - eoi pending bit not set
> > > (previous interrupt injection
> > > still pending)
> > > - skip eoi
> > >
> > > If it were not for interrupt window exiting, this would
> > > inject vector 4 on an unrelated exit who knows how long
> > > in the future.
> > >
> > > Also note optimization depends on the fact that the host
> > > kicks vcpu out unconditionally (so it is dependent on
> > > certain kvm implementation details).
> > >
> >
> > Look we can summarize as follows: irq windows exit is
> > required both before and after this patch.
> > But it does not make the check above redundant.
>
> Right, it is not redundant.
>
> The above is still a bug: a case where eoi pending bit is not updated
> properly.
The diagram shows pv eoi in guest is disabled so no need
for pending bit?
> How come is this compatible with hyper-v again? Enlight me.
On Wed, May 16, 2012 at 02:23:45PM -0300, Marcelo Tosatti wrote:
> On Wed, May 16, 2012 at 08:21:55PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 02:04:27PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 07:50:48PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 07:32:06PM +0300, Gleb Natapov wrote:
> > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > >
> > > > > > > > - if (vector == -1)
> > > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > > +
> > > > > > > > + if (vector < 0)
> > > > > > >
> > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > >
> > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > > interrupt pending in IRR.
> > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > >
> > > > > > > Doesnt this make this bit unnecessary ?
> > > > > >
> > > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > > that we exit on interrupt window.
> > > > > We request exit on interrupt window only if there is pending irq that
> > > > > can be delivered on a guest entry.
> > > >
> > > > Aha. If so what Marcelo proposed won't work I think: if we inject A then B
> > > > which is lower priority, we need an exit on EOI, we can't inject
> > > > immediately.
> > >
> > > Please describe the scenario clearly, i can't see the problem.
> >
> > During vcpu entry there are two IRRs set 100 and 200. 200 is injected,
> > but irq window is not requested because 100 can't be injected, During
> > EOI exit 100 is injected.
>
> interrupt window exiting is always requested if IRR is pending. Except
> if NMI window is requested (which has higher priority).
>
> What am i missing here?
That if EOI was not yet signalled by guest when we inject
the low priority irq, then windows exiting is not
enough we need an exit on EOI.
On Wed, May 16, 2012 at 09:29:49PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 09:25:20PM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> > > On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > >
> > > > > > > > - if (vector == -1)
> > > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > > +
> > > > > > > > + if (vector < 0)
> > > > > > >
> > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > >
> > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > > interrupt pending in IRR.
> > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > >
> > > > > > > Doesnt this make this bit unnecessary ?
> > > > > >
> > > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > > that we exit on interrupt window.
> > > > > > I guess there are reasons to exit on interrupt window but
> > > > > > isn't it better to make the feature independent of it?
> > > > >
> > > > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > > >
> > > > > > This almost never happens in my testing anyway, so
> > > > > > however we handle it is unlikely to affect performance.
> > > > >
> > > > > It decreases the amount of state that must be maintained.
> > > > >
> > > > > BTW there is a bug covered by interrupt window exiting:
> > > > >
> > > > > vcpu0 host
> > > > > - irr 5 set
> > > > > - isr 5 set, irr 5 cleared
> > > > > - eoi_skip bit not set,
> > > > > no other bit set in irr.
> > > > > - enter guest
> > > > >
> > > > > irr 4 set
> > > > > kick vcpu0 out of guest mode
> > > > >
> > > > > - eoi pending bit not set
> > > > > (previous interrupt injection
> > > > > still pending)
> > > > > - skip eoi
> > > > >
> > > > > If it were not for interrupt window exiting, this would
> > > > > inject vector 4 on an unrelated exit who knows how long
> > > > > in the future.
> > > > >
> > > > > Also note optimization depends on the fact that the host
> > > > > kicks vcpu out unconditionally (so it is dependent on
> > > > > certain kvm implementation details).
> > > > >
> > > >
> > > > Look we can summarize as follows: irq windows exit is
> > > > required both before and after this patch.
> > > > But it does not make the check above redundant.
> > >
> > > Right, it is not redundant.
> > >
> > > The above is still a bug: a case where eoi pending bit is not updated
> > > properly.
> > When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
> > Michael does your patch do that?
>
> I think this does it:
> /* Detect interrupt nesting and disable EOI optimization */
> if (pv_eoi_enabled(vcpu) && vector == -2)
> pv_eoi_clr_pending(vcpu);
>
> -2 is returned when irr is set.
>
This code is reached from kvm_cpu_get_interrupt(), but this function will
not be called in above scenario.
>
> > >
> > > How come is this compatible with hyper-v again? Enlight me.
> >
> > --
> > Gleb.
--
Gleb.
On Wed, May 16, 2012 at 09:38:22PM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 09:29:49PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 09:25:20PM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 03:15:00PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > > > > > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > > > > > >
> > > > > > > > > - if (vector == -1)
> > > > > > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > > > > > + pv_eoi_clr_pending(vcpu);
> > > > > > > > > +
> > > > > > > > > + if (vector < 0)
> > > > > > > >
> > > > > > > > With interrupt window exiting, the guest will exit:
> > > > > > > >
> > > > > > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > > > > > interrupt pending in IRR.
> > > > > > > > - any new interrupt is set in IRR will kick vcpu
> > > > > > > > out of guest mode and recalculate interrupt-window-exiting.
> > > > > > > >
> > > > > > > > Doesnt this make this bit unnecessary ?
> > > > > > >
> > > > > > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > > > > > that we exit on interrupt window.
> > > > > > > I guess there are reasons to exit on interrupt window but
> > > > > > > isn't it better to make the feature independent of it?
> > > > > >
> > > > > > Hum... not sure. Is it helpful for the Hyper-V interface?
> > > > > >
> > > > > > > This almost never happens in my testing anyway, so
> > > > > > > however we handle it is unlikely to affect performance.
> > > > > >
> > > > > > It decreases the amount of state that must be maintained.
> > > > > >
> > > > > > BTW there is a bug covered by interrupt window exiting:
> > > > > >
> > > > > > vcpu0 host
> > > > > > - irr 5 set
> > > > > > - isr 5 set, irr 5 cleared
> > > > > > - eoi_skip bit not set,
> > > > > > no other bit set in irr.
> > > > > > - enter guest
> > > > > >
> > > > > > irr 4 set
> > > > > > kick vcpu0 out of guest mode
> > > > > >
> > > > > > - eoi pending bit not set
> > > > > > (previous interrupt injection
> > > > > > still pending)
> > > > > > - skip eoi
> > > > > >
> > > > > > If it were not for interrupt window exiting, this would
> > > > > > inject vector 4 on an unrelated exit who knows how long
> > > > > > in the future.
> > > > > >
> > > > > > Also note optimization depends on the fact that the host
> > > > > > kicks vcpu out unconditionally (so it is dependent on
> > > > > > certain kvm implementation details).
> > > > > >
> > > > >
> > > > > Look we can summarize as follows: irq windows exit is
> > > > > required both before and after this patch.
> > > > > But it does not make the check above redundant.
> > > >
> > > > Right, it is not redundant.
> > > >
> > > > The above is still a bug: a case where eoi pending bit is not updated
> > > > properly.
> > > When IRR is set while eoi_skip is enabled, eoi_skip should be cleared.
> > > Michael does your patch do that?
> >
> > I think this does it:
> > /* Detect interrupt nesting and disable EOI optimization */
> > if (pv_eoi_enabled(vcpu) && vector == -2)
> > pv_eoi_clr_pending(vcpu);
> >
> > -2 is returned when irr is set.
> >
> This code is reached from kvm_cpu_get_interrupt(), but this function will
> not be called in above scenario.
I think I see. So this shall fix it also makes code cleaner
(no -2 hack). Right? kvm_apic_has_interrupt is called correct?
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b4f7013..5a38e34 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
highest_irr = apic_find_highest_irr(apic);
if (highest_irr == -1)
return -1;
- if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
- return -2;
+ /* Detect interrupt nesting and disable EOI optimization */
+ if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
+ if (pv_eoi_enabled(vcpu))
+ pv_eoi_clr_pending(vcpu);
+ return -1;
+ }
return highest_irr;
}
@@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
int vector = kvm_apic_has_interrupt(vcpu);
struct kvm_lapic *apic = vcpu->arch.apic;
- /* Detect interrupt nesting and disable EOI optimization */
- if (pv_eoi_enabled(vcpu) && vector == -2)
- pv_eoi_clr_pending(vcpu);
-
if (vector < 0)
return -1;
On Wed, May 16, 2012 at 08:58:57PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 02:20:58PM -0300, Marcelo Tosatti wrote:
> > On Wed, May 16, 2012 at 07:22:47PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 12:49:40PM -0300, Marcelo Tosatti wrote:
> > > > > @@ -1245,9 +1306,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > > > int vector = kvm_apic_has_interrupt(vcpu);
> > > > > struct kvm_lapic *apic = vcpu->arch.apic;
> > > > >
> > > > > - if (vector == -1)
> > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > + if (pv_eoi_enabled(vcpu) && vector == -2)
> > > > > + pv_eoi_clr_pending(vcpu);
> > > > > +
> > > > > + if (vector < 0)
> > > >
> > > > With interrupt window exiting, the guest will exit:
> > > >
> > > > - as soon as it sets RFLAGS.IF=1 and there is any
> > > > interrupt pending in IRR.
> > > > - any new interrupt is set in IRR will kick vcpu
> > > > out of guest mode and recalculate interrupt-window-exiting.
> > > >
> > > > Doesnt this make this bit unnecessary ?
> > >
> > > Looks like we could cut it out. But I'm not sure how architectural it is
> > > that we exit on interrupt window.
> > > I guess there are reasons to exit on interrupt window but
> > > isn't it better to make the feature independent of it?
> >
> > Hum... not sure. Is it helpful for the Hyper-V interface?
> >
> > > This almost never happens in my testing anyway, so
> > > however we handle it is unlikely to affect performance.
> >
> > It decreases the amount of state that must be maintained.
> >
> > BTW there is a bug covered by interrupt window exiting:
> >
> > vcpu0 host
> > - irr 5 set
> > - isr 5 set, irr 5 cleared
> > - eoi_skip bit not set,
> > no other bit set in irr.
> > - enter guest
> >
> > irr 4 set
> > kick vcpu0 out of guest mode
> >
> > - eoi pending bit not set
> > (previous interrupt injection
> > still pending)
> > - skip eoi
> >
> > If it were not for interrupt window exiting, this would
> > inject vector 4 on an unrelated exit who knows how long
> > in the future.
> >
> > Also note optimization depends on the fact that the host
> > kicks vcpu out unconditionally (so it is dependent on
> > certain kvm implementation details).
> >
>
> Look we can summarize as follows: irq windows exit is
> required both before and after this patch.
> But it does not make the check above redundant.
Right, it is not redundant.
The above is still a bug: a case where eoi pending bit is not updated
properly.
How come is this compatible with hyper-v again? Enlight me.
On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > not be called in above scenario.
>
> I think I see. So this shall fix it also makes code cleaner
> (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
Yes, but its used by multiple callsites. Best to unify it (both setting
and clearing) in KVM_REQ_EVENT processing before guest entry.
On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > not be called in above scenario.
>
> I think I see. So this shall fix it also makes code cleaner
> (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b4f7013..5a38e34 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> highest_irr = apic_find_highest_irr(apic);
> if (highest_irr == -1)
> return -1;
> - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> - return -2;
> + /* Detect interrupt nesting and disable EOI optimization */
> + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> + if (pv_eoi_enabled(vcpu))
> + pv_eoi_clr_pending(vcpu);
> + return -1;
> + }
> return highest_irr;
> }
>
I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
state.
> @@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> int vector = kvm_apic_has_interrupt(vcpu);
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> - /* Detect interrupt nesting and disable EOI optimization */
> - if (pv_eoi_enabled(vcpu) && vector == -2)
> - pv_eoi_clr_pending(vcpu);
> -
> if (vector < 0)
> return -1;
>
--
Gleb.
On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > not be called in above scenario.
> >
> > I think I see. So this shall fix it also makes code cleaner
> > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index b4f7013..5a38e34 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > highest_irr = apic_find_highest_irr(apic);
> > if (highest_irr == -1)
> > return -1;
> > - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > - return -2;
> > + /* Detect interrupt nesting and disable EOI optimization */
> > + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > + if (pv_eoi_enabled(vcpu))
> > + pv_eoi_clr_pending(vcpu);
> > + return -1;
> > + }
> > return highest_irr;
> > }
> >
> I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> state.
OK, so let's rename it so it's clear it can mutate state?
> > @@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > int vector = kvm_apic_has_interrupt(vcpu);
> > struct kvm_lapic *apic = vcpu->arch.apic;
> >
> > - /* Detect interrupt nesting and disable EOI optimization */
> > - if (pv_eoi_enabled(vcpu) && vector == -2)
> > - pv_eoi_clr_pending(vcpu);
> > -
> > if (vector < 0)
> > return -1;
> >
>
> --
> Gleb.
On Thu, May 17, 2012 at 10:49:47AM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > not be called in above scenario.
> > >
> > > I think I see. So this shall fix it also makes code cleaner
> > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index b4f7013..5a38e34 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > highest_irr = apic_find_highest_irr(apic);
> > > if (highest_irr == -1)
> > > return -1;
> > > - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > - return -2;
> > > + /* Detect interrupt nesting and disable EOI optimization */
> > > + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > + if (pv_eoi_enabled(vcpu))
> > > + pv_eoi_clr_pending(vcpu);
> > > + return -1;
> > > + }
> > > return highest_irr;
> > > }
> > >
> > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > state.
>
> OK, so let's rename it so it's clear it can mutate state?
>
It is not about the name, it is about how function is used, and it is
used to check if there is something to deliver. We should clear pv_eoi
during interrupt delivery to apic. May be setting vcpu request bit and
calling pv_eoi_clr_pending() on a guest entry.
> > > @@ -1306,10 +1310,6 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > > int vector = kvm_apic_has_interrupt(vcpu);
> > > struct kvm_lapic *apic = vcpu->arch.apic;
> > >
> > > - /* Detect interrupt nesting and disable EOI optimization */
> > > - if (pv_eoi_enabled(vcpu) && vector == -2)
> > > - pv_eoi_clr_pending(vcpu);
> > > -
> > > if (vector < 0)
> > > return -1;
> > >
> >
> > --
> > Gleb.
--
Gleb.
On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > not be called in above scenario.
> > >
> > > I think I see. So this shall fix it also makes code cleaner
> > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index b4f7013..5a38e34 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > highest_irr = apic_find_highest_irr(apic);
> > > if (highest_irr == -1)
> > > return -1;
> > > - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > - return -2;
> > > + /* Detect interrupt nesting and disable EOI optimization */
> > > + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > + if (pv_eoi_enabled(vcpu))
> > > + pv_eoi_clr_pending(vcpu);
> > > + return -1;
> > > + }
> > > return highest_irr;
> > > }
> > >
> > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > state.
>
> OK, so let's rename it so it's clear it can mutate state?
>
No, let's refactor this so it makes sense. The {has|get}_interrupt
split is the cause of the problem, I think. We need a single function,
with callbacks that are called when an event happens. The callbacks can
request an irq window exit, inject an interrupt, play with pveoi, or
cause a #vmexit.
--
error compiling committee.c: too many arguments to function
On Thu, May 17, 2012 at 10:57:33AM +0300, Avi Kivity wrote:
> On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> > On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > > not be called in above scenario.
> > > >
> > > > I think I see. So this shall fix it also makes code cleaner
> > > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index b4f7013..5a38e34 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > > highest_irr = apic_find_highest_irr(apic);
> > > > if (highest_irr == -1)
> > > > return -1;
> > > > - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > > - return -2;
> > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > > + if (pv_eoi_enabled(vcpu))
> > > > + pv_eoi_clr_pending(vcpu);
> > > > + return -1;
> > > > + }
> > > > return highest_irr;
> > > > }
> > > >
> > > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > > state.
> >
> > OK, so let's rename it so it's clear it can mutate state?
> >
>
> No, let's refactor this so it makes sense. The {has|get}_interrupt
> split is the cause of the problem, I think. We need a single function,
> with callbacks that are called when an event happens. The callbacks can
> request an irq window exit, inject an interrupt, play with pveoi, or
> cause a #vmexit.
>
Not sure what do you mean here. I kind of like the code we have now, but
this may be because I understand it :)
--
Gleb.
On Thu, May 17, 2012 at 10:57:33AM +0300, Avi Kivity wrote:
> On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> > On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > > not be called in above scenario.
> > > >
> > > > I think I see. So this shall fix it also makes code cleaner
> > > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index b4f7013..5a38e34 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > > highest_irr = apic_find_highest_irr(apic);
> > > > if (highest_irr == -1)
> > > > return -1;
> > > > - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > > - return -2;
> > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > > + if (pv_eoi_enabled(vcpu))
> > > > + pv_eoi_clr_pending(vcpu);
> > > > + return -1;
> > > > + }
> > > > return highest_irr;
> > > > }
> > > >
> > > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > > state.
> >
> > OK, so let's rename it so it's clear it can mutate state?
> >
>
> No, let's refactor this so it makes sense. The {has|get}_interrupt
> split is the cause of the problem, I think. We need a single function,
> with callbacks that are called when an event happens. The callbacks can
> request an irq window exit, inject an interrupt, play with pveoi, or
> cause a #vmexit.
OK will look into this next week.
> --
> error compiling committee.c: too many arguments to function
On Thu, May 17, 2012 at 12:10:55PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2012 at 10:57:33AM +0300, Avi Kivity wrote:
> > On 05/17/2012 10:49 AM, Michael S. Tsirkin wrote:
> > > On Thu, May 17, 2012 at 10:28:41AM +0300, Gleb Natapov wrote:
> > > > On Wed, May 16, 2012 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > > > This code is reached from kvm_cpu_get_interrupt(), but this function will
> > > > > > not be called in above scenario.
> > > > >
> > > > > I think I see. So this shall fix it also makes code cleaner
> > > > > (no -2 hack). Right? kvm_apic_has_interrupt is called correct?
> > > > >
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index b4f7013..5a38e34 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -1273,8 +1273,12 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> > > > > highest_irr = apic_find_highest_irr(apic);
> > > > > if (highest_irr == -1)
> > > > > return -1;
> > > > > - if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > > > > - return -2;
> > > > > + /* Detect interrupt nesting and disable EOI optimization */
> > > > > + if ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)) {
> > > > > + if (pv_eoi_enabled(vcpu))
> > > > > + pv_eoi_clr_pending(vcpu);
> > > > > + return -1;
> > > > > + }
> > > > > return highest_irr;
> > > > > }
> > > > >
> > > > I do not like it. kvm_apic_has_interrupt() does not suppose to mutate
> > > > state.
> > >
> > > OK, so let's rename it so it's clear it can mutate state?
> > >
> >
> > No, let's refactor this so it makes sense. The {has|get}_interrupt
> > split is the cause of the problem, I think. We need a single function,
> > with callbacks that are called when an event happens. The callbacks can
> > request an irq window exit, inject an interrupt, play with pveoi, or
> > cause a #vmexit.
>
> OK will look into this next week.
>
What about looking into my suggestion with setting vcpu->request bit? I
think totally refactoring irq handling path should not be done as part
of this series.
--
Gleb.
On 05/17/2012 11:07 AM, Gleb Natapov wrote:
> >
> > No, let's refactor this so it makes sense. The {has|get}_interrupt
> > split is the cause of the problem, I think. We need a single function,
> > with callbacks that are called when an event happens. The callbacks can
> > request an irq window exit, inject an interrupt, play with pveoi, or
> > cause a #vmexit.
> >
> Not sure what do you mean here. I kind of like the code we have now, but
> this may be because I understand it :)
Right now we have
if (has_interrupt)
do something
if (get_interrupt)
do_something_else
this duplicates some of the logic and causes non-atomicty (which isn't a
problem per se, but requires us to think of what happens if conditions
change between the two steps).
What I'm thinking of is
void process_interrupt(bool (*handle)());
Where the return value tells us whether the interrupt was accepted by
the handler. The callback could decide to enable the irq window, to
queue the interrupt, or to #vmexit (note that the latter can return
either true or false, depending on whether vmx is configured to ack the
interrupt or not; svm would return true here if interrupts are intercepted).
--
error compiling committee.c: too many arguments to function
On 05/17/2012 12:12 PM, Gleb Natapov wrote:
> >
> > OK will look into this next week.
> >
> What about looking into my suggestion with setting vcpu->request bit? I
> think totally refactoring irq handling path should not be done as part
> of this series.
>
Certainly not as part of it, but maybe as a prerequisite.
--
error compiling committee.c: too many arguments to function
On 05/16/2012 02:46 PM, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
>
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
>
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
>
> For testing results see description of previous patch
> 'kvm_para: guest side for eoi avoidance'.
>
>
> + /*
> + * It's legal for guest to ignore the PV EOI optimization
> + * and signal EOI by APIC write. If this happens, clear
> + * PV EOI on guest's behalf.
> + */
> + if (pv_eoi_enabled(apic->vcpu))
> + pv_eoi_clr_pending(apic->vcpu);
I'm a little worried about all the clr_pending() calls scattered
around. What happens if we forget one? In particular, we might miss
one on nested vmentry.
A safer path is to always clear it, but to enable it again during
reentry if all conditions are satisified. Might be a little slower though.
--
error compiling committee.c: too many arguments to function
On Thu, May 17, 2012 at 12:24:30PM +0300, Avi Kivity wrote:
> On 05/17/2012 11:07 AM, Gleb Natapov wrote:
> > >
> > > No, let's refactor this so it makes sense. The {has|get}_interrupt
> > > split is the cause of the problem, I think. We need a single function,
> > > with callbacks that are called when an event happens. The callbacks can
> > > request an irq window exit, inject an interrupt, play with pveoi, or
> > > cause a #vmexit.
> > >
> > Not sure what do you mean here. I kind of like the code we have now, but
> > this may be because I understand it :)
>
> Right now we have
>
> if (has_interrupt)
> do something
> if (get_interrupt)
> do_something_else
>
Not exactly. Now we have:
if (has_interrupt && can inject)
inject(get_interrupt())
if (still has_interrupt)
notify me when I can inject it.
There is not if(get_interrupt).
> this duplicates some of the logic and causes non-atomicty (which isn't a
> problem per se, but requires us to think of what happens if conditions
> change between the two steps).
>
> What I'm thinking of is
>
> void process_interrupt(bool (*handle)());
Why we even want to pass different handle() to the function?
>
> Where the return value tells us whether the interrupt was accepted by
> the handler. The callback could decide to enable the irq window, to
> queue the interrupt, or to #vmexit (note that the latter can return
Queuing interrupt and requesting irq window ate not mutually exclusive.
> either true or false, depending on whether vmx is configured to ack the
> interrupt or not; svm would return true here if interrupts are intercepted).
>
> --
> error compiling committee.c: too many arguments to function
--
Gleb.