2016-12-14 10:59:19

by Denis Plotnikov

[permalink] [raw]
Subject: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT

When processing KVM_REQ_EVENT, apic_update_ppr is called which may set
KVM_REQ_EVENT again if the recalculated value of PPR becomes smaller
than the previous one. This results in cancelling the guest entry and
reiterating in vcpu_enter_guest.

However this is unnecessary because at this point KVM_REQ_EVENT is
already being processed and there are no other changes in the lapic
that may require full-fledged state recalculation.

This situation is often hit on systems with TPR shadow, where the
TPR can be updated by the guest without a vmexit, so that the first
apic_update_ppr to notice it is exactly the one called while
processing KVM_REQ_EVENT.

To avoid it, introduce a parameter in apic_update_ppr allowing to
suppress setting of KVM_REQ_EVENT, and use it on the paths called from
KVM_REQ_EVENT processing.
Also add the parameter to kvm_cpu_get_interrupt to be passed through
to apic_update_ppr to make sure the supression of KVM_REQ_EVENT is done
in the KVM_REQ_EVENT processing path only.

This microoptimization gives 10% performance increase on a synthetic
test doing a lot of IPC in Windows using window messages.

Reviewed-by: Roman Kagan <[email protected]>
Signed-off-by: Denis Plotnikov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/irq.c | 8 ++++----
arch/x86/kvm/lapic.c | 26 +++++++++++++-------------
arch/x86/kvm/lapic.h | 4 ++--
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
6 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdde807..628c1d1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1331,7 +1331,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
-int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
+int kvm_cpu_get_interrupt(struct kvm_vcpu *v, bool make_req);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 60d91c9..b6125be 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -82,7 +82,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
if (kvm_vcpu_apicv_active(v))
return 0;

- return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
+ return kvm_apic_has_interrupt(v, false) != -1; /* LAPIC */
}

/*
@@ -97,7 +97,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
if (kvm_cpu_has_extint(v))
return 1;

- return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
+ return kvm_apic_has_interrupt(v, true) != -1; /* LAPIC */
}
EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);

@@ -122,7 +122,7 @@ static int kvm_cpu_get_extint(struct kvm_vcpu *v)
/*
* Read pending interrupt vector and intack.
*/
-int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
+int kvm_cpu_get_interrupt(struct kvm_vcpu *v, bool make_req)
{
int vector;

@@ -134,7 +134,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
if (vector != -1)
return vector; /* PIC */

- return kvm_get_apic_interrupt(v); /* APIC */
+ return kvm_get_apic_interrupt(v, make_req); /* APIC */
}
EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6f69340..7ec777d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -544,7 +544,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
}

-static void apic_update_ppr(struct kvm_lapic *apic)
+static void apic_update_ppr(struct kvm_lapic *apic, bool make_req)
{
u32 tpr, isrv, ppr, old_ppr;
int isr;
@@ -564,7 +564,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)

if (old_ppr != ppr) {
kvm_lapic_set_reg(apic, APIC_PROCPRI, ppr);
- if (ppr < old_ppr)
+ if (make_req && ppr < old_ppr)
kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
}
}
@@ -572,7 +572,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
{
kvm_lapic_set_reg(apic, APIC_TASKPRI, tpr);
- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);
}

static bool kvm_apic_broadcast(struct kvm_lapic *apic, u32 mda)
@@ -1032,7 +1032,7 @@ static int apic_set_eoi(struct kvm_lapic *apic)
return vector;

apic_clear_isr(vector, apic);
- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);

if (test_bit(vector, vcpu_to_synic(apic->vcpu)->vec_bitmap))
kvm_hv_synic_send_eoi(apic->vcpu, vector);
@@ -1147,7 +1147,7 @@ static u32 __apic_read(struct kvm_lapic *apic, unsigned int offset)
val = apic_get_tmcct(apic);
break;
case APIC_PROCPRI:
- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);
val = kvm_lapic_get_reg(apic, offset);
break;
case APIC_TASKPRI:
@@ -1841,7 +1841,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_lapic_set_base(vcpu,
vcpu->arch.apic_base | MSR_IA32_APICBASE_BSP);
vcpu->arch.pv_eoi.msr_val = 0;
- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);

vcpu->arch.apic_arb_prio = 0;
vcpu->arch.apic_attention = 0;
@@ -1956,7 +1956,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
return -ENOMEM;
}

-int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
+int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu, bool make_req)
{
struct kvm_lapic *apic = vcpu->arch.apic;
int highest_irr;
@@ -1964,7 +1964,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
if (!apic_enabled(apic))
return -1;

- apic_update_ppr(apic);
+ apic_update_ppr(apic, make_req);
highest_irr = apic_find_highest_irr(apic);
if ((highest_irr == -1) ||
((highest_irr & 0xF0) <= kvm_lapic_get_reg(apic, APIC_PROCPRI)))
@@ -1997,9 +1997,9 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
}
}

-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
+int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu, bool make_req)
{
- int vector = kvm_apic_has_interrupt(vcpu);
+ int vector = kvm_apic_has_interrupt(vcpu, make_req);
struct kvm_lapic *apic = vcpu->arch.apic;

if (vector == -1)
@@ -2013,12 +2013,12 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
*/

apic_set_isr(vector, apic);
- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);
apic_clear_irr(vector, apic);

if (test_bit(vector, vcpu_to_synic(vcpu)->auto_eoi_bitmap)) {
apic_clear_isr(vector, apic);
- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);
}

return vector;
@@ -2068,7 +2068,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
recalculate_apic_map(vcpu->kvm);
kvm_apic_set_version(vcpu);

- apic_update_ppr(apic);
+ apic_update_ppr(apic, true);
hrtimer_cancel(&apic->lapic_timer.timer);
apic_update_lvtt(apic);
apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index f60d01c..62c92c5 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -53,9 +53,9 @@ struct dest_map;
int kvm_create_lapic(struct kvm_vcpu *vcpu);
void kvm_free_lapic(struct kvm_vcpu *vcpu);

-int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
+int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu, bool make_req);
int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
-int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
+int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu, bool make_req);
void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..d411079 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10771,7 +10771,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,

if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
&& nested_exit_intr_ack_set(vcpu)) {
- int irq = kvm_cpu_get_interrupt(vcpu);
+ int irq = kvm_cpu_get_interrupt(vcpu, true);
WARN_ON(irq < 0);
vmcs12->vm_exit_intr_info = irq |
INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04c5d96..a7e51d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6190,7 +6190,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
return r;
}
if (kvm_x86_ops->interrupt_allowed(vcpu)) {
- kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
+ kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu, false),
false);
kvm_x86_ops->set_irq(vcpu);
}
--
2.10.1.352.g0cf3611


2016-12-14 21:37:21

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT

2016-12-14 13:59+0300, Denis Plotnikov:
> When processing KVM_REQ_EVENT, apic_update_ppr is called which may set
> KVM_REQ_EVENT again if the recalculated value of PPR becomes smaller
> than the previous one. This results in cancelling the guest entry and
> reiterating in vcpu_enter_guest.
>
> However this is unnecessary because at this point KVM_REQ_EVENT is
> already being processed and there are no other changes in the lapic
> that may require full-fledged state recalculation.
>
> This situation is often hit on systems with TPR shadow, where the
> TPR can be updated by the guest without a vmexit, so that the first
> apic_update_ppr to notice it is exactly the one called while
> processing KVM_REQ_EVENT.
>
> To avoid it, introduce a parameter in apic_update_ppr allowing to
> suppress setting of KVM_REQ_EVENT, and use it on the paths called from
> KVM_REQ_EVENT processing.
> Also add the parameter to kvm_cpu_get_interrupt to be passed through
> to apic_update_ppr to make sure the supression of KVM_REQ_EVENT is done
> in the KVM_REQ_EVENT processing path only.
>
> This microoptimization gives 10% performance increase on a synthetic
> test doing a lot of IPC in Windows using window messages.
>
> Reviewed-by: Roman Kagan <[email protected]>
> Signed-off-by: Denis Plotnikov <[email protected]>
> ---

It is not pretty, but desirable,

Reviewed-by: Radim Krčmář <[email protected]>

Thanks.

2016-12-14 22:29:39

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT



On 14/12/2016 11:59, Denis Plotnikov wrote:
>
> if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> && nested_exit_intr_ack_set(vcpu)) {
> - int irq = kvm_cpu_get_interrupt(vcpu);
> + int irq = kvm_cpu_get_interrupt(vcpu, true);
> WARN_ON(irq < 0);

I think this is not needed, because all nested vmexits end with a KVM_REQ_EVENT:

/*
* the KVM_REQ_EVENT optimization bit is only on for one entry, and if
* we did not inject a still-pending event to L1 now because of
* nested_run_pending, we need to re-enable this bit.
*/
if (vmx->nested.nested_run_pending)
kvm_make_request(KVM_REQ_EVENT, vcpu);

This would allow you to always pass false from kvm_cpu_get_interrupt to
kvm_get_apic_interrupt. Not sure if the additional complication in vmx.c
is worth the simplification in lapic.c. Radim, second opinion? :)

Paolo

2016-12-15 14:31:00

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT

2016-12-15 10:18+0300, Roman Kagan:
> On Wed, Dec 14, 2016 at 11:29:33PM +0100, Paolo Bonzini wrote:
>> On 14/12/2016 11:59, Denis Plotnikov wrote:
>> >
>> > if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>> > && nested_exit_intr_ack_set(vcpu)) {
>> > - int irq = kvm_cpu_get_interrupt(vcpu);
>> > + int irq = kvm_cpu_get_interrupt(vcpu, true);
>> > WARN_ON(irq < 0);
>>
>> I think this is not needed, because all nested vmexits end with a KVM_REQ_EVENT:

I also think that it can safely be false and we could drop the parameter
from kvm_cpu_get_interrupt().

(We have injected the highest priority interrupt and put it into ISR,
raising PPR again to its level, so there should be nothing to do in
KVM_REQ_EVENT due to any TPR changes.)

>> /*
>> * the KVM_REQ_EVENT optimization bit is only on for one entry, and if
>> * we did not inject a still-pending event to L1 now because of
>> * nested_run_pending, we need to re-enable this bit.
>> */
>> if (vmx->nested.nested_run_pending)
>> kvm_make_request(KVM_REQ_EVENT, vcpu);
>
> IIRC .nested_run_pending indicates we're emulating vmlaunch/vmresume and
> should not vmexit to L1, so this is not exactly "all nested vmexits"...
>
>> This would allow you to always pass false from kvm_cpu_get_interrupt to
>> kvm_get_apic_interrupt. Not sure if the additional complication in vmx.c
>> is worth the simplification in lapic.c. Radim, second opinion? :)

This patch goes for a minimal change in the non-nested case, so I would
leave nVMX optimizations for another patch.

One useless round of KVM_REQ_EVENT is not going change nested
performance by much and it is not the only thing we could improve wrt.
TPR ... I would just leave it for now and take care of it when we
* don't to update PPR at all with APICv -- it is already correct
* drop the KVM_REQ_EVENT with flex priority, because lower TPR cannot
unmask an interrupt

2016-12-15 14:32:50

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT



On 15/12/2016 15:30, Radim Krčmář wrote:
>
> One useless round of KVM_REQ_EVENT is not going change nested
> performance by much and it is not the only thing we could improve wrt.
> TPR ... I would just leave it for now and take care of it when we
> * don't to update PPR at all with APICv -- it is already correct
> * drop the KVM_REQ_EVENT with flex priority, because lower TPR cannot
> unmask an interrupt

I agree. I still don't like the patch very much, because I feel like an
explicit state machine ("can KVM_REQ_EVENT do anything?") would be more
maintainable. But if I don't come up with anything we'll go with this
patch.

Paolo

2016-12-15 14:57:33

by Roman Kagan

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT

On Thu, Dec 15, 2016 at 03:32:45PM +0100, Paolo Bonzini wrote:
>
>
> On 15/12/2016 15:30, Radim Krčmář wrote:
> >
> > One useless round of KVM_REQ_EVENT is not going change nested
> > performance by much and it is not the only thing we could improve wrt.
> > TPR ... I would just leave it for now and take care of it when we
> > * don't to update PPR at all with APICv -- it is already correct
> > * drop the KVM_REQ_EVENT with flex priority, because lower TPR cannot
> > unmask an interrupt
>
> I agree. I still don't like the patch very much, because I feel like an
> explicit state machine ("can KVM_REQ_EVENT do anything?") would be more
> maintainable.

We all seem to share that feeling towards this patch :) That's the
reason why it was baking here internally for a long time: Denis
discovered this scenario over a month ago while analyzing the
performance regressions in KVM against our proprietary hypervisor, but
pinning down a palatable and safe fix turned out to be a challenge.

I think we did our best to stay safe; I agree that it ended up no so
beautiful indeed.

Roman.

2016-12-18 21:04:03

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: avoid redundant REQ_EVENT



On 15/12/2016 15:56, Roman Kagan wrote:
> On Thu, Dec 15, 2016 at 03:32:45PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 15/12/2016 15:30, Radim Krčmář wrote:
>>>
>>> One useless round of KVM_REQ_EVENT is not going change nested
>>> performance by much and it is not the only thing we could improve wrt.
>>> TPR ... I would just leave it for now and take care of it when we
>>> * don't to update PPR at all with APICv -- it is already correct
>>> * drop the KVM_REQ_EVENT with flex priority, because lower TPR cannot
>>> unmask an interrupt
>>
>> I agree. I still don't like the patch very much, because I feel like an
>> explicit state machine ("can KVM_REQ_EVENT do anything?") would be more
>> maintainable.
>
> We all seem to share that feeling towards this patch :) That's the
> reason why it was baking here internally for a long time: Denis
> discovered this scenario over a month ago while analyzing the
> performance regressions in KVM against our proprietary hypervisor, but
> pinning down a palatable and safe fix turned out to be a challenge.
>
> I think we did our best to stay safe; I agree that it ended up no so
> beautiful indeed.

Yes, and it's fine. Now the thing to do is to find something that
perhaps looks less safe, but actually shows the underlying invariants
better. It doesn't even need any kind of state machine, because the
state machine is already implicit in the vmexits.

In particular, KVM_REQ_EVENT is only needed in the following cases:

1) an interrupt is injected (we could optimize self-IPIs but we don't)

2) PPR is lowered so that old-PPR >= IRR and new-PPR < IRR; this can
only happen:

2.1) from a TPR write (CR8-write vmexit, TPR-below-threshold vmexit, TPR
write through MMIO or MSR)

2.2) from an EOI's clearing of a bit in ISR

3) the interrupt window opens


In particular, this list does not include:

- setting a bit in ISR, as in kvm_get_apic_interrupt

- clearing a bit in IRR, again as in kvm_get_apic_interrupt

- anything on EOI besides what apic_update_ppr already does

- changes to PPR that didn't cause a vmexit: if the TPR changes but it
doesn't trigger a vmexit---because of TPR_THRESHOLD on vmx or
update_cr8_intercept on svm---then the TPR change must not have unmasked
an interrupt, and KVM_REQ_EVENT is unnecessary.


So in the end, as in the APICv case but hopefully with less pitfalls,
the extra KVM_REQ_EVENT are not only slowing down the code, but the
apparent extra safety has the cost of obfuscating it. I'll see if I can
come up with something not too ugly.

Paolo