2020-03-26 02:20:48

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 0/3] KVM: X86: Single target IPI fastpath enhancement

The original single target IPI fastpath patch forgot to filter the
ICR destination shorthand field. Multicast IPI is not suitable for
this feature since wakeup the multiple sleeping vCPUs will extend
the interrupt disabled time, it especially worse in the over-subscribe
and VM has a little bit more vCPUs scenario. Let's narrow it down to
single target IPI. In addition, this patchset micro-optimize virtual
IPI emulation sequence for fastpath.

Wanpeng Li (3):
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: X86: Narrow down the IPI fastpath to single target IPI
KVM: X86: Micro-optimize IPI fastpath delay

arch/x86/kvm/lapic.c | 4 ++--
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/x86.c | 14 +++++++++++---
3 files changed, 14 insertions(+), 5 deletions(-)

--
2.7.4


2020-03-26 02:21:09

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/3] KVM: X86: Delay read msr data iff writes ICR MSR

From: Wanpeng Li <[email protected]>

Delay read msr data until we identify guest accesses ICR MSR to avoid
to penalize all other MSR writes.

Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3156e25..9232b15 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1568,11 +1568,12 @@ static int handle_fastpath_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, u64 data
enum exit_fastpath_completion handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
{
u32 msr = kvm_rcx_read(vcpu);
- u64 data = kvm_read_edx_eax(vcpu);
+ u64 data;
int ret = 0;

switch (msr) {
case APIC_BASE_MSR + (APIC_ICR >> 4):
+ data = kvm_read_edx_eax(vcpu);
ret = handle_fastpath_set_x2apic_icr_irqoff(vcpu, data);
break;
default:
--
2.7.4

2020-03-26 02:21:10

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 2/3] KVM: X86: Narrow down the IPI fastpath to single target IPI

From: Wanpeng Li <[email protected]>

The original single target IPI fastpath patch forgot to filter the
ICR destination shorthand field. Multicast IPI is not suitable for
this feature since wakeup the multiple sleeping vCPUs will extend
the interrupt disabled time, it especially worse in the over-subscribe
and VM has a little bit more vCPUs scenario. Let's narrow it down to
single target IPI.

Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other
running cyclictest on all vCPUs, w/ this patch, the avg score
of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv
sched yield are disabled during testing to avoid the disturb).

Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/x86.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9232b15..50ef1c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1554,7 +1554,10 @@ EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
*/
static int handle_fastpath_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, u64 data)
{
- if (lapic_in_kernel(vcpu) && apic_x2apic_mode(vcpu->arch.apic) &&
+ if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic))
+ return 1;
+
+ if (((data & APIC_SHORT_MASK) == APIC_DEST_NOSHORT) &&
((data & APIC_DEST_MASK) == APIC_DEST_PHYSICAL) &&
((data & APIC_MODE_MASK) == APIC_DM_FIXED)) {

--
2.7.4

2020-03-26 02:21:31

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 3/3] KVM: X86: Micro-optimize IPI fastpath delay

From: Wanpeng Li <[email protected]>

This patch optimizes the virtual IPI fastpath emulation sequence:

write ICR2 send virtual IPI
read ICR2 write ICR2
send virtual IPI ==> write ICR
write ICR

We can observe ~0.67% performance improvement for IPI microbenchmark
(https://lore.kernel.org/kvm/[email protected]/)
on Skylake server.

Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 4 ++--
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/x86.c | 6 +++++-
3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e3099c6..338de38 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1226,7 +1226,7 @@ void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
}
EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);

-static void apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
+void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
{
struct kvm_lapic_irq irq;

@@ -1940,7 +1940,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
case APIC_ICR:
/* No delay here, so we always clear the pending bit */
val &= ~(1 << 12);
- apic_send_ipi(apic, val, kvm_lapic_get_reg(apic, APIC_ICR2));
+ kvm_apic_send_ipi(apic, val, kvm_lapic_get_reg(apic, APIC_ICR2));
kvm_lapic_set_reg(apic, APIC_ICR, val);
break;

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index ec6fbfe..bc76860 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -95,6 +95,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);

bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
+void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);

u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 50ef1c5..c4bb7d8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1561,8 +1561,12 @@ static int handle_fastpath_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, u64 data
((data & APIC_DEST_MASK) == APIC_DEST_PHYSICAL) &&
((data & APIC_MODE_MASK) == APIC_DM_FIXED)) {

+ data &= ~(1 << 12);
+ kvm_apic_send_ipi(vcpu->arch.apic, (u32)data, (u32)(data >> 32));
kvm_lapic_set_reg(vcpu->arch.apic, APIC_ICR2, (u32)(data >> 32));
- return kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR, (u32)data);
+ kvm_lapic_set_reg(vcpu->arch.apic, APIC_ICR, (u32)data);
+ trace_kvm_apic_write(APIC_ICR, (u32)data);
+ return 0;
}

return 1;
--
2.7.4

2020-03-26 09:47:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: X86: Single target IPI fastpath enhancement

On 26/03/20 03:19, Wanpeng Li wrote:
> The original single target IPI fastpath patch forgot to filter the
> ICR destination shorthand field. Multicast IPI is not suitable for
> this feature since wakeup the multiple sleeping vCPUs will extend
> the interrupt disabled time, it especially worse in the over-subscribe
> and VM has a little bit more vCPUs scenario. Let's narrow it down to
> single target IPI. In addition, this patchset micro-optimize virtual
> IPI emulation sequence for fastpath.
>
> Wanpeng Li (3):
> KVM: X86: Delay read msr data iff writes ICR MSR
> KVM: X86: Narrow down the IPI fastpath to single target IPI
> KVM: X86: Micro-optimize IPI fastpath delay
>
> arch/x86/kvm/lapic.c | 4 ++--
> arch/x86/kvm/lapic.h | 1 +
> arch/x86/kvm/x86.c | 14 +++++++++++---
> 3 files changed, 14 insertions(+), 5 deletions(-)
>

Queued 2 for 5.6 and 1-3 for 5.7, thanks.

Paolo