2019-06-12 09:38:08

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 0/5] KVM: LAPIC: Optimize timer latency further

Advance lapic timer tries to hidden the hypervisor overhead between the
host emulated timer fires and the guest awares the timer is fired. However,
it just hidden the time between apic_timer_fn/handle_preemption_timer ->
wait_lapic_expire, instead of the real position of vmentry which is
mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to
advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
the end of wait_lapic_expire and before world switch on my haswell desktop.

This patchset tries to narrow the last gap(wait_lapic_expire -> world switch),
it takes the real overhead time between apic_timer_fn/handle_preemption_timer
and before world switch into consideration when adaptively tuning timer
advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+
cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when
testing busy waits.

v2 -> v3:
* expose 'kvm_timer.timer_advance_ns' to userspace
* move the tracepoint below guest_exit_irqoff()
* move wait_lapic_expire() before flushing the L1

v1 -> v2:
* fix indent in patch 1/4
* remove the wait_lapic_expire() tracepoint and expose by debugfs
* move the call to wait_lapic_expire() into vmx.c and svm.c

Wanpeng Li (5):
KVM: LAPIC: Extract adaptive tune timer advancement logic
KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
KVM: LAPIC: Delay trace advance expire delta
KVM: LAPIC: Optimize timer latency further

arch/x86/kvm/debugfs.c | 16 +++++++++++++
arch/x86/kvm/lapic.c | 62 +++++++++++++++++++++++++++++---------------------
arch/x86/kvm/lapic.h | 3 ++-
arch/x86/kvm/svm.c | 4 ++++
arch/x86/kvm/vmx/vmx.c | 4 ++++
arch/x86/kvm/x86.c | 7 +++---
6 files changed, 65 insertions(+), 31 deletions(-)

--
2.7.4


2019-06-12 09:38:10

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic

From: Wanpeng Li <[email protected]>

Extract adaptive tune timer advancement logic to a single function.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 57 ++++++++++++++++++++++++++++++----------------------
1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bd13fdd..2f364fe 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1501,11 +1501,40 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles)
}
}

-void wait_lapic_expire(struct kvm_vcpu *vcpu)
+static inline void adaptive_tune_timer_advancement(struct kvm_vcpu *vcpu,
+ u64 guest_tsc, u64 tsc_deadline)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
- u64 guest_tsc, tsc_deadline, ns;
+ u64 ns;
+
+ /* too early */
+ if (guest_tsc < tsc_deadline) {
+ ns = (tsc_deadline - guest_tsc) * 1000000ULL;
+ do_div(ns, vcpu->arch.virtual_tsc_khz);
+ timer_advance_ns -= min((u32)ns,
+ timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+ } else {
+ /* too late */
+ ns = (guest_tsc - tsc_deadline) * 1000000ULL;
+ do_div(ns, vcpu->arch.virtual_tsc_khz);
+ timer_advance_ns += min((u32)ns,
+ timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+ }
+
+ if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
+ apic->lapic_timer.timer_advance_adjust_done = true;
+ if (unlikely(timer_advance_ns > 5000)) {
+ timer_advance_ns = 0;
+ apic->lapic_timer.timer_advance_adjust_done = true;
+ }
+ apic->lapic_timer.timer_advance_ns = timer_advance_ns;
+}
+
+void wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+ u64 guest_tsc, tsc_deadline;

if (apic->lapic_timer.expired_tscdeadline == 0)
return;
@@ -1521,28 +1550,8 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
if (guest_tsc < tsc_deadline)
__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);

- if (!apic->lapic_timer.timer_advance_adjust_done) {
- /* too early */
- if (guest_tsc < tsc_deadline) {
- ns = (tsc_deadline - guest_tsc) * 1000000ULL;
- do_div(ns, vcpu->arch.virtual_tsc_khz);
- timer_advance_ns -= min((u32)ns,
- timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
- } else {
- /* too late */
- ns = (guest_tsc - tsc_deadline) * 1000000ULL;
- do_div(ns, vcpu->arch.virtual_tsc_khz);
- timer_advance_ns += min((u32)ns,
- timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
- }
- if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
- apic->lapic_timer.timer_advance_adjust_done = true;
- if (unlikely(timer_advance_ns > 5000)) {
- timer_advance_ns = 0;
- apic->lapic_timer.timer_advance_adjust_done = true;
- }
- apic->lapic_timer.timer_advance_ns = timer_advance_ns;
- }
+ if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
+ adaptive_tune_timer_advancement(vcpu, guest_tsc, tsc_deadline);
}

static void start_sw_tscdeadline(struct kvm_lapic *apic)
--
2.7.4

2019-06-12 09:38:24

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 5/5] KVM: LAPIC: Optimize timer latency further

From: Wanpeng Li <[email protected]>

Advance lapic timer tries to hidden the hypervisor overhead between the
host emulated timer fires and the guest awares the timer is fired. However,
it just hidden the time between apic_timer_fn/handle_preemption_timer ->
wait_lapic_expire, instead of the real position of vmentry which is
mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to
advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
the end of wait_lapic_expire and before world switch on my haswell desktop.

This patch tries to narrow the last gap(wait_lapic_expire -> world switch),
it takes the real overhead time between apic_timer_fn/handle_preemption_timer
and before world switch into consideration when adaptively tuning timer
advancement. The patch can reduce 40% latency (~1600+ cycles to ~1000+ cycles
on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
busy waits.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 3 ++-
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/svm.c | 4 ++++
arch/x86/kvm/vmx/vmx.c | 4 ++++
arch/x86/kvm/x86.c | 3 ---
5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index af38ece..63513de 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1531,7 +1531,7 @@ static inline void adaptive_tune_timer_advancement(struct kvm_vcpu *vcpu,
apic->lapic_timer.timer_advance_ns = timer_advance_ns;
}

-void wait_lapic_expire(struct kvm_vcpu *vcpu)
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u64 guest_tsc, tsc_deadline;
@@ -1553,6 +1553,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
adaptive_tune_timer_advancement(vcpu, apic->lapic_timer.advance_expire_delta);
}
+EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire);

static void start_sw_tscdeadline(struct kvm_lapic *apic)
{
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3e72a25..f974a3d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -220,7 +220,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)

bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);

-void wait_lapic_expire(struct kvm_vcpu *vcpu);
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);

bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
struct kvm_vcpu **dest_vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6b92eaf..955cfcb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5638,6 +5638,10 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
clgi();
kvm_load_guest_xcr0(vcpu);

+ if (lapic_in_kernel(vcpu) &&
+ vcpu->arch.apic->lapic_timer.timer_advance_ns)
+ kvm_wait_lapic_expire(vcpu);
+
/*
* If this vCPU has touched SPEC_CTRL, restore the guest's value if
* it's non-zero. Since vmentry is serialising on affected CPUs, there
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e1fa935..771d3bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6423,6 +6423,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)

vmx_update_hv_timer(vcpu);

+ if (lapic_in_kernel(vcpu) &&
+ vcpu->arch.apic->lapic_timer.timer_advance_ns)
+ kvm_wait_lapic_expire(vcpu);
+
/*
* If this vCPU has touched SPEC_CTRL, restore the guest's value if
* it's non-zero. Since vmentry is serialising on affected CPUs, there
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4a7b00c..e154f52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7903,9 +7903,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}

trace_kvm_entry(vcpu->vcpu_id);
- if (lapic_in_kernel(vcpu) &&
- vcpu->arch.apic->lapic_timer.timer_advance_ns)
- wait_lapic_expire(vcpu);
guest_enter_irqoff();

fpregs_assert_state_consistent();
--
2.7.4

2019-06-12 09:38:30

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 4/5] KVM: LAPIC: Delay trace advance expire delta

From: Wanpeng Li <[email protected]>

wait_lapic_expire() call was moved above guest_enter_irqoff() because of
its tracepoint, which violated the RCU extended quiescent state invoked
by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint
below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before
VM-Enter, but trace it after VM-Exit. This can help us to move
wait_lapic_expire() just before vmentry in the later patch.

[1] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
[2] https://patchwork.kernel.org/patch/7821111/

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 16 ++++++++--------
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/x86.c | 2 ++
3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2f364fe..af38ece 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1502,27 +1502,27 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles)
}

static inline void adaptive_tune_timer_advancement(struct kvm_vcpu *vcpu,
- u64 guest_tsc, u64 tsc_deadline)
+ s64 advance_expire_delta)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
u64 ns;

/* too early */
- if (guest_tsc < tsc_deadline) {
- ns = (tsc_deadline - guest_tsc) * 1000000ULL;
+ if (advance_expire_delta < 0) {
+ ns = -advance_expire_delta * 1000000ULL;
do_div(ns, vcpu->arch.virtual_tsc_khz);
timer_advance_ns -= min((u32)ns,
timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
} else {
/* too late */
- ns = (guest_tsc - tsc_deadline) * 1000000ULL;
+ ns = advance_expire_delta * 1000000ULL;
do_div(ns, vcpu->arch.virtual_tsc_khz);
timer_advance_ns += min((u32)ns,
timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
}

- if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
+ if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
apic->lapic_timer.timer_advance_adjust_done = true;
if (unlikely(timer_advance_ns > 5000)) {
timer_advance_ns = 0;
@@ -1545,13 +1545,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
tsc_deadline = apic->lapic_timer.expired_tscdeadline;
apic->lapic_timer.expired_tscdeadline = 0;
guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
- trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
+ apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;

- if (guest_tsc < tsc_deadline)
+ if (apic->lapic_timer.advance_expire_delta < 0)
__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);

if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
- adaptive_tune_timer_advancement(vcpu, guest_tsc, tsc_deadline);
+ adaptive_tune_timer_advancement(vcpu, apic->lapic_timer.advance_expire_delta);
}

static void start_sw_tscdeadline(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index d6d049b..3e72a25 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -32,6 +32,7 @@ struct kvm_timer {
u64 tscdeadline;
u64 expired_tscdeadline;
u32 timer_advance_ns;
+ s64 advance_expire_delta;
atomic_t pending; /* accumulated triggered timers */
bool hv_timer_in_use;
bool timer_advance_adjust_done;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f2e3847..4a7b00c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7961,6 +7961,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
++vcpu->stat.exits;

guest_exit_irqoff();
+ trace_kvm_wait_lapic_expire(vcpu->vcpu_id,
+ vcpu->arch.apic->lapic_timer.advance_expire_delta);

local_irq_enable();
preempt_enable();
--
2.7.4

2019-06-12 09:38:36

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace

From: Wanpeng Li <[email protected]>

Expose per-vCPU timer_advance_ns to userspace, so it is able to
query the auto-adjusted value.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/debugfs.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index c19c7ed..a6f1f93 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -9,12 +9,22 @@
*/
#include <linux/kvm_host.h>
#include <linux/debugfs.h>
+#include "lapic.h"

bool kvm_arch_has_vcpu_debugfs(void)
{
return true;
}

+static int vcpu_get_timer_advance_ns(void *data, u64 *val)
+{
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
+ *val = vcpu->arch.apic->lapic_timer.timer_advance_ns;
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, NULL, "%llu\n");
+
static int vcpu_get_tsc_offset(void *data, u64 *val)
{
struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
@@ -51,6 +61,12 @@ int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
if (!ret)
return -ENOMEM;

+ ret = debugfs_create_file("lapic_timer_advance_ns", 0444,
+ vcpu->debugfs_dentry,
+ vcpu, &vcpu_timer_advance_ns_fops);
+ if (!ret)
+ return -ENOMEM;
+
if (kvm_has_tsc_control) {
ret = debugfs_create_file("tsc-scaling-ratio", 0444,
vcpu->debugfs_dentry,
--
2.7.4

2019-06-12 09:39:25

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow

From: Wanpeng Li <[email protected]>

After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of
timer advancement), '-1' enables adaptive tuning starting from default
advancment of 1000ns. However, we should expose an int instead of an overflow
uint module parameter.

Before patch:

/sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295

After patch:

/sys/module/kvm/parameters/lapic_timer_advance_ns:-1

Fixes: c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement)
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b63e7b0..f2e3847 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -143,7 +143,7 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
* tuning, i.e. allows priveleged userspace to set an exact advancement time.
*/
static int __read_mostly lapic_timer_advance_ns = -1;
-module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR);
+module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR);

static bool __read_mostly vector_hashing = true;
module_param(vector_hashing, bool, S_IRUGO);
--
2.7.4

2019-06-12 09:41:22

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] KVM: LAPIC: Optimize timer latency further

Sorry, send out the wrong patchset, please ignore.
On Wed, 12 Jun 2019 at 17:36, Wanpeng Li <[email protected]> wrote:
>
> Advance lapic timer tries to hidden the hypervisor overhead between the
> host emulated timer fires and the guest awares the timer is fired. However,
> it just hidden the time between apic_timer_fn/handle_preemption_timer ->
> wait_lapic_expire, instead of the real position of vmentry which is
> mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to
> advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
> the end of wait_lapic_expire and before world switch on my haswell desktop.
>
> This patchset tries to narrow the last gap(wait_lapic_expire -> world switch),
> it takes the real overhead time between apic_timer_fn/handle_preemption_timer
> and before world switch into consideration when adaptively tuning timer
> advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+
> cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when
> testing busy waits.
>
> v2 -> v3:
> * expose 'kvm_timer.timer_advance_ns' to userspace
> * move the tracepoint below guest_exit_irqoff()
> * move wait_lapic_expire() before flushing the L1
>
> v1 -> v2:
> * fix indent in patch 1/4
> * remove the wait_lapic_expire() tracepoint and expose by debugfs
> * move the call to wait_lapic_expire() into vmx.c and svm.c
>
> Wanpeng Li (5):
> KVM: LAPIC: Extract adaptive tune timer advancement logic
> KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
> KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
> KVM: LAPIC: Delay trace advance expire delta
> KVM: LAPIC: Optimize timer latency further
>
> arch/x86/kvm/debugfs.c | 16 +++++++++++++
> arch/x86/kvm/lapic.c | 62 +++++++++++++++++++++++++++++---------------------
> arch/x86/kvm/lapic.h | 3 ++-
> arch/x86/kvm/svm.c | 4 ++++
> arch/x86/kvm/vmx/vmx.c | 4 ++++
> arch/x86/kvm/x86.c | 7 +++---
> 6 files changed, 65 insertions(+), 31 deletions(-)
>
> --
> 2.7.4
>