2016-10-11 12:17:45

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

Most windows guests which I have on hand currently still utilize APIC Timer
periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
- windows 2008 server r2
- windows 2012 server r2
- windows 7
- windows 10

This patchset adds the support using the VMX preemption timer for APIC Timer
periodic/oneshot mode.

I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and observed
that w/ patch the latency is ~2% of w/o patch. I think maybe something is still
not right in the patchset, in addition, tmcct in apic_get_tmcct() maybe is not
calculated correctly. Your comments to improve the patchset is a great appreciated.

Wanpeng Li (2):
KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
KVM: x86: Support using the vmx preemption timer for APIC Timer periodic/one mode

arch/x86/kvm/lapic.c | 162 ++++++++++++++++++++++++++++++---------------------
1 file changed, 95 insertions(+), 67 deletions(-)

--
1.9.1


2016-10-11 12:17:50

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH RFC 1/2] KVM: lapic: Extract start_sw_period() to handle periodic/oneshot mode

From: Wanpeng Li <[email protected]>

Extract start_sw_period() to handle periodic/oneshot mode, it will be
used by later patch.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Yunhong Jiang <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 89 +++++++++++++++++++++++++++-------------------------
1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 23b99f3..dad743e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1347,6 +1347,50 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
local_irq_restore(flags);
}

+static void start_sw_period(struct kvm_lapic *apic)
+{
+ ktime_t now;
+
+ /* lapic timer in oneshot or periodic mode */
+ now = apic->lapic_timer.timer.base->get_time();
+ apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
+ * APIC_BUS_CYCLE_NS * apic->divide_count;
+
+ if (!apic->lapic_timer.period)
+ return;
+ /*
+ * Do not allow the guest to program periodic timers with small
+ * interval, since the hrtimers are not throttled by the host
+ * scheduler.
+ */
+ if (apic_lvtt_period(apic)) {
+ s64 min_period = min_timer_period_us * 1000LL;
+
+ if (apic->lapic_timer.period < min_period) {
+ pr_info_ratelimited(
+ "kvm: vcpu %i: requested %lld ns "
+ "lapic timer period limited to %lld ns\n",
+ apic->vcpu->vcpu_id,
+ apic->lapic_timer.period, min_period);
+ apic->lapic_timer.period = min_period;
+ }
+ }
+
+ hrtimer_start(&apic->lapic_timer.timer,
+ ktime_add_ns(now, apic->lapic_timer.period),
+ HRTIMER_MODE_ABS_PINNED);
+
+ apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
+ PRIx64 ", "
+ "timer initial count 0x%x, period %lldns, "
+ "expire @ 0x%016" PRIx64 ".\n", __func__,
+ APIC_BUS_CYCLE_NS, ktime_to_ns(now),
+ kvm_lapic_get_reg(apic, APIC_TMICT),
+ apic->lapic_timer.period,
+ ktime_to_ns(ktime_add_ns(now,
+ apic->lapic_timer.period)));
+}
+
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
{
if (!lapic_in_kernel(vcpu))
@@ -1424,50 +1468,11 @@ EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);

static void start_apic_timer(struct kvm_lapic *apic)
{
- ktime_t now;
-
atomic_set(&apic->lapic_timer.pending, 0);

- if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
- /* lapic timer in oneshot or periodic mode */
- now = apic->lapic_timer.timer.base->get_time();
- apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
- * APIC_BUS_CYCLE_NS * apic->divide_count;
-
- if (!apic->lapic_timer.period)
- return;
- /*
- * Do not allow the guest to program periodic timers with small
- * interval, since the hrtimers are not throttled by the host
- * scheduler.
- */
- if (apic_lvtt_period(apic)) {
- s64 min_period = min_timer_period_us * 1000LL;
-
- if (apic->lapic_timer.period < min_period) {
- pr_info_ratelimited(
- "kvm: vcpu %i: requested %lld ns "
- "lapic timer period limited to %lld ns\n",
- apic->vcpu->vcpu_id,
- apic->lapic_timer.period, min_period);
- apic->lapic_timer.period = min_period;
- }
- }
-
- hrtimer_start(&apic->lapic_timer.timer,
- ktime_add_ns(now, apic->lapic_timer.period),
- HRTIMER_MODE_ABS_PINNED);
-
- apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
- PRIx64 ", "
- "timer initial count 0x%x, period %lldns, "
- "expire @ 0x%016" PRIx64 ".\n", __func__,
- APIC_BUS_CYCLE_NS, ktime_to_ns(now),
- kvm_lapic_get_reg(apic, APIC_TMICT),
- apic->lapic_timer.period,
- ktime_to_ns(ktime_add_ns(now,
- apic->lapic_timer.period)));
- } else if (apic_lvtt_tscdeadline(apic)) {
+ if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
+ start_sw_period(apic);
+ else if (apic_lvtt_tscdeadline(apic)) {
if (!(kvm_x86_ops->set_hv_timer && start_hv_tscdeadline(apic)))
start_sw_tscdeadline(apic);
}
--
1.9.1

2016-10-11 12:17:53

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

From: Wanpeng Li <[email protected]>

Most windows guests still utilize APIC Timer periodic/oneshot mode
instead of tsc-deadline mode, and the APIC Timer periodic/oneshot
mode are still emulated by high overhead hrtimer on host. This patch
converts the expected expire time of the periodic/oneshot mode to
guest deadline tsc in order to leverage VMX preemption timer logic
for APIC Timer tsc-deadline mode.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Yunhong Jiang <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 83 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 53 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dad743e..f9df9e3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1101,13 +1101,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
apic->lapic_timer.period == 0)
return 0;

- remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
- if (ktime_to_ns(remaining) < 0)
- remaining = ktime_set(0, 0);
+ if (kvm_lapic_hv_timer_in_use(apic->vcpu)) {
+ u64 tscl = rdtsc();

- ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
- tmcct = div64_u64(ns,
- (APIC_BUS_CYCLE_NS * apic->divide_count));
+ tmcct = apic->lapic_timer.tscdeadline -
+ kvm_read_l1_tsc(apic->vcpu, tscl);
+ } else {
+ remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
+ if (ktime_to_ns(remaining) < 0)
+ remaining = ktime_set(0, 0);
+
+ ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
+ tmcct = div64_u64(ns,
+ (APIC_BUS_CYCLE_NS * apic->divide_count));
+ }

return tmcct;
}
@@ -1400,52 +1407,65 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);

-static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
+static void cancel_hv_timer(struct kvm_lapic *apic)
{
kvm_x86_ops->cancel_hv_timer(apic->vcpu);
apic->lapic_timer.hv_timer_in_use = false;
}

-void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
+static bool start_hv_timer(struct kvm_lapic *apic)
{
- struct kvm_lapic *apic = vcpu->arch.apic;
+ u64 tscdeadline;

- WARN_ON(!apic->lapic_timer.hv_timer_in_use);
- WARN_ON(swait_active(&vcpu->wq));
- cancel_hv_tscdeadline(apic);
- apic_timer_expired(apic);
-}
-EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
+ if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
+ u64 tscl = rdtsc();

-static bool start_hv_tscdeadline(struct kvm_lapic *apic)
-{
- u64 tscdeadline = apic->lapic_timer.tscdeadline;
+ apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
+ * APIC_BUS_CYCLE_NS * apic->divide_count;
+ apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
+ nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
+ }
+
+ tscdeadline = apic->lapic_timer.tscdeadline;

if (atomic_read(&apic->lapic_timer.pending) ||
kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
if (apic->lapic_timer.hv_timer_in_use)
- cancel_hv_tscdeadline(apic);
+ cancel_hv_timer(apic);
} else {
apic->lapic_timer.hv_timer_in_use = true;
hrtimer_cancel(&apic->lapic_timer.timer);

/* In case the sw timer triggered in the window */
if (atomic_read(&apic->lapic_timer.pending))
- cancel_hv_tscdeadline(apic);
+ cancel_hv_timer(apic);
}
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
apic->lapic_timer.hv_timer_in_use);
return apic->lapic_timer.hv_timer_in_use;
}

+void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+
+ WARN_ON(!apic->lapic_timer.hv_timer_in_use);
+ WARN_ON(swait_active(&vcpu->wq));
+ cancel_hv_timer(apic);
+ apic_timer_expired(apic);
+
+ if (apic_lvtt_period(apic))
+ start_hv_timer(apic);
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
+
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;

WARN_ON(apic->lapic_timer.hv_timer_in_use);

- if (apic_lvtt_tscdeadline(apic))
- start_hv_tscdeadline(apic);
+ start_hv_timer(apic);
}
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);

@@ -1457,12 +1477,15 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
if (!apic->lapic_timer.hv_timer_in_use)
return;

- cancel_hv_tscdeadline(apic);
+ cancel_hv_timer(apic);

if (atomic_read(&apic->lapic_timer.pending))
return;

- start_sw_tscdeadline(apic);
+ if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
+ start_sw_period(apic);
+ else if (apic_lvtt_tscdeadline(apic))
+ start_sw_tscdeadline(apic);
}
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);

@@ -1470,10 +1493,11 @@ static void start_apic_timer(struct kvm_lapic *apic)
{
atomic_set(&apic->lapic_timer.pending, 0);

- if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
- start_sw_period(apic);
- else if (apic_lvtt_tscdeadline(apic)) {
- if (!(kvm_x86_ops->set_hv_timer && start_hv_tscdeadline(apic)))
+ if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
+ if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
+ start_sw_period(apic);
+ } else if (apic_lvtt_tscdeadline(apic)) {
+ if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
start_sw_tscdeadline(apic);
}
}
@@ -1711,8 +1735,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;

- if (!lapic_in_kernel(vcpu) || apic_lvtt_oneshot(apic) ||
- apic_lvtt_period(apic))
+ if (!lapic_in_kernel(vcpu))
return 0;

return apic->lapic_timer.tscdeadline;
--
1.9.1

2016-10-11 15:06:46

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

2016-10-11 20:17+0800, Wanpeng Li:
> From: Wanpeng Li <[email protected]>
>
> Most windows guests still utilize APIC Timer periodic/oneshot mode
> instead of tsc-deadline mode, and the APIC Timer periodic/oneshot
> mode are still emulated by high overhead hrtimer on host. This patch
> converts the expected expire time of the periodic/oneshot mode to
> guest deadline tsc in order to leverage VMX preemption timer logic
> for APIC Timer tsc-deadline mode.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Yunhong Jiang <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> @@ -1101,13 +1101,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
> apic->lapic_timer.period == 0)
> return 0;
>
> + if (kvm_lapic_hv_timer_in_use(apic->vcpu)) {
> + u64 tscl = rdtsc();
> +
> + tmcct = apic->lapic_timer.tscdeadline -
> + kvm_read_l1_tsc(apic->vcpu, tscl);

Yes, this won't work. The easiest way to return a less bogus TMCCT
would be remember the timeout when setting up the timer and replace
hrtimer_get_remaining() with it -- our deliver method shouldn't change
the expiration time.

> + } else {
> + remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
> + if (ktime_to_ns(remaining) < 0)
> + remaining = ktime_set(0, 0);
> +
> + ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> + tmcct = div64_u64(ns,
> + (APIC_BUS_CYCLE_NS * apic->divide_count));
> + }
>
> return tmcct;
> }
> @@ -1400,52 +1407,65 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
>
> -static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
> +static void cancel_hv_timer(struct kvm_lapic *apic)
> {
> kvm_x86_ops->cancel_hv_timer(apic->vcpu);
> apic->lapic_timer.hv_timer_in_use = false;
> }
>
> +static bool start_hv_timer(struct kvm_lapic *apic)
> {
> + u64 tscdeadline;
>
> + if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
> + u64 tscl = rdtsc();
>
> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
> + * APIC_BUS_CYCLE_NS * apic->divide_count;
> + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
> + nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);

start_hv_timer() is called from kvm_lapic_switch_to_hv_timer(), which
can happen mid-period. The worst case is that the timer will never
fire, because we always convert back and forth. You have to compute the
equivalent deadline only once, and carry it around.

> + }
> +
> + tscdeadline = apic->lapic_timer.tscdeadline;
>
> if (atomic_read(&apic->lapic_timer.pending) ||
> kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
> if (apic->lapic_timer.hv_timer_in_use)
> + cancel_hv_timer(apic);
> } else {
> apic->lapic_timer.hv_timer_in_use = true;
> hrtimer_cancel(&apic->lapic_timer.timer);
>
> /* In case the sw timer triggered in the window */
> if (atomic_read(&apic->lapic_timer.pending))
> + cancel_hv_timer(apic);
> }
> trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> apic->lapic_timer.hv_timer_in_use);
> return apic->lapic_timer.hv_timer_in_use;
> }
>
> +void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_lapic *apic = vcpu->arch.apic;
> +
> + WARN_ON(!apic->lapic_timer.hv_timer_in_use);
> + WARN_ON(swait_active(&vcpu->wq));
> + cancel_hv_timer(apic);
> + apic_timer_expired(apic);
> +
> + if (apic_lvtt_period(apic))
> + start_hv_timer(apic);
> +}
> +EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
> +
> void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
> {
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> WARN_ON(apic->lapic_timer.hv_timer_in_use);
>
> - if (apic_lvtt_tscdeadline(apic))
> - start_hv_tscdeadline(apic);
> + start_hv_timer(apic);
> }
> EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
>

2016-10-12 08:35:32

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

2016-10-11 23:06 GMT+08:00 Radim Krčmář <[email protected]>:
> 2016-10-11 20:17+0800, Wanpeng Li:
>> From: Wanpeng Li <[email protected]>
>>
>> Most windows guests still utilize APIC Timer periodic/oneshot mode
>> instead of tsc-deadline mode, and the APIC Timer periodic/oneshot
>> mode are still emulated by high overhead hrtimer on host. This patch
>> converts the expected expire time of the periodic/oneshot mode to
>> guest deadline tsc in order to leverage VMX preemption timer logic
>> for APIC Timer tsc-deadline mode.
>>
>> Cc: Paolo Bonzini <[email protected]>
>> Cc: Radim Krčmář <[email protected]>
>> Cc: Yunhong Jiang <[email protected]>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> @@ -1101,13 +1101,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
>> apic->lapic_timer.period == 0)
>> return 0;
>>
>> + if (kvm_lapic_hv_timer_in_use(apic->vcpu)) {
>> + u64 tscl = rdtsc();
>> +
>> + tmcct = apic->lapic_timer.tscdeadline -
>> + kvm_read_l1_tsc(apic->vcpu, tscl);
>
> Yes, this won't work. The easiest way to return a less bogus TMCCT
> would be remember the timeout when setting up the timer and replace
> hrtimer_get_remaining() with it -- our deliver method shouldn't change
> the expiration time.

Agreed.

>
>> + } else {
>> + remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
>> + if (ktime_to_ns(remaining) < 0)
>> + remaining = ktime_set(0, 0);
>> +
>> + ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
>> + tmcct = div64_u64(ns,
>> + (APIC_BUS_CYCLE_NS * apic->divide_count));
>> + }
>>
>> return tmcct;
>> }
>> @@ -1400,52 +1407,65 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
>> }
>> EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
>>
>> -static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
>> +static void cancel_hv_timer(struct kvm_lapic *apic)
>> {
>> kvm_x86_ops->cancel_hv_timer(apic->vcpu);
>> apic->lapic_timer.hv_timer_in_use = false;
>> }
>>
>> +static bool start_hv_timer(struct kvm_lapic *apic)
>> {
>> + u64 tscdeadline;
>>
>> + if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
>> + u64 tscl = rdtsc();
>>
>> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
>> + * APIC_BUS_CYCLE_NS * apic->divide_count;
>> + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
>> + nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
>
> start_hv_timer() is called from kvm_lapic_switch_to_hv_timer(), which
> can happen mid-period. The worst case is that the timer will never
> fire, because we always convert back and forth. You have to compute the
> equivalent deadline only once, and carry it around.

Agreed. Thanks for your review. :) Please see RFC V2.

Regards,
Wanpeng Li

>
>> + }
>> +
>> + tscdeadline = apic->lapic_timer.tscdeadline;
>>
>> if (atomic_read(&apic->lapic_timer.pending) ||
>> kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
>> if (apic->lapic_timer.hv_timer_in_use)
>> + cancel_hv_timer(apic);
>> } else {
>> apic->lapic_timer.hv_timer_in_use = true;
>> hrtimer_cancel(&apic->lapic_timer.timer);
>>
>> /* In case the sw timer triggered in the window */
>> if (atomic_read(&apic->lapic_timer.pending))
>> + cancel_hv_timer(apic);
>> }
>> trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
>> apic->lapic_timer.hv_timer_in_use);
>> return apic->lapic_timer.hv_timer_in_use;
>> }
>>
>> +void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
>> +{
>> + struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> + WARN_ON(!apic->lapic_timer.hv_timer_in_use);
>> + WARN_ON(swait_active(&vcpu->wq));
>> + cancel_hv_timer(apic);
>> + apic_timer_expired(apic);
>> +
>> + if (apic_lvtt_period(apic))
>> + start_hv_timer(apic);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
>> +
>> void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_lapic *apic = vcpu->arch.apic;
>>
>> WARN_ON(apic->lapic_timer.hv_timer_in_use);
>>
>> - if (apic_lvtt_tscdeadline(apic))
>> - start_hv_tscdeadline(apic);
>> + start_hv_timer(apic);
>> }
>> EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
>>

2017-07-11 00:13:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

On 10/11/2016 05:17 AM, Wanpeng Li wrote:
> Most windows guests which I have on hand currently still utilize APIC Timer
> periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
> - windows 2008 server r2
> - windows 2012 server r2
> - windows 7
> - windows 10
>
> This patchset adds the support using the VMX preemption timer for APIC Timer
> periodic/oneshot mode.
>
> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and observed
> that w/ patch the latency is ~2% of w/o patch. I think maybe something is still
> not right in the patchset, in addition, tmcct in apic_get_tmcct() maybe is not
> calculated correctly. Your comments to improve the patchset is a great appreciated.
>
> Wanpeng Li (2):
> KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
> KVM: x86: Support using the vmx preemption timer for APIC Timer periodic/one mode
>
> arch/x86/kvm/lapic.c | 162 ++++++++++++++++++++++++++++++---------------------
> 1 file changed, 95 insertions(+), 67 deletions(-)
>

I think this is a step in the right direction, but I think there's a
different approach that would be much, much faster: use the VMX
preemption timer for *host* preemption. Specifically, do this:

1. Refactor the host TSC deadline timer a bit to allow the TSC deadline
timer to be "borrow". It might look something like this:

u64 borrow_tsc_deadline(void (*timer_callback)());

The caller is now permitted to use the TSC deadline timer for its own
nefarious purposes. The caller promises to call return_tsc_deadline()
in a timely manner if the TSC exceeds the return value while the
deadline timer is borrowed.

If the TSC deadline fires while it's borrowed, timer_callback() will be
called.

void return_tsc_deadline(bool timer_fired);

The caller is done borrowing the TSC deadline timer. The caller need
not reset the TSC deadline timer MSR to its previous value before
calling this. It must be called with IRQs on and preemption off.

Getting this to work cleanly without races may be a bit tricky. So be it.

2. Teach KVM to use the VMX preemption timer as a substitute deadline
timer while in guest mode. Specifically, KVM will borrow_tsc_deadline()
(if TSC deadline is enabled) when entering guest mode and
return_tsc_deadline() when returning out of guest mode.

3. Now KVM can change its MSR bitmaps to allow the guest to program the
TSC deadline MSR directly. No exit at all needed to handle guest writes
to the deadline timer.


Tglx, etc, would this be even remotely acceptable? I'm not personally
touching this code with a ten-foot pole, but there seem to be plenty of
people who really care about performance here.

--Andy

2017-07-11 02:44:26

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

2017-07-11 8:13 GMT+08:00 Andy Lutomirski <[email protected]>:
> On 10/11/2016 05:17 AM, Wanpeng Li wrote:
>>
>> Most windows guests which I have on hand currently still utilize APIC
>> Timer
>> periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
>> - windows 2008 server r2
>> - windows 2012 server r2
>> - windows 7
>> - windows 10
>>
>> This patchset adds the support using the VMX preemption timer for APIC
>> Timer
>> periodic/oneshot mode.
>>
>> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and
>> observed
>> that w/ patch the latency is ~2% of w/o patch. I think maybe something is
>> still
>> not right in the patchset, in addition, tmcct in apic_get_tmcct() maybe is
>> not
>> calculated correctly. Your comments to improve the patchset is a great
>> appreciated.
>>
>> Wanpeng Li (2):
>> KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
>> KVM: x86: Support using the vmx preemption timer for APIC Timer
>> periodic/one mode
>>
>> arch/x86/kvm/lapic.c | 162
>> ++++++++++++++++++++++++++++++---------------------
>> 1 file changed, 95 insertions(+), 67 deletions(-)
>>
>
> I think this is a step in the right direction, but I think there's a
> different approach that would be much, much faster: use the VMX preemption
> timer for *host* preemption. Specifically, do this:

Yeah, I agreed.

>
> 1. Refactor the host TSC deadline timer a bit to allow the TSC deadline
> timer to be "borrow". It might look something like this:
>
> u64 borrow_tsc_deadline(void (*timer_callback)());
>
> The caller is now permitted to use the TSC deadline timer for its own
> nefarious purposes. The caller promises to call return_tsc_deadline() in a
> timely manner if the TSC exceeds the return value while the deadline timer
> is borrowed.
>
> If the TSC deadline fires while it's borrowed, timer_callback() will be
> called.
>
> void return_tsc_deadline(bool timer_fired);
>
> The caller is done borrowing the TSC deadline timer. The caller need not
> reset the TSC deadline timer MSR to its previous value before calling this.
> It must be called with IRQs on and preemption off.
>
> Getting this to work cleanly without races may be a bit tricky. So be it.
>
> 2. Teach KVM to use the VMX preemption timer as a substitute deadline timer
> while in guest mode. Specifically, KVM will borrow_tsc_deadline() (if TSC
> deadline is enabled) when entering guest mode and return_tsc_deadline() when
> returning out of guest mode.
>
> 3. Now KVM can change its MSR bitmaps to allow the guest to program the TSC
> deadline MSR directly. No exit at all needed to handle guest writes to the
> deadline timer.
>
>
> Tglx, etc, would this be even remotely acceptable? I'm not personally
> touching this code with a ten-foot pole, but there seem to be plenty of
> people who really care about performance here.

Sigh, I heard the same idea from a company called Huawei, they applied
the patent for the idea.

Regard,
Wanpeng Li

2017-07-11 07:43:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

On 11/07/2017 02:13, Andy Lutomirski wrote:
> On 10/11/2016 05:17 AM, Wanpeng Li wrote:
>> Most windows guests which I have on hand currently still utilize APIC
>> Timer
>> periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
>> - windows 2008 server r2
>> - windows 2012 server r2
>> - windows 7
>> - windows 10
>>
>> This patchset adds the support using the VMX preemption timer for APIC
>> Timer
>> periodic/oneshot mode.
>>
>> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and
>> observed
>> that w/ patch the latency is ~2% of w/o patch. I think maybe something
>> is still
>> not right in the patchset, in addition, tmcct in apic_get_tmcct()
>> maybe is not
>> calculated correctly. Your comments to improve the patchset is a great
>> appreciated.
>>
>> Wanpeng Li (2):
>> KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
>> KVM: x86: Support using the vmx preemption timer for APIC Timer
>> periodic/one mode
>>
>> arch/x86/kvm/lapic.c | 162
>> ++++++++++++++++++++++++++++++---------------------
>> 1 file changed, 95 insertions(+), 67 deletions(-)
>>
>
> I think this is a step in the right direction, but I think there's a
> different approach that would be much, much faster: use the VMX
> preemption timer for *host* preemption. Specifically, do this:
>
> 1. Refactor the host TSC deadline timer a bit to allow the TSC deadline
> timer to be "borrow". It might look something like this:
>
> u64 borrow_tsc_deadline(void (*timer_callback)());
>
> The caller is now permitted to use the TSC deadline timer for its own
> nefarious purposes. The caller promises to call return_tsc_deadline()
> in a timely manner if the TSC exceeds the return value while the
> deadline timer is borrowed.
>
> If the TSC deadline fires while it's borrowed, timer_callback() will be
> called.
>
> void return_tsc_deadline(bool timer_fired);
>
> The caller is done borrowing the TSC deadline timer. The caller need
> not reset the TSC deadline timer MSR to its previous value before
> calling this. It must be called with IRQs on and preemption off.
>
> Getting this to work cleanly without races may be a bit tricky. So be it.
>
> 2. Teach KVM to use the VMX preemption timer as a substitute deadline
> timer while in guest mode. Specifically, KVM will borrow_tsc_deadline()
> (if TSC deadline is enabled) when entering guest mode and
> return_tsc_deadline() when returning out of guest mode.
>
> 3. Now KVM can change its MSR bitmaps to allow the guest to program the
> TSC deadline MSR directly. No exit at all needed to handle guest writes
> to the deadline timer.

This assumes that the TSC deadline MSR observes the guest TSC offset,
which I'm not at all sure of. If you can't, you break live migration.

Also, while it would halve the cost of a guest's programming of the
timer tick, you would still incur the cost of a vmexit to call
timer_callback (it would be different if you could program the TSC
deadline timer to send a posted interrupt, of course). Things would be
half as slow, but still a far cry from bare metal.

Really, we should just ask Intel to virtualize the TSC deadline MSR.

Paolo

2017-07-11 14:41:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode

On Mon, Jul 10, 2017 at 7:43 PM, Wanpeng Li <[email protected]> wrote:
>
> Sigh, I heard the same idea from a company called Huawei, they applied
> the patent for the idea.

Seriously? What arseholes.

Also, calling that an invention is a bit of a joke. It's called the
VMX *preemption* timer. Using it for preemption isn't exactly rocket
science. Neither is passing an MSR straight through.

That being said, the lack of scaling and offset support is indeed problematic.