2017-09-29 01:05:06

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 0/4] KVM: LAPIC: Rework lapic timer to behave more like real-hardware

The issue is reported in xen community.

Anthony PERARD pointed out:

https://www.mail-archive.com/[email protected]/msg117283.html#

| When developing PVH for OVMF, I've used the lapic timer. It turns out that the
| way it is used by OVMF did not work with Xen [1]. I tried to find out how
| real-hw behave, and write a XTF tests [2]. And this patch series tries to fix
| the behavior of the vlapic timer.
|
|
| The OVMF driver for the APIC timer initialize the timer like this:
| write to TMICT (initial counter)
| write to TMDCR (divide configuration)
| enable the timer (this may change timer mode from one-shot to periodic)
| It turns out that TMICT is set to 0 on the last step, but OVMF expect the timer
| to run.
|
| Here is some description of the APIC timer, base on observation as well as read
| of the Intel SDM. The description is also patch of patch description
| (reworded).
|
| Maybe a way of thinking how the APIC timer is evaluated, is to think of how
| hardward will do it. There is a counter TMCCT which always keeps counting down.
|
| Setting TMICT also set TMCCT, nothing else matter.
| Setting LVTT does not change anything right away.
| Setting TMDCR does not change much.
|
| Now TMCCT keeps counting down, by a value related to TMDCR.
| Once, TMCCT reach 0, it is only at this time that LVTT is taken into account.
| Is there an interrupt to deliver? Should the timer restart counting from the
| value in TMICT?
|
| In the Intel SDM, there is the word "disarm" of the timer used. I guess the
| easier way to disarm the APIC timer (when in periodic or one-shot) is to set
| TMICT to 0. But if we take TSC-Deadline mode out of the picture, there is
| nothing in the manual that say that the timer is disarm or stopped when
| changing timer mode (there is only two modes left, period and one-shot).
|
| As for the TSC-deadline timer mode, observation shown that changing to it (or
| from it) does reset and disarm both timers, so effectively TMICT and the
| tscdeadline are set to 0.
|
| [1] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00959.html
| [2] v1:
| https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg02533.html
| v2: look for "[XTF PATCH V2 0/3] Testing vlapic timer"

In addition, Patch 3/4 implements the illegal vector error handling according to
SDM 10.5.2~10.5.3.

v1 -> v2:
* add cover-letter and collect recent lapic patches to one patchset

Wanpeng Li (4):
KVM: LAPIC: Fix lapic timer mode transition
KVM: LAPIC: Keep timer running when switching between one-shot and periodic mode
KVM: LAPIC: Apply change to TDCR right away to the timer
KVM: LAPIC: Don't silently accept bad vectors

arch/x86/include/asm/apicdef.h | 1 +
arch/x86/kvm/lapic.c | 90 ++++++++++++++++++++++++++++++++++--------
2 files changed, 74 insertions(+), 17 deletions(-)

--
2.7.4


2017-09-29 01:05:10

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 1/4] KVM: LAPIC: Fix lapic timer mode transition

From: Wanpeng Li <[email protected]>

SDM 10.5.4.1 TSC-Deadline Mode mentioned that "Transitioning between TSC-Deadline
mode and other timer modes also disarms the timer". So the APIC Timer Initial Count
Register for one-shot/periodic mode should be reset. This patch do it.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/include/asm/apicdef.h | 1 +
arch/x86/kvm/lapic.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index c46bb99..d8ef1b4 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -100,6 +100,7 @@
#define APIC_TIMER_BASE_CLKIN 0x0
#define APIC_TIMER_BASE_TMBASE 0x1
#define APIC_TIMER_BASE_DIV 0x2
+#define APIC_LVT_TIMER_MASK (3 << 17)
#define APIC_LVT_TIMER_ONESHOT (0 << 17)
#define APIC_LVT_TIMER_PERIODIC (1 << 17)
#define APIC_LVT_TIMER_TSCDEADLINE (2 << 17)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 69c5612..a739cbb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1722,6 +1722,9 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
break;

case APIC_LVTT:
+ if (apic_lvtt_tscdeadline(apic) != ((val &
+ APIC_LVT_TIMER_MASK) == APIC_LVT_TIMER_TSCDEADLINE))
+ kvm_lapic_set_reg(apic, APIC_TMICT, 0);
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
--
2.7.4

2017-09-29 01:05:18

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 3/4] KVM: LAPIC: Apply change to TDCR right away to the timer

From: Wanpeng Li <[email protected]>

The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."

Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.

The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 946c11b..6bafd06 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1432,7 +1432,7 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
}

-static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update, uint32_t old_divisor)
{
ktime_t now, remaining;
u64 tscl = rdtsc(), delta;
@@ -1440,7 +1440,7 @@ static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
/* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
- * APIC_BUS_CYCLE_NS * apic->divide_count;
+ * APIC_BUS_CYCLE_NS * old_divisor;

if (!apic->lapic_timer.period)
return false;
@@ -1485,6 +1485,12 @@ static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
if (!delta)
return false;

+ if (apic->divide_count != old_divisor) {
+ apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
+ * APIC_BUS_CYCLE_NS * apic->divide_count;
+ delta = delta * apic->divide_count / old_divisor;
+ }
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
nsec_to_cycles(apic->vcpu, delta);
apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
@@ -1624,12 +1630,13 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
}

-static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update,
+ uint32_t old_divisor)
{
atomic_set(&apic->lapic_timer.pending, 0);

if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
- && !set_target_expiration(apic, timer_update))
+ && !set_target_expiration(apic, timer_update, old_divisor))
return;

restart_apic_timer(apic);
@@ -1745,7 +1752,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
- start_apic_timer(apic, true);
+ start_apic_timer(apic, true, apic->divide_count);
break;

case APIC_TMICT:
@@ -1754,16 +1761,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)

hrtimer_cancel(&apic->lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
- start_apic_timer(apic, false);
+ start_apic_timer(apic, false, apic->divide_count);
break;

- case APIC_TDCR:
+ case APIC_TDCR: {
+ uint32_t current_divisor = apic->divide_count;
+
if (val & 4)
apic_debug("KVM_WRITE:TDCR %x\n", val);
kvm_lapic_set_reg(apic, APIC_TDCR, val);
update_divide_count(apic);
+ hrtimer_cancel(&apic->lapic_timer.timer);
+ start_apic_timer(apic, true, current_divisor);
break;
-
+ }
case APIC_ESR:
if (apic_x2apic_mode(apic) && val != 0) {
apic_debug("KVM_WRITE:ESR not zero %x\n", val);
@@ -1888,7 +1899,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)

hrtimer_cancel(&apic->lapic_timer.timer);
apic->lapic_timer.tscdeadline = data;
- start_apic_timer(apic, false);
+ start_apic_timer(apic, false, apic->divide_count);
}

void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
@@ -2254,7 +2265,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
apic_update_lvtt(apic);
apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));
update_divide_count(apic);
- start_apic_timer(apic, false);
+ start_apic_timer(apic, false, apic->divide_count);
apic->irr_pending = true;
apic->isr_count = vcpu->arch.apicv_active ?
1 : count_vectors(apic->regs + APIC_ISR);
--
2.7.4

2017-09-29 01:05:34

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 4/4] KVM: LAPIC: Don't silently accept bad vectors

From: Wanpeng Li <[email protected]>

Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6bafd06..a779ba9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -935,6 +935,25 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
return ret;
}

+static void apic_error(struct kvm_lapic *apic, unsigned long errmask)
+{
+ uint32_t esr;
+
+ esr = kvm_lapic_get_reg(apic, APIC_ESR);
+
+ if ((esr & errmask) != errmask) {
+ uint32_t lvterr = kvm_lapic_get_reg(apic, APIC_LVTERR);
+
+ kvm_lapic_set_reg(apic, APIC_ESR, esr | errmask);
+ if (!(lvterr & APIC_LVT_MASKED)) {
+ struct kvm_lapic_irq irq;
+
+ irq.vector = lvterr & 0xff;
+ kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
+ }
+ }
+}
+
/*
* Add a pending IRQ into lapic.
* Return 1 if successfully added and 0 if discarded.
@@ -946,6 +965,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
int result = 0;
struct kvm_vcpu *vcpu = apic->vcpu;

+ if (unlikely(vector < 16) && delivery_mode == APIC_DM_FIXED) {
+ apic_error(apic, APIC_ESR_RECVILL);
+ return 0;
+ }
+
trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
trig_mode, vector);
switch (delivery_mode) {
@@ -1146,7 +1170,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
irq.trig_mode, irq.level, irq.dest_mode, irq.delivery_mode,
irq.vector, irq.msi_redir_hint);

- kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
+ if (unlikely(irq.vector < 16 && irq.delivery_mode == APIC_DM_FIXED))
+ apic_error(apic, APIC_ESR_SENDILL);
+ else
+ kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
}

static u32 apic_get_tmcct(struct kvm_lapic *apic)
@@ -1734,7 +1761,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
case APIC_LVTPC:
case APIC_LVT1:
case APIC_LVTERR:
- /* TODO: Check vector */
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;

--
2.7.4

2017-09-29 01:05:58

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 2/4] KVM: LAPIC: Keep timer running when switching between one-shot and periodic mode

From: Wanpeng Li <[email protected]>

If we take TSC-deadline mode timer out of the picture, the Intel SDM
does not say that the timer is disable when the timer mode is change,
either from one-shot to periodic or vice versa.

After this patch, the timer is no longer disarmed on change of mode, so
the counter (TMCCT) keeps counting down.

So what does a write to LVTT changes ? On baremetal, the change of mode
is probably taken into account only when the counter reach 0. When this
happen, LVTT is use to figure out if the counter should restard counting
down from TMICT (so periodic mode) or stop counting (if one-shot mode).

This patch is based on observation of the behavior of the APIC timer on
baremetal as well as check that they does not go against the description
written in the Intel SDM.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a739cbb..946c11b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1301,7 +1301,7 @@ static void update_divide_count(struct kvm_lapic *apic)
apic->divide_count);
}

-static void apic_update_lvtt(struct kvm_lapic *apic)
+static bool apic_update_lvtt(struct kvm_lapic *apic)
{
u32 timer_mode = kvm_lapic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask;
@@ -1309,7 +1309,9 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
if (apic->lapic_timer.timer_mode != timer_mode) {
apic->lapic_timer.timer_mode = timer_mode;
hrtimer_cancel(&apic->lapic_timer.timer);
+ return true;
}
+ return false;
}

static void apic_timer_expired(struct kvm_lapic *apic)
@@ -1430,11 +1432,12 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
}

-static bool set_target_expiration(struct kvm_lapic *apic)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
{
- ktime_t now;
- u64 tscl = rdtsc();
+ ktime_t now, remaining;
+ u64 tscl = rdtsc(), delta;

+ /* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
* APIC_BUS_CYCLE_NS * apic->divide_count;
@@ -1470,9 +1473,21 @@ static bool set_target_expiration(struct kvm_lapic *apic)
ktime_to_ns(ktime_add_ns(now,
apic->lapic_timer.period)));

+ if (!timer_update)
+ delta = apic->lapic_timer.period;
+ else {
+ remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
+ if (ktime_to_ns(remaining) < 0)
+ remaining = 0;
+ delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
+ }
+
+ if (!delta)
+ return false;
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
- nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
- apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period);
+ nsec_to_cycles(apic->vcpu, delta);
+ apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);

return true;
}
@@ -1609,12 +1624,12 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
}

-static void start_apic_timer(struct kvm_lapic *apic)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
{
atomic_set(&apic->lapic_timer.pending, 0);

if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
- && !set_target_expiration(apic))
+ && !set_target_expiration(apic, timer_update))
return;

restart_apic_timer(apic);
@@ -1729,7 +1744,8 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
- apic_update_lvtt(apic);
+ if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
+ start_apic_timer(apic, true);
break;

case APIC_TMICT:
@@ -1738,7 +1754,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)

hrtimer_cancel(&apic->lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
- start_apic_timer(apic);
+ start_apic_timer(apic, false);
break;

case APIC_TDCR:
@@ -1872,7 +1888,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)

hrtimer_cancel(&apic->lapic_timer.timer);
apic->lapic_timer.tscdeadline = data;
- start_apic_timer(apic);
+ start_apic_timer(apic, false);
}

void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
@@ -2238,7 +2254,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
apic_update_lvtt(apic);
apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));
update_divide_count(apic);
- start_apic_timer(apic);
+ start_apic_timer(apic, false);
apic->irr_pending = true;
apic->isr_count = vcpu->arch.apicv_active ?
1 : count_vectors(apic->regs + APIC_ISR);
--
2.7.4