2016-04-07 08:21:12

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 00/11] KVM: x86: Introduce SVM AVIC support

CHANGES FROM RFCv3:
==================
(https://lkml.org/lkml/2016/3/18/24)
* Added reviewed by in patch 3, 4, and 5.
* Make minor changes to some variable name (per Paolo).
* Merge AVIC vmexit handler and trace points patches (per Paolo)
* Delay the x86_set_memory_region() call from vm_init phrase to
when we create the first vcpu.
* Clean up x2APIC disabling code when enabling AVIC.
* Clean up CR8 interception code when enabling AVIC.
* Clean up avic_cpu_load() and avic_set_running() (per Radim).
* Note that nested-vm is not currently supported with AVIC enabled.
However, it should still work with AVIC disabled. This is still work
in progress.

CHANGES FROM RFCv2:
==================
(https://lkml.org/lkml/2016/3/4/746)
* Do not use bit-field in VMCB.
* Get rid of struct svm_vm_data, and consolidate the struct members
into struct kvm_arch.
* Do not abbreviate AVIC vmexit function and structure names.
* Adding VM init/uninit function hooks and use it to initialize /
uninitialize per-VM data structures.
* Simplify AVIC backing page allocation to use the emulated lapic
register page.
* Introduce kvm_vcpu_wake_up() (from Radim).
* Clean up AVIC vmexit handling functions.
* Replace pr_debug with new AVIC tracepoints.
* Use per-vcpu AVIC enable check (besides the global avic flag).
* Add smb_mb__after_atomic() in svm_deliver_avic_intr.
* Get rid of iommu related code for now.

CHANGES FROM RFCv1:
==================
(https://lkml.org/lkml/2016/2/12/225)
* Rebased from tip.git
* Use the vAPIC backing page as the emulated LAPIC register page.
* Clarified with HW engineer that Avic sets the IRR of all the cores
targeted whether they are running or not. It sends doorbells to the
ones that are running to get them to evaluate their new state.
If one or more weren't running it does the VMEXIT to indicate that
the host needs to go run the not running guests. It gives it the
ID of one of the guests that it observed not running which should
be enough of a hint for the hypervisor to track them all down.
* Rewrite the logic for handling AVIC_INCOMPLETE_IPI #vmexit for
IPI target not running case based on information above.
* Rewrite the logic in avic_vcpu_load() and avic_set_running().
* Rewrite the interrupt injection to remove the avic_pending_cnt
variable.
* Adding vcpu blocking/unblocking function hooks

GITHUB
======
Latest git tree can be found at:
http://github.com/ssuthiku/linux.git avic_part1_rfc_v4

OVERVIEW
========
This patch set is the first of the two-part patch series to introduce
the new AMD Advance Virtual Interrupt Controller (AVIC) support.

Basically, SVM AVIC hardware virtualizes local APIC registers of each
vCPU via the virtual APIC (vAPIC) backing page. This allows guest access
to certain APIC registers without the need to emulate the hardware behavior
in the hypervisor. More information about AVIC can be found in the
AMD64 Architecture Programmer’s Manual Volume 2 - System Programming.

http://support.amd.com/TechDocs/24593.pdf

For SVM AVIC, we extend the existing kvm_amd driver to:
* Check CPUID to detect AVIC support in the processor
* Program new fields in VMCB to enable AVIC
* Introduce new AVIC data structures and add code to manage them
* Handle two new AVIC #VMEXITs
* Add new interrupt intjection code using vAPIC backing page
instead of the existing V_IRQ, V_INTR_PRIO, V_INTR_VECTOR,
and V_IGN_TPR fields

Currently, this patch series does not enable AVIC by default.
Users can enable SVM AVIC by specifying avic=1 during insmod kvm-amd.

Later, in part 2, we will introduce the IOMMU AVIC support, which
provides speed up for PCI device pass-through use case by allowing
the IOMMU hardware to inject interrupt directly into the guest via
the vAPIC backing page.

PERFORMANCE RESULTS
===================
Currently, AVIC is supported in the AMD family 15h models 6Xh
(Carrizo) processors. Therefore, it is used to collect the
perforamance data shown below.

Generaly, SVM AVIC alone (w/o IOMMU AVIC) should provide speedup for
IPI interrupt since hypervisor does not require VMEXIT to inject
these interrupts. Also, it should speed up the case when hypervisor
wants to inject an interrupt into a running guest by setting the
corresponded IRR bit in the vAPIC backing page and trigger
AVIC_DOORBELL MSR.

IPI PERFORMANCE
===============

* BENCHMARK 1: HACKBENCH
For IPI, I have collected some performance number on 2 and 4 CPU running
hackbech with the following detail:

hackbench -p -l 100000
Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 100000 messages of 100 bytes

| 2 vcpus | 4 vcpus
------------------------------------------------
Vanila | 273.76 | 190.21
AVIC disabled | 260.51 (~5%) | 184.40 (~5%)
AVIC | 248.53 (~10%) | 155.01 (~20%)

OVERALL PERFORMANCE
===================
Enabling AVIC should helps speeding up workloads, which generate
large amount of interrupts. However, it requires additional logics to
maintain AVIC-specific data structures during vCPU load/unload
due to vcpu scheduling.

The goal is to minimize the overhead of AVIC in most cases, so that
we can achieve equivalent or improvement in overall performance when
enabling AVIC.

* BENCHMARK 1: TAR DECOMPRESSION
This test measures the average running time (of 10 runs) of the following
tar decompression command with 1, 2, and 4 vcpus.

tar xf linux-4.3.3.tar.xz

| 4 vcpus
---------------------------------
Vanila | 10.26
AVIC disabled | 10.10 (~1.5%)
AVIC | 10.07 (~1.8%)

Note: The unit of result below is in seconds (lower is better).

* BENCHMARK 2: NETPERF w/ virtual network
This test creates a virtual network by setting up bridge and tap device
on the host and pass it into the VM as virtio-net-pci device w/ vhost.
Then it sets up netserver in the host machine, and run netperf
in the VM with following option:

netperf -H <netserver ip> -l 60 -t TCP_RR -D 2

| 1 vcpu
------------------------------------
Vanila | 21623.887
AVIC disabled | 21538.09 (~-.4%)
AVIC | 21712.68 (~0.4%)

Note: The unit of result below is trans/sec (higher is better).

Preliminary result of both benchmarks show AVIC performance are slightly
better than the other two cases.

CURRENT UNTESTED USE-CASES
===========================
- VM Migration (work in progress)
- Nested VM

Any feedback and comments are very much appreciated.

Thank you,
Suravee

Radim Krčmář (1):
KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick

Suravee Suthikulpanit (10):
KVM: x86: Misc LAPIC changes to expose helper functions
KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks
KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
svm: Introduce new AVIC VMCB registers
KVM: x86: Detect and Initialize AVIC support
svm: Add interrupt injection via AVIC
svm: Add VMEXIT handlers for AVIC
svm: Do not expose x2APIC when enable AVIC
svm: Do not intercept CR8 when enable AVIC
svm: Manage vcpu load/unload when enable AVIC

arch/x86/include/asm/kvm_host.h | 25 +-
arch/x86/include/asm/svm.h | 12 +-
arch/x86/include/uapi/asm/svm.h | 9 +-
arch/x86/kvm/lapic.c | 127 ++++----
arch/x86/kvm/lapic.h | 32 ++
arch/x86/kvm/svm.c | 637 +++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/trace.h | 57 ++++
arch/x86/kvm/x86.c | 8 +
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 19 +-
10 files changed, 833 insertions(+), 94 deletions(-)

--
1.9.1


2016-04-07 08:21:20

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 01/11] KVM: x86: Misc LAPIC changes to expose helper functions

Exporting LAPIC utility functions and macros for re-use in SVM code.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/lapic.c | 127 +++++++++++++++++++++------------------------------
arch/x86/kvm/lapic.h | 29 ++++++++++++
2 files changed, 82 insertions(+), 74 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 443d2a5..99c9757 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -59,9 +59,8 @@
/* #define apic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) */
#define apic_debug(fmt, arg...)

-#define APIC_LVT_NUM 6
/* 14 is the version for Xeon and Pentium 8.4.8*/
-#define APIC_VERSION (0x14UL | ((APIC_LVT_NUM - 1) << 16))
+#define APIC_VERSION (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
#define LAPIC_MMIO_LENGTH (1 << 12)
/* followed define is not in apicdef.h */
#define APIC_SHORT_MASK 0xc0000
@@ -73,14 +72,6 @@
#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul

-#define VEC_POS(v) ((v) & (32 - 1))
-#define REG_POS(v) (((v) >> 5) << 4)
-
-static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
-{
- *((u32 *) (apic->regs + reg_off)) = val;
-}
-
static inline int apic_test_vector(int vec, void *bitmap)
{
return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
@@ -94,11 +85,6 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
apic_test_vector(vector, apic->regs + APIC_IRR);
}

-static inline void apic_set_vector(int vec, void *bitmap)
-{
- set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
-}
-
static inline void apic_clear_vector(int vec, void *bitmap)
{
clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
@@ -212,7 +198,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
{
bool enabled = val & APIC_SPIV_APIC_ENABLED;

- apic_set_reg(apic, APIC_SPIV, val);
+ kvm_lapic_set_reg(apic, APIC_SPIV, val);

if (enabled != apic->sw_enabled) {
apic->sw_enabled = enabled;
@@ -226,13 +212,13 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)

static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
{
- apic_set_reg(apic, APIC_ID, id << 24);
+ kvm_lapic_set_reg(apic, APIC_ID, id << 24);
recalculate_apic_map(apic->vcpu->kvm);
}

static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
{
- apic_set_reg(apic, APIC_LDR, id);
+ kvm_lapic_set_reg(apic, APIC_LDR, id);
recalculate_apic_map(apic->vcpu->kvm);
}

@@ -240,8 +226,8 @@ static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u8 id)
{
u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));

- apic_set_reg(apic, APIC_ID, id << 24);
- apic_set_reg(apic, APIC_LDR, ldr);
+ kvm_lapic_set_reg(apic, APIC_ID, id << 24);
+ kvm_lapic_set_reg(apic, APIC_LDR, ldr);
recalculate_apic_map(apic->vcpu->kvm);
}

@@ -287,10 +273,10 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
v |= APIC_LVR_DIRECTED_EOI;
- apic_set_reg(apic, APIC_LVR, v);
+ kvm_lapic_set_reg(apic, APIC_LVR, v);
}

-static const unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
+static const unsigned int apic_lvt_mask[KVM_APIC_LVT_NUM] = {
LVT_MASK , /* part LVTT mask, timer mode mask added at runtime */
LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
LVT_MASK | APIC_MODE_MASK, /* LVTPC */
@@ -349,16 +335,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
}
EXPORT_SYMBOL_GPL(kvm_apic_update_irr);

-static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
-{
- apic_set_vector(vec, apic->regs + APIC_IRR);
- /*
- * irr_pending must be true if any interrupt is pending; set it after
- * APIC_IRR to avoid race with apic_clear_irr
- */
- apic->irr_pending = true;
-}
-
static inline int apic_search_irr(struct kvm_lapic *apic)
{
return find_highest_vector(apic->regs + APIC_IRR);
@@ -563,7 +539,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)
apic, ppr, isr, isrv);

if (old_ppr != ppr) {
- apic_set_reg(apic, APIC_PROCPRI, ppr);
+ kvm_lapic_set_reg(apic, APIC_PROCPRI, ppr);
if (ppr < old_ppr)
kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
}
@@ -571,7 +547,7 @@ static void apic_update_ppr(struct kvm_lapic *apic)

static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
{
- apic_set_reg(apic, APIC_TASKPRI, tpr);
+ kvm_lapic_set_reg(apic, APIC_TASKPRI, tpr);
apic_update_ppr(apic);
}

@@ -668,6 +644,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
return false;
}
}
+EXPORT_SYMBOL_GPL(kvm_apic_match_dest);

int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
const unsigned long *bitmap, u32 bitmap_size)
@@ -921,7 +898,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,

if (apic_test_vector(vector, apic->regs + APIC_TMR) != !!trig_mode) {
if (trig_mode)
- apic_set_vector(vector, apic->regs + APIC_TMR);
+ kvm_lapic_set_vector(vector, apic->regs + APIC_TMR);
else
apic_clear_vector(vector, apic->regs + APIC_TMR);
}
@@ -929,7 +906,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
if (vcpu->arch.apicv_active)
kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
else {
- apic_set_irr(vector, apic);
+ kvm_lapic_set_irr(vector, apic);

kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_vcpu_kick(vcpu);
@@ -1186,7 +1163,7 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
return container_of(dev, struct kvm_lapic, dev);
}

-static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
void *data)
{
unsigned char alignment = offset & 0xf;
@@ -1223,6 +1200,7 @@ static int apic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
}
return 0;
}
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_read);

static int apic_mmio_in_range(struct kvm_lapic *apic, gpa_t addr)
{
@@ -1240,7 +1218,7 @@ static int apic_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
if (!apic_mmio_in_range(apic, address))
return -EOPNOTSUPP;

- apic_reg_read(apic, offset, len, data);
+ kvm_lapic_reg_read(apic, offset, len, data);

return 0;
}
@@ -1425,7 +1403,7 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
}
}

-static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
{
int ret = 0;

@@ -1457,7 +1435,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)

case APIC_DFR:
if (!apic_x2apic_mode(apic)) {
- apic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
+ kvm_lapic_set_reg(apic, APIC_DFR, val | 0x0FFFFFFF);
recalculate_apic_map(apic->vcpu->kvm);
} else
ret = 1;
@@ -1472,10 +1450,10 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
int i;
u32 lvt_val;

- for (i = 0; i < APIC_LVT_NUM; i++) {
+ for (i = 0; i < KVM_APIC_LVT_NUM; i++) {
lvt_val = kvm_apic_get_reg(apic,
APIC_LVTT + 0x10 * i);
- apic_set_reg(apic, APIC_LVTT + 0x10 * i,
+ kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i,
lvt_val | APIC_LVT_MASKED);
}
apic_update_lvtt(apic);
@@ -1486,14 +1464,14 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
}
case APIC_ICR:
/* No delay here, so we always clear the pending bit */
- apic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
+ kvm_lapic_set_reg(apic, APIC_ICR, val & ~(1 << 12));
apic_send_ipi(apic);
break;

case APIC_ICR2:
if (!apic_x2apic_mode(apic))
val &= 0xff000000;
- apic_set_reg(apic, APIC_ICR2, val);
+ kvm_lapic_set_reg(apic, APIC_ICR2, val);
break;

case APIC_LVT0:
@@ -1507,7 +1485,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
val |= APIC_LVT_MASKED;

val &= apic_lvt_mask[(reg - APIC_LVTT) >> 4];
- apic_set_reg(apic, reg, val);
+ kvm_lapic_set_reg(apic, reg, val);

break;

@@ -1515,7 +1493,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
- apic_set_reg(apic, APIC_LVTT, val);
+ kvm_lapic_set_reg(apic, APIC_LVTT, val);
apic_update_lvtt(apic);
break;

@@ -1524,14 +1502,14 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
break;

hrtimer_cancel(&apic->lapic_timer.timer);
- apic_set_reg(apic, APIC_TMICT, val);
+ kvm_lapic_set_reg(apic, APIC_TMICT, val);
start_apic_timer(apic);
break;

case APIC_TDCR:
if (val & 4)
apic_debug("KVM_WRITE:TDCR %x\n", val);
- apic_set_reg(apic, APIC_TDCR, val);
+ kvm_lapic_set_reg(apic, APIC_TDCR, val);
update_divide_count(apic);
break;

@@ -1544,7 +1522,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)

case APIC_SELF_IPI:
if (apic_x2apic_mode(apic)) {
- apic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
+ kvm_lapic_reg_write(apic, APIC_ICR, 0x40000 | (val & 0xff));
} else
ret = 1;
break;
@@ -1556,6 +1534,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
apic_debug("Local APIC Write to read-only register %x\n", reg);
return ret;
}
+EXPORT_SYMBOL_GPL(kvm_lapic_reg_write);

static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
gpa_t address, int len, const void *data)
@@ -1585,14 +1564,14 @@ static int apic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
apic_debug("%s: offset 0x%x with length 0x%x, and value is "
"0x%x\n", __func__, offset, len, val);

- apic_reg_write(apic, offset & 0xff0, val);
+ kvm_lapic_reg_write(apic, offset & 0xff0, val);

return 0;
}

void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
{
- apic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
+ kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
}
EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);

@@ -1604,10 +1583,10 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
/* hw has done the conditional check and inst decode */
offset &= 0xff0;

- apic_reg_read(vcpu->arch.apic, offset, 4, &val);
+ kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);

/* TODO: optimize to just emulate side effect w/o one more write */
- apic_reg_write(vcpu->arch.apic, offset, val);
+ kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
}
EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);

@@ -1740,28 +1719,28 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_apic_set_id(apic, vcpu->vcpu_id);
kvm_apic_set_version(apic->vcpu);

- for (i = 0; i < APIC_LVT_NUM; i++)
- apic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
+ for (i = 0; i < KVM_APIC_LVT_NUM; i++)
+ kvm_lapic_set_reg(apic, APIC_LVTT + 0x10 * i, APIC_LVT_MASKED);
apic_update_lvtt(apic);
if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_LINT0_REENABLED))
- apic_set_reg(apic, APIC_LVT0,
+ kvm_lapic_set_reg(apic, APIC_LVT0,
SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
apic_manage_nmi_watchdog(apic, kvm_apic_get_reg(apic, APIC_LVT0));

- apic_set_reg(apic, APIC_DFR, 0xffffffffU);
+ kvm_lapic_set_reg(apic, APIC_DFR, 0xffffffffU);
apic_set_spiv(apic, 0xff);
- apic_set_reg(apic, APIC_TASKPRI, 0);
+ kvm_lapic_set_reg(apic, APIC_TASKPRI, 0);
if (!apic_x2apic_mode(apic))
kvm_apic_set_ldr(apic, 0);
- apic_set_reg(apic, APIC_ESR, 0);
- apic_set_reg(apic, APIC_ICR, 0);
- apic_set_reg(apic, APIC_ICR2, 0);
- apic_set_reg(apic, APIC_TDCR, 0);
- apic_set_reg(apic, APIC_TMICT, 0);
+ kvm_lapic_set_reg(apic, APIC_ESR, 0);
+ kvm_lapic_set_reg(apic, APIC_ICR, 0);
+ kvm_lapic_set_reg(apic, APIC_ICR2, 0);
+ kvm_lapic_set_reg(apic, APIC_TDCR, 0);
+ kvm_lapic_set_reg(apic, APIC_TMICT, 0);
for (i = 0; i < 8; i++) {
- apic_set_reg(apic, APIC_IRR + 0x10 * i, 0);
- apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
- apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
+ kvm_lapic_set_reg(apic, APIC_IRR + 0x10 * i, 0);
+ kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
+ kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
apic->irr_pending = vcpu->arch.apicv_active;
apic->isr_count = vcpu->arch.apicv_active ? 1 : 0;
@@ -2139,8 +2118,8 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)

/* if this is ICR write vector before command */
if (reg == APIC_ICR)
- apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
- return apic_reg_write(apic, reg, (u32)data);
+ kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+ return kvm_lapic_reg_write(apic, reg, (u32)data);
}

int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
@@ -2157,10 +2136,10 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
return 1;
}

- if (apic_reg_read(apic, reg, 4, &low))
+ if (kvm_lapic_reg_read(apic, reg, 4, &low))
return 1;
if (reg == APIC_ICR)
- apic_reg_read(apic, APIC_ICR2, 4, &high);
+ kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);

*data = (((u64)high) << 32) | low;

@@ -2176,8 +2155,8 @@ int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)

/* if this is ICR write vector before command */
if (reg == APIC_ICR)
- apic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
- return apic_reg_write(apic, reg, (u32)data);
+ kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+ return kvm_lapic_reg_write(apic, reg, (u32)data);
}

int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
@@ -2188,10 +2167,10 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
if (!lapic_in_kernel(vcpu))
return 1;

- if (apic_reg_read(apic, reg, 4, &low))
+ if (kvm_lapic_reg_read(apic, reg, 4, &low))
return 1;
if (reg == APIC_ICR)
- apic_reg_read(apic, APIC_ICR2, 4, &high);
+ kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);

*data = (((u64)high) << 32) | low;

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index f71183e..a70cb62 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -7,6 +7,7 @@

#define KVM_APIC_INIT 0
#define KVM_APIC_SIPI 1
+#define KVM_APIC_LVT_NUM 6

struct kvm_timer {
struct hrtimer timer;
@@ -59,6 +60,11 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val);
+int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
+ void *data);
+bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
+ int short_hand, unsigned int dest, int dest_mode);

void __kvm_apic_update_irr(u32 *pir, void *regs);
void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
@@ -99,11 +105,34 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
void kvm_lapic_init(void);

+#define VEC_POS(v) ((v) & (32 - 1))
+#define REG_POS(v) (((v) >> 5) << 4)
+
+static inline void kvm_lapic_set_vector(int vec, void *bitmap)
+{
+ set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
+}
+
+static inline void kvm_lapic_set_irr(int vec, struct kvm_lapic *apic)
+{
+ kvm_lapic_set_vector(vec, apic->regs + APIC_IRR);
+ /*
+ * irr_pending must be true if any interrupt is pending; set it after
+ * APIC_IRR to avoid race with apic_clear_irr
+ */
+ apic->irr_pending = true;
+}
+
static inline u32 kvm_apic_get_reg(struct kvm_lapic *apic, int reg_off)
{
return *((u32 *) (apic->regs + reg_off));
}

+static inline void kvm_lapic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
+{
+ *((u32 *) (apic->regs + reg_off)) = val;
+}
+
extern struct static_key kvm_no_apic_vcpu;

static inline bool lapic_in_kernel(struct kvm_vcpu *vcpu)
--
1.9.1

2016-04-07 08:21:27

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 02/11] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks

Adding function pointers in struct kvm_x86_ops for processor-specific
layer to provide hooks for when KVM initialize and un-initialize VM.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/x86.c | 6 ++++++
2 files changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f62a9f37..22bd70c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -848,6 +848,9 @@ struct kvm_x86_ops {
bool (*cpu_has_high_real_mode_segbase)(void);
void (*cpuid_update)(struct kvm_vcpu *vcpu);

+ int (*vm_init)(struct kvm *kvm);
+ void (*vm_uninit)(struct kvm *kvm);
+
/* Create, but do not attach this VCPU */
struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
void (*vcpu_free)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 742d0f7..d12583e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7754,6 +7754,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm_page_track_init(kvm);
kvm_mmu_init_vm(kvm);

+ if (kvm_x86_ops->vm_init)
+ return kvm_x86_ops->vm_init(kvm);
+
return 0;
}

@@ -7781,6 +7784,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_arch_vcpu_free(vcpu);

+ if (kvm_x86_ops->vm_uninit)
+ kvm_x86_ops->vm_uninit(kvm);
+
mutex_lock(&kvm->lock);
for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
kvm->vcpus[i] = NULL;
--
1.9.1

2016-04-07 08:21:40

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 04/11] KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick

From: Radim Krčmář <[email protected]>

AVIC has a use for kvm_vcpu_wake_up.

Signed-off-by: Radim Krčmář <[email protected]>
Tested-by: Suravee Suthikulpanit <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 19 +++++++++++++------
2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5276fe0..673749d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -651,6 +651,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
void kvm_vcpu_block(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
int kvm_vcpu_yield_to(struct kvm_vcpu *target);
void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4fd482f..5bf20f8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2071,13 +2071,8 @@ out:
EXPORT_SYMBOL_GPL(kvm_vcpu_block);

#ifndef CONFIG_S390
-/*
- * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
- */
-void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
{
- int me;
- int cpu = vcpu->cpu;
struct swait_queue_head *wqp;

wqp = kvm_arch_vcpu_wq(vcpu);
@@ -2086,6 +2081,18 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}

+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
+
+/*
+ * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
+ */
+void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+ int me;
+ int cpu = vcpu->cpu;
+
+ kvm_vcpu_wake_up(vcpu);
me = get_cpu();
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
if (kvm_arch_vcpu_should_kick(vcpu))
--
1.9.1

2016-04-07 08:21:52

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 05/11] svm: Introduce new AVIC VMCB registers

Introduce new AVIC VMCB registers.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/svm.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6136d99..4711fa4 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -78,7 +78,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 exit_int_info;
u32 exit_int_info_err;
u64 nested_ctl;
- u8 reserved_4[16];
+ u64 avic_vapic_bar;
+ u8 reserved_4[8];
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
@@ -88,7 +89,11 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u64 next_rip;
u8 insn_len;
u8 insn_bytes[15];
- u8 reserved_6[800];
+ u64 avic_backing_page; /* Offset 0xe0 */
+ u8 reserved_6[8]; /* Offset 0xe8 */
+ u64 avic_logical_id; /* Offset 0xf0 */
+ u64 avic_physical_id; /* Offset 0xf8 */
+ u8 reserved_7[768];
};


--
1.9.1

2016-04-07 08:21:58

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 06/11] KVM: x86: Detect and Initialize AVIC support

This patch introduces AVIC-related data structure, and AVIC
initialization code.

There are three main data structures for AVIC:
* Virtual APIC (vAPIC) backing page (per-VCPU)
* Physical APIC ID table (per-VM)
* Logical APIC ID table (per-VM)

Currently, AVIC is disabled by default. Users can manually
enable AVIC via kernel boot option kvm-amd.avic=1 or during
kvm-amd module loading with parameter avic=1.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/include/asm/svm.h | 3 +
arch/x86/kvm/svm.c | 238 +++++++++++++++++++++++++++++++++++++++-
3 files changed, 245 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2268f4c..1df946f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -774,6 +774,11 @@ struct kvm_arch {
u8 nr_reserved_ioapic_pins;

bool disabled_lapic_found;
+
+ /* Struct members for AVIC */
+ u32 ldr_mode;
+ struct page *avic_logical_id_table_page;
+ struct page *avic_physical_id_table_page;
};

struct kvm_vm_stat {
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 4711fa4..d0fe23e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define V_INTR_MASKING_SHIFT 24
#define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)

+#define AVIC_ENABLE_SHIFT 31
+#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
+
#define SVM_INTERRUPT_SHADOW_MASK 1

#define SVM_IOIO_STR_SHIFT 2
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 31346a3..137cf18 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -14,6 +14,9 @@
* the COPYING file in the top-level directory.
*
*/
+
+#define pr_fmt(fmt) "SVM: " fmt
+
#include <linux/kvm_host.h>

#include "irq.h"
@@ -78,6 +81,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
#define TSC_RATIO_MIN 0x0000000000000001ULL
#define TSC_RATIO_MAX 0x000000ffffffffffULL

+#define AVIC_HPA_MASK ~((0xFFFULL << 52) || 0xFFF)
+
+/* NOTE: Current max index allowed for physical APIC ID table is 255 */
+#define AVIC_PHYSICAL_ID_MAX 0xFF
+
static bool erratum_383_found __read_mostly;

static const u32 host_save_user_msrs[] = {
@@ -162,8 +170,19 @@ struct vcpu_svm {

/* cached guest cpuid flags for faster access */
bool nrips_enabled : 1;
+
+ struct page *avic_backing_page;
+ u64 *avic_physical_id_cache;
};

+#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFF)
+#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
+
+#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK (0xFFULL)
+#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK (0xFFFFFFFFFFULL << 12)
+#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62)
+#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
+
static DEFINE_PER_CPU(u64, current_tsc_ratio);
#define TSC_RATIO_DEFAULT 0x0100000000ULL

@@ -205,6 +224,10 @@ module_param(npt, int, S_IRUGO);
static int nested = true;
module_param(nested, int, S_IRUGO);

+/* enable / disable AVIC */
+static int avic;
+module_param(avic, int, S_IRUGO);
+
static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
static void svm_flush_tlb(struct kvm_vcpu *vcpu);
static void svm_complete_interrupts(struct vcpu_svm *svm);
@@ -234,6 +257,18 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))

+#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
+
+static inline bool svm_vcpu_avic_enabled(struct vcpu_svm *svm)
+{
+ return (avic && (svm->vmcb->control.int_ctl & AVIC_ENABLE_MASK));
+}
+
+static inline void avic_update_vapic_bar(struct vcpu_svm *svm, u64 data)
+{
+ svm->vmcb->control.avic_vapic_bar = data & VMCB_AVIC_APIC_BAR_MASK;
+}
+
static inline void mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
@@ -923,6 +958,13 @@ static __init int svm_hardware_setup(void)
} else
kvm_disable_tdp();

+ if (avic && (!npt_enabled || !boot_cpu_has(X86_FEATURE_AVIC)))
+ avic = false;
+
+ if (avic) {
+ printk(KERN_INFO "kvm: AVIC enabled\n");
+ }
+
return 0;

err:
@@ -1000,6 +1042,24 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
}

+static void avic_init_vmcb(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = svm->vmcb;
+ struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
+ phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
+ phys_addr_t lpa = page_to_phys(vm_data->avic_logical_id_table_page);
+ phys_addr_t ppa = page_to_phys(vm_data->avic_physical_id_table_page);
+
+ if (!vmcb)
+ return;
+
+ vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK;
+ vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK;
+ vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
+ vmcb->control.avic_physical_id |= AVIC_PHYSICAL_ID_MAX;
+ vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+}
+
static void init_vmcb(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -1113,6 +1173,142 @@ static void init_vmcb(struct vcpu_svm *svm)
mark_all_dirty(svm->vmcb);

enable_gif(svm);
+
+ if (avic)
+ avic_init_vmcb(svm);
+}
+
+static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu, int index)
+{
+ u64 *avic_physical_id_table;
+ struct kvm_arch *vm_data = &vcpu->kvm->arch;
+
+ /* Note: APIC ID = 0xff is used for broadcast.
+ * APIC ID > 0xff is reserved.
+ */
+ if (index >= 0xff)
+ return NULL;
+
+ avic_physical_id_table = page_address(vm_data->avic_physical_id_table_page);
+
+ return &avic_physical_id_table[index];
+}
+
+/**
+ * Note:
+ * AVIC hardware walks the nested page table to check permissions,
+ * but does not use the SPA address specified in the leaf page
+ * table entry since it uses address in the AVIC_BACKING_PAGE pointer
+ * field of the VMCB. Therefore, we set up the
+ * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
+ */
+static int avic_init_access_page(struct kvm_vcpu *vcpu)
+{
+ int ret = 0;
+ struct kvm *kvm = vcpu->kvm;
+
+ if (!kvm->arch.apic_access_page_done) {
+ ret = x86_set_memory_region(kvm,
+ APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+ APIC_DEFAULT_PHYS_BASE,
+ PAGE_SIZE);
+ if (ret)
+ return ret;
+ kvm->arch.apic_access_page_done = true;
+ }
+
+ return ret;
+}
+
+static int avic_init_backing_page(struct kvm_vcpu *vcpu)
+{
+ int ret;
+ u64 *entry, new_entry;
+ int id = vcpu->vcpu_id;
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ ret = avic_init_access_page(vcpu);
+ if (ret)
+ return ret;
+
+ if (id >= AVIC_PHYSICAL_ID_MAX)
+ return -EINVAL;
+
+ if (!svm->vcpu.arch.apic->regs)
+ return -EINVAL;
+
+ svm->avic_backing_page = virt_to_page(svm->vcpu.arch.apic->regs);
+
+ avic_init_vmcb(svm);
+
+ /* Setting AVIC backing page address in the phy APIC ID table */
+ entry = avic_get_physical_id_entry(vcpu, id);
+ if (!entry)
+ return -EINVAL;
+
+ new_entry = READ_ONCE(*entry);
+ new_entry = (page_to_phys(svm->avic_backing_page) &
+ AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK) |
+ AVIC_PHYSICAL_ID_ENTRY_VALID_MASK;
+ WRITE_ONCE(*entry, new_entry);
+
+ svm->avic_physical_id_cache = entry;
+
+ return 0;
+}
+
+static void avic_vm_uninit(struct kvm *kvm)
+{
+ struct kvm_arch *vm_data = &kvm->arch;
+
+ if (vm_data->avic_logical_id_table_page)
+ __free_page(vm_data->avic_logical_id_table_page);
+ if (vm_data->avic_physical_id_table_page)
+ __free_page(vm_data->avic_physical_id_table_page);
+}
+
+static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (!avic)
+ return;
+
+ if (svm->avic_physical_id_cache)
+ svm->avic_physical_id_cache = NULL;
+}
+
+static int avic_vm_init(struct kvm *kvm)
+{
+ int err = -ENOMEM;
+ struct kvm_arch *vm_data = &kvm->arch;
+ struct page *p_page;
+ struct page *l_page;
+
+ if (!avic)
+ return 0;
+
+ /* Allocating physical APIC ID table (4KB) */
+ p_page = alloc_page(GFP_KERNEL);
+ if (!p_page)
+ goto free_avic;
+
+ vm_data->avic_physical_id_table_page = p_page;
+ clear_page(page_address(p_page));
+
+ /* Allocating logical APIC ID table (4KB) */
+ l_page = alloc_page(GFP_KERNEL);
+ if (!l_page)
+ goto free_avic;
+
+ vm_data->avic_logical_id_table_page = l_page;
+ clear_page(page_address(l_page));
+
+ return 0;
+
+free_avic:
+ avic_vm_uninit(kvm);
+ return err;
}

static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1131,6 +1327,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)

kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
+
+ if (svm_vcpu_avic_enabled(svm) && !init_event)
+ avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
}

static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
@@ -1169,6 +1368,14 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
if (!hsave_page)
goto free_page3;

+ if (avic) {
+ err = avic_init_backing_page(&svm->vcpu);
+ if (err) {
+ avic_vcpu_uninit(&svm->vcpu);
+ goto free_page4;
+ }
+ }
+
svm->nested.hsave = page_address(hsave_page);

svm->msrpm = page_address(msrpm_pages);
@@ -1187,6 +1394,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)

return &svm->vcpu;

+free_page4:
+ __free_page(hsave_page);
free_page3:
__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
free_page2:
@@ -1209,6 +1418,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
__free_page(virt_to_page(svm->nested.hsave));
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
+ avic_vcpu_uninit(vcpu);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
}
@@ -3212,6 +3422,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
case MSR_VM_IGNNE:
vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
break;
+ case MSR_IA32_APICBASE:
+ if (svm_vcpu_avic_enabled(svm))
+ avic_update_vapic_bar(to_svm(vcpu), data);
+ /* Follow through */
default:
return kvm_set_msr_common(vcpu, msr);
}
@@ -3375,6 +3589,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
+ pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
pr_err("%-20s%lld\n", "lbr_ctl:", control->lbr_ctl);
@@ -3606,11 +3821,27 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)

static bool svm_get_enable_apicv(void)
{
- return false;
+ return avic;
}

+static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+}
+
+static void svm_hwapic_isr_update(struct kvm *kvm, int isr)
+{
+}
+
+/* Note: Currently only used by Hyper-V. */
static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb *vmcb = svm->vmcb;
+
+ if (!avic)
+ return;
+
+ vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
}

static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
@@ -4322,6 +4553,9 @@ static struct kvm_x86_ops svm_x86_ops = {
.vcpu_free = svm_free_vcpu,
.vcpu_reset = svm_vcpu_reset,

+ .vm_init = avic_vm_init,
+ .vm_uninit = avic_vm_uninit,
+
.prepare_guest_switch = svm_prepare_guest_switch,
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
@@ -4382,6 +4616,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.sync_pir_to_irr = svm_sync_pir_to_irr,
+ .hwapic_irr_update = svm_hwapic_irr_update,
+ .hwapic_isr_update = svm_hwapic_isr_update,

.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
--
1.9.1

2016-04-07 08:22:10

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 07/11] svm: Add interrupt injection via AVIC

This patch introduces a new mechanism to inject interrupt using AVIC.
Since VINTR is not supported when enable AVIC, we need to inject
interrupt via APIC backing page instead.

This patch also adds support for AVIC doorbell, which is used by
KVM to signal a running vcpu to check IRR for injected interrupts.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 43 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 137cf18..f9547bc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -71,6 +71,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
#define SVM_FEATURE_DECODE_ASSIST (1 << 7)
#define SVM_FEATURE_PAUSE_FILTER (1 << 10)

+#define SVM_AVIC_DOORBELL 0xc001011b
+
#define NESTED_EXIT_HOST 0 /* Exit handled on host level */
#define NESTED_EXIT_DONE 1 /* Exit caused nested vmexit */
#define NESTED_EXIT_CONTINUE 2 /* Further checks needed */
@@ -216,6 +218,8 @@ static bool npt_enabled = true;
static bool npt_enabled;
#endif

+static struct kvm_x86_ops svm_x86_ops;
+
/* allow nested paging (virtualized MMU) for all guests */
static int npt = true;
module_param(npt, int, S_IRUGO);
@@ -290,6 +294,18 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
return container_of(vcpu, struct vcpu_svm, vcpu);
}

+static inline bool avic_vcpu_is_running(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 *entry;
+
+ entry = svm->avic_physical_id_cache;
+ if (!entry)
+ return false;
+
+ return (READ_ONCE(*entry) & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
+}
+
static void recalc_intercepts(struct vcpu_svm *svm)
{
struct vmcb_control_area *c, *h;
@@ -963,6 +979,8 @@ static __init int svm_hardware_setup(void)

if (avic) {
printk(KERN_INFO "kvm: AVIC enabled\n");
+ } else {
+ svm_x86_ops.deliver_posted_interrupt = NULL;
}

return 0;
@@ -2883,8 +2901,10 @@ static int clgi_interception(struct vcpu_svm *svm)
disable_gif(svm);

/* After a CLGI no interrupts should come */
- svm_clear_vintr(svm);
- svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+ if (!svm_vcpu_avic_enabled(svm)) {
+ svm_clear_vintr(svm);
+ svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+ }

mark_dirty(svm->vmcb, VMCB_INTR);

@@ -3777,6 +3797,7 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
{
struct vmcb_control_area *control;

+ /* The following fields are ignored when AVIC is enabled */
control = &svm->vmcb->control;
control->int_vector = irq;
control->int_ctl &= ~V_INTR_PRIO_MASK;
@@ -3854,6 +3875,20 @@ static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
return;
}

+static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ kvm_lapic_set_irr(vec, svm->vcpu.arch.apic);
+ smp_mb__after_atomic();
+
+ if (avic_vcpu_is_running(vcpu))
+ wrmsrl(SVM_AVIC_DOORBELL,
+ __default_cpu_present_to_apicid(vcpu->cpu));
+ else
+ kvm_vcpu_wake_up(vcpu);
+}
+
static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3908,6 +3943,9 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ if (svm_vcpu_avic_enabled(svm))
+ return;
+
/*
* In case GIF=0 we can't rely on the CPU to tell us when GIF becomes
* 1, because that's a separate STGI/VMRUN intercept. The next time we
@@ -4651,6 +4689,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.sched_in = svm_sched_in,

.pmu_ops = &amd_pmu_ops,
+ .deliver_posted_interrupt = svm_deliver_avic_intr,
};

static int __init svm_init(void)
--
1.9.1

2016-04-07 08:22:28

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 11/11] svm: Manage vcpu load/unload when enable AVIC

From: Suravee Suthikulpanit <[email protected]>

When a vcpu is loaded/unloaded to a physical core, we need to update
host physical APIC ID information in the Physical APIC-ID table
accordingly.

Also, when vCPU is blocking/un-blocking (due to halt instruction),
we need to make sure that the is-running bit in set accordingly in the
physical APIC-ID table.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bccf6cb..f527afe 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -35,6 +35,7 @@
#include <linux/trace_events.h>
#include <linux/slab.h>

+#include <asm/apic.h>
#include <asm/perf_event.h>
#include <asm/tlbflush.h>
#include <asm/desc.h>
@@ -175,6 +176,7 @@ struct vcpu_svm {

struct page *avic_backing_page;
u64 *avic_physical_id_cache;
+ bool avic_is_blocking;
};

#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFF)
@@ -1330,6 +1332,64 @@ free_avic:
return err;
}

+/**
+ * This function is called during VCPU halt/unhalt.
+ */
+static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+ u64 entry;
+ int h_physical_id = __default_cpu_present_to_apicid(vcpu->cpu);
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (!svm_vcpu_avic_enabled(svm))
+ return 0;
+
+ /* ID = 0xff (broadcast), ID > 0xff (reserved) */
+ if (h_physical_id >= AVIC_PHYSICAL_ID_MAX)
+ return -EINVAL;
+
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (is_run)
+ WARN_ON((entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) != 0);
+ else
+ WARN_ON((entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) == 0);
+
+ entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ if (is_run)
+ entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+
+ return 0;
+}
+
+static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
+{
+ u64 entry;
+ int h_physical_id = __default_cpu_present_to_apicid(cpu);
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (!svm_vcpu_avic_enabled(svm))
+ return 0;
+
+ /* ID = 0xff (broadcast), ID > 0xff (reserved) */
+ if (h_physical_id >= AVIC_PHYSICAL_ID_MAX)
+ return -EINVAL;
+
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ WARN_ON(is_load && (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK));
+
+ entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ if (is_load) {
+ entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
+ entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
+ if (!svm->avic_is_blocking)
+ entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ }
+ WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+
+ return 0;
+}
+
static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1395,6 +1455,11 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
}
}

+ /* We initialize this flag to one to make sure that the is_running
+ * bit would be set the first time the vcpu is loaded.
+ */
+ svm->avic_is_blocking = false;
+
svm->nested.hsave = page_address(hsave_page);

svm->msrpm = page_address(msrpm_pages);
@@ -1472,6 +1537,8 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* This assumes that the kernel never uses MSR_TSC_AUX */
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+
+ avic_vcpu_load(vcpu, cpu, true);
}

static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1479,6 +1546,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
int i;

+ avic_vcpu_load(vcpu, 0, false);
+
++vcpu->stat.host_state_reload;
kvm_load_ldt(svm->host.ldt);
#ifdef CONFIG_X86_64
@@ -1494,6 +1563,18 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
}

+static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+ to_svm(vcpu)->avic_is_blocking = true;
+ avic_set_running(vcpu, false);
+}
+
+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ to_svm(vcpu)->avic_is_blocking = false;
+ avic_set_running(vcpu, true);
+}
+
static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
{
return to_svm(vcpu)->vmcb->save.rflags;
@@ -4858,6 +4939,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.prepare_guest_switch = svm_prepare_guest_switch,
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
+ .vcpu_blocking = svm_vcpu_blocking,
+ .vcpu_unblocking = svm_vcpu_unblocking,

.update_bp_intercept = update_bp_intercept,
.get_msr = svm_get_msr,
--
1.9.1

2016-04-07 08:23:08

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 09/11] svm: Do not expose x2APIC when enable AVIC

From: Suravee Suthikulpanit <[email protected]>

Since AVIC only virtualizes xAPIC hardware for the guest, this patch
disable x2APIC support in guest CPUID.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 13fba3b..74b0751 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4560,14 +4560,26 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
static void svm_cpuid_update(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct kvm_cpuid_entry2 *entry;

/* Update nrips enabled cache */
svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
+
+ if (!svm_vcpu_avic_enabled(svm))
+ return;
+
+ entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+ if (entry->function == 1)
+ entry->ecx &= ~bit(X86_FEATURE_X2APIC);
}

static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
{
switch (func) {
+ case 0x00000001:
+ if (avic)
+ entry->ecx &= ~bit(X86_FEATURE_X2APIC);
+ break;
case 0x80000001:
if (nested)
entry->ecx |= (1 << 2); /* Set SVM bit */
--
1.9.1

2016-04-07 08:35:50

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 03/11] KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks

Adding new function pointer in struct kvm_x86_ops, and calling them
from the kvm_arch_vcpu[blocking/unblocking].

Signed-off-by: Suravee Suthikulpanit <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 22bd70c..2268f4c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -993,6 +993,10 @@ struct kvm_x86_ops {
*/
int (*pre_block)(struct kvm_vcpu *vcpu);
void (*post_block)(struct kvm_vcpu *vcpu);
+
+ void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
+ void (*vcpu_unblocking)(struct kvm_vcpu *vcpu);
+
int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
uint32_t guest_irq, bool set);
};
@@ -1344,7 +1348,16 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
struct kvm_lapic_irq *irq);

-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+ if (kvm_x86_ops->vcpu_blocking)
+ kvm_x86_ops->vcpu_blocking(vcpu);
+}
+
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ if (kvm_x86_ops->vcpu_unblocking)
+ kvm_x86_ops->vcpu_unblocking(vcpu);
+}

#endif /* _ASM_X86_KVM_HOST_H */
--
1.9.1

2016-04-07 08:37:24

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 10/11] svm: Do not intercept CR8 when enable AVIC

When enable AVIC:
* Do not intercept CR8 since this should be handled by AVIC HW.
* Also, we don't need to sync cr8/V_TPR and APIC backing page.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74b0751..bccf6cb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1092,7 +1092,8 @@ static void init_vmcb(struct vcpu_svm *svm)
set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
- set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+ if (!svm_vcpu_avic_enabled(svm))
+ set_cr_intercept(svm, INTERCEPT_CR8_WRITE);

set_dr_intercepts(svm);

@@ -4069,7 +4070,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
struct vcpu_svm *svm = to_svm(vcpu);

- if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
+ if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
+ svm_vcpu_avic_enabled(svm))
return;

clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
@@ -4255,14 +4257,15 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- u64 cr8;
+ struct kvm_lapic *apic = vcpu->arch.apic;

- if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
+ if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK) &&
+ svm_vcpu_avic_enabled(svm))
return;

- cr8 = kvm_get_cr8(vcpu);
svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
- svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
+ svm->vmcb->control.int_ctl |= (kvm_apic_get_reg(apic,
+ APIC_TASKPRI) >> 4) & V_TPR_MASK;
}

static void svm_complete_interrupts(struct vcpu_svm *svm)
--
1.9.1

2016-04-07 08:37:48

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

From: Suravee Suthikulpanit <[email protected]>

This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
and avic_unaccelerated_access_interception() along with two trace points
(trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/uapi/asm/svm.h | 9 +-
arch/x86/kvm/lapic.h | 3 +
arch/x86/kvm/svm.c | 246 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/trace.h | 57 ++++++++++
arch/x86/kvm/x86.c | 2 +
5 files changed, 316 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8a4add8..b9e9bb2 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -73,6 +73,8 @@
#define SVM_EXIT_MWAIT_COND 0x08c
#define SVM_EXIT_XSETBV 0x08d
#define SVM_EXIT_NPF 0x400
+#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
+#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402

#define SVM_EXIT_ERR -1

@@ -107,8 +109,10 @@
{ SVM_EXIT_SMI, "smi" }, \
{ SVM_EXIT_INIT, "init" }, \
{ SVM_EXIT_VINTR, "vintr" }, \
+ { SVM_EXIT_CR0_SEL_WRITE, "cr0_sel_write" }, \
{ SVM_EXIT_CPUID, "cpuid" }, \
{ SVM_EXIT_INVD, "invd" }, \
+ { SVM_EXIT_PAUSE, "pause" }, \
{ SVM_EXIT_HLT, "hlt" }, \
{ SVM_EXIT_INVLPG, "invlpg" }, \
{ SVM_EXIT_INVLPGA, "invlpga" }, \
@@ -127,7 +131,10 @@
{ SVM_EXIT_MONITOR, "monitor" }, \
{ SVM_EXIT_MWAIT, "mwait" }, \
{ SVM_EXIT_XSETBV, "xsetbv" }, \
- { SVM_EXIT_NPF, "npf" }
+ { SVM_EXIT_NPF, "npf" }, \
+ { SVM_EXIT_RSM, "rsm" }, \
+ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
+ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }


#endif /* _UAPI__SVM_H */
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a70cb62..2fc86b7 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -9,6 +9,9 @@
#define KVM_APIC_SIPI 1
#define KVM_APIC_LVT_NUM 6

+#define KVM_APIC_SHORT_MASK 0xc0000
+#define KVM_APIC_DEST_MASK 0x800
+
struct kvm_timer {
struct hrtimer timer;
s64 period; /* unit: ns */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f9547bc..13fba3b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
return nop_interception(svm);
}

+enum avic_ipi_failure_cause {
+ AVIC_IPI_FAILURE_INVALID_INT_TYPE,
+ AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
+ AVIC_IPI_FAILURE_INVALID_TARGET,
+ AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
+};
+
+static int avic_incomplete_ipi_interception(struct vcpu_svm *svm)
+{
+ u32 icrh = svm->vmcb->control.exit_info_1 >> 32;
+ u32 icrl = svm->vmcb->control.exit_info_1;
+ u32 id = svm->vmcb->control.exit_info_2 >> 32;
+ u32 index = svm->vmcb->control.exit_info_2 && 0xFF;
+ struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+ trace_kvm_avic_incomplete_ipi(svm->vcpu.vcpu_id, icrh, icrl, id, index);
+
+ switch (id) {
+ case AVIC_IPI_FAILURE_INVALID_INT_TYPE:
+ /*
+ * AVIC hardware handles the generation of
+ * IPIs when the specified Message Type is Fixed
+ * (also known as fixed delivery mode) and
+ * the Trigger Mode is edge-triggered. The hardware
+ * also supports self and broadcast delivery modes
+ * specified via the Destination Shorthand(DSH)
+ * field of the ICRL. Logical and physical APIC ID
+ * formats are supported. All other IPI types cause
+ * a #VMEXIT, which needs to emulated.
+ */
+ kvm_lapic_reg_write(apic, APIC_ICR2, icrh);
+ kvm_lapic_reg_write(apic, APIC_ICR, icrl);
+ break;
+ case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: {
+ int i;
+ struct kvm_vcpu *vcpu;
+ struct kvm *kvm = svm->vcpu.kvm;
+ struct kvm_lapic *apic = svm->vcpu.arch.apic;
+
+ /*
+ * At this point, we expect that the AVIC HW has already
+ * set the appropriate IRR bits on the valid target
+ * vcpus. So, we just need to kick the appropriate vcpu.
+ */
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ bool m = kvm_apic_match_dest(vcpu, apic,
+ icrl & KVM_APIC_SHORT_MASK,
+ GET_APIC_DEST_FIELD(icrh),
+ icrl & KVM_APIC_DEST_MASK);
+
+ if (m && !avic_vcpu_is_running(vcpu))
+ kvm_vcpu_wake_up(vcpu);
+ }
+ break;
+ }
+ case AVIC_IPI_FAILURE_INVALID_TARGET:
+ break;
+ case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
+ WARN_ONCE(1, "Invalid backing page\n");
+ break;
+ default:
+ pr_err("Unknown IPI interception\n");
+ }
+
+ return 1;
+}
+
+static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool flat)
+{
+ struct kvm_arch *vm_data = &vcpu->kvm->arch;
+ int index;
+ u32 *logical_apic_id_table;
+
+ if (flat) { /* flat */
+ if (mda > 7)
+ return NULL;
+ index = mda;
+ } else { /* cluster */
+ int apic_id = mda & 0xf;
+ int cluster_id = (mda & 0xf0) >> 8;
+
+ if (apic_id > 4 || cluster_id >= 0xf)
+ return NULL;
+ index = (cluster_id << 2) + apic_id;
+ }
+ logical_apic_id_table = (u32 *) page_address(vm_data->avic_logical_id_table_page);
+
+ return &logical_apic_id_table[index];
+}
+
+static int avic_handle_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id,
+ u8 logical_id)
+{
+ u32 mod;
+ u32 *entry, new_entry;
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (!svm)
+ return -EINVAL;
+
+ mod = (kvm_apic_get_reg(svm->vcpu.arch.apic, APIC_DFR) >> 28) & 0xf;
+ entry = avic_get_logical_id_entry(vcpu, logical_id, (mod == 0xf));
+ if (!entry)
+ return -EINVAL;
+
+ new_entry = READ_ONCE(*entry);
+ new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
+ new_entry |= (g_physical_id & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
+ new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
+ WRITE_ONCE(*entry, new_entry);
+
+ return 0;
+}
+
+static int avic_unaccel_trap_write(struct vcpu_svm *svm)
+{
+ u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;
+ struct kvm_lapic *apic = svm->vcpu.arch.apic;
+ u32 reg = kvm_apic_get_reg(apic, offset);
+
+ switch (offset) {
+ case APIC_ID: {
+ u32 aid = (reg >> 24) & 0xff;
+ u64 *o_ent = avic_get_physical_id_entry(&svm->vcpu,
+ svm->vcpu.vcpu_id);
+ u64 *n_ent = avic_get_physical_id_entry(&svm->vcpu, aid);
+
+ if (!n_ent || !o_ent)
+ return 0;
+
+ /* We need to move physical_id_entry to new offset */
+ *n_ent = *o_ent;
+ *o_ent = 0ULL;
+ svm->avic_physical_id_cache = n_ent;
+ break;
+ }
+ case APIC_LDR: {
+ int ret, lid;
+ int dlid = (reg >> 24) & 0xff;
+
+ if (!dlid)
+ return 0;
+
+ lid = ffs(dlid) - 1;
+ ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
+ if (ret)
+ return 0;
+
+ break;
+ }
+ case APIC_DFR: {
+ struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
+ u32 mod = (reg >> 28) & 0xf;
+
+ /*
+ * We assume that all local APICs are using the same type.
+ * If this changes, we need to rebuild the AVIC logical
+ * APID id table with subsequent write to APIC_LDR.
+ */
+ if (vm_data->ldr_mode != mod) {
+ clear_page(page_address(vm_data->avic_logical_id_table_page));
+ vm_data->ldr_mode = mod;
+ }
+ break;
+ }
+ default:
+ break;
+ }
+
+ kvm_lapic_reg_write(apic, offset, reg);
+
+ return 1;
+}
+
+static bool is_avic_unaccelerated_access_trap(u32 offset)
+{
+ bool ret = false;
+
+ switch (offset) {
+ case APIC_ID:
+ case APIC_EOI:
+ case APIC_RRR:
+ case APIC_LDR:
+ case APIC_DFR:
+ case APIC_SPIV:
+ case APIC_ESR:
+ case APIC_ICR:
+ case APIC_LVTT:
+ case APIC_LVTTHMR:
+ case APIC_LVTPC:
+ case APIC_LVT0:
+ case APIC_LVT1:
+ case APIC_LVTERR:
+ case APIC_TMICT:
+ case APIC_TDCR:
+ ret = true;
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+#define AVIC_UNACCEL_ACCESS_WRITE_MASK 1
+#define AVIC_UNACCEL_ACCESS_OFFSET_MASK 0xFF0
+#define AVIC_UNACCEL_ACCESS_VECTOR_MASK 0xFFFFFFFF
+
+static int avic_unaccelerated_access_interception(struct vcpu_svm *svm)
+{
+ int ret = 0;
+ u32 offset = svm->vmcb->control.exit_info_1 &
+ AVIC_UNACCEL_ACCESS_OFFSET_MASK;
+ u32 vector = svm->vmcb->control.exit_info_2 &
+ AVIC_UNACCEL_ACCESS_VECTOR_MASK;
+ bool write = (svm->vmcb->control.exit_info_1 >> 32) &
+ AVIC_UNACCEL_ACCESS_WRITE_MASK;
+ bool trap = is_avic_unaccelerated_access_trap(offset);
+
+ trace_kvm_avic_unaccelerated_access(svm->vcpu.vcpu_id, offset,
+ trap, write, vector);
+
+ /**
+ * AVIC does not support x2APIC registers, and we only advertise
+ * xAPIC when enable AVIC. Therefore, access to these registers
+ * will not be supported.
+ */
+ if (offset >= 0x400) {
+ WARN(1, "Unsupported APIC offset %#x\n", offset);
+ return ret;
+ }
+
+ if (trap) {
+ /* Handling Trap */
+ if (!write) /* Trap read should never happens */
+ BUG();
+ ret = avic_unaccel_trap_write(svm);
+ } else {
+ /* Handling Fault */
+ ret = (emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE);
+ }
+
+ return ret;
+}
+
static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
@@ -3578,6 +3822,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_XSETBV] = xsetbv_interception,
[SVM_EXIT_NPF] = pf_interception,
[SVM_EXIT_RSM] = emulate_on_interception,
+ [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception,
+ [SVM_EXIT_AVIC_UNACCELERATED_ACCESS] = avic_unaccelerated_access_interception,
};

static void dump_vmcb(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 2f1ea2f..39f264c 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1292,6 +1292,63 @@ TRACE_EVENT(kvm_hv_stimer_cleanup,
__entry->vcpu_id, __entry->timer_index)
);

+/*
+ * Tracepoint for AMD AVIC
+ */
+TRACE_EVENT(kvm_avic_incomplete_ipi,
+ TP_PROTO(u32 vcpu, u32 icrh, u32 icrl, u32 id, u32 index),
+ TP_ARGS(vcpu, icrh, icrl, id, index),
+
+ TP_STRUCT__entry(
+ __field(u32, vcpu)
+ __field(u32, icrh)
+ __field(u32, icrl)
+ __field(u32, id)
+ __field(u32, index)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->icrh = icrh;
+ __entry->icrl = icrl;
+ __entry->id = id;
+ __entry->index = index;
+ ),
+
+ TP_printk("vcpu=%u, icrh:icrl=%#010x:%08x, id=%u, index=%u\n",
+ __entry->vcpu, __entry->icrh, __entry->icrl,
+ __entry->id, __entry->index)
+);
+
+TRACE_EVENT(kvm_avic_unaccelerated_access,
+ TP_PROTO(u32 vcpu, u32 offset, bool ft, bool rw, u32 vec),
+ TP_ARGS(vcpu, offset, ft, rw, vec),
+
+ TP_STRUCT__entry(
+ __field(u32, vcpu)
+ __field(u32, offset)
+ __field(bool, ft)
+ __field(bool, rw)
+ __field(u32, vec)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->offset = offset;
+ __entry->ft = ft;
+ __entry->rw = rw;
+ __entry->vec = vec;
+ ),
+
+ TP_printk("vcpu=%u, offset=%#x(%s), %s, %s, vec=%#x\n",
+ __entry->vcpu,
+ __entry->offset,
+ __print_symbolic(__entry->offset, kvm_trace_symbol_apic),
+ __entry->ft ? "trap" : "fault",
+ __entry->rw ? "write" : "read",
+ __entry->vec)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d12583e..b0f211c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8437,3 +8437,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
--
1.9.1

2016-04-11 20:34:57

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 01/11] KVM: x86: Misc LAPIC changes to expose helper functions

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> Exporting LAPIC utility functions and macros for re-use in SVM code.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---

Adding kvm_lapic_set_reg looks weird when KVM has kvm_apic_get_reg, but
the structure they accept is called kvm_lapic, so I vote to rename get,

Reviewed-by: Radim Krčmář <[email protected]>

2016-04-11 20:48:34

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 06/11] KVM: x86: Detect and Initialize AVIC support

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> This patch introduces AVIC-related data structure, and AVIC
> initialization code.
>
> There are three main data structures for AVIC:
> * Virtual APIC (vAPIC) backing page (per-VCPU)
> * Physical APIC ID table (per-VM)
> * Logical APIC ID table (per-VM)
>
> Currently, AVIC is disabled by default. Users can manually
> enable AVIC via kernel boot option kvm-amd.avic=1 or during
> kvm-amd module loading with parameter avic=1.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -14,6 +14,9 @@
> * the COPYING file in the top-level directory.
> *
> */
> +
> +#define pr_fmt(fmt) "SVM: " fmt
> +
> #include <linux/kvm_host.h>
>
> #include "irq.h"
> @@ -78,6 +81,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
> #define TSC_RATIO_MIN 0x0000000000000001ULL
> #define TSC_RATIO_MAX 0x000000ffffffffffULL
>
> +#define AVIC_HPA_MASK ~((0xFFFULL << 52) || 0xFFF)
> +
> +/* NOTE: Current max index allowed for physical APIC ID table is 255 */
> +#define AVIC_PHYSICAL_ID_MAX 0xFF

0xff is broadcast, so shouldn't the actual last one be 0xfe?

> @@ -234,6 +257,18 @@ enum {
> /* TPR and CR2 are always written before VMRUN */
> #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
> +
> +static inline bool svm_vcpu_avic_enabled(struct vcpu_svm *svm)
> +{
> + return (avic && (svm->vmcb->control.int_ctl & AVIC_ENABLE_MASK));
> +}

I think it'd be better to use kvm_vcpu_apicv_active() instead.
In any case, have just one function to tell whether avic is enabled.

> @@ -1000,6 +1042,24 @@ static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
> mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
> }
>
> +static void avic_init_vmcb(struct vcpu_svm *svm)
> +{
> + struct vmcb *vmcb = svm->vmcb;
> + struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
> + phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
> + phys_addr_t lpa = page_to_phys(vm_data->avic_logical_id_table_page);
> + phys_addr_t ppa = page_to_phys(vm_data->avic_physical_id_table_page);
> +
> + if (!vmcb)
> + return;

We can call this function only from init_vmcb and therefore drop this
return.

> @@ -1113,6 +1173,142 @@ static void init_vmcb(struct vcpu_svm *svm)
> +static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu, int index)
> +{
> + u64 *avic_physical_id_table;
> + struct kvm_arch *vm_data = &vcpu->kvm->arch;
> +
> + /* Note: APIC ID = 0xff is used for broadcast.
> + * APIC ID > 0xff is reserved.
> + */
> + if (index >= 0xff)

Isn't this AVIC_PHYSICAL_ID_MAX?

> +static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> +{
> + int ret;
> + u64 *entry, new_entry;
> + int id = vcpu->vcpu_id;
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + ret = avic_init_access_page(vcpu);
> + if (ret)
> + return ret;
> +
> + if (id >= AVIC_PHYSICAL_ID_MAX)
> + return -EINVAL;
> +
> + if (!svm->vcpu.arch.apic->regs)
> + return -EINVAL;
> +
> + svm->avic_backing_page = virt_to_page(svm->vcpu.arch.apic->regs);
> +
> + avic_init_vmcb(svm);

avic_init_vmcb() should be dropped from here. It does nothing.

> +static void avic_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (!avic)
> + return;
> +
> + if (svm->avic_physical_id_cache)
> + svm->avic_physical_id_cache = NULL;
> +}

I think it's safe to drop avic_vcpu_uninit(). This function is only
called when we are freeing the vcpu, so the value is not used anymore.

> static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> {
> + struct vcpu_svm *svm = to_svm(vcpu);
> + struct vmcb *vmcb = svm->vmcb;
> +
> + if (!avic)
> + return;
> +
> + vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;

(This function doesn't really "refresh" avic, it just disables.
VMX isn't much better, so it's bearable to do so ...)

2016-04-11 20:49:44

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 02/11] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> Adding function pointers in struct kvm_x86_ops for processor-specific
> layer to provide hooks for when KVM initialize and un-initialize VM.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -7781,6 +7784,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
> kvm_for_each_vcpu(i, vcpu, kvm)
> kvm_arch_vcpu_free(vcpu);
>
> + if (kvm_x86_ops->vm_uninit)
> + kvm_x86_ops->vm_uninit(kvm);

vm_uninit() doesn't seem to have much to do with kvm_free_vcpus(),
please call it from kvm_arch_destroy_vm().

(kvm_x86_ops.vm_destroy would be a better name then.)

2016-04-11 20:52:10

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 07/11] svm: Add interrupt injection via AVIC

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> This patch introduces a new mechanism to inject interrupt using AVIC.
> Since VINTR is not supported when enable AVIC, we need to inject
> interrupt via APIC backing page instead.
>
> This patch also adds support for AVIC doorbell, which is used by
> KVM to signal a running vcpu to check IRR for injected interrupts.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -216,6 +218,8 @@ static bool npt_enabled = true;
> +static struct kvm_x86_ops svm_x86_ops;
> @@ -963,6 +979,8 @@ static __init int svm_hardware_setup(void)
>
> if (avic) {
> printk(KERN_INFO "kvm: AVIC enabled\n");
> + } else {
> + svm_x86_ops.deliver_posted_interrupt = NULL;

(No need to NULL this. deliver_posted_interrupt cannot be used to
determine whether avic is should be used, so we can keep it !NULL.
Declaration of svm_x86_ops can be omitted then.)

> @@ -2883,8 +2901,10 @@ static int clgi_interception(struct vcpu_svm *svm)
> disable_gif(svm);
>
> /* After a CLGI no interrupts should come */
> - svm_clear_vintr(svm);
> - svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> + if (!svm_vcpu_avic_enabled(svm)) {
> + svm_clear_vintr(svm);
> + svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> + }
>
> mark_dirty(svm->vmcb, VMCB_INTR);

mark_dirty also belongs to the if, because avic doesn't touch vmcb.

> @@ -3854,6 +3875,20 @@ static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> +static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + kvm_lapic_set_irr(vec, svm->vcpu.arch.apic);

(vcpu->arch.apic would work too. ;])

2016-04-11 20:54:30

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 09/11] svm: Do not expose x2APIC when enable AVIC

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <[email protected]>
>
> Since AVIC only virtualizes xAPIC hardware for the guest, this patch
> disable x2APIC support in guest CPUID.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -4560,14 +4560,26 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> static void svm_cpuid_update(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> + struct kvm_cpuid_entry2 *entry;
>
> /* Update nrips enabled cache */
> svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
> +
> + if (!svm_vcpu_avic_enabled(svm))
> + return;
> +
> + entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
> + if (entry->function == 1)

entry->function == 1 will always be true, because entry can be NULL
otherwise, so we would bug before. Check for entry.

> + entry->ecx &= ~bit(X86_FEATURE_X2APIC);
> }
>
> static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> {
> switch (func) {
> + case 0x00000001:

("case 1:" or "case 0x1:" would be easier to read.)

> + if (avic)
> + entry->ecx &= ~bit(X86_FEATURE_X2APIC);
> + break;


---
A rant for the unlikely case I get back to fix the broader situation:
Only one of these two additions is needed. If we do the second one,
then userspace should not set X2APIC, therefore the first one is
useless.

Omitting the second one allows userspace to clear apicv_active and set
X86_FEATURE_X2APIC, but it needs a non-intuitive order of ioctls, so I
think we should have the second one.

The problem is that KVM doesn't seems to check whether userspace sets
cpuid that is a subset of supported ones, so omitting the first one
needlessly expands the space for potential failures.

2016-04-12 14:18:30

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 10/11] svm: Do not intercept CR8 when enable AVIC

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> When enable AVIC:
> * Do not intercept CR8 since this should be handled by AVIC HW.
> * Also, we don't need to sync cr8/V_TPR and APIC backing page.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -4069,7 +4070,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> - if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> + if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
> + svm_vcpu_avic_enabled(svm))
> @@ -4255,14 +4257,15 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
> static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> - u64 cr8;
> + struct kvm_lapic *apic = vcpu->arch.apic;
>
> - if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> + if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK) &&

Should be "||" at the end of line, like above.

(Naming this condition would reduce the chance of errors.)

> + svm_vcpu_avic_enabled(svm))
> return;
>
> - cr8 = kvm_get_cr8(vcpu);
> svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
> - svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
> + svm->vmcb->control.int_ctl |= (kvm_apic_get_reg(apic,
> + APIC_TASKPRI) >> 4) & V_TPR_MASK;

kvm_get_cr8 takes a different path without lapic_in_kernel (when avic
cannot be enabled), so the original code was better.

2016-04-12 14:34:50

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 11/11] svm: Manage vcpu load/unload when enable AVIC

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <[email protected]>
>
> When a vcpu is loaded/unloaded to a physical core, we need to update
> host physical APIC ID information in the Physical APIC-ID table
> accordingly.
>
> Also, when vCPU is blocking/un-blocking (due to halt instruction),
> we need to make sure that the is-running bit in set accordingly in the
> physical APIC-ID table.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---

Thanks,

Reviewed-by: Radim Krčmář <[email protected]>

> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> #define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFF)
> @@ -1330,6 +1332,64 @@ free_avic:
> +/**
> + * This function is called during VCPU halt/unhalt.
> + */
> +static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
> +{
> + u64 entry;
> + int h_physical_id = __default_cpu_present_to_apicid(vcpu->cpu);
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (!svm_vcpu_avic_enabled(svm))
> + return 0;
> +
> + /* ID = 0xff (broadcast), ID > 0xff (reserved) */
> + if (h_physical_id >= AVIC_PHYSICAL_ID_MAX)
> + return -EINVAL;
> +
> + entry = READ_ONCE(*(svm->avic_physical_id_cache));
> + if (is_run)
> + WARN_ON((entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) != 0);
> + else
> + WARN_ON((entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) == 0);

(We're pretty unlikely to hit this, so I'd give it just one line with:
"is_run == !!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)")

> +
> + entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
> + if (is_run)
> + entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
> + WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
> +
> + return 0;
> +}
> @@ -1395,6 +1455,11 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
> }
> }
>
> + /* We initialize this flag to one to make sure that the is_running
> + * bit would be set the first time the vcpu is loaded.
> + */
> + svm->avic_is_blocking = false;

(svm was zalloc'ed, so we know it to be false, but I guess that a bit of
documentation doesn't hurt.)

2016-04-12 16:22:52

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-07 03:20-0500, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <[email protected]>
>
> This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
> and avic_unaccelerated_access_interception() along with two trace points
> (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
> +static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool flat)
> +{
> + struct kvm_arch *vm_data = &vcpu->kvm->arch;
> + int index;
> + u32 *logical_apic_id_table;
> +
> + if (flat) { /* flat */
> + if (mda > 7)

Don't you want to check that just one bit it set?

> + return NULL;
> + index = mda;
> + } else { /* cluster */
> + int apic_id = mda & 0xf;
> + int cluster_id = (mda & 0xf0) >> 8;

">> 4".

> +
> + if (apic_id > 4 || cluster_id >= 0xf)
> + return NULL;
> + index = (cluster_id << 2) + apic_id;

ffs(apic_id), because 'apic_id' must be compacted into 2 bits.

> + }
> + logical_apic_id_table = (u32 *) page_address(vm_data->avic_logical_id_table_page);
> +
> + return &logical_apic_id_table[index];
> +}
> +
> +static int avic_handle_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id,
> + u8 logical_id)
> +{
> + u32 mod;
> + u32 *entry, new_entry;
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (!svm)
> + return -EINVAL;

(No need to check, svm is taken for granted.)

> +
> + mod = (kvm_apic_get_reg(svm->vcpu.arch.apic, APIC_DFR) >> 28) & 0xf;
> + entry = avic_get_logical_id_entry(vcpu, logical_id, (mod == 0xf));

(Use "kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT".)

> + if (!entry)
> + return -EINVAL;
> +
> + new_entry = READ_ONCE(*entry);
> + new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> + new_entry |= (g_physical_id & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
> + new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
> + WRITE_ONCE(*entry, new_entry);
> +
> + return 0;
> +}
> +
> +static int avic_unaccel_trap_write(struct vcpu_svm *svm)
> +{
> + u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;

The previous function knew the offset, just pass it as an argument.
(Or use AVIC_UNACCEL_ACCESS_OFFSET_MASK, because defining and not using
it everywhere is too sad.)

> + struct kvm_lapic *apic = svm->vcpu.arch.apic;
> + u32 reg = kvm_apic_get_reg(apic, offset);
> +
> + switch (offset) {
> + case APIC_ID: {
> + u32 aid = (reg >> 24) & 0xff;
> + u64 *o_ent = avic_get_physical_id_entry(&svm->vcpu,
> + svm->vcpu.vcpu_id);
> + u64 *n_ent = avic_get_physical_id_entry(&svm->vcpu, aid);

Write old and new. (Skip "_ent" if you want to save characters.)

> + if (!n_ent || !o_ent)
> + return 0;
> +
> + /* We need to move physical_id_entry to new offset */
> + *n_ent = *o_ent;
> + *o_ent = 0ULL;
> + svm->avic_physical_id_cache = n_ent;
> + break;
> + }
> + case APIC_LDR: {
> + int ret, lid;
> + int dlid = (reg >> 24) & 0xff;
> +
> + if (!dlid)
> + return 0;

ldr == 0 should be handled as well.

> +
> + lid = ffs(dlid) - 1;
> + ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
> + if (ret)
> + return 0;

OS can actually change LDR, so the old one should be invalidated.

(No OS does, but that is not an important factor for the hypervisor.)

> +
> + break;
> + }
> + case APIC_DFR: {
> + struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
> + u32 mod = (reg >> 28) & 0xf;
> +
> + /*
> + * We assume that all local APICs are using the same type.
> + * If this changes, we need to rebuild the AVIC logical
> + * APID id table with subsequent write to APIC_LDR.
> + */

The code doesn't rebuild avic_logical_id_table_page, it just flushes the
old one.

> + if (vm_data->ldr_mode != mod) {
> + clear_page(page_address(vm_data->avic_logical_id_table_page));
> + vm_data->ldr_mode = mod;
> + }
> + break;
> + }

All these cases need to be called on KVM_SET_LAPIC -- the userspace can
provide completely new set of APIC registers and AVIC should build its
maps with them. Test with save/restore or migration.

> +static int avic_unaccelerated_access_interception(struct vcpu_svm *svm)
> +{
> + int ret = 0;
> + u32 offset = svm->vmcb->control.exit_info_1 &
> + AVIC_UNACCEL_ACCESS_OFFSET_MASK;
> + u32 vector = svm->vmcb->control.exit_info_2 &
> + AVIC_UNACCEL_ACCESS_VECTOR_MASK;
> + bool write = (svm->vmcb->control.exit_info_1 >> 32) &
> + AVIC_UNACCEL_ACCESS_WRITE_MASK;
> + bool trap = is_avic_unaccelerated_access_trap(offset);
> +
> + trace_kvm_avic_unaccelerated_access(svm->vcpu.vcpu_id, offset,
> + trap, write, vector);
> +
> + /**
> + * AVIC does not support x2APIC registers, and we only advertise
> + * xAPIC when enable AVIC. Therefore, access to these registers
> + * will not be supported.
> + */

x2APIC only has MSR interface and I don't think it has much do with
offset >= 0x400. AMD defines few registers in the extended area, but we
don't seem emulate them.

> + if (offset >= 0x400) {
> + WARN(1, "Unsupported APIC offset %#x\n", offset);

"printk_ratelimited(KERN_INFO " is the most severe message you could
give. I think that not printing anything is best,

> + return ret;

because we should not return, but continue to emulate the access.

> + }
> +
> + if (trap) {
> + /* Handling Trap */
> + if (!write) /* Trap read should never happens */
> + BUG();

(BUG_ON(!write) is shorter, though I would avoid BUG -- only guests are
going to fail, so we don't need to kill the host.)

> + ret = avic_unaccel_trap_write(svm);
> + } else {
> + /* Handling Fault */
> + ret = (emulate_instruction(&svm->vcpu, 0) == EMULATE_DONE);
> + }
> +
> + return ret;
> +}

2016-04-12 21:54:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PART1 RFC v4 06/11] KVM: x86: Detect and Initialize AVIC support



On 11/04/2016 22:48, Radim Krčmář wrote:
> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>> This patch introduces AVIC-related data structure, and AVIC
>> initialization code.
>>
>> There are three main data structures for AVIC:
>> * Virtual APIC (vAPIC) backing page (per-VCPU)
>> * Physical APIC ID table (per-VM)
>> * Logical APIC ID table (per-VM)
>>
>> Currently, AVIC is disabled by default. Users can manually
>> enable AVIC via kernel boot option kvm-amd.avic=1 or during
>> kvm-amd module loading with parameter avic=1.
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> @@ -14,6 +14,9 @@
>> * the COPYING file in the top-level directory.
>> *
>> */
>> +
>> +#define pr_fmt(fmt) "SVM: " fmt
>> +
>> #include <linux/kvm_host.h>
>>
>> #include "irq.h"
>> @@ -78,6 +81,11 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
>> #define TSC_RATIO_MIN 0x0000000000000001ULL
>> #define TSC_RATIO_MAX 0x000000ffffffffffULL
>>
>> +#define AVIC_HPA_MASK ~((0xFFFULL << 52) || 0xFFF)
>> +
>> +/* NOTE: Current max index allowed for physical APIC ID table is 255 */
>> +#define AVIC_PHYSICAL_ID_MAX 0xFF
>
> 0xff is broadcast, so shouldn't the actual last one be 0xfe?

Right, actually 0xFF is the maximum *number* of physical APICs (numbered
0 to 254). But the code is correct and written for that convention, so
we should just rename the macro:

/* 0xff is broadcast, so the max index allowed for physical APIC ID
* table is 0xfe. APIC IDs above 0xff are reserved.
*/
#define AVIC_MAX_PHYSICAL_ID_COUNT 255

You can then remove the comment where Radim pointed out you should use
the constant:

> + /* Note: APIC ID = 0xff is used for broadcast.
> + * APIC ID > 0xff is reserved.
> + */
> + if (index >= 0xff)

Paolo

2016-04-12 21:55:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PART1 RFC v4 02/11] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks



On 11/04/2016 22:49, Radim Krčmář wrote:
>> > @@ -7781,6 +7784,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
>> > kvm_for_each_vcpu(i, vcpu, kvm)
>> > kvm_arch_vcpu_free(vcpu);
>> >
>> > + if (kvm_x86_ops->vm_uninit)
>> > + kvm_x86_ops->vm_uninit(kvm);
> vm_uninit() doesn't seem to have much to do with kvm_free_vcpus(),
> please call it from kvm_arch_destroy_vm().
>
> (kvm_x86_ops.vm_destroy would be a better name then.)

Especially, you're calling it with struct kvm full of dangling pointer,
so please call it early, right after the "if (current->mm == kvm->mm)"
block.

Paolo

2016-04-12 22:09:35

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PART1 RFC v4 09/11] svm: Do not expose x2APIC when enable AVIC



On 11/04/2016 22:54, Radim Krčmář wrote:
>> >
>> > static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
>> > {
>> > switch (func) {
>> > + case 0x00000001:
> ("case 1:" or "case 0x1:" would be easier to read.)
>
>> > + if (avic)
>> > + entry->ecx &= ~bit(X86_FEATURE_X2APIC);
>> > + break;
>
> ---
> A rant for the unlikely case I get back to fix the broader situation:
> Only one of these two additions is needed. If we do the second one,
> then userspace should not set X2APIC, therefore the first one is
> useless.
>
> Omitting the second one allows userspace to clear apicv_active and set
> X86_FEATURE_X2APIC, but it needs a non-intuitive order of ioctls, so I
> think we should have the second one.
>
> The problem is that KVM doesn't seems to check whether userspace sets
> cpuid that is a subset of supported ones, so omitting the first one
> needlessly expands the space for potential failures.

Yes, we need both.

Paolo

2016-04-12 22:26:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PART1 RFC v4 10/11] svm: Do not intercept CR8 when enable AVIC



On 12/04/2016 16:18, Radim Krčmář wrote:
>> > @@ -4069,7 +4070,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>> > - if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>> > + if ((is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK)) ||
>> > + svm_vcpu_avic_enabled(svm))
>> > @@ -4255,14 +4257,15 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
>> > static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>> > {
>> > struct vcpu_svm *svm = to_svm(vcpu);
>> > - u64 cr8;
>> > + struct kvm_lapic *apic = vcpu->arch.apic;
>> >
>> > - if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>> > + if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK) &&
> Should be "||" at the end of line, like above.
>
> (Naming this condition would reduce the chance of errors.)
>

I think it's just "is_guest_mode(vcpu) && (vcpu->arch.hflags &
HF_VINTR_MASK)" that should become a static inline. It is used also in
update_cr8_intercept. Then something like

if (svm_in_nested_interrupt_shadow(vcpu) &&
svm_vcpu_avic_enabled(svm))
return;

makes little sense and stands out much better.

In fact, because nested SVM and AVIC have nothing to do with each other,
it's even better to write it like

if (svm_in_nested_interrupt_shadow(vcpu))
return;
if (svm_vcpu_avic_enabled(svm))
return;

Thanks,

Paolo

2016-04-12 22:29:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC



On 12/04/2016 18:22, Radim Krčmář wrote:
>> > +
>> > + if (apic_id > 4 || cluster_id >= 0xf)
>> > + return NULL;
>> > + index = (cluster_id << 2) + apic_id;
> ffs(apic_id), because 'apic_id' must be compacted into 2 bits.
>

ffs(apic_id)-1 actually.

Thanks for the review, Radim.

Paolo

2016-04-13 12:37:42

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-13 00:29+0200, Paolo Bonzini:
> On 12/04/2016 18:22, Radim Krčmář wrote:
>>> > + if (apic_id > 4 || cluster_id >= 0xf)
>>> > + return NULL;
>>> > + index = (cluster_id << 2) + apic_id;
>> ffs(apic_id), because 'apic_id' must be compacted into 2 bits.
>
> ffs(apic_id)-1 actually.

Yes, thanks.

(And I missed that the confusion begins by passing "ffs(dlid) - 1" as
mda to avic_handle_ldr_write, because the rest cannot work after that.)

2016-04-18 19:57:26

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PART1 RFC v4 01/11] KVM: x86: Misc LAPIC changes to expose helper functions

Radim,

On 04/12/2016 03:34 AM, Radim Krčmář wrote:
> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>> Exporting LAPIC utility functions and macros for re-use in SVM code.
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>
> Adding kvm_lapic_set_reg looks weird when KVM has kvm_apic_get_reg, but
> the structure they accept is called kvm_lapic, so I vote to rename get,
>
> Reviewed-by: Radim Krčmář <[email protected]>
>

Sure. Within the same patch, I can rename kvm_apic_get_reg to
kvm_lapic_get_reg to be consistent.

Thanks,
Suravee

2016-04-18 20:29:31

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PART1 RFC v4 01/11] KVM: x86: Misc LAPIC changes to expose helper functions



On 04/19/2016 02:57 AM, Suravee Suthikulpanit wrote:
> Radim,
>
> On 04/12/2016 03:34 AM, Radim Krčmář wrote:
>> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>>> Exporting LAPIC utility functions and macros for re-use in SVM code.
>>>
>>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>>> ---
>>
>> Adding kvm_lapic_set_reg looks weird when KVM has kvm_apic_get_reg, but
>> the structure they accept is called kvm_lapic, so I vote to rename get,
>>
>> Reviewed-by: Radim Krčmář <[email protected]>
>>
>
> Sure. Within the same patch, I can rename kvm_apic_get_reg to
> kvm_lapic_get_reg to be consistent.
>

Think again, I'll do this on a separate patch.

Suravee

2016-04-18 20:40:28

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PART1 RFC v4 02/11] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks

Radim,

On 04/12/2016 03:49 AM, Radim Krčmář wrote:
> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>> Adding function pointers in struct kvm_x86_ops for processor-specific
>> layer to provide hooks for when KVM initialize and un-initialize VM.
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -7781,6 +7784,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
>> kvm_for_each_vcpu(i, vcpu, kvm)
>> kvm_arch_vcpu_free(vcpu);
>>
>> + if (kvm_x86_ops->vm_uninit)
>> + kvm_x86_ops->vm_uninit(kvm);
>
> vm_uninit() doesn't seem to have much to do with kvm_free_vcpus(),
> please call it from kvm_arch_destroy_vm().
>
> (kvm_x86_ops.vm_destroy would be a better name then.)
>

Okay. I'll rename this and move the hook to be called from
kvm_arch_destroy_vm().

Thanks,
Suravee

2016-04-18 22:01:20

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PART1 RFC v4 02/11] KVM: x86: Introducing kvm_x86_ops VM init/uninit hooks

Paolo,

On 04/12/2016 04:55 PM, Paolo Bonzini wrote:
>
>
> On 11/04/2016 22:49, Radim Krčmář wrote:
>>>> @@ -7781,6 +7784,9 @@ static void kvm_free_vcpus(struct kvm *kvm)
>>>> kvm_for_each_vcpu(i, vcpu, kvm)
>>>> kvm_arch_vcpu_free(vcpu);
>>>>
>>>> + if (kvm_x86_ops->vm_uninit)
>>>> + kvm_x86_ops->vm_uninit(kvm);
>> vm_uninit() doesn't seem to have much to do with kvm_free_vcpus(),
>> please call it from kvm_arch_destroy_vm().
>>
>> (kvm_x86_ops.vm_destroy would be a better name then.)
>
> Especially, you're calling it with struct kvm full of dangling pointer,
> so please call it early, right after the "if (current->mm == kvm->mm)"
> block.
>
> Paolo

Good point.

Thanks,
Suravee

2016-04-28 22:08:31

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

Hi Radim / Paolo,

Sorry for delay in response.

On 4/12/2016 11:22 AM, Radim Krčmář wrote:
> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <[email protected]>
>>
>> This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
>> and avic_unaccelerated_access_interception() along with two trace points
>> (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> @@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
>> +static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool flat)
>> +{
>> + struct kvm_arch *vm_data = &vcpu->kvm->arch;
>> + int index;
>> + u32 *logical_apic_id_table;
>> +
>> + if (flat) { /* flat */
>> + if (mda > 7)
>
> Don't you want to check that just one bit it set?
>
>> + return NULL;
>> + index = mda;
>> + } else { /* cluster */
>> + int apic_id = mda & 0xf;
>> + int cluster_id = (mda & 0xf0) >> 8;
>
> ">> 4".
>
>> +
>> + if (apic_id > 4 || cluster_id >= 0xf)
>> + return NULL;
>> + index = (cluster_id << 2) + apic_id;
>
> ffs(apic_id), because 'apic_id' must be compacted into 2 bits.
>
>> + }
>> + logical_apic_id_table = (u32 *) page_address(vm_data->avic_logical_id_table_page);
>> +
>> + return &logical_apic_id_table[index];
>> +}

I have quite messed up in the logic here for handling the logical
cluster ID. Sorry for not catching this earlier. I'm rewriting this
function altogether to simplify it in the V5.

>> [...]
>> + lid = ffs(dlid) - 1;
>> + ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
>> + if (ret)
>> + return 0;
>
> OS can actually change LDR, so the old one should be invalidated.
>
> (No OS does, but that is not an important factor for the hypervisor.)
>

By validating the old one, are you suggesting that we should disable the
logical APIC table entry previously used by this vcpu? If so, that means
we would need to cached the previous LDR value since the one in vAPIC
backing page would already be updated.

>> [...]
>
>> + if (vm_data->ldr_mode != mod) {
>> + clear_page(page_address(vm_data->avic_logical_id_table_page));
>> + vm_data->ldr_mode = mod;
>> + }
>> + break;
>> + }
>
> All these cases need to be called on KVM_SET_LAPIC -- the userspace can
> provide completely new set of APIC registers and AVIC should build its
> maps with them. Test with save/restore or migration.

Hm.. This means we might need to introduce a new hook which is called
from the arch/x86/kvm/lapic.c: kvm_apic_post_state_restore(). Probably
something like kvm_x86_ops->apic_post_state_restore(), which sets up the
new physical and logical APIC id tables for AVIC.

If this works, I'll add support to handle this and test with the
migration stuff in the V5.

>> + if (offset >= 0x400) {
>> + WARN(1, "Unsupported APIC offset %#x\n", offset);
>
> "printk_ratelimited(KERN_INFO " is the most severe message you could
> give. I think that not printing anything is best,
>
>> + return ret;
>
> because we should not return, but continue to emulate the access.

Then, this would continue as if we are handling the normal fault access.

>
>> + }
>> +
>> + if (trap) {
>> + /* Handling Trap */
>> + if (!write) /* Trap read should never happens */
>> + BUG();
>
> (BUG_ON(!write) is shorter, though I would avoid BUG -- only guests are
> going to fail, so we don't need to kill the host.)

Ok. What about just WARN_ONCE(!write, "svm: Handling trap read.\n");

Thanks,
Suravee

2016-04-29 14:56:34

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-28 17:08-0500, Suravee Suthikulanit:
> On 4/12/2016 11:22 AM, Radim Krčmář wrote:
>> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>> > From: Suravee Suthikulpanit <[email protected]>
>> >
>> > This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
>> > and avic_unaccelerated_access_interception() along with two trace points
>> > (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).
>> >
>> > Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> > ---
>> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> > @@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
>> > [...]
>> > + lid = ffs(dlid) - 1;
>> > + ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
>> > + if (ret)
>> > + return 0;
>>
>> OS can actually change LDR, so the old one should be invalidated.
>>
>> (No OS does, but that is not an important factor for the hypervisor.)
>>
>
> By validating the old one, are you suggesting that we should disable the
> logical APIC table entry previously used by this vcpu? If so, that means we
> would need to cached the previous LDR value since the one in vAPIC backing
> page would already be updated.

Yes, the cache could be used to speed up recalculate_apic_map() too, so
it might not be a total waste.

Which reminds me that physical APIC_ID doesn't use correct cache.
svm->vcpu.vcpu_id is only the initial ID, but APIC_ID could be changed
more than once.
It would be great to make APIC_ID read-only in all cases, because x86
spec allows us to do so, but KVM_SET_LAPIC can set APIC ID too and I
think we don't retroactively modify userspace API ... Paolo?

>> > [...]
>>
>> > + if (vm_data->ldr_mode != mod) {
>> > + clear_page(page_address(vm_data->avic_logical_id_table_page));
>> > + vm_data->ldr_mode = mod;
>> > + }
>> > + break;
>> > + }
>>
>> All these cases need to be called on KVM_SET_LAPIC -- the userspace can
>> provide completely new set of APIC registers and AVIC should build its
>> maps with them. Test with save/restore or migration.
>
> Hm.. This means we might need to introduce a new hook which is called from
> the arch/x86/kvm/lapic.c: kvm_apic_post_state_restore(). Probably something
> like kvm_x86_ops->apic_post_state_restore(), which sets up the new physical
> and logical APIC id tables for AVIC.

Sounds good. I imagine the callback would just call
avic_unaccel_trap_write() for relevant registers.

>> > + return ret;
>>
>> because we should not return, but continue to emulate the access.
>
> Then, this would continue as if we are handling the normal fault access.

Exactly, it is a normal access to an undefined register.

>> > + }
>> > +
>> > + if (trap) {
>> > + /* Handling Trap */
>> > + if (!write) /* Trap read should never happens */
>> > + BUG();
>>
>> (BUG_ON(!write) is shorter, though I would avoid BUG -- only guests are
>> going to fail, so we don't need to kill the host.)
>
> Ok. What about just WARN_ONCE(!write, "svm: Handling trap read.\n");

Sure, it's a hardware bug and calling avic_unaccel_trap_write() on a
read access shouldn't result in a bug. I am slightly inclined towards
'if (trap && write)' and optional 'WARN_ONCE(trap,' in the else branch.

Thanks.