2017-12-18 17:17:56

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 0/7] KVM: nVMX: enlightened VMCS initial implementation

The original author of these patches does no longer work at Red Hat, I
agreed to take this over and send upstream. Here is his original
description:

"Makes KVM implement the enlightened VMCS feature per Hyper-V TLFS 5.0b.
I've measured about %5 improvement in cost of a nested VM exit (Hyper-V
enabled Windows Server 2016 nested in KVM)."

This is just an initial implementation. By leveraging clean fields mask
we can further improve performance. I'm also interested in implementing
the other part of the feature: consuming enlightened VMCS when KVM is
running on top of Hyper-V.

Ladi Prosek (7):
KVM: x86: rename HV_X64_MSR_APIC_ASSIST_PAGE to
HV_X64_MSR_VP_ASSIST_PAGE
KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS
KVM: nVMX: add I/O exit ECX, ESI, EDI, EIP vmcs12 fields
KVM: hyperv: define VP assist page structure and add helpers
KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability
KVM: nVMX: add enlightened VMCS state
KVM: nVMX: implement enlightened VMPTRLD

arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/include/asm/vmx.h | 4 +
arch/x86/include/uapi/asm/hyperv.h | 20 +-
arch/x86/kvm/hyperv.c | 31 ++-
arch/x86/kvm/hyperv.h | 4 +
arch/x86/kvm/lapic.c | 4 +-
arch/x86/kvm/lapic.h | 4 +-
arch/x86/kvm/svm.c | 9 +
arch/x86/kvm/vmx.c | 467 ++++++++++++++++++++++++++-----------
arch/x86/kvm/x86.c | 19 +-
include/uapi/linux/kvm.h | 1 +
11 files changed, 407 insertions(+), 159 deletions(-)

--
2.14.3


2017-12-18 17:18:12

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 3/7] KVM: nVMX: add I/O exit ECX, ESI, EDI, EIP vmcs12 fields

From: Ladi Prosek <[email protected]>

These non-synthetic VMCS fields were not supported by KVM thus far. The
layout is according to Hyper-V TLFS 5.0b, the physical encoding according
to the Intel SDM.

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/vmx.h | 4 ++++
arch/x86/kvm/vmx.c | 9 ++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 8b6780751132..92a10aa839e6 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -298,6 +298,10 @@ enum vmcs_field {
CR3_TARGET_VALUE2 = 0x0000600c,
CR3_TARGET_VALUE3 = 0x0000600e,
EXIT_QUALIFICATION = 0x00006400,
+ EXIT_IO_INSTR_ECX = 0x00006402,
+ EXIT_IO_INSTR_ESI = 0x00006404,
+ EXIT_IO_INSTR_EDI = 0x00006406,
+ EXIT_IO_INSTR_EIP = 0x00006408,
GUEST_LINEAR_ADDRESS = 0x0000640a,
GUEST_CR0 = 0x00006800,
GUEST_CR3 = 0x00006802,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cd5f29a57880..f3215b6a0531 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -391,7 +391,10 @@ struct __packed vmcs12 {
u32 vmx_instruction_info;

natural_width exit_qualification;
- natural_width padding64_3[4];
+ natural_width exit_io_instr_ecx;
+ natural_width exit_io_instr_esi;
+ natural_width exit_io_instr_edi;
+ natural_width exit_io_instr_eip;

natural_width guest_linear_address;
natural_width guest_rsp;
@@ -913,6 +916,10 @@ static const unsigned short vmcs_field_to_offset_table[] = {
FIELD(CR3_TARGET_VALUE2, cr3_target_value2),
FIELD(CR3_TARGET_VALUE3, cr3_target_value3),
FIELD(EXIT_QUALIFICATION, exit_qualification),
+ FIELD(EXIT_IO_INSTR_ECX, exit_io_instr_ecx),
+ FIELD(EXIT_IO_INSTR_ESI, exit_io_instr_esi),
+ FIELD(EXIT_IO_INSTR_EDI, exit_io_instr_edi),
+ FIELD(EXIT_IO_INSTR_EIP, exit_io_instr_eip),
FIELD(GUEST_LINEAR_ADDRESS, guest_linear_address),
FIELD(GUEST_CR0, guest_cr0),
FIELD(GUEST_CR3, guest_cr3),
--
2.14.3

2017-12-18 17:18:21

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 4/7] KVM: hyperv: define VP assist page structure and add helpers

From: Ladi Prosek <[email protected]>

Structure layout is specified in Hyper-V TLFS 5.0b.

The state related to the VP assist page is still managed by the LAPIC
code in the pv_eoi field.

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/uapi/asm/hyperv.h | 10 ++++++++++
arch/x86/kvm/hyperv.c | 23 +++++++++++++++++++++--
arch/x86/kvm/hyperv.h | 4 ++++
arch/x86/kvm/lapic.c | 4 ++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/x86.c | 2 +-
6 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index a742af3e8408..032580cab492 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -385,6 +385,16 @@ struct hv_timer_message_payload {
__u64 delivery_time; /* When the message was delivered */
};

+/* Define virtual processor assist page structure. */
+struct hv_vp_assist_page {
+ __u32 apic_assist;
+ __u32 reserved;
+ __u64 vtl_control[2];
+ __u64 nested_enlightenments_control[2];
+ __u32 enlighten_vmentry;
+ __u64 current_nested_vmcs;
+};
+
#define HV_STIMER_ENABLE (1ULL << 0)
#define HV_STIMER_PERIODIC (1ULL << 1)
#define HV_STIMER_LAZY (1ULL << 2)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 9fb0ed9b1670..f91db96ee2d6 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -667,6 +667,24 @@ void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)
stimer_cleanup(&hv_vcpu->stimer[i]);
}

+bool kvm_hv_assist_page_enabled(struct kvm_vcpu *vcpu)
+{
+ if (!(vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE))
+ return false;
+ return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
+}
+EXPORT_SYMBOL_GPL(kvm_hv_assist_page_enabled);
+
+bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu,
+ struct hv_vp_assist_page *assist_page)
+{
+ if (!kvm_hv_assist_page_enabled(vcpu))
+ return false;
+ return !kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data,
+ assist_page, sizeof(*assist_page));
+}
+EXPORT_SYMBOL_GPL(kvm_hv_get_assist_page);
+
static void stimer_prepare_msg(struct kvm_vcpu_hv_stimer *stimer)
{
struct hv_message *msg = &stimer->msg;
@@ -1015,7 +1033,7 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)

if (!(data & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE)) {
hv->hv_vapic = data;
- if (kvm_lapic_enable_pv_eoi(vcpu, 0))
+ if (kvm_lapic_enable_pv_eoi(vcpu, 0, 0))
return 1;
break;
}
@@ -1028,7 +1046,8 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
hv->hv_vapic = data;
kvm_vcpu_mark_page_dirty(vcpu, gfn);
if (kvm_lapic_enable_pv_eoi(vcpu,
- gfn_to_gpa(gfn) | KVM_MSR_ENABLED))
+ gfn_to_gpa(gfn) | KVM_MSR_ENABLED,
+ sizeof(struct hv_vp_assist_page)))
return 1;
break;
}
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index e637631a9574..d963ddb6cd63 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -62,6 +62,10 @@ void kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
void kvm_hv_vcpu_postcreate(struct kvm_vcpu *vcpu);
void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu);

+bool kvm_hv_assist_page_enabled(struct kvm_vcpu *vcpu);
+bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu,
+ struct hv_vp_assist_page *assist_page);
+
static inline struct kvm_vcpu_hv_stimer *vcpu_to_stimer(struct kvm_vcpu *vcpu,
int timer_index)
{
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e2c1fb8d35ce..846c1a192eb9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2524,7 +2524,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
return 0;
}

-int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len)
{
u64 addr = data & ~KVM_MSR_ENABLED;
if (!IS_ALIGNED(addr, 4))
@@ -2534,7 +2534,7 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
if (!pv_eoi_enabled(vcpu))
return 0;
return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
- addr, sizeof(u8));
+ addr, len);
}

void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index f7ee56819add..f6c43f320cc2 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -112,7 +112,7 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
}

-int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
+int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len);
void kvm_lapic_init(void);
void kvm_lapic_exit(void);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc06af73128e..08eff1cd64bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2290,7 +2290,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

break;
case MSR_KVM_PV_EOI_EN:
- if (kvm_lapic_enable_pv_eoi(vcpu, data))
+ if (kvm_lapic_enable_pv_eoi(vcpu, data, sizeof(u8)))
return 1;
break;

--
2.14.3

2017-12-18 17:18:32

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 6/7] KVM: nVMX: add enlightened VMCS state

From: Ladi Prosek <[email protected]>

Adds two bool fields and implements copy_enlightened_to_vmcs12() and
copy_enlightened_to_vmcs12().

Unlike shadow VMCS, enlightened VMCS is para-virtual and active only if
the nested guest explicitly enables it. The pattern repeating itself a
few times throughout this patch:

if (vmx->nested.enlightened_vmcs_active) {
/* enlightened! */
} else if (enable_shadow_vmcs) {
/* fall-back */
}

reflects this. If the nested guest elects to not use enlightened VMCS,
the regular HW-assisted shadow VMCS feature is used, if enabled.
enlightened_vmcs_active is never going to be true if
enlightened_vmcs_enabled is not set.

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 320bb6670413..00b4a362351d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -503,6 +503,16 @@ struct nested_vmx {
* on what the enlightened VMCS supports.
*/
bool enlightened_vmcs_enabled;
+ /*
+ * Indicates that the nested hypervisor performed the last vmentry with
+ * a Hyper-V enlightened VMCS.
+ */
+ bool enlightened_vmcs_active;
+
+ /*
+ * Indicates that the enlightened VMCS must be synced with vmcs12
+ */
+ bool sync_enlightened_vmcs;

/* vmcs02_list cache of VMCSs recently used to run L2 guests */
struct list_head vmcs02_pool;
@@ -991,6 +1001,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
static bool guest_state_valid(struct kvm_vcpu *vcpu);
static u32 vmx_segment_access_rights(struct kvm_segment *var);
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
+static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx);
static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
@@ -7455,7 +7466,10 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
if (vmx->nested.current_vmptr == -1ull)
return;

- if (enable_shadow_vmcs) {
+ if (vmx->nested.enlightened_vmcs_active) {
+ copy_enlightened_to_vmcs12(vmx);
+ vmx->nested.sync_enlightened_vmcs = false;
+ } else if (enable_shadow_vmcs) {
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
@@ -7642,6 +7656,20 @@ static inline int vmcs12_write_any(struct kvm_vcpu *vcpu,

}

+static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx)
+{
+ kvm_vcpu_read_guest_page(&vmx->vcpu,
+ vmx->nested.current_vmptr >> PAGE_SHIFT,
+ vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
+}
+
+static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx)
+{
+ kvm_vcpu_write_guest_page(&vmx->vcpu,
+ vmx->nested.current_vmptr >> PAGE_SHIFT,
+ vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
+}
+
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
{
int i;
@@ -7841,7 +7869,9 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
static void set_current_vmptr(struct vcpu_vmx *vmx, gpa_t vmptr)
{
vmx->nested.current_vmptr = vmptr;
- if (enable_shadow_vmcs) {
+ if (vmx->nested.enlightened_vmcs_active) {
+ vmx->nested.sync_enlightened_vmcs = true;
+ } else if (enable_shadow_vmcs) {
vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER,
@@ -9396,7 +9426,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmcs_write32(PLE_WINDOW, vmx->ple_window);
}

- if (vmx->nested.sync_shadow_vmcs) {
+ if (vmx->nested.sync_enlightened_vmcs) {
+ copy_vmcs12_to_enlightened(vmx);
+ vmx->nested.sync_enlightened_vmcs = false;
+ } else if (vmx->nested.sync_shadow_vmcs) {
copy_vmcs12_to_shadow(vmx);
vmx->nested.sync_shadow_vmcs = false;
}
@@ -11017,7 +11050,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)

vmcs12 = get_vmcs12(vcpu);

- if (enable_shadow_vmcs)
+ if (vmx->nested.enlightened_vmcs_active)
+ copy_enlightened_to_vmcs12(vmx);
+ else if (enable_shadow_vmcs)
copy_shadow_to_vmcs12(vmx);

/*
@@ -11634,8 +11669,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
*/
kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);

- if (enable_shadow_vmcs && exit_reason != -1)
- vmx->nested.sync_shadow_vmcs = true;
+ if (exit_reason != -1) {
+ if (vmx->nested.enlightened_vmcs_active)
+ vmx->nested.sync_enlightened_vmcs = true;
+ else if (enable_shadow_vmcs)
+ vmx->nested.sync_shadow_vmcs = true;
+ }

/* in case we halted in L2 */
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
@@ -11714,12 +11753,17 @@ static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12,
u32 reason, unsigned long qualification)
{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
load_vmcs12_host_state(vcpu, vmcs12);
vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
vmcs12->exit_qualification = qualification;
nested_vmx_succeed(vcpu);
- if (enable_shadow_vmcs)
- to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
+
+ if (vmx->nested.enlightened_vmcs_active)
+ vmx->nested.sync_enlightened_vmcs = true;
+ else if (enable_shadow_vmcs)
+ vmx->nested.sync_shadow_vmcs = true;
}

static int vmx_check_intercept(struct kvm_vcpu *vcpu,
--
2.14.3

2017-12-18 17:18:03

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 1/7] KVM: x86: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE

From: Ladi Prosek <[email protected]>

The assist page has been used only for the paravirtual EOI so far, hence
the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's
called "Virtual VP Assist MSR".

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/uapi/asm/hyperv.h | 10 +++++-----
arch/x86/kvm/hyperv.c | 8 ++++----
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/x86.c | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfead93b4..a742af3e8408 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -186,7 +186,7 @@
#define HV_X64_MSR_EOI 0x40000070
#define HV_X64_MSR_ICR 0x40000071
#define HV_X64_MSR_TPR 0x40000072
-#define HV_X64_MSR_APIC_ASSIST_PAGE 0x40000073
+#define HV_X64_MSR_VP_ASSIST_PAGE 0x40000073

/* Define synthetic interrupt controller model specific registers. */
#define HV_X64_MSR_SCONTROL 0x40000080
@@ -248,10 +248,10 @@
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d

-#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE 0x00000001
-#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT 12
-#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
- (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
+#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK \
+ (~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x00000001
#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index dc97f2544b6f..9fb0ed9b1670 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1009,17 +1009,17 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
return 1;
hv->vp_index = (u32)data;
break;
- case HV_X64_MSR_APIC_ASSIST_PAGE: {
+ case HV_X64_MSR_VP_ASSIST_PAGE: {
u64 gfn;
unsigned long addr;

- if (!(data & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE)) {
+ if (!(data & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE)) {
hv->hv_vapic = data;
if (kvm_lapic_enable_pv_eoi(vcpu, 0))
return 1;
break;
}
- gfn = data >> HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT;
+ gfn = data >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT;
addr = kvm_vcpu_gfn_to_hva(vcpu, gfn);
if (kvm_is_error_hva(addr))
return 1;
@@ -1129,7 +1129,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata);
case HV_X64_MSR_TPR:
return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata);
- case HV_X64_MSR_APIC_ASSIST_PAGE:
+ case HV_X64_MSR_VP_ASSIST_PAGE:
data = hv->hv_vapic;
break;
case HV_X64_MSR_VP_RUNTIME:
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4b9935a38347..f7ee56819add 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -109,7 +109,7 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data);

static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
{
- return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
+ return vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
}

int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index faf843c9b916..fc06af73128e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1026,7 +1026,7 @@ static u32 emulated_msrs[] = {
HV_X64_MSR_VP_RUNTIME,
HV_X64_MSR_SCONTROL,
HV_X64_MSR_STIMER0_CONFIG,
- HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ HV_X64_MSR_VP_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,

MSR_IA32_TSC_ADJUST,
--
2.14.3

2017-12-18 17:18:34

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 7/7] KVM: nVMX: implement enlightened VMPTRLD

From: Ladi Prosek <[email protected]>

Per Hyper-V TLFS 5.0b:

"The L1 hypervisor may choose to use enlightened VMCSs by writing 1 to
the corresponding field in the VP assist page (see section 7.8.7).
Another field in the VP assist page controls the currently active
enlightened VMCS. Each enlightened VMCS is exactly one page (4 KB) in
size and must be initially zeroed. No VMPTRLD instruction must be
executed to make an enlightened VMCS active or current.

After the L1 hypervisor performs a VM entry with an enlightened VMCS,
the VMCS is considered active on the processor. An enlightened VMCS
can only be active on a single processor at the same time. The L1
hypervisor can execute a VMCLEAR instruction to transition an
enlightened VMCS from the active to the non-active state. Any VMREAD
or VMWRITE instructions while an enlightened VMCS is active is
unsupported and can result in unexpected behavior."

Note that we choose to not modify our VMREAD, VMWRITE, and VMPTRLD
handlers. They will not cause any explicit failure but may not have
the intended effect.

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 00b4a362351d..f7f6f7d18ade 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -20,6 +20,7 @@
#include "mmu.h"
#include "cpuid.h"
#include "lapic.h"
+#include "hyperv.h"

#include <linux/kvm_host.h>
#include <linux/module.h>
@@ -7935,6 +7936,30 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
return kvm_skip_emulated_instruction(vcpu);
}

+static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct hv_vp_assist_page assist_page;
+
+ if (!vmx->nested.enlightened_vmcs_enabled)
+ return 1;
+
+ vmx->nested.enlightened_vmcs_active =
+ kvm_hv_get_assist_page(vcpu, &assist_page) &&
+ assist_page.enlighten_vmentry;
+
+ if (vmx->nested.enlightened_vmcs_active &&
+ assist_page.current_nested_vmcs != vmx->nested.current_vmptr) {
+ /*
+ * This is an equivalent of the nested hypervisor executing
+ * the vmptrld instruction.
+ */
+ set_current_vmptr(vmx, assist_page.current_nested_vmcs);
+ copy_enlightened_to_vmcs12(vmx);
+ }
+ return 1;
+}
+
/* Emulate the VMPTRST instruction */
static int handle_vmptrst(struct kvm_vcpu *vcpu)
{
@@ -11045,6 +11070,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
if (!nested_vmx_check_permission(vcpu))
return 1;

+ if (!nested_vmx_handle_enlightened_vmptrld(vcpu))
+ return 1;
+
if (!nested_vmx_check_vmcs12(vcpu))
goto out;

--
2.14.3

2017-12-18 17:19:09

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 5/7] KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability

From: Ladi Prosek <[email protected]>

Enlightened VMCS is opt-in. The current version does not contain all
fields supported by nested VMX so we must not advertise the
corresponding VMX features if enlightened VMCS is enabled.

Userspace is given the enlightened VMCS version supported by KVM as
part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to
be advertised to the nested hypervisor, currently done via a cpuid
leaf for Hyper-V.

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/svm.c | 9 ++++++++
arch/x86/kvm/vmx.c | 51 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 15 ++++++++++++
include/uapi/linux/kvm.h | 1 +
5 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 516798431328..79c188ae7837 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1079,6 +1079,9 @@ struct kvm_x86_ops {
int (*pre_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
int (*pre_leave_smm)(struct kvm_vcpu *vcpu, u64 smbase);
int (*enable_smi_window)(struct kvm_vcpu *vcpu);
+
+ int (*enable_enlightened_vmcs)(struct kvm_vcpu *vcpu,
+ uint16_t *vmcs_version);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index eb714f1cdf7e..6dc28d53bb89 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5505,6 +5505,13 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
return 0;
}

+static int enable_enlightened_vmcs(struct kvm_vcpu *vcpu,
+ uint16_t *vmcs_version)
+{
+ /* Intel-only feature */
+ return -ENODEV;
+}
+
static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -5620,6 +5627,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.pre_enter_smm = svm_pre_enter_smm,
.pre_leave_smm = svm_pre_leave_smm,
.enable_smi_window = enable_smi_window,
+
+ .enable_enlightened_vmcs = enable_enlightened_vmcs,
};

static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f3215b6a0531..320bb6670413 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -464,6 +464,8 @@ struct __packed vmcs12 {
*/
#define VMCS12_SIZE 0x1000

+#define ENLIGHTENED_VMCS_VERSION (1 | (1u << 8))
+
/* Used to remember the last vmcs02 used for some recently used vmcs12s */
struct vmcs02_list {
struct list_head list;
@@ -495,6 +497,13 @@ struct nested_vmx {
*/
bool sync_shadow_vmcs;

+ /*
+ * Enlightened VMCS has been enabled. It does not mean that L1 has to
+ * use it. However, VMX features available to L1 will be limited based
+ * on what the enlightened VMCS supports.
+ */
+ bool enlightened_vmcs_enabled;
+
/* vmcs02_list cache of VMCSs recently used to run L2 guests */
struct list_head vmcs02_pool;
int vmcs02_num;
@@ -12129,6 +12138,46 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
return 0;
}

+static int enable_enlightened_vmcs(struct kvm_vcpu *vcpu,
+ uint16_t *vmcs_version)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ /* We don't support disabling the feature for simplicity. */
+ if (vmx->nested.enlightened_vmcs_enabled)
+ return 0;
+ vmx->nested.enlightened_vmcs_enabled = true;
+ *vmcs_version = ENLIGHTENED_VMCS_VERSION;
+
+ /*
+ * Enlightened VMCS doesn't have the POSTED_INTR_DESC_ADDR,
+ * POSTED_INTR_NV, VMX_PREEMPTION_TIMER_VALUE,
+ * GUEST_IA32_PERF_GLOBAL_CTRL, and HOST_IA32_PERF_GLOBAL_CTRL
+ * fields.
+ */
+ vmx->nested.nested_vmx_pinbased_ctls_high &=
+ ~(PIN_BASED_POSTED_INTR |
+ PIN_BASED_VMX_PREEMPTION_TIMER);
+ vmx->nested.nested_vmx_entry_ctls_high &=
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+ vmx->nested.nested_vmx_exit_ctls_high &=
+ ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+
+ /*
+ * Enlightened VMCS doesn't have the APIC_ACCESS_ADDR,
+ * EOI_EXIT_BITMAP*, GUEST_INTR_STATUS, VM_FUNCTION_CONTROL,
+ * EPTP_LIST_ADDRESS, PML_ADDRESS, and GUEST_PML_INDEX fields.
+ */
+ vmx->nested.nested_vmx_secondary_ctls_high &=
+ ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+ SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+ SECONDARY_EXEC_ENABLE_VMFUNC |
+ SECONDARY_EXEC_ENABLE_PML);
+ vmx->nested.nested_vmx_vmfunc_controls &=
+ ~VMX_VMFUNC_EPTP_SWITCHING;
+ return 0;
+}
+
static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -12259,6 +12308,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.pre_enter_smm = vmx_pre_enter_smm,
.pre_leave_smm = vmx_pre_leave_smm,
.enable_smi_window = enable_smi_window,
+
+ .enable_enlightened_vmcs = enable_enlightened_vmcs,
};

static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 08eff1cd64bd..9ab0988317d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2701,6 +2701,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_HYPERV_SYNIC:
case KVM_CAP_HYPERV_SYNIC2:
case KVM_CAP_HYPERV_VP_INDEX:
+ case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
case KVM_CAP_PCI_SEGMENT:
case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:
@@ -3442,6 +3443,10 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
struct kvm_enable_cap *cap)
{
+ int r;
+ uint16_t vmcs_version;
+ void __user *user_ptr;
+
if (cap->flags)
return -EINVAL;

@@ -3454,6 +3459,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
return -EINVAL;
return kvm_hv_activate_synic(vcpu, cap->cap ==
KVM_CAP_HYPERV_SYNIC2);
+ case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
+ r = kvm_x86_ops->enable_enlightened_vmcs(vcpu, &vmcs_version);
+ if (!r) {
+ user_ptr = (void __user *)(uintptr_t)cap->args[0];
+ if (copy_to_user(user_ptr, &vmcs_version,
+ sizeof(vmcs_version)))
+ r = -EFAULT;
+ }
+ return r;
+
default:
return -EINVAL;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 496e59a2738b..728dfa2f5638 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_HYPERV_SYNIC2 148
#define KVM_CAP_HYPERV_VP_INDEX 149
#define KVM_CAP_S390_AIS_MIGRATION 150
+#define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 151

#ifdef KVM_CAP_IRQ_ROUTING

--
2.14.3

2017-12-18 17:20:31

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

From: Ladi Prosek <[email protected]>

Reorders existing fields and adds fields specific to Hyper-V. The layout
now matches Hyper-V TLFS 5.0b 16.11.2 Enlightened VMCS.

Fields used by KVM but missing from Hyper-V are placed in the second half
of the VMCS page to minimize the chances they will clash with future
enlightened VMCS versions.

Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
[Vitaly]: Update VMCS12_REVISION to some new arbitrary number.
---
arch/x86/kvm/vmx.c | 321 +++++++++++++++++++++++++++++++----------------------
1 file changed, 187 insertions(+), 134 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8eba631c4dbd..cd5f29a57880 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -239,159 +239,212 @@ struct __packed vmcs12 {
u32 revision_id;
u32 abort;

+ union {
+ u64 hv_vmcs[255];
+ struct {
+ u16 host_es_selector;
+ u16 host_cs_selector;
+ u16 host_ss_selector;
+ u16 host_ds_selector;
+ u16 host_fs_selector;
+ u16 host_gs_selector;
+ u16 host_tr_selector;
+
+ u64 host_ia32_pat;
+ u64 host_ia32_efer;
+
+ /*
+ * To allow migration of L1 (complete with its L2
+ * guests) between machines of different natural widths
+ * (32 or 64 bit), we cannot have unsigned long fields
+ * with no explicit size. We use u64 (aliased
+ * natural_width) instead. Luckily, x86 is
+ * little-endian.
+ */
+ natural_width host_cr0;
+ natural_width host_cr3;
+ natural_width host_cr4;
+
+ natural_width host_ia32_sysenter_esp;
+ natural_width host_ia32_sysenter_eip;
+ natural_width host_rip;
+ u32 host_ia32_sysenter_cs;
+
+ u32 pin_based_vm_exec_control;
+ u32 vm_exit_controls;
+ u32 secondary_vm_exec_control;
+
+ u64 io_bitmap_a;
+ u64 io_bitmap_b;
+ u64 msr_bitmap;
+
+ u16 guest_es_selector;
+ u16 guest_cs_selector;
+ u16 guest_ss_selector;
+ u16 guest_ds_selector;
+ u16 guest_fs_selector;
+ u16 guest_gs_selector;
+ u16 guest_ldtr_selector;
+ u16 guest_tr_selector;
+
+ u32 guest_es_limit;
+ u32 guest_cs_limit;
+ u32 guest_ss_limit;
+ u32 guest_ds_limit;
+ u32 guest_fs_limit;
+ u32 guest_gs_limit;
+ u32 guest_ldtr_limit;
+ u32 guest_tr_limit;
+ u32 guest_gdtr_limit;
+ u32 guest_idtr_limit;
+
+ u32 guest_es_ar_bytes;
+ u32 guest_cs_ar_bytes;
+ u32 guest_ss_ar_bytes;
+ u32 guest_ds_ar_bytes;
+ u32 guest_fs_ar_bytes;
+ u32 guest_gs_ar_bytes;
+ u32 guest_ldtr_ar_bytes;
+ u32 guest_tr_ar_bytes;
+
+ natural_width guest_es_base;
+ natural_width guest_cs_base;
+ natural_width guest_ss_base;
+ natural_width guest_ds_base;
+ natural_width guest_fs_base;
+ natural_width guest_gs_base;
+ natural_width guest_ldtr_base;
+ natural_width guest_tr_base;
+ natural_width guest_gdtr_base;
+ natural_width guest_idtr_base;
+
+ u64 padding64_1[3];
+
+ u64 vm_exit_msr_store_addr;
+ u64 vm_exit_msr_load_addr;
+ u64 vm_entry_msr_load_addr;
+
+ natural_width cr3_target_value0;
+ natural_width cr3_target_value1;
+ natural_width cr3_target_value2;
+ natural_width cr3_target_value3;
+
+ u32 page_fault_error_code_mask;
+ u32 page_fault_error_code_match;
+
+ u32 cr3_target_count;
+ u32 vm_exit_msr_store_count;
+ u32 vm_exit_msr_load_count;
+ u32 vm_entry_msr_load_count;
+
+ u64 tsc_offset;
+ u64 virtual_apic_page_addr;
+ u64 vmcs_link_pointer;
+
+ u64 guest_ia32_debugctl;
+ u64 guest_ia32_pat;
+ u64 guest_ia32_efer;
+
+ u64 guest_pdptr0;
+ u64 guest_pdptr1;
+ u64 guest_pdptr2;
+ u64 guest_pdptr3;
+
+ natural_width guest_pending_dbg_exceptions;
+ natural_width guest_sysenter_esp;
+ natural_width guest_sysenter_eip;
+
+ u32 guest_activity_state;
+ u32 guest_sysenter_cs;
+
+ natural_width cr0_guest_host_mask;
+ natural_width cr4_guest_host_mask;
+ natural_width cr0_read_shadow;
+ natural_width cr4_read_shadow;
+ natural_width guest_cr0;
+ natural_width guest_cr3;
+ natural_width guest_cr4;
+ natural_width guest_dr7;
+
+ natural_width host_fs_base;
+ natural_width host_gs_base;
+ natural_width host_tr_base;
+ natural_width host_gdtr_base;
+ natural_width host_idtr_base;
+ natural_width host_rsp;
+
+ u64 ept_pointer;
+
+ u16 virtual_processor_id;
+ u16 padding16[3];
+
+ u64 padding64_2[5];
+ u64 guest_physical_address;
+
+ u32 vm_instruction_error;
+ u32 vm_exit_reason;
+ u32 vm_exit_intr_info;
+ u32 vm_exit_intr_error_code;
+ u32 idt_vectoring_info_field;
+ u32 idt_vectoring_error_code;
+ u32 vm_exit_instruction_len;
+ u32 vmx_instruction_info;
+
+ natural_width exit_qualification;
+ natural_width padding64_3[4];
+
+ natural_width guest_linear_address;
+ natural_width guest_rsp;
+ natural_width guest_rflags;
+
+ u32 guest_interruptibility_info;
+ u32 cpu_based_vm_exec_control;
+ u32 exception_bitmap;
+ u32 vm_entry_controls;
+ u32 vm_entry_intr_info_field;
+ u32 vm_entry_exception_error_code;
+ u32 vm_entry_instruction_len;
+ u32 tpr_threshold;
+
+ natural_width guest_rip;
+
+ u32 hv_clean_fields;
+ u32 hv_padding_32;
+ u32 hv_synthetic_controls;
+ u32 hv_enlightenments_control;
+ u32 hv_vp_id;
+
+ u64 hv_vm_id;
+ u64 partition_assist_page;
+ u64 padding64_4[4];
+ u64 guest_bndcfgs;
+ u64 padding64_5[7];
+ u64 xss_exit_bitmap;
+ u64 padding64_6[7];
+ };
+ };
+
+ /* Synthetic and KVM-specific fields: */
u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
u32 padding[7]; /* room for future expansion */

- u64 io_bitmap_a;
- u64 io_bitmap_b;
- u64 msr_bitmap;
- u64 vm_exit_msr_store_addr;
- u64 vm_exit_msr_load_addr;
- u64 vm_entry_msr_load_addr;
- u64 tsc_offset;
- u64 virtual_apic_page_addr;
u64 apic_access_addr;
u64 posted_intr_desc_addr;
u64 vm_function_control;
- u64 ept_pointer;
u64 eoi_exit_bitmap0;
u64 eoi_exit_bitmap1;
u64 eoi_exit_bitmap2;
u64 eoi_exit_bitmap3;
u64 eptp_list_address;
- u64 xss_exit_bitmap;
- u64 guest_physical_address;
- u64 vmcs_link_pointer;
u64 pml_address;
- u64 guest_ia32_debugctl;
- u64 guest_ia32_pat;
- u64 guest_ia32_efer;
u64 guest_ia32_perf_global_ctrl;
- u64 guest_pdptr0;
- u64 guest_pdptr1;
- u64 guest_pdptr2;
- u64 guest_pdptr3;
- u64 guest_bndcfgs;
- u64 host_ia32_pat;
- u64 host_ia32_efer;
u64 host_ia32_perf_global_ctrl;
u64 padding64[8]; /* room for future expansion */
- /*
- * To allow migration of L1 (complete with its L2 guests) between
- * machines of different natural widths (32 or 64 bit), we cannot have
- * unsigned long fields with no explict size. We use u64 (aliased
- * natural_width) instead. Luckily, x86 is little-endian.
- */
- natural_width cr0_guest_host_mask;
- natural_width cr4_guest_host_mask;
- natural_width cr0_read_shadow;
- natural_width cr4_read_shadow;
- natural_width cr3_target_value0;
- natural_width cr3_target_value1;
- natural_width cr3_target_value2;
- natural_width cr3_target_value3;
- natural_width exit_qualification;
- natural_width guest_linear_address;
- natural_width guest_cr0;
- natural_width guest_cr3;
- natural_width guest_cr4;
- natural_width guest_es_base;
- natural_width guest_cs_base;
- natural_width guest_ss_base;
- natural_width guest_ds_base;
- natural_width guest_fs_base;
- natural_width guest_gs_base;
- natural_width guest_ldtr_base;
- natural_width guest_tr_base;
- natural_width guest_gdtr_base;
- natural_width guest_idtr_base;
- natural_width guest_dr7;
- natural_width guest_rsp;
- natural_width guest_rip;
- natural_width guest_rflags;
- natural_width guest_pending_dbg_exceptions;
- natural_width guest_sysenter_esp;
- natural_width guest_sysenter_eip;
- natural_width host_cr0;
- natural_width host_cr3;
- natural_width host_cr4;
- natural_width host_fs_base;
- natural_width host_gs_base;
- natural_width host_tr_base;
- natural_width host_gdtr_base;
- natural_width host_idtr_base;
- natural_width host_ia32_sysenter_esp;
- natural_width host_ia32_sysenter_eip;
- natural_width host_rsp;
- natural_width host_rip;
- natural_width paddingl[8]; /* room for future expansion */
- u32 pin_based_vm_exec_control;
- u32 cpu_based_vm_exec_control;
- u32 exception_bitmap;
- u32 page_fault_error_code_mask;
- u32 page_fault_error_code_match;
- u32 cr3_target_count;
- u32 vm_exit_controls;
- u32 vm_exit_msr_store_count;
- u32 vm_exit_msr_load_count;
- u32 vm_entry_controls;
- u32 vm_entry_msr_load_count;
- u32 vm_entry_intr_info_field;
- u32 vm_entry_exception_error_code;
- u32 vm_entry_instruction_len;
- u32 tpr_threshold;
- u32 secondary_vm_exec_control;
- u32 vm_instruction_error;
- u32 vm_exit_reason;
- u32 vm_exit_intr_info;
- u32 vm_exit_intr_error_code;
- u32 idt_vectoring_info_field;
- u32 idt_vectoring_error_code;
- u32 vm_exit_instruction_len;
- u32 vmx_instruction_info;
- u32 guest_es_limit;
- u32 guest_cs_limit;
- u32 guest_ss_limit;
- u32 guest_ds_limit;
- u32 guest_fs_limit;
- u32 guest_gs_limit;
- u32 guest_ldtr_limit;
- u32 guest_tr_limit;
- u32 guest_gdtr_limit;
- u32 guest_idtr_limit;
- u32 guest_es_ar_bytes;
- u32 guest_cs_ar_bytes;
- u32 guest_ss_ar_bytes;
- u32 guest_ds_ar_bytes;
- u32 guest_fs_ar_bytes;
- u32 guest_gs_ar_bytes;
- u32 guest_ldtr_ar_bytes;
- u32 guest_tr_ar_bytes;
- u32 guest_interruptibility_info;
- u32 guest_activity_state;
- u32 guest_sysenter_cs;
- u32 host_ia32_sysenter_cs;
u32 vmx_preemption_timer_value;
u32 padding32[7]; /* room for future expansion */
- u16 virtual_processor_id;
u16 posted_intr_nv;
- u16 guest_es_selector;
- u16 guest_cs_selector;
- u16 guest_ss_selector;
- u16 guest_ds_selector;
- u16 guest_fs_selector;
- u16 guest_gs_selector;
- u16 guest_ldtr_selector;
- u16 guest_tr_selector;
u16 guest_intr_status;
u16 guest_pml_index;
- u16 host_es_selector;
- u16 host_cs_selector;
- u16 host_ss_selector;
- u16 host_ds_selector;
- u16 host_fs_selector;
- u16 host_gs_selector;
- u16 host_tr_selector;
};

/*
@@ -399,7 +452,7 @@ struct __packed vmcs12 {
* layout of struct vmcs12 is changed. MSR_IA32_VMX_BASIC returns this id, and
* VMPTRLD verifies that the VMCS region that L1 is loading contains this id.
*/
-#define VMCS12_REVISION 0x11e57ed0
+#define VMCS12_REVISION 0x11e586b1

/*
* VMCS12_SIZE is the number of bytes L1 should allocate for the VMXON region
--
2.14.3

2017-12-18 20:23:05

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

Yikes! This breaks migration to/from older versions of kvm. Will you
be submitting another change to handle dynamic conversion between
formats?

On Mon, Dec 18, 2017 at 9:17 AM, Vitaly Kuznetsov <[email protected]> wrote:
> From: Ladi Prosek <[email protected]>
>
> Reorders existing fields and adds fields specific to Hyper-V. The layout
> now matches Hyper-V TLFS 5.0b 16.11.2 Enlightened VMCS.
>
> Fields used by KVM but missing from Hyper-V are placed in the second half
> of the VMCS page to minimize the chances they will clash with future
> enlightened VMCS versions.
>
> Signed-off-by: Ladi Prosek <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> [Vitaly]: Update VMCS12_REVISION to some new arbitrary number.
> ---
> arch/x86/kvm/vmx.c | 321 +++++++++++++++++++++++++++++++----------------------
> 1 file changed, 187 insertions(+), 134 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 8eba631c4dbd..cd5f29a57880 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -239,159 +239,212 @@ struct __packed vmcs12 {
> u32 revision_id;
> u32 abort;
>
> + union {
> + u64 hv_vmcs[255];
> + struct {
> + u16 host_es_selector;
> + u16 host_cs_selector;
> + u16 host_ss_selector;
> + u16 host_ds_selector;
> + u16 host_fs_selector;
> + u16 host_gs_selector;
> + u16 host_tr_selector;
> +
> + u64 host_ia32_pat;
> + u64 host_ia32_efer;
> +
> + /*
> + * To allow migration of L1 (complete with its L2
> + * guests) between machines of different natural widths
> + * (32 or 64 bit), we cannot have unsigned long fields
> + * with no explicit size. We use u64 (aliased
> + * natural_width) instead. Luckily, x86 is
> + * little-endian.
> + */
> + natural_width host_cr0;
> + natural_width host_cr3;
> + natural_width host_cr4;
> +
> + natural_width host_ia32_sysenter_esp;
> + natural_width host_ia32_sysenter_eip;
> + natural_width host_rip;
> + u32 host_ia32_sysenter_cs;
> +
> + u32 pin_based_vm_exec_control;
> + u32 vm_exit_controls;
> + u32 secondary_vm_exec_control;
> +
> + u64 io_bitmap_a;
> + u64 io_bitmap_b;
> + u64 msr_bitmap;
> +
> + u16 guest_es_selector;
> + u16 guest_cs_selector;
> + u16 guest_ss_selector;
> + u16 guest_ds_selector;
> + u16 guest_fs_selector;
> + u16 guest_gs_selector;
> + u16 guest_ldtr_selector;
> + u16 guest_tr_selector;
> +
> + u32 guest_es_limit;
> + u32 guest_cs_limit;
> + u32 guest_ss_limit;
> + u32 guest_ds_limit;
> + u32 guest_fs_limit;
> + u32 guest_gs_limit;
> + u32 guest_ldtr_limit;
> + u32 guest_tr_limit;
> + u32 guest_gdtr_limit;
> + u32 guest_idtr_limit;
> +
> + u32 guest_es_ar_bytes;
> + u32 guest_cs_ar_bytes;
> + u32 guest_ss_ar_bytes;
> + u32 guest_ds_ar_bytes;
> + u32 guest_fs_ar_bytes;
> + u32 guest_gs_ar_bytes;
> + u32 guest_ldtr_ar_bytes;
> + u32 guest_tr_ar_bytes;
> +
> + natural_width guest_es_base;
> + natural_width guest_cs_base;
> + natural_width guest_ss_base;
> + natural_width guest_ds_base;
> + natural_width guest_fs_base;
> + natural_width guest_gs_base;
> + natural_width guest_ldtr_base;
> + natural_width guest_tr_base;
> + natural_width guest_gdtr_base;
> + natural_width guest_idtr_base;
> +
> + u64 padding64_1[3];
> +
> + u64 vm_exit_msr_store_addr;
> + u64 vm_exit_msr_load_addr;
> + u64 vm_entry_msr_load_addr;
> +
> + natural_width cr3_target_value0;
> + natural_width cr3_target_value1;
> + natural_width cr3_target_value2;
> + natural_width cr3_target_value3;
> +
> + u32 page_fault_error_code_mask;
> + u32 page_fault_error_code_match;
> +
> + u32 cr3_target_count;
> + u32 vm_exit_msr_store_count;
> + u32 vm_exit_msr_load_count;
> + u32 vm_entry_msr_load_count;
> +
> + u64 tsc_offset;
> + u64 virtual_apic_page_addr;
> + u64 vmcs_link_pointer;
> +
> + u64 guest_ia32_debugctl;
> + u64 guest_ia32_pat;
> + u64 guest_ia32_efer;
> +
> + u64 guest_pdptr0;
> + u64 guest_pdptr1;
> + u64 guest_pdptr2;
> + u64 guest_pdptr3;
> +
> + natural_width guest_pending_dbg_exceptions;
> + natural_width guest_sysenter_esp;
> + natural_width guest_sysenter_eip;
> +
> + u32 guest_activity_state;
> + u32 guest_sysenter_cs;
> +
> + natural_width cr0_guest_host_mask;
> + natural_width cr4_guest_host_mask;
> + natural_width cr0_read_shadow;
> + natural_width cr4_read_shadow;
> + natural_width guest_cr0;
> + natural_width guest_cr3;
> + natural_width guest_cr4;
> + natural_width guest_dr7;
> +
> + natural_width host_fs_base;
> + natural_width host_gs_base;
> + natural_width host_tr_base;
> + natural_width host_gdtr_base;
> + natural_width host_idtr_base;
> + natural_width host_rsp;
> +
> + u64 ept_pointer;
> +
> + u16 virtual_processor_id;
> + u16 padding16[3];
> +
> + u64 padding64_2[5];
> + u64 guest_physical_address;
> +
> + u32 vm_instruction_error;
> + u32 vm_exit_reason;
> + u32 vm_exit_intr_info;
> + u32 vm_exit_intr_error_code;
> + u32 idt_vectoring_info_field;
> + u32 idt_vectoring_error_code;
> + u32 vm_exit_instruction_len;
> + u32 vmx_instruction_info;
> +
> + natural_width exit_qualification;
> + natural_width padding64_3[4];
> +
> + natural_width guest_linear_address;
> + natural_width guest_rsp;
> + natural_width guest_rflags;
> +
> + u32 guest_interruptibility_info;
> + u32 cpu_based_vm_exec_control;
> + u32 exception_bitmap;
> + u32 vm_entry_controls;
> + u32 vm_entry_intr_info_field;
> + u32 vm_entry_exception_error_code;
> + u32 vm_entry_instruction_len;
> + u32 tpr_threshold;
> +
> + natural_width guest_rip;
> +
> + u32 hv_clean_fields;
> + u32 hv_padding_32;
> + u32 hv_synthetic_controls;
> + u32 hv_enlightenments_control;
> + u32 hv_vp_id;
> +
> + u64 hv_vm_id;
> + u64 partition_assist_page;
> + u64 padding64_4[4];
> + u64 guest_bndcfgs;
> + u64 padding64_5[7];
> + u64 xss_exit_bitmap;
> + u64 padding64_6[7];
> + };
> + };
> +
> + /* Synthetic and KVM-specific fields: */
> u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
> u32 padding[7]; /* room for future expansion */
>
> - u64 io_bitmap_a;
> - u64 io_bitmap_b;
> - u64 msr_bitmap;
> - u64 vm_exit_msr_store_addr;
> - u64 vm_exit_msr_load_addr;
> - u64 vm_entry_msr_load_addr;
> - u64 tsc_offset;
> - u64 virtual_apic_page_addr;
> u64 apic_access_addr;
> u64 posted_intr_desc_addr;
> u64 vm_function_control;
> - u64 ept_pointer;
> u64 eoi_exit_bitmap0;
> u64 eoi_exit_bitmap1;
> u64 eoi_exit_bitmap2;
> u64 eoi_exit_bitmap3;
> u64 eptp_list_address;
> - u64 xss_exit_bitmap;
> - u64 guest_physical_address;
> - u64 vmcs_link_pointer;
> u64 pml_address;
> - u64 guest_ia32_debugctl;
> - u64 guest_ia32_pat;
> - u64 guest_ia32_efer;
> u64 guest_ia32_perf_global_ctrl;
> - u64 guest_pdptr0;
> - u64 guest_pdptr1;
> - u64 guest_pdptr2;
> - u64 guest_pdptr3;
> - u64 guest_bndcfgs;
> - u64 host_ia32_pat;
> - u64 host_ia32_efer;
> u64 host_ia32_perf_global_ctrl;
> u64 padding64[8]; /* room for future expansion */
> - /*
> - * To allow migration of L1 (complete with its L2 guests) between
> - * machines of different natural widths (32 or 64 bit), we cannot have
> - * unsigned long fields with no explict size. We use u64 (aliased
> - * natural_width) instead. Luckily, x86 is little-endian.
> - */
> - natural_width cr0_guest_host_mask;
> - natural_width cr4_guest_host_mask;
> - natural_width cr0_read_shadow;
> - natural_width cr4_read_shadow;
> - natural_width cr3_target_value0;
> - natural_width cr3_target_value1;
> - natural_width cr3_target_value2;
> - natural_width cr3_target_value3;
> - natural_width exit_qualification;
> - natural_width guest_linear_address;
> - natural_width guest_cr0;
> - natural_width guest_cr3;
> - natural_width guest_cr4;
> - natural_width guest_es_base;
> - natural_width guest_cs_base;
> - natural_width guest_ss_base;
> - natural_width guest_ds_base;
> - natural_width guest_fs_base;
> - natural_width guest_gs_base;
> - natural_width guest_ldtr_base;
> - natural_width guest_tr_base;
> - natural_width guest_gdtr_base;
> - natural_width guest_idtr_base;
> - natural_width guest_dr7;
> - natural_width guest_rsp;
> - natural_width guest_rip;
> - natural_width guest_rflags;
> - natural_width guest_pending_dbg_exceptions;
> - natural_width guest_sysenter_esp;
> - natural_width guest_sysenter_eip;
> - natural_width host_cr0;
> - natural_width host_cr3;
> - natural_width host_cr4;
> - natural_width host_fs_base;
> - natural_width host_gs_base;
> - natural_width host_tr_base;
> - natural_width host_gdtr_base;
> - natural_width host_idtr_base;
> - natural_width host_ia32_sysenter_esp;
> - natural_width host_ia32_sysenter_eip;
> - natural_width host_rsp;
> - natural_width host_rip;
> - natural_width paddingl[8]; /* room for future expansion */
> - u32 pin_based_vm_exec_control;
> - u32 cpu_based_vm_exec_control;
> - u32 exception_bitmap;
> - u32 page_fault_error_code_mask;
> - u32 page_fault_error_code_match;
> - u32 cr3_target_count;
> - u32 vm_exit_controls;
> - u32 vm_exit_msr_store_count;
> - u32 vm_exit_msr_load_count;
> - u32 vm_entry_controls;
> - u32 vm_entry_msr_load_count;
> - u32 vm_entry_intr_info_field;
> - u32 vm_entry_exception_error_code;
> - u32 vm_entry_instruction_len;
> - u32 tpr_threshold;
> - u32 secondary_vm_exec_control;
> - u32 vm_instruction_error;
> - u32 vm_exit_reason;
> - u32 vm_exit_intr_info;
> - u32 vm_exit_intr_error_code;
> - u32 idt_vectoring_info_field;
> - u32 idt_vectoring_error_code;
> - u32 vm_exit_instruction_len;
> - u32 vmx_instruction_info;
> - u32 guest_es_limit;
> - u32 guest_cs_limit;
> - u32 guest_ss_limit;
> - u32 guest_ds_limit;
> - u32 guest_fs_limit;
> - u32 guest_gs_limit;
> - u32 guest_ldtr_limit;
> - u32 guest_tr_limit;
> - u32 guest_gdtr_limit;
> - u32 guest_idtr_limit;
> - u32 guest_es_ar_bytes;
> - u32 guest_cs_ar_bytes;
> - u32 guest_ss_ar_bytes;
> - u32 guest_ds_ar_bytes;
> - u32 guest_fs_ar_bytes;
> - u32 guest_gs_ar_bytes;
> - u32 guest_ldtr_ar_bytes;
> - u32 guest_tr_ar_bytes;
> - u32 guest_interruptibility_info;
> - u32 guest_activity_state;
> - u32 guest_sysenter_cs;
> - u32 host_ia32_sysenter_cs;
> u32 vmx_preemption_timer_value;
> u32 padding32[7]; /* room for future expansion */
> - u16 virtual_processor_id;
> u16 posted_intr_nv;
> - u16 guest_es_selector;
> - u16 guest_cs_selector;
> - u16 guest_ss_selector;
> - u16 guest_ds_selector;
> - u16 guest_fs_selector;
> - u16 guest_gs_selector;
> - u16 guest_ldtr_selector;
> - u16 guest_tr_selector;
> u16 guest_intr_status;
> u16 guest_pml_index;
> - u16 host_es_selector;
> - u16 host_cs_selector;
> - u16 host_ss_selector;
> - u16 host_ds_selector;
> - u16 host_fs_selector;
> - u16 host_gs_selector;
> - u16 host_tr_selector;
> };
>
> /*
> @@ -399,7 +452,7 @@ struct __packed vmcs12 {
> * layout of struct vmcs12 is changed. MSR_IA32_VMX_BASIC returns this id, and
> * VMPTRLD verifies that the VMCS region that L1 is loading contains this id.
> */
> -#define VMCS12_REVISION 0x11e57ed0
> +#define VMCS12_REVISION 0x11e586b1
>
> /*
> * VMCS12_SIZE is the number of bytes L1 should allocate for the VMXON region
> --
> 2.14.3
>

2017-12-18 21:28:45

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

At this point in time, I don't think you can just blithely change the
virtual VMCS layout and revision number. Existing VMs using the old
layout and revision number must continue to work on versions of kvm
past this point. You could tie the layout and revision number changes
to KVM_CAP_HYPERV_ENLIGHTENED_VMCS if you like, but kvm must be able
to continue to service VMs using the previous layout and revision
number in perpetuity.

On Mon, Dec 18, 2017 at 12:23 PM, Jim Mattson <[email protected]> wrote:
> Yikes! This breaks migration to/from older versions of kvm. Will you
> be submitting another change to handle dynamic conversion between
> formats?
>
> On Mon, Dec 18, 2017 at 9:17 AM, Vitaly Kuznetsov <[email protected]> wrote:
>> From: Ladi Prosek <[email protected]>
>>
>> Reorders existing fields and adds fields specific to Hyper-V. The layout
>> now matches Hyper-V TLFS 5.0b 16.11.2 Enlightened VMCS.
>>
>> Fields used by KVM but missing from Hyper-V are placed in the second half
>> of the VMCS page to minimize the chances they will clash with future
>> enlightened VMCS versions.
>>
>> Signed-off-by: Ladi Prosek <[email protected]>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> [Vitaly]: Update VMCS12_REVISION to some new arbitrary number.
>> ---
>> arch/x86/kvm/vmx.c | 321 +++++++++++++++++++++++++++++++----------------------
>> 1 file changed, 187 insertions(+), 134 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 8eba631c4dbd..cd5f29a57880 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -239,159 +239,212 @@ struct __packed vmcs12 {
>> u32 revision_id;
>> u32 abort;
>>
>> + union {
>> + u64 hv_vmcs[255];
>> + struct {
>> + u16 host_es_selector;
>> + u16 host_cs_selector;
>> + u16 host_ss_selector;
>> + u16 host_ds_selector;
>> + u16 host_fs_selector;
>> + u16 host_gs_selector;
>> + u16 host_tr_selector;
>> +
>> + u64 host_ia32_pat;
>> + u64 host_ia32_efer;
>> +
>> + /*
>> + * To allow migration of L1 (complete with its L2
>> + * guests) between machines of different natural widths
>> + * (32 or 64 bit), we cannot have unsigned long fields
>> + * with no explicit size. We use u64 (aliased
>> + * natural_width) instead. Luckily, x86 is
>> + * little-endian.
>> + */
>> + natural_width host_cr0;
>> + natural_width host_cr3;
>> + natural_width host_cr4;
>> +
>> + natural_width host_ia32_sysenter_esp;
>> + natural_width host_ia32_sysenter_eip;
>> + natural_width host_rip;
>> + u32 host_ia32_sysenter_cs;
>> +
>> + u32 pin_based_vm_exec_control;
>> + u32 vm_exit_controls;
>> + u32 secondary_vm_exec_control;
>> +
>> + u64 io_bitmap_a;
>> + u64 io_bitmap_b;
>> + u64 msr_bitmap;
>> +
>> + u16 guest_es_selector;
>> + u16 guest_cs_selector;
>> + u16 guest_ss_selector;
>> + u16 guest_ds_selector;
>> + u16 guest_fs_selector;
>> + u16 guest_gs_selector;
>> + u16 guest_ldtr_selector;
>> + u16 guest_tr_selector;
>> +
>> + u32 guest_es_limit;
>> + u32 guest_cs_limit;
>> + u32 guest_ss_limit;
>> + u32 guest_ds_limit;
>> + u32 guest_fs_limit;
>> + u32 guest_gs_limit;
>> + u32 guest_ldtr_limit;
>> + u32 guest_tr_limit;
>> + u32 guest_gdtr_limit;
>> + u32 guest_idtr_limit;
>> +
>> + u32 guest_es_ar_bytes;
>> + u32 guest_cs_ar_bytes;
>> + u32 guest_ss_ar_bytes;
>> + u32 guest_ds_ar_bytes;
>> + u32 guest_fs_ar_bytes;
>> + u32 guest_gs_ar_bytes;
>> + u32 guest_ldtr_ar_bytes;
>> + u32 guest_tr_ar_bytes;
>> +
>> + natural_width guest_es_base;
>> + natural_width guest_cs_base;
>> + natural_width guest_ss_base;
>> + natural_width guest_ds_base;
>> + natural_width guest_fs_base;
>> + natural_width guest_gs_base;
>> + natural_width guest_ldtr_base;
>> + natural_width guest_tr_base;
>> + natural_width guest_gdtr_base;
>> + natural_width guest_idtr_base;
>> +
>> + u64 padding64_1[3];
>> +
>> + u64 vm_exit_msr_store_addr;
>> + u64 vm_exit_msr_load_addr;
>> + u64 vm_entry_msr_load_addr;
>> +
>> + natural_width cr3_target_value0;
>> + natural_width cr3_target_value1;
>> + natural_width cr3_target_value2;
>> + natural_width cr3_target_value3;
>> +
>> + u32 page_fault_error_code_mask;
>> + u32 page_fault_error_code_match;
>> +
>> + u32 cr3_target_count;
>> + u32 vm_exit_msr_store_count;
>> + u32 vm_exit_msr_load_count;
>> + u32 vm_entry_msr_load_count;
>> +
>> + u64 tsc_offset;
>> + u64 virtual_apic_page_addr;
>> + u64 vmcs_link_pointer;
>> +
>> + u64 guest_ia32_debugctl;
>> + u64 guest_ia32_pat;
>> + u64 guest_ia32_efer;
>> +
>> + u64 guest_pdptr0;
>> + u64 guest_pdptr1;
>> + u64 guest_pdptr2;
>> + u64 guest_pdptr3;
>> +
>> + natural_width guest_pending_dbg_exceptions;
>> + natural_width guest_sysenter_esp;
>> + natural_width guest_sysenter_eip;
>> +
>> + u32 guest_activity_state;
>> + u32 guest_sysenter_cs;
>> +
>> + natural_width cr0_guest_host_mask;
>> + natural_width cr4_guest_host_mask;
>> + natural_width cr0_read_shadow;
>> + natural_width cr4_read_shadow;
>> + natural_width guest_cr0;
>> + natural_width guest_cr3;
>> + natural_width guest_cr4;
>> + natural_width guest_dr7;
>> +
>> + natural_width host_fs_base;
>> + natural_width host_gs_base;
>> + natural_width host_tr_base;
>> + natural_width host_gdtr_base;
>> + natural_width host_idtr_base;
>> + natural_width host_rsp;
>> +
>> + u64 ept_pointer;
>> +
>> + u16 virtual_processor_id;
>> + u16 padding16[3];
>> +
>> + u64 padding64_2[5];
>> + u64 guest_physical_address;
>> +
>> + u32 vm_instruction_error;
>> + u32 vm_exit_reason;
>> + u32 vm_exit_intr_info;
>> + u32 vm_exit_intr_error_code;
>> + u32 idt_vectoring_info_field;
>> + u32 idt_vectoring_error_code;
>> + u32 vm_exit_instruction_len;
>> + u32 vmx_instruction_info;
>> +
>> + natural_width exit_qualification;
>> + natural_width padding64_3[4];
>> +
>> + natural_width guest_linear_address;
>> + natural_width guest_rsp;
>> + natural_width guest_rflags;
>> +
>> + u32 guest_interruptibility_info;
>> + u32 cpu_based_vm_exec_control;
>> + u32 exception_bitmap;
>> + u32 vm_entry_controls;
>> + u32 vm_entry_intr_info_field;
>> + u32 vm_entry_exception_error_code;
>> + u32 vm_entry_instruction_len;
>> + u32 tpr_threshold;
>> +
>> + natural_width guest_rip;
>> +
>> + u32 hv_clean_fields;
>> + u32 hv_padding_32;
>> + u32 hv_synthetic_controls;
>> + u32 hv_enlightenments_control;
>> + u32 hv_vp_id;
>> +
>> + u64 hv_vm_id;
>> + u64 partition_assist_page;
>> + u64 padding64_4[4];
>> + u64 guest_bndcfgs;
>> + u64 padding64_5[7];
>> + u64 xss_exit_bitmap;
>> + u64 padding64_6[7];
>> + };
>> + };
>> +
>> + /* Synthetic and KVM-specific fields: */
>> u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
>> u32 padding[7]; /* room for future expansion */
>>
>> - u64 io_bitmap_a;
>> - u64 io_bitmap_b;
>> - u64 msr_bitmap;
>> - u64 vm_exit_msr_store_addr;
>> - u64 vm_exit_msr_load_addr;
>> - u64 vm_entry_msr_load_addr;
>> - u64 tsc_offset;
>> - u64 virtual_apic_page_addr;
>> u64 apic_access_addr;
>> u64 posted_intr_desc_addr;
>> u64 vm_function_control;
>> - u64 ept_pointer;
>> u64 eoi_exit_bitmap0;
>> u64 eoi_exit_bitmap1;
>> u64 eoi_exit_bitmap2;
>> u64 eoi_exit_bitmap3;
>> u64 eptp_list_address;
>> - u64 xss_exit_bitmap;
>> - u64 guest_physical_address;
>> - u64 vmcs_link_pointer;
>> u64 pml_address;
>> - u64 guest_ia32_debugctl;
>> - u64 guest_ia32_pat;
>> - u64 guest_ia32_efer;
>> u64 guest_ia32_perf_global_ctrl;
>> - u64 guest_pdptr0;
>> - u64 guest_pdptr1;
>> - u64 guest_pdptr2;
>> - u64 guest_pdptr3;
>> - u64 guest_bndcfgs;
>> - u64 host_ia32_pat;
>> - u64 host_ia32_efer;
>> u64 host_ia32_perf_global_ctrl;
>> u64 padding64[8]; /* room for future expansion */
>> - /*
>> - * To allow migration of L1 (complete with its L2 guests) between
>> - * machines of different natural widths (32 or 64 bit), we cannot have
>> - * unsigned long fields with no explict size. We use u64 (aliased
>> - * natural_width) instead. Luckily, x86 is little-endian.
>> - */
>> - natural_width cr0_guest_host_mask;
>> - natural_width cr4_guest_host_mask;
>> - natural_width cr0_read_shadow;
>> - natural_width cr4_read_shadow;
>> - natural_width cr3_target_value0;
>> - natural_width cr3_target_value1;
>> - natural_width cr3_target_value2;
>> - natural_width cr3_target_value3;
>> - natural_width exit_qualification;
>> - natural_width guest_linear_address;
>> - natural_width guest_cr0;
>> - natural_width guest_cr3;
>> - natural_width guest_cr4;
>> - natural_width guest_es_base;
>> - natural_width guest_cs_base;
>> - natural_width guest_ss_base;
>> - natural_width guest_ds_base;
>> - natural_width guest_fs_base;
>> - natural_width guest_gs_base;
>> - natural_width guest_ldtr_base;
>> - natural_width guest_tr_base;
>> - natural_width guest_gdtr_base;
>> - natural_width guest_idtr_base;
>> - natural_width guest_dr7;
>> - natural_width guest_rsp;
>> - natural_width guest_rip;
>> - natural_width guest_rflags;
>> - natural_width guest_pending_dbg_exceptions;
>> - natural_width guest_sysenter_esp;
>> - natural_width guest_sysenter_eip;
>> - natural_width host_cr0;
>> - natural_width host_cr3;
>> - natural_width host_cr4;
>> - natural_width host_fs_base;
>> - natural_width host_gs_base;
>> - natural_width host_tr_base;
>> - natural_width host_gdtr_base;
>> - natural_width host_idtr_base;
>> - natural_width host_ia32_sysenter_esp;
>> - natural_width host_ia32_sysenter_eip;
>> - natural_width host_rsp;
>> - natural_width host_rip;
>> - natural_width paddingl[8]; /* room for future expansion */
>> - u32 pin_based_vm_exec_control;
>> - u32 cpu_based_vm_exec_control;
>> - u32 exception_bitmap;
>> - u32 page_fault_error_code_mask;
>> - u32 page_fault_error_code_match;
>> - u32 cr3_target_count;
>> - u32 vm_exit_controls;
>> - u32 vm_exit_msr_store_count;
>> - u32 vm_exit_msr_load_count;
>> - u32 vm_entry_controls;
>> - u32 vm_entry_msr_load_count;
>> - u32 vm_entry_intr_info_field;
>> - u32 vm_entry_exception_error_code;
>> - u32 vm_entry_instruction_len;
>> - u32 tpr_threshold;
>> - u32 secondary_vm_exec_control;
>> - u32 vm_instruction_error;
>> - u32 vm_exit_reason;
>> - u32 vm_exit_intr_info;
>> - u32 vm_exit_intr_error_code;
>> - u32 idt_vectoring_info_field;
>> - u32 idt_vectoring_error_code;
>> - u32 vm_exit_instruction_len;
>> - u32 vmx_instruction_info;
>> - u32 guest_es_limit;
>> - u32 guest_cs_limit;
>> - u32 guest_ss_limit;
>> - u32 guest_ds_limit;
>> - u32 guest_fs_limit;
>> - u32 guest_gs_limit;
>> - u32 guest_ldtr_limit;
>> - u32 guest_tr_limit;
>> - u32 guest_gdtr_limit;
>> - u32 guest_idtr_limit;
>> - u32 guest_es_ar_bytes;
>> - u32 guest_cs_ar_bytes;
>> - u32 guest_ss_ar_bytes;
>> - u32 guest_ds_ar_bytes;
>> - u32 guest_fs_ar_bytes;
>> - u32 guest_gs_ar_bytes;
>> - u32 guest_ldtr_ar_bytes;
>> - u32 guest_tr_ar_bytes;
>> - u32 guest_interruptibility_info;
>> - u32 guest_activity_state;
>> - u32 guest_sysenter_cs;
>> - u32 host_ia32_sysenter_cs;
>> u32 vmx_preemption_timer_value;
>> u32 padding32[7]; /* room for future expansion */
>> - u16 virtual_processor_id;
>> u16 posted_intr_nv;
>> - u16 guest_es_selector;
>> - u16 guest_cs_selector;
>> - u16 guest_ss_selector;
>> - u16 guest_ds_selector;
>> - u16 guest_fs_selector;
>> - u16 guest_gs_selector;
>> - u16 guest_ldtr_selector;
>> - u16 guest_tr_selector;
>> u16 guest_intr_status;
>> u16 guest_pml_index;
>> - u16 host_es_selector;
>> - u16 host_cs_selector;
>> - u16 host_ss_selector;
>> - u16 host_ds_selector;
>> - u16 host_fs_selector;
>> - u16 host_gs_selector;
>> - u16 host_tr_selector;
>> };
>>
>> /*
>> @@ -399,7 +452,7 @@ struct __packed vmcs12 {
>> * layout of struct vmcs12 is changed. MSR_IA32_VMX_BASIC returns this id, and
>> * VMPTRLD verifies that the VMCS region that L1 is loading contains this id.
>> */
>> -#define VMCS12_REVISION 0x11e57ed0
>> +#define VMCS12_REVISION 0x11e586b1
>>
>> /*
>> * VMCS12_SIZE is the number of bytes L1 should allocate for the VMXON region
>> --
>> 2.14.3
>>

2017-12-19 12:25:26

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

Jim Mattson <[email protected]> writes:

> At this point in time, I don't think you can just blithely change the
> virtual VMCS layout and revision number. Existing VMs using the old
> layout and revision number must continue to work on versions of kvm
> past this point. You could tie the layout and revision number changes
> to KVM_CAP_HYPERV_ENLIGHTENED_VMCS if you like, but kvm must be able
> to continue to service VMs using the previous layout and revision
> number in perpetuity.
>

I see what you mean. In case we need to keep migration of nested
workloads working between KVMs of different versions we can't (ever)
touch vmcs12.

The way to go in this case, I think, is to create a completely separate
enlightened_vmcs12 struct and use it when appropriate. We can't possibly
support migrating workloads which use enlightened VMCS to an old KVM
which doesn't support it.

P.S. "If there are changes in this struct, VMCS12_REVISION must be
changed." comment needs to be replaced with "Don't even think about
changing this" :-)

--
Vitaly

2017-12-19 12:38:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

On 19/12/2017 13:25, Vitaly Kuznetsov wrote:
>
>> At this point in time, I don't think you can just blithely change the
>> virtual VMCS layout and revision number. Existing VMs using the old
>> layout and revision number must continue to work on versions of kvm
>> past this point. You could tie the layout and revision number changes
>> to KVM_CAP_HYPERV_ENLIGHTENED_VMCS if you like, but kvm must be able
>> to continue to service VMs using the previous layout and revision
>> number in perpetuity.
>>
> I see what you mean. In case we need to keep migration of nested
> workloads working between KVMs of different versions we can't (ever)
> touch vmcs12.

Actually we can, for two reasons.

First, the active VMCS is stored in host RAM (not in guest RAM). This
means there are clear points where to do the translation, namely vmptrld
and the (not yet upstream) ioctl to set VMX state.

Therefore you only need to keep an (offset, type) map from old to new
layout map; at those two points if you detect an old VMCS12_REVISION you
copy the fields one by one instead of doing a memcpy. The next vmclear
or vmptrld or get-VMX-state ioctl will automatically update to the new
VMCS12_REVISION. Of course, this is a one-way street unless you also
add support for writing old VMCS12_REVISIONs.

But, second, VMX state migration is not upstream yet, so nested
hypervisors are currently not migratable: the active VMCS12 state will
not be migrated at all! So in upstream KVM we wouldn't even need to
upgrade the VMCS12_REVISION to make changes to vmcs12.

That said...

> The way to go in this case, I think, is to create a completely separate
> enlightened_vmcs12 struct and use it when appropriate. We can't possibly
> support migrating workloads which use enlightened VMCS to an old KVM
> which doesn't support it.

... this is probably a good idea as well.

Paolo

2017-12-19 12:42:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 0/7] KVM: nVMX: enlightened VMCS initial implementation

On 18/12/2017 18:17, Vitaly Kuznetsov wrote:
> The original author of these patches does no longer work at Red Hat, I
> agreed to take this over and send upstream. Here is his original
> description:
>
> "Makes KVM implement the enlightened VMCS feature per Hyper-V TLFS 5.0b.
> I've measured about %5 improvement in cost of a nested VM exit (Hyper-V
> enabled Windows Server 2016 nested in KVM)."

Can you try reproducing this and see how much a simple CPUID loop costs in:

* Hyper-V on Hyper-V (with enlightened VMCS, as a proxy for a full
implementation including the clean fields mask)

* Hyper-V on KVM, with and without enlightened VMCS

The latest kvm/queue branch already cut a lot of the cost of a nested VM
exit (from ~22000 to ~14000 clock cycles for KVM on KVM), so we could
also see if Hyper-V needs shadowing of more fields.

> This is just an initial implementation. By leveraging clean fields mask
> we can further improve performance. I'm also interested in implementing
> the other part of the feature: consuming enlightened VMCS when KVM is
> running on top of Hyper-V.

I'm also interested in consuming enlightened VMCS on Hyper-V if that can
provide better performance for KVM on Azure.

Paolo

2017-12-19 13:21:15

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 0/7] KVM: nVMX: enlightened VMCS initial implementation

Paolo Bonzini <[email protected]> writes:

> On 18/12/2017 18:17, Vitaly Kuznetsov wrote:
>> The original author of these patches does no longer work at Red Hat, I
>> agreed to take this over and send upstream. Here is his original
>> description:
>>
>> "Makes KVM implement the enlightened VMCS feature per Hyper-V TLFS 5.0b.
>> I've measured about %5 improvement in cost of a nested VM exit (Hyper-V
>> enabled Windows Server 2016 nested in KVM)."
>
> Can you try reproducing this and see how much a simple CPUID loop costs in:
>
> * Hyper-V on Hyper-V (with enlightened VMCS, as a proxy for a full
> implementation including the clean fields mask)
>
> * Hyper-V on KVM, with and without enlightened VMCS
>
> The latest kvm/queue branch already cut a lot of the cost of a nested VM
> exit (from ~22000 to ~14000 clock cycles for KVM on KVM), so we could
> also see if Hyper-V needs shadowing of more fields.

I tested this series before sending out and was able to reproduce said
5% improvement with the feature (but didn't keep record of clock
cycles). I'll try doing tests you mentioned on the same hardware and
come back with the result. Hopefully I'll manage that before holidays.

Thanks,

--
Vitaly

2017-12-19 17:40:51

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

On Tue, Dec 19, 2017 at 4:37 AM, Paolo Bonzini <[email protected]> wrote:
> On 19/12/2017 13:25, Vitaly Kuznetsov wrote:
>>
>>> At this point in time, I don't think you can just blithely change the
>>> virtual VMCS layout and revision number. Existing VMs using the old
>>> layout and revision number must continue to work on versions of kvm
>>> past this point. You could tie the layout and revision number changes
>>> to KVM_CAP_HYPERV_ENLIGHTENED_VMCS if you like, but kvm must be able
>>> to continue to service VMs using the previous layout and revision
>>> number in perpetuity.
>>>
>> I see what you mean. In case we need to keep migration of nested
>> workloads working between KVMs of different versions we can't (ever)
>> touch vmcs12.
>
> Actually we can, for two reasons.
>
> First, the active VMCS is stored in host RAM (not in guest RAM). This
> means there are clear points where to do the translation, namely vmptrld
> and the (not yet upstream) ioctl to set VMX state.
>
> Therefore you only need to keep an (offset, type) map from old to new
> layout map; at those two points if you detect an old VMCS12_REVISION you
> copy the fields one by one instead of doing a memcpy. The next vmclear
> or vmptrld or get-VMX-state ioctl will automatically update to the new
> VMCS12_REVISION. Of course, this is a one-way street unless you also
> add support for writing old VMCS12_REVISIONs.

I'm not sure that's really the right way to go, since any guest that
has already read the IA32_VMX_BASIC MSR has a right to expect the VMCS
revision to remain unchanged. Also, this breaks migration back to
older versions of kvm, even for VMs that are not making use of the
enlightened VMCS. It would be nice to be able to maintain the ability
for an older VM to run on a newer kvm version without picking up any
taint that would make it impossible to migrate it back to the older
kvm version where it started.

> But, second, VMX state migration is not upstream yet, so nested
> hypervisors are currently not migratable: the active VMCS12 state will
> not be migrated at all! So in upstream KVM we wouldn't even need to
> upgrade the VMCS12_REVISION to make changes to vmcs12.

Mea culpa. Perhaps this is the motivation I needed to prioritize
incorporating the community feedback on that change set.

> That said...
>
>> The way to go in this case, I think, is to create a completely separate
>> enlightened_vmcs12 struct and use it when appropriate. We can't possibly
>> support migrating workloads which use enlightened VMCS to an old KVM
>> which doesn't support it.
>
> ... this is probably a good idea as well.
>
> Paolo

2017-12-19 17:44:07

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

You can change the default VMCS12_REVISION and associated layout, as
long as support is maintained for the old layout and userspace has the
ability (e.g. by setting the IA32_VMX_BASIC MSR) to specify that a VM
needs to use the old layout.

On Tue, Dec 19, 2017 at 4:25 AM, Vitaly Kuznetsov <[email protected]> wrote:
> Jim Mattson <[email protected]> writes:
>
>> At this point in time, I don't think you can just blithely change the
>> virtual VMCS layout and revision number. Existing VMs using the old
>> layout and revision number must continue to work on versions of kvm
>> past this point. You could tie the layout and revision number changes
>> to KVM_CAP_HYPERV_ENLIGHTENED_VMCS if you like, but kvm must be able
>> to continue to service VMs using the previous layout and revision
>> number in perpetuity.
>>
>
> I see what you mean. In case we need to keep migration of nested
> workloads working between KVMs of different versions we can't (ever)
> touch vmcs12.
>
> The way to go in this case, I think, is to create a completely separate
> enlightened_vmcs12 struct and use it when appropriate. We can't possibly
> support migrating workloads which use enlightened VMCS to an old KVM
> which doesn't support it.
>
> P.S. "If there are changes in this struct, VMCS12_REVISION must be
> changed." comment needs to be replaced with "Don't even think about
> changing this" :-)
>
> --
> Vitaly

2017-12-19 21:19:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

On 19/12/2017 18:40, Jim Mattson wrote:
> I'm not sure that's really the right way to go, since any guest that
> has already read the IA32_VMX_BASIC MSR has a right to expect the VMCS
> revision to remain unchanged.

Hmm, not just that, "the VMCS revision identifier is never written by
the processor" according to the SDM. Maybe the code that accesses the
vmcs12 can be placed in a .h file and included more than once in vmx.c.

Paolo

2017-12-21 12:50:39

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 0/7] KVM: nVMX: enlightened VMCS initial implementation

Vitaly Kuznetsov <[email protected]> writes:

> Paolo Bonzini <[email protected]> writes:
>
>> On 18/12/2017 18:17, Vitaly Kuznetsov wrote:
>>> The original author of these patches does no longer work at Red Hat, I
>>> agreed to take this over and send upstream. Here is his original
>>> description:
>>>
>>> "Makes KVM implement the enlightened VMCS feature per Hyper-V TLFS 5.0b.
>>> I've measured about %5 improvement in cost of a nested VM exit (Hyper-V
>>> enabled Windows Server 2016 nested in KVM)."
>>
>> Can you try reproducing this and see how much a simple CPUID loop costs in:
>>
>> * Hyper-V on Hyper-V (with enlightened VMCS, as a proxy for a full
>> implementation including the clean fields mask)
>>
>> * Hyper-V on KVM, with and without enlightened VMCS
>>
>> The latest kvm/queue branch already cut a lot of the cost of a nested VM
>> exit (from ~22000 to ~14000 clock cycles for KVM on KVM), so we could
>> also see if Hyper-V needs shadowing of more fields.
>
> I tested this series before sending out and was able to reproduce said
> 5% improvement with the feature (but didn't keep record of clock
> cycles). I'll try doing tests you mentioned on the same hardware and
> come back with the result. Hopefully I'll manage that before holidays.

I'm back with (somewhat frustrating) results (E5-2603):

1) Windows on Hyper-V (no nesting): 1350 cycles

2) Windows on Hyper-V on Hyper-V: 8600

3) Windows on KVM (no nesting): 1150 cycles

4) Windows on Hyper-V on KVM (no enlightened VMCS): 18200

5) Windows on Hyper-V on KVM (enlightened VMCS): 17100

--
Vitaly

2017-12-21 13:03:54

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 2/7] KVM: nVMX: modify vmcs12 fields to match Hyper-V enlightened VMCS

Paolo Bonzini <[email protected]> writes:

> On 19/12/2017 13:25, Vitaly Kuznetsov wrote:
>>
>>> At this point in time, I don't think you can just blithely change the
>>> virtual VMCS layout and revision number. Existing VMs using the old
>>> layout and revision number must continue to work on versions of kvm
>>> past this point. You could tie the layout and revision number changes
>>> to KVM_CAP_HYPERV_ENLIGHTENED_VMCS if you like, but kvm must be able
>>> to continue to service VMs using the previous layout and revision
>>> number in perpetuity.
>>>
>> I see what you mean. In case we need to keep migration of nested
>> workloads working between KVMs of different versions we can't (ever)
>> touch vmcs12.
>
> Actually we can, for two reasons.
>
> First, the active VMCS is stored in host RAM (not in guest RAM). This
> means there are clear points where to do the translation, namely vmptrld
> and the (not yet upstream) ioctl to set VMX state.
>
> Therefore you only need to keep an (offset, type) map from old to new
> layout map; at those two points if you detect an old VMCS12_REVISION you
> copy the fields one by one instead of doing a memcpy. The next vmclear
> or vmptrld or get-VMX-state ioctl will automatically update to the new
> VMCS12_REVISION. Of course, this is a one-way street unless you also
> add support for writing old VMCS12_REVISIONs.
>
> But, second, VMX state migration is not upstream yet, so nested
> hypervisors are currently not migratable: the active VMCS12 state will
> not be migrated at all! So in upstream KVM we wouldn't even need to
> upgrade the VMCS12_REVISION to make changes to vmcs12.
>
> That said...
>
>> The way to go in this case, I think, is to create a completely separate
>> enlightened_vmcs12 struct and use it when appropriate. We can't possibly
>> support migrating workloads which use enlightened VMCS to an old KVM
>> which doesn't support it.
>
> ... this is probably a good idea as well.
>

One other thing I was thinking about is the shared definition of
enlightened vmcs which we'll use for both KVM-on-Hyper-V and Hyper-V on
KVM and for that purpose I'd like it to be placed outside of struct
vmcs12. We can, of course, embed it at the beginning of vmcs12.

Thinking long term (and having in mind that Microsoft will be updating
enlightened VMCS on its own schedule) -- what would be the preferred way
to go? It seems that personally I'm leaning towards untangling and
keeping it separate from vmcs12 but I can't really find a convincing
argument...

--
Vitaly

2017-12-21 14:32:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 0/7] KVM: nVMX: enlightened VMCS initial implementation

On 21/12/2017 13:50, Vitaly Kuznetsov wrote:
> I'm back with (somewhat frustrating) results (E5-2603):

v4 (that would be Broadwell)?

> 1) Windows on Hyper-V (no nesting): 1350 cycles
>
> 2) Windows on Hyper-V on Hyper-V: 8600
>
> 3) Windows on KVM (no nesting): 1150 cycles
>
> 4) Windows on Hyper-V on KVM (no enlightened VMCS): 18200
>
> 5) Windows on Hyper-V on KVM (enlightened VMCS): 17100

What version were you using for KVM? There are quite a few nested virt
optimizations in kvm/queue (which may make enlightened VMCS both more or
less efficient).

In particular, with latest kvm/queue you could try tracing vmread and
vmwrite vmexits, and see if you get any. If you do, that might be an
easy few hundred cycles savings.

Paolo

2017-12-21 15:08:44

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 0/7] KVM: nVMX: enlightened VMCS initial implementation

Paolo Bonzini <[email protected]> writes:

> On 21/12/2017 13:50, Vitaly Kuznetsov wrote:
>> I'm back with (somewhat frustrating) results (E5-2603):
>
> v4 (that would be Broadwell)?
>

Sorry, v3, actually. Haswell. (the first one supporting vmcs shadowing afaiu).

>> 1) Windows on Hyper-V (no nesting): 1350 cycles
>>
>> 2) Windows on Hyper-V on Hyper-V: 8600
>>
>> 3) Windows on KVM (no nesting): 1150 cycles
>>
>> 4) Windows on Hyper-V on KVM (no enlightened VMCS): 18200
>>
>> 5) Windows on Hyper-V on KVM (enlightened VMCS): 17100
>
> What version were you using for KVM? There are quite a few nested virt
> optimizations in kvm/queue (which may make enlightened VMCS both more or
> less efficient).

This is kvm/queue and I rebased enlightened VMCS patches to it.

>
> In particular, with latest kvm/queue you could try tracing vmread and
> vmwrite vmexits, and see if you get any. If you do, that might be an
> easy few hundred cycles savings.

Will do.

--
Vitaly