2016-11-30 02:15:29

by David Matlack

[permalink] [raw]
Subject: [PATCH v3 0/5] VMX Capability MSRs

This patchset adds support setting the VMX capability MSRs from userspace.
This is required for migration of nested-capable VMs to different CPUs and
KVM versions.

Patch 1 generates the non-true VMX MSRs using the true MSRs, which allows
userspace to skip restoring them.

Patch 2 adds support for restoring the VMX capability MSRs.

Patches 3 and 4 make KVM's emulation of MSR_IA32_VMX_CR{0,4}_FIXED1 more
accurate.

Patch 5 fixes a bug in emulated VM-entry that came up when testing patches
3 and 4.

Changes since v2:
* Generate CR0_FIXED1 in addition to CR4_FIXED1
* Generate "non-true" capability MSRs from the "true" versions and remove
"non-true" MSRs from struct nested_vmx.
* Disallow restore of CR{0,4}_FIXED1 and "non-true" MSRs since they are
generated.

Changes since v1:
* Support restoring less-capable versions of MSR_IA32_VMX_BASIC,
MSR_IA32_VMX_CR{0,4}_FIXED{0,1}.
* Include VMX_INS_OUTS in MSR_IA32_VMX_BASIC initial value.

David Matlack (5):
KVM: nVMX: generate non-true VMX MSRs based on true versions
KVM: nVMX: support restore of VMX capability MSRs
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry

arch/x86/include/asm/vmx.h | 31 +++
arch/x86/kvm/vmx.c | 479 +++++++++++++++++++++++++++++++++++++--------
2 files changed, 427 insertions(+), 83 deletions(-)

--
2.8.0.rc3.226.g39d4020


2016-11-30 02:15:30

by David Matlack

[permalink] [raw]
Subject: [PATCH v3 1/5] KVM: nVMX: generate non-true VMX MSRs based on true versions

The "non-true" VMX capability MSRs can be generated from their "true"
counterparts, by OR-ing the default1 bits. The default1 bits are fixed
and defined in the SDM.

Since we can generate the non-true VMX MSRs from the true versions,
there's no need to store both in struct nested_vmx. This also lets
userspace avoid having to restore the non-true MSRs.

Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so,
we simply need to set all the default1 bits in the true MSRs (such that
the true MSRs and the generated non-true MSRs are equal).

Signed-off-by: David Matlack <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/vmx.c | 45 +++++++++++++++++++--------------------------
1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..0beb56a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -446,19 +446,21 @@ struct nested_vmx {
u16 vpid02;
u16 last_vpid;

+ /*
+ * We only store the "true" versions of the VMX capability MSRs. We
+ * generate the "non-true" versions by setting the must-be-1 bits
+ * according to the SDM.
+ */
u32 nested_vmx_procbased_ctls_low;
u32 nested_vmx_procbased_ctls_high;
- u32 nested_vmx_true_procbased_ctls_low;
u32 nested_vmx_secondary_ctls_low;
u32 nested_vmx_secondary_ctls_high;
u32 nested_vmx_pinbased_ctls_low;
u32 nested_vmx_pinbased_ctls_high;
u32 nested_vmx_exit_ctls_low;
u32 nested_vmx_exit_ctls_high;
- u32 nested_vmx_true_exit_ctls_low;
u32 nested_vmx_entry_ctls_low;
u32 nested_vmx_entry_ctls_high;
- u32 nested_vmx_true_entry_ctls_low;
u32 nested_vmx_misc_low;
u32 nested_vmx_misc_high;
u32 nested_vmx_ept_caps;
@@ -2712,9 +2714,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_exit_ctls_high |= VM_EXIT_CLEAR_BNDCFGS;

/* We support free control of debug control saving. */
- vmx->nested.nested_vmx_true_exit_ctls_low =
- vmx->nested.nested_vmx_exit_ctls_low &
- ~VM_EXIT_SAVE_DEBUG_CONTROLS;
+ vmx->nested.nested_vmx_exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;

/* entry controls */
rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
@@ -2733,9 +2733,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_entry_ctls_high |= VM_ENTRY_LOAD_BNDCFGS;

/* We support free control of debug control loading. */
- vmx->nested.nested_vmx_true_entry_ctls_low =
- vmx->nested.nested_vmx_entry_ctls_low &
- ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
+ vmx->nested.nested_vmx_entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;

/* cpu-based controls */
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
@@ -2768,8 +2766,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
CPU_BASED_USE_MSR_BITMAPS;

/* We support free control of CR3 access interception. */
- vmx->nested.nested_vmx_true_procbased_ctls_low =
- vmx->nested.nested_vmx_procbased_ctls_low &
+ vmx->nested.nested_vmx_procbased_ctls_low &=
~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);

/* secondary cpu-based controls */
@@ -2868,36 +2865,32 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
*pdata = vmx_control_msr(
vmx->nested.nested_vmx_pinbased_ctls_low,
vmx->nested.nested_vmx_pinbased_ctls_high);
+ if (msr_index == MSR_IA32_VMX_PINBASED_CTLS)
+ *pdata |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
break;
case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
- *pdata = vmx_control_msr(
- vmx->nested.nested_vmx_true_procbased_ctls_low,
- vmx->nested.nested_vmx_procbased_ctls_high);
- break;
case MSR_IA32_VMX_PROCBASED_CTLS:
*pdata = vmx_control_msr(
vmx->nested.nested_vmx_procbased_ctls_low,
vmx->nested.nested_vmx_procbased_ctls_high);
+ if (msr_index == MSR_IA32_VMX_PROCBASED_CTLS)
+ *pdata |= CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
break;
case MSR_IA32_VMX_TRUE_EXIT_CTLS:
- *pdata = vmx_control_msr(
- vmx->nested.nested_vmx_true_exit_ctls_low,
- vmx->nested.nested_vmx_exit_ctls_high);
- break;
case MSR_IA32_VMX_EXIT_CTLS:
*pdata = vmx_control_msr(
vmx->nested.nested_vmx_exit_ctls_low,
vmx->nested.nested_vmx_exit_ctls_high);
+ if (msr_index == MSR_IA32_VMX_EXIT_CTLS)
+ *pdata |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
break;
case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
- *pdata = vmx_control_msr(
- vmx->nested.nested_vmx_true_entry_ctls_low,
- vmx->nested.nested_vmx_entry_ctls_high);
- break;
case MSR_IA32_VMX_ENTRY_CTLS:
*pdata = vmx_control_msr(
vmx->nested.nested_vmx_entry_ctls_low,
vmx->nested.nested_vmx_entry_ctls_high);
+ if (msr_index == MSR_IA32_VMX_ENTRY_CTLS)
+ *pdata |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
break;
case MSR_IA32_VMX_MISC:
*pdata = vmx_control_msr(
@@ -10184,7 +10177,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
}

if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
- vmx->nested.nested_vmx_true_procbased_ctls_low,
+ vmx->nested.nested_vmx_procbased_ctls_low,
vmx->nested.nested_vmx_procbased_ctls_high) ||
!vmx_control_verify(vmcs12->secondary_vm_exec_control,
vmx->nested.nested_vmx_secondary_ctls_low,
@@ -10193,10 +10186,10 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
vmx->nested.nested_vmx_pinbased_ctls_low,
vmx->nested.nested_vmx_pinbased_ctls_high) ||
!vmx_control_verify(vmcs12->vm_exit_controls,
- vmx->nested.nested_vmx_true_exit_ctls_low,
+ vmx->nested.nested_vmx_exit_ctls_low,
vmx->nested.nested_vmx_exit_ctls_high) ||
!vmx_control_verify(vmcs12->vm_entry_controls,
- vmx->nested.nested_vmx_true_entry_ctls_low,
+ vmx->nested.nested_vmx_entry_ctls_low,
vmx->nested.nested_vmx_entry_ctls_high))
{
nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
--
2.8.0.rc3.226.g39d4020

2016-11-30 02:15:27

by David Matlack

[permalink] [raw]
Subject: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation

KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.

This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.

Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.

This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.

Signed-off-by: David Matlack <[email protected]>
---
arch/x86/kvm/vmx.c | 77 +++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 01a2b9e..b414513 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2864,12 +2864,18 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_vmcs_enum = 0x2e;
}

+/*
+ * if fixed0[i] == 1: val[i] must be 1
+ * if fixed1[i] == 0: val[i] must be 0
+ */
+static inline bool fixed_bits_valid(u64 val, u64 fixed0, u64 fixed1)
+{
+ return ((val & fixed1) | fixed0) == val;
+}
+
static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
{
- /*
- * Bits 0 in high must be 0, and bits 1 in low must be 1.
- */
- return ((control & high) | low) == control;
+ return fixed_bits_valid(control, low, high);
}

static inline u64 vmx_control_msr(u32 low, u32 high)
@@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
(unsigned long *)&vcpu->arch.regs_dirty);
}

+static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
+ u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+ if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
+ SECONDARY_EXEC_UNRESTRICTED_GUEST &&
+ nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
+ fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);
+
+ return fixed_bits_valid(val, fixed0, fixed1);
+}
+
+static bool nested_host_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
+ u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
+
+ return fixed_bits_valid(val, fixed0, fixed1);
+}
+
+static bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed0;
+ u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed1;
+
+ return fixed_bits_valid(val, fixed0, fixed1);
+}
+
+/* No difference in the restrictions on guest and host CR4 in VMX operation. */
+#define nested_guest_cr4_valid nested_cr4_valid
+#define nested_host_cr4_valid nested_cr4_valid
+
static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);

static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
@@ -4232,8 +4272,8 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
if (!nested_vmx_allowed(vcpu))
return 1;
}
- if (to_vmx(vcpu)->nested.vmxon &&
- ((cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON))
+
+ if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
return 1;

vcpu->arch.cr4 = cr4;
@@ -5852,18 +5892,6 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xc1;
}

-static bool nested_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
-{
- unsigned long always_on = VMXON_CR0_ALWAYSON;
- struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-
- if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
- SECONDARY_EXEC_UNRESTRICTED_GUEST &&
- nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
- always_on &= ~(X86_CR0_PE | X86_CR0_PG);
- return (val & always_on) == always_on;
-}
-
/* called to set cr0 as appropriate for a mov-to-cr0 exit. */
static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
{
@@ -5882,7 +5910,7 @@ static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
val = (val & ~vmcs12->cr0_guest_host_mask) |
(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask);

- if (!nested_cr0_valid(vcpu, val))
+ if (!nested_guest_cr0_valid(vcpu, val))
return 1;

if (kvm_set_cr0(vcpu, val))
@@ -5891,8 +5919,9 @@ static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
return 0;
} else {
if (to_vmx(vcpu)->nested.vmxon &&
- ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
+ !nested_host_cr0_valid(vcpu, val))
return 1;
+
return kvm_set_cr0(vcpu, val);
}
}
@@ -10438,15 +10467,15 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
return 1;
}

- if (((vmcs12->host_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
- ((vmcs12->host_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+ if (!nested_host_cr0_valid(vcpu, vmcs12->host_cr0) ||
+ !nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) {
nested_vmx_failValid(vcpu,
VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
return 1;
}

- if (!nested_cr0_valid(vcpu, vmcs12->guest_cr0) ||
- ((vmcs12->guest_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+ if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
+ !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) {
nested_vmx_entry_failure(vcpu, vmcs12,
EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
return 1;
--
2.8.0.rc3.226.g39d4020

2016-11-30 02:15:28

by David Matlack

[permalink] [raw]
Subject: [PATCH v3 2/5] KVM: nVMX: support restore of VMX capability MSRs

The VMX capability MSRs advertise the set of features the KVM virtual
CPU can support. This set of features varies across different host CPUs
and KVM versions. This patch aims to addresses both sources of
differences, allowing VMs to be migrated across CPUs and KVM versions
without guest-visible changes to these MSRs. Note that cross-KVM-
version migration is only supported from this point forward.

When the VMX capability MSRs are restored, they are audited to check
that the set of features advertised are a subset of what KVM and the
CPU support.

Since the VMX capability MSRs are read-only, they do not need to be on
the default MSR save/restore lists. The userspace hypervisor can set
the values of these MSRs or read them from KVM at VCPU creation time,
and restore the same value after every save/restore.

Signed-off-by: David Matlack <[email protected]>
---
arch/x86/include/asm/vmx.h | 31 +++++
arch/x86/kvm/vmx.c | 290 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 297 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a002b07..a4ca897 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -25,6 +25,7 @@
#define VMX_H


+#include <linux/bitops.h>
#include <linux/types.h>
#include <uapi/asm/vmx.h>

@@ -110,6 +111,36 @@
#define VMX_MISC_SAVE_EFER_LMA 0x00000020
#define VMX_MISC_ACTIVITY_HLT 0x00000040

+static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
+{
+ return vmx_basic & GENMASK_ULL(30, 0);
+}
+
+static inline u32 vmx_basic_vmcs_size(u64 vmx_basic)
+{
+ return (vmx_basic & GENMASK_ULL(44, 32)) >> 32;
+}
+
+static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
+{
+ return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+}
+
+static inline int vmx_misc_cr3_count(u64 vmx_misc)
+{
+ return (vmx_misc & GENMASK_ULL(24, 16)) >> 16;
+}
+
+static inline int vmx_misc_max_msr(u64 vmx_misc)
+{
+ return (vmx_misc & GENMASK_ULL(27, 25)) >> 25;
+}
+
+static inline int vmx_misc_mseg_revid(u64 vmx_misc)
+{
+ return (vmx_misc & GENMASK_ULL(63, 32)) >> 32;
+}
+
/* VMCS Encodings */
enum vmcs_field {
VIRTUAL_PROCESSOR_ID = 0x00000000,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0beb56a..01a2b9e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -465,6 +465,12 @@ struct nested_vmx {
u32 nested_vmx_misc_high;
u32 nested_vmx_ept_caps;
u32 nested_vmx_vpid_caps;
+ u64 nested_vmx_basic;
+ u64 nested_vmx_cr0_fixed0;
+ u64 nested_vmx_cr0_fixed1;
+ u64 nested_vmx_cr4_fixed0;
+ u64 nested_vmx_cr4_fixed1;
+ u64 nested_vmx_vmcs_enum;
};

#define POSTED_INTR_ON 0
@@ -2826,6 +2832,36 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
VMX_MISC_ACTIVITY_HLT;
vmx->nested.nested_vmx_misc_high = 0;
+
+ /*
+ * This MSR reports some information about VMX support. We
+ * should return information about the VMX we emulate for the
+ * guest, and the VMCS structure we give it - not about the
+ * VMX support of the underlying hardware.
+ */
+ vmx->nested.nested_vmx_basic =
+ VMCS12_REVISION |
+ VMX_BASIC_TRUE_CTLS |
+ ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
+ (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
+
+ if (cpu_has_vmx_basic_inout())
+ vmx->nested.nested_vmx_basic |= VMX_BASIC_INOUT;
+
+ /*
+ * These MSRs specify bits which the guest must keep fixed (on or off)
+ * while L1 is in VMXON mode (in L1's root mode, or running an L2).
+ * We picked the standard core2 setting.
+ */
+#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
+#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
+ vmx->nested.nested_vmx_cr0_fixed0 = VMXON_CR0_ALWAYSON;
+ vmx->nested.nested_vmx_cr0_fixed1 = -1ULL;
+ vmx->nested.nested_vmx_cr4_fixed0 = VMXON_CR4_ALWAYSON;
+ vmx->nested.nested_vmx_cr4_fixed1 = -1ULL;
+
+ /* highest index: VMX_PREEMPTION_TIMER_VALUE */
+ vmx->nested.nested_vmx_vmcs_enum = 0x2e;
}

static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
@@ -2841,24 +2877,233 @@ static inline u64 vmx_control_msr(u32 low, u32 high)
return low | ((u64)high << 32);
}

-/* Returns 0 on success, non-0 otherwise. */
-static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
+{
+ superset &= mask;
+ subset &= mask;
+
+ return (superset | subset) == superset;
+}
+
+static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
+{
+ const u64 feature_and_reserved =
+ /* feature (except bit 48; see below) */
+ BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+ /* reserved */
+ BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+ u64 vmx_basic = vmx->nested.nested_vmx_basic;
+
+ if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
+ return -EINVAL;
+
+ /*
+ * KVM does not emulate a version of VMX that constrains physical
+ * addresses of VMX structures (e.g. VMCS) to 32-bits.
+ */
+ if (data & BIT_ULL(48))
+ return -EINVAL;
+
+ if (vmx_basic_vmcs_revision_id(vmx_basic) !=
+ vmx_basic_vmcs_revision_id(data))
+ return -EINVAL;
+
+ if (vmx_basic_vmcs_size(vmx_basic) > vmx_basic_vmcs_size(data))
+ return -EINVAL;
+
+ vmx->nested.nested_vmx_basic = data;
+ return 0;
+}
+
+static int
+vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ u64 supported;
+ u32 *lowp, *highp;
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+ lowp = &vmx->nested.nested_vmx_pinbased_ctls_low;
+ highp = &vmx->nested.nested_vmx_pinbased_ctls_high;
+ break;
+ case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+ lowp = &vmx->nested.nested_vmx_procbased_ctls_low;
+ highp = &vmx->nested.nested_vmx_procbased_ctls_high;
+ break;
+ case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+ lowp = &vmx->nested.nested_vmx_exit_ctls_low;
+ highp = &vmx->nested.nested_vmx_exit_ctls_high;
+ break;
+ case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+ lowp = &vmx->nested.nested_vmx_entry_ctls_low;
+ highp = &vmx->nested.nested_vmx_entry_ctls_high;
+ break;
+ case MSR_IA32_VMX_PROCBASED_CTLS2:
+ lowp = &vmx->nested.nested_vmx_secondary_ctls_low;
+ highp = &vmx->nested.nested_vmx_secondary_ctls_high;
+ break;
+ default:
+ BUG();
+ }
+
+ supported = vmx_control_msr(*lowp, *highp);
+
+ /* Check must-be-1 bits are still 1. */
+ if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
+ return -EINVAL;
+
+ /* Check must-be-0 bits are still 0. */
+ if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
+ return -EINVAL;
+
+ *lowp = data;
+ *highp = data >> 32;
+ return 0;
+}
+
+static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
+{
+ const u64 feature_and_reserved_bits =
+ /* feature */
+ BIT_ULL(5) | GENMASK_ULL(8, 6) | BIT_ULL(14) | BIT_ULL(15) |
+ BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
+ /* reserved */
+ GENMASK_ULL(13, 9) | BIT_ULL(31);
+ u64 vmx_misc;
+
+ vmx_misc = vmx_control_msr(vmx->nested.nested_vmx_misc_low,
+ vmx->nested.nested_vmx_misc_high);
+
+ if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
+ return -EINVAL;
+
+ if ((vmx->nested.nested_vmx_pinbased_ctls_high &
+ PIN_BASED_VMX_PREEMPTION_TIMER) &&
+ vmx_misc_preemption_timer_rate(data) !=
+ vmx_misc_preemption_timer_rate(vmx_misc))
+ return -EINVAL;
+
+ if (vmx_misc_cr3_count(data) > vmx_misc_cr3_count(vmx_misc))
+ return -EINVAL;
+
+ if (vmx_misc_max_msr(data) > vmx_misc_max_msr(vmx_misc))
+ return -EINVAL;
+
+ if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc))
+ return -EINVAL;
+
+ vmx->nested.nested_vmx_misc_low = data;
+ vmx->nested.nested_vmx_misc_high = data >> 32;
+ return 0;
+}
+
+static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
+{
+ u64 vmx_ept_vpid_cap;
+
+ vmx_ept_vpid_cap = vmx_control_msr(vmx->nested.nested_vmx_ept_caps,
+ vmx->nested.nested_vmx_vpid_caps);
+
+ /* Every bit is either reserved or a feature bit. */
+ if (!is_bitwise_subset(vmx_ept_vpid_cap, data, -1ULL))
+ return -EINVAL;
+
+ vmx->nested.nested_vmx_ept_caps = data;
+ vmx->nested.nested_vmx_vpid_caps = data >> 32;
+ return 0;
+}
+
+static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ u64 *msr;
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_CR0_FIXED0:
+ msr = &vmx->nested.nested_vmx_cr0_fixed0;
+ break;
+ case MSR_IA32_VMX_CR4_FIXED0:
+ msr = &vmx->nested.nested_vmx_cr4_fixed0;
+ break;
+ default:
+ BUG();
+ }
+
+ /*
+ * 1 bits (which indicates bits which "must-be-1" during VMX operation)
+ * must be 1 in the restored value.
+ */
+ if (!is_bitwise_subset(data, *msr, -1ULL))
+ return -EINVAL;
+
+ *msr = data;
+ return 0;
+}
+
+/*
+ * Called when userspace is restoring VMX MSRs.
+ *
+ * Returns 0 on success, non-0 otherwise.
+ */
+static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);

switch (msr_index) {
case MSR_IA32_VMX_BASIC:
+ return vmx_restore_vmx_basic(vmx, data);
+ case MSR_IA32_VMX_PINBASED_CTLS:
+ case MSR_IA32_VMX_PROCBASED_CTLS:
+ case MSR_IA32_VMX_EXIT_CTLS:
+ case MSR_IA32_VMX_ENTRY_CTLS:
+ /*
+ * The "non-true" VMX capability MSRs are generated from the
+ * "true" MSRs, so we do not support restoring them directly.
+ *
+ * If userspace wants to emulate VMX_BASIC[55]=0, userspace
+ * should restore the "true" MSRs with the must-be-1 bits
+ * set according to the SDM Vol 3. A.2 "RESERVED CONTROLS AND
+ * DEFAULT SETTINGS".
+ */
+ return -EINVAL;
+ case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+ case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+ case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+ case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+ case MSR_IA32_VMX_PROCBASED_CTLS2:
+ return vmx_restore_control_msr(vmx, msr_index, data);
+ case MSR_IA32_VMX_MISC:
+ return vmx_restore_vmx_misc(vmx, data);
+ case MSR_IA32_VMX_CR0_FIXED0:
+ case MSR_IA32_VMX_CR4_FIXED0:
+ return vmx_restore_fixed0_msr(vmx, msr_index, data);
+ case MSR_IA32_VMX_CR0_FIXED1:
+ case MSR_IA32_VMX_CR4_FIXED1:
+ /*
+ * These MSRs are generated based on the vCPU's CPUID, so we
+ * do not support restoring them directly.
+ */
+ return -EINVAL;
+ case MSR_IA32_VMX_EPT_VPID_CAP:
+ return vmx_restore_vmx_ept_vpid_cap(vmx, data);
+ case MSR_IA32_VMX_VMCS_ENUM:
+ vmx->nested.nested_vmx_vmcs_enum = data;
+ return 0;
+ default:
/*
- * This MSR reports some information about VMX support. We
- * should return information about the VMX we emulate for the
- * guest, and the VMCS structure we give it - not about the
- * VMX support of the underlying hardware.
+ * The rest of the VMX capability MSRs do not support restore.
*/
- *pdata = VMCS12_REVISION | VMX_BASIC_TRUE_CTLS |
- ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
- (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
- if (cpu_has_vmx_basic_inout())
- *pdata |= VMX_BASIC_INOUT;
+ return -EINVAL;
+ }
+}
+
+/* Returns 0 on success, non-0 otherwise. */
+static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_BASIC:
+ *pdata = vmx->nested.nested_vmx_basic;
break;
case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
case MSR_IA32_VMX_PINBASED_CTLS:
@@ -2897,27 +3142,20 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
vmx->nested.nested_vmx_misc_low,
vmx->nested.nested_vmx_misc_high);
break;
- /*
- * These MSRs specify bits which the guest must keep fixed (on or off)
- * while L1 is in VMXON mode (in L1's root mode, or running an L2).
- * We picked the standard core2 setting.
- */
-#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
-#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
case MSR_IA32_VMX_CR0_FIXED0:
- *pdata = VMXON_CR0_ALWAYSON;
+ *pdata = vmx->nested.nested_vmx_cr0_fixed0;
break;
case MSR_IA32_VMX_CR0_FIXED1:
- *pdata = -1ULL;
+ *pdata = vmx->nested.nested_vmx_cr0_fixed1;
break;
case MSR_IA32_VMX_CR4_FIXED0:
- *pdata = VMXON_CR4_ALWAYSON;
+ *pdata = vmx->nested.nested_vmx_cr4_fixed0;
break;
case MSR_IA32_VMX_CR4_FIXED1:
- *pdata = -1ULL;
+ *pdata = vmx->nested.nested_vmx_cr4_fixed1;
break;
case MSR_IA32_VMX_VMCS_ENUM:
- *pdata = 0x2e; /* highest index: VMX_PREEMPTION_TIMER_VALUE */
+ *pdata = vmx->nested.nested_vmx_vmcs_enum;
break;
case MSR_IA32_VMX_PROCBASED_CTLS2:
*pdata = vmx_control_msr(
@@ -3100,7 +3338,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx_leave_nested(vcpu);
break;
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
- return 1; /* they are read-only */
+ if (!msr_info->host_initiated)
+ return 1; /* they are read-only */
+ if (!nested_vmx_allowed(vcpu))
+ return 1;
+ return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
if (!vmx_xsaves_supported())
return 1;
--
2.8.0.rc3.226.g39d4020

2016-11-30 02:15:26

by David Matlack

[permalink] [raw]
Subject: [PATCH v3 4/5] KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID

MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to
be 1 during VMX operation. Since the set of allowed-1 bits is the same
in and out of VMX operation, we can generate these MSRs entirely from
the guest's CPUID. This lets userspace avoiding having to save/restore
these MSRs.

This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs
by default. This is a saner than the current default of -1ull, which
includes bits that the host CPU does not support.

Signed-off-by: David Matlack <[email protected]>
---
arch/x86/kvm/vmx.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b414513..49270c4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2849,16 +2849,18 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_basic |= VMX_BASIC_INOUT;

/*
- * These MSRs specify bits which the guest must keep fixed (on or off)
+ * These MSRs specify bits which the guest must keep fixed on
* while L1 is in VMXON mode (in L1's root mode, or running an L2).
* We picked the standard core2 setting.
*/
#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
vmx->nested.nested_vmx_cr0_fixed0 = VMXON_CR0_ALWAYSON;
- vmx->nested.nested_vmx_cr0_fixed1 = -1ULL;
vmx->nested.nested_vmx_cr4_fixed0 = VMXON_CR4_ALWAYSON;
- vmx->nested.nested_vmx_cr4_fixed1 = -1ULL;
+
+ /* These MSRs specify bits which the guest must keep fixed off. */
+ rdmsrl(MSR_IA32_VMX_CR0_FIXED1, vmx->nested.nested_vmx_cr0_fixed1);
+ rdmsrl(MSR_IA32_VMX_CR4_FIXED1, vmx->nested.nested_vmx_cr4_fixed1);

/* highest index: VMX_PREEMPTION_TIMER_VALUE */
vmx->nested.nested_vmx_vmcs_enum = 0x2e;
@@ -9547,6 +9549,50 @@ static void vmcs_set_secondary_exec_control(u32 new_ctl)
(new_ctl & ~mask) | (cur_ctl & mask));
}

+/*
+ * Generate MSR_IA32_VMX_CR{0,4}_FIXED1 according to CPUID. Only set bits
+ * (indicating "allowed-1") if they are supported in the guest's CPUID.
+ */
+static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_cpuid_entry2 *entry;
+
+ vmx->nested.nested_vmx_cr0_fixed1 = 0xffffffff;
+ vmx->nested.nested_vmx_cr4_fixed1 = X86_CR4_PCE;
+
+#define cr4_fixed1_update(_cr4_mask, _reg, _cpuid_mask) do { \
+ if (entry && (entry->_reg & (_cpuid_mask))) \
+ vmx->nested.nested_vmx_cr4_fixed1 |= (_cr4_mask); \
+} while (0)
+
+ entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+ cr4_fixed1_update(X86_CR4_VME, edx, bit(X86_FEATURE_VME));
+ cr4_fixed1_update(X86_CR4_PVI, edx, bit(X86_FEATURE_VME));
+ cr4_fixed1_update(X86_CR4_TSD, edx, bit(X86_FEATURE_TSC));
+ cr4_fixed1_update(X86_CR4_DE, edx, bit(X86_FEATURE_DE));
+ cr4_fixed1_update(X86_CR4_PSE, edx, bit(X86_FEATURE_PSE));
+ cr4_fixed1_update(X86_CR4_PAE, edx, bit(X86_FEATURE_PAE));
+ cr4_fixed1_update(X86_CR4_MCE, edx, bit(X86_FEATURE_MCE));
+ cr4_fixed1_update(X86_CR4_PGE, edx, bit(X86_FEATURE_PGE));
+ cr4_fixed1_update(X86_CR4_OSFXSR, edx, bit(X86_FEATURE_FXSR));
+ cr4_fixed1_update(X86_CR4_OSXMMEXCPT, edx, bit(X86_FEATURE_XMM));
+ cr4_fixed1_update(X86_CR4_VMXE, ecx, bit(X86_FEATURE_VMX));
+ cr4_fixed1_update(X86_CR4_SMXE, ecx, bit(X86_FEATURE_SMX));
+ cr4_fixed1_update(X86_CR4_PCIDE, ecx, bit(X86_FEATURE_PCID));
+ cr4_fixed1_update(X86_CR4_OSXSAVE, ecx, bit(X86_FEATURE_XSAVE));
+
+ entry = kvm_find_cpuid_entry(vcpu, 0x7, 0);
+ cr4_fixed1_update(X86_CR4_FSGSBASE, ebx, bit(X86_FEATURE_FSGSBASE));
+ cr4_fixed1_update(X86_CR4_SMEP, ebx, bit(X86_FEATURE_SMEP));
+ cr4_fixed1_update(X86_CR4_SMAP, ebx, bit(X86_FEATURE_SMAP));
+ cr4_fixed1_update(X86_CR4_PKE, ecx, bit(X86_FEATURE_PKU));
+ /* TODO: Use X86_CR4_UMIP and X86_FEATURE_UMIP macros */
+ cr4_fixed1_update(bit(11), ecx, bit(2));
+
+#undef cr4_fixed1_update
+}
+
static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -9588,6 +9634,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
else
to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &=
~FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
+
+ if (nested_vmx_allowed(vcpu))
+ nested_vmx_cr_fixed1_bits_update(vcpu);
}

static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
--
2.8.0.rc3.226.g39d4020

2016-11-30 02:15:25

by David Matlack

[permalink] [raw]
Subject: [PATCH v3 5/5] KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry

vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current
VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry
is more faithful to VMCS12.

This patch correctly causes VM-entry to fail when "IA-32e mode guest" is
1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and
"IA-32e mode guest" would silently be disabled by KVM.

Signed-off-by: David Matlack <[email protected]>
---
arch/x86/kvm/vmx.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 49270c4..776dc67 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10386,15 +10386,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
nested_ept_init_mmu_context(vcpu);
}

- if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
- vcpu->arch.efer = vmcs12->guest_ia32_efer;
- else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
- vcpu->arch.efer |= (EFER_LMA | EFER_LME);
- else
- vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
- /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
- vmx_set_efer(vcpu, vcpu->arch.efer);
-
/*
* This sets GUEST_CR0 to vmcs12->guest_cr0, with possibly a modified
* TS bit (for lazy fpu) and bits which we consider mandatory enabled.
@@ -10409,6 +10400,15 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_set_cr4(vcpu, vmcs12->guest_cr4);
vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));

+ if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
+ vcpu->arch.efer = vmcs12->guest_ia32_efer;
+ else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
+ vcpu->arch.efer |= (EFER_LMA | EFER_LME);
+ else
+ vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
+ /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
+ vmx_set_efer(vcpu, vcpu->arch.efer);
+
/* shadow page tables on either EPT or shadow page tables */
kvm_set_cr3(vcpu, vmcs12->guest_cr3);
kvm_mmu_reset_context(vcpu);
--
2.8.0.rc3.226.g39d4020

2016-11-30 11:16:26

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] KVM: nVMX: generate non-true VMX MSRs based on true versions



On 30/11/2016 03:14, David Matlack wrote:
> The "non-true" VMX capability MSRs can be generated from their "true"
> counterparts, by OR-ing the default1 bits. The default1 bits are fixed
> and defined in the SDM.
>
> Since we can generate the non-true VMX MSRs from the true versions,
> there's no need to store both in struct nested_vmx. This also lets
> userspace avoid having to restore the non-true MSRs.
>
> Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so,
> we simply need to set all the default1 bits in the true MSRs (such that
> the true MSRs and the generated non-true MSRs are equal).
>
> Signed-off-by: David Matlack <[email protected]>
> Suggested-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 45 +++++++++++++++++++--------------------------
> 1 file changed, 19 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5382b82..0beb56a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -446,19 +446,21 @@ struct nested_vmx {
> u16 vpid02;
> u16 last_vpid;
>
> + /*
> + * We only store the "true" versions of the VMX capability MSRs. We
> + * generate the "non-true" versions by setting the must-be-1 bits
> + * according to the SDM.
> + */
> u32 nested_vmx_procbased_ctls_low;
> u32 nested_vmx_procbased_ctls_high;
> - u32 nested_vmx_true_procbased_ctls_low;
> u32 nested_vmx_secondary_ctls_low;
> u32 nested_vmx_secondary_ctls_high;
> u32 nested_vmx_pinbased_ctls_low;
> u32 nested_vmx_pinbased_ctls_high;
> u32 nested_vmx_exit_ctls_low;
> u32 nested_vmx_exit_ctls_high;
> - u32 nested_vmx_true_exit_ctls_low;
> u32 nested_vmx_entry_ctls_low;
> u32 nested_vmx_entry_ctls_high;
> - u32 nested_vmx_true_entry_ctls_low;
> u32 nested_vmx_misc_low;
> u32 nested_vmx_misc_high;
> u32 nested_vmx_ept_caps;
> @@ -2712,9 +2714,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
> vmx->nested.nested_vmx_exit_ctls_high |= VM_EXIT_CLEAR_BNDCFGS;
>
> /* We support free control of debug control saving. */
> - vmx->nested.nested_vmx_true_exit_ctls_low =
> - vmx->nested.nested_vmx_exit_ctls_low &
> - ~VM_EXIT_SAVE_DEBUG_CONTROLS;
> + vmx->nested.nested_vmx_exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
>
> /* entry controls */
> rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
> @@ -2733,9 +2733,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
> vmx->nested.nested_vmx_entry_ctls_high |= VM_ENTRY_LOAD_BNDCFGS;
>
> /* We support free control of debug control loading. */
> - vmx->nested.nested_vmx_true_entry_ctls_low =
> - vmx->nested.nested_vmx_entry_ctls_low &
> - ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
> + vmx->nested.nested_vmx_entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
>
> /* cpu-based controls */
> rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
> @@ -2768,8 +2766,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
> CPU_BASED_USE_MSR_BITMAPS;
>
> /* We support free control of CR3 access interception. */
> - vmx->nested.nested_vmx_true_procbased_ctls_low =
> - vmx->nested.nested_vmx_procbased_ctls_low &
> + vmx->nested.nested_vmx_procbased_ctls_low &=
> ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);
>
> /* secondary cpu-based controls */
> @@ -2868,36 +2865,32 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> *pdata = vmx_control_msr(
> vmx->nested.nested_vmx_pinbased_ctls_low,
> vmx->nested.nested_vmx_pinbased_ctls_high);
> + if (msr_index == MSR_IA32_VMX_PINBASED_CTLS)
> + *pdata |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;

Almost: PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR must be
added to both the low and high parts. Likewise below.
I guess you can use vmx_control_msr to generate it, too.

Paolo

> break;
> case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
> - *pdata = vmx_control_msr(
> - vmx->nested.nested_vmx_true_procbased_ctls_low,
> - vmx->nested.nested_vmx_procbased_ctls_high);
> - break;
> case MSR_IA32_VMX_PROCBASED_CTLS:
> *pdata = vmx_control_msr(
> vmx->nested.nested_vmx_procbased_ctls_low,
> vmx->nested.nested_vmx_procbased_ctls_high);
> + if (msr_index == MSR_IA32_VMX_PROCBASED_CTLS)
> + *pdata |= CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
> break;
> case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> - *pdata = vmx_control_msr(
> - vmx->nested.nested_vmx_true_exit_ctls_low,
> - vmx->nested.nested_vmx_exit_ctls_high);
> - break;
> case MSR_IA32_VMX_EXIT_CTLS:
> *pdata = vmx_control_msr(
> vmx->nested.nested_vmx_exit_ctls_low,
> vmx->nested.nested_vmx_exit_ctls_high);
> + if (msr_index == MSR_IA32_VMX_EXIT_CTLS)
> + *pdata |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> break;
> case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> - *pdata = vmx_control_msr(
> - vmx->nested.nested_vmx_true_entry_ctls_low,
> - vmx->nested.nested_vmx_entry_ctls_high);
> - break;
> case MSR_IA32_VMX_ENTRY_CTLS:
> *pdata = vmx_control_msr(
> vmx->nested.nested_vmx_entry_ctls_low,
> vmx->nested.nested_vmx_entry_ctls_high);
> + if (msr_index == MSR_IA32_VMX_ENTRY_CTLS)
> + *pdata |= VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
> break;
> case MSR_IA32_VMX_MISC:
> *pdata = vmx_control_msr(
> @@ -10184,7 +10177,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> }
>
> if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
> - vmx->nested.nested_vmx_true_procbased_ctls_low,
> + vmx->nested.nested_vmx_procbased_ctls_low,
> vmx->nested.nested_vmx_procbased_ctls_high) ||
> !vmx_control_verify(vmcs12->secondary_vm_exec_control,
> vmx->nested.nested_vmx_secondary_ctls_low,
> @@ -10193,10 +10186,10 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> vmx->nested.nested_vmx_pinbased_ctls_low,
> vmx->nested.nested_vmx_pinbased_ctls_high) ||
> !vmx_control_verify(vmcs12->vm_exit_controls,
> - vmx->nested.nested_vmx_true_exit_ctls_low,
> + vmx->nested.nested_vmx_exit_ctls_low,
> vmx->nested.nested_vmx_exit_ctls_high) ||
> !vmx_control_verify(vmcs12->vm_entry_controls,
> - vmx->nested.nested_vmx_true_entry_ctls_low,
> + vmx->nested.nested_vmx_entry_ctls_low,
> vmx->nested.nested_vmx_entry_ctls_high))
> {
> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
>

2016-11-30 11:22:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] VMX Capability MSRs



On 30/11/2016 03:14, David Matlack wrote:
> This patchset adds support setting the VMX capability MSRs from userspace.
> This is required for migration of nested-capable VMs to different CPUs and
> KVM versions.
>
> Patch 1 generates the non-true VMX MSRs using the true MSRs, which allows
> userspace to skip restoring them.
>
> Patch 2 adds support for restoring the VMX capability MSRs.
>
> Patches 3 and 4 make KVM's emulation of MSR_IA32_VMX_CR{0,4}_FIXED1 more
> accurate.
>
> Patch 5 fixes a bug in emulated VM-entry that came up when testing patches
> 3 and 4.
>
> Changes since v2:
> * Generate CR0_FIXED1 in addition to CR4_FIXED1
> * Generate "non-true" capability MSRs from the "true" versions and remove
> "non-true" MSRs from struct nested_vmx.
> * Disallow restore of CR{0,4}_FIXED1 and "non-true" MSRs since they are
> generated.
>
> Changes since v1:
> * Support restoring less-capable versions of MSR_IA32_VMX_BASIC,
> MSR_IA32_VMX_CR{0,4}_FIXED{0,1}.
> * Include VMX_INS_OUTS in MSR_IA32_VMX_BASIC initial value.
>
> David Matlack (5):
> KVM: nVMX: generate non-true VMX MSRs based on true versions
> KVM: nVMX: support restore of VMX capability MSRs
> KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
> KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
> KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
>
> arch/x86/include/asm/vmx.h | 31 +++
> arch/x86/kvm/vmx.c | 479 +++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 427 insertions(+), 83 deletions(-)

Just a small nit that can be fixed on applying. Thanks!

Paolo

2016-11-30 18:05:45

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] KVM: nVMX: generate non-true VMX MSRs based on true versions

On Wed, Nov 30, 2016 at 3:16 AM, Paolo Bonzini <[email protected]> wrote:
> On 30/11/2016 03:14, David Matlack wrote:
>>
>> /* secondary cpu-based controls */
>> @@ -2868,36 +2865,32 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>> *pdata = vmx_control_msr(
>> vmx->nested.nested_vmx_pinbased_ctls_low,
>> vmx->nested.nested_vmx_pinbased_ctls_high);
>> + if (msr_index == MSR_IA32_VMX_PINBASED_CTLS)
>> + *pdata |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
>
> Almost: PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR must be
> added to both the low and high parts. Likewise below.
> I guess you can use vmx_control_msr to generate it, too.

SGTM.

Although that would mean the true MSRs indicate a bit must-be-0 while
the non-true MSRs are indicating it must-be-1, which seems odd.

2016-11-30 18:06:38

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] VMX Capability MSRs

On Wed, Nov 30, 2016 at 3:22 AM, Paolo Bonzini <[email protected]> wrote:
>
>
> On 30/11/2016 03:14, David Matlack wrote:
>> This patchset adds support setting the VMX capability MSRs from userspace.
>> This is required for migration of nested-capable VMs to different CPUs and
>> KVM versions.
>>
>> Patch 1 generates the non-true VMX MSRs using the true MSRs, which allows
>> userspace to skip restoring them.
>>
>> Patch 2 adds support for restoring the VMX capability MSRs.
>>
>> Patches 3 and 4 make KVM's emulation of MSR_IA32_VMX_CR{0,4}_FIXED1 more
>> accurate.
>>
>> Patch 5 fixes a bug in emulated VM-entry that came up when testing patches
>> 3 and 4.
>>
>> Changes since v2:
>> * Generate CR0_FIXED1 in addition to CR4_FIXED1
>> * Generate "non-true" capability MSRs from the "true" versions and remove
>> "non-true" MSRs from struct nested_vmx.
>> * Disallow restore of CR{0,4}_FIXED1 and "non-true" MSRs since they are
>> generated.
>>
>> Changes since v1:
>> * Support restoring less-capable versions of MSR_IA32_VMX_BASIC,
>> MSR_IA32_VMX_CR{0,4}_FIXED{0,1}.
>> * Include VMX_INS_OUTS in MSR_IA32_VMX_BASIC initial value.
>>
>> David Matlack (5):
>> KVM: nVMX: generate non-true VMX MSRs based on true versions
>> KVM: nVMX: support restore of VMX capability MSRs
>> KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
>> KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID
>> KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
>>
>> arch/x86/include/asm/vmx.h | 31 +++
>> arch/x86/kvm/vmx.c | 479 +++++++++++++++++++++++++++++++++++++--------
>> 2 files changed, 427 insertions(+), 83 deletions(-)
>
> Just a small nit that can be fixed on applying. Thanks!

Thanks for the thorough review!

>
> Paolo

2016-11-30 20:46:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] KVM: nVMX: generate non-true VMX MSRs based on true versions



----- Original Message -----
> From: "David Matlack" <[email protected]>
> To: "Paolo Bonzini" <[email protected]>
> Cc: "kvm list" <[email protected]>, [email protected], "Jim Mattson" <[email protected]>, "Radim
> Krčmář" <[email protected]>
> Sent: Wednesday, November 30, 2016 7:05:04 PM
> Subject: Re: [PATCH v3 1/5] KVM: nVMX: generate non-true VMX MSRs based on true versions
>
> On Wed, Nov 30, 2016 at 3:16 AM, Paolo Bonzini <[email protected]> wrote:
> > On 30/11/2016 03:14, David Matlack wrote:
> >>
> >> /* secondary cpu-based controls */
> >> @@ -2868,36 +2865,32 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu,
> >> u32 msr_index, u64 *pdata)
> >> *pdata = vmx_control_msr(
> >> vmx->nested.nested_vmx_pinbased_ctls_low,
> >> vmx->nested.nested_vmx_pinbased_ctls_high);
> >> + if (msr_index == MSR_IA32_VMX_PINBASED_CTLS)
> >> + *pdata |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
> >
> > Almost: PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR must be
> > added to both the low and high parts. Likewise below.
> > I guess you can use vmx_control_msr to generate it, too.
>
> SGTM.
>
> Although that would mean the true MSRs indicate a bit must-be-0 while
> the non-true MSRs are indicating it must-be-1, which seems odd.

You're right. the high part is "can be 1", so the true MSR's high part
must already include the always-on-without-true-MSR bits. Good!

Paolo

2016-11-30 22:02:10

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation

2016-11-29 18:14-0800, David Matlack:
> KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
> all CR0 and CR4 bits are allowed to be 1 during VMX operation.
>
> This does not match real hardware, which disallows the high 32 bits of
> CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
> which are defined in the SDM but missing according to CPUID). A guest
> can induce a VM-entry failure by setting these bits in GUEST_CR0 and
> GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
> valid.
>
> Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
> checks on these registers do not verify must-be-0 bits. Fix these checks
> to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
>
> This patch should introduce no change in behavior in KVM, since these
> MSRs are still -1ULL.
>
> Signed-off-by: David Matlack <[email protected]>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
> +static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
> +{
> + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
> + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +
> + if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
> + SECONDARY_EXEC_UNRESTRICTED_GUEST &&
> + nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
> + fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);

These bits also seem to be guaranteed in fixed1 ... complicated
dependencies.

There is another exception, SDM 26.3.1.1 (Checks on Guest Control
Registers, Debug Registers, and MSRs):

Bit 29 (corresponding to CR0.NW) and bit 30 (CD) are never checked
because the values of these bits are not changed by VM entry; see
Section 26.3.2.1.

And another check:

If bit 31 in the CR0 field (corresponding to PG) is 1, bit 0 in that
field (PE) must also be 1.

> +
> + return fixed_bits_valid(val, fixed0, fixed1);
> +}
> +
> +static bool nested_host_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
> +{
> + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
> + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
> +
> + return fixed_bits_valid(val, fixed0, fixed1);
> +}
> +
> +static bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> +{
> + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed0;
> + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed1;
> +
> + return fixed_bits_valid(val, fixed0, fixed1);
> +}
> +
> +/* No difference in the restrictions on guest and host CR4 in VMX operation. */
> +#define nested_guest_cr4_valid nested_cr4_valid
> +#define nested_host_cr4_valid nested_cr4_valid

We should use cr0 and cr4 checks also in handle_vmon().

I've applied this series to kvm/queue for early testing.
Please send replacement patch or patch(es) on top of this series.

Thanks.

2016-11-30 22:33:34

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation



----- Original Message -----
> From: "Radim Krčmář" <[email protected]>
> To: "David Matlack" <[email protected]>
> Cc: [email protected], [email protected], [email protected], [email protected]
> Sent: Wednesday, November 30, 2016 10:52:35 PM
> Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
>
> 2016-11-29 18:14-0800, David Matlack:
> > KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
> > all CR0 and CR4 bits are allowed to be 1 during VMX operation.
> >
> > This does not match real hardware, which disallows the high 32 bits of
> > CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
> > which are defined in the SDM but missing according to CPUID). A guest
> > can induce a VM-entry failure by setting these bits in GUEST_CR0 and
> > GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
> > valid.
> >
> > Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
> > checks on these registers do not verify must-be-0 bits. Fix these checks
> > to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
> >
> > This patch should introduce no change in behavior in KVM, since these
> > MSRs are still -1ULL.
> >
> > Signed-off-by: David Matlack <[email protected]>
> > ---
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > @@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
> > +static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long
> > val)
> > +{
> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
> > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> > +
> > + if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
> > + SECONDARY_EXEC_UNRESTRICTED_GUEST &&
> > + nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
> > + fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);
>
> These bits also seem to be guaranteed in fixed1 ... complicated
> dependencies.

Bits that are set in fixed0 must be set in fixed1 too. Since patch 4
always sets CR0_FIXED1 to all-ones (matching bare metal), this is okay.

> There is another exception, SDM 26.3.1.1 (Checks on Guest Control
> Registers, Debug Registers, and MSRs):
>
> Bit 29 (corresponding to CR0.NW) and bit 30 (CD) are never checked
> because the values of these bits are not changed by VM entry; see
> Section 26.3.2.1.

Same here, we never check them anyway.

> And another check:
>
> If bit 31 in the CR0 field (corresponding to PG) is 1, bit 0 in that
> field (PE) must also be 1.

This should not be a problem, a failed vmentry is reflected into L1
anyway. We only need to check insofar as we could have a more restrictive
check than what the processor does.

Paolo

> > +
> > + return fixed_bits_valid(val, fixed0, fixed1);
> > +}
> > +
> > +static bool nested_host_cr0_valid(struct kvm_vcpu *vcpu, unsigned long
> > val)
> > +{
> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
> > +
> > + return fixed_bits_valid(val, fixed0, fixed1);
> > +}
> > +
> > +static bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> > +{
> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed0;
> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed1;
> > +
> > + return fixed_bits_valid(val, fixed0, fixed1);
> > +}
> > +
> > +/* No difference in the restrictions on guest and host CR4 in VMX
> > operation. */
> > +#define nested_guest_cr4_valid nested_cr4_valid
> > +#define nested_host_cr4_valid nested_cr4_valid
>
> We should use cr0 and cr4 checks also in handle_vmon().
>
> I've applied this series to kvm/queue for early testing.
> Please send replacement patch or patch(es) on top of this series.
>
> Thanks.
>

2016-11-30 23:53:57

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation

On Wed, Nov 30, 2016 at 2:33 PM, Paolo Bonzini <[email protected]> wrote:
> ----- Original Message -----
>> From: "Radim Krčmář" <[email protected]>
>> To: "David Matlack" <[email protected]>
>> Cc: [email protected], [email protected], [email protected], [email protected]
>> Sent: Wednesday, November 30, 2016 10:52:35 PM
>> Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
>>
>> 2016-11-29 18:14-0800, David Matlack:
>> > KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
>> > all CR0 and CR4 bits are allowed to be 1 during VMX operation.
>> >
>> > This does not match real hardware, which disallows the high 32 bits of
>> > CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
>> > which are defined in the SDM but missing according to CPUID). A guest
>> > can induce a VM-entry failure by setting these bits in GUEST_CR0 and
>> > GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
>> > valid.
>> >
>> > Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
>> > checks on these registers do not verify must-be-0 bits. Fix these checks
>> > to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
>> >
>> > This patch should introduce no change in behavior in KVM, since these
>> > MSRs are still -1ULL.
>> >
>> > Signed-off-by: David Matlack <[email protected]>
>> > ---
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > @@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
>> > +static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long
>> > val)
>> > +{
>> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
>> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
>> > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> > +
>> > + if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
>> > + SECONDARY_EXEC_UNRESTRICTED_GUEST &&
>> > + nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
>> > + fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);
>>
>> These bits also seem to be guaranteed in fixed1 ... complicated
>> dependencies.
>
> Bits that are set in fixed0 must be set in fixed1 too. Since patch 4
> always sets CR0_FIXED1 to all-ones (matching bare metal), this is okay.
>
>> There is another exception, SDM 26.3.1.1 (Checks on Guest Control
>> Registers, Debug Registers, and MSRs):
>>
>> Bit 29 (corresponding to CR0.NW) and bit 30 (CD) are never checked
>> because the values of these bits are not changed by VM entry; see
>> Section 26.3.2.1.
>
> Same here, we never check them anyway.
>
>> And another check:
>>
>> If bit 31 in the CR0 field (corresponding to PG) is 1, bit 0 in that
>> field (PE) must also be 1.
>
> This should not be a problem, a failed vmentry is reflected into L1
> anyway. We only need to check insofar as we could have a more restrictive
> check than what the processor does.

I had the same thought when I was first writing this patch, Radim.
Maybe we should add a comment here. E.g.

/*
* CR0.PG && !CR0.PE is also invalid but caught by the CPU
* during VM-entry to L2.
*/

>
> Paolo
>
>> > +
>> > + return fixed_bits_valid(val, fixed0, fixed1);
>> > +}
>> > +
>> > +static bool nested_host_cr0_valid(struct kvm_vcpu *vcpu, unsigned long
>> > val)
>> > +{
>> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
>> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
>> > +
>> > + return fixed_bits_valid(val, fixed0, fixed1);
>> > +}
>> > +
>> > +static bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
>> > +{
>> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed0;
>> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed1;
>> > +
>> > + return fixed_bits_valid(val, fixed0, fixed1);
>> > +}
>> > +
>> > +/* No difference in the restrictions on guest and host CR4 in VMX
>> > operation. */
>> > +#define nested_guest_cr4_valid nested_cr4_valid
>> > +#define nested_host_cr4_valid nested_cr4_valid
>>
>> We should use cr0 and cr4 checks also in handle_vmon().
>>
>> I've applied this series to kvm/queue for early testing.
>> Please send replacement patch or patch(es) on top of this series.
>>
>> Thanks.
>>

2016-12-01 13:22:43

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation

2016-11-30 15:53-0800, David Matlack:
> On Wed, Nov 30, 2016 at 2:33 PM, Paolo Bonzini <[email protected]> wrote:
>> ----- Original Message -----
>>> From: "Radim Krčmář" <[email protected]>
>>> To: "David Matlack" <[email protected]>
>>> Cc: [email protected], [email protected], [email protected], [email protected]
>>> Sent: Wednesday, November 30, 2016 10:52:35 PM
>>> Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
>>>
>>> 2016-11-29 18:14-0800, David Matlack:
>>> > KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
>>> > all CR0 and CR4 bits are allowed to be 1 during VMX operation.
>>> >
>>> > This does not match real hardware, which disallows the high 32 bits of
>>> > CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
>>> > which are defined in the SDM but missing according to CPUID). A guest
>>> > can induce a VM-entry failure by setting these bits in GUEST_CR0 and
>>> > GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
>>> > valid.
>>> >
>>> > Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
>>> > checks on these registers do not verify must-be-0 bits. Fix these checks
>>> > to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
>>> >
>>> > This patch should introduce no change in behavior in KVM, since these
>>> > MSRs are still -1ULL.
>>> >
>>> > Signed-off-by: David Matlack <[email protected]>
>>> > ---
>>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> > @@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
>>> > +static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long
>>> > val)
>>> > +{
>>> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
>>> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
>>> > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>>> > +
>>> > + if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
>>> > + SECONDARY_EXEC_UNRESTRICTED_GUEST &&
>>> > + nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
>>> > + fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);
>>>
>>> These bits also seem to be guaranteed in fixed1 ... complicated
>>> dependencies.
>>
>> Bits that are set in fixed0 must be set in fixed1 too. Since patch 4
>> always sets CR0_FIXED1 to all-ones (matching bare metal), this is okay.
>>
>>> There is another exception, SDM 26.3.1.1 (Checks on Guest Control
>>> Registers, Debug Registers, and MSRs):
>>>
>>> Bit 29 (corresponding to CR0.NW) and bit 30 (CD) are never checked
>>> because the values of these bits are not changed by VM entry; see
>>> Section 26.3.2.1.
>>
>> Same here, we never check them anyway.

True.

>>> And another check:
>>>
>>> If bit 31 in the CR0 field (corresponding to PG) is 1, bit 0 in that
>>> field (PE) must also be 1.
>>
>> This should not be a problem, a failed vmentry is reflected into L1
>> anyway. We only need to check insofar as we could have a more restrictive
>> check than what the processor does.
>
> I had the same thought when I was first writing this patch, Radim.
> Maybe we should add a comment here. E.g.
>
> /*
> * CR0.PG && !CR0.PE is also invalid but caught by the CPU
> * during VM-entry to L2.
> */

Ah, no need, thanks. I expected that we want to kill L1 when VM entry
failure happens, because it could have been L0's bug too ... we also
pass failure while checking host state, which is definitely L0 bug and
shouldn't ever get to L1.

I'd like to assume that KVM is (going to be) sound, so both cases are
reasonable (just hindering development).