This patchset includes v2 of "KVM: nVMX: support restore of VMX capability
MSRs" (patch 1) as well as some additional related patches that came up
while preparing v2.
Patches 2 and 3 make KVM's emulation of MSR_IA32_VMX_CR{0,4}_FIXED1 more
accurate. Patch 4 fixes a bug in emulated VM-entry that came up when
testing patches 2 and 3.
Changes since v1:
* Support restoring less-capable versions of MSR_IA32_VMX_BASIC,
MSR_IA32_VMX_CR{0,4}_FIXED{0,1}.
* Include VMX_INS_OUTS in MSR_IA32_VMX_BASIC initial value.
David Matlack (4):
KVM: nVMX: support restore of VMX capability MSRs
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM: nVMX: accurate emulation of MSR_IA32_CR{0,4}_FIXED1
KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
arch/x86/include/asm/vmx.h | 31 ++++
arch/x86/kvm/vmx.c | 443 +++++++++++++++++++++++++++++++++++++++------
2 files changed, 421 insertions(+), 53 deletions(-)
--
2.8.0.rc3.226.g39d4020
The VMX capability MSRs advertise the set of features the KVM virtual
CPU can support. This set of features vary across different host CPUs
and KVM versions. This patch aims to addresses both sources of
differences, allowing VMs to be migrated across CPUs and KVM versions
without guest-visible changes to these MSRs. Note that cross-KVM-
version migration is only supported from this point forward.
When the VMX capability MSRs are restored, they are audited to check
that the set of features advertised are a subset of what KVM and the
CPU support.
Since the VMX capability MSRs are read-only, they do not need to be on
the default MSR save/restore lists. The userspace hypervisor can set
the values of these MSRs or read them from KVM at VCPU creation time,
and restore the same value after every save/restore.
Signed-off-by: David Matlack <[email protected]>
---
arch/x86/include/asm/vmx.h | 31 +++++
arch/x86/kvm/vmx.c | 317 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 324 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a002b07..a4ca897 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -25,6 +25,7 @@
#define VMX_H
+#include <linux/bitops.h>
#include <linux/types.h>
#include <uapi/asm/vmx.h>
@@ -110,6 +111,36 @@
#define VMX_MISC_SAVE_EFER_LMA 0x00000020
#define VMX_MISC_ACTIVITY_HLT 0x00000040
+static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
+{
+ return vmx_basic & GENMASK_ULL(30, 0);
+}
+
+static inline u32 vmx_basic_vmcs_size(u64 vmx_basic)
+{
+ return (vmx_basic & GENMASK_ULL(44, 32)) >> 32;
+}
+
+static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
+{
+ return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+}
+
+static inline int vmx_misc_cr3_count(u64 vmx_misc)
+{
+ return (vmx_misc & GENMASK_ULL(24, 16)) >> 16;
+}
+
+static inline int vmx_misc_max_msr(u64 vmx_misc)
+{
+ return (vmx_misc & GENMASK_ULL(27, 25)) >> 25;
+}
+
+static inline int vmx_misc_mseg_revid(u64 vmx_misc)
+{
+ return (vmx_misc & GENMASK_ULL(63, 32)) >> 32;
+}
+
/* VMCS Encodings */
enum vmcs_field {
VIRTUAL_PROCESSOR_ID = 0x00000000,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..6ec3832 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -463,6 +463,12 @@ struct nested_vmx {
u32 nested_vmx_misc_high;
u32 nested_vmx_ept_caps;
u32 nested_vmx_vpid_caps;
+ u64 nested_vmx_basic;
+ u64 nested_vmx_cr0_fixed0;
+ u64 nested_vmx_cr0_fixed1;
+ u64 nested_vmx_cr4_fixed0;
+ u64 nested_vmx_cr4_fixed1;
+ u64 nested_vmx_vmcs_enum;
};
#define POSTED_INTR_ON 0
@@ -2829,6 +2835,36 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
VMX_MISC_ACTIVITY_HLT;
vmx->nested.nested_vmx_misc_high = 0;
+
+ /*
+ * This MSR reports some information about VMX support. We
+ * should return information about the VMX we emulate for the
+ * guest, and the VMCS structure we give it - not about the
+ * VMX support of the underlying hardware.
+ */
+ vmx->nested.nested_vmx_basic =
+ VMCS12_REVISION |
+ VMX_BASIC_TRUE_CTLS |
+ ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
+ (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
+
+ if (cpu_has_vmx_basic_inout())
+ vmx->nested.nested_vmx_basic |= VMX_BASIC_INOUT;
+
+ /*
+ * These MSRs specify bits which the guest must keep fixed (on or off)
+ * while L1 is in VMXON mode (in L1's root mode, or running an L2).
+ * We picked the standard core2 setting.
+ */
+#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
+#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
+ vmx->nested.nested_vmx_cr0_fixed0 = VMXON_CR0_ALWAYSON;
+ vmx->nested.nested_vmx_cr0_fixed1 = -1ULL;
+ vmx->nested.nested_vmx_cr4_fixed0 = VMXON_CR4_ALWAYSON;
+ vmx->nested.nested_vmx_cr4_fixed1 = -1ULL;
+
+ /* highest index: VMX_PREEMPTION_TIMER_VALUE */
+ vmx->nested.nested_vmx_vmcs_enum = 0x2e;
}
static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
@@ -2844,24 +2880,260 @@ static inline u64 vmx_control_msr(u32 low, u32 high)
return low | ((u64)high << 32);
}
-/* Returns 0 on success, non-0 otherwise. */
-static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
+{
+ superset &= mask;
+ subset &= mask;
+
+ return (superset | subset) == superset;
+}
+
+static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
+{
+ const u64 feature_and_reserved =
+ /* feature */
+ BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+ /* reserved */
+ BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+ u64 vmx_basic = vmx->nested.nested_vmx_basic;
+
+ if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
+ return -EINVAL;
+
+ /*
+ * KVM does not emulate a version of VMX that constrains physical
+ * addresses of VMX structures (e.g. VMCS) to 32-bits.
+ */
+ if (data & BIT_ULL(48))
+ return -EINVAL;
+
+ if (vmx_basic_vmcs_revision_id(vmx_basic) !=
+ vmx_basic_vmcs_revision_id(data))
+ return -EINVAL;
+
+ if (vmx_basic_vmcs_size(vmx_basic) > vmx_basic_vmcs_size(data))
+ return -EINVAL;
+
+ vmx->nested.nested_vmx_basic = data;
+ return 0;
+}
+
+static int
+vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ u64 supported;
+ u32 *lowp, *highp;
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+ case MSR_IA32_VMX_PINBASED_CTLS:
+ lowp = &vmx->nested.nested_vmx_pinbased_ctls_low;
+ highp = &vmx->nested.nested_vmx_pinbased_ctls_high;
+ break;
+ case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+ lowp = &vmx->nested.nested_vmx_true_procbased_ctls_low;
+ highp = &vmx->nested.nested_vmx_procbased_ctls_high;
+ break;
+ case MSR_IA32_VMX_PROCBASED_CTLS:
+ lowp = &vmx->nested.nested_vmx_procbased_ctls_low;
+ highp = &vmx->nested.nested_vmx_procbased_ctls_high;
+ break;
+ case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+ lowp = &vmx->nested.nested_vmx_true_exit_ctls_low;
+ highp = &vmx->nested.nested_vmx_exit_ctls_high;
+ break;
+ case MSR_IA32_VMX_EXIT_CTLS:
+ lowp = &vmx->nested.nested_vmx_exit_ctls_low,
+ highp = &vmx->nested.nested_vmx_exit_ctls_high;
+ break;
+ case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+ lowp = &vmx->nested.nested_vmx_true_entry_ctls_low;
+ highp = &vmx->nested.nested_vmx_entry_ctls_high;
+ break;
+ case MSR_IA32_VMX_ENTRY_CTLS:
+ lowp = &vmx->nested.nested_vmx_entry_ctls_low;
+ highp = &vmx->nested.nested_vmx_entry_ctls_high;
+ break;
+ case MSR_IA32_VMX_PROCBASED_CTLS2:
+ lowp = &vmx->nested.nested_vmx_secondary_ctls_low;
+ highp = &vmx->nested.nested_vmx_secondary_ctls_high;
+ break;
+ default:
+ BUG();
+ }
+
+ supported = vmx_control_msr(*lowp, *highp);
+
+ /* Check must-be-1 bits are still 1. */
+ if (!is_bitwise_subset(data, supported, GENMASK_ULL(31, 0)))
+ return -EINVAL;
+
+ /* Check must-be-0 bits are still 0. */
+ if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
+ return -EINVAL;
+
+ *lowp = data;
+ *highp = data >> 32;
+ return 0;
+}
+
+static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
+{
+ const u64 feature_and_reserved_bits =
+ /* feature */
+ BIT_ULL(5) | GENMASK_ULL(8, 6) | BIT_ULL(14) | BIT_ULL(15) |
+ BIT_ULL(28) | BIT_ULL(29) | BIT_ULL(30) |
+ /* reserved */
+ GENMASK_ULL(13, 9) | BIT_ULL(31);
+ u64 vmx_misc;
+
+ vmx_misc = vmx_control_msr(vmx->nested.nested_vmx_misc_low,
+ vmx->nested.nested_vmx_misc_high);
+
+ if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
+ return -EINVAL;
+
+ if ((vmx->nested.nested_vmx_pinbased_ctls_high &
+ PIN_BASED_VMX_PREEMPTION_TIMER) &&
+ vmx_misc_preemption_timer_rate(data) !=
+ vmx_misc_preemption_timer_rate(vmx_misc))
+ return -EINVAL;
+
+ if (vmx_misc_cr3_count(data) > vmx_misc_cr3_count(vmx_misc))
+ return -EINVAL;
+
+ if (vmx_misc_max_msr(data) > vmx_misc_max_msr(vmx_misc))
+ return -EINVAL;
+
+ if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc))
+ return -EINVAL;
+
+ vmx->nested.nested_vmx_misc_low = data;
+ vmx->nested.nested_vmx_misc_high = data >> 32;
+ return 0;
+}
+
+static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
+{
+ u64 vmx_ept_vpid_cap;
+
+ vmx_ept_vpid_cap = vmx_control_msr(vmx->nested.nested_vmx_ept_caps,
+ vmx->nested.nested_vmx_vpid_caps);
+
+ /* Every bit is either reserved or a feature bit. */
+ if (!is_bitwise_subset(vmx_ept_vpid_cap, data, -1ULL))
+ return -EINVAL;
+
+ vmx->nested.nested_vmx_ept_caps = data;
+ vmx->nested.nested_vmx_vpid_caps = data >> 32;
+ return 0;
+}
+
+static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ u64 *msr;
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_CR0_FIXED0:
+ msr = &vmx->nested.nested_vmx_cr0_fixed0;
+ break;
+ case MSR_IA32_VMX_CR4_FIXED0:
+ msr = &vmx->nested.nested_vmx_cr4_fixed0;
+ break;
+ default:
+ BUG();
+ }
+
+ /*
+ * 1 bits (which indicates bits which "must-be-1" during VMX operation)
+ * must be 1 in the restored value.
+ */
+ if (!is_bitwise_subset(data, *msr, -1ULL))
+ return -EINVAL;
+
+ *msr = data;
+ return 0;
+}
+
+static int vmx_restore_fixed1_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
+{
+ u64 *msr;
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_CR0_FIXED1:
+ msr = &vmx->nested.nested_vmx_cr0_fixed1;
+ break;
+ case MSR_IA32_VMX_CR4_FIXED1:
+ msr = &vmx->nested.nested_vmx_cr4_fixed1;
+ break;
+ default:
+ BUG();
+ }
+
+ /*
+ * 0 bits (which indicates bits which "must-be-0" during VMX operation)
+ * must be 0 in the restored value.
+ */
+ if (!is_bitwise_subset(*msr, data, -1ULL))
+ return -EINVAL;
+
+ *msr = data;
+ return 0;
+}
+
+/*
+ * Called when userspace is restoring VMX MSRs.
+ *
+ * Returns 0 on success, non-0 otherwise.
+ */
+static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
switch (msr_index) {
case MSR_IA32_VMX_BASIC:
+ return vmx_restore_vmx_basic(vmx, data);
+ case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+ case MSR_IA32_VMX_PINBASED_CTLS:
+ case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+ case MSR_IA32_VMX_PROCBASED_CTLS:
+ case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+ case MSR_IA32_VMX_EXIT_CTLS:
+ case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+ case MSR_IA32_VMX_ENTRY_CTLS:
+ case MSR_IA32_VMX_PROCBASED_CTLS2:
+ return vmx_restore_control_msr(vmx, msr_index, data);
+ case MSR_IA32_VMX_MISC:
+ return vmx_restore_vmx_misc(vmx, data);
+ case MSR_IA32_VMX_CR0_FIXED0:
+ case MSR_IA32_VMX_CR4_FIXED0:
+ return vmx_restore_fixed0_msr(vmx, msr_index, data);
+ case MSR_IA32_VMX_CR0_FIXED1:
+ case MSR_IA32_VMX_CR4_FIXED1:
+ return vmx_restore_fixed1_msr(vmx, msr_index, data);
+ case MSR_IA32_VMX_EPT_VPID_CAP:
+ return vmx_restore_vmx_ept_vpid_cap(vmx, data);
+ case MSR_IA32_VMX_VMCS_ENUM:
+ vmx->nested.nested_vmx_vmcs_enum = data;
+ return 0;
+ default:
/*
- * This MSR reports some information about VMX support. We
- * should return information about the VMX we emulate for the
- * guest, and the VMCS structure we give it - not about the
- * VMX support of the underlying hardware.
+ * The rest of the VMX capability MSRs do not support restore.
*/
- *pdata = VMCS12_REVISION | VMX_BASIC_TRUE_CTLS |
- ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
- (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
- if (cpu_has_vmx_basic_inout())
- *pdata |= VMX_BASIC_INOUT;
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* Returns 0 on success, non-0 otherwise. */
+static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ switch (msr_index) {
+ case MSR_IA32_VMX_BASIC:
+ *pdata = vmx->nested.nested_vmx_basic;
break;
case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
case MSR_IA32_VMX_PINBASED_CTLS:
@@ -2904,27 +3176,20 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
vmx->nested.nested_vmx_misc_low,
vmx->nested.nested_vmx_misc_high);
break;
- /*
- * These MSRs specify bits which the guest must keep fixed (on or off)
- * while L1 is in VMXON mode (in L1's root mode, or running an L2).
- * We picked the standard core2 setting.
- */
-#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
-#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
case MSR_IA32_VMX_CR0_FIXED0:
- *pdata = VMXON_CR0_ALWAYSON;
+ *pdata = vmx->nested.nested_vmx_cr0_fixed0;
break;
case MSR_IA32_VMX_CR0_FIXED1:
- *pdata = -1ULL;
+ *pdata = vmx->nested.nested_vmx_cr0_fixed1;
break;
case MSR_IA32_VMX_CR4_FIXED0:
- *pdata = VMXON_CR4_ALWAYSON;
+ *pdata = vmx->nested.nested_vmx_cr4_fixed0;
break;
case MSR_IA32_VMX_CR4_FIXED1:
- *pdata = -1ULL;
+ *pdata = vmx->nested.nested_vmx_cr4_fixed1;
break;
case MSR_IA32_VMX_VMCS_ENUM:
- *pdata = 0x2e; /* highest index: VMX_PREEMPTION_TIMER_VALUE */
+ *pdata = vmx->nested.nested_vmx_vmcs_enum;
break;
case MSR_IA32_VMX_PROCBASED_CTLS2:
*pdata = vmx_control_msr(
@@ -3107,7 +3372,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx_leave_nested(vcpu);
break;
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
- return 1; /* they are read-only */
+ if (!msr_info->host_initiated)
+ return 1; /* they are read-only */
+ if (!nested_vmx_allowed(vcpu))
+ return 1;
+ return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
if (!vmx_xsaves_supported())
return 1;
--
2.8.0.rc3.226.g39d4020
Set MSR_IA32_CR{0,4}_FIXED1 to match the CPU's MSRs.
In addition, MSR_IA32_CR4_FIXED1 should reflect the available CR4 bits
according to CPUID. Whenever guest CPUID is updated by userspace,
regenerate MSR_IA32_CR4_FIXED1 to match it.
Signed-off-by: David Matlack <[email protected]>
---
Note: "x86/cpufeature: Add User-Mode Instruction Prevention definitions" has
not hit kvm/master yet so the macros for X86_CR4_UMIP and X86_FEATURE_UMIP
are not available.
arch/x86/kvm/vmx.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a2a5ad8..ac5d9c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2852,16 +2852,18 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_basic |= VMX_BASIC_INOUT;
/*
- * These MSRs specify bits which the guest must keep fixed (on or off)
+ * These MSRs specify bits which the guest must keep fixed on
* while L1 is in VMXON mode (in L1's root mode, or running an L2).
* We picked the standard core2 setting.
*/
#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
vmx->nested.nested_vmx_cr0_fixed0 = VMXON_CR0_ALWAYSON;
- vmx->nested.nested_vmx_cr0_fixed1 = -1ULL;
vmx->nested.nested_vmx_cr4_fixed0 = VMXON_CR4_ALWAYSON;
- vmx->nested.nested_vmx_cr4_fixed1 = -1ULL;
+
+ /* These MSRs specify bits which the guest must keep fixed off. */
+ rdmsrl(MSR_IA32_VMX_CR0_FIXED1, vmx->nested.nested_vmx_cr0_fixed1);
+ rdmsrl(MSR_IA32_VMX_CR4_FIXED1, vmx->nested.nested_vmx_cr4_fixed1);
/* highest index: VMX_PREEMPTION_TIMER_VALUE */
vmx->nested.nested_vmx_vmcs_enum = 0x2e;
@@ -9580,6 +9582,49 @@ static void vmcs_set_secondary_exec_control(u32 new_ctl)
(new_ctl & ~mask) | (cur_ctl & mask));
}
+/*
+ * Generate MSR_IA32_VMX_CR4_FIXED1 according to CPUID. Only set bits
+ * (indicating "allowed-1") if they are supported in the guest's CPUID.
+ */
+static void nested_vmx_cr4_fixed1_update(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_cpuid_entry2 *entry;
+
+#define update(_cr4_mask, _reg, _cpuid_mask) do { \
+ if (entry && (entry->_reg & (_cpuid_mask))) \
+ vmx->nested.nested_vmx_cr4_fixed1 |= (_cr4_mask); \
+} while (0)
+
+ vmx->nested.nested_vmx_cr4_fixed1 = X86_CR4_PCE;
+
+ entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+ update(X86_CR4_VME, edx, bit(X86_FEATURE_VME));
+ update(X86_CR4_PVI, edx, bit(X86_FEATURE_VME));
+ update(X86_CR4_TSD, edx, bit(X86_FEATURE_TSC));
+ update(X86_CR4_DE, edx, bit(X86_FEATURE_DE));
+ update(X86_CR4_PSE, edx, bit(X86_FEATURE_PSE));
+ update(X86_CR4_PAE, edx, bit(X86_FEATURE_PAE));
+ update(X86_CR4_MCE, edx, bit(X86_FEATURE_MCE));
+ update(X86_CR4_PGE, edx, bit(X86_FEATURE_PGE));
+ update(X86_CR4_OSFXSR, edx, bit(X86_FEATURE_FXSR));
+ update(X86_CR4_OSXMMEXCPT, edx, bit(X86_FEATURE_XMM));
+ update(X86_CR4_VMXE, ecx, bit(X86_FEATURE_VMX));
+ update(X86_CR4_SMXE, ecx, bit(X86_FEATURE_SMX));
+ update(X86_CR4_PCIDE, ecx, bit(X86_FEATURE_PCID));
+ update(X86_CR4_OSXSAVE, ecx, bit(X86_FEATURE_XSAVE));
+
+ entry = kvm_find_cpuid_entry(vcpu, 0x7, 0);
+ update(X86_CR4_FSGSBASE, ebx, bit(X86_FEATURE_FSGSBASE));
+ update(X86_CR4_SMEP, ebx, bit(X86_FEATURE_SMEP));
+ update(X86_CR4_SMAP, ebx, bit(X86_FEATURE_SMAP));
+ update(X86_CR4_PKE, ecx, bit(X86_FEATURE_PKU));
+ /* TODO: Use X86_CR4_UMIP and X86_FEATURE_UMIP macros */
+ update(bit(11), ecx, bit(2));
+
+#undef update
+}
+
static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -9621,6 +9666,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
else
to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &=
~FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
+
+ if (nested_vmx_allowed(vcpu))
+ nested_vmx_cr4_fixed1_update(vcpu);
}
static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
--
2.8.0.rc3.226.g39d4020
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.
This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.
Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.
Signed-off-by: David Matlack <[email protected]>
---
arch/x86/kvm/vmx.c | 68 ++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 48 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6ec3832..a2a5ad8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4138,6 +4138,45 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
(unsigned long *)&vcpu->arch.regs_dirty);
}
+static bool fixed_bits_valid(u64 val, u64 fixed0, u64 fixed1)
+{
+ return ((val & fixed0) == fixed0) && ((~val & ~fixed1) == ~fixed1);
+}
+
+static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
+ u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+ if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
+ SECONDARY_EXEC_UNRESTRICTED_GUEST &&
+ nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
+ fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);
+
+ return fixed_bits_valid(val, fixed0, fixed1);
+}
+
+static bool nested_host_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
+ u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
+
+ return fixed_bits_valid(val, fixed0, fixed1);
+}
+
+static bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed0;
+ u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr4_fixed1;
+
+ return fixed_bits_valid(val, fixed0, fixed1);
+}
+
+/* No difference in the restrictions on guest and host CR4 in VMX operation. */
+#define nested_guest_cr4_valid nested_cr4_valid
+#define nested_host_cr4_valid nested_cr4_valid
+
static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
@@ -4266,8 +4305,8 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
if (!nested_vmx_allowed(vcpu))
return 1;
}
- if (to_vmx(vcpu)->nested.vmxon &&
- ((cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON))
+
+ if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
return 1;
vcpu->arch.cr4 = cr4;
@@ -5886,18 +5925,6 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xc1;
}
-static bool nested_cr0_valid(struct kvm_vcpu *vcpu, unsigned long val)
-{
- unsigned long always_on = VMXON_CR0_ALWAYSON;
- struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
-
- if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
- SECONDARY_EXEC_UNRESTRICTED_GUEST &&
- nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
- always_on &= ~(X86_CR0_PE | X86_CR0_PG);
- return (val & always_on) == always_on;
-}
-
/* called to set cr0 as appropriate for a mov-to-cr0 exit. */
static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
{
@@ -5916,7 +5943,7 @@ static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
val = (val & ~vmcs12->cr0_guest_host_mask) |
(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask);
- if (!nested_cr0_valid(vcpu, val))
+ if (!nested_guest_cr0_valid(vcpu, val))
return 1;
if (kvm_set_cr0(vcpu, val))
@@ -5925,8 +5952,9 @@ static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
return 0;
} else {
if (to_vmx(vcpu)->nested.vmxon &&
- ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
+ !nested_host_cr0_valid(vcpu, val))
return 1;
+
return kvm_set_cr0(vcpu, val);
}
}
@@ -10472,15 +10500,15 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
return 1;
}
- if (((vmcs12->host_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
- ((vmcs12->host_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+ if (!nested_host_cr0_valid(vcpu, vmcs12->host_cr0) ||
+ !nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) {
nested_vmx_failValid(vcpu,
VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
return 1;
}
- if (!nested_cr0_valid(vcpu, vmcs12->guest_cr0) ||
- ((vmcs12->guest_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+ if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
+ !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) {
nested_vmx_entry_failure(vcpu, vmcs12,
EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
return 1;
--
2.8.0.rc3.226.g39d4020
vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current
VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry
is more faithful to VMCS12.
This patch correctly causes VM-entry to fail when "IA-32e mode guest" is
1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and
"IA-32e mode guest" would silently be disabled by KVM.
Signed-off-by: David Matlack <[email protected]>
---
arch/x86/kvm/vmx.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ac5d9c0..86235fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10418,15 +10418,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
nested_ept_init_mmu_context(vcpu);
}
- if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
- vcpu->arch.efer = vmcs12->guest_ia32_efer;
- else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
- vcpu->arch.efer |= (EFER_LMA | EFER_LME);
- else
- vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
- /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
- vmx_set_efer(vcpu, vcpu->arch.efer);
-
/*
* This sets GUEST_CR0 to vmcs12->guest_cr0, with possibly a modified
* TS bit (for lazy fpu) and bits which we consider mandatory enabled.
@@ -10441,6 +10432,15 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_set_cr4(vcpu, vmcs12->guest_cr4);
vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
+ if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
+ vcpu->arch.efer = vmcs12->guest_ia32_efer;
+ else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
+ vcpu->arch.efer |= (EFER_LMA | EFER_LME);
+ else
+ vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
+ /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
+ vmx_set_efer(vcpu, vcpu->arch.efer);
+
/* shadow page tables on either EPT or shadow page tables */
kvm_set_cr3(vcpu, vmcs12->guest_cr3);
kvm_mmu_reset_context(vcpu);
--
2.8.0.rc3.226.g39d4020
> Set MSR_IA32_CR{0,4}_FIXED1 to match the CPU's MSRs.
>
> In addition, MSR_IA32_CR4_FIXED1 should reflect the available CR4 bits
> according to CPUID. Whenever guest CPUID is updated by userspace,
> regenerate MSR_IA32_CR4_FIXED1 to match it.
>
> Signed-off-by: David Matlack <[email protected]>
Oh, I thought userspace would do that! Doing it in KVM is fine as well,
but then do we need to give userspace access to CR{0,4}_FIXED{0,1} at all?
Paolo
> ---
> Note: "x86/cpufeature: Add User-Mode Instruction Prevention definitions" has
> not hit kvm/master yet so the macros for X86_CR4_UMIP and X86_FEATURE_UMIP
> are not available.
>
> arch/x86/kvm/vmx.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a2a5ad8..ac5d9c0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2852,16 +2852,18 @@ static void nested_vmx_setup_ctls_msrs(struct
> vcpu_vmx *vmx)
> vmx->nested.nested_vmx_basic |= VMX_BASIC_INOUT;
>
> /*
> - * These MSRs specify bits which the guest must keep fixed (on or off)
> + * These MSRs specify bits which the guest must keep fixed on
> * while L1 is in VMXON mode (in L1's root mode, or running an L2).
> * We picked the standard core2 setting.
> */
> #define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
> #define VMXON_CR4_ALWAYSON X86_CR4_VMXE
> vmx->nested.nested_vmx_cr0_fixed0 = VMXON_CR0_ALWAYSON;
> - vmx->nested.nested_vmx_cr0_fixed1 = -1ULL;
> vmx->nested.nested_vmx_cr4_fixed0 = VMXON_CR4_ALWAYSON;
> - vmx->nested.nested_vmx_cr4_fixed1 = -1ULL;
> +
> + /* These MSRs specify bits which the guest must keep fixed off. */
> + rdmsrl(MSR_IA32_VMX_CR0_FIXED1, vmx->nested.nested_vmx_cr0_fixed1);
> + rdmsrl(MSR_IA32_VMX_CR4_FIXED1, vmx->nested.nested_vmx_cr4_fixed1);
>
> /* highest index: VMX_PREEMPTION_TIMER_VALUE */
> vmx->nested.nested_vmx_vmcs_enum = 0x2e;
> @@ -9580,6 +9582,49 @@ static void vmcs_set_secondary_exec_control(u32
> new_ctl)
> (new_ctl & ~mask) | (cur_ctl & mask));
> }
>
> +/*
> + * Generate MSR_IA32_VMX_CR4_FIXED1 according to CPUID. Only set bits
> + * (indicating "allowed-1") if they are supported in the guest's CPUID.
> + */
> +static void nested_vmx_cr4_fixed1_update(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> + struct kvm_cpuid_entry2 *entry;
> +
> +#define update(_cr4_mask, _reg, _cpuid_mask) do { \
> + if (entry && (entry->_reg & (_cpuid_mask))) \
> + vmx->nested.nested_vmx_cr4_fixed1 |= (_cr4_mask); \
> +} while (0)
> +
> + vmx->nested.nested_vmx_cr4_fixed1 = X86_CR4_PCE;
> +
> + entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
> + update(X86_CR4_VME, edx, bit(X86_FEATURE_VME));
> + update(X86_CR4_PVI, edx, bit(X86_FEATURE_VME));
> + update(X86_CR4_TSD, edx, bit(X86_FEATURE_TSC));
> + update(X86_CR4_DE, edx, bit(X86_FEATURE_DE));
> + update(X86_CR4_PSE, edx, bit(X86_FEATURE_PSE));
> + update(X86_CR4_PAE, edx, bit(X86_FEATURE_PAE));
> + update(X86_CR4_MCE, edx, bit(X86_FEATURE_MCE));
> + update(X86_CR4_PGE, edx, bit(X86_FEATURE_PGE));
> + update(X86_CR4_OSFXSR, edx, bit(X86_FEATURE_FXSR));
> + update(X86_CR4_OSXMMEXCPT, edx, bit(X86_FEATURE_XMM));
> + update(X86_CR4_VMXE, ecx, bit(X86_FEATURE_VMX));
> + update(X86_CR4_SMXE, ecx, bit(X86_FEATURE_SMX));
> + update(X86_CR4_PCIDE, ecx, bit(X86_FEATURE_PCID));
> + update(X86_CR4_OSXSAVE, ecx, bit(X86_FEATURE_XSAVE));
> +
> + entry = kvm_find_cpuid_entry(vcpu, 0x7, 0);
> + update(X86_CR4_FSGSBASE, ebx, bit(X86_FEATURE_FSGSBASE));
> + update(X86_CR4_SMEP, ebx, bit(X86_FEATURE_SMEP));
> + update(X86_CR4_SMAP, ebx, bit(X86_FEATURE_SMAP));
> + update(X86_CR4_PKE, ecx, bit(X86_FEATURE_PKU));
> + /* TODO: Use X86_CR4_UMIP and X86_FEATURE_UMIP macros */
> + update(bit(11), ecx, bit(2));
> +
> +#undef update
> +}
> +
> static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpuid_entry2 *best;
> @@ -9621,6 +9666,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> else
> to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &=
> ~FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
> +
> + if (nested_vmx_allowed(vcpu))
> + nested_vmx_cr4_fixed1_update(vcpu);
> }
>
> static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2
> *entry)
> --
> 2.8.0.rc3.226.g39d4020
>
>
On 23/11/2016 02:14, David Matlack wrote:
> +static bool fixed_bits_valid(u64 val, u64 fixed0, u64 fixed1)
> +{
> + return ((val & fixed0) == fixed0) && ((~val & ~fixed1) == ~fixed1);
> +}
> +
This is the same as vmx_control_verify (except with u64 arguments
instead of u32).
Paolo
On 23/11/2016 02:14, David Matlack wrote:
> The VMX capability MSRs advertise the set of features the KVM virtual
> CPU can support. This set of features vary across different host CPUs
> and KVM versions. This patch aims to addresses both sources of
> differences, allowing VMs to be migrated across CPUs and KVM versions
> without guest-visible changes to these MSRs. Note that cross-KVM-
> version migration is only supported from this point forward.
>
> When the VMX capability MSRs are restored, they are audited to check
> that the set of features advertised are a subset of what KVM and the
> CPU support.
>
> Since the VMX capability MSRs are read-only, they do not need to be on
> the default MSR save/restore lists. The userspace hypervisor can set
> the values of these MSRs or read them from KVM at VCPU creation time,
> and restore the same value after every save/restore.
>
> Signed-off-by: David Matlack <[email protected]>
> ---
> arch/x86/include/asm/vmx.h | 31 +++++
> arch/x86/kvm/vmx.c | 317 +++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 324 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index a002b07..a4ca897 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -25,6 +25,7 @@
> #define VMX_H
>
>
> +#include <linux/bitops.h>
> #include <linux/types.h>
> #include <uapi/asm/vmx.h>
>
> @@ -110,6 +111,36 @@
> #define VMX_MISC_SAVE_EFER_LMA 0x00000020
> #define VMX_MISC_ACTIVITY_HLT 0x00000040
>
> +static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
> +{
> + return vmx_basic & GENMASK_ULL(30, 0);
> +}
> +
> +static inline u32 vmx_basic_vmcs_size(u64 vmx_basic)
> +{
> + return (vmx_basic & GENMASK_ULL(44, 32)) >> 32;
> +}
> +
> +static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
> +{
> + return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
> +}
> +
> +static inline int vmx_misc_cr3_count(u64 vmx_misc)
> +{
> + return (vmx_misc & GENMASK_ULL(24, 16)) >> 16;
> +}
> +
> +static inline int vmx_misc_max_msr(u64 vmx_misc)
> +{
> + return (vmx_misc & GENMASK_ULL(27, 25)) >> 25;
> +}
> +
> +static inline int vmx_misc_mseg_revid(u64 vmx_misc)
> +{
> + return (vmx_misc & GENMASK_ULL(63, 32)) >> 32;
> +}
> +
> /* VMCS Encodings */
> enum vmcs_field {
> VIRTUAL_PROCESSOR_ID = 0x00000000,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5382b82..6ec3832 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -463,6 +463,12 @@ struct nested_vmx {
> u32 nested_vmx_misc_high;
> u32 nested_vmx_ept_caps;
> +/*
> + * Called when userspace is restoring VMX MSRs.
> + *
> + * Returns 0 on success, non-0 otherwise.
> + */
> +static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> switch (msr_index) {
> case MSR_IA32_VMX_BASIC:
> + return vmx_restore_vmx_basic(vmx, data);
> + case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> + case MSR_IA32_VMX_PINBASED_CTLS:
> + case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
> + case MSR_IA32_VMX_PROCBASED_CTLS:
> + case MSR_IA32_VMX_TRUE_EXIT_CTLS:
> + case MSR_IA32_VMX_EXIT_CTLS:
> + case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
> + case MSR_IA32_VMX_ENTRY_CTLS:
PINBASED_CTLS, PROCBASED_CTLS, EXIT_CTLS and ENTRY_CTLS can be derived
from their "true" counterparts, so I think it's better to remove the
"non-true" ones from struct nested_vmx (and/or add the "true" ones when
missing) and make them entirely computed. But it can be done on top.
Paolo
> + case MSR_IA32_VMX_PROCBASED_CTLS2:
> + return vmx_restore_control_msr(vmx, msr_index, data);
> + case MSR_IA32_VMX_MISC:
> + return vmx_restore_vmx_misc(vmx, data);
> + case MSR_IA32_VMX_CR0_FIXED0:
> + case MSR_IA32_VMX_CR4_FIXED0:
> + return vmx_restore_fixed0_msr(vmx, msr_index, data);
> + case MSR_IA32_VMX_CR0_FIXED1:
> + case MSR_IA32_VMX_CR4_FIXED1:
> + return vmx_restore_fixed1_msr(vmx, msr_index, data);
> + case MSR_IA32_VMX_EPT_VPID_CAP:
> + return vmx_restore_vmx_ept_vpid_cap(vmx, data);
> + case MSR_IA32_VMX_VMCS_ENUM:
> + vmx->nested.nested_vmx_vmcs_enum = data;
> + return 0;
> + default:
> /*
> - * This MSR reports some information about VMX support. We
> - * should return information about the VMX we emulate for the
> - * guest, and the VMCS structure we give it - not about the
> - * VMX support of the underlying hardware.
> + * The rest of the VMX capability MSRs do not support restore.
> */
> - *pdata = VMCS12_REVISION | VMX_BASIC_TRUE_CTLS |
> - ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) |
> - (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT);
> - if (cpu_has_vmx_basic_inout())
> - *pdata |= VMX_BASIC_INOUT;
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +/* Returns 0 on success, non-0 otherwise. */
> +static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> +{
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> + switch (msr_index) {
> + case MSR_IA32_VMX_BASIC:
> + *pdata = vmx->nested.nested_vmx_basic;
> break;
> case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> case MSR_IA32_VMX_PINBASED_CTLS:
> @@ -2904,27 +3176,20 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> vmx->nested.nested_vmx_misc_low,
> vmx->nested.nested_vmx_misc_high);
> break;
> - /*
> - * These MSRs specify bits which the guest must keep fixed (on or off)
> - * while L1 is in VMXON mode (in L1's root mode, or running an L2).
> - * We picked the standard core2 setting.
> - */
> -#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
> -#define VMXON_CR4_ALWAYSON X86_CR4_VMXE
> case MSR_IA32_VMX_CR0_FIXED0:
> - *pdata = VMXON_CR0_ALWAYSON;
> + *pdata = vmx->nested.nested_vmx_cr0_fixed0;
> break;
> case MSR_IA32_VMX_CR0_FIXED1:
> - *pdata = -1ULL;
> + *pdata = vmx->nested.nested_vmx_cr0_fixed1;
> break;
> case MSR_IA32_VMX_CR4_FIXED0:
> - *pdata = VMXON_CR4_ALWAYSON;
> + *pdata = vmx->nested.nested_vmx_cr4_fixed0;
> break;
> case MSR_IA32_VMX_CR4_FIXED1:
> - *pdata = -1ULL;
> + *pdata = vmx->nested.nested_vmx_cr4_fixed1;
> break;
> case MSR_IA32_VMX_VMCS_ENUM:
> - *pdata = 0x2e; /* highest index: VMX_PREEMPTION_TIMER_VALUE */
> + *pdata = vmx->nested.nested_vmx_vmcs_enum;
> break;
> case MSR_IA32_VMX_PROCBASED_CTLS2:
> *pdata = vmx_control_msr(
> @@ -3107,7 +3372,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vmx_leave_nested(vcpu);
> break;
> case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
> - return 1; /* they are read-only */
> + if (!msr_info->host_initiated)
> + return 1; /* they are read-only */
> + if (!nested_vmx_allowed(vcpu))
> + return 1;
> + return vmx_set_vmx_msr(vcpu, msr_index, data);
> case MSR_IA32_XSS:
> if (!vmx_xsaves_supported())
> return 1;
>
On 23/11/2016 02:14, David Matlack wrote:
> This patchset includes v2 of "KVM: nVMX: support restore of VMX capability
> MSRs" (patch 1) as well as some additional related patches that came up
> while preparing v2.
>
> Patches 2 and 3 make KVM's emulation of MSR_IA32_VMX_CR{0,4}_FIXED1 more
> accurate. Patch 4 fixes a bug in emulated VM-entry that came up when
> testing patches 2 and 3.
>
> Changes since v1:
> * Support restoring less-capable versions of MSR_IA32_VMX_BASIC,
> MSR_IA32_VMX_CR{0,4}_FIXED{0,1}.
> * Include VMX_INS_OUTS in MSR_IA32_VMX_BASIC initial value.
>
> David Matlack (4):
> KVM: nVMX: support restore of VMX capability MSRs
> KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
> KVM: nVMX: accurate emulation of MSR_IA32_CR{0,4}_FIXED1
> KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
>
> arch/x86/include/asm/vmx.h | 31 ++++
> arch/x86/kvm/vmx.c | 443 +++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 421 insertions(+), 53 deletions(-)
>
The main question is whether patches 2-3 actually make
vmx_restore_fixed0/1_msr unnecessary, otherwise looks great.
It would be nice to have a testcase for patch 4, since it could go in
independently.
Paolo
On Wed, Nov 23, 2016 at 1:06 AM, Paolo Bonzini <[email protected]> wrote:
>
>> Set MSR_IA32_CR{0,4}_FIXED1 to match the CPU's MSRs.
>>
>> In addition, MSR_IA32_CR4_FIXED1 should reflect the available CR4 bits
>> according to CPUID. Whenever guest CPUID is updated by userspace,
>> regenerate MSR_IA32_CR4_FIXED1 to match it.
>>
>> Signed-off-by: David Matlack <[email protected]>
>
> Oh, I thought userspace would do that! Doing it in KVM is fine as well,
> but then do we need to give userspace access to CR{0,4}_FIXED{0,1} at all?
I think it should be safe for userspace to skip restoring CR4_FIXED1,
since it is 100% generated based on CPUID. But I'd prefer to keep it
accessible from userspace, for consistency with the other VMX MSRs and
for flexibility. The auditing should ensure userspace doesn't restore
a CR4_FIXED1 that is inconsistent with CPUID.
Userspace should restore CR0_FIXED1 in case future CPUs change which
bits of CR0 are valid in VMX operation. Userspace should also restore
CR{0,4}_FIXED0 so we have the flexibility to change the defaults in
KVM. Both of these situations seem unlikely but we might as well play
it safe, the cost is small.
>
> Paolo
>
>> ---
>> Note: "x86/cpufeature: Add User-Mode Instruction Prevention definitions" has
>> not hit kvm/master yet so the macros for X86_CR4_UMIP and X86_FEATURE_UMIP
>> are not available.
>>
>> arch/x86/kvm/vmx.c | 54
>> +++++++++++++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 51 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index a2a5ad8..ac5d9c0 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2852,16 +2852,18 @@ static void nested_vmx_setup_ctls_msrs(struct
>> vcpu_vmx *vmx)
>> vmx->nested.nested_vmx_basic |= VMX_BASIC_INOUT;
>>
>> /*
>> - * These MSRs specify bits which the guest must keep fixed (on or off)
>> + * These MSRs specify bits which the guest must keep fixed on
>> * while L1 is in VMXON mode (in L1's root mode, or running an L2).
>> * We picked the standard core2 setting.
>> */
>> #define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE)
>> #define VMXON_CR4_ALWAYSON X86_CR4_VMXE
>> vmx->nested.nested_vmx_cr0_fixed0 = VMXON_CR0_ALWAYSON;
>> - vmx->nested.nested_vmx_cr0_fixed1 = -1ULL;
>> vmx->nested.nested_vmx_cr4_fixed0 = VMXON_CR4_ALWAYSON;
>> - vmx->nested.nested_vmx_cr4_fixed1 = -1ULL;
>> +
>> + /* These MSRs specify bits which the guest must keep fixed off. */
>> + rdmsrl(MSR_IA32_VMX_CR0_FIXED1, vmx->nested.nested_vmx_cr0_fixed1);
>> + rdmsrl(MSR_IA32_VMX_CR4_FIXED1, vmx->nested.nested_vmx_cr4_fixed1);
>>
>> /* highest index: VMX_PREEMPTION_TIMER_VALUE */
>> vmx->nested.nested_vmx_vmcs_enum = 0x2e;
>> @@ -9580,6 +9582,49 @@ static void vmcs_set_secondary_exec_control(u32
>> new_ctl)
>> (new_ctl & ~mask) | (cur_ctl & mask));
>> }
>>
>> +/*
>> + * Generate MSR_IA32_VMX_CR4_FIXED1 according to CPUID. Only set bits
>> + * (indicating "allowed-1") if they are supported in the guest's CPUID.
>> + */
>> +static void nested_vmx_cr4_fixed1_update(struct kvm_vcpu *vcpu)
>> +{
>> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>> + struct kvm_cpuid_entry2 *entry;
>> +
>> +#define update(_cr4_mask, _reg, _cpuid_mask) do { \
>> + if (entry && (entry->_reg & (_cpuid_mask))) \
>> + vmx->nested.nested_vmx_cr4_fixed1 |= (_cr4_mask); \
>> +} while (0)
>> +
>> + vmx->nested.nested_vmx_cr4_fixed1 = X86_CR4_PCE;
>> +
>> + entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
>> + update(X86_CR4_VME, edx, bit(X86_FEATURE_VME));
>> + update(X86_CR4_PVI, edx, bit(X86_FEATURE_VME));
>> + update(X86_CR4_TSD, edx, bit(X86_FEATURE_TSC));
>> + update(X86_CR4_DE, edx, bit(X86_FEATURE_DE));
>> + update(X86_CR4_PSE, edx, bit(X86_FEATURE_PSE));
>> + update(X86_CR4_PAE, edx, bit(X86_FEATURE_PAE));
>> + update(X86_CR4_MCE, edx, bit(X86_FEATURE_MCE));
>> + update(X86_CR4_PGE, edx, bit(X86_FEATURE_PGE));
>> + update(X86_CR4_OSFXSR, edx, bit(X86_FEATURE_FXSR));
>> + update(X86_CR4_OSXMMEXCPT, edx, bit(X86_FEATURE_XMM));
>> + update(X86_CR4_VMXE, ecx, bit(X86_FEATURE_VMX));
>> + update(X86_CR4_SMXE, ecx, bit(X86_FEATURE_SMX));
>> + update(X86_CR4_PCIDE, ecx, bit(X86_FEATURE_PCID));
>> + update(X86_CR4_OSXSAVE, ecx, bit(X86_FEATURE_XSAVE));
>> +
>> + entry = kvm_find_cpuid_entry(vcpu, 0x7, 0);
>> + update(X86_CR4_FSGSBASE, ebx, bit(X86_FEATURE_FSGSBASE));
>> + update(X86_CR4_SMEP, ebx, bit(X86_FEATURE_SMEP));
>> + update(X86_CR4_SMAP, ebx, bit(X86_FEATURE_SMAP));
>> + update(X86_CR4_PKE, ecx, bit(X86_FEATURE_PKU));
>> + /* TODO: Use X86_CR4_UMIP and X86_FEATURE_UMIP macros */
>> + update(bit(11), ecx, bit(2));
>> +
>> +#undef update
>> +}
>> +
>> static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_cpuid_entry2 *best;
>> @@ -9621,6 +9666,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>> else
>> to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &=
>> ~FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
>> +
>> + if (nested_vmx_allowed(vcpu))
>> + nested_vmx_cr4_fixed1_update(vcpu);
>> }
>>
>> static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2
>> *entry)
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>>
On 23/11/2016 20:16, David Matlack wrote:
> > Oh, I thought userspace would do that! Doing it in KVM is fine as well,
> > but then do we need to give userspace access to CR{0,4}_FIXED{0,1} at all?
>
> I think it should be safe for userspace to skip restoring CR4_FIXED1,
> since it is 100% generated based on CPUID. But I'd prefer to keep it
> accessible from userspace, for consistency with the other VMX MSRs and
> for flexibility. The auditing should ensure userspace doesn't restore
> a CR4_FIXED1 that is inconsistent with CPUID.
Or would it just allow userspace to put anything into it, even if it's
inconsistent with CPUID, as long as it's consistent with the host?
> Userspace should restore CR0_FIXED1 in case future CPUs change which
> bits of CR0 are valid in VMX operation. Userspace should also restore
> CR{0,4}_FIXED0 so we have the flexibility to change the defaults in
> KVM. Both of these situations seem unlikely but we might as well play
> it safe, the cost is small.
I disagree, there is always a cost. Besides the fact that it's
unlikely that there'll be any future CR0 bits at all, any changes would
most likely be keyed by a new CPUID bit (the same as CR4) or execution
control (the same as unrestricted guest).
In the end, since we assume that userspace (any) has no idea of what to
do with it, I see no good reason to make the MSRs available.
Paolo
On Wed, Nov 23, 2016 at 11:24 AM, Paolo Bonzini <[email protected]> wrote:
>
>
> On 23/11/2016 20:16, David Matlack wrote:
>> > Oh, I thought userspace would do that! Doing it in KVM is fine as well,
>> > but then do we need to give userspace access to CR{0,4}_FIXED{0,1} at all?
>>
>> I think it should be safe for userspace to skip restoring CR4_FIXED1,
>> since it is 100% generated based on CPUID. But I'd prefer to keep it
>> accessible from userspace, for consistency with the other VMX MSRs and
>> for flexibility. The auditing should ensure userspace doesn't restore
>> a CR4_FIXED1 that is inconsistent with CPUID.
>
> Or would it just allow userspace to put anything into it, even if it's
> inconsistent with CPUID, as long as it's consistent with the host?
It would not allow anything inconsistent with guest CPUID. The
auditing on restore of CR4_FIXED1 compares the new value with
vmx->nested.nested_vmx_cr4_fixed1, which is updated as part of setting
the guest's CPUID.
>
>> Userspace should restore CR0_FIXED1 in case future CPUs change which
>> bits of CR0 are valid in VMX operation. Userspace should also restore
>> CR{0,4}_FIXED0 so we have the flexibility to change the defaults in
>> KVM. Both of these situations seem unlikely but we might as well play
>> it safe, the cost is small.
>
> I disagree, there is always a cost. Besides the fact that it's
> unlikely that there'll be any future CR0 bits at all, any changes would
> most likely be keyed by a new CPUID bit (the same as CR4) or execution
> control (the same as unrestricted guest).
That's true. So CR0_FIXED1 would not need to be accessible from
userspace either. This patch would need to be a little different then:
vmx_cpuid_update should also update vmx->nested.nested_vmx_cr0_fixed1
to 0xffffffff.
A downside of this scheme is we'd have to remember to update
nested_vmx_cr4_fixed1_update() before giving VMs new CPUID bits. If we
forget, a VM could end up with different values for CR{0,4}_FIXED0 for
the same CPUID depending on which version of KVM you're running on.
Hm, now I'm thinking you were right in the beginning. Userspace should
generate CR{0,4}_FIXED1, not the kernel. And KVM should allow
userspace to save/restore them.
>
> In the end, since we assume that userspace (any) has no idea of what to
> do with it, I see no good reason to make the MSRs available.
>
> Paolo
On 23/11/2016 23:07, David Matlack wrote:
> A downside of this scheme is we'd have to remember to update
> nested_vmx_cr4_fixed1_update() before giving VMs new CPUID bits. If we
> forget, a VM could end up with different values for CR{0,4}_FIXED0 for
> the same CPUID depending on which version of KVM you're running on.
If userspace doesn't obey KVM_GET_SUPPORTED_CPUID, all bets are off
anyway, so I don't think it's a big deal. However, if you want to make
it generated by userspace, that would be fine as well! That would
simply entail removing this patch, wouldn't it?
Paolo
> Hm, now I'm thinking you were right in the beginning. Userspace should
> generate CR{0,4}_FIXED1, not the kernel. And KVM should allow
> userspace to save/restore them.
On Wed, Nov 23, 2016 at 2:11 PM, Paolo Bonzini <[email protected]> wrote:
> On 23/11/2016 23:07, David Matlack wrote:
>> A downside of this scheme is we'd have to remember to update
>> nested_vmx_cr4_fixed1_update() before giving VMs new CPUID bits. If we
>> forget, a VM could end up with different values for CR{0,4}_FIXED0 for
>> the same CPUID depending on which version of KVM you're running on.
>
> If userspace doesn't obey KVM_GET_SUPPORTED_CPUID, all bets are off
> anyway, so I don't think it's a big deal. However, if you want to make
> it generated by userspace, that would be fine as well!
Ok let's generate them in userspace.
> That would simply entail removing this patch, wouldn't it?
Mostly. The first half of the patch (initialize from host MSRs) should stay.
On Wed, Nov 23, 2016 at 3:45 AM, Paolo Bonzini <[email protected]> wrote:
>
> On 23/11/2016 02:14, David Matlack wrote:
>> This patchset includes v2 of "KVM: nVMX: support restore of VMX capability
>> MSRs" (patch 1) as well as some additional related patches that came up
>> while preparing v2.
>>
>> Patches 2 and 3 make KVM's emulation of MSR_IA32_VMX_CR{0,4}_FIXED1 more
>> accurate. Patch 4 fixes a bug in emulated VM-entry that came up when
>> testing patches 2 and 3.
>>
>> Changes since v1:
>> * Support restoring less-capable versions of MSR_IA32_VMX_BASIC,
>> MSR_IA32_VMX_CR{0,4}_FIXED{0,1}.
>> * Include VMX_INS_OUTS in MSR_IA32_VMX_BASIC initial value.
>>
>> David Matlack (4):
>> KVM: nVMX: support restore of VMX capability MSRs
>> KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
>> KVM: nVMX: accurate emulation of MSR_IA32_CR{0,4}_FIXED1
>> KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry
>>
>> arch/x86/include/asm/vmx.h | 31 ++++
>> arch/x86/kvm/vmx.c | 443 +++++++++++++++++++++++++++++++++++++++------
>> 2 files changed, 421 insertions(+), 53 deletions(-)
>>
>
> The main question is whether patches 2-3 actually make
> vmx_restore_fixed0/1_msr unnecessary, otherwise looks great.
>
> It would be nice to have a testcase for patch 4, since it could go in
> independently.
I've got a kvm-unit-test testcase for patches 2-4 but unfortunately it
depends on changes we've made internally to the kvm-unit-tests, and
we're a bit behind on getting those upstreamed.
>
> Paolo
On Wed, Nov 23, 2016 at 3:31 AM, Paolo Bonzini <[email protected]> wrote:
> On 23/11/2016 02:14, David Matlack wrote:
>> +static bool fixed_bits_valid(u64 val, u64 fixed0, u64 fixed1)
>> +{
>> + return ((val & fixed0) == fixed0) && ((~val & ~fixed1) == ~fixed1);
>> +}
>> +
>
> This is the same as vmx_control_verify (except with u64 arguments
> instead of u32).
Good point. I'll remove this duplication in v3.
On Wed, Nov 23, 2016 at 3:44 AM, Paolo Bonzini <[email protected]> wrote:
> On 23/11/2016 02:14, David Matlack wrote:
>> switch (msr_index) {
>> case MSR_IA32_VMX_BASIC:
>> + return vmx_restore_vmx_basic(vmx, data);
>> + case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
>> + case MSR_IA32_VMX_PINBASED_CTLS:
>> + case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
>> + case MSR_IA32_VMX_PROCBASED_CTLS:
>> + case MSR_IA32_VMX_TRUE_EXIT_CTLS:
>> + case MSR_IA32_VMX_EXIT_CTLS:
>> + case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>> + case MSR_IA32_VMX_ENTRY_CTLS:
>
> PINBASED_CTLS, PROCBASED_CTLS, EXIT_CTLS and ENTRY_CTLS can be derived
> from their "true" counterparts, so I think it's better to remove the
> "non-true" ones from struct nested_vmx (and/or add the "true" ones when
> missing) and make them entirely computed. But it can be done on top.
Good point. And that would mean userspace does not need to restore the
non-true MSRs, right? KVM does not emulate MSR_IA32_VMX_BASIC[55]=0,
and will probably never want to.
On Wed, Nov 23, 2016 at 3:28 PM, David Matlack <[email protected]> wrote:
> On Wed, Nov 23, 2016 at 2:11 PM, Paolo Bonzini <[email protected]> wrote:
>> On 23/11/2016 23:07, David Matlack wrote:
>>> A downside of this scheme is we'd have to remember to update
>>> nested_vmx_cr4_fixed1_update() before giving VMs new CPUID bits. If we
>>> forget, a VM could end up with different values for CR{0,4}_FIXED0 for
>>> the same CPUID depending on which version of KVM you're running on.
I've realized my concern here doesn't make sense. Such a VM would
likely fail to enter VMX operation, or #GP (unexpectedly) at some
point later. Linux, for example, does not appear to consult
MSR_IA32_VMX_CR4_FIXED1 when determining which bits of CR4 it can use
(regardless of whether it is in VMX operation or not).
>>
>> If userspace doesn't obey KVM_GET_SUPPORTED_CPUID, all bets are off
>> anyway, so I don't think it's a big deal. However, if you want to make
>> it generated by userspace, that would be fine as well!
>
> Ok let's generate them in userspace.
I'm more inclined to generate them in the kernel, given the above.
On 28/11/2016 22:11, David Matlack wrote:
> > PINBASED_CTLS, PROCBASED_CTLS, EXIT_CTLS and ENTRY_CTLS can be derived
> > from their "true" counterparts, so I think it's better to remove the
> > "non-true" ones from struct nested_vmx (and/or add the "true" ones when
> > missing) and make them entirely computed. But it can be done on top.
>
> Good point. And that would mean userspace does not need to restore the
> non-true MSRs, right?
Yes, sorry for being a bit too concise. :)
> KVM does not emulate MSR_IA32_VMX_BASIC[55]=0,
> and will probably never want to.
That's a separate question, MSR_IA32_VMX_BASIC[55]=0 basically means
that the "true" capabilities are the same as the "default" capabilities.
If userspace wanted to set it that way, KVM right now would not hide
the "true" capability MSR, but on the other hand the nested hypervisor
should not even notice the difference.
Paolo
On Mon, Nov 28, 2016 at 2:48 PM, Paolo Bonzini <[email protected]> wrote:
> On 28/11/2016 22:11, David Matlack wrote:
>> > PINBASED_CTLS, PROCBASED_CTLS, EXIT_CTLS and ENTRY_CTLS can be derived
>> > from their "true" counterparts, so I think it's better to remove the
>> > "non-true" ones from struct nested_vmx (and/or add the "true" ones when
>> > missing) and make them entirely computed. But it can be done on top.
>>
>> Good point. And that would mean userspace does not need to restore the
>> non-true MSRs, right?
>
> Yes, sorry for being a bit too concise. :)
I'll include this cleanup in the next version of the patchset since it
affects which MSRs userspace will restore. It looks like a pretty
simple patch.
>
>> KVM does not emulate MSR_IA32_VMX_BASIC[55]=0,
>> and will probably never want to.
>
> That's a separate question, MSR_IA32_VMX_BASIC[55]=0 basically means
> that the "true" capabilities are the same as the "default" capabilities.
> If userspace wanted to set it that way, KVM right now would not hide
> the "true" capability MSR, but on the other hand the nested hypervisor
> should not even notice the difference.
KVM would also need to use the non-true MSR in place of the true MSRs
when checking VMCS12 during VM-entry.
>
> Paolo
> On Mon, Nov 28, 2016 at 2:48 PM, Paolo Bonzini <[email protected]> wrote:
> > On 28/11/2016 22:11, David Matlack wrote:
> >> > PINBASED_CTLS, PROCBASED_CTLS, EXIT_CTLS and ENTRY_CTLS can be derived
> >> > from their "true" counterparts, so I think it's better to remove the
> >> > "non-true" ones from struct nested_vmx (and/or add the "true" ones when
> >> > missing) and make them entirely computed. But it can be done on top.
> >>
> >> Good point. And that would mean userspace does not need to restore the
> >> non-true MSRs, right?
> >
> > Yes, sorry for being a bit too concise. :)
>
> I'll include this cleanup in the next version of the patchset since it
> affects which MSRs userspace will restore. It looks like a pretty
> simple patch.
Don't bother removing the "non-true" registers from nested_vmx; you only
need to adjust the userspace API.
> >
> >> KVM does not emulate MSR_IA32_VMX_BASIC[55]=0,
> >> and will probably never want to.
> >
> > That's a separate question, MSR_IA32_VMX_BASIC[55]=0 basically means
> > that the "true" capabilities are the same as the "default" capabilities.
> > If userspace wanted to set it that way, KVM right now would not hide
> > the "true" capability MSR, but on the other hand the nested hypervisor
> > should not even notice the difference.
>
> KVM would also need to use the non-true MSR in place of the true MSRs
> when checking VMCS12 during VM-entry.
It's not necessary, userspace would set the relevant bits to 1 in the true
MSRs, for both the low and high parts. If it doesn't, it's garbage in
garbage out.
Paolo
On Tue, Nov 29, 2016 at 12:01 AM, Paolo Bonzini <[email protected]> wrote:
>> On Mon, Nov 28, 2016 at 2:48 PM, Paolo Bonzini <[email protected]> wrote:
>> > On 28/11/2016 22:11, David Matlack wrote:
>> >> > PINBASED_CTLS, PROCBASED_CTLS, EXIT_CTLS and ENTRY_CTLS can be derived
>> >> > from their "true" counterparts, so I think it's better to remove the
>> >> > "non-true" ones from struct nested_vmx (and/or add the "true" ones when
>> >> > missing) and make them entirely computed. But it can be done on top.
>> >>
>> >> Good point. And that would mean userspace does not need to restore the
>> >> non-true MSRs, right?
>> >
>> > Yes, sorry for being a bit too concise. :)
>>
>> I'll include this cleanup in the next version of the patchset since it
>> affects which MSRs userspace will restore. It looks like a pretty
>> simple patch.
>
> Don't bother removing the "non-true" registers from nested_vmx; you only
> need to adjust the userspace API.
I already wrote the patch, so unless there's an argument against
removing them I'll include it in the next patchset. Thanks!
>
>> >
>> >> KVM does not emulate MSR_IA32_VMX_BASIC[55]=0,
>> >> and will probably never want to.
>> >
>> > That's a separate question, MSR_IA32_VMX_BASIC[55]=0 basically means
>> > that the "true" capabilities are the same as the "default" capabilities.
>> > If userspace wanted to set it that way, KVM right now would not hide
>> > the "true" capability MSR, but on the other hand the nested hypervisor
>> > should not even notice the difference.
>>
>> KVM would also need to use the non-true MSR in place of the true MSRs
>> when checking VMCS12 during VM-entry.
>
> It's not necessary, userspace would set the relevant bits to 1 in the true
> MSRs, for both the low and high parts. If it doesn't, it's garbage in
> garbage out.
>
> Paolo