2022-07-14 09:28:10

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 00/25] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs

Changes since v3 [Max]:
- Fix swapped encls_exiting_bitmap and host_ia32_perf_global_ctrl fields.
- Wipe 'hv_vcpu->cpuid_cache' with memset() on update.
- Add a comment explaining made-up HV_X64_NESTED_EVMCS1_2022_UPDATE name
and what this bit means.
- Add R-b tags.

Original description:

Enlightened VMCS v1 definition was updates to include fields for the
following features:
- PerfGlobalCtrl
- EnclsExitingBitmap
- TSC scaling
- GuestLbrCtl
- CET
- SSP
While the information is missing in the publicly available TLFS, the
updated definition comes with a new feature bit in CPUID.0x4000000A.EBX
(BIT 0). Use a made up HV_X64_NESTED_EVMCS1_2022_UPDATE name for it.

Add support for the new revision to KVM. SSP, CET and GuestLbrCtl
features are not currently supported by KVM.

While on it, implement Sean's idea to use vmcs_config for setting up
L1 VMX control MSRs instead of re-reading host MSRs.

Jim Mattson (1):
KVM: x86: VMX: Replace some Intel model numbers with mnemonics

Sean Christopherson (1):
KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not
setup

Vitaly Kuznetsov (23):
KVM: x86: hyper-v: Expose access to debug MSRs in the partition
privilege flags
x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields
KVM: nVMX: Support several new fields in eVMCSv1
KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf
KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields
KVM: selftests: Switch to updated eVMCSv1 definition
KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with
enlightened VMCS
KVM: selftests: Enable TSC scaling in evmcs selftest
KVM: VMX: Get rid of eVMCS specific VMX controls sanitization
KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config()
KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING in
setup_vmcs_config()
KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING
in setup_vmcs_config()
KVM: VMX: Extend VMX controls macro shenanigans
KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of
setup_vmcs_config()
KVM: VMX: Add missing VMEXIT controls to vmcs_config
KVM: VMX: Add missing CPU based VM execution controls to vmcs_config
KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of
setup_vmcs_config()
KVM: nVMX: Always set required-1 bits of pinbased_ctls to
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR
KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs
KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config
KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up
nested MSR

arch/x86/include/asm/hyperv-tlfs.h | 30 ++-
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/hyperv.c | 20 +-
arch/x86/kvm/vmx/capabilities.h | 14 +-
arch/x86/kvm/vmx/evmcs.c | 127 +++++++---
arch/x86/kvm/vmx/evmcs.h | 18 +-
arch/x86/kvm/vmx/nested.c | 70 ++++--
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 235 ++++++++----------
arch/x86/kvm/vmx/vmx.h | 116 +++++++++
include/asm-generic/hyperv-tlfs.h | 2 +
.../selftests/kvm/include/x86_64/evmcs.h | 45 +++-
.../selftests/kvm/include/x86_64/vmx.h | 2 +
.../testing/selftests/kvm/x86_64/evmcs_test.c | 31 ++-
14 files changed, 484 insertions(+), 230 deletions(-)

--
2.35.3


2022-07-14 09:42:14

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 20/25] KVM: x86: VMX: Replace some Intel model numbers with mnemonics

From: Jim Mattson <[email protected]>

Intel processor code names are more familiar to many readers than
their decimal model numbers.

Signed-off-by: Jim Mattson <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index eca6875d6732..2dff5b94c535 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2580,11 +2580,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
*/
if (boot_cpu_data.x86 == 0x6) {
switch (boot_cpu_data.x86_model) {
- case 26: /* AAK155 */
- case 30: /* AAP115 */
- case 37: /* AAT100 */
- case 44: /* BC86,AAY89,BD102 */
- case 46: /* BA97 */
+ case INTEL_FAM6_NEHALEM_EP: /* AAK155 */
+ case INTEL_FAM6_NEHALEM: /* AAP115 */
+ case INTEL_FAM6_WESTMERE: /* AAT100 */
+ case INTEL_FAM6_WESTMERE_EP: /* BC86,AAY89,BD102 */
+ case INTEL_FAM6_NEHALEM_EX: /* BA97 */
_vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
_vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
--
2.35.3

2022-07-14 09:44:38

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 12/25] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config()

VM_ENTRY_IA32E_MODE control is toggled dynamically by vmx_set_efer()
and setup_vmcs_config() doesn't check its existence. On the contrary,
nested_vmx_setup_ctls_msrs() doesn set it on x86_64. Add the missing
check and filter the bit out in vmx_vmentry_ctrl().

No (real) functional change intended as all existing CPUs supporting
long mode and VMX are supposed to have it.

Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dd905ad72637..1aaec4d19e1b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2610,6 +2610,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;

min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
+#ifdef CONFIG_X86_64
+ min |= VM_ENTRY_IA32E_MODE;
+#endif
opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
VM_ENTRY_LOAD_IA32_PAT |
VM_ENTRY_LOAD_IA32_EFER |
@@ -4242,9 +4245,14 @@ static u32 vmx_vmentry_ctrl(void)
if (vmx_pt_mode_is_system())
vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP |
VM_ENTRY_LOAD_IA32_RTIT_CTL);
- /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
- return vmentry_ctrl &
- ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER);
+ /*
+ * IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically.
+ */
+ vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+ VM_ENTRY_LOAD_IA32_EFER |
+ VM_ENTRY_IA32E_MODE);
+
+ return vmentry_ctrl;
}

static u32 vmx_vmexit_ctrl(void)
--
2.35.3

2022-07-14 09:44:47

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 23/25] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs

Using raw host MSR values for setting up nested VMX control MSRs is
incorrect as some features need to disabled, e.g. when KVM runs as
a nested hypervisor on Hyper-V and uses Enlightened VMCS or when a
workaround for IA32_PERF_GLOBAL_CTRL is applied. For non-nested VMX, this
is done in setup_vmcs_config() and the result is stored in vmcs_config.
Use it for setting up allowed-1 bits in nested VMX MSRs too.

Suggested-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 30 ++++++++++++------------------
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 5 ++---
3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 09654d5c2144..3d386c663fac 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6565,8 +6565,10 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
* bit in the high half is on if the corresponding bit in the control field
* may be on. See also vmx_control_verify().
*/
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
{
+ struct nested_vmx_msrs *msrs = &vmcs_conf->nested;
+
/*
* Note that as a general rule, the high half of the MSRs (bits in
* the control fields which may be 1) should be initialized by the
@@ -6583,11 +6585,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
*/

/* pin-based controls */
- rdmsr(MSR_IA32_VMX_PINBASED_CTLS,
- msrs->pinbased_ctls_low,
- msrs->pinbased_ctls_high);
msrs->pinbased_ctls_low =
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+ msrs->pinbased_ctls_high = vmcs_conf->pin_based_exec_ctrl;
msrs->pinbased_ctls_high &=
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
@@ -6598,12 +6599,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
PIN_BASED_VMX_PREEMPTION_TIMER;

/* exit controls */
- rdmsr(MSR_IA32_VMX_EXIT_CTLS,
- msrs->exit_ctls_low,
- msrs->exit_ctls_high);
msrs->exit_ctls_low =
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;

+ msrs->exit_ctls_high = vmcs_conf->vmexit_ctrl;
msrs->exit_ctls_high &=
#ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6619,11 +6618,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;

/* entry controls */
- rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
- msrs->entry_ctls_low,
- msrs->entry_ctls_high);
msrs->entry_ctls_low =
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+
+ msrs->entry_ctls_high = vmcs_conf->vmentry_ctrl;
msrs->entry_ctls_high &=
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
@@ -6637,11 +6635,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;

/* cpu-based controls */
- rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
- msrs->procbased_ctls_low,
- msrs->procbased_ctls_high);
msrs->procbased_ctls_low =
CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+ msrs->procbased_ctls_high = vmcs_conf->cpu_based_exec_ctrl;
msrs->procbased_ctls_high &=
CPU_BASED_INTR_WINDOW_EXITING |
CPU_BASED_NMI_WINDOW_EXITING | CPU_BASED_USE_TSC_OFFSETTING |
@@ -6675,12 +6672,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
* depend on CPUID bits, they are added later by
* vmx_vcpu_after_set_cpuid.
*/
- if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
- rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
- msrs->secondary_ctls_low,
- msrs->secondary_ctls_high);
-
msrs->secondary_ctls_low = 0;
+
+ msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
msrs->secondary_ctls_high &=
SECONDARY_EXEC_DESC |
SECONDARY_EXEC_ENABLE_RDTSCP |
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index c92cea0b8ccc..fae047c6204b 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,7 +17,7 @@ enum nvmx_vmentry_status {
};

void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e462e5b9c0a1..35285109856f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7306,7 +7306,7 @@ static int __init vmx_check_processor_compat(void)
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
+ nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
smp_processor_id());
@@ -8281,8 +8281,7 @@ static __init int hardware_setup(void)
setup_default_sgx_lepubkeyhash();

if (nested) {
- nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept);
+ nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);

r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
if (r)
--
2.35.3

2022-07-14 09:45:27

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 21/25] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()

As a preparation to reusing the result of setup_vmcs_config() for setting
up nested VMX control MSRs, move LOAD_IA32_PERF_GLOBAL_CTRL errata handling
to vmx_vmexit_ctrl()/vmx_vmentry_ctrl() and print the warning from
hardware_setup(). While it seems reasonable to not expose
LOAD_IA32_PERF_GLOBAL_CTRL controls to L1 hypervisor on buggy CPUs,
such change would inevitably break live migration from older KVMs
where the controls are exposed. Keep the status quo for know, L1 hypervisor
itself is supposed to take care of the errata.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 62 ++++++++++++++++++++++++++----------------
1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2dff5b94c535..e462e5b9c0a1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2416,6 +2416,31 @@ static bool cpu_has_sgx(void)
return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
}

+/*
+ * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
+ * can't be used due to errata where VM Exit may incorrectly clear
+ * IA32_PERF_GLOBAL_CTRL[34:32]. Work around the errata by using the
+ * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
+ */
+static bool cpu_has_perf_global_ctrl_bug(void)
+{
+ if (boot_cpu_data.x86 == 0x6) {
+ switch (boot_cpu_data.x86_model) {
+ case INTEL_FAM6_NEHALEM_EP: /* AAK155 */
+ case INTEL_FAM6_NEHALEM: /* AAP115 */
+ case INTEL_FAM6_WESTMERE: /* AAT100 */
+ case INTEL_FAM6_WESTMERE_EP: /* BC86,AAY89,BD102 */
+ case INTEL_FAM6_NEHALEM_EX: /* BA97 */
+ return true;
+ default:
+ break;
+ }
+ }
+
+ return false;
+}
+
+
static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
u32 msr, u32 *result)
{
@@ -2572,30 +2597,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
_vmexit_control &= ~x_ctrl;
}

- /*
- * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
- * can't be used due to an errata where VM Exit may incorrectly clear
- * IA32_PERF_GLOBAL_CTRL[34:32]. Workaround the errata by using the
- * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
- */
- if (boot_cpu_data.x86 == 0x6) {
- switch (boot_cpu_data.x86_model) {
- case INTEL_FAM6_NEHALEM_EP: /* AAK155 */
- case INTEL_FAM6_NEHALEM: /* AAP115 */
- case INTEL_FAM6_WESTMERE: /* AAT100 */
- case INTEL_FAM6_WESTMERE_EP: /* BC86,AAY89,BD102 */
- case INTEL_FAM6_NEHALEM_EX: /* BA97 */
- _vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
- _vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
- pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
- "does not work properly. Using workaround\n");
- break;
- default:
- break;
- }
- }
-
-
rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);

/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -4184,6 +4185,10 @@ static u32 vmx_vmentry_ctrl(void)
VM_ENTRY_LOAD_IA32_EFER |
VM_ENTRY_IA32E_MODE);

+
+ if (cpu_has_perf_global_ctrl_bug())
+ vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+
return vmentry_ctrl;
}

@@ -4198,6 +4203,10 @@ static u32 vmx_vmexit_ctrl(void)
if (vmx_pt_mode_is_system())
vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP |
VM_EXIT_CLEAR_IA32_RTIT_CTL);
+
+ if (cpu_has_perf_global_ctrl_bug())
+ vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+
/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
return vmexit_ctrl &
~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
@@ -8113,6 +8122,11 @@ static __init int hardware_setup(void)
if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
return -EIO;

+ if (cpu_has_perf_global_ctrl_bug()) {
+ pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
+ "does not work properly. Using workaround\n");
+ }
+
if (boot_cpu_has(X86_FEATURE_NX))
kvm_enable_efer_bits(EFER_NX);

--
2.35.3

2022-07-14 09:45:30

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 02/25] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition

Section 1.9 of TLFS v6.0b says:

"All structures are padded in such a way that fields are aligned
naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
and so on)".

'struct enlightened_vmcs' has a glitch:

...
struct {
u32 nested_flush_hypercall:1; /* 836: 0 4 */
u32 msr_bitmap:1; /* 836: 1 4 */
u32 reserved:30; /* 836: 2 4 */
} hv_enlightenments_control; /* 836 4 */
u32 hv_vp_id; /* 840 4 */
u64 hv_vm_id; /* 844 8 */
u64 partition_assist_page; /* 852 8 */
...

And the observed values in 'partition_assist_page' make no sense at
all. Fix the layout by padding the structure properly.

Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 0a9407dc0859..6f0acc45e67a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -546,7 +546,7 @@ struct hv_enlightened_vmcs {
u64 guest_rip;

u32 hv_clean_fields;
- u32 hv_padding_32;
+ u32 padding32_1;
u32 hv_synthetic_controls;
struct {
u32 nested_flush_hypercall:1;
@@ -554,7 +554,7 @@ struct hv_enlightened_vmcs {
u32 reserved:30;
} __packed hv_enlightenments_control;
u32 hv_vp_id;
-
+ u32 padding32_2;
u64 hv_vm_id;
u64 partition_assist_page;
u64 padding64_4[4];
--
2.35.3

2022-07-14 09:45:50

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 11/25] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization

With the updated eVMCSv1 definition, there's no known 'problematic'
controls which are exposed in VMX control MSRs but are not present in
eVMCSv1. Get rid of VMX control MSRs filtering for KVM on Hyper-V.

Note: VMX control MSRs filtering for Hyper-V on KVM
(nested_evmcs_filter_control_msr()) stays as even the updated eVMCSv1
definition doesn't have all the features implemented by KVM and some
fields are still missing. Moreover, nested_evmcs_filter_control_msr()
has to support the original eVMCSv1 version when VMM wishes so.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/evmcs.c | 13 -------------
arch/x86/kvm/vmx/evmcs.h | 1 -
arch/x86/kvm/vmx/vmx.c | 5 -----
3 files changed, 19 deletions(-)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 52a53debd806..b5cfbf7d487b 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -320,19 +320,6 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
};
const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);

-#if IS_ENABLED(CONFIG_HYPERV)
-__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
-{
- vmcs_conf->cpu_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_EXEC_CTRL;
- vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL;
- vmcs_conf->cpu_based_2nd_exec_ctrl &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
- vmcs_conf->cpu_based_3rd_exec_ctrl = 0;
-
- vmcs_conf->vmexit_ctrl &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
- vmcs_conf->vmentry_ctrl &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
-}
-#endif
-
bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa)
{
struct hv_vp_assist_page assist_page;
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 4b809c79ae63..0feac101cce4 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -203,7 +203,6 @@ static inline void evmcs_load(u64 phys_addr)
vp_ap->enlighten_vmentry = 1;
}

-__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
#else /* !IS_ENABLED(CONFIG_HYPERV) */
static __always_inline void evmcs_write64(unsigned long field, u64 value) {}
static inline void evmcs_write32(unsigned long field, u32 value) {}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b4915d841357..dd905ad72637 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2689,11 +2689,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
vmcs_conf->vmexit_ctrl = _vmexit_control;
vmcs_conf->vmentry_ctrl = _vmentry_control;

-#if IS_ENABLED(CONFIG_HYPERV)
- if (enlightened_vmcs)
- evmcs_sanitize_exec_ctrls(vmcs_conf);
-#endif
-
return 0;
}

--
2.35.3

2022-07-14 09:46:50

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 14/25] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING in setup_vmcs_config()

SECONDARY_EXEC_ENCLS_EXITING is conditionally added to the 'optional'
checklist in setup_vmcs_config() but there's little value in doing so.
First, as the control is optional, we can always check for its
presence, no harm done. Second, the only real value cpu_has_sgx() check
gives is that on the CPUs which support SECONDARY_EXEC_ENCLS_EXITING but
don't support SGX, the control is not getting enabled. It's highly unlikely
such CPUs exist but it's possible that some hypervisors expose broken vCPU
models.

Preserve cpu_has_sgx() check but filter the result of adjust_vmx_controls()
instead of the input.

Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ce54f13d8da1..566be73c6509 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2528,9 +2528,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC |
SECONDARY_EXEC_BUS_LOCK_DETECTION |
- SECONDARY_EXEC_NOTIFY_VM_EXITING;
- if (cpu_has_sgx())
- opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
+ SECONDARY_EXEC_NOTIFY_VM_EXITING |
+ SECONDARY_EXEC_ENCLS_EXITING;
+
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -2577,6 +2577,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
vmx_cap->vpid = 0;
}

+ if (!cpu_has_sgx())
+ _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
+
if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
u64 opt3 = TERTIARY_EXEC_IPI_VIRT;

--
2.35.3

2022-07-14 09:47:47

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 07/25] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields

The updated Enlightened VMCS definition has 'encls_exiting_bitmap'
field which needs mapping to VMCS, add the missing encoding.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/vmx.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index cc3604f8f1d3..5292d30fb7f2 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -208,6 +208,8 @@ enum vmcs_field {
VMWRITE_BITMAP_HIGH = 0x00002029,
XSS_EXIT_BITMAP = 0x0000202C,
XSS_EXIT_BITMAP_HIGH = 0x0000202D,
+ ENCLS_EXITING_BITMAP = 0x0000202E,
+ ENCLS_EXITING_BITMAP_HIGH = 0x0000202F,
TSC_MULTIPLIER = 0x00002032,
TSC_MULTIPLIER_HIGH = 0x00002033,
GUEST_PHYSICAL_ADDRESS = 0x00002400,
--
2.35.3

2022-07-14 09:48:06

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 03/25] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition

Updated Hyper-V Enlightened VMCS specification lists several new
fields for the following features:

- PerfGlobalCtrl
- EnclsExitingBitmap
- Tsc Scaling
- GuestLbrCtl
- CET
- SSP

Update the definition. The updated definition is available only when
CPUID.0x4000000A.EBX BIT(0) is '1'. Add a define for it as well.

Note: The latest TLFS is available at
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/hyperv-tlfs.h | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 6f0acc45e67a..ebc27017fa48 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -138,6 +138,17 @@
#define HV_X64_NESTED_GUEST_MAPPING_FLUSH BIT(18)
#define HV_X64_NESTED_MSR_BITMAP BIT(19)

+/*
+ * Nested quirks. These are HYPERV_CPUID_NESTED_FEATURES.EBX bits.
+ *
+ * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
+ * published TLFS version. When the bit is set, nested hypervisor can use
+ * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
+ * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
+ * specification).
+ */
+#define HV_X64_NESTED_EVMCS1_2022_UPDATE BIT(0)
+
/*
* This is specific to AMD and specifies that enlightened TLB flush is
* supported. If guest opts in to this feature, ASID invalidations only
@@ -559,9 +570,20 @@ struct hv_enlightened_vmcs {
u64 partition_assist_page;
u64 padding64_4[4];
u64 guest_bndcfgs;
- u64 padding64_5[7];
+ u64 guest_ia32_perf_global_ctrl;
+ u64 guest_ia32_s_cet;
+ u64 guest_ssp;
+ u64 guest_ia32_int_ssp_table_addr;
+ u64 guest_ia32_lbr_ctl;
+ u64 padding64_5[2];
u64 xss_exit_bitmap;
- u64 padding64_6[7];
+ u64 encls_exiting_bitmap;
+ u64 host_ia32_perf_global_ctrl;
+ u64 tsc_multiplier;
+ u64 host_ia32_s_cet;
+ u64 host_ssp;
+ u64 host_ia32_int_ssp_table_addr;
+ u64 padding64_6;
} __packed;

#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE 0
--
2.35.3

2022-07-14 09:48:16

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 05/25] KVM: nVMX: Support several new fields in eVMCSv1

Enlightened VMCS v1 definition was updated with new fields, add
support for them for Hyper-V on KVM.

Note: SSP, CET and Guest LBR features are not supported by KVM yet
and 'struct vmcs12' has no corresponding fields.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 778f82015f03..4fc84f0f5875 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1603,6 +1603,10 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
vmcs12->guest_rflags = evmcs->guest_rflags;
vmcs12->guest_interruptibility_info =
evmcs->guest_interruptibility_info;
+ /*
+ * Not present in struct vmcs12:
+ * vmcs12->guest_ssp = evmcs->guest_ssp;
+ */
}

if (unlikely(!(hv_clean_fields &
@@ -1649,6 +1653,13 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
vmcs12->host_fs_selector = evmcs->host_fs_selector;
vmcs12->host_gs_selector = evmcs->host_gs_selector;
vmcs12->host_tr_selector = evmcs->host_tr_selector;
+ vmcs12->host_ia32_perf_global_ctrl = evmcs->host_ia32_perf_global_ctrl;
+ /*
+ * Not present in struct vmcs12:
+ * vmcs12->host_ia32_s_cet = evmcs->host_ia32_s_cet;
+ * vmcs12->host_ssp = evmcs->host_ssp;
+ * vmcs12->host_ia32_int_ssp_table_addr = evmcs->host_ia32_int_ssp_table_addr;
+ */
}

if (unlikely(!(hv_clean_fields &
@@ -1716,6 +1727,8 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
vmcs12->tsc_offset = evmcs->tsc_offset;
vmcs12->virtual_apic_page_addr = evmcs->virtual_apic_page_addr;
vmcs12->xss_exit_bitmap = evmcs->xss_exit_bitmap;
+ vmcs12->encls_exiting_bitmap = evmcs->encls_exiting_bitmap;
+ vmcs12->tsc_multiplier = evmcs->tsc_multiplier;
}

if (unlikely(!(hv_clean_fields &
@@ -1763,6 +1776,13 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
vmcs12->guest_bndcfgs = evmcs->guest_bndcfgs;
vmcs12->guest_activity_state = evmcs->guest_activity_state;
vmcs12->guest_sysenter_cs = evmcs->guest_sysenter_cs;
+ vmcs12->guest_ia32_perf_global_ctrl = evmcs->guest_ia32_perf_global_ctrl;
+ /*
+ * Not present in struct vmcs12:
+ * vmcs12->guest_ia32_s_cet = evmcs->guest_ia32_s_cet;
+ * vmcs12->guest_ia32_lbr_ctl = evmcs->guest_ia32_lbr_ctl;
+ * vmcs12->guest_ia32_int_ssp_table_addr = evmcs->guest_ia32_int_ssp_table_addr;
+ */
}

/*
@@ -1865,12 +1885,23 @@ static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx)
* evmcs->vm_exit_msr_store_count = vmcs12->vm_exit_msr_store_count;
* evmcs->vm_exit_msr_load_count = vmcs12->vm_exit_msr_load_count;
* evmcs->vm_entry_msr_load_count = vmcs12->vm_entry_msr_load_count;
+ * evmcs->guest_ia32_perf_global_ctrl = vmcs12->guest_ia32_perf_global_ctrl;
+ * evmcs->host_ia32_perf_global_ctrl = vmcs12->host_ia32_perf_global_ctrl;
+ * evmcs->encls_exiting_bitmap = vmcs12->encls_exiting_bitmap;
+ * evmcs->tsc_multiplier = vmcs12->tsc_multiplier;
*
* Not present in struct vmcs12:
* evmcs->exit_io_instruction_ecx = vmcs12->exit_io_instruction_ecx;
* evmcs->exit_io_instruction_esi = vmcs12->exit_io_instruction_esi;
* evmcs->exit_io_instruction_edi = vmcs12->exit_io_instruction_edi;
* evmcs->exit_io_instruction_eip = vmcs12->exit_io_instruction_eip;
+ * evmcs->host_ia32_s_cet = vmcs12->host_ia32_s_cet;
+ * evmcs->host_ssp = vmcs12->host_ssp;
+ * evmcs->host_ia32_int_ssp_table_addr = vmcs12->host_ia32_int_ssp_table_addr;
+ * evmcs->guest_ia32_s_cet = vmcs12->guest_ia32_s_cet;
+ * evmcs->guest_ia32_lbr_ctl = vmcs12->guest_ia32_lbr_ctl;
+ * evmcs->guest_ia32_int_ssp_table_addr = vmcs12->guest_ia32_int_ssp_table_addr;
+ * evmcs->guest_ssp = vmcs12->guest_ssp;
*/

evmcs->guest_es_selector = vmcs12->guest_es_selector;
--
2.35.3

2022-07-14 09:48:16

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v4 01/25] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags

For some features, Hyper-V spec defines two separate CPUID bits: one
listing whether the feature is supported or not and another one showing
whether guest partition was granted access to the feature ("partition
privilege mask"). 'Debug MSRs available' is one of such features. Add
the missing 'access' bit.

Fixes: f97f5a56f597 ("x86/kvm/hyper-v: Add support for synthetic debugger interface")
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 1 +
include/asm-generic/hyperv-tlfs.h | 2 ++
2 files changed, 3 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index e2e95a6fccfd..e08189211d9a 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2496,6 +2496,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->eax |= HV_MSR_RESET_AVAILABLE;
ent->eax |= HV_MSR_REFERENCE_TSC_AVAILABLE;
ent->eax |= HV_ACCESS_FREQUENCY_MSRS;
+ ent->eax |= HV_ACCESS_DEBUG_MSRS;
ent->eax |= HV_ACCESS_REENLIGHTENMENT;

ent->ebx |= HV_POST_MESSAGES;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index fdce7a4cfc6f..1d99dd296a76 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -70,6 +70,8 @@
#define HV_MSR_GUEST_IDLE_AVAILABLE BIT(10)
/* Partition local APIC and TSC frequency registers available */
#define HV_ACCESS_FREQUENCY_MSRS BIT(11)
+/* Debug MSRs available */
+#define HV_ACCESS_DEBUG_MSRS BIT(12)
/* AccessReenlightenmentControls privilege */
#define HV_ACCESS_REENLIGHTENMENT BIT(13)
/* AccessTscInvariantControls privilege */
--
2.35.3

2022-07-14 09:48:21

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v4 07/25] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields

On Thu, 2022-07-14 at 11:13 +0200, Vitaly Kuznetsov wrote:
> The updated Enlightened VMCS definition has 'encls_exiting_bitmap'
> field which needs mapping to VMCS, add the missing encoding.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> tools/testing/selftests/kvm/include/x86_64/vmx.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> index cc3604f8f1d3..5292d30fb7f2 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> @@ -208,6 +208,8 @@ enum vmcs_field {
> VMWRITE_BITMAP_HIGH = 0x00002029,
> XSS_EXIT_BITMAP = 0x0000202C,
> XSS_EXIT_BITMAP_HIGH = 0x0000202D,
> + ENCLS_EXITING_BITMAP = 0x0000202E,
> + ENCLS_EXITING_BITMAP_HIGH = 0x0000202F,
> TSC_MULTIPLIER = 0x00002032,
> TSC_MULTIPLIER_HIGH = 0x00002033,
> GUEST_PHYSICAL_ADDRESS = 0x00002400,

Reviewed-by: Kai Huang <[email protected]>

--
Thanks,
-Kai


2022-07-14 10:04:01

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition

On Thu, 2022-07-14 at 11:13 +0200, Vitaly Kuznetsov wrote:
> Updated Hyper-V Enlightened VMCS specification lists several new
> fields for the following features:
>
> - PerfGlobalCtrl
> - EnclsExitingBitmap
> - Tsc Scaling
> - GuestLbrCtl
> - CET
> - SSP
>
> Update the definition. The updated definition is available only when
> CPUID.0x4000000A.EBX BIT(0) is '1'. Add a define for it as well.
>
> Note: The latest TLFS is available at
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6f0acc45e67a..ebc27017fa48 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -138,6 +138,17 @@
>  #define HV_X64_NESTED_GUEST_MAPPING_FLUSH              BIT(18)
>  #define HV_X64_NESTED_MSR_BITMAP                       BIT(19)
>  
> +/*
> + * Nested quirks. These are HYPERV_CPUID_NESTED_FEATURES.EBX bits.
> + *
> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any
> + * published TLFS version. When the bit is set, nested hypervisor can use
> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl,
> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016
> + * specification).
> + */
> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE               BIT(0)
> +
>  /*
>   * This is specific to AMD and specifies that enlightened TLB flush is
>   * supported. If guest opts in to this feature, ASID invalidations only
> @@ -559,9 +570,20 @@ struct hv_enlightened_vmcs {
>         u64 partition_assist_page;
>         u64 padding64_4[4];
>         u64 guest_bndcfgs;
> -       u64 padding64_5[7];
> +       u64 guest_ia32_perf_global_ctrl;
> +       u64 guest_ia32_s_cet;
> +       u64 guest_ssp;
> +       u64 guest_ia32_int_ssp_table_addr;
> +       u64 guest_ia32_lbr_ctl;
> +       u64 padding64_5[2];
>         u64 xss_exit_bitmap;
> -       u64 padding64_6[7];
> +       u64 encls_exiting_bitmap;
> +       u64 host_ia32_perf_global_ctrl;
> +       u64 tsc_multiplier;
> +       u64 host_ia32_s_cet;
> +       u64 host_ssp;
> +       u64 host_ia32_int_ssp_table_addr;
> +       u64 padding64_6;
>  } __packed;
>  
>  #define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE                    0

All look good now.

I really don't like the new 'online' TLFS spec - as you said,
they can indeed change it any moment without any traces.

Seems it was done with good intentions, and it much easier to use,
but they should also provide a PDF, or at least some form or archive of
these web pages.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2022-07-14 10:16:58

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v4 11/25] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization

On Thu, 2022-07-14 at 11:13 +0200, Vitaly Kuznetsov wrote:
> With the updated eVMCSv1 definition, there's no known 'problematic'
> controls which are exposed in VMX control MSRs but are not present in
> eVMCSv1. Get rid of VMX control MSRs filtering for KVM on Hyper-V.

I think it still might be worth it, mentioning at least in the commit message,
that as you said, the all known HyperV versions, either don't expose the 
new fields by not setting bits in the VMX feature controls, 
or support the new eVMCS revision.

But anyway:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

>
> Note: VMX control MSRs filtering for Hyper-V on KVM
> (nested_evmcs_filter_control_msr()) stays as even the updated eVMCSv1
> definition doesn't have all the features implemented by KVM and some
> fields are still missing. Moreover, nested_evmcs_filter_control_msr()
> has to support the original eVMCSv1 version when VMM wishes so.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
>  arch/x86/kvm/vmx/evmcs.c | 13 -------------
>  arch/x86/kvm/vmx/evmcs.h |  1 -
>  arch/x86/kvm/vmx/vmx.c   |  5 -----
>  3 files changed, 19 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 52a53debd806..b5cfbf7d487b 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -320,19 +320,6 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
>  };
>  const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);
>  
> -#if IS_ENABLED(CONFIG_HYPERV)
> -__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
> -{
> -       vmcs_conf->cpu_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_EXEC_CTRL;
> -       vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> -       vmcs_conf->cpu_based_2nd_exec_ctrl &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
> -       vmcs_conf->cpu_based_3rd_exec_ctrl = 0;
> -
> -       vmcs_conf->vmexit_ctrl &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> -       vmcs_conf->vmentry_ctrl &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> -}
> -#endif
> -
>  bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa)
>  {
>         struct hv_vp_assist_page assist_page;
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 4b809c79ae63..0feac101cce4 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -203,7 +203,6 @@ static inline void evmcs_load(u64 phys_addr)
>         vp_ap->enlighten_vmentry = 1;
>  }
>  
> -__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
>  #else /* !IS_ENABLED(CONFIG_HYPERV) */
>  static __always_inline void evmcs_write64(unsigned long field, u64 value) {}
>  static inline void evmcs_write32(unsigned long field, u32 value) {}
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b4915d841357..dd905ad72637 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2689,11 +2689,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>         vmcs_conf->vmexit_ctrl         = _vmexit_control;
>         vmcs_conf->vmentry_ctrl        = _vmentry_control;
>  
> -#if IS_ENABLED(CONFIG_HYPERV)
> -       if (enlightened_vmcs)
> -               evmcs_sanitize_exec_ctrls(vmcs_conf);
> -#endif
> -
>         return 0;
>  }
>  


2022-07-21 22:13:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 12/25] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config()

On Thu, Jul 14, 2022, Vitaly Kuznetsov wrote:
> VM_ENTRY_IA32E_MODE control is toggled dynamically by vmx_set_efer()
> and setup_vmcs_config() doesn't check its existence. On the contrary,
> nested_vmx_setup_ctls_msrs() doesn set it on x86_64. Add the missing
> check and filter the bit out in vmx_vmentry_ctrl().
>
> No (real) functional change intended as all existing CPUs supporting
> long mode and VMX are supposed to have it.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Reviewed-by: Jim Mattson <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---

Reviewed-by: Sean Christopherson <[email protected]>

2022-07-21 22:15:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 14/25] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING in setup_vmcs_config()

On Thu, Jul 14, 2022, Vitaly Kuznetsov wrote:
> SECONDARY_EXEC_ENCLS_EXITING is conditionally added to the 'optional'
> checklist in setup_vmcs_config() but there's little value in doing so.
> First, as the control is optional, we can always check for its
> presence, no harm done. Second, the only real value cpu_has_sgx() check
> gives is that on the CPUs which support SECONDARY_EXEC_ENCLS_EXITING but
> don't support SGX, the control is not getting enabled. It's highly unlikely
> such CPUs exist but it's possible that some hypervisors expose broken vCPU
> models.

It's not just broken vCPU models, SGX can be "soft-disabled" on bare metal, e.g. if
software writes MCE control MSRs or there's an uncorrectable #MC (may not be the
case on all platforms). This is architectural behavior and needs to be handled in
KVM. Obviously if SGX gets disabled after KVM is loaded then we're out of luck, but
having the ENCL-exiting control without SGX being enabled is 100% valid.

As for why KVM bothers with the check, it's to work around a suspected hardware
or XuCode bug (I'm still a bit shocked that's public now :-) ) where SGX got
_hard_ disabled across S3 on some CPUs and made the fields magically disappear.
The workaround was to soft-disable SGX in BIOS so that KVM wouldn't attempt to
enable the ENCLS-exiting control

> Preserve cpu_has_sgx() check but filter the result of adjust_vmx_controls()
> instead of the input.
>
> Reviewed-by: Jim Mattson <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ce54f13d8da1..566be73c6509 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2528,9 +2528,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> SECONDARY_EXEC_PT_CONCEAL_VMX |
> SECONDARY_EXEC_ENABLE_VMFUNC |
> SECONDARY_EXEC_BUS_LOCK_DETECTION |
> - SECONDARY_EXEC_NOTIFY_VM_EXITING;
> - if (cpu_has_sgx())
> - opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
> + SECONDARY_EXEC_NOTIFY_VM_EXITING |
> + SECONDARY_EXEC_ENCLS_EXITING;
> +
> if (adjust_vmx_controls(min2, opt2,
> MSR_IA32_VMX_PROCBASED_CTLS2,
> &_cpu_based_2nd_exec_control) < 0)
> @@ -2577,6 +2577,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> vmx_cap->vpid = 0;
> }
>
> + if (!cpu_has_sgx())
> + _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
> +
> if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
> u64 opt3 = TERTIARY_EXEC_IPI_VIRT;
>
> --
> 2.35.3
>

2022-07-21 22:34:39

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags

On Thu, Jul 14, 2022, Vitaly Kuznetsov wrote:
> For some features, Hyper-V spec defines two separate CPUID bits: one
> listing whether the feature is supported or not and another one showing
> whether guest partition was granted access to the feature ("partition
> privilege mask"). 'Debug MSRs available' is one of such features. Add
> the missing 'access' bit.
>
> Fixes: f97f5a56f597 ("x86/kvm/hyper-v: Add support for synthetic debugger interface")
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/hyperv.c | 1 +
> include/asm-generic/hyperv-tlfs.h | 2 ++
> 2 files changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index e2e95a6fccfd..e08189211d9a 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2496,6 +2496,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> ent->eax |= HV_MSR_RESET_AVAILABLE;
> ent->eax |= HV_MSR_REFERENCE_TSC_AVAILABLE;
> ent->eax |= HV_ACCESS_FREQUENCY_MSRS;
> + ent->eax |= HV_ACCESS_DEBUG_MSRS;
> ent->eax |= HV_ACCESS_REENLIGHTENMENT;

Doesn't KVM also need to switch to enforcing the "access" flag?

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index c284a605e453..ca91547034e4 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1282,7 +1282,7 @@ static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
case HV_X64_MSR_SYNDBG_OPTIONS:
case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
return hv_vcpu->cpuid_cache.features_edx &
- HV_FEATURE_DEBUG_MSRS_AVAILABLE;
+ HV_ACCESS_DEBUG_MSRS;
default:
break;
}

2022-07-21 23:00:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 21/25] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()

On Thu, Jul 14, 2022, Vitaly Kuznetsov wrote:
> As a preparation to reusing the result of setup_vmcs_config() for setting
> up nested VMX control MSRs, move LOAD_IA32_PERF_GLOBAL_CTRL errata handling
> to vmx_vmexit_ctrl()/vmx_vmentry_ctrl() and print the warning from
> hardware_setup(). While it seems reasonable to not expose
> LOAD_IA32_PERF_GLOBAL_CTRL controls to L1 hypervisor on buggy CPUs,
> such change would inevitably break live migration from older KVMs
> where the controls are exposed. Keep the status quo for know, L1 hypervisor

s/know/now

> itself is supposed to take care of the errata.

Except the errata are based on FMS and the FMS exposed to the L1 hypervisor may
not be the real FMS.

But that's moot, because they _should_ be fully emulated by KVM anyways; KVM
runs L2 with a MSR value modified by perf, not the raw MSR value requested by L1.

Of course KVM screws things up and fails to clear the flag in entry controls...
All exit controls are emulated so at least KVM gets those right.

Untested, but I believe KVM the fix is:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d0e781c7ac72..76926147b672 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2357,7 +2357,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* we can avoid VMWrites during vmx_set_efer().
*/
exec_control = __vm_entry_controls_get(vmcs01);
- exec_control |= vmcs12->vm_entry_controls;
+ exec_control |= (vmcs12->vm_entry_controls &
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)

> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 62 ++++++++++++++++++++++++++----------------
> 1 file changed, 38 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2dff5b94c535..e462e5b9c0a1 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2416,6 +2416,31 @@ static bool cpu_has_sgx(void)
> return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
> }
>
> +/*
> + * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
> + * can't be used due to errata where VM Exit may incorrectly clear
> + * IA32_PERF_GLOBAL_CTRL[34:32]. Work around the errata by using the
> + * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
> + */
> +static bool cpu_has_perf_global_ctrl_bug(void)
> +{
> + if (boot_cpu_data.x86 == 0x6) {
> + switch (boot_cpu_data.x86_model) {
> + case INTEL_FAM6_NEHALEM_EP: /* AAK155 */
> + case INTEL_FAM6_NEHALEM: /* AAP115 */
> + case INTEL_FAM6_WESTMERE: /* AAT100 */
> + case INTEL_FAM6_WESTMERE_EP: /* BC86,AAY89,BD102 */
> + case INTEL_FAM6_NEHALEM_EX: /* BA97 */
> + return true;
> + default:
> + break;
> + }
> + }
> +
> + return false;
> +}
> +
> +
> static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
> u32 msr, u32 *result)
> {
> @@ -2572,30 +2597,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> _vmexit_control &= ~x_ctrl;
> }
>
> - /*
> - * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they
> - * can't be used due to an errata where VM Exit may incorrectly clear
> - * IA32_PERF_GLOBAL_CTRL[34:32]. Workaround the errata by using the
> - * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL.
> - */
> - if (boot_cpu_data.x86 == 0x6) {
> - switch (boot_cpu_data.x86_model) {
> - case INTEL_FAM6_NEHALEM_EP: /* AAK155 */
> - case INTEL_FAM6_NEHALEM: /* AAP115 */
> - case INTEL_FAM6_WESTMERE: /* AAT100 */
> - case INTEL_FAM6_WESTMERE_EP: /* BC86,AAY89,BD102 */
> - case INTEL_FAM6_NEHALEM_EX: /* BA97 */
> - _vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> - _vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> - pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
> - "does not work properly. Using workaround\n");
> - break;
> - default:
> - break;
> - }
> - }
> -
> -
> rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);
>
> /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
> @@ -4184,6 +4185,10 @@ static u32 vmx_vmentry_ctrl(void)
> VM_ENTRY_LOAD_IA32_EFER |
> VM_ENTRY_IA32E_MODE);
>
> +

Extra line.

> + if (cpu_has_perf_global_ctrl_bug())
> + vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> +
> return vmentry_ctrl;
> }
>
> @@ -4198,6 +4203,10 @@ static u32 vmx_vmexit_ctrl(void)
> if (vmx_pt_mode_is_system())
> vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP |
> VM_EXIT_CLEAR_IA32_RTIT_CTL);
> +
> + if (cpu_has_perf_global_ctrl_bug())
> + vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> +
> /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
> return vmexit_ctrl &
> ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
> @@ -8113,6 +8122,11 @@ static __init int hardware_setup(void)
> if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
> return -EIO;
>
> + if (cpu_has_perf_global_ctrl_bug()) {

Curly braces not needed.

> + pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
> + "does not work properly. Using workaround\n");
> + }
> +
> if (boot_cpu_has(X86_FEATURE_NX))
> kvm_enable_efer_bits(EFER_NX);
>
> --
> 2.35.3
>

2022-07-22 17:42:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags

On 7/21/22 23:43, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index c284a605e453..ca91547034e4 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1282,7 +1282,7 @@ static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
> case HV_X64_MSR_SYNDBG_OPTIONS:
> case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
> return hv_vcpu->cpuid_cache.features_edx &
> - HV_FEATURE_DEBUG_MSRS_AVAILABLE;
> + HV_ACCESS_DEBUG_MSRS;
> default:
> break;
> }
>

Yes, and this will need some kind of hack in QEMU to expose both CPUID
bits. Fortunately hv-syndbg shouldn't be in much use in the wild, so I
think we can avoid quirks etc.

Paolo

2022-07-28 22:43:43

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 21/25] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()

On 7/22/22 00:56, Sean Christopherson wrote:
> Except the errata are based on FMS and the FMS exposed to the L1 hypervisor may
> not be the real FMS.
>
> But that's moot, because they_should_ be fully emulated by KVM anyways; KVM
> runs L2 with a MSR value modified by perf, not the raw MSR value requested by L1.
>
> Of course KVM screws things up and fails to clear the flag in entry controls...
> All exit controls are emulated so at least KVM gets those right.

Can you send this as a separate patch?

Paolo

> Untested, but I believe KVM the fix is:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d0e781c7ac72..76926147b672 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2357,7 +2357,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
> * we can avoid VMWrites during vmx_set_efer().
> */
> exec_control = __vm_entry_controls_get(vmcs01);
> - exec_control |= vmcs12->vm_entry_controls;
> + exec_control |= (vmcs12->vm_entry_controls &
> + ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
> exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
> if (cpu_has_load_ia32_efer()) {
> if (guest_efer & EFER_LMA)

2022-07-28 22:44:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 21/25] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()

On Fri, Jul 29, 2022, Paolo Bonzini wrote:
> On 7/22/22 00:56, Sean Christopherson wrote:
> > Except the errata are based on FMS and the FMS exposed to the L1 hypervisor may
> > not be the real FMS.
> >
> > But that's moot, because they_should_ be fully emulated by KVM anyways; KVM
> > runs L2 with a MSR value modified by perf, not the raw MSR value requested by L1.
> >
> > Of course KVM screws things up and fails to clear the flag in entry controls...
> > All exit controls are emulated so at least KVM gets those right.
>
> Can you send this as a separate patch?

Will do.

2022-08-01 09:03:59

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags

Paolo Bonzini <[email protected]> writes:

> On 7/21/22 23:43, Sean Christopherson wrote:
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index c284a605e453..ca91547034e4 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1282,7 +1282,7 @@ static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
>> case HV_X64_MSR_SYNDBG_OPTIONS:
>> case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
>> return hv_vcpu->cpuid_cache.features_edx &
>> - HV_FEATURE_DEBUG_MSRS_AVAILABLE;
>> + HV_ACCESS_DEBUG_MSRS;
>> default:
>> break;
>> }
>>
>
> Yes, and this will need some kind of hack in QEMU to expose both CPUID
> bits. Fortunately hv-syndbg shouldn't be in much use in the wild, so I
> think we can avoid quirks etc.

Properly behaving VMM should always expose both bits. I'm not sure what
would it mean if only 'access' bit is present: you can try accessing the
missing feature but you get #GP anyway most likely. When the feature is
available but 'access' bit is not set -- the result is also #GP. In case
we really want to support this behavior in KVM we should probably check
*both* bits in hv_check_msr_access() but I don't really see a
use-case. I've lazily kept HV_FEATURE_DEBUG_MSRS_AVAILABLE here just to
be QEMU compatible.

--
Vitaly


2022-08-02 13:01:24

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v4 14/25] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING in setup_vmcs_config()

Sean Christopherson <[email protected]> writes:

> On Thu, Jul 14, 2022, Vitaly Kuznetsov wrote:
>> SECONDARY_EXEC_ENCLS_EXITING is conditionally added to the 'optional'
>> checklist in setup_vmcs_config() but there's little value in doing so.
>> First, as the control is optional, we can always check for its
>> presence, no harm done. Second, the only real value cpu_has_sgx() check
>> gives is that on the CPUs which support SECONDARY_EXEC_ENCLS_EXITING but
>> don't support SGX, the control is not getting enabled. It's highly unlikely
>> such CPUs exist but it's possible that some hypervisors expose broken vCPU
>> models.
>
> It's not just broken vCPU models, SGX can be "soft-disabled" on bare metal, e.g. if
> software writes MCE control MSRs or there's an uncorrectable #MC (may not be the
> case on all platforms). This is architectural behavior and needs to be handled in
> KVM. Obviously if SGX gets disabled after KVM is loaded then we're out of luck, but
> having the ENCL-exiting control without SGX being enabled is 100% valid.
>
> As for why KVM bothers with the check, it's to work around a suspected hardware
> or XuCode bug (I'm still a bit shocked that's public now :-) ) where SGX got
> _hard_ disabled across S3 on some CPUs and made the fields magically disappear.
> The workaround was to soft-disable SGX in BIOS so that KVM wouldn't attempt to
> enable the ENCLS-exiting control

Oh, thanks for this insight, I had no idea! I'll adjust my commit
message accordingly.

>
>> Preserve cpu_has_sgx() check but filter the result of adjust_vmx_controls()
>> instead of the input.
>>
>> Reviewed-by: Jim Mattson <[email protected]>
>> Reviewed-by: Maxim Levitsky <[email protected]>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index ce54f13d8da1..566be73c6509 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2528,9 +2528,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>> SECONDARY_EXEC_PT_CONCEAL_VMX |
>> SECONDARY_EXEC_ENABLE_VMFUNC |
>> SECONDARY_EXEC_BUS_LOCK_DETECTION |
>> - SECONDARY_EXEC_NOTIFY_VM_EXITING;
>> - if (cpu_has_sgx())
>> - opt2 |= SECONDARY_EXEC_ENCLS_EXITING;
>> + SECONDARY_EXEC_NOTIFY_VM_EXITING |
>> + SECONDARY_EXEC_ENCLS_EXITING;
>> +
>> if (adjust_vmx_controls(min2, opt2,
>> MSR_IA32_VMX_PROCBASED_CTLS2,
>> &_cpu_based_2nd_exec_control) < 0)
>> @@ -2577,6 +2577,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>> vmx_cap->vpid = 0;
>> }
>>
>> + if (!cpu_has_sgx())
>> + _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
>> +
>> if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
>> u64 opt3 = TERTIARY_EXEC_IPI_VIRT;
>>
>> --
>> 2.35.3
>>
>

--
Vitaly