2018-01-31 13:13:52

by KarimAllah Ahmed

[permalink] [raw]
Subject: [PATCH v4 0/5] KVM: Expose speculation control feature to guests

Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.

v4:
- Add IBRS passthrough for SVM (5/5).
- Handle nested guests properly.
- expose F(IBRS) in kvm_cpuid_8000_0008_ebx_x86_features

Ashok Raj (1):
KVM: x86: Add IBPB support

KarimAllah Ahmed (4):
KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX
KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL
KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

arch/x86/kvm/cpuid.c | 22 +++++++---
arch/x86/kvm/cpuid.h | 1 +
arch/x86/kvm/svm.c | 85 ++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 1 +
5 files changed, 216 insertions(+), 7 deletions(-)

Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Asit Mallick <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Janakarajan Natarajan <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jun Nakajima <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

--
2.7.4



2018-01-31 13:13:54

by KarimAllah Ahmed

[permalink] [raw]
Subject: [PATCH v4 4/5] KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Ashok Raj <[email protected]> ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of atomically saving and restoring the
MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only
add_atomic_switch_msr when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Cc: Asit Mallick <[email protected]>
Cc: Arjan Van De Ven <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jun Nakajima <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ashok Raj <[email protected]>
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
v4:
- Add IBRS to kvm_cpuid_8000_0008_ebx_x86_features
- Handling nested guests
v3:
- Save/restore manually
- Fix CPUID handling
- Fix a copy & paste error in the name of SPEC_CTRL MSR in
disable_intercept.
- support !cpu_has_vmx_msr_bitmap()
v2:
- remove 'host_spec_ctrl' in favor of only a comment (dwmw@).
- special case writing '0' in SPEC_CTRL to avoid confusing live-migration
when the instance never used the MSR (dwmw@).
- depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@).
- add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident).
---
arch/x86/kvm/cpuid.c | 9 ++++---
arch/x86/kvm/vmx.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 2 +-
3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,

/* cpuid 0x80000008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
- F(IBPB);
+ F(IBPB) | F(IBRS);

/* cpuid 0xC0000001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,

/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+ F(ARCH_CAPABILITIES);

/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
- /* IBPB isn't necessarily present in hardware cpuid */
+ /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+ if (boot_cpu_has(X86_FEATURE_IBRS))
+ entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40643b8..9080938 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -593,6 +593,8 @@ struct vcpu_vmx {
u64 msr_guest_kernel_gs_base;
#endif
u64 arch_capabilities;
+ u64 spec_ctrl;
+ bool save_spec_ctrl_on_exit;

u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -938,6 +940,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+ u32 msr, int type);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3238,6 +3242,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+ return 1;
+
+ msr_info->data = to_vmx(vcpu)->spec_ctrl;
+ break;
case MSR_IA32_ARCH_CAPABILITIES:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -3351,6 +3362,35 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+ return 1;
+
+ /* The STIBP bit doesn't fault even if it's not advertised */
+ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+ return 1;
+
+ vmx->spec_ctrl = data;
+
+ /*
+ * When it's written (to non-zero) for the first time, pass
+ * it through. This means we don't have to take the perf
+ * hit of saving it on vmexit for the common case of guests
+ * that don't use it.
+ */
+ if (cpu_has_vmx_msr_bitmap() && data &&
+ !vmx->save_spec_ctrl_on_exit) {
+ vmx->save_spec_ctrl_on_exit = true;
+
+ if (is_guest_mode(vcpu))
+ break;
+
+ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+ MSR_IA32_SPEC_CTRL,
+ MSR_TYPE_RW);
+ }
+ break;
case MSR_IA32_PRED_CMD:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -5667,6 +5707,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
u64 cr0;

vmx->rmode.vm86_active = 0;
+ vmx->spec_ctrl = 0;

vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vcpu, 0);
@@ -9338,6 +9379,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)

vmx_arm_hv_timer(vcpu);

+ /*
+ * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+ * it's non-zero. Since vmentry is serialising on affected CPUs, there
+ * is no need to worry about the conditional branch over the wrmsr
+ * being speculatively taken.
+ */
+ if (vmx->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
vmx->__launched = vmx->loaded_vmcs->launched;
asm(
/* Store host registers */
@@ -9456,6 +9506,19 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);

+ /*
+ * We do not use IBRS in the kernel. If this vCPU has used the
+ * SPEC_CTRL MSR it may have left it on; save the value and
+ * turn it off. This is much more efficient than blindly adding
+ * it to the atomic save/restore list. Especially as the former
+ * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+ */
+ if (vmx->save_spec_ctrl_on_exit)
+ rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
+
+ if (vmx->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();

@@ -10119,6 +10182,11 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
nested_vmx_disable_intercept_for_msr(
msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_SPEC_CTRL,
+ MSR_TYPE_R | MSR_TYPE_W);
+
+ nested_vmx_disable_intercept_for_msr(
+ msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_PRED_CMD,
MSR_TYPE_R);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ec142e..ac38143 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = {
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_ARCH_CAPABILITIES
+ MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
};

static unsigned num_msrs_to_save;
--
2.7.4


2018-01-31 13:14:17

by KarimAllah Ahmed

[permalink] [raw]
Subject: [PATCH v4 5/5] KVM: SVM: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Paolo Bonzini <[email protected]> ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
actually used it.

Cc: Asit Mallick <[email protected]>
Cc: Arjan Van De Ven <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jun Nakajima <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ashok Raj <[email protected]>
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/kvm/svm.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 89495cf..e1ba4c6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,9 @@ struct vcpu_svm {
u64 gs_base;
} host;

+ u64 spec_ctrl;
+ bool save_spec_ctrl_on_exit;
+
u32 *msrpm;

ulong nmi_iret_rip;
@@ -1583,6 +1586,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
u32 dummy;
u32 eax = 1;

+ svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
MSR_IA32_APICBASE_ENABLE;
@@ -3604,6 +3609,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+ return 1;
+
+ msr_info->data = svm->spec_ctrl;
+ break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x01000065;
break;
@@ -3695,6 +3707,30 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+ case MSR_IA32_SPEC_CTRL:
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+ return 1;
+
+ /* The STIBP bit doesn't fault even if it's not advertised */
+ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+ return 1;
+
+ svm->spec_ctrl = data;
+
+ /*
+ * When it's written (to non-zero) for the first time, pass
+ * it through. This means we don't have to take the perf
+ * hit of saving it on vmexit for the common case of guests
+ * that don't use it.
+ */
+ if (data && !svm->save_spec_ctrl_on_exit) {
+ svm->save_spec_ctrl_on_exit = true;
+ if (is_guest_mode(vcpu))
+ break;
+ set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+ }
+ break;
case MSR_IA32_PRED_CMD:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4963,6 +4999,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)

local_irq_enable();

+ /*
+ * If this vCPU has touched SPEC_CTRL, restore the guest's value if
+ * it's non-zero. Since vmentry is serialising on affected CPUs, there
+ * is no need to worry about the conditional branch over the wrmsr
+ * being speculatively taken.
+ */
+ if (svm->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
asm volatile (
"push %%" _ASM_BP "; \n\t"
"mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
@@ -5055,6 +5100,19 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);

+ /*
+ * We do not use IBRS in the kernel. If this vCPU has used the
+ * SPEC_CTRL MSR it may have left it on; save the value and
+ * turn it off. This is much more efficient than blindly adding
+ * it to the atomic save/restore list. Especially as the former
+ * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
+ */
+ if (svm->save_spec_ctrl_on_exit)
+ rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
+
+ if (svm->spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();

--
2.7.4


2018-01-31 13:14:50

by KarimAllah Ahmed

[permalink] [raw]
Subject: [PATCH v4 3/5] KVM: VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

Future intel processors will use MSR_IA32_ARCH_CAPABILITIES MSR to indicate
RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default
the contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Cc: Asit Mallick <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arjan Van De Ven <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jun Nakajima <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Ashok Raj <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx.c | 15 +++++++++++++++
arch/x86/kvm/x86.c | 1 +
3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,

/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);

/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 96e672e..40643b8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -592,6 +592,8 @@ struct vcpu_vmx {
u64 msr_host_kernel_gs_base;
u64 msr_guest_kernel_gs_base;
#endif
+ u64 arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3236,6 +3238,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ return 1;
+ msr_info->data = to_vmx(vcpu)->arch_capabilities;
+ break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3362,6 +3370,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
MSR_TYPE_W);
break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!msr_info->host_initiated)
+ return 1;
+ vmx->arch_capabilities = data;
+ break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5624,6 +5637,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}

+ if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);

vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_ARCH_CAPABILITIES
};

static unsigned num_msrs_to_save;
--
2.7.4


2018-01-31 13:15:40

by KarimAllah Ahmed

[permalink] [raw]
Subject: [PATCH v4 1/5] KVM: x86: Update the reverse_cpuid list to include CPUID_7_EDX

[dwmw2: Stop using KF() for bits in it, too]
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/kvm/cpuid.c | 8 +++-----
arch/x86/kvm/cpuid.h | 1 +
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)

#define F(x) bit(X86_FEATURE_##x)

-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
#define KF(x) bit(KVM_CPUID_BIT_##x)

int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,

/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
- KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+ F(AVX512_4VNNIW) | F(AVX512_4FMAPS);

/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
- entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+ cpuid_mask(&entry->edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
+ [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
};

static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
--
2.7.4


2018-01-31 13:15:54

by KarimAllah Ahmed

[permalink] [raw]
Subject: [PATCH v4 2/5] KVM: x86: Add IBPB support

From: Ashok Raj <[email protected]>

Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
barriers on switching between VMs to avoid inter VM Spectre-v2 attacks.

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
- vmx: expose PRED_CMD if guest has it in CPUID
- svm: only pass through IBPB if guest has it in CPUID
- vmx: support !cpu_has_vmx_msr_bitmap()]
- vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Cc: Asit Mallick <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arjan Van De Ven <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jun Nakajima <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: KarimAllah Ahmed <[email protected]>
---
arch/x86/kvm/cpuid.c | 11 ++++++++++-
arch/x86/kvm/svm.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 31 ++++++++++++++++++++++++++++++-
3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);

+ /* cpuid 0x80000008.ebx */
+ const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+ F(IBPB);
+
/* cpuid 0xC0000001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
- entry->ebx = entry->edx = 0;
+ entry->edx = 0;
+ /* IBPB isn't necessarily present in hardware cpuid */
+ if (boot_cpu_has(X86_FEATURE_IBPB))
+ entry->ebx |= F(IBPB);
+ entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+ cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
break;
}
case 0x80000019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..89495cf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -529,6 +529,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;

struct page *save_area;
+ struct vmcb *current_vmcb;
};

static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+ /*
+ * The vmcb page can be recycled, causing a false negative in
+ * svm_vcpu_load(). So do a full IBPB now.
+ */
+ indirect_branch_prediction_barrier();
}

static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;

if (unlikely(cpu != vcpu->cpu)) {
@@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);

+ if (sd->current_vmcb != svm->vmcb) {
+ sd->current_vmcb = svm->vmcb;
+ indirect_branch_prediction_barrier();
+ }
avic_vcpu_load(vcpu, cpu);
}

@@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+ case MSR_IA32_PRED_CMD:
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+ return 1;
+
+ if (data & ~PRED_CMD_IBPB)
+ return 1;
+
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+ if (is_guest_mode(vcpu))
+ break;
+ set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+ break;
case MSR_STAR:
svm->vmcb->save.star = data;
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d46a61b..96e672e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
+ indirect_branch_prediction_barrier();
}

if (!already_loaded) {
@@ -3342,6 +3343,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
+ case MSR_IA32_PRED_CMD:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+ return 1;
+
+ if (data & ~PRED_CMD_IBPB)
+ return 1;
+
+ if (!data)
+ break;
+
+ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
+
+ if (is_guest_mode(vcpu))
+ break;
+
+ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+ MSR_TYPE_W);
+ break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -10046,7 +10066,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;

/* This shortcut is ok because we support only x2APIC MSRs so far. */
- if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+ if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+ !to_vmx(vcpu)->save_spec_ctrl_on_exit)
return false;

page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -10079,6 +10100,14 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
MSR_TYPE_W);
}
}
+
+ if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
+ nested_vmx_disable_intercept_for_msr(
+ msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PRED_CMD,
+ MSR_TYPE_R);
+ }
+
kunmap(page);
kvm_release_page_clean(page);

--
2.7.4


2018-01-31 16:53:10

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <[email protected]> wrote:

> + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> + MSR_TYPE_W);

Why not disable this intercept eagerly, rather than lazily? Unlike
MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
is no cost to disabling the intercept if the guest cpuid info declares
support for it.


> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
> + nested_vmx_disable_intercept_for_msr(
> + msr_bitmap_l1, msr_bitmap_l0,
> + MSR_IA32_PRED_CMD,
> + MSR_TYPE_R);
> + }

I don't think this should be predicated on
"to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
"guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
MSR_TYPE_R.

2018-01-31 16:57:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On 31/01/2018 11:50, Jim Mattson wrote:
>> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>> + nested_vmx_disable_intercept_for_msr(
>> + msr_bitmap_l1, msr_bitmap_l0,
>> + MSR_IA32_PRED_CMD,
>> + MSR_TYPE_R);
>> + }
> I don't think this should be predicated on
> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
> MSR_TYPE_R.

In fact this MSR can even be passed down unconditionally, since it needs
no save/restore and has no ill performance effect on the sibling
hyperthread.

Only MSR_IA32_SPEC_CTRL needs to be conditional on
"to_vmx(vcpu)->save_spec_ctrl_on_exit".

Paolo

2018-01-31 17:16:40

by KarimAllah Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On 01/31/2018 05:50 PM, Jim Mattson wrote:
> On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <[email protected]> wrote:
>
>> + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
>> + MSR_TYPE_W);
>
> Why not disable this intercept eagerly, rather than lazily? Unlike
> MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
> is no cost to disabling the intercept if the guest cpuid info declares
> support for it.
>
>
>> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>> + nested_vmx_disable_intercept_for_msr(
>> + msr_bitmap_l1, msr_bitmap_l0,
>> + MSR_IA32_PRED_CMD,
>> + MSR_TYPE_R);
>> + }
>
> I don't think this should be predicated on
> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"?

Paolo suggested this on the previous revision because guest_cpuid_has()
would be slow.

> Also, the final argument to
> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
> MSR_TYPE_R.
>
Oops! will fix!
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

2018-01-31 17:19:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On 31/01/2018 12:11, KarimAllah Ahmed wrote:
> On 01/31/2018 05:50 PM, Jim Mattson wrote:
>> On Wed, Jan 31, 2018 at 5:10 AM, KarimAllah Ahmed <[email protected]>
>> wrote:
>>
>>> +               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
>>> MSR_IA32_PRED_CMD,
>>> +                                             MSR_TYPE_W);
>>
>> Why not disable this intercept eagerly, rather than lazily? Unlike
>> MSR_IA32_SPEC_CTRL, there is no guest value to save/restore, so there
>> is no cost to disabling the intercept if the guest cpuid info declares
>> support for it.
>>
>>
>>> +       if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>>> +               nested_vmx_disable_intercept_for_msr(
>>> +                               msr_bitmap_l1, msr_bitmap_l0,
>>> +                               MSR_IA32_PRED_CMD,
>>> +                               MSR_TYPE_R);
>>> +       }
>>
>> I don't think this should be predicated on
>> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
>> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"?
>
> Paolo suggested this on the previous revision because guest_cpuid_has()
> would be slow.

Sorry, that was for spec_ctrl. Here there's no need to do any kind of
conditional check.

Paolo

>> Also, the final argument to
>> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
>> MSR_TYPE_R.
>>
> Oops! will fix!
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


2018-01-31 17:22:20

by KarimAllah Ahmed

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On 01/31/2018 05:55 PM, Paolo Bonzini wrote:
> On 31/01/2018 11:50, Jim Mattson wrote:
>>> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
>>> + nested_vmx_disable_intercept_for_msr(
>>> + msr_bitmap_l1, msr_bitmap_l0,
>>> + MSR_IA32_PRED_CMD,
>>> + MSR_TYPE_R);
>>> + }
>> I don't think this should be predicated on
>> "to_vmx(vcpu)->save_spec_ctrl_on_exit." Why not just
>> "guest_cpuid_has(vcpu, X86_FEATURE_IBPB)"? Also, the final argument to
>> nested_vmx_disable_intercept_for_msr should be MSR_TYPE_W rather than
>> MSR_TYPE_R.
>
> In fact this MSR can even be passed down unconditionally, since it needs
> no save/restore and has no ill performance effect on the sibling
> hyperthread.
>
> Only MSR_IA32_SPEC_CTRL needs to be conditional on
> "to_vmx(vcpu)->save_spec_ctrl_on_exit".

That used to be the case in an earlier version. There seems to be two
opinions here:

1) Pass it only if CPUID for the guest has it.
2) Pass it unconditionally.

I do not really have a preference.


>
> Paolo
>
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

2018-01-31 17:55:51

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <[email protected]> wrote:

> In fact this MSR can even be passed down unconditionally, since it needs
> no save/restore and has no ill performance effect on the sibling
> hyperthread.

I'm a bit surprised to hear that IBPB has no ill performance impact on
the sibling hyperthread. On current CPUs, this has to flush the BTB,
doesn't it? And since the BTB is shared between hyperthreads, doesn't
the sibling lose all of its branch predictions?

2018-01-31 18:05:37

by Van De Ven, Arjan

[permalink] [raw]
Subject: RE: [PATCH v4 2/5] KVM: x86: Add IBPB support


> On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <[email protected]> wrote:
>
> > In fact this MSR can even be passed down unconditionally, since it needs
> > no save/restore and has no ill performance effect on the sibling
> > hyperthread.
>
> I'm a bit surprised to hear that IBPB has no ill performance impact on
> the sibling hyperthread. On current CPUs, this has to flush the BTB,
> doesn't it? And since the BTB is shared between hyperthreads, doesn't
> the sibling lose all of its branch predictions?

IBPB most definitely (in current implementations) will stop both hyperthreads for
the flush.

IBPB is not a cheap operation on a system level

2018-01-31 18:11:18

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

Looking at SVM now...

On 31/01/2018 08:10, KarimAllah Ahmed wrote:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f40d0da..89495cf 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -529,6 +529,7 @@ struct svm_cpu_data {
> struct kvm_ldttss_desc *tss_desc;
>
> struct page *save_area;
> + struct vmcb *current_vmcb;
> };
>
> static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -1703,11 +1704,17 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
> __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
> kvm_vcpu_uninit(vcpu);
> kmem_cache_free(kvm_vcpu_cache, svm);
> + /*
> + * The vmcb page can be recycled, causing a false negative in
> + * svm_vcpu_load(). So do a full IBPB now.
> + */
> + indirect_branch_prediction_barrier();
> }
>
> static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> + struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
> int i;
>
> if (unlikely(cpu != vcpu->cpu)) {
> @@ -1736,6 +1743,10 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (static_cpu_has(X86_FEATURE_RDTSCP))
> wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> + if (sd->current_vmcb != svm->vmcb) {
> + sd->current_vmcb = svm->vmcb;
> + indirect_branch_prediction_barrier();
> + }
> avic_vcpu_load(vcpu, cpu);
> }
>
> @@ -3684,6 +3695,22 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> case MSR_IA32_TSC:
> kvm_write_tsc(vcpu, msr);
> break;
> + case MSR_IA32_PRED_CMD:
> + if (!msr->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> + return 1;
> +
> + if (data & ~PRED_CMD_IBPB)
> + return 1;
> +
> + if (!data)
> + break;
> +
> + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);

This is missing an

{ .index = MSR_IA32_PRED_CMD, .always = false },

in the direct_access_msrs array. Once you do this, nested is taken care of
automatically.

Likewise for MSR_IA32_SPEC_CTRL in patch 5.

Paolo

> + if (is_guest_mode(vcpu))
> + break;
> + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
> + break;
> case MSR_STAR:
> svm->vmcb->save.star = data;
> break;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d46a61b..96e672e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2285,6 +2285,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
> per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
> vmcs_load(vmx->loaded_vmcs->vmcs);
> + indirect_branch_prediction_barrier();
> }
>
> if (!already_loaded) {
> @@ -3342,6 +3343,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_TSC:
> kvm_write_tsc(vcpu, msr_info);
> break;
> + case MSR_IA32_PRED_CMD:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
> + return 1;
> +
> + if (data & ~PRED_CMD_IBPB)
> + return 1;
> +
> + if (!data)
> + break;
> +
> + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
> +
> + if (is_guest_mode(vcpu))
> + break;
> +
> + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> + MSR_TYPE_W);
> + break;
> case MSR_IA32_CR_PAT:
> if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
> if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
> @@ -10046,7 +10066,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
>
> /* This shortcut is ok because we support only x2APIC MSRs so far. */
> - if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
> + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
> + !to_vmx(vcpu)->save_spec_ctrl_on_exit)
> return false;
>
> page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
> @@ -10079,6 +10100,14 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
> MSR_TYPE_W);
> }
> }
> +
> + if (to_vmx(vcpu)->save_spec_ctrl_on_exit) {
> + nested_vmx_disable_intercept_for_msr(
> + msr_bitmap_l1, msr_bitmap_l0,
> + MSR_IA32_PRED_CMD,
> + MSR_TYPE_R);
> + }
> +
> kunmap(page);
> kvm_release_page_clean(page);
>
>


2018-01-31 18:55:03

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: x86: Add IBPB support

On 31/01/2018 12:39, Jim Mattson wrote:
> On Wed, Jan 31, 2018 at 8:55 AM, Paolo Bonzini <[email protected]> wrote:
>
>> In fact this MSR can even be passed down unconditionally, since it needs
>> no save/restore and has no ill performance effect on the sibling
>> hyperthread.
>
> I'm a bit surprised to hear that IBPB has no ill performance impact on
> the sibling hyperthread. On current CPUs, this has to flush the BTB,
> doesn't it? And since the BTB is shared between hyperthreads, doesn't
> the sibling lose all of its branch predictions?

Yeah, I knew about that, but I hadn't heard yet that it also blocks the
hyperthread while you do the write. It makes sense in retrospect.

In any case, there's no difference (unlike IBRS) in the vmexit cost if
the MSR is passed through.

Paolo