2024-04-10 15:01:55

by Chao Gao

[permalink] [raw]
Subject: [RFC PATCH v3 00/10] Virtualize Intel IA32_SPEC_CTRL

Hi all,

This series is tagged as RFC because I want to seek your feedback on

1. the KVM<->userspace ABI defined in patch 1

I am wondering if we can allow the userspace to configure the mask
and the shadow value during guest's lifetime and do it on a vCPU basis.
this way, in conjunction with "virtual MSRs" or any other interfaces,
the usespace can adjust hardware mitigations applied to the guest during
guest's lifetime e.g., for the best performance.

2. Intel-defined virtual MSRs vs. a new interface

The situation is some other OS already adopts the Intel-defined virtual
MSRs. Given this, I am not sure whether defining a new interface is
still preferable, as it will add more complexities if we end up with two
interfaces for the same purpose.

So, I just want to reconfirm whether the suggestion remains to define a
new interface through community collaboration as suggested at [1].



Below is the cover letter:

Background
==========

Branch History Injection (BHI) is a special form of Spectre variant 2,
where an attacker may manipulate branch history before transitioning
from user to supervisor mode (or from VMX non-root/guest to root mode)
in an effort to cause an indirect branch predictor to select a specific
predictor entry for an indirect branch, and a disclosure gadget at the
predicted target will transiently execute.

To mitigate BHI attacks, the kernel may use the hardware mitigation, i.e.,
BHI_DIS_S or resort to a SW loop, i.e., the BHB-clearing sequence, when the
hardware mitigation is not supported.


Problem
=======

However, the SW loop is effective on pre-SPR parts but not on SPR and
future parts. This creates a mitigation effectiveness problem for virtual
machines:

Migrating a guest using the SW loop on a pre-SPR part to parts where
the SW loop is ineffective (e.g., a SPR or future part) makes the
guest become vulnerable to BHI.

[For bare-metal, it isn't a problem. because parts on which the SW loop
is ineffective always support BHI_DIS_S, which is a more preferable
mitigation than the SW loop.]


Solution
========
This series proposes QEMU+KVM to deploy BHI_DIS_S using "virtualize
IA32_SPEC_CTRL" for the guest if the SW loop is ineffective on the host.

Note that: "virtualize IA32_SPEC_CTRL" allows the VMM to prevent the
guest from changing some bits of IA32_SPEC_CTRL MSR w/o intercepting
guest's writes to the MSR.


This solution leads to a new problem:

Deploying BHI_DIS_S for the guest may cause unnecessary performance loss
if the guest is using other mitigations for BHI or doesn't care BHI
attacks at all.

To overcome this unnecessary performance loss, we want to allow the guest
to opt out of BHI_DIS_S in this case. the idea is to let the guest report
whether it is using the SW loop to KVM/QEMU. Then KVM/QEMU won't deploy
BHI_DIS_S for the guest if the SW loop isn't in use.

Intel defines a set of para-virtualized MSRs [2] for guests to report
software mitigation status. This series emulates the para-virtualized
MSRs in KVM.

Overall, the series has two parts:
1. patch 1-3: Define the KVM ABI for userspace VMMs (e.g., QEMU) to deploy
hardware mitigations for the guest to solve the mitigation effectivenss
problem when migrating guests across parts w/ different microarchitecture.

2. patch 4-10: Emulate virtual MSRs so that the guest can report software
mitigation status to avoid the unnecessary performance loss.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#inpage-nav-4


Chao Gao (4):
KVM: VMX: Cache IA32_SPEC_CTRL_SHADOW field of VMCS
KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02
KVM: VMX: Cache force_spec_ctrl_value/mask for each vCPU
KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT

Daniel Sneddon (1):
KVM: VMX: Virtualize Intel IA32_SPEC_CTRL

Pawan Gupta (2):
x86/bugs: Use Virtual MSRs to request BHI_DIS_S
x86/bugs: Use Virtual MSRs to request RRSBA_DIS_S

Zhang Chen (3):
KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
KVM: VMX: Advertise MITIGATION_CTRL support
KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

Documentation/virt/kvm/api.rst | 39 +++++++
arch/x86/include/asm/kvm_host.h | 4 +
arch/x86/include/asm/msr-index.h | 24 +++++
arch/x86/include/asm/vmx.h | 5 +
arch/x86/include/asm/vmxfeatures.h | 2 +
arch/x86/kernel/cpu/bugs.c | 33 ++++++
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/vmx/capabilities.h | 5 +
arch/x86/kvm/vmx/nested.c | 30 ++++++
arch/x86/kvm/vmx/vmx.c | 162 +++++++++++++++++++++++++++--
arch/x86/kvm/vmx/vmx.h | 21 +++-
arch/x86/kvm/x86.c | 49 ++++++++-
arch/x86/kvm/x86.h | 1 +
include/uapi/linux/kvm.h | 4 +
16 files changed, 376 insertions(+), 8 deletions(-)


base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
--
2.39.3



2024-04-10 15:07:09

by Chao Gao

[permalink] [raw]
Subject: [RFC PATCH v3 08/10] KVM: VMX: Advertise MITIGATION_CTRL support

From: Zhang Chen <[email protected]>

Advertise MITIGATION_CTRL support and emulate accesses to two associated
MSRs.

MITIGATION_CTRL is enumerated by bit 0 of MSR_VIRTUAL_ENUMERATION. If
supported, two virtual MSRs MSR_VIRTUAL_MITIGATION_ENUM(0x50000001) and
MSR_VIRTUAL_MITIGATION_CTRL(0x50000002) are available.

The guest can use the two MSRs to report software mitigation status.
According to this information, KVM can deploy some alternative
mitigations (e.g., hardware mitigations) for the guest if some software
mitigations are not effective on the host.

Signed-off-by: Zhang Chen <[email protected]>
Co-developed-by: Chao Gao <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
---
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 36 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 3 +++
arch/x86/kvm/x86.c | 3 +++
4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e3406971a8b7..8a080592aa54 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4289,6 +4289,8 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
case MSR_VIRTUAL_ENUMERATION:
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ case MSR_VIRTUAL_MITIGATION_CTRL:
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
return false;
case MSR_IA32_SMBASE:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dcb06406fd09..cc260b14f8df 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1955,7 +1955,9 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
return !(msr->data & ~valid_bits);
}

-#define VIRTUAL_ENUMERATION_VALID_BITS 0ULL
+#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
+#define MITI_ENUM_VALID_BITS 0ULL
+#define MITI_CTRL_VALID_BITS 0ULL

static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -1967,6 +1969,9 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
case MSR_VIRTUAL_ENUMERATION:
msr->data = VIRTUAL_ENUMERATION_VALID_BITS;
return 0;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ msr->data = MITI_ENUM_VALID_BITS;
+ return 0;
default:
return KVM_MSR_RET_INVALID;
}
@@ -2124,6 +2129,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = vmx->msr_virtual_enumeration;
break;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ msr_info->data = vmx->msr_virtual_mitigation_enum;
+ break;
+ case MSR_VIRTUAL_MITIGATION_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ msr_info->data = vmx->msr_virtual_mitigation_ctrl;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2476,7 +2493,23 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

vmx->msr_virtual_enumeration = data;
break;
+ case MSR_VIRTUAL_MITIGATION_ENUM:
+ if (!msr_info->host_initiated)
+ return 1;
+ if (data & ~MITI_ENUM_VALID_BITS)
+ return 1;
+
+ vmx->msr_virtual_mitigation_enum = data;
+ break;
+ case MSR_VIRTUAL_MITIGATION_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vmx->msr_virtual_enumeration & VIRT_ENUM_MITIGATION_CTRL_SUPPORT))
+ return 1;
+ if (data & ~MITI_CTRL_VALID_BITS)
+ return 1;

+ vmx->msr_virtual_mitigation_ctrl = data;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_index);
@@ -4901,6 +4934,7 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
*/
vmx->pi_desc.nv = POSTED_INTR_VECTOR;
vmx->pi_desc.sn = 1;
+ vmx->msr_virtual_mitigation_ctrl = 0;
}

static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0519cf6187ac..7be5dd5dde6c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -296,6 +296,9 @@ struct vcpu_vmx {

u64 msr_virtual_enumeration;

+ u64 msr_virtual_mitigation_enum;
+ u64 msr_virtual_mitigation_ctrl;
+
u32 msr_ia32_umwait_control;

/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4721b6fe7641..f55d26d7c79a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1565,6 +1565,8 @@ static const u32 emulated_msrs_all[] = {
MSR_K7_HWCR,
MSR_KVM_POLL_CONTROL,
MSR_VIRTUAL_ENUMERATION,
+ MSR_VIRTUAL_MITIGATION_ENUM,
+ MSR_VIRTUAL_MITIGATION_CTRL,
};

static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
@@ -1581,6 +1583,7 @@ static const u32 msr_based_features_all_except_vmx[] = {
MSR_IA32_ARCH_CAPABILITIES,
MSR_IA32_PERF_CAPABILITIES,
MSR_VIRTUAL_ENUMERATION,
+ MSR_VIRTUAL_MITIGATION_ENUM,
};

static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
--
2.39.3


2024-04-10 15:10:58

by Chao Gao

[permalink] [raw]
Subject: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

From: Zhang Chen <[email protected]>

Allow guest to report if the short BHB-clearing sequence is in use.

KVM will deploy BHI_DIS_S for the guest if the short BHB-clearing
sequence is in use and the processor doesn't enumerate BHI_NO.

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc260b14f8df..c5ceaebd954b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,8 +1956,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
}

#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
-#define MITI_ENUM_VALID_BITS 0ULL
-#define MITI_CTRL_VALID_BITS 0ULL
+#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
+#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED

static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
struct vmx_uret_msr *msr;
int ret = 0;
u32 msr_index = msr_info->index;
- u64 data = msr_info->data;
+ u64 data = msr_info->data, spec_ctrl_mask = 0;
u32 index;

switch (msr_index) {
@@ -2508,6 +2508,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~MITI_CTRL_VALID_BITS)
return 1;

+ if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
+ kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
+ !(host_arch_capabilities & ARCH_CAP_BHI_NO))
+ spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
+
+ /*
+ * Intercept IA32_SPEC_CTRL to disallow guest from changing
+ * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
+ * e.g., in nested case.
+ */
+ if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
+ vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
+
+ /*
+ * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
+ * MSR_VIRTUAL_MITIGATION_CTRL.
+ */
+ spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
+
+ vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
+ spec_ctrl_mask;
+ vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
+ spec_ctrl_mask;
+ vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
+
vmx->msr_virtual_mitigation_ctrl = data;
break;
default:
--
2.39.3


2024-04-10 15:15:26

by Chao Gao

[permalink] [raw]
Subject: [RFC PATCH v3 10/10] KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT

Allow guest to report if retpoline is used in supervisor mode.

KVM will deploy RRSBA_DIS_S for guest if guest is using retpoline and
the processor enumerates RRSBA.

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c5ceaebd954b..235cb6ad69c0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,8 +1956,10 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
}

#define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
-#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
-#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
+#define MITI_ENUM_VALID_BITS (MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT | \
+ MITI_ENUM_RETPOLINE_S_SUPPORT)
+#define MITI_CTRL_VALID_BITS (MITI_CTRL_BHB_CLEAR_SEQ_S_USED | \
+ MITI_CTRL_RETPOLINE_S_USED)

static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
@@ -2508,6 +2510,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~MITI_CTRL_VALID_BITS)
return 1;

+ if (data & MITI_CTRL_RETPOLINE_S_USED &&
+ kvm_cpu_cap_has(X86_FEATURE_RRSBA_CTRL) &&
+ host_arch_capabilities & ARCH_CAP_RRSBA)
+ spec_ctrl_mask |= SPEC_CTRL_RRSBA_DIS_S;
+
if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
!(host_arch_capabilities & ARCH_CAP_BHI_NO))
--
2.39.3


2024-06-11 01:34:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

On Wed, Apr 10, 2024, Chao Gao wrote:
> From: Zhang Chen <[email protected]>
>
> Allow guest to report if the short BHB-clearing sequence is in use.
>
> KVM will deploy BHI_DIS_S for the guest if the short BHB-clearing
> sequence is in use and the processor doesn't enumerate BHI_NO.
>
> Signed-off-by: Zhang Chen <[email protected]>
> Signed-off-by: Chao Gao <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++---
> 1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cc260b14f8df..c5ceaebd954b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1956,8 +1956,8 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
> }
>
> #define VIRTUAL_ENUMERATION_VALID_BITS VIRT_ENUM_MITIGATION_CTRL_SUPPORT
> -#define MITI_ENUM_VALID_BITS 0ULL
> -#define MITI_CTRL_VALID_BITS 0ULL
> +#define MITI_ENUM_VALID_BITS MITI_ENUM_BHB_CLEAR_SEQ_S_SUPPORT
> +#define MITI_CTRL_VALID_BITS MITI_CTRL_BHB_CLEAR_SEQ_S_USED
>
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> @@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> struct vmx_uret_msr *msr;
> int ret = 0;
> u32 msr_index = msr_info->index;
> - u64 data = msr_info->data;
> + u64 data = msr_info->data, spec_ctrl_mask = 0;
> u32 index;
>
> switch (msr_index) {
> @@ -2508,6 +2508,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (data & ~MITI_CTRL_VALID_BITS)
> return 1;
>
> + if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
> + kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
> + !(host_arch_capabilities & ARCH_CAP_BHI_NO))
> + spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
> +
> + /*
> + * Intercept IA32_SPEC_CTRL to disallow guest from changing
> + * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
> + * e.g., in nested case.
> + */
> + if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
> + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
> +
> + /*
> + * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
> + * MSR_VIRTUAL_MITIGATION_CTRL.
> + */
> + spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
> +
> + vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
> + spec_ctrl_mask;
> + vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
> + spec_ctrl_mask;
> + vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
> +
> vmx->msr_virtual_mitigation_ctrl = data;
> break;

I continue find all of this unpalatable. The guest tells KVM what software
mitigations the guest is using, and then KVM is supposed to translate that into
some hardware functionality? And merge that with userspace's own overrides?

Blech.

With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
there's no need for KVM to handle them for performance reasons.

That way KVM doesn't need to deal with the the virtual MSRs, userspace can make
an informed decision when deciding how to set KVM_CAP_FORCE_SPEC_CTRL, and as a
bonus, rollouts for new mitigation thingies should be faster as updating userspace
is typically easier than updating the kernel/KVM.

2024-06-11 10:49:18

by Chao Gao

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

>> + if (data & MITI_CTRL_BHB_CLEAR_SEQ_S_USED &&
>> + kvm_cpu_cap_has(X86_FEATURE_BHI_CTRL) &&
>> + !(host_arch_capabilities & ARCH_CAP_BHI_NO))
>> + spec_ctrl_mask |= SPEC_CTRL_BHI_DIS_S;
>> +
>> + /*
>> + * Intercept IA32_SPEC_CTRL to disallow guest from changing
>> + * certain bits if "virtualize IA32_SPEC_CTRL" isn't supported
>> + * e.g., in nested case.
>> + */
>> + if (spec_ctrl_mask && !cpu_has_spec_ctrl_shadow())
>> + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
>> +
>> + /*
>> + * KVM_CAP_FORCE_SPEC_CTRL takes precedence over
>> + * MSR_VIRTUAL_MITIGATION_CTRL.
>> + */
>> + spec_ctrl_mask &= ~vmx->vcpu.kvm->arch.force_spec_ctrl_mask;
>> +
>> + vmx->force_spec_ctrl_mask = vmx->vcpu.kvm->arch.force_spec_ctrl_mask |
>> + spec_ctrl_mask;
>> + vmx->force_spec_ctrl_value = vmx->vcpu.kvm->arch.force_spec_ctrl_value |
>> + spec_ctrl_mask;
>> + vmx_set_spec_ctrl(&vmx->vcpu, vmx->spec_ctrl_shadow);
>> +
>> vmx->msr_virtual_mitigation_ctrl = data;
>> break;
>
>I continue find all of this unpalatable. The guest tells KVM what software
>mitigations the guest is using, and then KVM is supposed to translate that into
>some hardware functionality? And merge that with userspace's own overrides?

Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
wanted to punt to userspace ...

>
>Blech.
>
>With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
>Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
>Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
>to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
>there's no need for KVM to handle them for performance reasons.

... I had this idea of implementing policy-related stuff in userspace, and I wrote
in the cover-letter:

"""
1. the KVM<->userspace ABI defined in patch 1

I am wondering if we can allow the userspace to configure the mask
and the shadow value during guest's lifetime and do it on a vCPU basis.
this way, in conjunction with "virtual MSRs" or any other interfaces,
the usespace can adjust hardware mitigations applied to the guest during
guest's lifetime e.g., for the best performance.
"""

As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
the mask and shadow values adjustable and applicable on a per-vCPU basis. The
tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
preferable interfaces, they could also benefit from these changes.

Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?

>
>That way KVM doesn't need to deal with the the virtual MSRs, userspace can make
>an informed decision when deciding how to set KVM_CAP_FORCE_SPEC_CTRL, and as a
>bonus, rollouts for new mitigation thingies should be faster as updating userspace
>is typically easier than updating the kernel/KVM.

Good point!

2024-06-11 13:43:55

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

On Tue, Jun 11, 2024, Chao Gao wrote:
> >I continue find all of this unpalatable. The guest tells KVM what software
> >mitigations the guest is using, and then KVM is supposed to translate that into
> >some hardware functionality? And merge that with userspace's own overrides?
>
> Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
> wanted to punt to userspace ...
>
> >
> >Blech.
> >
> >With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
> >Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
> >Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
> >to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
> >there's no need for KVM to handle them for performance reasons.
>
> ... I had this idea of implementing policy-related stuff in userspace, and I wrote
> in the cover-letter:
>
> """
> 1. the KVM<->userspace ABI defined in patch 1
>
> I am wondering if we can allow the userspace to configure the mask
> and the shadow value during guest's lifetime and do it on a vCPU basis.
> this way, in conjunction with "virtual MSRs" or any other interfaces,
> the usespace can adjust hardware mitigations applied to the guest during
> guest's lifetime e.g., for the best performance.
> """

Gah, sorry, I speed read the cover letter and didn't take the time to process that.

> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
> preferable interfaces, they could also benefit from these changes.
>
> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?

Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
mitigations be system-wide / VM-wide?

2024-06-11 14:18:03

by Chao Gao

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

On Tue, Jun 11, 2024 at 06:34:49AM -0700, Sean Christopherson wrote:
>On Tue, Jun 11, 2024, Chao Gao wrote:
>> >I continue find all of this unpalatable. The guest tells KVM what software
>> >mitigations the guest is using, and then KVM is supposed to translate that into
>> >some hardware functionality? And merge that with userspace's own overrides?
>>
>> Yes. It is ugly. I will drop all Intel-defined stuff from KVM. Actually, I
>> wanted to punt to userspace ...
>>
>> >
>> >Blech.
>> >
>> >With KVM_CAP_FORCE_SPEC_CTRL, I don't see any reason for KVM to support the
>> >Intel-defined virtual MSRs. If the userspace VMM wants to play nice with the
>> >Intel-defined stuff, then userspace can advertise the MSRs and use an MSR filter
>> >to intercept and "emulate" the MSRs. They should be set-and-forget MSRs, so
>> >there's no need for KVM to handle them for performance reasons.
>>
>> ... I had this idea of implementing policy-related stuff in userspace, and I wrote
>> in the cover-letter:
>>
>> """
>> 1. the KVM<->userspace ABI defined in patch 1
>>
>> I am wondering if we can allow the userspace to configure the mask
>> and the shadow value during guest's lifetime and do it on a vCPU basis.
>> this way, in conjunction with "virtual MSRs" or any other interfaces,
>> the usespace can adjust hardware mitigations applied to the guest during
>> guest's lifetime e.g., for the best performance.
>> """
>
>Gah, sorry, I speed read the cover letter and didn't take the time to process that.
>
>> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
>> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
>> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
>> preferable interfaces, they could also benefit from these changes.
>>
>> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
>
>Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
>mitigations be system-wide / VM-wide?

Because spec_ctrl is per-vCPU and Intel-defined virtual MSRs are also per-vCPU.
i.e., a guest __can__ configure different values to virtual MSRs on different
vCPUs even though a sane guest won't do this. If KVM doesn't want to rule out
the possibility of supporting Intel-defined virtual MSRs in userspace or any
other per-vCPU interfaces, KVM_CAP_FORCE_SPEC_CTRL needs to be per-vCPU.

implementation-wise, being per-vCPU is simpler because, otherwise, once userspace
adjusts the hardware mitigations to enforce, KVM needs to kick all vCPUs. This
will add more complexity.

And IMO, requiring guests to deploy same mitigations on vCPUs is an unnecessary
limitation.

2024-06-11 16:33:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT

On Tue, Jun 11, 2024, Chao Gao wrote:
> On Tue, Jun 11, 2024 at 06:34:49AM -0700, Sean Christopherson wrote:
> >> As said, this requires some tweaks to KVM_CAP_FORCE_SPEC_CTRL, such as making
> >> the mask and shadow values adjustable and applicable on a per-vCPU basis. The
> >> tweaks are not necessarily for Intel-defined virtual MSRs; if there were other
> >> preferable interfaces, they could also benefit from these changes.
> >>
> >> Any objections to these tweaks to KVM_CAP_FORCE_SPEC_CTRL?
> >
> >Why does KVM_CAP_FORCE_SPEC_CTRL need to be per-vCPU? Won't the CPU bugs and
> >mitigations be system-wide / VM-wide?
>
> Because spec_ctrl is per-vCPU and Intel-defined virtual MSRs are also per-vCPU.

I figured that was the answer, but part of me was hopeful :-)

> i.e., a guest __can__ configure different values to virtual MSRs on different
> vCPUs even though a sane guest won't do this. If KVM doesn't want to rule out
> the possibility of supporting Intel-defined virtual MSRs in userspace or any
> other per-vCPU interfaces, KVM_CAP_FORCE_SPEC_CTRL needs to be per-vCPU.
>
> implementation-wise, being per-vCPU is simpler because, otherwise, once userspace
> adjusts the hardware mitigations to enforce, KVM needs to kick all vCPUs. This
> will add more complexity.

+1, I even typed up as much before reading this paragraph.

> And IMO, requiring guests to deploy same mitigations on vCPUs is an unnecessary
> limitation.

Yeah, I can see how it would make things weird for no good reason.

So yeah, if the only thing stopping us from letting userspace deal with the virtual
MSRs is converting to a vCPU-scoped ioctl(), then by all means, lets do that.