2019-11-18 18:19:22

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 0/5] KVM: vmx: implement MSR_IA32_TSX_CTRL for guests

The current guest mitigation of TAA is both too heavy and not really
sufficient. It is too heavy because it will cause some affected CPUs
(those that have MDS_NO but lack TAA_NO) to fall back to VERW and
get the corresponding slowdown. It is not really sufficient because
it will cause the MDS_NO bit to disappear upon microcode update, so
that VMs started before the microcode update will not be runnable
anymore afterwards, even with tsx=on.

Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
the guest and let it run without the VERW mitigation. Even though
MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
it on every vmentry, we can use the shared MSR functionality because
the host kernel need not protect itself from TSX-based side-channels.

Patch 1 is a minimal fix for MSR_IA32_ARCH_CAPABILITIES for stable
kernels. The other four patches implement the feature.

Getting some help testing this series with the kvm-unit-tests patch I
have just sent would be great; I could only test this on a machine that
I couldn't reboot, and therefore I could only work on an older kernel.

Paolo Bonzini (5):
KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
KVM: x86: do not modify masked bits of shared MSRs
KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack
it

arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 8 +++--
arch/x86/kvm/vmx/vmx.c | 78 ++++++++++++++++++++++++++++++++---------
arch/x86/kvm/x86.c | 34 ++++++++----------
4 files changed, 82 insertions(+), 39 deletions(-)

--
1.8.3.1


2019-11-18 18:19:24

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 1/5] KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES

KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
!RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
hidden (it actually was), yet the value says that TSX is not vulnerable
to microarchitectural data sampling. Fix both.

Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/x86.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d530521f11d..6ea735d632e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1327,12 +1327,18 @@ static u64 kvm_get_arch_capabilities(void)
* If TSX is disabled on the system, guests are also mitigated against
* TAA and clear CPU buffer mitigation is not required for guests.
*/
- if (boot_cpu_has_bug(X86_BUG_TAA) && boot_cpu_has(X86_FEATURE_RTM) &&
- (data & ARCH_CAP_TSX_CTRL_MSR))
+ if (!boot_cpu_has(X86_FEATURE_RTM))
+ data &= ~ARCH_CAP_TAA_NO;
+ else if (!boot_cpu_has_bug(X86_BUG_TAA))
+ data |= ARCH_CAP_TAA_NO;
+ else if (data & ARCH_CAP_TSX_CTRL_MSR)
data &= ~ARCH_CAP_MDS_NO;

+ /* KVM does not emulate MSR_IA32_TSX_CTRL. */
+ data &= ~ARCH_CAP_TSX_CTRL_MSR;
return data;
}
+EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);

static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
{
--
1.8.3.1


2019-11-18 18:19:34

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 5/5] KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it

If X86_FEATURE_RTM is disabled, the guest should not be able to access
MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all
transactions from the guest to abort.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 44 ++++++++++++++++++++++++++++++--------------
1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ed25fe7d5234..8cba65eec0d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -639,6 +639,23 @@ struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
return NULL;
}

+static int vmx_set_guest_msr(struct vcpu_vmx *vmx, struct shared_msr_entry *msr, u64 data)
+{
+ int ret = 0;
+
+ u64 old_msr_data = msr->data;
+ msr->data = data;
+ if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
+ preempt_disable();
+ ret = kvm_set_shared_msr(msr->index, msr->data,
+ msr->mask);
+ preempt_enable();
+ if (ret)
+ msr->data = old_msr_data;
+ }
+ return ret;
+}
+
void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
{
vmcs_clear(loaded_vmcs->vmcs);
@@ -2174,20 +2191,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
default:
find_shared_msr:
msr = find_msr_entry(vmx, msr_index);
- if (msr) {
- u64 old_msr_data = msr->data;
- msr->data = data;
- if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
- preempt_disable();
- ret = kvm_set_shared_msr(msr->index, msr->data,
- msr->mask);
- preempt_enable();
- if (ret)
- msr->data = old_msr_data;
- }
- break;
- }
- ret = kvm_set_msr_common(vcpu, msr_info);
+ if (msr)
+ ret = vmx_set_guest_msr(vmx, msr, data);
+ else
+ ret = kvm_set_msr_common(vcpu, msr_info);
}

return ret;
@@ -7138,6 +7145,15 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
update_intel_pt_cfg(vcpu);
+
+ if (boot_cpu_has(X86_FEATURE_RTM)) {
+ struct shared_msr_entry *msr;
+ msr = find_msr_entry(vmx, MSR_IA32_TSX_CTRL);
+ if (msr) {
+ bool enabled = guest_cpuid_has(vcpu, X86_FEATURE_RTM);
+ vmx_set_guest_msr(vmx, msr, enabled ? 0 : TSX_CTRL_RTM_DISABLE);
+ }
+ }
}

static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
--
1.8.3.1

2019-11-18 18:19:37

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 3/5] KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID

Because KVM always emulates CPUID, the CPUID clear bit
(bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually"
by the hypervisor when performing said emulation.

Right now neither kvm-intel.ko nor kvm-amd.ko implement
MSR_IA32_TSX_CTRL but this will change in the next patch.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 8 ++++++--
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4fc61483919a..663d09ac7778 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1357,6 +1357,7 @@ int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,

void kvm_enable_efer_bits(u64);
bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
+int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated);
int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data);
int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data);
int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f68c0c753c38..c0aa07487eb8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -816,8 +816,6 @@ static int do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 func,
return __do_cpuid_func(entry, func, nent, maxnent);
}

-#undef F
-
struct kvm_cpuid_param {
u32 func;
bool (*qualifier)(const struct kvm_cpuid_param *param);
@@ -1015,6 +1013,12 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
*ebx = entry->ebx;
*ecx = entry->ecx;
*edx = entry->edx;
+ if (function == 7 && index == 0) {
+ u64 data;
+ if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
+ (data & TSX_CTRL_CPUID_CLEAR))
+ *ebx &= ~(F(RTM) | F(HLE));
+ }
} else {
*eax = *ebx = *ecx = *edx = 0;
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02863998af91..648e84e728fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1484,8 +1484,8 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
* Returns 0 on success, non-0 otherwise.
* Assumes vcpu_load() was already called.
*/
-static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
+int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
{
struct msr_data msr;
int ret;
--
1.8.3.1


2019-11-18 18:22:08

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 2/5] KVM: x86: do not modify masked bits of shared MSRs

"Shared MSRs" are guest MSRs that are written to the host MSRs but
keep their value until the next return to userspace. They support
a mask, so that some bits keep the host value, but this mask is
only used to skip an unnecessary MSR write and the value written
to the MSR is always the guest MSR.

Fix this and, while at it, do not update smsr->values[slot].curr if
for whatever reason the wrmsr fails. This should only happen due to
reserved bits, so the value written to smsr->values[slot].curr
will not match when the user-return notifier and the host value will
always be restored. However, it is untidy and in rare cases this
can actually avoid spurious WRMSRs on return to userspace.

Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/x86.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6ea735d632e9..02863998af91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -300,13 +300,14 @@ int kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu);
int err;

- if (((value ^ smsr->values[slot].curr) & mask) == 0)
+ value = (value & mask) | (smsr->values[slot].host & ~mask);
+ if (value == smsr->values[slot].curr)
return 0;
- smsr->values[slot].curr = value;
err = wrmsrl_safe(shared_msrs_global.msrs[slot], value);
if (err)
return 1;

+ smsr->values[slot].curr = value;
if (!smsr->registered) {
smsr->urn.on_user_return = kvm_on_user_return;
user_return_notifier_register(&smsr->urn);
--
1.8.3.1


2019-11-18 18:22:52

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 4/5] KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality

The current guest mitigation of TAA is both too heavy and not really
sufficient. It is too heavy because it will cause some affected CPUs
(those that have MDS_NO but lack TAA_NO) to fall back to VERW and
get the corresponding slowdown. It is not really sufficient because
it will cause the MDS_NO bit to disappear upon microcode update, so
that VMs started before the microcode update will not be runnable
anymore afterwards, even with tsx=on.

Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
the guest and let it run without the VERW mitigation. Even though
MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
it on every vmentry, we can use the shared MSR functionality because
the host kernel need not protect itself from TSX-based side-channels.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++++---
arch/x86/kvm/x86.c | 23 +++++------------------
2 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 04a8212704c1..ed25fe7d5234 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -450,6 +450,7 @@ noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
#endif
MSR_EFER, MSR_TSC_AUX, MSR_STAR,
+ MSR_IA32_TSX_CTRL,
};

#if IS_ENABLED(CONFIG_HYPERV)
@@ -1683,6 +1684,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
index = __find_msr_index(vmx, MSR_TSC_AUX);
if (index >= 0 && guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP))
move_msr_up(vmx, index, save_nmsrs++);
+ index = __find_msr_index(vmx, MSR_IA32_TSX_CTRL);
+ if (index >= 0)
+ move_msr_up(vmx, index, save_nmsrs++);

vmx->save_nmsrs = save_nmsrs;
vmx->guest_msrs_ready = false;
@@ -1782,6 +1786,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
#endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
+ case MSR_IA32_TSX_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
+ return 1;
+ goto find_shared_msr;
case MSR_IA32_UMWAIT_CONTROL:
if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
return 1;
@@ -1884,8 +1893,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
return 1;
- /* Else, falls through */
+ goto find_shared_msr;
default:
+ find_shared_msr:
msr = find_msr_entry(vmx, msr_info->index);
if (msr) {
msr_info->data = msr->data;
@@ -2001,6 +2011,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_IA32_SPEC_CTRL,
MSR_TYPE_RW);
break;
+ case MSR_IA32_TSX_CTRL:
+ if (!msr_info->host_initiated &&
+ !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
+ return 1;
+ if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
+ return 1;
+ goto find_shared_msr;
case MSR_IA32_PRED_CMD:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -2152,8 +2169,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
/* Check reserved bit, higher 32 bits should be zero */
if ((data >> 32) != 0)
return 1;
- /* Else, falls through */
+ goto find_shared_msr;
+
default:
+ find_shared_msr:
msr = find_msr_entry(vmx, msr_index);
if (msr) {
u64 old_msr_data = msr->data;
@@ -4234,7 +4253,16 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
continue;
vmx->guest_msrs[j].index = i;
vmx->guest_msrs[j].data = 0;
- vmx->guest_msrs[j].mask = -1ull;
+
+ switch (index) {
+ case MSR_IA32_TSX_CTRL:
+ /* No need to pass TSX_CTRL_CPUID_CLEAR through. */
+ vmx->guest_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
+ break;
+ default:
+ vmx->guest_msrs[j].mask = -1ull;
+ break;
+ }
++vmx->nmsrs;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 648e84e728fc..fc54e3905fe3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1314,29 +1314,16 @@ static u64 kvm_get_arch_capabilities(void)
data |= ARCH_CAP_MDS_NO;

/*
- * On TAA affected systems, export MDS_NO=0 when:
- * - TSX is enabled on the host, i.e. X86_FEATURE_RTM=1.
- * - Updated microcode is present. This is detected by
- * the presence of ARCH_CAP_TSX_CTRL_MSR and ensures
- * that VERW clears CPU buffers.
- *
- * When MDS_NO=0 is exported, guests deploy clear CPU buffer
- * mitigation and don't complain:
- *
- * "Vulnerable: Clear CPU buffers attempted, no microcode"
- *
- * If TSX is disabled on the system, guests are also mitigated against
- * TAA and clear CPU buffer mitigation is not required for guests.
+ * On TAA affected systems:
+ * - nothing to do if TSX is disabled on the host.
+ * - we emulate TSX_CTRL if present on the host.
+ * This lets the guest use VERW to clear CPU buffers.
*/
if (!boot_cpu_has(X86_FEATURE_RTM))
- data &= ~ARCH_CAP_TAA_NO;
+ data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
else if (!boot_cpu_has_bug(X86_BUG_TAA))
data |= ARCH_CAP_TAA_NO;
- else if (data & ARCH_CAP_TSX_CTRL_MSR)
- data &= ~ARCH_CAP_MDS_NO;

- /* KVM does not emulate MSR_IA32_TSX_CTRL. */
- data &= ~ARCH_CAP_TSX_CTRL_MSR;
return data;
}
EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
--
1.8.3.1


2019-11-18 19:43:46

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES

On Mon, Nov 18, 2019 at 10:17 AM Paolo Bonzini <[email protected]> wrote:
>
> KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
> to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
> !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
> hidden (it actually was), yet the value says that TSX is not vulnerable
> to microarchitectural data sampling. Fix both.

I actually think kvm should virtualize IA32_TSX_CTRL for VMs that have
exclusive use of their cores (i.e. the same VMs for which we disable
MWAIT and HLT exiting).

2019-11-18 20:50:41

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES

On Mon, Nov 18, 2019 at 10:17 AM Paolo Bonzini <[email protected]> wrote:
>
> KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
> to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
> !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
> hidden (it actually was), yet the value says that TSX is not vulnerable
> to microarchitectural data sampling. Fix both.
> Cc: [email protected]
> Signed-off-by: Paolo Bonzini <[email protected]>
Ignore my previous comment. I see that the functionality I want is
coming later in this series.

Reviewed-by: Jim Mattson <[email protected]>

2019-11-19 19:03:09

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: x86: do not modify masked bits of shared MSRs

On Mon, Nov 18, 2019 at 10:17 AM Paolo Bonzini <[email protected]> wrote:
>
> "Shared MSRs" are guest MSRs that are written to the host MSRs but
> keep their value until the next return to userspace. They support
> a mask, so that some bits keep the host value, but this mask is
> only used to skip an unnecessary MSR write and the value written
> to the MSR is always the guest MSR.
>
> Fix this and, while at it, do not update smsr->values[slot].curr if
> for whatever reason the wrmsr fails. This should only happen due to
> reserved bits, so the value written to smsr->values[slot].curr
> will not match when the user-return notifier and the host value will
> always be restored. However, it is untidy and in rare cases this
> can actually avoid spurious WRMSRs on return to userspace.
>
> Cc: [email protected]
> Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2019-11-19 20:06:59

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 3/5] KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID

On Mon, Nov 18, 2019 at 10:17 AM Paolo Bonzini <[email protected]> wrote:
>
> Because KVM always emulates CPUID, the CPUID clear bit
> (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually"
> by the hypervisor when performing said emulation.
>
> Right now neither kvm-intel.ko nor kvm-amd.ko implement
> MSR_IA32_TSX_CTRL but this will change in the next patch.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2019-11-19 21:09:13

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality

On Mon, Nov 18, 2019 at 10:17 AM Paolo Bonzini <[email protected]> wrote:
>
> The current guest mitigation of TAA is both too heavy and not really
> sufficient. It is too heavy because it will cause some affected CPUs
> (those that have MDS_NO but lack TAA_NO) to fall back to VERW and
> get the corresponding slowdown. It is not really sufficient because
> it will cause the MDS_NO bit to disappear upon microcode update, so
> that VMs started before the microcode update will not be runnable
> anymore afterwards, even with tsx=on.
>
> Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
> the guest and let it run without the VERW mitigation. Even though
> MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
> it on every vmentry, we can use the shared MSR functionality because
> the host kernel need not protect itself from TSX-based side-channels.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++++---
> arch/x86/kvm/x86.c | 23 +++++------------------
> 2 files changed, 36 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 04a8212704c1..ed25fe7d5234 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -450,6 +450,7 @@ noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
> MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
> #endif
> MSR_EFER, MSR_TSC_AUX, MSR_STAR,
> + MSR_IA32_TSX_CTRL,
> };
>
> #if IS_ENABLED(CONFIG_HYPERV)
> @@ -1683,6 +1684,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
> index = __find_msr_index(vmx, MSR_TSC_AUX);
> if (index >= 0 && guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP))
> move_msr_up(vmx, index, save_nmsrs++);
> + index = __find_msr_index(vmx, MSR_IA32_TSX_CTRL);
> + if (index >= 0)
> + move_msr_up(vmx, index, save_nmsrs++);
>
> vmx->save_nmsrs = save_nmsrs;
> vmx->guest_msrs_ready = false;
> @@ -1782,6 +1786,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> #endif
> case MSR_EFER:
> return kvm_get_msr_common(vcpu, msr_info);
> + case MSR_IA32_TSX_CTRL:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
> + return 1;
> + goto find_shared_msr;
> case MSR_IA32_UMWAIT_CONTROL:
> if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
> return 1;
> @@ -1884,8 +1893,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> return 1;
> - /* Else, falls through */
> + goto find_shared_msr;
> default:
> + find_shared_msr:
> msr = find_msr_entry(vmx, msr_info->index);
> if (msr) {
> msr_info->data = msr->data;
> @@ -2001,6 +2011,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> MSR_IA32_SPEC_CTRL,
> MSR_TYPE_RW);
> break;
> + case MSR_IA32_TSX_CTRL:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
> + return 1;
> + if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
> + return 1;
> + goto find_shared_msr;
> case MSR_IA32_PRED_CMD:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> @@ -2152,8 +2169,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> /* Check reserved bit, higher 32 bits should be zero */
> if ((data >> 32) != 0)
> return 1;
> - /* Else, falls through */
> + goto find_shared_msr;
> +
> default:
> + find_shared_msr:
> msr = find_msr_entry(vmx, msr_index);
> if (msr) {
> u64 old_msr_data = msr->data;
> @@ -4234,7 +4253,16 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> continue;
> vmx->guest_msrs[j].index = i;
> vmx->guest_msrs[j].data = 0;
> - vmx->guest_msrs[j].mask = -1ull;
> +
> + switch (index) {
> + case MSR_IA32_TSX_CTRL:
> + /* No need to pass TSX_CTRL_CPUID_CLEAR through. */
> + vmx->guest_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> + break;

Why even bother with the special case here? Does this make the wrmsr faster?

> + default:
> + vmx->guest_msrs[j].mask = -1ull;
> + break;
> + }
> ++vmx->nmsrs;
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 648e84e728fc..fc54e3905fe3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1314,29 +1314,16 @@ static u64 kvm_get_arch_capabilities(void)
> data |= ARCH_CAP_MDS_NO;
>
> /*
> - * On TAA affected systems, export MDS_NO=0 when:
> - * - TSX is enabled on the host, i.e. X86_FEATURE_RTM=1.
> - * - Updated microcode is present. This is detected by
> - * the presence of ARCH_CAP_TSX_CTRL_MSR and ensures
> - * that VERW clears CPU buffers.
> - *
> - * When MDS_NO=0 is exported, guests deploy clear CPU buffer
> - * mitigation and don't complain:
> - *
> - * "Vulnerable: Clear CPU buffers attempted, no microcode"
> - *
> - * If TSX is disabled on the system, guests are also mitigated against
> - * TAA and clear CPU buffer mitigation is not required for guests.
> + * On TAA affected systems:
> + * - nothing to do if TSX is disabled on the host.
> + * - we emulate TSX_CTRL if present on the host.
> + * This lets the guest use VERW to clear CPU buffers.
> */
> if (!boot_cpu_has(X86_FEATURE_RTM))
> - data &= ~ARCH_CAP_TAA_NO;
> + data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
> else if (!boot_cpu_has_bug(X86_BUG_TAA))
> data |= ARCH_CAP_TAA_NO;
> - else if (data & ARCH_CAP_TSX_CTRL_MSR)
> - data &= ~ARCH_CAP_MDS_NO;
>
> - /* KVM does not emulate MSR_IA32_TSX_CTRL. */
> - data &= ~ARCH_CAP_TSX_CTRL_MSR;
> return data;
> }
> EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
> --
> 1.8.3.1
>
>

2019-11-20 16:22:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality

On 19/11/19 22:06, Jim Mattson wrote:
>> + switch (index) {
>> + case MSR_IA32_TSX_CTRL:
>> + /* No need to pass TSX_CTRL_CPUID_CLEAR through. */
>> + vmx->guest_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
>> + break;
> Why even bother with the special case here? Does this make the wrmsr faster?
>

No, but it can avoid the wrmsr altogether if the guest uses the same
DISABLE_RTM setting but a different value for CPUID_CLEAR.

More important, while I am confident re-enabling TSX while in the kernel
and only restoring MSR_IA32_TSX_CTRL on return to userspace, I'm more
wary of changing CPUID bits while the kernel is running. I will update
the comment.

Paolo


2019-11-21 02:30:53

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it

On Mon, Nov 18, 2019 at 07:17:47PM +0100, Paolo Bonzini wrote:
> If X86_FEATURE_RTM is disabled, the guest should not be able to access
> MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all
> transactions from the guest to abort.
>
> Signed-off-by: Paolo Bonzini <[email protected]>

So, without this patch guest OSes will incorrectly report "Not
affected" at /sys/devices/system/cpu/vulnerabilities/tsx_async_abort
if RTM is disabled in the VM configuration.

Is there anything host userspace can do to detect this situation
and issue a warning on that case?

Is there anything the guest kernel can do to detect this and not
report a false negative at /sys/.../tsx_async_abort?

--
Eduardo

2019-11-21 09:07:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it

On 21/11/19 03:22, Eduardo Habkost wrote:
> On Mon, Nov 18, 2019 at 07:17:47PM +0100, Paolo Bonzini wrote:
>> If X86_FEATURE_RTM is disabled, the guest should not be able to access
>> MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all
>> transactions from the guest to abort.
>>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>
> So, without this patch guest OSes will incorrectly report "Not
> affected" at /sys/devices/system/cpu/vulnerabilities/tsx_async_abort
> if RTM is disabled in the VM configuration.
>
> Is there anything host userspace can do to detect this situation
> and issue a warning on that case?
>
> Is there anything the guest kernel can do to detect this and not
> report a false negative at /sys/.../tsx_async_abort?

Unfortunately not. The hypervisor needs to know about TAA in order to
mitigate it on behalf of the guest. At least this doesn't require an
updated userspace and VM configuration!

Paolo

2019-11-22 20:17:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES

On Mon, Nov 18, 2019 at 07:17:43PM +0100, Paolo Bonzini wrote:
> KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
> to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
> !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
> hidden (it actually was), yet the value says that TSX is not vulnerable
> to microarchitectural data sampling. Fix both.
>
> Cc: [email protected]
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/x86.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5d530521f11d..6ea735d632e9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1327,12 +1327,18 @@ static u64 kvm_get_arch_capabilities(void)
> * If TSX is disabled on the system, guests are also mitigated against
> * TAA and clear CPU buffer mitigation is not required for guests.
> */
> - if (boot_cpu_has_bug(X86_BUG_TAA) && boot_cpu_has(X86_FEATURE_RTM) &&
> - (data & ARCH_CAP_TSX_CTRL_MSR))
> + if (!boot_cpu_has(X86_FEATURE_RTM))
> + data &= ~ARCH_CAP_TAA_NO;
> + else if (!boot_cpu_has_bug(X86_BUG_TAA))
> + data |= ARCH_CAP_TAA_NO;
> + else if (data & ARCH_CAP_TSX_CTRL_MSR)
> data &= ~ARCH_CAP_MDS_NO;
>
> + /* KVM does not emulate MSR_IA32_TSX_CTRL. */
> + data &= ~ARCH_CAP_TSX_CTRL_MSR;
> return data;
> }
> +EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);

Whoever backports this patch should drop this spurious addition of
EXPORT_SYMBOL_GPL, unless they also want to backport the cleanup :-).

2019-12-04 23:50:57

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality

On Mon, Nov 18, 2019 at 10:17 AM Paolo Bonzini <[email protected]> wrote:
>
> The current guest mitigation of TAA is both too heavy and not really
> sufficient. It is too heavy because it will cause some affected CPUs
> (those that have MDS_NO but lack TAA_NO) to fall back to VERW and
> get the corresponding slowdown. It is not really sufficient because
> it will cause the MDS_NO bit to disappear upon microcode update, so
> that VMs started before the microcode update will not be runnable
> anymore afterwards, even with tsx=on.
>
> Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
> the guest and let it run without the VERW mitigation. Even though
> MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
> it on every vmentry, we can use the shared MSR functionality because
> the host kernel need not protect itself from TSX-based side-channels.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++++---
> arch/x86/kvm/x86.c | 23 +++++------------------
> 2 files changed, 36 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 04a8212704c1..ed25fe7d5234 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -450,6 +450,7 @@ noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
> MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
> #endif
> MSR_EFER, MSR_TSC_AUX, MSR_STAR,
> + MSR_IA32_TSX_CTRL,
> };
>
> #if IS_ENABLED(CONFIG_HYPERV)
> @@ -1683,6 +1684,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
> index = __find_msr_index(vmx, MSR_TSC_AUX);
> if (index >= 0 && guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP))
> move_msr_up(vmx, index, save_nmsrs++);
> + index = __find_msr_index(vmx, MSR_IA32_TSX_CTRL);
> + if (index >= 0)
> + move_msr_up(vmx, index, save_nmsrs++);
>
> vmx->save_nmsrs = save_nmsrs;
> vmx->guest_msrs_ready = false;
> @@ -1782,6 +1786,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> #endif
> case MSR_EFER:
> return kvm_get_msr_common(vcpu, msr_info);
> + case MSR_IA32_TSX_CTRL:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
> + return 1;
> + goto find_shared_msr;
> case MSR_IA32_UMWAIT_CONTROL:
> if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
> return 1;
> @@ -1884,8 +1893,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> return 1;
> - /* Else, falls through */
> + goto find_shared_msr;
> default:
> + find_shared_msr:
> msr = find_msr_entry(vmx, msr_info->index);
> if (msr) {
> msr_info->data = msr->data;
> @@ -2001,6 +2011,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> MSR_IA32_SPEC_CTRL,
> MSR_TYPE_RW);
> break;
> + case MSR_IA32_TSX_CTRL:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
> + return 1;
> + if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
> + return 1;
> + goto find_shared_msr;
> case MSR_IA32_PRED_CMD:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> @@ -2152,8 +2169,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> /* Check reserved bit, higher 32 bits should be zero */
> if ((data >> 32) != 0)
> return 1;
> - /* Else, falls through */
> + goto find_shared_msr;
> +
> default:
> + find_shared_msr:
> msr = find_msr_entry(vmx, msr_index);
> if (msr) {
> u64 old_msr_data = msr->data;
> @@ -4234,7 +4253,16 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> continue;
> vmx->guest_msrs[j].index = i;
> vmx->guest_msrs[j].data = 0;
> - vmx->guest_msrs[j].mask = -1ull;
> +
> + switch (index) {
> + case MSR_IA32_TSX_CTRL:
> + /* No need to pass TSX_CTRL_CPUID_CLEAR through. */
> + vmx->guest_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
> + break;
> + default:
> + vmx->guest_msrs[j].mask = -1ull;
> + break;
> + }
> ++vmx->nmsrs;
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 648e84e728fc..fc54e3905fe3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1314,29 +1314,16 @@ static u64 kvm_get_arch_capabilities(void)
> data |= ARCH_CAP_MDS_NO;
>
> /*
> - * On TAA affected systems, export MDS_NO=0 when:
> - * - TSX is enabled on the host, i.e. X86_FEATURE_RTM=1.
> - * - Updated microcode is present. This is detected by
> - * the presence of ARCH_CAP_TSX_CTRL_MSR and ensures
> - * that VERW clears CPU buffers.
> - *
> - * When MDS_NO=0 is exported, guests deploy clear CPU buffer
> - * mitigation and don't complain:
> - *
> - * "Vulnerable: Clear CPU buffers attempted, no microcode"
> - *
> - * If TSX is disabled on the system, guests are also mitigated against
> - * TAA and clear CPU buffer mitigation is not required for guests.
> + * On TAA affected systems:
> + * - nothing to do if TSX is disabled on the host.
> + * - we emulate TSX_CTRL if present on the host.
> + * This lets the guest use VERW to clear CPU buffers.
> */
> if (!boot_cpu_has(X86_FEATURE_RTM))
> - data &= ~ARCH_CAP_TAA_NO;
> + data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
> else if (!boot_cpu_has_bug(X86_BUG_TAA))
> data |= ARCH_CAP_TAA_NO;
> - else if (data & ARCH_CAP_TSX_CTRL_MSR)
> - data &= ~ARCH_CAP_MDS_NO;
>
> - /* KVM does not emulate MSR_IA32_TSX_CTRL. */
> - data &= ~ARCH_CAP_TSX_CTRL_MSR;

Shouldn't kvm be masking off any bits that it doesn't know about here?
Who knows what future features we may claim to support?

> return data;
> }
> EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
> --
> 1.8.3.1
>
>

2019-12-05 10:17:05

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality

On 05/12/19 00:49, Jim Mattson wrote:
>> if (!boot_cpu_has(X86_FEATURE_RTM))
>> - data &= ~ARCH_CAP_TAA_NO;
>> + data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
>> else if (!boot_cpu_has_bug(X86_BUG_TAA))
>> data |= ARCH_CAP_TAA_NO;
>> - else if (data & ARCH_CAP_TSX_CTRL_MSR)
>> - data &= ~ARCH_CAP_MDS_NO;
>>
>> - /* KVM does not emulate MSR_IA32_TSX_CTRL. */
>> - data &= ~ARCH_CAP_TSX_CTRL_MSR;
> Shouldn't kvm be masking off any bits that it doesn't know about here?
> Who knows what future features we may claim to support?

Good question, in the past the ARCH_CAPABILITIES were just "we don't
have this bug" so it made sense to pass everything through. Now we have
TSX_CTRL that is of a different kind and arguably should have been a
CPUID bit, so we should indeed mask unknown capabilties.

Paolo