2022-01-18 02:34:24

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

Changes since v1:
- Drop the allowlist of items which were allowed to change and just allow
the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly.
- Drop PATCH1 as the exact same change got merged upstream.

Recently, KVM made it illegal to change CPUID after KVM_RUN but
unfortunately this change is not fully compatible with existing VMMs.
In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it
calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing
KVM_SET_CPUID{,2} with the exact same data.

Vitaly Kuznetsov (4):
KVM: x86: Do runtime CPUID update before updating
vcpu->arch.cpuid_entries
KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN
KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test'
KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUN

arch/x86/kvm/cpuid.c | 67 ++++++++++++++++---
arch/x86/kvm/x86.c | 19 ------
tools/testing/selftests/kvm/.gitignore | 2 +-
tools/testing/selftests/kvm/Makefile | 4 +-
.../selftests/kvm/include/x86_64/processor.h | 7 ++
.../selftests/kvm/lib/x86_64/processor.c | 33 +++++++--
.../x86_64/{get_cpuid_test.c => cpuid_test.c} | 30 +++++++++
7 files changed, 126 insertions(+), 36 deletions(-)
rename tools/testing/selftests/kvm/x86_64/{get_cpuid_test.c => cpuid_test.c} (83%)

--
2.34.1


2022-01-18 02:34:27

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v2 1/4] KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries

kvm_update_cpuid_runtime() mangles CPUID data coming from userspace
VMM after updating 'vcpu->arch.cpuid_entries', this makes it
impossible to compare an update with what was previously
supplied. Introduce __kvm_update_cpuid_runtime() version which can be
used to tweak the input before it goes to 'vcpu->arch.cpuid_entries'
so the upcoming update check can compare tweaked data.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/cpuid.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c55e57b30e81..812190a707f6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -145,14 +145,21 @@ static void kvm_update_kvm_cpuid_base(struct kvm_vcpu *vcpu)
}
}

-static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
+static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu,
+ struct kvm_cpuid_entry2 *entries, int nent)
{
u32 base = vcpu->arch.kvm_cpuid_base;

if (!base)
return NULL;

- return kvm_find_cpuid_entry(vcpu, base | KVM_CPUID_FEATURES, 0);
+ return cpuid_entry2_find(entries, nent, base | KVM_CPUID_FEATURES, 0);
+}
+
+static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
+{
+ return __kvm_find_kvm_cpuid_features(vcpu, vcpu->arch.cpuid_entries,
+ vcpu->arch.cpuid_nent);
}

void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
@@ -167,11 +174,12 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
vcpu->arch.pv_cpuid.features = best->eax;
}

-void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
+static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
+ int nent)
{
struct kvm_cpuid_entry2 *best;

- best = kvm_find_cpuid_entry(vcpu, 1, 0);
+ best = cpuid_entry2_find(entries, nent, 1, 0);
if (best) {
/* Update OSXSAVE bit */
if (boot_cpu_has(X86_FEATURE_XSAVE))
@@ -182,33 +190,38 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
}

- best = kvm_find_cpuid_entry(vcpu, 7, 0);
+ best = cpuid_entry2_find(entries, nent, 7, 0);
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
cpuid_entry_change(best, X86_FEATURE_OSPKE,
kvm_read_cr4_bits(vcpu, X86_CR4_PKE));

- best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+ best = cpuid_entry2_find(entries, nent, 0xD, 0);
if (best)
best->ebx = xstate_required_size(vcpu->arch.xcr0, false);

- best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
+ best = cpuid_entry2_find(entries, nent, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
best->ebx = xstate_required_size(vcpu->arch.xcr0, true);

- best = kvm_find_kvm_cpuid_features(vcpu);
+ best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);

if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
- best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+ best = cpuid_entry2_find(entries, nent, 0x1, 0);
if (best)
cpuid_entry_change(best, X86_FEATURE_MWAIT,
vcpu->arch.ia32_misc_enable_msr &
MSR_IA32_MISC_ENABLE_MWAIT);
}
}
+
+void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
+{
+ __kvm_update_cpuid_runtime(vcpu, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
@@ -298,6 +311,8 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
{
int r;

+ __kvm_update_cpuid_runtime(vcpu, e2, nent);
+
r = kvm_check_cpuid(vcpu, e2, nent);
if (r)
return r;
@@ -307,7 +322,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
vcpu->arch.cpuid_nent = nent;

kvm_update_kvm_cpuid_base(vcpu);
- kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);

return 0;
--
2.34.1

2022-01-20 12:49:24

by Igor Mammedov

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

On Mon, 17 Jan 2022 16:05:38 +0100
Vitaly Kuznetsov <[email protected]> wrote:

> Changes since v1:
> - Drop the allowlist of items which were allowed to change and just allow
> the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly.
> - Drop PATCH1 as the exact same change got merged upstream.
>
> Recently, KVM made it illegal to change CPUID after KVM_RUN but
> unfortunately this change is not fully compatible with existing VMMs.
> In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it
> calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing
> KVM_SET_CPUID{,2} with the exact same data.


Can you check following scenario:
* on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present)
* boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI

that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H
and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it cleared

* hotunplug a VCPU and then replug it again
if IA32_TSX_CTRL is reset to initial state, that should re-enable
RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference

and as Sean pointed out there might be other non constant leafs,
where exact match check could leave userspace broken.


> Vitaly Kuznetsov (4):
> KVM: x86: Do runtime CPUID update before updating
> vcpu->arch.cpuid_entries
> KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN
> KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test'
> KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUN
>
> arch/x86/kvm/cpuid.c | 67 ++++++++++++++++---
> arch/x86/kvm/x86.c | 19 ------
> tools/testing/selftests/kvm/.gitignore | 2 +-
> tools/testing/selftests/kvm/Makefile | 4 +-
> .../selftests/kvm/include/x86_64/processor.h | 7 ++
> .../selftests/kvm/lib/x86_64/processor.c | 33 +++++++--
> .../x86_64/{get_cpuid_test.c => cpuid_test.c} | 30 +++++++++
> 7 files changed, 126 insertions(+), 36 deletions(-)
> rename tools/testing/selftests/kvm/x86_64/{get_cpuid_test.c => cpuid_test.c} (83%)
>

2022-01-20 17:31:48

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

On 1/18/22 15:35, Igor Mammedov wrote:
> Can you check following scenario:
> * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present)
> * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI
>
> that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H
> and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it cleared
>
> * hotunplug a VCPU and then replug it again
> if IA32_TSX_CTRL is reset to initial state, that should re-enable
> RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference
>
> and as Sean pointed out there might be other non constant leafs,
> where exact match check could leave userspace broken.


MSR_IA32_TSX_CTRL is handled differently straight during the CPUID call:

if (function == 7 && index == 0) {
u64 data;
if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
(data & TSX_CTRL_CPUID_CLEAR))
*ebx &= ~(F(RTM) | F(HLE));
}


and I think we should redo all or most of kvm_update_cpuid_runtime
the same way.

Paolo

2022-01-20 17:36:35

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

Igor Mammedov <[email protected]> writes:

> On Mon, 17 Jan 2022 16:05:38 +0100
> Vitaly Kuznetsov <[email protected]> wrote:
>
>> Changes since v1:
>> - Drop the allowlist of items which were allowed to change and just allow
>> the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly.
>> - Drop PATCH1 as the exact same change got merged upstream.
>>
>> Recently, KVM made it illegal to change CPUID after KVM_RUN but
>> unfortunately this change is not fully compatible with existing VMMs.
>> In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it
>> calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing
>> KVM_SET_CPUID{,2} with the exact same data.
>
>
> Can you check following scenario:
> * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present)
> * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI
>
> that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H
> and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it
> cleared

Forgive me my ignorance around (not only) TSX :-) I took a "Intel(R)
Xeon(R) CPU E3-1270 v5 @ 3.60GHz" host which seems to have rtm/hle and
booted a guest with 'cpu=host' and with (and without) 'tsx=off' on the
kernel command line. I decided to check what's is MSR_IA32_TSX_CTRL but
I see the following:

# rdmsr 0x122
rdmsr: CPU 0 cannot read MSR 0x00000122

I tried adding 'tsx_ctrl' to my QEMU command line but it complains with
qemu-system-x86_64: warning: host doesn't support requested feature: MSR(10AH).tsx-ctrl [bit 7]

so I think my host is not good enough :-(

Also, I've looked at tsx_clear_cpuid() but it actually writes to
MSR_TSX_FORCE_ABORT MSR (0x10F), not MSR_IA32_TSX_CTRL so I'm confused.

>
> * hotunplug a VCPU and then replug it again
> if IA32_TSX_CTRL is reset to initial state, that should re-enable
> RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference

Could you please teach me this kung-fu, I mean hot to unplug a
cold-plugged CPU with QMP? Previoulsy, I only did un-plugging for what
I've hotplugged, something like:

(QEMU) device_add driver=host-x86_64-cpu socket-id=0 core-id=2 thread-id=0 id=cpu2
{"return": {}}
(QEMU) device_del id=cpu2
{"return": {}}

What's the ids of the cold-plugged CPUs?

>
> and as Sean pointed out there might be other non constant leafs,
> where exact match check could leave userspace broken.

Indeed, while testing your suggestion I've stumbled upon
CPUID.(EAX=0x12, ECX=1) (SGX) where we mangle ECX from
kvm_vcpu_after_set_cpuid():

best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1);
if (best) {
best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff;
best->edx &= vcpu->arch.guest_supported_xcr0 >> 32;
best->ecx |= XFEATURE_MASK_FPSSE;
}

In theory, we should just move this to __kvm_update_cpuid_runtime()...
I'll take a look tomorrow.

--
Vitaly

2022-01-20 17:59:20

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

On Tue, Jan 18, 2022, Paolo Bonzini wrote:
> On 1/18/22 15:35, Igor Mammedov wrote:
> > Can you check following scenario:
> > * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present)
> > * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI
> >
> > that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H
> > and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it cleared
> >
> > * hotunplug a VCPU and then replug it again
> > if IA32_TSX_CTRL is reset to initial state, that should re-enable
> > RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference
> >
> > and as Sean pointed out there might be other non constant leafs,
> > where exact match check could leave userspace broken.
>
>
> MSR_IA32_TSX_CTRL is handled differently straight during the CPUID call:
>
> if (function == 7 && index == 0) {
> u64 data;
> if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> (data & TSX_CTRL_CPUID_CLEAR))
> *ebx &= ~(F(RTM) | F(HLE));
> }
>
>
> and I think we should redo all or most of kvm_update_cpuid_runtime
> the same way.

Please no. xstate_required_size() requires multiple host CPUID calls, and glibc
does CPUID.0xD.0x0 and CPUID.0xD.0x1 as part of its initialization, i.e. launching
a new userspace process in the guest will see additional performance overhread due
to KVM dynamically computing the XSAVE size instead of caching it based on vCPU
state. Nested virtualization would be especially painful as every one of those
"host" CPUID invocations will trigger and exit from L1=>L0.

2022-01-20 19:14:37

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

On 1/18/22 17:53, Sean Christopherson wrote:
>>
>> and I think we should redo all or most of kvm_update_cpuid_runtime
>> the same way.
> Please no. xstate_required_size() requires multiple host CPUID calls, and glibc
> does CPUID.0xD.0x0 and CPUID.0xD.0x1 as part of its initialization, i.e. launching
> a new userspace process in the guest will see additional performance overhread due
> to KVM dynamically computing the XSAVE size instead of caching it based on vCPU
> state. Nested virtualization would be especially painful as every one of those
> "host" CPUID invocations will trigger and exit from L1=>L0.
>

Agreed, but all of the required data is by Linux cached in
xstate_offsets, xstate_sizes and xstate_comp_offsets; moving
xstate_required_size to xstate.c and skipping the host CPUID would
probably be a good idea nevertheless.

Paolo

2022-01-21 18:54:57

by Igor Mammedov

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug

On Tue, 18 Jan 2022 17:34:09 +0100
Vitaly Kuznetsov <[email protected]> wrote:

> Igor Mammedov <[email protected]> writes:
>
> > On Mon, 17 Jan 2022 16:05:38 +0100
> > Vitaly Kuznetsov <[email protected]> wrote:
> >
> >> Changes since v1:
> >> - Drop the allowlist of items which were allowed to change and just allow
> >> the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly.
> >> - Drop PATCH1 as the exact same change got merged upstream.
> >>
> >> Recently, KVM made it illegal to change CPUID after KVM_RUN but
> >> unfortunately this change is not fully compatible with existing VMMs.
> >> In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it
> >> calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing
> >> KVM_SET_CPUID{,2} with the exact same data.
> >
> >
> > Can you check following scenario:
> > * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present)
> > * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI
> >
> > that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H
> > and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it
> > cleared
>
> Forgive me my ignorance around (not only) TSX :-) I took a "Intel(R)
> Xeon(R) CPU E3-1270 v5 @ 3.60GHz" host which seems to have rtm/hle and
> booted a guest with 'cpu=host' and with (and without) 'tsx=off' on the
> kernel command line. I decided to check what's is MSR_IA32_TSX_CTRL but
> I see the following:
>
> # rdmsr 0x122
> rdmsr: CPU 0 cannot read MSR 0x00000122
>
> I tried adding 'tsx_ctrl' to my QEMU command line but it complains with
> qemu-system-x86_64: warning: host doesn't support requested feature: MSR(10AH).tsx-ctrl [bit 7]
>
> so I think my host is not good enough :-(

I've seen it being available on "COOPER LAKE" Xeon

>
> Also, I've looked at tsx_clear_cpuid() but it actually writes to
> MSR_TSX_FORCE_ABORT MSR (0x10F), not MSR_IA32_TSX_CTRL so I'm confused.

look at tsx_disable()

> > * hotunplug a VCPU and then replug it again
> > if IA32_TSX_CTRL is reset to initial state, that should re-enable
> > RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference
>
> Could you please teach me this kung-fu, I mean hot to unplug a
> cold-plugged CPU with QMP? Previoulsy, I only did un-plugging for what
> I've hotplugged, something like:
>
> (QEMU) device_add driver=host-x86_64-cpu socket-id=0 core-id=2 thread-id=0 id=cpu2
> {"return": {}}
> (QEMU) device_del id=cpu2
> {"return": {}}
>
> What's the ids of the cold-plugged CPUs?

it doesn't have to be coldplugged, hot(plug/unplug/plug) sequence is fine as well.
fyi you can use qom_path with device _del from 'info hotpluggable-cpus' output


> > and as Sean pointed out there might be other non constant leafs,
> > where exact match check could leave userspace broken.
>
> Indeed, while testing your suggestion I've stumbled upon
> CPUID.(EAX=0x12, ECX=1) (SGX) where we mangle ECX from
> kvm_vcpu_after_set_cpuid():
>
> best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1);
> if (best) {
> best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff;
> best->edx &= vcpu->arch.guest_supported_xcr0 >> 32;
> best->ecx |= XFEATURE_MASK_FPSSE;
> }
>
> In theory, we should just move this to __kvm_update_cpuid_runtime()...
> I'll take a look tomorrow.
>