2022-10-27 09:56:33

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

Passing the host topology to the guest is almost certainly wrong
and will confuse the scheduler. In addition, several fields of
these CPUID leaves vary on each processor; it is simply impossible to
return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
they can be passed to KVM_SET_CPUID2.

The values that will most likely prevent confusion are all zeroes.
Userspace will have to override it anyway if it wishes to present a
specific topology to the guest.

Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virt/kvm/api.rst | 14 ++++++++++++++
arch/x86/kvm/cpuid.c | 32 ++++++++++++++++----------------
2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..20f4f6b302ff 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``
It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
has enabled in-kernel emulation of the local APIC.

+CPU topology
+~~~~~~~~~~~~
+
+Several CPUID values include topology information for the host CPU:
+0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different
+versions of KVM return different values for this information and userspace
+should not rely on it. Currently they return all zeroes.
+
+If userspace wishes to set up a guest topology, it should be careful that
+the values of these three leaves differ for each CPU. In particular,
+the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX
+for 0x8000001e; the latter also encodes the core id and node id in bits
+7:0 of EBX and ECX respectively.
+
Obsolete ioctls and capabilities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0810e93cbedc..164bfb7e7a16 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -759,16 +759,22 @@ struct kvm_cpuid_array {
int nent;
};

+static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array)
+{
+ if (array->nent >= array->maxnent)
+ return NULL;
+
+ return &array->entries[array->nent++];
+}
+
static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
u32 function, u32 index)
{
- struct kvm_cpuid_entry2 *entry;
+ struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);

- if (array->nent >= array->maxnent)
+ if (!entry)
return NULL;

- entry = &array->entries[array->nent++];
-
memset(entry, 0, sizeof(*entry));
entry->function = function;
entry->index = index;
@@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->edx = edx.full;
break;
}
- /*
- * Per Intel's SDM, the 0x1f is a superset of 0xb,
- * thus they can be handled by common code.
- */
case 0x1f:
case 0xb:
/*
- * Populate entries until the level type (ECX[15:8]) of the
- * previous entry is zero. Note, CPUID EAX.{0x1f,0xb}.0 is
- * the starting entry, filled by the primary do_host_cpuid().
+ * No topology; a valid topology is indicated by the presence
+ * of subleaf 1.
*/
- for (i = 1; entry->ecx & 0xff00; ++i) {
- entry = do_host_cpuid(array, function, i);
- if (!entry)
- goto out;
- }
+ entry->eax = entry->ebx = entry->ecx = 0;
break;
case 0xd: {
u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
@@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ebx = entry->ecx = entry->edx = 0;
break;
case 0x8000001e:
+ /* Do not return host topology information. */
+ entry->eax = entry->ebx = entry->ecx = 0;
+ entry->edx = 0; /* reserved */
break;
case 0x8000001F:
if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
--
2.31.1



2023-01-24 23:16:58

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Thu, Oct 27, 2022 at 2:21 AM Paolo Bonzini <[email protected]> wrote:
>
> Passing the host topology to the guest is almost certainly wrong
> and will confuse the scheduler. In addition, several fields of
> these CPUID leaves vary on each processor; it is simply impossible to
> return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
> they can be passed to KVM_SET_CPUID2.
>
> The values that will most likely prevent confusion are all zeroes.
> Userspace will have to override it anyway if it wishes to present a
> specific topology to the guest.
>
> Cc: [email protected]
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 14 ++++++++++++++
> arch/x86/kvm/cpuid.c | 32 ++++++++++++++++----------------
> 2 files changed, 30 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..20f4f6b302ff 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``
> It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
> has enabled in-kernel emulation of the local APIC.
>
> +CPU topology
> +~~~~~~~~~~~~
> +
> +Several CPUID values include topology information for the host CPU:
> +0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems. Different
> +versions of KVM return different values for this information and userspace
> +should not rely on it. Currently they return all zeroes.
> +
> +If userspace wishes to set up a guest topology, it should be careful that
> +the values of these three leaves differ for each CPU. In particular,
> +the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX
> +for 0x8000001e; the latter also encodes the core id and node id in bits
> +7:0 of EBX and ECX respectively.
> +
> Obsolete ioctls and capabilities
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0810e93cbedc..164bfb7e7a16 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -759,16 +759,22 @@ struct kvm_cpuid_array {
> int nent;
> };
>
> +static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array)
> +{
> + if (array->nent >= array->maxnent)
> + return NULL;
> +
> + return &array->entries[array->nent++];
> +}
> +
> static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
> u32 function, u32 index)
> {
> - struct kvm_cpuid_entry2 *entry;
> + struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);
>
> - if (array->nent >= array->maxnent)
> + if (!entry)
> return NULL;
>
> - entry = &array->entries[array->nent++];
> -
> memset(entry, 0, sizeof(*entry));
> entry->function = function;
> entry->index = index;
> @@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> entry->edx = edx.full;
> break;
> }
> - /*
> - * Per Intel's SDM, the 0x1f is a superset of 0xb,
> - * thus they can be handled by common code.
> - */
> case 0x1f:
> case 0xb:
> /*
> - * Populate entries until the level type (ECX[15:8]) of the
> - * previous entry is zero. Note, CPUID EAX.{0x1f,0xb}.0 is
> - * the starting entry, filled by the primary do_host_cpuid().
> + * No topology; a valid topology is indicated by the presence
> + * of subleaf 1.
> */
> - for (i = 1; entry->ecx & 0xff00; ++i) {
> - entry = do_host_cpuid(array, function, i);
> - if (!entry)
> - goto out;
> - }
> + entry->eax = entry->ebx = entry->ecx = 0;
> break;
> case 0xd: {
> u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
> @@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> entry->ebx = entry->ecx = entry->edx = 0;
> break;
> case 0x8000001e:
> + /* Do not return host topology information. */
> + entry->eax = entry->ebx = entry->ecx = 0;
> + entry->edx = 0; /* reserved */
> break;
> case 0x8000001F:
> if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
> --
> 2.31.1
>

This is a userspace ABI change that breaks existing hypervisors.
Please don't do this. Userspace ABIs are supposed to be inviolate.

2023-01-25 14:18:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On 1/25/23 00:16, Jim Mattson wrote:
> This is a userspace ABI change that breaks existing hypervisors.
> Please don't do this. Userspace ABIs are supposed to be inviolate.

What exactly is broken?

Part of the definition of the API is that you can take
KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
Returning host topology information for a random host vCPU definitely
violates the contract.

Furthermore, any existing userspace should be prepared to deal with
nonexistent host topology leaves. If you mean that said userspace would
now pass no topology to the guest, that's not an API change either.

(Now, there are certainly other parts of the KVM_GET_SUPPORTED_CPUID
contract that should be specified better. But that should be done for
each leaf one by one, which I intend to do, and would not extend to
these three host topology leaves).

Paolo


2023-01-25 16:49:20

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Wed, Jan 25, 2023 at 6:17 AM Paolo Bonzini <[email protected]> wrote:
>
> On 1/25/23 00:16, Jim Mattson wrote:
> > This is a userspace ABI change that breaks existing hypervisors.
> > Please don't do this. Userspace ABIs are supposed to be inviolate.
>
> What exactly is broken?

KVM_GET_SUPPORTED_CPUID no longer returns the host topology in leaf 0xB.

> Part of the definition of the API is that you can take
> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
> Returning host topology information for a random host vCPU definitely
> violates the contract.

You are attempting to rewrite history. Leaf 0xB was added to
KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
cpuid management"), and the only documentation of the
KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:

- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports

There is nothing in the commit message or the official documentation
at the time that the ioctl was added that says anything about passing
the result to KVM_SET_CPUID2 for all vCPUs. Operationally, it is quite
clear from the committed code that the intention was to return the
host topology information for the current logical processor.

Any future changes to either the operational behavior or the
documented behavior of the ABI surely demand a version bump.

2023-01-25 21:49:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On 1/25/23 17:47, Jim Mattson wrote:
>> Part of the definition of the API is that you can take
>> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
>> Returning host topology information for a random host vCPU definitely
>> violates the contract.
>
> You are attempting to rewrite history. Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
> cpuid management"), and the only documentation of the
> KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
>
> - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
> supports
>
> [...] the intention was to return the
> host topology information for the current logical processor.

The handling of unknown features is so naive in that commit, that I
don't think it is possible to read anything from the implementation; and
it certainly should not be a paragon for a future-proof implementation
of KVM_GET_SUPPORTED_CPUID.

For example, it only hid _known_ CPUID leaves or features and passed the
unknown ones through, which you'll agree is completely broken. It also
didn't try to handle all leaves for which ECX might turn out to be
significant---which happened for EAX=7 so the commit returns a wrong
output for CPUID[EAX=7,ECX=0].EAX.

In other words, the only reason it handles 0xB is because it was known
to have subleaves.

We can get more information about how userspace was intended to use it
from the qemu-kvm fork, which at the time was practically the only KVM
userspace. As of 2009 it was only checking a handful of leaves:

https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?h=kvm-88#n133

so shall we say that userspace is supposed to build each CPUID leaf one
by one and use KVM_GET_SUPPORTED_CPUID2 for validation only? I think
the first committed documentation agrees: "Userspace can use the
information returned by this ioctl to construct cpuid information (for
KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace
capabilities, and with user requirements".

However, that's the theory. "Do not break userspace" also involves
looking at how userspace *really* uses the API, and make compromises to
cater to those uses; which is different from rewriting history.

And in practice, people basically stopped reading after "(for
KVM_SET_CPUID2)".

For example in kvmtool:

kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES;
if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0)
die_perror("KVM_GET_SUPPORTED_CPUID failed");

filter_cpuid(kvm_cpuid, vcpu->cpu_id);

if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0)
die_perror("KVM_SET_CPUID2 failed");

where filter_cpuid only does minor adjustments that do not include 0xB,
0x1F and 0x8000001E. The result is a topology that makes no sense if
host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC
id in EDX.

https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030fd31f/x86/cpuid.c


crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but
it fails to adjust the APIC id in EDX. On the other hand it also passes
through 0x8000001E if ctx.cpu_config.host_cpu_topology is false,
incorrectly. So on one hand this patch breaks host_cpu_topology ==
true, on the other hand it fixes host_cpu_topology == false on AMD
processors.

https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec82/x86_64/src/cpuid.rs#L121


The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but
not for 0x1F. Apart from that it passes through the host topology
leaves, again resulting in nonsensical topology for host #vCPUs != guest
#vCPUs.

https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c82d6277a755/src/vm-vcpu-ref/src/x86_64/cpuid.rs


Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb
and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch
is again a fix of sorts---having all zeroes in 0x1F is better than
having a value that is valid but inconsistent with 0xB.

https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/intel.rs#L120
https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/amd.rs#L88


So basically the only open source userspace that is penalized (but not
broken, and also partly fixed) by the patch is crosvm. QEMU doesn't
care, while firecracker/kvmtool/vmm-reference are a net positive.

Paolo

> Any future changes to either the operational behavior or the
> documented behavior of the ABI surely demand a version bump.
>


2023-01-25 22:10:15

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Wed, Jan 25, 2023 at 1:46 PM Paolo Bonzini <[email protected]> wrote:
>
> On 1/25/23 17:47, Jim Mattson wrote:
> >> Part of the definition of the API is that you can take
> >> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
> >> Returning host topology information for a random host vCPU definitely
> >> violates the contract.
> >
> > You are attempting to rewrite history. Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
> > cpuid management"), and the only documentation of the
> > KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
> >
> > - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
> > supports
> >
> > [...] the intention was to return the
> > host topology information for the current logical processor.
>
> The handling of unknown features is so naive in that commit, that I
> don't think it is possible to read anything from the implementation; and
> it certainly should not be a paragon for a future-proof implementation
> of KVM_GET_SUPPORTED_CPUID.
>
> For example, it only hid _known_ CPUID leaves or features and passed the
> unknown ones through, which you'll agree is completely broken. It also
> didn't try to handle all leaves for which ECX might turn out to be
> significant---which happened for EAX=7 so the commit returns a wrong
> output for CPUID[EAX=7,ECX=0].EAX.
>
> In other words, the only reason it handles 0xB is because it was known
> to have subleaves.
>
> We can get more information about how userspace was intended to use it
> from the qemu-kvm fork, which at the time was practically the only KVM
> userspace. As of 2009 it was only checking a handful of leaves:
>
> https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?h=kvm-88#n133
>
> so shall we say that userspace is supposed to build each CPUID leaf one
> by one and use KVM_GET_SUPPORTED_CPUID2 for validation only? I think
> the first committed documentation agrees: "Userspace can use the
> information returned by this ioctl to construct cpuid information (for
> KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace
> capabilities, and with user requirements".
>
> However, that's the theory. "Do not break userspace" also involves
> looking at how userspace *really* uses the API, and make compromises to
> cater to those uses; which is different from rewriting history.
>
> And in practice, people basically stopped reading after "(for
> KVM_SET_CPUID2)".
>
> For example in kvmtool:
>
> kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES;
> if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0)
> die_perror("KVM_GET_SUPPORTED_CPUID failed");
>
> filter_cpuid(kvm_cpuid, vcpu->cpu_id);
>
> if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0)
> die_perror("KVM_SET_CPUID2 failed");
>
> where filter_cpuid only does minor adjustments that do not include 0xB,
> 0x1F and 0x8000001E. The result is a topology that makes no sense if
> host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC
> id in EDX.
>
> https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030fd31f/x86/cpuid.c
>
>
> crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but
> it fails to adjust the APIC id in EDX. On the other hand it also passes
> through 0x8000001E if ctx.cpu_config.host_cpu_topology is false,
> incorrectly. So on one hand this patch breaks host_cpu_topology ==
> true, on the other hand it fixes host_cpu_topology == false on AMD
> processors.
>
> https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec82/x86_64/src/cpuid.rs#L121
>
>
> The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but
> not for 0x1F. Apart from that it passes through the host topology
> leaves, again resulting in nonsensical topology for host #vCPUs != guest
> #vCPUs.
>
> https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c82d6277a755/src/vm-vcpu-ref/src/x86_64/cpuid.rs
>
>
> Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb
> and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch
> is again a fix of sorts---having all zeroes in 0x1F is better than
> having a value that is valid but inconsistent with 0xB.
>
> https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/intel.rs#L120
> https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/amd.rs#L88
>
>
> So basically the only open source userspace that is penalized (but not
> broken, and also partly fixed) by the patch is crosvm. QEMU doesn't
> care, while firecracker/kvmtool/vmm-reference are a net positive.
>
> Paolo

The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
decade* have been passed through unmodified from the host. They have
never made sense for KVM_SET_CPUID2, with the unlikely exception of a
whole-host VM.

Our VMM populates the topology of the guest CPUID table on its own, as
any VMM must. However, it uses the host topology (which
KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
decade*) to see if the requested guest topology is possible.

Changing a long-established ABI in a way that breaks userspace
applications is a bad practice. I didn't think we, as a community, did
that. I didn't realize that we were only catering to open source
implementations here.

2023-01-25 22:44:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On 1/25/23 23:09, Jim Mattson wrote:
> The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
> decade* have been passed through unmodified from the host. They have
> never made sense for KVM_SET_CPUID2, with the unlikely exception of a
> whole-host VM.

True, unfortunately people have not read the nonexistent documentation
and they are:

1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;

2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for
example in inconsistencies between 0xB and 0x1F.

*But* (2) should not be needed unless you care about maintaining
homogeneous CPUID within a VM migration pool. For something like
kvmtool, having to do (2) would be a workaround for the bug that this
patch fixes.

> Our VMM populates the topology of the guest CPUID table on its own, as
> any VMM must. However, it uses the host topology (which
> KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
> decade*) to see if the requested guest topology is possible.

Ok, thanks; this is useful to know.

> Changing a long-established ABI in a way that breaks userspace
> applications is a bad practice. I didn't think we, as a community, did
> that. I didn't realize that we were only catering to open source
> implementations here.

We aren't. But the open source implementations provide some guidance as
to how the API is being used in the wild, and what the pitfalls are.

You wrote it yourself: any VMM must either populate the topology on its
own, or possibly fill it with zeros. Returning a value that is
extremely unlikely to be used is worse in pretty much every way (apart
from not breaking your VMM, of course).

With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
damned if it doesn't. There is a tension between fixing the one VMM
that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
and fixing 3-4 that were silently broken and are now fixed. I will
probably send a patch to crosvm, though.

The VMM being _proprietary_ doesn't really matter, however it does
matter to me that it is not _public_: it is only used within Google, and
the breakage is neither hard to fix in the VMM nor hard to temporarily
avoid by reverting the patch in the Google kernel.

Paolo


2023-01-26 00:58:56

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Wed, Jan 25, 2023 at 2:44 PM Paolo Bonzini <[email protected]> wrote:
>
> On 1/25/23 23:09, Jim Mattson wrote:
> > The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
> > decade* have been passed through unmodified from the host. They have
> > never made sense for KVM_SET_CPUID2, with the unlikely exception of a
> > whole-host VM.
>
> True, unfortunately people have not read the nonexistent documentation
> and they are:
>
> 1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;
>
> 2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for
> example in inconsistencies between 0xB and 0x1F.
>
> *But* (2) should not be needed unless you care about maintaining
> homogeneous CPUID within a VM migration pool. For something like
> kvmtool, having to do (2) would be a workaround for the bug that this
> patch fixes.

Maybe we should just populate up to leaf 3. :-)

> > Our VMM populates the topology of the guest CPUID table on its own, as
> > any VMM must. However, it uses the host topology (which
> > KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
> > decade*) to see if the requested guest topology is possible.
>
> Ok, thanks; this is useful to know.
>
> > Changing a long-established ABI in a way that breaks userspace
> > applications is a bad practice. I didn't think we, as a community, did
> > that. I didn't realize that we were only catering to open source
> > implementations here.
>
> We aren't. But the open source implementations provide some guidance as
> to how the API is being used in the wild, and what the pitfalls are.
>
> You wrote it yourself: any VMM must either populate the topology on its
> own, or possibly fill it with zeros. Returning a value that is
> extremely unlikely to be used is worse in pretty much every way (apart
> from not breaking your VMM, of course).

I've complained about this particular ioctl more than I can remember.
This is just one of its many problems.

> With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
> rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
> damned if it doesn't. There is a tension between fixing the one VMM
> that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
> and fixing 3-4 that were silently broken and are now fixed. I will
> probably send a patch to crosvm, though.
>
> The VMM being _proprietary_ doesn't really matter, however it does
> matter to me that it is not _public_: it is only used within Google, and
> the breakage is neither hard to fix in the VMM nor hard to temporarily
> avoid by reverting the patch in the Google kernel.

Sadly, there isn't a single kernel involved. People running our VMM on
their desktops are going to be impacted as soon as this patch hits
that distro. (I don't know if I can say which distro that is.) So, now
we have to get the VMM folks to urgently accommodate this change and
get a new distribution out.

2023-01-26 09:41:34

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On 1/26/23 01:58, Jim Mattson wrote:
>> You wrote it yourself: any VMM must either populate the topology on its
>> own, or possibly fill it with zeros. Returning a value that is
>> extremely unlikely to be used is worse in pretty much every way (apart
>> from not breaking your VMM, of course).
>
> I've complained about this particular ioctl more than I can remember.
> This is just one of its many problems.

I agree. At the very least it should have been a VM ioctl.

>> With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
>> rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
>> damned if it doesn't. There is a tension between fixing the one VMM
>> that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
>> and fixing 3-4 that were silently broken and are now fixed. I will
>> probably send a patch to crosvm, though.
>>
>> The VMM being _proprietary_ doesn't really matter, however it does
>> matter to me that it is not _public_: it is only used within Google, and
>> the breakage is neither hard to fix in the VMM nor hard to temporarily
>> avoid by reverting the patch in the Google kernel.
>
> Sadly, there isn't a single kernel involved. People running our VMM on
> their desktops are going to be impacted as soon as this patch hits
> that distro. (I don't know if I can say which distro that is.) So, now
> we have to get the VMM folks to urgently accommodate this change and
> get a new distribution out.

Ok, this is what is needed to make a more informed choice. To be clear,
this is _still_ not public (for example it's not ChromeOS), so there is
at least some control on what version of the VMM they use? Would it
make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?

Thanks,

Paolo


2023-01-26 16:06:35

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Thu, Jan 26, 2023 at 1:40 AM Paolo Bonzini <[email protected]> wrote:
>
> On 1/26/23 01:58, Jim Mattson wrote:
> >> You wrote it yourself: any VMM must either populate the topology on its
> >> own, or possibly fill it with zeros. Returning a value that is
> >> extremely unlikely to be used is worse in pretty much every way (apart
> >> from not breaking your VMM, of course).
> >
> > I've complained about this particular ioctl more than I can remember.
> > This is just one of its many problems.
>
> I agree. At the very least it should have been a VM ioctl.
>
> >> With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
> >> rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
> >> damned if it doesn't. There is a tension between fixing the one VMM
> >> that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
> >> and fixing 3-4 that were silently broken and are now fixed. I will
> >> probably send a patch to crosvm, though.
> >>
> >> The VMM being _proprietary_ doesn't really matter, however it does
> >> matter to me that it is not _public_: it is only used within Google, and
> >> the breakage is neither hard to fix in the VMM nor hard to temporarily
> >> avoid by reverting the patch in the Google kernel.
> >
> > Sadly, there isn't a single kernel involved. People running our VMM on
> > their desktops are going to be impacted as soon as this patch hits
> > that distro. (I don't know if I can say which distro that is.) So, now
> > we have to get the VMM folks to urgently accommodate this change and
> > get a new distribution out.
>
> Ok, this is what is needed to make a more informed choice. To be clear,
> this is _still_ not public (for example it's not ChromeOS), so there is
> at least some control on what version of the VMM they use? Would it
> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?

Mainline isn't a problem. I'm more worried about 5.19 LTS.

Thanks!

2023-01-26 17:48:47

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On 1/26/23 17:06, Jim Mattson wrote:
>>> Sadly, there isn't a single kernel involved. People running our VMM on
>>> their desktops are going to be impacted as soon as this patch hits
>>> that distro. (I don't know if I can say which distro that is.) So, now
>>> we have to get the VMM folks to urgently accommodate this change and
>>> get a new distribution out.
>>
>> Ok, this is what is needed to make a more informed choice. To be clear,
>> this is _still_ not public (for example it's not ChromeOS), so there is
>> at least some control on what version of the VMM they use? Would it
>> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
>
> Mainline isn't a problem. I'm more worried about 5.19 LTS.

5.19 is not LTS, is it? This patch is only in 6.1.7 and 6.1.8 as far as
stable kernels is concerned, should I ask Greg to revert it there?

Paolo


2023-01-26 20:47:10

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Thu, Jan 26, 2023 at 9:47 AM Paolo Bonzini <[email protected]> wrote:
>
> On 1/26/23 17:06, Jim Mattson wrote:
> >>> Sadly, there isn't a single kernel involved. People running our VMM on
> >>> their desktops are going to be impacted as soon as this patch hits
> >>> that distro. (I don't know if I can say which distro that is.) So, now
> >>> we have to get the VMM folks to urgently accommodate this change and
> >>> get a new distribution out.
> >>
> >> Ok, this is what is needed to make a more informed choice. To be clear,
> >> this is _still_ not public (for example it's not ChromeOS), so there is
> >> at least some control on what version of the VMM they use? Would it
> >> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
> >
> > Mainline isn't a problem. I'm more worried about 5.19 LTS.
>
> 5.19 is not LTS, is it? This patch is only in 6.1.7 and 6.1.8 as far as
> stable kernels is concerned, should I ask Greg to revert it there?

It came to my attention when commit 196c6f0c3e21 ("KVM: x86: Do not
return host topology information from KVM_GET_SUPPORTED_CPUID") broke
some of our tests under 5.10 LTS.

If it isn't bound for linux-5.19-y, then we have some breathing room.

2023-01-27 07:23:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID

On Thu, Jan 26, 2023 at 12:45:58PM -0800, Jim Mattson wrote:
> On Thu, Jan 26, 2023 at 9:47 AM Paolo Bonzini <[email protected]> wrote:
> >
> > On 1/26/23 17:06, Jim Mattson wrote:
> > >>> Sadly, there isn't a single kernel involved. People running our VMM on
> > >>> their desktops are going to be impacted as soon as this patch hits
> > >>> that distro. (I don't know if I can say which distro that is.) So, now
> > >>> we have to get the VMM folks to urgently accommodate this change and
> > >>> get a new distribution out.
> > >>
> > >> Ok, this is what is needed to make a more informed choice. To be clear,
> > >> this is _still_ not public (for example it's not ChromeOS), so there is
> > >> at least some control on what version of the VMM they use? Would it
> > >> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
> > >
> > > Mainline isn't a problem. I'm more worried about 5.19 LTS.
> >
> > 5.19 is not LTS, is it? This patch is only in 6.1.7 and 6.1.8 as far as
> > stable kernels is concerned, should I ask Greg to revert it there?
>
> It came to my attention when commit 196c6f0c3e21 ("KVM: x86: Do not
> return host topology information from KVM_GET_SUPPORTED_CPUID") broke
> some of our tests under 5.10 LTS.
>
> If it isn't bound for linux-5.19-y, then we have some breathing room.

5.19 is long end-of-life, it dropped off of being maintained back in
October of last year.

You can always use the front page of kernel.org to determine what is
still being maintained.

thanks,

greg k-h