2020-09-15 16:33:13

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for
KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
from '40' in 2010. We can, of course, just bump it a little bit to fix
the immediate issue but the report made me wonder why we need to pre-
allocate vcpu->arch.cpuid_entries array instead of sizing it dynamically.
This RFC is intended to feed my curiosity.

Very mildly tested with selftests/kvm-unit-tests and nothing seems to
break. I also don't have access to the system where the original issue
was reported but chances we're fixing it are very good IMO as just the
second patch alone was reported to be sufficient.

Reported-by: Dr. David Alan Gilbert <[email protected]>

Vitaly Kuznetsov (2):
KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
KVM: x86: bump KVM_MAX_CPUID_ENTRIES

arch/x86/include/asm/kvm_host.h | 4 +--
arch/x86/kvm/cpuid.c | 55 ++++++++++++++++++++++++---------
arch/x86/kvm/x86.c | 1 +
3 files changed, 43 insertions(+), 17 deletions(-)

--
2.25.4


2020-09-15 19:51:21

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

* Vitaly Kuznetsov ([email protected]) wrote:
> With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for
> KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
> from '40' in 2010. We can, of course, just bump it a little bit to fix
> the immediate issue but the report made me wonder why we need to pre-
> allocate vcpu->arch.cpuid_entries array instead of sizing it dynamically.
> This RFC is intended to feed my curiosity.
>
> Very mildly tested with selftests/kvm-unit-tests and nothing seems to
> break. I also don't have access to the system where the original issue
> was reported but chances we're fixing it are very good IMO as just the
> second patch alone was reported to be sufficient.
>
> Reported-by: Dr. David Alan Gilbert <[email protected]>

Oh nice, I was just going to bump the magic number :-)

Anyway, this seems to work for me, so:

Tested-by: Dr. David Alan Gilbert <[email protected]>

> Vitaly Kuznetsov (2):
> KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
> KVM: x86: bump KVM_MAX_CPUID_ENTRIES
>
> arch/x86/include/asm/kvm_host.h | 4 +--
> arch/x86/kvm/cpuid.c | 55 ++++++++++++++++++++++++---------
> arch/x86/kvm/x86.c | 1 +
> 3 files changed, 43 insertions(+), 17 deletions(-)
>
> --
> 2.25.4
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2020-09-15 22:50:24

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 1/2] KVM: x86: allocate vcpu->arch.cpuid_entries dynamically

The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
is reported to be insufficient but before we bump it let's switch to
allocating vcpu->arch.cpuid_entries dynamically. Currenly,
'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
but having it pre-allocated (for all vCPUs which we also pre-allocate)
gives us no benefits.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/cpuid.c | 55 ++++++++++++++++++++++++---------
arch/x86/kvm/x86.c | 1 +
3 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5303dbc5c9bc..0c5f2ca3e838 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -636,7 +636,7 @@ struct kvm_vcpu_arch {
int halt_request; /* real mode on Intel only */

int cpuid_nent;
- struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES];
+ struct kvm_cpuid_entry2 *cpuid_entries;

int maxphyaddr;
int max_tdp_level;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3fd6eec202d7..0ce943a8a39a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -195,6 +195,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
{
int r, i;
struct kvm_cpuid_entry *cpuid_entries = NULL;
+ struct kvm_cpuid_entry2 *cpuid_entries2 = NULL;

r = -E2BIG;
if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
@@ -207,31 +208,42 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
r = PTR_ERR(cpuid_entries);
goto out;
}
+ cpuid_entries2 = kvmalloc_array(cpuid->nent, sizeof(cpuid_entries2[0]),
+ GFP_KERNEL_ACCOUNT);
+ if (!cpuid_entries2) {
+ r = -ENOMEM;
+ goto out_free_cpuid;
+ }
}
for (i = 0; i < cpuid->nent; i++) {
- vcpu->arch.cpuid_entries[i].function = cpuid_entries[i].function;
- vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax;
- vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx;
- vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx;
- vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx;
- vcpu->arch.cpuid_entries[i].index = 0;
- vcpu->arch.cpuid_entries[i].flags = 0;
- vcpu->arch.cpuid_entries[i].padding[0] = 0;
- vcpu->arch.cpuid_entries[i].padding[1] = 0;
- vcpu->arch.cpuid_entries[i].padding[2] = 0;
+ cpuid_entries2[i].function = cpuid_entries[i].function;
+ cpuid_entries2[i].eax = cpuid_entries[i].eax;
+ cpuid_entries2[i].ebx = cpuid_entries[i].ebx;
+ cpuid_entries2[i].ecx = cpuid_entries[i].ecx;
+ cpuid_entries2[i].edx = cpuid_entries[i].edx;
+ cpuid_entries2[i].index = 0;
+ cpuid_entries2[i].flags = 0;
+ cpuid_entries2[i].padding[0] = 0;
+ cpuid_entries2[i].padding[1] = 0;
+ cpuid_entries2[i].padding[2] = 0;
}
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = cpuid_entries2;
vcpu->arch.cpuid_nent = cpuid->nent;
+
r = kvm_check_cpuid(vcpu);
if (r) {
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = NULL;
vcpu->arch.cpuid_nent = 0;
- kvfree(cpuid_entries);
- goto out;
+ goto out_free_cpuid;
}

cpuid_fix_nx_cap(vcpu);
kvm_update_cpuid_runtime(vcpu);
kvm_vcpu_after_set_cpuid(vcpu);

+out_free_cpuid:
kvfree(cpuid_entries);
out:
return r;
@@ -241,18 +253,31 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 __user *entries)
{
+ struct kvm_cpuid_entry2 *cpuid_entries2 = NULL;
int r;

r = -E2BIG;
if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
goto out;
r = -EFAULT;
- if (copy_from_user(&vcpu->arch.cpuid_entries, entries,
- cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
- goto out;
+
+ if (cpuid->nent) {
+ cpuid_entries2 = vmemdup_user(entries,
+ array_size(sizeof(cpuid_entries2[0]),
+ cpuid->nent));
+ if (IS_ERR(cpuid_entries2)) {
+ r = PTR_ERR(cpuid_entries2);
+ goto out;
+ }
+ }
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = cpuid_entries2;
vcpu->arch.cpuid_nent = cpuid->nent;
+
r = kvm_check_cpuid(vcpu);
if (r) {
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = NULL;
vcpu->arch.cpuid_nent = 0;
goto out;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1994602a0851..42259a6ec1d8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9610,6 +9610,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
kvm_mmu_destroy(vcpu);
srcu_read_unlock(&vcpu->kvm->srcu, idx);
free_page((unsigned long)vcpu->arch.pio_data);
+ kvfree(vcpu->arch.cpuid_entries);
if (!lapic_in_kernel(vcpu))
static_key_slow_dec(&kvm_no_apic_vcpu);
}
--
2.25.4

2020-09-15 22:52:04

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC 2/2] KVM: x86: bump KVM_MAX_CPUID_ENTRIES

As vcpu->arch.cpuid_entries is now allocated dynamically, the only
remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/
KVM_SET_CPUID2 input for sanity. Since it was reported that the
current limit (80) is insufficient for some CPUs, bump
KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new
limit.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0c5f2ca3e838..c1d942c9eb58 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -132,7 +132,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
#define KVM_REFILL_PAGES 25
-#define KVM_MAX_CPUID_ENTRIES 80
+#define KVM_MAX_CPUID_ENTRIES 256
#define KVM_NR_FIXED_MTRR_REGION 88
#define KVM_NR_VAR_MTRR 8

--
2.25.4

2020-09-16 03:52:32

by Wei Huang

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

On 09/15 05:51, Dr. David Alan Gilbert wrote:
> * Vitaly Kuznetsov ([email protected]) wrote:
> > With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for

Could you elaborate on this limit? On Rome, I counted ~35 CPUID functions which
include Fn0000_xxxx, Fn4000_xxxx and Fn8000_xxxx.

> > KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
> > from '40' in 2010. We can, of course, just bump it a little bit to fix
> > the immediate issue but the report made me wonder why we need to pre-
> > allocate vcpu->arch.cpuid_entries array instead of sizing it dynamically.
> > This RFC is intended to feed my curiosity.
> >
> > Very mildly tested with selftests/kvm-unit-tests and nothing seems to
> > break. I also don't have access to the system where the original issue
> > was reported but chances we're fixing it are very good IMO as just the
> > second patch alone was reported to be sufficient.
> >
> > Reported-by: Dr. David Alan Gilbert <[email protected]>
>
> Oh nice, I was just going to bump the magic number :-)
>
> Anyway, this seems to work for me, so:
>
> Tested-by: Dr. David Alan Gilbert <[email protected]>
>

I tested on two platforms and the patches worked fine. So no objection on the
design.

Tested-by: Wei Huang <[email protected]>

> > Vitaly Kuznetsov (2):
> > KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
> > KVM: x86: bump KVM_MAX_CPUID_ENTRIES
> >
> > arch/x86/include/asm/kvm_host.h | 4 +--
> > arch/x86/kvm/cpuid.c | 55 ++++++++++++++++++++++++---------
> > arch/x86/kvm/x86.c | 1 +
> > 3 files changed, 43 insertions(+), 17 deletions(-)
> >
> > --
> > 2.25.4
> >
> --
> Dr. David Alan Gilbert / [email protected] / Manchester, UK
>

2020-09-16 07:48:56

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

Wei Huang <[email protected]> writes:

> On 09/15 05:51, Dr. David Alan Gilbert wrote:
>> * Vitaly Kuznetsov ([email protected]) wrote:
>> > With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for
>
> Could you elaborate on this limit? On Rome, I counted ~35 CPUID functions which
> include Fn0000_xxxx, Fn4000_xxxx and Fn8000_xxxx.

Some CPUID functions have indicies (e.g. 0x00000004, 0x8000001d, ...)
and each existing index requires a separate entry. E.g. on my AMD EPYC
7401P host I can see:

# cpuid -r -1 | wc -l
53

Hyper-V emulation accounts for 11 leaves (we don't currently set them
all in QEMU but eventually we will). Also, we sometimes set both Intel
and AMD cpuid leaves in QEMU (David can probably elaborate on which/why)
and this all adds up. '80' seems to be an easy target to hit even with
existing CPUs :-)

--
Vitaly

2020-09-16 08:37:40

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

* Wei Huang ([email protected]) wrote:
> On 09/15 05:51, Dr. David Alan Gilbert wrote:
> > * Vitaly Kuznetsov ([email protected]) wrote:
> > > With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for
>
> Could you elaborate on this limit? On Rome, I counted ~35 CPUID functions which
> include Fn0000_xxxx, Fn4000_xxxx and Fn8000_xxxx.

On my 7302P the output of:
cpuid -1 -r | wc -l

is 61, there is one line of header in there.

However in a guest I see more; and I think that's because KVM tends to
list the CPUID entries for a lot of disabled Intel features, even on
AMD, e.g. 0x11-0x1f which AMD doesn't have, are listed in a KVM guest.
Then you add the KVM CPUIDs at 4...0 and 4....1.

IMHO we should be filtering those out for at least two reasons:
a) They're wrong
b) We're probably not keeping the set of visible CPUID fields the same
when we move between host kernels, and that can't be good for
migration.

Still, those are separate problems.

Dave

> > > KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
> > > from '40' in 2010. We can, of course, just bump it a little bit to fix
> > > the immediate issue but the report made me wonder why we need to pre-
> > > allocate vcpu->arch.cpuid_entries array instead of sizing it dynamically.
> > > This RFC is intended to feed my curiosity.
> > >
> > > Very mildly tested with selftests/kvm-unit-tests and nothing seems to
> > > break. I also don't have access to the system where the original issue
> > > was reported but chances we're fixing it are very good IMO as just the
> > > second patch alone was reported to be sufficient.
> > >
> > > Reported-by: Dr. David Alan Gilbert <[email protected]>
> >
> > Oh nice, I was just going to bump the magic number :-)
> >
> > Anyway, this seems to work for me, so:
> >
> > Tested-by: Dr. David Alan Gilbert <[email protected]>
> >
>
> I tested on two platforms and the patches worked fine. So no objection on the
> design.
>
> Tested-by: Wei Huang <[email protected]>
>
> > > Vitaly Kuznetsov (2):
> > > KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
> > > KVM: x86: bump KVM_MAX_CPUID_ENTRIES
> > >
> > > arch/x86/include/asm/kvm_host.h | 4 +--
> > > arch/x86/kvm/cpuid.c | 55 ++++++++++++++++++++++++---------
> > > arch/x86/kvm/x86.c | 1 +
> > > 3 files changed, 43 insertions(+), 17 deletions(-)
> > >
> > > --
> > > 2.25.4
> > >
> > --
> > Dr. David Alan Gilbert / [email protected] / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2020-09-16 19:27:21

by Wei Huang

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] KVM: x86: allow for more CPUID entries

On 09/16 09:33, Dr. David Alan Gilbert wrote:
> * Wei Huang ([email protected]) wrote:
> > On 09/15 05:51, Dr. David Alan Gilbert wrote:
> > > * Vitaly Kuznetsov ([email protected]) wrote:
> > > > With QEMU and newer AMD CPUs (namely: Epyc 'Rome') the current limit for
> >
> > Could you elaborate on this limit? On Rome, I counted ~35 CPUID functions which
> > include Fn0000_xxxx, Fn4000_xxxx and Fn8000_xxxx.
>
> On my 7302P the output of:
> cpuid -1 -r | wc -l
>
> is 61, there is one line of header in there.
>
> However in a guest I see more; and I think that's because KVM tends to
> list the CPUID entries for a lot of disabled Intel features, even on
> AMD, e.g. 0x11-0x1f which AMD doesn't have, are listed in a KVM guest.
> Then you add the KVM CPUIDs at 4...0 and 4....1.
>

It is indeed a mixing bag. Some are added even though AMD CPU doesn't define
them. BTW I also believe that cpuid command lists more CPUIDs than the real
value of cpuid->nent in kvm_vcpu_ioctl_set_cpuid(2).

Anyway I don't have objection to this patchset.

> IMHO we should be filtering those out for at least two reasons:
> a) They're wrong
> b) We're probably not keeping the set of visible CPUID fields the same
> when we move between host kernels, and that can't be good for
> migration.
>
> Still, those are separate problems.
>
> Dave
>
> > > > KVM_MAX_CPUID_ENTRIES(80) is reported to be hit. Last time it was raised
> > > > from '40' in 2010. We can, of course, just bump it a little bit to fix
> > > > the immediate issue but the report made me wonder why we need to pre-
> > > > allocate vcpu->arch.cpuid_entries array instead of sizing it dynamically.
> > > > This RFC is intended to feed my curiosity.
> > > >
> > > > Very mildly tested with selftests/kvm-unit-tests and nothing seems to
> > > > break. I also don't have access to the system where the original issue
> > > > was reported but chances we're fixing it are very good IMO as just the
> > > > second patch alone was reported to be sufficient.
> > > >
> > > > Reported-by: Dr. David Alan Gilbert <[email protected]>
> > >
> > > Oh nice, I was just going to bump the magic number :-)
> > >
> > > Anyway, this seems to work for me, so:
> > >
> > > Tested-by: Dr. David Alan Gilbert <[email protected]>
> > >
> >
> > I tested on two platforms and the patches worked fine. So no objection on the
> > design.
> >
> > Tested-by: Wei Huang <[email protected]>
> >
> > > > Vitaly Kuznetsov (2):
> > > > KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
> > > > KVM: x86: bump KVM_MAX_CPUID_ENTRIES
> > > >
> > > > arch/x86/include/asm/kvm_host.h | 4 +--
> > > > arch/x86/kvm/cpuid.c | 55 ++++++++++++++++++++++++---------
> > > > arch/x86/kvm/x86.c | 1 +
> > > > 3 files changed, 43 insertions(+), 17 deletions(-)
> > > >
> > > > --
> > > > 2.25.4
> > > >
> > > --
> > > Dr. David Alan Gilbert / [email protected] / Manchester, UK
> > >
> >
> --
> Dr. David Alan Gilbert / [email protected] / Manchester, UK
>

2020-09-18 02:43:58

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] KVM: x86: allocate vcpu->arch.cpuid_entries dynamically

On Tue, Sep 15, 2020 at 05:43:05PM +0200, Vitaly Kuznetsov wrote:
> The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
> is reported to be insufficient but before we bump it let's switch to
> allocating vcpu->arch.cpuid_entries dynamically. Currenly,

Currently,

> 'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
> 3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
> but having it pre-allocated (for all vCPUs which we also pre-allocate)
> gives us no benefits.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---

...

> @@ -241,18 +253,31 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
> struct kvm_cpuid2 *cpuid,
> struct kvm_cpuid_entry2 __user *entries)
> {
> + struct kvm_cpuid_entry2 *cpuid_entries2 = NULL;
> int r;
>
> r = -E2BIG;
> if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
> goto out;
> r = -EFAULT;
> - if (copy_from_user(&vcpu->arch.cpuid_entries, entries,
> - cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
> - goto out;
> +
> + if (cpuid->nent) {
> + cpuid_entries2 = vmemdup_user(entries,
> + array_size(sizeof(cpuid_entries2[0]),
> + cpuid->nent));

Any objection to using something super short like "e2" instead of cpuid_entries2
so that this can squeeze on a single line, or at least be a little more sane?

> + if (IS_ERR(cpuid_entries2)) {
> + r = PTR_ERR(cpuid_entries2);
> + goto out;

Don't suppose you'd want to opportunistically kill off these gotos?

> + }
> + }
> + kvfree(vcpu->arch.cpuid_entries);

This is a bit odd. The previous vcpu->arch.cpuid_entries is preserved on
allocation failure, but not on kvm_check_cpuid() failure. Being destructive
on the "check" failure was always a bit odd, but it really stands out now.

Given that kvm_check_cpuid() now only does an actual check and not a big
pile of updates, what if we refactored the guts of kvm_find_cpuid_entry()
into yet another helper so that kvm_check_cpuid() could check the input
before crushing vcpu->arch.cpuid_entries?

if (cpuid->nent) {
e2 = vmemdup_user(entries, array_size(sizeof(e2[0]), cpuid->nent));
if (IS_ERR(e2))
return PTR_ERR(e2);

r = kvm_check_cpuid(e2, cpuid->nent);
if (r)
return r;
}

vcpu->arch.cpuid_entries = e2;
vcpu->arch.cpuid_nent = cpuid->nent;
return 0;

> + vcpu->arch.cpuid_entries = cpuid_entries2;
> vcpu->arch.cpuid_nent = cpuid->nent;
> +
> r = kvm_check_cpuid(vcpu);
> if (r) {
> + kvfree(vcpu->arch.cpuid_entries);
> + vcpu->arch.cpuid_entries = NULL;
> vcpu->arch.cpuid_nent = 0;
> goto out;
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1994602a0851..42259a6ec1d8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9610,6 +9610,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> kvm_mmu_destroy(vcpu);
> srcu_read_unlock(&vcpu->kvm->srcu, idx);
> free_page((unsigned long)vcpu->arch.pio_data);
> + kvfree(vcpu->arch.cpuid_entries);
> if (!lapic_in_kernel(vcpu))
> static_key_slow_dec(&kvm_no_apic_vcpu);
> }
> --
> 2.25.4
>

2020-10-01 10:05:53

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] KVM: x86: allocate vcpu->arch.cpuid_entries dynamically

Sean Christopherson <[email protected]> writes:

> On Tue, Sep 15, 2020 at 05:43:05PM +0200, Vitaly Kuznetsov wrote:
>> The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
>> is reported to be insufficient but before we bump it let's switch to
>> allocating vcpu->arch.cpuid_entries dynamically. Currenly,
>
> Currently,
>
>> 'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
>> 3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
>> but having it pre-allocated (for all vCPUs which we also pre-allocate)
>> gives us no benefits.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>
> ...
>
>> @@ -241,18 +253,31 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
>> struct kvm_cpuid2 *cpuid,
>> struct kvm_cpuid_entry2 __user *entries)
>> {
>> + struct kvm_cpuid_entry2 *cpuid_entries2 = NULL;
>> int r;
>>
>> r = -E2BIG;
>> if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
>> goto out;
>> r = -EFAULT;
>> - if (copy_from_user(&vcpu->arch.cpuid_entries, entries,
>> - cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
>> - goto out;
>> +
>> + if (cpuid->nent) {
>> + cpuid_entries2 = vmemdup_user(entries,
>> + array_size(sizeof(cpuid_entries2[0]),
>> + cpuid->nent));
>
> Any objection to using something super short like "e2" instead of cpuid_entries2
> so that this can squeeze on a single line, or at least be a little more sane?
>
>> + if (IS_ERR(cpuid_entries2)) {
>> + r = PTR_ERR(cpuid_entries2);
>> + goto out;
>
> Don't suppose you'd want to opportunistically kill off these gotos?
>
>> + }
>> + }
>> + kvfree(vcpu->arch.cpuid_entries);
>
> This is a bit odd. The previous vcpu->arch.cpuid_entries is preserved on
> allocation failure, but not on kvm_check_cpuid() failure. Being destructive
> on the "check" failure was always a bit odd, but it really stands out now.
>
> Given that kvm_check_cpuid() now only does an actual check and not a big
> pile of updates, what if we refactored the guts of kvm_find_cpuid_entry()
> into yet another helper so that kvm_check_cpuid() could check the input
> before crushing vcpu->arch.cpuid_entries?
>
> if (cpuid->nent) {
> e2 = vmemdup_user(entries, array_size(sizeof(e2[0]), cpuid->nent));
> if (IS_ERR(e2))
> return PTR_ERR(e2);
>
> r = kvm_check_cpuid(e2, cpuid->nent);
> if (r)
> return r;
> }
>
> vcpu->arch.cpuid_entries = e2;
> vcpu->arch.cpuid_nent = cpuid->nent;
> return 0;
>

(I somehow missed this and forgot about the whole thing).

Makes sense to me, will do in v1.

Overall, it seems nobody is against the general idea so I'll prepeare
v1.

Thanks!

>> + vcpu->arch.cpuid_entries = cpuid_entries2;
>> vcpu->arch.cpuid_nent = cpuid->nent;
>> +
>> r = kvm_check_cpuid(vcpu);
>> if (r) {
>> + kvfree(vcpu->arch.cpuid_entries);
>> + vcpu->arch.cpuid_entries = NULL;
>> vcpu->arch.cpuid_nent = 0;
>> goto out;
>> }
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 1994602a0851..42259a6ec1d8 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -9610,6 +9610,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> kvm_mmu_destroy(vcpu);
>> srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> free_page((unsigned long)vcpu->arch.pio_data);
>> + kvfree(vcpu->arch.cpuid_entries);
>> if (!lapic_in_kernel(vcpu))
>> static_key_slow_dec(&kvm_no_apic_vcpu);
>> }
>> --
>> 2.25.4
>>
>

--
Vitaly