LinuxLists.cc - [PATCH 0/6] x86/kvm: add boot parameters for max vcpu configs

2021-07-01 15:42:34

Subject: [PATCH 0/6] x86/kvm: add boot parameters for max vcpu configs

In order to be able to have a single kernel for supporting even huge
numbers of vcpus per guest some arrays should be sized dynamically.

The easiest way to do that is to add boot parameters for the maximum
number of vcpus and the highest supported vcpu-id overwriting the
normal default.

This patch series is doing that for x86. The same scheme can be easily
adapted to other architectures, but I don't want to do that in the
first iteration.

In the long term I'd suggest to have a per-guest setting of the two
parameters allowing to spare some memory for smaller guests. OTOH this
would require new ioctl()s and respective qemu modifications, so I let
those away for now.

I've tested the series not to break normal guest operation and the new
parameters to be effective on x86. For Arm64 I did a compile test only.

Juergen Gross (6):
x86/kvm: fix vcpu-id indexed array sizes
x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h
x86/kvm: add boot parameter for maximum vcpu-id
x86/kvm: introduce per cpu vcpu masks
kvm: allocate vcpu pointer array separately
x86/kvm: add boot parameter for setting max number of vcpus per guest

.../admin-guide/kernel-parameters.txt | 18 +++++++
arch/arm64/kvm/arm.c | 28 +++++++++--
arch/x86/include/asm/kvm_host.h | 22 ++++++---
arch/x86/kvm/hyperv.c | 25 +++++++---
arch/x86/kvm/ioapic.c | 14 +++++-
arch/x86/kvm/ioapic.h | 8 +--
arch/x86/kvm/irq_comm.c | 9 +++-
arch/x86/kvm/x86.c | 49 ++++++++++++++++++-
include/linux/kvm_host.h | 17 ++++++-
9 files changed, 160 insertions(+), 30 deletions(-)

--
2.26.2

2021-07-01 15:42:38

by Juergen Gross

[permalink] [raw]

Subject: [PATCH 1/6] x86/kvm: fix vcpu-id indexed array sizes

KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
elements.

Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
an odd number, resulting in always enough padding being available at
the end of those arrays.

Nevertheless this should be fixed in order to avoid rare problems in
case someone is using an even number for KVM_MAX_VCPU_ID.

Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/ioapic.h | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 698969e18fe3..ff005fe738a4 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -96,7 +96,7 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
static void rtc_irq_eoi_tracking_reset(struct kvm_ioapic *ioapic)
{
ioapic->rtc_status.pending_eoi = 0;
- bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID);
+ bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID + 1);
}

static void kvm_rtc_eoi_tracking_restore_all(struct kvm_ioapic *ioapic);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 660401700075..11e4065e1617 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -43,13 +43,13 @@ struct kvm_vcpu;

struct dest_map {
/* vcpu bitmap where IRQ has been sent */
- DECLARE_BITMAP(map, KVM_MAX_VCPU_ID);
+ DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);

/*
* Vector sent to a given vcpu, only valid when
* the vcpu's bit in map is set
*/
- u8 vectors[KVM_MAX_VCPU_ID];
+ u8 vectors[KVM_MAX_VCPU_ID + 1];
};

--
2.26.2

2021-07-01 15:43:53

by Juergen Gross

[permalink] [raw]

Subject: [PATCH 4/6] x86/kvm: introduce per cpu vcpu masks

In order to support high vcpu numbers per guest don't use on stack
vcpu bitmasks. As all those currently used bitmasks are not used in
functions subject to recursion it is fairly easy to replace them with
percpu bitmasks.

Disable preemption while such a bitmask is being used in order to
avoid double usage in case we'd switch cpus.

Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 7 +++++++
arch/x86/kvm/hyperv.c | 25 ++++++++++++++++++-------
arch/x86/kvm/irq_comm.c | 9 +++++++--
arch/x86/kvm/x86.c | 20 +++++++++++++++++++-
4 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88b1ff898fb9..79138c91f83d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1514,6 +1514,13 @@ extern u64 kvm_default_tsc_scaling_ratio;
extern bool kvm_has_bus_lock_exit;
/* maximum vcpu-id */
extern unsigned int max_vcpu_id;
+/* per cpu vcpu bitmasks (disable preemption during usage) */
+extern unsigned long __percpu *kvm_pcpu_vcpu_mask;
+#define KVM_VCPU_MASK_SZ \
+ (sizeof(*kvm_pcpu_vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS))
+extern u64 __percpu *kvm_hv_vp_bitmap;
+#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
+#define KVM_HV_VPMAP_SZ (sizeof(u64) * KVM_HV_MAX_SPARSE_VCPU_SET_BITS)

extern u64 kvm_mce_cap_supported;

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index f00830e5202f..32d31a7334fa 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -40,7 +40,7 @@
/* "Hv#1" signature */
#define HYPERV_CPUID_SIGNATURE_EAX 0x31237648

-#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
+u64 __percpu *kvm_hv_vp_bitmap;

static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
bool vcpu_kick);
@@ -1612,8 +1612,7 @@ static __always_inline unsigned long *sparse_set_to_vcpu_mask(
struct kvm_vcpu *vcpu;
int i, bank, sbank = 0;

- memset(vp_bitmap, 0,
- KVM_HV_MAX_SPARSE_VCPU_SET_BITS * sizeof(*vp_bitmap));
+ memset(vp_bitmap, 0, KVM_HV_VPMAP_SZ);
for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
vp_bitmap[bank] = sparse_banks[sbank++];
@@ -1637,8 +1636,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
struct hv_tlb_flush_ex flush_ex;
struct hv_tlb_flush flush;
- u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
- DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+ u64 *vp_bitmap;
+ unsigned long *vcpu_bitmap;
unsigned long *vcpu_mask;
u64 valid_bank_mask;
u64 sparse_banks[64];
@@ -1696,6 +1695,10 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool

cpumask_clear(&hv_vcpu->tlb_flush);

+ preempt_disable();
+ vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+ vp_bitmap = this_cpu_ptr(kvm_hv_vp_bitmap);
+
vcpu_mask = all_cpus ? NULL :
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
vp_bitmap, vcpu_bitmap);
@@ -1707,6 +1710,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool
kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH,
NULL, vcpu_mask, &hv_vcpu->tlb_flush);

+ preempt_enable();
+
ret_success:
/* We always do full TLB flush, set rep_done = rep_cnt. */
return (u64)HV_STATUS_SUCCESS |
@@ -1738,8 +1743,8 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, u64 ingpa, u64 outgpa,
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
- u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
- DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+ u64 *vp_bitmap;
+ unsigned long *vcpu_bitmap;
unsigned long *vcpu_mask;
unsigned long valid_bank_mask;
u64 sparse_banks[64];
@@ -1796,12 +1801,18 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, u64 ingpa, u64 outgpa,
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return HV_STATUS_INVALID_HYPERCALL_INPUT;

+ preempt_disable();
+ vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+ vp_bitmap = this_cpu_ptr(kvm_hv_vp_bitmap);
+
vcpu_mask = all_cpus ? NULL :
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
vp_bitmap, vcpu_bitmap);

kvm_send_ipi_to_many(kvm, vector, vcpu_mask);

+ preempt_enable();
+
ret_success:
return HV_STATUS_SUCCESS;
}
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index d5b72a08e566..be4424ddcd8f 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -47,7 +47,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
{
int i, r = -1;
struct kvm_vcpu *vcpu, *lowest = NULL;
- unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+ unsigned long *dest_vcpu_bitmap;
unsigned int dest_vcpus = 0;

if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
@@ -59,7 +59,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
irq->delivery_mode = APIC_DM_FIXED;
}

- memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+ preempt_disable();
+ dest_vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+
+ memset(dest_vcpu_bitmap, 0, KVM_VCPU_MASK_SZ);

kvm_for_each_vcpu(i, vcpu, kvm) {
if (!kvm_apic_present(vcpu))
@@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
lowest = kvm_get_vcpu(kvm, idx);
}

+ preempt_enable();
+
if (lowest)
r = kvm_apic_set_irq(lowest, irq, dest_map);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0390d90fd360..3af398ef1fc9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -180,6 +180,8 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
module_param(max_vcpu_id, uint, S_IRUGO);

+unsigned long __percpu *kvm_pcpu_vcpu_mask;
+
/*
* Restoring the host value for MSRs that are only consumed when running in
* usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
@@ -10646,9 +10648,18 @@ int kvm_arch_hardware_setup(void *opaque)
if (boot_cpu_has(X86_FEATURE_XSAVES))
rdmsrl(MSR_IA32_XSS, host_xss);

+ kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
+ sizeof(unsigned long));
+ kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
+
+ if (!kvm_pcpu_vcpu_mask || !kvm_hv_vp_bitmap) {
+ r = -ENOMEM;
+ goto err;
+ }
+
r = ops->hardware_setup();
if (r != 0)
- return r;
+ goto err;

memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
kvm_ops_static_call_update();
@@ -10676,11 +10687,18 @@ int kvm_arch_hardware_setup(void *opaque)

kvm_init_msr_list();
return 0;
+
+ err:
+ free_percpu(kvm_pcpu_vcpu_mask);
+ free_percpu(kvm_hv_vp_bitmap);
+ return r;
}

void kvm_arch_hardware_unsetup(void)
{
static_call(kvm_x86_hardware_unsetup)();
+ free_percpu(kvm_pcpu_vcpu_mask);
+ free_percpu(kvm_hv_vp_bitmap);
}

int kvm_arch_check_processor_compat(void *opaque)
--
2.26.2

2021-07-01 15:44:40

by Juergen Gross

[permalink] [raw]

Subject: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

Today the maximum number of vcpus of a kvm guest is set via a #define
in a header file.

In order to support higher vcpu numbers for guests without generally
increasing the memory consumption of guests on the host especially on
very large systems add a boot parameter for specifying the number of
allowed vcpus for guests.

The default will still be the current setting of 288. The value 0 has
the special meaning to limit the number of possible vcpus to the
number of possible cpus of the host.

Signed-off-by: Juergen Gross <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
arch/x86/include/asm/kvm_host.h | 5 ++++-
arch/x86/kvm/x86.c | 7 +++++++
3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 99bfa53a2bbd..8eb856396ffa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2373,6 +2373,16 @@
guest can't have more vcpus than the set value + 1.
Default: 1023

+ kvm.max_vcpus= [KVM,X86] Set the maximum allowed numbers of vcpus per
+ guest. The special value 0 sets the limit to the number
+ of physical cpus possible on the host (including not
+ yet hotplugged cpus). Higher values will result in
+ slightly higher memory consumption per guest. Depending
+ on the value and the virtual topology the maximum
+ allowed vcpu-id might need to be raised, too (see
+ kvm.max_vcpu_id parameter).
+ Default: 288
+
l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 39cbc4b6bffb..65ae82a5d444 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,7 +37,8 @@

#define __KVM_HAVE_ARCH_VCPU_DEBUGFS

-#define KVM_MAX_VCPUS 288
+#define KVM_DEFAULT_MAX_VCPUS 288
+#define KVM_MAX_VCPUS max_vcpus
#define KVM_SOFT_MAX_VCPUS 240
#define KVM_DEFAULT_MAX_VCPU_ID 1023
#define KVM_MAX_VCPU_ID max_vcpu_id
@@ -1509,6 +1510,8 @@ extern u64 kvm_max_tsc_scaling_ratio;
extern u64 kvm_default_tsc_scaling_ratio;
/* bus lock detection supported? */
extern bool kvm_has_bus_lock_exit;
+/* maximum number of vcpus per guest */
+extern unsigned int max_vcpus;
/* maximum vcpu-id */
extern unsigned int max_vcpu_id;
/* per cpu vcpu bitmasks (disable preemption during usage) */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9b0bb2221ea..888c4507504d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -177,6 +177,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);

+unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
+module_param(max_vcpus, uint, S_IRUGO);
+EXPORT_SYMBOL_GPL(max_vcpus);
+
unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
module_param(max_vcpu_id, uint, S_IRUGO);

@@ -10648,6 +10652,9 @@ int kvm_arch_hardware_setup(void *opaque)
if (boot_cpu_has(X86_FEATURE_XSAVES))
rdmsrl(MSR_IA32_XSS, host_xss);

+ if (max_vcpus == 0)
+ max_vcpus = num_possible_cpus();
+
kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
sizeof(unsigned long));
kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
--
2.26.2

2021-07-01 15:45:48

by Juergen Gross

[permalink] [raw]

Subject: [PATCH 3/6] x86/kvm: add boot parameter for maximum vcpu-id

Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
via a #define in a header file.

In order to support higher vcpu-ids without generally increasing the
memory consumption of guests on the host (some guest structures contain
arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for selecting the
maximum vcpu-id. Per default it will still be the current value of
1023, but it can be set manually to higher or lower values.

This requires to allocate the arrays using KVM_MAX_VCPU_ID as the size
dynamically.

Signed-off-by: Juergen Gross <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
arch/x86/include/asm/kvm_host.h | 5 ++++-
arch/x86/kvm/ioapic.c | 12 +++++++++++-
arch/x86/kvm/ioapic.h | 4 ++--
arch/x86/kvm/x86.c | 3 +++
5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..99bfa53a2bbd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2365,6 +2365,14 @@
feature (tagged TLBs) on capable Intel chips.
Default is 1 (enabled)

+ kvm.max_vcpu_id=
+ [KVM,X86] Set the maximum allowed vcpu-id of a guest.
+ Some memory structure sizes depend on this value, so it
+ shouldn't be set too high. Note that each vcpu of a
+ guests needs to have a unique vcpu-id, so a single
+ guest can't have more vcpus than the set value + 1.
+ Default: 1023
+
l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9c7ced0e3171..88b1ff898fb9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -39,7 +39,8 @@

#define KVM_MAX_VCPUS 288
#define KVM_SOFT_MAX_VCPUS 240
-#define KVM_MAX_VCPU_ID 1023
+#define KVM_DEFAULT_MAX_VCPU_ID 1023
+#define KVM_MAX_VCPU_ID max_vcpu_id
/* memory slots that are not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 3

@@ -1511,6 +1512,8 @@ extern u64 kvm_max_tsc_scaling_ratio;
extern u64 kvm_default_tsc_scaling_ratio;
/* bus lock detection supported? */
extern bool kvm_has_bus_lock_exit;
+/* maximum vcpu-id */
+extern unsigned int max_vcpu_id;

extern u64 kvm_mce_cap_supported;

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index ff005fe738a4..52e8ea90c914 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
int kvm_ioapic_init(struct kvm *kvm)
{
struct kvm_ioapic *ioapic;
+ size_t sz;
int ret;

- ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
+ sz = sizeof(struct kvm_ioapic) +
+ sizeof(*ioapic->rtc_status.dest_map.map) *
+ BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1) +
+ sizeof(*ioapic->rtc_status.dest_map.vectors) *
+ (KVM_MAX_VCPU_ID + 1);
+ ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
if (!ioapic)
return -ENOMEM;
+ ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
+ ioapic->rtc_status.dest_map.vectors =
+ (void *)(ioapic->rtc_status.dest_map.map +
+ BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1));
spin_lock_init(&ioapic->lock);
INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
kvm->arch.vioapic = ioapic;
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index bbd4a5d18b5d..623a3c5afad7 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -39,13 +39,13 @@ struct kvm_vcpu;

struct dest_map {
/* vcpu bitmap where IRQ has been sent */
- DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
+ unsigned long *map;

/*
* Vector sent to a given vcpu, only valid when
* the vcpu's bit in map is set
*/
- u8 vectors[KVM_MAX_VCPU_ID + 1];
+ u8 *vectors;
};

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0f4a46649d7..0390d90fd360 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -177,6 +177,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);

+unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
+module_param(max_vcpu_id, uint, S_IRUGO);
+
/*
* Restoring the host value for MSRs that are only consumed when running in
* usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
--
2.26.2

2021-07-01 15:45:54

by Juergen Gross

[permalink] [raw]

Subject: [PATCH 5/6] kvm: allocate vcpu pointer array separately

Prepare support of very large vcpu numbers per guest by moving the
vcpu pointer array out of struct kvm.

Signed-off-by: Juergen Gross <[email protected]>
---
arch/arm64/kvm/arm.c | 28 ++++++++++++++++++++++++----
arch/x86/include/asm/kvm_host.h | 5 +----
arch/x86/kvm/x86.c | 19 +++++++++++++++++++
include/linux/kvm_host.h | 17 +++++++++++++++--
4 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e720148232a0..4f055408fe9f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -280,18 +280,38 @@ long kvm_arch_dev_ioctl(struct file *filp,

struct kvm *kvm_arch_alloc_vm(void)
{
+ struct kvm *kvm;
+
if (!has_vhe())
- return kzalloc(sizeof(struct kvm), GFP_KERNEL);
+ kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+ else
+ kvm = vzalloc(sizeof(struct kvm));

- return vzalloc(sizeof(struct kvm));
+ if (!kvm)
+ return NULL;
+
+ if (!has_vhe())
+ kvm->vcpus = kcalloc(KVM_MAX_VCPUS, sizeof(void *), GFP_KERNEL);
+ else
+ kvm->vcpus = vzalloc(KVM_MAX_VCPUS * sizeof(void *));
+
+ if (!kvm->vcpus) {
+ kvm_arch_free_vm(kvm);
+ kvm = NULL;
+ }
+
+ return kvm;
}

void kvm_arch_free_vm(struct kvm *kvm)
{
- if (!has_vhe())
+ if (!has_vhe()) {
+ kfree(kvm->vcpus);
kfree(kvm);
- else
+ } else {
+ vfree(kvm->vcpus);
vfree(kvm);
+ }
}

int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 79138c91f83d..39cbc4b6bffb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1440,10 +1440,7 @@ static inline void kvm_ops_static_call_update(void)
}

#define __KVM_HAVE_ARCH_VM_ALLOC
-static inline struct kvm *kvm_arch_alloc_vm(void)
-{
- return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-}
+struct kvm *kvm_arch_alloc_vm(void);
void kvm_arch_free_vm(struct kvm *kvm);

#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3af398ef1fc9..a9b0bb2221ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10741,9 +10741,28 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
static_call(kvm_x86_sched_in)(vcpu, cpu);
}

+struct kvm *kvm_arch_alloc_vm(void)
+{
+ struct kvm *kvm;
+
+ kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!kvm)
+ return NULL;
+
+ kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
+ GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!kvm->vcpus) {
+ vfree(kvm);
+ kvm = NULL;
+ }
+
+ return kvm;
+}
+
void kvm_arch_free_vm(struct kvm *kvm)
{
kfree(to_kvm_hv(kvm)->hv_pa_pg);
+ vfree(kvm->vcpus);
vfree(kvm);
}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..e424ef1078a1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -525,7 +525,7 @@ struct kvm {
struct mutex slots_lock;
struct mm_struct *mm; /* userspace tied to this vm */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
- struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+ struct kvm_vcpu **vcpus;

/*
* created_vcpus is protected by kvm->lock, and is incremented
@@ -1022,11 +1022,24 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm);
*/
static inline struct kvm *kvm_arch_alloc_vm(void)
{
- return kzalloc(sizeof(struct kvm), GFP_KERNEL);
+ struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+
+ if (!kvm)
+ return NULL;
+
+ kvm->vcpus = kcalloc(KVM_MAX_VCPUS, sizeof(void *), GFP_KERNEL);
+ if (!kvm->vcpus) {
+ kfree(kvm);
+ kvm = NULL;
+ }
+
+ return kvm;
}

static inline void kvm_arch_free_vm(struct kvm *kvm)
{
+ if (kvm)
+ kfree(kvm->vcpus);
kfree(kvm);
}
#endif
--
2.26.2

2021-07-14 11:16:10

by Vitaly Kuznetsov

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

Juergen Gross <[email protected]> writes:

> Today the maximum number of vcpus of a kvm guest is set via a #define
> in a header file.
>
> In order to support higher vcpu numbers for guests without generally
> increasing the memory consumption of guests on the host especially on
> very large systems add a boot parameter for specifying the number of
> allowed vcpus for guests.
>
> The default will still be the current setting of 288. The value 0 has
> the special meaning to limit the number of possible vcpus to the
> number of possible cpus of the host.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
> arch/x86/include/asm/kvm_host.h | 5 ++++-
> arch/x86/kvm/x86.c | 7 +++++++
> 3 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 99bfa53a2bbd..8eb856396ffa 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2373,6 +2373,16 @@
> guest can't have more vcpus than the set value + 1.
> Default: 1023
>
> + kvm.max_vcpus= [KVM,X86] Set the maximum allowed numbers of vcpus per
> + guest. The special value 0 sets the limit to the number
> + of physical cpus possible on the host (including not
> + yet hotplugged cpus). Higher values will result in
> + slightly higher memory consumption per guest. Depending
> + on the value and the virtual topology the maximum
> + allowed vcpu-id might need to be raised, too (see
> + kvm.max_vcpu_id parameter).

I'd suggest to at least add a sanity check: 'max_vcpu_id' should always
be >= 'max_vcpus'. Alternatively, we can replace 'max_vcpu_id' with say
'vcpu_id_to_vcpus_ratio' and set it to e.g. '4' by default.

> + Default: 288
> +
> l1tf= [X86] Control mitigation of the L1TF vulnerability on
> affected CPUs
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 39cbc4b6bffb..65ae82a5d444 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -37,7 +37,8 @@
>
> #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>
> -#define KVM_MAX_VCPUS 288
> +#define KVM_DEFAULT_MAX_VCPUS 288
> +#define KVM_MAX_VCPUS max_vcpus
> #define KVM_SOFT_MAX_VCPUS 240
> #define KVM_DEFAULT_MAX_VCPU_ID 1023
> #define KVM_MAX_VCPU_ID max_vcpu_id
> @@ -1509,6 +1510,8 @@ extern u64 kvm_max_tsc_scaling_ratio;
> extern u64 kvm_default_tsc_scaling_ratio;
> /* bus lock detection supported? */
> extern bool kvm_has_bus_lock_exit;
> +/* maximum number of vcpus per guest */
> +extern unsigned int max_vcpus;
> /* maximum vcpu-id */
> extern unsigned int max_vcpu_id;
> /* per cpu vcpu bitmasks (disable preemption during usage) */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a9b0bb2221ea..888c4507504d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -177,6 +177,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
> int __read_mostly pi_inject_timer = -1;
> module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>
> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
> +module_param(max_vcpus, uint, S_IRUGO);
> +EXPORT_SYMBOL_GPL(max_vcpus);
> +
> unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
> module_param(max_vcpu_id, uint, S_IRUGO);
>
> @@ -10648,6 +10652,9 @@ int kvm_arch_hardware_setup(void *opaque)
> if (boot_cpu_has(X86_FEATURE_XSAVES))
> rdmsrl(MSR_IA32_XSS, host_xss);
>
> + if (max_vcpus == 0)
> + max_vcpus = num_possible_cpus();

Is this special case really needed? I mean 'max_vcpus' is not '0' by
default so whoever sets it manually probably knows how big his guests
are going to be anyway and it is not always obvious how many CPUs are
reported by 'num_possible_cpus()' (ACPI tables can be weird for example).

> +
> kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
> sizeof(unsigned long));
> kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));

--
Vitaly

2021-07-14 11:25:42

by Juergen Gross

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

On 14.07.21 13:15, Vitaly Kuznetsov wrote:
> Juergen Gross <[email protected]> writes:
>
>> Today the maximum number of vcpus of a kvm guest is set via a #define
>> in a header file.
>>
>> In order to support higher vcpu numbers for guests without generally
>> increasing the memory consumption of guests on the host especially on
>> very large systems add a boot parameter for specifying the number of
>> allowed vcpus for guests.
>>
>> The default will still be the current setting of 288. The value 0 has
>> the special meaning to limit the number of possible vcpus to the
>> number of possible cpus of the host.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
>> arch/x86/include/asm/kvm_host.h | 5 ++++-
>> arch/x86/kvm/x86.c | 7 +++++++
>> 3 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 99bfa53a2bbd..8eb856396ffa 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2373,6 +2373,16 @@
>> guest can't have more vcpus than the set value + 1.
>> Default: 1023
>>
>> + kvm.max_vcpus= [KVM,X86] Set the maximum allowed numbers of vcpus per
>> + guest. The special value 0 sets the limit to the number
>> + of physical cpus possible on the host (including not
>> + yet hotplugged cpus). Higher values will result in
>> + slightly higher memory consumption per guest. Depending
>> + on the value and the virtual topology the maximum
>> + allowed vcpu-id might need to be raised, too (see
>> + kvm.max_vcpu_id parameter).
>
> I'd suggest to at least add a sanity check: 'max_vcpu_id' should always
> be >= 'max_vcpus'. Alternatively, we can replace 'max_vcpu_id' with say
> 'vcpu_id_to_vcpus_ratio' and set it to e.g. '4' by default.

Either would be fine with me.

A default of '2' for the ratio would seem more appropriate for me,
however. A thread count per core not being a power of 2 is quite
unlikely, and the worst case scenario for cores per socket would be
2^n + 1.

>
>> + Default: 288
>> +
>> l1tf= [X86] Control mitigation of the L1TF vulnerability on
>> affected CPUs
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 39cbc4b6bffb..65ae82a5d444 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -37,7 +37,8 @@
>>
>> #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>
>> -#define KVM_MAX_VCPUS 288
>> +#define KVM_DEFAULT_MAX_VCPUS 288
>> +#define KVM_MAX_VCPUS max_vcpus
>> #define KVM_SOFT_MAX_VCPUS 240
>> #define KVM_DEFAULT_MAX_VCPU_ID 1023
>> #define KVM_MAX_VCPU_ID max_vcpu_id
>> @@ -1509,6 +1510,8 @@ extern u64 kvm_max_tsc_scaling_ratio;
>> extern u64 kvm_default_tsc_scaling_ratio;
>> /* bus lock detection supported? */
>> extern bool kvm_has_bus_lock_exit;
>> +/* maximum number of vcpus per guest */
>> +extern unsigned int max_vcpus;
>> /* maximum vcpu-id */
>> extern unsigned int max_vcpu_id;
>> /* per cpu vcpu bitmasks (disable preemption during usage) */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a9b0bb2221ea..888c4507504d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -177,6 +177,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>> int __read_mostly pi_inject_timer = -1;
>> module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>>
>> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
>> +module_param(max_vcpus, uint, S_IRUGO);
>> +EXPORT_SYMBOL_GPL(max_vcpus);
>> +
>> unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
>> module_param(max_vcpu_id, uint, S_IRUGO);
>>
>> @@ -10648,6 +10652,9 @@ int kvm_arch_hardware_setup(void *opaque)
>> if (boot_cpu_has(X86_FEATURE_XSAVES))
>> rdmsrl(MSR_IA32_XSS, host_xss);
>>
>> + if (max_vcpus == 0)
>> + max_vcpus = num_possible_cpus();
>
> Is this special case really needed? I mean 'max_vcpus' is not '0' by
> default so whoever sets it manually probably knows how big his guests
> are going to be anyway and it is not always obvious how many CPUs are
> reported by 'num_possible_cpus()' (ACPI tables can be weird for example).

The idea was to make it easy for anyone managing a large fleet of hosts
and wanting to have a common setting for all of them.

It would even be possible to use '0' as the default (probably via config
option only).

>
>> +
>> kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
>> sizeof(unsigned long));
>> kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
>

Thanks for the feedback,

Juergen

Attachments:

OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key OpenPGP_signature (505.00 B)
OpenPGP digital signature Download all attachments

2021-07-14 11:49:33

by Vitaly Kuznetsov

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

Juergen Gross <[email protected]> writes:

> On 14.07.21 13:15, Vitaly Kuznetsov wrote:
>> Juergen Gross <[email protected]> writes:
>>
>>> Today the maximum number of vcpus of a kvm guest is set via a #define
>>> in a header file.
>>>
>>> In order to support higher vcpu numbers for guests without generally
>>> increasing the memory consumption of guests on the host especially on
>>> very large systems add a boot parameter for specifying the number of
>>> allowed vcpus for guests.
>>>
>>> The default will still be the current setting of 288. The value 0 has
>>> the special meaning to limit the number of possible vcpus to the
>>> number of possible cpus of the host.
>>>
>>> Signed-off-by: Juergen Gross <[email protected]>
>>> ---
>>> Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
>>> arch/x86/include/asm/kvm_host.h | 5 ++++-
>>> arch/x86/kvm/x86.c | 7 +++++++
>>> 3 files changed, 21 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 99bfa53a2bbd..8eb856396ffa 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -2373,6 +2373,16 @@
>>> guest can't have more vcpus than the set value + 1.
>>> Default: 1023
>>>
>>> + kvm.max_vcpus= [KVM,X86] Set the maximum allowed numbers of vcpus per
>>> + guest. The special value 0 sets the limit to the number
>>> + of physical cpus possible on the host (including not
>>> + yet hotplugged cpus). Higher values will result in
>>> + slightly higher memory consumption per guest. Depending
>>> + on the value and the virtual topology the maximum
>>> + allowed vcpu-id might need to be raised, too (see
>>> + kvm.max_vcpu_id parameter).
>>
>> I'd suggest to at least add a sanity check: 'max_vcpu_id' should always
>> be >= 'max_vcpus'. Alternatively, we can replace 'max_vcpu_id' with say
>> 'vcpu_id_to_vcpus_ratio' and set it to e.g. '4' by default.
>
> Either would be fine with me.
>
> A default of '2' for the ratio would seem more appropriate for me,
> however. A thread count per core not being a power of 2 is quite
> unlikely, and the worst case scenario for cores per socket would be
> 2^n + 1.
>

(I vaguely recall AMD EPYC had more than thread id (package id?)
encapsulated into APIC id).

Personally, I'd vote for introducing a 'ratio' parameter then so
generally users will only have to set 'kvm.max_vcpus'.

>>
>>> + Default: 288
>>> +
>>> l1tf= [X86] Control mitigation of the L1TF vulnerability on
>>> affected CPUs
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 39cbc4b6bffb..65ae82a5d444 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -37,7 +37,8 @@
>>>
>>> #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>>
>>> -#define KVM_MAX_VCPUS 288
>>> +#define KVM_DEFAULT_MAX_VCPUS 288
>>> +#define KVM_MAX_VCPUS max_vcpus
>>> #define KVM_SOFT_MAX_VCPUS 240
>>> #define KVM_DEFAULT_MAX_VCPU_ID 1023
>>> #define KVM_MAX_VCPU_ID max_vcpu_id
>>> @@ -1509,6 +1510,8 @@ extern u64 kvm_max_tsc_scaling_ratio;
>>> extern u64 kvm_default_tsc_scaling_ratio;
>>> /* bus lock detection supported? */
>>> extern bool kvm_has_bus_lock_exit;
>>> +/* maximum number of vcpus per guest */
>>> +extern unsigned int max_vcpus;
>>> /* maximum vcpu-id */
>>> extern unsigned int max_vcpu_id;
>>> /* per cpu vcpu bitmasks (disable preemption during usage) */
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index a9b0bb2221ea..888c4507504d 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -177,6 +177,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>>> int __read_mostly pi_inject_timer = -1;
>>> module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>>>
>>> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
>>> +module_param(max_vcpus, uint, S_IRUGO);
>>> +EXPORT_SYMBOL_GPL(max_vcpus);
>>> +
>>> unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
>>> module_param(max_vcpu_id, uint, S_IRUGO);
>>>
>>> @@ -10648,6 +10652,9 @@ int kvm_arch_hardware_setup(void *opaque)
>>> if (boot_cpu_has(X86_FEATURE_XSAVES))
>>> rdmsrl(MSR_IA32_XSS, host_xss);
>>>
>>> + if (max_vcpus == 0)
>>> + max_vcpus = num_possible_cpus();
>>
>> Is this special case really needed? I mean 'max_vcpus' is not '0' by
>> default so whoever sets it manually probably knows how big his guests
>> are going to be anyway and it is not always obvious how many CPUs are
>> reported by 'num_possible_cpus()' (ACPI tables can be weird for example).
>
> The idea was to make it easy for anyone managing a large fleet of hosts
> and wanting to have a common setting for all of them.
>

I see. It seems to be uncommon indeed to run guests with more vCPUs than
host pCPUs so everything >= num_online_cpus() should be OK. My only
concern about num_possible_cpus() is that it is going to be hard to
explain what 'possible CPUs' mean (but whoever cares that much about
wasting memory can always set the required value manually).

> It would even be possible to use '0' as the default (probably via config
> option only).
>
>>
>>> +
>>> kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
>>> sizeof(unsigned long));
>>> kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
>>

--
Vitaly

2021-07-14 13:06:47

by Juergen Gross

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

On 14.07.21 13:45, Vitaly Kuznetsov wrote:
> Juergen Gross <[email protected]> writes:
>
>> On 14.07.21 13:15, Vitaly Kuznetsov wrote:
>>> Juergen Gross <[email protected]> writes:
>>>
>>>> Today the maximum number of vcpus of a kvm guest is set via a #define
>>>> in a header file.
>>>>
>>>> In order to support higher vcpu numbers for guests without generally
>>>> increasing the memory consumption of guests on the host especially on
>>>> very large systems add a boot parameter for specifying the number of
>>>> allowed vcpus for guests.
>>>>
>>>> The default will still be the current setting of 288. The value 0 has
>>>> the special meaning to limit the number of possible vcpus to the
>>>> number of possible cpus of the host.
>>>>
>>>> Signed-off-by: Juergen Gross <[email protected]>
>>>> ---
>>>> Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
>>>> arch/x86/include/asm/kvm_host.h | 5 ++++-
>>>> arch/x86/kvm/x86.c | 7 +++++++
>>>> 3 files changed, 21 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>> index 99bfa53a2bbd..8eb856396ffa 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -2373,6 +2373,16 @@
>>>> guest can't have more vcpus than the set value + 1.
>>>> Default: 1023
>>>>
>>>> + kvm.max_vcpus= [KVM,X86] Set the maximum allowed numbers of vcpus per
>>>> + guest. The special value 0 sets the limit to the number
>>>> + of physical cpus possible on the host (including not
>>>> + yet hotplugged cpus). Higher values will result in
>>>> + slightly higher memory consumption per guest. Depending
>>>> + on the value and the virtual topology the maximum
>>>> + allowed vcpu-id might need to be raised, too (see
>>>> + kvm.max_vcpu_id parameter).
>>>
>>> I'd suggest to at least add a sanity check: 'max_vcpu_id' should always
>>> be >= 'max_vcpus'. Alternatively, we can replace 'max_vcpu_id' with say
>>> 'vcpu_id_to_vcpus_ratio' and set it to e.g. '4' by default.
>>
>> Either would be fine with me.
>>
>> A default of '2' for the ratio would seem more appropriate for me,
>> however. A thread count per core not being a power of 2 is quite
>> unlikely, and the worst case scenario for cores per socket would be
>> 2^n + 1.
>>
>
> (I vaguely recall AMD EPYC had more than thread id (package id?)
> encapsulated into APIC id).

Ah, yes, that rings a bell.

> Personally, I'd vote for introducing a 'ratio' parameter then so
> generally users will only have to set 'kvm.max_vcpus'.

Okay.

Default '4' then? Or '2 ^ (topology_levels - 2)' (assuming a
topology_level of 3 on Intel: thread/core/socket and 4 on EPYC:
thread/core/package/socket).

>
>>>
>>>> + Default: 288
>>>> +
>>>> l1tf= [X86] Control mitigation of the L1TF vulnerability on
>>>> affected CPUs
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 39cbc4b6bffb..65ae82a5d444 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -37,7 +37,8 @@
>>>>
>>>> #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>>>
>>>> -#define KVM_MAX_VCPUS 288
>>>> +#define KVM_DEFAULT_MAX_VCPUS 288
>>>> +#define KVM_MAX_VCPUS max_vcpus
>>>> #define KVM_SOFT_MAX_VCPUS 240
>>>> #define KVM_DEFAULT_MAX_VCPU_ID 1023
>>>> #define KVM_MAX_VCPU_ID max_vcpu_id
>>>> @@ -1509,6 +1510,8 @@ extern u64 kvm_max_tsc_scaling_ratio;
>>>> extern u64 kvm_default_tsc_scaling_ratio;
>>>> /* bus lock detection supported? */
>>>> extern bool kvm_has_bus_lock_exit;
>>>> +/* maximum number of vcpus per guest */
>>>> +extern unsigned int max_vcpus;
>>>> /* maximum vcpu-id */
>>>> extern unsigned int max_vcpu_id;
>>>> /* per cpu vcpu bitmasks (disable preemption during usage) */
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index a9b0bb2221ea..888c4507504d 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -177,6 +177,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>>>> int __read_mostly pi_inject_timer = -1;
>>>> module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>>>>
>>>> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
>>>> +module_param(max_vcpus, uint, S_IRUGO);
>>>> +EXPORT_SYMBOL_GPL(max_vcpus);
>>>> +
>>>> unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
>>>> module_param(max_vcpu_id, uint, S_IRUGO);
>>>>
>>>> @@ -10648,6 +10652,9 @@ int kvm_arch_hardware_setup(void *opaque)
>>>> if (boot_cpu_has(X86_FEATURE_XSAVES))
>>>> rdmsrl(MSR_IA32_XSS, host_xss);
>>>>
>>>> + if (max_vcpus == 0)
>>>> + max_vcpus = num_possible_cpus();
>>>
>>> Is this special case really needed? I mean 'max_vcpus' is not '0' by
>>> default so whoever sets it manually probably knows how big his guests
>>> are going to be anyway and it is not always obvious how many CPUs are
>>> reported by 'num_possible_cpus()' (ACPI tables can be weird for example).
>>
>> The idea was to make it easy for anyone managing a large fleet of hosts
>> and wanting to have a common setting for all of them.
>>
>
> I see. It seems to be uncommon indeed to run guests with more vCPUs than
> host pCPUs so everything >= num_online_cpus() should be OK. My only
> concern about num_possible_cpus() is that it is going to be hard to
> explain what 'possible CPUs' mean (but whoever cares that much about
> wasting memory can always set the required value manually).

Right.

Juergen

Attachments:

OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key OpenPGP_signature (505.00 B)
OpenPGP digital signature Download all attachments

2021-07-14 13:23:42

by Vitaly Kuznetsov

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

Juergen Gross <[email protected]> writes:

> On 14.07.21 13:45, Vitaly Kuznetsov wrote:
>
>> Personally, I'd vote for introducing a 'ratio' parameter then so
>> generally users will only have to set 'kvm.max_vcpus'.
>
> Okay.
>
> Default '4' then? Or '2 ^ (topology_levels - 2)' (assuming a
> topology_level of 3 on Intel: thread/core/socket and 4 on EPYC:
> thread/core/package/socket).

I'd suggest we default to '4' for both Intel and AMD as we haven't given
up completely on cross-vendor VMs (running AMD VMs on Intel CPUs and
vice versa). It would be great to leave a comment where the number comes
from of course.

--
Vitaly

2021-07-26 13:34:32

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 4/6] x86/kvm: introduce per cpu vcpu masks

On 01/07/21 17:41, Juergen Gross wrote:
> In order to support high vcpu numbers per guest don't use on stack
> vcpu bitmasks. As all those currently used bitmasks are not used in
> functions subject to recursion it is fairly easy to replace them with
> percpu bitmasks.
>
> Disable preemption while such a bitmask is being used in order to
> avoid double usage in case we'd switch cpus.
>
> Signed-off-by: Juergen Gross <[email protected]>

Please use a local_lock instead of disabling preemption.

Paolo

> ---
> arch/x86/include/asm/kvm_host.h | 7 +++++++
> arch/x86/kvm/hyperv.c | 25 ++++++++++++++++++-------
> arch/x86/kvm/irq_comm.c | 9 +++++++--
> arch/x86/kvm/x86.c | 20 +++++++++++++++++++-
> 4 files changed, 51 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 88b1ff898fb9..79138c91f83d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1514,6 +1514,13 @@ extern u64 kvm_default_tsc_scaling_ratio;
> extern bool kvm_has_bus_lock_exit;
> /* maximum vcpu-id */
> extern unsigned int max_vcpu_id;
> +/* per cpu vcpu bitmasks (disable preemption during usage) */
> +extern unsigned long __percpu *kvm_pcpu_vcpu_mask;
> +#define KVM_VCPU_MASK_SZ \
> + (sizeof(*kvm_pcpu_vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS))
> +extern u64 __percpu *kvm_hv_vp_bitmap;
> +#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
> +#define KVM_HV_VPMAP_SZ (sizeof(u64) * KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
>
> extern u64 kvm_mce_cap_supported;
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index f00830e5202f..32d31a7334fa 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -40,7 +40,7 @@
> /* "Hv#1" signature */
> #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
>
> -#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
> +u64 __percpu *kvm_hv_vp_bitmap;
>
> static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
> bool vcpu_kick);
> @@ -1612,8 +1612,7 @@ static __always_inline unsigned long *sparse_set_to_vcpu_mask(
> struct kvm_vcpu *vcpu;
> int i, bank, sbank = 0;
>
> - memset(vp_bitmap, 0,
> - KVM_HV_MAX_SPARSE_VCPU_SET_BITS * sizeof(*vp_bitmap));
> + memset(vp_bitmap, 0, KVM_HV_VPMAP_SZ);
> for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
> KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
> vp_bitmap[bank] = sparse_banks[sbank++];
> @@ -1637,8 +1636,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> struct hv_tlb_flush_ex flush_ex;
> struct hv_tlb_flush flush;
> - u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
> - DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
> + u64 *vp_bitmap;
> + unsigned long *vcpu_bitmap;
> unsigned long *vcpu_mask;
> u64 valid_bank_mask;
> u64 sparse_banks[64];
> @@ -1696,6 +1695,10 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool
>
> cpumask_clear(&hv_vcpu->tlb_flush);
>
> + preempt_disable();
> + vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
> + vp_bitmap = this_cpu_ptr(kvm_hv_vp_bitmap);
> +
> vcpu_mask = all_cpus ? NULL :
> sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
> vp_bitmap, vcpu_bitmap);
> @@ -1707,6 +1710,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, u64 ingpa, u16 rep_cnt, bool
> kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH,
> NULL, vcpu_mask, &hv_vcpu->tlb_flush);
>
> + preempt_enable();
> +
> ret_success:
> /* We always do full TLB flush, set rep_done = rep_cnt. */
> return (u64)HV_STATUS_SUCCESS |
> @@ -1738,8 +1743,8 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, u64 ingpa, u64 outgpa,
> struct kvm *kvm = vcpu->kvm;
> struct hv_send_ipi_ex send_ipi_ex;
> struct hv_send_ipi send_ipi;
> - u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
> - DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
> + u64 *vp_bitmap;
> + unsigned long *vcpu_bitmap;
> unsigned long *vcpu_mask;
> unsigned long valid_bank_mask;
> u64 sparse_banks[64];
> @@ -1796,12 +1801,18 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, u64 ingpa, u64 outgpa,
> if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
>
> + preempt_disable();
> + vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
> + vp_bitmap = this_cpu_ptr(kvm_hv_vp_bitmap);
> +
> vcpu_mask = all_cpus ? NULL :
> sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
> vp_bitmap, vcpu_bitmap);
>
> kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
>
> + preempt_enable();
> +
> ret_success:
> return HV_STATUS_SUCCESS;
> }
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index d5b72a08e566..be4424ddcd8f 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -47,7 +47,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> {
> int i, r = -1;
> struct kvm_vcpu *vcpu, *lowest = NULL;
> - unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
> + unsigned long *dest_vcpu_bitmap;
> unsigned int dest_vcpus = 0;
>
> if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
> @@ -59,7 +59,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> irq->delivery_mode = APIC_DM_FIXED;
> }
>
> - memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
> + preempt_disable();
> + dest_vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
> +
> + memset(dest_vcpu_bitmap, 0, KVM_VCPU_MASK_SZ);
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> if (!kvm_apic_present(vcpu))
> @@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> lowest = kvm_get_vcpu(kvm, idx);
> }
>
> + preempt_enable();
> +
> if (lowest)
> r = kvm_apic_set_irq(lowest, irq, dest_map);
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0390d90fd360..3af398ef1fc9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -180,6 +180,8 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
> unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID;
> module_param(max_vcpu_id, uint, S_IRUGO);
>
> +unsigned long __percpu *kvm_pcpu_vcpu_mask;
> +
> /*
> * Restoring the host value for MSRs that are only consumed when running in
> * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
> @@ -10646,9 +10648,18 @@ int kvm_arch_hardware_setup(void *opaque)
> if (boot_cpu_has(X86_FEATURE_XSAVES))
> rdmsrl(MSR_IA32_XSS, host_xss);
>
> + kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
> + sizeof(unsigned long));
> + kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
> +
> + if (!kvm_pcpu_vcpu_mask || !kvm_hv_vp_bitmap) {
> + r = -ENOMEM;
> + goto err;
> + }
> +
> r = ops->hardware_setup();
> if (r != 0)
> - return r;
> + goto err;
>
> memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> kvm_ops_static_call_update();
> @@ -10676,11 +10687,18 @@ int kvm_arch_hardware_setup(void *opaque)
>
> kvm_init_msr_list();
> return 0;
> +
> + err:
> + free_percpu(kvm_pcpu_vcpu_mask);
> + free_percpu(kvm_hv_vp_bitmap);
> + return r;
> }
>
> void kvm_arch_hardware_unsetup(void)
> {
> static_call(kvm_x86_hardware_unsetup)();
> + free_percpu(kvm_pcpu_vcpu_mask);
> + free_percpu(kvm_hv_vp_bitmap);
> }
>
> int kvm_arch_check_processor_compat(void *opaque)
>

2021-07-26 13:42:48

by Juergen Gross

[permalink] [raw]

Subject: Re: [PATCH 4/6] x86/kvm: introduce per cpu vcpu masks

On 26.07.21 15:32, Paolo Bonzini wrote:
> On 01/07/21 17:41, Juergen Gross wrote:
>> In order to support high vcpu numbers per guest don't use on stack
>> vcpu bitmasks. As all those currently used bitmasks are not used in
>> functions subject to recursion it is fairly easy to replace them with
>> percpu bitmasks.
>>
>> Disable preemption while such a bitmask is being used in order to
>> avoid double usage in case we'd switch cpus.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> Please use a local_lock instead of disabling preemption.

Okay.

Juergen

Attachments:

OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key OpenPGP_signature (505.00 B)
OpenPGP digital signature Download all attachments

2021-07-26 13:43:49

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 0/6] x86/kvm: add boot parameters for max vcpu configs

On 01/07/21 17:40, Juergen Gross wrote:
> In order to be able to have a single kernel for supporting even huge
> numbers of vcpus per guest some arrays should be sized dynamically.
>
> The easiest way to do that is to add boot parameters for the maximum
> number of vcpus and the highest supported vcpu-id overwriting the
> normal default.
>
> This patch series is doing that for x86. The same scheme can be easily
> adapted to other architectures, but I don't want to do that in the
> first iteration.
>
> In the long term I'd suggest to have a per-guest setting of the two
> parameters allowing to spare some memory for smaller guests. OTOH this
> would require new ioctl()s and respective qemu modifications, so I let
> those away for now.
>
> I've tested the series not to break normal guest operation and the new
> parameters to be effective on x86. For Arm64 I did a compile test only.
>
> Juergen Gross (6):
> x86/kvm: fix vcpu-id indexed array sizes
> x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h
> x86/kvm: add boot parameter for maximum vcpu-id
> x86/kvm: introduce per cpu vcpu masks
> kvm: allocate vcpu pointer array separately
> x86/kvm: add boot parameter for setting max number of vcpus per guest
>
> .../admin-guide/kernel-parameters.txt | 18 +++++++
> arch/arm64/kvm/arm.c | 28 +++++++++--
> arch/x86/include/asm/kvm_host.h | 22 ++++++---
> arch/x86/kvm/hyperv.c | 25 +++++++---
> arch/x86/kvm/ioapic.c | 14 +++++-
> arch/x86/kvm/ioapic.h | 8 +--
> arch/x86/kvm/irq_comm.c | 9 +++-
> arch/x86/kvm/x86.c | 49 ++++++++++++++++++-
> include/linux/kvm_host.h | 17 ++++++-
> 9 files changed, 160 insertions(+), 30 deletions(-)
>

Queued patches 1-2, thanks (1 for stable too).

Paolo

2021-07-26 13:43:54

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 5/6] kvm: allocate vcpu pointer array separately

On 01/07/21 17:41, Juergen Gross wrote:
> {
> - if (!has_vhe())
> + if (!has_vhe()) {
> + kfree(kvm->vcpus);
> kfree(kvm);
> - else
> + } else {
> + vfree(kvm->vcpus);
> vfree(kvm);
> + }
> }
>
> int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 79138c91f83d..39cbc4b6bffb 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1440,10 +1440,7 @@ static inline void kvm_ops_static_call_update(void)
> }
>
> #define __KVM_HAVE_ARCH_VM_ALLOC
> -static inline struct kvm *kvm_arch_alloc_vm(void)
> -{
> - return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> -}
> +struct kvm *kvm_arch_alloc_vm(void);
> void kvm_arch_free_vm(struct kvm *kvm);
>
> #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3af398ef1fc9..a9b0bb2221ea 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10741,9 +10741,28 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> static_call(kvm_x86_sched_in)(vcpu, cpu);
> }
>
> +struct kvm *kvm_arch_alloc_vm(void)
> +{
> + struct kvm *kvm;
> +
> + kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + if (!kvm)
> + return NULL;
> +
> + kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
> + GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + if (!kvm->vcpus) {
> + vfree(kvm);
> + kvm = NULL;
> + }
> +

Let's keep this cleaner:

1) use kvfree in the common version of kvm_arch_free_vm

2) split __KVM_HAVE_ARCH_VM_ALLOC and __KVM_HAVE_ARCH_VM_FREE (ARM does
not need it once kvfree is used)

3) define a __kvm_arch_free_vm version that is defined even if
!__KVM_HAVE_ARCH_VM_FREE, and which can be used on x86.

Paolo

2021-07-26 13:49:19

by Juergen Gross

[permalink] [raw]

Subject: Re: [PATCH 5/6] kvm: allocate vcpu pointer array separately

On 26.07.21 15:40, Paolo Bonzini wrote:
> On 01/07/21 17:41, Juergen Gross wrote:
>> {
>> -    if (!has_vhe())
>> +    if (!has_vhe()) {
>> +        kfree(kvm->vcpus);
>>           kfree(kvm);
>> -    else
>> +    } else {
>> +        vfree(kvm->vcpus);
>>           vfree(kvm);
>> +    }
>> }
>> int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> index 79138c91f83d..39cbc4b6bffb 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1440,10 +1440,7 @@ static inline void
>> kvm_ops_static_call_update(void)
>> }
>> #define __KVM_HAVE_ARCH_VM_ALLOC
>> -static inline struct kvm *kvm_arch_alloc_vm(void)
>> -{
>> -    return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT |
>> __GFP_ZERO);
>> -}
>> +struct kvm *kvm_arch_alloc_vm(void);
>> void kvm_arch_free_vm(struct kvm *kvm);
>> #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 3af398ef1fc9..a9b0bb2221ea 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10741,9 +10741,28 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu,
>> int cpu)
>>       static_call(kvm_x86_sched_in)(vcpu, cpu);
>> }
>> +struct kvm *kvm_arch_alloc_vm(void)
>> +{
>> +    struct kvm *kvm;
>> +
>> +    kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT |
>> __GFP_ZERO);
>> +    if (!kvm)
>> +        return NULL;
>> +
>> +    kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
>> +                   GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +    if (!kvm->vcpus) {
>> +        vfree(kvm);
>> +        kvm = NULL;
>> +    }
>> +
>
> Let's keep this cleaner:
>
> 1) use kvfree in the common version of kvm_arch_free_vm
>
> 2) split __KVM_HAVE_ARCH_VM_ALLOC and __KVM_HAVE_ARCH_VM_FREE (ARM does
> not need it once kvfree is used)
>
> 3) define a __kvm_arch_free_vm version that is defined even if
> !__KVM_HAVE_ARCH_VM_FREE, and which can be used on x86.

Okay, will do so.

Juergen

Attachments:

OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key OpenPGP_signature (505.00 B)
OpenPGP digital signature Download all attachments

2021-07-26 13:59:26

by Marc Zyngier

[permalink] [raw]

Subject: Re: [PATCH 5/6] kvm: allocate vcpu pointer array separately

On 2021-07-26 14:46, Juergen Gross wrote:
> On 26.07.21 15:40, Paolo Bonzini wrote:
>> On 01/07/21 17:41, Juergen Gross wrote:
>>> {
>>> -    if (!has_vhe())
>>> +    if (!has_vhe()) {
>>> +        kfree(kvm->vcpus);
>>>           kfree(kvm);
>>> -    else
>>> +    } else {
>>> +        vfree(kvm->vcpus);
>>>           vfree(kvm);
>>> +    }
>>> }
>>> int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>> b/arch/x86/include/asm/kvm_host.h
>>> index 79138c91f83d..39cbc4b6bffb 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1440,10 +1440,7 @@ static inline void
>>> kvm_ops_static_call_update(void)
>>> }
>>> #define __KVM_HAVE_ARCH_VM_ALLOC
>>> -static inline struct kvm *kvm_arch_alloc_vm(void)
>>> -{
>>> -    return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT |
>>> __GFP_ZERO);
>>> -}
>>> +struct kvm *kvm_arch_alloc_vm(void);
>>> void kvm_arch_free_vm(struct kvm *kvm);
>>> #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 3af398ef1fc9..a9b0bb2221ea 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -10741,9 +10741,28 @@ void kvm_arch_sched_in(struct kvm_vcpu
>>> *vcpu, int cpu)
>>>       static_call(kvm_x86_sched_in)(vcpu, cpu);
>>> }
>>> +struct kvm *kvm_arch_alloc_vm(void)
>>> +{
>>> +    struct kvm *kvm;
>>> +
>>> +    kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT |
>>> __GFP_ZERO);
>>> +    if (!kvm)
>>> +        return NULL;
>>> +
>>> +    kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
>>> +                   GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>>> +    if (!kvm->vcpus) {
>>> +        vfree(kvm);
>>> +        kvm = NULL;
>>> +    }
>>> +
>>
>> Let's keep this cleaner:
>>
>> 1) use kvfree in the common version of kvm_arch_free_vm
>>
>> 2) split __KVM_HAVE_ARCH_VM_ALLOC and __KVM_HAVE_ARCH_VM_FREE (ARM
>> does not need it once kvfree is used)
>>
>> 3) define a __kvm_arch_free_vm version that is defined even if
>> !__KVM_HAVE_ARCH_VM_FREE, and which can be used on x86.
>
> Okay, will do so.

I'd appreciate if you could Cc me on the whole series, and
not just the single arm64 patch.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2021-09-03 07:03:59

by Juergen Gross

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

On 14.07.21 15:21, Vitaly Kuznetsov wrote:
> Juergen Gross <[email protected]> writes:
>
>> On 14.07.21 13:45, Vitaly Kuznetsov wrote:
>>
>>> Personally, I'd vote for introducing a 'ratio' parameter then so
>>> generally users will only have to set 'kvm.max_vcpus'.
>>
>> Okay.
>>
>> Default '4' then? Or '2 ^ (topology_levels - 2)' (assuming a
>> topology_level of 3 on Intel: thread/core/socket and 4 on EPYC:
>> thread/core/package/socket).
>
> I'd suggest we default to '4' for both Intel and AMD as we haven't given
> up completely on cross-vendor VMs (running AMD VMs on Intel CPUs and
> vice versa). It would be great to leave a comment where the number comes
> from of course.
>

Thinking more about it I believe it would be better to make the
parameter something like "additional vcpu-id bits" with a default of
topology_levels - 2 (cross-vendor VMs are so special that I think the
need to specify another value explicitly in this case is acceptable).

Reasons are:

- the ability to specify factor values not being a power of 2 is weird
- just specifying the additional number of bits would lead to compatible
behavior (e.g. a max vcpu-id of 1023 with max_vcpus being 288 and the
default value of 1)
- the max vcpu-id should (normally) be 2^n - 1

Juergen

Attachments:

OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key OpenPGP_signature (505.00 B)
OpenPGP digital signature Download all attachments

2021-09-03 07:45:01

by Vitaly Kuznetsov

[permalink] [raw]

Subject: Re: [PATCH 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest

Juergen Gross <[email protected]> writes:

> On 14.07.21 15:21, Vitaly Kuznetsov wrote:
>> Juergen Gross <[email protected]> writes:
>>
>>> On 14.07.21 13:45, Vitaly Kuznetsov wrote:
>>>
>>>> Personally, I'd vote for introducing a 'ratio' parameter then so
>>>> generally users will only have to set 'kvm.max_vcpus'.
>>>
>>> Okay.
>>>
>>> Default '4' then? Or '2 ^ (topology_levels - 2)' (assuming a
>>> topology_level of 3 on Intel: thread/core/socket and 4 on EPYC:
>>> thread/core/package/socket).
>>
>> I'd suggest we default to '4' for both Intel and AMD as we haven't given
>> up completely on cross-vendor VMs (running AMD VMs on Intel CPUs and
>> vice versa). It would be great to leave a comment where the number comes
>> from of course.
>>
>
> Thinking more about it I believe it would be better to make the
> parameter something like "additional vcpu-id bits" with a default of
> topology_levels - 2 (cross-vendor VMs are so special that I think the
> need to specify another value explicitly in this case is acceptable).
>
> Reasons are:
>
> - the ability to specify factor values not being a power of 2 is weird
> - just specifying the additional number of bits would lead to compatible
> behavior (e.g. a max vcpu-id of 1023 with max_vcpus being 288 and the
> default value of 1)
> - the max vcpu-id should (normally) be 2^n - 1

Sounds good to me!

Also, there's an ongoing work to raise the default KVM_MAX_VCPUS number
by Eduardo (Cc):

https://lore.kernel.org/kvm/[email protected]/

It would be great if you could unify your efforts)

--
Vitaly

2021-09-03 15:32:28

by Eduardo Habkost

[permalink] [raw]

Subject: Re: [PATCH 1/6] x86/kvm: fix vcpu-id indexed array sizes

On Thu, Jul 01, 2021 at 05:41:00PM +0200, Juergen Gross wrote:
> KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
> of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
> elements.

I don't think that's true. kvm_vm_ioctl_create_vcpu() refuses to
create a VCPU with id==KVM_MAX_VCPU_ID.
Documentation/virt/kvm/api.rst also states that
"The vcpu id is an integer in the range [0, max_vcpu_id)."

>
> Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
> an odd number, resulting in always enough padding being available at
> the end of those arrays.
>
> Nevertheless this should be fixed in order to avoid rare problems in
> case someone is using an even number for KVM_MAX_VCPU_ID.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> arch/x86/kvm/ioapic.c | 2 +-
> arch/x86/kvm/ioapic.h | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 698969e18fe3..ff005fe738a4 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -96,7 +96,7 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
> static void rtc_irq_eoi_tracking_reset(struct kvm_ioapic *ioapic)
> {
> ioapic->rtc_status.pending_eoi = 0;
> - bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID);
> + bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID + 1);
> }
>
> static void kvm_rtc_eoi_tracking_restore_all(struct kvm_ioapic *ioapic);
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index 660401700075..11e4065e1617 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -43,13 +43,13 @@ struct kvm_vcpu;
>
> struct dest_map {
> /* vcpu bitmap where IRQ has been sent */
> - DECLARE_BITMAP(map, KVM_MAX_VCPU_ID);
> + DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
>
> /*
> * Vector sent to a given vcpu, only valid when
> * the vcpu's bit in map is set
> */
> - u8 vectors[KVM_MAX_VCPU_ID];
> + u8 vectors[KVM_MAX_VCPU_ID + 1];
> };
>
>
> --
> 2.26.2
>

--
Eduardo