2018-07-03 06:24:43

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 0/6] KVM: X86: Implement PV IPIs support

Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by PV IPIs. This patchset supports
PV IPIs for maximal 128 vCPUs VM, it is big enough for cloud environment
currently, supporting more vCPUs needs to introduce more complex logic,
in the future this might be extended if needed.

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns

x2apic cluster mode, pv-ipi

Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns

x2apic physical mode, vanilla

Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns

x2apic physical mode, pv-ipi

Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns

v2 -> v3:
* rename ipi_mask_done to irq_restore_exit, __send_ipi_mask return int
instead of bool
* fix build errors reported by 0day
* split patches, nothing change

v1 -> v2:
* sparse apic id > 128, or any other errors, fallback to original apic hooks
* have two bitmask arguments so that one hypercall handles 128 vCPUs
* fix KVM_FEATURE_PV_SEND_IPI doc
* document hypercall
* fix NMI selftest fails
* fix build errors reported by 0day

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>

Wanpeng Li (6):
KVM: X86: Add kvm hypervisor init time platform setup callback
KVM: X86: Implement PV IPIs in linux guest
KVM: X86: Fallback to original apic hooks when bad happens
KVM: X86: Implement PV IPIs send hypercall
KVM: X86: Add NMI support to PV IPIs
KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest

Documentation/virtual/kvm/cpuid.txt | 4 ++
Documentation/virtual/kvm/hypercalls.txt | 6 ++
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/kvm.c | 101 +++++++++++++++++++++++++++++++
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/x86.c | 42 +++++++++++++
include/uapi/linux/kvm_para.h | 1 +
7 files changed, 157 insertions(+), 1 deletion(-)

--
2.7.4



2018-07-03 06:23:34

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall

From: Wanpeng Li <[email protected]>

Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by PV IPIs.

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

Dry-run: 0, 2392199 ns
Self-IPI: 6907514, 15027589 ns
Normal IPI: 223910476, 251301666 ns
Broadcast IPI: 0, 9282161150 ns
Broadcast lock: 0, 8812934104 ns

x2apic cluster mode, pv-ipi

Dry-run: 0, 2449341 ns
Self-IPI: 6720360, 15028732 ns
Normal IPI: 228643307, 255708477 ns
Broadcast IPI: 0, 7572293590 ns => 22% performance boost
Broadcast lock: 0, 8316124651 ns

x2apic physical mode, vanilla

Dry-run: 0, 3135933 ns
Self-IPI: 8572670, 17901757 ns
Normal IPI: 226444334, 255421709 ns
Broadcast IPI: 0, 19845070887 ns
Broadcast lock: 0, 19827383656 ns

x2apic physical mode, pv-ipi

Dry-run: 0, 2446381 ns
Self-IPI: 6788217, 15021056 ns
Normal IPI: 219454441, 249583458 ns
Broadcast IPI: 0, 7806540019 ns => 154% performance boost
Broadcast lock: 0, 9143618799 ns

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
Documentation/virtual/kvm/hypercalls.txt | 6 ++++++
arch/x86/kvm/x86.c | 36 ++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+)

diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
index a890529..a771ee8 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -121,3 +121,9 @@ compute the CLOCK_REALTIME for its clock, at the same instant.

Returns KVM_EOPNOTSUPP if the host does not use TSC clocksource,
or if clock type is different than KVM_CLOCK_PAIRING_WALLCLOCK.
+
+6. KVM_HC_SEND_IPI
+------------------------
+Architecture: x86
+Status: active
+Purpose: Hypercall used to send IPIs.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0046aa7..46657ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6689,6 +6689,39 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
}

+/*
+ * Return 0 if successfully added and 1 if discarded.
+ */
+static int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
+ unsigned long ipi_bitmap_high, int vector)
+{
+ int i;
+ struct kvm_apic_map *map;
+ struct kvm_vcpu *vcpu;
+ struct kvm_lapic_irq irq = {
+ .delivery_mode = APIC_DM_FIXED,
+ .vector = vector,
+ };
+
+ rcu_read_lock();
+ map = rcu_dereference(kvm->arch.apic_map);
+
+ for_each_set_bit(i, &ipi_bitmap_low, BITS_PER_LONG) {
+ vcpu = map->phys_map[i]->vcpu;
+ if (!kvm_apic_set_irq(vcpu, &irq, NULL))
+ return 1;
+ }
+
+ for_each_set_bit(i, &ipi_bitmap_high, BITS_PER_LONG) {
+ vcpu = map->phys_map[i + BITS_PER_LONG]->vcpu;
+ if (!kvm_apic_set_irq(vcpu, &irq, NULL))
+ return 1;
+ }
+
+ rcu_read_unlock();
+ return 0;
+}
+
void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
{
vcpu->arch.apicv_active = false;
@@ -6737,6 +6770,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_CLOCK_PAIRING:
ret = kvm_pv_clock_pairing(vcpu, a0, a1);
break;
+ case KVM_HC_SEND_IPI:
+ ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2);
+ break;
#endif
default:
ret = -KVM_ENOSYS;
--
2.7.4


2018-07-03 06:23:39

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs

From: Wanpeng Li <[email protected]>

The NMI delivery mode of ICR is used to deliver an NMI to the processor,
and the vector information is ignored.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kernel/kvm.c | 13 +++++++++++--
arch/x86/kvm/x86.c | 16 +++++++++++-----
2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 81be75b..b866518 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -459,7 +459,7 @@ static void __init sev_map_percpu_data(void)
#ifdef CONFIG_X86_64
static int __send_ipi_mask(const struct cpumask *mask, int vector)
{
- unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
+ unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0, icr = 0;
int cpu, apic_id, ret = 0;

if (cpumask_empty(mask))
@@ -479,7 +479,16 @@ static int __send_ipi_mask(const struct cpumask *mask, int vector)
}
}

- ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
+ switch (vector) {
+ default:
+ icr = APIC_DM_FIXED | vector;
+ break;
+ case NMI_VECTOR:
+ icr = APIC_DM_NMI;
+ break;
+ }
+
+ ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, icr);

irq_restore_exit:
local_irq_restore(flags);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 46657ef..2e4c4d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6693,15 +6693,21 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
* Return 0 if successfully added and 1 if discarded.
*/
static int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
- unsigned long ipi_bitmap_high, int vector)
+ unsigned long ipi_bitmap_high, unsigned long icr)
{
int i;
struct kvm_apic_map *map;
struct kvm_vcpu *vcpu;
- struct kvm_lapic_irq irq = {
- .delivery_mode = APIC_DM_FIXED,
- .vector = vector,
- };
+ struct kvm_lapic_irq irq = {0};
+
+ switch (icr & APIC_VECTOR_MASK) {
+ default:
+ irq.vector = icr & APIC_VECTOR_MASK;
+ break;
+ case NMI_VECTOR:
+ break;
+ }
+ irq.delivery_mode = icr & APIC_MODE_MASK;

rcu_read_lock();
map = rcu_dereference(kvm->arch.apic_map);
--
2.7.4


2018-07-03 06:23:42

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 6/6] KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest

From: Wanpeng Li <[email protected]>

Expose PV_SEND_IPI feature bit to guest, the guest can check this feature
bit before using paravirtualized send IPIs.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
Documentation/virtual/kvm/cpuid.txt | 4 ++++
arch/x86/kvm/cpuid.c | 3 ++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index ab022dc..97ca194 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -62,6 +62,10 @@ KVM_FEATURE_ASYNC_PF_VMEXIT || 10 || paravirtualized async PF VM exit
|| || can be enabled by setting bit 2
|| || when writing to msr 0x4b564d02
------------------------------------------------------------------------------
+KVM_FEATURE_PV_SEND_IPI || 11 || guest checks this feature bit
+ || || before using paravirtualized
+ || || send IPIs.
+------------------------------------------------------------------------------
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
|| || per-cpu warps are expected in
|| || kvmclock.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7e042e3..7bcfa61 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -621,7 +621,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
(1 << KVM_FEATURE_PV_UNHALT) |
(1 << KVM_FEATURE_PV_TLB_FLUSH) |
- (1 << KVM_FEATURE_ASYNC_PF_VMEXIT);
+ (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
+ (1 << KVM_FEATURE_PV_SEND_IPI);

if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
--
2.7.4


2018-07-03 06:24:14

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 3/6] KVM: X86: Fallback to original apic hooks when bad happens

From: Wanpeng Li <[email protected]>

Fallback to original apic hooks when apic id is sparse and larger
than 128 or kvm fails to add the pending IRQ to lapic.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kernel/kvm.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 2fe1420..81be75b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -47,6 +47,7 @@
#include <asm/hypervisor.h>
#include <asm/kvm_guest.h>

+static struct apic orig_apic;
static int kvmapf = 1;

static int __init parse_no_kvmapf(char *arg)
@@ -456,13 +457,13 @@ static void __init sev_map_percpu_data(void)
#ifdef CONFIG_SMP

#ifdef CONFIG_X86_64
-static void __send_ipi_mask(const struct cpumask *mask, int vector)
+static int __send_ipi_mask(const struct cpumask *mask, int vector)
{
unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
- int cpu, apic_id;
+ int cpu, apic_id, ret = 0;

if (cpumask_empty(mask))
- return;
+ return 0;

local_irq_save(flags);

@@ -472,16 +473,23 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
__set_bit(apic_id, &ipi_bitmap_low);
else if (apic_id < 2 * BITS_PER_LONG)
__set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
+ else {
+ ret = -EFAULT;
+ goto irq_restore_exit;
+ }
}

- kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
+ ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);

+irq_restore_exit:
local_irq_restore(flags);
+ return ret;
}

static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
{
- __send_ipi_mask(mask, vector);
+ if (__send_ipi_mask(mask, vector))
+ orig_apic.send_IPI_mask(mask, vector);
}

static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
@@ -493,7 +501,8 @@ static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
cpumask_copy(&new_mask, mask);
cpumask_clear_cpu(this_cpu, &new_mask);
local_mask = &new_mask;
- __send_ipi_mask(local_mask, vector);
+ if (__send_ipi_mask(local_mask, vector))
+ orig_apic.send_IPI_mask_allbutself(mask, vector);
}

static void kvm_send_ipi_allbutself(int vector)
@@ -503,7 +512,8 @@ static void kvm_send_ipi_allbutself(int vector)

static void kvm_send_ipi_all(int vector)
{
- __send_ipi_mask(cpu_online_mask, vector);
+ if (__send_ipi_mask(cpu_online_mask, vector))
+ orig_apic.send_IPI_all(vector);
}

/*
@@ -511,6 +521,8 @@ static void kvm_send_ipi_all(int vector)
*/
static void kvm_setup_pv_ipi(void)
{
+ orig_apic = *apic;
+
apic->send_IPI_mask = kvm_send_ipi_mask;
apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
--
2.7.4


2018-07-03 06:24:34

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

From: Wanpeng Li <[email protected]>

Implement paravirtual apic hooks to enable PV IPIs.

apic->send_IPI_mask
apic->send_IPI_mask_allbutself
apic->send_IPI_allbutself
apic->send_IPI_all

The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
environment currently, supporting more vCPUs needs to introduce more
complex logic, in the future this might be extended if needed.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/kvm.c | 70 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm_para.h | 1 +
3 files changed, 72 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 0ede697..19980ec 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -28,6 +28,7 @@
#define KVM_FEATURE_PV_UNHALT 7
#define KVM_FEATURE_PV_TLB_FLUSH 9
#define KVM_FEATURE_ASYNC_PF_VMEXIT 10
+#define KVM_FEATURE_PV_SEND_IPI 11

#define KVM_HINTS_REALTIME 0

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 591bcf2..2fe1420 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
}

#ifdef CONFIG_SMP
+
+#ifdef CONFIG_X86_64
+static void __send_ipi_mask(const struct cpumask *mask, int vector)
+{
+ unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
+ int cpu, apic_id;
+
+ if (cpumask_empty(mask))
+ return;
+
+ local_irq_save(flags);
+
+ for_each_cpu(cpu, mask) {
+ apic_id = per_cpu(x86_cpu_to_apicid, cpu);
+ if (apic_id < BITS_PER_LONG)
+ __set_bit(apic_id, &ipi_bitmap_low);
+ else if (apic_id < 2 * BITS_PER_LONG)
+ __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
+ }
+
+ kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
+
+ local_irq_restore(flags);
+}
+
+static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
+{
+ __send_ipi_mask(mask, vector);
+}
+
+static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
+{
+ unsigned int this_cpu = smp_processor_id();
+ struct cpumask new_mask;
+ const struct cpumask *local_mask;
+
+ cpumask_copy(&new_mask, mask);
+ cpumask_clear_cpu(this_cpu, &new_mask);
+ local_mask = &new_mask;
+ __send_ipi_mask(local_mask, vector);
+}
+
+static void kvm_send_ipi_allbutself(int vector)
+{
+ kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
+}
+
+static void kvm_send_ipi_all(int vector)
+{
+ __send_ipi_mask(cpu_online_mask, vector);
+}
+
+/*
+ * Set the IPI entry points
+ */
+static void kvm_setup_pv_ipi(void)
+{
+ apic->send_IPI_mask = kvm_send_ipi_mask;
+ apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
+ apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
+ apic->send_IPI_all = kvm_send_ipi_all;
+ pr_info("KVM setup pv IPIs\n");
+}
+#endif
+
static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
{
native_smp_prepare_cpus(max_cpus);
@@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)

static void __init kvm_apic_init(void)
{
+#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
+ if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
+ num_possible_cpus() <= 2 * BITS_PER_LONG)
+ kvm_setup_pv_ipi();
+#endif
}

static void __init kvm_init_platform(void)
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index dcf629d..84f8fe3 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -26,6 +26,7 @@
#define KVM_HC_MIPS_EXIT_VM 7
#define KVM_HC_MIPS_CONSOLE_OUTPUT 8
#define KVM_HC_CLOCK_PAIRING 9
+#define KVM_HC_SEND_IPI 10

/*
* hypercalls use architecture specific
--
2.7.4


2018-07-03 06:24:49

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 1/6] KVM: X86: Add kvm hypervisor init time platform setup callback

From: Wanpeng Li <[email protected]>

Add kvm hypervisor init time platform setup callback which
will be used to replace native apic hooks by pararvirtual
hooks.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kernel/kvm.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5b2300b..591bcf2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -624,12 +624,22 @@ static uint32_t __init kvm_detect(void)
return kvm_cpuid_base();
}

+static void __init kvm_apic_init(void)
+{
+}
+
+static void __init kvm_init_platform(void)
+{
+ x86_platform.apic_post_init = kvm_apic_init;
+}
+
const __initconst struct hypervisor_x86 x86_hyper_kvm = {
.name = "KVM",
.detect = kvm_detect,
.type = X86_HYPER_KVM,
.init.guest_late_init = kvm_guest_init,
.init.x2apic_available = kvm_para_available,
+ .init.init_platform = kvm_init_platform,
};

static __init int activate_jump_labels(void)
--
2.7.4


2018-07-18 03:01:50

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: X86: Implement PV IPIs support

Gentle ping, hope this series can catch up with the next merge window. :)
On Tue, 3 Jul 2018 at 14:21, Wanpeng Li <[email protected]> wrote:
>
> Using hypercall to send IPIs by one vmexit instead of one by one for
> xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
> mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
> is enabled in qemu, however, latest AMD EPYC still just supports xapic
> mode which can get great improvement by PV IPIs. This patchset supports
> PV IPIs for maximal 128 vCPUs VM, it is big enough for cloud environment
> currently, supporting more vCPUs needs to introduce more complex logic,
> in the future this might be extended if needed.
>
> Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
> is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
>
> x2apic cluster mode, vanilla
>
> Dry-run: 0, 2392199 ns
> Self-IPI: 6907514, 15027589 ns
> Normal IPI: 223910476, 251301666 ns
> Broadcast IPI: 0, 9282161150 ns
> Broadcast lock: 0, 8812934104 ns
>
> x2apic cluster mode, pv-ipi
>
> Dry-run: 0, 2449341 ns
> Self-IPI: 6720360, 15028732 ns
> Normal IPI: 228643307, 255708477 ns
> Broadcast IPI: 0, 7572293590 ns => 22% performance boost
> Broadcast lock: 0, 8316124651 ns
>
> x2apic physical mode, vanilla
>
> Dry-run: 0, 3135933 ns
> Self-IPI: 8572670, 17901757 ns
> Normal IPI: 226444334, 255421709 ns
> Broadcast IPI: 0, 19845070887 ns
> Broadcast lock: 0, 19827383656 ns
>
> x2apic physical mode, pv-ipi
>
> Dry-run: 0, 2446381 ns
> Self-IPI: 6788217, 15021056 ns
> Normal IPI: 219454441, 249583458 ns
> Broadcast IPI: 0, 7806540019 ns => 154% performance boost
> Broadcast lock: 0, 9143618799 ns
>
> v2 -> v3:
> * rename ipi_mask_done to irq_restore_exit, __send_ipi_mask return int
> instead of bool
> * fix build errors reported by 0day
> * split patches, nothing change
>
> v1 -> v2:
> * sparse apic id > 128, or any other errors, fallback to original apic hooks
> * have two bitmask arguments so that one hypercall handles 128 vCPUs
> * fix KVM_FEATURE_PV_SEND_IPI doc
> * document hypercall
> * fix NMI selftest fails
> * fix build errors reported by 0day
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
>
> Wanpeng Li (6):
> KVM: X86: Add kvm hypervisor init time platform setup callback
> KVM: X86: Implement PV IPIs in linux guest
> KVM: X86: Fallback to original apic hooks when bad happens
> KVM: X86: Implement PV IPIs send hypercall
> KVM: X86: Add NMI support to PV IPIs
> KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest
>
> Documentation/virtual/kvm/cpuid.txt | 4 ++
> Documentation/virtual/kvm/hypercalls.txt | 6 ++
> arch/x86/include/uapi/asm/kvm_para.h | 1 +
> arch/x86/kernel/kvm.c | 101 +++++++++++++++++++++++++++++++
> arch/x86/kvm/cpuid.c | 3 +-
> arch/x86/kvm/x86.c | 42 +++++++++++++
> include/uapi/linux/kvm_para.h | 1 +
> 7 files changed, 157 insertions(+), 1 deletion(-)
>
> --
> 2.7.4
>

2018-07-19 16:30:04

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

2018-07-03 14:21+0800, Wanpeng Li:
> From: Wanpeng Li <[email protected]>
>
> Implement paravirtual apic hooks to enable PV IPIs.
>
> apic->send_IPI_mask
> apic->send_IPI_mask_allbutself
> apic->send_IPI_allbutself
> apic->send_IPI_all
>
> The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> environment currently, supporting more vCPUs needs to introduce more
> complex logic, in the future this might be extended if needed.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
> }
>
> #ifdef CONFIG_SMP
> +
> +#ifdef CONFIG_X86_64
> +static void __send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> + unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
> + int cpu, apic_id;
> +
> + if (cpumask_empty(mask))
> + return;
> +
> + local_irq_save(flags);
> +
> + for_each_cpu(cpu, mask) {
> + apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> + if (apic_id < BITS_PER_LONG)
> + __set_bit(apic_id, &ipi_bitmap_low);
> + else if (apic_id < 2 * BITS_PER_LONG)
> + __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);

It'd be nicer with 'unsigned long ipi_bitmap[2]' and a single

__set_bit(apic_id, ipi_bitmap);

> + }
> +
> + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);

and

kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);

Still, the main problem is that we can only address 128 APICs.

A simple improvement would reuse the vector field (as we need only 8
bits) and put a 'offset' in the rest. The offset would say which
cluster of 128 are we addressing. 24 bits of offset results in 2^31
total addressable CPUs (we probably should even use that many bits).
The downside of this is that we can only address 128 at a time.

It's basically the same as x2apic cluster mode, only with 128 cluster
size instead of 16, so the code should be a straightforward port.
And because x2apic code doesn't seem to use any division by the cluster
size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
make the cluster size 192. :)

But because it is very similar to x2apic, I'd really need some real
performance data to see if this benefits a real workload.
Hardware could further optimize LAPIC (apicv, vapic) in the future,
which we'd lose by using paravirt.

e.g. AMD's acceleration should be superior to this when using < 8 VCPUs
as they can use logical xAPIC and send without VM exits (when all VCPUs
are running).

> +
> + local_irq_restore(flags);
> +}
> +
> +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> + __send_ipi_mask(mask, vector);
> +}
> +
> +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
> +{
> + unsigned int this_cpu = smp_processor_id();
> + struct cpumask new_mask;
> + const struct cpumask *local_mask;
> +
> + cpumask_copy(&new_mask, mask);
> + cpumask_clear_cpu(this_cpu, &new_mask);
> + local_mask = &new_mask;
> + __send_ipi_mask(local_mask, vector);
> +}
> +
> +static void kvm_send_ipi_allbutself(int vector)
> +{
> + kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
> +}
> +
> +static void kvm_send_ipi_all(int vector)
> +{
> + __send_ipi_mask(cpu_online_mask, vector);

These should be faster when using the native APIC shorthand -- is this
the "Broadcast" in your tests?

> +}
> +
> +/*
> + * Set the IPI entry points
> + */
> +static void kvm_setup_pv_ipi(void)
> +{
> + apic->send_IPI_mask = kvm_send_ipi_mask;
> + apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> + apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
> + apic->send_IPI_all = kvm_send_ipi_all;
> + pr_info("KVM setup pv IPIs\n");
> +}
> +#endif
> +
> static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
> {
> native_smp_prepare_cpus(max_cpus);
> @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
>
> static void __init kvm_apic_init(void)
> {
> +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
> + if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> + num_possible_cpus() <= 2 * BITS_PER_LONG)

It looks that num_possible_cpus() is actually NR_CPUS, so the feature
would never be used on a standard Linux distro.
And we're using APIC_ID, which can be higher even if maximum CPU the
number is lower. Just remove it.

2018-07-19 16:32:35

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs

2018-07-03 14:21+0800, Wanpeng Li:
> From: Wanpeng Li <[email protected]>
>
> The NMI delivery mode of ICR is used to deliver an NMI to the processor,
> and the vector information is ignored.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -479,7 +479,16 @@ static int __send_ipi_mask(const struct cpumask *mask, int vector)
> }
> }
>
> - ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> + switch (vector) {
> + default:
> + icr = APIC_DM_FIXED | vector;
> + break;
> + case NMI_VECTOR:
> + icr = APIC_DM_NMI;

I think it would be better to say that KVM interprets NMI_VECTOR and
sends the interrupt as APIC_DM_NMI.

> + break;
> + }
> +
> + ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, icr);
>

2018-07-19 16:48:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall

On 03/07/2018 08:21, Wanpeng Li wrote:
> +
> + rcu_read_lock();
> + map = rcu_dereference(kvm->arch.apic_map);
> +
> + for_each_set_bit(i, &ipi_bitmap_low, BITS_PER_LONG) {
> + vcpu = map->phys_map[i]->vcpu;
> + if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> + return 1;
> + }
> +
> + for_each_set_bit(i, &ipi_bitmap_high, BITS_PER_LONG) {
> + vcpu = map->phys_map[i + BITS_PER_LONG]->vcpu;
> + if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> + return 1;
> + }
> +

This should be the guest's BITS_PER_LONG, not the host's (i.e. you need
to pass op_64_bit from kvm_emulate_hypercall).

Thanks,

Paolo

2018-07-19 16:49:22

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On 19/07/2018 18:28, Radim Krčmář wrote:
>> +
>> + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> and
>
> kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
>
> Still, the main problem is that we can only address 128 APICs.
>
> A simple improvement would reuse the vector field (as we need only 8
> bits) and put a 'offset' in the rest. The offset would say which
> cluster of 128 are we addressing. 24 bits of offset results in 2^31
> total addressable CPUs (we probably should even use that many bits).
> The downside of this is that we can only address 128 at a time.
>
> It's basically the same as x2apic cluster mode, only with 128 cluster
> size instead of 16, so the code should be a straightforward port.
> And because x2apic code doesn't seem to use any division by the cluster
> size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> make the cluster size 192. :)

I did suggest an offset earlier in the discussion.

The main problem is that consecutive CPU ids do not map to consecutive
APIC ids. But still, we could do an hypercall whenever the total range
exceeds 64. Something like

u64 ipi_bitmap = 0;
for_each_cpu(cpu, mask)
if (!ipi_bitmap) {
min = max = cpu;
} else if (cpu < min && max - cpu < 64) {
ipi_bitmap <<= min - cpu;
min = cpu;
} else if (id < min + 64) {
max = cpu < max ? max : cpu;
} else {
/* ... send hypercall... */
min = max = cpu;
ipi_bitmap = 0;
}
__set_bit(ipi_bitmap, cpu - min);
}
if (ipi_bitmap) {
/* ... send hypercall... */
}

We could keep the cluster size of 128, but it would be more complicated
to do the left shift in the first "else if". If the limit is 64, you
can keep the two arguments in the hypercall, and just pass 0 as the
"high" bitmap on 64-bit kernels.

Paolo

> But because it is very similar to x2apic, I'd really need some real
> performance data to see if this benefits a real workload.
> Hardware could further optimize LAPIC (apicv, vapic) in the future,
> which we'd lose by using paravirt.
>
> e.g. AMD's acceleration should be superior to this when using < 8 VCPUs
> as they can use logical xAPIC and send without VM exits (when all VCPUs
> are running).
>


2018-07-19 17:22:54

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

2018-07-19 18:47+0200, Paolo Bonzini:
> On 19/07/2018 18:28, Radim Krčmář wrote:
> >> +
> >> + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > and
> >
> > kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> >
> > Still, the main problem is that we can only address 128 APICs.
> >
> > A simple improvement would reuse the vector field (as we need only 8
> > bits) and put a 'offset' in the rest. The offset would say which
> > cluster of 128 are we addressing. 24 bits of offset results in 2^31
> > total addressable CPUs (we probably should even use that many bits).
> > The downside of this is that we can only address 128 at a time.
> >
> > It's basically the same as x2apic cluster mode, only with 128 cluster
> > size instead of 16, so the code should be a straightforward port.
> > And because x2apic code doesn't seem to use any division by the cluster
> > size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> > make the cluster size 192. :)
>
> I did suggest an offset earlier in the discussion.
>
> The main problem is that consecutive CPU ids do not map to consecutive
> APIC ids. But still, we could do an hypercall whenever the total range
> exceeds 64. Something like

Right, the cluster x2apic implementation came with a second mapping to make
this in linear time and send as little IPIs as possible:

· /* Collapse cpus in a cluster so a single IPI per cluster is sent */
· for_each_cpu(cpu, tmpmsk) {
· · struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);

· · dest = 0;
· · for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
· · · dest |= per_cpu(x86_cpu_to_logical_apicid, clustercpu);

· · if (!dest)
· · · continue;

· · __x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
· · /* Remove cluster CPUs from tmpmask */
· · cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
· }

I think that the extra memory consumption would be excusable.

2018-07-19 23:07:17

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On Mon, Jul 2, 2018 at 11:23 PM Wanpeng Li <[email protected]> wrote:
>
> From: Wanpeng Li <[email protected]>
>
> Implement paravirtual apic hooks to enable PV IPIs.

Very cool. Thanks for working on this!

>
> apic->send_IPI_mask
> apic->send_IPI_mask_allbutself
> apic->send_IPI_allbutself
> apic->send_IPI_all
>
> The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> environment currently,

From the Cloud perspective, 128 vCPUs is already obsolete. GCE's
n1-utlramem-160 VMs have 160 vCPUs where the maximum APIC ID is 231.
I'd definitely prefer an approach that scales to higher APIC IDs, like
Paolo's offset idea.

To Radim's point of real world performance testing, do you know what
is the primary source of multi-target IPIs? If it's TLB shootdowns we
might get a bigger bang for our buck with a PV TLB Shootdown.

> supporting more vCPUs needs to introduce more
> complex logic, in the future this might be extended if needed.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> arch/x86/include/uapi/asm/kvm_para.h | 1 +
> arch/x86/kernel/kvm.c | 70 ++++++++++++++++++++++++++++++++++++
> include/uapi/linux/kvm_para.h | 1 +
> 3 files changed, 72 insertions(+)
>
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 0ede697..19980ec 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -28,6 +28,7 @@
> #define KVM_FEATURE_PV_UNHALT 7
> #define KVM_FEATURE_PV_TLB_FLUSH 9
> #define KVM_FEATURE_ASYNC_PF_VMEXIT 10
> +#define KVM_FEATURE_PV_SEND_IPI 11
>
> #define KVM_HINTS_REALTIME 0
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 591bcf2..2fe1420 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
> }
>
> #ifdef CONFIG_SMP
> +
> +#ifdef CONFIG_X86_64
> +static void __send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> + unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
> + int cpu, apic_id;
> +
> + if (cpumask_empty(mask))
> + return;
> +
> + local_irq_save(flags);
> +
> + for_each_cpu(cpu, mask) {
> + apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> + if (apic_id < BITS_PER_LONG)
> + __set_bit(apic_id, &ipi_bitmap_low);
> + else if (apic_id < 2 * BITS_PER_LONG)
> + __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
> + }
> +
> + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> +
> + local_irq_restore(flags);
> +}
> +
> +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> + __send_ipi_mask(mask, vector);
> +}
> +
> +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
> +{
> + unsigned int this_cpu = smp_processor_id();
> + struct cpumask new_mask;
> + const struct cpumask *local_mask;
> +
> + cpumask_copy(&new_mask, mask);
> + cpumask_clear_cpu(this_cpu, &new_mask);
> + local_mask = &new_mask;
> + __send_ipi_mask(local_mask, vector);
> +}
> +
> +static void kvm_send_ipi_allbutself(int vector)
> +{
> + kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
> +}
> +
> +static void kvm_send_ipi_all(int vector)
> +{
> + __send_ipi_mask(cpu_online_mask, vector);
> +}
> +
> +/*
> + * Set the IPI entry points
> + */
> +static void kvm_setup_pv_ipi(void)
> +{
> + apic->send_IPI_mask = kvm_send_ipi_mask;
> + apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> + apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
> + apic->send_IPI_all = kvm_send_ipi_all;
> + pr_info("KVM setup pv IPIs\n");
> +}
> +#endif
> +
> static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
> {
> native_smp_prepare_cpus(max_cpus);
> @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
>
> static void __init kvm_apic_init(void)
> {
> +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
> + if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> + num_possible_cpus() <= 2 * BITS_PER_LONG)
> + kvm_setup_pv_ipi();
> +#endif
> }
>
> static void __init kvm_init_platform(void)
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index dcf629d..84f8fe3 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -26,6 +26,7 @@
> #define KVM_HC_MIPS_EXIT_VM 7
> #define KVM_HC_MIPS_CONSOLE_OUTPUT 8
> #define KVM_HC_CLOCK_PAIRING 9
> +#define KVM_HC_SEND_IPI 10
>
> /*
> * hypercalls use architecture specific
> --
> 2.7.4
>

2018-07-20 03:34:22

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On Fri, 20 Jul 2018 at 00:28, Radim Krčmář <[email protected]> wrote:
>
> 2018-07-03 14:21+0800, Wanpeng Li:
> > From: Wanpeng Li <[email protected]>
> >
> > Implement paravirtual apic hooks to enable PV IPIs.
> >
> > apic->send_IPI_mask
> > apic->send_IPI_mask_allbutself
> > apic->send_IPI_allbutself
> > apic->send_IPI_all
> >
> > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> > environment currently, supporting more vCPUs needs to introduce more
> > complex logic, in the future this might be extended if needed.
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Vitaly Kuznetsov <[email protected]>
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
> > }
> >
> > #ifdef CONFIG_SMP
> > +
> > +#ifdef CONFIG_X86_64
> > +static void __send_ipi_mask(const struct cpumask *mask, int vector)
> > +{
> > + unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
> > + int cpu, apic_id;
> > +
> > + if (cpumask_empty(mask))
> > + return;
> > +
> > + local_irq_save(flags);
> > +
> > + for_each_cpu(cpu, mask) {
> > + apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> > + if (apic_id < BITS_PER_LONG)
> > + __set_bit(apic_id, &ipi_bitmap_low);
> > + else if (apic_id < 2 * BITS_PER_LONG)
> > + __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
>
> It'd be nicer with 'unsigned long ipi_bitmap[2]' and a single
>
> __set_bit(apic_id, ipi_bitmap);
>
> > + }
> > +
> > + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
>
> and
>
> kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
>
> Still, the main problem is that we can only address 128 APICs.
>
> A simple improvement would reuse the vector field (as we need only 8
> bits) and put a 'offset' in the rest. The offset would say which
> cluster of 128 are we addressing. 24 bits of offset results in 2^31
> total addressable CPUs (we probably should even use that many bits).
> The downside of this is that we can only address 128 at a time.
>
> It's basically the same as x2apic cluster mode, only with 128 cluster
> size instead of 16, so the code should be a straightforward port.
> And because x2apic code doesn't seem to use any division by the cluster
> size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> make the cluster size 192. :)
>
> But because it is very similar to x2apic, I'd really need some real
> performance data to see if this benefits a real workload.

Thanks for your review, Radim! :) I will find another real benchmark
instead of the micro one to evaluate the performance.

> Hardware could further optimize LAPIC (apicv, vapic) in the future,
> which we'd lose by using paravirt.
>
> e.g. AMD's acceleration should be superior to this when using < 8 VCPUs
> as they can use logical xAPIC and send without VM exits (when all VCPUs
> are running).
>
> > +
> > + local_irq_restore(flags);
> > +}
> > +
> > +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
> > +{
> > + __send_ipi_mask(mask, vector);
> > +}
> > +
> > +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
> > +{
> > + unsigned int this_cpu = smp_processor_id();
> > + struct cpumask new_mask;
> > + const struct cpumask *local_mask;
> > +
> > + cpumask_copy(&new_mask, mask);
> > + cpumask_clear_cpu(this_cpu, &new_mask);
> > + local_mask = &new_mask;
> > + __send_ipi_mask(local_mask, vector);
> > +}
> > +
> > +static void kvm_send_ipi_allbutself(int vector)
> > +{
> > + kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
> > +}
> > +
> > +static void kvm_send_ipi_all(int vector)
> > +{
> > + __send_ipi_mask(cpu_online_mask, vector);
>
> These should be faster when using the native APIC shorthand -- is this
> the "Broadcast" in your tests?

Not true, .send_IPI_all almost no callers though linux apic drivers
implement this hook, in addition, shortcut is not used for x2apic
mode(__x2apic_send_IPI_dest()), and very limited using in other
scenarios according to linux apic drivers.

>
> > +}
> > +
> > +/*
> > + * Set the IPI entry points
> > + */
> > +static void kvm_setup_pv_ipi(void)
> > +{
> > + apic->send_IPI_mask = kvm_send_ipi_mask;
> > + apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> > + apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
> > + apic->send_IPI_all = kvm_send_ipi_all;
> > + pr_info("KVM setup pv IPIs\n");
> > +}
> > +#endif
> > +
> > static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
> > {
> > native_smp_prepare_cpus(max_cpus);
> > @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
> >
> > static void __init kvm_apic_init(void)
> > {
> > +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
> > + if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> > + num_possible_cpus() <= 2 * BITS_PER_LONG)
>
> It looks that num_possible_cpus() is actually NR_CPUS, so the feature
> would never be used on a standard Linux distro.
> And we're using APIC_ID, which can be higher even if maximum CPU the
> number is lower. Just remove it.

Will do.

Regards,
Wanpeng Li

2018-07-20 03:35:55

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On Fri, 20 Jul 2018 at 00:47, Paolo Bonzini <[email protected]> wrote:
>
> On 19/07/2018 18:28, Radim Krčmář wrote:
> >> +
> >> + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > and
> >
> > kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> >
> > Still, the main problem is that we can only address 128 APICs.
> >
> > A simple improvement would reuse the vector field (as we need only 8
> > bits) and put a 'offset' in the rest. The offset would say which
> > cluster of 128 are we addressing. 24 bits of offset results in 2^31
> > total addressable CPUs (we probably should even use that many bits).
> > The downside of this is that we can only address 128 at a time.
> >
> > It's basically the same as x2apic cluster mode, only with 128 cluster
> > size instead of 16, so the code should be a straightforward port.
> > And because x2apic code doesn't seem to use any division by the cluster
> > size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> > make the cluster size 192. :)
>
> I did suggest an offset earlier in the discussion.
>
> The main problem is that consecutive CPU ids do not map to consecutive
> APIC ids. But still, we could do an hypercall whenever the total range
> exceeds 64. Something like
>
> u64 ipi_bitmap = 0;
> for_each_cpu(cpu, mask)
> if (!ipi_bitmap) {
> min = max = cpu;
> } else if (cpu < min && max - cpu < 64) {
> ipi_bitmap <<= min - cpu;
> min = cpu;
> } else if (id < min + 64) {
> max = cpu < max ? max : cpu;
> } else {
> /* ... send hypercall... */
> min = max = cpu;
> ipi_bitmap = 0;
> }
> __set_bit(ipi_bitmap, cpu - min);
> }
> if (ipi_bitmap) {
> /* ... send hypercall... */
> }
>
> We could keep the cluster size of 128, but it would be more complicated
> to do the left shift in the first "else if". If the limit is 64, you
> can keep the two arguments in the hypercall, and just pass 0 as the
> "high" bitmap on 64-bit kernels.

Cool, i will try the offset method in next version. Thanks for your
review, Paolo! :)

Regards,
Wanpeng Li

2018-07-20 03:46:24

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On Fri, 20 Jul 2018 at 07:05, David Matlack <[email protected]> wrote:
>
> On Mon, Jul 2, 2018 at 11:23 PM Wanpeng Li <[email protected]> wrote:
> >
> > From: Wanpeng Li <[email protected]>
> >
> > Implement paravirtual apic hooks to enable PV IPIs.
>
> Very cool. Thanks for working on this!

Thanks David!

>
> >
> > apic->send_IPI_mask
> > apic->send_IPI_mask_allbutself
> > apic->send_IPI_allbutself
> > apic->send_IPI_all
> >
> > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> > environment currently,
>
> From the Cloud perspective, 128 vCPUs is already obsolete. GCE's
> n1-utlramem-160 VMs have 160 vCPUs where the maximum APIC ID is 231.
> I'd definitely prefer an approach that scales to higher APIC IDs, like
> Paolo's offset idea.

Ok, I will try the offset method in next version.

>
> To Radim's point of real world performance testing, do you know what
> is the primary source of multi-target IPIs? If it's TLB shootdowns we
> might get a bigger bang for our buck with a PV TLB Shootdown.

The "Function Call interrupts", there is a lot of callers for
smp_call_function_many() except TLB Shootdowns in linux kernel which
try to run a function on a set of other CPUs. TLB Shootdown still can
get benefit from PV IPIs even if PV TLB Shootdown is enabled since
IPIs should be sent to the vCPUs which are active and will incur
vmexits. PV IPIs will benefit both vCPUs overcommit and
non-overcommit(which PV TLB Shootdown can't help) scenarios. Btw,
hyperv also implements PV IPIs even if PV TLB Shootdown is present.
https://lkml.org/lkml/2018/7/3/537

Regards,
Wanpeng Li

2018-07-20 03:50:21

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall

On Fri, 20 Jul 2018 at 00:47, Paolo Bonzini <[email protected]> wrote:
>
> On 03/07/2018 08:21, Wanpeng Li wrote:
> > +
> > + rcu_read_lock();
> > + map = rcu_dereference(kvm->arch.apic_map);
> > +
> > + for_each_set_bit(i, &ipi_bitmap_low, BITS_PER_LONG) {
> > + vcpu = map->phys_map[i]->vcpu;
> > + if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> > + return 1;
> > + }
> > +
> > + for_each_set_bit(i, &ipi_bitmap_high, BITS_PER_LONG) {
> > + vcpu = map->phys_map[i + BITS_PER_LONG]->vcpu;
> > + if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> > + return 1;
> > + }
> > +
>
> This should be the guest's BITS_PER_LONG, not the host's (i.e. you need
> to pass op_64_bit from kvm_emulate_hypercall).

Will do in next version.

Regards,
Wanpeng Li

2018-07-20 03:53:51

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs

On Fri, 20 Jul 2018 at 00:31, Radim Krčmář <[email protected]> wrote:
>
> 2018-07-03 14:21+0800, Wanpeng Li:
> > From: Wanpeng Li <[email protected]>
> >
> > The NMI delivery mode of ICR is used to deliver an NMI to the processor,
> > and the vector information is ignored.
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Vitaly Kuznetsov <[email protected]>
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > @@ -479,7 +479,16 @@ static int __send_ipi_mask(const struct cpumask *mask, int vector)
> > }
> > }
> >
> > - ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > + switch (vector) {
> > + default:
> > + icr = APIC_DM_FIXED | vector;
> > + break;
> > + case NMI_VECTOR:
> > + icr = APIC_DM_NMI;
>
> I think it would be better to say that KVM interprets NMI_VECTOR and
> sends the interrupt as APIC_DM_NMI.

Yeah, in addition, SDM 10.6.1 also mentioned that:
Delivery mode:
100 (NMI) Delivers an NMI interrupt to the target processor or
processors. The vector information is ignored.

Regards,
Wanpeng Li

2018-07-20 06:00:03

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On Fri, 20 Jul 2018 at 00:47, Paolo Bonzini <[email protected]> wrote:
>
> On 19/07/2018 18:28, Radim Krčmář wrote:
> >> +
> >> + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > and
> >
> > kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> >
> > Still, the main problem is that we can only address 128 APICs.
> >
> > A simple improvement would reuse the vector field (as we need only 8
> > bits) and put a 'offset' in the rest. The offset would say which
> > cluster of 128 are we addressing. 24 bits of offset results in 2^31
> > total addressable CPUs (we probably should even use that many bits).
> > The downside of this is that we can only address 128 at a time.
> >
> > It's basically the same as x2apic cluster mode, only with 128 cluster
> > size instead of 16, so the code should be a straightforward port.
> > And because x2apic code doesn't seem to use any division by the cluster
> > size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> > make the cluster size 192. :)
>
> I did suggest an offset earlier in the discussion.
>
> The main problem is that consecutive CPU ids do not map to consecutive
> APIC ids. But still, we could do an hypercall whenever the total range
> exceeds 64. Something like
>
> u64 ipi_bitmap = 0;
> for_each_cpu(cpu, mask)
> if (!ipi_bitmap) {
> min = max = cpu;
> } else if (cpu < min && max - cpu < 64) {
> ipi_bitmap <<= min - cpu;
> min = cpu;
> } else if (id < min + 64) {
> max = cpu < max ? max : cpu;
> } else {
> /* ... send hypercall... */
> min = max = cpu;
> ipi_bitmap = 0;
> }
> __set_bit(ipi_bitmap, cpu - min);
> }
> if (ipi_bitmap) {
> /* ... send hypercall... */
> }
>
> We could keep the cluster size of 128, but it would be more complicated
> to do the left shift in the first "else if". If the limit is 64, you
> can keep the two arguments in the hypercall, and just pass 0 as the
> "high" bitmap on 64-bit kernels.

As David pointed out, we need to scale to higher APIC IDs. I will add
the cpu id to apic id transfer in the for loop. How about:
kvm_hypercall2(KVM_HC_SEND_IPI, ipi_bitmap, vector); directly. In
addition, why need to pass the 0 as the "high" bitmap even if for 128
vCPUs case?

Regards,
Wanpeng Li

2018-07-20 08:05:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs

On 20/07/2018 05:53, Wanpeng Li wrote:
>>> - ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
>>> + switch (vector) {
>>> + default:
>>> + icr = APIC_DM_FIXED | vector;
>>> + break;
>>> + case NMI_VECTOR:
>>> + icr = APIC_DM_NMI;
>> I think it would be better to say that KVM interprets NMI_VECTOR and
>> sends the interrupt as APIC_DM_NMI.

It's not KVM, this is arch/x86/kernel/kvm.c so the guest side.

Paolo

> Yeah, in addition, SDM 10.6.1 also mentioned that:
> Delivery mode:
> 100 (NMI) Delivers an NMI interrupt to the target processor or
> processors. The vector information is ignored.


2018-07-20 08:08:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On 20/07/2018 07:58, Wanpeng Li wrote:
>>
>> We could keep the cluster size of 128, but it would be more complicated
>> to do the left shift in the first "else if". If the limit is 64, you
>> can keep the two arguments in the hypercall, and just pass 0 as the
>> "high" bitmap on 64-bit kernels.
> As David pointed out, we need to scale to higher APIC IDs.

The offset is enough to scale to higher APIC IDs.

It's just an optimization to allow 128 CPUs per hypercall instead of 64
CPUs. But actually you can use __uint128_t on 64-bit machines, I forgot
about that. With u64 on 32-bit and __uint128_t on 64-bit, you can do 64
CPUs per hypercall on 32-bit and 128 CPUs per hypercall on 64-bit.

I will add
> the cpu id to apic id transfer in the for loop. How about:
> kvm_hypercall2(KVM_HC_SEND_IPI, ipi_bitmap, vector); directly. In
> addition, why need to pass the 0 as the "high" bitmap even if for 128
> vCPUs case?


2018-07-20 09:52:28

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

2018-07-20 11:33+0800, Wanpeng Li:
> On Fri, 20 Jul 2018 at 00:28, Radim Krčmář <[email protected]> wrote:
> > 2018-07-03 14:21+0800, Wanpeng Li:
> > But because it is very similar to x2apic, I'd really need some real
> > performance data to see if this benefits a real workload.
>
> Thanks for your review, Radim! :) I will find another real benchmark
> instead of the micro one to evaluate the performance.

Analyzing the cpu bitmap for every IPI request on a non-small guest (at
least 32 VCPUs, ideally >256) during various workloads could also
provide some insight regardless of workload/benchmark result -- we want
to know how many VM exits we would save.

> > > +static void kvm_send_ipi_all(int vector)
> > > +{
> > > + __send_ipi_mask(cpu_online_mask, vector);
> >
> > These should be faster when using the native APIC shorthand -- is this
> > the "Broadcast" in your tests?
>
> Not true, .send_IPI_all almost no callers though linux apic drivers
> implement this hook, in addition, shortcut is not used for x2apic
> mode(__x2apic_send_IPI_dest()), and very limited using in other
> scenarios according to linux apic drivers.

Good point,
(xAPIC is using shorthands, so I didn't expect we'd stop doing so on
x2APIC, but there was probably no need.)

thanks.

2018-07-20 10:18:32

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

On Fri, 20 Jul 2018 at 17:51, Radim Krcmar <[email protected]> wrote:
>
> 2018-07-20 11:33+0800, Wanpeng Li:
> > On Fri, 20 Jul 2018 at 00:28, Radim Krčmář <[email protected]> wrote:
> > > 2018-07-03 14:21+0800, Wanpeng Li:
> > > But because it is very similar to x2apic, I'd really need some real
> > > performance data to see if this benefits a real workload.
> >
> > Thanks for your review, Radim! :) I will find another real benchmark
> > instead of the micro one to evaluate the performance.
>
> Analyzing the cpu bitmap for every IPI request on a non-small guest (at
> least 32 VCPUs, ideally >256) during various workloads could also
> provide some insight regardless of workload/benchmark result -- we want
> to know how many VM exits we would save.

I will try ebizzy benchmark, just complete the patchset w/ __uint128
which Paolo just suggested. In addition, I remember aliyun posted the
performance number for their online real workload "message oriented
middleware" from 800 k/s vmexits w/ PV IPIs to 150 k/s vmexits w/ PV
IPIs.

Regards,
Wanpeng Li

2018-07-20 13:13:40

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest

2018-07-20 11:45+0800, Wanpeng Li:
> On Fri, 20 Jul 2018 at 07:05, David Matlack <[email protected]> wrote:
> >
> > On Mon, Jul 2, 2018 at 11:23 PM Wanpeng Li <[email protected]> wrote:
> > >
> > > From: Wanpeng Li <[email protected]>
> > >
> > > Implement paravirtual apic hooks to enable PV IPIs.
> >
> > Very cool. Thanks for working on this!
>
> Thanks David!
>
> >
> > >
> > > apic->send_IPI_mask
> > > apic->send_IPI_mask_allbutself
> > > apic->send_IPI_allbutself
> > > apic->send_IPI_all
> > >
> > > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> > > environment currently,
> >
> > From the Cloud perspective, 128 vCPUs is already obsolete. GCE's
> > n1-utlramem-160 VMs have 160 vCPUs where the maximum APIC ID is 231.
> > I'd definitely prefer an approach that scales to higher APIC IDs, like
> > Paolo's offset idea.
>
> Ok, I will try the offset method in next version.
>
> >
> > To Radim's point of real world performance testing, do you know what
> > is the primary source of multi-target IPIs? If it's TLB shootdowns we
> > might get a bigger bang for our buck with a PV TLB Shootdown.

I assume it is the TLB shootdown by a large margin, but never profiled
it.

We could add a more complex PV TLB shootdown that does the whole TLB
invalidation inside KVM, but I don't think it would be significantly
faster, because we need to force a VM exit if the target VCPU is
running. With PV IPI, we get exit-less IPI injection and a VM exit for
TLB invalidation.

> The "Function Call interrupts", there is a lot of callers for
> smp_call_function_many() except TLB Shootdowns in linux kernel which
> try to run a function on a set of other CPUs. TLB Shootdown still can
> get benefit from PV IPIs even if PV TLB Shootdown is enabled since
> IPIs should be sent to the vCPUs which are active and will incur
> vmexits. PV IPIs will benefit both vCPUs overcommit and
> non-overcommit(which PV TLB Shootdown can't help) scenarios. Btw,
> hyperv also implements PV IPIs even if PV TLB Shootdown is present.
> https://lkml.org/lkml/2018/7/3/537

Hyper-V in a better spot as guests can be using hypervisor's VPIDX for
cpu bitmaps and pass it to the IPI hypercall, so there is no APIC ID
translation on either side -- the IPI hypercall seems worth it just for
the simpler logic.

Hyper-V can also address up to 4096 VCPUs in one hypercall (optimized as
sparse 2 level 64-ary tree), so we might want to do something like that
(with higher ceiling) as the probability of being within 128 APIC IDs
should rapidly diminish with growing number of VCPUs.

2018-07-20 13:28:13

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs

2018-07-20 10:04+0200, Paolo Bonzini:
> On 20/07/2018 05:53, Wanpeng Li wrote:
> >>> - ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> >>> + switch (vector) {
> >>> + default:
> >>> + icr = APIC_DM_FIXED | vector;
> >>> + break;
> >>> + case NMI_VECTOR:
> >>> + icr = APIC_DM_NMI;
> >> I think it would be better to say that KVM interprets NMI_VECTOR and
> >> sends the interrupt as APIC_DM_NMI.
>
> It's not KVM, this is arch/x86/kernel/kvm.c so the guest side.

Yes, I was preparing to drop the delivery mode bits as they are not
needed for NMI_VECTOR: real APIC can't send nor receive vectors 0-15, so
we could just say that NMI_VECTOR is sendable through this interface and
is delivered as APIC_DM_NMI.

I didn't realize that this allows the other delivery modes. They are
not useful [1], but we don't really have a better use for the three
bits. The only other option so far is extending the cluster index to
increase addressability ... I'm starting to reconsider as addressing
another few millions VCPUs would probably be even more useless.

Wanpeng, sorry for the confusion, feel free to use any version.
Document the hypercall layout, though,

thanks.

---
1: I think that OSes will use unicast INIT+SIPI as it's easier to call
it after preparing structures for the new CPU; SMI is pointless;
LowestPriority shouldn't be supported; and Reserved (APIC_DM_REMRD)
is reused for KVM unhalt, where unicast is most likely.

2018-07-23 00:53:24

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs

On Fri, 20 Jul 2018 at 21:27, Radim Krcmar <[email protected]> wrote:
>
> 2018-07-20 10:04+0200, Paolo Bonzini:
> > On 20/07/2018 05:53, Wanpeng Li wrote:
> > >>> - ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > >>> + switch (vector) {
> > >>> + default:
> > >>> + icr = APIC_DM_FIXED | vector;
> > >>> + break;
> > >>> + case NMI_VECTOR:
> > >>> + icr = APIC_DM_NMI;
> > >> I think it would be better to say that KVM interprets NMI_VECTOR and
> > >> sends the interrupt as APIC_DM_NMI.
> >
> > It's not KVM, this is arch/x86/kernel/kvm.c so the guest side.
>
> Yes, I was preparing to drop the delivery mode bits as they are not
> needed for NMI_VECTOR: real APIC can't send nor receive vectors 0-15, so
> we could just say that NMI_VECTOR is sendable through this interface and
> is delivered as APIC_DM_NMI.
>
> I didn't realize that this allows the other delivery modes. They are
> not useful [1], but we don't really have a better use for the three
> bits. The only other option so far is extending the cluster index to
> increase addressability ... I'm starting to reconsider as addressing
> another few millions VCPUs would probably be even more useless.
>
> Wanpeng, sorry for the confusion, feel free to use any version.
> Document the hypercall layout, though,

No problem, you and Paolo always help me a lot, thanks for that! :)

Regards,
Wanpeng Li

>
> thanks.
>
> ---
> 1: I think that OSes will use unicast INIT+SIPI as it's easier to call
> it after preparing structures for the new CPU; SMI is pointless;
> LowestPriority shouldn't be supported; and Reserved (APIC_DM_REMRD)
> is reused for KVM unhalt, where unicast is most likely.