2021-10-20 12:08:54

by zhenwei pi

[permalink] [raw]
Subject: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

Although host side exposes KVM PV SEND IPI feature to guest side,
guest should still have a chance to disable it.

A typicall case of this parameter:
If the host AMD server enables AVIC feature, the flat mode of APIC
get better performance in the guest.

Signed-off-by: zhenwei pi <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 2 ++
arch/x86/kernel/kvm.c | 13 ++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..73b8712b94b0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3495,6 +3495,8 @@
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.

+ no-kvm-pvipi [X86,KVM] Disable paravirtualized KVM send IPI.
+
no-vmw-sched-clock
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b656456c3a94..911f1cd2bec5 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -62,6 +62,17 @@ static int __init parse_no_stealacc(char *arg)

early_param("no-steal-acc", parse_no_stealacc);

+static int kvm_pvipi = 1;
+
+static int __init parse_no_kvm_pvipi(char *arg)
+{
+ kvm_pvipi = 0;
+
+ return 0;
+}
+
+early_param("no-kvm-pvipi", parse_no_kvm_pvipi);
+
static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64) __visible;
static int has_steal_clock = 0;
@@ -795,7 +806,7 @@ static uint32_t __init kvm_detect(void)
static void __init kvm_apic_init(void)
{
#ifdef CONFIG_SMP
- if (pv_ipi_supported())
+ if (pv_ipi_supported() && kvm_pvipi)
kvm_setup_pv_ipi();
#endif
}
--
2.25.1


2021-10-20 12:24:28

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On Wed, 20 Oct 2021 at 20:08, zhenwei pi <[email protected]> wrote:
>
> Although host side exposes KVM PV SEND IPI feature to guest side,
> guest should still have a chance to disable it.
>
> A typicall case of this parameter:
> If the host AMD server enables AVIC feature, the flat mode of APIC
> get better performance in the guest.

Hmm, I didn't find enough valuable information in your posting. We
observe AMD a lot before.
https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u

Wanpeng

2021-10-20 20:16:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On Wed, Oct 20, 2021, Wanpeng Li wrote:
> On Wed, 20 Oct 2021 at 20:08, zhenwei pi <[email protected]> wrote:
> >
> > Although host side exposes KVM PV SEND IPI feature to guest side,
> > guest should still have a chance to disable it.
> >
> > A typicall case of this parameter:
> > If the host AMD server enables AVIC feature, the flat mode of APIC
> > get better performance in the guest.
>
> Hmm, I didn't find enough valuable information in your posting. We
> observe AMD a lot before.
> https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u

I too would like to see numbers. I suspect the answer is going to be that
AVIC performs poorly in CPU overcommit scenarios because of the cost of managing
the tables and handling "failed delivery" exits, but that AVIC does quite well
when vCPUs are pinned 1:1 and IPIs rarely require an exit to the host.

2021-10-21 03:10:39

by zhenwei pi

[permalink] [raw]
Subject: Re: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi


On 10/21/21 4:12 AM, Sean Christopherson wrote:
> On Wed, Oct 20, 2021, Wanpeng Li wrote:
>> On Wed, 20 Oct 2021 at 20:08, zhenwei pi <[email protected]> wrote:
>>>
>>> Although host side exposes KVM PV SEND IPI feature to guest side,
>>> guest should still have a chance to disable it.
>>>
>>> A typicall case of this parameter:
>>> If the host AMD server enables AVIC feature, the flat mode of APIC
>>> get better performance in the guest.
>>
>> Hmm, I didn't find enough valuable information in your posting. We
>> observe AMD a lot before.
>> https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u
>
> I too would like to see numbers. I suspect the answer is going to be that
> AVIC performs poorly in CPU overcommit scenarios because of the cost of managing
> the tables and handling "failed delivery" exits, but that AVIC does quite well
> when vCPUs are pinned 1:1 and IPIs rarely require an exit to the host.
>

Test env:
CPU: AMD EPYC 7642 48-Core Processor

Kmod args(enable avic and disable nested):
modprobe kvm-amd nested=0 avic=1 npt=1

QEMU args(disable x2apic):
... -cpu host,x2apic=off ...

Benchmark tool:
https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/apic-ipi

~# insmod apic_ipi.ko options=5 && dmesg -c

apic_ipi: 1 NUMA node(s)
apic_ipi: apic [flat]
apic_ipi: apic->send_IPI[default_send_IPI_single+0x0/0x40]
apic_ipi: apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
apic_ipi: total cycles 375671259, avg 3756
apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
apic_ipi: total cycles 221961822, avg 2219


apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
-> This line show current send_IPI_mask is kvm_send_ipi_mask(because
of PV SEND IPI FEATURE)

apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
apic_ipi: total cycles 375671259, avg 3756
-->These lines show the average cycles of each kvm_send_ipi_mask: 3756

apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
apic_ipi: total cycles 221961822, avg 2219
-->These lines show the average cycles of each flat_send_IPI_mask: 2219


--
zhenwei pi

2021-10-21 05:10:54

by Wanpeng Li

[permalink] [raw]
Subject: Re: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On Thu, 21 Oct 2021 at 11:05, zhenwei pi <[email protected]> wrote:
>
>
> On 10/21/21 4:12 AM, Sean Christopherson wrote:
> > On Wed, Oct 20, 2021, Wanpeng Li wrote:
> >> On Wed, 20 Oct 2021 at 20:08, zhenwei pi <[email protected]> wrote:
> >>>
> >>> Although host side exposes KVM PV SEND IPI feature to guest side,
> >>> guest should still have a chance to disable it.
> >>>
> >>> A typicall case of this parameter:
> >>> If the host AMD server enables AVIC feature, the flat mode of APIC
> >>> get better performance in the guest.
> >>
> >> Hmm, I didn't find enough valuable information in your posting. We
> >> observe AMD a lot before.
> >> https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u
> >
> > I too would like to see numbers. I suspect the answer is going to be that
> > AVIC performs poorly in CPU overcommit scenarios because of the cost of managing
> > the tables and handling "failed delivery" exits, but that AVIC does quite well
> > when vCPUs are pinned 1:1 and IPIs rarely require an exit to the host.
> >
>
> Test env:
> CPU: AMD EPYC 7642 48-Core Processor
>
> Kmod args(enable avic and disable nested):
> modprobe kvm-amd nested=0 avic=1 npt=1
>
> QEMU args(disable x2apic):
> ... -cpu host,x2apic=off ...
>
> Benchmark tool:
> https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/apic-ipi
>
> ~# insmod apic_ipi.ko options=5 && dmesg -c
>
> apic_ipi: 1 NUMA node(s)
> apic_ipi: apic [flat]
> apic_ipi: apic->send_IPI[default_send_IPI_single+0x0/0x40]
> apic_ipi: apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
> apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
> apic_ipi: total cycles 375671259, avg 3756
> apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
> apic_ipi: total cycles 221961822, avg 2219
>
>
> apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
> -> This line show current send_IPI_mask is kvm_send_ipi_mask(because
> of PV SEND IPI FEATURE)
>
> apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
> apic_ipi: total cycles 375671259, avg 3756
> -->These lines show the average cycles of each kvm_send_ipi_mask: 3756
>
> apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
> apic_ipi: total cycles 221961822, avg 2219
> -->These lines show the average cycles of each flat_send_IPI_mask: 2219

Just single target IPI is not eough.

Wanpeng

2021-10-21 07:22:46

by zhenwei pi

[permalink] [raw]
Subject: Re: Re: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On 10/21/21 1:03 PM, Wanpeng Li wrote:
> On Thu, 21 Oct 2021 at 11:05, zhenwei pi <[email protected]> wrote:
>>
>>
>> On 10/21/21 4:12 AM, Sean Christopherson wrote:
>>> On Wed, Oct 20, 2021, Wanpeng Li wrote:
>>>> On Wed, 20 Oct 2021 at 20:08, zhenwei pi <[email protected]> wrote:
>>>>>
>>>>> Although host side exposes KVM PV SEND IPI feature to guest side,
>>>>> guest should still have a chance to disable it.
>>>>>
>>>>> A typicall case of this parameter:
>>>>> If the host AMD server enables AVIC feature, the flat mode of APIC
>>>>> get better performance in the guest.
>>>>
>>>> Hmm, I didn't find enough valuable information in your posting. We
>>>> observe AMD a lot before.
>>>> https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u
>>>
>>> I too would like to see numbers. I suspect the answer is going to be that
>>> AVIC performs poorly in CPU overcommit scenarios because of the cost of managing
>>> the tables and handling "failed delivery" exits, but that AVIC does quite well
>>> when vCPUs are pinned 1:1 and IPIs rarely require an exit to the host.
>>>
>>
>> Test env:
>> CPU: AMD EPYC 7642 48-Core Processor
>>
>> Kmod args(enable avic and disable nested):
>> modprobe kvm-amd nested=0 avic=1 npt=1
>>
>> QEMU args(disable x2apic):
>> ... -cpu host,x2apic=off ...
>>
>> Benchmark tool:
>> https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/apic-ipi
>>
>> ~# insmod apic_ipi.ko options=5 && dmesg -c
>>
>> apic_ipi: 1 NUMA node(s)
>> apic_ipi: apic [flat]
>> apic_ipi: apic->send_IPI[default_send_IPI_single+0x0/0x40]
>> apic_ipi: apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
>> apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
>> apic_ipi: total cycles 375671259, avg 3756
>> apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
>> apic_ipi: total cycles 221961822, avg 2219
>>
>>
>> apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
>> -> This line show current send_IPI_mask is kvm_send_ipi_mask(because
>> of PV SEND IPI FEATURE)
>>
>> apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
>> apic_ipi: total cycles 375671259, avg 3756
>> -->These lines show the average cycles of each kvm_send_ipi_mask: 3756
>>
>> apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
>> apic_ipi: total cycles 221961822, avg 2219
>> -->These lines show the average cycles of each flat_send_IPI_mask: 2219
>
> Just single target IPI is not eough.
>
> Wanpeng
>

Benchmark smp_call_function_single
(https://github.com/bytedance/kvm-utils/blob/master/microbenchmark/ipi-bench/ipi_bench.c):

Test env:
CPU: AMD EPYC 7642 48-Core Processor

Kmod args(enable avic and disable nested):
modprobe kvm-amd nested=0 avic=1 npt=1

QEMU args(disable x2apic):
... -cpu host,x2apic=off ...

1> without no-kvm-pvipi:
ipi_bench_single wait[1], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
elapsed = 424945631 cycles, average = 4249 cycles
ipitime = 385246136 cycles, average = 3852 cycles
ipi_bench_single wait[0], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
elapsed = 419057953 cycles, average = 4190 cycles

2> with no-kvm-pvipi:
ipi_bench_single wait[1], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
elapsed = 321756407 cycles, average = 3217 cycles
ipitime = 299433550 cycles, average = 2994 cycles
ipi_bench_single wait[0], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
elapsed = 295382146 cycles, average = 2953 cycles


--
zhenwei pi

2021-10-25 03:41:12

by zhenwei pi

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On 10/21/21 3:17 PM, zhenwei pi wrote:
> On 10/21/21 1:03 PM, Wanpeng Li wrote:
>> On Thu, 21 Oct 2021 at 11:05, zhenwei pi <[email protected]> wrote:
>>>
>>>
>>> On 10/21/21 4:12 AM, Sean Christopherson wrote:
>>>> On Wed, Oct 20, 2021, Wanpeng Li wrote:
>>>>> On Wed, 20 Oct 2021 at 20:08, zhenwei pi <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Although host side exposes KVM PV SEND IPI feature to guest side,
>>>>>> guest should still have a chance to disable it.
>>>>>>
>>>>>> A typicall case of this parameter:
>>>>>> If the host AMD server enables AVIC feature, the flat mode of APIC
>>>>>> get better performance in the guest.
>>>>>
>>>>> Hmm, I didn't find enough valuable information in your posting. We
>>>>> observe AMD a lot before.
>>>>> https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u
>>>>>
>>>>
>>>> I too would like to see numbers.  I suspect the answer is going to
>>>> be that
>>>> AVIC performs poorly in CPU overcommit scenarios because of the cost
>>>> of managing
>>>> the tables and handling "failed delivery" exits, but that AVIC does
>>>> quite well
>>>> when vCPUs are pinned 1:1 and IPIs rarely require an exit to the host.
>>>>
>>>
>>> Test env:
>>> CPU: AMD EPYC 7642 48-Core Processor
>>>
>>> Kmod args(enable avic and disable nested):
>>> modprobe kvm-amd nested=0 avic=1 npt=1
>>>
>>> QEMU args(disable x2apic):
>>> ... -cpu host,x2apic=off ...
>>>
>>> Benchmark tool:
>>> https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/apic-ipi
>>>
>>>
>>> ~# insmod apic_ipi.ko options=5 && dmesg -c
>>>
>>>    apic_ipi: 1 NUMA node(s)
>>>    apic_ipi: apic [flat]
>>>    apic_ipi: apic->send_IPI[default_send_IPI_single+0x0/0x40]
>>>    apic_ipi: apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
>>>    apic_ipi:     IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
>>>    apic_ipi:             total cycles 375671259, avg 3756
>>>    apic_ipi:     IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
>>>    apic_ipi:             total cycles 221961822, avg 2219
>>>
>>>
>>> apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
>>>     -> This line show current send_IPI_mask is kvm_send_ipi_mask(because
>>> of PV SEND IPI FEATURE)
>>>
>>> apic_ipi:       IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
>>> apic_ipi:               total cycles 375671259, avg 3756
>>>     -->These lines show the average cycles of each kvm_send_ipi_mask:
>>> 3756
>>>
>>> apic_ipi:       IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
>>> apic_ipi:               total cycles 221961822, avg 2219
>>>     -->These lines show the average cycles of each
>>> flat_send_IPI_mask: 2219
>>
>> Just single target IPI is not eough.
>>
>>      Wanpeng
>>
>
> Benchmark smp_call_function_single
> (https://github.com/bytedance/kvm-utils/blob/master/microbenchmark/ipi-bench/ipi_bench.c):
>
>
>  Test env:
>  CPU: AMD EPYC 7642 48-Core Processor
>
>  Kmod args(enable avic and disable nested):
>  modprobe kvm-amd nested=0 avic=1 npt=1
>
>  QEMU args(disable x2apic):
>  ... -cpu host,x2apic=off ...
>
> 1> without no-kvm-pvipi:
> ipi_bench_single wait[1], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
>      elapsed =        424945631 cycles, average =     4249 cycles
>      ipitime =        385246136 cycles, average =     3852 cycles
> ipi_bench_single wait[0], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
>      elapsed =        419057953 cycles, average =     4190 cycles
>
> 2> with no-kvm-pvipi:
> ipi_bench_single wait[1], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
>      elapsed =        321756407 cycles, average =     3217 cycles
>      ipitime =        299433550 cycles, average =     2994 cycles
> ipi_bench_single wait[0], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
>      elapsed =        295382146 cycles, average =     2953 cycles
>
>
Hi, Wanpeng & Sean

Also benchmark redis(by 127.0.0.1) in a guest(2vCPU), 'no-kvm-pvipi'
gets better performance.

Test env:
Host side: pin 2vCPU on 2core in a die.
Guest side: run command:
taskset -c 1 ./redis-server --appendonly no
taskset -c 0 ./redis-benchmark -h 127.0.0.1 -d 1024 -n 10000000 -t get

1> without no-kvm-pvipi:
redis QPS: 193203.12 requests per second
kvm_pv_send_ipi exit: ~18K/s

2> with no-kvm-pvipi:
redis QPS: 196028.47 requests per second
avic_incomplete_ipi_interception exit: ~5K/s

--
zhenwei pi

2021-10-26 21:29:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On Mon, Oct 25, 2021, zhenwei pi wrote:
> Hi, Wanpeng & Sean
>
> Also benchmark redis(by 127.0.0.1) in a guest(2vCPU), 'no-kvm-pvipi' gets
> better performance.
>
> Test env:
> Host side: pin 2vCPU on 2core in a die.
> Guest side: run command:
> taskset -c 1 ./redis-server --appendonly no
> taskset -c 0 ./redis-benchmark -h 127.0.0.1 -d 1024 -n 10000000 -t get
>
> 1> without no-kvm-pvipi:
> redis QPS: 193203.12 requests per second
> kvm_pv_send_ipi exit: ~18K/s
>
> 2> with no-kvm-pvipi:
> redis QPS: 196028.47 requests per second
> avic_incomplete_ipi_interception exit: ~5K/s

Numbers look sane, but I don't think that adding a guest-side kernel param is
the correct "fix". As evidenced by Wanpeng's tests, PV IPI can outperform AVIC
in overcommit scenarios, and there's also no guarantee that AVIC/APICv is even
supported/enabled. In other words, blindly disabling PV IPIs from within the
guest makes sense if and only if the guest knows that AVIC is enabled and that
its vCPUs are pinned. If the guest has that info, then the host also has that
info, in which case the correct way to handle this is to simply not advertise
KVM_FEATURE_PV_SEND_IPI to the guest in CPUID.

2021-10-27 13:55:05

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi

On Wed, 27 Oct 2021 at 00:04, Sean Christopherson <[email protected]> wrote:
>
> On Mon, Oct 25, 2021, zhenwei pi wrote:
> > Hi, Wanpeng & Sean
> >
> > Also benchmark redis(by 127.0.0.1) in a guest(2vCPU), 'no-kvm-pvipi' gets
> > better performance.
> >
> > Test env:
> > Host side: pin 2vCPU on 2core in a die.
> > Guest side: run command:
> > taskset -c 1 ./redis-server --appendonly no
> > taskset -c 0 ./redis-benchmark -h 127.0.0.1 -d 1024 -n 10000000 -t get
> >
> > 1> without no-kvm-pvipi:
> > redis QPS: 193203.12 requests per second
> > kvm_pv_send_ipi exit: ~18K/s
> >
> > 2> with no-kvm-pvipi:
> > redis QPS: 196028.47 requests per second
> > avic_incomplete_ipi_interception exit: ~5K/s
>
> Numbers look sane, but I don't think that adding a guest-side kernel param is
> the correct "fix". As evidenced by Wanpeng's tests, PV IPI can outperform AVIC
> in overcommit scenarios, and there's also no guarantee that AVIC/APICv is even

Our evaluation is a dedicated scenario w/ big VM. The testing from
above is a one-sided view.

Wanpeng