Currently, issuing an IPI except self-ipi in guest on Intel CPU
always causes a VM-exit. It can lead to non-negligible overhead
to some workloads involving frequent IPIs when running in VMs.
IPI virtualization is a new VT-x feature, targeting to eliminate
VM-exits on source vCPUs when issuing unicast, physical-addressing
IPIs. Once it is enabled, the processor virtualizes following kinds
of operations that send IPIs without causing VM-exits:
- Memory-mapped ICR writes
- MSR-mapped ICR writes
- SENDUIPI execution
This patch series implements IPI virtualization support in KVM.
Patches 1-4 add tertiary processor-based VM-execution support
framework, which is used to enumerate IPI virtualization.
Patch 5 handles APIC-write VM exit due to writes to ICR MSR when
guest works in x2APIC mode. This is a new case introduced by
Intel VT-x.
Patch 6 disable the APIC ID change in any case.
Patch 7 implements IPI virtualization related function including
feature enabling through tertiary processor-based VM-execution in
various scenarios of VMCS configuration, PID table setup in vCPU
creation and vCPU block consideration.
Patch 8-9 provide userspace capability to set maximum possible VCPU
ID for current VM. IPIv can refer to this value to allocate essential
memory for PID-pointer table instead of using KVM_MAX_VCPU_IDS. It
targets to reduce overall memory footprint.
Document for IPI virtualization is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".
Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
We did experiment to measure average time sending IPI from source vCPU
to the target vCPU completing the IPI handling by kvm unittest w/ and
w/o IPI virtualization. When IPI virtualization enabled, it will reduce
22.21% and 15.98% cycles consuming in xAPIC mode and x2APIC mode
respectively.
--------------------------------------
KVM unittest:vmexit/ipi
2 vCPU, AP was modified to run in idle loop instead of halt to ensure
no VM exit impact on target vCPU.
Cycles of IPI
xAPIC mode x2APIC mode
test w/o IPIv w/ IPIv w/o IPIv w/ IPIv
1 6106 4816 4265 3768
2 6244 4656 4404 3546
3 6165 4658 4233 3474
4 5992 4710 4363 3430
5 6083 4741 4215 3551
6 6238 4904 4304 3547
7 6164 4617 4263 3709
8 5984 4763 4518 3779
9 5931 4712 4645 3667
10 5955 4530 4332 3724
11 5897 4673 4283 3569
12 6140 4794 4178 3598
13 6183 4728 4363 3628
14 5991 4994 4509 3842
15 5866 4665 4520 3739
16 6032 4654 4229 3701
17 6050 4653 4185 3726
18 6004 4792 4319 3746
19 5961 4626 4196 3392
20 6194 4576 4433 3760
Average cycles 6059 4713.1 4337.85 3644.8
%Reduction -22.21% -15.98%
--------------------------------------
IPI microbenchmark:
(https://lore.kernel.org/kvm/[email protected])
2 vCPUs, 1:1 pin vCPU to pCPU, guest VM runs with idle=poll, x2APIC mode
Result with IPIv enabled:
Dry-run: 0, 272798 ns
Self-IPI: 5094123, 11114037 ns
Normal IPI: 131697087, 173321200 ns
Broadcast IPI: 0, 155649075 ns
Broadcast lock: 0, 161518031 ns
Result with IPIv disabled:
Dry-run: 0, 272766 ns
Self-IPI: 5091788, 11123699 ns
Normal IPI: 145215772, 174558920 ns
Broadcast IPI: 0, 175785384 ns
Broadcast lock: 0, 149076195 ns
As IPIv can benefit unicast IPI to other CPU, Normal IPI test case gain
about 9.73% time saving on average out of 15 test runs when IPIv is
enabled.
Normal IPI statistics (unit:ns):
test w/o IPIv w/ IPIv
1 153346049 140907046
2 147218648 141660618
3 145215772 117890672
4 146621682 136430470
5 144821472 136199421
6 144704378 131676928
7 141403224 131697087
8 144775766 125476250
9 140658192 137263330
10 144768626 138593127
11 145166679 131946752
12 145020451 116852889
13 148161353 131406280
14 148378655 130174353
15 148903652 127969674
Average time 145944306.6 131742993.1 ns
%Reduction -9.73%
--------------------------------------
hackbench:
8 vCPUs, guest VM free run, x2APIC mode
./hackbench -p -l 100000
w/o IPIv w/ IPIv
Time 91.887 74.605
%Reduction -18.808%
96 vCPUs, guest VM free run, x2APIC mode
./hackbench -p -l 1000000
w/o IPIv w/ IPIv
Time 287.504 235.185
%Reduction -18.198%
--------------------------------------
v5->v6:
1. Adapt kvm_apic_write_nodecode() implementation based
on Sean's fix of x2apic's ICR register process.
2. Drop the patch handling IPIv table entry setting in
case APIC ID changed, instead applying Levitsky's patch
to disallow setting APIC ID in any case.
3. Drop the patch resizing the PID-pointer table on demand.
Allow userspace to set maximum vcpu id at runtime that
IPIv can refer to the practical value to allocate necessary
memory for PID-pointer table.
v4 -> v5:
1. Deal with enable_ipiv parameter following current
vmcs configuration rule.
2. Allocate memory for PID-pointer table dynamically
3. Support guest runtime modify APIC ID in xAPIC mode
4. Helper to judge possibility to take PI block in IPIv case
v3 -> v4:
1. Refine code style of patch 2
2. Move tertiary control shadow build into patch 3
3. Make vmx_tertiary_exec_control to be static function
v2 -> v3:
1. Misc change on tertiary execution control
definition and capability setup
2. Alternative to get tertiary execution
control configuration
v1 -> v2:
1. Refine the IPIv enabling logic for VM.
Remove ipiv_active definition per vCPU.
--------------------------------------
Gao Chao (1):
KVM: VMX: enable IPI virtualization
Maxim Levitsky (1):
KVM: x86: lapic: don't allow to change APIC ID unconditionally
Robert Hoo (4):
x86/cpu: Add new VMX feature, Tertiary VM-Execution control
KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit
variation
KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config
KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well
Zeng Guang (3):
KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode
KVM: x86: Allow userspace set maximum VCPU id for VM
KVM: VMX: Optimize memory allocation for PID-pointer table
arch/x86/include/asm/kvm_host.h | 6 ++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 11 +++
arch/x86/include/asm/vmxfeatures.h | 5 +-
arch/x86/kernel/cpu/feat_ctl.c | 9 +-
arch/x86/kvm/lapic.c | 50 ++++++++---
arch/x86/kvm/vmx/capabilities.h | 13 +++
arch/x86/kvm/vmx/evmcs.c | 2 +
arch/x86/kvm/vmx/evmcs.h | 1 +
arch/x86/kvm/vmx/posted_intr.c | 12 ++-
arch/x86/kvm/vmx/vmcs.h | 1 +
arch/x86/kvm/vmx/vmx.c | 140 ++++++++++++++++++++++++++---
arch/x86/kvm/vmx/vmx.h | 63 +++++++------
arch/x86/kvm/x86.c | 11 +++
14 files changed, 274 insertions(+), 51 deletions(-)
--
2.27.0
From: Robert Hoo <[email protected]>
Add tertiary_exec_control field report in dump_vmcs()
Signed-off-by: Robert Hoo <[email protected]>
Signed-off-by: Zeng Guang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8a5713d49635..7beba7a9f247 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5891,6 +5891,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 vmentry_ctl, vmexit_ctl;
u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control;
+ u64 tertiary_exec_control;
unsigned long cr4;
int efer_slot;
@@ -5904,9 +5905,16 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
cpu_based_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL);
cr4 = vmcs_readl(GUEST_CR4);
- secondary_exec_control = 0;
+
if (cpu_has_secondary_exec_ctrls())
secondary_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+ else
+ secondary_exec_control = 0;
+
+ if (cpu_has_tertiary_exec_ctrls())
+ tertiary_exec_control = vmcs_read64(TERTIARY_VM_EXEC_CONTROL);
+ else
+ tertiary_exec_control = 0;
pr_err("VMCS %p, last attempted VM-entry on CPU %d\n",
vmx->loaded_vmcs->vmcs, vcpu->arch.last_vmentry_cpu);
@@ -6006,9 +6014,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
pr_err("*** Control State ***\n");
- pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n",
- pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control);
- pr_err("EntryControls=%08x ExitControls=%08x\n", vmentry_ctl, vmexit_ctl);
+ pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
+ cpu_based_exec_ctrl, secondary_exec_control, tertiary_exec_control);
+ pr_err("PinBased=0x%08x EntryControls=%08x ExitControls=%08x\n",
+ pin_based_exec_ctrl, vmentry_ctl, vmexit_ctl);
pr_err("ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x\n",
vmcs_read32(EXCEPTION_BITMAP),
vmcs_read32(PAGE_FAULT_ERROR_CODE_MASK),
--
2.27.0