Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2508071ybc; Wed, 20 Nov 2019 15:40:02 -0800 (PST) X-Google-Smtp-Source: APXvYqy0VZ6fXtBVubqGdFaPWm3MabWglC80HtMP43rJ6KTb0g1BEgYt508HwHmTnj7oL4JHn1lk X-Received: by 2002:a17:906:d293:: with SMTP id ay19mr8921217ejb.146.1574293202848; Wed, 20 Nov 2019 15:40:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574293202; cv=none; d=google.com; s=arc-20160816; b=l0T7Rr2Gs2jkp76yfvQIwCwXAYd+sIL/ChqPwOvlmq2ziJwSdwXfVPdthlFXTwjIfE M9mwL17QDJkHoS/5bppjkMLJ9QkoogfJ1TvNBhgLmrjmLFf/IenPaZS6MrHrgQegMewf xYeB/W6vTvqviQFJplNPwVKH/BKrZmwhWHRTHAJJ4Z0gkKKvPVaHc+ER/bpE1QhSRkuT Mfh9sjb+lKD547wLEHWpyJoa++ATIEeL1A7OXS7QFEOFmN/gAZTvQyMA/PKEn3cXEAyY /J7kxxXyiUyh2zN3Na3ltVkzJ/xRFuHEL1rOTe5EnkX2ifXjSL7L2pdWAdRJG/GT27u3 gB4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=Gah6rmllwVxF6eSF+laxaEaYX6/MhWlteeKFkqAcXYk=; b=xVMtqyO6KKNfNSkM7ohUCAjYCW/PBO8CRubxPMac9KF3NZzbhQX2DDMMvTVqK19ilh 5L1IxXNejSp1Eb6LmJ7d4gEfr5FEKbypZVchh5/df2tJl/44GnSA1sfXU0ocvWpqbnIA BPke+k4b9T8/y9Z8IRWj1/HszfoOuHf2CpgG52K08PTCCMiFCvINEj9s0uHki+uEvzC0 ZiyCQCSMQC4G+oUTCUW6INM7ENN9loIj8LzK3Fe57bwCswYbT31TAvkIDcopQsNVOMKU kO1CybP+ktuLnzMsyBvQGCwcCitigfTurHFG9N72Q1sJteOCda7J9WcrqvMWeI2pPCrW 8cdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b="KA/cpl9z"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q25si706413edw.222.2019.11.20.15.39.26; Wed, 20 Nov 2019 15:40:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b="KA/cpl9z"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726351AbfKTXiJ (ORCPT + 99 others); Wed, 20 Nov 2019 18:38:09 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:56264 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725878AbfKTXiJ (ORCPT ); Wed, 20 Nov 2019 18:38:09 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xAKNYXMR053664; Wed, 20 Nov 2019 23:37:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2019-08-05; bh=Gah6rmllwVxF6eSF+laxaEaYX6/MhWlteeKFkqAcXYk=; b=KA/cpl9z6MSGBbE5WUIPDhpKoMr+ZixHqGTSk9aXtZciaOHlPoj+7T1hK1nvYl81/C1L wqTnks06MUF/BwlfAdgssCKZHyyh/SbE8wBYypfiSHm1Ws+ksQ8q7QNrOM0DBN0SY1As SuZgeFJtO8x8eiliPYltXxVErJZdB5tN/TPwmYhuWt6alNCwIu9YlTv/a3nKEZwodrgP GARFhEy5ahX+QtW57MUDbsz8Ss07cW3Op3wKkwC6Fb9AbLUSaRe1byoGA5E8H6+fD2sq e1UdNqLZdaPAo1/1+IEHauZZf8oBMoZ2U/G4fztsgNck3G1mNPrfdeL3cjn0FgERz7lU Kw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2wa8hu0swn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Nov 2019 23:37:05 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xAKNXqsa192707; Wed, 20 Nov 2019 23:37:04 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 2wd47vygrr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Nov 2019 23:37:04 +0000 Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id xAKNb2UK008955; Wed, 20 Nov 2019 23:37:02 GMT Received: from [192.168.14.112] (/79.176.218.68) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 20 Nov 2019 15:37:02 -0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) Subject: Re: [PATCH v3 1/2] KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath From: Liran Alon In-Reply-To: <1574221329-12370-1-git-send-email-wanpengli@tencent.com> Date: Thu, 21 Nov 2019 01:36:57 +0200 Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , =?utf-8?B?UmFkaW0gS3LEjW3DocWZ?= , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel Content-Transfer-Encoding: quoted-printable Message-Id: <61E34902-0743-4DAF-A7DF-94C0E51CDA08@oracle.com> References: <1574221329-12370-1-git-send-email-wanpengli@tencent.com> To: Wanpeng Li X-Mailer: Apple Mail (2.3445.4.7) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9447 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1911200197 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9447 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1911200197 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 20 Nov 2019, at 5:42, Wanpeng Li wrote: >=20 > From: Wanpeng Li >=20 > ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in=20 > our product observation, multicast IPIs are not as common as unicast=20= > IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. Have you also had the chance to measure non-Linux workloads. Such as = Windows? >=20 > This patch tries to optimize x2apic physical destination mode, fixed=20= > delivery mode single target IPI. The fast path is invoked at=20 > ->handle_exit_irqoff() to emulate only the effect of the ICR write > itself, i.e. the sending of IPIs. Sending IPIs early in the VM-Exit > flow reduces the latency of virtual IPIs by avoiding the expensive = bits > of transitioning from guest to host, e.g. reacquiring KVM's SRCU lock. > Especially when running guest w/ KVM_CAP_X86_DISABLE_EXITS capability=20= > enabled or guest can keep running, IPI can be injected to target vCPU=20= > by posted-interrupt immediately. May I suggest an alternative phrasing? Something such as: =E2=80=9C=E2=80=9D=E2=80=9D This patch introduce a mechanism to handle certain performance-critical = WRMSRs in a very early stage of KVM VMExit handler. This mechanism is specifically used for accelerating writes to x2APIC = ICR that attempt to send a virtual IPI with physical destination-mode, fixed = delivery-mode and single target. Which was found as one of the main causes of VMExits = for Linux workloads. The reason this mechanism significantly reduce the latency of such = virtual IPIs is by sending the physical IPI to the target vCPU in a very early stage = of KVM VMExit handler, before host interrupts are enabled and before expensive operations such as reacquiring KVM=E2=80=99s SRCU lock. Latency is reduced even more when KVM is able to use APICv = posted-interrupt mechanism (which allows to deliver the virtual IPI directly to target = vCPU without the need to kick it to host). =E2=80=9C=E2=80=9D=E2=80=9D >=20 > Testing on Xeon Skylake server: >=20 > The virtual IPI latency from sender send to receiver receive reduces=20= > more than 200+ cpu cycles. >=20 > Cc: Paolo Bonzini > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > Cc: Sean Christopherson > Cc: Vitaly Kuznetsov > Cc: Liran Alon > Signed-off-by: Wanpeng Li I see you used the code I provided my reply to v2. :) I had only some very minor comments inline below. Therefore: Reviewed-by: Liran Alon Thanks for doing this optimisation. -Liran > --- > v2 -> v3: > * for both VMX and SVM > * vmx_handle_exit() get second parameter by value and not by pointer > * rename parameter to =E2=80=9Caccel_exit_completion=E2=80=9D > * preserve tracepoint ordering > * rename handler to handle_accel_set_msr_irqoff and more generic > * add comments above handle_accel_set_msr_irqoff > * msr index APIC_BASE_MSR + (APIC_ICR >> 4) > v1 -> v2: > * add tracepoint > * Instead of a separate vcpu->fast_vmexit, set exit_reason > to vmx->exit_reason to -1 if the fast path succeeds. > * move the "kvm_skip_emulated_instruction(vcpu)" to vmx_handle_exit > * moving the handling into vmx_handle_exit_irqoff() >=20 > arch/x86/include/asm/kvm_host.h | 11 ++++++++-- > arch/x86/kvm/svm.c | 14 ++++++++---- > arch/x86/kvm/vmx/vmx.c | 13 ++++++++--- > arch/x86/kvm/x86.c | 48 = +++++++++++++++++++++++++++++++++++++++-- > arch/x86/kvm/x86.h | 1 + > 5 files changed, 76 insertions(+), 11 deletions(-) >=20 > diff --git a/arch/x86/include/asm/kvm_host.h = b/arch/x86/include/asm/kvm_host.h > index 898ab9e..67c7889 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -175,6 +175,11 @@ enum { > VCPU_SREG_LDTR, > }; >=20 > +enum accel_exit_completion { > + ACCEL_EXIT_NONE, > + ACCEL_EXIT_SKIP_EMUL_INS =3D -1, You should remove the =E2=80=9C=3D -1=E2=80=9D. > +}; > + > #include >=20 > #define KVM_NR_MEM_OBJS 40 > @@ -1084,7 +1089,8 @@ struct kvm_x86_ops { > void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr); >=20 > void (*run)(struct kvm_vcpu *vcpu); > - int (*handle_exit)(struct kvm_vcpu *vcpu); > + int (*handle_exit)(struct kvm_vcpu *vcpu, > + enum accel_exit_completion accel_exit); > int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); > void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); > u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu); > @@ -1134,7 +1140,8 @@ struct kvm_x86_ops { > int (*check_intercept)(struct kvm_vcpu *vcpu, > struct x86_instruction_info *info, > enum x86_intercept_stage stage); > - void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu); > + void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu, > + enum accel_exit_completion *accel_exit); > bool (*mpx_supported)(void); > bool (*xsaves_supported)(void); > bool (*umip_emulated)(void); > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index d02a73a..060b11d 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -4929,7 +4929,8 @@ static void svm_get_exit_info(struct kvm_vcpu = *vcpu, u64 *info1, u64 *info2) > *info2 =3D control->exit_info_2; > } >=20 > -static int handle_exit(struct kvm_vcpu *vcpu) > +static int handle_exit(struct kvm_vcpu *vcpu, > + enum accel_exit_completion accel_exit) > { > struct vcpu_svm *svm =3D to_svm(vcpu); > struct kvm_run *kvm_run =3D vcpu->run; > @@ -4987,7 +4988,10 @@ static int handle_exit(struct kvm_vcpu *vcpu) > __func__, svm->vmcb->control.exit_int_info, > exit_code); >=20 > - if (exit_code >=3D ARRAY_SIZE(svm_exit_handlers) > + if (accel_exit =3D=3D ACCEL_EXIT_SKIP_EMUL_INS) { > + kvm_skip_emulated_instruction(vcpu); > + return 1; > + } else if (exit_code >=3D ARRAY_SIZE(svm_exit_handlers) > || !svm_exit_handlers[exit_code]) { > vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%x\n", = exit_code); > dump_vmcb(vcpu); > @@ -6187,9 +6191,11 @@ static int svm_check_intercept(struct kvm_vcpu = *vcpu, > return ret; > } >=20 > -static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu) > +static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu, > + enum accel_exit_completion *accel_exit) > { > - > + if (to_svm(vcpu)->vmcb->control.exit_code =3D=3D = EXIT_REASON_MSR_WRITE) > + *accel_exit =3D handle_accel_set_msr_irqoff(vcpu); > } >=20 > static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu) > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 621142e5..86c0a23 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -5792,7 +5792,8 @@ void dump_vmcs(void) > * The guest has exited. See if we can fix it or if we need userspace > * assistance. > */ > -static int vmx_handle_exit(struct kvm_vcpu *vcpu) > +static int vmx_handle_exit(struct kvm_vcpu *vcpu, > + enum accel_exit_completion accel_exit) > { > struct vcpu_vmx *vmx =3D to_vmx(vcpu); > u32 exit_reason =3D vmx->exit_reason; > @@ -5878,7 +5879,10 @@ static int vmx_handle_exit(struct kvm_vcpu = *vcpu) > } > } >=20 > - if (exit_reason < kvm_vmx_max_exit_handlers > + if (accel_exit =3D=3D ACCEL_EXIT_SKIP_EMUL_INS) { > + kvm_skip_emulated_instruction(vcpu); > + return 1; > + } else if (exit_reason < kvm_vmx_max_exit_handlers > && kvm_vmx_exit_handlers[exit_reason]) { > #ifdef CONFIG_RETPOLINE > if (exit_reason =3D=3D EXIT_REASON_MSR_WRITE) > @@ -6223,7 +6227,8 @@ static void = handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) > } > STACK_FRAME_NON_STANDARD(handle_external_interrupt_irqoff); >=20 > -static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) > +static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu, > + enum accel_exit_completion *accel_exit) > { > struct vcpu_vmx *vmx =3D to_vmx(vcpu); >=20 > @@ -6231,6 +6236,8 @@ static void vmx_handle_exit_irqoff(struct = kvm_vcpu *vcpu) > handle_external_interrupt_irqoff(vcpu); > else if (vmx->exit_reason =3D=3D EXIT_REASON_EXCEPTION_NMI) > handle_exception_nmi_irqoff(vmx); > + else if (vmx->exit_reason =3D=3D EXIT_REASON_MSR_WRITE) > + *accel_exit =3D handle_accel_set_msr_irqoff(vcpu); > } >=20 > static bool vmx_has_emulated_msr(int index) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 991dd01..966c659 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1509,6 +1509,49 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu) > } > EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr); >=20 > +static int handle_accel_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, = u64 data) > +{ > + if (lapic_in_kernel(vcpu) && apic_x2apic_mode(vcpu->arch.apic) = && > + ((data & KVM_APIC_DEST_MASK) =3D=3D APIC_DEST_PHYSICAL) = && > + ((data & APIC_MODE_MASK) =3D=3D APIC_DM_FIXED)) { > + > + kvm_lapic_set_reg(vcpu->arch.apic, APIC_ICR2, (u32)(data = >> 32)); > + return kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR, = (u32)data); > + } > + > + return 1; > +} > + > +/* > + * The fast path for frequent and performance sensitive wrmsr = emulation, > + * i.e. the sending of IPI, sending IPI early in the VM-Exit flow = reduces > + * the latency of virtual IPI by avoiding the expensive bits of = transitioning > + * from guest to host, e.g. reacquiring KVM's SRCU lock. In contrast = to the > + * other cases which must be called after interrupts are enabled on = the host. > + */ This comment belongs better on top of = handle_accel_set_x2apic_icr_irqoff(). As handle_accel_set_msr_irqoff() is in theory written to maybe use it = for other MSRs as-well. > +enum accel_exit_completion handle_accel_set_msr_irqoff(struct = kvm_vcpu *vcpu) > +{ > + u32 msr =3D kvm_rcx_read(vcpu); > + u64 data =3D kvm_read_edx_eax(vcpu); > + int ret =3D 0; > + > + switch (msr) { > + case APIC_BASE_MSR + (APIC_ICR >> 4): > + ret =3D handle_accel_set_x2apic_icr_irqoff(vcpu, data); > + break; > + default: > + return ACCEL_EXIT_NONE; > + } > + > + if (!ret) { > + trace_kvm_msr_write(msr, data); > + return ACCEL_EXIT_SKIP_EMUL_INS; > + } > + > + return ACCEL_EXIT_NONE; > +} > +EXPORT_SYMBOL_GPL(handle_accel_set_msr_irqoff); > + > /* > * Adapt set_msr() to msr_io()'s calling convention > */ > @@ -7984,6 +8027,7 @@ static int vcpu_enter_guest(struct kvm_vcpu = *vcpu) > bool req_int_win =3D > dm_request_for_irq_injection(vcpu) && > kvm_cpu_accept_dm_intr(vcpu); > + enum accel_exit_completion accel_exit =3D ACCEL_EXIT_NONE; >=20 > bool req_immediate_exit =3D false; >=20 > @@ -8226,7 +8270,7 @@ static int vcpu_enter_guest(struct kvm_vcpu = *vcpu) > vcpu->mode =3D OUTSIDE_GUEST_MODE; > smp_wmb(); >=20 > - kvm_x86_ops->handle_exit_irqoff(vcpu); > + kvm_x86_ops->handle_exit_irqoff(vcpu, &accel_exit); >=20 > /* > * Consume any pending interrupts, including the possible source = of > @@ -8270,7 +8314,7 @@ static int vcpu_enter_guest(struct kvm_vcpu = *vcpu) > kvm_lapic_sync_from_vapic(vcpu); >=20 > vcpu->arch.gpa_available =3D false; > - r =3D kvm_x86_ops->handle_exit(vcpu); > + r =3D kvm_x86_ops->handle_exit(vcpu, accel_exit); > return r; >=20 > cancel_injection: > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index 29391af..f14ec14 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -291,6 +291,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct = kvm_vcpu *vcpu, gfn_t gfn, > bool kvm_vector_hashing_enabled(void); > int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, > int emulation_type, void *insn, int = insn_len); > +enum accel_exit_completion handle_accel_set_msr_irqoff(struct = kvm_vcpu *vcpu); >=20 > #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \ > | XFEATURE_MASK_YMM | = XFEATURE_MASK_BNDREGS \ > --=20 > 2.7.4 >=20