Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753061AbcJPDVr (ORCPT ); Sat, 15 Oct 2016 23:21:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49744 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751392AbcJPDVn (ORCPT ); Sat, 15 Oct 2016 23:21:43 -0400 Date: Sun, 16 Oct 2016 06:21:41 +0300 From: "Michael S. Tsirkin" To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rkrcmar@redhat.com, yang.zhang.wz@gmail.com, feng.wu@intel.com Subject: Re: [PATCH 1/5] KVM: x86: avoid atomic operations on APICv vmentry Message-ID: <20161016060320-mutt-send-email-mst@kernel.org> References: <1476469291-5039-1-git-send-email-pbonzini@redhat.com> <1476469291-5039-2-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1476469291-5039-2-git-send-email-pbonzini@redhat.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Sun, 16 Oct 2016 03:21:42 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2391 Lines: 73 On Fri, Oct 14, 2016 at 08:21:27PM +0200, Paolo Bonzini wrote: > On some benchmarks (e.g. netperf with ioeventfd disabled), APICv > posted interrupts turn out to be slower than interrupt injection via > KVM_REQ_EVENT. > > This patch optimizes a bit the IRR update, avoiding expensive atomic > operations in the common case where PI.ON=0 at vmentry or the PIR vector > is mostly zero. This saves at least 20 cycles (1%) per vmexit, as > measured by kvm-unit-tests' inl_from_qemu test (20 runs): > > | enable_apicv=1 | enable_apicv=0 > | mean stdev | mean stdev > ----------|-----------------|------------------ > before | 5826 32.65 | 5765 47.09 > after | 5809 43.42 | 5777 77.02 > > Of course, any change in the right column is just placebo effect. :) > The savings are bigger if interrupts are frequent. > > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/lapic.c | 6 ++++-- > arch/x86/kvm/vmx.c | 9 ++++++++- > 2 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 23b99f305382..63a442aefc12 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -342,9 +342,11 @@ void __kvm_apic_update_irr(u32 *pir, void *regs) > u32 i, pir_val; > > for (i = 0; i <= 7; i++) { > - pir_val = xchg(&pir[i], 0); > - if (pir_val) > + pir_val = READ_ONCE(pir[i]); > + if (pir_val) { > + pir_val = xchg(&pir[i], 0); > *((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val; > + } > } > } > EXPORT_SYMBOL_GPL(__kvm_apic_update_irr); gcc doesn't seem to unroll this loop and it's probably worth unrolling it The following seems to do the trick for me on upstream - I didn't benchmark it though. Is there a kvm unit test for interrupts? ---> kvm: unroll the loop in __kvm_apic_update_irr. This is hot data path in interrupt-rich workloads, worth unrolling. Signed-off-by: Michael S. Tsirkin diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index b62c852..0c3462c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -337,7 +337,8 @@ static u8 count_vectors(void *bitmap) return count; } -void __kvm_apic_update_irr(u32 *pir, void *regs) +void __attribute__((optimize("unroll-loops"))) +__kvm_apic_update_irr(u32 *pir, void *regs) { u32 i, pir_val;