Date: Sun, 16 Oct 2016 06:21:41 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rkrcmar@redhat.com,
        yang.zhang.wz@gmail.com, feng.wu@intel.com
Subject: Re: [PATCH 1/5] KVM: x86: avoid atomic operations on APICv vmentry
Message-ID: <20161016060320-mutt-send-email-mst@kernel.org>
References: <1476469291-5039-1-git-send-email-pbonzini@redhat.com>
 <1476469291-5039-2-git-send-email-pbonzini@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1476469291-5039-2-git-send-email-pbonzini@redhat.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2391
Lines: 73

On Fri, Oct 14, 2016 at 08:21:27PM +0200, Paolo Bonzini wrote:
> On some benchmarks (e.g. netperf with ioeventfd disabled), APICv
> posted interrupts turn out to be slower than interrupt injection via
> KVM_REQ_EVENT.
> 
> This patch optimizes a bit the IRR update, avoiding expensive atomic
> operations in the common case where PI.ON=0 at vmentry or the PIR vector
> is mostly zero.  This saves at least 20 cycles (1%) per vmexit, as
> measured by kvm-unit-tests' inl_from_qemu test (20 runs):
> 
>               | enable_apicv=1  |  enable_apicv=0
>               | mean     stdev  |  mean     stdev
>     ----------|-----------------|------------------
>     before    | 5826     32.65  |  5765     47.09
>     after     | 5809     43.42  |  5777     77.02
> 
> Of course, any change in the right column is just placebo effect. :)
> The savings are bigger if interrupts are frequent.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 6 ++++--
>  arch/x86/kvm/vmx.c   | 9 ++++++++-
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 23b99f305382..63a442aefc12 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -342,9 +342,11 @@ void __kvm_apic_update_irr(u32 *pir, void *regs)
>  	u32 i, pir_val;
>  
>  	for (i = 0; i <= 7; i++) {
> -		pir_val = xchg(&pir[i], 0);
> -		if (pir_val)
> +		pir_val = READ_ONCE(pir[i]);
> +		if (pir_val) {
> +			pir_val = xchg(&pir[i], 0);
>  			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
> +		}
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);

gcc doesn't seem to unroll this loop and it's
probably worth unrolling it

The following seems to do the trick for me on upstream - I didn't
benchmark it though. Is there a kvm unit test for interrupts?

--->

kvm: unroll the loop in __kvm_apic_update_irr.

This is hot data path in interrupt-rich workloads, worth unrolling.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b62c852..0c3462c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -337,7 +337,8 @@ static u8 count_vectors(void *bitmap)
 	return count;
 }
 
-void __kvm_apic_update_irr(u32 *pir, void *regs)
+void __attribute__((optimize("unroll-loops")))
+__kvm_apic_update_irr(u32 *pir, void *regs)
 {
 	u32 i, pir_val;