From: "Longpeng(Mike)" <longpeng2@huawei.com>
To: <pbonzini@redhat.com>, <rkrcmar@redhat.com>
CC: <agraf@suse.com>, <borntraeger@de.ibm.com>, <cohuck@redhat.com>,
        <christoffer.dall@linaro.org>, <marc.zyngier@arm.com>,
        <james.hogan@imgtec.com>, <kvm@vger.kernel.org>,
        <linux-kernel@vger.kernel.org>, <weidong.huang@huawei.com>,
        <arei.gonglei@huawei.com>, <wangxinxin.wang@huawei.com>,
        <longpeng.mike@gmail.com>, <david@redhat.com>,
        "Longpeng(Mike)" <longpeng2@huawei.com>
Subject: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin
Date: Tue, 8 Aug 2017 12:05:31 +0800
Message-ID: <1502165135-4784-1-git-send-email-longpeng2@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3184
Lines: 105

This is a simple optimization for kvm_vcpu_on_spin, the
main idea is described in patch-1's commit msg.

I did some tests base on the RFC version, the result shows
that it can improves the performance slightly.

== Geekbench-3.4.1 ==
VM1: 	8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19)
	running Geekbench-3.4.1 *10 truns*
VM2/VM3/VM4: configure is the same as VM1
	stress each vcpu usage(seed by top in guest) to 40%

The comparison of each testcase's score:
(higher is better)
		before		after		improve
Inter
 single		1176.7		1179.0		0.2%
 multi		3459.5		3426.5		-0.9%
Float
 single		1150.5		1150.9		0.0%
 multi		3364.5		3391.9		0.8%
Memory(stream)
 single		1768.7		1773.1		0.2%
 multi		2511.6		2557.2		1.8%
Overall
 single		1284.2		1286.2		0.2%
 multi		3231.4		3238.4		0.2%


== kernbench-0.42 ==
VM1:    8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19)
        running "kernbench -n 10"
VM2/VM3/VM4: configure is the same as VM1
        stress each vcpu usage(seed by top in guest) to 40%

The comparison of 'Elapsed Time':
(sooner is better)
		before		after		improve
load -j4	12.762		12.751		0.1%
load -j32	9.743		8.955		8.1%
load -j		9.688		9.229		4.7%


Physical Machine:
  Architecture:          x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
  CPU(s):                24
  On-line CPU(s) list:   0-23
  Thread(s) per core:    2
  Core(s) per socket:    6
  Socket(s):             2
  NUMA node(s):          2
  Vendor ID:             GenuineIntel
  CPU family:            6
  Model:                 45
  Model name:            Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
  Stepping:              7
  CPU MHz:               2799.902
  BogoMIPS:              5004.67
  Virtualization:        VT-x
  L1d cache:             32K
  L1i cache:             32K
  L2 cache:              256K
  L3 cache:              15360K
  NUMA node0 CPU(s):     0-5,12-17
  NUMA node1 CPU(s):     6-11,18-23

---
Changes since V1:
 - split the implementation of s390 & arm. [David]
 - refactor the impls according to the suggestion. [Paolo]

Changes since RFC:
 - only cache result for X86. [David & Cornlia & Paolo]
 - add performance numbers. [David]
 - impls arm/s390. [Christoffer & David]
 - refactor the impls. [me]

---
Longpeng(Mike) (4):
  KVM: add spinlock optimization framework
  KVM: X86: implement the logic for spinlock optimization
  KVM: s390: implements the kvm_arch_vcpu_in_kernel()
  KVM: arm: implements the kvm_arch_vcpu_in_kernel()

 arch/arm/kvm/handle_exit.c      |  2 +-
 arch/arm64/kvm/handle_exit.c    |  2 +-
 arch/mips/kvm/mips.c            |  6 ++++++
 arch/powerpc/kvm/powerpc.c      |  6 ++++++
 arch/s390/kvm/diag.c            |  2 +-
 arch/s390/kvm/kvm-s390.c        |  6 ++++++
 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/hyperv.c           |  2 +-
 arch/x86/kvm/svm.c              | 10 +++++++++-
 arch/x86/kvm/vmx.c              | 16 +++++++++++++++-
 arch/x86/kvm/x86.c              | 11 +++++++++++
 include/linux/kvm_host.h        |  3 ++-
 virt/kvm/arm/arm.c              |  5 +++++
 virt/kvm/kvm_main.c             |  4 +++-
 14 files changed, 72 insertions(+), 8 deletions(-)

-- 
1.8.3.1