Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752033AbdHHEGM (ORCPT ); Tue, 8 Aug 2017 00:06:12 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:3025 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751913AbdHHEGK (ORCPT ); Tue, 8 Aug 2017 00:06:10 -0400 From: "Longpeng(Mike)" To: , CC: , , , , , , , , , , , , , "Longpeng(Mike)" Subject: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin Date: Tue, 8 Aug 2017 12:05:31 +0800 Message-ID: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> X-Mailer: git-send-email 1.8.4.msysgit.0 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.177.246.209] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A010202.598938B0.0019,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 669219640afa7e6589314111b7657db9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3184 Lines: 105 This is a simple optimization for kvm_vcpu_on_spin, the main idea is described in patch-1's commit msg. I did some tests base on the RFC version, the result shows that it can improves the performance slightly. == Geekbench-3.4.1 == VM1: 8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) running Geekbench-3.4.1 *10 truns* VM2/VM3/VM4: configure is the same as VM1 stress each vcpu usage(seed by top in guest) to 40% The comparison of each testcase's score: (higher is better) before after improve Inter single 1176.7 1179.0 0.2% multi 3459.5 3426.5 -0.9% Float single 1150.5 1150.9 0.0% multi 3364.5 3391.9 0.8% Memory(stream) single 1768.7 1773.1 0.2% multi 2511.6 2557.2 1.8% Overall single 1284.2 1286.2 0.2% multi 3231.4 3238.4 0.2% == kernbench-0.42 == VM1: 8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) running "kernbench -n 10" VM2/VM3/VM4: configure is the same as VM1 stress each vcpu usage(seed by top in guest) to 40% The comparison of 'Elapsed Time': (sooner is better) before after improve load -j4 12.762 12.751 0.1% load -j32 9.743 8.955 8.1% load -j 9.688 9.229 4.7% Physical Machine: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz Stepping: 7 CPU MHz: 2799.902 BogoMIPS: 5004.67 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K NUMA node0 CPU(s): 0-5,12-17 NUMA node1 CPU(s): 6-11,18-23 --- Changes since V1: - split the implementation of s390 & arm. [David] - refactor the impls according to the suggestion. [Paolo] Changes since RFC: - only cache result for X86. [David & Cornlia & Paolo] - add performance numbers. [David] - impls arm/s390. [Christoffer & David] - refactor the impls. [me] --- Longpeng(Mike) (4): KVM: add spinlock optimization framework KVM: X86: implement the logic for spinlock optimization KVM: s390: implements the kvm_arch_vcpu_in_kernel() KVM: arm: implements the kvm_arch_vcpu_in_kernel() arch/arm/kvm/handle_exit.c | 2 +- arch/arm64/kvm/handle_exit.c | 2 +- arch/mips/kvm/mips.c | 6 ++++++ arch/powerpc/kvm/powerpc.c | 6 ++++++ arch/s390/kvm/diag.c | 2 +- arch/s390/kvm/kvm-s390.c | 6 ++++++ arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/kvm/hyperv.c | 2 +- arch/x86/kvm/svm.c | 10 +++++++++- arch/x86/kvm/vmx.c | 16 +++++++++++++++- arch/x86/kvm/x86.c | 11 +++++++++++ include/linux/kvm_host.h | 3 ++- virt/kvm/arm/arm.c | 5 +++++ virt/kvm/kvm_main.c | 4 +++- 14 files changed, 72 insertions(+), 8 deletions(-) -- 1.8.3.1