Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752092AbdHHLtb (ORCPT ); Tue, 8 Aug 2017 07:49:31 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:2588 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751925AbdHHLt3 (ORCPT ); Tue, 8 Aug 2017 07:49:29 -0400 Message-ID: <5989A53B.8070406@huawei.com> Date: Tue, 8 Aug 2017 19:49:15 +0800 From: "Longpeng (Mike)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: David Hildenbrand CC: , , , , , , , , , , , , , Subject: Re: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin References: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.246.209] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020204.5989A546.01D6,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 199fda3346ce46a8dede5e4b895471ec Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4368 Lines: 137 On 2017/8/8 19:25, David Hildenbrand wrote: > On 08.08.2017 06:05, Longpeng(Mike) wrote: >> This is a simple optimization for kvm_vcpu_on_spin, the >> main idea is described in patch-1's commit msg. >> >> I did some tests base on the RFC version, the result shows >> that it can improves the performance slightly. >> >> == Geekbench-3.4.1 == >> VM1: 8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) >> running Geekbench-3.4.1 *10 truns* >> VM2/VM3/VM4: configure is the same as VM1 >> stress each vcpu usage(seed by top in guest) to 40% >> >> The comparison of each testcase's score: >> (higher is better) >> before after improve >> Inter >> single 1176.7 1179.0 0.2% >> multi 3459.5 3426.5 -0.9% >> Float >> single 1150.5 1150.9 0.0% >> multi 3364.5 3391.9 0.8% >> Memory(stream) >> single 1768.7 1773.1 0.2% >> multi 2511.6 2557.2 1.8% >> Overall >> single 1284.2 1286.2 0.2% >> multi 3231.4 3238.4 0.2% >> >> >> == kernbench-0.42 == >> VM1: 8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) >> running "kernbench -n 10" >> VM2/VM3/VM4: configure is the same as VM1 >> stress each vcpu usage(seed by top in guest) to 40% >> >> The comparison of 'Elapsed Time': >> (sooner is better) >> before after improve >> load -j4 12.762 12.751 0.1% >> load -j32 9.743 8.955 8.1% >> load -j 9.688 9.229 4.7% >> >> >> Physical Machine: >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 24 >> On-line CPU(s) list: 0-23 >> Thread(s) per core: 2 >> Core(s) per socket: 6 >> Socket(s): 2 >> NUMA node(s): 2 >> Vendor ID: GenuineIntel >> CPU family: 6 >> Model: 45 >> Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz >> Stepping: 7 >> CPU MHz: 2799.902 >> BogoMIPS: 5004.67 >> Virtualization: VT-x >> L1d cache: 32K >> L1i cache: 32K >> L2 cache: 256K >> L3 cache: 15360K >> NUMA node0 CPU(s): 0-5,12-17 >> NUMA node1 CPU(s): 6-11,18-23 >> >> --- >> Changes since V1: >> - split the implementation of s390 & arm. [David] >> - refactor the impls according to the suggestion. [Paolo] >> >> Changes since RFC: >> - only cache result for X86. [David & Cornlia & Paolo] >> - add performance numbers. [David] >> - impls arm/s390. [Christoffer & David] >> - refactor the impls. [me] >> >> --- >> Longpeng(Mike) (4): >> KVM: add spinlock optimization framework >> KVM: X86: implement the logic for spinlock optimization >> KVM: s390: implements the kvm_arch_vcpu_in_kernel() >> KVM: arm: implements the kvm_arch_vcpu_in_kernel() >> >> arch/arm/kvm/handle_exit.c | 2 +- >> arch/arm64/kvm/handle_exit.c | 2 +- >> arch/mips/kvm/mips.c | 6 ++++++ >> arch/powerpc/kvm/powerpc.c | 6 ++++++ >> arch/s390/kvm/diag.c | 2 +- >> arch/s390/kvm/kvm-s390.c | 6 ++++++ >> arch/x86/include/asm/kvm_host.h | 5 +++++ >> arch/x86/kvm/hyperv.c | 2 +- >> arch/x86/kvm/svm.c | 10 +++++++++- >> arch/x86/kvm/vmx.c | 16 +++++++++++++++- >> arch/x86/kvm/x86.c | 11 +++++++++++ >> include/linux/kvm_host.h | 3 ++- >> virt/kvm/arm/arm.c | 5 +++++ >> virt/kvm/kvm_main.c | 4 +++- >> 14 files changed, 72 insertions(+), 8 deletions(-) >> > > I am curious, is there any architecture that allows to trigger > kvm_vcpu_on_spin(vcpu); while _not_ in kernel mode? IIUC, X86/SVM will trap to host due to PAUSE insn no matter the vcpu is in kernel-mode or user-mode. > > I would have guessed that user space should never be allowed to make cpu > wide decisions (giving up the CPU to the hypervisor). > > E.g. s390x diag can only be executed from kernel space. VMX PAUSE is > only valid from kernel space. X86/VMX has "PAUSE exiting" and "PAUSE-loop exiting"(PLE). KVM only uses PLE, this is as you said "only valid from kernel space" However, the "PAUSE exiting" can cause user-mode vcpu exit too. > > I.o.w. do we need a parameter to kvm_vcpu_on_spin(vcpu); at all, or is > "me_in_kernel" basically always true? > -- Regards, Longpeng(Mike)