Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752191AbdHHLvD (ORCPT ); Tue, 8 Aug 2017 07:51:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26925 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751925AbdHHLvC (ORCPT ); Tue, 8 Aug 2017 07:51:02 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A14A38046A Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=david@redhat.com Subject: Re: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin To: "Longpeng (Mike)" Cc: pbonzini@redhat.com, rkrcmar@redhat.com, agraf@suse.com, borntraeger@de.ibm.com, cohuck@redhat.com, christoffer.dall@linaro.org, marc.zyngier@arm.com, james.hogan@imgtec.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, weidong.huang@huawei.com, arei.gonglei@huawei.com, wangxinxin.wang@huawei.com, longpeng.mike@gmail.com References: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> <5989A53B.8070406@huawei.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <1187734c-7bf4-60fc-306c-6e498fc3d4c4@redhat.com> Date: Tue, 8 Aug 2017 13:50:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <5989A53B.8070406@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 08 Aug 2017 11:51:02 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4456 Lines: 136 On 08.08.2017 13:49, Longpeng (Mike) wrote: > > > On 2017/8/8 19:25, David Hildenbrand wrote: > >> On 08.08.2017 06:05, Longpeng(Mike) wrote: >>> This is a simple optimization for kvm_vcpu_on_spin, the >>> main idea is described in patch-1's commit msg. >>> >>> I did some tests base on the RFC version, the result shows >>> that it can improves the performance slightly. >>> >>> == Geekbench-3.4.1 == >>> VM1: 8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) >>> running Geekbench-3.4.1 *10 truns* >>> VM2/VM3/VM4: configure is the same as VM1 >>> stress each vcpu usage(seed by top in guest) to 40% >>> >>> The comparison of each testcase's score: >>> (higher is better) >>> before after improve >>> Inter >>> single 1176.7 1179.0 0.2% >>> multi 3459.5 3426.5 -0.9% >>> Float >>> single 1150.5 1150.9 0.0% >>> multi 3364.5 3391.9 0.8% >>> Memory(stream) >>> single 1768.7 1773.1 0.2% >>> multi 2511.6 2557.2 1.8% >>> Overall >>> single 1284.2 1286.2 0.2% >>> multi 3231.4 3238.4 0.2% >>> >>> >>> == kernbench-0.42 == >>> VM1: 8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) >>> running "kernbench -n 10" >>> VM2/VM3/VM4: configure is the same as VM1 >>> stress each vcpu usage(seed by top in guest) to 40% >>> >>> The comparison of 'Elapsed Time': >>> (sooner is better) >>> before after improve >>> load -j4 12.762 12.751 0.1% >>> load -j32 9.743 8.955 8.1% >>> load -j 9.688 9.229 4.7% >>> >>> >>> Physical Machine: >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Byte Order: Little Endian >>> CPU(s): 24 >>> On-line CPU(s) list: 0-23 >>> Thread(s) per core: 2 >>> Core(s) per socket: 6 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family: 6 >>> Model: 45 >>> Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz >>> Stepping: 7 >>> CPU MHz: 2799.902 >>> BogoMIPS: 5004.67 >>> Virtualization: VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 256K >>> L3 cache: 15360K >>> NUMA node0 CPU(s): 0-5,12-17 >>> NUMA node1 CPU(s): 6-11,18-23 >>> >>> --- >>> Changes since V1: >>> - split the implementation of s390 & arm. [David] >>> - refactor the impls according to the suggestion. [Paolo] >>> >>> Changes since RFC: >>> - only cache result for X86. [David & Cornlia & Paolo] >>> - add performance numbers. [David] >>> - impls arm/s390. [Christoffer & David] >>> - refactor the impls. [me] >>> >>> --- >>> Longpeng(Mike) (4): >>> KVM: add spinlock optimization framework >>> KVM: X86: implement the logic for spinlock optimization >>> KVM: s390: implements the kvm_arch_vcpu_in_kernel() >>> KVM: arm: implements the kvm_arch_vcpu_in_kernel() >>> >>> arch/arm/kvm/handle_exit.c | 2 +- >>> arch/arm64/kvm/handle_exit.c | 2 +- >>> arch/mips/kvm/mips.c | 6 ++++++ >>> arch/powerpc/kvm/powerpc.c | 6 ++++++ >>> arch/s390/kvm/diag.c | 2 +- >>> arch/s390/kvm/kvm-s390.c | 6 ++++++ >>> arch/x86/include/asm/kvm_host.h | 5 +++++ >>> arch/x86/kvm/hyperv.c | 2 +- >>> arch/x86/kvm/svm.c | 10 +++++++++- >>> arch/x86/kvm/vmx.c | 16 +++++++++++++++- >>> arch/x86/kvm/x86.c | 11 +++++++++++ >>> include/linux/kvm_host.h | 3 ++- >>> virt/kvm/arm/arm.c | 5 +++++ >>> virt/kvm/kvm_main.c | 4 +++- >>> 14 files changed, 72 insertions(+), 8 deletions(-) >>> >> >> I am curious, is there any architecture that allows to trigger >> kvm_vcpu_on_spin(vcpu); while _not_ in kernel mode? > > > IIUC, X86/SVM will trap to host due to PAUSE insn no matter the vcpu is in > kernel-mode or user-mode. > >> >> I would have guessed that user space should never be allowed to make cpu >> wide decisions (giving up the CPU to the hypervisor). >> >> E.g. s390x diag can only be executed from kernel space. VMX PAUSE is >> only valid from kernel space. > > > X86/VMX has "PAUSE exiting" and "PAUSE-loop exiting"(PLE). KVM only uses PLE, > this is as you said "only valid from kernel space" > > However, the "PAUSE exiting" can cause user-mode vcpu exit too. Thanks Longpeng and Christoffer! -- Thanks, David