Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752365AbdHHLZk (ORCPT ); Tue, 8 Aug 2017 07:25:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59364 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751921AbdHHLZh (ORCPT ); Tue, 8 Aug 2017 07:25:37 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B5AA0356CD Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=david@redhat.com Subject: Re: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin To: "Longpeng(Mike)" , pbonzini@redhat.com, rkrcmar@redhat.com Cc: agraf@suse.com, borntraeger@de.ibm.com, cohuck@redhat.com, christoffer.dall@linaro.org, marc.zyngier@arm.com, james.hogan@imgtec.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, weidong.huang@huawei.com, arei.gonglei@huawei.com, wangxinxin.wang@huawei.com, longpeng.mike@gmail.com References: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: Date: Tue, 8 Aug 2017 13:25:31 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 08 Aug 2017 11:25:36 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3893 Lines: 121 On 08.08.2017 06:05, Longpeng(Mike) wrote: > This is a simple optimization for kvm_vcpu_on_spin, the > main idea is described in patch-1's commit msg. > > I did some tests base on the RFC version, the result shows > that it can improves the performance slightly. > > == Geekbench-3.4.1 == > VM1: 8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) > running Geekbench-3.4.1 *10 truns* > VM2/VM3/VM4: configure is the same as VM1 > stress each vcpu usage(seed by top in guest) to 40% > > The comparison of each testcase's score: > (higher is better) > before after improve > Inter > single 1176.7 1179.0 0.2% > multi 3459.5 3426.5 -0.9% > Float > single 1150.5 1150.9 0.0% > multi 3364.5 3391.9 0.8% > Memory(stream) > single 1768.7 1773.1 0.2% > multi 2511.6 2557.2 1.8% > Overall > single 1284.2 1286.2 0.2% > multi 3231.4 3238.4 0.2% > > > == kernbench-0.42 == > VM1: 8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) > running "kernbench -n 10" > VM2/VM3/VM4: configure is the same as VM1 > stress each vcpu usage(seed by top in guest) to 40% > > The comparison of 'Elapsed Time': > (sooner is better) > before after improve > load -j4 12.762 12.751 0.1% > load -j32 9.743 8.955 8.1% > load -j 9.688 9.229 4.7% > > > Physical Machine: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 24 > On-line CPU(s) list: 0-23 > Thread(s) per core: 2 > Core(s) per socket: 6 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 45 > Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz > Stepping: 7 > CPU MHz: 2799.902 > BogoMIPS: 5004.67 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 15360K > NUMA node0 CPU(s): 0-5,12-17 > NUMA node1 CPU(s): 6-11,18-23 > > --- > Changes since V1: > - split the implementation of s390 & arm. [David] > - refactor the impls according to the suggestion. [Paolo] > > Changes since RFC: > - only cache result for X86. [David & Cornlia & Paolo] > - add performance numbers. [David] > - impls arm/s390. [Christoffer & David] > - refactor the impls. [me] > > --- > Longpeng(Mike) (4): > KVM: add spinlock optimization framework > KVM: X86: implement the logic for spinlock optimization > KVM: s390: implements the kvm_arch_vcpu_in_kernel() > KVM: arm: implements the kvm_arch_vcpu_in_kernel() > > arch/arm/kvm/handle_exit.c | 2 +- > arch/arm64/kvm/handle_exit.c | 2 +- > arch/mips/kvm/mips.c | 6 ++++++ > arch/powerpc/kvm/powerpc.c | 6 ++++++ > arch/s390/kvm/diag.c | 2 +- > arch/s390/kvm/kvm-s390.c | 6 ++++++ > arch/x86/include/asm/kvm_host.h | 5 +++++ > arch/x86/kvm/hyperv.c | 2 +- > arch/x86/kvm/svm.c | 10 +++++++++- > arch/x86/kvm/vmx.c | 16 +++++++++++++++- > arch/x86/kvm/x86.c | 11 +++++++++++ > include/linux/kvm_host.h | 3 ++- > virt/kvm/arm/arm.c | 5 +++++ > virt/kvm/kvm_main.c | 4 +++- > 14 files changed, 72 insertions(+), 8 deletions(-) > I am curious, is there any architecture that allows to trigger kvm_vcpu_on_spin(vcpu); while _not_ in kernel mode? I would have guessed that user space should never be allowed to make cpu wide decisions (giving up the CPU to the hypervisor). E.g. s390x diag can only be executed from kernel space. VMX PAUSE is only valid from kernel space. I.o.w. do we need a parameter to kvm_vcpu_on_spin(vcpu); at all, or is "me_in_kernel" basically always true? -- Thanks, David