Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752092AbdHKBng (ORCPT ); Thu, 10 Aug 2017 21:43:36 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:3062 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751463AbdHKBnf (ORCPT ); Thu, 10 Aug 2017 21:43:35 -0400 Message-ID: <598D0BB0.2040901@huawei.com> Date: Fri, 11 Aug 2017 09:43:12 +0800 From: "Longpeng (Mike)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Eric Farman CC: Cornelia Huck , , , , , , , , , , , , , , Subject: Re: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin References: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> <20170808094153.1b5bf8f4@gondolin> <598972E3.1030807@huawei.com> <65ce708e-480a-6173-f678-d7934c630439@linux.vnet.ibm.com> In-Reply-To: <65ce708e-480a-6173-f678-d7934c630439@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.246.209] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090202.598D0BC4.0018,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 199fda3346ce46a8dede5e4b895471ec Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2557 Lines: 83 On 2017/8/10 21:18, Eric Farman wrote: > > > On 08/08/2017 04:14 AM, Longpeng (Mike) wrote: >> >> >> On 2017/8/8 15:41, Cornelia Huck wrote: >> >>> On Tue, 8 Aug 2017 12:05:31 +0800 >>> "Longpeng(Mike)" wrote: >>> >>>> This is a simple optimization for kvm_vcpu_on_spin, the >>>> main idea is described in patch-1's commit msg. >>> >>> I think this generally looks good now. >>> >>>> >>>> I did some tests base on the RFC version, the result shows >>>> that it can improves the performance slightly. >>> >>> Did you re-run tests on this version? >> >> >> Hi Cornelia, >> >> I didn't re-run tests on V2. But the major difference between RFC and V2 >> is that V2 only cache result for X86 (s390/arm needn't) and V2 saves a >> expensive operation ( 440-1400 cycles on my test machine ) for X86/VMX. >> >> So I think V2's performance is at least the same as RFC or even slightly >> better. :) >> >>> >>> I would also like to see some s390 numbers; unfortunately I only have a >>> z/VM environment and any performance numbers would be nearly useless >>> there. Maybe somebody within IBM with a better setup can run a quick >>> test? > > Won't swear I didn't screw something up, but here's some quick numbers. Host was > 4.12.0 with and without this series, running QEMU 2.10.0-rc0. Created 4 guests, > each with 4 CPU (unpinned) and 4GB RAM. VM1 did full kernel compiles with > kernbench, which took averages of 5 runs of different job sizes (I threw away > the "-j 1" numbers). VM2-VM4 ran cpu burners on 2 of their 4 cpus. > > Numbers from VM1 kernbench output, and the delta between runs: > > load -j 3 before after delta > Elapsed Time 183.178 182.58 -0.598 > User Time 534.19 531.52 -2.67 > System Time 32.538 33.37 0.832 > Percent CPU 308.8 309 0.2 > Context Switches 98484.6 99001 516.4 > Sleeps 227347 228752 1405 > > load -j 16 before after delta > Elapsed Time 153.352 147.59 -5.762 > User Time 545.829 533.41 -12.419 > System Time 34.289 34.85 0.561 > Percent CPU 347.6 348 0.4 > Context Switches 160518 159120 -1398 > Sleeps 240740 240536 -204 > Thanks Eric! The `Elapsed Time` is smaller with this series , the result is the same as my numbers in cover-letter. > > - Eric > > > . > -- Regards, Longpeng(Mike)