Message-ID: <598D0BB0.2040901@huawei.com>
Date: Fri, 11 Aug 2017 09:43:12 +0800
From: "Longpeng (Mike)" <longpeng2@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1
MIME-Version: 1.0
To: Eric Farman <farman@linux.vnet.ibm.com>
CC: Cornelia Huck <cohuck@redhat.com>, <pbonzini@redhat.com>,
        <rkrcmar@redhat.com>, <agraf@suse.com>, <borntraeger@de.ibm.com>,
        <christoffer.dall@linaro.org>, <marc.zyngier@arm.com>,
        <james.hogan@imgtec.com>, <kvm@vger.kernel.org>,
        <linux-kernel@vger.kernel.org>, <weidong.huang@huawei.com>,
        <arei.gonglei@huawei.com>, <wangxinxin.wang@huawei.com>,
        <longpeng.mike@gmail.com>, <david@redhat.com>
Subject: Re: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin
References: <1502165135-4784-1-git-send-email-longpeng2@huawei.com> <20170808094153.1b5bf8f4@gondolin> <598972E3.1030807@huawei.com> <65ce708e-480a-6173-f678-d7934c630439@linux.vnet.ibm.com>
In-Reply-To: <65ce708e-480a-6173-f678-d7934c630439@linux.vnet.ibm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2557
Lines: 83


On 2017/8/10 21:18, Eric Farman wrote:

> 
> 
> On 08/08/2017 04:14 AM, Longpeng (Mike) wrote:
>>
>>
>> On 2017/8/8 15:41, Cornelia Huck wrote:
>>
>>> On Tue, 8 Aug 2017 12:05:31 +0800
>>> "Longpeng(Mike)" <longpeng2@huawei.com> wrote:
>>>
>>>> This is a simple optimization for kvm_vcpu_on_spin, the
>>>> main idea is described in patch-1's commit msg.
>>>
>>> I think this generally looks good now.
>>>
>>>>
>>>> I did some tests base on the RFC version, the result shows
>>>> that it can improves the performance slightly.
>>>
>>> Did you re-run tests on this version?
>>
>>
>> Hi Cornelia,
>>
>> I didn't re-run tests on V2. But the major difference between RFC and V2
>> is that V2 only cache result for X86 (s390/arm needn't) and V2 saves a
>> expensive operation ( 440-1400 cycles on my test machine ) for X86/VMX.
>>
>> So I think V2's performance is at least the same as RFC or even slightly
>> better. :)
>>
>>>
>>> I would also like to see some s390 numbers; unfortunately I only have a
>>> z/VM environment and any performance numbers would be nearly useless
>>> there. Maybe somebody within IBM with a better setup can run a quick
>>> test?
> 
> Won't swear I didn't screw something up, but here's some quick numbers. Host was
> 4.12.0 with and without this series, running QEMU 2.10.0-rc0. Created 4 guests,
> each with 4 CPU (unpinned) and 4GB RAM.  VM1 did full kernel compiles with
> kernbench, which took averages of 5 runs of different job sizes (I threw away
> the "-j 1" numbers). VM2-VM4 ran cpu burners on 2 of their 4 cpus.
> 
> Numbers from VM1 kernbench output, and the delta between runs:
> 
> load -j 3        before        after        delta
> Elapsed Time        183.178        182.58        -0.598
> User Time        534.19        531.52        -2.67
> System Time        32.538        33.37        0.832
> Percent CPU        308.8        309        0.2
> Context Switches    98484.6        99001        516.4
> Sleeps            227347        228752        1405
> 
> load -j 16        before        after        delta
> Elapsed Time        153.352        147.59        -5.762
> User Time        545.829        533.41        -12.419
> System Time        34.289        34.85        0.561
> Percent CPU        347.6        348        0.4
> Context Switches    160518        159120        -1398
> Sleeps            240740        240536        -204
> 


Thanks Eric!

The `Elapsed Time` is smaller with this series , the result is the same as my
numbers in cover-letter.

> 
>  - Eric
> 
> 
> .
> 


-- 
Regards,
Longpeng(Mike)