Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751804AbdINJlK (ORCPT ); Thu, 14 Sep 2017 05:41:10 -0400 Received: from mail-io0-f196.google.com ([209.85.223.196]:32908 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751277AbdINJlI (ORCPT ); Thu, 14 Sep 2017 05:41:08 -0400 X-Google-Smtp-Source: AOwi7QA2QeAPHkwwjVp25odSad1JOAbqtWBLpI3bOfHmDPUyZW7KloohopYrGkLt5mTGct4AD4+eow== Subject: Re: [RFC PATCH v2 0/7] x86/idle: add halt poll support To: Wanpeng Li Cc: Yang Zhang , "Michael S. Tsirkin" , "linux-kernel@vger.kernel.org" , kvm , Wanpeng Li , Paolo Bonzini , Thomas Gleixner , Radim Krcmar , David Matlack , Alexander Graf , Peter Zijlstra , linux-doc@vger.kernel.org References: <1504007201-12904-1-git-send-email-yang.zhang.wz@gmail.com> <20170829174147-mutt-send-email-mst@kernel.org> <259c95bc-3641-965b-4054-a233a6ee785c@gmail.com> <25e566b9-a8f4-2d90-0ba3-725f1a215c1f@gmail.com> From: Quan Xu Message-ID: Date: Thu, 14 Sep 2017 17:40:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3719 Lines: 115 On 2017/9/14 17:19, Wanpeng Li wrote: > 2017-09-14 16:36 GMT+08:00 Quan Xu : >> >> on 2017/9/13 19:56, Yang Zhang wrote: >>> On 2017/8/29 22:56, Michael S. Tsirkin wrote: >>>> On Tue, Aug 29, 2017 at 11:46:34AM +0000, Yang Zhang wrote: >>>>> Some latency-intensive workload will see obviously performance >>>>> drop when running inside VM. >>>> >>>> But are we trading a lot of CPU for a bit of lower latency? >>>> >>>>> The main reason is that the overhead >>>>> is amplified when running inside VM. The most cost i have seen is >>>>> inside idle path. >>>>> >>>>> This patch introduces a new mechanism to poll for a while before >>>>> entering idle state. If schedule is needed during poll, then we >>>>> don't need to goes through the heavy overhead path. >>>> >>>> Isn't it the job of an idle driver to find the best way to >>>> halt the CPU? >>>> >>>> It looks like just by adding a cstate we can make it >>>> halt at higher latencies only. And at lower latencies, >>>> if it's doing a good job we can hopefully use mwait to >>>> stop the CPU. >>>> >>>> In fact I have been experimenting with exactly that. >>>> Some initial results are encouraging but I could use help >>>> with testing and especially tuning. If you can help >>>> pls let me know! >>> >>> Quan, Can you help to test it and give result? Thanks. >>> >> Hi, MST >> >> I have tested the patch "intel_idle: add pv cstates when running on kvm" on >> a recent host that allows guests >> to execute mwait without an exit. also I have tested our patch "[RFC PATCH >> v2 0/7] x86/idle: add halt poll support", >> upstream linux, and idle=poll. >> >> the following is the result (which seems better than ever berfore, as I ran >> test case on a more powerful machine): >> >> for __netperf__, the first column is trans. rate per sec, the second column >> is CPU utilzation. >> >> 1. upstream linux > This "upstream linux" means that disables the kvm adaptive > halt-polling after confirm with Xu Quan. upstream linux -- the source code is just from upstream linux, without our patch or MST's patch.. yes, we disable kvm halt-polling(halt_poll_ns=0) for _all_of_ following cases. Quan > Regards, > Wanpeng Li > >> 28371.7 bits/s -- 76.6 %CPU >> >> 2. idle=poll >> >> 34372 bit/s -- 999.3 %CPU >> >> 3. "[RFC PATCH v2 0/7] x86/idle: add halt poll support", with different >> values of parameter 'halt_poll_threshold': >> >> 28362.7 bits/s -- 74.7 %CPU (halt_poll_threshold=10000) >> 32949.5 bits/s -- 82.5 %CPU (halt_poll_threshold=20000) >> 39717.9 bits/s -- 104.1 %CPU (halt_poll_threshold=30000) >> 40137.9 bits/s -- 104.4 %CPU (halt_poll_threshold=40000) >> 40079.8 bits/s -- 105.6 %CPU (halt_poll_threshold=50000) >> >> >> 4. "intel_idle: add pv cstates when running on kvm" >> >> 33041.8 bits/s -- 999.4 %CPU >> >> >> >> >> >> for __ctxsw__, the first column is the time per process context switches, >> the second column is CPU utilzation.. >> >> 1. upstream linux >> >> 3624.19 ns/ctxsw -- 191.9 %CPU >> >> 2. idle=poll >> >> 3419.66 ns/ctxsw -- 999.2 %CPU >> >> 3. "[RFC PATCH v2 0/7] x86/idle: add halt poll support", with different >> values of parameter 'halt_poll_threshold': >> >> 1123.40 ns/ctxsw -- 199.6 %CPU (halt_poll_threshold=10000) >> 1127.38 ns/ctxsw -- 199.7 %CPU (halt_poll_threshold=20000) >> 1113.58 ns/ctxsw -- 199.6 %CPU (halt_poll_threshold=30000) >> 1117.12 ns/ctxsw -- 199.6 %CPU (halt_poll_threshold=40000) >> 1121.62 ns/ctxsw -- 199.6 %CPU (halt_poll_threshold=50000) >> >> 4. "intel_idle: add pv cstates when running on kvm" >> >> 3427.59 ns/ctxsw -- 999.4 %CPU >> >> -Quan