Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752503AbdH2L6R (ORCPT ); Tue, 29 Aug 2017 07:58:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:37755 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751227AbdH2L6P (ORCPT ); Tue, 29 Aug 2017 07:58:15 -0400 Subject: Re: [RFC PATCH v2 0/7] x86/idle: add halt poll support To: Yang Zhang , linux-kernel@vger.kernel.org References: <1504007201-12904-1-git-send-email-yang.zhang.wz@gmail.com> Cc: kvm@vger.kernel.org, wanpeng.li@hotmail.com, mst@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, rkrcmar@redhat.com, dmatlack@google.com, peterz@infradead.org, linux-doc@vger.kernel.org From: Alexander Graf Message-ID: <6ba7f198-4403-c9d1-f0be-7069cc8cd421@suse.de> Date: Tue, 29 Aug 2017 13:58:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <1504007201-12904-1-git-send-email-yang.zhang.wz@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2152 Lines: 53 On 08/29/2017 01:46 PM, Yang Zhang wrote: > Some latency-intensive workload will see obviously performance > drop when running inside VM. The main reason is that the overhead > is amplified when running inside VM. The most cost i have seen is > inside idle path. > > This patch introduces a new mechanism to poll for a while before > entering idle state. If schedule is needed during poll, then we > don't need to goes through the heavy overhead path. > > Here is the data we get when running benchmark contextswitch to measure > the latency(lower is better): > > 1. w/o patch: > 2493.14 ns/ctxsw -- 200.3 %CPU > > 2. w/ patch: > halt_poll_threshold=10000 -- 1485.96ns/ctxsw -- 201.0 %CPU > halt_poll_threshold=20000 -- 1391.26 ns/ctxsw -- 200.7 %CPU > halt_poll_threshold=30000 -- 1488.55 ns/ctxsw -- 200.1 %CPU > halt_poll_threshold=500000 -- 1159.14 ns/ctxsw -- 201.5 %CPU > > 3. kvm dynamic poll > halt_poll_ns=10000 -- 2296.11 ns/ctxsw -- 201.2 %CPU > halt_poll_ns=20000 -- 2599.7 ns/ctxsw -- 201.7 %CPU > halt_poll_ns=30000 -- 2588.68 ns/ctxsw -- 211.6 %CPU > halt_poll_ns=500000 -- 2423.20 ns/ctxsw -- 229.2 %CPU > > 4. idle=poll > 2050.1 ns/ctxsw -- 1003 %CPU > > 5. idle=mwait > 2188.06 ns/ctxsw -- 206.3 %CPU Could you please try to create another metric for guest initiated, host aborted mwait? For a quick benchmark, reserve 4 registers for a magic value, set them to the magic value before you enter MWAIT in the guest. Then allow native MWAIT execution on the host. If you see the guest wants to enter with the 4 registers containing the magic contents and no events are pending, directly go into the vcpu block function on the host. That way any time a guest gets naturally aborted while in mwait, it will only reenter mwait when an event actually occured. While the guest is normally running (and nobody else wants to run on the host), we just stay in guest context, but with a sleeping CPU. Overall, that might give us even better performance, as it allows for turbo boost and HT to work properly. Alex