Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751377AbdGRDPF (ORCPT ); Mon, 17 Jul 2017 23:15:05 -0400 Received: from mga03.intel.com ([134.134.136.65]:6251 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751318AbdGRDPD (ORCPT ); Mon, 17 Jul 2017 23:15:03 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,376,1496127600"; d="scan'208";a="126285614" Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods To: Peter Zijlstra , Andi Kleen Cc: Frederic Weisbecker , Christoph Lameter , Aubrey Li , tglx@linutronix.de, len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com, arjan@linux.intel.com, paulmck@linux.vnet.ibm.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org, daniel.lezcano@linaro.org References: <20170713083649.febfflfl5hafkko5@hirez.programming.kicks-ass.net> <16e12e23-6b28-f174-7c4b-4d719225cd3b@linux.intel.com> <20170713145311.z4zxlyd2dospeoqg@hirez.programming.kicks-ass.net> <4a577bd6-20b1-abb6-2153-f9870f0a721e@linux.intel.com> <20170713182820.sn3fjitnd3mca27p@hirez.programming.kicks-ass.net> <31170ac6-9db1-f0b8-4841-f1661c8ed6e1@linux.intel.com> <20170714153818.pjauqxebxyhs6ljp@hirez.programming.kicks-ass.net> <20170714155356.GH3441@tassilo.jf.intel.com> <20170714160648.tg2u6eo2id6gmnjz@hirez.programming.kicks-ass.net> <20170714162619.GJ3441@tassilo.jf.intel.com> <20170717192309.ubn5muvc3u7htuaw@hirez.programming.kicks-ass.net> From: "Li, Aubrey" Message-ID: <34371ef8-b8bc-d2bf-93de-3fccd6beb032@linux.intel.com> Date: Tue, 18 Jul 2017 11:14:57 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170717192309.ubn5muvc3u7htuaw@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2549 Lines: 66 On 2017/7/18 3:23, Peter Zijlstra wrote: > On Fri, Jul 14, 2017 at 09:26:19AM -0700, Andi Kleen wrote: >>> And as said; Daniel has been working on a better predictor -- now he's >>> probably not used it on the network workload you're looking at, so that >>> might be something to consider. >> >> Deriving a better idle predictor is a bit orthogonal to fast idle. > > No. If you want a different C state selected we need to fix the current > C state selector. We're not going to tinker. > > And the predictor is probably the most fundamental part of the whole C > state selection logic. > > Now I think the problem is that the current predictor goes for an > average idle duration. This means that we, on average, get it wrong 50% > of the time. For performance that's bad. > > If you want to improve the worst case, we need to consider a cumulative > distribution function, and staying with the Gaussian assumption already > present, that would mean using: > > 1 x - mu > CDF(x) = - [ 1 + erf(-------------) ] > 2 sigma sqrt(2) > > Where, per the normal convention mu is the average and sigma^2 the > variance. See also: > > https://en.wikipedia.org/wiki/Normal_distribution > > We then solve CDF(x) = n% to find the x for which we get it wrong n% of > the time (IIRC something like: 'mu - 2sigma' ends up being 5% or so). > > This conceptually gets us better exit latency for the cases where we got > it wrong before, and practically pushes down the estimate which gets us > C1 longer. > > Of course, this all assumes a Gaussian distribution to begin with, if we > get bimodal (or worse) distributions we can still get it wrong. To fix > that, we'd need to do something better than what we currently have. > Maybe you are talking about applying some machine learning algorithm online to fit a multivariate normal distribution, :) Well, back to the problem, when the scheduler picks up idle thread, it does not look at the history, nor make the prediction. So it's possible it has to switch back a task ASAP when it's going into idle(very common under some workloads). That is, (idle_entry + idle_exit) > idle. If the system has multiple hardware idle states, then: (idle_entry + idle_exit + HW_entry + HW_exit) > HW_sleep So we eventually want the idle path lighter than what we currently have. A complex predictor may have high accuracy, but the cost could be high as well. We need a tradeoff here IMHO. I'll check Daniel's work to understand how/if it's better than menu governor. Thanks, -Aubrey