Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760856Ab2FDQx6 (ORCPT ); Mon, 4 Jun 2012 12:53:58 -0400 Received: from mga14.intel.com ([143.182.124.37]:44093 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751874Ab2FDQx4 (ORCPT ); Mon, 4 Jun 2012 12:53:56 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.71,315,1320652800"; d="scan'208";a="151415606" Message-ID: <4FCCE823.8090700@linux.intel.com> Date: Mon, 04 Jun 2012 09:53:55 -0700 From: Arjan van de Ven User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Peter Zijlstra CC: Vladimir Davydov , Ingo Molnar , Len Brown , Andrew Morton , linux-kernel@vger.kernel.org Subject: Re: [PATCH] cpuidle: menu: use nr_running instead of cpuload for calculating perf mult References: <1338805485-10874-1-git-send-email-vdavydov@parallels.com> <1338805967.28282.12.camel@twins> <4FCCB486.4040905@linux.intel.com> <1338817519.28282.54.camel@twins> <4FCCBC97.8060101@linux.intel.com> <1338822509.28282.65.camel@twins> <4FCCD0CD.8080700@linux.intel.com> <1338823568.28282.79.camel@twins> <4FCCD6B7.4030703@linux.intel.com> <1338827607.28282.99.camel@twins> In-Reply-To: <1338827607.28282.99.camel@twins> X-Enigmail-Version: 1.4.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1995 Lines: 52 > > False, you can have 0 idle time and still have low load. 1 is not low in this context fwiw. > >> but because idle >> time tends to be bursty, we can still be idle for, say, a millisecond >> every 10 milliseconds. In this scenario, the load average is used to >> ensure that the 200 usecond cost of exiting idle is acceptable. > > So what you're saying is that if you have 1ms idle in 10ms, it might not > be a continuous 1ms. And you're using load as a measure of how many > fragments it comes apart in? no what I'm saying is that if you have a workload where you have 10 msec of work, then 1 msec of idle, then 10 msec of work, 1 msec of idle etc etc, it is very different from 100 msec of work, 10 msec of idle, 100 msec of work, even though utilization is the same. what the logic is trying to do, on a 10 km level, is to limit the damage of accumulated C state exit time. (I'll avoid the word "latency" here, since the real time people will then immediately think this is about controlling latency response, which it isn't) Now, if you're very idle for a sustained duration (e.g. low load), you're assumed not sensitive to a bit of performance cost. but if you're actually busy (over a longer period, not just "right now"), you're assumed to be sensitive to the performance cost, and what the algorithm does is make it less easy to go into the expensive states. the closest metric we have right now to "sensitive to performance cost" that I know of is "load average". If the scheduler has a better metric, I'd be more than happy to switch the idle selection code over to it... note that the idle selection code has 3 metrics, this is only one of them: 1. PM_QOS latency tolerance 2. Energy break even 3. Performance tolerance -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/