Message-ID: <4FCCE823.8090700@linux.intel.com>
Date: Mon, 04 Jun 2012 09:53:55 -0700
From: Arjan van de Ven <arjan@linux.intel.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Vladimir Davydov <vdavydov@parallels.com>, Ingo Molnar <mingo@elte.hu>,
        Len Brown <lenb@kernel.org>, Andrew Morton <akpm@linux-foundation.org>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cpuidle: menu: use nr_running instead of cpuload for
 calculating perf mult
References: <1338805485-10874-1-git-send-email-vdavydov@parallels.com>     <1338805967.28282.12.camel@twins> <4FCCB486.4040905@linux.intel.com>    <1338817519.28282.54.camel@twins> <4FCCBC97.8060101@linux.intel.com>   <1338822509.28282.65.camel@twins> <4FCCD0CD.8080700@linux.intel.com>  <1338823568.28282.79.camel@twins> <4FCCD6B7.4030703@linux.intel.com> <1338827607.28282.99.camel@twins>
In-Reply-To: <1338827607.28282.99.camel@twins>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1995
Lines: 52


> 
> False, you can have 0 idle time and still have low load.

1 is not low in this context fwiw.

> 
>>  but because idle
>> time tends to be bursty, we can still be idle for, say, a millisecond
>> every 10 milliseconds. In this scenario, the load average is used to
>> ensure that the 200 usecond cost of exiting idle is acceptable.
> 
> So what you're saying is that if you have 1ms idle in 10ms, it might not
> be a continuous 1ms. And you're using load as a measure of how many
> fragments it comes apart in?

no

what I'm saying is that if you have a workload where you have 10 msec of
work, then 1 msec of idle, then 10 msec of work, 1 msec of idle etc etc,
it is very different from 100 msec of work, 10 msec of idle, 100 msec of
work, even though utilization is the same.

what the logic is trying to do, on a 10 km level, is to limit the damage
of accumulated C state exit time.
(I'll avoid the word "latency" here, since the real time people will
then immediately think this is about controlling latency response, which
it isn't)

Now, if you're very idle for a sustained duration (e.g. low load),
you're assumed not sensitive to a bit of performance cost.
but if you're actually busy (over a longer period, not just "right
now"), you're assumed to be sensitive to the performance cost,
and what the algorithm does is make it less easy to go into the
expensive states.

the closest metric we have right now to "sensitive to performance cost"
that I know of is "load average". If the scheduler has a better metric,
I'd be more than happy to switch the idle selection code over to it...


note that the idle selection code has 3 metrics, this is only one of them:
1. PM_QOS latency tolerance
2. Energy break even
3. Performance tolerance


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/