2014-01-15 04:08:21

by Alex Shi

[permalink] [raw]
Subject: [RFC PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..fb52d26 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+ /*
+ * Coarsely to get the latest idle cpu for shorter latency and
+ * possible power benefit.
+ */
+ if (!min_load) {
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+ s64 latest_wake = 0;
+ /* idle cpu doing irq */
+ if (ts->inidle && !ts->idle_active)
+ idlest = i;
+ /* the cpu resched */
+ else if (!ts->inidle)
+ idlest = i;
+ /* find latest idle cpu */
+ else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
+ idlest = i;
+ }
+#endif
}

return idlest;
--
1.8.1.2


2014-01-15 04:31:33

by Michael wang

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

Hi, Alex

On 01/15/2014 12:07 PM, Alex Shi wrote:
[snip] }
> +#ifdef CONFIG_NO_HZ_COMMON
> + /*
> + * Coarsely to get the latest idle cpu for shorter latency and
> + * possible power benefit.
> + */
> + if (!min_load) {
> + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
> +
> + s64 latest_wake = 0;

I guess we missed some code for latest_wake here?

Regards,
Michael Wang

> + /* idle cpu doing irq */
> + if (ts->inidle && !ts->idle_active)
> + idlest = i;
> + /* the cpu resched */
> + else if (!ts->inidle)
> + idlest = i;
> + /* find latest idle cpu */
> + else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
> + idlest = i;
> + }
> +#endif
> }
>
> return idlest;
>

2014-01-15 04:49:01

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 12:31 PM, Michael wang wrote:
> Hi, Alex
>
> On 01/15/2014 12:07 PM, Alex Shi wrote:
> [snip] }
>> +#ifdef CONFIG_NO_HZ_COMMON
>> + /*
>> + * Coarsely to get the latest idle cpu for shorter latency and
>> + * possible power benefit.
>> + */
>> + if (!min_load) {

here should be !load.
>> + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
>> +
>> + s64 latest_wake = 0;
>
> I guess we missed some code for latest_wake here?

Yes, thanks for reminder!

so updated patch:

====

>From c3a88e73fed3da96549b5a922076e996832685f8 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..73a2a07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,31 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+ /*
+ * Coarsely to get the latest idle cpu for shorter latency and
+ * possible power benefit.
+ */
+ if (!load) {
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+ s64 latest_wake = 0;
+ /* idle cpu doing irq */
+ if (ts->inidle && !ts->idle_active)
+ idlest = i;
+ /* the cpu resched */
+ else if (!ts->inidle)
+ idlest = i;
+ /* find latest idle cpu */
+ else {
+ s64 temp = ktime_to_us(ts->idle_entrytime);
+ if (temp > latest_wake) {
+ latest_wake = temp;
+ idlest = i;
+ }
+ }
+ }
+#endif
}

return idlest;
--
1.8.1.2

--
Thanks
Alex

2014-01-15 04:53:37

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 12:48 PM, Alex Shi wrote:
> On 01/15/2014 12:31 PM, Michael wang wrote:
>> Hi, Alex
>>
>> On 01/15/2014 12:07 PM, Alex Shi wrote:
>> [snip] }
>>> +#ifdef CONFIG_NO_HZ_COMMON
>>> + /*
>>> + * Coarsely to get the latest idle cpu for shorter latency and
>>> + * possible power benefit.
>>> + */
>>> + if (!min_load) {
>
> here should be !load.
>>> + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
>>> +
>>> + s64 latest_wake = 0;
>>
>> I guess we missed some code for latest_wake here?
>
> Yes, thanks for reminder!
>
> so updated patch:
>

ops, still incorrect. re-updated:

===

>From 5d48303b3eb3b5ca7fde54a6dfcab79cff360403 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..e2c4cd9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4161,12 +4161,38 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)

/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
+ s64 latest_wake = 0;
+
load = weighted_cpuload(i);

if (load < min_load || (load == min_load && i == this_cpu)) {
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+ /*
+ * Coarsely to get the latest idle cpu for shorter latency and
+ * possible power benefit.
+ */
+ if (!load) {
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+ /* idle cpu doing irq */
+ if (ts->inidle && !ts->idle_active)
+ idlest = i;
+ /* the cpu resched */
+ else if (!ts->inidle)
+ idlest = i;
+ /* find latest idle cpu */
+ else {
+ s64 temp = ktime_to_us(ts->idle_entrytime);
+ if (temp > latest_wake) {
+ latest_wake = temp;
+ idlest = i;
+ }
+ }
+ }
+#endif
}

return idlest;
--
1.8.1.2

--
Thanks
Alex

2014-01-15 05:07:09

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 12:53 PM, Alex Shi wrote:
>>> >> I guess we missed some code for latest_wake here?
>> >
>> > Yes, thanks for reminder!
>> >
>> > so updated patch:
>> >
> ops, still incorrect. re-updated:

update to wrong file. re-re-update. :(

===

>From b75e43bb77df14e2209532c1e5c48e0e03afa414 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..f82ca3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4159,6 +4159,10 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
int idlest = -1;
int i;

+#ifdef CONFIG_NO_HZ_COMMON
+ s64 latest_wake = 0;
+#endif
+
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
load = weighted_cpuload(i);
@@ -4167,6 +4171,30 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+ /*
+ * Coarsely to get the latest idle cpu for shorter latency and
+ * possible power benefit.
+ */
+ if (!load) {
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+ /* idle cpu doing irq */
+ if (ts->inidle && !ts->idle_active)
+ idlest = i;
+ /* the cpu resched */
+ else if (!ts->inidle)
+ idlest = i;
+ /* find latest idle cpu */
+ else {
+ s64 temp = ktime_to_us(ts->idle_entrytime);
+ if (temp > latest_wake) {
+ latest_wake = temp;
+ idlest = i;
+ }
+ }
+ }
+#endif
}

return idlest;
--
1.8.1.2

--
Thanks
Alex

2014-01-15 05:33:35

by Michael wang

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 12:07 PM, Alex Shi wrote:
> Currently we just try to find least load cpu. If some cpus idled,
> we just pick the first cpu in cpu mask.
>
> In fact we can get the interrupted idle cpu or the latest idled cpu,
> then we may get the benefit from both latency and power.
> The selected cpu maybe not the best, since other cpu may be interrupted
> during our selecting. But be captious costs too much.

So the idea here is we want to choose the latest idle cpu if we have
multiple idle cpu for choosing, correct?

And I guess that was in order to avoid choosing tickless cpu while there
are un-tickless idle one, is that right?

What confused me is, what about those cpu who just going to recover from
tickless as you mentioned, which means latest idle doesn't mean the best
choice, or even could be the worst (if just two choice, and the longer
tickless one is just going to recover while the latest is going to
tickless).

So what about just check 'ts->tick_stopped' and record one ticking idle
cpu? the cost could be lower than time comparison, we could reduce the
risk may be...(well, not so risky since the logical only works when
system is relaxing with several cpu idle)

Regards,
Michael Wang

>
> Signed-off-by: Alex Shi <[email protected]>
> ---
> kernel/sched/fair.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c7395d9..fb52d26 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> min_load = load;
> idlest = i;
> }
> +#ifdef CONFIG_NO_HZ_COMMON
> + /*
> + * Coarsely to get the latest idle cpu for shorter latency and
> + * possible power benefit.
> + */
> + if (!min_load) {
> + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
> +
> + s64 latest_wake = 0;
> + /* idle cpu doing irq */
> + if (ts->inidle && !ts->idle_active)
> + idlest = i;
> + /* the cpu resched */
> + else if (!ts->inidle)
> + idlest = i;
> + /* find latest idle cpu */
> + else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
> + idlest = i;
> + }
> +#endif
> }
>
> return idlest;
>

2014-01-15 06:46:13

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 01:33 PM, Michael wang wrote:
> On 01/15/2014 12:07 PM, Alex Shi wrote:
>> > Currently we just try to find least load cpu. If some cpus idled,
>> > we just pick the first cpu in cpu mask.
>> >
>> > In fact we can get the interrupted idle cpu or the latest idled cpu,
>> > then we may get the benefit from both latency and power.
>> > The selected cpu maybe not the best, since other cpu may be interrupted
>> > during our selecting. But be captious costs too much.
> So the idea here is we want to choose the latest idle cpu if we have
> multiple idle cpu for choosing, correct?

yes.
>
> And I guess that was in order to avoid choosing tickless cpu while there
> are un-tickless idle one, is that right?

no, current logical choice least load cpu no matter if it is idle.
>
> What confused me is, what about those cpu who just going to recover from
> tickless as you mentioned, which means latest idle doesn't mean the best
> choice, or even could be the worst (if just two choice, and the longer
> tickless one is just going to recover while the latest is going to
> tickless).

yes, to save your scenario, we need to know the next timer for idle cpu,
but that is not enough, interrupt is totally unpredictable. So, I'd
rather bear the coarse method now.
>
> So what about just check 'ts->tick_stopped' and record one ticking idle
> cpu? the cost could be lower than time comparison, we could reduce the
> risk may be...(well, not so risky since the logical only works when
> system is relaxing with several cpu idle)

first, nohz full also stop tick. second, tick_stopped can not reflect
the interrupt. when the idle cpu was interrupted, it's waken, then be a
good candidate for task running.

--
Thanks
Alex

2014-01-15 07:36:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
> Currently we just try to find least load cpu. If some cpus idled,
> we just pick the first cpu in cpu mask.
>
> In fact we can get the interrupted idle cpu or the latest idled cpu,
> then we may get the benefit from both latency and power.
> The selected cpu maybe not the best, since other cpu may be interrupted
> during our selecting. But be captious costs too much.

No, we should not do anything like this without first integrating
cpuidle.

At which point we have a sane view of the idle states and can make a
sane choice between them.

2014-01-15 08:05:17

by Michael wang

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 02:45 PM, Alex Shi wrote:
[snip]
>
> yes, to save your scenario, we need to know the next timer for idle cpu,
> but that is not enough, interrupt is totally unpredictable. So, I'd
> rather bear the coarse method now.
>>
>> So what about just check 'ts->tick_stopped' and record one ticking idle
>> cpu? the cost could be lower than time comparison, we could reduce the
>> risk may be...(well, not so risky since the logical only works when
>> system is relaxing with several cpu idle)
>
> first, nohz full also stop tick. second, tick_stopped can not reflect
> the interrupt. when the idle cpu was interrupted, it's waken, then be a
> good candidate for task running.

IMHO, if we have to do gamble here, we better choose the cheaper bet,
unless we could prove this 'coarse method' have more higher chance for
BINGO than just check 'tick_stopped'...

BTW, may be the logical should be in the select_idle_sibling()?

Regards,
Michael Wang

>

2014-01-15 14:28:56

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 04:05 PM, Michael wang wrote:
> On 01/15/2014 02:45 PM, Alex Shi wrote:
> [snip]
>>
>> yes, to save your scenario, we need to know the next timer for idle cpu,
>> but that is not enough, interrupt is totally unpredictable. So, I'd
>> rather bear the coarse method now.
>>>
>>> So what about just check 'ts->tick_stopped' and record one ticking idle
>>> cpu? the cost could be lower than time comparison, we could reduce the
>>> risk may be...(well, not so risky since the logical only works when
>>> system is relaxing with several cpu idle)
>>
>> first, nohz full also stop tick. second, tick_stopped can not reflect
>> the interrupt. when the idle cpu was interrupted, it's waken, then be a
>> good candidate for task running.
>
> IMHO, if we have to do gamble here, we better choose the cheaper bet,
> unless we could prove this 'coarse method' have more higher chance for
> BINGO than just check 'tick_stopped'...

Tick stopped on a nohz full CPU, but the cpu still had a task running...
>
> BTW, may be the logical should be in the select_idle_sibling()?

both of functions need to be considered.
>
> Regards,
> Michael Wang
>
>>
>


--
Thanks
Alex

2014-01-15 14:38:06

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 03:35 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
>> Currently we just try to find least load cpu. If some cpus idled,
>> we just pick the first cpu in cpu mask.
>>
>> In fact we can get the interrupted idle cpu or the latest idled cpu,
>> then we may get the benefit from both latency and power.
>> The selected cpu maybe not the best, since other cpu may be interrupted
>> during our selecting. But be captious costs too much.
>
> No, we should not do anything like this without first integrating
> cpuidle.
>
> At which point we have a sane view of the idle states and can make a
> sane choice between them.
>


Daniel,

Any comments to make it better?

--
Thanks
Alex

2014-01-16 11:03:20

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/15/2014 03:37 PM, Alex Shi wrote:
> On 01/15/2014 03:35 PM, Peter Zijlstra wrote:
>> On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
>>> Currently we just try to find least load cpu. If some cpus idled,
>>> we just pick the first cpu in cpu mask.
>>>
>>> In fact we can get the interrupted idle cpu or the latest idled cpu,
>>> then we may get the benefit from both latency and power.
>>> The selected cpu maybe not the best, since other cpu may be interrupted
>>> during our selecting. But be captious costs too much.
>>
>> No, we should not do anything like this without first integrating
>> cpuidle.
>>
>> At which point we have a sane view of the idle states and can make a
>> sane choice between them.
>>
>
>
> Daniel,
>
> Any comments to make it better?

Hi Alex,

it is a nice optimization attempt but I agree with Peter we should focus
on integrating cpuidle.

The question is "how do we integrate cpuidle ?"

IMHO, the main problem are the governors, especially the menu governor.

The menu governor tries to predict the events per cpu. This approach
which gave us a nice benefit for the power saving may not fit well for
the scheduler.

I think we can classify the events in three categories:

1. fully predictable (timers)
2. partially predictable (eg. MMC, sdd or network)
3. unpredictable (eg. keyboard, network ingress after quiescent period)

The menu governor mix 2 and 3 with statistics and a performance
multiplier to reach shallow states based on heuristic and
experimentation for a specific platform.

I was wondering if we shouldn't create a per task io latency tracking.

Mostly based on io_schedule and io_schedule_timeout, we track the
latency for each task for each device, keeping up to date a rb-tree
where the left-most leaf is the minimum latency for all the tasks
running on a specific cpu. That allows better tracking when moving tasks
across cpus.

With this approach, we have something consistent with the per load task
tracking.

This io latency tracking gives us the next wake up event we can inject
to the cpuidle framework directly. That removes all the code related to
the menu governor statistics based on IO events and simplify a lot the
menu governor code. So we replaced a piece of the cpuidle code by a
scheduler code which I hope could be better for prediction, leading to a
part of integration.

In order to finish integrating the cpuidle framework in the scheduler,
there are pending questions about the impact in the current design.

Peter or Ingo, if you have time, could you have a look at the email I
sent previously [1] ?

Thanks

-- Daniel


[1] https://lkml.org/lkml/2013/12/17/106

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 11:38:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On Thu, Jan 16, 2014 at 12:03:13PM +0100, Daniel Lezcano wrote:
> Hi Alex,
>
> it is a nice optimization attempt but I agree with Peter we should focus on
> integrating cpuidle.
>
> The question is "how do we integrate cpuidle ?"
>
> IMHO, the main problem are the governors, especially the menu governor.

Yah.

> The menu governor tries to predict the events per cpu. This approach which
> gave us a nice benefit for the power saving may not fit well for the
> scheduler.

So the way to start all this is I think to gradually share more and
more.

Start by pulling in the actual idle state; such that we can indeed
observe what the relative cost is of waking a cpu (against another), and
maybe even the predicted wakeup time.

Then pull in the various statistics gathering bits -- without improving
them.

Then improve the statistics; try and remove duplicate statistics -- if
there's such things, try and use the extra information the scheduler has
etc..

Then worry about the governors, or what's left of them.

> In order to finish integrating the cpuidle framework in the scheduler, there
> are pending questions about the impact in the current design.
>
> Peter or Ingo, if you have time, could you have a look at the email I sent
> previously [1] ?

I read it once, it didn't make sense at the time, I just read it again,
still doesn't make sense.

We need the idle task, since we need to DO something to go idle, the
scheduler needs to pick a task to go do that something. This is the idle
task.

You cannot get rid of that.

In fact, the 'doing' of that task is running much of the cpuidle code,
so by getting rid of it, there's nobody left to execute that code.

Also, since its already running that cpuidle stuff, integrating it more
closely with the scheduler will not in fact change much, it will still
run it.

Could of course be I'm not reading what you meant to write, if so, do
try again ;-)

2014-01-16 12:16:20

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On 01/16/2014 12:38 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 12:03:13PM +0100, Daniel Lezcano wrote:
>> Hi Alex,
>>
>> it is a nice optimization attempt but I agree with Peter we should focus on
>> integrating cpuidle.
>>
>> The question is "how do we integrate cpuidle ?"
>>
>> IMHO, the main problem are the governors, especially the menu governor.
>
> Yah.
>
>> The menu governor tries to predict the events per cpu. This approach which
>> gave us a nice benefit for the power saving may not fit well for the
>> scheduler.
>
> So the way to start all this is I think to gradually share more and
> more.
>
> Start by pulling in the actual idle state; such that we can indeed
> observe what the relative cost is of waking a cpu (against another), and
> maybe even the predicted wakeup time.

Ok, I will send a patch for this.

> Then pull in the various statistics gathering bits -- without improving
> them.
>
> Then improve the statistics; try and remove duplicate statistics -- if
> there's such things, try and use the extra information the scheduler has
> etc..
>
> Then worry about the governors, or what's left of them.
>
>> In order to finish integrating the cpuidle framework in the scheduler, there
>> are pending questions about the impact in the current design.
>>
>> Peter or Ingo, if you have time, could you have a look at the email I sent
>> previously [1] ?
>
> I read it once, it didn't make sense at the time, I just read it again,
> still doesn't make sense.

:)

The question raised when I looked closely how to fully integrate cpuidle
with the scheduler; in particular, the idle time.
The scheduler idle time is not the same than the cpuidle idle time.
A cpu can be idle for the scheduler 1s but it could be interrupted
several times by an interrupt thus the idle time for cpuidle is
different. But anyway ...

> We need the idle task, since we need to DO something to go idle, the
> scheduler needs to pick a task to go do that something. This is the idle
> task.
>
> You cannot get rid of that.
>
> In fact, the 'doing' of that task is running much of the cpuidle code,
> so by getting rid of it, there's nobody left to execute that code.
>
> Also, since its already running that cpuidle stuff, integrating it more
> closely with the scheduler will not in fact change much, it will still
> run it.
>
> Could of course be I'm not reading what you meant to write, if so, do
> try again ;-)

Well, I wanted to have a clarification of what was your feeling about
how to integrate cpuidle in the scheduler. If removing the idle task (in
the future) does not make sense for you, I will not insist. Let's see
how the code evolves by integrating cpuidle and we will figure out what
will be the impact on the idle task.

Thanks for your feedbacks

-- Daniel

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-17 02:40:33

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: find the latest idle cpu

On Thu, 16 Jan 2014, Daniel Lezcano wrote:

> The question raised when I looked closely how to fully integrate cpuidle with
> the scheduler; in particular, the idle time.
> The scheduler idle time is not the same than the cpuidle idle time.
> A cpu can be idle for the scheduler 1s but it could be interrupted several
> times by an interrupt thus the idle time for cpuidle is different. But anyway
> ...

The idle task would run each time an interrupt has been serviced, either
to yield to a newly awaken task or to put the CPU back to sleep. In the
later case the idle task may simply do extra idleness accounting
locally. If the former case happens most of the time then the scheduler
idle time would be most representative already.

And if threaded IRQs are used then the the scheduler idle time would be
the same as cpuidle's.

> > We need the idle task, since we need to DO something to go idle, the
> > scheduler needs to pick a task to go do that something. This is the idle
> > task.
> >
> > You cannot get rid of that.
> >
> > In fact, the 'doing' of that task is running much of the cpuidle code,
> > so by getting rid of it, there's nobody left to execute that code.
> >
> > Also, since its already running that cpuidle stuff, integrating it more
> > closely with the scheduler will not in fact change much, it will still
> > run it.
> >
> > Could of course be I'm not reading what you meant to write, if so, do
> > try again ;-)
>
> Well, I wanted to have a clarification of what was your feeling about how to
> integrate cpuidle in the scheduler. If removing the idle task (in the future)
> does not make sense for you, I will not insist. Let's see how the code evolves
> by integrating cpuidle and we will figure out what will be the impact on the
> idle task.

I think we should be able to get rid of architecture specific idle
loops. The idle loop could be moved close to the scheduler and
architectures would only need to provide a default CPU halt method for
when there is nothing else registered with the cpuidle subsystem.


Nicolas