2014-04-16 11:34:40

by Alex Shi

[permalink] [raw]
Subject: [RFC PATCH] sched: let task migration destination cpu do active balance

Chris Redpath found an issue on active balance:
We let the task source cpu, the busiest cpu, do the active balance,
while the destination cpu maybe idle. thus we take the busiest cpu
time, but left the idlest cpu wait. That is not good for performance.

This patch let the destination cpu do active balance. It will give tasks
more running time.

Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b4c4f3..cccee76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6308,7 +6308,7 @@ more_balance:
raw_spin_unlock_irqrestore(&busiest->lock, flags);

if (active_balance) {
- stop_one_cpu_nowait(cpu_of(busiest),
+ stop_one_cpu_nowait(busiest->push_cpu,
active_load_balance_cpu_stop, busiest,
&busiest->active_balance_work);
}
--
1.8.1.2


2014-04-16 12:13:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: let task migration destination cpu do active balance

On Wed, Apr 16, 2014 at 07:34:29PM +0800, Alex Shi wrote:
> Chris Redpath found an issue on active balance:
> We let the task source cpu, the busiest cpu, do the active balance,
> while the destination cpu maybe idle. thus we take the busiest cpu
> time, but left the idlest cpu wait. That is not good for performance.
>
> This patch let the destination cpu do active balance. It will give tasks
> more running time.
>
> Signed-off-by: Alex Shi <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9b4c4f3..cccee76 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6308,7 +6308,7 @@ more_balance:
> raw_spin_unlock_irqrestore(&busiest->lock, flags);
>
> if (active_balance) {
> - stop_one_cpu_nowait(cpu_of(busiest),
> + stop_one_cpu_nowait(busiest->push_cpu,
> active_load_balance_cpu_stop, busiest,
> &busiest->active_balance_work);
> }

This doesn't make sense, the whole point of active balance is that we're
going to move current, for that to work we have to interrupt the CPU
current is running on and make sure another task (the stopper task in
this case) is running, so that the previous current is now a !running
task and we can move it around.

2014-04-17 05:42:55

by Alex Shi

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: let task migration destination cpu do active balance

On 04/16/2014 08:13 PM, Peter Zijlstra wrote:
> On Wed, Apr 16, 2014 at 07:34:29PM +0800, Alex Shi wrote:
>> Chris Redpath found an issue on active balance:
>> We let the task source cpu, the busiest cpu, do the active balance,
>> while the destination cpu maybe idle. thus we take the busiest cpu
>> time, but left the idlest cpu wait. That is not good for performance.
>>
>> This patch let the destination cpu do active balance. It will give tasks
>> more running time.
>>
>> Signed-off-by: Alex Shi <[email protected]>
>> ---
>> kernel/sched/fair.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9b4c4f3..cccee76 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6308,7 +6308,7 @@ more_balance:
>> raw_spin_unlock_irqrestore(&busiest->lock, flags);
>>
>> if (active_balance) {
>> - stop_one_cpu_nowait(cpu_of(busiest),
>> + stop_one_cpu_nowait(busiest->push_cpu,
>> active_load_balance_cpu_stop, busiest,
>> &busiest->active_balance_work);
>> }
>
> This doesn't make sense, the whole point of active balance is that we're
> going to move current, for that to work we have to interrupt the CPU
> current is running on and make sure another task (the stopper task in
> this case) is running, so that the previous current is now a !running
> task and we can move it around.
>

Sure, you are right. thanks for correction!

--
Thanks
Alex

2014-04-23 15:45:54

by Chris Redpath

[permalink] [raw]
Subject: Re: [RFC PATCH] sched: let task migration destination cpu do active balance

On 17/04/14 06:42, Alex Shi wrote:
> On 04/16/2014 08:13 PM, Peter Zijlstra wrote:
>> On Wed, Apr 16, 2014 at 07:34:29PM +0800, Alex Shi wrote:
>>> Chris Redpath found an issue on active balance:
>>> We let the task source cpu, the busiest cpu, do the active balance,
>>> while the destination cpu maybe idle. thus we take the busiest cpu
>>> time, but left the idlest cpu wait. That is not good for performance.
>>>
>>> This patch let the destination cpu do active balance. It will give tasks
>>> more running time.
>>>
>>> Signed-off-by: Alex Shi <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 9b4c4f3..cccee76 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6308,7 +6308,7 @@ more_balance:
>>> raw_spin_unlock_irqrestore(&busiest->lock, flags);
>>>
>>> if (active_balance) {
>>> - stop_one_cpu_nowait(cpu_of(busiest),
>>> + stop_one_cpu_nowait(busiest->push_cpu,
>>> active_load_balance_cpu_stop, busiest,
>>> &busiest->active_balance_work);
>>> }
>>
>> This doesn't make sense, the whole point of active balance is that we're
>> going to move current, for that to work we have to interrupt the CPU
>> current is running on and make sure another task (the stopper task in
>> this case) is running, so that the previous current is now a !running
>> task and we can move it around.
>>
>
> Sure, you are right. thanks for correction!
>
Hi Alex, Peter,

I've been away but just to clarify, the issue I found wasn't a problem
in the scheduler. It would be more accurately described as an
optimization for our big.LITTLE MP support patches (called HMP in the
sources) that live in Linaro's LSK tree (https://wiki.linaro.org/LSK).

I don't think there is anything to talk about in the mainline scheduler
yet, but perhaps if the scheduler knows more about the idle states of
CPUs some similar optimizations could be useful. I'm not convinced that
this kind of optimization scales but in any case I've included a
description below the line for anyone interested.

--Chris


--------
In ARM's big.LITTLE MP solution we arrange the CPUs such that all the
little CPUs are in one MC domain and all the big CPUs are in another MC
domain. We disable load balancing at the CPU level, and use the task
load to decide if a task ought to be running in the big or little MC
domain. If we decide to move a running task, we use a forced migration
which is functionally identical to the active balance but given a
different name to reflect being triggered under different circumstances.

When we try to do a forced migration, it's constrained to only target an
idle big CPU as it's supposed to be increasing the CPU performance
available to the task. The big.LITTLE architecture has big and little
CPUs in separate clusters and even the simplest power management
implementations have total power off for an idle cluster.

It is likely that a lightly-loaded system will have an idle big cluster
if we hit this code path since we also have idle-pull between big and
little domains for eligible tasks when a CPU is about to enter idle and
cannot pull something within its own domain, so there's a good chance
that the target cpu is completely powered down.

The optimization is for us to set a flag on the RQ of the target CPU and
send a reschedule IPI instead of running the stopper immediately. The
target CPU is woken to handle the IPI and the flag tells it to look at
the tasks in the slower domain and set up a task migration if something
appropriate should be found. This allows a busy task to continue to
execute on a little CPU during the power-on time for the big CPU and the
big CPU will be already awake when the task is pushed there by the
stopper. There are some details to make it work as expected but that's
basically it.

The intent is to try to avoid stalling forward progress in a task which
is ramping up CPU activity.
--------