2009-06-17 18:36:46

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: [patch 0/2] RFC sched: Change nohz ilb logic from poll to push model

Existing nohz idle load balance (ilb) logic uses the pull model, with one
idle load balancer CPU nominated on any partially idle system and that
balancer CPU not going into nohz mode. With the periodic tick, the
balancer does the idle balancing on behalf of all the CPUs in nohz mode.

This is not very optimal and has few issues:
* the balancer will continue to have periodic ticks and wakeup
frequently (HZ rate), even though it may not have any rebalancing to do on
behalf of any of the idle CPUs.
* On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic
wakeup can result in an additional interrupt on a CPU doing the timer
broadcast.
* The balancer may end up spending a lot of time doing the balancing on
behalf of nohz CPUs, especially with increasing number of sockets and
cores in the platform.

The alternative is to have a push model, where all idle CPUs can enter nohz
mode and busy CPU kicks one of the idle CPUs to take care of idle balancing
on behalf of a group of idle CPUs.

Following patches tries that approach. There are still some rough edges
in the patches related to use of #defines around the code. But, wanted
to get opinion on this approach as an RFC (not for inclusion into the
tree yet).

Thanks,
Venki

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>

--
--


2009-06-17 19:17:08

by Vaidyanathan Srinivasan

[permalink] [raw]
Subject: Re: [patch 0/2] RFC sched: Change nohz ilb logic from poll to push model

* [email protected] <[email protected]> [2009-06-17 11:26:49]:

> Existing nohz idle load balance (ilb) logic uses the pull model, with one
> idle load balancer CPU nominated on any partially idle system and that
> balancer CPU not going into nohz mode. With the periodic tick, the
> balancer does the idle balancing on behalf of all the CPUs in nohz mode.
>
> This is not very optimal and has few issues:
> * the balancer will continue to have periodic ticks and wakeup
> frequently (HZ rate), even though it may not have any rebalancing to do on
> behalf of any of the idle CPUs.
> * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic
> wakeup can result in an additional interrupt on a CPU doing the timer
> broadcast.
> * The balancer may end up spending a lot of time doing the balancing on
> behalf of nohz CPUs, especially with increasing number of sockets and
> cores in the platform.
>
> The alternative is to have a push model, where all idle CPUs can enter nohz
> mode and busy CPU kicks one of the idle CPUs to take care of idle balancing
> on behalf of a group of idle CPUs.

Hi Venki,

The idea is very useful and further extends the power savings in idle
system. However the kick method from busy CPU should not add to
scheduling latency during a sudden burst of work.

Does adding nohz_balancer_kick() in trigger_load_balance() path in
a busy CPU add to its overhead?


> Following patches tries that approach. There are still some rough edges
> in the patches related to use of #defines around the code. But, wanted
> to get opinion on this approach as an RFC (not for inclusion into the
> tree yet).

I like the idea but my only concern is the performance impact on busy
cpus with this push model.

--Vaidy

2009-06-18 23:42:59

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [patch 0/2] RFC sched: Change nohz ilb logic from poll to push model

On Wed, 2009-06-17 at 12:16 -0700, Vaidyanathan Srinivasan wrote:
> * [email protected] <[email protected]> [2009-06-17 11:26:49]:
>
> > Existing nohz idle load balance (ilb) logic uses the pull model, with one
> > idle load balancer CPU nominated on any partially idle system and that
> > balancer CPU not going into nohz mode. With the periodic tick, the
> > balancer does the idle balancing on behalf of all the CPUs in nohz mode.
> >
> > This is not very optimal and has few issues:
> > * the balancer will continue to have periodic ticks and wakeup
> > frequently (HZ rate), even though it may not have any rebalancing to do on
> > behalf of any of the idle CPUs.
> > * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic
> > wakeup can result in an additional interrupt on a CPU doing the timer
> > broadcast.
> > * The balancer may end up spending a lot of time doing the balancing on
> > behalf of nohz CPUs, especially with increasing number of sockets and
> > cores in the platform.
> >
> > The alternative is to have a push model, where all idle CPUs can enter nohz
> > mode and busy CPU kicks one of the idle CPUs to take care of idle balancing
> > on behalf of a group of idle CPUs.
>
> Hi Venki,
>
> The idea is very useful and further extends the power savings in idle
> system. However the kick method from busy CPU should not add to
> scheduling latency during a sudden burst of work.
>
> Does adding nohz_balancer_kick() in trigger_load_balance() path in
> a busy CPU add to its overhead?
>
>
> > Following patches tries that approach. There are still some rough edges
> > in the patches related to use of #defines around the code. But, wanted
> > to get opinion on this approach as an RFC (not for inclusion into the
> > tree yet).
>
> I like the idea but my only concern is the performance impact on busy
> cpus with this push model.

Vaidy,

I tried to keep the overhead on the busy CPU low in this RFC. There is a
check the for next_balance time and if there is a load balance CPU
nominated we just send a resched to the load balance CPU. We do look at
cpu_mask to find the first bit set, when there is no assigned
load_balance CPU (that is when say load balance CPU started running and
no other CPU has nominated himself yet). But, that's the only overhead
there. All the other complexities are handled on the idle CPU side.

Thanks,
Venki