LinuxLists.cc - Re: [Lse-tech] Re: NUMA scheduler 2nd approach

2003-01-14 14:46:23

Subject: Re: [Lse-tech] Re: NUMA scheduler 2nd approach

> I suppose I should not have been so dang lazy and cut-n-pasted
> the line I changed. The change was (((5*4*this_load)/4) + 4)
> which should be the same as your second choice.
> >
> > We def need some constant to avoid low load ping pong, right?
>
> Yep. Without the constant, one could have 6 processes on node
> A and 4 on node B, and node B would end up stealing. While making
> a perfect balance, the expense of the off-node traffic does not
> justify it. At least on the NUMAQ box. It might be justified
> for a different NUMA architecture, which is why I propose putting
> this check in a macro that can be defined in topology.h for each
> architecture.

Yes, I was also concerned about one task in one node and none in the others.
Without some sort of constant we will ping pong the task on every node
endlessly, since there is no % threshold that could make any difference when
the original load value is 0.. Your +4 gets rid of the 1 task case.

-Andrew

2003-01-14 15:04:16

by Erich Focht

[permalink] [raw]

Subject: Re: [Lse-tech] Re: NUMA scheduler 2nd approach

On Tuesday 14 January 2003 17:52, Andrew Theurer wrote:
> > I suppose I should not have been so dang lazy and cut-n-pasted
> > the line I changed. The change was (((5*4*this_load)/4) + 4)
> > which should be the same as your second choice.
> >
> > > We def need some constant to avoid low load ping pong, right?
> >
> > Yep. Without the constant, one could have 6 processes on node
> > A and 4 on node B, and node B would end up stealing. While making
> > a perfect balance, the expense of the off-node traffic does not
> > justify it. At least on the NUMAQ box. It might be justified
> > for a different NUMA architecture, which is why I propose putting
> > this check in a macro that can be defined in topology.h for each
> > architecture.
>
> Yes, I was also concerned about one task in one node and none in the
> others. Without some sort of constant we will ping pong the task on every
> node endlessly, since there is no % threshold that could make any
> difference when the original load value is 0.. Your +4 gets rid of the 1
> task case.

That won't happen because:
- find_busiest_queue() won't detect a sufficient imbalance
(imbalance == 0),
- load_balance() won't be able to steal the task, as it will be
running (the rq->curr task)

So the worst thing that happens is that load_balance() finds out that
it cannot steal the only task running on the runqueue and
returns. But we won't get there because find_busiest_queue() returns
NULL. It happens even in the bad case:
node#0 : 0 tasks
node#1 : 4 tasks (each on its own CPU)
where we would desire to distribute the tasks equally among the
nodes. At least on our IA64 platform...

Regards,
Erich