2015-12-11 14:10:36

by Peter Zijlstra

[permalink] [raw]
Subject: SCHED_RR vs push-pull

Hai,

Thomas just reported a 'fun' problem with our rt 'load-balancer'.

The problem is 2 cpus 4 equal prio RR tasks.
Suppose an unequal distribution of these tasks among the CPUs; eg 1:3.

Now one would expect; barring other constraints; that each CPU would get
2 of the tasks and they would RR on their prio level.

This does not happen.

The push-pull thing only acts when there's idle or new tasks, and in the
above scenario, the CPU with only the single RR task will happily
continue running that task, while the other CPU will have to RR between
the remaining 3.


Now my initial thoughts were to define a global RR order using a
virtual timeline and you'll get something like EEVDF on a per RR prio
level with push-pull state between that.

Which might be a tad over engineered.

Is there a 'sane' solution to this problem? One that still is
deterministic, because this is after all, RT scheduling.

Happy thinking ;-)


2015-12-11 19:39:37

by Luca Abeni

[permalink] [raw]
Subject: Re: SCHED_RR vs push-pull

Hi Peter,

On Fri, 11 Dec 2015 15:10:28 +0100
Peter Zijlstra <[email protected]> wrote:
[...]
> Thomas just reported a 'fun' problem with our rt 'load-balancer'.
I suspect the root of the proble is that rt push/pull do not implement
a load balancer, but just make sure that the M high priority tasks
(where M is the number of CPUs) are scheduled for execution.
The difference with a "real" load balancer can be seen when there are
multiple tasks with the same priority :)


> The problem is 2 cpus 4 equal prio RR tasks.
> Suppose an unequal distribution of these tasks among the CPUs; eg 1:3.
>
> Now one would expect; barring other constraints; that each CPU would
> get 2 of the tasks and they would RR on their prio level.
>
> This does not happen.
>
> The push-pull thing only acts when there's idle or new tasks, and in
> the above scenario, the CPU with only the single RR task will happily
> continue running that task, while the other CPU will have to RR
> between the remaining 3.
I might be wrong, but I think this is due to the
if (lowest_rq->rt.highest_prio.curr <= task->prio) {
in rt.c::find_lock_lowest_rq().
I suspect that changing "<=" in "<" might fix the issue, at the cost of
generating a lot of useless tasks migrations.

> Now my initial thoughts were to define a global RR order using a
> virtual timeline and you'll get something like EEVDF on a per RR prio
> level with push-pull state between that.
>
> Which might be a tad over engineered.
I suspect this issue can be fixed in a simpler way, by changing the
check I mentioned above.

If you want to balance SCHED_RR tasks with the same priority, I think
the "lowest_rq->rt.highest_prio.curr <= task->prio" should be extended
to do the migration if:
- the local task has a higher priority than the highest priority task
on lowest_rq (this is what's currently done)
- the local task has the same priority of the highest priority task on
lowest_rq and they are SCHED_RR and the number of tasks with
task->prio on the local RQ is larger than the number of tasks with
lowest_rq->rt.highest_prio.curr on lowest_rq + 1.

I think this could work, but I just looked at the code, without any
real test. If you provide a simple program implementing a testcase, I
can do some experiments in next week.

The alternative (of course I have to mention it :) would be to use
SCHED_DEADLINE instead of SCHED_RR.



Luca

>
> Is there a 'sane' solution to this problem? One that still is
> deterministic, because this is after all, RT scheduling.
>
> Happy thinking ;-)

2015-12-11 19:54:06

by Steven Rostedt

[permalink] [raw]
Subject: Re: SCHED_RR vs push-pull

On Fri, 11 Dec 2015 20:39:18 +0100
Luca Abeni <[email protected]> wrote:

> Hi Peter,
>
> On Fri, 11 Dec 2015 15:10:28 +0100
> Peter Zijlstra <[email protected]> wrote:
> [...]
> > Thomas just reported a 'fun' problem with our rt 'load-balancer'.
> I suspect the root of the proble is that rt push/pull do not implement
> a load balancer, but just make sure that the M high priority tasks
> (where M is the number of CPUs) are scheduled for execution.
> The difference with a "real" load balancer can be seen when there are
> multiple tasks with the same priority :)

Yep.

>
>
> > The problem is 2 cpus 4 equal prio RR tasks.
> > Suppose an unequal distribution of these tasks among the CPUs; eg 1:3.
> >
> > Now one would expect; barring other constraints; that each CPU would
> > get 2 of the tasks and they would RR on their prio level.
> >
> > This does not happen.
> >
> > The push-pull thing only acts when there's idle or new tasks, and in
> > the above scenario, the CPU with only the single RR task will happily
> > continue running that task, while the other CPU will have to RR
> > between the remaining 3.
> I might be wrong, but I think this is due to the
> if (lowest_rq->rt.highest_prio.curr <= task->prio) {
> in rt.c::find_lock_lowest_rq().
> I suspect that changing "<=" in "<" might fix the issue, at the cost of
> generating a lot of useless tasks migrations.

I'm against this. Unless we only do this when current and the task we
want to move are both RR. It might help.


>
> > Now my initial thoughts were to define a global RR order using a
> > virtual timeline and you'll get something like EEVDF on a per RR prio
> > level with push-pull state between that.
> >
> > Which might be a tad over engineered.
> I suspect this issue can be fixed in a simpler way, by changing the
> check I mentioned above.

What happens when current is FIFO, and we just moved an RR task over
that will now never run?

>
> If you want to balance SCHED_RR tasks with the same priority, I think
> the "lowest_rq->rt.highest_prio.curr <= task->prio" should be extended
> to do the migration if:
> - the local task has a higher priority than the highest priority task
> on lowest_rq (this is what's currently done)
> - the local task has the same priority of the highest priority task on
> lowest_rq and they are SCHED_RR and the number of tasks with
> task->prio on the local RQ is larger than the number of tasks with
> lowest_rq->rt.highest_prio.curr on lowest_rq + 1.

Well, the number of tasks may not be good enough. We need to only look
at RR tasks. Perhaps if current is RR and the waiter on the other CPU
is RR, we can do a scan to see if a balance should be done.

>
> I think this could work, but I just looked at the code, without any
> real test. If you provide a simple program implementing a testcase, I
> can do some experiments in next week.
>
> The alternative (of course I have to mention it :) would be to use
> SCHED_DEADLINE instead of SCHED_RR.

Hmm, I wonder if we could have a wrapper around SCHED_DEADLINE to
implement SCHED_RR. Probably not, because SCHED_RR has hard coded
priorities and SCHED_DEADLINE is more dynamic (and still higher than
SCHED_FIFO).

> >
> > Happy thinking ;-)

Heh, I originally thought Peter said "Happy Thanksgiving".

-- Steve

2015-12-11 21:27:46

by Luca Abeni

[permalink] [raw]
Subject: Re: SCHED_RR vs push-pull

Hi Steven,

On Fri, 11 Dec 2015 14:53:59 -0500
Steven Rostedt <[email protected]> wrote:
[...]
> > > The push-pull thing only acts when there's idle or new tasks, and
> > > in the above scenario, the CPU with only the single RR task will
> > > happily continue running that task, while the other CPU will have
> > > to RR between the remaining 3.
> > I might be wrong, but I think this is due to the
> > if (lowest_rq->rt.highest_prio.curr <= task->prio) {
> > in rt.c::find_lock_lowest_rq().
> > I suspect that changing "<=" in "<" might fix the issue, at the
> > cost of generating a lot of useless tasks migrations.
>
> I'm against this.
I agree; this was just a quick hack to check if my theory is correct
(and to work around the issue for someone needing an immediate solution
to this problem).


> Unless we only do this when current and the task we
> want to move are both RR. It might help.
Yes, in a proper solution a check for RR is needed. But I think it is
also important to check the number of tasks having the highest priority
on the two runqueues (otherwise, we risk to continuously bounce tasks
between the two CPUs).


> >
> > > Now my initial thoughts were to define a global RR order using a
> > > virtual timeline and you'll get something like EEVDF on a per RR
> > > prio level with push-pull state between that.
> > >
> > > Which might be a tad over engineered.
> > I suspect this issue can be fixed in a simpler way, by changing the
> > check I mentioned above.
>
> What happens when current is FIFO, and we just moved an RR task over
> that will now never run?
Uh... I did not think about it. Having SCHED_FIFO and SCHED_RR tasks at
the same priority level is... Dangerous. Probably, when a SCHED_FIFO
task is executing on a CPU all the SCHED_RR tasks with the same
priority must be pushed to other CPUs... And tasks should never be
pushed on a CPU where there is a SCHED_FIFO task with the same priority.

> > If you want to balance SCHED_RR tasks with the same priority, I
> > think the "lowest_rq->rt.highest_prio.curr <= task->prio" should be
> > extended to do the migration if:
> > - the local task has a higher priority than the highest priority
> > task on lowest_rq (this is what's currently done)
> > - the local task has the same priority of the highest priority task
> > on lowest_rq and they are SCHED_RR and the number of tasks with
> > task->prio on the local RQ is larger than the number of tasks with
> > lowest_rq->rt.highest_prio.curr on lowest_rq + 1.
>
> Well, the number of tasks may not be good enough. We need to only look
> at RR tasks.
I agree... The proper check is more complex than what I wrote.

> Perhaps if current is RR and the waiter on the other CPU
> is RR, we can do a scan to see if a balance should be done.
Yes; but I suspect we should check how many RR tasks with this priority
are on this CPU, and how many RR tasks with this priority are on the
other CPU... When the difference is larger than 1, the task can be
pushed (otherwise, we risk again to bounce tasks between the two CPUs.
Think about 5 RR tasks, all with the same priority, and 2 CPUs: I guess
the best thing to do is to put 3 tasks on a CPU and 2 on the other, and
do not try to balance. Otherwise, we end up with a migration at every
timeslice).


> > I think this could work, but I just looked at the code, without any
> > real test. If you provide a simple program implementing a testcase,
> > I can do some experiments in next week.
> >
> > The alternative (of course I have to mention it :) would be to use
> > SCHED_DEADLINE instead of SCHED_RR.
>
> Hmm, I wonder if we could have a wrapper around SCHED_DEADLINE to
> implement SCHED_RR. Probably not, because SCHED_RR has hard coded
> priorities and SCHED_DEADLINE is more dynamic (and still higher than
> SCHED_FIFO).
I think we can do something similar to SCHED_RR (but we need to
implement a "soft" runtime enforcement), but of course SCHED_DEADLINE
cannot provide the fixed priority behaviour.



Luca

>
> > >
> > > Happy thinking ;-)
>
> Heh, I originally thought Peter said "Happy Thanksgiving".
>
> -- Steve