On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> Since we may do periodic load-balance every 10 ms or so, we will perform
> a number of load-balances where runnable_avg_sum will mostly be
> reflecting the state of the world before a change (new task queued or
> moved a task to a different cpu). If you had have two tasks continuously
> on one cpu and your other cpu is idle, and you move one of the tasks to
> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> first cpu while it starts from 0 on the other one. 10 ms later it will
> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> and we might decide to put more tasks on it because we don't know if
> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> task) or if it will continue to rise and eventually get to 47742.
Ah, no, since we track per task, and update the per-cpu ones when we
migrate tasks, the per-cpu values should be instantly updated.
If we were to increase per task storage, we might as well also track
running_avg not only runnable_avg.
On Tue, Jun 03, 2014 at 04:50:07PM +0100, Peter Zijlstra wrote:
> On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> > Since we may do periodic load-balance every 10 ms or so, we will perform
> > a number of load-balances where runnable_avg_sum will mostly be
> > reflecting the state of the world before a change (new task queued or
> > moved a task to a different cpu). If you had have two tasks continuously
> > on one cpu and your other cpu is idle, and you move one of the tasks to
> > the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> > first cpu while it starts from 0 on the other one. 10 ms later it will
> > have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> > it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> > and we might decide to put more tasks on it because we don't know if
> > runnable_avg_sum represents a partially utilized cpu (for example a 50%
> > task) or if it will continue to rise and eventually get to 47742.
>
> Ah, no, since we track per task, and update the per-cpu ones when we
> migrate tasks, the per-cpu values should be instantly updated.
No, not for this per-cpu tracking metric :) For cfs.runnable_load_avg
you are right, but this patch set is using rq->avg.runnable_avg_sum
which is different. See my other reply.
> If we were to increase per task storage, we might as well also track
> running_avg not only runnable_avg.
That could probably make sense. We had that in pjt's first proposal.
On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
>> Since we may do periodic load-balance every 10 ms or so, we will perform
>> a number of load-balances where runnable_avg_sum will mostly be
>> reflecting the state of the world before a change (new task queued or
>> moved a task to a different cpu). If you had have two tasks continuously
>> on one cpu and your other cpu is idle, and you move one of the tasks to
>> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
>> first cpu while it starts from 0 on the other one. 10 ms later it will
>> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
>> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
>> and we might decide to put more tasks on it because we don't know if
>> runnable_avg_sum represents a partially utilized cpu (for example a 50%
>> task) or if it will continue to rise and eventually get to 47742.
>
> Ah, no, since we track per task, and update the per-cpu ones when we
> migrate tasks, the per-cpu values should be instantly updated.
>
> If we were to increase per task storage, we might as well also track
> running_avg not only runnable_avg.
I agree that the removed running_avg should give more useful
information about the the load of a CPU.
The main issue with running_avg is that it's disturbed by other tasks
(as point out previously). As a typical example, if we have 2 tasks
with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
in the range of [100% - 50%] depending of the parallelism of the
runtime of the tasks whereas the reality is 50% and the use of
running_avg will return this value
Vincent
On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> >> Since we may do periodic load-balance every 10 ms or so, we will perform
> >> a number of load-balances where runnable_avg_sum will mostly be
> >> reflecting the state of the world before a change (new task queued or
> >> moved a task to a different cpu). If you had have two tasks continuously
> >> on one cpu and your other cpu is idle, and you move one of the tasks to
> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> >> first cpu while it starts from 0 on the other one. 10 ms later it will
> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> >> and we might decide to put more tasks on it because we don't know if
> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> >> task) or if it will continue to rise and eventually get to 47742.
> >
> > Ah, no, since we track per task, and update the per-cpu ones when we
> > migrate tasks, the per-cpu values should be instantly updated.
> >
> > If we were to increase per task storage, we might as well also track
> > running_avg not only runnable_avg.
>
> I agree that the removed running_avg should give more useful
> information about the the load of a CPU.
>
> The main issue with running_avg is that it's disturbed by other tasks
> (as point out previously). As a typical example, if we have 2 tasks
> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> in the range of [100% - 50%] depending of the parallelism of the
> runtime of the tasks whereas the reality is 50% and the use of
> running_avg will return this value
I'm not sure I see how 100% is possible, but yes I agree that runnable
can indeed be inflated due to this queueing effect.
On Wed, Jun 04, 2014 at 09:08:09AM +0100, Peter Zijlstra wrote:
> On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> > On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> > > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> > >> Since we may do periodic load-balance every 10 ms or so, we will perform
> > >> a number of load-balances where runnable_avg_sum will mostly be
> > >> reflecting the state of the world before a change (new task queued or
> > >> moved a task to a different cpu). If you had have two tasks continuously
> > >> on one cpu and your other cpu is idle, and you move one of the tasks to
> > >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> > >> first cpu while it starts from 0 on the other one. 10 ms later it will
> > >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> > >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> > >> and we might decide to put more tasks on it because we don't know if
> > >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> > >> task) or if it will continue to rise and eventually get to 47742.
> > >
> > > Ah, no, since we track per task, and update the per-cpu ones when we
> > > migrate tasks, the per-cpu values should be instantly updated.
> > >
> > > If we were to increase per task storage, we might as well also track
> > > running_avg not only runnable_avg.
> >
> > I agree that the removed running_avg should give more useful
> > information about the the load of a CPU.
> >
> > The main issue with running_avg is that it's disturbed by other tasks
> > (as point out previously). As a typical example, if we have 2 tasks
> > with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> > in the range of [100% - 50%] depending of the parallelism of the
> > runtime of the tasks whereas the reality is 50% and the use of
> > running_avg will return this value
Both running_avg and runnable_avg are affected by other tasks on the
same cpus, but in different ways. They are equal if you only have one
task on a cpu. If you have more, running_avg will give you the true
requirement of the tasks until the cpu is fully utilized. At which point
the task running_avg will drop if you add more tasks (the unweighted sum
of task running_avgs remains constant).
runnable_avg on the other hand, might be affected as soon as you have
two task running on the same cpu if they are runnable at the same time.
That isn't necessarily a bad thing for load-balancing purposes, because
tasks that are runnable at the same time are likely to be run more
efficiently by placing them on different cpus. You might view as at sort
of built in concurrency factor, somewhat similar to what Yuyang is
proposing. runnable_avg increases rapidly when the cpu is over-utilized.
> I'm not sure I see how 100% is possible, but yes I agree that runnable
> can indeed be inflated due to this queueing effect.
You should only be able to get to 75% worst case for runnable_avg for
that example. The total running_avg is 50% no matter if the tasks
overlaps or not.
f you had five tasks on one cpu that each have a 25% requirement you can
get individual task runnable_avgs of up to 100% (cpu unweighted
runnable_load_avg can get up 500%, I think), but the task running_avgs
would be 20% each (total of 100%).
On Wed, Jun 04, 2014 at 09:55:42AM +0100, Morten Rasmussen wrote:
> Both running_avg and runnable_avg are affected by other tasks on the
> same cpus, but in different ways. They are equal if you only have one
> task on a cpu. If you have more, running_avg will give you the true
> requirement of the tasks until the cpu is fully utilized. At which point
> the task running_avg will drop if you add more tasks (the unweighted sum
> of task running_avgs remains constant).
>
> runnable_avg on the other hand, might be affected as soon as you have
> two task running on the same cpu if they are runnable at the same time.
> That isn't necessarily a bad thing for load-balancing purposes, because
> tasks that are runnable at the same time are likely to be run more
> efficiently by placing them on different cpus. You might view as at sort
> of built in concurrency factor, somewhat similar to what Yuyang is
> proposing. runnable_avg increases rapidly when the cpu is over-utilized.
Agreed.
> > I'm not sure I see how 100% is possible, but yes I agree that runnable
> > can indeed be inflated due to this queueing effect.
>
> You should only be able to get to 75% worst case for runnable_avg for
> that example. The total running_avg is 50% no matter if the tasks
> overlaps or not.
Yes, 75% is what I ended up with.
> f you had five tasks on one cpu that each have a 25% requirement you can
> get individual task runnable_avgs of up to 100% (cpu unweighted
> runnable_load_avg can get up 500%, I think), but the task running_avgs
> would be 20% each (total of 100%).
Yeah, more or less so indeed. I had not considered the queueing effects
on runnable_avg yesterday, so good that that got raised.
That does indeed invalidate my: runnable - running := extra cpu required
thing. It ends up being the extra cpu required for 0 latency but gobs of
idle time, which is something else entirely.
On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
>> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
>> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
>> >> Since we may do periodic load-balance every 10 ms or so, we will perform
>> >> a number of load-balances where runnable_avg_sum will mostly be
>> >> reflecting the state of the world before a change (new task queued or
>> >> moved a task to a different cpu). If you had have two tasks continuously
>> >> on one cpu and your other cpu is idle, and you move one of the tasks to
>> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
>> >> first cpu while it starts from 0 on the other one. 10 ms later it will
>> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
>> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
>> >> and we might decide to put more tasks on it because we don't know if
>> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
>> >> task) or if it will continue to rise and eventually get to 47742.
>> >
>> > Ah, no, since we track per task, and update the per-cpu ones when we
>> > migrate tasks, the per-cpu values should be instantly updated.
>> >
>> > If we were to increase per task storage, we might as well also track
>> > running_avg not only runnable_avg.
>>
>> I agree that the removed running_avg should give more useful
>> information about the the load of a CPU.
>>
>> The main issue with running_avg is that it's disturbed by other tasks
>> (as point out previously). As a typical example, if we have 2 tasks
>> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
>> in the range of [100% - 50%] depending of the parallelism of the
>> runtime of the tasks whereas the reality is 50% and the use of
>> running_avg will return this value
>
> I'm not sure I see how 100% is possible, but yes I agree that runnable
> can indeed be inflated due to this queueing effect.
In fact, it can be even worse than that because i forgot to take into
account the geometric series effect which implies that it depends of
the runtime (idletime) of the task
Take 3 examples:
2 tasks that need to run 10ms simultaneously each 40ms. If they share
the same CPU, they will be on the runqueue 20ms (in fact a bit less
for one of them), Their load (runnable_avg_sum/runnable_avg_period)
will be 33% each so the unweighted runnable_load_avg of the CPU will
be 66%
2 tasks that need to run 25ms simultaneously each 100ms. If they share
the same CPU, they will be on the runqueue 50ms (in fact a bit less
for one of them), Their load (runnable_avg_sum/runnable_avg_period)
will be 74% each so the unweighted runnable_load_avg of the CPU will
be 148%
2 tasks that need to run 50ms simultaneously each 200ms. If they
share the same CPU, they will be on the runqueue 100ms (in fact a bit
less for one of them), Their load
(runnable_avg_sum/runnable_avg_period) will be 89% each so the
unweighted runnable_load_avg of the CPU will be 180%
Vincent
On 4 June 2014 11:23, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jun 04, 2014 at 09:55:42AM +0100, Morten Rasmussen wrote:
>> Both running_avg and runnable_avg are affected by other tasks on the
>> same cpus, but in different ways. They are equal if you only have one
>> task on a cpu. If you have more, running_avg will give you the true
>> requirement of the tasks until the cpu is fully utilized. At which point
>> the task running_avg will drop if you add more tasks (the unweighted sum
>> of task running_avgs remains constant).
>>
>> runnable_avg on the other hand, might be affected as soon as you have
>> two task running on the same cpu if they are runnable at the same time.
>> That isn't necessarily a bad thing for load-balancing purposes, because
>> tasks that are runnable at the same time are likely to be run more
>> efficiently by placing them on different cpus. You might view as at sort
>> of built in concurrency factor, somewhat similar to what Yuyang is
>> proposing. runnable_avg increases rapidly when the cpu is over-utilized.
>
> Agreed.
>
>> > I'm not sure I see how 100% is possible, but yes I agree that runnable
>> > can indeed be inflated due to this queueing effect.
>>
>> You should only be able to get to 75% worst case for runnable_avg for
>> that example. The total running_avg is 50% no matter if the tasks
>> overlaps or not.
>
> Yes, 75% is what I ended up with.
Can you explain how you reach 75% as it depends on the runtime and a
runtime longer than 345ms will end to a 100% load whatever the
idletime was previously ?
>
>> f you had five tasks on one cpu that each have a 25% requirement you can
>> get individual task runnable_avgs of up to 100% (cpu unweighted
>> runnable_load_avg can get up 500%, I think), but the task running_avgs
>> would be 20% each (total of 100%).
>
> Yeah, more or less so indeed. I had not considered the queueing effects
> on runnable_avg yesterday, so good that that got raised.
>
> That does indeed invalidate my: runnable - running := extra cpu required
> thing. It ends up being the extra cpu required for 0 latency but gobs of
> idle time, which is something else entirely.
On Wed, Jun 04, 2014 at 10:23:13AM +0100, Peter Zijlstra wrote:
> > f you had five tasks on one cpu that each have a 25% requirement you can
> > get individual task runnable_avgs of up to 100% (cpu unweighted
> > runnable_load_avg can get up 500%, I think), but the task running_avgs
> > would be 20% each (total of 100%).
>
> Yeah, more or less so indeed. I had not considered the queueing effects
> on runnable_avg yesterday, so good that that got raised.
>
> That does indeed invalidate my: runnable - running := extra cpu required
> thing. It ends up being the extra cpu required for 0 latency but gobs of
> idle time, which is something else entirely.
Agreed, but I think it is still a useful estimate of the required
compute capacity. If there is a significant difference between runnable
and running on a cpu, the current mix of tasks is not good for latency.
However, we need to treat it as a worst case estimate and not necessarily
try to move exactly runnable-running worth of tasks to another cpu.
So far I haven't been able to come up with something better.
On Wed, Jun 04, 2014 at 10:32:10AM +0100, Vincent Guittot wrote:
> On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> >> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
> >> >> a number of load-balances where runnable_avg_sum will mostly be
> >> >> reflecting the state of the world before a change (new task queued or
> >> >> moved a task to a different cpu). If you had have two tasks continuously
> >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
> >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
> >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> >> >> and we might decide to put more tasks on it because we don't know if
> >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> >> >> task) or if it will continue to rise and eventually get to 47742.
> >> >
> >> > Ah, no, since we track per task, and update the per-cpu ones when we
> >> > migrate tasks, the per-cpu values should be instantly updated.
> >> >
> >> > If we were to increase per task storage, we might as well also track
> >> > running_avg not only runnable_avg.
> >>
> >> I agree that the removed running_avg should give more useful
> >> information about the the load of a CPU.
> >>
> >> The main issue with running_avg is that it's disturbed by other tasks
> >> (as point out previously). As a typical example, if we have 2 tasks
> >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> >> in the range of [100% - 50%] depending of the parallelism of the
> >> runtime of the tasks whereas the reality is 50% and the use of
> >> running_avg will return this value
> >
> > I'm not sure I see how 100% is possible, but yes I agree that runnable
> > can indeed be inflated due to this queueing effect.
>
> In fact, it can be even worse than that because i forgot to take into
> account the geometric series effect which implies that it depends of
> the runtime (idletime) of the task
>
> Take 3 examples:
>
> 2 tasks that need to run 10ms simultaneously each 40ms. If they share
> the same CPU, they will be on the runqueue 20ms (in fact a bit less
> for one of them), Their load (runnable_avg_sum/runnable_avg_period)
> will be 33% each so the unweighted runnable_load_avg of the CPU will
> be 66%
Right, it actually depends on how often you switch between the tasks. If
you sched_tick happens every 10 ms then in this example one task will
run for 10 ms an be done, while the other one waits for 10 ms and then
runs to completion in 10 ms. The result is that one task is runnable for
10 ms and the other one is runnable for 20 ms. That gives you 25% and
50% for a total of 75%.
>
> 2 tasks that need to run 25ms simultaneously each 100ms. If they share
> the same CPU, they will be on the runqueue 50ms (in fact a bit less
> for one of them), Their load (runnable_avg_sum/runnable_avg_period)
> will be 74% each so the unweighted runnable_load_avg of the CPU will
> be 148%
>
> 2 tasks that need to run 50ms simultaneously each 200ms. If they
> share the same CPU, they will be on the runqueue 100ms (in fact a bit
> less for one of them), Their load
> (runnable_avg_sum/runnable_avg_period) will be 89% each so the
> unweighted runnable_load_avg of the CPU will be 180%
In this case you switch been the tasks before they complete, so in this
case both tasks are waiting so runnable will get much righer than 75%.
You are right. It was thinking about tasks where the busy period is
short enough that they run to completion when they are scheduled.
On Wed, Jun 04, 2014 at 11:32:10AM +0200, Vincent Guittot wrote:
> On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> >> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
> >> >> a number of load-balances where runnable_avg_sum will mostly be
> >> >> reflecting the state of the world before a change (new task queued or
> >> >> moved a task to a different cpu). If you had have two tasks continuously
> >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
> >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
> >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> >> >> and we might decide to put more tasks on it because we don't know if
> >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> >> >> task) or if it will continue to rise and eventually get to 47742.
> >> >
> >> > Ah, no, since we track per task, and update the per-cpu ones when we
> >> > migrate tasks, the per-cpu values should be instantly updated.
> >> >
> >> > If we were to increase per task storage, we might as well also track
> >> > running_avg not only runnable_avg.
> >>
> >> I agree that the removed running_avg should give more useful
> >> information about the the load of a CPU.
> >>
> >> The main issue with running_avg is that it's disturbed by other tasks
> >> (as point out previously). As a typical example, if we have 2 tasks
> >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> >> in the range of [100% - 50%] depending of the parallelism of the
> >> runtime of the tasks whereas the reality is 50% and the use of
> >> running_avg will return this value
> >
> > I'm not sure I see how 100% is possible, but yes I agree that runnable
> > can indeed be inflated due to this queueing effect.
Let me explain the 75%, take any one of the above scenarios. Lets call
the two tasks A and B, and let for a moment assume A always wins and
runs first, and then B.
So A will be runnable for 25%, B otoh will be runnable the entire time A
is actually running plus its own running time, giving 50%. Together that
makes 75%.
If you release the assumption that A runs first, but instead assume they
equally win the first execution, you get them averaging at 37.5% each,
which combined will still give 75%.
> In fact, it can be even worse than that because i forgot to take into
> account the geometric series effect which implies that it depends of
> the runtime (idletime) of the task
>
> Take 3 examples:
>
> 2 tasks that need to run 10ms simultaneously each 40ms. If they share
> the same CPU, they will be on the runqueue 20ms (in fact a bit less
> for one of them), Their load (runnable_avg_sum/runnable_avg_period)
> will be 33% each so the unweighted runnable_load_avg of the CPU will
> be 66%
>
> 2 tasks that need to run 25ms simultaneously each 100ms. If they share
> the same CPU, they will be on the runqueue 50ms (in fact a bit less
> for one of them), Their load (runnable_avg_sum/runnable_avg_period)
> will be 74% each so the unweighted runnable_load_avg of the CPU will
> be 148%
>
> 2 tasks that need to run 50ms simultaneously each 200ms. If they
> share the same CPU, they will be on the runqueue 100ms (in fact a bit
> less for one of them), Their load
> (runnable_avg_sum/runnable_avg_period) will be 89% each so the
> unweighted runnable_load_avg of the CPU will be 180%
And this is because the running time is 'large' compared to the decay
and we get hit by the weight of the recent state? Yes, I can see that,
the avg will fluctuate due to the nature of this thing.
On Wed, Jun 04, 2014 at 10:35:15AM +0100, Vincent Guittot wrote:
> On 4 June 2014 11:23, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Jun 04, 2014 at 09:55:42AM +0100, Morten Rasmussen wrote:
> >> Both running_avg and runnable_avg are affected by other tasks on the
> >> same cpus, but in different ways. They are equal if you only have one
> >> task on a cpu. If you have more, running_avg will give you the true
> >> requirement of the tasks until the cpu is fully utilized. At which point
> >> the task running_avg will drop if you add more tasks (the unweighted sum
> >> of task running_avgs remains constant).
> >>
> >> runnable_avg on the other hand, might be affected as soon as you have
> >> two task running on the same cpu if they are runnable at the same time.
> >> That isn't necessarily a bad thing for load-balancing purposes, because
> >> tasks that are runnable at the same time are likely to be run more
> >> efficiently by placing them on different cpus. You might view as at sort
> >> of built in concurrency factor, somewhat similar to what Yuyang is
> >> proposing. runnable_avg increases rapidly when the cpu is over-utilized.
> >
> > Agreed.
> >
> >> > I'm not sure I see how 100% is possible, but yes I agree that runnable
> >> > can indeed be inflated due to this queueing effect.
> >>
> >> You should only be able to get to 75% worst case for runnable_avg for
> >> that example. The total running_avg is 50% no matter if the tasks
> >> overlaps or not.
> >
> > Yes, 75% is what I ended up with.
>
> Can you explain how you reach 75% as it depends on the runtime and a
> runtime longer than 345ms will end to a 100% load whatever the
> idletime was previously ?
If the busy period of each task is long enough that the first one to run
runs to completetion before the other task is scheduled you get 25% and
50%. But as I said in my other reply, you can get higher task runnable
if the tasks busy period is long enough that you switch between them
before the first one completes.
On Wed, Jun 04, 2014 at 11:17:24AM +0100, Peter Zijlstra wrote:
> On Wed, Jun 04, 2014 at 11:32:10AM +0200, Vincent Guittot wrote:
> > On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
> > > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> > >> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> > >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> > >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
> > >> >> a number of load-balances where runnable_avg_sum will mostly be
> > >> >> reflecting the state of the world before a change (new task queued or
> > >> >> moved a task to a different cpu). If you had have two tasks continuously
> > >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
> > >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> > >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
> > >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> > >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> > >> >> and we might decide to put more tasks on it because we don't know if
> > >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> > >> >> task) or if it will continue to rise and eventually get to 47742.
> > >> >
> > >> > Ah, no, since we track per task, and update the per-cpu ones when we
> > >> > migrate tasks, the per-cpu values should be instantly updated.
> > >> >
> > >> > If we were to increase per task storage, we might as well also track
> > >> > running_avg not only runnable_avg.
> > >>
> > >> I agree that the removed running_avg should give more useful
> > >> information about the the load of a CPU.
> > >>
> > >> The main issue with running_avg is that it's disturbed by other tasks
> > >> (as point out previously). As a typical example, if we have 2 tasks
> > >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> > >> in the range of [100% - 50%] depending of the parallelism of the
> > >> runtime of the tasks whereas the reality is 50% and the use of
> > >> running_avg will return this value
> > >
> > > I'm not sure I see how 100% is possible, but yes I agree that runnable
> > > can indeed be inflated due to this queueing effect.
>
> Let me explain the 75%, take any one of the above scenarios. Lets call
> the two tasks A and B, and let for a moment assume A always wins and
> runs first, and then B.
>
> So A will be runnable for 25%, B otoh will be runnable the entire time A
> is actually running plus its own running time, giving 50%. Together that
> makes 75%.
>
> If you release the assumption that A runs first, but instead assume they
> equally win the first execution, you get them averaging at 37.5% each,
> which combined will still give 75%.
But that is assuming that the first task gets to run to completion of it
busy period. If it uses up its sched_slice and we switch to the other
tasks, they both get to wait.
For example, if the sched_slice is 5 ms and the busy period is 10 ms,
the execution pattern would be: A, B, A, B, idle, ... In that case A is
runnable for 15 ms and B is for 20 ms. Assuming that the overall period
is 40 ms, the A runnable is 37.5% and B is 50%.
On Wed, Jun 04, 2014 at 11:36:19AM +0100, Morten Rasmussen wrote:
> On Wed, Jun 04, 2014 at 11:17:24AM +0100, Peter Zijlstra wrote:
> > Let me explain the 75%, take any one of the above scenarios. Lets call
> > the two tasks A and B, and let for a moment assume A always wins and
> > runs first, and then B.
> >
> > So A will be runnable for 25%, B otoh will be runnable the entire time A
> > is actually running plus its own running time, giving 50%. Together that
> > makes 75%.
> >
> > If you release the assumption that A runs first, but instead assume they
> > equally win the first execution, you get them averaging at 37.5% each,
> > which combined will still give 75%.
>
> But that is assuming that the first task gets to run to completion of it
> busy period. If it uses up its sched_slice and we switch to the other
> tasks, they both get to wait.
>
> For example, if the sched_slice is 5 ms and the busy period is 10 ms,
> the execution pattern would be: A, B, A, B, idle, ... In that case A is
> runnable for 15 ms and B is for 20 ms. Assuming that the overall period
> is 40 ms, the A runnable is 37.5% and B is 50%.
Indeed, with preemption added you can pull this out further. You can
then indeed get infinitesimally close to 100% with this.
On 4 June 2014 12:36, Morten Rasmussen <[email protected]> wrote:
> On Wed, Jun 04, 2014 at 11:17:24AM +0100, Peter Zijlstra wrote:
>> On Wed, Jun 04, 2014 at 11:32:10AM +0200, Vincent Guittot wrote:
>> > On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
>> > > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
>> > >> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
>> > >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
>> > >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
>> > >> >> a number of load-balances where runnable_avg_sum will mostly be
>> > >> >> reflecting the state of the world before a change (new task queued or
>> > >> >> moved a task to a different cpu). If you had have two tasks continuously
>> > >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
>> > >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
>> > >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
>> > >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
>> > >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
>> > >> >> and we might decide to put more tasks on it because we don't know if
>> > >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
>> > >> >> task) or if it will continue to rise and eventually get to 47742.
>> > >> >
>> > >> > Ah, no, since we track per task, and update the per-cpu ones when we
>> > >> > migrate tasks, the per-cpu values should be instantly updated.
>> > >> >
>> > >> > If we were to increase per task storage, we might as well also track
>> > >> > running_avg not only runnable_avg.
>> > >>
>> > >> I agree that the removed running_avg should give more useful
>> > >> information about the the load of a CPU.
>> > >>
>> > >> The main issue with running_avg is that it's disturbed by other tasks
>> > >> (as point out previously). As a typical example, if we have 2 tasks
>> > >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
>> > >> in the range of [100% - 50%] depending of the parallelism of the
>> > >> runtime of the tasks whereas the reality is 50% and the use of
>> > >> running_avg will return this value
>> > >
>> > > I'm not sure I see how 100% is possible, but yes I agree that runnable
>> > > can indeed be inflated due to this queueing effect.
>>
>> Let me explain the 75%, take any one of the above scenarios. Lets call
>> the two tasks A and B, and let for a moment assume A always wins and
>> runs first, and then B.
>>
>> So A will be runnable for 25%, B otoh will be runnable the entire time A
>> is actually running plus its own running time, giving 50%. Together that
>> makes 75%.
>>
>> If you release the assumption that A runs first, but instead assume they
>> equally win the first execution, you get them averaging at 37.5% each,
>> which combined will still give 75%.
>
> But that is assuming that the first task gets to run to completion of it
> busy period. If it uses up its sched_slice and we switch to the other
> tasks, they both get to wait.
>
> For example, if the sched_slice is 5 ms and the busy period is 10 ms,
> the execution pattern would be: A, B, A, B, idle, ... In that case A is
> runnable for 15 ms and B is for 20 ms. Assuming that the overall period
> is 40 ms, the A runnable is 37.5% and B is 50%.
The exact value for your scheduling example above is:
A runnable will be 47% and B runnable will be 60% (unless i make a
mistake in my computation)
and CPU runnable will be 60% too
Vincent
>
On Wed, Jun 04, 2014 at 12:07:29PM +0100, Vincent Guittot wrote:
> On 4 June 2014 12:36, Morten Rasmussen <[email protected]> wrote:
> > On Wed, Jun 04, 2014 at 11:17:24AM +0100, Peter Zijlstra wrote:
> >> On Wed, Jun 04, 2014 at 11:32:10AM +0200, Vincent Guittot wrote:
> >> > On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
> >> > > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> >> > >> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
> >> > >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> >> > >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
> >> > >> >> a number of load-balances where runnable_avg_sum will mostly be
> >> > >> >> reflecting the state of the world before a change (new task queued or
> >> > >> >> moved a task to a different cpu). If you had have two tasks continuously
> >> > >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
> >> > >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> >> > >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
> >> > >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> >> > >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> >> > >> >> and we might decide to put more tasks on it because we don't know if
> >> > >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> >> > >> >> task) or if it will continue to rise and eventually get to 47742.
> >> > >> >
> >> > >> > Ah, no, since we track per task, and update the per-cpu ones when we
> >> > >> > migrate tasks, the per-cpu values should be instantly updated.
> >> > >> >
> >> > >> > If we were to increase per task storage, we might as well also track
> >> > >> > running_avg not only runnable_avg.
> >> > >>
> >> > >> I agree that the removed running_avg should give more useful
> >> > >> information about the the load of a CPU.
> >> > >>
> >> > >> The main issue with running_avg is that it's disturbed by other tasks
> >> > >> (as point out previously). As a typical example, if we have 2 tasks
> >> > >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> >> > >> in the range of [100% - 50%] depending of the parallelism of the
> >> > >> runtime of the tasks whereas the reality is 50% and the use of
> >> > >> running_avg will return this value
> >> > >
> >> > > I'm not sure I see how 100% is possible, but yes I agree that runnable
> >> > > can indeed be inflated due to this queueing effect.
> >>
> >> Let me explain the 75%, take any one of the above scenarios. Lets call
> >> the two tasks A and B, and let for a moment assume A always wins and
> >> runs first, and then B.
> >>
> >> So A will be runnable for 25%, B otoh will be runnable the entire time A
> >> is actually running plus its own running time, giving 50%. Together that
> >> makes 75%.
> >>
> >> If you release the assumption that A runs first, but instead assume they
> >> equally win the first execution, you get them averaging at 37.5% each,
> >> which combined will still give 75%.
> >
> > But that is assuming that the first task gets to run to completion of it
> > busy period. If it uses up its sched_slice and we switch to the other
> > tasks, they both get to wait.
> >
> > For example, if the sched_slice is 5 ms and the busy period is 10 ms,
> > the execution pattern would be: A, B, A, B, idle, ... In that case A is
> > runnable for 15 ms and B is for 20 ms. Assuming that the overall period
> > is 40 ms, the A runnable is 37.5% and B is 50%.
>
> The exact value for your scheduling example above is:
> A runnable will be 47% and B runnable will be 60% (unless i make a
> mistake in my computation)
I get:
A: 15/40 ms = 37.5%
B: 20/40 ms = 50%
Schedule:
| 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms |
A: run rq run ----------- sleeping ------------- run
B: rq run rq run ---- sleeping ------------- rq
> and CPU runnable will be 60% too
rq->avg.runnable_avg_sum should be 50%. You have two tasks running for
20 ms every 40 ms.
Right?
Morten
On 4 June 2014 13:23, Morten Rasmussen <[email protected]> wrote:
> On Wed, Jun 04, 2014 at 12:07:29PM +0100, Vincent Guittot wrote:
>> On 4 June 2014 12:36, Morten Rasmussen <[email protected]> wrote:
>> > On Wed, Jun 04, 2014 at 11:17:24AM +0100, Peter Zijlstra wrote:
>> >> On Wed, Jun 04, 2014 at 11:32:10AM +0200, Vincent Guittot wrote:
>> >> > On 4 June 2014 10:08, Peter Zijlstra <[email protected]> wrote:
>> >> > > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
>> >> > >> On 3 June 2014 17:50, Peter Zijlstra <[email protected]> wrote:
>> >> > >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
>> >> > >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
>> >> > >> >> a number of load-balances where runnable_avg_sum will mostly be
>> >> > >> >> reflecting the state of the world before a change (new task queued or
>> >> > >> >> moved a task to a different cpu). If you had have two tasks continuously
>> >> > >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
>> >> > >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
>> >> > >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
>> >> > >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
>> >> > >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
>> >> > >> >> and we might decide to put more tasks on it because we don't know if
>> >> > >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
>> >> > >> >> task) or if it will continue to rise and eventually get to 47742.
>> >> > >> >
>> >> > >> > Ah, no, since we track per task, and update the per-cpu ones when we
>> >> > >> > migrate tasks, the per-cpu values should be instantly updated.
>> >> > >> >
>> >> > >> > If we were to increase per task storage, we might as well also track
>> >> > >> > running_avg not only runnable_avg.
>> >> > >>
>> >> > >> I agree that the removed running_avg should give more useful
>> >> > >> information about the the load of a CPU.
>> >> > >>
>> >> > >> The main issue with running_avg is that it's disturbed by other tasks
>> >> > >> (as point out previously). As a typical example, if we have 2 tasks
>> >> > >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
>> >> > >> in the range of [100% - 50%] depending of the parallelism of the
>> >> > >> runtime of the tasks whereas the reality is 50% and the use of
>> >> > >> running_avg will return this value
>> >> > >
>> >> > > I'm not sure I see how 100% is possible, but yes I agree that runnable
>> >> > > can indeed be inflated due to this queueing effect.
>> >>
>> >> Let me explain the 75%, take any one of the above scenarios. Lets call
>> >> the two tasks A and B, and let for a moment assume A always wins and
>> >> runs first, and then B.
>> >>
>> >> So A will be runnable for 25%, B otoh will be runnable the entire time A
>> >> is actually running plus its own running time, giving 50%. Together that
>> >> makes 75%.
>> >>
>> >> If you release the assumption that A runs first, but instead assume they
>> >> equally win the first execution, you get them averaging at 37.5% each,
>> >> which combined will still give 75%.
>> >
>> > But that is assuming that the first task gets to run to completion of it
>> > busy period. If it uses up its sched_slice and we switch to the other
>> > tasks, they both get to wait.
>> >
>> > For example, if the sched_slice is 5 ms and the busy period is 10 ms,
>> > the execution pattern would be: A, B, A, B, idle, ... In that case A is
>> > runnable for 15 ms and B is for 20 ms. Assuming that the overall period
>> > is 40 ms, the A runnable is 37.5% and B is 50%.
>>
>> The exact value for your scheduling example above is:
>> A runnable will be 47% and B runnable will be 60% (unless i make a
>> mistake in my computation)
>
> I get:
>
> A: 15/40 ms = 37.5%
> B: 20/40 ms = 50%
>
> Schedule:
>
> | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms |
> A: run rq run ----------- sleeping ------------- run
> B: rq run rq run ---- sleeping ------------- rq
>
>> and CPU runnable will be 60% too
>
> rq->avg.runnable_avg_sum should be 50%. You have two tasks running for
> 20 ms every 40 ms.
>
> Right?
ok, i see the misunderstood.
it's depends of what we mean by runnable. You take the % of time
whereas i take the runnable_avg_sum/period
so A is on_rq 15/40 ms = 37.5% of the time which gives a
runnable_avg_sum/runnable_avg_period of 47%
B is on_rq 20/40 ms = 50% of the time which gives a
runnable_avg_sum/runnable_avg_period of 60%
and CPU has a task on its rq 20/40ms = 50% of the time which gives a
runnable_avg_sum/runnable_avg_period of 60%
Vincent
>
> Morten
On Wed, Jun 04, 2014 at 12:52:46PM +0100, Vincent Guittot wrote:
> On 4 June 2014 13:23, Morten Rasmussen <[email protected]> wrote:
> > I get:
> >
> > A: 15/40 ms = 37.5%
> > B: 20/40 ms = 50%
> >
> > Schedule:
> >
> > | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms |
> > A: run rq run ----------- sleeping ------------- run
> > B: rq run rq run ---- sleeping ------------- rq
> >
> >> and CPU runnable will be 60% too
> >
> > rq->avg.runnable_avg_sum should be 50%. You have two tasks running for
> > 20 ms every 40 ms.
> >
> > Right?
>
> ok, i see the misunderstood.
> it's depends of what we mean by runnable. You take the % of time
> whereas i take the runnable_avg_sum/period
Right. There is a difference.
>
> so A is on_rq 15/40 ms = 37.5% of the time which gives a
> runnable_avg_sum/runnable_avg_period of 47%
> B is on_rq 20/40 ms = 50% of the time which gives a
> runnable_avg_sum/runnable_avg_period of 60%
> and CPU has a task on its rq 20/40ms = 50% of the time which gives a
> runnable_avg_sum/runnable_avg_period of 60%
Yes, that seems about right.
On Wed, Jun 04, 2014 at 02:09:18PM +0100, Morten Rasmussen wrote:
> On Wed, Jun 04, 2014 at 12:52:46PM +0100, Vincent Guittot wrote:
> > On 4 June 2014 13:23, Morten Rasmussen <[email protected]> wrote:
> > > I get:
> > >
> > > A: 15/40 ms = 37.5%
> > > B: 20/40 ms = 50%
> > >
> > > Schedule:
> > >
> > > | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms | 5 ms |
> > > A: run rq run ----------- sleeping ------------- run
> > > B: rq run rq run ---- sleeping ------------- rq
> > >
> > >> and CPU runnable will be 60% too
> > >
> > > rq->avg.runnable_avg_sum should be 50%. You have two tasks running for
> > > 20 ms every 40 ms.
> > >
> > > Right?
> >
> > ok, i see the misunderstood.
> > it's depends of what we mean by runnable. You take the % of time
> > whereas i take the runnable_avg_sum/period
>
> Right. There is a difference.
>
> >
> > so A is on_rq 15/40 ms = 37.5% of the time which gives a
> > runnable_avg_sum/runnable_avg_period of 47%
> > B is on_rq 20/40 ms = 50% of the time which gives a
> > runnable_avg_sum/runnable_avg_period of 60%
> > and CPU has a task on its rq 20/40ms = 50% of the time which gives a
> > runnable_avg_sum/runnable_avg_period of 60%
>
> Yes, that seems about right.
If my calculations are right, cfs.runnable_load_avg would peak 15 ms
into the period at about 104%. After 20 ms A has decayed more than B has
gained so the sum is slightly lower.