LinuxLists.cc - report a bug about sched_rt

2009-07-24 10:57:37

[permalink] [raw]

Subject: report a bug about sched_rt

I find something is wrong about sched_rt.

when I am debugging my system with rt_bandwidth_enabled, there is a
running realtime FIFO task in the sched_rt running queue and
the fair running queue is empty. I found the idle task will be
scheduled up when the running task still lie in the sched_rt running
queue!

this will happen when rt runqueue passed it's rt_bandwidth_enabled
runtime,then the scheduler choose the idle task instead of realtime
FIFO task.

the reason lie in: when scheduler try to pick up a realtime FIFO task,
it will check if rt_throttled is enabled,
if so, it'll return and try fair queue but it is empty, then it come
to the sched_idle class.

I don't think it reasonable, we should give the realtime FIFO task the
chance, even when rt runqueue passed it's runtime.
because it is cpu's free time.

To fix it ,and keep rt_bandwidth works as before, I think
pick_next_task_rt() is the best space,

the pick_next_task_rt should check another condiction: rq->cfs.nr_running.

So,I modify pick_next_task_rt() like this and debug it on my omap3430
zoom2 board, it works!

static struct task_struct *pick_next_task_rt(struct rq *rq)
{
struct sched_rt_entity *rt_se;
struct task_struct *p;
struct rt_rq *rt_rq;

...

if (rt_rq_throttled(rt_rq)&& rq->cfs.nr_running)
return NULL;

...
}

2009-07-24 12:13:07

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 18:57 +0800, sen wang wrote:
> I find something is wrong about sched_rt.
>
> when I am debugging my system with rt_bandwidth_enabled, there is a
> running realtime FIFO task in the sched_rt running queue and
> the fair running queue is empty. I found the idle task will be
> scheduled up when the running task still lie in the sched_rt running
> queue!
>
> this will happen when rt runqueue passed it's rt_bandwidth_enabled
> runtime,then the scheduler choose the idle task instead of realtime
> FIFO task.
>
> the reason lie in: when scheduler try to pick up a realtime FIFO task,
> it will check if rt_throttled is enabled,
> if so, it'll return and try fair queue but it is empty, then it come
> to the sched_idle class.
>
> I don't think it reasonable, we should give the realtime FIFO task the
> chance, even when rt runqueue passed it's runtime.
> because it is cpu's free time.
>
> To fix it ,and keep rt_bandwidth works as before, I think
> pick_next_task_rt() is the best space,

RT is about determinism, sometimes having some extra time dependent on
the runnability of SCHED_OTHER tasks is utterly useless.

If you don't like the throttle, disable it.

2009-07-24 13:12:20

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Linux is used in many fieldes. SCHED_OTHER tasks is important to
embedded system.

if there is a running state task（a realtime task）, how can we shcedule
the idle task up?
It is ridiculous！

since the throttle has a bug, why not fix it?

we just modify the codes of checking conditions of picking rt taskes!

static struct task_struct *pick_next_task_rt(struct rq *rq)
{
...

if (rt_rq_throttled(rt_rq)&& rq->cfs.nr_running)
return NULL;

...
}

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 18:57 +0800, sen wang wrote:
>> I find something is wrong about sched_rt.
>>
>> when I am debugging my system with rt_bandwidth_enabled, there is a
>> running realtime FIFO task in the sched_rt running queue and
>> the fair running queue is empty. I found the idle task will be
>> scheduled up when the running task still lie in the sched_rt running
>> queue!
>>
>> this will happen when rt runqueue passed it's rt_bandwidth_enabled
>> runtime,then the scheduler choose the idle task instead of realtime
>> FIFO task.
>>
>> the reason lie in: when scheduler try to pick up a realtime FIFO task,
>> it will check if rt_throttled is enabled,
>> if so, it'll return and try fair queue but it is empty, then it come
>> to the sched_idle class.
>>
>> I don't think it reasonable, we should give the realtime FIFO task the
>> chance, even when rt runqueue passed it's runtime.
>> because it is cpu's free time.
>>
>> To fix it ,and keep rt_bandwidth works as before, I think
>> pick_next_task_rt() is the best space,
>
> RT is about determinism, sometimes having some extra time dependent on
> the runnability of SCHED_OTHER tasks is utterly useless.
>
> If you don't like the throttle, disable it.
>
>
>

2009-07-24 13:13:17

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Don't top post -- again!

On Fri, 2009-07-24 at 21:04 +0800, sen wang wrote:
> Linux is used in many fieldes. SCHED_OTHER tasks is important to
> embedded system.

Irrelevant.

> if there is a running state task（a realtime task）, how can we
> shcedule the idle task up?

Because it ran out of bandwidth.

> It is ridiculous！
>
> since the throttle has a bug, why not fix it?

It doesn't have a bug, therefore I won't fix it.

The throttle limits the RT tasks to a bandwidth w of u/p.
Since real-time scheduling is about determinism a maximum bandwidth
larger than the minimum bandwidth specified by w is useless since it
cannot be relied upon.

Therefore we don't run RT tasks beyond their bandwidth limit.

Go read up on scheduling theory.

Now you might want a bandwidth of 100% for your RT application (not
something I can recommend for the overall health of your machine) in
which case you're free to change this setting:

echo -1 > /proc/sys/kernel/sched_rt_runtime_us

Should do that for you. Also read:

Documentation/scheduler/sched-rt-group.txt

2009-07-24 13:26:44

[permalink] [raw]

Subject: Re: report a bug about sched_rt

don't tell me what theory. don't be so doctrinairism! OK?
If cpu is free and there is a running state task,how can you scdedule
idle task up?
I tell you again:we are not talking about a bandwidth of 100% for RT!
Bug lies in the bandwidth of (100- X)%.(X<100)
even in the time of 100-X,if there is a rt task, you should not idle()
the system.

2009/7/24 Peter Zijlstra <[email protected]>:
> Don't top post -- again!
>
> On Fri, 2009-07-24 at 21:04 +0800, sen wang wrote:
>> Linux is used in many fieldes. SCHED_OTHER tasks is important to
>> embedded system.
>
> Irrelevant.
>
>> if there is a running state task（a realtime task）, how can we
>> shcedule the idle task up?
>
> Because it ran out of bandwidth.
>
>> It is ridiculous！
>>
>> since the throttle has a bug, why not fix it?
>
> It doesn't have a bug, therefore I won't fix it.
>
> The throttle limits the RT tasks to a bandwidth w of u/p.
> Since real-time scheduling is about determinism a maximum bandwidth
> larger than the minimum bandwidth specified by w is useless since it
> cannot be relied upon.
>
> Therefore we don't run RT tasks beyond their bandwidth limit.
>
> Go read up on scheduling theory.
>
> Now you might want a bandwidth of 100% for your RT application (not
> something I can recommend for the overall health of your machine) in
> which case you're free to change this setting:
>
> echo -1 > /proc/sys/kernel/sched_rt_runtime_us
>
> Should do that for you. Also read:
>
> Documentation/scheduler/sched-rt-group.txt
>
>
>

2009-07-24 13:32:00

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 21:26 +0800, sen wang wrote:
> don't tell me what theory. don't be so doctrinairism! OK?
> If cpu is free and there is a running state task,how can you scdedule
> idle task up?
> I tell you again:we are not talking about a bandwidth of 100% for RT!
> Bug lies in the bandwidth of (100- X)%.(X<100)
> even in the time of 100-X,if there is a rt task, you should not idle()
> the system.

*sigh*

Yes we should. I appreciate that you might assume otherwise, but you're
wrong. Suppose you have two competing bandwidth groups, which one will
run over, to what purpose?

Also, your next top post will go to /dev/null.

2009-07-24 13:44:47

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 21:26 +0800, sen wang wrote:
>> don't tell me what theory. don't be so doctrinairism! OK?
>> If cpu is free and there is a running state task,how can you scdedule
>> idle task up?
>> I tell you again:we are not talking about a bandwidth of 100% for RT!
>> Bug lies in the bandwidth of (100- X)%.(X<100)
>> even in the time of 100-X,if there is a rt task, you should not idle()
>> the system.
>
> *sigh*
>
> Yes we should. I appreciate that you might assume otherwise, but you're
> wrong. Suppose you have two competing bandwidth groups, which one will
> run over, to what purpose?
>
> Also, your next top post will go to /dev/null.
>

OK ! maybe you has not understand what I said.
It not two competing bandwidth groups. there is a active group and
another is empty?
How you do?
Why not try it by your hand: empty the fair task, run a rt task,enable
the bandwidth and
see what will happen!

In many embedded system,idle task will lead to shutdown something, but
the rt task will
assume: when it run, idle will not happen!

2009-07-24 13:52:41

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 21:44 +0800, sen wang wrote:
> 2009/7/24 Peter Zijlstra <[email protected]>:
> > On Fri, 2009-07-24 at 21:26 +0800, sen wang wrote:
> >> don't tell me what theory. don't be so doctrinairism! OK?
> >> If cpu is free and there is a running state task,how can you scdedule
> >> idle task up?
> >> I tell you again:we are not talking about a bandwidth of 100% for RT!
> >> Bug lies in the bandwidth of (100- X)%.(X<100)
> >> even in the time of 100-X,if there is a rt task, you should not idle()
> >> the system.
> >
> > *sigh*
> >
> > Yes we should. I appreciate that you might assume otherwise, but you're
> > wrong. Suppose you have two competing bandwidth groups, which one will
> > run over, to what purpose?
> >
> > Also, your next top post will go to /dev/null.
> >
>
>
> OK ! maybe you has not understand what I said.
> It not two competing bandwidth groups. there is a active group and
> another is empty?
> How you do?

No, but the 1 group is the trivial case of many groups. Changing the
semantics for the trivial case is inconsistent at best, and confusing at
worst.

> Why not try it by your hand: empty the fair task, run a rt task,enable
> the bandwidth and
> see what will happen!

Oh, I know, I wrote the code.

> In many embedded system,idle task will lead to shutdown something, but
> the rt task will
> assume: when it run, idle will not happen!

How is it my problem when you design your system wrong?

If you want your 1 RT group to not get throttled, disable the throttle,
or adjust it to fit the parameters of your workload. If you don't want
idle to have latency impact on your RT tasks, fix your idle behaviour.

2009-07-24 14:04:18

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 21:44 +0800, sen wang wrote:
>> 2009/7/24 Peter Zijlstra <[email protected]>:
>> > On Fri, 2009-07-24 at 21:26 +0800, sen wang wrote:
>> >> don't tell me what theory. don't be so doctrinairism! OK?
>> >> If cpu is free and there is a running state task,how can you scdedule
>> >> idle task up?
>> >> I tell you again:we are not talking about a bandwidth of 100% for RT!
>> >> Bug lies in the bandwidth of (100- X)%.(X<100)
>> >> even in the time of 100-X,if there is a rt task, you should not idle()
>> >> the system.
>> >
>> > *sigh*
>> >
>> > Yes we should. I appreciate that you might assume otherwise, but you're
>> > wrong. Suppose you have two competing bandwidth groups, which one will
>> > run over, to what purpose?
>> >
>> > Also, your next top post will go to /dev/null.
>> >
>>
>>
>> OK ! maybe you has not understand what I said.
>> It not two competing bandwidth groups. there is a active group and
>> another is empty?
>> How you do?
>
> No, but the 1 group is the trivial case of many groups. Changing the
> semantics for the trivial case is inconsistent at best, and confusing at
> worst.
>
>> Why not try it by your hand: empty the fair task, run a rt task,enable
>> the bandwidth and
>> see what will happen!
>
> Oh, I know, I wrote the code.
>
>> In many embedded system,idle task will lead to shutdown something, but
>> the rt task will
>> assume: when it run, idle will not happen!
>
> How is it my problem when you design your system wrong?
>
> If you want your 1 RT group to not get throttled, disable the throttle,
> or adjust it to fit the parameters of your workload. If you don't want
> idle to have latency impact on your RT tasks, fix your idle behaviour.
>
>
>

OK
just one question:
if cpu is free and there is running state task, how you do?
schedule the task up? or schedule idle task up?

2009-07-24 14:24:07

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 21:44 +0800, sen wang wrote:
>> 2009/7/24 Peter Zijlstra <[email protected]>:
>> > On Fri, 2009-07-24 at 21:26 +0800, sen wang wrote:
>> >> don't tell me what theory. don't be so doctrinairism! OK?
>> >> If cpu is free and there is a running state task,how can you scdedule
>> >> idle task up?
>> >> I tell you again:we are not talking about a bandwidth of 100% for RT!
>> >> Bug lies in the bandwidth of (100- X)%.(X<100)
>> >> even in the time of 100-X,if there is a rt task, you should not idle()
>> >> the system.
>> >
>> > *sigh*
>> >
>> > Yes we should. I appreciate that you might assume otherwise, but you're
>> > wrong. Suppose you have two competing bandwidth groups, which one will
>> > run over, to what purpose?
>> >
>> > Also, your next top post will go to /dev/null.
>> >
>>
>>
>> OK ! maybe you has not understand what I said.
>> It not two competing bandwidth groups. there is a active group and
>> another is empty?
>> How you do?
>
> No, but the 1 group is the trivial case of many groups. Changing the
> semantics for the trivial case is inconsistent at best, and confusing at
> worst.
yes! 1 group is the trivial case ,but you can't say it is useless. and
in some system
it is important!
I have read across the schedule codes and tried this way,it work:
static struct task_struct *pick_next_task_rt(struct rq *rq)
{ ...
if (rt_rq_throttled(rt_rq)&& rq->cfs.nr_running)
return NULL;
...
}

>> Why not try it by your hand: empty the fair task, run a rt task,enable
>> the bandwidth and
>> see what will happen!
>
> Oh, I know, I wrote the code.
>
>> In many embedded system,idle task will lead to shutdown something, but
>> the rt task will
>> assume: when it run, idle will not happen!
>
> How is it my problem when you design your system wrong?

my system is good. but there is no rules what the idle task will do,so.
people always write codes in idle task with the assume: no any running
task in the system.
and people also always write codes in rt task with the assume: if I am
in running state
,system will not idle.

so what i said above is some like theory,but I don't like the word “theory".
I call it people's common sense.

but the behavior of the throttled RT group is changed from people's
common sense,so don't say people's common sense is wrong. OK?

>
> If you want your 1 RT group to not get throttled, disable the throttle,
> or adjust it to fit the parameters of your workload. If you don't want
> idle to have latency impact on your RT tasks, fix your idle behaviour.
>

1 RT is important to me. But I also have fair task, so throttled is
also important to me.
and don't say : idle have latency impact on RT tasks. It is too
ludicrous Why we make intended latency impact by ourselves,by wrong
idle task?

2009-07-24 14:46:55

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 22:04 +0800, sen wang wrote:

> just one question:
> if cpu is free and there is running state task, how you do?
> schedule the task up? or schedule idle task up?

Well, when an RT group is over the bandwidth limit I don't consider them
runnable. Therefore, failing to find any other tasks, we run the idle
task.

2009-07-24 14:47:06

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 22:24 +0800, sen wang wrote:

> > No, but the 1 group is the trivial case of many groups. Changing the
> > semantics for the trivial case is inconsistent at best, and confusing at
> > worst.

> yes! 1 group is the trivial case ,but you can't say it is useless. and
> in some system
> it is important!
> I have read across the schedule codes and tried this way,it work:
> static struct task_struct *pick_next_task_rt(struct rq *rq)
> { ...
> if (rt_rq_throttled(rt_rq)&& rq->cfs.nr_running)
> return NULL;
> ....
> }

That might work in the current implementation, but like I already
explained, its not consistent with the multi-group case. Also, people
are working on making it a proper EDF scheduled CBS, it won't
generalize.

> > How is it my problem when you design your system wrong?
>
> my system is good. but there is no rules what the idle task will do,so.
> people always write codes in idle task with the assume: no any running
> task in the system.
> and people also always write codes in rt task with the assume: if I am
> in running state
> ,system will not idle.
>
> so what i said above is some like theory,but I don't like the word “theory".
> I call it people's common sense.
>
> but the behavior of the throttled RT group is changed from people's
> common sense,so don't say people's common sense is wrong. OK?

There are plenty of examples where common sense utterly fails, the one
that comes to mind is Probability Theory.

> > If you want your 1 RT group to not get throttled, disable the throttle,
> > or adjust it to fit the parameters of your workload. If you don't want
> > idle to have latency impact on your RT tasks, fix your idle behaviour.
> >
>
> 1 RT is important to me. But I also have fair task, so throttled is
> also important to me.
> and don't say : idle have latency impact on RT tasks. It is too
> ludicrous Why we make intended latency impact by ourselves,by wrong
> idle task?

Yes, configurable idle tasks are nothing new. If you care about wakeup
latency then idle=poll is preferred (it sucks for power saving, but such
is life).

On your embedded board you seem to have a particularly aggressive idle
function wrt power savings, which would result in rather large wake from
idle latencies, regardless of the bandwidth throttle, so what is the
problem?

If you're using the bandwidth throttle to control your RT tasks so as
not to starve your SCHED_OTHER tasks, then I will call your system ill
designed.

2009-07-24 14:53:16

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 22:04 +0800, sen wang wrote:
>
>> just one question:
>> if cpu is free and there is running state task, how you do?
>> schedule the task up? or schedule idle task up?
>
> Well, when an RT group is over the bandwidth limit I don't consider them
> runnable. Therefore, failing to find any other tasks, we run the idle
> task.
>

you consider them runnable. but sorry, what you consider is wrong!

2009-07-24 15:02:52

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 22:24 +0800, sen wang wrote:
>
>> > No, but the 1 group is the trivial case of many groups. Changing the
>> > semantics for the trivial case is inconsistent at best, and confusing at
>> > worst.
>
>> yes! 1 group is the trivial case ,but you can't say it is useless. and
>> in some system
>> it is important!
>> I have read across the schedule codes and tried this way,it work:
>> static struct task_struct *pick_next_task_rt(struct rq *rq)
>> { ...
>> if (rt_rq_throttled(rt_rq)&& rq->cfs.nr_running)
>> return NULL;
>> ....
>> }
>
> That might work in the current implementation, but like I already
> explained, its not consistent with the multi-group case. Also, people
> are working on making it a proper EDF scheduled CBS, it won't
> generalize.
>
>> > How is it my problem when you design your system wrong?
>>
>> my system is good. but there is no rules what the idle task will do,so.
>> people always write codes in idle task with the assume: no any running
>> task in the system.
>> and people also always write codes in rt task with the assume: if I am
>> in running state
>> ,system will not idle.
>>
>> so what i said above is some like theory,but I don't like the word “theory".
>> I call it people's common sense.
>>
>> but the behavior of the throttled RT group is changed from people's
>> common sense,so don't say people's common sense is wrong. OK?
>
> There are plenty of examples where common sense utterly fails, the one
> that comes to mind is Probability Theory.
>
>> > If you want your 1 RT group to not get throttled, disable the throttle,
>> > or adjust it to fit the parameters of your workload. If you don't want
>> > idle to have latency impact on your RT tasks, fix your idle behaviour.
>> >
>>
>> 1 RT is important to me. But I also have fair task, so throttled is
>> also important to me.
>> and don't say : idle have latency impact on RT tasks. It is too
>> ludicrous Why we make intended latency impact by ourselves,by wrong
>> idle task?
>
> Yes, configurable idle tasks are nothing new. If you care about wakeup
> latency then idle=poll is preferred (it sucks for power saving, but such
> is life).
>
> On your embedded board you seem to have a particularly aggressive idle
> function wrt power savings, which would result in rather large wake from
> idle latencies, regardless of the bandwidth throttle, so what is the
> problem?
>
don't guess what i do in my idle? my idle is perfect!
and don't think only you understand the scheduling and waht you
consider is right.
linux is a free world.

> If you're using the bandwidth throttle to control your RT tasks so as
> not to starve your SCHED_OTHER tasks, then I will call your system ill
> designed.
>
the bandwidth throttle to control RT tasks is useful. of course , I know how
not to prevent SCHED_OTHER tasks from being starved. we just discuss how to
deal with the 100-X time. and very unfortunatly,you are wrong.

2009-07-24 15:08:00

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 22:04 +0800, sen wang wrote:
>
>> just one question:
>> if cpu is free and there is running state task, how you do?
>> schedule the task up? or schedule idle task up?
>
> Well, when an RT group is over the bandwidth limit I don't consider them
> runnable. Therefore, failing to find any other tasks, we run the idle
> task.
>

you havn't anwser the question: if cpu is free, should we schedule the
running state task or idle task?

face the error and fix it! ok?

2009-07-24 15:23:07

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 23:07 +0800, sen wang wrote:
> 2009/7/24 Peter Zijlstra <[email protected]>:
> > On Fri, 2009-07-24 at 22:04 +0800, sen wang wrote:
> >
> >> just one question:
> >> if cpu is free and there is running state task, how you do?
> >> schedule the task up? or schedule idle task up?
> >
> > Well, when an RT group is over the bandwidth limit I don't consider them
> > runnable. Therefore, failing to find any other tasks, we run the idle
> > task.
> >
>
> you havn't anwser the question: if cpu is free, should we schedule the
> running state task or idle task?

It it not runnable because the group is over its limit.

> face the error and fix it! ok?

Please tone down and re-read the explanations I gave.

The throttle is an H-CBS services for RT task groups, meant to provide
isolation through a fixed resource guarantee.

Any process actually hitting the throttle means a miss configured system
-- unless its a temporary overload and you're able to deal with those.

The single group case is simply the trivial case thereof.

Your proposed change does not generalize to such a framework, and while
it might work with the current code, it doesn't serve a use-case
considered in this architecture and will render the interface
inconsistent.

Furthermore, future work in this area will not be able to support your
changed semantics in a sane fashion.

I've yet to see any coherent explanation of your problem, and quite
frankly I find your attitude offensive.

As you say, Linux is an open-source effort, and you're free to do with
your copy as you see fit (provided you stick to the rules stipulated by
the GPLv2). However as co-maintainer of the mainline scheduler I see no
reason to entertain your change, nor for that matter to continue this
discussion.

2009-07-24 15:35:03

by Thomas Gleixner

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 24 Jul 2009, sen wang wrote:
> 2009/7/24 Peter Zijlstra <[email protected]>:
> > On Fri, 2009-07-24 at 22:04 +0800, sen wang wrote:
> >
> >> just one question:
> >> if cpu is free and there is running state task, how you do?
> >> schedule the task up? or schedule idle task up?
> >
> > Well, when an RT group is over the bandwidth limit I don't consider them
> > runnable. Therefore, failing to find any other tasks, we run the idle
> > task.
> >
>
> you havn't anwser the question: if cpu is free, should we schedule the
> running state task or idle task?

Peter explained how it's implemented and why he considers it to be
correct and that it can be disabled.

> face the error and fix it! ok?

Can you please stop yelling at Peter?

He politely answered your questions. Have you even thought about his
answers before shouting "error" ? Also be aware that you can yell "fix
it" as often as you want, all you are achieving is an entry in a
couple of /dev/null procmail rules.

Please read http://www.tux.org/lkml/#s3-12 before you answer again.

Thanks,

tglx

2009-07-24 15:41:02

by Jamie Lokier

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Peter Zijlstra wrote:
> If you're using the bandwidth throttle to control your RT tasks so as
> not to starve your SCHED_OTHER tasks, then I will call your system ill
> designed.

What mechanism should be used to avoid starving SCHED_OTHER tasks, in
the event there are unforeseen bugs or unpredictable calculation times
in an RT task?

Thanks,
-- Jamie

2009-07-24 15:43:26

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Peter Zijlstra <[email protected]>:
> On Fri, 2009-07-24 at 23:07 +0800, sen wang wrote:
>> 2009/7/24 Peter Zijlstra <[email protected]>:
>> > On Fri, 2009-07-24 at 22:04 +0800, sen wang wrote:
>> >
>> >> just one question:
>> >> if cpu is free and there is running state task, how you do?
>> >> schedule the task up? or schedule idle task up?
>> >
>> > Well, when an RT group is over the bandwidth limit I don't consider them
>> > runnable. Therefore, failing to find any other tasks, we run the idle
>> > task.
>> >
>>
>> you havn't anwser the question: if cpu is free, should we schedule the
>> running state task or idle task?
>
> It it not runnable because the group is over its limit.
>
>> face the error and fix it! ok?
>
> Please tone down and re-read the explanations I gave.
>
> The throttle is an H-CBS services for RT task groups, meant to provide
> isolation through a fixed resource guarantee.
>
> Any process actually hitting the throttle means a miss configured system
> -- unless its a temporary overload and you're able to deal with those.
>
> The single group case is simply the trivial case thereof.
>
> Your proposed change does not generalize to such a framework, and while
> it might work with the current code, it doesn't serve a use-case
> considered in this architecture and will render the interface
> inconsistent.
>
> Furthermore, future work in this area will not be able to support your
> changed semantics in a sane fashion.
>
> I've yet to see any coherent explanation of your problem, and quite
> frankly I find your attitude offensive.
>
> As you say, Linux is an open-source effort, and you're free to do with
> your copy as you see fit (provided you stick to the rules stipulated by
> the GPLv2). However as co-maintainer of the mainline scheduler I see no
> reason to entertain your change, nor for that matter to continue this
> discussion.
>
>

sorry for my tone, If you feel hurted. I apologize.

But I still hold my viewpoint. I just want the 100-x time should be
used by running task.

2009-07-24 15:59:52

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 16:40 +0100, Jamie Lokier wrote:
> Peter Zijlstra wrote:
> > If you're using the bandwidth throttle to control your RT tasks so as
> > not to starve your SCHED_OTHER tasks, then I will call your system ill
> > designed.
>
> What mechanism should be used to avoid starving SCHED_OTHER tasks, in
> the event there are unforeseen bugs or unpredictable calculation times
> in an RT task?

For bugs the throttle works, like I said a well functioning system is
not supposed to hit the throttle, obviously a bug precludes the well
functioning qualification :-)

Unpredictable calculation times can be dealt with on the application
design level, for example using techniques such as outlined here:

http://feanor.sssup.it/~faggioli/papers/OSPERT-2009-dlexception.pdf

These really are things you should know about before writing an RT
application ;-)

2009-07-24 23:31:08

by Jamie Lokier

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Peter Zijlstra wrote:
> On Fri, 2009-07-24 at 16:40 +0100, Jamie Lokier wrote:
> > Peter Zijlstra wrote:
> > > If you're using the bandwidth throttle to control your RT tasks so as
> > > not to starve your SCHED_OTHER tasks, then I will call your system ill
> > > designed.
> >
> > What mechanism should be used to avoid starving SCHED_OTHER tasks, in
> > the event there are unforeseen bugs or unpredictable calculation times
> > in an RT task?
>
> For bugs the throttle works, like I said a well functioning system is
> not supposed to hit the throttle, obviously a bug precludes the well
> functioning qualification :-)
>
> Unpredictable calculation times can be dealt with on the application
> design level, for example using techniques such as outlined here:
>
> http://feanor.sssup.it/~faggioli/papers/OSPERT-2009-dlexception.pdf
>
> These really are things you should know about before writing an RT
> application ;-)

Certainly those things can be used, if you are really serious about RT
behaviour. They are quite complex.

For simple things like "try to keep the buffer to my DVD writer full"
(no I don't know how much CPU that requires - it's a kind of "best
effort but try very hard!"), it would be quite useful to have
something like RT-bandwidth which grants a certain percentage of time
as an RT task, and effectively downgrades it to SCHED_OTHER when that
time is exceeded to permit some fairness with the rest of the system.

You can do that in userspace using the techniques in the PDF, and I
have looked at such techniques many years ago (2.2 days!), but the
same could be said about RT-bandwidth. But it's much easier to just
set a kernel parameter.

-- Jamie

2009-07-25 05:22:47

by Bill Gatliff

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Jamie Lokier wrote:
> For simple things like "try to keep the buffer to my DVD writer full"
> (no I don't know how much CPU that requires - it's a kind of "best
> effort but try very hard!"), it would be quite useful to have
> something like RT-bandwidth which grants a certain percentage of time
> as an RT task, and effectively downgrades it to SCHED_OTHER when that
> time is exceeded to permit some fairness with the rest of the system.
>

Useful perhaps, but an application design that explicitly communicates
your desires to the scheduler will be more robust, even if it does seem
more complex at the outset.

I'm with Peter on this one. My impression of RT-bandwidth is that you
shouldn't ever see it doing anything unless your system contains an
error. In those situations, it's definitely a handy alternative to
rebooting to get your shell back. But I don't think you want to build a
system that depends on it, perhaps for no other reason than the fact
that if RT-bandwidth doesn't make your system behave itself then you
don't have a Plan B anymore.

b.g.

--
Bill Gatliff
[email protected]

2009-07-25 11:10:29

by Dario Faggioli

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 21:26 +0800, sen wang wrote:
> If cpu is free and there is a running state task,how can you scdedule
> idle task up?
>
Well, if you drop an eye to what Peter is trying to point out, you'll
find a lot of examples where providing an RT application with _more_ CPU
than it asks, lead to catastrophic consequences... They are some of what
we call "scheduling anomalies", and there are plenty of examples of
that! :-O

I think sched_rt determinism could be improved... But giving some random
task some random more bandwidth is just going in the opposite
direction! :-(

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2009-07-25 11:12:09

by Dario Faggioli

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 23:07 +0800, sen wang wrote:
> face the error and fix it! ok?
Wow... You're very and reasonable nice guy, aren't you? :-O

Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2009-07-25 12:19:40

by Dario Faggioli

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 2009-07-24 at 18:01 +0200, Peter Zijlstra wrote:
> For bugs the throttle works, like I said a well functioning system is
> not supposed to hit the throttle, obviously a bug precludes the well
> functioning qualification :-)
>
Yes, I also think a bandwidth isolation/throttling mechanism could help
a lot either with bugs or when you need hard real-time, soft real-time
and non real-time applications to live together in one single system
such as Linux is --or is about to become.

> Unpredictable calculation times can be dealt with on the application
> design level, for example using techniques such as outlined here:
>
> http://feanor.sssup.it/~faggioli/papers/OSPERT-2009-dlexception.pdf
>
Thanks Peter! :-)

We're getting more citation in this ML than in 'our' academic world...
I'm not sure it is useful for our PhD and research career, but, indeed,
I like that very much anyway! :-P

The mechanism proposed in that paper is one way for providing developers
with the capability of specifying some typical real time "attributes" of
an application (or part of it), such as deadline and/or expected (worst
case?) execution time.

It is probably not always the best way of doing, but it's something we
think it could be useful somewhere. Therefore, we are still working on
it, e.g., improving timer resolution, adding the support for new
semantic and programming models, etc. Moreover, we are open to any
suggestion and contribution about this work, especially from the
community!

> These really are things you should know about before writing an RT
> application ;-)
:-D

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2009-07-25 12:33:33

by Dario Faggioli

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Sat, 2009-07-25 at 00:30 +0100, Jamie Lokier wrote:
> > Unpredictable calculation times can be dealt with on the application
> > design level, for example using techniques such as outlined here:
> >
> > http://feanor.sssup.it/~faggioli/papers/OSPERT-2009-dlexception.pdf
> >
> > These really are things you should know about before writing an RT
> > application ;-)
>
> Certainly those things can be used, if you are really serious about RT
> behaviour.
Well... True... But not that much! :-)

> They are quite complex.
>
Agree... But it's just at its start, still working and trying to improve
it...

> For simple things like "try to keep the buffer to my DVD writer full"
> (no I don't know how much CPU that requires - it's a kind of "best
> effort but try very hard!"), it would be quite useful to have
> something like RT-bandwidth which grants a certain percentage of time
> as an RT task, and effectively downgrades it to SCHED_OTHER when that
> time is exceeded to permit some fairness with the rest of the system.
>
Well, agree, again. If you want something very useful, you need the
combination of the two: user space techniques and kernel space support.

The mechanism described in the paper, work at its best if ran on top of
the proper scheduling policies/framework... And the rt-throttling
mechanism which is currently in place --or some improvements of it--
could definitely be one of those.

> You can do that in userspace using the techniques in the PDF, and I
> have looked at such techniques many years ago (2.2 days!), but the
> same could be said about RT-bandwidth. But it's much easier to just
> set a kernel parameter.
>
The aim of the mechanism was not to move RT to userspace, forgetting
about kernel support... Believe me: we are way far from that! :-)

As said, the point is trying to provide the user to specify some
--typically-- real-time characteristics of its apps, and have them
enforced somehow.

I don't think comparing kernel-space throttling with our user space
deadline/wcet violation notification is the right thing to do, since
they have very different objective, actually!

Throttling is aimed at limiting the bandwidth of real-time apps (or
groups of them) without the need of them to be aware of that.
Our exception based mechanism is aimed at giving the application
developer the capability of being aware of exactly such!

So, different tools for different goals, I think, which however could
work together, if needed...

I hope it would not seem I'm trying to push our mechanism over
anything... Just trying to clarify a little bit why we conceived it and
how it works. :-)

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2009-07-25 14:58:21

by Tommaso Cucinotta

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Hi all,

Raistlin ha scritto:
>> For simple things like "try to keep the buffer to my DVD writer full"
>> (no I don't know how much CPU that requires - it's a kind of "best
>> effort but try very hard!"), it would be quite useful to have
>> something like RT-bandwidth which grants a certain percentage of time
>> as an RT task, and effectively downgrades it to SCHED_OTHER when that
>> time is exceeded to permit some fairness with the rest of the system
> Well, agree, again. If you want something very useful, you need the
> combination of the two: user space techniques and kernel space support.
>
I didn't follow the entire discussion, however I'd like to add a
comment, if it may be of any help. What is useful actually depends on
the usage scenario and its requirements, comprising for example
real-time and security requirements. On one hand, giving a real-time
task the opportunity to keep running even if its budget is exhausted may
be of course useful for the real-time task. In fact, in the real-time
literature, you can find the term "soft reservations" to denote those
real-time scheduling mechanisms that have such a property (and still
preserve theoretical schedulability), with various different ways of
distributing the spare capacity on the real-time tasks. On a GPOS like
Linux, it may also be useful to "downgrade" a RT task to SCHED_OTHER
when its budget is exhausted. In fact, in the AQuoSA EDF-based scheduler
[academic], if the flag "SOFT_SERVER" is specified when creating a
server, this is exactly what happens :-). On a related note, in the
POSIX SPORADIC_SERVER (and e.g., its implementation by Dario Faggioli)
there is a "low priority" field specifying the priority at which the
task should run when the budget is exhausted).
However, if you depart from the traditional "embedded" context (i.e.,
for industrial control), switching for example to a "multi-user server"
context, then a task "triggering" the throttling might not constitute
necessarily a system bug that "needs a reboot", but it may simply be due
to an application trying to over-use the system as compared to how much
it is supposed to use it. Imagine a "pay-per-compute" context in which
the share of a server is granted to a user (i.e., to a VM). Then, a
provider would not necessarily want to grant a user more computation
capability than the user has paid for. In fact, in the AQuoSA scheduler
[again, academic], an access-control model exists by which the sys-admin
may decide what users (and user groups) are authorized to access the
"SOFT_SERVER" facility (i.e., real-time reservations for "gold" users
might be allowed to be soft, but the ones for "bronze" users might not).

Therefore, IMHO there is no "silver bullet", but what behavior is best
depends on the security requirements that may be in-place. Access to the
"soft server" mentioned above is just an example, but plenty of other
issues may arise, including: maximum system capacity that users may be
authorized to occupy, maximum RT server periods that users may be
authorized to use (for not starving the background OS for too much),
minimum RT server period (for not causing too much scheduling overhead),
etc... A more detailed discussion about security requirements arising
when granting real-time facilities to unprivileged users on a GPOS may
be found in [1], in case anyone is interested.

Regards,

T.

[1] Tommaso Cucinotta "Access Control for Adaptive Reservations on
Multi-User Systems", in Proceedings of the 14th IEEE Real-Time and
Embedded Technology and Applications Symposium (RTAS 2008), St. Louis,
MO, United States, April 2008, available at:
http://feanor.sssup.it/~tommaso/publications/RTAS-2008.pdf

--
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://feanor.sssup.it/~tommaso

2009-07-25 17:55:09

by Arjan van de Ven

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Fri, 24 Jul 2009 18:57:35 +0800
sen wang <[email protected]> wrote:

> I find something is wrong about sched_rt.
>
> when I am debugging my system with rt_bandwidth_enabled, there is a
> running realtime FIFO task in the sched_rt running queue and
> the fair running queue is empty. I found the idle task will be
> scheduled up when the running task still lie in the sched_rt running
> queue!
>
> this will happen when rt runqueue passed it's rt_bandwidth_enabled
> runtime,then the scheduler choose the idle task instead of realtime
> FIFO task.
>
> the reason lie in: when scheduler try to pick up a realtime FIFO task,
> it will check if rt_throttled is enabled,
> if so, it'll return and try fair queue but it is empty, then it come
> to the sched_idle class.
>
> I don't think it reasonable, we should give the realtime FIFO task the
> chance, even when rt runqueue passed it's runtime.
> because it is cpu's free time.

sounds like a good power limiting feature...

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-07-25 22:49:11

by Jamie Lokier

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Bill Gatliff wrote:
> Jamie Lokier wrote:
> >For simple things like "try to keep the buffer to my DVD writer full"
> >(no I don't know how much CPU that requires - it's a kind of "best
> >effort but try very hard!"), it would be quite useful to have
> >something like RT-bandwidth which grants a certain percentage of time
> >as an RT task, and effectively downgrades it to SCHED_OTHER when that
> >time is exceeded to permit some fairness with the rest of the system.
> >
>
> Useful perhaps, but an application design that explicitly communicates
> your desires to the scheduler will be more robust, even if it does seem
> more complex at the outset.

I agree with communicting the desire explicitly to the scheduler.

In the above example, the exact desire is "give me as much CPU as I
ask for, because my hardware servicing will be adversely but
non-fatally affected if you don't, and the amount of CPU needed to
service the hardware cannot be determined in advance, but prevent me
from blocking progress in the rest of the system by limiting my
exclusive ownership of the CPU".

How do you propose to communicate that to the scheduler, if not by
something rather like RT-bandwidth with downgrading to SCHED_OTHER
when a policy limit is exceeded?

-- Jamie

2009-07-25 22:55:11

by Jamie Lokier

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Raistlin wrote:
> > http://feanor.sssup.it/~faggioli/papers/OSPERT-2009-dlexception.pdf
>
> It is probably not always the best way of doing, but it's something we
> think it could be useful somewhere. Therefore, we are still working on
> it, e.g., improving timer resolution, adding the support for new
> semantic and programming models, etc. Moreover, we are open to any
> suggestion and contribution about this work, especially from the
> community!

The biggest weakness I see is if the application has a bug such as
overwriting random memory or terminating the thread which is receiving
timer signals, it can easily break the scheduling policy by accident.

When the scheduling policy is implemented in the kernel, it can only
be broken by system calls requesting a change of scheduling policy,
which are relatively unlikely, and if necessary that can be completely
prevented by security controls.

-- Jamie

2009-07-25 23:01:10

by Jamie Lokier

[permalink] [raw]

Subject: Re: report a bug about sched_rt

sen wang wrote:
> but, to the realtime system like a decoder, I think we should meet the rt
> task as we can.

If the realtime task is something like a video decoder feeding a
display, and it is bandwidth-throttled only to ensure things like SSH
and filesystem I/O are still available, then I have to agree with Sen,
you would want any "spare" CPU to go the video decoder in that
application, not the idle task.

> the RT scheduler is mainly used for realtime system which should have the
> different policy from fair task.

The RT scheduler is used for lots of different systems which you
haven't considered.

For your application, probably the way RT-bandwidth works is not
useful. It's better for some other applications.

-- Jamie

2009-07-25 23:24:30

by Tommaso Cucinotta

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Hi,

Jamie Lokier ha scritto:
> Raistlin wrote:
>
>>> http://feanor.sssup.it/~faggioli/papers/OSPERT-2009-dlexception.pdf
>>>
>> Moreover, we are open to any
>> suggestion and contribution about this work, especially from the
>> community!
>>
>
> The biggest weakness I see is if the application has a bug such as
> overwriting random memory or terminating the thread which is receiving
> timer signals, it can easily break the scheduling policy by accident.
>
> When the scheduling policy is implemented in the kernel, it can only
> be broken by system calls requesting a change of scheduling policy,
> which are relatively unlikely, and if necessary that can be completely
> prevented by security controls.
>
First, thanks for your comments and interest. The mentioned mechanism
should be regarded as something that helps in the development of
programs that need to meet timing requirements. This is not meant at all
to constitute a "user-level" scheduler, nor to replace a real-time
scheduler job. Contrarily, its purpose is solely to push towards a
software design paradigm in which the "awareness" of the existing timing
constraints, and the "awareness" of the possibility that they might be
violated (on a GPOS) is coded at the program-level (by means of an
exception-like paradigm). The mechanism has been designed as a
complement to the real-time scheduling facilities that the kernel
provides. As an example, imagine that, by a proper configuration of the
scheduling policy and parameters, an application may be guaranteed a
certain budget every period. However, the requested budget cannot be the
actual worst-case, because it would not be practical to compute it (nor
feasible, cause it would depend on a lot of external factors such as
interrupt load etc...) and it would lead to too much under-usage of
resources. By using this exception-like paradigm, the developer may
code, into an "exception-handler" segment, the recovery actions needed
whenever the real-time task is about to violate for example its WCET
constraints (for example, one could use a "try_wcet()" block with a WCET
specs which is slightly lower than the budget configured into the
scheduler, i.e., the difference being the WCET of the exception handling
code). Then, the real-time scheduler will still enforce the configured
budget for this application, so, if the recovery logics embedded inside
the exception-handler takes too much (i.e., its WCET has been
under-estimated), then the temporal isolation is guaranteed anycase, and
the application won't impact negatively other applications' real-time
guarantees provided by the scheduler.
In fact, we're working on a practical case-study where both the
timing-exception mechanism and one of the real-time schedulers we have
here at SSSA is used. For example, a potential issue we're thinking
about is if and how to "synchronize" someway the vision of time of the
exception-based mechanism and the one of the scheduler, because in prior
experiences built modifying multimedia applications for taking advantage
of feedback-based real-time scheduling, this was one of the burden to
face with, before the application started behaving as "theoretically"
foreseen.

Hope this clarifies our view. Please, feel free to post further comments.

Regards,

T.

2009-07-26 02:44:06

by Bill Gatliff

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Jamie Lokier wrote:
> Bill Gatliff wrote:
>
>> Jamie Lokier wrote:
>>
>>> For simple things like "try to keep the buffer to my DVD writer full"
>>> (no I don't know how much CPU that requires - it's a kind of "best
>>> effort but try very hard!"), it would be quite useful to have
>>> something like RT-bandwidth which grants a certain percentage of time
>>> as an RT task, and effectively downgrades it to SCHED_OTHER when that
>>> time is exceeded to permit some fairness with the rest of the system.
>>>
>>>
>> Useful perhaps, but an application design that explicitly communicates
>> your desires to the scheduler will be more robust, even if it does seem
>> more complex at the outset.
>>
>
> I agree with communicting the desire explicitly to the scheduler.
>
> In the above example, the exact desire is "give me as much CPU as I
> ask for, because my hardware servicing will be adversely but
> non-fatally affected if you don't, and the amount of CPU needed to
> service the hardware cannot be determined in advance, but prevent me
> from blocking progress in the rest of the system by limiting my
> exclusive ownership of the CPU".
>
> How do you propose to communicate that to the scheduler, if not by
> something rather like RT-bandwidth with downgrading to SCHED_OTHER
> when a policy limit is exceeded?
>

This is a great real-world problem. And there's no one-size-fits-all
answer, unfortunately.

RT-bandwidth will give you the system behavior you are after, but it's a
pretty blunt instrument.

I'd consider putting some throttling in your interrupt handler that
prevents it from running more than a certain amount of calculation per
interrupt event. And perhaps it's looking at execution timestamps to
determine how often it's running, and can therefore do a rough
calculation of how much CPU it's eating. At least until threaded
interrupt scheduling is widespread, a runaway interrupt handler is
definitely an opportunity to hang up a system.

Tasklets are nice for this, because the scheduler won't re-queue one if
it's already running. So if your interrupt handler's job is just to
launch the tasklet, and you know how much time the tasklet takes to run,
then if you get a burst of interrupts you don't end up launching an
equivalent burst of scheduled work: eventually the interrupt handler
overtakes the tasklet, and the additional interrupt events get dropped.
That's often a decent way to deal with system overload, especially if it
leaves the system functional enough to take some sort of "evasive
action" like reverting to polled i/o, issuing a diagnostic message, or
doing an orderly transition to a safe mode.

A flood ping, lots of paging, and driver bugs are just a few ways you
can encounter an unexpected burst of interrupt activity that might, if
not dealt with on some level, cause the system to suddenly destabilize.

Point is, keep a mentality that you want to fall back onto RT-bandwidth
(or any other type of watchdog timer expiration) only after you've
exhausted all other options. Pretend it isn't there--- but definitely
know what will happen if it ever steps in. A system coded that way is
much more resistant to breakage, in my experience anyway.

b.g.

--
Bill Gatliff
[email protected]

2009-07-26 03:55:51

[permalink] [raw]

Subject: Re: report a bug about sched_rt

2009/7/24 Arjan van de Ven <[email protected]>
>
> On Fri, 24 Jul 2009 18:57:35 +0800
> sen wang <[email protected]> wrote:
>
> > I find something is wrong about sched_rt.
> >
> > when I am debugging my system with rt_bandwidth_enabled, there is a
> > running realtime FIFO task in the sched_rt running queue and
> > the fair running queue is empty. I found the idle task will be
> > scheduled up when the running task still lie in the sched_rt running
> > queue!
> >
> > this will happen when rt runqueue passed it's rt_bandwidth_enabled
> > runtime,then the scheduler choose the idle task instead of realtime
> > FIFO task.
> >
> > the reason lie in: when scheduler try to pick up a realtime FIFO task,
> > it will check if rt_throttled is enabled,
> > if so, it'll return and try fair queue but it is empty, then it come
> > to the sched_idle class.
> >
> > I don't think it reasonable, we should give the realtime FIFO task the
> > chance, even when rt runqueue passed it's runtime.
> > because it is cpu's free time.
>
>
> sounds like a good power limiting feature...
>
>

what I want to say is:
If we give cpu to rt task at that situation,the normal fair task still
have chance to get cpu in the left 50ms.(say the throttle is 950
ms).because in the left 50ms,throttle is still enabled, every
tick,the rt schedule will check the normal fair task.if find some
one,rt will yield cpu to
the new coming noraml fair task.

for realtime system there is stll normal fair task, so the bandwidth
is useful,we cann't just
turn it off simply.

I think the usespace schedule policy is not feasible. because when and
who and where to implemnt it?glibc,uclibc,android bionic.? can you
make sure they will implent the compatible policy? kernel is the best
place to do it.

and since linux is used for so many fields,why not provide different
feature for different system.by adding a new kenrel config item,say
CONFIG_REAL_TIME_SYSTEM, we can give cpu to rt task when throttle is
on,by disable CONFIG_REAL_TIME_SYSTEM,everything is keep untouched for
server and desktop.

i don't know if the viewpoint will offsend somebody, sorry first :)

> --
> Arjan van de Ven Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org

2009-07-26 19:04:15

by Jamie Lokier

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Bill Gatliff wrote:
> Jamie Lokier wrote:
> >I agree with communicting the desire explicitly to the scheduler.
> >
> >In the above example, the exact desire is "give me as much CPU as I
> >ask for, because my hardware servicing will be adversely but
> >non-fatally affected if you don't, and the amount of CPU needed to
> >service the hardware cannot be determined in advance, but prevent me
> >from blocking progress in the rest of the system by limiting my
> >exclusive ownership of the CPU".
> >
> >How do you propose to communicate that to the scheduler, if not by
> >something rather like RT-bandwidth with downgrading to SCHED_OTHER
> >when a policy limit is exceeded?
>
> This is a great real-world problem. And there's no one-size-fits-all
> answer, unfortunately.
>
> RT-bandwidth will give you the system behavior you are after, but it's a
> pretty blunt instrument.

I'm under the impression that RT-bandwidth will *not* give the above
system behaviour, and that is the whole reason for this thread.

> I'd consider putting some throttling in your interrupt handler that
> prevents it from running more than a certain amount of calculation per
> interrupt event.

There is no interrupt handler in my specification above...

> And perhaps it's looking at execution timestamps to
> determine how often it's running, and can therefore do a rough
> calculation of how much CPU it's eating. At least until threaded
> interrupt scheduling is widespread, a runaway interrupt handler is
> definitely an opportunity to hang up a system.

With threaded interrupt scheduling using RT priority, that opportunity
to hang the system is exactly the same.

Indeed, threaded interrupts are a good example of when you might want
a limit fraction of the CPU allocated to that thread at RT priority,
falling down to SCHED_OTHER if the handler needs to continue to run.
That is, in fact, how

> Tasklets

tasklets, bottom halves and things like that work :-)

[snip explanation of tasklets]
> That's often a decent way to deal with system overload, especially if it
> leaves the system functional enough to take some sort of "evasive
> action" like reverting to polled i/o, issuing a diagnostic message, or
> doing an orderly transition to a safe mode.

Polled I/O is good when this happens. You can revert to polled I/O
automatically without coding it explicitly in interrupt handlers, if
the scheduler provides appropriate support.

When a threaded interrupt (with RT priority, naturally) is run too
often, then you stop scheduling it as RT and bring it down to
SCHED_OTHER or lower, periodically allowing it to have a fair share of
the CPU when there are other runnable tasks. That's quite close to
polling I/O, without coding it explicitly in the device driver.

So RT-bandwidth would be nice for those threaded interrupts.

-- Jamie

2009-07-27 10:43:46

by Peter Zijlstra

[permalink] [raw]

Subject: Re: report a bug about sched_rt

On Sun, 2009-07-26 at 20:03 +0100, Jamie Lokier wrote:

> So RT-bandwidth would be nice for those threaded interrupts.

No, a different/better scheduling policy would be - maybe.

People mentioned SCHED_SPORADIC, but I really really dislike that
because for the actual sporadic task model we can do so much better
using deadline schedulers.

Furthermore, SCHED_SPORADIC as specified by POSIX is a useless piece of
crap, so we would have to deviate from POSIX, which would create
confusion -- although good documentation might help a little here.

The current RT-bandwidth comes from the RT cgroup code, and its only
purpose in life is to provide isolation between multiple groups through
guaranteeing the bandwidth of others by hard limiting. It does that.

It's certainly not flawless, in fact its not what I would call complete
(hence its still EXPERIMENTAL status), but Fabio is working on
implementing a deadline H-CBS for this, which would greatly improve the
situation.

Extending the deadline model with a soft mode might be useful as
mentioned by Tommaso, but I would only be looking at that after we've
completed work on the normal deadline bits (both group and task). And
then we'd have to consistently and full integrate it with both.

2009-07-27 13:35:22

by Bill Gatliff

[permalink] [raw]

Subject: Re: report a bug about sched_rt

Jamie Lokier wrote:
> Bill Gatliff wrote:
>
>> Jamie Lokier wrote:
>>
>>> I agree with communicting the desire explicitly to the scheduler.
>>>
>>> In the above example, the exact desire is "give me as much CPU as I
>>> ask for, because my hardware servicing will be adversely but
>>> non-fatally affected if you don't, and the amount of CPU needed to
>>> service the hardware cannot be determined in advance, but prevent me
>>>
>> >from blocking progress in the rest of the system by limiting my
>>
>>> exclusive ownership of the CPU".
>>>
>>> How do you propose to communicate that to the scheduler, if not by
>>> something rather like RT-bandwidth with downgrading to SCHED_OTHER
>>> when a policy limit is exceeded?
>>>
>> This is a great real-world problem. And there's no one-size-fits-all
>> answer, unfortunately.
>>
>> RT-bandwidth will give you the system behavior you are after, but it's a
>> pretty blunt instrument.
>>
>
> I'm under the impression that RT-bandwidth will *not* give the above
> system behaviour, and that is the whole reason for this thread.
>

I think I misspoke. What I meant to say is that RT-bandwidth will
(probably) prevent the hardware handler from eating 100% of the CPU.
But the system will suffer quite a, um, "discontinuity" when the
throttling happens.

>
>> I'd consider putting some throttling in your interrupt handler that
>> prevents it from running more than a certain amount of calculation per
>> interrupt event.
>>
>
> There is no interrupt handler in my specification above...
>

True. But in practice, I think such devices are typically
interrupt-driven at some level.

b.g.

--
Bill Gatliff
[email protected]