2008-08-19 10:44:21

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 6/6] sched: disabled rt-bandwidth by default

Disable bandwidth control by default.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni

/*
* part of the period that we allow rt tasks to run in us.
- * default: 0.95s
+ * default: inf
*/
-int sysctl_sched_rt_runtime = 950000;
+int sysctl_sched_rt_runtime = -1;

static inline u64 global_rt_period(void)
{

--


2008-08-19 11:06:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Peter Zijlstra <[email protected]> wrote:

> Disable bandwidth control by default.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched.c | 17 +++++++----------
> 1 file changed, 7 insertions(+), 10 deletions(-)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
>
> /*
> * part of the period that we allow rt tasks to run in us.
> - * default: 0.95s
> + * default: inf
> */
> -int sysctl_sched_rt_runtime = 950000;
> +int sysctl_sched_rt_runtime = -1;

The fixes look good to me, but this enabling of infinite RT task lockups
is not an improvement.

The thing is, i got far more bugreports about locked up RT tasks where
the lockup was unintentional, than real bugreports about anyone
_intending_ for the whole box to come to a grinding halt because a
high-prio RT tasks is monopolizing the CPU.

In fact there's only been this artificial test so far.

So could you please just increase the chunking to 10 seconds or so, from
the current 1 second? Anyone locking up the system for more than 10
seconds via an RT task has to deal with many other issues already.

I.e. keep the system borderline debuggable (up to 10 seconds delays are
_not_ nice so people will notice) - but it's still a marked improvement
from completly locked up desktops.

And those who really need longer than 10 second periods can set it
higher, or even (if they want to live dangerously or run POSIX
conformance tests) make it infinite (set it to -1) - and will have to
deal with other things like the softlockup watchdog as well.

Ok?

Ingo

2008-08-19 11:12:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Ingo Molnar <[email protected]> wrote:

> The fixes look good to me, but this enabling of infinite RT task
> lockups is not an improvement.
>
> The thing is, i got far more bugreports about locked up RT tasks where
> the lockup was unintentional, than real bugreports about anyone
> _intending_ for the whole box to come to a grinding halt because a
> high-prio RT tasks is monopolizing the CPU.
>
> In fact there's only been this artificial test so far.
>
> So could you please just increase the chunking to 10 seconds or so,
> from the current 1 second? Anyone locking up the system for more than
> 10 seconds via an RT task has to deal with many other issues already.
>
> I.e. keep the system borderline debuggable (up to 10 seconds delays
> are _not_ nice so people will notice) - but it's still a marked
> improvement from completly locked up desktops.
>
> And those who really need longer than 10 second periods can set it
> higher, or even (if they want to live dangerously or run POSIX
> conformance tests) make it infinite (set it to -1) - and will have to
> deal with other things like the softlockup watchdog as well.
>
> Ok?

ok - i've queued the fixes up in tip/sched/rt (not in tip/sched/urgent
yet, they need a bit of test-time, but are potential v2.6.27 commits) -
see the shortlog below.

Ingo

------------------>
Ingo Molnar (1):
sched: set rt-bandwidth period from 1 second to 10 seconds

Peter Zijlstra (5):
sched: rt-bandwidth for user grouping interface
sched: rt-bandwidth accounting fix
sched: rt-bandwidth group disable fixes
sched: extract walk_tg_tree()
sched: rt-bandwidth fixes


kernel/sched.c | 215 +++++++++++++++++++++++++++++------------------------
kernel/sched_rt.c | 16 ++--
kernel/user.c | 4 +-
3 files changed, 129 insertions(+), 106 deletions(-)

2008-08-19 11:18:20

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tuesday 19 August 2008 21:05, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
> > Disable bandwidth control by default.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > ---
> > kernel/sched.c | 17 +++++++----------
> > 1 file changed, 7 insertions(+), 10 deletions(-)
> >
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
> >
> > /*
> > * part of the period that we allow rt tasks to run in us.
> > - * default: 0.95s
> > + * default: inf
> > */
> > -int sysctl_sched_rt_runtime = 950000;
> > +int sysctl_sched_rt_runtime = -1;
>
> The fixes look good to me, but this enabling of infinite RT task lockups
> is not an improvement.
>
> The thing is, i got far more bugreports about locked up RT tasks where
> the lockup was unintentional, than real bugreports about anyone
> _intending_ for the whole box to come to a grinding halt because a
> high-prio RT tasks is monopolizing the CPU.

Why are all these people running poorly written apps then?

We don't cater to poorly at the expense of the properly written
code.


> In fact there's only been this artificial test so far.

No, someone reported that it broke their app.


> So could you please just increase the chunking to 10 seconds or so, from
> the current 1 second? Anyone locking up the system for more than 10
> seconds via an RT task has to deal with many other issues already.
>
> I.e. keep the system borderline debuggable (up to 10 seconds delays are
> _not_ nice so people will notice) - but it's still a marked improvement
> from completly locked up desktops.
>
> And those who really need longer than 10 second periods can set it
> higher, or even (if they want to live dangerously or run POSIX
> conformance tests) make it infinite (set it to -1) - and will have to
> deal with other things like the softlockup watchdog as well.
>
> Ok?

Nack. Let's retain our API specifications and backwards compatibilty
by default. Advertise the sysrq switch and the setting of the sysctl
to throttle, but don't break this by default please.

2008-08-19 11:42:48

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] sched: extract walk_tg_tree(), fix


>From fc21334298056c1e0d6428d3abe46b104188a05e Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Tue, 19 Aug 2008 13:40:47 +0200
Subject: [PATCH] sched: extract walk_tg_tree(), fix

fix:

kernel/sched.c: In function '__rt_schedulable':
kernel/sched.c:8771: error: implicit declaration of function 'walk_tg_tree'
kernel/sched.c:8771: error: 'tg_nop' undeclared (first use in this function)
kernel/sched.c:8771: error: (Each undeclared identifier is reported only once
kernel/sched.c:8771: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 59c6683..10f7ad2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1387,7 +1387,7 @@ static inline void dec_cpu_load(struct rq *rq, unsigned long load)
update_load_sub(&rq->load, load);
}

-#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(SCHED_RT_GROUP_SCHED)
+#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(CONFIG_SCHED_RT_GROUP_SCHED)
typedef int (*tg_visitor)(struct task_group *, void *);

/*

2008-08-19 13:01:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Nick Piggin <[email protected]> wrote:

> [...] Let's retain our API specifications and backwards compatibilty
> by default. [...]

I agree with you that the 1 second default was a bit too tight - and we
should definitely change that (and it's changed already).

So changing the "allow RT tasks up to 10 seconds uninterrupted CPU
monopolization" is OK to me - it still keeps runaway CPU loops (which
are in the vast majority) debuggable, while allowing common-sense RT
task usage.

But changing that back to the other extreme: "allow lockups by default"
is unreasonable IMO - especially in the face of rtlimit that allows
unprivileged tasks to gain RT privileges.

As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It
does not result in a usable Linux system - it interacts with too many
normal system activities. It is a very, very special mode of operation
and anyone using Linux in such a way has to take precautions and has to
tune things specially anyway. (has to turn off the softlockup watchdog,
has to make sure IO requests do not time out artificially, etc.) You
wont even get normal keyboard or console behavior in most cases.

Furthermore, if by "API specifications" you mean POSIX - to get a
conformant POSIX run one has to change a lot of things on a typical
Linux system anyway. APIs and utilities have to be crippled to be "POSIX
compliant".

In other words: we use common sense when thinking about specifications.
The kernel's defaults are about being reasonable by default.

I have no _strong_ feelings about it, but i dont see the practical value
in going beyond 10 seconds - as it turns a rather useful robustness
feature off by default (and keeps it untested, etc.).

Ingo

2008-08-19 18:15:40

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>> [...] Let's retain our API specifications and backwards compatibilty
>> by default. [...]
>
> I agree with you that the 1 second default was a bit too tight - and we
> should definitely change that (and it's changed already).
>
> So changing the "allow RT tasks up to 10 seconds uninterrupted CPU
> monopolization" is OK to me - it still keeps runaway CPU loops (which
> are in the vast majority) debuggable, while allowing common-sense RT
> task usage.
>
> But changing that back to the other extreme: "allow lockups by default"
> is unreasonable IMO - especially in the face of rtlimit that allows
> unprivileged tasks to gain RT privileges.
>
> As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It
> does not result in a usable Linux system - it interacts with too many
> normal system activities. It is a very, very special mode of operation
> and anyone using Linux in such a way has to take precautions and has to
> tune things specially anyway. (has to turn off the softlockup watchdog,
> has to make sure IO requests do not time out artificially, etc.)
btw The tuning is actually very easy and straightforward ie not so
special anymore. That's one of the use cases that my cpu isolation work
was addressing. 2.6.27 will have most of the mechanisms available. All
the tuning is done by the 'syspart' package:
http://git.kernel.org/?p=linux/kernel/git/maxk/syspart.git;a=summary

> You wont even get normal keyboard or console behavior in most cases.
Only on a single processor system.

> Furthermore, if by "API specifications" you mean POSIX - to get a
> conformant POSIX run one has to change a lot of things on a typical
> Linux system anyway. APIs and utilities have to be crippled to be "POSIX
> compliant".
>
> In other words: we use common sense when thinking about specifications.
> The kernel's defaults are about being reasonable by default.
>
> I have no _strong_ feelings about it, but i dont see the practical value
> in going beyond 10 seconds - as it turns a rather useful robustness
> feature off by default (and keeps it untested, etc.).
Same here. I do not mind setting sysctls. At the same time I agree with
Nick that ideally we should not change the meaning of SCHED_FIFO.

Max

2008-08-20 11:56:22

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tuesday 19 August 2008 22:59, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > [...] Let's retain our API specifications and backwards compatibilty
> > by default. [...]
>
> I agree with you that the 1 second default was a bit too tight - and we
> should definitely change that (and it's changed already).

I do not agree that it is too tight, it is just plain wrong.


> So changing the "allow RT tasks up to 10 seconds uninterrupted CPU
> monopolization" is OK to me - it still keeps runaway CPU loops (which
> are in the vast majority) debuggable, while allowing common-sense RT
> task usage.


RT tasks have always been debuggable by using a simple watchdog thread.
As I said before, someone who develops a non-trivial RT app without a
watchdog thread or isolated CPU basically doesn't deserve the honour of
us breaking our API to cater for their idiocity.

But even for those people, we now have the sysrq trigger too. And also
we'll still have the rt throttle sysctl that can be changed at runtime.

There are so many options... "oh but maybe they didn't research the
options either so let's break our APIs instead" is not common sense
IMO.


> But changing that back to the other extreme: "allow lockups by default"
> is unreasonable IMO - especially in the face of rtlimit that allows
> unprivileged tasks to gain RT privileges.

No, it's not "allow lockups by default". It is "follow the API and
backwards compatibility by default".

If some distro has gone and given all users RTPRIO rlimit by default
and allowed unprivileged users to lock up the system, it is not the
problem of the upstream kernel. That distro can set the rt throttle
default if it wants to. Or provide a watchdog thread for debugging
RT tasks.


> As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It
> does not result in a usable Linux system - it interacts with too many
> normal system activities. It is a very, very special mode of operation
> and anyone using Linux in such a way has to take precautions and has to
> tune things specially anyway. (has to turn off the softlockup watchdog,
> has to make sure IO requests do not time out artificially, etc.) You
> wont even get normal keyboard or console behavior in most cases.

This is exactly what *real* RT app/system developers do. I'm not
talking about an untuned desktop system!!


> Furthermore, if by "API specifications" you mean POSIX - to get a
> conformant POSIX run one has to change a lot of things on a typical
> Linux system anyway. APIs and utilities have to be crippled to be "POSIX
> compliant".

By that argument we can break any userspace API for any reason.


> In other words: we use common sense when thinking about specifications.
> The kernel's defaults are about being reasonable by default.

It's not common sense to change this. It would be perfectly valid to
engineer a realtime process that uses a peak of say 90% of the CPU with
a 10% margin for safety and other services. Now they only have 5%.

Or a realtime app could definitely use the CPU adaptively up to 100% but
still unable to tolerate an unexpected preemption.

I don't know how you can change this so significantly and be so sure of
yourself that you won't break anything (actually you already have one
reported breakage in this thread).


> I have no _strong_ feelings about it, but i dont see the practical value
> in going beyond 10 seconds - as it turns a rather useful robustness
> feature off by default (and keeps it untested, etc.).

I feel strongly about it.

The primary issue is that we have broken the API from both specification
and previous implementation, the answer is yes. That *you* can't see any
reason to use the API in that way kind of pales in comparison with all
due respect. Especially as you already got a counter example of someone's
app that broke.

2008-08-26 09:00:27

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

So... no reply to this? I'm really wondering how it's OK to break documented
standards and previous Linux behaviour by default for something that it is
trivial to solve in userspace? All the arguments for it IMO are weak, and
the argument against is obviously pretty strong but doesn't seem to have
been acknolwedged.

On Wednesday 20 August 2008 21:56, Nick Piggin wrote:
> On Tuesday 19 August 2008 22:59, Ingo Molnar wrote:
> > * Nick Piggin <[email protected]> wrote:
> > > [...] Let's retain our API specifications and backwards compatibilty
> > > by default. [...]
> >
> > I agree with you that the 1 second default was a bit too tight - and we
> > should definitely change that (and it's changed already).
>
> I do not agree that it is too tight, it is just plain wrong.
>
> > So changing the "allow RT tasks up to 10 seconds uninterrupted CPU
> > monopolization" is OK to me - it still keeps runaway CPU loops (which
> > are in the vast majority) debuggable, while allowing common-sense RT
> > task usage.
>
> RT tasks have always been debuggable by using a simple watchdog thread.
> As I said before, someone who develops a non-trivial RT app without a
> watchdog thread or isolated CPU basically doesn't deserve the honour of
> us breaking our API to cater for their idiocity.
>
> But even for those people, we now have the sysrq trigger too. And also
> we'll still have the rt throttle sysctl that can be changed at runtime.
>
> There are so many options... "oh but maybe they didn't research the
> options either so let's break our APIs instead" is not common sense
> IMO.
>
> > But changing that back to the other extreme: "allow lockups by default"
> > is unreasonable IMO - especially in the face of rtlimit that allows
> > unprivileged tasks to gain RT privileges.
>
> No, it's not "allow lockups by default". It is "follow the API and
> backwards compatibility by default".
>
> If some distro has gone and given all users RTPRIO rlimit by default
> and allowed unprivileged users to lock up the system, it is not the
> problem of the upstream kernel. That distro can set the rt throttle
> default if it wants to. Or provide a watchdog thread for debugging
> RT tasks.
>
> > As an experiment try running a 100% CPU using SCHED_FIFO:99 RT task. It
> > does not result in a usable Linux system - it interacts with too many
> > normal system activities. It is a very, very special mode of operation
> > and anyone using Linux in such a way has to take precautions and has to
> > tune things specially anyway. (has to turn off the softlockup watchdog,
> > has to make sure IO requests do not time out artificially, etc.) You
> > wont even get normal keyboard or console behavior in most cases.
>
> This is exactly what *real* RT app/system developers do. I'm not
> talking about an untuned desktop system!!
>
> > Furthermore, if by "API specifications" you mean POSIX - to get a
> > conformant POSIX run one has to change a lot of things on a typical
> > Linux system anyway. APIs and utilities have to be crippled to be "POSIX
> > compliant".
>
> By that argument we can break any userspace API for any reason.
>
> > In other words: we use common sense when thinking about specifications.
> > The kernel's defaults are about being reasonable by default.
>
> It's not common sense to change this. It would be perfectly valid to
> engineer a realtime process that uses a peak of say 90% of the CPU with
> a 10% margin for safety and other services. Now they only have 5%.
>
> Or a realtime app could definitely use the CPU adaptively up to 100% but
> still unable to tolerate an unexpected preemption.
>
> I don't know how you can change this so significantly and be so sure of
> yourself that you won't break anything (actually you already have one
> reported breakage in this thread).
>
> > I have no _strong_ feelings about it, but i dont see the practical value
> > in going beyond 10 seconds - as it turns a rather useful robustness
> > feature off by default (and keeps it untested, etc.).
>
> I feel strongly about it.
>
> The primary issue is that we have broken the API from both specification
> and previous implementation, the answer is yes. That *you* can't see any
> reason to use the API in that way kind of pales in comparison with all
> due respect. Especially as you already got a counter example of someone's
> app that broke.

2008-08-26 09:31:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Nick Piggin <[email protected]> wrote:

> So... no reply to this? I'm really wondering how it's OK to break
> documented standards and previous Linux behaviour by default for
> something that it is trivial to solve in userspace? [...]

I disagree and what do you mean by "trivial to solve in user-space"?

Ingo

2008-08-26 09:45:05

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > So... no reply to this? I'm really wondering how it's OK to break
> > documented standards and previous Linux behaviour by default for
> > something that it is trivial to solve in userspace? [...]
>
> I disagree

Disagree with what? That it's a problem to basically break the guarantee
realtime SCHED_ policies have previously provided?


> and what do you mean by "trivial to solve in user-space"?

I mean that if some distro has turned on the RT scheduling ulimit by
default and now finds themselves with a local DoS for unpriviliged users
as a result, then either that distro should just make their init scripts
set the throttle and break the API themselves, or they should start a
watchdog at a higher priority than unprivileged user can set.

2008-08-26 09:55:26

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > So... no reply to this? I'm really wondering how it's OK to break
> > documented standards and previous Linux behaviour by default for
> > something that it is trivial to solve in userspace? [...]
>
> I disagree

Your arguments were along the line of:

* It probably doesn't break anything (except we had somebody report
that it breaks their app)

* If it does break something then they must be doing something stupid
(I refuted that because there are several legitimate ways to use rt
scheduling that is broken by this)

* We have many other APIs and tools that don't conform to posix (why
is that a reason to break this one?)

* We should break the API to cater for stupid users and distros who
create local DoS and/or lock up their boxes (except this is trivial
to solve by setting sysctls or having a watchdog or using sysrq)

So did I miss some really good argument, or do you really think the
above arguments are a good reason to break the API? If the latter,
then we have to just agree to disagree and I'll ask Linus to arbitrate.
OK?

2008-08-26 10:30:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Nick Piggin <[email protected]> wrote:

> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > * Nick Piggin <[email protected]> wrote:
> > > So... no reply to this? I'm really wondering how it's OK to break
> > > documented standards and previous Linux behaviour by default for
> > > something that it is trivial to solve in userspace? [...]
> >
> > I disagree
>
> Disagree with what? That it's a problem to basically break the
> guarantee realtime SCHED_ policies have previously provided?

I think you are sticking to the rigid letter of some standard without
seeing the bigger picture.

Firstly, please realize that to do a "successful" POSIX or other
conformance run a default Linux distribution has to be tweaked and often
crippled literally dozens and often hundreds of ways. In this case you
also have to add one more entry to /etc/sysctl.conf, to allow RT tasks
to monopolize CPU time. So you can still get the POSIX sticker if you
want to - nothing changed about that.

Secondly, my big picture point is that our task is to make Linux more
useful and more usable by default. You seem to be arguing that RT tasks
should be allowed by default to monopolize all CPU time forever, and i
disagree with that proposition.

But do _you_ actually use such runaway CPU-monopolizing RT tasks? Try it
one day and you'll quickly meet various practical problems. Let a
SCHED_FIFO:99 RT task run long enough and on all the main distributions
you will get:

BUG: soft lockup - CPU#1 stuck for 61s! [bash:3659]

But monopolizing any resource in a 100% way (which you are arguing for)
is just not a generic Linux system and for years (seeing all the
practical problems with it) we tried various methods to contain
SCHED_FIFO tasks in the scheduler, none was really acceptable for
mainline.

Peter's changes were clean and useful at last. There's lots of apps that
use SCHED_FIFO for a short burst of activity, and 100% of the ones i
know do not want to run for longer than 10 seconds.

Thirdly, your argument can only be consistent if you also argue for the
softlockup watchdog to be disabled. Do you make that point?

> > and what do you mean by "trivial to solve in user-space"?
>
> I mean that if some distro has turned on the RT scheduling ulimit by
> default and now finds themselves with a local DoS for unpriviliged
> users as a result, then either that distro should just make their init
> scripts set the throttle and break the API themselves, or they should
> start a watchdog at a higher priority than unprivileged user can set.

... but that's by far not the only usecase. Very frequently i've seen
bugreports from people with runaway RT tasks (which tasks were running
as root) where that runaway behavior was completely unintended. Audio
apps or other apps getting into a loop and locking up the system.

Worse than that, such bugs prevented the system from being debugged by
plain users. A runaway RT task that monopolizes the CPU will lock it up
completely, requiring a hard reset or a power cycle. That can lose data,
etc. If we allow it to lock up the CPU for up to 10 seconds it will
still be noticed if that is unintentional (the system is very slow), but
the problem can be debugged.

By making RT tasks not lock up like that by default and allowing them to
'only' monopolize the CPU up to 10 seconds, we make the system more
debuggable and more useful in general. It is a quite reasonable
proposition that makes Linux useful in general, and you seem to be
ignoring that practical angle altogether. It's not about allowing
user-space rtprio-rlimit driven apps to not run away, it's about
allowing _any_ RT task to be throttled by default if they run away.

On the other side of the equation, what exact application do you know
that absolutely relies on being able to monopolize all CPU time in
excess of 10 seconds? I havent heard much about that usecase. Why does
that particular RT app do it, because that behavior sounds _very_ weird
to me.

If it's some embedded system or other special-purpose app then it can
tweak the sysctl no problem. (it will have to do it anyway, to turn off
the softlockup watchdog)

If it's some general purpose Linux app, exactly which one is it? If it's
an OSS app please give me an URL to its source code, we need to fix it
urgently. Running for more than 10 seconds wastes power like mad and is
generally a very un-nice thing to do.

All in one, since the 'buggy RT app runs into a loop and monopolizes the
CPU' case is much more common, i do think that supporting that usecase
is the better choice for a default.

... and in any case, i agree with some of the observations in this
thread, in particular that the 1 second default limit was too low
(_occasional_ spurts of a couple of seconds activities by RT tasks ought
to be OK) - that's why we upped it to 10 seconds already in sched/devel
tree, a week ago or so.

Ingo

2008-08-26 11:03:52

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tuesday 26 August 2008 20:29, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > * Nick Piggin <[email protected]> wrote:
> > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > documented standards and previous Linux behaviour by default for
> > > > something that it is trivial to solve in userspace? [...]
> > >
> > > I disagree
> >
> > Disagree with what? That it's a problem to basically break the
> > guarantee realtime SCHED_ policies have previously provided?
>
> I think you are sticking to the rigid letter of some standard without
> seeing the bigger picture.
>
> Firstly, please realize that to do a "successful" POSIX or other
> conformance run a default Linux distribution has to be tweaked and often
> crippled literally dozens and often hundreds of ways. In this case you
> also have to add one more entry to /etc/sysctl.conf, to allow RT tasks
> to monopolize CPU time. So you can still get the POSIX sticker if you
> want to - nothing changed about that.

I'm not talking about anything else except this particular interface.
I'm also not talking about getting a sticker or anything, but providing
_expected_ and _documented_ and _matching with previous_ behaviour.


> Secondly, my big picture point is that our task is to make Linux more
> useful and more usable by default. You seem to be arguing that RT tasks
> should be allowed by default to monopolize all CPU time forever, and i
> disagree with that proposition.

Then that's not SCHED_FIFO/SCHED_RT, so just make another scheduling class.
SCHED_FIFO and SCHED_RT can use up all CPU time, but that's why they are
privileged by default. root has always been able to do silly things, that's
nothing new.

It is the easiest thing in the world to have made a new scheduling class
rather than break existing ones.


> But do _you_ actually use such runaway CPU-monopolizing RT tasks? Try it
> one day and you'll quickly meet various practical problems. Let a
> SCHED_FIFO:99 RT task run long enough and on all the main distributions
> you will get:
>
> BUG: soft lockup - CPU#1 stuck for 61s! [bash:3659]

Again, I'm talking about the upstream kernel, and I'm not actually interested
in other bugs or problems because the way to fix things is to solve one bug
at a time and not give up just because there are some other bugs.

Soft lockup message I don't think causes much pain, except it may be useful to
actually panic and do failover with but AFAIKS it is not enabled by default
anyway.


> But monopolizing any resource in a 100% way (which you are arguing for)
> is just not a generic Linux system and for years (seeing all the
> practical problems with it) we tried various methods to contain
> SCHED_FIFO tasks in the scheduler, none was really acceptable for
> mainline.

Actually you can pretty well isolate kernel services and interrupts from one
CPU and run rt tasks on that. But anyway, who are you to impose a magical
10s limit on it and _really_ break it by design?

> Peter's changes were clean and useful at last. There's lots of apps that
> use SCHED_FIFO for a short burst of activity, and 100% of the ones i
> know do not want to run for longer than 10 seconds.
>
> Thirdly, your argument can only be consistent if you also argue for the
> softlockup watchdog to be disabled. Do you make that point?

It is disabled by default.


> > > and what do you mean by "trivial to solve in user-space"?
> >
> > I mean that if some distro has turned on the RT scheduling ulimit by
> > default and now finds themselves with a local DoS for unpriviliged
> > users as a result, then either that distro should just make their init
> > scripts set the throttle and break the API themselves, or they should
> > start a watchdog at a higher priority than unprivileged user can set.
>
> ... but that's by far not the only usecase. Very frequently i've seen
> bugreports from people with runaway RT tasks (which tasks were running
> as root) where that runaway behavior was completely unintended. Audio
> apps or other apps getting into a loop and locking up the system.

And how is that a kernel problem? Should we fix the kernel against
a stupid user running rm -rf / as root?


> Worse than that, such bugs prevented the system from being debugged by
> plain users. A runaway RT task that monopolizes the CPU will lock it up
> completely, requiring a hard reset or a power cycle. That can lose data,
> etc. If we allow it to lock up the CPU for up to 10 seconds it will
> still be noticed if that is unintentional (the system is very slow), but
> the problem can be debugged.

Tell the stupid audio program writers to run a watchdog task if they
are running a non-trivial amount of code with rt sched policy. Like any
other sane rt apps should have.


> By making RT tasks not lock up like that by default and allowing them to
> 'only' monopolize the CPU up to 10 seconds, we make the system more
> debuggable and more useful in general. It is a quite reasonable
> proposition that makes Linux useful in general, and you seem to be
> ignoring that practical angle altogether. It's not about allowing
> user-space rtprio-rlimit driven apps to not run away, it's about
> allowing _any_ RT task to be throttled by default if they run away.

Privileged users can break the kernel and kill everyone so easily anyway,
that this seems insane.


> On the other side of the equation, what exact application do you know
> that absolutely relies on being able to monopolize all CPU time in
> excess of 10 seconds? I havent heard much about that usecase. Why does
> that particular RT app do it, because that behavior sounds _very_ weird
> to me.

Somebody already reported their app failed with 1s. What makes you
think there are none around that fail with 10s? Changing old existing
userspace APIs can't be done just because a single person (you) can't
think of a counter example.

Especially not when it could equally be done just by introducing a new
API.


> If it's some embedded system or other special-purpose app then it can
> tweak the sysctl no problem. (it will have to do it anyway, to turn off
> the softlockup watchdog)

It won't because it won't be on by default.


> If it's some general purpose Linux app, exactly which one is it? If it's
> an OSS app please give me an URL to its source code, we need to fix it
> urgently. Running for more than 10 seconds wastes power like mad and is
> generally a very un-nice thing to do.

No, what's not nice is to subtly change behaviour in a way that's not
going to be detected except by random failures in the field.


> All in one, since the 'buggy RT app runs into a loop and monopolizes the
> CPU' case is much more common, i do think that supporting that usecase
> is the better choice for a default.

I disagree.

And given the amount of dual core CPUs around these days, I suspect you
exaggerate the number of bug reports you get about this too. But anyway
as I said, if you're enabling rt prio ulimit by default in your distro
and then dislike the local DoS it opens up, then why can't you also just
change the rt throttle yourself rather than breaking upstream?

2008-08-26 11:10:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, 26 Aug 2008, Nick Piggin wrote:

> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > * Nick Piggin <[email protected]> wrote:
> > > So... no reply to this? I'm really wondering how it's OK to break
> > > documented standards and previous Linux behaviour by default for
> > > something that it is trivial to solve in userspace? [...]
> >
> > I disagree
>
> Your arguments were along the line of:
>
> * It probably doesn't break anything (except we had somebody report
> that it breaks their app)

I'm a real-time oldtimer. An application which hogs the CPU for 9.9
seconds with SCHED_FIFO priority is just broken. It's broken beyond
all limits, whether POSIX allows to do that or Linux obeyed the
request of the braindamaged application design.

> * If it does break something then they must be doing something stupid
> (I refuted that because there are several legitimate ways to use rt
> scheduling that is broken by this)
>
> * We have many other APIs and tools that don't conform to posix (why
> is that a reason to break this one?)

Simply because we use common sense instead of following every single
POSIX brainfart by the letter.

> * We should break the API to cater for stupid users and distros who
> create local DoS and/or lock up their boxes (except this is trivial
> to solve by setting sysctls or having a watchdog or using sysrq)

For the vast majority of users and RT developers a sane default of
sanity measures is useful and sensible.

If someone wants to shoot himself in the foot then it's not an
unreasonable request that he needs to disable the safety guards before
pulling the trigger.

Thanks,

tglx

2008-08-26 11:27:50

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tuesday 26 August 2008 21:09, Thomas Gleixner wrote:
> On Tue, 26 Aug 2008, Nick Piggin wrote:
> > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > * Nick Piggin <[email protected]> wrote:
> > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > documented standards and previous Linux behaviour by default for
> > > > something that it is trivial to solve in userspace? [...]
> > >
> > > I disagree
> >
> > Your arguments were along the line of:
> >
> > * It probably doesn't break anything (except we had somebody report
> > that it breaks their app)
>
> I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> seconds with SCHED_FIFO priority is just broken. It's broken beyond
> all limits, whether POSIX allows to do that or Linux obeyed the
> request of the braindamaged application design.

Oh with this much handwaving from you old timers I feel much better
about it ;) I bet before the bug report and change to 10s, any
application that hogged the CPU for more than 0.9 seconds was just
broken too, right? But 10s is more than enough for everybody?

I may not be an old timer, but I can say the kernel is just broken
if it deliberately deviates from standards to undocumented behaviour,
and even more so if it changes from working to broken behaviour for
reasons that can be worked around in userspace (eg. running a higher
priority watchdog).


> > * If it does break something then they must be doing something stupid
> > (I refuted that because there are several legitimate ways to use rt
> > scheduling that is broken by this)
> >
> > * We have many other APIs and tools that don't conform to posix (why
> > is that a reason to break this one?)
>
> Simply because we use common sense instead of following every single
> POSIX brainfart by the letter.

How is that a brainfart? It is simple, relatively unambiguous, and not
arbitrary. You really say the POSIX specified behaviour is "a brainfart",
but adding an arbitrary 10s throttle "but the process might be preempted
and lose the CPU to a lower priority task if it uses 10s of consecutive
CPU time" would eliminate that brainfart? I have to laugh.


> > * We should break the API to cater for stupid users and distros who
> > create local DoS and/or lock up their boxes (except this is trivial
> > to solve by setting sysctls or having a watchdog or using sysrq)
>
> For the vast majority of users and RT developers a sane default of
> sanity measures is useful and sensible.

You seriously develop complex rt tasks without having at least a simple
watchdog task?


> If someone wants to shoot himself in the foot then it's not an
> unreasonable request that he needs to disable the safety guards before
> pulling the trigger.

root is allowed to shoot themselves in the foot. root is the safeguard.

2008-08-26 12:51:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, Aug 26, 2008 at 09:27:26PM +1000, Nick Piggin wrote:
>
> Oh with this much handwaving from you old timers I feel much better
> about it ;) I bet before the bug report and change to 10s, any
> application that hogged the CPU for more than 0.9 seconds was just
> broken too, right? But 10s is more than enough for everybody?
>

Actually, any real-time application which hogs the CPU at a high
real-time priority for more than one second is probably doing
something broken. The whole point of high real-time priorities is to
do something really fast, get in and get out. Usually such routines
are measured in milliseconds or microseconds.

Think about it *this* way --- what would you think of some device
driver which hogged an interrupt for a full second, never mind 10
seconds. You'd say it was broken, right? Now consider that a high
real-time priority thread might be running at a higher priority than
interrupt handlers, and in fact could preempt interrupt handlers....

> > Simply because we use common sense instead of following every single
> > POSIX brainfart by the letter.
>
> How is that a brainfart? It is simple, relatively unambiguous, and not
> arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> but adding an arbitrary 10s throttle "but the process might be preempted
> and lose the CPU to a lower priority task if it uses 10s of consecutive
> CPU time" would eliminate that brainfart? I have to laugh.

We've not followed POSIX before when it hasn't made sense. For
example, "df" and "du" report its output in kilobytes, instead of 512
byte sectors, per POSIX's demands.

> root is allowed to shoot themselves in the foot. root is the safeguard.

We've done things before to make things harder for root; for example
we've restricted what /dev/mem can do. And root can always lift the
ulimit.

- Ted

2008-08-26 13:30:22

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default



Am Dienstag, den 26.08.2008, 08:50 -0400 schrieb Theodore Tso:
> On Tue, Aug 26, 2008 at 09:27:26PM +1000, Nick Piggin wrote:
> >
> > Oh with this much handwaving from you old timers I feel much better
> > about it ;) I bet before the bug report and change to 10s, any
> > application that hogged the CPU for more than 0.9 seconds was just
> > broken too, right? But 10s is more than enough for everybody?
> >

Sorry, the world of embedded programming is sometime stranger than in
theory. Normally it would not happen that a real-time process locks the
CPU for more than 1 sec. But in some circumstances, especially FPGA
initialisation and long term measurements it is possible that the
real-time process locks the cpu for more than a, sometime for more than
10 sec. If the embedded program has designed it in that way, this
behaviour is desired.


> Actually, any real-time application which hogs the CPU at a high
> real-time priority for more than one second is probably doing
> something broken. The whole point of high real-time priorities is to
> do something really fast, get in and get out. Usually such routines
> are measured in milliseconds or microseconds.


> Think about it *this* way --- what would you think of some device
> driver which hogged an interrupt for a full second, never mind 10
> seconds. You'd say it was broken, right? Now consider that a high
> real-time priority thread might be running at a higher priority than
> interrupt handlers, and in fact could preempt interrupt handlers....
>
> > > Simply because we use common sense instead of following every single
> > > POSIX brainfart by the letter.
> >
> > How is that a brainfart? It is simple, relatively unambiguous, and not
> > arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> > but adding an arbitrary 10s throttle "but the process might be preempted
> > and lose the CPU to a lower priority task if it uses 10s of consecutive
> > CPU time" would eliminate that brainfart? I have to laugh.
>
> We've not followed POSIX before when it hasn't made sense. For
> example, "df" and "du" report its output in kilobytes, instead of 512
> byte sectors, per POSIX's demands.
>

This has nothing to do with POSIX. It is standard real time behaviour.
RT Programming is a job like writing device drivers. U must know what
you do.

Modify the scheduler in that way that a realtime process will give away
the CPU after a given time will certain break some embedded application.

Don't think only in desktop or enterprise LINUX boxes, there a much more
LINUX embedded devices on this planet and not less of them rely on the
old scheduler behaviour.

The LINUX base guideline is simple in that way, that the kernel will
never break userland applications.

> > root is allowed to shoot themselves in the foot. root is the safeguard.
>
> We've done things before to make things harder for root; for example
> we've restricted what /dev/mem can do. And root can always lift the
> ulimit.
>
> - Ted

What coming at next? A device driver manager, which kills any driver
which use to much CPU resource? Or throttle/kicks off the responsible
driver if the hardware generates to many interrupts?

Kernel and embedded real time programmer should know what there do.

Stefani

2008-08-26 17:56:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, Aug 26, 2008 at 03:31:27PM +0200, Stefani Seibold wrote:
>
> Sorry, the world of embedded programming is sometime stranger than in
> theory. Normally it would not happen that a real-time process locks the
> CPU for more than 1 sec. But in some circumstances, especially FPGA
> initialisation and long term measurements it is possible that the
> real-time process locks the cpu for more than a, sometime for more than
> 10 sec. If the embedded program has designed it in that way, this
> behaviour is desired.
>

And if that's true, the embedded program can adjust the ulimit to
change the priority levels as appropriately. Real-time programming
will always required a bit more configuration, such as what priority
various hard and soft interrupt routines will run it. This is just
one more configuration option.

> What coming at next? A device driver manager, which kills any driver
> which use to much CPU resource? Or throttle/kicks off the responsible
> driver if the hardware generates to many interrupts?

Actually, we have both of these already. :-)

- Ted

2008-08-26 17:59:39

by Mark Hounschell

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Thomas Gleixner wrote:
> On Tue, 26 Aug 2008, Nick Piggin wrote:
>
>> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
>>> * Nick Piggin <[email protected]> wrote:
>>>> So... no reply to this? I'm really wondering how it's OK to break
>>>> documented standards and previous Linux behaviour by default for
>>>> something that it is trivial to solve in userspace? [...]
>>> I disagree
>> Your arguments were along the line of:
>>
>> * It probably doesn't break anything (except we had somebody report
>> that it breaks their app)
>
> I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> seconds with SCHED_FIFO priority is just broken. It's broken beyond
> all limits, whether POSIX allows to do that or Linux obeyed the
> request of the braindamaged application design.
>

Well, I've been working on RT hardware (mostly) and software since 1977.
With all due respect, thats crapola. I for one have this requirement and
there is _no_ way around it in my world. In fact it's the kernel thats broke
by stealing precious usecs from me.

>From my point of view, as an RT user, any kernel that supports SMP yet can't
guarantee me %100 of even one _my_ processors is just a plainly broken kernel.

>> * If it does break something then they must be doing something stupid
>> (I refuted that because there are several legitimate ways to use rt
>> scheduling that is broken by this)
>>
>> * We have many other APIs and tools that don't conform to posix (why
>> is that a reason to break this one?)
>
> Simply because we use common sense instead of following every single
> POSIX brainfart by the letter.
>
>> * We should break the API to cater for stupid users and distros who
>> create local DoS and/or lock up their boxes (except this is trivial
>> to solve by setting sysctls or having a watchdog or using sysrq)
>
> For the vast majority of users and RT developers a sane default of
> sanity measures is useful and sensible.
>
> If someone wants to shoot himself in the foot then it's not an
> unreasonable request that he needs to disable the safety guards before
> pulling the trigger.
>

Again that is also crapola. If i want to shoot myself in the foot, it's
none of your concern. I know perfectly well what will happen when
I pull the trigger.

My 2 cents
Regards
Mark

2008-08-26 21:38:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, 26 Aug 2008, Nick Piggin wrote:
> On Tuesday 26 August 2008 21:09, Thomas Gleixner wrote:
> > On Tue, 26 Aug 2008, Nick Piggin wrote:
> > > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > > * Nick Piggin <[email protected]> wrote:
> > > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > > documented standards and previous Linux behaviour by default for
> > > > > something that it is trivial to solve in userspace? [...]
> > > >
> > > > I disagree
> > >
> > > Your arguments were along the line of:
> > >
> > > * It probably doesn't break anything (except we had somebody report
> > > that it breaks their app)
> >
> > I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> > seconds with SCHED_FIFO priority is just broken. It's broken beyond
> > all limits, whether POSIX allows to do that or Linux obeyed the
> > request of the braindamaged application design.
>
> Oh with this much handwaving from you old timers I feel much better
> about it ;) I bet before the bug report and change to 10s, any
> application that hogged the CPU for more than 0.9 seconds was just
> broken too, right? But 10s is more than enough for everybody?

Well, we might have a public opinion poll, whether a system is
declared frozen after 1, 10 or 100 seconds. Even a one second
unresponsivness shows up on the kernel bugzilla and you request that
unlimited unresponsivness w/o a chance to debug it is the sane
default.

An one second RT CPU hog is just a broken application, nothing
else. Your precious customer use case is simply crap.

Real-time is about determinism and not about the allowance to fuck up
a system at will. If a system failed to prevent the fuckup once then
this is not at all a guarantee that it allows to do that forever.

Especially not in the Open Source space, where developers are still
allowed to use their brain and apply common sense to prevent such a
wreckage and abuse. Still, your not yet specified use case can
continue to do stupid things forever with the simple tweak that it
needs to declare itself broken by turning off the kernel sanity
checks.

> I may not be an old timer, but I can say the kernel is just broken
> if it deliberately deviates from standards to undocumented behaviour,
> and even more so if it changes from working to broken behaviour for
> reasons that can be worked around in userspace (eg. running a higher
> priority watchdog).

Right. I appreciate the nitpicking janitor of the most important POSIX
feature:

"The unlimited right to monopolize the CPU for any given timeframe."

Get your brain together. Just because it worked before and POSIX
allows it is not an argument at all that it is something useful. If
you want to do this you still can do it by resetting the limit.

Your request to enforce that stupid and braindead behaviour on
everyone is simply annyoing.

> > > * If it does break something then they must be doing something stupid
> > > (I refuted that because there are several legitimate ways to use rt
> > > scheduling that is broken by this)
> > >
> > > * We have many other APIs and tools that don't conform to posix (why
> > > is that a reason to break this one?)
> >
> > Simply because we use common sense instead of following every single
> > POSIX brainfart by the letter.
>
> How is that a brainfart? It is simple, relatively unambiguous, and not
> arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> but adding an arbitrary 10s throttle "but the process might be preempted
> and lose the CPU to a lower priority task if it uses 10s of consecutive
> CPU time" would eliminate that brainfart? I have to laugh.

No, I did not say that. All I said is that giving the normal and
common sense capable user/developer the chance to debug a runaway task
w/o rebooting the system via the power off button is a sensible and
useful default.

Your request to default to a possibly unusable system serves some yet
to be explained higher goal, which is definitely out of the scope of
common sense.

You still did not explain why this behaviour is useful and your
handwaving vs. some (probably closed source) customer application is
not an argument at all.

> > > * We should break the API to cater for stupid users and distros who
> > > create local DoS and/or lock up their boxes (except this is trivial
> > > to solve by setting sysctls or having a watchdog or using sysrq)
> >
> > For the vast majority of users and RT developers a sane default of
> > sanity measures is useful and sensible.
>
> You seriously develop complex rt tasks without having at least a simple
> watchdog task?

Dude, don't tell me how to design and debug a real time system.

It's not about me, but about the general usability and debuggability
of Linux even in extreme situations, e.g. an unvoluntary runaway task,
which we see even from time to time in bug reports. Having a sensible
default guard is helping in the common case and denying it is just a
selfserving attitude to keep some braindamaged customer niche
application alive. Linux and Open Source is not about the customer
application, it is about having a sane and safe environment for 99% of
the use cases. Your pretious CPU hog SCHED_FIFO application is an
engineering brainfart which is really not relevant to any community
decision of a sane and per default safe guarded OS.

> > If someone wants to shoot himself in the foot then it's not an
> > unreasonable request that he needs to disable the safety guards before
> > pulling the trigger.
>
> root is allowed to shoot themselves in the foot. root is the safeguard.

Sure. You are allowed to shoot yourself in the foot as well. Does the
gun manufacturer omit safety guards just because you are allowed to
and just because the 1990 version of the gun did not have that safety
guard ?

Again. Common sense is way more important than some green table
specification and some esoteric customer application.

Thanks,

tglx

2008-08-26 22:49:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Thomas Gleixner <[email protected]> writes:

> Well, we might have a public opinion poll, whether a system is
> declared frozen after 1, 10 or 100 seconds. Even a one second
> unresponsivness shows up on the kernel bugzilla and you request that
> unlimited unresponsivness w/o a chance to debug it is the sane
> default.

That assumes single CPU. With multiple CPUs and not
all hogged the system should be still responsive?

-Andi

2008-08-26 23:00:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, Aug 26, 2008 at 09:47:33AM -0400, Mark Hounschell wrote:
> Thomas Gleixner wrote:
>> On Tue, 26 Aug 2008, Nick Piggin wrote:
>>
>>> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
>>>> * Nick Piggin <[email protected]> wrote:
>>>>> So... no reply to this? I'm really wondering how it's OK to break
>>>>> documented standards and previous Linux behaviour by default for
>>>>> something that it is trivial to solve in userspace? [...]
>>>> I disagree
>>> Your arguments were along the line of:
>>>
>>> * It probably doesn't break anything (except we had somebody report
>>> that it breaks their app)
>>
>> I'm a real-time oldtimer. An application which hogs the CPU for 9.9
>> seconds with SCHED_FIFO priority is just broken. It's broken beyond
>> all limits, whether POSIX allows to do that or Linux obeyed the
>> request of the braindamaged application design.
>>
>
> Well, I've been working on RT hardware (mostly) and software since 1977.
> With all due respect, thats crapola. I for one have this requirement and
> there is _no_ way around it in my world. In fact it's the kernel thats broke
> by stealing precious usecs from me.

I'm sorry, but I need to agree with this. I've been focused more on RT
and in military apps since 1991 (not as long as 77 though :-)

There's two issues here.

1) What FIFO means

2) Protecting the 99% of the users


What most real RT centric folks will want is the true meaning of FIFO.
That is, a FIFO task can run as long as it wants using as much CPU as it
wants until a) a higher RT task preempts it, or b) it voluntarily
releases the CPU.

This change, without doubt, breaks the definition of what a FIFO task
is. This is the kernel imposing policy onto userspace.

What Thomas Gleixner and Ingo Molnar are doing, is focusing on 2 above.
(protecting the 99% of users). This is reasonable, since thats who will
bug them the most when things break.

The problem I have, is that this is breaking a defined user API. A
default that is well known within the RT community. The simple
definition of FIFO.

What I would suggest is this.

1) Keep the default as the infinite for those that know what they are
doing.

2) Change the sysctl scripts in the distros to set the default to a sane
time that will protect the users.

An RT app that would break the 10s limit would probably be using busybox
anyway, so the default for that would be what the kernel comes up with.

The default the 99% of users would have, is what the distro set it to
for them.

This seems like a sane solution to satisfy both camps.

-- Steve

2008-08-27 10:05:09

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Wednesday 27 August 2008 07:37, Thomas Gleixner wrote:
> On Tue, 26 Aug 2008, Nick Piggin wrote:
> > On Tuesday 26 August 2008 21:09, Thomas Gleixner wrote:
> > > On Tue, 26 Aug 2008, Nick Piggin wrote:
> > > > On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > > > > * Nick Piggin <[email protected]> wrote:
> > > > > > So... no reply to this? I'm really wondering how it's OK to break
> > > > > > documented standards and previous Linux behaviour by default for
> > > > > > something that it is trivial to solve in userspace? [...]
> > > > >
> > > > > I disagree
> > > >
> > > > Your arguments were along the line of:
> > > >
> > > > * It probably doesn't break anything (except we had somebody report
> > > > that it breaks their app)
> > >
> > > I'm a real-time oldtimer. An application which hogs the CPU for 9.9
> > > seconds with SCHED_FIFO priority is just broken. It's broken beyond
> > > all limits, whether POSIX allows to do that or Linux obeyed the
> > > request of the braindamaged application design.
> >
> > Oh with this much handwaving from you old timers I feel much better
> > about it ;) I bet before the bug report and change to 10s, any
> > application that hogged the CPU for more than 0.9 seconds was just
> > broken too, right? But 10s is more than enough for everybody?
>
> Well, we might have a public opinion poll, whether a system is
> declared frozen after 1, 10 or 100 seconds.

I don't understand the fixation on declaring a system frozen. I repeat:
how do you know "rt task code that hogs the CPU for 10s is broken"? This
still hasn't been adequately explained to me, and from responses to this
post, it seems that others have a different view than you do.


> Even a one second
> unresponsivness shows up on the kernel bugzilla and you request that
> unlimited unresponsivness w/o a chance to debug it is the sane
> default.
>
> An one second RT CPU hog is just a broken application, nothing
> else. Your precious customer use case is simply crap.

What customer use case are you talking about? I never mentioned one and
have none. Are you confusing me with someone else?

But OK, so if someone else has a customer use case that breaks, what
makes you think you can just declare it is crap and we don't care about
it? For that matter, what has closed source got to do with it? We don't
break kernel userspace API regardless of closed source or open source.


> Real-time is about determinism and not about the allowance to fuck up
> a system at will. If a system failed to prevent the fuckup once then
> this is not at all a guarantee that it allows to do that forever.


This is just handwaving and ignoring the issue at hand. SCHED_FIFO and
SCHED_RT are exactly about being able to hog the CPU. That is exactly
how they are defined.


> Especially not in the Open Source space, where developers are still
> allowed to use their brain and apply common sense to prevent such a
> wreckage and abuse. Still, your not yet specified use case can
> continue to do stupid things forever with the simple tweak that it
> needs to declare itself broken by turning off the kernel sanity
> checks.

Huh? Again, I don't have a use case, and even ignoring the several posts
of people who do, I would still make the same argument because it is
plain for me to see that breaking the API by default is the wrong thing
to do.


> > I may not be an old timer, but I can say the kernel is just broken
> > if it deliberately deviates from standards to undocumented behaviour,
> > and even more so if it changes from working to broken behaviour for
> > reasons that can be worked around in userspace (eg. running a higher
> > priority watchdog).
>
> Right. I appreciate the nitpicking janitor of the most important POSIX
> feature:
>
> "The unlimited right to monopolize the CPU for any given timeframe."

Umm... yeah. That's exactly one of the important properties of SCHED_FIFO
and SCHED_RR. Why do you think it is OK to change this?


> Get your brain together. Just because it worked before and POSIX
> allows it is not an argument at all that it is something useful. If
> you want to do this you still can do it by resetting the limit.
>
> Your request to enforce that stupid and braindead behaviour on
> everyone is simply annyoing.

Get my brain together? You're the one with faulty reasoning on this issue.


> > > > * If it does break something then they must be doing something stupid
> > > > (I refuted that because there are several legitimate ways to use rt
> > > > scheduling that is broken by this)
> > > >
> > > > * We have many other APIs and tools that don't conform to posix (why
> > > > is that a reason to break this one?)
> > >
> > > Simply because we use common sense instead of following every single
> > > POSIX brainfart by the letter.
> >
> > How is that a brainfart? It is simple, relatively unambiguous, and not
> > arbitrary. You really say the POSIX specified behaviour is "a brainfart",
> > but adding an arbitrary 10s throttle "but the process might be preempted
> > and lose the CPU to a lower priority task if it uses 10s of consecutive
> > CPU time" would eliminate that brainfart? I have to laugh.
>
> No, I did not say that. All I said is that giving the normal and
> common sense capable user/developer the chance to debug a runaway task
> w/o rebooting the system via the power off button is a sensible and
> useful default.

I don't deny that the runaway task thing is a *small* advantage. But
it is the only one, and weighed against lots of negatives.


> Your request to default to a possibly unusable system serves some yet
> to be explained higher goal, which is definitely out of the scope of
> common sense.
>
> You still did not explain why this behaviour is useful and your
> handwaving vs. some (probably closed source) customer application is
> not an argument at all.

You have it completely backwards. If someone wants to change a userspace API,
it is *they* who must not handwave about why "anybody who wants to do that is
broken anyway so we don't care about them".

I, on the other hand, opposing the API change, sure can handwave or find one
or two counter examples as to why we might have users relying on the old
behaviour.

The replies you got might convince you that your view of the rt world is not
the complete and only picture. But if not, then consider that rt tasks need
not have a fixed amount of work to be done per unit of time but they may
scale work according to the available CPU power. Or it may be something
that runs a polling loop I guess.


> > > > * We should break the API to cater for stupid users and distros who
> > > > create local DoS and/or lock up their boxes (except this is trivial
> > > > to solve by setting sysctls or having a watchdog or using sysrq)
> > >
> > > For the vast majority of users and RT developers a sane default of
> > > sanity measures is useful and sensible.
> >
> > You seriously develop complex rt tasks without having at least a simple
> > watchdog task?
>
> Dude, don't tell me how to design and debug a real time system.

I didn't tell you, I asked you. Do you develop without a watchdog? Do
you think the majority of RT developers do?

Because if so, then I certianly will tell you to use a watchdog to get
the debuggability you ask for, rather than break the kernel interface
for everyone else.. If not, then the RT developers debuggability
argument is false.


> It's not about me, but about the general usability and debuggability
> of Linux even in extreme situations, e.g. an unvoluntary runaway task,
> which we see even from time to time in bug reports. Having a sensible
> default guard is helping in the common case and denying it is just a
> selfserving attitude to keep some braindamaged customer niche
> application alive. Linux and Open Source is not about the customer
> application, it is about having a sane and safe environment for 99% of
> the use cases. Your pretious CPU hog SCHED_FIFO application is an
> engineering brainfart which is really not relevant to any community
> decision of a sane and per default safe guarded OS.

Enough with this strawman, please. I never argued in the context of having
a specific broken application. It is the concept of changing this interface
which is what I am arguing against.

However, assuming I did have some customer application, I would know why
you think it is OK that it has been broken "because it must be crap anyway".


> > > If someone wants to shoot himself in the foot then it's not an
> > > unreasonable request that he needs to disable the safety guards before
> > > pulling the trigger.
> >
> > root is allowed to shoot themselves in the foot. root is the safeguard.
>
> Sure. You are allowed to shoot yourself in the foot as well. Does the
> gun manufacturer omit safety guards just because you are allowed to
> and just because the 1990 version of the gun did not have that safety
> guard ?

Making arguments with metaphores like this is useless. How are we supposed
to have a sane technical argument otherwise?

So: root can shoot themselves in the foot, easily, in many ways. Lots of
ways do not have safeguards. This has never been considered a problem before.


> Again. Common sense is way more important than some green table
> specification and some esoteric customer application.

It is not some green table specification. It is really widely accepted
and implemented behaviour, and perhaps most importantly it has existed
that way in Linux for a long time.

I can't believe I have to argue so hard against this change to the API.

If you and your users or developers want a different scheduling policy
that throttles, WTF not just create a new SCHED_ policy? People that
ask for SCHED_FIFO are expecting to get what SCHED_FIFO gives in other
operating systems, in older Linux versions, and in specifications. You
can't tell me that *I'm* wrong for advocating that we implement this
correctly -- you have to tell all users of this API that they're wrong
for asking for it, and then you can provide a SCHED_FIFO_THROTTLED or
something for them to use.

2008-08-27 10:08:33

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> Thomas Gleixner <[email protected]> writes:
> > Well, we might have a public opinion poll, whether a system is
> > declared frozen after 1, 10 or 100 seconds. Even a one second
> > unresponsivness shows up on the kernel bugzilla and you request that
> > unlimited unresponsivness w/o a chance to debug it is the sane
> > default.
>
> That assumes single CPU. With multiple CPUs and not
> all hogged the system should be still responsive?

Right.

But also it assumes desktop/general purpose server thing.

There may not even be any user interface to be unresponsive. Or it
may be something implemented with a userspace driven scheduling
system. Or an event loop in a single process.

2008-08-27 18:56:19

by Chris Friesen

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Steven Rostedt wrote:

> What I would suggest is this.
>
> 1) Keep the default as the infinite for those that know what they are
> doing.
>
> 2) Change the sysctl scripts in the distros to set the default to a sane
> time that will protect the users.
>
> An RT app that would break the 10s limit would probably be using busybox
> anyway, so the default for that would be what the kernel comes up with.
>
> The default the 99% of users would have, is what the distro set it to
> for them.
>
> This seems like a sane solution to satisfy both camps.


Makes sense to me. It could even get sent out to users about as fast as
a new kernel by itself, since they could just add a package dependency
to update the init scripts when the end-user installs the new kernel
package.

Anyone messing with the kernel directly is likely 1) smart enough to
deal with existing FIFO semantics, and 2) able to modify their own init
scripts to get some additional security if they so desire.

Chris

2008-08-28 10:54:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Nick Piggin <[email protected]> wrote:

> On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> > Thomas Gleixner <[email protected]> writes:
> > > Well, we might have a public opinion poll, whether a system is
> > > declared frozen after 1, 10 or 100 seconds. Even a one second
> > > unresponsivness shows up on the kernel bugzilla and you request that
> > > unlimited unresponsivness w/o a chance to debug it is the sane
> > > default.
> >
> > That assumes single CPU. With multiple CPUs and not
> > all hogged the system should be still responsive?
>
> Right.

Wrong.

Even if the system has multiple CPUs, and even if just a single CPU is
fully utilized by an RT task, without the rt-limit the system will still
lock up in practice due to various other factors: workqueues and tasks
being 'stuck' on CPUs that host an RT hog. While there's obviously CPU
time available on other CPUs, you cannot run 'top', the desktop will
freeze, work flows of the system can be stuck, etc, etc..

With the rt limit in place, it's all pretty smooth and debuggable. Even
with all CPUs hogged by SCHED_FIFO prio 99 the system is laggy but
debuggable - the user can run 'top' and can resolve the situation.

Really, this reply of yours shows something startling: that despite this
many mails you still have never actually tried to run the scenario you
are complaining about: you have never tried to run a CPU hog high-prio
RT task on a Linux system before, and you have never observed the
effects it has on general system stability and debuggability.

This fundamental lack of experience weakens all your arguments and i
dont even know why you are arguing about it. Do you perhaps have some
customer application/workload you are worried about? If you have then
please tell us about the exact specifics - this handwaving about
compliance really makes little sense.

In other words: in our car the air-bag continues to be enabled by
default, and if someone wants to use the car for stunts the air-bag can
be disabled via that handy sysctl.

In any case i think i'm going to ignore this thread from now on, nothing
new has been said really, just the general tone of discussion is
deteriorating. You are also very late with raising objections in any
case - the rt-limit feature has been posted 10 months ago and went
upstream 8 months ago - two full kernel cycles have been completed with
this change in place and a third one has almost been finished.

Ingo

2008-08-28 11:06:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

> Even if the system has multiple CPUs, and even if just a single CPU is
> fully utilized by an RT task, without the rt-limit the system will still
> lock up in practice due to various other factors: workqueues and tasks
> being 'stuck' on CPUs that host an RT hog.

The load balancer will not notice that a particular CPU is busy
with real time tasks?

> While there's obviously CPU
> time available on other CPUs, you cannot run 'top', the desktop will
> freeze, work flows of the system can be stuck, etc, etc..

I had such a situation at least once in the past (not due
run away RT but due a kernel bug) and even with 2 out of 4 CPUs blocked
the system was still quite usable. top/kill definitely worked. The system
didn't have a desktop, but I didn't notice many problems in shell use.
Ok it's just one sample.

That said I don't think having such a limit by default is a bad idea actually.

Just handling it in the scheduler anyways is also probably good because
it can happen even due to other issues than just run away RT tasks.

-Andi

2008-08-28 11:19:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > Even if the system has multiple CPUs, and even if just a single CPU is
> > fully utilized by an RT task, without the rt-limit the system will still
> > lock up in practice due to various other factors: workqueues and tasks
> > being 'stuck' on CPUs that host an RT hog.
>
> The load balancer will not notice that a particular CPU is busy
> with real time tasks?

Not currently, working on that though.

2008-08-28 11:29:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Peter Zijlstra <[email protected]> wrote:

> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > > Even if the system has multiple CPUs, and even if just a single CPU is
> > > fully utilized by an RT task, without the rt-limit the system will still
> > > lock up in practice due to various other factors: workqueues and tasks
> > > being 'stuck' on CPUs that host an RT hog.
> >
> > The load balancer will not notice that a particular CPU is busy
> > with real time tasks?
>
> Not currently, working on that though.

yeah, that's nice - i tried the earlier iteration of your patch already.
It doesnt solve the UP case obviously, nor the case where all CPUs are
hogged by RT tasks, nor any other (or future) per CPU aspect of Linux
that we have in place currently.

Ingo

2008-08-28 11:48:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > > Even if the system has multiple CPUs, and even if just a single CPU is
> > > fully utilized by an RT task, without the rt-limit the system will still
> > > lock up in practice due to various other factors: workqueues and tasks
> > > being 'stuck' on CPUs that host an RT hog.
> >
> > The load balancer will not notice that a particular CPU is busy
> > with real time tasks?
>
> Not currently, working on that though.

I wonder if it would make sense to break affinities in extreme case?
With that even the workqueues would work again.

-Andi

--
[email protected]

2008-08-28 12:01:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, 2008-08-28 at 13:50 +0200, Andi Kleen wrote:
> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> > > > Even if the system has multiple CPUs, and even if just a single CPU is
> > > > fully utilized by an RT task, without the rt-limit the system will still
> > > > lock up in practice due to various other factors: workqueues and tasks
> > > > being 'stuck' on CPUs that host an RT hog.
> > >
> > > The load balancer will not notice that a particular CPU is busy
> > > with real time tasks?
> >
> > Not currently, working on that though.
>
> I wonder if it would make sense to break affinities in extreme case?
> With that even the workqueues would work again.

Then people can no longer assume stuff like queue_work_on() etc.. works.
Users of such code might depend on it actually running on the specified
cpu.


2008-08-28 12:04:12

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thursday 28 August 2008 20:54, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > On Wednesday 27 August 2008 08:49, Andi Kleen wrote:
> > > Thomas Gleixner <[email protected]> writes:
> > > > Well, we might have a public opinion poll, whether a system is
> > > > declared frozen after 1, 10 or 100 seconds. Even a one second
> > > > unresponsivness shows up on the kernel bugzilla and you request that
> > > > unlimited unresponsivness w/o a chance to debug it is the sane
> > > > default.
> > >
> > > That assumes single CPU. With multiple CPUs and not
> > > all hogged the system should be still responsive?
> >
> > Right.
>
> Wrong.
>
> Even if the system has multiple CPUs, and even if just a single CPU is
> fully utilized by an RT task, without the rt-limit the system will still
> lock up in practice due to various other factors: workqueues and tasks
> being 'stuck' on CPUs that host an RT hog. While there's obviously CPU
> time available on other CPUs, you cannot run 'top', the desktop will
> freeze, work flows of the system can be stuck, etc, etc..

No, it is right. With caveats. Because you can pretty well isolate a
CPU from running kernel threads or work. At any rate, I don't think it
is your decision to just mandate this.


> With the rt limit in place, it's all pretty smooth and debuggable. Even
> with all CPUs hogged by SCHED_FIFO prio 99 the system is laggy but
> debuggable - the user can run 'top' and can resolve the situation.

When I write rt apps, I run a watchdog thread which detects a hang
task and kills it.


> Really, this reply of yours shows something startling: that despite this
> many mails you still have never actually tried to run the scenario you
> are complaining about: you have never tried to run a CPU hog high-prio
> RT task on a Linux system before, and you have never observed the
> effects it has on general system stability and debuggability.

Of course I have and of course I know what it does if you run a
for (;;) rt thread on an ordinary Linux desktop system. Trying to
"fix" that for people is not a good reason to break the API.


> This fundamental lack of experience weakens all your arguments and i
> dont even know why you are arguing about it. Do you perhaps have some
> customer application/workload you are worried about? If you have then
> please tell us about the exact specifics - this handwaving about
> compliance really makes little sense.

You're continually ignoring all of my arguments and instead raising
irrelvant things like this.

You ignored others in this thread who replied with real uses of the
rt scheduling that is being prevented by this API breakage, and
you're ignoring my examples of how it could be used and just keep
asserting that "anybody who does that is broken anyway".

You also ignored when I told you how you can fix this correctly by
introducing new SCHED_xxx scheduling policies that won't break
backwards compatibility and will be defined from the outset to be
throttled as such.

There is no customer issue and there is no handwaving about compliance;
it is a black and white issue: this behaviour breaks all documentation,
previous Linux behaviour, other systems.


> In other words: in our car the air-bag continues to be enabled by
> default, and if someone wants to use the car for stunts the air-bag can
> be disabled via that handy sysctl.

How am I supposed to respond to that? My car doesn't have an air bag
but it's breaks don't stop working every 10 seconds.

Can we stop with the car and gun analogies now?


> In any case i think i'm going to ignore this thread from now on, nothing
> new has been said really, just the general tone of discussion is
> deteriorating.

OK, if you don't wish to have further discussion then I will submit a
patch to Linus and I'll see what he says.


> You are also very late with raising objections in any
> case - the rt-limit feature has been posted 10 months ago and went
> upstream 8 months ago - two full kernel cycles have been completed with
> this change in place and a third one has almost been finished.

So what?

2008-08-28 12:12:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

> Then people can no longer assume stuff like queue_work_on() etc.. works.
> Users of such code might depend on it actually running on the specified
> cpu.

If they assume that they're already buggy because CPU hot unplug will break
affinities.

-Andi

2008-08-28 12:18:33

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thursday 28 August 2008 22:14, Andi Kleen wrote:
> > Then people can no longer assume stuff like queue_work_on() etc.. works.
> > Users of such code might depend on it actually running on the specified
> > cpu.
>
> If they assume that they're already buggy because CPU hot unplug will break
> affinities.

It is actually possible (with fairly little work, last time I looked,
maybe it is already integrated in the kernel) to avoid all this kind of
thing from isolated CPUs.

But even then, note that the types of programs using the CPU for long
periods are obviously not going to be run on an average desktop system.
So the responsiveness argument is laughable. Responsive as defined how?
And in relation to what type of systems?

2008-08-28 12:29:51

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thursday 28 August 2008 20:54, Ingo Molnar wrote:

> This fundamental lack of experience weakens all your arguments and i
> dont even know why you are arguing about it.

BTW. this is funny that you just decide you can somehow "weaken"
my technical arguments because of some of my personal attribute
you believe about me.

You don't know why I am arguing? I'll put it very simply one more
time.

- This behaviour has changed the kernel's userspace API in a way
that can break existing applications.

That is my primary point. If you think it gets somehow weaker
because you don't think I have ever locked up my workstation with
an RT task, then I give up arguing with you.

2008-08-28 13:07:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Nick Piggin <[email protected]> wrote:

> There is no customer issue and there is no handwaving about
> compliance;

well, the reason i'm asking is that i cannot for anything in the world
imagine you being so upset about _anything_ but something that involves
benchmark runs ;-)

And what does SCHED_FIFO RT policy scheduling have to do with
performance and benchmarks? Nothing usually in the real world, except
for this little known fact: a common 'tuning' for TPC database
benchmarks is to run all DB threads as SCHED_FIFO to squeeze the last
0.1% of performance out of the setup.

So - and i'm taking an educated guess here - is SCHED_FIFO+TPC
performance perhaps one of the factors that played a role in you
initiating this thread? If yes then it's obviously an incredibly broken
use of SCHED_FIFO and we can add the sysctl tuning to the long list of
dozens of other tunings that happen before a TPC run anyway.

Hm?

Ingo

2008-08-28 13:45:34

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thursday 28 August 2008 23:07, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > There is no customer issue and there is no handwaving about
> > compliance;
>
> well, the reason i'm asking is that i cannot for anything in the world
> imagine you being so upset about _anything_ but something that involves
> benchmark runs ;-)

;) Well yes as you know I'm not actively doing much scheduler work for
a while now. Luckily there are a lot of really good people who probably
do a better job on it than me anyway, so on the whole I'm quite happy
with it.

But ironically that's also why I hadn't raised my concerns earlier... I
simply was not aware of the change. So I wish I had participated in the
discussion earlier, but that's life, so I have to raise my concern now.


> And what does SCHED_FIFO RT policy scheduling have to do with
> performance and benchmarks? Nothing usually in the real world, except
> for this little known fact: a common 'tuning' for TPC database
> benchmarks is to run all DB threads as SCHED_FIFO to squeeze the last
> 0.1% of performance out of the setup.
>
> So - and i'm taking an educated guess here - is SCHED_FIFO+TPC
> performance perhaps one of the factors that played a role in you
> initiating this thread? If yes then it's obviously an incredibly broken
> use of SCHED_FIFO and we can add the sysctl tuning to the long list of
> dozens of other tunings that happen before a TPC run anyway.
>
> Hm?

To address this concern: no, it is not tpc ;) Actually I don't know a
thing about how tpc except what scant information can basically be
gained on the list (disclaimer: I probably could find out more under
NDA, but I don't care to).

No, there is no customer behind the scenes and nor do I have a use
case myself. I really would have told you about it by now.

I'm concerned because I honestly think there is a risk of breaking
systems. I also think that in this problem space, people often care
about guard bands and worst case scenarios so even if the app does
not do a cpu hogging polling loop or cooperative scheduling or
anything like that, then I think it is risky to add this source of
uncertianty.

The other issue is that the old behaviour (and, dare I say it,
specification) is quite straightforward. At least it is simpler and thus
I guess easier to analyze than this behaviour with the added caveat.

I realise that as Linux gets better at this, people are wanting to use
-rt programs like audio mixing on their desktops and for that kind of
thing, throttling is probably often the desired behaviour. So I can
see why it was implemented. I just think it is a nasty surprise to
have this behaviour by default in the kernel.

I hope I explained myself better now. I was not being too constructive
when I was getting heated.

What I would like to see is maybe a new SCHED_ policy or two which can
be defined basically as rt-with-throttle which some apps could use. I
also think the sysctl to throttle it is a fine idea. And for desktop
installations there is probably a much stronger argument for it. But I
disagree with having it default from kernel.org like this.

2008-08-28 14:15:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, Aug 19, 2008 at 01:05:57PM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > Disable bandwidth control by default.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > ---
> > kernel/sched.c | 17 +++++++----------
> > 1 file changed, 7 insertions(+), 10 deletions(-)
> >
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
> >
> > /*
> > * part of the period that we allow rt tasks to run in us.
> > - * default: 0.95s
> > + * default: inf
> > */
> > -int sysctl_sched_rt_runtime = 950000;
> > +int sysctl_sched_rt_runtime = -1;
>
> The fixes look good to me, but this enabling of infinite RT task lockups
> is not an improvement.
>
> The thing is, i got far more bugreports about locked up RT tasks where
> the lockup was unintentional, than real bugreports about anyone
> _intending_ for the whole box to come to a grinding halt because a
> high-prio RT tasks is monopolizing the CPU.
>
> In fact there's only been this artificial test so far.
>
> So could you please just increase the chunking to 10 seconds or so, from
> the current 1 second? Anyone locking up the system for more than 10
> seconds via an RT task has to deal with many other issues already.
>
> I.e. keep the system borderline debuggable (up to 10 seconds delays are
> _not_ nice so people will notice) - but it's still a marked improvement
> from completly locked up desktops.
>
> And those who really need longer than 10 second periods can set it
> higher, or even (if they want to live dangerously or run POSIX
> conformance tests) make it infinite (set it to -1) - and will have to
> deal with other things like the softlockup watchdog as well.

My biggest concern about adding a limit to FIFO is that an RT developer
would spend weeks trying to debug their system wondering why their
planned CPU RT hog, is being preempted by a non-RT task.

For this, if this time limit does kick in, we should at the very least
print something out to let the user know this happened. After all, this
is more of a safety net anyway, and if we are hitting the limit, the
user should be notified. Perhaps even tell the user that if this
behaviour is expected, to up the sysctl <var> by more.

Peter, another question. Is this limit for a single RT task running, or
all RT tasks. I'm assuming here that it is a single RT task. If you have
20 RT tasks all running, would this let non RT tasks in? In that case,
this could be even a bigger issues.

Thanks,

-- Steve

2008-08-28 14:31:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Steven Rostedt <[email protected]> wrote:

> For this, if this time limit does kick in, we should at the very least
> print something out to let the user know this happened. After all,
> this is more of a safety net anyway, and if we are hitting the limit,
> the user should be notified. Perhaps even tell the user that if this
> behaviour is expected, to up the sysctl <var> by more.

yeah, agreed, this is a reasonable suggestion. Peter, do you agree?

Ingo

2008-08-28 14:36:53

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> * Steven Rostedt <[email protected]> wrote:
> > For this, if this time limit does kick in, we should at the very least
> > print something out to let the user know this happened. After all,
> > this is more of a safety net anyway, and if we are hitting the limit,
> > the user should be notified. Perhaps even tell the user that if this
> > behaviour is expected, to up the sysctl <var> by more.
>
> yeah, agreed, this is a reasonable suggestion. Peter, do you agree?

Seems reasonable. But I still think it should be disabled by default
(it might not get caught in testing for example).

2008-08-28 15:12:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


On Fri, 29 Aug 2008, Nick Piggin wrote:

> On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> > * Steven Rostedt <[email protected]> wrote:
> > > For this, if this time limit does kick in, we should at the very least
> > > print something out to let the user know this happened. After all,
> > > this is more of a safety net anyway, and if we are hitting the limit,
> > > the user should be notified. Perhaps even tell the user that if this
> > > behaviour is expected, to up the sysctl <var> by more.
> >
> > yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
>
> Seems reasonable. But I still think it should be disabled by default
> (it might not get caught in testing for example).

Perhaps we should default it to 1sec, that way it would be hit more often,
and educate the users of this now feature.

-- Steve

2008-08-28 15:34:56

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Friday 29 August 2008 01:12, Steven Rostedt wrote:
> On Fri, 29 Aug 2008, Nick Piggin wrote:
> > On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> > > * Steven Rostedt <[email protected]> wrote:
> > > > For this, if this time limit does kick in, we should at the very
> > > > least print something out to let the user know this happened. After
> > > > all, this is more of a safety net anyway, and if we are hitting the
> > > > limit, the user should be notified. Perhaps even tell the user that
> > > > if this behaviour is expected, to up the sysctl <var> by more.
> > >
> > > yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
> >
> > Seems reasonable. But I still think it should be disabled by default
> > (it might not get caught in testing for example).
>
> Perhaps we should default it to 1sec, that way it would be hit more often,
> and educate the users of this now feature.

There only one sane default, as far as I can see.

Before anybody attacks me again because I haven't got my brain together or
am an annoying standards nitpicker:

I'm very well aware of the consequences of unlimited hogging of the CPU.
And I know exactly why people might want rt throttling. But just think for
a minute the _negative_ consequences of changing the API and remember that
is close to the #1 rule of Linux development to not break user API.

And put it this way: the sysctl is right there. Any distro that cares about
this problem will probably find this thread as #1 hit and work out how to
enable the sysctl and break the API if they are happy to do that. On the
flip side, not every application development or deployment is even going to
know about this, and it may not be trivial to catch in testing, so it could
cause failures in the field.

2008-08-28 15:50:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


On Fri, 29 Aug 2008, Nick Piggin wrote:

> On Friday 29 August 2008 01:12, Steven Rostedt wrote:
> > On Fri, 29 Aug 2008, Nick Piggin wrote:
> > > On Friday 29 August 2008 00:30, Ingo Molnar wrote:
> > > > * Steven Rostedt <[email protected]> wrote:
> > > > > For this, if this time limit does kick in, we should at the very
> > > > > least print something out to let the user know this happened. After
> > > > > all, this is more of a safety net anyway, and if we are hitting the
> > > > > limit, the user should be notified. Perhaps even tell the user that
> > > > > if this behaviour is expected, to up the sysctl <var> by more.
> > > >
> > > > yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
> > >
> > > Seems reasonable. But I still think it should be disabled by default
> > > (it might not get caught in testing for example).
> >
> > Perhaps we should default it to 1sec, that way it would be hit more often,
> > and educate the users of this now feature.
>
> There only one sane default, as far as I can see.
>
> Before anybody attacks me again because I haven't got my brain together or
> am an annoying standards nitpicker:
>
> I'm very well aware of the consequences of unlimited hogging of the CPU.
> And I know exactly why people might want rt throttling. But just think for
> a minute the _negative_ consequences of changing the API and remember that
> is close to the #1 rule of Linux development to not break user API.
>
> And put it this way: the sysctl is right there. Any distro that cares about
> this problem will probably find this thread as #1 hit and work out how to
> enable the sysctl and break the API if they are happy to do that. On the
> flip side, not every application development or deployment is even going to
> know about this, and it may not be trivial to catch in testing, so it could
> cause failures in the field.
>

The issue here is where to place the policy of protecting the user. Is it
in the kernel, or is it up to the distro.

I've always thought that the policy settings belong in the distro, and the
kernel should never enforce a policy (by setting this as default, it is
enforcing a policy, even though an RT user can change it).

I've recently been told that the kernel has of recent, has indeed been
starting to set policies. With protection of memory and such. If this is
the case, that the kernel is the place to implement policy, then the
"sane" default belongs there. If the distro is the place to instill
policy, then that is the place to put the "sane" default.

Basically, I'm not in a position to say where Linux should place the
default policies (distro or kernel). I've always thought the kernel should
be bare bones, allowing the distros to do all the policy settings, and
those that compile and build their own kernels/distros do so at their own
risks. But if this is no longer the case, then who am I to argue.

I guess this decision belongs to those above (Linus, Andrew)?

-- Steve

2008-08-28 16:05:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, 2008-08-28 at 10:15 -0400, Steven Rostedt wrote:

> My biggest concern about adding a limit to FIFO is that an RT developer
> would spend weeks trying to debug their system wondering why their
> planned CPU RT hog, is being preempted by a non-RT task.
>
> For this, if this time limit does kick in, we should at the very least
> print something out to let the user know this happened. After all, this
> is more of a safety net anyway, and if we are hitting the limit, the
> user should be notified. Perhaps even tell the user that if this
> behaviour is expected, to up the sysctl <var> by more.

Should be easy enough to do -

> Peter, another question. Is this limit for a single RT task running, or
> all RT tasks. I'm assuming here that it is a single RT task. If you have
> 20 RT tasks all running, would this let non RT tasks in? In that case,
> this could be even a bigger issues.

No its not per task. Its per group (and trivially the !group case is one
group).

All this bandwidth code comes from RT group scheduling. We do that by
assigning a bandwidth to each group so that within that bandwidth each
group can use RT tasks and have them behave like they should.

I don't fully agree with the statement that the most important thing for
SCHED_FIFO is to run as long as you want.

The most important thing SCHED_FIFO brings us are deterministic
scheduling rules. And RT group scheduling maintains that determinism by
using a constand bandwidth assignment.

Now the thing that we've been bickering about - bandwidth limits on the
root group, which just fell out of the whole ordeal due to symmertry.

On the one hand, a program that ran deterministic will still run
deterministically at n% (although of course, just like running on less
powerfull hardware, you could miss deadlines you previously did not). On
the other hand, people might not expect that.

Having a lower than 100% bandwidth limit by default gives a safer
environment because it avoids total starvation, nor does it take away
determinism [*].

It does however bring the risk of surprising a few folks.

[*] - there is some added jitter due to the throttling logic, and since
the default period might not align nicely with actual deadlines its not
perfect. An EDF based scheduler with <100% bandwidth caps would do
better.

Other scheduling classes have been mentioned... I've been on the point
of writing SCHED_ISO, a bandwidth throttled SCHED_FIFO that doesn't
require root priviligles and comes with say a 10% bandwidth limit.

Doing that should not be too hard - it will just add more code and a
bigger configuration space.


2008-08-28 16:15:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default



On Thu, 28 Aug 2008, Peter Zijlstra wrote:

> On Thu, 2008-08-28 at 10:15 -0400, Steven Rostedt wrote:
>
> > My biggest concern about adding a limit to FIFO is that an RT developer
> > would spend weeks trying to debug their system wondering why their
> > planned CPU RT hog, is being preempted by a non-RT task.
> >
> > For this, if this time limit does kick in, we should at the very least
> > print something out to let the user know this happened. After all, this
> > is more of a safety net anyway, and if we are hitting the limit, the
> > user should be notified. Perhaps even tell the user that if this
> > behaviour is expected, to up the sysctl <var> by more.
>
> Should be easy enough to do -
>
> > Peter, another question. Is this limit for a single RT task running, or
> > all RT tasks. I'm assuming here that it is a single RT task. If you have
> > 20 RT tasks all running, would this let non RT tasks in? In that case,
> > this could be even a bigger issues.
>
> No its not per task. Its per group (and trivially the !group case is one
> group).

Does this mean, if I have 100 RT tasks, that will together run for 10secs
secs, they will only run for 9.5secs?

This looks like an even bigger issue. Now we don't have one RT FIFO CPU
hog, we are now hitting 100 RT FIFO tasks that try to get a bunch done in
10 secs.

-- Steve

2008-08-28 16:19:20

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Andi Kleen wrote:
> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
>> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
>>>> Even if the system has multiple CPUs, and even if just a single CPU is
>>>> fully utilized by an RT task, without the rt-limit the system will still
>>>> lock up in practice due to various other factors: workqueues and tasks
>>>> being 'stuck' on CPUs that host an RT hog.
>>> The load balancer will not notice that a particular CPU is busy
>>> with real time tasks?
>> Not currently, working on that though.
>
> I wonder if it would make sense to break affinities in extreme case?
> With that even the workqueues would work again.

Please lets not break affinity :).

I'm going to submit patches (soonish) that convert drivers/etc to use
cancel_work_sync()/flush_work() instead of flush_scheduled_work().
That takes care of the
"machine getting stuck because workqueue thread is starved"
case.

Max

2008-08-28 16:26:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Max Krasnyansky <[email protected]> wrote:

> Andi Kleen wrote:
>> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
>>> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
>>>>> Even if the system has multiple CPUs, and even if just a single CPU is
>>>>> fully utilized by an RT task, without the rt-limit the system will still
>>>>> lock up in practice due to various other factors: workqueues and tasks
>>>>> being 'stuck' on CPUs that host an RT hog.
>>>> The load balancer will not notice that a particular CPU is busy
>>>> with real time tasks?
>>> Not currently, working on that though.
>>
>> I wonder if it would make sense to break affinities in extreme case?
>> With that even the workqueues would work again.
>
> Please lets not break affinity :).

correct, breaking affinity is a rather stupid idea.

Ingo

2008-08-28 16:29:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, 2008-08-28 at 12:15 -0400, Steven Rostedt wrote:
>
> On Thu, 28 Aug 2008, Peter Zijlstra wrote:
>
> > On Thu, 2008-08-28 at 10:15 -0400, Steven Rostedt wrote:
> >
> > > My biggest concern about adding a limit to FIFO is that an RT developer
> > > would spend weeks trying to debug their system wondering why their
> > > planned CPU RT hog, is being preempted by a non-RT task.
> > >
> > > For this, if this time limit does kick in, we should at the very least
> > > print something out to let the user know this happened. After all, this
> > > is more of a safety net anyway, and if we are hitting the limit, the
> > > user should be notified. Perhaps even tell the user that if this
> > > behaviour is expected, to up the sysctl <var> by more.
> >
> > Should be easy enough to do -
> >
> > > Peter, another question. Is this limit for a single RT task running, or
> > > all RT tasks. I'm assuming here that it is a single RT task. If you have
> > > 20 RT tasks all running, would this let non RT tasks in? In that case,
> > > this could be even a bigger issues.
> >
> > No its not per task. Its per group (and trivially the !group case is one
> > group).
>
> Does this mean, if I have 100 RT tasks, that will together run for 10secs
> secs, they will only run for 9.5secs?
>
> This looks like an even bigger issue. Now we don't have one RT FIFO CPU
> hog, we are now hitting 100 RT FIFO tasks that try to get a bunch done in
> 10 secs.

Yes.

But say you were doing rate monotonic scheduling (as is not uncommonly
done on top of SCHED_FIFO) then you could not get 100% cpu utilisation
anyway, as RMS has a ~69% utility bound.


2008-08-28 16:30:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, Aug 28, 2008 at 06:25:48PM +0200, Ingo Molnar wrote:
>
> * Max Krasnyansky <[email protected]> wrote:
>
> > Andi Kleen wrote:
> >> On Thu, Aug 28, 2008 at 01:19:13PM +0200, Peter Zijlstra wrote:
> >>> On Thu, 2008-08-28 at 13:09 +0200, Andi Kleen wrote:
> >>>>> Even if the system has multiple CPUs, and even if just a single CPU is
> >>>>> fully utilized by an RT task, without the rt-limit the system will still
> >>>>> lock up in practice due to various other factors: workqueues and tasks
> >>>>> being 'stuck' on CPUs that host an RT hog.
> >>>> The load balancer will not notice that a particular CPU is busy
> >>>> with real time tasks?
> >>> Not currently, working on that though.
> >>
> >> I wonder if it would make sense to break affinities in extreme case?
> >> With that even the workqueues would work again.
> >
> > Please lets not break affinity :).
>
> correct, breaking affinity is a rather stupid idea.

Ok let's remove cpu hotunplug then. Probably nobody uses it anyways @)

Seriously cpu affinity on all non BP CPU is currently broken on every
suspend to RAM, doing it in a few more cases when it makes the system
more robust is unlikely to hurt anybody.

-Andi

--
[email protected]

2008-08-28 16:33:34

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Nick Piggin wrote:
> On Friday 29 August 2008 00:30, Ingo Molnar wrote:
>> * Steven Rostedt <[email protected]> wrote:
>>> For this, if this time limit does kick in, we should at the very least
>>> print something out to let the user know this happened. After all,
>>> this is more of a safety net anyway, and if we are hitting the limit,
>>> the user should be notified. Perhaps even tell the user that if this
>>> behaviour is expected, to up the sysctl <var> by more.
>> yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
>
> Seems reasonable. But I still think it should be disabled by default
> (it might not get caught in testing for example).

I cannot believe you guys are still arguing about this and calling each
other stupid/incompetent/braindead and such (not this particular email
but all the stuff before) :)

Seems to me like leaving RT throttling disabled by default is a
reasonable compromise. Several people suggested that and the advantage
is that it does not change the definition of SCHED_FIFO/RR by default.

I personally do not care that much what the default is. If Fedora, for
example, starts enabling it by default I'll still have to change it. So
it's not much different from enabled by default in the kernel.

Max

2008-08-28 17:22:18

by John Kacur

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, Aug 28, 2008 at 6:33 PM, Max Krasnyansky <[email protected]> wrote:
> Nick Piggin wrote:
>>
>> On Friday 29 August 2008 00:30, Ingo Molnar wrote:
>>>
>>> * Steven Rostedt <[email protected]> wrote:
>>>>
>>>> For this, if this time limit does kick in, we should at the very least
>>>> print something out to let the user know this happened. After all,
>>>> this is more of a safety net anyway, and if we are hitting the limit,
>>>> the user should be notified. Perhaps even tell the user that if this
>>>> behaviour is expected, to up the sysctl <var> by more.
>>>
>>> yeah, agreed, this is a reasonable suggestion. Peter, do you agree?
>>
>> Seems reasonable. But I still think it should be disabled by default
>> (it might not get caught in testing for example).
>
> I cannot believe you guys are still arguing about this and calling each
> other stupid/incompetent/braindead and such (not this particular email but
> all the stuff before) :)
>
> Seems to me like leaving RT throttling disabled by default is a reasonable
> compromise. Several people suggested that and the advantage is that it does
> not change the definition of SCHED_FIFO/RR by default.
>
> I personally do not care that much what the default is. If Fedora, for
> example, starts enabling it by default I'll still have to change it. So it's
> not much different from enabled by default in the kernel.
>
> Max
>

I'm rather surprised at this whole conversation. I think it is pretty
simple that.
1. The kernel should not set policy but provide capabilities.
a.) It would be more appropriate for a distro to set the policy -. but
even here, the default policy should match the expectation of what
SCHED_FIFO is and standards such as POSIX unless there is a really
really good reason to show why the standard is wrong. (and I haven't
heard it here)
b.) The fact that it is possible to change the settings is an
excellent feature, but that cannot be used as an argument to change
the default settings to something unexpected. Rather, the feature can
be used to change what the standard default is.

2. SCHED_FIFO doesn't have limitations to it, even if the application
programmer can abuse it. That to me seems to be the whole purpose of
SCHED_FIFO - it does let you do things if you have the proper
privileges that a standard kernel protects against, but if the kernel
sets a limitation on it, then it simply isn't SCHED_FIFO anymore, it's
something else. I really dislike this talk about what a good
application programmer should do anyway, I like that we can be
surprised at human creativity and how things can be used in unexpected
ways, so I don't see why that should be throttled. And this argument
about false kernel lock-ups seems bogus to me too.

John

2008-08-28 17:27:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default



On Thu, 28 Aug 2008, Steven Rostedt wrote:
>
> I've always thought that the policy settings belong in the distro, and the
> kernel should never enforce a policy (by setting this as default, it is
> enforcing a policy, even though an RT user can change it).

The kernel has always done a certain amount of "default policy".

What do you think things like "swappiness" etc are? Or things like
oevrcommit settings? They're all policies, and there is always a default
one. So in that sense the kernel always has - and fundamentally _must_ -
set some kind of policy.

And the default policy should generally be the one that makes sense for
most people. Quite frankly, if it's an issue where all normal distros
would basically be expected to set a value, then that value should _be_
the default policy, and none of the normal distros should ever need to
worry.

Whether this case is one such, I dunno. Quite frankly, I don't think it's
even _nearly_ important enough to get this kind of noise.

Linus

2008-08-28 18:05:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


On Thu, 28 Aug 2008, Linus Torvalds wrote:

>
>
> On Thu, 28 Aug 2008, Steven Rostedt wrote:
> >
> > I've always thought that the policy settings belong in the distro, and the
> > kernel should never enforce a policy (by setting this as default, it is
> > enforcing a policy, even though an RT user can change it).
>
> The kernel has always done a certain amount of "default policy".
>
> What do you think things like "swappiness" etc are? Or things like
> oevrcommit settings? They're all policies, and there is always a default
> one. So in that sense the kernel always has - and fundamentally _must_ -
> set some kind of policy.
>
> And the default policy should generally be the one that makes sense for
> most people. Quite frankly, if it's an issue where all normal distros
> would basically be expected to set a value, then that value should _be_
> the default policy, and none of the normal distros should ever need to
> worry.
>
> Whether this case is one such, I dunno. Quite frankly, I don't think it's
> even _nearly_ important enough to get this kind of noise.

I guess the reason that this is getting so much noise over other default
policies, is that this default policy is changing a well known definition:
The meaning of FIFO.

By making the default policy limit the time an RT task runs, we have, in
essence, changed a user API. Applications that expect to be able to run
uninterrupted by SCHED_OTHER tasks, will now break.

No one is arguing that this new feature is not useful. The argument is,
should the kernel set the default policy of an old well known scheduling
policy to something different than what is expected?

Distros set SE Linux on by default, should the kernel do that too?

-- Steve

2008-08-28 18:10:45

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, Aug 28, 2008 at 11:04 AM, Steven Rostedt <[email protected]> wrote:
>
> On Thu, 28 Aug 2008, Linus Torvalds wrote:
>
>>
>>
>> On Thu, 28 Aug 2008, Steven Rostedt wrote:
>> >
>> > I've always thought that the policy settings belong in the distro, and the
>> > kernel should never enforce a policy (by setting this as default, it is
>> > enforcing a policy, even though an RT user can change it).
>>
>> The kernel has always done a certain amount of "default policy".
>>
>> What do you think things like "swappiness" etc are? Or things like
>> oevrcommit settings? They're all policies, and there is always a default
>> one. So in that sense the kernel always has - and fundamentally _must_ -
>> set some kind of policy.
>>
>> And the default policy should generally be the one that makes sense for
>> most people. Quite frankly, if it's an issue where all normal distros
>> would basically be expected to set a value, then that value should _be_
>> the default policy, and none of the normal distros should ever need to
>> worry.
>>
>> Whether this case is one such, I dunno. Quite frankly, I don't think it's
>> even _nearly_ important enough to get this kind of noise.
>
> I guess the reason that this is getting so much noise over other default
> policies, is that this default policy is changing a well known definition:
> The meaning of FIFO.
>
> By making the default policy limit the time an RT task runs, we have, in
> essence, changed a user API. Applications that expect to be able to run
> uninterrupted by SCHED_OTHER tasks, will now break.
>
> No one is arguing that this new feature is not useful. The argument is,
> should the kernel set the default policy of an old well known scheduling
> policy to something different than what is expected?
>
> Distros set SE Linux on by default, should the kernel do that too?
>
> -- Steve
>

A lot of people I have an immense amount of respect for with vastly differing
opinions. There was mention of a user poll so I'll share my .000000002 USD
here.

I have accepted in my dealings with real-time that it is a special programming
paradigm. The developer has much greater control and must exercise it
responsibly. From this, I have accepted that I can bring my system to it's
knees rather easily if I'm not careful. I agree with Nick and Max that this
default behavior should be preserved. I like Steven's suggested of disabling
the throttling in the upstream kernel, and leaving it to the distros to
safe-gaurd the user from themselves should they choose to. There is already
some precedent for this with the updated default kernel thread priorities and
realtime group and pam limits.conf settings in Red Hat's MRG product. When
doing real-time application development, I use various mechanisms to ensure
debugability, and it varies based on what I'm doing and how I access the
machine. Sometimes I need special watchdog application, sometimes I need to
boost all the kernel threads related to networking or serial consoles and the
respective login apps (ssh, agetty, etc.). It seems reasonable to consider
this throttling as another _optional_ tool in my debugging toolkit.

--
Darren Hart

2008-08-28 18:16:39

by Mark Hounschell

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

Linus Torvalds wrote:
>
> On Thu, 28 Aug 2008, Steven Rostedt wrote:
>> I've always thought that the policy settings belong in the distro, and the
>> kernel should never enforce a policy (by setting this as default, it is
>> enforcing a policy, even though an RT user can change it).
>
> The kernel has always done a certain amount of "default policy".
>
> What do you think things like "swappiness" etc are? Or things like
> oevrcommit settings? They're all policies, and there is always a default
> one. So in that sense the kernel always has - and fundamentally _must_ -
> set some kind of policy.
>
> And the default policy should generally be the one that makes sense for
> most people. Quite frankly, if it's an issue where all normal distros
> would basically be expected to set a value, then that value should _be_
> the default policy, and none of the normal distros should ever need to
> worry.
>
> Whether this case is one such, I dunno. Quite frankly, I don't think it's
> even _nearly_ important enough to get this kind of noise.
>
> Linus

More and more are wanting and now finding the Linux kernel to be more
RT capable. I seem to remember way back you saying it was one thing
you didn't really care much about one way or the other. Thats OK. But,
you _are_ the man. Put an end to this. Are you going to allow the long
understood meaning of SCHED_FIFO to change in the Linux kernel
just to protect a few _supposedly_ bad programmers???

Regards
Mark

2008-08-28 18:43:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default



On Thu, 28 Aug 2008, Mark Hounschell wrote:
>
> More and more are wanting and now finding the Linux kernel to be more
> RT capable. I seem to remember way back you saying it was one thing you didn't
> really care much about one way or the other. Thats OK. But, you _are_ the man.

The thing is, the reason I dislike RT is that so many people have so
different understanding of what RT means.

Quite frankly, I think that the people who are complaining (like you)
think that RT means "hard realtime". You think about literally specialized
devices.

A lot of _other_ people think that RT means "good audio latency", where it
really is a lot softer.

And neither camp seems to ever admit that they are just a small camp, and
that the other camp exists or is even valid.

And I'm not really interested. Quite frankly, I suspect the "we want to
run something like pulseaudio with RT priorities" camp is the more common
one, and in that context I understand limiting SCHED_FIFO sounds perfectly
understandable.

As to your

> "just to protect a few _supposedly_ bad programmers???"

quite frankly, most programmers aren't "supposedly bad". And if you think
that the hard-RT "real man" programmers aren't bad, I really have nothing
to say.

Linus

2008-08-28 18:53:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


On Thu, 28 Aug 2008, Linus Torvalds wrote:
>
> And I'm not really interested. Quite frankly, I suspect the "we want to
> run something like pulseaudio with RT priorities" camp is the more common
> one, and in that context I understand limiting SCHED_FIFO sounds perfectly
> understandable.

The fact that it actually limits a SCHED_FIFO task group, over a single
task thread does bother me a little.

But that said, I and others have made our complaints known, and will
forever be documented in the halls of the Internet abyss. Thus, the
verdict has been laid. Seems the default shall be something other than
infinite.

I will now remain silent.

-- Steve

2008-08-28 19:38:46

by Stefani Seibold

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

I started this discussion last week with an apparent bug in the new CFS.

As it turns out, it was not a bug, it was an feature, a (undocumented?)
feature.

In the world of embeded device and real time programming it is not a
hard job to compile the kernel right for the desired usage und fix the
startup script to use the desired policy.

Getting back the old behaviour would be nice and in my opinion the right
way, because the new one breaks with POSIX. But I have a working
solution and that is for me what matters.

By the way - RT means not hard real time. Hard-RT is a marketing phrase.
A given combination of OS and hardware must handle a event in a given
time. Thats all.

Thanks for the support.

Regards,
Stefani, the hard RT "real woman" programmer ;-)

Am Donnerstag, den 28.08.2008, 11:42 -0700 schrieb Linus Torvalds:
>
> On Thu, 28 Aug 2008, Mark Hounschell wrote:
> >
> > More and more are wanting and now finding the Linux kernel to be more
> > RT capable. I seem to remember way back you saying it was one thing you didn't
> > really care much about one way or the other. Thats OK. But, you _are_ the man.
>
> The thing is, the reason I dislike RT is that so many people have so
> different understanding of what RT means.
>
> Quite frankly, I think that the people who are complaining (like you)
> think that RT means "hard realtime". You think about literally specialized
> devices.
>
> A lot of _other_ people think that RT means "good audio latency", where it
> really is a lot softer.
>
> And neither camp seems to ever admit that they are just a small camp, and
> that the other camp exists or is even valid.
>
> And I'm not really interested. Quite frankly, I suspect the "we want to
> run something like pulseaudio with RT priorities" camp is the more common
> one, and in that context I understand limiting SCHED_FIFO sounds perfectly
> understandable.
>
> As to your
>
> > "just to protect a few _supposedly_ bad programmers???"
>
> quite frankly, most programmers aren't "supposedly bad". And if you think
> that the hard-RT "real man" programmers aren't bad, I really have nothing
> to say.
>
> Linus
>

2008-08-28 21:12:17

by Alan

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

> And I'm not really interested. Quite frankly, I suspect the "we want to
> run something like pulseaudio with RT priorities" camp is the more common
> one, and in that context I understand limiting SCHED_FIFO sounds perfectly
> understandable.

Is there actually a reason we can't have two forms of SCHED_FIFO. For
hard RT the existing behaviour is a lot more useful and it is hard to see
how you'd emulate it.

> quite frankly, most programmers aren't "supposedly bad". And if you think
> that the hard-RT "real man" programmers aren't bad, I really have nothing
> to say.

"real man" programmers stare at the code in Zen contemplation and debug
by powercycling - thats one thing even hard RT processes can't beat.

Alan

2008-08-29 07:57:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Thu, 2008-08-28 at 14:53 -0400, Steven Rostedt wrote:
> On Thu, 28 Aug 2008, Linus Torvalds wrote:
> >
> > And I'm not really interested. Quite frankly, I suspect the "we want to
> > run something like pulseaudio with RT priorities" camp is the more common
> > one, and in that context I understand limiting SCHED_FIFO sounds perfectly
> > understandable.
>
> The fact that it actually limits a SCHED_FIFO task group, over a single
> task thread does bother me a little.

It bothers me some too. You have to patch/re-compile the kernel if you
need to turn it off and don't have SCHED_DEBUG enabled (not free).

I tripped over this recently while regression testing. I didn't expect
a gaggle of SCHED_RR tasks to be throttled on an otherwise idle box.
Hitting that perturbed test results in an unexpected manner, and sent me
off on a tangent.

-Mike

2008-08-29 08:06:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Fri, 2008-08-29 at 09:56 +0200, Mike Galbraith wrote:
> On Thu, 2008-08-28 at 14:53 -0400, Steven Rostedt wrote:
> > On Thu, 28 Aug 2008, Linus Torvalds wrote:
> > >
> > > And I'm not really interested. Quite frankly, I suspect the "we want to
> > > run something like pulseaudio with RT priorities" camp is the more common
> > > one, and in that context I understand limiting SCHED_FIFO sounds perfectly
> > > understandable.
> >
> > The fact that it actually limits a SCHED_FIFO task group, over a single
> > task thread does bother me a little.
>
> It bothers me some too. You have to patch/re-compile the kernel if you
> need to turn it off and don't have SCHED_DEBUG enabled (not free).

/proc/sys/kernel/sched_rt_{runtime,period}_us don't require SCHED_DEBUG.
If they are in any way non-functional on SCHED_DEBUG=n then that's a
clear bug.

2008-08-29 08:47:51

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Fri, 2008-08-29 at 10:06 +0200, Peter Zijlstra wrote:
> On Fri, 2008-08-29 at 09:56 +0200, Mike Galbraith wrote:
> >
> > It bothers me some too. You have to patch/re-compile the kernel if you
> > need to turn it off and don't have SCHED_DEBUG enabled (not free).
>
> /proc/sys/kernel/sched_rt_{runtime,period}_us don't require SCHED_DEBUG.
> If they are in any way non-functional on SCHED_DEBUG=n then that's a
> clear bug.

Gee, you're right. I guess my eyeballs didn't want to see them without
their friends.

-Mike

2008-08-30 06:33:42

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Friday 29 August 2008 03:26, Linus Torvalds wrote:
> On Thu, 28 Aug 2008, Steven Rostedt wrote:
> > I've always thought that the policy settings belong in the distro, and
> > the kernel should never enforce a policy (by setting this as default, it
> > is enforcing a policy, even though an RT user can change it).
>
> The kernel has always done a certain amount of "default policy".
>
> What do you think things like "swappiness" etc are? Or things like
> oevrcommit settings? They're all policies, and there is always a default
> one. So in that sense the kernel always has - and fundamentally _must_ -
> set some kind of policy.

There is a difference. You *have* to pick some value for those things.
The settings can't necessarily be called correct or incorrect.

The default rt sched policy is definitely "broken" in that it very clearly
changes our previous behaviour, documentation, and what other systems do.

You could say that "realtime" in general is not really a single accepted
definition, but *SCHED_FIFO* and *SCHED_RR* in particular do have a well
defined, simple, and widely accepted definition that is undeniably changed
by this "policy".

Given that a) we can easily introduce new SCHED_xxx policies to implement
the new behaviour, and b) there are quite a few users of this API in this
thread who are concerned about the change, I think it is wisest just to
revert to our old behaviour.

I thought the rule of thumb is "if in doubt, we don't break user APIs".
It's funny that nobody has really answered any of my points of concern.

Anyway, I won't keep harping on about it.


> And the default policy should generally be the one that makes sense for
> most people. Quite frankly, if it's an issue where all normal distros
> would basically be expected to set a value, then that value should _be_
> the default policy, and none of the normal distros should ever need to
> worry.
>
> Whether this case is one such, I dunno. Quite frankly, I don't think it's
> even _nearly_ important enough to get this kind of noise.

That's cause you don't care about rt that much. You do care about back
compatibility though so I thought you'd be more interested. Anyway, I won't
post any more.