2005-01-03 14:04:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> The realtime LSM has been previously explained on this list. Its
> function is to allow selected nonroot users to run RT tasks. The most
> common application is low latency audio with JACK, http://jackit.sf.net.
>
> Several people have reported that 2.6.10 is the best kernel yet for
> audio latency, see
> http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.
>
> We (the authors and the Linux audio community) would like to request its
> inclusion in the next -mm release, with the eventual goal of having it
> in mainline.
>
> This is identical to the last version Jack O'Quin posted (but didn't cc:
> Andrew, or make clear that we would like this added to -mm), so I
> preserved his Signed-Off-By.

This is far too specialized. And option to the capability LSM to grant
capabilities to certain uids/gids sounds like the better choise - and
would also allow to get rid of the magic hugetlb uid horrors.


2005-01-03 14:15:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Mon, 2005-01-03 at 14:03 +0000, Christoph Hellwig wrote:
> On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> > The realtime LSM has been previously explained on this list. Its
> > function is to allow selected nonroot users to run RT tasks. The most
> > common application is low latency audio with JACK, http://jackit.sf.net.
> >
> > Several people have reported that 2.6.10 is the best kernel yet for
> > audio latency, see
> > http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.
> >
> > We (the authors and the Linux audio community) would like to request its
> > inclusion in the next -mm release, with the eventual goal of having it
> > in mainline.
> >
> > This is identical to the last version Jack O'Quin posted (but didn't cc:
> > Andrew, or make clear that we would like this added to -mm), so I
> > preserved his Signed-Off-By.
>
> This is far too specialized. And option to the capability LSM to grant
> capabilities to certain uids/gids sounds like the better choise - and
> would also allow to get rid of the magic hugetlb uid horrors.
those can go away anyway now that there is an rlimit to achieve the
exact same thing.....

I can see the point of making an rlimit like thing instead for both the
nice levels allowed and maybe the "can do rt" bit


2005-01-04 18:17:09

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Mon, 2005-01-03 at 14:03 +0000, Christoph Hellwig wrote:
> On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> > The realtime LSM has been previously explained on this list. Its
> > function is to allow selected nonroot users to run RT tasks. The most
> > common application is low latency audio with JACK, http://jackit.sf.net.
> >
> > Several people have reported that 2.6.10 is the best kernel yet for
> > audio latency, see
> > http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.
> >
> > We (the authors and the Linux audio community) would like to request its
> > inclusion in the next -mm release, with the eventual goal of having it
> > in mainline.
> >
> > This is identical to the last version Jack O'Quin posted (but didn't cc:
> > Andrew, or make clear that we would like this added to -mm), so I
> > preserved his Signed-Off-By.
>
> This is far too specialized. And option to the capability LSM to grant
> capabilities to certain uids/gids sounds like the better choise - and
> would also allow to get rid of the magic hugetlb uid horrors.
>

Got a patch? Code talks, BS walks. This is working perfectly, right
now, and is being used by thousands of Linux ausio users.

Lee

2005-01-04 18:20:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
> Got a patch? Code talks, BS walks. This is working perfectly, right
> now, and is being used by thousands of Linux ausio users.

Which still doesn't mean it's the right design. And no, I don't need the
feature so I won't write it. If you want a certain feature it's up to
you to implement it in a way that's considered mergeable.

2005-01-04 18:55:09

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Christoph Hellwig <[email protected]> writes:

> Which still doesn't mean it's the right design. And no, I don't
> need the feature so I won't write it. If you want a certain feature
> it's up to you to implement it in a way that's considered mergeable.

Which is what I have done. I worked on it because no "real" kernel
developer seemed willing to solve it. Having worked on other kernels
in an "earlier lifetime", I have *no* desire to do that any more. I
would much rather write audio software.

But, the lack of this feature has been a continual impediment for
years now. It affects not just me, but most other serious Linux audio
developers and many of our users. We need a simple way for users to
configure a Digital Audio Workstation without having to run large,
complex, insecure audio applications as `root'. Our competition runs
on Windows and Mac systems where no such configuration is needed.

Statements of the form "had I cared enough to do something about this
problem, I would have implemented it differently" are not much help.
This patch is small and clean. It meshes with existing kernel LSM
mechanisms. It solves a real problem affecting many Linux desktop
users.

I respectfully request that it be accepted for inclusion in 2.6.11.
--
joq

2005-01-04 18:57:32

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-04 at 18:20 +0000, Christoph Hellwig wrote:
> On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
> > Got a patch? Code talks, BS walks. This is working perfectly, right
> > now, and is being used by thousands of Linux ausio users.
>
> Which still doesn't mean it's the right design. And no, I don't need the
> feature so I won't write it. If you want a certain feature it's up to
> you to implement it in a way that's considered mergeable.
>

Please specify what's wrong with it. So far all your objection amounts
to is "I don't like it".

If you do have anything other that your opinion to back up your
assertion that it's a bad design, you should have raised it months ago
when this was first posted. Now that we have it to a mergeable state
(as far as the people who worked on it are concerned), you want to pop
up and say "Nope, bad design"?

Sorry but last time I checked you were not the ultimate arbiter of good
design on LKML. If you want to shitcan the _only known good, field
tested, working solution_ then you have to have overwhelming technical
arguments. So far I've seen zero.

Lee

2005-01-04 19:00:15

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-04 at 12:55 -0600, Jack O'Quin wrote:
> But, the lack of this feature has been a continual impediment for
> years now. It affects not just me, but most other serious Linux audio
> developers and many of our users. We need a simple way for users to
> configure a Digital Audio Workstation without having to run large,
> complex, insecure audio applications as `root'. Our competition runs
> on Windows and Mac systems where no such configuration is needed.

We could do it the was OSX (our real competition) does if that would
make people happy. They just let any user run RT tasks. Oh wait, but
that's a "broken design", everyone knows that OSX is a joke, no one
would use *that* OS to mix a CD or score a movie. :-)

Lee



2005-01-05 01:09:07

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Maw, 2005-01-04 at 18:59, Lee Revell wrote:
> We could do it the was OSX (our real competition) does if that would
> make people happy. They just let any user run RT tasks. Oh wait, but
> that's a "broken design", everyone knows that OSX is a joke, no one
> would use *that* OS to mix a CD or score a movie. :-)

You can do that already, just make everyone root

The problem with uid/gid based hacks is that they get really ugly to
administer really fast. Especially once you have users who need realtime
and hugetlb, and users who need one only.

It would be far cleaner to split CAP_SYS_NICE capability down - which
should cover the real time OS functions nicely. Right now it gives a few
too many rights but that could be fixed easily.

2005-01-05 01:29:33

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-05 at 00:01 +0000, Alan Cox wrote:
> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.
>

Sorry, how does hugetlb relate to this?

> It would be far cleaner to split CAP_SYS_NICE capability down - which
> should cover the real time OS functions nicely. Right now it gives a few
> too many rights but that could be fixed easily.
>

We need selected nonroot users to be able to run SCHED_FIFO tasks and
mlock(). It has to be easy to administer. That's it.

As Jack mentioned, the developers of this patch are not kernel hackers
by trade, they wrote this to solve a real problem. In other words, a
patch is worth a thousand words.

It seems distro vendors would be interested in solving this problem.
The linux audio market is smaller than the general desktop of course but
many of the users are professionals who would gladly pay for support.
Look how many people pay for OSX. Wouldn't Red Hat and SuSE like some
of those customers?

Lee

2005-01-05 01:33:06

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-05 at 00:01 +0000, Alan Cox wrote:
> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.

Why? Just make a realtime group and a hugetlb group and add users to
one, the other, or both.

Lee

2005-01-05 01:38:11

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Lee Revell wrote:
> On Tue, 2005-01-04 at 18:20 +0000, Christoph Hellwig wrote:
>
>>On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
>>
>>>Got a patch? Code talks, BS walks. This is working perfectly, right
>>>now, and is being used by thousands of Linux ausio users.
>>
>>Which still doesn't mean it's the right design. And no, I don't need the
>>feature so I won't write it. If you want a certain feature it's up to
>>you to implement it in a way that's considered mergeable.
>>
>
>
> Please specify what's wrong with it. So far all your objection amounts
> to is "I don't like it".
>
> If you do have anything other that your opinion to back up your
> assertion that it's a bad design, you should have raised it months ago
> when this was first posted. Now that we have it to a mergeable state
> (as far as the people who worked on it are concerned), you want to pop
> up and say "Nope, bad design"?

Let me remind you all that according to lkml history hch has always been
biased and objecting to anything related to lsm. Nobody can take hch's
opinion here as objective. I would even go so far that when things are
related to lsm(s) he's just tro...
--
Andreas Steinmetz SPAMmers use [email protected]

2005-01-05 01:52:47

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Alan Cox ([email protected]) wrote:
> On Maw, 2005-01-04 at 18:59, Lee Revell wrote:
> > We could do it the was OSX (our real competition) does if that would
> > make people happy. They just let any user run RT tasks. Oh wait, but
> > that's a "broken design", everyone knows that OSX is a joke, no one
> > would use *that* OS to mix a CD or score a movie. :-)
>
> You can do that already, just make everyone root
>
> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.

I don't believe the hugetlb gid stuff is useful anymore. It should be
handled nicely via rlimits.

> It would be far cleaner to split CAP_SYS_NICE capability down - which
> should cover the real time OS functions nicely. Right now it gives a few
> too many rights but that could be fixed easily.

Hmm, how do we do this w/out breaking things? Maybe I'm misunderstanding
your idea.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-05 01:56:00

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-04 at 17:50 -0800, Chris Wright wrote:
> * Alan Cox ([email protected]) wrote:
> >
> > The problem with uid/gid based hacks is that they get really ugly to
> > administer really fast. Especially once you have users who need realtime
> > and hugetlb, and users who need one only.
>
> I don't believe the hugetlb gid stuff is useful anymore. It should be
> handled nicely via rlimits.

The last time I checked users could belong to more than one group. Am I
missing something?

Lee

2005-01-05 02:05:56

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Lee Revell ([email protected]) wrote:
> The last time I checked users could belong to more than one group. Am I
> missing something?

No, you're not. I think Alan's just saying the gid based checks
are suboptimal if there's a cleaner way to do it (to which I agree).
Personally, I don't have a big problem with the Realtime LSM. I've helped
you with it, and suggested a few times that I'd prefer it to be generic;
but never stepped up to deliver code of that sort. Since it's your itch,
you've scratched it, and it's quite simple and contained, I consider
it acceptable.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-05 03:02:22

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Jan 04, 2005, at 21:05, Chris Wright wrote:
> No, you're not. I think Alan's just saying the gid based checks
> are suboptimal if there's a cleaner way to do it (to which I agree).
> Personally, I don't have a big problem with the Realtime LSM. I've
> helped
> you with it, and suggested a few times that I'd prefer it to be
> generic;
> but never stepped up to deliver code of that sort. Since it's your
> itch,
> you've scratched it, and it's quite simple and contained, I consider
> it acceptable.

Here's a relatively simple idea: Why not make the "Realtime LSM"
just check for a certain "Realtime" credential in the new credential
store (Patch is in 2.6.10, see [1] for control program). You would
mark it as a system credential and give access to that credential via
the appropriate capability with a small utility program.

Of course, I _do_ respect that I am not providing a patch which they
have done. I think this serves a useful place and should probably be
included as-is, for now. A later update to make it use a better
mechanism would be nice, though. :-)

[1] http://people.redhat.com/~dhowells/keys/keyctl.c

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------


2005-01-05 03:46:01

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Kyle Moffett ([email protected]) wrote:
> Here's a relatively simple idea: Why not make the "Realtime LSM"
> just check for a certain "Realtime" credential in the new credential
> store (Patch is in 2.6.10, see [1] for control program). You would
> mark it as a system credential and give access to that credential via
> the appropriate capability with a small utility program.

Well, that's basically what the gid is in this case. It's the credential
that's set at login time and has all the proper sharing and inheritance
rules. So, I'm not yet convinced that this would buy us much.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-05 04:03:47

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Alan Cox <[email protected]> writes:

> On Maw, 2005-01-04 at 18:59, Lee Revell wrote:
>> We could do it the was OSX (our real competition) does if that would
>> make people happy. They just let any user run RT tasks. Oh wait, but
>> that's a "broken design", everyone knows that OSX is a joke, no one
>> would use *that* OS to mix a CD or score a movie. :-)
>
> You can do that already, just make everyone root

Surely you're joking. Is this actually a serious proposal?

> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.

This is why POSIX requires supplementary groups.

All I had to do on my system was...

# adduser joq audio

That is considerably easier than hacking rlimits values via PAM.
--
joq

2005-01-05 04:05:04

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Chris Wright <[email protected]> writes:

> * Lee Revell ([email protected]) wrote:
>> The last time I checked users could belong to more than one group. Am I
>> missing something?
>
> No, you're not. I think Alan's just saying the gid based checks
> are suboptimal if there's a cleaner way to do it (to which I agree).
> Personally, I don't have a big problem with the Realtime LSM. I've helped
> you with it, and suggested a few times that I'd prefer it to be generic;
> but never stepped up to deliver code of that sort. Since it's your itch,
> you've scratched it, and it's quite simple and contained, I consider
> it acceptable.

We appreciate the help, Chris. The patch is considerably smaller and
cleaner thanks to your efforts.
--
joq

2005-01-05 05:22:57

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Mer, 2005-01-05 at 01:35, Andreas Steinmetz wrote:
> Let me remind you all that according to lkml history hch has always been
> biased and objecting to anything related to lsm. Nobody can take hch's
> opinion here as objective. I would even go so far that when things are
> related to lsm(s) he's just tro...

Oh I don't think so. Everyone thinks Christoph has it in for their
project (me included quite often). He's just blessed with a lot of taste
and determination to enforce it, and cursed (or perhaps blessed) with
the ability to explain bluntly and clearly his opinion.

gid hacks are not a good long term plan.

Can we use capabilities, if not - why not and how do we fix it so we can
do the job right. Do we need some more capability bits that are
implicitly inherited and not touched by setuidness ?


2005-01-05 05:50:45

by Andrew Morton

[permalink] [raw]

2005-01-05 11:21:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 04, 2005 at 12:55:15PM -0600, Jack O'Quin wrote:
> Statements of the form "had I cared enough to do something about this
> problem, I would have implemented it differently" are not much help.
> This patch is small and clean. It meshes with existing kernel LSM
> mechanisms. It solves a real problem affecting many Linux desktop
> users.

It solves problems - most kernel patches do that. But it does solve
this problems in a way that doesn't fit very well in the grand design.

2005-01-05 11:24:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 04, 2005 at 01:57:13PM -0500, Lee Revell wrote:
> On Tue, 2005-01-04 at 18:20 +0000, Christoph Hellwig wrote:
> > On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
> > > Got a patch? Code talks, BS walks. This is working perfectly, right
> > > now, and is being used by thousands of Linux ausio users.
> >
> > Which still doesn't mean it's the right design. And no, I don't need the
> > feature so I won't write it. If you want a certain feature it's up to
> > you to implement it in a way that's considered mergeable.
> >
>
> Please specify what's wrong with it. So far all your objection amounts
> to is "I don't like it".

It's tying privilegues to uids/gids, and it does so in an overcomplicated
way and just for an extremly tiny, specialized subset of available
privilegues.

In short it's a very specialized hack.

> If you do have anything other that your opinion to back up your
> assertion that it's a bad design, you should have raised it months ago
> when this was first posted. Now that we have it to a mergeable state
> (as far as the people who worked on it are concerned), you want to pop
> up and say "Nope, bad design"?

I'm very sorry but I don't have the time to comment on every single patch
posted somewhere. All the review and core kernel work I do on lkml is in my
unpaid spare time. If you want me to review specific things in a deadline
or want me to implement features in a way that fits the kernel grand plan
(which doesn't equal to it actually beeing accepted by other kernel
developers), you're free to contract me.

2005-01-05 11:28:04

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 04, 2005 at 01:59:57PM -0500, Lee Revell wrote:
> We could do it the was OSX (our real competition) does if that would
> make people happy. They just let any user run RT tasks. Oh wait, but
> that's a "broken design", everyone knows that OSX is a joke, no one
> would use *that* OS to mix a CD or score a movie. :-)

No one sane (well, no one sane with a background in Operating Systems)
would use OS X at all.

2005-01-05 11:42:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

> Let me remind you all that according to lkml history hch has always been
> biased and objecting to anything related to lsm. Nobody can take hch's
> opinion here as objective. I would even go so far that when things are
> related to lsm(s) he's just tro...

I'm not a big fan of LSM, and I've explained the rationale why multiple
times. The doesn't mean everything done using LSM is bad - in practice
most things are bad though (from the things I've seen everything but lsm)

btw, any reason you drop me from the Cc list once you start the personal
attacks?

2005-01-05 11:52:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


the RT-LSM thing is a bit dangerous because it doesnt really protect
against a runaway, buggy app. So i think the right way to approach this
problem is to not apply RT-LSM for the time being, but to provide an
'advanced latency needs' scheduling class that is _still_ safe even if
the task is runaway, but behaves with near-RT priorities if the task is
'nice' (i.e. doesnt use up large amount of CPU time.)

incidentally, there is such a scheduling class already: negative nice
levels. Please skip any preconceptions you might have about nice levels,
nice levels have been improved in 2.6.10, the timeslices are now given
out exponentially, giving nice -20 tasks far more weight and priority
than they used to have. (They are obviously still preemptable if they
keep looping burning CPU - but that we can consider a feature.) (Also,
in 2.6 the negative nice levels have a much more agressive interactivity
setting, allowing them to preempt everything lower-prio.)

so, could you try vanilla 2.6.10 (without LSM and without jackd running
with RT priorities), with jackd set to nice -20? Make sure the
jack-client process gets this priority too. Best to achieve this is to
renice a shell to -20 and start up everything from there - the nice
settings will be inherited. How does such an audio test compare to a
test done with jackd running at SCHED_FIFO with RT priority 1?

if this works out well then we could achieve something comparable to
RT-LSM, via nice levels alone.

Ingo

2005-01-05 12:06:06

by Herbert Poetzl

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 04, 2005 at 09:50:10PM -0800, Andrew Morton wrote:
> Alan Cox <[email protected]> wrote:
> >
> > Can we use capabilities
>
> capabilities don't work :(
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.0/0502.html

well, maybe it is time to fix them ..

I already proposed some methods to extend them,
and I'm also willing to dig into the various things
required to allow to use the capability system for
what it was intended.

best,
Herbert

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-01-05 15:41:34

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-05 at 12:52 +0100, Ingo Molnar wrote:
> the RT-LSM thing is a bit dangerous because it doesnt really protect
> against a runaway, buggy app. So i think the right way to approach this
> problem is to not apply RT-LSM for the time being, but to provide an
> 'advanced latency needs' scheduling class that is _still_ safe even if
> the task is runaway, but behaves with near-RT priorities if the task is
> 'nice' (i.e. doesnt use up large amount of CPU time.)
>
> incidentally, there is such a scheduling class already: negative nice
> levels. Please skip any preconceptions you might have about nice levels,
> nice levels have been improved in 2.6.10, the timeslices are now given
> out exponentially, giving nice -20 tasks far more weight and priority
> than they used to have. (They are obviously still preemptable if they
> keep looping burning CPU - but that we can consider a feature.) (Also,
> in 2.6 the negative nice levels have a much more agressive interactivity
> setting, allowing them to preempt everything lower-prio.)
>
> so, could you try vanilla 2.6.10 (without LSM and without jackd running
> with RT priorities), with jackd set to nice -20? Make sure the
> jack-client process gets this priority too. Best to achieve this is to
> renice a shell to -20 and start up everything from there - the nice
> settings will be inherited. How does such an audio test compare to a
> test done with jackd running at SCHED_FIFO with RT priority 1?
>
> if this works out well then we could achieve something comparable to
> RT-LSM, via nice levels alone.

Ugh, screwed up the cc: list. Sorry for the WOB.

Paul, care to comment on the above?

Lee

2005-01-05 15:48:05

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-05 at 12:52 +0100, Ingo Molnar wrote:
> the RT-LSM thing is a bit dangerous because it doesnt really protect
> against a runaway, buggy app. So i think the right way to approach this
> problem is to not apply RT-LSM for the time being, but to provide an
> 'advanced latency needs' scheduling class that is _still_ safe even if
> the task is runaway, but behaves with near-RT priorities if the task is
> 'nice' (i.e. doesnt use up large amount of CPU time.)
>
> incidentally, there is such a scheduling class already: negative nice
> levels. Please skip any preconceptions you might have about nice levels,
> nice levels have been improved in 2.6.10, the timeslices are now given
> out exponentially, giving nice -20 tasks far more weight and priority
> than they used to have. (They are obviously still preemptable if they
> keep looping burning CPU - but that we can consider a feature.) (Also,
> in 2.6 the negative nice levels have a much more agressive interactivity
> setting, allowing them to preempt everything lower-prio.)
>
> so, could you try vanilla 2.6.10 (without LSM and without jackd running
> with RT priorities), with jackd set to nice -20? Make sure the
> jack-client process gets this priority too. Best to achieve this is to
> renice a shell to -20 and start up everything from there - the nice
> settings will be inherited. How does such an audio test compare to a
> test done with jackd running at SCHED_FIFO with RT priority 1?
>
> if this works out well then we could achieve something comparable to
> RT-LSM, via nice levels alone.
>

Adding Paul Davis to the cc:, as he has expressed very strong opinions
on this in the past.

Of course this does not address the problem as you still need to be root
to run at a negative nice value.

Lee

2005-01-05 17:36:32

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-05 at 11:39 +0000, Christoph Hellwig wrote:
> I'm not a big fan of LSM, and I've explained the rationale why multiple
> times. The doesn't mean everything done using LSM is bad - in practice
> most things are bad though (from the things I've seen everything but lsm)
^^^

Is this a typo? Maybe you mean SELinux?

Lee

2005-01-05 17:36:34

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-05 at 11:25 +0000, Christoph Hellwig wrote:
> On Tue, Jan 04, 2005 at 01:59:57PM -0500, Lee Revell wrote:
> > We could do it the was OSX (our real competition) does if that would
> > make people happy. They just let any user run RT tasks. Oh wait, but
> > that's a "broken design", everyone knows that OSX is a joke, no one
> > would use *that* OS to mix a CD or score a movie. :-)
>
> No one sane (well, no one sane with a background in Operating Systems)
> would use OS X at all.
>

Really? I would expect any sane engineer to use the best tool for the
job. If you actually think it's Linux, I suggest you try it sometime.

Lee

2005-01-05 18:18:16

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> the RT-LSM thing is a bit dangerous because it doesnt really protect
> against a runaway, buggy app. So i think the right way to approach this
> problem is to not apply RT-LSM for the time being, but to provide an
> 'advanced latency needs' scheduling class that is _still_ safe even if
> the task is runaway, but behaves with near-RT priorities if the task is
> 'nice' (i.e. doesnt use up large amount of CPU time.)

You are right that a runaway SCHED_FIFO application can freeze the
system. But, this really has nothing to do with the permissions
problem addressed by the realtime-lsm. In fact, it is needed by
non-root users for running `nice -20', just as for SCHED_FIFO.

I have no objection to creating a "better" RT scheduling class than
SCHED_FIFO. The "much-maligned" Mac OS X has a deadline scheduler
that works quite well for running JACK and its applications.

> so, could you try vanilla 2.6.10 (without LSM and without jackd running
> with RT priorities), with jackd set to nice -20? Make sure the
> jack-client process gets this priority too. Best to achieve this is to
> renice a shell to -20 and start up everything from there - the nice
> settings will be inherited. How does such an audio test compare to a
> test done with jackd running at SCHED_FIFO with RT priority 1?

For a quick comparison, I used a slightly modified version of the
jack_test3.2 script, that runs jackd without the -R (--realtime)
option...

With -R Without -R
(SCHED_FIFO) (nice -20)

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1) ( 1)
XRUN Count . . . . . . . . . : 2 2837
Delay Count (>spare time) . . : 0 0
Delay Count (>1000 usecs) . . : 0 0
Delay Maximum . . . . . . . . : 3130 usecs 5038044 usecs
Cycle Maximum . . . . . . . . : 960 usecs 18802 usecs
Average DSP Load. . . . . . . : 34.3 % 44.1 %
Average CPU System Load . . . : 8.7 % 7.5 %
Average CPU User Load . . . . : 29.8 % 5.2 %
Average CPU Nice Load . . . . : 0.0 % 20.3 %
Average CPU I/O Wait Load . . : 3.2 % 5.2 %
Average CPU IRQ Load . . . . : 0.7 % 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 % 0.2 %
Average Interrupt Rate . . . : 1707.6 /sec 1677.3 /sec
Average Context-Switch Rate . : 11914.9 /sec 11197.6 /sec
*********************************************

This was not exactly the test you requested. The LSM is still
present. But, it makes no difference. In fact, I used it to grant
nice privileges, since I didn't feel like running it as root.

But this is otherwise vanilla 2.6.10, and the two scheduling
algorithms are fairly represented. Try it yourself, I think you'll
see similarly dramatic differences.

Note that 2.6.10 has by far the best realtime performance of any
vanilla Linux kernel I have ever tried. Although, much better results
can be obtained with your Realtime Preemption patches, this is still a
very creditable result, quite usable for many relatively low-latency
applications. Kudos to you and the many others who contributed to
this achievement.

> if this works out well then we could achieve something comparable to
> RT-LSM, via nice levels alone.

As you see, it does not work at all.
--
joq

2005-01-05 19:11:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Jan 05, 2005 at 12:32:47PM -0500, Lee Revell wrote:
> Really? I would expect any sane engineer to use the best tool for the
> job.

Sure.

> If you actually think it's Linux, I suggest you try it sometime.

You don't want to run Darwin, trust me. If you don't read through their
sources..

2005-01-05 19:12:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Jan 05, 2005 at 12:35:56PM -0500, Lee Revell wrote:
> On Wed, 2005-01-05 at 11:39 +0000, Christoph Hellwig wrote:
> > I'm not a big fan of LSM, and I've explained the rationale why multiple
> > times. The doesn't mean everything done using LSM is bad - in practice
> > most things are bad though (from the things I've seen everything but lsm)
> ^^^
>
> Is this a typo? Maybe you mean SELinux?

Yes.

2005-01-07 01:16:48

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Jan 05, 2005 at 01:06:02PM +0100, Herbert Poetzl wrote:
> On Tue, Jan 04, 2005 at 09:50:10PM -0800, Andrew Morton wrote:
> > Alan Cox <[email protected]> wrote:
> > >
> > > Can we use capabilities
> >
> > capabilities don't work :(
> >
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.0/0502.html
>
> well, maybe it is time to fix them ..
>
> I already proposed some methods to extend them,
> and I'm also willing to dig into the various things
> required to allow to use the capability system for
> what it was intended.

You can't fix them without changing the semantics for existing users
in ways they didn't expect. It could be done with a new personality flag,
but..

--
Mathematics is the supreme nostalgia of our time.

2005-01-07 01:22:14

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Jan 05, 2005 at 04:18:15AM +0000, Alan Cox wrote:
> On Mer, 2005-01-05 at 01:35, Andreas Steinmetz wrote:
> > Let me remind you all that according to lkml history hch has always been
> > biased and objecting to anything related to lsm. Nobody can take hch's
> > opinion here as objective. I would even go so far that when things are
> > related to lsm(s) he's just tro...
>
> Oh I don't think so. Everyone thinks Christoph has it in for their
> project (me included quite often). He's just blessed with a lot of taste
> and determination to enforce it, and cursed (or perhaps blessed) with
> the ability to explain bluntly and clearly his opinion.
>
> gid hacks are not a good long term plan.
>
> Can we use capabilities, if not - why not and how do we fix it so we can
> do the job right. Do we need some more capability bits that are
> implicitly inherited and not touched by setuidness ?

Why can't this be done with a simple SUID helper to promote given
tasks to RT with sched_setschedule, doing essentially all the checks
this LSM is doing?

Objections of "because it requires dangerous root or suid" don't fly,
an RT app under user control can DoS the box trivially. Never mind you
need root to configure the LSM anyway..

--
Mathematics is the supreme nostalgia of our time.

2005-01-07 02:36:49

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-06 at 17:18 -0800, Matt Mackall wrote:
> Why can't this be done with a simple SUID helper to promote given
> tasks to RT with sched_setschedule, doing essentially all the checks
> this LSM is doing?
>
> Objections of "because it requires dangerous root or suid" don't fly,
> an RT app under user control can DoS the box trivially. Never mind you
> need root to configure the LSM anyway..

Yes but a bug in an app running as root can trash the filesystem. The
worst you can do with RT privileges is lock up the machine.

Lee

2005-01-07 03:00:47

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Gwe, 2005-01-07 at 01:13, Matt Mackall wrote:
> You can't fix them without changing the semantics for existing users
> in ways they didn't expect. It could be done with a new personality flag,
> but..

I disagree. At the most trivial you could just add another 32bits of
sticky capability that are never touched by setuid/non-setuidness and
represent additional "user" (or more rightly session) abilities to do
limited overrides

2005-01-07 05:53:35

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


[Adding linux-audio-dev to the CC list]

Matt Mackall <[email protected]> writes:

> On Wed, Jan 05, 2005 at 04:18:15AM +0000, Alan Cox wrote:
>> gid hacks are not a good long term plan.
>>
>> Can we use capabilities, if not - why not and how do we fix it so
>> we can do the job right. Do we need some more capability bits that
>> are implicitly inherited and not touched by setuidness ?
>
> Why can't this be done with a simple SUID helper to promote given
> tasks to RT with sched_setschedule, doing essentially all the checks
> this LSM is doing?

The answer to your simple question is a long, sad story. :-(

There is clearly no practical way to write large audio applications
(many with elaborate graphical interfaces) securly enough to run them
as root. So, we have used capabilities with linux-2.4 systems for
several years. It was never a satisfactory solution, but was all we
could do at the time.

There is a small setuid program called `jackstart' that exec()s the
JACK server (`jackd') with appropriate privileges so it can pass
realtime privileges to its applications. Each client needs to create
a realtime thread and mlock() its storage to do its part of the
realtime audio cycle. Note that sched_setschedule() provides no way
to handle the mlock() requirement, which cannot be done from another
process. Clients may come and go at any time, so dropping the
privilege after initialization is not an option.

Unfortunately, all this heavyweight mechanism only helps with JACK and
its many clients. Lots of other audio or video oriented applications
also have realtime needs.

The biggest problem was CAP_SETPCAP, which for good reasons[1] is
disabled in distributed kernels. This forced every user to patch and
build a custom kernel. Worse, it opened all our systems up to the
problems reported by this sendmail security advisory.

[1] http://www.securiteam.com/unixfocus/5KQ040A1RI.html

While stumbling along with this very unsatisfactory state of affairs,
many on the Linux Audio Developers mailing list were shocked[2] to
hear about an LKML discussion[3] suggesting a significant lack of
developer committment to addressing these issues...

> Quoting Albert Cahalan[3]: "The authors of our code seem to have
> given up and moved on. Nobody cleaned up the mess. Is it any wonder
> the POSIX draft didn't ever make it beyond the draft state?"

[2] http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-November/005332.html
[3] http://www.kerneltraffic.org/kernel-traffic/kt20031101_239.html#3

So, all our work, frustration and user confusion while trying to "do
the right thing" seemed doomed to failure. Since the Linux kernel
developers continued to show little interest in our needs, we started
a discussion about how to meet them ourselves[4].

[4] http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-November/005345.html

Looking at our security requirements in a practical manner, we quickly
concluded that CAP_SETPCAP is the work of the devil. A true
filesystem-based privilege vector solution might be adequate, but is
clearly beyond the scope of what we audio programmers could hope to
accomplish. Even then, it would be difficult to administer.

A simple group ID test is far more secure than CAP_SETPCAP, and
perfectly adequate for us. When configuring a Digital Audio
Workstation, one is not terribly concerned about local Denial of
Service attacks or runaway realtime threads. That would be
unacceptable for many other systems, but not ours. Yet, we want to
avoid system integrity holes in network daemons like sendmail[1]. In
other words: we can tolerate the bad guys crashing the system, but we
don't want them turning it into an open spam relay or corrupting the
filesystem.

So, we needed to provide a simple way for an unskilled system admin
(aka "musician") to configure a personal workstation to run realtime
applications without opening egregious security holes. Equally
important, it must be easy for other system admins to ensure that
these privileges are *not* available on their server systems. It soon
became apparent that the then-new LSM framework provided a good
solution. Because LSM's can be built outside the kernel source tree,
we were no longer forced to wait for some kernel developer to take an
interest.

The realtime-lsm is the solution we evolved. It has been actively
used by thousands of Linux audio users for over a year now[5]. The
first supported SourceForge release was in April of 2004[6]. It is
now used by many popular audio-oriented distributions, including
Planet CCRMA[7] from Stanford University and the Debian Music
Distribution[8] from the AGNULA project.

[5] http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-December/005745.html
[6] http://eca.cx/laa/2004/04/0028.html
[7] http://ccrma.stanford.edu/planetccrma/software/
[8] http://www.agnula.org/

I understand that kernel developers are busy and have other problems
they consider more important than ours. But, you ought to at least
understand that this is really important to us. We needed a clean
solution two or three years ago. Now we finally have one.

Distributing it with the kernel sources would be a great convenience
for our users and would significantly simplify maintenance. It would
also (IMHO) close a significant security and usability deficiency in
the standard kernel. Any of the NSA and DoD experts will tell you: a
security solution that is difficult to administer is not secure.

It is no surprise that kernel developers should consider our solution
technically inferior to their own ideas on the subject. I would have
been delighted to have some kernel developer step in and provide a
clean, well-thought out solution several years ago. This is a kernel
deficiency, not an audio problem. I don't want to work on kernels.

But, I am feeling quite discouraged that so many kernel developers
still seem to consider this problem unimportant. I sense a distinct
unwillingness to move forward on this issue. I really hope I am wrong
about that.
--
joq

2005-01-07 12:56:23

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

I just read the thread of messages about this, and I am just
dumbfounded. Jack O'Quin has very politely explained the whole thing,
and it appears that almost nobody actually paid attention to what
he was saying.

1) capabilities: it has been explained by several people that
capabilities do not work, and in the past there has been an utter lack
of interest on the part of the kernel crowd to fix them, sometimes
even going as far as "it can't be fixed".

2) this is *not* only about scheduling. Realtime tasks need
mlockall() and/or mlock as well. even the man page for mlock
recognizes this, yet almost all the discussion here has focused on
scheduling.

3) christoph claims that using uid/gid to define priviledge scope
is a bad idea. but that is the *desired* method. uid/gid corresponds exactly
to what the users of these systems want. they don't want priviledge
accorded to specific applications - its the *users* not the
applications that have the right to get RT scheduling, lock down
memory and so on. these applications will run without RT priviledges,
just not very well (in general, so badly that they are unusable for
their intended purpose).

4) christoph's claims about OS X are nothing but ridiculous. whatever
the internals of Darwin may or may not be (and they certainly include
some of the best ideas about media-friendly kernels from the last 20
years, unlike our favorite OS), professional people are using OS X
(like they used OS 9 and OS 8 before) to get serious, paid work done
in a way that they cannot on Linux. and if attitudes like christoph's
prevail, in a way that they will never get to do on Linux without
going through steps that they will consider absurd. Alan jokes (i
presume) "oh, thats easy, make everyone root", but thats not what OS X
does. OS X says "we know that running realtime applications matters
for a broad class of our likely users, and so anyone can do it, not
just root". And note: "realtime applications" does not mean just
"rt-scheduled", as noted above.

5) setuid wrappers don't work for this, because even though you can
change the scheduling class of another process, you cannot "grant" it
the ability to use mlock. at least not without capabilities, so back
to (1) above ...

So, what do we have here? The two most successful media-friendly OS's
(BeOS and OS X) demonstrate clearly the way things need to be from the
user experience perspective, a development community within the Linux
world evolves a solution using the very nice new security modules in
2.6, and then people who don't appear to understand anything about
what is required or what the use cases are say "i don't like and
because nobody pays me i don't have to tell you why".

I've spent probably burnt through to $250,000 supporting myself and my
family over the last 5 years while I develop pro-level audio software
for Linux. I don't expect to see any of that back. So when Christoph
chimes in with the "I'm not paid, I don't have to tell you why I don't
like it, I just don't" ... that really, really, really irritates me in
a way that few other comments do.

We (Jack, Lee and now myself) have tried to explain what the problem
with the kernel is, how LSM makes a solution possible, acknowledged
issues and attempted to address them, and finally have offered up a
working patch that makes life easier for a bunch of people who don't
want to run webservers or compile kernels all day. If you're going to
publically argue that what the "realtime" LSM does should not be part
of the kernel, at least do us the favor of showing us enough respect
to provide technical or policy based reasons for why its such a bad
solution.

--p

2005-01-07 13:04:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 07:56:02AM -0500, Paul Davis wrote:
> 2) this is *not* only about scheduling. Realtime tasks need
> mlockall() and/or mlock as well. even the man page for mlock
> recognizes this, yet almost all the discussion here has focused on
> scheduling.

RLIMIT_MEMLOCK is your friend.

> 3) christoph claims that using uid/gid to define priviledge scope
> is a bad idea. but that is the *desired* method. uid/gid corresponds exactly
> to what the users of these systems want. they don't want priviledge
> accorded to specific applications - its the *users* not the
> applications that have the right to get RT scheduling, lock down
> memory and so on. these applications will run without RT priviledges,
> just not very well (in general, so badly that they are unusable for
> their intended purpose).

it doesn't really matter what you want, but how we can implement
something that fits in the kernel design.

> 4) christoph's claims about OS X are nothing but ridiculous. whatever
> the internals of Darwin may or may not be (and they certainly include
> some of the best ideas about media-friendly kernels from the last 20
> years, unlike our favorite OS), professional people are using OS X

professional people are also using Windows or Solaris. That doesn't
mean we have to copy every bad idea from them.

> 5) setuid wrappers don't work for this, because even though you can
> change the scheduling class of another process, you cannot "grant" it
> the ability to use mlock. at least not without capabilities, so back
> to (1) above ...

See above (RLIMIT_MEMLOCK).

> I've spent probably burnt through to $250,000 supporting myself and my
> family over the last 5 years while I develop pro-level audio software
> for Linux. I don't expect to see any of that back. So when Christoph
> chimes in with the "I'm not paid, I don't have to tell you why I don't
> like it, I just don't" ... that really, really, really irritates me in
> a way that few other comments do.

I think you're taking things totally out of context here. Lee complained
I didn't review his patch earlier. I only have a limited time available
so I'll select patches that I'm gonna review - and that means thet have
to either be very interesting or be proposed for inclusion. If you want
me to review other things you'll have to either pay me or ask me really
nicely offlist.

> We (Jack, Lee and now myself) have tried to explain what the problem
> with the kernel is, how LSM makes a solution possible, acknowledged
> issues and attempted to address them, and finally have offered up a
> working patch that makes life easier for a bunch of people who don't
> want to run webservers or compile kernels all day.

And we have told you that this solution is not okay. You can spend
more time whining which won't do anything or you could help brainstorming
how to implement a workable solution.

2005-01-07 14:18:41

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>On Fri, Jan 07, 2005 at 07:56:02AM -0500, Paul Davis wrote:
>> 2) this is *not* only about scheduling. Realtime tasks need
>> mlockall() and/or mlock as well. even the man page for mlock
>> recognizes this, yet almost all the discussion here has focused on
>> scheduling.
>
>RLIMIT_MEMLOCK is your friend.

rlimit_memlock limits the *amount* of memory that mlock() can be used
on, not whether mlock can be used. at least, thats my understanding of
the POSIX design for this. the man page and the source code for mlock
support make that reasonably clear.

moreover, AFAIK all the issues that existed for granting capabilities
exist for rlimit-based priviledges. if they are not granted to all
users/processes, how are they granted, and can they controlled by a
non-root process? last time i looked, the hard limit used by rlimits is
system-wide. you want to copy that idea from OSX or not?

>it doesn't really matter what you want, but how we can implement
>something that fits in the kernel design.

"realtime" LSM does fit into the kernel, quite demonstrably so. it
doesn't, it appears, fit into *your* idea of kernel design.

>> 4) christoph's claims about OS X are nothing but ridiculous. whatever
>> the internals of Darwin may or may not be (and they certainly include
>> some of the best ideas about media-friendly kernels from the last 20
>> years, unlike our favorite OS), professional people are using OS X
>
>professional people are also using Windows or Solaris. That doesn't
>mean we have to copy every bad idea from them.

I didn't say "copy every idea from them". The point of "realtime" LSM
is precisely *not* to copy every idea from them - instead of every
user being able to run RT apps, only specifically root-administered
uids and/or gids can.

>And we have told you that this solution is not okay. You can spend

You, Christoph, have told us that. There is no "we" here. You provided
no rationale other than "uid/gid based privildge control is the wrong
method".

>more time whining which won't do anything or you could help brainstorming
>how to implement a workable solution.

We (Jack, Torben and others on LAD) did brainstorm. We were told on
lkml that LSM was the right way to do this kind of things these days,
because capabilities were broken. But you don't like LSM, so now,
totally post-facto you're telling us that this is not a "workable
solution."

Newsflash: its a totally workable and working solution, and its one
that distributions will adopt whether you get paid or i suck up and
ask you nicely offline. The question was whether we could make
distributions' and users' lives a little easier by not requiring them
to download additional stuff first. Apparently, your unexplained
convictions about the right and wrong way to grant priviledges,
(something that no OS has ever really gotten its head around except
VMS (maybe)), is more important.

Fine, we'll continue to tell people to use "realtime" LSM for audio
work. The people this really affects probably won't use vanilla
kernels anyway.

--p


2005-01-07 14:27:46

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 09:16:50AM -0500, Paul Davis wrote:
> >On Fri, Jan 07, 2005 at 07:56:02AM -0500, Paul Davis wrote:
> >> 2) this is *not* only about scheduling. Realtime tasks need
> >> mlockall() and/or mlock as well. even the man page for mlock
> >> recognizes this, yet almost all the discussion here has focused on
> >> scheduling.
> >
> >RLIMIT_MEMLOCK is your friend.
>
> rlimit_memlock limits the *amount* of memory that mlock() can be used
> on, not whether mlock can be used. at least, thats my understanding of
> the POSIX design for this. the man page and the source code for mlock
> support make that reasonably clear.

eh no. It defaults to zero, but if you increase it for a specific user, that
user is allowed to mlock more.

>
> Fine, we'll continue to tell people to use "realtime" LSM for audio
> work. The people this really affects probably won't use vanilla
> kernels anyway.

that is so not a constructive way to make progress.
The realtime LSM is the wrong concept. It's a hack to work around other
design issues with linux. *THAT* is what makes it wrong. Not the fact that
it wouldn't work (I believe it works, I don't think anyone doubts that
much). If you are unwilling to even discuss fixing the underlying design
issues then I'm scared that this issue will never come to any workable
solution.

2005-01-07 14:40:59

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>> rlimit_memlock limits the *amount* of memory that mlock() can be used
>> on, not whether mlock can be used. at least, thats my understanding of
>> the POSIX design for this. the man page and the source code for mlock
>> support make that reasonably clear.
>
>eh no. It defaults to zero, but if you increase it for a specific user, that
>user is allowed to mlock more.

from mm/mlock.c:do_mlock() in 2.6.8:

if (on && !capable(CAP_IPC_LOCK))
return -EPERM;

i.e. only root or capabilities can make mlock() usable.

>much). If you are unwilling to even discuss fixing the underlying design
>issues then I'm scared that this issue will never come to any workable
>solution.

Lee, Jack and I have been very willing to discuss the issue. Christoph
isn't willing to discuss it, he's just told us "its the wrong design,
and I'm not telling you why or what's better". If there is a better
design that will end up in the mainstream kernel, we'd love to see it
implemented, and will likely be involved in doing it, because its
really important to us.

--p

2005-01-07 14:43:17

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 09:38:38AM -0500, Paul Davis wrote:
> >> rlimit_memlock limits the *amount* of memory that mlock() can be used
> >> on, not whether mlock can be used. at least, thats my understanding of
> >> the POSIX design for this. the man page and the source code for mlock
> >> support make that reasonably clear.
> >
> >eh no. It defaults to zero, but if you increase it for a specific user, that
> >user is allowed to mlock more.
>
> from mm/mlock.c:do_mlock() in 2.6.8:
>
> if (on && !capable(CAP_IPC_LOCK))
> return -EPERM;

now try 2.6.9 ;)
this deficiency got already fixed

2005-01-07 14:47:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 09:38:38AM -0500, Paul Davis wrote:
> Lee, Jack and I have been very willing to discuss the issue. Christoph
> isn't willing to discuss it, he's just told us "its the wrong design,
> and I'm not telling you why or what's better". If there is a better
> design that will end up in the mainstream kernel, we'd love to see it
> implemented, and will likely be involved in doing it, because its
> really important to us.

Calm down and read through the thread again.

2005-01-07 15:26:41

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>On Fri, Jan 07, 2005 at 09:38:38AM -0500, Paul Davis wrote:
>> Lee, Jack and I have been very willing to discuss the issue. Christoph
>> isn't willing to discuss it, he's just told us "its the wrong design,
>> and I'm not telling you why or what's better". If there is a better
>> design that will end up in the mainstream kernel, we'd love to see it
>> implemented, and will likely be involved in doing it, because its
>> really important to us.
>
>Calm down and read through the thread again.

Sure, lets. Distilling out the responses from kernel developers:

======================================================================

Christoph:
---------
This is far too specialized. And option to the capability LSM to grant
capabilities to certain uids/gids sounds like the better choise - and
would also allow to get rid of the magic hugetlb uid horrors.

Which still doesn't mean it's the right design. And no, I don't need the
feature so I won't write it. If you want a certain feature it's up to
you to implement it in a way that's considered mergeable.

Alan:
-----
The problem with uid/gid based hacks is that they get really ugly to
administer really fast. Especially once you have users who need realtime
and hugetlb, and users who need one only.

It would be far cleaner to split CAP_SYS_NICE capability down - which
should cover the real time OS functions nicely. Right now it gives a few
too many rights but that could be fixed easily.

gid hacks are not a good long term plan.

Can we use capabilities, if not - why not and how do we fix it so we can
do the job right. Do we need some more capability bits that are
implicitly inherited and not touched by setuidness ?

Andrew:
-------

capabilities don't work :(

Herbert:
--------

well, maybe it is time to fix them ..

I already proposed some methods to extend them,
and I'm also willing to dig into the various things
required to allow to use the capability system for
what it was intended.

Matt:
-----

You can't fix them without changing the semantics for existing users
in ways they didn't expect. It could be done with a new personality flag,
but..

Alan:
-----
I disagree. At the most trivial you could just add another 32bits of
sticky capability that are never touched by setuid/non-setuidness and
represent additional "user" (or more rightly session) abilities to do
limited overrides

Olaf:
-----
Capabilities don't work, because of missing filesystem
capabilities. If you have them, it's a question of setting the
appropriate permitted, inheritable and effective capability sets.

I didn't follow the whole thread. But if you want to grant
capabilities on a per user/group basis, may I suggest accessfs user
based capabilities, for example? :-)

======================================================================

So, we have a few responses, some references to various potential
solutions all of which have problems just as deep if not deeper than
the uid/gid-based model that this particular LSM adopts. No proposal
for any system that would actually work and address anyone's real
needs in a useful way. Please recall that we developed a
capability-based solution for 2.4, but it was cumbersome because the
vanilla kernel doesn't have capabilities enabled and there are lots of
reasons to not enable them given their current status.

Meanwhile, Jack already provided a very detailed, cross-referenced and
clear explanatin of why various other ideas won't work very well from
a user-space perspective. And in this thread, both Lee and Jack have
attempted to deal with issues that have been raised about the uid/gid
approach.

In summary, on the one hand, we have a working, defensible solution,
and on the other some misgivings and suggestions to try again at
implementing some more generic priviledge-granting system, something
that lkml has been arguing about for years, along with the rest of the
OS design community. Something that I suspect will never be properly
resolved, merely "muddled towards". There is no right way to grant
priviledges - there are many ways, and the benefits and downfalls of
each depends on what you are trying to achieve. For years, POSIX based
systems have relied on uid/gid solutions and they continue to do
so. People understand how to manage them (as best as can be done), and
what the issues are. Capabilities were supposed to be solution to
this, and instead have essentially been a dead-end. So I trust that
you'll be understanding of any scepticism that I might have of the
suggestion that we go away and work on some other "more generic"
system.

--p

2005-01-07 15:28:40

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>now try 2.6.9 ;)
>this deficiency got already fixed

well thats good, i hope someone updated the man page too :)

but is there actually any way to grant specific users a reasonable
rlimit, or are you proposing that we adopt another "bad idea" from OS
X and let everybody do this?

--p

2005-01-07 15:34:20

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


On Fri, Jan 07, 2005 at 10:27:33AM -0500, Paul Davis wrote:
> >now try 2.6.9 ;)
> >this deficiency got already fixed
>
> well thats good, i hope someone updated the man page too :)
>
> but is there actually any way to grant specific users a reasonable
> rlimit,

yes; most distributions will use pam for this, you can set per user or per
gorup limits there.

2005-01-07 15:42:25

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>> well thats good, i hope someone updated the man page too :)
>>
>> but is there actually any way to grant specific users a reasonable
>> rlimit,
>
>yes; most distributions will use pam for this, you can set per user or per
>gorup limits there.

isn't that a uid/gid based system? ok, i'm being a little snide :)

fine, so the mlock situation may have improved enough post-2.6.9 that
it can be considered fixed. that leaves the scheduler issue. but
apparently, a uid/gid solution is OK for mlock, and not for the
scheduler. am i missing something?

--p


2005-01-07 16:04:57

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
>
> fine, so the mlock situation may have improved enough post-2.6.9 that
> it can be considered fixed. that leaves the scheduler issue. but
> apparently, a uid/gid solution is OK for mlock, and not for the
> scheduler. am i missing something?

I think you skipped a step. You don't have a scheduler requirement, you have
a latency requirement. You currently *solve* that latency requirement via a
scheduler "hack", yet is quite clear that the "hard" realtime solution is
most likely not the right approach. Note that I'm not saying that you
shouldn't get the latency that that currently provides, but the downsides
(can hang the machine) are bad; a solution that solves that would be far
preferable
something like a soft realtime flag that acts as if it's the hard realtime
one unless the app shows "misbehavior" (eg eats its timeslice for X times in
a row) might for example be such a solution. And with the anti abuse
protection it can run with far lighter privilegs.

2005-01-07 16:04:23

by Martin Mares

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Hello!

> >yes; most distributions will use pam for this, you can set per user or per
> >gorup limits there.
>
> isn't that a uid/gid based system? ok, i'm being a little snide :)

:) The big difference between this and a pure uid/gid based system is that
pam_limits is not the only place where you can change the ulimits. If your
system is simple enough that deciding on uid/gid is enough, you can use
pam_limits; if not and you for example want to make the limits depend
on the phase of the moon, it's easy to do so -- just write a simple user space
program which will set the limits accordingly. Also, if the user wishes to
restrict his abilities, because he's going to do some experiment and he
doesn't want to lock up the machine, he can easily do so.

Except for filesystem permissions, I think that it's exactly the usual UNIX
way of controlling access -- the kernel takes care of access checks based
on some trivial attributes like ulimits and capabilities, and user space
decides who should get which. I don't see any reason why the right to use
realtime scheduling should be treated differently. Do you?

It's quite probable that the current system of capabilities is not well
suited for this, but I think that although it's tempting to work around it
by introducing a new security module, in the long term it's much better
to extend and/or fix the capabilities -- I don't see any fundamental reason
for capabilities being unusable for this goal, it's much more likely to be
just minor details in the implementation.

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Always remember that you are absolutely unique ... just like everyone else.

2005-01-07 16:08:28

by Martin Mares

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Hello!

> Olaf:
> -----
> Capabilities don't work, because of missing filesystem
> capabilities. If you have them, it's a question of setting the
> appropriate permitted, inheritable and effective capability sets.

Sure, filesystem capabilities would be nice, but for the stuff Paul
mentions they aren't needed -- what you need is to grant capabilities
to the user's session, which can be easily done by a PAM module.

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"C++: an octopus made by nailing extra legs onto a dog." -- Steve Taylor

2005-01-07 16:14:46

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>Sure, filesystem capabilities would be nice, but for the stuff Paul
>mentions they aren't needed -- what you need is to grant capabilities
>to the user's session, which can be easily done by a PAM module.

i think this is true only if the kernel comes with capabilities
enabled.

various media-centric distributions (CCRMA, demudi, dyne:bolic and
others) enabled them for their 2.4 kernels, but not the major
desktop-centric ones. then the impression began to be received that in
2.6, capabilities were even more questionable of a mechanism to use.
In addition, the LSM system appeared, and seemed to offer a much
better solution entirely: no need to patch the kernel at all, or at
least it appeared to be so in the beginning. Hence the "realtime" LSM.

--p

2005-01-07 16:20:31

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

At Fri, 7 Jan 2005 17:03:51 +0100,
Arjan van de Ven wrote:
>
> On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
> >
> > fine, so the mlock situation may have improved enough post-2.6.9 that
> > it can be considered fixed. that leaves the scheduler issue. but
> > apparently, a uid/gid solution is OK for mlock, and not for the
> > scheduler. am i missing something?
>
> I think you skipped a step. You don't have a scheduler requirement, you have
> a latency requirement. You currently *solve* that latency requirement via a
> scheduler "hack", yet is quite clear that the "hard" realtime solution is
> most likely not the right approach. Note that I'm not saying that you
> shouldn't get the latency that that currently provides, but the downsides
> (can hang the machine) are bad; a solution that solves that would be far
> preferable
> something like a soft realtime flag that acts as if it's the hard realtime
> one unless the app shows "misbehavior" (eg eats its timeslice for X times in
> a row) might for example be such a solution. And with the anti abuse
> protection it can run with far lighter privilegs.

This reminds me about the soft-RT patch posted quite sometime ago.
I feel such a handy psuedo-RT scheduler class would be useful for
other systems than JACK, too...


Takashi

2005-01-07 16:21:40

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
>>
>> fine, so the mlock situation may have improved enough post-2.6.9 that
>> it can be considered fixed. that leaves the scheduler issue. but
>> apparently, a uid/gid solution is OK for mlock, and not for the
>> scheduler. am i missing something?
>
>I think you skipped a step. You don't have a scheduler requirement, you have
>a latency requirement. You currently *solve* that latency requirement via a
>scheduler "hack", yet is quite clear that the "hard" realtime solution is
>most likely not the right approach. Note that I'm not saying that you

Why is that clear? In just about every respect, realtime audio has the
same characteristics as hard realtime, except that nobody gets hurt
when a deadline is missed :) We have an IRQ source, and a deadline
(sometimes on the sub-msec range, but more typically 1-5msec) for the
work that has to be done. This deadline is tight enough that the task
essentially *has* to run with SCHED_FIFO scheduling, because doing
almost anything else instead will cause the deadline to be missed.

>shouldn't get the latency that that currently provides, but the downsides
>(can hang the machine) are bad; a solution that solves that would be far
>preferable

OS X's deadline scheduler is arguably better, though I don't believe
it can actually offer the guarantees it claims to with 100%
reliability. But they are essentially do hard realtime via deadline
scheduling, combined with a task killer for any RT task that exceeds
its stated cycle consumption.

To do that in Linux would be great, but its really an addition to the
current scheduling mechanisms, not a replacement. The OS X realtime
task (its actually a Mach RT thread, to be more precise) can still
theoretically cause DOS *if* the kernel task killer was not present,
so its just the task killer that would be needed, presumably driven by
the timer interrupt.

>something like a soft realtime flag that acts as if it's the hard realtime
>one unless the app shows "misbehavior" (eg eats its timeslice for X times in
>a row) might for example be such a solution. And with the anti abuse
>protection it can run with far lighter privilegs.

i guess we're suggesting almost the same thing, except that i consider
this to be hard realtime plus a task killer, not "soft realtime
pretending to be hard realtime" :)

--p

2005-01-07 16:24:24

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>It's quite probable that the current system of capabilities is not well
>suited for this, but I think that although it's tempting to work around it
>by introducing a new security module, in the long term it's much better
>to extend and/or fix the capabilities -- I don't see any fundamental reason
>for capabilities being unusable for this goal, it's much more likely to be
>just minor details in the implementation.

capabilities work - we use them in 2.4 where a helper suid application
gets the ball rolling, and then its child grants capabilities to new
clients.

the problem we have with capabilities is that capabilities are not
enabled by default in the vanilla kernel, and there seems to be
considerable advice suggesting that they should not be enabled.

--p

2005-01-07 16:29:10

by Martin Mares

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Hello!

> i think this is true only if the kernel comes with capabilities
> enabled.
>
> various media-centric distributions (CCRMA, demudi, dyne:bolic and
> others) enabled them for their 2.4 kernels, but not the major
> desktop-centric ones. then the impression began to be received that in
> 2.6, capabilities were even more questionable of a mechanism to use.
> In addition, the LSM system appeared, and seemed to offer a much
> better solution entirely: no need to patch the kernel at all, or at
> least it appeared to be so in the beginning. Hence the "realtime" LSM.

Yes, but is there really some difference between people having to enable
LSM and add a new LSM module, and people recompiling the kernel to include
capabilities?

Also, is somebody really shipping 2.4 kernels without capabilities?
I'm unable to find any such config switch in 2.4.28 -- maybe it's because
I'm almost sleeping now, but it doesn't seem to be there.

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
return(ECRAY); /* Program exited before being run */

2005-01-07 16:37:27

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>Yes, but is there really some difference between people having to enable
>LSM and add a new LSM module, and people recompiling the kernel to include
>capabilities?

Well, one is configuration issue, the other involves hacking the
kernel headers before recompiling. Maybe you and I might not seem much
difference, but many people would. One of them says "the kernel gang
think this is OK to use if you want to", the other one says "err, you
can do this but don't call me if it goes wrong".

>Also, is somebody really shipping 2.4 kernels without capabilities?
>I'm unable to find any such config switch in 2.4.28 -- maybe it's because
>I'm almost sleeping now, but it doesn't seem to be there.

They are present but disabled by default. You have to hack the initial
values of CAP_INIT_EFF_SET and CAP_INIT_IHN_SET.

--p

2005-01-07 16:40:14

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

At Fri, 7 Jan 2005 17:29:02 +0100,
Martin Mares wrote:
>
> Hello!
>
> > i think this is true only if the kernel comes with capabilities
> > enabled.
> >
> > various media-centric distributions (CCRMA, demudi, dyne:bolic and
> > others) enabled them for their 2.4 kernels, but not the major
> > desktop-centric ones. then the impression began to be received that in
> > 2.6, capabilities were even more questionable of a mechanism to use.
> > In addition, the LSM system appeared, and seemed to offer a much
> > better solution entirely: no need to patch the kernel at all, or at
> > least it appeared to be so in the beginning. Hence the "realtime" LSM.
>
> Yes, but is there really some difference between people having to enable
> LSM and add a new LSM module, and people recompiling the kernel to include
> capabilities?

For distributors, it's much easier to provide an additional module
than to let people recompile kernels.


Takashi

2005-01-07 16:44:24

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

[added Paul to cc:]

On Mon, 2005-01-03 at 15:15 +0100, Arjan van de Ven wrote:
> On Mon, 2005-01-03 at 14:03 +0000, Christoph Hellwig wrote:
> > On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> > > The realtime LSM has been previously explained on this list. Its
> > > function is to allow selected nonroot users to run RT tasks. The most
> > > common application is low latency audio with JACK, http://jackit.sf.net.
> > >
> >
> > This is far too specialized. And option to the capability LSM to grant
> > capabilities to certain uids/gids sounds like the better choise - and
> > would also allow to get rid of the magic hugetlb uid horrors.
> those can go away anyway now that there is an rlimit to achieve the
> exact same thing.....
>
> I can see the point of making an rlimit like thing instead for both the
> nice levels allowed and maybe the "can do rt" bit
>

How about a "max RT prio" rlimit, that defaults to -1 (can't do RT).
Set it to 90 or something for audio users so you can still run a higher
prio watchdog thread.

Lee


2005-01-07 16:44:49

by Martin Mares

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Hello!

> > Yes, but is there really some difference between people having to enable
> > LSM and add a new LSM module, and people recompiling the kernel to include
> > capabilities?
>
> For distributors, it's much easier to provide an additional module
> than to let people recompile kernels.

Well, if LSM is enabled in the kernel, enabling capabilities should be
a single insmod, shouldn't it?

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
The better the better, the better the bet.

2005-01-07 17:06:06

by Martin Mares

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Hello!

> They are present but disabled by default. You have to hack the initial
> values of CAP_INIT_EFF_SET and CAP_INIT_IHN_SET.

Oops. Does anybody know why this has been done?

Also, it seems that it has a relatively easy work-around: boot with
init=/sbin/simple-wrapper and let the wrapper set the cap_bset and exec real
init. (I agree that it's a hack, but a temporarily usable one.)

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"When I was a boy I was told that anybody could become President; I'm beginning to believe it." -- C. Darrow

2005-01-07 17:29:27

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Martin Mares ([email protected]) wrote:
> Hello!
>
> > They are present but disabled by default. You have to hack the initial
> > values of CAP_INIT_EFF_SET and CAP_INIT_IHN_SET.
>
> Oops. Does anybody know why this has been done?

Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.

> Also, it seems that it has a relatively easy work-around: boot with
> init=/sbin/simple-wrapper and let the wrapper set the cap_bset and exec real
> init. (I agree that it's a hack, but a temporarily usable one.)

This won't work, you can't increase the bset, which is hardcoded to
leave out SETPCAP. Also, init is hard coded to start without SETPCAP.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 17:32:51

by Martin Mares

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Hello!

> Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.

Hmmm, I don't remember now, could you give me some pointer, please?

> This won't work, you can't increase the bset, which is hardcoded to
> leave out SETPCAP. Also, init is hard coded to start without SETPCAP.

If I read the source correctly, init is allowed to increase the bset,
the other processes aren't.

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
American patent law: two monkeys, fourteen days.

2005-01-07 17:41:12

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Martin Mares ([email protected]) wrote:
> Hello!
>
> > Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.
>
> Hmmm, I don't remember now, could you give me some pointer, please?

Sure, the Wagner/Chen paper on setuid demystified has some references to
it IIRC. http://www.cs.ucdavis.edu/~hchen/paper/usenix02.ps

> > This won't work, you can't increase the bset, which is hardcoded to
> > leave out SETPCAP. Also, init is hard coded to start without SETPCAP.
>
> If I read the source correctly, init is allowed to increase the bset,
> the other processes aren't.

Yes, you're right I forgot about that.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 17:56:32

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Paul Davis ([email protected]) wrote:
> So, we have a few responses, some references to various potential
> solutions all of which have problems just as deep if not deeper than
> the uid/gid-based model that this particular LSM adopts. No proposal
> for any system that would actually work and address anyone's real
> needs in a useful way.

I don't think that's quite true. One repeated recommendation was to
simply generalize the idea so that it applies to all capabilities.
Another, which at this point appears quite workable, was Arjan's
recommendation to make scheduling policy/priority protected by an rlimit
(complicated only by representing the combinations sanely in a single
number).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 18:04:47

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Arjan van de Ven ([email protected]) wrote:
> eh no. It defaults to zero, but if you increase it for a specific user, that
> user is allowed to mlock more.

Actually, I think it defaults to 32k to keep gpg happy (at least in
mainline) ;-)

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 19:57:17

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Martin Mares <[email protected]> writes:

>> Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.
>
> Hmmm, I don't remember now, could you give me some pointer, please?

I already did that...

> Jack O'Quin wrote:
> > The biggest problem was CAP_SETPCAP, which for good reasons[1] is
> > disabled in distributed kernels. This forced every user to patch and
> > build a custom kernel. Worse, it opened all our systems up to the
> > problems reported by this sendmail security advisory.

[1] http://www.securiteam.com/unixfocus/5KQ040A1RI.html

--
joq

2005-01-07 20:08:48

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 01:55:09AM +0000, Alan Cox wrote:
> On Gwe, 2005-01-07 at 01:13, Matt Mackall wrote:
> > You can't fix them without changing the semantics for existing users
> > in ways they didn't expect. It could be done with a new personality flag,
> > but..
>
> I disagree. At the most trivial you could just add another 32bits of
> sticky capability that are never touched by setuid/non-setuidness and
> represent additional "user" (or more rightly session) abilities to do
> limited overrides

I think we're referring to different brokenness. The problems I see
are with the semantics of inheritance of capabilities which make
wrapping applications painful. Those can't be changed without creating
holes in existing apps so the general utility of caps is limited.

--
Mathematics is the supreme nostalgia of our time.

2005-01-07 20:05:48

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
> Note that sched_setschedule() provides no way to handle the mlock()
> requirement, which cannot be done from another process.

I'm pretty sure that part can be done by a privileged server handing
out mlocked shared memory segments.

The trouble with introducing something into the kernel is that once
done, it can't be undone. So you're absolutely going to meet
resistance to anything that can be a) done sufficiently in userspace
or b) can reasonably be done in a more generic manner so as to meet
the needs of a wider future audience. The onus is on the submitter to
meet these requirements because we can't easily kick out a broken API
after we accept it.

--
Mathematics is the supreme nostalgia of our time.

2005-01-07 20:30:01

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Matt Mackall <[email protected]> writes:

> On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
>> Note that sched_setschedule() provides no way to handle the mlock()
>> requirement, which cannot be done from another process.
>
> I'm pretty sure that part can be done by a privileged server handing
> out mlocked shared memory segments.

If you're "pretty sure", please explain how locking a shared memory
segment prevents the code and stack of the client's realtime thread
from page faulting.
--
joq

2005-01-07 20:24:34

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
> > Note that sched_setschedule() provides no way to handle the mlock()
> > requirement, which cannot be done from another process.
>
> I'm pretty sure that part can be done by a privileged server handing
> out mlocked shared memory segments.

It can actually be done with plain ol' rlimits (RLIMIT_MEMLOCK).

> The trouble with introducing something into the kernel is that once
> done, it can't be undone. So you're absolutely going to meet
> resistance to anything that can be a) done sufficiently in userspace
> or b) can reasonably be done in a more generic manner so as to meet
> the needs of a wider future audience. The onus is on the submitter to
> meet these requirements because we can't easily kick out a broken API
> after we accept it.

Indeed (although in this case it's not adding an API as much as using an
existing one).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 20:46:07

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-07 at 12:02 -0800, Matt Mackall wrote:
> The trouble with introducing something into the kernel is that once
> done, it can't be undone. So you're absolutely going to meet
> resistance to anything that can be a) done sufficiently in userspace
> or b) can reasonably be done in a more generic manner so as to meet
> the needs of a wider future audience. The onus is on the submitter to
> meet these requirements because we can't easily kick out a broken API
> after we accept it.

For a big subsystem that exposes an API, you would be right. But this
is a *really* simple problem, all you need is a way to tell it who gets
RT privileges, which means uid or gid. So any future solution will be
orthogonal to this one, and when users upgrade even a not very smart
Perl script will be able to migrate the configuration. How many
different ways are there to say "these are the non-root users who have
realtime prvileges", anyway?

Unless, of course, the solution that's eventually merged is *really*
overcomplicated by comparison, in which case users will (rightly) reject
it, and the system will have worked.

Lee



2005-01-07 20:49:00

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 02:27:26PM -0600, Jack O'Quin wrote:
> Matt Mackall <[email protected]> writes:
>
> > On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
> >> Note that sched_setschedule() provides no way to handle the mlock()
> >> requirement, which cannot be done from another process.
> >
> > I'm pretty sure that part can be done by a privileged server handing
> > out mlocked shared memory segments.
>
> If you're "pretty sure", please explain how locking a shared memory
> segment prevents the code and stack of the client's realtime thread
> from page faulting.

You just map your RT-dependent routine (PIC, of course) into the
segment and move your stack pointer into a second segment. I didn't
say it was easy, but it's all just bits. There's also the rlimit
issue.

Or, going the other way, the client app can pass map handles to the
server to bless. Some juggling might be involved but it's obviously
doable.

As has been pointed out, an rlimit solution exists now as well.

--
Mathematics is the supreme nostalgia of our time.

2005-01-07 20:56:56

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-07 at 12:46 -0800, Matt Mackall wrote:
> You just map your RT-dependent routine (PIC, of course) into the
> segment and move your stack pointer into a second segment. I didn't
> say it was easy, but it's all just bits. There's also the rlimit
> issue.
>
> Or, going the other way, the client app can pass map handles to the
> server to bless. Some juggling might be involved but it's obviously
> doable.
>

Christ, what a nightmare! Since when does "obviously doable" mean it's
a good idea? Please, reread your above statements, then go back and
look at the realtime LSM patch (it's less than 200 lines), and tell me
again that your way is more secure.

Please keep in mind that there are already 1000s of users using the
realtime LSM to do audio work. Sorry, but I will take a known good,
well understood, PROVEN solution over "it's obviously doable, it's all
bits anyway". Get back to me when you have some code, or at least some
reasonable suggestions as Alan, Christoph and others have made.

> As has been pointed out, an rlimit solution exists now as well.

Wrong, as was said repeatedly, rlimits only help with mlock! Have you
even been reading the thread?

Lee

2005-01-07 21:22:56

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 03:55:12PM -0500, Lee Revell wrote:
> On Fri, 2005-01-07 at 12:46 -0800, Matt Mackall wrote:
> > You just map your RT-dependent routine (PIC, of course) into the
> > segment and move your stack pointer into a second segment. I didn't
> > say it was easy, but it's all just bits. There's also the rlimit
> > issue.
> >
> > Or, going the other way, the client app can pass map handles to the
> > server to bless. Some juggling might be involved but it's obviously
> > doable.
> >
>
> Christ, what a nightmare! Since when does "obviously doable" mean it's
> a good idea? Please, reread your above statements, then go back and
> look at the realtime LSM patch (it's less than 200 lines), and tell me
> again that your way is more secure.

My way simply proves that existing userspace methods have not been
exhausted. It's not impossible as was claimed and cleaner methods or
nicely wrapped variants of the above probably exist. And yes, doing
ugly things in userspace is preferable to adding application-specific
baggage to the kernel.

> > As has been pointed out, an rlimit solution exists now as well.
>
> Wrong, as was said repeatedly, rlimits only help with mlock! Have you
> even been reading the thread?

Feh. The RT scheduling class issue is orthogonal. Addressing mlock and
scheduling class at once (and nothing else) is actually an ugliness of
your LSM approach as there are folks who want mlock and not RT.

--
Mathematics is the supreme nostalgia of our time.

2005-01-07 21:28:22

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-07 at 11:20 -0500, Paul Davis wrote:
> >On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
> >>
> >> fine, so the mlock situation may have improved enough post-2.6.9 that
> >> it can be considered fixed. that leaves the scheduler issue. but
> >> apparently, a uid/gid solution is OK for mlock, and not for the
> >> scheduler. am i missing something?
> >
> >I think you skipped a step. You don't have a scheduler requirement, you have
> >a latency requirement. You currently *solve* that latency requirement via a
> >scheduler "hack", yet is quite clear that the "hard" realtime solution is
> >most likely not the right approach. Note that I'm not saying that you
>
> Why is that clear? In just about every respect, realtime audio has the
> same characteristics as hard realtime, except that nobody gets hurt
> when a deadline is missed :) We have an IRQ source, and a deadline
> (sometimes on the sub-msec range, but more typically 1-5msec) for the
> work that has to be done. This deadline is tight enough that the task
> essentially *has* to run with SCHED_FIFO scheduling, because doing
> almost anything else instead will cause the deadline to be missed.
>

It's not like hard realtime, it is. All that makes a hard RT system is
that missing a deadline means the system has utterly failed. How is
this any different than an xrun causing a loud pop or click in a live
performance?

Really, I think Linux has owned the server space for so long that some
folks on this list are getting hubristic. Just because you have the
best server OS does not mean it's the best at everything.

Lee

2005-01-07 21:38:21

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> Feh. The RT scheduling class issue is orthogonal. Addressing mlock and
> scheduling class at once (and nothing else) is actually an ugliness of
> your LSM approach as there are folks who want mlock and not RT.

Last I checked they could be controlled separately in that module. It
has been suggested (by me and others) that one possible solution would
be to expand it to be generic for all caps.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 21:46:02

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Lee Revell <[email protected]> wrote:
>
> Really, I think Linux has owned the server space for so long that some
> folks on this list are getting hubristic. Just because you have the
> best server OS does not mean it's the best at everything.

nah, the requirement is clearly valid, and longstanding. We need to
satisfy it. It's just a matter of working out the best way.

Chris Wright <[email protected]> wrote:
>
> ...
> Last I checked they could be controlled separately in that module. It
> has been suggested (by me and others) that one possible solution would
> be to expand it to be generic for all caps.

Maybe this is the way?

2005-01-07 22:09:28

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 07 Jan 2005 13:49:41 PST, Andrew Morton said:

> Chris Wright <[email protected]> wrote:

> > Last I checked they could be controlled separately in that module. It
> > has been suggested (by me and others) that one possible solution would
> > be to expand it to be generic for all caps.
>
> Maybe this is the way?

We already *know* how to (in principle) fix the capabilities system to make
it useful. We should probably investigate doing that and at the same time
fixing the current CAP_SYS_ADMIN mess (which we also have at least some ideas
on fixing). The remaining problem is possible breakage of software that's doing
capability things The Old Way (as the inheritance rules are incompatible).

Linus at one time said that a 2.7 might open if there was some issue that
caused enough disruption to require a fork - could this be it, or does somebody
have a better way to address the backward-combatability problem?


Attachments:
(No filename) (226.00 B)

2005-01-07 22:17:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 01:49:41PM -0800, Andrew Morton wrote:
> Chris Wright <[email protected]> wrote:
> >
> > ...
> > Last I checked they could be controlled separately in that module. It
> > has been suggested (by me and others) that one possible solution would
> > be to expand it to be generic for all caps.
>
> Maybe this is the way?

It's at least not as bad as the current hack (when properly done in
the capabilities modules instead of adding one ontop).

I must say I'm not exactly happy with that idea still. It ties the
privilegues we have been separating from a special uid (0) to filesystem
permissions again. It's not nessecarily a bad idea per, but it doesn't
really fit into the model we've been working to. I'd expect quite a few
unpleasant devices when a user detects that the distibution had been
binding various capabilities to uids/gids behinds his back.

So to make forward progress I'd like the audio people to confirm whether
the mlock bits in 2.6.9+ do help that half of their requirement first
(and if not find a way to fix it) and then tackle the scheduling part.
For that one I really wonder whether the combination of the now actually
working nicelevels (see Mingo's post) and a simple wrapper for the really
high requirements cases doesn't work.

2005-01-07 22:34:08

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>> Last I checked they could be controlled separately in that module. It
>> has been suggested (by me and others) that one possible solution would
>> be to expand it to be generic for all caps.
>
>Maybe this is the way?

that would make a much more complex LSM, and thus opens the doors to
some inadvertent security hazard that doesn't arise in the simpler
tool we have now.

other than that, its not a terrible suggestion at all, just a lot, lot
more work.

--p


2005-01-07 22:31:27

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Christoph Hellwig ([email protected]) wrote:
> On Fri, Jan 07, 2005 at 01:49:41PM -0800, Andrew Morton wrote:
> > Chris Wright <[email protected]> wrote:
> > >
> > > ...
> > > Last I checked they could be controlled separately in that module. It
> > > has been suggested (by me and others) that one possible solution would
> > > be to expand it to be generic for all caps.
> >
> > Maybe this is the way?
>
> It's at least not as bad as the current hack (when properly done in
> the capabilities modules instead of adding one ontop).
>
> I must say I'm not exactly happy with that idea still. It ties the
> privilegues we have been separating from a special uid (0) to filesystem
> permissions again. It's not nessecarily a bad idea per, but it doesn't
> really fit into the model we've been working to. I'd expect quite a few
> unpleasant devices when a user detects that the distibution had been
> binding various capabilities to uids/gids behinds his back.

I agree, it's still a hack, just a generic and complete hack ;-)

> So to make forward progress I'd like the audio people to confirm whether
> the mlock bits in 2.6.9+ do help that half of their requirement first

It sure should, but I guess they can reply on that.

> (and if not find a way to fix it) and then tackle the scheduling part.
> For that one I really wonder whether the combination of the now actually
> working nicelevels (see Mingo's post) and a simple wrapper for the really
> high requirements cases doesn't work.

I saw Jack (I think) post some numbers showing that it wasn't enough.
What about making priority level protected via rlimit?

Here's an uncompiled, untested patch doing that (probably has some math
error or logic hole in it, but idea seems sound enough). I think it has
at least one problem, where nice 19 process, could renice itself back to
0. And it doesn't really handle the different scheduling policies,
other than implicit 40 - 139 being used for SCHED_FIFO/SCHED_RR.

It takes the 140 priority levels (0-139), inverts their priority
order, and then uses that number as the basis for the rlimit (so that a
larger rlimit means higher priority, to fall inline with normal rlimit
semantics). Defaults to 19 (which should be niceval of 0). And allows
CAP_SYS_NICE to continue to override if the rlimit is too low.

===== kernel/sched.c 1.386 vs edited =====
--- 1.386/kernel/sched.c 2005-01-04 18:48:21 -08:00
+++ edited/kernel/sched.c 2005-01-07 14:23:32 -08:00
@@ -3009,12 +3009,8 @@ asmlinkage long sys_nice(int increment)
* We don't have to worry. Conceptually one call occurs first
* and we have a single winner.
*/
- if (increment < 0) {
- if (!capable(CAP_SYS_NICE))
- return -EPERM;
- if (increment < -40)
- increment = -40;
- }
+ if (increment < -40)
+ increment = -40;
if (increment > 40)
increment = 40;

@@ -3024,6 +3020,11 @@ asmlinkage long sys_nice(int increment)
if (nice > 19)
nice = 19;

+ if ((MAX_PRIO-1) - NICE_TO_PRIO(nice) >
+ current->signal->rlim[RLIMIT_PRIO].rlim_cur &&
+ !capable(CAP_SYS_NICE))
+ return -EPERM;
+
retval = security_task_setnice(current, nice);
if (retval)
return retval;
@@ -3057,6 +3058,15 @@ int task_nice(const task_t *p)
}

/**
+ * nice_to_prio - return priority of give nice value
+ * @nice: nice value
+ */
+int nice_to_prio(const int nice)
+{
+ return NICE_TO_PRIO(nice);
+}
+
+/**
* idle_cpu - is a given cpu idle currently?
* @cpu: the processor in question.
*/
@@ -3140,6 +3150,7 @@ recheck:

retval = -EPERM;
if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
+ lp.sched_priority+40 > p->signal->rlim[RLIMIT_PRIO].rlim_cur &&
!capable(CAP_SYS_NICE))
goto out_unlock;
if ((current->euid != p->euid) && (current->euid != p->uid) &&
===== kernel/sys.c 1.102 vs edited =====
--- 1.102/kernel/sys.c 2005-01-06 23:25:46 -08:00
+++ edited/kernel/sys.c 2005-01-07 14:13:37 -08:00
@@ -225,7 +225,9 @@ static int set_one_prio(struct task_stru
error = -EPERM;
goto out;
}
- if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
+ if ((MAX_PRIO-1) - nice_to_prio(niceval) >
+ p->signal->rlim[RLIMIT_PRIO].rlim_cur &&
+ !capable(CAP_SYS_NICE)) {
error = -EACCES;
goto out;
}
===== include/asm-i386/resource.h 1.5 vs edited =====
--- 1.5/include/asm-i386/resource.h 2004-08-23 01:15:26 -07:00
+++ edited/include/asm-i386/resource.h 2005-01-07 13:55:37 -08:00
@@ -18,8 +18,9 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
+#define RLIMIT_PRIO 13 /* maximum scheduling priority */

-#define RLIM_NLIMITS 13
+#define RLIM_NLIMITS 14


/*
@@ -45,6 +46,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
+ { 19, 19 }, \
}

#endif /* __KERNEL__ */
===== include/linux/sched.h 1.280 vs edited =====
--- 1.280/include/linux/sched.h 2005-01-04 18:48:20 -08:00
+++ edited/include/linux/sched.h 2005-01-07 14:14:16 -08:00
@@ -760,6 +760,7 @@ extern void sched_idle_next(void);
extern void set_user_nice(task_t *p, long nice);
extern int task_prio(const task_t *p);
extern int task_nice(const task_t *p);
+extern int nice_to_prio(const int nice);
extern int task_curr(const task_t *p);
extern int idle_cpu(int cpu);

2005-01-07 22:39:34

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* [email protected] ([email protected]) wrote:
> On Fri, 07 Jan 2005 13:49:41 PST, Andrew Morton said:
>
> > Chris Wright <[email protected]> wrote:
>
> > > Last I checked they could be controlled separately in that module. It
> > > has been suggested (by me and others) that one possible solution would
> > > be to expand it to be generic for all caps.
> >
> > Maybe this is the way?
>
> We already *know* how to (in principle) fix the capabilities system to make
> it useful. We should probably investigate doing that and at the same time
> fixing the current CAP_SYS_ADMIN mess (which we also have at least some ideas
> on fixing). The remaining problem is possible breakage of software that's doing
> capability things The Old Way (as the inheritance rules are incompatible).

Fixing CAP_SYS_ADMIN whole other can o' worms. No point in tangling the
two.

> Linus at one time said that a 2.7 might open if there was some issue that
> caused enough disruption to require a fork - could this be it, or does somebody
> have a better way to address the backward-combatability problem?

There's at least two ways. Introduce a new capability module or introduce
a PF flag to opt in. Neither are great

--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-07 22:47:56

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Andrew Morton wrote:
> Lee Revell <[email protected]> wrote:
>
>>Really, I think Linux has owned the server space for so long that some
>>folks on this list are getting hubristic. Just because you have the
>>best server OS does not mean it's the best at everything.
>
>
> nah, the requirement is clearly valid, and longstanding. We need to
> satisfy it. It's just a matter of working out the best way.
>
> Chris Wright <[email protected]> wrote:
>
>>...
>>Last I checked they could be controlled separately in that module. It
>>has been suggested (by me and others) that one possible solution would
>>be to expand it to be generic for all caps.
>
>
> Maybe this is the way?

This could give an advantage for e.g. networked daemons, too. No more
root privilege necessary for applications just to bind to a privileged
port which does make life easier (CAP_NET_BIND_SERVICE). Other ideas for
e.g. CAP_NET_RAW or CAP_SYS_RAWIO come to mind. Using the current
capabilties in this design as all incuding supersets that can be defined
more fine grained in a later step I guess should suit others, too. The
remaining problem would then be the design of an extensible interface
that is backwards compatible.

--
Andreas Steinmetz SPAMmers use [email protected]

2005-01-07 23:06:52

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 07 Jan 2005 14:36:38 PST, Chris Wright said:

> > We already *know* how to (in principle) fix the capabilities system to make
> > it useful. We should probably investigate doing that and at the same time
> > fixing the current CAP_SYS_ADMIN mess (which we also have at least some ideas
> > on fixing). The remaining problem is possible breakage of software that's doing
> > capability things The Old Way (as the inheritance rules are incompatible).
>
> Fixing CAP_SYS_ADMIN whole other can o' worms. No point in tangling the
> two.

Yes, it's two entire cans. The problem is that in *both* cases, we're probably
going to have to do an API change. It may be preferable to only require changes
on the userspace side once, rather than change it once to fix the inheritance
problems in 2.7/2.6.N+10 or whatever it will be, and then again in 2.9/2.6.N+20
or whatever....

> > Linus at one time said that a 2.7 might open if there was some issue that
> > caused enough disruption to require a fork - could this be it, or does somebody
> > have a better way to address the backward-combatability problem?
>
> There's at least two ways. Introduce a new capability module or introduce
> a PF flag to opt in. Neither are great

A new PF flag strikes me as marginally better, especially if we have a way to
propogate from Elf headers in a way similar to Execshield's use of elf_ex.e_phnum
to set the executable-stack...


Attachments:
(No filename) (226.00 B)

2005-01-07 23:09:59

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-07 at 22:10 +0000, Christoph Hellwig wrote:
> It's not nessecarily a bad idea per, but it doesn't
> really fit into the model we've been working to. I'd expect quite a few
> unpleasant devices when a user detects that the distibution had been
> binding various capabilities to uids/gids behinds his back.
>

Point taken, but do keep in mind that this will *certainly* be disabled
by default, unless you run an audio oriented distro, and we assume those
people know what they're doing ;-)

> For that one I really wonder whether the combination of the now actually
> working nicelevels (see Mingo's post)

Ingo said "it should work". It currently doesn't, as you can see from
Jack's post. My concern here is, the semantics of SCHED_FIFO are well
defined and stable. The highest priority runnable SCHED_FIFO process
*always* runs. The semantics of "nice -20" apparently change from
release to release, as you can see. We can't have the scheduler
deciding to run something else when jackd needs to run because it
decides jackd is hogging the CPU or whatever. Everyone knows that when
dealing with realtime constraints the important case is not the average
but the worst.

In a live audio situation an xrun storm and a complete system lockup are
both catastrophic failures.

Lee

2005-01-07 23:21:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

[email protected] wrote:
>
> fix the inheritance problems

Does anyone actually have a handle on what's involved in fixing the
inheritance problem?

It's risky, but it is something which we should do.

<grumpytroll> We really shouldn't have merged all that new fancy security
stuff when the existing security framework was known-badly-broken.
Especially as the new stuff seems incapable of doing simple things which
unbroken inherited caps would do perfectly.</grumpytroll>

2005-01-07 23:37:54

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 07 Jan 2005 15:20:04 PST, Andrew Morton said:

> Does anyone actually have a handle on what's involved in fixing the
> inheritance problem?

Andy Lutomirski was looking at that, and it's actually a very small but
incompatible change that allows filesystem support for set-capability files to
be actually usable. He posted some patches back in May....


Attachments:
(No filename) (226.00 B)

2005-01-07 22:38:24

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>So to make forward progress I'd like the audio people to confirm whether
>the mlock bits in 2.6.9+ do help that half of their requirement first

it does, although it would be nicer to not have two separate
components to administering the usability of realtime applications.

>(and if not find a way to fix it) and then tackle the scheduling part.
>For that one I really wonder whether the combination of the now actually
>working nicelevels (see Mingo's post) and a simple wrapper for the really
>high requirements cases doesn't work.

Jack already posted results: the nice levels are massively inferior as
they currently stand.

The wrapper is incredibly inconvenient for applications: when you use
JACK, start clients would require a different command depending on
whether JACK is using RT mode or not. That is extremely inelegant, and
its why we've developed these solutions (caps+jackstart for 2.4,
"realtime" LSM for 2.6).

--p

2005-01-08 05:38:17

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Takashi Iwai wrote:
> At Fri, 7 Jan 2005 17:03:51 +0100,
> Arjan van de Ven wrote:
>>something like a soft realtime flag that acts as if it's the hard realtime
>>one unless the app shows "misbehavior" (eg eats its timeslice for X times in
>>a row) might for example be such a solution. And with the anti abuse
>>protection it can run with far lighter privilegs.
>
>
> This reminds me about the soft-RT patch posted quite sometime ago.
> I feel such a handy psuedo-RT scheduler class would be useful for
> other systems than JACK, too...

You've already proven that soft RT does not suit your requirements. The
current scheduler running a task at nice -20 has extremely long periods
of cpu availability at the expense of lower priority tasks and is close
to the behaviour you would get with a soft RT patch. Your concern is
exactly the scenario where nice -20 fails, and would be the same
scenario where a soft RT policy would fail. Doing this with a scheduling
policy, you want cpu time long after there is any hope for fairness or
safety of hanging. From experimentation with such soft RT policies, we
find average latencies can be reduced but the maximum ones, which are
the ones that concern professional audio, remain the same.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-08 09:44:48

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Con Kolivas <[email protected]> writes:

> Takashi Iwai wrote:
>> At Fri, 7 Jan 2005 17:03:51 +0100,
>> Arjan van de Ven wrote:
>>>something like a soft realtime flag that acts as if it's the hard realtime
>>>one unless the app shows "misbehavior" (eg eats its timeslice for X times in
>>>a row) might for example be such a solution. And with the anti abuse
>>>protection it can run with far lighter privilegs.

>> This reminds me about the soft-RT patch posted quite sometime ago.
>> I feel such a handy psuedo-RT scheduler class would be useful for
>> other systems than JACK, too...
>
> You've already proven that soft RT does not suit your
> requirements. The current scheduler running a task at nice -20 has
> extremely long periods of cpu availability at the expense of lower
> priority tasks and is close to the behaviour you would get with a soft
> RT patch. Your concern is exactly the scenario where nice -20 fails,
> and would be the same scenario where a soft RT policy would
> fail. Doing this with a scheduling policy, you want cpu time long
> after there is any hope for fairness or safety of hanging. From
> experimentation with such soft RT policies, we find average latencies
> can be reduced but the maximum ones, which are the ones that concern
> professional audio, remain the same.

Yes, this is exactly right. The corrected test results I just posted
support your contention.

For realtime, most of the OS tricks we all know and love are
counter-productive. It's the worst case that matters, not the
average.
--
joq

2005-01-08 09:48:21

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Chris Wright <[email protected]> writes:

> * Christoph Hellwig ([email protected]) wrote:
>> So to make forward progress I'd like the audio people to confirm whether
>> the mlock bits in 2.6.9+ do help that half of their requirement first
>
> It sure should, but I guess they can reply on that.

That does seem to work now (finally). It looks like that longstanding
CAP_IPC_LOCK bug is finally fixed, too.

I find it hard to understand why some of you think PAM is an adequate
solution. As currently deployed, it is poorly documented and nearly
impossible for non-experts to administer securely. On my Debian woody
system, when I login from the console I get one fairly sensible set of
ulimit values, but from gdm I get a much more permissive set (with
ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
config includes `session required pam_limits.so' but the system comes
with an empty /etc/security/limits.conf. I'm just guessing about that
because I can't find any decent documentation for any of this crap.

Remember, if something is difficult to administer, it's *not* secure.

>> (and if not find a way to fix it) and then tackle the scheduling part.
>> For that one I really wonder whether the combination of the now actually
>> working nicelevels (see Mingo's post) and a simple wrapper for the really
>> high requirements cases doesn't work.
>
> I saw Jack (I think) post some numbers showing that it wasn't enough.
> What about making priority level protected via rlimit?

The numbers I reported yesterday were so bad I couldn't figure out why
anyone even thought it was worth trying. Now I realize why.

When Ingo said to try "nice -20", I took him literally, forgetting
that the stupid command to achieve a nice value of -20 is `nice --20'.
So I was actually testing with a nice value of 19. Bah! No wonder it
sucked.

Running `nice --20' is still significantly worse than SCHED_FIFO, but
not the unmitigated disaster shown in the middle column. But, this
improved performance is still not adequate for audio work. The worst
delay was absurdly long (~1/2 sec).

Here are the corrected results...

With -R Without -R Without -R
(SCHED_FIFO) (nice -20) (nice --20)

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1) ( 1) ( 1)
XRUN Count . . . . . . . . . : 2 2837 43
Delay Count (>spare time) . . : 0 0 0
Delay Count (>1000 usecs) . . : 0 0 0
Delay Maximum . . . . . . . . : 3130 usecs 5038044 usecs 501374 usecs
Cycle Maximum . . . . . . . . : 960 usecs 18802 usecs 1036 usecs
Average DSP Load. . . . . . . : 34.3 % 44.1 % 34.3 %
Average CPU System Load . . . : 8.7 % 7.5 % 7.8 %
Average CPU User Load . . . . : 29.8 % 5.2 % 25.3 %
Average CPU Nice Load . . . . : 0.0 % 20.3 % 0.0 %
Average CPU I/O Wait Load . . : 3.2 % 5.2 % 0.1 %
Average CPU IRQ Load . . . . : 0.7 % 0.7 % 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 % 0.2 % 0.0 %
Average Interrupt Rate . . . : 1707.6 /sec 1677.3 /sec 1692.9 /sec
Average Context-Switch Rate . : 11914.9 /sec 11197.6 /sec 11611.2 /sec
*********************************************

> Here's an uncompiled, untested patch doing that (probably has some math
> error or logic hole in it, but idea seems sound enough). I think it has
> at least one problem, where nice 19 process, could renice itself back to
> 0. And it doesn't really handle the different scheduling policies,
> other than implicit 40 - 139 being used for SCHED_FIFO/SCHED_RR.
>
> It takes the 140 priority levels (0-139), inverts their priority
> order, and then uses that number as the basis for the rlimit (so that a
> larger rlimit means higher priority, to fall inline with normal rlimit
> semantics). Defaults to 19 (which should be niceval of 0). And allows
> CAP_SYS_NICE to continue to override if the rlimit is too low.

If you really want to use PAM for everything, then this idea makes a
lot of sense.

But, what about all the other programs that would need updating to
make it useful? We'd need at least a new pam_limits.so module and a
new shell (since ulimit is built-in). I expect I will need to
maintain the realtime-lsm for at least another year before all that
can trickle down to actual end users.

--
joq

2005-01-08 13:07:37

by Paul Jakma

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 7 Jan 2005, Paul Davis wrote:

> capabilities work - we use them in 2.4 where a helper suid application
> gets the ball rolling, and then its child grants capabilities to new
> clients.

We use them too in Quagga. Reasonably happy with them.

Not a panacae, but far better to retain just a few capabilities, than
retaining ruid 0 (as we must on other systems).

Only issue really is "graininess" of capabilities, which i'd guess is
a double-edged sword.

regards,
--
Paul Jakma [email protected] [email protected] Key ID: 64A2FF6A
Fortune:
Kill Ugly Radio
- Frank Zappa

2005-01-08 16:56:59

by ross

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Sat, Jan 08, 2005 at 12:12:59AM -0600, Jack O'Quin wrote:
> I find it hard to understand why some of you think PAM is an adequate
> solution. As currently deployed, it is poorly documented and nearly
> impossible for non-experts to administer securely. On my Debian woody
> system, when I login from the console I get one fairly sensible set of
> ulimit values, but from gdm I get a much more permissive set (with
> ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
> config includes `session required pam_limits.so' but the system comes
> with an empty /etc/security/limits.conf. I'm just guessing about that
> because I can't find any decent documentation for any of this crap.
>
> Remember, if something is difficult to administer, it's *not* secure.

Not to mention that not everyone chooses to use PAM for precisely this
reason. Slackware has never included PAM and probably never will.
My audio workstation has worked swell with the 2.4+caps solution and
the 2.6+LSM solution. PAM would break me ::-(

--
Ross Vandegrift
[email protected]

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

2005-01-08 18:26:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Sat, Jan 08, 2005 at 11:56:57AM -0500, [email protected] wrote:
> On Sat, Jan 08, 2005 at 12:12:59AM -0600, Jack O'Quin wrote:
> > I find it hard to understand why some of you think PAM is an adequate
> > solution. As currently deployed, it is poorly documented and nearly
> > impossible for non-experts to administer securely. On my Debian woody
> > system, when I login from the console I get one fairly sensible set of
> > ulimit values, but from gdm I get a much more permissive set (with
> > ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
> > config includes `session required pam_limits.so' but the system comes
> > with an empty /etc/security/limits.conf. I'm just guessing about that
> > because I can't find any decent documentation for any of this crap.
> >
> > Remember, if something is difficult to administer, it's *not* secure.
>
> Not to mention that not everyone chooses to use PAM for precisely this
> reason. Slackware has never included PAM and probably never will.
> My audio workstation has worked swell with the 2.4+caps solution and
> the 2.6+LSM solution. PAM would break me ::-(

you can set rmlimits as well without pam. it's just more complicated.
But hey, it was you who didn't want to use it :)

2005-01-08 22:17:53

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Sat, 2005-01-08 at 00:12 -0600, Jack O'Quin wrote:
> I find it hard to understand why some of you think PAM is an adequate
> solution. As currently deployed, it is poorly documented and nearly
> impossible for non-experts to administer securely. On my Debian woody
> system, when I login from the console I get one fairly sensible set of
> ulimit values, but from gdm I get a much more permissive set (with
> ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
> config includes `session required pam_limits.so' but the system comes
> with an empty /etc/security/limits.conf. I'm just guessing about that
> because I can't find any decent documentation for any of this crap.

Eh, PAM is a perfectly fine solution. Documentation is lacking, but
it's easy to find examples. On my system /etc/security/limits.conf has
this sample config, commented out:

#<domain> <type> <item> <value>
#

#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0

So add your audio users (or cdrecord users, or whoever) to group
realtime and add:

realtime hard memlock 100000
realtime soft prio 100

Problem solved.

Lee


2005-01-08 22:22:05

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Sat, 2005-01-08 at 11:56 -0500, [email protected] wrote:
> Not to mention that not everyone chooses to use PAM for precisely this
> reason. Slackware has never included PAM and probably never will.
> My audio workstation has worked swell with the 2.4+caps solution and
> the 2.6+LSM solution. PAM would break me ::-(

Hmm. How could you (for example) configure all your machines to
authenticate against an LDAP server without PAM?

Lee

2005-01-08 22:29:51

by Andreas Steinmetz

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Lee Revell wrote:
> On Sat, 2005-01-08 at 11:56 -0500, [email protected] wrote:
>
>>Not to mention that not everyone chooses to use PAM for precisely this
>>reason. Slackware has never included PAM and probably never will.
>>My audio workstation has worked swell with the 2.4+caps solution and
>>the 2.6+LSM solution. PAM would break me ::-(
>
>
> Hmm. How could you (for example) configure all your machines to
> authenticate against an LDAP server without PAM?

nss_ldap :-)

--
Andreas Steinmetz SPAMmers use [email protected]

2005-01-10 21:10:35

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 07, 2005 at 03:20:04PM -0800, Andrew Morton wrote:
> [email protected] wrote:
> >
> > fix the inheritance problems
>
> Does anyone actually have a handle on what's involved in fixing the
> inheritance problem?

Probably not, in the sense that it's a complex enough problem that
something will likely to be found to be fatally flawed a year down the
road. Just like the situation we're in now.

> It's risky, but it is something which we should do.
>
> <grumpytroll> We really shouldn't have merged all that new fancy security
> stuff when the existing security framework was known-badly-broken.
> Especially as the new stuff seems incapable of doing simple things which
> unbroken inherited caps would do perfectly.</grumpytroll>

It's taken some decades to ferret out all the gotchas of the standard
UNIX permission model. None of this fancy new stuff is "simpler" by
any stretch, so expect it to be quite some time before all the
implications of any of them are completely understood.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 02:36:00

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Sat, Jan 08, 2005 at 12:12:59AM -0600, Jack O'Quin wrote:
> Chris Wright <[email protected]> writes:
>
> > * Christoph Hellwig ([email protected]) wrote:
> >> So to make forward progress I'd like the audio people to confirm whether
> >> the mlock bits in 2.6.9+ do help that half of their requirement first
> >
> > It sure should, but I guess they can reply on that.
>
> That does seem to work now (finally). It looks like that longstanding
> CAP_IPC_LOCK bug is finally fixed, too.
>
> I find it hard to understand why some of you think PAM is an adequate
> solution.

The best we can do _here_ is present something that userspace can use
sensibly. We can't make userspace actually use it that way though.

Rlimits are neither UID/GID or PAM-specific. They fit well within
the general model of UNIX security, extending an existing mechanism
rather than adding a completely new one. That PAM happens to be the
way rlimits are usually administered may be unfortunate, yes, but it
doesn't mean that rlimits is the wrong way.

> Running `nice --20' is still significantly worse than SCHED_FIFO, but
> not the unmitigated disaster shown in the middle column. But, this
> improved performance is still not adequate for audio work. The worst
> delay was absurdly long (~1/2 sec).

Let's work on that. It'd be _far_ better to have unprivileged near-RT
capability everywhere without potential scheduling DoS.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 13:05:24

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>Rlimits are neither UID/GID or PAM-specific. They fit well within
>the general model of UNIX security, extending an existing mechanism
>rather than adding a completely new one. That PAM happens to be the
>way rlimits are usually administered may be unfortunate, yes, but it
>doesn't mean that rlimits is the wrong way.

agreed, although i note with interest the flap over RLIMIT_MEMLOCK
being made accessible to unprivileged users by people working on
grsecurity.

>> Running `nice --20' is still significantly worse than SCHED_FIFO, but
>> not the unmitigated disaster shown in the middle column. But, this
>> improved performance is still not adequate for audio work. The worst
>> delay was absurdly long (~1/2 sec).
>
>Let's work on that. It'd be _far_ better to have unprivileged near-RT
>capability everywhere without potential scheduling DoS.

I am not sure what you mean here. I think we've established that
SCHED_OTHER cannot be made adequate for realtime audio work. Its
intended purpose (timesharing the machine in ways that should
generally benefit tasks that don't do a lot and/or are dominated by
user interaction, thus rendering the machine apparently responsive) is
really at odds with what we need.

Con has discussed the idea of a new scheduling class, one that has no
internal priority, runs like SCHED_RR but is subject to cpu
utilization limits, and is accessible to unprivileged users. I think
this makes a lot of sense. It can be controlled using sysctl's and/or
rlimit.

But please note: in any sane world, adding stuff like this could only
take place in an unstable tree. It seems really odd to me that anyone
can be talking about adding any of these *mechanisms* to 2.6. That was
the whole reason we (well, Jack, Torben and others) worked with LSM:
LSM appeared to be the "blessed" method in 2.6 of allowing changes to
security policy to be made. We are now finding out that even if Linus
"blessed" it by inclusion, there is enough vocal opposition to
actually using it for something useful that something else has to be
done. I wouldn't want to run an important machine on 2.6 if adding,
say SCHED_ISO or even RLIMIT_RT_CPU is part of 2.6's "maintainance".

Meanwhile, as I mentioned before, every realtime audio user of 2.6 is
*still* going to use "realtime" LSM because its really the only
effective way to get the privilege needed to do what they want to get
done.

--p

2005-01-11 14:30:25

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

> On Sat, Jan 08, 2005 at 12:12:59AM -0600, Jack O'Quin wrote:
>> I find it hard to understand why some of you think PAM is an adequate
>> solution.

Matt Mackall <[email protected]> writes:
> The best we can do _here_ is present something that userspace can use
> sensibly. We can't make userspace actually use it that way though.

"O'Quin's law" states that "every system reflects the structure of the
organization creating it". (Probably not original, I "discovered"
this about 25 years ago, while doing OS development at IBM.) Compared
to most other operating systems, GNU/Linux has a much larger
organizational gap between kernel development and the rest of the OS.
Like anything else, this is both a strength and a weakness.

> Rlimits are neither UID/GID or PAM-specific. They fit well within
> the general model of UNIX security, extending an existing mechanism
> rather than adding a completely new one. That PAM happens to be the
> way rlimits are usually administered may be unfortunate, yes, but it
> doesn't mean that rlimits is the wrong way.

This whole RLIMITS_MEMLOCK situation with PAM is a good example of how
that disconnect causes systemic troubles. AFAICT, fixing the
longstanding bug in mlock() introduced a Denial of Service bug in
Debian (and perhaps other distributions) when running 2.6.10.

Clearly, this is not a kernel bug. In fact, the kernel was broken
before. But, it is an excellent example of how depending on a
Byzantine mechanism like PAM *harms* system security. The Debian
developers are very careful about things like this. If they can't get
the default install right, something is badly amiss, damaging
complexity in the overall system. The kernel is not solely
responsible for that, but ignoring its contribution would be a
mistake.

>> Running `nice --20' is still significantly worse than SCHED_FIFO, but
>> not the unmitigated disaster shown in the middle column. But, this
>> improved performance is still not adequate for audio work. The worst
>> delay was absurdly long (~1/2 sec).
>
> Let's work on that. It'd be _far_ better to have unprivileged near-RT
> capability everywhere without potential scheduling DoS.

"Near-RT" is about the most useless concept I've heard of in a long
time. It sounds like the answer to a question nobody asked. ;-)

Linux can and should develop a better unprivileged realtime scheduling
algorithm. But, this is not an "escape hatch" to avoid confronting
mainline 2.6 security problems. We still have 2005 problems to solve.
--
joq

2005-01-11 16:41:26

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Paul Davis <[email protected]> writes:

>>Rlimits are neither UID/GID or PAM-specific. They fit well within
>>the general model of UNIX security, extending an existing mechanism
>>rather than adding a completely new one. That PAM happens to be the
>>way rlimits are usually administered may be unfortunate, yes, but it
>>doesn't mean that rlimits is the wrong way.

PAM is how most GNU/Linux systems manage rlimits. It is very UID/GID
oriented. So from the sysadmin perspective, claiming that rlimits is
"better" or "easier to manage" than "GID hacks" is bogus.

> agreed, although i note with interest the flap over RLIMIT_MEMLOCK
> being made accessible to unprivileged users by people working on
> grsecurity.

:-)

>>Let's work on that. It'd be _far_ better to have unprivileged near-RT
>>capability everywhere without potential scheduling DoS.
>
> I am not sure what you mean here. I think we've established that
> SCHED_OTHER cannot be made adequate for realtime audio work. Its
> intended purpose (timesharing the machine in ways that should
> generally benefit tasks that don't do a lot and/or are dominated by
> user interaction, thus rendering the machine apparently responsive) is
> really at odds with what we need.
>
> Con has discussed the idea of a new scheduling class, one that has no
> internal priority, runs like SCHED_RR but is subject to cpu
> utilization limits, and is accessible to unprivileged users. I think
> this makes a lot of sense. It can be controlled using sysctl's and/or
> rlimit.

A good isochronous scheduler in 2.8 would be great. We can experiment
with it this year in patch form.

Meanwhile...

> But please note: in any sane world, adding stuff like this could only
> take place in an unstable tree. It seems really odd to me that anyone
> can be talking about adding any of these *mechanisms* to 2.6. That was
> the whole reason we (well, Jack, Torben and others) worked with LSM:
> LSM appeared to be the "blessed" method in 2.6 of allowing changes to
> security policy to be made. We are now finding out that even if Linus
> "blessed" it by inclusion, there is enough vocal opposition to
> actually using it for something useful that something else has to be
> done. I wouldn't want to run an important machine on 2.6 if adding,
> say SCHED_ISO or even RLIMIT_RT_CPU is part of 2.6's "maintainance".
>
> Meanwhile, as I mentioned before, every realtime audio user of 2.6 is
> *still* going to use "realtime" LSM because its really the only
> effective way to get the privilege needed to do what they want to get
> done.

I am surprised and dismayed by the ignorance of realtime programming
expressed by some (not all) messages in this thread. Worse, many
developers seem unaware of how much they don't know. This stuff is
difficult, even for smart people. Maybe even "especially for smart
people".

I am very conscious of my own matching ignorance of Linux kernel
internals. If possible, I'd like to keep it that way. ;-)

Kernel developers really don't have the equivalent luxury of ignoring
realtime design issues.
--
joq

2005-01-11 19:02:20

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 10:28:13AM -0600, Jack O'Quin wrote:
> Paul Davis <[email protected]> writes:
>
> >>Rlimits are neither UID/GID or PAM-specific. They fit well within
> >>the general model of UNIX security, extending an existing mechanism
> >>rather than adding a completely new one. That PAM happens to be the
> >>way rlimits are usually administered may be unfortunate, yes, but it
> >>doesn't mean that rlimits is the wrong way.
>
> PAM is how most GNU/Linux systems manage rlimits. It is very UID/GID
> oriented. So from the sysadmin perspective, claiming that rlimits is
> "better" or "easier to manage" than "GID hacks" is bogus.

Yes, you're right, so let's invent something completely new and
inherently much less flexible so that the problem is made worse on
both fronts.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 19:24:40

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 08:05:08AM -0500, Paul Davis wrote:
> >> Running `nice --20' is still significantly worse than SCHED_FIFO, but
> >> not the unmitigated disaster shown in the middle column. But, this
> >> improved performance is still not adequate for audio work. The worst
> >> delay was absurdly long (~1/2 sec).
> >
> >Let's work on that. It'd be _far_ better to have unprivileged near-RT
> >capability everywhere without potential scheduling DoS.
>
> I am not sure what you mean here. I think we've established that
> SCHED_OTHER cannot be made adequate for realtime audio work. Its
> intended purpose (timesharing the machine in ways that should
> generally benefit tasks that don't do a lot and/or are dominated by
> user interaction, thus rendering the machine apparently responsive) is
> really at odds with what we need.

We have not established that at all. In principle, because SCHED_OTHER
tasks running at full priority lie on the boundary between SCHED_OTHER
and SCHED_FIFO, they can be made to run arbitrarily close to the
performance of tasks in SCHED_FIFO. With the upside that they won't be
able to deadlock the machine.

And I mean arbitrarily close in the strict delta-epsilon sense.
It's not perfect, but neither is SCHED_FIFO, in principle or in
practice.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 19:46:35

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

> On Tue, Jan 11, 2005 at 08:05:08AM -0500, Paul Davis wrote:
>> I am not sure what you mean here. I think we've established that
>> SCHED_OTHER cannot be made adequate for realtime audio work. Its
>> intended purpose (timesharing the machine in ways that should
>> generally benefit tasks that don't do a lot and/or are dominated by
>> user interaction, thus rendering the machine apparently responsive) is
>> really at odds with what we need.

Matt Mackall <[email protected]> writes:
> We have not established that at all. In principle, because SCHED_OTHER
> tasks running at full priority lie on the boundary between SCHED_OTHER
> and SCHED_FIFO, they can be made to run arbitrarily close to the
> performance of tasks in SCHED_FIFO. With the upside that they won't be
> able to deadlock the machine.
>
> And I mean arbitrarily close in the strict delta-epsilon sense.
> It's not perfect, but neither is SCHED_FIFO, in principle or in
> practice.

Though inelegant in theory, SCHED_FIFO *has* been shown to work in
practice. The POSIX 1003.4 committee were not all a bunch of idiots.
That stuff *is* useful and *does* work (given appropriate privileges).

Your assertions have not been reduced to practice. This is a
significant difference. Write some code, then we can discuss whether
it solves any problems or not. I doubt it, but prove me wrong and
next year you can be the proud author of a scheduler used for hundreds
of audio applications.

Meanwhile, what about 2005? It's "almost upon us". :-/
--
joq

2005-01-11 19:51:23

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 08:30:50AM -0600, Jack O'Quin wrote:
> > Let's work on that. It'd be _far_ better to have unprivileged near-RT
> > capability everywhere without potential scheduling DoS.
>
> "Near-RT" is about the most useless concept I've heard of in a long
> time. It sounds like the answer to a question nobody asked. ;-)

To my way of thinking, it's a pretty good description of Ingo's work
or anything you're ever going to see on a PC. If you think you're
going to get real hard RT performance on your off-the-shelf x86 box
running a conventional OS, you are fooling yourself.

Thankfully a buffer underrun is no more fatal for pro audio than a
broken guitar string. CDs skip, DATs glitch, XLR cables flake out,
circuit breakers trip, amps clip, Powerbooks crash, and the show goes
on. I've done more than enough stage tech to know it's a huge pain in
the ass, but let's stop pretending we require absolute perfection,
please.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 20:00:55

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Matt Mackall <[email protected]> writes:

> On Tue, Jan 11, 2005 at 08:30:50AM -0600, Jack O'Quin wrote:
>> "Near-RT" is about the most useless concept I've heard of in a long
>> time. It sounds like the answer to a question nobody asked. ;-)
>
> To my way of thinking, it's a pretty good description of Ingo's work
> or anything you're ever going to see on a PC. If you think you're
> going to get real hard RT performance on your off-the-shelf x86 box
> running a conventional OS, you are fooling yourself.
>
> Thankfully a buffer underrun is no more fatal for pro audio than a
> broken guitar string. CDs skip, DATs glitch, XLR cables flake out,
> circuit breakers trip, amps clip, Powerbooks crash, and the show goes
> on. I've done more than enough stage tech to know it's a huge pain in
> the ass, but let's stop pretending we require absolute perfection,
> please.

In _practice_, Ingo's patches are considerably better than what you
seem to consider "good enough for mere audio work".
--
joq

2005-01-11 20:06:41

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 01:57:11PM -0600, Jack O'Quin wrote:
> Matt Mackall <[email protected]> writes:
>
> > On Tue, Jan 11, 2005 at 08:30:50AM -0600, Jack O'Quin wrote:
> >> "Near-RT" is about the most useless concept I've heard of in a long
> >> time. It sounds like the answer to a question nobody asked. ;-)
> >
> > To my way of thinking, it's a pretty good description of Ingo's work
> > or anything you're ever going to see on a PC. If you think you're
> > going to get real hard RT performance on your off-the-shelf x86 box
> > running a conventional OS, you are fooling yourself.
> >
> > Thankfully a buffer underrun is no more fatal for pro audio than a
> > broken guitar string. CDs skip, DATs glitch, XLR cables flake out,
> > circuit breakers trip, amps clip, Powerbooks crash, and the show goes
> > on. I've done more than enough stage tech to know it's a huge pain in
> > the ass, but let's stop pretending we require absolute perfection,
> > please.
>
> In _practice_, Ingo's patches are considerably better than what you
> seem to consider "good enough for mere audio work".

Eh? I never implied mainstream was good enough.

What I said was that high priority SCHED_OTHER could be made good
enough and that that would be preferable to SCHED_FIFO in many cases.

Anyway, *plonk*.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 20:20:11

by Chris Friesen

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Jack O'Quin wrote:
> Matt Mackall <[email protected]> writes:

>>Thankfully a buffer underrun is no more fatal for pro audio than a
>>broken guitar string. CDs skip, DATs glitch, XLR cables flake out,
>>circuit breakers trip, amps clip, Powerbooks crash, and the show goes
>>on. I've done more than enough stage tech to know it's a huge pain in
>>the ass, but let's stop pretending we require absolute perfection,
>>please.

> In _practice_, Ingo's patches are considerably better than what you
> seem to consider "good enough for mere audio work".

I don't see anywere that Matt was criticising Ingo's work. He just said
that it wasn't hard realtime--which is true.

A hard realtime system will *guarantee* that the deadlines will be met,
*no matter what*. It makes all kinds of other sacrifices to do it, and
it makes additional demands on the application designer as well.

I don't think Ingo would claim that his patches make Linux a hard RT
operating system.

Chris

2005-01-11 20:29:42

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 12:05 -0800, Matt Mackall wrote:
> Anyway, *plonk*.
>

Plonk? WTF? Jack comes up with what many people think is a reasonable
solution to a real problem, that affects thousands of users, and in the
middle of what seems to me a civilized discussion, you killfile him
because he disagrees with you?

Plonk to you too, asshole.

Lee

2005-01-11 20:47:13

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Lee Revell ([email protected]) wrote:
> On Tue, 2005-01-11 at 12:05 -0800, Matt Mackall wrote:
> > Anyway, *plonk*.
>
> Plonk? WTF? Jack comes up with what many people think is a reasonable
> solution to a real problem, that affects thousands of users, and in the
> middle of what seems to me a civilized discussion, you killfile him
> because he disagrees with you?
>
> Plonk to you too, asshole.

Guys, could we please bring this back to a useful discussion. None of
you have commented on whether the rlimits for priority are useful. As I
said before, I've no real problem with the module as it stands since it's
tiny, quite contained, and does something people need. But I agree it'd
be better to find something that's workable as long term solution.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 20:48:45

by utz lehmann

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 10:28 -0600, Jack O'Quin wrote:
> Paul Davis <[email protected]> writes:
>
> >>Rlimits are neither UID/GID or PAM-specific. They fit well within
> >>the general model of UNIX security, extending an existing mechanism
> >>rather than adding a completely new one. That PAM happens to be the
> >>way rlimits are usually administered may be unfortunate, yes, but it
> >>doesn't mean that rlimits is the wrong way.
>
> PAM is how most GNU/Linux systems manage rlimits. It is very UID/GID
> oriented. So from the sysadmin perspective, claiming that rlimits is
> "better" or "easier to manage" than "GID hacks" is bogus.

Why do you have such a problem with a rlimit base approach?
IMHO it's not a hack like realtime LSM, usable for other things beside
pro audio (see "scheduling priorities with rlimit" thread), securer and
more user friendly.

With realtime LSM a user in the realtime group can change the nice
values and RT priorities of other users processes, incl. owned by root
and kernel threads. This has to be fixed. I think this means a rewrite
(not using CAP_SYS_NICE).

It can't be used with distro kernels which have common-caps complied in,
eg. fedora.

IMHO for a possible mainline inclusion the mlock part have to taken away
because RLIMIT_MLOCK is a better solution. A pro audio user have to deal
with rlimits for mlock and realtime LSM for the RT priority part.
Doing both with rlimits is more user friendly. Most of them have only to
put something like this in limits.conf:

me hard memlock 500000
me soft memlock 500000
me hard realtime 60
me soft realtime 60

And with rlimits you can drop privileges on process basis. Just set the
hard RLIMIT_RT to 0 (ulimit). You can't do this with realtime LSM.

With realtime rlimit you can even think about to give users realtime
prios on a multi user machine. Limit the RT prio for users to 10 and
have a rt-watchdog process with a higher priority which kills runaway
user RT processes.
With realtime LSM you can't limit the RT prio. It's all or nothing.


2005-01-11 20:50:31

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> On Tue, Jan 11, 2005 at 08:05:08AM -0500, Paul Davis wrote:
> > I am not sure what you mean here. I think we've established that
> > SCHED_OTHER cannot be made adequate for realtime audio work. Its
> > intended purpose (timesharing the machine in ways that should
> > generally benefit tasks that don't do a lot and/or are dominated by
> > user interaction, thus rendering the machine apparently responsive) is
> > really at odds with what we need.
>
> We have not established that at all. In principle, because SCHED_OTHER
> tasks running at full priority lie on the boundary between SCHED_OTHER
> and SCHED_FIFO, they can be made to run arbitrarily close to the
> performance of tasks in SCHED_FIFO. With the upside that they won't be
> able to deadlock the machine.

I don't think they lie quite so neatly on this boundary. There's one
fundamental difference which is how the dynamic priority is adjusted
which alters the basic preemptibility rules.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 20:59:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Chris Wright <[email protected]> wrote:

> > We have not established that at all. In principle, because SCHED_OTHER
> > tasks running at full priority lie on the boundary between SCHED_OTHER
> > and SCHED_FIFO, they can be made to run arbitrarily close to the
> > performance of tasks in SCHED_FIFO. With the upside that they won't be
> > able to deadlock the machine.
>
> I don't think they lie quite so neatly on this boundary. There's one
> fundamental difference which is how the dynamic priority is adjusted
> which alters the basic preemptibility rules.

but at nice level -20 this adjustment is at most +5 priority levels -
i.e. down to an equivalent of nice -15. Consider that a nice 0 task can
at most get a -5 priority boost gives a nice -5 task worst-case - so the
nice -20 task still preempts the lower prio task.

so this could work in theory. But practice shows it doesnt work at the
moment, and nobody has analyzed why, yet.

(There are some other differences in scheduling like starvation
prevention adding potential delays, but those should in theory not
affect the basic tests that were done so far. There are also some
differences in timeslice management, but with the huge timeslices that
nice -20 tasks get this shouldnt be causing problems either. So my
current thinking is that there's an unknown scheduling effect causing
latency regression of nice -20 tasks, compared to the latencies of
RT-prio-1 tasks.)

Ingo

2005-01-11 21:11:02

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 10:28 -0600, Jack O'Quin wrote:
> Paul Davis <[email protected]> writes:
>
> >>Rlimits are neither UID/GID or PAM-specific. They fit well within
> >>the general model of UNIX security, extending an existing mechanism
> >>rather than adding a completely new one. That PAM happens to be the
> >>way rlimits are usually administered may be unfortunate, yes, but it
> >>doesn't mean that rlimits is the wrong way.
>
> PAM is how most GNU/Linux systems manage rlimits. It is very UID/GID
> oriented. So from the sysadmin perspective, claiming that rlimits is
> "better" or "easier to manage" than "GID hacks" is bogus.
>

Sorry, I have to agree with Matt, let's just use PAM. Maybe I have been
a Linux admin for too long but I don't think PAM is so bad. Yes it
could be better documented but if this was a showstopper then no one
would use Linux at all. It's not like every naive user will have to
figure out PAM now, the audio oriented distributions will just set it up
right by default. And if people want to use the mainstream distros to
do audio work OOTB, they'll just have to bug their vendor about it.

> > agreed, although i note with interest the flap over RLIMIT_MEMLOCK
> > being made accessible to unprivileged users by people working on
> > grsecurity.
>
> :-)

But we are not talking about unprivileged users. Do not take
"unprivileged" to mean "nonroot". We need an easy mechanism for root to
tell the kernel 'the following users get to do things that could
potentially lock up the system'. No general purpose Linux distro would
ship with this enabled by default for everyone. But, to quote another
LKML thread 'you can't prevent root from doing stupid things because
that would also keep him from doing clever things'.

It's a fine line between stupid and clever.

Lee


2005-01-11 21:12:39

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 12:47 -0800, Chris Wright wrote:
> * Lee Revell ([email protected]) wrote:
> > On Tue, 2005-01-11 at 12:05 -0800, Matt Mackall wrote:
> > > Anyway, *plonk*.
> >
> > Plonk? WTF? Jack comes up with what many people think is a reasonable
> > solution to a real problem, that affects thousands of users, and in the
> > middle of what seems to me a civilized discussion, you killfile him
> > because he disagrees with you?
> >
> > Plonk to you too, asshole.
>
> Guys, could we please bring this back to a useful discussion. None of
> you have commented on whether the rlimits for priority are useful. As I
> said before, I've no real problem with the module as it stands since it's
> tiny, quite contained, and does something people need. But I agree it'd
> be better to find something that's workable as long term solution.

Chris, I did comment on it, see
[email protected] from around 5:15 on
Saturday.

from the above message:

Eh, PAM is a perfectly fine solution. Documentation is lacking, but
it's easy to find examples. On my system /etc/security/limits.conf has
this sample config, commented out:

#<domain> <type> <item> <value>
#

#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0

So add your audio users (or cdrecord users, or whoever) to group
realtime and add:

realtime hard memlock 100000
realtime soft prio 100

Problem solved.

Lee


2005-01-11 21:16:53

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Ingo Molnar ([email protected]) wrote:
> * Chris Wright <[email protected]> wrote:
> > I don't think they lie quite so neatly on this boundary. There's one
> > fundamental difference which is how the dynamic priority is adjusted
> > which alters the basic preemptibility rules.
>
> but at nice level -20 this adjustment is at most +5 priority levels -
> i.e. down to an equivalent of nice -15. Consider that a nice 0 task can
> at most get a -5 priority boost gives a nice -5 task worst-case - so the
> nice -20 task still preempts the lower prio task.

Yeah, I realize it provides some safety, I just wanted to point out
the fundamental difference. And one point being made is that it's
the occasional worst case latencies which are the problem. Dynamic
adjustments could be one culprit for this.

Hmm, I wonder if this could have anything to do with it. These are
within striking range:

PID COMMAND NI PRI
9 events/1 -10 34
931 kcryptd/1 -10 33
930 kcryptd/0 -10 34
8 events/0 -10 34
892 ata/1 -10 34
891 ata/0 -10 34
3747 udevd -10 33
26 kacpid -10 31
238 aio/1 -10 34
237 aio/0 -10 31
117 kblockd/1 -10 34
116 kblockd/0 -10 34
10 khelper -10 34

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 21:21:19

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Lee Revell ([email protected]) wrote:
> On Tue, 2005-01-11 at 12:47 -0800, Chris Wright wrote:
> > Guys, could we please bring this back to a useful discussion. None of
> > you have commented on whether the rlimits for priority are useful. As I
> > said before, I've no real problem with the module as it stands since it's
> > tiny, quite contained, and does something people need. But I agree it'd
> > be better to find something that's workable as long term solution.
>
> Chris, I did comment on it, see
> [email protected] from around 5:15 on
> Saturday.

Eeek, I missed/forgot (let me guess, I replied too? ;-)

Thanks Lee.
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 21:25:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> The numbers I reported yesterday were so bad I couldn't figure out why
> anyone even thought it was worth trying. Now I realize why.
>
> When Ingo said to try "nice -20", I took him literally, forgetting
> that the stupid command to achieve a nice value of -20 is `nice --20'.
> So I was actually testing with a nice value of 19. Bah! No wonder it
> sucked.
>
> Running `nice --20' is still significantly worse than SCHED_FIFO, but
> not the unmitigated disaster shown in the middle column. But, this
> improved performance is still not adequate for audio work. The worst
> delay was absurdly long (~1/2 sec).
>
> Here are the corrected results...
>
> With -R Without -R Without -R
> (SCHED_FIFO) (nice -20) (nice --20)
>
> ************* SUMMARY RESULT ****************
> Total seconds ran . . . . . . : 300
> Number of clients . . . . . . : 20
> Ports per client . . . . . . : 4
> Frames per buffer . . . . . . : 64
> *********************************************
> Timeout Count . . . . . . . . :( 1) ( 1) ( 1)
> XRUN Count . . . . . . . . . : 2 2837 43
> Delay Count (>spare time) . . : 0 0 0
> Delay Count (>1000 usecs) . . : 0 0 0
> Delay Maximum . . . . . . . . : 3130 usecs 5038044 usecs 501374 usecs
> Cycle Maximum . . . . . . . . : 960 usecs 18802 usecs 1036 usecs
> Average DSP Load. . . . . . . : 34.3 % 44.1 % 34.3 %

what kind of non-audio workload was there during this test? 43 xruns
arent nice but arent that bad either.

plus, is it 100% sure that all audio threads inherited the nice --20
priority - including the client threads? Nornally jackd does a
setscheduler for the client threads so that they get boosted to
SCHED_FIFO, but there is no parallel to that in the nice --20 case, did
you do that manually (or did you start the clients up from the nice --20
shell too?))

If the nice --20 priority setup is perfect and there are still xruns
then could you try the following hack, change this line in
kernel/sched.c:

#define STARVATION_LIMIT (MAX_SLEEP_AVG)

to:

#define STARVATION_LIMIT 0

this will turn off starvation checking, for testing purposes. (to see
whether there's anything else but anti-starvation causing xruns.)

Ingo

2005-01-11 21:31:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Chris Wright <[email protected]> wrote:

> Hmm, I wonder if this could have anything to do with it. These are
> within striking range:
>
> PID COMMAND NI PRI
> 9 events/1 -10 34
> 931 kcryptd/1 -10 33
> 930 kcryptd/0 -10 34
> 8 events/0 -10 34
> 892 ata/1 -10 34
> 891 ata/0 -10 34
> 3747 udevd -10 33
> 26 kacpid -10 31
> 238 aio/1 -10 34
> 237 aio/0 -10 31
> 117 kblockd/1 -10 34
> 116 kblockd/0 -10 34
> 10 khelper -10 34

you are right, i forgot about kernel threads. If they are nice -10 on
Jack's system too then they are within striking range indeed, especially
since they are typically idle and if then they are active for short
bursts of time and get the maximum boost. Jack, could you renice these
to -5, to make sure they dont interfere?

btw., why are these at nice -10? workqueue.c sets nice value to -5
normally.

Ingo

2005-01-11 21:33:32

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 12:47:07PM -0800, Chris Wright wrote:
> Guys, could we please bring this back to a useful discussion. None of
> you have commented on whether the rlimits for priority are useful. As I
> said before, I've no real problem with the module as it stands since it's
> tiny, quite contained, and does something people need. But I agree it'd
> be better to find something that's workable as long term solution.

I almost like it. I don't like that it exposes the internal scheduler
priorities directly (-tiny in fact has options to change these!). So
perhaps some thought could be given to either stratifying it a bit
more (>2000 for SCHED_FIFO, >1000 for SCHED_RR, then SCHED_OTHER) or
separate limits for the different scheduling disciplines.

Right now, you can make a good argument that SCHED_FIFO > SCHED_RR >
SCHED_OTHER from a privilege point of view, but that could change if
we add a pseudo-RT scheduling class of some sort. Similarly, adding a
discipline means adding an rlimit with the split approach, so that's
not very friendly either.

Another way:

0-20: normal nice values (inverted)
>20: privilege to set any RT priority

Limiting to below normal nice is a little weird and the offset to make
everything positive is weird as well. Above 20, any RT app can starve
SCHED_OTHER and it's less important to dole out fine-grained levels
here as these apps must be engineered to cooperate to some degree
anyway.

But I'm also still not convinced this policy can't be most flexibly
handled by a setuid helper together with the mlock rlimit.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 21:42:30

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 13:28 -0800, Matt Mackall wrote:
> But I'm also still not convinced this policy can't be most flexibly
> handled by a setuid helper together with the mlock rlimit.
>

Quoting my message from a few days ago:

On Thu, 2005-01-06 at 17:18 -0800, Matt Mackall wrote:
> Why can't this be done with a simple SUID helper to promote given
> tasks to RT with sched_setschedule, doing essentially all the checks
> this LSM is doing?
>
> Objections of "because it requires dangerous root or suid" don't fly,
> an RT app under user control can DoS the box trivially. Never mind you
> need root to configure the LSM anyway..

Yes but a bug in an app running as root can trash the filesystem. The
worst you can do with RT privileges is lock up the machine.

Lee

2005-01-11 21:46:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 04:38:14PM -0500, Lee Revell wrote:
> Yes but a bug in an app running as root can trash the filesystem. The
> worst you can do with RT privileges is lock up the machine.

several filesystem and IO threads run at prio -10 but not RT.
That makes me a bit less sure of your statement....

2005-01-11 21:46:25

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> On Tue, Jan 11, 2005 at 12:47:07PM -0800, Chris Wright wrote:
> > Guys, could we please bring this back to a useful discussion. None of
> > you have commented on whether the rlimits for priority are useful. As I
> > said before, I've no real problem with the module as it stands since it's
> > tiny, quite contained, and does something people need. But I agree it'd
> > be better to find something that's workable as long term solution.
>
> I almost like it. I don't like that it exposes the internal scheduler
> priorities directly (-tiny in fact has options to change these!). So
> perhaps some thought could be given to either stratifying it a bit
> more (>2000 for SCHED_FIFO, >1000 for SCHED_RR, then SCHED_OTHER) or
> separate limits for the different scheduling disciplines.

Yeah, I don't like that either (thought I mentioned it in earliest
patch). I thought about the method you mentioned, but didn't like it
much better. I also suggested using 0 == default, 1 == can nice down,
2 == can set RT prio. Utz suggests just splitting nice limit from rt
limit.

> But I'm also still not convinced this policy can't be most flexibly
> handled by a setuid helper together with the mlock rlimit.

Wait, why can't it be done with (to date fictitious) pam_prio, which
simply calls sched_setscheduler? It's already privileged while it's
doing these things...

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 22:11:32

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 04:38:14PM -0500, Lee Revell wrote:
> On Tue, 2005-01-11 at 13:28 -0800, Matt Mackall wrote:
> > But I'm also still not convinced this policy can't be most flexibly
> > handled by a setuid helper together with the mlock rlimit.
>
> Quoting my message from a few days ago:
>
> On Thu, 2005-01-06 at 17:18 -0800, Matt Mackall wrote:
> > Why can't this be done with a simple SUID helper to promote given
> > tasks to RT with sched_setschedule, doing essentially all the checks
> > this LSM is doing?
> >
> > Objections of "because it requires dangerous root or suid" don't fly,
> > an RT app under user control can DoS the box trivially. Never mind you
> > need root to configure the LSM anyway..
>
> Yes but a bug in an app running as root can trash the filesystem. The
> worst you can do with RT privileges is lock up the machine.

Yes. So can a bug in an LSM or in new rlimits code.

But bugs can be fixed. Poorly designed APIs cannot. That's why the
best API from the kernel perspective is no API: do it in userspace
wherever possible. Bring the kernel in only when the kernel can do it
better, more cleanly, and more generally. The rlimits-on-priorities
approach may qualify in that it might solve problems for other folks
(games on the desktop, CD burning, and the like) and isn't a bad fit
into the rest of the standard security model, but it's still got a
wart or two.

I suppose I ought to spell out my personal LSM bias while I'm at it:

- it invites ad-hoc extensions like this
- we have enough security issues without supporting a proliferation of
incompatible security models

So while I think it's perfectly fine for people to kludge up things
like this, I don't think they belong in the tree unless they're _very_
generally applicable and _very_ well thought out. LSMs should not be
treated like drivers.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 22:15:30

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Ingo Molnar ([email protected]) wrote:
> you are right, i forgot about kernel threads. If they are nice -10 on
> Jack's system too then they are within striking range indeed, especially
> since they are typically idle and if then they are active for short
> bursts of time and get the maximum boost. Jack, could you renice these
> to -5, to make sure they dont interfere?

Yup, their bursty nature makes them seem a likely culprit.

> btw., why are these at nice -10? workqueue.c sets nice value to -5
> normally.

Heh, I was just wondering the same thing.

BTW, grepping set_user_nice shows a few more possible culprits.
One more reason that there may be value in promoting the audio app to
rt scheduling.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 22:19:26

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 01:42:51PM -0800, Chris Wright wrote:
> > But I'm also still not convinced this policy can't be most flexibly
> > handled by a setuid helper together with the mlock rlimit.
>
> Wait, why can't it be done with (to date fictitious) pam_prio, which
> simply calls sched_setscheduler? It's already privileged while it's
> doing these things...

You certainly do not want to run everything at RT from login on.
That'd be bad.

Also, tying to UIDs rather than (UID, executable) is worrisome as
random_game_with_audio in Gnome might decide it needs RT, much to the
admin's surprise.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 22:21:57

by utz

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 13:28 -0800, Matt Mackall wrote:
> On Tue, Jan 11, 2005 at 12:47:07PM -0800, Chris Wright wrote:
> > Guys, could we please bring this back to a useful discussion. None of
> > you have commented on whether the rlimits for priority are useful. As I
> > said before, I've no real problem with the module as it stands since it's
> > tiny, quite contained, and does something people need. But I agree it'd
> > be better to find something that's workable as long term solution.
>
> I almost like it. I don't like that it exposes the internal scheduler
> priorities directly (-tiny in fact has options to change these!). So
> perhaps some thought could be given to either stratifying it a bit
> more (>2000 for SCHED_FIFO, >1000 for SCHED_RR, then SCHED_OTHER) or
> separate limits for the different scheduling disciplines.
>
> Right now, you can make a good argument that SCHED_FIFO > SCHED_RR >
> SCHED_OTHER from a privilege point of view, but that could change if
> we add a pseudo-RT scheduling class of some sort. Similarly, adding a
> discipline means adding an rlimit with the split approach, so that's
> not very friendly either.
>
> Another way:
>
> 0-20: normal nice values (inverted)
> >20: privilege to set any RT priority
>
> Limiting to below normal nice is a little weird and the offset to make
> everything positive is weird as well. Above 20, any RT app can starve
> SCHED_OTHER and it's less important to dole out fine-grained levels
> here as these apps must be engineered to cooperate to some degree
> anyway.

Limiting to positive nice values are needed too. At leased i need such
thing. Normal users are only allowed to increase the nice value (lower
prio). If a user job runs at nice 15 they can't renice it to 5. I get
about 3 calls a week to do this as root.

And the presentation of the usual nice values can be done in userspace.
pamlimits and ulimit already converts values (min -> s, KiB -> Bytes).

And separating the nice and RT part is useful to prevent confusion in
userspace tools and for the admin.

I patched PAM which allows the setting of nice and realtime rlimits in
limits.conf:

nice goes from 19 to -20 (internally converted to 0-39).
realtime from 0 - 99.



2005-01-11 22:23:37

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> On Tue, Jan 11, 2005 at 01:42:51PM -0800, Chris Wright wrote:
> > > But I'm also still not convinced this policy can't be most flexibly
> > > handled by a setuid helper together with the mlock rlimit.
> >
> > Wait, why can't it be done with (to date fictitious) pam_prio, which
> > simply calls sched_setscheduler? It's already privileged while it's
> > doing these things...
>
> You certainly do not want to run everything at RT from login on.
> That'd be bad.

Yup, true.

> Also, tying to UIDs rather than (UID, executable) is worrisome as
> random_game_with_audio in Gnome might decide it needs RT, much to the
> admin's surprise.

Hmm, well, the pam_limit approach has that problem.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 22:29:54

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Chris Wright wrote:
> * Ingo Molnar ([email protected]) wrote:
>
>>you are right, i forgot about kernel threads. If they are nice -10 on
>>Jack's system too then they are within striking range indeed, especially
>>since they are typically idle and if then they are active for short
>>bursts of time and get the maximum boost. Jack, could you renice these
>>to -5, to make sure they dont interfere?
>
>
> Yup, their bursty nature makes them seem a likely culprit.
>
>
>>btw., why are these at nice -10? workqueue.c sets nice value to -5
>>normally.
>
>
> Heh, I was just wondering the same thing.
>
> BTW, grepping set_user_nice shows a few more possible culprits.
> One more reason that there may be value in promoting the audio app to
> rt scheduling.

They were nice -10. I changed them to nice -5 recently in -mm and that
just got commited to -bk post 2.6.10

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-11 22:42:56

by utz lehmann

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, 2005-01-11 at 14:21 -0800, Chris Wright wrote:
> * Matt Mackall ([email protected]) wrote:
> > On Tue, Jan 11, 2005 at 01:42:51PM -0800, Chris Wright wrote:

> > Also, tying to UIDs rather than (UID, executable) is worrisome as
> > random_game_with_audio in Gnome might decide it needs RT, much to the
> > admin's surprise.
>
> Hmm, well, the pam_limit approach has that problem.

selinux?


2005-01-11 22:53:26

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>But I'm also still not convinced this policy can't be most flexibly
>handled by a setuid helper together with the mlock rlimit.

This has been explained several times already.

When you run a JACK client, the user should not be required to use a
different command sequence depending on whether or not JACK is running
with RT scheduling or not. That's almost more arcane than the current
situation and is a step backwards from even 2.4, where we use
capabilities to allow JACK itself to pass on the ability to use RT
scheduling and memlock to its clients.

--p

2005-01-11 22:57:31

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>On Tue, Jan 11, 2005 at 04:38:14PM -0500, Lee Revell wrote:
>> Yes but a bug in an app running as root can trash the filesystem. The
>> worst you can do with RT privileges is lock up the machine.
>
>several filesystem and IO threads run at prio -10 but not RT.
>That makes me a bit less sure of your statement....

Its completely orthogonal. Lee didn't say "tasks running without RT
can't mess up filesystems". He said "tasks running as root can trash
the filesystem" and "tasks running as RT can lock up the
machine". obviously, the intersection point (a root, RT task) is
double trouble.

--p

2005-01-11 22:53:24

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>Thankfully a buffer underrun is no more fatal for pro audio than a
>broken guitar string. CDs skip, DATs glitch, XLR cables flake out,
>circuit breakers trip, amps clip, Powerbooks crash, and the show goes
>on. I've done more than enough stage tech to know it's a huge pain in
>the ass, but let's stop pretending we require absolute perfection,
>please.

Are you really serious? Nobody said anything about absolute
perfection. We've got 2 kernels (2.4+lowlat, and 2.6
+realtime_preempt) whose performance *far* exceeds that of any vanilla
kernel, and in the latter case, probably any other desktop OS. We've
even got a kernel (2.6.9 or maybe .10) whose performance is getting
closer to par with OSX. We want people to be able to access this
performance relatively hassle free. Right now, people who want this
have to jump through a lot of hoops to access something they can, and
should, be able to do quite easily.

*That* is what this is all about, nothing more. From the looks of
things, the performance of vanilla 2.6 in this area is going to
continue to improve, but users' ability to actually use it will remain
in the same primitive condition its in now.

--p




2005-01-11 22:45:01

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* utz lehmann ([email protected]) wrote:
> On Tue, 2005-01-11 at 14:21 -0800, Chris Wright wrote:
> > * Matt Mackall ([email protected]) wrote:
> > > On Tue, Jan 11, 2005 at 01:42:51PM -0800, Chris Wright wrote:
>
> > > Also, tying to UIDs rather than (UID, executable) is worrisome as
> > > random_game_with_audio in Gnome might decide it needs RT, much to the
> > > admin's surprise.
> >
> > Hmm, well, the pam_limit approach has that problem.
>
> selinux?

Won't work (at least not now). LIDS should do be able to do it.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-11 23:09:55

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 05:48:43PM -0500, Paul Davis wrote:
> >But I'm also still not convinced this policy can't be most flexibly
> >handled by a setuid helper together with the mlock rlimit.
>
> This has been explained several times already.
>
> When you run a JACK client, the user should not be required to use a
> different command sequence depending on whether or not JACK is running
> with RT scheduling or not. That's almost more arcane than the current
> situation and is a step backwards from even 2.4, where we use
> capabilities to allow JACK itself to pass on the ability to use RT
> scheduling and memlock to its clients.

And that is a failure of imagination on the part of the JACK
developers. Simply add a library function to libjack or whatever:

jack_make_me_important(...); /* pretty please */

A client starts at normal priority, asks jack nicely to promote it to
RT, then jackd, if so configured/enabled, calls the wrapper with a PID
and a priority level. The wrapper checks the UID/priority/executable
name against its permission table and does sched_set{scheduler,param}
if allowed.

This is nice because Jack gets to make the decisions about what the
appropriate priorities for its clients are (eg they can't be higher
priority than jackd, etc.) and it all gracefully falls back if the
helper isn't enabled.

--
Mathematics is the supreme nostalgia of our time.

2005-01-11 23:13:31

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Paul Davis ([email protected]) wrote:
> >On Tue, Jan 11, 2005 at 04:38:14PM -0500, Lee Revell wrote:
> >> Yes but a bug in an app running as root can trash the filesystem. The
> >> worst you can do with RT privileges is lock up the machine.
> >
> >several filesystem and IO threads run at prio -10 but not RT.
> >That makes me a bit less sure of your statement....
>
> Its completely orthogonal. Lee didn't say "tasks running without RT
> can't mess up filesystems". He said "tasks running as root can trash
> the filesystem" and "tasks running as RT can lock up the
> machine". obviously, the intersection point (a root, RT task) is
> double trouble.

This is straying from the core issue... But, Arjan's saying that an RT
(non-root) task could trash the filesystem if it deadlocks the machine
(because those important fs and IO threads don't run).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-12 01:42:47

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Chris Wright <[email protected]> writes:

> * Paul Davis ([email protected]) wrote:
>> >On Tue, Jan 11, 2005 at 04:38:14PM -0500, Lee Revell wrote:
>> >> Yes but a bug in an app running as root can trash the filesystem. The
>> >> worst you can do with RT privileges is lock up the machine.
>> >
>> >several filesystem and IO threads run at prio -10 but not RT.
>> >That makes me a bit less sure of your statement....
>>
>> Its completely orthogonal. Lee didn't say "tasks running without RT
>> can't mess up filesystems". He said "tasks running as root can trash
>> the filesystem" and "tasks running as RT can lock up the
>> machine". obviously, the intersection point (a root, RT task) is
>> double trouble.
>
> This is straying from the core issue... But, Arjan's saying that an RT
> (non-root) task could trash the filesystem if it deadlocks the machine
> (because those important fs and IO threads don't run).

Lexicographic ambiguity: Lee and Paul are using "trash" for things
like installing a hidden suid root shell or co-opting sendmail into an
open spam relay. Arjan just means crashing the system which forces
reboot to run fsck.
--
joq

2005-01-12 02:10:21

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> * Jack O'Quin <[email protected]> wrote:
>
>> Here are the corrected results...
>>
>> With -R Without -R Without -R
>> (SCHED_FIFO) (nice -20) (nice --20)
>>
>> XRUN Count . . . . . . . . . : 2 2837 43
>> Delay Maximum . . . . . . . . : 3130 usecs 5038044 usecs 501374 usecs
>> Cycle Maximum . . . . . . . . : 960 usecs 18802 usecs 1036 usecs
>
> what kind of non-audio workload was there during this test? 43 xruns
> arent nice but arent that bad either.

Nothing heavy, but I was reading mail, and switching GNOME workspaces.
Workspace switching often caused trouble in the past, but I had
already hacked my X server not to run nice -10 (which is the Debian
default).

> plus, is it 100% sure that all audio threads inherited the nice --20
> priority - including the client threads? Nornally jackd does a
> setscheduler for the client threads so that they get boosted to
> SCHED_FIFO, but there is no parallel to that in the nice --20 case, did
> you do that manually (or did you start the clients up from the nice --20
> shell too?))

Having totally screwed up the test once already, I hesitate to claim
100% surety about anything. :-)

The script starts all the clients. I ran it with nice --20. I just
started it again so I could check the nice values with GNOME system
monitor. They all have -20, AFAICS. There are a bunch of them at
-20, and I don't see any process that looks relevant without -20.

> If the nice --20 priority setup is perfect and there are still xruns
> then could you try the following hack, change this line in
> kernel/sched.c:
>
> #define STARVATION_LIMIT (MAX_SLEEP_AVG)
>
> to:
>
> #define STARVATION_LIMIT 0
>
> this will turn off starvation checking, for testing purposes. (to see
> whether there's anything else but anti-starvation causing xruns.)

No problem (it might be Thursday before I have time to try it).
--
joq

2005-01-12 02:16:59

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>And that is a failure of imagination on the part of the JACK

Please be careful with your words. Based on your comments below, it
appears that you've never read any of the technical docs on it, and
almost certainly never read the source code.

>developers. Simply add a library function to libjack or whatever:
>
> jack_make_me_important(...); /* pretty please */

like:

int jack_set_client_capabilities (jack_engine_t *engine, jack_client_id_t id);

along with various other things that will ultimately get the client to
call functions like:

int jack_drop_real_time_scheduling (pthread_t thread);
int jack_acquire_real_time_scheduling (pthread_t thread, int priority);

these functions are exported to clients, because some clients have
other threads that require RT scheduling.

>A client starts at normal priority, asks jack nicely to promote it to
>RT, then jackd, if so configured/enabled, calls the wrapper with a PID

a PID? clients are multithreaded, and only specific threads run with
RT scheduling (normally just the one created for them by
libjack). So you presumably mean a TID, which in turn creates a
problem for any system (e.g. 2.4) where all threads share the PID, and
sched_setscheduler() really does use the PID as a PID, not a TID.

but its gets worse. JACK clients need to drop RT scheduling under
certain, well-defined circumstances. how do they get it back under
this scheme?

--p

2005-01-12 03:20:36

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> * Chris Wright <[email protected]> wrote:
>
>> Hmm, I wonder if this could have anything to do with it. These are
>> within striking range:
>>
>> PID COMMAND NI PRI
>> 9 events/1 -10 34
>> 931 kcryptd/1 -10 33
>> 930 kcryptd/0 -10 34
>> 8 events/0 -10 34
>> 892 ata/1 -10 34
>> 891 ata/0 -10 34
>> 3747 udevd -10 33
>> 26 kacpid -10 31
>> 238 aio/1 -10 34
>> 237 aio/0 -10 31
>> 117 kblockd/1 -10 34
>> 116 kblockd/0 -10 34
>> 10 khelper -10 34
>
> you are right, i forgot about kernel threads. If they are nice -10 on
> Jack's system too then they are within striking range indeed, especially
> since they are typically idle and if then they are active for short
> bursts of time and get the maximum boost. Jack, could you renice these
> to -5, to make sure they dont interfere?

Sure. My system does have some of these running at nice -10. Where
(how) do I change them?

BTW, let's not lose sight of the fact that `nice --20 foo' requires
CAP_SYS_NICE just like SCHED_FIFO does. From a privilege perspective,
this recurses to the same (still unsolved) problem.

Chris's rlimits proposal was the only workable suggestion I've seen
for that. Is there any hope of doing something like that in the 2.6.x
timeframe?

At this point, I no longer even care that PAM will probably start
randomly assigning users unlimited scheduling rights like it recently
did for mlock. Eventually, that will get fixed. :-(
--
joq

2005-01-12 04:29:48

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Jack O'Quin ([email protected]) wrote:
> Ingo Molnar <[email protected]> writes:
> > you are right, i forgot about kernel threads. If they are nice -10 on
> > Jack's system too then they are within striking range indeed, especially
> > since they are typically idle and if then they are active for short
> > bursts of time and get the maximum boost. Jack, could you renice these
> > to -5, to make sure they dont interfere?
>
> Sure. My system does have some of these running at nice -10. Where
> (how) do I change them?

For a one off test you can brute force it with the plain old renice(8).
Or (depending on which kernel you're using -- Con changed this post
2.6.10) you can apply a patch like:

diff -Nru a/kernel/workqueue.c b/kernel/workqueue.c
--- a/kernel/workqueue.c 2005-01-11 20:26:26 -08:00
+++ b/kernel/workqueue.c 2005-01-11 20:26:26 -08:00
@@ -188,7 +188,7 @@

current->flags |= PF_NOFREEZE;

- set_user_nice(current, -10);
+ set_user_nice(current, -5);

/* Block and flush all signals */
sigfillset(&blocked);

> BTW, let's not lose sight of the fact that `nice --20 foo' requires
> CAP_SYS_NICE just like SCHED_FIFO does. From a privilege perspective,
> this recurses to the same (still unsolved) problem.

Yup, not forgotten ;-)

> Chris's rlimits proposal was the only workable suggestion I've seen
> for that. Is there any hope of doing something like that in the 2.6.x
> timeframe?

Yes there is. We've made other rlimits changes in 2.6, and this one isn't
that invasive. The main issues are: getting semantics right, making sure
it actually solves the problem, making sure it keeps sane defaults (not
creating some new ugly hole), and making sure it's in step with the Grand
Plan (TM). None of these issues are showstoppers, all quite workable.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-12 07:51:56

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 07:43:29PM -0600, Jack O'Quin wrote:
> > This is straying from the core issue... But, Arjan's saying that an RT
> > (non-root) task could trash the filesystem if it deadlocks the machine
> > (because those important fs and IO threads don't run).
>
> Lexicographic ambiguity: Lee and Paul are using "trash" for things
> like installing a hidden suid root shell or co-opting sendmail into an
> open spam relay. Arjan just means crashing the system which forces
> reboot to run fsck.

I actually meant data corruption.

2005-01-12 19:21:39

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Tue, Jan 11, 2005 at 09:13:44PM -0500, Paul Davis wrote:
> >And that is a failure of imagination on the part of the JACK
>
> Please be careful with your words. Based on your comments below, it
> appears that you've never read any of the technical docs on it, and
> almost certainly never read the source code.

I thought I made it clear that I didn't even know the name of library.
And I thought I understood from you that you had to do different
start-up per client depending on whether RT was available. Have I
misunderstood you?

> >A client starts at normal priority, asks jack nicely to promote it to
> >RT, then jackd, if so configured/enabled, calls the wrapper with a PID
>
> a PID? clients are multithreaded, and only specific threads run with
> RT scheduling (normally just the one created for them by
> libjack). So you presumably mean a TID, which in turn creates a
> problem for any system (e.g. 2.4) where all threads share the PID, and
> sched_setscheduler() really does use the PID as a PID, not a TID.

That actually sounds like an independent API problem.

> but its gets worse. JACK clients need to drop RT scheduling under
> certain, well-defined circumstances. how do they get it back under
> this scheme?

Assuming a more thread-aware API, they just ask for privileges again.
But with the non-thread-aware API, my first reaction would be the thread in
question clones, and the clone drops privileges.

--
Mathematics is the supreme nostalgia of our time.

2005-01-12 22:09:17

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-12 at 08:49 +0100, Arjan van de Ven wrote:
> On Tue, Jan 11, 2005 at 07:43:29PM -0600, Jack O'Quin wrote:
> > > This is straying from the core issue... But, Arjan's saying that an RT
> > > (non-root) task could trash the filesystem if it deadlocks the machine
> > > (because those important fs and IO threads don't run).
> >
> > Lexicographic ambiguity: Lee and Paul are using "trash" for things
> > like installing a hidden suid root shell or co-opting sendmail into an
> > open spam relay. Arjan just means crashing the system which forces
> > reboot to run fsck.
>
> I actually meant data corruption.

OK, so the ability to run RT tasks implies the ability to possibly
corrupt data. It appears that this can't be fixed until we have a real
isochronous scheduling class; for the forseeable future RT tasks will
need SCHED_FIFO and nonroot users will need to run them.

Anyway it's good to see the problem finally being taken seriously.

Lee

2005-01-13 00:50:07

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Arjan van de Ven <[email protected]> writes:

> On Tue, Jan 11, 2005 at 07:43:29PM -0600, Jack O'Quin wrote:
>> Lexicographic ambiguity: Lee and Paul are using "trash" for things
>> like installing a hidden suid root shell or co-opting sendmail into an
>> open spam relay. Arjan just means crashing the system which forces
>> reboot to run fsck.
>
> I actually meant data corruption.

Are you concerned about something different from the "normal" risk of
data corruption when the kernel panics or someone trips over the power
cord?
--
joq

2005-01-13 01:10:09

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, 2005-01-12 at 11:09 -0800, Matt Mackall wrote:
> On Tue, Jan 11, 2005 at 09:13:44PM -0500, Paul Davis wrote:
> > >A client starts at normal priority, asks jack nicely to promote it to
> > >RT, then jackd, if so configured/enabled, calls the wrapper with a PID
> >
> > a PID? clients are multithreaded, and only specific threads run with
> > RT scheduling (normally just the one created for them by
> > libjack). So you presumably mean a TID, which in turn creates a
> > problem for any system (e.g. 2.4) where all threads share the PID, and
> > sched_setscheduler() really does use the PID as a PID, not a TID.
>
> That actually sounds like an independent API problem.
>

What's your point? It has to work on 2.4, so this is not a feasible
solution.

> > but its gets worse. JACK clients need to drop RT scheduling under
> > certain, well-defined circumstances. how do they get it back under
> > this scheme?
>
> Assuming a more thread-aware API, they just ask for privileges again.
> But with the non-thread-aware API, my first reaction would be the thread in
> question clones, and the clone drops privileges.
>

Clones? Seems pretty inefficient compared to having a simple mechanism
for root to grant users the ability to run RT tasks. We have such a
system now, and it works perfectly, so any solution that makes people
jump through hoops will be rejected.

Lee

2005-01-13 05:44:01

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> * Chris Wright <[email protected]> wrote:
>
>> Hmm, I wonder if this could have anything to do with it. These are
>> within striking range:
>>
>> PID COMMAND NI PRI
>> 9 events/1 -10 34
>> 931 kcryptd/1 -10 33
>> 930 kcryptd/0 -10 34
>> 8 events/0 -10 34
>> 892 ata/1 -10 34
>> 891 ata/0 -10 34
>> 3747 udevd -10 33
>> 26 kacpid -10 31
>> 238 aio/1 -10 34
>> 237 aio/0 -10 31
>> 117 kblockd/1 -10 34
>> 116 kblockd/0 -10 34
>> 10 khelper -10 34
>
> you are right, i forgot about kernel threads. If they are nice -10 on
> Jack's system too then they are within striking range indeed, especially
> since they are typically idle and if then they are active for short
> bursts of time and get the maximum boost. Jack, could you renice these
> to -5, to make sure they dont interfere?

OK, I reran with just 5 processes reniced from -10 to -5. On my
system they were: events, khelper, kblockd, aio and reiserfs. In
addition, I reniced loop0 from -20 to -5.

I made no changes to the kernel, yet. It's still vanilla 2.6.10 with
realtime-lsm built-in.

A whole bunch of jackd, sh and jack_test3_client processes are ran at
nice -20.

It didn't make any significant difference...

With -R Without -R Without -R
(SCHED_FIFO) (nice --20) (kprocs reniced)

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1) ( 1) ( 1)
XRUN Count . . . . . . . . . : 2 43 49
Delay Count (>spare time) . . : 0 0 0
Delay Count (>1000 usecs) . . : 0 0 0
Delay Maximum . . . . . . . . : 3130 usecs 501374 usecs 501415 usecs
Cycle Maximum . . . . . . . . : 960 usecs 1036 usecs 902 usecs
Average DSP Load. . . . . . . : 34.3 % 34.3 % 34.7 %
Average CPU System Load . . . : 8.7 % 7.8 % 8.5 %
Average CPU User Load . . . . : 29.8 % 25.3 % 23.9 %
Average CPU Nice Load . . . . : 0.0 % 0.0 % 0.0 %
Average CPU I/O Wait Load . . : 3.2 % 0.1 % 0.0 %
Average CPU IRQ Load . . . . : 0.7 % 0.7 % 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 % 0.0 % 0.0 %
Average Interrupt Rate . . . : 1707.6 /sec 1692.9 /sec 1695.7 /sec
Average Context-Switch Rate . : 11914.9 /sec 11611.2 /sec 11603.6 /sec
*********************************************


One major problem: this `nice --20' hack affects every thread, not
just the critical realtime ones. That's not what we want. Audio
applications make very conscious choices which threads run with high
priority and which do not.

That JACK scheduling test doesn't have any graphical component, so it
cannot detect the problems of audio applications with GTK or Qt
threads running at nice -20 which will interfere with their own signal
processing loop. I expect that to cause a horrible mess.

Plus, we maintain JACK for several platforms including GNU/Linux,
FreeBSD, and Mac OS X. IRIX support is planned soon, possibly Solaris
some day. I would really prefer for Linux to support genuine POSIX
realtime with SCHED_FIFO scheduling. Since that is our primary
development platform, it makes our code a lot more portable.

And, this is not just about JACK. We could change to call nice()
instead of pthread_setschedparam() on Linux, but that about all the
other audio applications? I don't think this is a reasonable thing to
ask of people. It would take a year just to get them all changed,
like herding cats.

This whole approach seems like a "dry well" to me.

Tomorrow, I'll try the test again after making a new kernel with
STARVATION_LIMIT set to zero.

Anything else I should try?
--
joq

2005-01-13 06:36:28

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Jan 12, 2005 at 11:44:34PM -0600, Jack O'Quin wrote:
>
> One major problem: this `nice --20' hack affects every thread, not
> just the critical realtime ones. That's not what we want. Audio
> applications make very conscious choices which threads run with high
> priority and which do not.

I don't think it was intended as a final solution but rather as a
feasibility experiment.

> Plus, we maintain JACK for several platforms including GNU/Linux,
> FreeBSD, and Mac OS X. IRIX support is planned soon, possibly Solaris
> some day. I would really prefer for Linux to support genuine POSIX
> realtime with SCHED_FIFO scheduling. Since that is our primary
> development platform, it makes our code a lot more portable.

Good realtime support is looking like a certainty at this point,
though it may take a while for all the bits to be fully merged.

> And, this is not just about JACK. We could change to call nice()
> instead of pthread_setschedparam() on Linux, but that about all the
> other audio applications? I don't think this is a reasonable thing to
> ask of people. It would take a year just to get them all changed,
> like herding cats.

If we can get high priority SCHED_OTHER working sufficiently well,
that will be preferable in the long run as the security implications
are slightly less dire. It's already been noted that it doesn't solve
your privilege problem, but it's still interesting to us because it
has potential to address the deadlock issue.

Doesn't mean you have to use it (though you'll probably want to give
your users the option).

> This whole approach seems like a "dry well" to me.

It may turn out to be. Please continue testing it though - you've got
a good test case handy.

> Tomorrow, I'll try the test again after making a new kernel with
> STARVATION_LIMIT set to zero.
>
> Anything else I should try?

Testing feedback on the bits from Ingo that have gone to -mm will
probably help speed their acceptance in mainline.

--
Mathematics is the supreme nostalgia of our time.

2005-01-13 07:30:08

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Wed, Jan 12, 2005 at 06:44:23PM -0600, Jack O'Quin wrote:
> Arjan van de Ven <[email protected]> writes:
>
> > On Tue, Jan 11, 2005 at 07:43:29PM -0600, Jack O'Quin wrote:
> >> Lexicographic ambiguity: Lee and Paul are using "trash" for things
> >> like installing a hidden suid root shell or co-opting sendmail into an
> >> open spam relay. Arjan just means crashing the system which forces
> >> reboot to run fsck.
> >
> > I actually meant data corruption.
>
> Are you concerned about something different from the "normal" risk of
> data corruption when the kernel panics or someone trips over the power
> cord?

yes; the "normal" risk is time limited, eg the kernel will wait at most 30
seconds before writing back your dirty data, 5 seconds for ext3 actually.
With the "RT-abuse" hang, this 30 second thing goes on hold (because it's
done from those kernel threads that cause you those hickups in sound :-) and
you can starve a far longer period of time.. which may well mean a far
larger dataset not hitting the disk.

2005-01-13 19:20:13

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Matt Mackall <[email protected]> writes:

> If we can get high priority SCHED_OTHER working sufficiently well,
> that will be preferable in the long run as the security implications
> are slightly less dire. It's already been noted that it doesn't solve
> your privilege problem, but it's still interesting to us because it
> has potential to address the deadlock issue.

True.

But there may be other, better solutions to the deadlock problem.
Several years ago, Roger Larsson wrote a completely user-space
realtime monitor program that works perfectly well for revoking
realtime privileges when it detects CPU starvation. I still use it
occasionally to help debug problems if the built-in JACK watchdog
timer doesn't catch them.

In my view, Con Kolivas' SCHED_ISO prototype is a good avenue to
explore for mainstream kernel support. With that approach, it is
relatively easy to build in protection against programs that abuse
their promised cycle reservations. This appears to be similar to what
Apple is doing.

SCHED_OTHER is so timesharing oriented, that I seriously doubt its
appropriateness for soft realtime. I say this naively without any
first-hand study of the current Linux implementation. I do understand
traditional Unix schedulers (at one time in detail). The general idea
was to punish CPU-bound processes and reward I/O-bound processes.

> Doesn't mean you have to use it (though you'll probably want to give
> your users the option).

We already do. That's why I was able to experiment with nice --20 so
quickly. In fact, SCHED_OTHER is the default. Users have to specify
-R (--realtime) before JACK requests SCHED_FIFO privileges.

> On Wed, Jan 12, 2005 at 11:44:34PM -0600, Jack O'Quin wrote:
>> This whole approach seems like a "dry well" to me.
>
> It may turn out to be. Please continue testing it though - you've got
> a good test case handy.

Sure.

I didn't write that test script, BTW. (I'd like to know who did.)
IIUC, it is one Lee Revell and Rui Nuno Capela have been using to test
Ingo's RP patches. I got it from Rui. I chose it because it was
handy and I figured Ingo would be familiar with its output.

We are considering including it (or some variant) in the JACK sources,
so any interested user can download JACK, configure, compile, and then
run `make test'.

It is a fairly heavy test. The system takes an interrupt from the
audio card every 1.45 msec, then must schedule 22 realtime threads
belonging to 21 different processes (the JACK server and twenty
clients) before the next interrupt arrives. An XRUN means the system
was late servicing the interrupt (very bad). The "DSP load" indicates
that these threads are using a little over 1/3 of the total bandwidth
of my 1.5GHz Athlon XP.

>> Tomorrow, I'll try the test again after making a new kernel with
>> STARVATION_LIMIT set to zero.
>>
>> Anything else I should try?
>
> Testing feedback on the bits from Ingo that have gone to -mm will
> probably help speed their acceptance in mainline.

Several people continue working with him on that. Lee and Rui have
been instrumental in testing with Ingo's kernels and in developing
JACK patches to gather needed information. Much of their
instrumentation will be included in the next JACK release.

We are all highly motivated to help. We want Linux to have the best
soft realtime possible, while working within the very real constraints
of what it is practical to do in a general-purpose OS.

I hate for the OSX folks to do better.
--
joq

2005-01-13 21:10:57

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


>> > On Tue, Jan 11, 2005 at 07:43:29PM -0600, Jack O'Quin wrote:
>> >> Arjan just means crashing the system which forces reboot to run
>> >> fsck.

>> Arjan van de Ven <[email protected]> writes:
>> > I actually meant data corruption.

> On Wed, Jan 12, 2005 at 06:44:23PM -0600, Jack O'Quin wrote:
>> Are you concerned about something different from the "normal" risk of
>> data corruption when the kernel panics or someone trips over the power
>> cord?

Arjan van de Ven <[email protected]> writes:
> yes; the "normal" risk is time limited, eg the kernel will wait at most 30
> seconds before writing back your dirty data, 5 seconds for ext3 actually.
> With the "RT-abuse" hang, this 30 second thing goes on hold (because it's
> done from those kernel threads that cause you those hickups in sound :-) and
> you can starve a far longer period of time.. which may well mean a far
> larger dataset not hitting the disk.

Ah, good point.

Just thinking about this naively, I come up with two scenarios:

(1) SMP -- RT thread hangs one CPU. Kernel threads can still run on
other processors. Rest of system continues running (degraded) until
more RT threads hang the remaining CPUs at which time we end up
with...

(2) UP -- RT thread hangs the last remaining CPU. Kernel threads
can't run. User processes can no longer write data to FS.

(Probably, this simplistic analysis misses some other, more subtle,
factors.)

RT threads should not do FS writes of their own. But, a badly broken
or malicious one could, I suppose. So, that might provide a mechanism
for losing more data than usual. Is that what you had in mind?
--
joq

2005-01-13 21:14:49

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:
>
> (Probably, this simplistic analysis misses some other, more subtle,
> factors.)

I think you can do nasty things to the locks held by those threads too

>
> RT threads should not do FS writes of their own. But, a badly broken
> or malicious one could, I suppose. So, that might provide a mechanism
> for losing more data than usual. Is that what you had in mind?

basically yes.
note that "FS writes" can come from various things, including library calls
made and such. But I think you got my point; even though it might seem a bit
theoretical it sure is unpleasant.

2005-01-13 21:30:32

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-13 at 22:07 +0100, Arjan van de Ven wrote:
> On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:
> >
> > (Probably, this simplistic analysis misses some other, more subtle,
> > factors.)
>
> I think you can do nasty things to the locks held by those threads too
>
> >
> > RT threads should not do FS writes of their own. But, a badly broken
> > or malicious one could, I suppose. So, that might provide a mechanism
> > for losing more data than usual. Is that what you had in mind?
>
> basically yes.
> note that "FS writes" can come from various things, including library calls
> made and such. But I think you got my point; even though it might seem a bit
> theoretical it sure is unpleasant.
>

I added Con to the cc: because this thread is starting to converge with
an email discussion we've been having.

The basic issue is that the current semantics of SCHED_FIFO seem make
the deadlock/data corruption due to runaway RT thread issue difficult.
The obvious solution is a new scheduling class equivalent to SCHED_FIFO
but with a mechanism for the kernel to demote the offending thread to
SCHED_OTHER in an emergency. The problem can be solved in userspace
with a SCHED_FIFO watchdog thread that runs at a higher RT priority than
all other RT processes.

This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
excellent solution. The RT watchdog thread could run as root, and the
rlimit would be used to ensure than even nonroot users in the RT group
could never preempt the watchdog thread.

Lee

2005-01-13 21:51:54

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


On Thu, Jan 13, 2005 at 04:25:08PM -0500, Lee Revell wrote:
> The basic issue is that the current semantics of SCHED_FIFO seem make
> the deadlock/data corruption due to runaway RT thread issue difficult.
> The obvious solution is a new scheduling class equivalent to SCHED_FIFO
> but with a mechanism for the kernel to demote the offending thread to
> SCHED_OTHER in an emergency.

and this is getting really close to the original "counter proposal" to the
LSM module that was basically "lets make lower nice limit an rlimit, and
have -20 mean "basically FIFO" *if* the task behaves itself".

2005-01-13 23:43:04

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Arjan van de Ven <[email protected]> writes:

> On Thu, Jan 13, 2005 at 04:25:08PM -0500, Lee Revell wrote:
>> The basic issue is that the current semantics of SCHED_FIFO seem make
>> the deadlock/data corruption due to runaway RT thread issue difficult.
>> The obvious solution is a new scheduling class equivalent to SCHED_FIFO
>> but with a mechanism for the kernel to demote the offending thread to
>> SCHED_OTHER in an emergency.
>
> and this is getting really close to the original "counter proposal" to the
> LSM module that was basically "lets make lower nice limit an rlimit, and
> have -20 mean "basically FIFO" *if* the task behaves itself".

Yes. However, my tests have so far shown a need for "actual FIFO as
long as the task behaves itself."

Otherwise, your rlimits proposal is fine. I still think it puts more
of a burden on the sysadmin, but nobody else seems to care about that.
--
joq

2005-01-14 00:40:14

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Jack O'Quin ([email protected]) wrote:
> Otherwise, your rlimits proposal is fine. I still think it puts more
> of a burden on the sysadmin, but nobody else seems to care about that.

Actually, I care. However, I don't think the burden is really too
much greater. It may put some extra burden on the how-to-audio writer.
But adding a group and editing /etc/security/limits.conf doesn't sound
too bad to me.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-14 00:56:45

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Jack O'Quin wrote:
> Arjan van de Ven <[email protected]> writes:
>
>
>>On Thu, Jan 13, 2005 at 04:25:08PM -0500, Lee Revell wrote:
>>
>>>The basic issue is that the current semantics of SCHED_FIFO seem make
>>>the deadlock/data corruption due to runaway RT thread issue difficult.
>>>The obvious solution is a new scheduling class equivalent to SCHED_FIFO
>>>but with a mechanism for the kernel to demote the offending thread to
>>>SCHED_OTHER in an emergency.
>>
>>and this is getting really close to the original "counter proposal" to the
>>LSM module that was basically "lets make lower nice limit an rlimit, and
>>have -20 mean "basically FIFO" *if* the task behaves itself".
>
>
> Yes. However, my tests have so far shown a need for "actual FIFO as
> long as the task behaves itself."

I should comment on this thread on lkml. After some
investigation/discussion and testing I came up with a proposal for this
problem. Since we are a general purpose operating system and not a hard
rt system (although addon patches are clearly making that a future
possibility) we need a solution that is satisfactory to a general...

There are two ways I suggested for this.

First, (and I am increasingly believing in the second) is to implement a
new scheduling class for isochronous scheduling. This would be a class
for unprivileged users, and behave like SCHED_RR (to avoid complications
of QoS features we dont have infrastrucutre for) at a priority just
above SCHED_NORMAL, but below all privileged SCHED_RR and SCHED_FIFO.
Importantly, a soft cpu limit and rate period can be set by default for
this scheduling class that provides good true SCHED_RR performance, and
is configurable. Literature suggests that 70% is adequate cpu for good
real time performance and would be starvation free. I believe setting
70% with 10% hysteresis (dropping to say 63% on hitting limit) would be
a good start. Beyond this, however, to satisfy the needs of those with
more demanding setups, a simple configurable runtime setting to set both
the cpu% and the rate period could be available to something as simple
as proc
/proc/sys/kernel/iso_cpu
/proc/sys/kernel/iso_cpu_period
where iso_cpu is set to 70, and period to maybe 1 second. The actual
mode of setting this tunable is not important, and could be in /sys or
whatever

The second option is to not implement a new scheduling class at all, and
allow unprivileged users to use either SCHED_FIFO or SCHED_RR, but to
make the cpu constraints described for SCHED_ISO above apply to their
use of those classes. Supporting priority settings for these could be
possible, but in my opinion, it would work as a better class if they
only had one priority level, as for the SCHED_ISO implementation above
(better than any SCHED_NORMAL, but lower than privileged SCHED_RR/FIFO).

This latter approach to me seems the least invasive and most user and
sysadmin friendly method.

What was amusing to me was that after I suggested the latter option, I
discovered that was basically what OSX does, however being not a real
multi-user operating system they had absurd limits for cpu at 90% by
default. Theory suggests 70% should be a good default limit.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 01:22:04

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 14, 2005 at 11:50:14AM +1100, Con Kolivas wrote:
> Jack O'Quin wrote:
> >Arjan van de Ven <[email protected]> writes:
> >
> >
> >>On Thu, Jan 13, 2005 at 04:25:08PM -0500, Lee Revell wrote:
> >>
> >>>The basic issue is that the current semantics of SCHED_FIFO seem make
> >>>the deadlock/data corruption due to runaway RT thread issue difficult.
> >>>The obvious solution is a new scheduling class equivalent to SCHED_FIFO
> >>>but with a mechanism for the kernel to demote the offending thread to
> >>>SCHED_OTHER in an emergency.
> >>
> >>and this is getting really close to the original "counter proposal" to the
> >>LSM module that was basically "lets make lower nice limit an rlimit, and
> >>have -20 mean "basically FIFO" *if* the task behaves itself".
> >
> >
> >Yes. However, my tests have so far shown a need for "actual FIFO as
> >long as the task behaves itself."
>
> I should comment on this thread on lkml. After some
> investigation/discussion and testing I came up with a proposal for this
> problem. Since we are a general purpose operating system and not a hard
> rt system (although addon patches are clearly making that a future
> possibility) we need a solution that is satisfactory to a general...
>
> There are two ways I suggested for this.
>
> First, (and I am increasingly believing in the second) is to implement a
> new scheduling class for isochronous scheduling. This would be a class
> for unprivileged users, and behave like SCHED_RR (to avoid complications
> of QoS features we dont have infrastrucutre for) at a priority just
> above SCHED_NORMAL, but below all privileged SCHED_RR and SCHED_FIFO.
> Importantly, a soft cpu limit and rate period can be set by default for
> this scheduling class that provides good true SCHED_RR performance, and
> is configurable. Literature suggests that 70% is adequate cpu for good
> real time performance and would be starvation free. I believe setting
> 70% with 10% hysteresis (dropping to say 63% on hitting limit) would be
> a good start. Beyond this, however, to satisfy the needs of those with
> more demanding setups, a simple configurable runtime setting to set both
> the cpu% and the rate period could be available to something as simple
> as proc
> /proc/sys/kernel/iso_cpu
> /proc/sys/kernel/iso_cpu_period
> where iso_cpu is set to 70, and period to maybe 1 second. The actual
> mode of setting this tunable is not important, and could be in /sys or
> whatever

This sounds promising, but I think it still needs to be privileged.
See a) below.

> The second option is to not implement a new scheduling class at all, and
> allow unprivileged users to use either SCHED_FIFO or SCHED_RR, but to
> make the cpu constraints described for SCHED_ISO above apply to their
> use of those classes. Supporting priority settings for these could be
> possible, but in my opinion, it would work as a better class if they
> only had one priority level, as for the SCHED_ISO implementation above
> (better than any SCHED_NORMAL, but lower than privileged SCHED_RR/FIFO).

a) How to arbitrate between competing unprivileged users that want
pseudo-SCHED_FIFO? Do they all lose?
b) Priority levels are important here, we want to be able to do things
like have audio run at higher priority than video.

>
> This latter approach to me seems the least invasive and most user and
> sysadmin friendly method.
>
> What was amusing to me was that after I suggested the latter option, I
> discovered that was basically what OSX does, however being not a real
> multi-user operating system they had absurd limits for cpu at 90% by
> default. Theory suggests 70% should be a good default limit.
>
> Cheers,
> Con



--
Mathematics is the supreme nostalgia of our time.

2005-01-14 01:33:10

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Matt Mackall wrote:
> On Fri, Jan 14, 2005 at 11:50:14AM +1100, Con Kolivas wrote:
>
>>Jack O'Quin wrote:
>>
>>>Arjan van de Ven <[email protected]> writes:
>>>
>>>
>>>
>>>>On Thu, Jan 13, 2005 at 04:25:08PM -0500, Lee Revell wrote:
>>>>
>>>>
>>>>>The basic issue is that the current semantics of SCHED_FIFO seem make
>>>>>the deadlock/data corruption due to runaway RT thread issue difficult.
>>>>>The obvious solution is a new scheduling class equivalent to SCHED_FIFO
>>>>>but with a mechanism for the kernel to demote the offending thread to
>>>>>SCHED_OTHER in an emergency.
>>>>
>>>>and this is getting really close to the original "counter proposal" to the
>>>>LSM module that was basically "lets make lower nice limit an rlimit, and
>>>>have -20 mean "basically FIFO" *if* the task behaves itself".
>>>
>>>
>>>Yes. However, my tests have so far shown a need for "actual FIFO as
>>>long as the task behaves itself."
>>
>>I should comment on this thread on lkml. After some
>>investigation/discussion and testing I came up with a proposal for this
>>problem. Since we are a general purpose operating system and not a hard
>>rt system (although addon patches are clearly making that a future
>>possibility) we need a solution that is satisfactory to a general...
>>
>>There are two ways I suggested for this.
>>
>>First, (and I am increasingly believing in the second) is to implement a
>>new scheduling class for isochronous scheduling. This would be a class
>>for unprivileged users, and behave like SCHED_RR (to avoid complications
>>of QoS features we dont have infrastrucutre for) at a priority just
>>above SCHED_NORMAL, but below all privileged SCHED_RR and SCHED_FIFO.
>>Importantly, a soft cpu limit and rate period can be set by default for
>>this scheduling class that provides good true SCHED_RR performance, and
>>is configurable. Literature suggests that 70% is adequate cpu for good
>>real time performance and would be starvation free. I believe setting
>>70% with 10% hysteresis (dropping to say 63% on hitting limit) would be
>>a good start. Beyond this, however, to satisfy the needs of those with
>>more demanding setups, a simple configurable runtime setting to set both
>>the cpu% and the rate period could be available to something as simple
>>as proc
>>/proc/sys/kernel/iso_cpu
>>/proc/sys/kernel/iso_cpu_period
>>where iso_cpu is set to 70, and period to maybe 1 second. The actual
>>mode of setting this tunable is not important, and could be in /sys or
>>whatever
>
>
> This sounds promising, but I think it still needs to be privileged.
> See a) below.

>>The second option is to not implement a new scheduling class at all, and
>>allow unprivileged users to use either SCHED_FIFO or SCHED_RR, but to
>>make the cpu constraints described for SCHED_ISO above apply to their
>>use of those classes. Supporting priority settings for these could be
>>possible, but in my opinion, it would work as a better class if they
>>only had one priority level, as for the SCHED_ISO implementation above
>>(better than any SCHED_NORMAL, but lower than privileged SCHED_RR/FIFO).
>
>
> a) How to arbitrate between competing unprivileged users that want
> pseudo-SCHED_FIFO? Do they all lose?
> b) Priority levels are important here, we want to be able to do things
> like have audio run at higher priority than video.

Indeed how OSX does it is to pretend that it is paying any attention at
all to the QoS requests that is given to it. In actual fact all it does
is RR frequently enough and then everything tends to work anyway...

The reason I suggest not supporting priorities (at the moment) is for
them to even work I would need to implement a complete Earliest Deadline
First scheduler, ideally with more syscalls and transferring QoS
requests to the scheduler. This is umm non-trivial to say the least...
but possible on a landscape not living in a 2.6 forever development world.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 02:05:59

by utz lehmann

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-13 at 16:25 -0500, Lee Revell wrote:
> On Thu, 2005-01-13 at 22:07 +0100, Arjan van de Ven wrote:
> > On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:
> > >
> > > (Probably, this simplistic analysis misses some other, more subtle,
> > > factors.)
> >
> > I think you can do nasty things to the locks held by those threads too
> >
> > >
> > > RT threads should not do FS writes of their own. But, a badly broken
> > > or malicious one could, I suppose. So, that might provide a mechanism
> > > for losing more data than usual. Is that what you had in mind?
> >
> > basically yes.
> > note that "FS writes" can come from various things, including library calls
> > made and such. But I think you got my point; even though it might seem a bit
> > theoretical it sure is unpleasant.
> >
>
> I added Con to the cc: because this thread is starting to converge with
> an email discussion we've been having.
>
> The basic issue is that the current semantics of SCHED_FIFO seem make
> the deadlock/data corruption due to runaway RT thread issue difficult.
> The obvious solution is a new scheduling class equivalent to SCHED_FIFO
> but with a mechanism for the kernel to demote the offending thread to
> SCHED_OTHER in an emergency. The problem can be solved in userspace
> with a SCHED_FIFO watchdog thread that runs at a higher RT priority than
> all other RT processes.
>
> This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
> excellent solution. The RT watchdog thread could run as root, and the
> rlimit would be used to ensure than even nonroot users in the RT group
> could never preempt the watchdog thread.

Just an idea. What about throttling runaway RT tasks?
If the system spend more than 98% in RT tasks for 5s consider this as a
_fatal error_. Print an error message and throttle RT tasks by inserting
ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
means one SCHED_OTHER only tick all 50 ticks.

The limit and timeout should be configurable and of course it can be
disabled.

I know this is against RT task preempt all SCHED_OTHER but this is only
for a fatal system state to be able to recover sanely. A locked up
machine is is the worse alternative.


2005-01-14 02:09:53

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

utz lehmann wrote:
> On Thu, 2005-01-13 at 16:25 -0500, Lee Revell wrote:
>
>>On Thu, 2005-01-13 at 22:07 +0100, Arjan van de Ven wrote:
>>
>>>On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:
>>>
>>>>(Probably, this simplistic analysis misses some other, more subtle,
>>>>factors.)
>>>
>>>I think you can do nasty things to the locks held by those threads too
>>>
>>>
>>>>RT threads should not do FS writes of their own. But, a badly broken
>>>>or malicious one could, I suppose. So, that might provide a mechanism
>>>>for losing more data than usual. Is that what you had in mind?
>>>
>>>basically yes.
>>>note that "FS writes" can come from various things, including library calls
>>>made and such. But I think you got my point; even though it might seem a bit
>>>theoretical it sure is unpleasant.
>>>
>>
>>I added Con to the cc: because this thread is starting to converge with
>>an email discussion we've been having.
>>
>>The basic issue is that the current semantics of SCHED_FIFO seem make
>>the deadlock/data corruption due to runaway RT thread issue difficult.
>>The obvious solution is a new scheduling class equivalent to SCHED_FIFO
>>but with a mechanism for the kernel to demote the offending thread to
>>SCHED_OTHER in an emergency. The problem can be solved in userspace
>>with a SCHED_FIFO watchdog thread that runs at a higher RT priority than
>>all other RT processes.
>>
>>This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
>>excellent solution. The RT watchdog thread could run as root, and the
>>rlimit would be used to ensure than even nonroot users in the RT group
>>could never preempt the watchdog thread.
>
>
> Just an idea. What about throttling runaway RT tasks?
> If the system spend more than 98% in RT tasks for 5s consider this as a
> _fatal error_. Print an error message and throttle RT tasks by inserting
> ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
> means one SCHED_OTHER only tick all 50 ticks.
>
> The limit and timeout should be configurable and of course it can be
> disabled.
>
> I know this is against RT task preempt all SCHED_OTHER but this is only
> for a fatal system state to be able to recover sanely. A locked up
> machine is is the worse alternative.

There is a patch in -mm currently designed to use a sysrq key
combination which converts all real time tasks to sched normal to save
you if you desire in a lockup situation. We do want to preserve RT
scheduling behaviour at all times without caveats for privileged users.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 02:24:29

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 03:05 +0100, utz lehmann wrote:
> On Thu, 2005-01-13 at 16:25 -0500, Lee Revell wrote:

> > This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
> > excellent solution. The RT watchdog thread could run as root, and the
> > rlimit would be used to ensure than even nonroot users in the RT group
> > could never preempt the watchdog thread.
>
> Just an idea. What about throttling runaway RT tasks?
> If the system spend more than 98% in RT tasks for 5s consider this as a
> _fatal error_. Print an error message and throttle RT tasks by inserting
> ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
> means one SCHED_OTHER only tick all 50 ticks.
>
> The limit and timeout should be configurable and of course it can be
> disabled.
>

This is just a hack. Realtime scheduling is pretty rigidly specified,
and we satisfy that. Thus it is useful for systems that need to make
use of it. The way SCHED_FIFO and SCHED_RR scheduling is specified is
inherently insecure/incompatible with a multi user machine; I don't
understand why people are getting heated with this debate. You literally
can't run more than one realtime system on the same CPU(s) if they don't
have a knowledge of one another.

SCHED_FIFO and SCHED_RR are definitely privileged operations and you
can't really change them without making them useless to legitimate
users, I think.



2005-01-14 02:24:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Con Kolivas <[email protected]> wrote:
>
> There is a patch in -mm currently designed to use a sysrq key
> combination which converts all real time tasks to sched normal to save
> you if you desire in a lockup situation.

That's in 2.6.11-rc1 now. sysrq-n.

2005-01-14 02:36:28

by utz lehmann

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 13:08 +1100, Con Kolivas wrote:
> utz lehmann wrote:
> > On Thu, 2005-01-13 at 16:25 -0500, Lee Revell wrote:
> >
> >>On Thu, 2005-01-13 at 22:07 +0100, Arjan van de Ven wrote:
> >>
> >>>On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:
> >>>
> >>>>(Probably, this simplistic analysis misses some other, more subtle,
> >>>>factors.)
> >>>
> >>>I think you can do nasty things to the locks held by those threads too
> >>>
> >>>
> >>>>RT threads should not do FS writes of their own. But, a badly broken
> >>>>or malicious one could, I suppose. So, that might provide a mechanism
> >>>>for losing more data than usual. Is that what you had in mind?
> >>>
> >>>basically yes.
> >>>note that "FS writes" can come from various things, including library calls
> >>>made and such. But I think you got my point; even though it might seem a bit
> >>>theoretical it sure is unpleasant.
> >>>
> >>
> >>I added Con to the cc: because this thread is starting to converge with
> >>an email discussion we've been having.
> >>
> >>The basic issue is that the current semantics of SCHED_FIFO seem make
> >>the deadlock/data corruption due to runaway RT thread issue difficult.
> >>The obvious solution is a new scheduling class equivalent to SCHED_FIFO
> >>but with a mechanism for the kernel to demote the offending thread to
> >>SCHED_OTHER in an emergency. The problem can be solved in userspace
> >>with a SCHED_FIFO watchdog thread that runs at a higher RT priority than
> >>all other RT processes.
> >>
> >>This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
> >>excellent solution. The RT watchdog thread could run as root, and the
> >>rlimit would be used to ensure than even nonroot users in the RT group
> >>could never preempt the watchdog thread.
> >
> >
> > Just an idea. What about throttling runaway RT tasks?
> > If the system spend more than 98% in RT tasks for 5s consider this as a
> > _fatal error_. Print an error message and throttle RT tasks by inserting
> > ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
> > means one SCHED_OTHER only tick all 50 ticks.
> >
> > The limit and timeout should be configurable and of course it can be
> > disabled.
> >
> > I know this is against RT task preempt all SCHED_OTHER but this is only
> > for a fatal system state to be able to recover sanely. A locked up
> > machine is is the worse alternative.
>
> There is a patch in -mm currently designed to use a sysrq key
> combination which converts all real time tasks to sched normal to save
> you if you desire in a lockup situation. We do want to preserve RT
> scheduling behaviour at all times without caveats for privileged users.

The sysrq is already in 2.6.10. I had to use it the last days a few
times. But it does help if you have no access to the console.

The RT throttling idea is not to change the behavior in normal
conditions. It's only for a fatal system state. If you have a runaway RT
task you can't guarantee the system is work properly anyway. It's
blocking vital kernel threads, filesystems, swap, keyboard, ...

It's a bit like out of memory. You can do nothing and panic. Or trying
something bad (killing processes) which is hopefully better as the
former.
btw: Are RT tasks excluded by the oom killer?


2005-01-14 02:42:43

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>SCHED_FIFO and SCHED_RR are definitely privileged operations and you

this is the crux of what this whole debate is about. for all of you
people who think about linux on multi-user systems with network
connectivity, running servers and so forth, this is clearly a given.

but there is large and growing body of machines that run linux where
the sole human user of the machine has a strong and overwhelming
desire to have tasks run with the characteristics offered by
SCHED_FIFO and/or SCHED_RR. are they still "privileged" operations on
this class of linux system? what about linux installed on an embedded
system, with a small LCD screen and the sole purpose of running audio
apps live? are they still privileged then?

i think there is room for debate, but its clear that in general,
SCHED_FIFO/SCHED_RR's "definite" status as privileged operations is
not clear. we are trying to find ways to provide access to it in ways
that don't conflict with the other categories of linux systems where
it clearly needs to be off-limits to unprivileged users.

--p


2005-01-14 02:44:37

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

utz lehmann wrote:
> On Fri, 2005-01-14 at 13:08 +1100, Con Kolivas wrote:
>
>>utz lehmann wrote:
>>>Just an idea. What about throttling runaway RT tasks?
>>>If the system spend more than 98% in RT tasks for 5s consider this as a
>>>_fatal error_. Print an error message and throttle RT tasks by inserting
>>>ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
>>>means one SCHED_OTHER only tick all 50 ticks.
>>>
>>>The limit and timeout should be configurable and of course it can be
>>>disabled.
>>>
>>>I know this is against RT task preempt all SCHED_OTHER but this is only
>>>for a fatal system state to be able to recover sanely. A locked up
>>>machine is is the worse alternative.
>>
>>There is a patch in -mm currently designed to use a sysrq key
>>combination which converts all real time tasks to sched normal to save
>>you if you desire in a lockup situation. We do want to preserve RT
>>scheduling behaviour at all times without caveats for privileged users.
>
>
> The sysrq is already in 2.6.10. I had to use it the last days a few
> times. But it does help if you have no access to the console.
>
> The RT throttling idea is not to change the behavior in normal
> conditions. It's only for a fatal system state. If you have a runaway RT
> task you can't guarantee the system is work properly anyway. It's
> blocking vital kernel threads, filesystems, swap, keyboard, ...

I understand fully your concern. If such a thing were to be introduced
it would have to be disabled by default. Since I'm looking at
implementing such throttling for user RT tasks, it should be trivial to
add it to other RT tasks, and have 100% as the default cpu limit. How
does that sound?

> It's a bit like out of memory. You can do nothing and panic. Or trying
> something bad (killing processes) which is hopefully better as the
> former.
> btw: Are RT tasks excluded by the oom killer?

I haven't looked. VM hackers?

Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 03:14:43

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-13 at 21:40 -0500, Paul Davis wrote:
> >SCHED_FIFO and SCHED_RR are definitely privileged operations and you
>
> this is the crux of what this whole debate is about. for all of you
> people who think about linux on multi-user systems with network
> connectivity, running servers and so forth, this is clearly a given.
>
> but there is large and growing body of machines that run linux where
> the sole human user of the machine has a strong and overwhelming
> desire to have tasks run with the characteristics offered by
> SCHED_FIFO and/or SCHED_RR. are they still "privileged" operations on
> this class of linux system? what about linux installed on an embedded
> system, with a small LCD screen and the sole purpose of running audio
> apps live? are they still privileged then?
>

I think yes, because their misuse can trivially take down the
machine by definition. So it is still privileged in the context
of that system.

> i think there is room for debate, but its clear that in general,
> SCHED_FIFO/SCHED_RR's "definite" status as privileged operations is
> not clear. we are trying to find ways to provide access to it in ways
> that don't conflict with the other categories of linux systems where
> it clearly needs to be off-limits to unprivileged users.
>

In such a system, sure you could make allowances by elevating
privileges or what have you.

I guess the tricky part is exactly how to make these allowances. I've
joined the thread too late (and don't have the knowledge) to really get
into that... but I just wanted to be clear that watering down SCHED_RR
and SCHED_FIFO basically just makes them no good to anyone.

I personally can't see how a scheduling policy can allow deterministic
access to the CPU without being a privileged operation. If you don't
need deterministic access to the scheduler, then let's talk about why
SCHED_OTHER isn't good enough. If you do, then we're talking about
security access I think.

Nick


http://mobile.yahoo.com.au - Yahoo! Mobile
- Check & compose your email via SMS on your Telstra or Vodafone mobile.

2005-01-14 03:17:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Paul Davis <[email protected]> wrote:
>
> >SCHED_FIFO and SCHED_RR are definitely privileged operations and you
>
> this is the crux of what this whole debate is about. for all of you
> people who think about linux on multi-user systems with network
> connectivity, running servers and so forth, this is clearly a given.
>
> but there is large and growing body of machines that run linux where
> the sole human user of the machine has a strong and overwhelming
> desire to have tasks run with the characteristics offered by
> SCHED_FIFO and/or SCHED_RR. are they still "privileged" operations on
> this class of linux system? what about linux installed on an embedded
> system, with a small LCD screen and the sole purpose of running audio
> apps live? are they still privileged then?
>

Paul. Everyone agrees with you. I think. We just need to work out
the best way of doing it.

Would I be right in suspecting that we know what to do, but nobody has
stepped up to write the code? It's kinda looking like that?

2005-01-14 03:25:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Con Kolivas <[email protected]> wrote:
>
> > btw: Are RT tasks excluded by the oom killer?
>
> I haven't looked. VM hackers?

Nope. We're nastier to tasks which have been niced down, but we're not
nicer to tasks which have been given elevated priority/policy.

2005-01-14 03:25:05

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Andrew Morton wrote:
> Paul Davis <[email protected]> wrote:
>
>>>SCHED_FIFO and SCHED_RR are definitely privileged operations and you
>>
>>this is the crux of what this whole debate is about. for all of you
>>people who think about linux on multi-user systems with network
>>connectivity, running servers and so forth, this is clearly a given.
>>
>>but there is large and growing body of machines that run linux where
>>the sole human user of the machine has a strong and overwhelming
>>desire to have tasks run with the characteristics offered by
>>SCHED_FIFO and/or SCHED_RR. are they still "privileged" operations on
>>this class of linux system? what about linux installed on an embedded
>>system, with a small LCD screen and the sole purpose of running audio
>>apps live? are they still privileged then?
>>
>
>
> Paul. Everyone agrees with you. I think. We just need to work out
> the best way of doing it.
>
> Would I be right in suspecting that we know what to do, but nobody has
> stepped up to write the code? It's kinda looking like that?

I thought I made it clear i had already volunteered. I was after a
response to my proposal for how to do it.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 03:32:57

by utz lehmann

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 13:42 +1100, Con Kolivas wrote:
> utz lehmann wrote:
> > On Fri, 2005-01-14 at 13:08 +1100, Con Kolivas wrote:
> >
> >>utz lehmann wrote:
> >>>Just an idea. What about throttling runaway RT tasks?
> >>>If the system spend more than 98% in RT tasks for 5s consider this as a
> >>>_fatal error_. Print an error message and throttle RT tasks by inserting
> >>>ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
> >>>means one SCHED_OTHER only tick all 50 ticks.
> >>>
> >>>The limit and timeout should be configurable and of course it can be
> >>>disabled.
> >>>
> >>>I know this is against RT task preempt all SCHED_OTHER but this is only
> >>>for a fatal system state to be able to recover sanely. A locked up
> >>>machine is is the worse alternative.
> >>
> >>There is a patch in -mm currently designed to use a sysrq key
> >>combination which converts all real time tasks to sched normal to save
> >>you if you desire in a lockup situation. We do want to preserve RT
> >>scheduling behaviour at all times without caveats for privileged users.
> >
> >
> > The sysrq is already in 2.6.10. I had to use it the last days a few
> > times. But it does help if you have no access to the console.
> >
> > The RT throttling idea is not to change the behavior in normal
> > conditions. It's only for a fatal system state. If you have a runaway RT
> > task you can't guarantee the system is work properly anyway. It's
> > blocking vital kernel threads, filesystems, swap, keyboard, ...
>
> I understand fully your concern. If such a thing were to be introduced
> it would have to be disabled by default. Since I'm looking at
> implementing such throttling for user RT tasks, it should be trivial to
> add it to other RT tasks, and have 100% as the default cpu limit. How
> does that sound?

Sounds good.-)
The kernel should have 100% limit (disable) as default. Users and
distros can change it to a sane value for there needs.


2005-01-14 03:33:36

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 14:18 +1100, Con Kolivas wrote:
> Andrew Morton wrote:
> > Paul Davis <[email protected]> wrote:
> >
> >>>SCHED_FIFO and SCHED_RR are definitely privileged operations and you
> >>
> >>this is the crux of what this whole debate is about. for all of you
> >>people who think about linux on multi-user systems with network
> >>connectivity, running servers and so forth, this is clearly a given.
> >>
> >>but there is large and growing body of machines that run linux where
> >>the sole human user of the machine has a strong and overwhelming
> >>desire to have tasks run with the characteristics offered by
> >>SCHED_FIFO and/or SCHED_RR. are they still "privileged" operations on
> >>this class of linux system? what about linux installed on an embedded
> >>system, with a small LCD screen and the sole purpose of running audio
> >>apps live? are they still privileged then?
> >>
> >
> >
> > Paul. Everyone agrees with you. I think. We just need to work out
> > the best way of doing it.
> >
> > Would I be right in suspecting that we know what to do, but nobody has
> > stepped up to write the code? It's kinda looking like that?
>
> I thought I made it clear i had already volunteered. I was after a
> response to my proposal for how to do it.
>

It sounds to me like both your proposals may be too complex and not
sufficiently deterministic (I don't know for sure, maybe that's
exactly what the RT people want).

I wouldn't have thought it is so much a matter of having real-time-ish
scheduling available that tries to play nicely in a multi user machine.
That must still imply that either the user is able to unduly tie up
resources (and thus it has to be a privileged operation), or that it
sometimes can't meet its "guarantees" (in which case, is it useful?).

I was thinking that the solution might be more along the lines of
a nice way to handle privileges for these guys.

I could be completely off the rails though. I haven't really been
following this thread so please shoot me in my foot if I have put it
in my mouth.



2005-01-14 03:36:42

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>> Paul. Everyone agrees with you. I think. We just need to work out
>> the best way of doing it.
>>
>> Would I be right in suspecting that we know what to do, but nobody has
>> stepped up to write the code? It's kinda looking like that?
>
>I thought I made it clear i had already volunteered. I was after a
>response to my proposal for how to do it.

I think your proposal is a good (maybe even excellent) one, but it
somewhat sidesteps the issue (which may be the best thing to
do). Rather than answering the question "how best to allow regular
users access to SCHED_FIFO", it says "lets offer regular users
SCHED_ISO which is essentially identical to SCHED_FIFO unless tasks
running SCHED_ISO use too much cpu time".

its a fine answer, but its the answer to a slightly different
question. if anyone (maybe us audio freaks, maybe someone else) comes
up with a reason to want "The Real SCHED_FIFO", the original question
will have gone unanswered.

--p

2005-01-14 03:36:41

by utz lehmann

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-13 at 19:20 -0800, Andrew Morton wrote:
> Con Kolivas <[email protected]> wrote:
> >
> > > btw: Are RT tasks excluded by the oom killer?
> >
> > I haven't looked. VM hackers?
>
> Nope. We're nastier to tasks which have been niced down, but we're not
> nicer to tasks which have been given elevated priority/policy.

Maybe this should be done?
RT tasks are somewhat important i think.


2005-01-14 03:41:31

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>I wouldn't have thought it is so much a matter of having real-time-ish
>scheduling available that tries to play nicely in a multi user machine.
>That must still imply that either the user is able to unduly tie up
>resources (and thus it has to be a privileged operation), or that it
>sometimes can't meet its "guarantees" (in which case, is it useful?).

most audio hackers and users are perfectly comfortable with the OSX
compromise - tasks with no special privilege get deterministic access
to the CPU as long as they do not consume excessive cycles.

this begs the question about what happens when the entire class of
SCHED_ISO (to use Con's working name for such a scheduling class)
tasks is eating too much CPU, rather than any one of them, but i'll
leave that to Con :)

--p

2005-01-14 03:46:00

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Paul Davis wrote:
>>>Paul. Everyone agrees with you. I think. We just need to work out
>>>the best way of doing it.
>>>
>>>Would I be right in suspecting that we know what to do, but nobody has
>>>stepped up to write the code? It's kinda looking like that?
>>
>>I thought I made it clear i had already volunteered. I was after a
>>response to my proposal for how to do it.
>
>
> I think your proposal is a good (maybe even excellent) one, but it
> somewhat sidesteps the issue (which may be the best thing to
> do). Rather than answering the question "how best to allow regular
> users access to SCHED_FIFO", it says "lets offer regular users
> SCHED_ISO which is essentially identical to SCHED_FIFO unless tasks
> running SCHED_ISO use too much cpu time".
>
> its a fine answer, but its the answer to a slightly different
> question. if anyone (maybe us audio freaks, maybe someone else) comes
> up with a reason to want "The Real SCHED_FIFO", the original question
> will have gone unanswered.

Ah then you missed something. You can set the max cpu of SCHED_ISO to
100% and then you have it.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 03:54:26

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>> its a fine answer, but its the answer to a slightly different
>> question. if anyone (maybe us audio freaks, maybe someone else) comes
>> up with a reason to want "The Real SCHED_FIFO", the original question
>> will have gone unanswered.
>
>Ah then you missed something. You can set the max cpu of SCHED_ISO to
>100% and then you have it.

true, i missed that :) but i also recall you saying you were thinking
of having no prioritization within SCHED_ISO ... or am i remembering
wrong? also, is it just me, or having to ways to achieve the exact
same result seems very un-linux-like ... and if they are not exact
same results, how does a regular user get the SCHED_FIFO ones? is the
answer just "they don't" ?

--p

2005-01-14 04:01:43

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Paul Davis wrote:
>>>its a fine answer, but its the answer to a slightly different
>>>question. if anyone (maybe us audio freaks, maybe someone else) comes
>>>up with a reason to want "The Real SCHED_FIFO", the original question
>>>will have gone unanswered.
>>
>>Ah then you missed something. You can set the max cpu of SCHED_ISO to
>>100% and then you have it.
>
>
> true, i missed that :) but i also recall you saying you were thinking
> of having no prioritization within SCHED_ISO ... or am i remembering
> wrong?

Nothing is set in stone. I wont even look at code until Ingo or Linus
rules on this. Ingo has expressed interest in SCHED_ISO on a previous
thread with me.

> also, is it just me, or having to ways to achieve the exact
> same result seems very un-linux-like ... and if they are not exact
> same results, how does a regular user get the SCHED_FIFO ones? is the
> answer just "they don't" ?

To answer your question, the second of my proposals was to not have a
separate scheduling class at all. To let normal users set SCHED_FIFO and
SCHED_RR, possibly with all their priorities intact, but for there to be
limits placed on their usage of these classes. The reason I suggested
not supporting priorities is that proper real time scheduling would
entail being able to say "I need x cycles, to complete by y time and I
can or cannot be preempted". With these QoS requirements, a whole new
scheduling style (EDF) would need to be implemented. Without actually
implementing this, if you set a limit of cpu to 70%, all it takes is one
FIFO process to run long enough at high enough priority and all your
other soft real time tasks go to SCHED_NORMAL, which is nothing like
what happens with true RT scheduling. Forcing all soft RT threads to
round robin at the same priority would make them sort themselves out.
It's a compromise either way, and in fact this latter way is what OSX
does and works well in practice as well as theory.

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 04:04:43

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 14:38 +1100, Con Kolivas wrote:
> Paul Davis wrote:

> > its a fine answer, but its the answer to a slightly different
> > question. if anyone (maybe us audio freaks, maybe someone else) comes
> > up with a reason to want "The Real SCHED_FIFO", the original question
> > will have gone unanswered.
>
> Ah then you missed something. You can set the max cpu of SCHED_ISO to
> 100% and then you have it.
>

Is that a good solution? I'm not sure if it is wise to try to
masquerade SCHED_ISO as an unprivileged RT class.

I mean what happens if two users are trying to run independent
SCHED_ISO systems? Both will probably break, right?

And how can you provide _any_ guarantees in an arbitrary environment
without this becoming a privileged operation? I can't quite get my head
around that at the moment...

I guess if you have SCHED_ISO start out with 0 guarantees, and have root
dole some out, then it may be workable. But then that is just another
specialised ad hoc sort of hack wouldn't it? (not talking about
SCHED_ISO itself, but the granting of the privilege to use it).




2005-01-14 04:12:50

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Nick Piggin wrote:
> It sounds to me like both your proposals may be too complex and not
> sufficiently deterministic (I don't know for sure, maybe that's
> exactly what the RT people want).

This is the solution already employed in the real world by OSX. It works
well, and the audio people have told me they are happy with it.

> I could be completely off the rails though. I haven't really been
> following this thread so please shoot me in my foot if I have put it
> in my mouth.

If your foot is in your mouth and you ask me to shoot you in the foot it
would blow your head off... Hmm it's tempting...

Cheers,
Con


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2005-01-14 04:17:34

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 15:00 +1100, Con Kolivas wrote:
> Paul Davis wrote:
> >>>its a fine answer, but its the answer to a slightly different
> >>>question. if anyone (maybe us audio freaks, maybe someone else) comes
> >>>up with a reason to want "The Real SCHED_FIFO", the original question
> >>>will have gone unanswered.
> >>
> >>Ah then you missed something. You can set the max cpu of SCHED_ISO to
> >>100% and then you have it.
> >
> >
> > true, i missed that :) but i also recall you saying you were thinking
> > of having no prioritization within SCHED_ISO ... or am i remembering
> > wrong?
>
> Nothing is set in stone. I wont even look at code until Ingo or Linus
> rules on this. Ingo has expressed interest in SCHED_ISO on a previous
> thread with me.
>

You may have a chicken and egg problem :) I don't think anybody will
rule on this unless there is at least a demand for it. For there to
be a demand for it I think you'd need to come up with a rigorous
specification, wouldn't you? And then implement it even.

Unfortunately this is just how kernel development goes if you're brave
enough to try out new things.

I'm leaning toward the opinion that the entire problem would be better
handled purely with the existing RT scheduling classes, and a good way
to handle the security side of things.

> > also, is it just me, or having to ways to achieve the exact
> > same result seems very un-linux-like ... and if they are not exact
> > same results, how does a regular user get the SCHED_FIFO ones? is the
> > answer just "they don't" ?
>
> To answer your question, the second of my proposals was to not have a
> separate scheduling class at all. To let normal users set SCHED_FIFO and
> SCHED_RR, possibly with all their priorities intact, but for there to be
> limits placed on their usage of these classes. The reason I suggested
> not supporting priorities is that proper real time scheduling would
> entail being able to say "I need x cycles, to complete by y time and I
> can or cannot be preempted". With these QoS requirements, a whole new
> scheduling style (EDF) would need to be implemented. Without actually

This sort of thing is pretty well specialised enough that it doesn't
belong in the kernel scheduler, however. Often it can be satisfied by
userspace managers... but you'd have to be talking about a hard RT
system anyway, which Linux isn't.


http://mobile.yahoo.com.au - Yahoo! Mobile
- Check & compose your email via SMS on your Telstra or Vodafone mobile.

2005-01-14 04:24:08

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 2005-01-14 at 15:11 +1100, Con Kolivas wrote:
> Nick Piggin wrote:
> > It sounds to me like both your proposals may be too complex and not
> > sufficiently deterministic (I don't know for sure, maybe that's
> > exactly what the RT people want).
>
> This is the solution already employed in the real world by OSX. It works
> well, and the audio people have told me they are happy with it.
>

Alternatively, could you grant the required capabilities to use real
RT scheduling and not foul up the scheduler?

Or do a similar sort of thing with a userspace daemon that manages
priorities and watches CPU usage?

Basically I'd prefer not to put hacks in the (mainline) scheduler to
handle this pretty specific special case.

> > I could be completely off the rails though. I haven't really been
> > following this thread so please shoot me in my foot if I have put it
> > in my mouth.
>
> If your foot is in your mouth and you ask me to shoot you in the foot it
> would blow your head off... Hmm it's tempting...
>

Meeeow!



2005-01-14 04:45:17

by Paul Davis

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>Alternatively, could you grant the required capabilities to use real
>RT scheduling and not foul up the scheduler?

this is precisely the point i was making. either you agree that
unprivileged users can get easy access to a scheduling class that can
reliably DOS the system, or they can't. if they can't, what kind of
scheduling class can they access easily?

according to andrew, and i agree with his conclusion, many people
agree that its OK for them to get access to the DOS class, but there's
little agreement on the security model to allow this. Con is
suggesting that they are not, but instead get a different scheduling
class that is functionally equivalent except that it can't
(theoretically) be used to DOS the system.

--p

2005-01-14 05:15:12

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-13 at 23:45 -0500, Paul Davis wrote:
> >Alternatively, could you grant the required capabilities to use real
> >RT scheduling and not foul up the scheduler?
>
> this is precisely the point i was making. either you agree that
> unprivileged users can get easy access to a scheduling class that can
> reliably DOS the system, or they can't. if they can't, what kind of
> scheduling class can they access easily?
>
> according to andrew, and i agree with his conclusion, many people
> agree that its OK for them to get access to the DOS class, but there's
> little agreement on the security model to allow this. Con is
> suggesting that they are not, but instead get a different scheduling
> class that is functionally equivalent except that it can't
> (theoretically) be used to DOS the system.

Well IMO that would be preferable if there are no other objections.
I can't think how any sort of unprivileged "real time" scheduling
would have a place on multi-user systems.

And if it is only really used on single user systems then presumably
the priority elevation isn't a big problem provided it can be properly
managed. So this would appear to be the better solution.

Supposing you do want some sort of DOS prevention in the system, I'd
much prefer it be handled by a trusted user-space daemon for example,
rather than scheduler smarts (which may require a little bit of work
to limit priorities but would be relatively straightforward).



2005-01-14 06:57:55

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, Jan 13, 2005 at 07:12:37PM -0800, Andrew Morton wrote:
> Paul Davis <[email protected]> wrote:
> >
> > >SCHED_FIFO and SCHED_RR are definitely privileged operations and you
> >
> > this is the crux of what this whole debate is about. for all of you
> > people who think about linux on multi-user systems with network
> > connectivity, running servers and so forth, this is clearly a given.
> >
> > but there is large and growing body of machines that run linux where
> > the sole human user of the machine has a strong and overwhelming
> > desire to have tasks run with the characteristics offered by
> > SCHED_FIFO and/or SCHED_RR. are they still "privileged" operations on
> > this class of linux system? what about linux installed on an embedded
> > system, with a small LCD screen and the sole purpose of running audio
> > apps live? are they still privileged then?
> >
>
> Paul. Everyone agrees with you. I think. We just need to work out
> the best way of doing it.
>
> Would I be right in suspecting that we know what to do, but nobody has
> stepped up to write the code? It's kinda looking like that?

The closest thing to concensus I've seen yet was a new rlimit for
scheduling with code from Chris Wright. The version I last saw had
some rough edges on the API (exposing the internal scheduler priority
levels) but wasn't too bad in principle. We really ought not get in
the habit of adding new rlimits though.

Perhaps he can post whatever he has again, I'm not sure what the
current state is.

--
Mathematics is the supreme nostalgia of our time.

2005-01-14 07:05:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Matt Mackall <[email protected]> wrote:
>
> The closest thing to concensus I've seen yet was a new rlimit for
> scheduling with code from Chris Wright.

hmm, yes. It doesn't feel like an rlimity thing to me, unless the rlimit
actually _limits_ something. Say, minimum permissible nice level. But
scheduling policy sounds more like a capability than an rlimit.

> We really ought not get in
> the habit of adding new rlimits though.

How come? It's a real pita that the standard shells don't appear to have a
way of setting an unknown rlimit. But what else?

2005-01-14 07:55:43

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Andrew Morton ([email protected]) wrote:
> Matt Mackall <[email protected]> wrote:
> >
> > The closest thing to concensus I've seen yet was a new rlimit for
> > scheduling with code from Chris Wright.
>
> hmm, yes. It doesn't feel like an rlimity thing to me, unless the rlimit
> actually _limits_ something. Say, minimum permissible nice level. But
> scheduling policy sounds more like a capability than an rlimit.

It's had a few incarnations with minor tweaks. But they each did
provide a limit, an upper bound, on how the user could prioritize it's
task with the scheduler (both nice values and rt priorities).

> > We really ought not get in
> > the habit of adding new rlimits though.
>
> How come? It's a real pita that the standard shells don't appear to have a
> way of setting an unknown rlimit. But what else?

It's got that slippery slope feeling. When do you decided that you're
just punting everything to an rlimit and it becomes an unmanaged mess?
However, in this case, at least it's easy to justify cpu time as a
resource. I'll repost in the AM...sleep calls.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-01-14 09:21:41

by Will Dyson

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, 14 Jan 2005 14:31:21 +1100, Nick Piggin <[email protected]> wrote:

> It sounds to me like both your proposals may be too complex and not
> sufficiently deterministic (I don't know for sure, maybe that's
> exactly what the RT people want).
>
> I wouldn't have thought it is so much a matter of having real-time-ish
> scheduling available that tries to play nicely in a multi user machine.
> That must still imply that either the user is able to unduly tie up
> resources (and thus it has to be a privileged operation), or that it
> sometimes can't meet its "guarantees" (in which case, is it useful?).

The VM system with overcommit is in a similar pickle. It can't honor
the "guarantees" it makes. Yet, I think it is in wide use. Overcommit
is a useful behavior for many people, despite the fact that it allows
any user to turn loose the oom_killer on the system.

So I think many people would also find a best-effort-at-realtime
SCHED_ISO type thing pretty useful, even if it allowed unprivileged
users to tie up resources (while protecting the system from DOS).
Heck, we don't have to allow unprivileged users to tie up resources.
SCHED_ISO use could be limited to members of a certain group, possibly
implemented using some sort of LSM module... :)

Of course, suggesting that access to SCHED_ISO be limited pretty much
admits that running processes as SCHED_ISO should be a privileged
operation, like accessing /dev/dsp (a privilege that is granted
through group membership on most desktops).

> I was thinking that the solution might be more along the lines of
> a nice way to handle privileges for these guys.

A nice, flexible way to hand out scheduler (and perhaps other)
privileges would be... nice. Are you thinking of something more
fine-grained than per-user?

--
Will Dyson

2005-01-14 09:54:47

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Will Dyson wrote:
> On Fri, 14 Jan 2005 14:31:21 +1100, Nick Piggin <[email protected]> wrote:
>
>
>>It sounds to me like both your proposals may be too complex and not
>>sufficiently deterministic (I don't know for sure, maybe that's
>>exactly what the RT people want).
>>
>>I wouldn't have thought it is so much a matter of having real-time-ish
>>scheduling available that tries to play nicely in a multi user machine.
>>That must still imply that either the user is able to unduly tie up
>>resources (and thus it has to be a privileged operation), or that it
>>sometimes can't meet its "guarantees" (in which case, is it useful?).
>
>
> The VM system with overcommit is in a similar pickle. It can't honor
> the "guarantees" it makes. Yet, I think it is in wide use. Overcommit
> is a useful behavior for many people, despite the fact that it allows
> any user to turn loose the oom_killer on the system.
>

I'm not sure if that is a really good comparison.

> So I think many people would also find a best-effort-at-realtime
> SCHED_ISO type thing pretty useful, even if it allowed unprivileged
> users to tie up resources (while protecting the system from DOS).
> Heck, we don't have to allow unprivileged users to tie up resources.
> SCHED_ISO use could be limited to members of a certain group, possibly
> implemented using some sort of LSM module... :)
>
> Of course, suggesting that access to SCHED_ISO be limited pretty much
> admits that running processes as SCHED_ISO should be a privileged
> operation, like accessing /dev/dsp (a privilege that is granted
> through group membership on most desktops).
>

Now I'm not adverse to cool hacks, and I haven't thought about
SCHED_ISO enough to comment on it much (nor has its behaviour
even been firmly defined as far as I know).

But regarding the kernel in general and the scheduler especially:
it is pretty important to fight feature creep. SCHED_ISO will have
a non zero cost in terms of complexity, maintainability, and
probably performance.

So the only way it can go in is if a non trivial number of people
really need it for things that can't be satisfied in userspace or
with a good privilege system or [something elegant], etc.

I'm not by any means stopping anyone from coming up with a firm
definition for SCHED_ISO, implementing it, and demonstrating that
it is the best way to solve a problem that X people care about,
and that its benefits outweigh its inevitable costs...
I'm just giving some frank advice.

>
>>I was thinking that the solution might be more along the lines of
>>a nice way to handle privileges for these guys.
>
>
> A nice, flexible way to hand out scheduler (and perhaps other)
> privileges would be... nice. Are you thinking of something more
> fine-grained than per-user?
>

I'm not too sure, that topic's out of my league... But that is
basically what is sought after by the guys behind the realtime LSM.
So I'd better stop hijacking their thread!

2005-01-14 17:33:01

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

At 05:31 PM 1/13/2005 -0600, Jack O'Quin wrote:
>Arjan van de Ven <[email protected]> writes:
>
> > On Thu, Jan 13, 2005 at 04:25:08PM -0500, Lee Revell wrote:
> >> The basic issue is that the current semantics of SCHED_FIFO seem make
> >> the deadlock/data corruption due to runaway RT thread issue difficult.
> >> The obvious solution is a new scheduling class equivalent to SCHED_FIFO
> >> but with a mechanism for the kernel to demote the offending thread to
> >> SCHED_OTHER in an emergency.
> >
> > and this is getting really close to the original "counter proposal" to the
> > LSM module that was basically "lets make lower nice limit an rlimit, and
> > have -20 mean "basically FIFO" *if* the task behaves itself".
>
>Yes. However, my tests have so far shown a need for "actual FIFO as
>long as the task behaves itself."

I for one wonder why that appears to be so. What happens if you use
SCHED_RR instead of SCHED_FIFO?

(ie is the problem just one of running out of slice at a bad time, or is it
the dynamic priority adjustment)

-Mike

2005-01-14 20:16:14

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> The closest thing to concensus I've seen yet was a new rlimit for
> scheduling with code from Chris Wright. The version I last saw had
> some rough edges on the API (exposing the internal scheduler priority
> levels) but wasn't too bad in principle. We really ought not get in
> the habit of adding new rlimits though.
>
> Perhaps he can post whatever he has again, I'm not sure what the
> current state is.

This is the latest version, with the idea from Utz to break nice and
rtprio apart.

The basic issue on the rlimit value is how to sanely encode nice values,
realtime prioroties and scheduler policies into a number. The first
incarnation was the clumsiest, and tried to pack it all into a number
in range of [0,139]. This, as many agree, too closely reflects kernel
internal values. This one gives 0-39 (nice values 19,-20) to RLIMIT_NICE,
and 0-99 (rt priorities) to RLIMIT_RTPRIO. There's no distinction in rt
policy, and the traditional override (CAP_SYS_NICE) is still in place.
The defaults for both rlimits are 0, and behaviour should be backwards
compatible. I tested this one a bit, and it worked as expected. I've
got a patch to pam_limits as well, although it's untested.

thanks,
-chris
--

===== include/asm-i386/resource.h 1.5 vs edited =====
--- 1.5/include/asm-i386/resource.h 2004-08-23 01:15:26 -07:00
+++ edited/include/asm-i386/resource.h 2005-01-14 10:28:19 -08:00
@@ -18,8 +18,11 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
+#define RLIMIT_NICE 13 /* max nice prio allowed to raise to
+ 0-39 for nice level 19 .. -20 */
+#define RLIMIT_RTPRIO 14 /* maximum realtime priority */

-#define RLIM_NLIMITS 13
+#define RLIM_NLIMITS 15


/*
@@ -45,6 +48,8 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
+ { 0, 0 }, \
+ { 0, 0 }, \
}

#endif /* __KERNEL__ */
===== include/linux/sched.h 1.291 vs edited =====
--- 1.291/include/linux/sched.h 2005-01-11 16:42:57 -08:00
+++ edited/include/linux/sched.h 2005-01-14 10:11:13 -08:00
@@ -767,6 +767,7 @@ extern void sched_idle_next(void);
extern void set_user_nice(task_t *p, long nice);
extern int task_prio(const task_t *p);
extern int task_nice(const task_t *p);
+extern unsigned long nice_to_rlimit_nice(const int nice);
extern int task_curr(const task_t *p);
extern int idle_cpu(int cpu);
extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
===== kernel/sched.c 1.407 vs edited =====
--- 1.407/kernel/sched.c 2005-01-11 16:42:35 -08:00
+++ edited/kernel/sched.c 2005-01-14 10:38:21 -08:00
@@ -68,6 +68,12 @@
#define MAX_USER_PRIO (USER_PRIO(MAX_PRIO))

/*
+ * convert nice to RLIMIT_NICE values ([ 19 ... -20 ] to [ 0 ... 39 ])
+ */
+
+#define NICE_TO_RLIMIT_NICE(nice) (19 - nice)
+
+/*
* Some helpers for converting nanosecond timing to jiffy resolution
*/
#define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 / HZ))
@@ -3140,12 +3146,8 @@ asmlinkage long sys_nice(int increment)
* We don't have to worry. Conceptually one call occurs first
* and we have a single winner.
*/
- if (increment < 0) {
- if (!capable(CAP_SYS_NICE))
- return -EPERM;
- if (increment < -40)
- increment = -40;
- }
+ if (increment < -40)
+ increment = -40;
if (increment > 40)
increment = 40;

@@ -3155,6 +3157,12 @@ asmlinkage long sys_nice(int increment)
if (nice > 19)
nice = 19;

+ if (increment < 0 &&
+ NICE_TO_RLIMIT_NICE(nice) >
+ current->signal->rlim[RLIMIT_NICE].rlim_cur &&
+ !capable(CAP_SYS_NICE))
+ return -EPERM;
+
retval = security_task_setnice(current, nice);
if (retval)
return retval;
@@ -3188,6 +3196,15 @@ int task_nice(const task_t *p)
}

/**
+ * nice_to_rlimit_nice - return rlimit_nice priority of give nice value
+ * @nice: nice value
+ */
+unsigned long nice_to_rlimit_nice(const int nice)
+{
+ return NICE_TO_RLIMIT_NICE(nice);
+}
+
+/**
* idle_cpu - is a given cpu idle currently?
* @cpu: the processor in question.
*/
@@ -3252,6 +3269,7 @@ recheck:
return -EINVAL;

if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
+ param->sched_priority > p->signal->rlim[RLIMIT_RTPRIO].rlim_cur &&
!capable(CAP_SYS_NICE))
return -EPERM;
if ((current->euid != p->euid) && (current->euid != p->uid) &&
===== kernel/sys.c 1.104 vs edited =====
--- 1.104/kernel/sys.c 2005-01-11 16:42:35 -08:00
+++ edited/kernel/sys.c 2005-01-14 10:11:13 -08:00
@@ -225,7 +225,10 @@ static int set_one_prio(struct task_stru
error = -EPERM;
goto out;
}
- if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
+ if (niceval < task_nice(p) &&
+ nice_to_rlimit_nice(niceval) >
+ p->signal->rlim[RLIMIT_NICE].rlim_cur &&
+ !capable(CAP_SYS_NICE)) {
error = -EACCES;
goto out;
}


-----
And the patch for pam.

--- Linux-PAM-0.77/modules/pam_limits/pam_limits.c.prio 2005-01-14 10:47:03.000000000 -0800
+++ Linux-PAM-0.77/modules/pam_limits/pam_limits.c 2005-01-14 10:55:13.000000000 -0800
@@ -39,6 +39,11 @@
#include <grp.h>
#include <pwd.h>

+/* Hack to test new rlimit values */
+#define RLIMIT_NICE 13
+#define RLIMIT_RTPRIO 14
+#define RLIM_NLIMITS 15
+
/* Module defines */
#define LINE_LENGTH 1024

@@ -293,6 +298,10 @@ static void process_limit(int source, co
else if (strcmp(lim_item, "locks") == 0)
limit_item = RLIMIT_LOCKS;
#endif
+ else if (strcmp(lim_item, "rt_priority") == 0)
+ limit_item = RLIMIT_RTPRIO;
+ else if (strcmp(lim_item, "nice") == 0)
+ limit_item = RLIMIT_NICE;
else if (strcmp(lim_item, "maxlogins") == 0) {
limit_item = LIMIT_LOGIN;
pl->flag_numsyslogins = 0;
@@ -360,6 +369,19 @@ static void process_limit(int source, co
case RLIMIT_AS:
limit_value *= 1024;
break;
+ case RLIMIT_NICE:
+ limit_value = 19 - limit_value;
+ if (limit_value > 39)
+ limit_value = 39;
+ if (limit_value < 0);
+ limit_value = 0;
+ break;
+ case RLIMIT_RTPRIO:
+ if (limit_value > 99)
+ limit_value = 99;
+ if (limit_value < 0);
+ limit_value = 0;
+ break;
}

if ( (limit_item != LIMIT_LOGIN)

2005-01-14 20:57:14

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 14, 2005 at 12:10:21PM -0800, Chris Wright wrote:
> * Matt Mackall ([email protected]) wrote:
> > The closest thing to concensus I've seen yet was a new rlimit for
> > scheduling with code from Chris Wright. The version I last saw had
> > some rough edges on the API (exposing the internal scheduler priority
> > levels) but wasn't too bad in principle. We really ought not get in
> > the habit of adding new rlimits though.
> >
> > Perhaps he can post whatever he has again, I'm not sure what the
> > current state is.
>
> This is the latest version, with the idea from Utz to break nice and
> rtprio apart.
>
> The basic issue on the rlimit value is how to sanely encode nice values,
> realtime prioroties and scheduler policies into a number. The first
> incarnation was the clumsiest, and tried to pack it all into a number
> in range of [0,139]. This, as many agree, too closely reflects kernel
> internal values. This one gives 0-39 (nice values 19,-20) to RLIMIT_NICE,
> and 0-99 (rt priorities) to RLIMIT_RTPRIO. There's no distinction in rt
> policy, and the traditional override (CAP_SYS_NICE) is still in place.
> The defaults for both rlimits are 0, and behaviour should be backwards
> compatible. I tested this one a bit, and it worked as expected. I've
> got a patch to pam_limits as well, although it's untested.

This is looking pretty good.

> +#define NICE_TO_RLIMIT_NICE(nice) (19 - nice)
...
> +unsigned long nice_to_rlimit_nice(const int nice)
> +{
> + return NICE_TO_RLIMIT_NICE(nice);
> +}

This is a bit silly.

> - if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
> + if (niceval < task_nice(p) &&
> + nice_to_rlimit_nice(niceval) >
> + p->signal->rlim[RLIMIT_NICE].rlim_cur &&
> + !capable(CAP_SYS_NICE)) {

Perhaps we want another helper function to do the rlim and
CAP_SYS_NICE check together.

--
Mathematics is the supreme nostalgia of our time.

2005-01-14 21:03:24

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Thu, 2005-01-13 at 13:17 -0600, Jack O'Quin wrote:
> But there may be other, better solutions to the deadlock problem.
> Several years ago, Roger Larsson wrote a completely user-space
> realtime monitor program that works perfectly well for revoking
> realtime privileges when it detects CPU starvation. I still use it
> occasionally to help debug problems if the built-in JACK watchdog
> timer doesn't catch them.

Jack,

Do you have a link to Roger Larsson's RT watchdog?

Since we seem to have a consensus that the rlimit approach is the way to
go, I think it will be important to have a generic watchdog thread
running as root at a higher RT prio than the RT group. JACK solves the
problem with its own watchdog thread but as more and more apps migrate
to (in our opinion) the "correct" RT programming model, where you have
multithreaded apps with normal prio disk and GUI threads feeding an RT
rendering thread, a system wide watchdog daemon becomes more attractive.
Keep in mind there are many other applications than audio, for example
CD burning has an obvious RT constraint and cdrecord will take advantage
of SCHED_FIFO and mlockall() if it can get them.

Lee

2005-01-14 23:12:46

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

* Matt Mackall ([email protected]) wrote:
> On Fri, Jan 14, 2005 at 12:10:21PM -0800, Chris Wright wrote:
> > The basic issue on the rlimit value is how to sanely encode nice values,
> > realtime prioroties and scheduler policies into a number. The first
> > incarnation was the clumsiest, and tried to pack it all into a number
> > in range of [0,139]. This, as many agree, too closely reflects kernel
> > internal values. This one gives 0-39 (nice values 19,-20) to RLIMIT_NICE,
> > and 0-99 (rt priorities) to RLIMIT_RTPRIO. There's no distinction in rt
> > policy, and the traditional override (CAP_SYS_NICE) is still in place.
> > The defaults for both rlimits are 0, and behaviour should be backwards
> > compatible. I tested this one a bit, and it worked as expected. I've
> > got a patch to pam_limits as well, although it's untested.
>
> This is looking pretty good.
>
> > +#define NICE_TO_RLIMIT_NICE(nice) (19 - nice)
> ...
> > +unsigned long nice_to_rlimit_nice(const int nice)
> > +{
> > + return NICE_TO_RLIMIT_NICE(nice);
> > +}
>
> This is a bit silly.

Heh, I wondered what comment that would get ;-) It's gone.

> > - if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
> > + if (niceval < task_nice(p) &&
> > + nice_to_rlimit_nice(niceval) >
> > + p->signal->rlim[RLIMIT_NICE].rlim_cur &&
> > + !capable(CAP_SYS_NICE)) {
>
> Perhaps we want another helper function to do the rlim and
> CAP_SYS_NICE check together.

Sure.
-chris
--

===== include/asm-i386/resource.h 1.5 vs edited =====
--- 1.5/include/asm-i386/resource.h 2004-08-23 01:15:26 -07:00
+++ edited/include/asm-i386/resource.h 2005-01-14 13:48:53 -08:00
@@ -18,8 +18,11 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
+#define RLIMIT_NICE 13 /* max nice prio allowed to raise to
+ 0-39 for nice level 19 .. -20 */
+#define RLIMIT_RTPRIO 14 /* maximum realtime priority */

-#define RLIM_NLIMITS 13
+#define RLIM_NLIMITS 15


/*
@@ -45,6 +48,8 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
+ { 0, 0 }, \
+ { 0, 0 }, \
}

#endif /* __KERNEL__ */
===== include/asm-x86_64/resource.h 1.5 vs edited =====
--- 1.5/include/asm-x86_64/resource.h 2004-08-23 01:15:26 -07:00
+++ edited/include/asm-x86_64/resource.h 2005-01-14 14:17:38 -08:00
@@ -18,8 +18,11 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
+#define RLIMIT_NICE 13 /* max nice prio allowed to raise to
+ 0-39 for nice level 19 .. -20 */
+#define RLIMIT_RTPRIO 14 /* maximum realtime priority */

-#define RLIM_NLIMITS 13
+#define RLIM_NLIMITS 15

/*
* SuS says limits have to be unsigned.
@@ -44,6 +47,8 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
+ { 0, 0 }, \
+ { 0, 0 }, \
}

#endif /* __KERNEL__ */
===== include/linux/sched.h 1.291 vs edited =====
--- 1.291/include/linux/sched.h 2005-01-11 16:42:57 -08:00
+++ edited/include/linux/sched.h 2005-01-14 13:58:32 -08:00
@@ -767,6 +767,7 @@ extern void sched_idle_next(void);
extern void set_user_nice(task_t *p, long nice);
extern int task_prio(const task_t *p);
extern int task_nice(const task_t *p);
+extern int can_nice(const task_t *p, const int nice);
extern int task_curr(const task_t *p);
extern int idle_cpu(int cpu);
extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
===== kernel/sched.c 1.407 vs edited =====
--- 1.407/kernel/sched.c 2005-01-11 16:42:35 -08:00
+++ edited/kernel/sched.c 2005-01-14 15:03:44 -08:00
@@ -3121,6 +3121,19 @@ out_unlock:

EXPORT_SYMBOL(set_user_nice);

+/**
+ * can_nice - check if a task can reduce its nice value
+ @p: task
+ * @nice: nice value
+ */
+int can_nice(const task_t *p, const int nice)
+{
+ /* convert nice value [19,-20] to rlimit style value [0,39] */
+ int nice_rlim = 19 - nice;
+ return (nice_rlim <= p->signal->rlim[RLIMIT_NICE].rlim_cur ||
+ capable(CAP_SYS_NICE));
+}
+
#ifdef __ARCH_WANT_SYS_NICE

/*
@@ -3140,12 +3153,8 @@ asmlinkage long sys_nice(int increment)
* We don't have to worry. Conceptually one call occurs first
* and we have a single winner.
*/
- if (increment < 0) {
- if (!capable(CAP_SYS_NICE))
- return -EPERM;
- if (increment < -40)
- increment = -40;
- }
+ if (increment < -40)
+ increment = -40;
if (increment > 40)
increment = 40;

@@ -3155,6 +3164,9 @@ asmlinkage long sys_nice(int increment)
if (nice > 19)
nice = 19;

+ if (increment < 0 && !can_nice(current, nice))
+ return -EPERM;
+
retval = security_task_setnice(current, nice);
if (retval)
return retval;
@@ -3252,6 +3264,7 @@ recheck:
return -EINVAL;

if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
+ param->sched_priority > p->signal->rlim[RLIMIT_RTPRIO].rlim_cur &&
!capable(CAP_SYS_NICE))
return -EPERM;
if ((current->euid != p->euid) && (current->euid != p->uid) &&
===== kernel/sys.c 1.104 vs edited =====
--- 1.104/kernel/sys.c 2005-01-11 16:42:35 -08:00
+++ edited/kernel/sys.c 2005-01-14 14:10:11 -08:00
@@ -225,7 +225,7 @@ static int set_one_prio(struct task_stru
error = -EPERM;
goto out;
}
- if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
+ if (niceval < task_nice(p) && !can_nice(p, niceval)) {
error = -EACCES;
goto out;
}

2005-01-15 00:42:51

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


> On Thu, 2005-01-13 at 13:17 -0600, Jack O'Quin wrote:
>> But there may be other, better solutions to the deadlock problem.
>> Several years ago, Roger Larsson wrote a completely user-space
>> realtime monitor program that works perfectly well for revoking
>> realtime privileges when it detects CPU starvation. I still use it
>> occasionally to help debug problems if the built-in JACK watchdog
>> timer doesn't catch them.

Lee Revell <[email protected]> writes:
> Do you have a link to Roger Larsson's RT watchdog?

No official, supported version. With his permission, I posted a copy
on my home system a year ago for some audio users who had inquired
about it. That copy is here...

http://www.joq.us/joq/rt_monitor.tgz
--
joq

2005-01-15 01:17:14

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Mike Galbraith <[email protected]> writes:

> At 05:31 PM 1/13/2005 -0600, Jack O'Quin wrote:
>>Yes. However, my tests have so far shown a need for "actual FIFO as
>>long as the task behaves itself."
>
> I for one wonder why that appears to be so. What happens if you use
> SCHED_RR instead of SCHED_FIFO?
>
> (ie is the problem just one of running out of slice at a bad time, or
> is it the dynamic priority adjustment)

I have no quick and easy test for that.

If it's important, I can modify a version of JACK to use SCHED_RR,
instead.

I very much doubt it would make any difference, since we normally only
run one realtime thread at a time. Each client taps the next on the
shoulder when it is time for it to run, so there is essentially no
concurrency among them.
--
joq

2005-01-15 01:03:18

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Fri, Jan 14, 2005 at 03:04:18PM -0800, Chris Wright wrote:
> > Perhaps we want another helper function to do the rlim and
> > CAP_SYS_NICE check together.
>
> Sure.
> -chris

This last version looks good to me. My only concern right now is
increasing the list of rlimits, but I can probably save those for the
next rlimit addition.

--
Mathematics is the supreme nostalgia of our time.

2005-01-15 02:28:48

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Jack O'Quin wrote:
>>On Thu, 2005-01-13 at 13:17 -0600, Jack O'Quin wrote:
>>
>>>But there may be other, better solutions to the deadlock problem.
>>>Several years ago, Roger Larsson wrote a completely user-space
>>>realtime monitor program that works perfectly well for revoking
>>>realtime privileges when it detects CPU starvation. I still use it
>>>occasionally to help debug problems if the built-in JACK watchdog
>>>timer doesn't catch them.
>
>
> Lee Revell <[email protected]> writes:
>
>>Do you have a link to Roger Larsson's RT watchdog?
>
>
> No official, supported version. With his permission, I posted a copy
> on my home system a year ago for some audio users who had inquired
> about it. That copy is here...
>
> http://www.joq.us/joq/rt_monitor.tgz

Bad URL, not found....

--
~Randy

2005-01-15 04:09:39

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

>> Lee Revell <[email protected]> writes:
>>>Do you have a link to Roger Larsson's RT watchdog?

> Jack O'Quin wrote:
>> No official, supported version. With his permission, I posted a copy
>> on my home system a year ago for some audio users who had inquired
>> about it. That copy is here...
>> http://www.joq.us/joq/rt_monitor.tgz

"Randy.Dunlap" <[email protected]> writes:
> Bad URL, not found....

Sorry, that was a typo...

http://www.joq.us/jack/rt_monitor.tgz
--
joq

2005-01-15 04:58:35

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


Ingo Molnar <[email protected]> writes:

> what kind of non-audio workload was there during this test? 43 xruns
> arent nice but arent that bad either.

Audio playback through JACK didn't work well at all with nice --20,
even at relatively high latencies (23 msec cycle). It was bad enough
that I would not want to use it for anything.

The 1/2 second max delay was probably more of an issue than the number
of xruns. Something really bad happened there.

> this will turn off starvation checking, for testing purposes. (to see
> whether there's anything else but anti-starvation causing xruns.)

I build a 2.6.10 kernel with just these two changes...

--- kernel/sched.c~ Fri Dec 24 15:35:24 2004
+++ kernel/sched.c Wed Jan 12 23:48:49 2005
@@ -95,7 +95,7 @@
#define MAX_BONUS (MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
#define INTERACTIVE_DELTA 2
#define MAX_SLEEP_AVG (DEF_TIMESLICE * MAX_BONUS)
-#define STARVATION_LIMIT (MAX_SLEEP_AVG)
+#define STARVATION_LIMIT 0
#define NS_MAX_SLEEP_AVG (JIFFIES_TO_NS(MAX_SLEEP_AVG))
#define CREDIT_LIMIT 100

--- kernel/workqueue.c~ Fri Dec 24 15:35:40 2004
+++ kernel/workqueue.c Fri Jan 14 19:34:10 2005
@@ -188,7 +188,7 @@

current->flags |= PF_NOFREEZE;

- set_user_nice(current, -10);
+ set_user_nice(current, -5);

/* Block and flush all signals */
sigfillset(&blocked);

Since realtime-lsm was not available, I ran the test as root. Overall
system performance was not good. Trying to do mail with xemacs and
gnus (as I had done before) hung for long periods of time.

The test did not work correctly. A number of segfaults occurred. The
jackd server hung and had to be killed manually.

So, these results aren't worth much, but here's what it reported
(compared with earlier results)...


With -R Without -R Without -R
(SCHED_FIFO) (nice --20) (STARVATION_LIMIT=0)

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1) ( 1) ( 2)
XRUN Count . . . . . . . . . : 2 43 46
Delay Count (>spare time) . . : 0 0 0
Delay Count (>1000 usecs) . . : 0 0 0
Delay Maximum . . . . . . . . : 3130 usecs 501374 usecs 0 usecs
Cycle Maximum . . . . . . . . : 960 usecs 1036 usecs 0 usecs
Average DSP Load. . . . . . . : 34.3 % 34.3 % 19.5 %

The "{Delay|Cycle} Maximum" values were apparently not reported
because jackd hung. I suspect the DSP load went down because many of
the clients crashed.

I ran it again with -R on this kernel, just to check. The DSP load
was back to 33.2%. It performed as before, except the "{Delay|Cycle}
Maximum" values were also reported as zero. Not sure why, don't have
time to debug it right now. Running with -R did not impact other
interactive processes as badly as nice --20 on this kernel.

If you want, I can dig into this some more and try to figure out what
went wrong. Did I make the exact changes you wanted?

If it's not interesting, I probably won't bother.
--
joq

2005-01-15 08:18:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

At 07:14 PM 1/14/2005 -0600, Jack O'Quin wrote:
>Mike Galbraith <[email protected]> writes:
>
> > At 05:31 PM 1/13/2005 -0600, Jack O'Quin wrote:
> >>Yes. However, my tests have so far shown a need for "actual FIFO as
> >>long as the task behaves itself."
> >
> > I for one wonder why that appears to be so. What happens if you use
> > SCHED_RR instead of SCHED_FIFO?
> >
> > (ie is the problem just one of running out of slice at a bad time, or
> > is it the dynamic priority adjustment)
>
>I have no quick and easy test for that.
>
>If it's important, I can modify a version of JACK to use SCHED_RR,
>instead.

I think the problem you're seeing is strange enough to consider trying the
(possibly odd sounding) test. I haven't seen an explanation of why nice
-20 doesn't work for you.

>I very much doubt it would make any difference, since we normally only
>run one realtime thread at a time. Each client taps the next on the
>shoulder when it is time for it to run, so there is essentially no
>concurrency among them.

It may not make any difference. Seeing that would at least be an
additional datapoint. The only significant difference I see between a
gaggle of SCHED_FIFO tasks and one of nice -20 tasks, who are alone in
their top-of-the-heap queue, and who are not cpu hogs, is the timeslice. I
don't recall there being any wakeup/preempt logic differences, ergo the
SCHED_RR suggestion.

-Mike

2005-01-15 13:50:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> OK, I reran with just 5 processes reniced from -10 to -5. On my
> system they were: events, khelper, kblockd, aio and reiserfs. In
> addition, I reniced loop0 from -20 to -5.

> One major problem: this `nice --20' hack affects every thread, not
> just the critical realtime ones. That's not what we want. Audio
> applications make very conscious choices which threads run with high
> priority and which do not.

how much did this problem affect your test? Could the source of the 500
msec delays be the non-highprio components of the test that somehow
became nice --20?

Ingo

2005-01-15 14:43:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> --- kernel/sched.c~ Fri Dec 24 15:35:24 2004
> +++ kernel/sched.c Wed Jan 12 23:48:49 2005
> @@ -95,7 +95,7 @@
> #define MAX_BONUS (MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
> #define INTERACTIVE_DELTA 2
> #define MAX_SLEEP_AVG (DEF_TIMESLICE * MAX_BONUS)
> -#define STARVATION_LIMIT (MAX_SLEEP_AVG)
> +#define STARVATION_LIMIT 0
> #define NS_MAX_SLEEP_AVG (JIFFIES_TO_NS(MAX_SLEEP_AVG))
> #define CREDIT_LIMIT 100

could you try the patch below? The above patch wasnt enough. With the
patch below we turn off the starvation limits for nice --20 tasks only.
This is still a hack only. If we cannot make nice --20 perform like
RT-prio-1 then there's some problem with SCHED_OTHER scheduling.

Ingo

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -2245,10 +2245,10 @@ EXPORT_PER_CPU_SYMBOL(kstat);
* if a better static_prio task has expired:
*/
#define EXPIRED_STARVING(rq) \
- ((STARVATION_LIMIT && ((rq)->expired_timestamp && \
+ ((task_nice(current) > -20) && ((STARVATION_LIMIT && ((rq)->expired_timestamp && \
(jiffies - (rq)->expired_timestamp >= \
STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
- ((rq)->curr->static_prio > (rq)->best_expired_prio))
+ ((rq)->curr->static_prio > (rq)->best_expired_prio)))

/*
* Do the virtual cpu time signal calculations.

2005-01-15 23:02:24

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> * Jack O'Quin <[email protected]> wrote:
>
>> OK, I reran with just 5 processes reniced from -10 to -5. On my
>> system they were: events, khelper, kblockd, aio and reiserfs. In
>> addition, I reniced loop0 from -20 to -5.
>
>> One major problem: this `nice --20' hack affects every thread, not
>> just the critical realtime ones. That's not what we want. Audio
>> applications make very conscious choices which threads run with high
>> priority and which do not.
>
> how much did this problem affect your test? Could the source of the 500
> msec delays be the non-highprio components of the test that somehow
> became nice --20?

Some interference is definitely possible. But, the test does not
involve any graphical interface, so I'd expect that to be small.
Looking at jack_test3_client.cpp, the main thread just does a sleep()
while the process cycle is running.

Still, it's hard to be sure.

Probably, the best way to tell would be patching JACK so it uses
nice(-20) instead of pthread_setschedparam() for the realtime threads.
As a hack, that looks easy. I'll build a working directory with just
that change, so we can experiment with it better.
--
joq

2005-01-15 23:10:08

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


> * Jack O'Quin <[email protected]> wrote:
>
>> --- kernel/sched.c~ Fri Dec 24 15:35:24 2004
>> +++ kernel/sched.c Wed Jan 12 23:48:49 2005
>> @@ -95,7 +95,7 @@
>> #define MAX_BONUS (MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
>> #define INTERACTIVE_DELTA 2
>> #define MAX_SLEEP_AVG (DEF_TIMESLICE * MAX_BONUS)
>> -#define STARVATION_LIMIT (MAX_SLEEP_AVG)
>> +#define STARVATION_LIMIT 0
>> #define NS_MAX_SLEEP_AVG (JIFFIES_TO_NS(MAX_SLEEP_AVG))
>> #define CREDIT_LIMIT 100

Ingo Molnar <[email protected]> writes:
> could you try the patch below? The above patch wasnt enough. With the
> patch below we turn off the starvation limits for nice --20 tasks only.
> This is still a hack only. If we cannot make nice --20 perform like
> RT-prio-1 then there's some problem with SCHED_OTHER scheduling.

I am building again with your new patch and with STARVATION_LIMIT
defined as (MAX_SLEEP_AVG) again. I'll run that with a modified JACK
to reduce the interference of all those other non-realtime threads.

Will let you know what happens.
--
joq

2005-01-15 23:38:22

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM



>> * Jack O'Quin <[email protected]> wrote:
>>> One major problem: this `nice --20' hack affects every thread, not
>>> just the critical realtime ones. That's not what we want. Audio
>>> applications make very conscious choices which threads run with high
>>> priority and which do not.

> Ingo Molnar <[email protected]> writes:
>> how much did this problem affect your test? Could the source of the 500
>> msec delays be the non-highprio components of the test that somehow
>> became nice --20?

Jack O'Quin <[email protected]> writes:
> Probably, the best way to tell would be patching JACK so it uses
> nice(-20) instead of pthread_setschedparam() for the realtime threads.
> As a hack, that looks easy. I'll build a working directory with just
> that change, so we can experiment with it better.

Bah! Nothing is ever as easy as it looks.

According to the manpage, nice(2) is per-process not per-thread. That
does not give the granularity we need.

Is that correct? If so, I can't think of any way to make this work.
Suggestions?

We need to allow both realtime and non-realtime threads in the same
process. Anything less would require an enormous rewrite for most
audio programs, an unreasonable thing to ask.
--
joq

2005-01-15 23:48:37

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Mike Galbraith <[email protected]> writes:

> At 07:14 PM 1/14/2005 -0600, Jack O'Quin wrote:
>>Mike Galbraith <[email protected]> writes:
>>
>> > At 05:31 PM 1/13/2005 -0600, Jack O'Quin wrote:
>> >>Yes. However, my tests have so far shown a need for "actual FIFO as
>> >>long as the task behaves itself."
>> >
>> > I for one wonder why that appears to be so. What happens if you use
>> > SCHED_RR instead of SCHED_FIFO?
>> >
>> > (ie is the problem just one of running out of slice at a bad time, or
>> > is it the dynamic priority adjustment)
>>
>>I have no quick and easy test for that.
>>
>>If it's important, I can modify a version of JACK to use SCHED_RR,
>>instead.
>
> I think the problem you're seeing is strange enough to consider trying
> the (possibly odd sounding) test. I haven't seen an explanation of
> why nice -20 doesn't work for you.

The simplest explanation that makes any sense to me is that the
non-realtime threads are interfering with the realtime ones. These
threads don't do much in this test, although they would in a real
audio application. Still, there are enough things going on before and
after the sleep() in the main thread to possibly generate the number
of xruns we're seeing.

This is why I don't think nice is an appropriate solution for the
problem we're trying to solve. It's too blunt an instrument for audio
work.

>>I very much doubt it would make any difference, since we normally only
>>run one realtime thread at a time. Each client taps the next on the
>>shoulder when it is time for it to run, so there is essentially no
>>concurrency among them.
>
> It may not make any difference. Seeing that would at least be an
> additional datapoint. The only significant difference I see between a
> gaggle of SCHED_FIFO tasks and one of nice -20 tasks, who are alone in
> their top-of-the-heap queue, and who are not cpu hogs, is the
> timeslice. I don't recall there being any wakeup/preempt logic
> differences, ergo the SCHED_RR suggestion.

I think you're missing the fact that SCHED_FIFO is per-thread while
nice() is per-process.
--
joq

2005-01-16 01:48:20

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


> Ingo Molnar <[email protected]> writes:
>> could you try the patch below? The above patch wasnt enough. With the
>> patch below we turn off the starvation limits for nice --20 tasks only.
>> This is still a hack only. If we cannot make nice --20 perform like
>> RT-prio-1 then there's some problem with SCHED_OTHER scheduling.

I made your suggested sched.c change. It works much better. I was
not able to modify JACK (for reasons explained in an earlier message).
So, this test has the same interference problems with non-realtime
threads. Of course, I don't know for sure that they are the source of
our xruns and long delays, but that's my suspicion.

I didn't have the same problems with normal processes being
unresponsive (compared to the previous sched.c patch). The test ran
normally, with about the same results we saw before for the nice --20
experiments.

*** Terminated Sat Jan 15 18:15:13 CST 2005 ***
************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1)
XRUN Count . . . . . . . . . : 47
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 500544 usecs
Cycle Maximum . . . . . . . . : 1086 usecs
Average DSP Load. . . . . . . : 36.1 %
Average CPU System Load . . . : 8.2 %
Average CPU User Load . . . . : 26.3 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.4 %
Average CPU IRQ Load . . . . : 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate . . . : 1703.3 /sec
Average Context-Switch Rate . : 11600.6 /sec
*********************************************

I think this means the starvation test was not the problem. So far,
I've seen no proof that there is any problem with the 2.6.10
scheduler, just some evidence that nice --20 does not work for
multi-threaded realtime audio.

If someone can suggest a way to run certain threads of a process with
a different nice value than the others, I can probably hack that into
JACK in some crude way. That should tell us whether my intuition is
right about the source of scheduling interference.

Otherwise, I'm out of ideas at the moment. I don't think SCHED_RR
will be any different from SCHED_FIFO in this test. Even if it were,
I'm not sure what that would prove.
--
joq

2005-01-16 04:29:41

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Jack O'Quin <[email protected]> writes:

> *** Terminated Sat Jan 15 18:15:13 CST 2005 ***
> ************* SUMMARY RESULT ****************
> Total seconds ran . . . . . . : 300
> Number of clients . . . . . . : 20
> Ports per client . . . . . . : 4
> Frames per buffer . . . . . . : 64
> *********************************************
> Timeout Count . . . . . . . . :( 1)
> XRUN Count . . . . . . . . . : 47
> Delay Count (>spare time) . . : 0
> Delay Count (>1000 usecs) . . : 0
> Delay Maximum . . . . . . . . : 500544 usecs
> Cycle Maximum . . . . . . . . : 1086 usecs
> Average DSP Load. . . . . . . : 36.1 %
> Average CPU System Load . . . : 8.2 %
> Average CPU User Load . . . . : 26.3 %
> Average CPU Nice Load . . . . : 0.0 %
> Average CPU I/O Wait Load . . : 0.4 %
> Average CPU IRQ Load . . . . : 0.7 %
> Average CPU Soft-IRQ Load . . : 0.0 %
> Average Interrupt Rate . . . : 1703.3 /sec
> Average Context-Switch Rate . : 11600.6 /sec
> *********************************************
>
> I think this means the starvation test was not the problem. So far,
> I've seen no proof that there is any problem with the 2.6.10
> scheduler, just some evidence that nice --20 does not work for
> multi-threaded realtime audio.
>
> If someone can suggest a way to run certain threads of a process with
> a different nice value than the others, I can probably hack that into
> JACK in some crude way. That should tell us whether my intuition is
> right about the source of scheduling interference.
>
> Otherwise, I'm out of ideas at the moment. I don't think SCHED_RR
> will be any different from SCHED_FIFO in this test. Even if it were,
> I'm not sure what that would prove.

Studying the test script, I discovered that it starts a separate
program running in the background. So, I hacked the script to run it
with nice -15 in order not to interfere with the realtime threads.
The XRUNS didn't get much better, but the maximum delay went way down,
from 1/2 sec to a much more believable (but still too high) 32.5 msec.
I ran this with the same patched scheduler.

*** Terminated Sat Jan 15 21:22:00 CST 2005 ***
************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 0)
XRUN Count . . . . . . . . . : 43
Delay Count (>spare time) . . : 0
Delay Count (>1000 usecs) . . : 0
Delay Maximum . . . . . . . . : 32518 usecs
Cycle Maximum . . . . . . . . : 820 usecs
Average DSP Load. . . . . . . : 34.9 %
Average CPU System Load . . . : 8.5 %
Average CPU User Load . . . . : 23.8 %
Average CPU Nice Load . . . . : 0.0 %
Average CPU I/O Wait Load . . : 0.0 %
Average CPU IRQ Load . . . . : 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 %
Average Interrupt Rate . . . : 1688.5 /sec
Average Context-Switch Rate . : 11704.9 /sec
*********************************************

This supports my intuition that lack of per-thread granularity is the
main problem. Where I was able to isolate some non-realtime code and
run it at lower priority, it helped quite a bit.
--
joq

2005-01-16 23:13:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> According to the manpage, nice(2) is per-process not per-thread. That
> does not give the granularity we need.

the manpage is incorrect - sys_nice() is per-thread. (Btw., you could
use setpriority() too.)

Ingo

2005-01-16 23:22:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> Studying the test script, I discovered that it starts a separate
> program running in the background. So, I hacked the script to run it
> with nice -15 in order not to interfere with the realtime threads. The
> XRUNS didn't get much better, but the maximum delay went way down,
> from 1/2 sec to a much more believable (but still too high) 32.5 msec.
> I ran this with the same patched scheduler.

> This supports my intuition that lack of per-thread granularity is the
> main problem. Where I was able to isolate some non-realtime code and
> run it at lower priority, it helped quite a bit.

ok, makes perfect sense. My suggestion for the next step would be to try
nice() or setpriority() to do priority isolation.

If that experiment works out fine (i.e. the xrun count is comparable to
the SCHED_FIFO case) then it would also be nice to do a nice --19 run
(under the hacked kernel), which is a priority level that doesnt have
starvation turned off in the patched kernel but is otherwise very close
in behavior to nice --20.

i.e. as an end result we'd have the following 3 priority setups
compared: SCHED_FIFO:RT-prio-1, SCHED_NORMAL:nice--20,
SCHED_NORMAL:nice--19. The (ideal) goal would be for them to have
near-identical audio-latency performance.

Ingo

2005-01-16 23:56:30

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


> * Jack O'Quin <[email protected]> wrote:
>> According to the manpage, nice(2) is per-process not per-thread. That
>> does not give the granularity we need.

Ingo Molnar <[email protected]> writes:
> the manpage is incorrect - sys_nice() is per-thread. (Btw., you could
> use setpriority() too.)

OK. Where is this stuff documented?

BTW, I think this violates POSIX, which states...

The nice value set with nice() shall be applied to the process. If
the process is multi-threaded, the nice value shall affect all
system scope threads in the process.

(It does not affect SCHED_FIFO or SCHED_RR threads, however.)

Is it possible to call sched_setscheduler() with a thread ID instead
of a pid? That's what I really need. JACK sets and resets the thread
priorities from a different thread.
--
joq

2005-01-17 09:19:10

by Sytse Wielinga

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

On Sun, Jan 16, 2005 at 05:57:23PM -0600, Jack O'Quin wrote:
> > * Jack O'Quin <[email protected]> wrote:
> >> According to the manpage, nice(2) is per-process not per-thread. That
> >> does not give the granularity we need.
>
> Ingo Molnar <[email protected]> writes:
> > the manpage is incorrect - sys_nice() is per-thread. (Btw., you could
> > use setpriority() too.)
>
> OK. Where is this stuff documented?
>
> BTW, I think this violates POSIX, which states...
>
> The nice value set with nice() shall be applied to the process. If
> the process is multi-threaded, the nice value shall affect all
> system scope threads in the process.

We are talking about two different things here. POSIX is just about API and
has, correct me if I'm wrong, nothing to do with system calls whatsoever. The
manpage nice(2) is about the libc library call nice(), which is per-process,
which it should be according to POSIX. The system call, called sys_nice() in C,
is per-thread. Apparently glibc or some thread library contains some magic to
make the translation.

Sytse

2005-01-17 10:07:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> Is it possible to call sched_setscheduler() with a thread ID instead
> of a pid? That's what I really need. JACK sets and resets the thread
> priorities from a different thread.

yes. The PID arguments in these APIs are all treated as 'TIDs'. One day
the APIs themselves might switch over to what POSIX specifies, and there
will be new, thread-specific APIs - but at the moment they are all
thread-granular.

(If then this switchover will happen in a controlled manner via glibc,
not via the kernel. I.e. kernel will introduce new syscalls to do the
per-process priority changing, then newest glibc will utilize it - i.e.
already existing apps will stay compatible.)

Ingo

2005-01-17 14:38:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Sytse Wielinga <[email protected]> wrote:

> We are talking about two different things here. POSIX is just about
> API and has, correct me if I'm wrong, nothing to do with system calls
> whatsoever. The manpage nice(2) is about the libc library call nice(),
> which is per-process, which it should be according to POSIX. The
> system call, called sys_nice() in C, is per-thread. Apparently glibc
> or some thread library contains some magic to make the translation.

AFAIK there's no such translation at the glibc level - i.e. you'll get
per-thread semantics. (glibc really needs kernel help to do the
per-process things cleanly.) Anyway, this hasnt been a big issue in the
past, and especially for the current testing purpose this behavior is
what we need right now.

Ingo

2005-01-18 08:04:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> In the absence of any documentation, I'm guessing about storing the
> nice value in the priority field of the sched_param struct. But, I
> have not been able to figure out how to make that work.

the call you need is:

setpriority(PRIO_PROCESS, tid, -20);

where 'tid' is the TID (pid) of the thread in question. There's no way i
know of to utilize the pthread_t ID to do this, so you'll have to figure
the TID out via gettid() - which needs to happen in the child context -
how hard would it be to attach the TID field to some per-thread Jack
structure? [while the purpose is still a quick hack.]

Ingo

2005-01-18 17:04:51

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> * Jack O'Quin <[email protected]> wrote:
>
>> In the absence of any documentation, I'm guessing about storing the
>> nice value in the priority field of the sched_param struct. But, I
>> have not been able to figure out how to make that work.
>
> the call you need is:
>
> setpriority(PRIO_PROCESS, tid, -20);
>
> where 'tid' is the TID (pid) of the thread in question. There's no way i
> know of to utilize the pthread_t ID to do this, so you'll have to figure
> the TID out via gettid() - which needs to happen in the child context -
> how hard would it be to attach the TID field to some per-thread Jack
> structure? [while the purpose is still a quick hack.]

Adding a tid field is relatively easy. Fixing the race condition
between setting it in the new thread and using it in the creating
thread is harder, but not impossible. But, even setting it in the new
thread would create an incompatible interface. With hundreds of JACK
client applications, binary compatibility is a serious consideration.

Due to the absurd difficulty of successfully creating a realtime
thread under the various incompatible Linux kernels and pthread
libraries, we export jack_create_thread() to applications. That way,
they can take advantage of our latest fix for the latest NPTL botch
(0.60 was particularly bad).

So, the new thread's start_routine is not necessarily ours. I suppose
we could provide an internal function to intialize the thread and then
call the requester's start_routine. But, this is getting to be a
significant time sink.

Eventually, I can probably cobble something together that will
establish whether your current 2.6.10 SCHED_OTHER works with nice -20.
Is that all we're trying to accomplish? I do think it can be made to
work (on some kernel versions, given appropriate privileges, with
kernel thread priorities adjusted properly, etc.).

But, that does not meet any of my needs.
--
joq

2005-01-19 08:34:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> Adding a tid field is relatively easy. Fixing the race condition
> between setting it in the new thread and using it in the creating
> thread is harder, but not impossible. But, even setting it in the new
> thread would create an incompatible interface. With hundreds of JACK
> client applications, binary compatibility is a serious consideration.

i'm not suggesting that this is the way to go, it's just to test how
nice--20 tasks would perform (on the hacked kernel). We still dont have
this data, because in the other tests you tried, some non-highprio
threads got nice--20 priority as well, which can (and apparently do)
interfere with the highprio threads.

is it possible to call a function from the highprio-threads (and only
from them) themselves, during the setup of those threads? If this is
possible then all you need to add is a nice(-20); function call, which
only affects the current thread. (you dont have to know the TID or PID
and dont have to extend any Jack APIs and structures for this hack.)

('highprio threads' are the ones that normally get SCHED_FIFO priority
with -R, 'lowprio threads' are the other client-side threads, if any.)

Ingo

2005-01-19 14:40:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Ingo Molnar <[email protected]> wrote:

> i'm not suggesting that this is the way to go, it's just to test how
> nice--20 tasks would perform (on the hacked kernel). We still dont
> have this data, because in the other tests you tried, some
> non-highprio threads got nice--20 priority as well, which can (and
> apparently do) interfere with the highprio threads.

to make it easier to test, i've written an API hack: with the kernel
patch below setscheduler() will set the task to nice --20 if you use
SCHED_FIFO and sched_priority of 1. I.e. all you need to do is to run
Jack with -R and use an RT priority of 1 - all the highprio threads
should then become nice --20. If you use RT prio 2 (or higher) it should
be SCHED_FIFO again. Just apply the patch to 2.6.11-rc1 (2.6.10 might
work too) and it will work automatically. (the hack also includes the
earlier 'no starvation for nice--20 tasks' hack.)

Ingo

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -2245,10 +2245,10 @@ EXPORT_PER_CPU_SYMBOL(kstat);
* if a better static_prio task has expired:
*/
#define EXPIRED_STARVING(rq) \
- ((STARVATION_LIMIT && ((rq)->expired_timestamp && \
+ ((task_nice(current) > -20) && ((STARVATION_LIMIT && ((rq)->expired_timestamp && \
(jiffies - (rq)->expired_timestamp >= \
STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
- ((rq)->curr->static_prio > (rq)->best_expired_prio))
+ ((rq)->curr->static_prio > (rq)->best_expired_prio)))

/*
* Do the virtual cpu time signal calculations.
@@ -3211,6 +3211,12 @@ static inline task_t *find_process_by_pi
static void __setscheduler(struct task_struct *p, int policy, int prio)
{
BUG_ON(p->array);
+ if (prio == 1 && policy != SCHED_NORMAL) {
+ p->policy = SCHED_NORMAL;
+ p->static_prio = NICE_TO_PRIO(-20);
+ p->prio = p->static_prio;
+ return;
+ }
p->policy = policy;
p->rt_priority = prio;
if (policy != SCHED_NORMAL)

2005-01-19 17:48:20

by Jack O'Quin

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

Ingo Molnar <[email protected]> writes:

> * Ingo Molnar <[email protected]> wrote:
>
>> i'm not suggesting that this is the way to go, it's just to test how
>> nice--20 tasks would perform (on the hacked kernel). We still dont
>> have this data, because in the other tests you tried, some
>> non-highprio threads got nice--20 priority as well, which can (and
>> apparently do) interfere with the highprio threads.

I could hack the threads that the test actually uses just to get some
numbers. But, that will break some existing JACK clients.

> ('highprio threads' are the ones that normally get SCHED_FIFO priority
> with -R, 'lowprio threads' are the other client-side threads, if any.)

I usually call them `realtime threads' and `non-realtime threads'.
Means the same thing. I think of them that way, because any code
running in a realtime thread is severely constrained. It must be
written *very* carefully, almost like a hardware interrupt handler.

> to make it easier to test, i've written an API hack: with the kernel
> patch below setscheduler() will set the task to nice --20 if you use
> SCHED_FIFO and sched_priority of 1. I.e. all you need to do is to run
> Jack with -R and use an RT priority of 1 - all the highprio threads
> should then become nice --20. If you use RT prio 2 (or higher) it should
> be SCHED_FIFO again. Just apply the patch to 2.6.11-rc1 (2.6.10 might
> work too) and it will work automatically. (the hack also includes the
> earlier 'no starvation for nice--20 tasks' hack.)

Good idea, thanks.

These tests mean a lot more running "real" audio programs. :-)

> @@ -3211,6 +3211,12 @@ static inline task_t *find_process_by_pi
> static void __setscheduler(struct task_struct *p, int policy, int prio)
> {
> BUG_ON(p->array);
> + if (prio == 1 && policy != SCHED_NORMAL) {
> + p->policy = SCHED_NORMAL;
> + p->static_prio = NICE_TO_PRIO(-20);
> + p->prio = p->static_prio;
> + return;
> + }
> p->policy = policy;
> p->rt_priority = prio;
> if (policy != SCHED_NORMAL)
>

JACK actually uses three different priorities, the defaults are 9, 10
and 20. How about if I change this test?

if (prio <= 20 && policy != SCHED_NORMAL) {

Or, should that be?

if (prio > 0 && prio <= 20 && policy != SCHED_NORMAL) {
--
joq

2005-01-19 18:32:55

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM

> > @@ -3211,6 +3211,12 @@ static inline task_t *find_process_by_pi
> > static void __setscheduler(struct task_struct *p, int policy, int prio)
> > {
> > BUG_ON(p->array);
> > + if (prio == 1 && policy != SCHED_NORMAL) {
> > + p->policy = SCHED_NORMAL;
> > + p->static_prio = NICE_TO_PRIO(-20);
> > + p->prio = p->static_prio;
> > + return;
> > + }
> > p->policy = policy;
> > p->rt_priority = prio;
> > if (policy != SCHED_NORMAL)
> >
>
> JACK actually uses three different priorities, the defaults are 9, 10
> and 20. How about if I change this test?
>
> if (prio <= 20 && policy != SCHED_NORMAL) {
>
> Or, should that be?
>
> if (prio > 0 && prio <= 20 && policy != SCHED_NORMAL) {

Or you can just drop the 'prio == 1 &&' part for this test. Ingo was
trying to be clever to allow some RT bits, but that's not really
necessary.

--
Mathematics is the supreme nostalgia of our time.

2005-01-20 08:06:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Jack O'Quin <[email protected]> wrote:

> JACK actually uses three different priorities, the defaults are 9, 10
> and 20. How about if I change this test?
>
> if (prio <= 20 && policy != SCHED_NORMAL) {

yeah, this is OK. 20 is used for the watchdog thread, right? (so it has
minimal latency impact). What's the difference between prio 9 and 10
threads? You might want to map prio 9 ones to nice--15 and prio 10 ones
to nice--20, if there's a real difference between them. But for the
first test i'd suggest to use nice--20 for both. (to make sure
SCHED_OTHER tasks interfere as rarely as possible.)

> Or, should that be?
>
> if (prio > 0 && prio <= 20 && policy != SCHED_NORMAL) {

'prio' cannot get negative here, so the first test is just as fine.

Ingo

2005-01-20 08:09:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [request for inclusion] Realtime LSM


* Matt Mackall <[email protected]> wrote:

> > Or, should that be?
> >
> > if (prio > 0 && prio <= 20 && policy != SCHED_NORMAL) {
>
> Or you can just drop the 'prio == 1 &&' part for this test. Ingo was
> trying to be clever to allow some RT bits, but that's not really
> necessary.

actually, there may be some kernel threads that may run at RT priority
99. But i agree, dropping the test for prio==1 should work just as fine.

Ingo