2004-06-30 13:42:14

by Paul Davis

[permalink] [raw]
Subject: 2.6.X, NPTL, SCHED_FIFO and JACK

JACK is the de-facto standard for low latency audio and
inter-application audio routing on Linux (its also widely appreciated
on OS X too). It makes heavy use of threads to provide the
functionality relied on by more than 2 dozen serious Linux audio
applications. For many users, its a requirement to use SCHED_FIFO and
mlockall() with audio applications, because of the realtime, low
latency nature of their configurations/goals.

Because of the recognition by kernel developers that 2.6 does not
perform as well as 2.4+lowlat (the Andrew Morton patches) when it
comes to scheduling latency, most audio developers and users have
remained with 2.4. Recently however, several brave souls have
attempted to test 2.6. The results have been mixed.

On the one hand, it does seem possible to get performance from an
unpatched 2.6 kernel that is pretty close to the 2.4+lowlat
numbers. Using the CKolivas patches for 2.6 only improves things
further.

However, the ONLY way to get even vaguely reasonable performance in
this area is to disable the use of NPTL using LD_ASSUME_KERNEL. With
NPTL in use, there are a series of apparently interlocking problems
with scheduler parameter inheritance, scheduler performance and
decision making. Its more or less impossible to run JACK-enabled audio
systems on 2.6 with NPTL. A series of ugly kludges are beginning to
emerge within the Linux audio community, and I think its time we cut
them off before things get out of hand.

The JACK group is entirely open to the idea that we have made an error
in our use of the pthreads API, and that NPTL is simply exposing our
mistake. We can't see the error, however, and so for the moment, we
are working on the assumption that there are genuine kernel+glibc
errors.

The first and most visible issue is with inheritance of SCHED_FIFO
scheduling. Although there are other mechanisms available under 2.6,
many people use the "jackstart" helper application which runs setuid
root and uses capabilities to start up JACK with the required caps to
allow use of SCHED_FIFO and mlockall(). This has worked very well in
2.4 for about 2 years, but in 2.6 JACK fails to get its threads to be
in the SCHED_FIFO scheduling class without a bunch of nasty kludges.

Things work correctly as soon as LD_ASSUME_KERNEL is used.

We also see apparently impossible thread scheduling, where a thread
that should run immediately is delayed by a significant time, and the
thread that woke the first one up (and should be waiting for it to
execute) runs again, apparently without ever having blocked. Once
more, it all works correctly is LD_ASSUME_KERNEL is used to avoid
NPTL.

Are there known issues with the implementation of NPTL that might give
rise to this behaviour? What can we do to help understand and debug
it?

thanks,

Paul Davis <[email protected]> Bala Cynwyd, PA, USA
Linux Audio Systems 610-667-4807
----------------------------------------------------------------------------
hybrid rather than pure; compromising rather than clean;
distorted rather than straightforward; ambiguous rather than
articulated; both-and rather than either-or; the difficult
unity of inclusion rather than the easy unity of exclusion. Robert Venturi
----------------------------------------------------------------------------


2004-06-30 15:03:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK


* Paul Davis <[email protected]> wrote:

> The first and most visible issue is with inheritance of SCHED_FIFO
> scheduling. Although there are other mechanisms available under 2.6,
> many people use the "jackstart" helper application which runs setuid
> root and uses capabilities to start up JACK with the required caps to
> allow use of SCHED_FIFO and mlockall(). This has worked very well in
> 2.4 for about 2 years, but in 2.6 JACK fails to get its threads to be
> in the SCHED_FIFO scheduling class without a bunch of nasty kludges.
>
> Things work correctly as soon as LD_ASSUME_KERNEL is used.

A simple "strace -f" should show whether the setscheduler() call
succeeds or not. Does 'jackstart' do anything with glibc internals?

> We also see apparently impossible thread scheduling, where a thread
> that should run immediately is delayed by a significant time, and the
> thread that woke the first one up (and should be waiting for it to
> execute) runs again, apparently without ever having blocked. Once
> more, it all works correctly is LD_ASSUME_KERNEL is used to avoid
> NPTL.

there was a SCHED_FIFO bug in all 2.6 kernels prior 2.6.5, causing
erratic scheduling. Have you tried 2.6.6 or 2.6.7?

> Are there known issues with the implementation of NPTL that might give
> rise to this behaviour? What can we do to help understand and debug
> it?

there's nothing special about NPTL, scheduling-wise. But if SCHED_FIFO
is not properly set for all JACK threads that could explain the
symptoms. You talked about kludges that are necessary to make all
threads SCHED_FIFO - are you 100% sure that all JACK threads are indeed
SCHED_FIFO after these kludges are applied? If yes and you are running a
later kernel then it's something new and probably NPTL-unrelated.

Ingo

2004-06-30 15:04:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK


another question: do all JACK threads run at SCHED_FIFO, and do they all
have the same rt_priority value?

Ingo

2004-06-30 15:17:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK


* Ingo Molnar <[email protected]> wrote:

> A simple "strace -f" should show whether the setscheduler() call
> succeeds or not. Does 'jackstart' do anything with glibc internals?

it seems part of the problem is that the setscheduler() calls 'succeed',
but the policy is not changed to SCHED_FIFO. The question here is,
are the correct PIDs used?

Ingo

2004-06-30 15:37:15

by Jakub Jelinek

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

On Wed, Jun 30, 2004 at 05:04:30PM +0200, Ingo Molnar wrote:
> > Are there known issues with the implementation of NPTL that might give
> > rise to this behaviour? What can we do to help understand and debug
> > it?
>
> there's nothing special about NPTL, scheduling-wise. But if SCHED_FIFO
> is not properly set for all JACK threads that could explain the
> symptoms. You talked about kludges that are necessary to make all
> threads SCHED_FIFO - are you 100% sure that all JACK threads are indeed
> SCHED_FIFO after these kludges are applied? If yes and you are running a
> later kernel then it's something new and probably NPTL-unrelated.

One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
So, if you care about what scheduling created threads will have
and want it to work with both NPTL and LinuxThreads, you want
pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
explicitely.

Jakub

2004-06-30 16:16:07

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

>another question: do all JACK threads run at SCHED_FIFO, and do they all
>have the same rt_priority value?

They don't all run SCHED_FIFO. Just two threads in the server (one is
a watchdog designed to prevent system lockups) and at least one in
each client (there may be more depending on what the client does, but
its not created by JACK and JACK doesn't know about it). The client
threads run at 1 level lower priority than the servers main thread,
and that runs 1 level lower than the watchdog.

but ...

>it seems part of the problem is that the setscheduler() calls 'succeed',
>but the policy is not changed to SCHED_FIFO. The question here is,
>are the correct PIDs used?

this has me thinking. one of the major changes with NPTL is that all
threads share the same PID. so how in the world do we ever set the
scheduling policy of a single thread (as opposed to something
identified by a pid_t) to SCHED_FIFO?

--p

2004-06-30 16:32:07

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

>One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
>while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
>So, if you care about what scheduling created threads will have
>and want it to work with both NPTL and LinuxThreads, you want
>pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
>explicitely.

But since we always set the scheduling class explicitly, should the
inherited scheduler class make any difference?

--p

2004-06-30 16:59:46

by Jakub Jelinek

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

On Wed, Jun 30, 2004 at 12:32:03PM -0400, Paul Davis wrote:
> >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
> >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
> >So, if you care about what scheduling created threads will have
> >and want it to work with both NPTL and LinuxThreads, you want
> >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
> >explicitely.
>
> But since we always set the scheduling class explicitly, should the
> inherited scheduler class make any difference?

Of course.
If you say
pthread_attr_init (&attr);
pthread_attr_setschedpolicy (&attr, SCHED_FIFO);
pthread_attr_setschedparam (&attr, &param);
pthread_create (&th, &attr, fn, arg);
then with LinuxThreads the thread will have FIFO policy while with
NPTL it won't unless the current thread has it.
If you:
pthread_attr_init (&attr);
pthread_attr_setschedpolicy (&attr, SCHED_FIFO);
pthread_attr_setschedparam (&attr, &param);
pthread_attr_setinheritsched (&attr, PTHREAD_INHERIT_SCHED);
pthread_create (&th, &attr, fn, arg);
then the thread will inherit scheduling parameters from current thread,
so unless it has FIFO the the fn thread will not have FIFO policy.
If you:
pthread_attr_init (&attr);
pthread_attr_setschedpolicy (&attr, SCHED_FIFO);
pthread_attr_setschedparam (&attr, &param);
pthread_attr_setinheritsched (&attr, PTHREAD_EXPLICIT_SCHED);
pthread_create (&th, &attr, fn, arg);
then thread will have FIFO policy in both NPTL and LinuxThreads.
For details see
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_attr_getinheritsched.html

The reason why LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED and
NPTL defaults to PTHREAD_INHERIT_SCHED is that those are the cheaper
variants. LinuxThreads has a manager thread which creates the child
threads, so for INHERIT_SCHED it needs to issue some syscalls to query
scheduling parameters of the thread which called pthread_create.
In addition to this, no matter what inheritsched setting was, if the
desired sched parameters are different from the initial thread, it
needs to issue a system call to set it for the new thread.
NPTL doesn't have a manager thread and a child thread inherits parent
thread's settings without any syscalls anywhere. For
PTHREAD_EXPLICIT_SCHED, it needs to issue a system call to set scheduling
params to the desired ones.

Jakub

2004-06-30 17:20:57

by Ulrich Drepper

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

Paul Davis wrote:

> this has me thinking. one of the major changes with NPTL is that all
> threads share the same PID. so how in the world do we ever set the
> scheduling policy of a single thread (as opposed to something
> identified by a pid_t) to SCHED_FIFO?

If you have to ask this question than it's no wonder you get erratic
behavior. It means you haven't looked at the pthread interface at all.

Define a pthread_attr_t with the appropriate setting (with
pthread_attr_setschedparam etc) and create the thread (and use
pthread_attr_setinheritsched correctly). Alternatively use
pthread_setschedparam on already running threads.

And use a recent enough nptl version. Very early versions didn't have
any of the scheduler handling implemented.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2004-06-30 17:50:48

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

>> this has me thinking. one of the major changes with NPTL is that all
>> threads share the same PID. so how in the world do we ever set the
>> scheduling policy of a single thread (as opposed to something
>> identified by a pid_t) to SCHED_FIFO?
>
>If you have to ask this question than it's no wonder you get erratic
>behavior. It means you haven't looked at the pthread interface at all.

thanks, i appreciate the ad hominem remarks. you think we could ever
get SCHED_FIFO if we were not familiar with these calls? this is
really unnecessary...

my question wasn't about the pthread API. it was about what kernel API
was used to implement it. the simple answer would have been that we
use the TID, not the PID, or to have just pointed me at the source.

>And use a recent enough nptl version. Very early versions didn't have
>any of the scheduler handling implemented.

we already discovered that. the people testing this stuff are using
the most recent "stable" release of glibc, for the most part.

--p

2004-06-30 17:55:14

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.X, NPTL, SCHED_FIFO and JACK

>On Wed, Jun 30, 2004 at 12:32:03PM -0400, Paul Davis wrote:
>> >One thing to note is that NPTL defaults to PTHREAD_INHERIT_SCHED
>> >while LinuxThreads defaults to PTHREAD_EXPLICIT_SCHED.
>> >So, if you care about what scheduling created threads will have
>> >and want it to work with both NPTL and LinuxThreads, you want
>> >pthread_attr_setinheritsched (&attr, PTHREAD_*_SCHED);
>> >explicitely.
>>
>> But since we always set the scheduling class explicitly, should the
>> inherited scheduler class make any difference?
>
>Of course.

i understand that in the context of "pthread_attr_*; pthread_create();",
but we use pthread_create() and then set scheduling class/priority
within the new thread. Why would INHERIT_SCHED affect that? Does it?