2006-05-10 09:22:37

by Sébastien Dugué

[permalink] [raw]
Subject: [RFC][PATCH RT 0/2] futex priority based wakeup



Hi,

in the current futex implementation, tasks are woken up in FIFO order,
(i.e. in the order they were put to sleep). For realtime systems needing
system wide strict realtime priority scheduling, tasks should be woken up
in priority order.

This patchset achieves this by changing the futex hash bucket list into
a plist. Tasks waiting on a futex are enqueued in this plist based on
their priority so that they can be woken up in priority order.

The 1st patch is a small optimization to futex_requeue() which could
go into mainline and does not depend on the RT patch.

The 2nd patch does the real job of converting the futex hash bucket list
into a plist.
There is however a drawback with this patch in that it is not possible to use
CONFIG_DEBUG_PI_LIST as the plist debugging code expects the spinlock protecting
the plist to be a raw_spinlock while the futex hash bucket lock is a regular
spinlock (rt-mutex). Maybe the plist debugging checks could be made generic to
accept both types.

Any thoughts?

Comments welcomed.

S?bastien.



-----------------------------------------------------

S?bastien Dugu? BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770

mailto:[email protected]

Linux POSIX AIO: http://www.bullopensource.org/posix
http://sourceforge.net/projects/paiol

-----------------------------------------------------


2006-05-10 10:09:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup


* S?bastien Dugu? <[email protected]> wrote:

> in the current futex implementation, tasks are woken up in FIFO
> order, (i.e. in the order they were put to sleep). For realtime
> systems needing system wide strict realtime priority scheduling, tasks
> should be woken up in priority order.
>
> This patchset achieves this by changing the futex hash bucket list
> into a plist. Tasks waiting on a futex are enqueued in this plist
> based on their priority so that they can be woken up in priority
> order.

hm, i dont think this is enough. Basically, waking up in priority order
is just the (easier) half of the story - what you want is to also
propagate priorities when you block. We provided a complete solution via
the PI-futex patchset (currently included in -mm).

In other words: as long as locking primitives go, i dont think real-time
applications should use wakeup-priority-ordered futexes, they should use
the real thing, PI futexes.

There is one exception: when a normal futex is used as a waitqueue
without any contention properties. (for example a waitqueue for worker
threads) But those are both rare, and typically dont muster tasks with
different priorities - i.e. FIFO is good enough.

Also, there's a performance cost to this. Could you try to measure the
impact to SCHED_OTHER tasks via some pthread locking benchmark?

Ingo

2006-05-10 13:02:25

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Wed, 2006-05-10 at 12:08 +0200, Ingo Molnar wrote:
> * S?bastien Dugu? <[email protected]> wrote:
>
> > in the current futex implementation, tasks are woken up in FIFO
> > order, (i.e. in the order they were put to sleep). For realtime
> > systems needing system wide strict realtime priority scheduling, tasks
> > should be woken up in priority order.
> >
> > This patchset achieves this by changing the futex hash bucket list
> > into a plist. Tasks waiting on a futex are enqueued in this plist
> > based on their priority so that they can be woken up in priority
> > order.
>
> hm, i dont think this is enough. Basically, waking up in priority order
> is just the (easier) half of the story - what you want is to also
> propagate priorities when you block. We provided a complete solution via
> the PI-futex patchset (currently included in -mm).
>
> In other words: as long as locking primitives go, i dont think real-time
> applications should use wakeup-priority-ordered futexes, they should use
> the real thing, PI futexes.

Yeah, that's right as long as userland pthread_mutexes are used, but
currently, pthread_condvars do not use PI-futexes. Therefore when we
have some threads sleeping on a condvar and the condvar is broadcasted,
those blocked threads are woken up through futex_requeue() in the order
they were put to sleep and not in their priority order.

Maybe the pthread_cond_*() function should be made to use the
PI-futexes support in glibc.

>
> There is one exception: when a normal futex is used as a waitqueue
> without any contention properties. (for example a waitqueue for worker
> threads) But those are both rare, and typically dont muster tasks with
> different priorities - i.e. FIFO is good enough.
>
> Also, there's a performance cost to this. Could you try to measure the
> impact to SCHED_OTHER tasks via some pthread locking benchmark?

Right, I will try to quantify the impact for SCHED_OTHER tasks. Any
pointers to such a benchmark?

Thanks,

S?bastien.


2006-05-10 14:29:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Wed, 2006-05-10 at 15:03 +0200, Sébastien Dugué wrote:
> Maybe the pthread_cond_*() function should be made to use the
> PI-futexes support in glibc.

<hack_alert>

There is a simple way to do so. Just remove the assembler version of
pthread_cond_xx and use the generic c code implementation in glibc. That
allows you to use a PI mutex for the outer mutex.

</hack_alert>

tglx


2006-05-10 15:01:59

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Wed, May 10, 2006 at 04:32:14PM +0200, Thomas Gleixner wrote:
> On Wed, 2006-05-10 at 15:03 +0200, S?bastien Dugu? wrote:
> > Maybe the pthread_cond_*() function should be made to use the
> > PI-futexes support in glibc.
>
> <hack_alert>
>
> There is a simple way to do so. Just remove the assembler version of
> pthread_cond_xx and use the generic c code implementation in glibc. That
> allows you to use a PI mutex for the outer mutex.
>
> </hack_alert>

I don't see how would that help. Both asm and sysdeps/pthread/*.c
versions call __pthread_mutex_unlock_usercnt/__pthread_mutex_cond_lock,
which will DTRT for the mutex passed as second argument to
pthread_mutex_*wait (assuming you have Ulrich's/mine PI nptl patch).
But, there are 2 other futexes used by condvars - internal condvar's lock
and __data.__futex. If the associated mutex uses PI protocol, then
I'm afraid the internal condvar lock needs to follow the same protocol
(i.e. use FUTEX_*LOCK_PI), otherwise a low priority task calling
pthread_cond_* and acquiring the internal lock, then scheduled away
indefinitely because of some middle-priority CPU hog could delay
some high priority task calling pthread_cond_* on the same condvar.
But, there is a problem here - pthread_cond_{signal,broadcast} don't
have any associated mutexes, so you often don't know which type
of protocol you want to use for the internal condvar lock.
Now, for the __data.__futex lock POSIX seems to be more clear,
all it says is that the wake up should happen according to the scheduling
policy (but, on the other side for pthread_mutex_unlock it says the same
and we still use FIFO there).

Jakub

2006-05-10 16:59:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Wed, 2006-05-10 at 11:01 -0400, Jakub Jelinek wrote:
> But, there are 2 other futexes used by condvars - internal condvar's lock
> and __data.__futex. If the associated mutex uses PI protocol, then
> I'm afraid the internal condvar lock needs to follow the same protocol
> (i.e. use FUTEX_*LOCK_PI), otherwise a low priority task calling
> pthread_cond_* and acquiring the internal lock, then scheduled away
> indefinitely because of some middle-priority CPU hog could delay
> some high priority task calling pthread_cond_* on the same condvar.

Did not think about __data.__lock

I said "hack_alert" explicitely as this was just a work around, so we
could use an PI futex for the outer one, which was used to protect other
things too. The assmebler code corrupted the lock field of the outer
mutex with 0/1/2.

> But, there is a problem here - pthread_cond_{signal,broadcast} don't
> have any associated mutexes, so you often don't know which type
> of protocol you want to use for the internal condvar lock.
> Now, for the __data.__futex lock POSIX seems to be more clear,
> all it says is that the wake up should happen according to the scheduling
> policy (but, on the other side for pthread_mutex_unlock it says the same
> and we still use FIFO there).

Sebastians patch might solve this.

Is there a way to (pre)set the policy of the inner lock or is there any
other solution in sight ?

tglx


2006-05-11 08:52:41

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Wed, 2006-05-10 at 19:02 +0200, Thomas Gleixner wrote:
> On Wed, 2006-05-10 at 11:01 -0400, Jakub Jelinek wrote:
> > But, there are 2 other futexes used by condvars - internal condvar's lock
> > and __data.__futex. If the associated mutex uses PI protocol, then
> > I'm afraid the internal condvar lock needs to follow the same protocol
> > (i.e. use FUTEX_*LOCK_PI), otherwise a low priority task calling
> > pthread_cond_* and acquiring the internal lock, then scheduled away
> > indefinitely because of some middle-priority CPU hog could delay
> > some high priority task calling pthread_cond_* on the same condvar.
>
> Did not think about __data.__lock
>
> I said "hack_alert" explicitely as this was just a work around, so we
> could use an PI futex for the outer one, which was used to protect other
> things too. The assmebler code corrupted the lock field of the outer
> mutex with 0/1/2.

Hm, I wonder what would be the effect of having the external mutex
and __data.__lock use a PI futex and __data.__futex use a regular
futex when the waiters on __data.__futex are requeued on the external
mutex during a broadcast.

>
> > But, there is a problem here - pthread_cond_{signal,broadcast} don't
> > have any associated mutexes, so you often don't know which type
> > of protocol you want to use for the internal condvar lock.

Just a wild guess here, but in the broadcast or signal path, couldn't
__data.__mutex be looked up to determine what protocol to use for the
__data.__futex?

> > Now, for the __data.__futex lock POSIX seems to be more clear,
> > all it says is that the wake up should happen according to the scheduling
> > policy (but, on the other side for pthread_mutex_unlock it says the same
> > and we still use FIFO there).
>
> Sebastians patch might solve this.
>
> Is there a way to (pre)set the policy of the inner lock or is there any
> other solution in sight ?

As a corolary to the above couldn't the protocol used by the external
mutex dictates what the __data.__futex should use (PI or not)?

Maybe there is a need for something like a futex_requeue_pi() otherwise
we'll end up mixing PI and non-PI futexes or maybe I'm not understanding
the thing.

S?bastien.




2006-05-12 13:31:26

by Pierre Peiffer

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

Hi,

Just few comments for my understanding:

Ingo Molnar a ?crit :
> * S?bastien Dugu? <[email protected]> wrote:
>
>> in the current futex implementation, tasks are woken up in FIFO
>> order, (i.e. in the order they were put to sleep). For realtime
>> systems needing system wide strict realtime priority scheduling, tasks
>> should be woken up in priority order.
>>
>> This patchset achieves this by changing the futex hash bucket list
>> into a plist. Tasks waiting on a futex are enqueued in this plist
>> based on their priority so that they can be woken up in priority
>> order.
>
> hm, i dont think this is enough. Basically, waking up in priority order
> is just the (easier) half of the story - what you want is to also
> propagate priorities when you block. We provided a complete solution via
> the PI-futex patchset (currently included in -mm).
>
> In other words: as long as locking primitives go, i dont think real-time
> applications should use wakeup-priority-ordered futexes, they should use
> the real thing, PI futexes.

In fact, I agree with that for a lock (pthread_mutex, etc).

>
> There is one exception: when a normal futex is used as a waitqueue
> without any contention properties. (for example a waitqueue for worker
> threads) But those are both rare, and typically dont muster tasks with
> different priorities - i.e. FIFO is good enough.
>

But here, I think this is what we have with the condvar, no ? When some
threads are blocked on the condvar (pthread_cond_wait), they must be
woken in priority order with pthread_broadcast, but there is no
"lock-owner" to boost here.
Even if all threads but one are requeued on the second futex (i.e. the
mutex used with the condvar), with the patch from Seb, they are requeued
in priority order and thus get woken in priority order: we don't need
any priority propagation here, I think.

So, I think that the PI-futexes are the right solution for the mutexes
and rwlocks. But this patch seems to me correct for condvar
(FUTEX_REQUEUE), I don't think that PI-futexes will add any benefit for
condvar (?). But I may have missed something ?


> Also, there's a performance cost to this. Could you try to measure the
> impact to SCHED_OTHER tasks via some pthread locking benchmark?
>
> Ingo

--
Pierre

2006-05-16 10:36:58

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Thu, May 11, 2006 at 10:56:57AM +0200, S?bastien Dugu? wrote:
> Hm, I wonder what would be the effect of having the external mutex
> and __data.__lock use a PI futex and __data.__futex use a regular
> futex when the waiters on __data.__futex are requeued on the external
> mutex during a broadcast.

Well, either glibc can stop using requeue if the mutex is PI mutex (and use
the slower fallback), or kernel would need to handle requeueing from normal to
PI futex.

> > > But, there is a problem here - pthread_cond_{signal,broadcast} don't
> > > have any associated mutexes, so you often don't know which type
> > > of protocol you want to use for the internal condvar lock.
>
> Just a wild guess here, but in the broadcast or signal path, couldn't
> __data.__mutex be looked up to determine what protocol to use for the
> __data.__futex?

Not always.
Say if you do:
thread1 (low prio) thread2 (very high prio) thread3 (mid prio)
pthread_cond_signal (&cv)
# first use of cv in the program, no mutex has been ever used with this
# condvar
lll_mutex_lock (&cv->__data.__lock)
pthread_cond_wait (&cv, &pi_mutex)
lll_mutex_lock (&cv->__data.__lock)
use_all_CPU
# Then thread2 is stuck, waiting on thread1 which waits on thread3

At pthread_cond_signal enter time you don't know the type of associated
mutex, so you don't know which kind of internal lock to use.
It doesn't have to be the first use of cv in the program, similarly it can
be any pthread_cond_{signal,broadcast} called while there are no threads
in pthread_cond_*wait on that cv.

Jakub

2006-05-18 08:46:56

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [RFC][PATCH RT 0/2] futex priority based wakeup

On Tue, 2006-05-16 at 06:36 -0400, Jakub Jelinek wrote:
> On Thu, May 11, 2006 at 10:56:57AM +0200, S?bastien Dugu? wrote:
> > Hm, I wonder what would be the effect of having the external mutex
> > and __data.__lock use a PI futex and __data.__futex use a regular
> > futex when the waiters on __data.__futex are requeued on the external
> > mutex during a broadcast.
>
> Well, either glibc can stop using requeue if the mutex is PI mutex (and use
> the slower fallback), or kernel would need to handle requeueing from normal to
> PI futex.

Yes, or make the condvar use a PI futex and implement a
futex_requeue_pi() and have glibc use that in the broadcast case. I
think the requeue thing is a good idea and should not be dropped.

>
> > > > But, there is a problem here - pthread_cond_{signal,broadcast} don't
> > > > have any associated mutexes, so you often don't know which type
> > > > of protocol you want to use for the internal condvar lock.
> >
> > Just a wild guess here, but in the broadcast or signal path, couldn't
> > __data.__mutex be looked up to determine what protocol to use for the
> > __data.__futex?
>
> Not always.
> Say if you do:
> thread1 (low prio) thread2 (very high prio) thread3 (mid prio)
> pthread_cond_signal (&cv)
> # first use of cv in the program, no mutex has been ever used with this
> # condvar
> lll_mutex_lock (&cv->__data.__lock)
> pthread_cond_wait (&cv, &pi_mutex)
> lll_mutex_lock (&cv->__data.__lock)
> use_all_CPU
> # Then thread2 is stuck, waiting on thread1 which waits on thread3
>
> At pthread_cond_signal enter time you don't know the type of associated
> mutex, so you don't know which kind of internal lock to use.
> It doesn't have to be the first use of cv in the program, similarly it can
> be any pthread_cond_{signal,broadcast} called while there are no threads
> in pthread_cond_*wait on that cv.

OK, it makes sense now, thanks.

S?bastien.