Hi,
Today, there are several functionalities or improvements about futexes included
in -rt kernel tree, which, I think, it make sense to have in mainline.
Among them, there are:
* futex use prio list : allow threads to be woken in priority order instead of
FIFO order.
* futex_wait use hrtimer : allow the use of finer timer resolution.
* futex_requeue_pi functionality : allow use of requeue optimisation for
PI-mutexes/PI-futexes.
* futex64 syscall : allow use of 64-bit futexes instead of 32-bit.
The following mails provide the corresponding patches.
Comments, suggestions, feedback, etc are welcome, as usual.
--
Pierre Peiffer
Andrew,
if the patches allow this, I'd like to see parts 2, 3, and 4 to be in
-mm ASAP. Especially the 64-bit variants are urgently needed. Just
hold off adding the plist use, I am still not convinced that
unconditional use is a good thing, especially with one single global list.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Andrew Morton a ?crit :
> OK. Unfortunately patches 2-4 don't apply without #1 present and the fix
> is not immediately obvious, so we'll need a respin+retest, please.
Ok, I'll provide updated patches for -mm ASAP.
> On Thu, 11 Jan 2007 09:47:28 -0800
> Ulrich Drepper <[email protected]> wrote:
>> if the patches allow this, I'd like to see parts 2, 3, and 4 to be in
>> -mm ASAP. Especially the 64-bit variants are urgently needed. Just
>> hold off adding the plist use, I am still not convinced that
>> unconditional use is a good thing, especially with one single global list.
Just to avoid any misunderstanding (I (really) understand your point about
performance issue), but:
* the problem I mention about several futexes hashed on the same key, and thus
with all potential waiters listed on the same list, is _not_ a new problem which
comes with this patch: it already exists today, with simple list.
* the measures of performance done with pthread_broadcast (and thus with
futex_requeue) is a good choice (well, may be not realistic, when considering
real applications (*)) to put in evidence the performance impact, rather than
threads making FUTEX_WAIT/FUTEX_WAKE: what is expensive with plist is the
plist_add operation (which occurs in FUTEX_WAIT), not plist_del (which occurs
during FUTEX_WAKE => thus, no big impact should be noticed here). Any measure
will be difficult to do with only FUTEX_WAIT/WAKE.
=> futex_requeue does as many plist_del/plist_add operations as the number of
threads waiting (minus 1), and thus has a direct impact on the time needed to
wake everybody (or to wake the first thread to be more precise).
(*) I'll try the volano bench, if I have time.
--
Pierre
* Pierre Peiffer <[email protected]> wrote:
> [...] Any measure will be difficult to do with only FUTEX_WAIT/WAKE.
that's not a problem - just do such a measurement and show that it does
/not/ impact performance measurably. That's what we want to know...
> (*) I'll try the volano bench, if I have time.
yeah. As an alternative, it might be a good idea to pthread-ify
hackbench.c - that should replicate the Volano workload pretty
accurately. I've attached hackbench.c. (it's process based right now, so
it wont trigger contended futex ops)
Ingo
Hi,
Ingo Molnar a ?crit :
> yeah. As an alternative, it might be a good idea to pthread-ify
> hackbench.c - that should replicate the Volano workload pretty
> accurately. I've attached hackbench.c. (it's process based right now, so
> it wont trigger contended futex ops)
Ok, thanks. I've adapted your test, Ingo, and do some measures. (I've only
replaced fork with pthread_create, I didn't use condvar or barrier for the first
synchronization).
The modified hackbench is available here:
http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c
I've run this bench 1000 times with pipe and 800 groups.
Here are the results:
Test1 - with simple list (i.e. without any futex patches)
=========================================================
Iterations=1000
Latency (s) min max avg stddev
26.67 27.89 27.14 0.19
Test2 - with plist (i.e. with only patch 1/4 as is)
===================================================
Iterations=1000
Latency (s) min max avg stddev
26.87 28.18 27.30 0.18
Test3 - with plist but all SHED_OTHER registered
with the same priority (MAX_RT_PRIO)
(i.e. with modified patch 1/4, patch not yet posted here)
=========================================================
Iterations=1000
Latency (s) min max avg stddev
26.74 27.84 27.16 0.18
--
Pierre
* Pierre Peiffer <[email protected]> wrote:
> The modified hackbench is available here:
>
> http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c
cool!
> I've run this bench 1000 times with pipe and 800 groups.
> Here are the results:
>
> Test1 - with simple list (i.e. without any futex patches)
> =========================================================
> Latency (s) min max avg stddev
> 26.67 27.89 27.14 0.19
> Test2 - with plist (i.e. with only patch 1/4 as is)
> 26.87 28.18 27.30 0.18
> Test3 - with plist but all SHED_OTHER registered
> 26.74 27.84 27.16 0.18
ok, seems like the last one is the winner - it's the same as unmodified,
within noise.
Ingo
Pierre Peiffer wrote:
> I've run this bench 1000 times with pipe and 800 groups.
> Here are the results:
This is not what I'm mostly concerned about. The patches create a
bottleneck since _all_ processes use the same resource. Plus, this code
has to be run on a machine with multiple processors to get RFOs into play.
So, please do this: on an SMP (4p or more) machine, rig the test so that
it runs quite a while. Then, in a script, start the program a bunch of
times, all in parallel. Have the script wait until all program runs are
done and time the time until the last program finishes.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
* Ulrich Drepper <[email protected]> wrote:
> Pierre Peiffer wrote:
> > I've run this bench 1000 times with pipe and 800 groups.
> > Here are the results:
>
> This is not what I'm mostly concerned about. The patches create a
> bottleneck since _all_ processes use the same resource. [...]
what do you mean by that - which is this same resource?
Ingo
Ingo Molnar wrote:
> what do you mean by that - which is this same resource?
From what has been said here before, all futexes are stored in the same
list or hash table or whatever it was. I want to see how that code
behaves if many separate processes concurrently use futexes.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
* Ulrich Drepper <[email protected]> wrote:
> > what do you mean by that - which is this same resource?
>
> From what has been said here before, all futexes are stored in the
> same list or hash table or whatever it was. I want to see how that
> code behaves if many separate processes concurrently use futexes.
futexes are stored in the bucket hash, and these patches do not change
that. The pi-list that was talked about is per-futex. So there's no
change to the way futexes are hashed nor should there be any scalability
impact - besides the micro-impact that was measured in a number of ways
- AFAICS.
Ingo
Ingo Molnar a ?crit :
> * Ulrich Drepper <[email protected]> wrote:
>
>>> what do you mean by that - which is this same resource?
>> From what has been said here before, all futexes are stored in the
>> same list or hash table or whatever it was. I want to see how that
>> code behaves if many separate processes concurrently use futexes.
>
> futexes are stored in the bucket hash, and these patches do not change
> that. The pi-list that was talked about is per-futex. So there's no
> change to the way futexes are hashed nor should there be any scalability
> impact - besides the micro-impact that was measured in a number of ways
> - AFAICS.
Yes, that's completely right !
--
Pierre