Hi,
I would like to propose adding a new FUTEX_SET_WAIT operation to the futex
code, as per the following patch against 2.6.32-rc7
The change adds a new FUTEX_SET_WAIT operation, which is similar to
FUTEX_WAIT_BITSET except for the addition of an additional 'val2'
parameter, which is an integer and is passed in place of the 'uaddr2'
parameter to the futex syscall.
When val2==val, the behavior is identical to FUTEX_WAIT_BITSET: The
kernel verifies that *uval == val, and waits if that condition is
satisfied. The typical use case is that userspace inspects the futex
value, finds out that it needs to block based on that value, changes the
value to something that indicates it wants to be woken up, and then goes
to execute the futex syscall with a FUTEX_WAIT or FUTEX_WAIT_BITSET
operation.
With the new FUTEX_SET_WAIT operation, the change of the futex value to
indicate a wakeup is desired is done atomically with the kernel's
inspection of the futex value to figure out if it is still necessary to
wait. That is, the kernel inspects the futex value and if 'val' is
found, atomically changes it to 'val2'. Then if the new futex value is
'val2' (either because that was the original value when the kernel first
inspected it, or because the atomic update from val to val2 succeeded),
the thread goes to wait on the futex.
By doing the futex value update atomically with the kernel's inspection
of it to decide to wait, we avoid the time window where the futex has
been set to the 'please wake me up' state, but the thread has not been
queued onto the hash bucket yet. This has two effects:
- Avoids a futex syscall with the FUTEX_WAKE operation if there is no
thread to be woken yet
- In the heavily contended case, avoids waking an extra thread that's
only likely to make the contention problem worse.
Sample test results, on a sun x4600 M2, using the test program included
after the diff (dumb lock/unlock stress test, comparing FUTEX_SET_WAIT
with FUTEX_WAIT):
FUTEX_SET_WAIT test
1 threads: 45662 Kiter/s (2.19s user 0.00s system 2.19s wall 1.00 cores)
2 threads: 11834 Kiter/s (11.07s user 4.70s system 8.45s wall 1.87 cores)
3 threads: 9425 Kiter/s (11.10s user 14.73s system 10.61s wall 2.43 cores)
4 threads: 20790 Kiter/s (5.73s user 10.53s system 4.81s wall 3.38 cores)
5 threads: 21505 Kiter/s (5.05s user 14.02s system 4.65s wall 4.10 cores)
6 threads: 18904 Kiter/s (5.64s user 19.07s system 5.29s wall 4.67 cores)
8 threads: 17212 Kiter/s (6.10s user 28.39s system 5.81s wall 5.94 cores)
10 threads: 19493 Kiter/s (5.20s user 35.82s system 5.13s wall 8.00 cores)
12 threads: 20325 Kiter/s (4.92s user 42.52s system 4.92s wall 9.64 cores)
16 threads: 22026 Kiter/s (4.64s user 56.58s system 4.54s wall 13.48 cores)
24 threads: 23041 Kiter/s (4.33s user 84.76s system 4.34s wall 20.53 cores)
32 threads: 23585 Kiter/s (4.11s user 112.75s system 4.24s wall 27.56 cores)
64 threads: 25907 Kiter/s (3.93s user 115.99s system 3.86s wall 31.07 cores)
128 threads: 26455 Kiter/s (4.02s user 114.50s system 3.78s wall 31.35 cores)
256 threads: 26596 Kiter/s (3.93s user 114.55s system 3.76s wall 31.51 cores)
512 threads: 26596 Kiter/s (3.92s user 114.74s system 3.76s wall 31.56 cores)
1024 threads: 26525 Kiter/s (3.95s user 115.96s system 3.77s wall 31.81 cores)
FUTEX_WAIT test
1 threads: 46083 Kiter/s (2.17s user 0.00s system 2.17s wall 1.00 cores)
2 threads: 10811 Kiter/s (12.39s user 4.71s system 9.25s wall 1.85 cores)
3 threads: 5353 Kiter/s (21.02s user 25.85s system 18.68s wall 2.51 cores)
4 threads: 4277 Kiter/s (27.12s user 54.89s system 23.38s wall 3.51 cores)
5 threads: 3861 Kiter/s (24.51s user 85.21s system 25.90s wall 4.24 cores)
6 threads: 3540 Kiter/s (20.37s user 125.47s system 28.25s wall 5.16 cores)
8 threads: 7257 Kiter/s (12.11s user 81.09s system 13.78s wall 6.76 cores)
10 threads: 8271 Kiter/s (10.87s user 90.33s system 12.09s wall 8.37 cores)
12 threads: 10965 Kiter/s (9.16s user 88.66s system 9.12s wall 10.73 cores)
16 threads: 14472 Kiter/s (7.24s user 95.69s system 6.91s wall 14.90 cores)
24 threads: 17331 Kiter/s (6.01s user 123.90s system 5.77s wall 22.51 cores)
32 threads: 18939 Kiter/s (5.60s user 155.93s system 5.28s wall 30.59 cores)
64 threads: 18727 Kiter/s (5.66s user 162.57s system 5.34s wall 31.50 cores)
128 threads: 18349 Kiter/s (5.56s user 167.46s system 5.45s wall 31.75 cores)
256 threads: 17271 Kiter/s (5.41s user 178.54s system 5.79s wall 31.77 cores)
512 threads: 16207 Kiter/s (5.15s user 191.55s system 6.17s wall 31.88 cores)
1024 threads: 14948 Kiter/s (4.72s user 208.38s system 6.69s wall 31.85 cores)
Signed-off-by: Michel Lespinasse <[email protected]>
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..c5e887d 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,7 @@
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define FUTEX_SET_WAIT 13
#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -39,6 +40,7 @@
FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
+#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
/*
* Support for robust futexes: the kernel cleans up held futexes at
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index a8cc4e1..a199606 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -25,6 +25,7 @@ struct restart_block {
struct {
u32 *uaddr;
u32 val;
+ u32 val2;
u32 flags;
u32 bitset;
u64 time;
diff --git a/kernel/futex.c b/kernel/futex.c
index fb65e82..be9de2b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1694,6 +1694,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* futex_wait_setup() - Prepare to wait on a futex
* @uaddr: the futex userspace address
* @val: the expected value
+ * @val2: the value we want to set, in replacement of val
* @fshared: whether the futex is shared (1) or not (0)
* @q: the associated futex_q
* @hb: storage for hash_bucket pointer to be returned to caller
@@ -1704,10 +1705,10 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* with no q.key reference on failure.
*
* Returns:
- * 0 - uaddr contains val and hb has been locked
+ * 0 - uaddr contains val2 and hb has been locked
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
+static int futex_wait_setup(u32 __user *uaddr, u32 val, u32 val2, int fshared,
struct futex_q *q, struct futex_hash_bucket **hb)
{
u32 uval;
@@ -1722,52 +1723,61 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
*
* The basic logical guarantee of a futex is that it blocks ONLY
* if cond(var) is known to be true at the time of blocking, for
- * any cond. If we queued after testing *uaddr, that would open
- * a race condition where we could block indefinitely with
+ * any cond. If we locked the hash-bucket after testing *uaddr, that
+ * would open a race condition where we could block indefinitely with
* cond(var) false, which would violate the guarantee.
*
- * A consequence is that futex_wait() can return zero and absorb
- * a wakeup when *uaddr != val on entry to the syscall. This is
- * rare, but normal.
+ * On the other hand, we insert q and release the hash-bucket only
+ * after testing *uaddr. This guarantees that futex_wait() will NOT
+ * absorb a wakeup if *uaddr does not match the desired values
+ * while the syscall executes.
*/
retry:
q->key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
+ ret = get_futex_key(uaddr, fshared, &q->key,
+ (val == val2) ? VERIFY_READ : VERIFY_WRITE);
if (unlikely(ret != 0))
return ret;
retry_private:
*hb = queue_lock(q);
- ret = get_futex_value_locked(&uval, uaddr);
-
- if (ret) {
+ pagefault_disable();
+ if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
+ pagefault_enable();
queue_unlock(q, *hb);
-
ret = get_user(uval, uaddr);
+ fault_common:
if (ret)
goto out;
-
if (!fshared)
goto retry_private;
-
put_futex_key(fshared, &q->key);
goto retry;
}
-
- if (uval != val) {
- queue_unlock(q, *hb);
- ret = -EWOULDBLOCK;
+ if (val != val2 && uval == val) {
+ uval = futex_atomic_cmpxchg_inatomic(uaddr, val, val2);
+ if (unlikely(uval == -EFAULT)) {
+ pagefault_enable();
+ queue_unlock(q, *hb);
+ ret = fault_in_user_writeable(uaddr);
+ goto fault_common;
+ }
}
+ pagefault_enable();
+
+ if (uval == val || uval == val2)
+ return 0; /* success */
+ queue_unlock(q, *hb);
+ ret = -EWOULDBLOCK;
out:
- if (ret)
- put_futex_key(fshared, &q->key);
+ put_futex_key(fshared, &q->key);
return ret;
}
-static int futex_wait(u32 __user *uaddr, int fshared,
- u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int fshared, u32 val, u32 val2,
+ ktime_t *abs_time, u32 bitset, int clockrt)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct restart_block *restart;
@@ -1795,7 +1805,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
retry:
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, val2, fshared, &q, &hb);
if (ret)
goto out;
@@ -1827,6 +1837,7 @@ retry:
restart->fn = futex_wait_restart;
restart->futex.uaddr = (u32 *)uaddr;
restart->futex.val = val;
+ restart->futex.val2 = val2;
restart->futex.time = abs_time->tv64;
restart->futex.bitset = bitset;
restart->futex.flags = FLAGS_HAS_TIMEOUT;
@@ -1862,8 +1873,8 @@ static long futex_wait_restart(struct restart_block *restart)
restart->fn = do_no_restart_syscall;
if (restart->futex.flags & FLAGS_SHARED)
fshared = 1;
- return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
- restart->futex.bitset,
+ return (long)futex_wait(uaddr, fshared, restart->futex.val,
+ restart->futex.val2, tp, restart->futex.bitset,
restart->futex.flags & FLAGS_CLOCKRT);
}
@@ -2219,7 +2230,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
q.requeue_pi_key = &key2;
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, val, fshared, &q, &hb);
if (ret)
goto out_key2;
@@ -2532,14 +2543,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
fshared = 1;
clockrt = op & FUTEX_CLOCK_REALTIME;
- if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+ if (clockrt && cmd != FUTEX_WAIT_BITSET &&
+ cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_SET_WAIT)
return -ENOSYS;
switch (cmd) {
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
- ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+ val2 = val;
+ case FUTEX_SET_WAIT:
+ ret = futex_wait(uaddr, fshared, val, val2, timeout, val3,
+ clockrt);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
@@ -2595,7 +2610,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
cmd == FUTEX_WAIT_BITSET ||
- cmd == FUTEX_WAIT_REQUEUE_PI)) {
+ cmd == FUTEX_WAIT_REQUEUE_PI || cmd == FUTEX_SET_WAIT)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
@@ -2609,10 +2624,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
/*
* requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
* number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
+ * new uval value in 'uaddr2' if cmd == FUTEX_SET_WAIT.
*/
if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
val2 = (u32) (unsigned long) utime;
+ else if (cmd == FUTEX_SET_WAIT) {
+ if (!futex_cmpxchg_enabled)
+ return -ENOSYS;
+ val2 = (u32) (unsigned long) uaddr2;
+ }
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
test code:
#include <stdio.h>
#include <errno.h>
#include <linux/unistd.h>
#include <linux/futex.h>
#include <limits.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/times.h>
#include <sys/syscall.h>
#define FUTEX_SET_WAIT 13
#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
static inline int sys_futex(int * uaddr, int op, int val,
struct timespec * timeout, int * uaddr2, int val3)
{
return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3);
}
static inline int cmpxchg(volatile int *ptr, int old, int new)
{
int prev;
asm volatile("lock; cmpxchgl %1,%2"
: "=a" (prev)
: "r"(new), "m"(*ptr), "0"(old)
: "memory");
return prev;
}
/***** Sample futex based lock/unlock operations *****/
/* States: 0 = unlocked; 1 = locked ; 2 = locked with possible waiters */
/* Classic lock, using FUTEX_WAIT */
static inline void futex_wait_lock(volatile int *ptr)
{
int status = *ptr;
if (status == 0)
status = cmpxchg(ptr, 0, 1);
while (status != 0) {
if (status == 1)
status = cmpxchg(ptr, 1, 2);
if (status != 0) {
sys_futex((int *)ptr, FUTEX_WAIT_PRIVATE, 2, NULL, NULL, 0);
status = *ptr;
}
if (status == 0)
status = cmpxchg(ptr, 0, 2);
}
}
/* Optimized lock, using the proposed FUTEX_SET_WAIT operation */
static inline void futex_setwait_lock(volatile int *ptr)
{
int status = *ptr;
if (status == 0)
status = cmpxchg(ptr, 0, 1);
if (status != 0) {
int desired_status = 1;
do {
if (sys_futex((int *)ptr, FUTEX_SET_WAIT_PRIVATE, 1, NULL,
(int *)2, ~0) == 0) {
/* We absorbed a wakeup; so make sure to unblock next thread */
desired_status = 2;
}
status = *ptr;
if (status == 0)
status = cmpxchg(ptr, 0, desired_status);
} while (status != 0);
}
}
/* Unlock. cmpxchg is not optimal for this sample 3-state locking protocol;
but it is often required when implementing more complex locking protocols */
static inline void futex_cmpxchg_unlock(volatile int *ptr)
{
int status = *ptr;
if (status == 1)
status = cmpxchg(ptr, 1, 0);
if (status == 2) {
cmpxchg(ptr, 2, 0); /* Guaranteed to succeed in 3-state protocol */
sys_futex((int *)ptr, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
}
}
/***** Lock/unlock perf test *****/
struct thread_barrier {
int threads;
int unblock;
};
static void barrier_sync(struct thread_barrier *barrier);
struct locktest_shared {
struct thread_barrier barrier_before;
struct thread_barrier barrier_after;
int loops;
int test_futex;
};
static void * futex_wait_test(void * dummy)
{
struct locktest_shared * shared = dummy;
int i = shared->loops;
barrier_sync(&shared->barrier_before);
while (i--) {
futex_wait_lock(&shared->test_futex);
futex_cmpxchg_unlock(&shared->test_futex);
}
barrier_sync(&shared->barrier_after);
return NULL;
}
static void * futex_setwait_test(void * dummy)
{
struct locktest_shared * shared = dummy;
int i = shared->loops;
barrier_sync(&shared->barrier_before);
while (i--) {
futex_setwait_lock(&shared->test_futex);
futex_cmpxchg_unlock(&shared->test_futex);
}
barrier_sync(&shared->barrier_after);
return NULL;
}
/***** Test harness & support functions *****/
static inline void decrement(int *ptr)
{
asm volatile("lock; decl %0"
: "=m" (*ptr)
: "m" (*ptr));
}
/* Called by main thread to initialize barrier */
static void barrier_init(struct thread_barrier *barrier, int threads)
{
barrier->threads = threads;
barrier->unblock = 0;
}
/* Called by worker threads to synchronize with main thread */
static void barrier_sync(struct thread_barrier *barrier)
{
decrement(&barrier->threads);
if (barrier->threads == 0)
sys_futex(&barrier->threads, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
while (barrier->unblock == 0)
sys_futex(&barrier->unblock, FUTEX_WAIT_PRIVATE, 0, NULL, NULL, 0);
}
/* Called by main thread to wait for all workers to reach sync point */
static void barrier_wait(struct thread_barrier *barrier)
{
int threads;
while ((threads = barrier->threads) > 0)
sys_futex(&barrier->threads, FUTEX_WAIT_PRIVATE, threads, NULL, NULL, 0);
}
/* Called by main thread to unblock worker threads from their sync point */
static void barrier_unblock(struct thread_barrier *barrier)
{
barrier->unblock = 1;
sys_futex(&barrier->unblock, FUTEX_WAKE_PRIVATE, INT_MAX, NULL, NULL, 0);
}
static void locktest(void * thread_function(void *), int iterations,
int num_threads)
{
struct locktest_shared shared;
pthread_t thread[num_threads];
int i;
clock_t before, after;
struct tms tms_before, tms_after;
int wall, user, system;
double tick;
barrier_init(&shared.barrier_before, num_threads);
barrier_init(&shared.barrier_after, num_threads);
shared.loops = iterations / num_threads;
shared.test_futex = 0;
for (i = 0; i < num_threads; i++)
pthread_create(thread + i, NULL, thread_function, &shared);
barrier_wait(&shared.barrier_before);
before = times(&tms_before);
barrier_unblock(&shared.barrier_before);
barrier_wait(&shared.barrier_after);
after = times(&tms_after);
wall = after - before;
user = tms_after.tms_utime - tms_before.tms_utime;
system = tms_after.tms_stime - tms_before.tms_stime;
tick = 1.0 / sysconf(_SC_CLK_TCK);
printf("%d threads: %.0f Kiter/s "
"(%.2fs user %.2fs system %.2fs wall %.2f cores)\n",
num_threads, (num_threads * shared.loops) / (wall * tick * 1000),
user * tick, system * tick, wall * tick,
wall ? (user + system) * 1. / wall : 1.);
barrier_unblock(&shared.barrier_after);
for (i = 0; i < num_threads; i++)
pthread_join(thread[i], NULL);
}
int futex_setwait_supported(void)
{
int futex = 0;
sys_futex(&futex, FUTEX_SET_WAIT_PRIVATE, 1, NULL, (int *)2, ~0);
return errno == EWOULDBLOCK;
}
int main (void)
{
int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
64, 128, 256, 512, 1024, 0 };
int i;
if (futex_setwait_supported()) {
printf("\nFUTEX_SET_WAIT test\n");
for (i = 0; threads[i]; i++)
locktest(futex_setwait_test, 100000000, threads[i]);
}
printf("\nFUTEX_WAIT test\n");
for (i = 0; threads[i]; i++)
locktest(futex_wait_test, 100000000, threads[i]);
printf("\n");
return 0;
}
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
* Michel Lespinasse <[email protected]> wrote:
> Sample test results, on a sun x4600 M2, using the test program
> included after the diff (dumb lock/unlock stress test, comparing
> FUTEX_SET_WAIT with FUTEX_WAIT):
>
> FUTEX_SET_WAIT test
> 1 threads: 45662 Kiter/s (2.19s user 0.00s system 2.19s wall 1.00 cores)
> 2 threads: 11834 Kiter/s (11.07s user 4.70s system 8.45s wall 1.87 cores)
> 3 threads: 9425 Kiter/s (11.10s user 14.73s system 10.61s wall 2.43 cores)
> 4 threads: 20790 Kiter/s (5.73s user 10.53s system 4.81s wall 3.38 cores)
> 5 threads: 21505 Kiter/s (5.05s user 14.02s system 4.65s wall 4.10 cores)
> 6 threads: 18904 Kiter/s (5.64s user 19.07s system 5.29s wall 4.67 cores)
> 8 threads: 17212 Kiter/s (6.10s user 28.39s system 5.81s wall 5.94 cores)
> 10 threads: 19493 Kiter/s (5.20s user 35.82s system 5.13s wall 8.00 cores)
> 12 threads: 20325 Kiter/s (4.92s user 42.52s system 4.92s wall 9.64 cores)
> 16 threads: 22026 Kiter/s (4.64s user 56.58s system 4.54s wall 13.48 cores)
> 24 threads: 23041 Kiter/s (4.33s user 84.76s system 4.34s wall 20.53 cores)
> 32 threads: 23585 Kiter/s (4.11s user 112.75s system 4.24s wall 27.56 cores)
> 64 threads: 25907 Kiter/s (3.93s user 115.99s system 3.86s wall 31.07 cores)
> 128 threads: 26455 Kiter/s (4.02s user 114.50s system 3.78s wall 31.35 cores)
> 256 threads: 26596 Kiter/s (3.93s user 114.55s system 3.76s wall 31.51 cores)
> 512 threads: 26596 Kiter/s (3.92s user 114.74s system 3.76s wall 31.56 cores)
> 1024 threads: 26525 Kiter/s (3.95s user 115.96s system 3.77s wall 31.81 cores)
>
> FUTEX_WAIT test
> 1 threads: 46083 Kiter/s (2.17s user 0.00s system 2.17s wall 1.00 cores)
> 2 threads: 10811 Kiter/s (12.39s user 4.71s system 9.25s wall 1.85 cores)
> 3 threads: 5353 Kiter/s (21.02s user 25.85s system 18.68s wall 2.51 cores)
> 4 threads: 4277 Kiter/s (27.12s user 54.89s system 23.38s wall 3.51 cores)
> 5 threads: 3861 Kiter/s (24.51s user 85.21s system 25.90s wall 4.24 cores)
> 6 threads: 3540 Kiter/s (20.37s user 125.47s system 28.25s wall 5.16 cores)
> 8 threads: 7257 Kiter/s (12.11s user 81.09s system 13.78s wall 6.76 cores)
> 10 threads: 8271 Kiter/s (10.87s user 90.33s system 12.09s wall 8.37 cores)
> 12 threads: 10965 Kiter/s (9.16s user 88.66s system 9.12s wall 10.73 cores)
> 16 threads: 14472 Kiter/s (7.24s user 95.69s system 6.91s wall 14.90 cores)
> 24 threads: 17331 Kiter/s (6.01s user 123.90s system 5.77s wall 22.51 cores)
> 32 threads: 18939 Kiter/s (5.60s user 155.93s system 5.28s wall 30.59 cores)
> 64 threads: 18727 Kiter/s (5.66s user 162.57s system 5.34s wall 31.50 cores)
> 128 threads: 18349 Kiter/s (5.56s user 167.46s system 5.45s wall 31.75 cores)
> 256 threads: 17271 Kiter/s (5.41s user 178.54s system 5.79s wall 31.77 cores)
> 512 threads: 16207 Kiter/s (5.15s user 191.55s system 6.17s wall 31.88 cores)
> 1024 threads: 14948 Kiter/s (4.72s user 208.38s system 6.69s wall 31.85 cores)
This test program looks really useful.
Would you be interested in adding it as a 'perf bench futex' testcase?
That way kernel developers could monitor futex performance in the future
as well.
See 'perf bench' in the latest perf events tree:
http://people.redhat.com/mingo/tip.git/README
Do 'cd tools/perf; make -j install' to install perf.
Ingo
(long quote for Linus' benefit, added an old patch to the tail)
On Mon, 2009-11-16 at 23:46 -0800, Michel Lespinasse wrote:
>
> I would like to propose adding a new FUTEX_SET_WAIT operation to the futex
> code, as per the following patch against 2.6.32-rc7
>
> The change adds a new FUTEX_SET_WAIT operation, which is similar to
> FUTEX_WAIT_BITSET except for the addition of an additional 'val2'
> parameter, which is an integer and is passed in place of the 'uaddr2'
> parameter to the futex syscall.
>
> When val2==val, the behavior is identical to FUTEX_WAIT_BITSET: The
> kernel verifies that *uval == val, and waits if that condition is
> satisfied. The typical use case is that userspace inspects the futex
> value, finds out that it needs to block based on that value, changes the
> value to something that indicates it wants to be woken up, and then goes
> to execute the futex syscall with a FUTEX_WAIT or FUTEX_WAIT_BITSET
> operation.
>
> With the new FUTEX_SET_WAIT operation, the change of the futex value to
> indicate a wakeup is desired is done atomically with the kernel's
> inspection of the futex value to figure out if it is still necessary to
> wait. That is, the kernel inspects the futex value and if 'val' is
> found, atomically changes it to 'val2'. Then if the new futex value is
> 'val2' (either because that was the original value when the kernel first
> inspected it, or because the atomic update from val to val2 succeeded),
> the thread goes to wait on the futex.
Right, so this is something I was needing to make adaptive spinning work
with futexes (aside from braving the glibc futex horror).
I've attached an old patch (which was never actually tested), that shows
how to do the adaptive wait. It also does the cmpxchg thing you do, and
it imposes a structure on the futex value.
Maybe we can merge these things.. One more point below.
> By doing the futex value update atomically with the kernel's inspection
> of it to decide to wait, we avoid the time window where the futex has
> been set to the 'please wake me up' state, but the thread has not been
> queued onto the hash bucket yet. This has two effects:
> - Avoids a futex syscall with the FUTEX_WAKE operation if there is no
> thread to be woken yet
> - In the heavily contended case, avoids waking an extra thread that's
> only likely to make the contention problem worse.
- avoids lock stealing, which might make real world loads suck a
little?
> Sample test results, on a sun x4600 M2, using the test program included
> after the diff (dumb lock/unlock stress test, comparing FUTEX_SET_WAIT
> with FUTEX_WAIT):
[ snipped results showing FUTEX_SET_WAIT is faster for this workload ]
> Signed-off-by: Michel Lespinasse <[email protected]>
>
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 1e5a26d..c5e887d 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -20,6 +20,7 @@
> #define FUTEX_WAKE_BITSET 10
> #define FUTEX_WAIT_REQUEUE_PI 11
> #define FUTEX_CMP_REQUEUE_PI 12
> +#define FUTEX_SET_WAIT 13
>
> #define FUTEX_PRIVATE_FLAG 128
> #define FUTEX_CLOCK_REALTIME 256
> @@ -39,6 +40,7 @@
> FUTEX_PRIVATE_FLAG)
> #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> +#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
>
> /*
> * Support for robust futexes: the kernel cleans up held futexes at
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index a8cc4e1..a199606 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -25,6 +25,7 @@ struct restart_block {
> struct {
> u32 *uaddr;
> u32 val;
> + u32 val2;
> u32 flags;
> u32 bitset;
> u64 time;
> diff --git a/kernel/futex.c b/kernel/futex.c
> index fb65e82..be9de2b 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1694,6 +1694,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
> * futex_wait_setup() - Prepare to wait on a futex
> * @uaddr: the futex userspace address
> * @val: the expected value
> + * @val2: the value we want to set, in replacement of val
> * @fshared: whether the futex is shared (1) or not (0)
> * @q: the associated futex_q
> * @hb: storage for hash_bucket pointer to be returned to caller
> @@ -1704,10 +1705,10 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
> * with no q.key reference on failure.
> *
> * Returns:
> - * 0 - uaddr contains val and hb has been locked
> + * 0 - uaddr contains val2 and hb has been locked
> * <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
> */
> -static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
> +static int futex_wait_setup(u32 __user *uaddr, u32 val, u32 val2, int fshared,
> struct futex_q *q, struct futex_hash_bucket **hb)
> {
> u32 uval;
> @@ -1722,52 +1723,61 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
> *
> * The basic logical guarantee of a futex is that it blocks ONLY
> * if cond(var) is known to be true at the time of blocking, for
> - * any cond. If we queued after testing *uaddr, that would open
> - * a race condition where we could block indefinitely with
> + * any cond. If we locked the hash-bucket after testing *uaddr, that
> + * would open a race condition where we could block indefinitely with
> * cond(var) false, which would violate the guarantee.
> *
> - * A consequence is that futex_wait() can return zero and absorb
> - * a wakeup when *uaddr != val on entry to the syscall. This is
> - * rare, but normal.
> + * On the other hand, we insert q and release the hash-bucket only
> + * after testing *uaddr. This guarantees that futex_wait() will NOT
> + * absorb a wakeup if *uaddr does not match the desired values
> + * while the syscall executes.
> */
> retry:
> q->key = FUTEX_KEY_INIT;
> - ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
> + ret = get_futex_key(uaddr, fshared, &q->key,
> + (val == val2) ? VERIFY_READ : VERIFY_WRITE);
> if (unlikely(ret != 0))
> return ret;
>
> retry_private:
> *hb = queue_lock(q);
>
> - ret = get_futex_value_locked(&uval, uaddr);
> -
> - if (ret) {
> + pagefault_disable();
> + if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
> + pagefault_enable();
> queue_unlock(q, *hb);
> -
> ret = get_user(uval, uaddr);
> + fault_common:
> if (ret)
> goto out;
> -
> if (!fshared)
> goto retry_private;
> -
> put_futex_key(fshared, &q->key);
> goto retry;
> }
> -
> - if (uval != val) {
> - queue_unlock(q, *hb);
> - ret = -EWOULDBLOCK;
> + if (val != val2 && uval == val) {
> + uval = futex_atomic_cmpxchg_inatomic(uaddr, val, val2);
> + if (unlikely(uval == -EFAULT)) {
> + pagefault_enable();
> + queue_unlock(q, *hb);
> + ret = fault_in_user_writeable(uaddr);
> + goto fault_common;
> + }
> }
> + pagefault_enable();
> +
> + if (uval == val || uval == val2)
> + return 0; /* success */
>
> + queue_unlock(q, *hb);
> + ret = -EWOULDBLOCK;
> out:
> - if (ret)
> - put_futex_key(fshared, &q->key);
> + put_futex_key(fshared, &q->key);
> return ret;
> }
>
> -static int futex_wait(u32 __user *uaddr, int fshared,
> - u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
> +static int futex_wait(u32 __user *uaddr, int fshared, u32 val, u32 val2,
> + ktime_t *abs_time, u32 bitset, int clockrt)
> {
> struct hrtimer_sleeper timeout, *to = NULL;
> struct restart_block *restart;
> @@ -1795,7 +1805,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
>
> retry:
> /* Prepare to wait on uaddr. */
> - ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
> + ret = futex_wait_setup(uaddr, val, val2, fshared, &q, &hb);
> if (ret)
> goto out;
>
> @@ -1827,6 +1837,7 @@ retry:
> restart->fn = futex_wait_restart;
> restart->futex.uaddr = (u32 *)uaddr;
> restart->futex.val = val;
> + restart->futex.val2 = val2;
> restart->futex.time = abs_time->tv64;
> restart->futex.bitset = bitset;
> restart->futex.flags = FLAGS_HAS_TIMEOUT;
> @@ -1862,8 +1873,8 @@ static long futex_wait_restart(struct restart_block *restart)
> restart->fn = do_no_restart_syscall;
> if (restart->futex.flags & FLAGS_SHARED)
> fshared = 1;
> - return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
> - restart->futex.bitset,
> + return (long)futex_wait(uaddr, fshared, restart->futex.val,
> + restart->futex.val2, tp, restart->futex.bitset,
> restart->futex.flags & FLAGS_CLOCKRT);
> }
>
> @@ -2219,7 +2230,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
> q.requeue_pi_key = &key2;
>
> /* Prepare to wait on uaddr. */
> - ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
> + ret = futex_wait_setup(uaddr, val, val, fshared, &q, &hb);
> if (ret)
> goto out_key2;
>
> @@ -2532,14 +2543,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
> fshared = 1;
>
> clockrt = op & FUTEX_CLOCK_REALTIME;
> - if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
> + if (clockrt && cmd != FUTEX_WAIT_BITSET &&
> + cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_SET_WAIT)
> return -ENOSYS;
>
> switch (cmd) {
> case FUTEX_WAIT:
> val3 = FUTEX_BITSET_MATCH_ANY;
> case FUTEX_WAIT_BITSET:
> - ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
> + val2 = val;
> + case FUTEX_SET_WAIT:
> + ret = futex_wait(uaddr, fshared, val, val2, timeout, val3,
> + clockrt);
> break;
> case FUTEX_WAKE:
> val3 = FUTEX_BITSET_MATCH_ANY;
> @@ -2595,7 +2610,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
>
> if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
> cmd == FUTEX_WAIT_BITSET ||
> - cmd == FUTEX_WAIT_REQUEUE_PI)) {
> + cmd == FUTEX_WAIT_REQUEUE_PI || cmd == FUTEX_SET_WAIT)) {
> if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
> return -EFAULT;
> if (!timespec_valid(&ts))
> @@ -2609,10 +2624,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
> /*
> * requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
> * number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
> + * new uval value in 'uaddr2' if cmd == FUTEX_SET_WAIT.
> */
> if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
> cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
> val2 = (u32) (unsigned long) utime;
> + else if (cmd == FUTEX_SET_WAIT) {
> + if (!futex_cmpxchg_enabled)
> + return -ENOSYS;
> + val2 = (u32) (unsigned long) uaddr2;
> + }
>
> return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
> }
>
---
Subject: futex: implement adaptive spin
From: Peter Zijlstra <[email protected]>
Date: Tue Jan 20 14:40:36 CET 2009
Implement kernel side adaptive spining for futexes.
This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
requires the futex lock field to contain a TID and regular futexes do
not have that restriction.
FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
different cpu) or until the task gets preempted. After that it behaves
like FUTEX_WAIT.
The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
on the lower 30 bits (TID_MASK) of the lock field -- leaving the
WAITERS and OWNER_DIED flags in tact.
NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
adding a lock_owner function argument.
void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
void *lock, struct thread_info *owner);
Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/futex.h | 2 +
include/linux/sched.h | 1
kernel/futex.c | 96 ++++++++++++++++++++++++++++++++++++++++++--------
kernel/sched.c | 59 ++++++++++++++++++++++++++++++
4 files changed, 143 insertions(+), 15 deletions(-)
Index: linux-2.6/include/linux/futex.h
===================================================================
--- linux-2.6.orig/include/linux/futex.h
+++ linux-2.6/include/linux/futex.h
@@ -23,6 +23,7 @@ union ktime;
#define FUTEX_TRYLOCK_PI 8
#define FUTEX_WAIT_BITSET 9
#define FUTEX_WAKE_BITSET 10
+#define FUTEX_WAIT_ADAPTIVE 11
#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -171,6 +172,7 @@ union futex_key {
extern void exit_robust_list(struct task_struct *curr);
extern void exit_pi_state_list(struct task_struct *curr);
extern int futex_cmpxchg_enabled;
+extern struct thread_info *futex_owner(u32 __user *uaddr);
#else
static inline void exit_robust_list(struct task_struct *curr)
{
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -341,6 +341,7 @@ extern signed long schedule_timeout_unin
asmlinkage void __schedule(void);
asmlinkage void schedule(void);
extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner);
struct nsproxy;
struct user_namespace;
Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -1158,10 +1158,40 @@ handle_fault:
*/
#define FLAGS_SHARED 0x01
#define FLAGS_CLOCKRT 0x02
+#define FLAGS_ADAPTIVE 0x03
static long futex_wait_restart(struct restart_block *restart);
-static int futex_wait(u32 __user *uaddr, int fshared,
+#ifdef CONFIG_SMP
+struct thread_info *futex_owner(u32 __user *uaddr)
+{
+ struct thread_info *ti = NULL;
+ struct task_struct *p;
+ pid_t pid;
+ u32 uval;
+
+ if (get_futex_value_locked(&uval, uaddr))
+ return NULL;
+
+ pid = uval & FUTEX_TID_MASK;
+ rcu_read_lock();
+ p = find_taks_by_vpid(pid);
+ if (p) {
+ const struct cred *cred, *pcred;
+
+ cread = current_cred();
+ pcred = __task_cred(p);
+ if (cred->euid == pcred->euid ||
+ cred->euid == pcred->uid)
+ ti = task_thread_info(p);
+ }
+ rcu_read_unlock();
+
+ return ti;
+}
+#endif
+
+static int futex_wait(u32 __user *uaddr, int flags,
u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
{
struct task_struct *curr = current;
@@ -1176,11 +1206,43 @@ static int futex_wait(u32 __user *uaddr,
if (!bitset)
return -EINVAL;
+#ifdef CONFIG_SMP
+ if (!futex_cmpxchg_enabled || !(flags & FLAGS_ADAPTIVE))
+ goto skip_adaptive;
+
+ preempt_disable();
+ for (;;) {
+ struct thread_info *owner;
+ u32 curval = 0, newval = task_pid_vnr(curr);
+
+ owner = futex_owner(uaddr);
+ if (owner && futex_spin_on_owner(uaddr, owner))
+ break;
+
+ if (get_futex_value_locked(&uval, uaddr))
+ break;
+
+ curval |= uval & ~FUTEX_TID_MASK;
+ newval |= uval & ~FUTEX_TID_MASK;
+
+ if (cmpxchg_futex_value_locked(uaddr, curval, newval)
+ == newval)
+ return 0;
+
+ if (!owner && (need_resched() || rt_task(curr)))
+ break;
+
+ cpu_relax();
+ }
+ preempt_enable();
+skip_adaptive:
+#endif
+
q.pi_state = NULL;
q.bitset = bitset;
retry:
q.key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q.key);
+ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
if (unlikely(ret != 0))
goto out;
@@ -1210,7 +1272,7 @@ retry:
if (unlikely(ret)) {
queue_unlock(&q, hb);
- put_futex_key(fshared, &q.key);
+ put_futex_key(flags & FLAGS_SHARED, &q.key);
ret = get_user(uval, uaddr);
@@ -1305,7 +1367,7 @@ retry:
restart->futex.bitset = bitset;
restart->futex.flags = 0;
- if (fshared)
+ if (flags & FLAGS_SHARED)
restart->futex.flags |= FLAGS_SHARED;
if (clockrt)
restart->futex.flags |= FLAGS_CLOCKRT;
@@ -1314,7 +1376,7 @@ retry:
out_unlock_put_key:
queue_unlock(&q, hb);
- put_futex_key(fshared, &q.key);
+ put_futex_key(flags & FLAGS_SHARED, &q.key);
out:
return ret;
@@ -1929,46 +1991,50 @@ long do_futex(u32 __user *uaddr, int op,
{
int clockrt, ret = -ENOSYS;
int cmd = op & FUTEX_CMD_MASK;
- int fshared = 0;
+ int flags = 0;
if (!(op & FUTEX_PRIVATE_FLAG))
- fshared = 1;
+ flags |= FLAGS_SHARED;
clockrt = op & FUTEX_CLOCK_REALTIME;
if (clockrt && cmd != FUTEX_WAIT_BITSET)
return -ENOSYS;
switch (cmd) {
+ case FUTEX_WAIT_ADAPTIVE:
+ flags |= FLAGS_ADAPTIVE;
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
- ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+ ret = futex_wait(uaddr, flags, val, timeout, val3, clockrt);
break;
+
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAKE_BITSET:
- ret = futex_wake(uaddr, fshared, val, val3);
+ ret = futex_wake(uaddr, flags, val, val3);
break;
+
case FUTEX_REQUEUE:
- ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL);
+ ret = futex_requeue(uaddr, flags, uaddr2, val, val2, NULL);
break;
case FUTEX_CMP_REQUEUE:
- ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3);
+ ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3);
break;
case FUTEX_WAKE_OP:
- ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
+ ret = futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
break;
case FUTEX_LOCK_PI:
if (futex_cmpxchg_enabled)
- ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
+ ret = futex_lock_pi(uaddr, flags, val, timeout, 0);
break;
case FUTEX_UNLOCK_PI:
if (futex_cmpxchg_enabled)
- ret = futex_unlock_pi(uaddr, fshared);
+ ret = futex_unlock_pi(uaddr, flags);
break;
case FUTEX_TRYLOCK_PI:
if (futex_cmpxchg_enabled)
- ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
+ ret = futex_lock_pi(uaddr, flags, 0, timeout, 1);
break;
default:
ret = -ENOSYS;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4834,6 +4834,65 @@ int mutex_spin_on_owner(struct mutex *lo
out:
return 1;
}
+
+#ifdef CONFIG_FUTEX
+#include <linux/futex.h>
+
+int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
+{
+ unsigned int cpu;
+ struct rq *rq;
+
+ if (!sched_feat(OWNER_SPIN))
+ return 0;
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ /*
+ * Need to access the cpu field knowing that
+ * DEBUG_PAGEALLOC could have unmapped it if
+ * the mutex owner just released it and exited.
+ */
+ if (probe_kernel_address(&owner->cpu, cpu))
+ goto out;
+#else
+ cpu = owner->cpu;
+#endif
+
+ /*
+ * Even if the access succeeded (likely case),
+ * the cpu field may no longer be valid.
+ */
+ if (cpu >= nr_cpumask_bits)
+ goto out;
+
+ /*
+ * We need to validate that we can do a
+ * get_cpu() and that we have the percpu area.
+ */
+ if (!cpu_online(cpu))
+ goto out;
+
+ rq = cpu_rq(cpu);
+
+ for (;;) {
+ /*
+ * Owner changed, break to re-assess state.
+ */
+ if (futex_owner(uaddr) != owner)
+ break;
+
+ /*
+ * Is that owner really running on that cpu?
+ */
+ if (task_thread_info(rq->curr) != owner || need_resched())
+ return 0;
+
+ cpu_relax();
+ }
+out:
+ return 1;
+}
+#endif
#endif
#ifdef CONFIG_PREEMPT
On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
>
> Would you be interested in adding it as a 'perf bench futex' testcase?
> That way kernel developers could monitor futex performance in the future
> as well.
>
> See 'perf bench' in the latest perf events tree:
>
> http://people.redhat.com/mingo/tip.git/README
>
> Do 'cd tools/perf; make -j install' to install perf.
Darren has recently been putting a lot of effort in making a futex test
suite:
http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
Its not yet complete, but hopefully it will soon be :-)
On Tue, 17 Nov 2009, Peter Zijlstra wrote:
>
> (long quote for Linus' benefit, added an old patch to the tail)
Both look conceptually sane to me.
The FUTEX_SET_WAIT concept seems well-defined, although it sounds more
like a FUTEX_CMPXCHG_WAIT to me than a "SET" operation. I'm not entirely
sure that we really want to do the CMPXCHG in the kernel rather than in
user space, since lock stealing generally isn't a problem, but I don't
think it's _wrong_ to add this concept.
In fact, CMPXCHG is generally seen to be the "fundamental" base for
implementing locking, so in that sense it makes perfect sense to have it
as a FUTEX model.
That said, I personally think the adaptive wait model is (a) more likely
to fix many performance issues and (b) a bit more high-level concept, so I
like Peter's patch too, but I don't see that the patches would really be
mutually exclusive.
Of course, it's possible that Michel's performance problem is fixed by the
adaptive approach too, in which case the FUTEX_SET_WAIT (or _CMPXCHG_WAIT)
patch is just fundamentally less interesting. But some people do need
fairness - even when it's bad for performance - so...
One thing that does strike me is that _if_ we want to do both interfaces,
then I would assume that we quite likely also want to have an adaptive
version of the FUTEX_SET|CMPXCHG_WAIT thing. Which perhaps implies that
the "ADAPTIVE" part should be a bitflag in the command value?
Linus
Peter Zijlstra wrote:
> On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
>> Would you be interested in adding it as a 'perf bench futex' testcase?
>> That way kernel developers could monitor futex performance in the future
>> as well.
>>
>> See 'perf bench' in the latest perf events tree:
>>
>> http://people.redhat.com/mingo/tip.git/README
>>
>> Do 'cd tools/perf; make -j install' to install perf.
>
> Darren has recently been putting a lot of effort in making a futex test
> suite:
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Its not yet complete, but hopefully it will soon be :-)
>
Michael, would you be willing to include a version of this test in the
above test suite? If so, then in keeping with the rest of the test
suite, I would recommend splitting into two tests, one of each opcode
being tested, and add argument to define thread count. The run.sh
script would then run each thread count as a separate test run.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Michel Lespinasse wrote:
> Hi,
>
Hi Michel,
Thanks for the excellent writeup. The concept looks reasonable, and
useful, to me. Just a few thoughts, comments below.
> ...
>
> By doing the futex value update atomically with the kernel's inspection
> of it to decide to wait, we avoid the time window where the futex has
> been set to the 'please wake me up' state, but the thread has not been
> queued onto the hash bucket yet. This has two effects:
> - Avoids a futex syscall with the FUTEX_WAKE operation if there is no
> thread to be woken yet
This also reduces lock contention on the hash-bucket locks, another plus.
> - In the heavily contended case, avoids waking an extra thread that's
> only likely to make the contention problem worse.
I'm not seeing this. What is the extra thread that would be woken which
isn't with FUTEX_SET_WAIT?
> ...
>
> Signed-off-by: Michel Lespinasse <[email protected]>
>
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 1e5a26d..c5e887d 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -20,6 +20,7 @@
> #define FUTEX_WAKE_BITSET 10
> #define FUTEX_WAIT_REQUEUE_PI 11
> #define FUTEX_CMP_REQUEUE_PI 12
> +#define FUTEX_SET_WAIT 13
>
> #define FUTEX_PRIVATE_FLAG 128
> #define FUTEX_CLOCK_REALTIME 256
> @@ -39,6 +40,7 @@
> FUTEX_PRIVATE_FLAG)
> #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> +#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
>
> /*
> * Support for robust futexes: the kernel cleans up held futexes at
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index a8cc4e1..a199606 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -25,6 +25,7 @@ struct restart_block {
> struct {
> u32 *uaddr;
> u32 val;
> + u32 val2;
It's a nitpic, but val2 is used in the futex syscall arguments already
and another variable of the same name that is actually initially derived
from uaddr2... is more likely to confuse than not. Perhaps "setval"?
Throughout the patch.
> @@ -1722,52 +1723,61 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
> *
> * The basic logical guarantee of a futex is that it blocks ONLY
> * if cond(var) is known to be true at the time of blocking, for
> - * any cond. If we queued after testing *uaddr, that would open
> - * a race condition where we could block indefinitely with
> + * any cond. If we locked the hash-bucket after testing *uaddr, that
> + * would open a race condition where we could block indefinitely with
> * cond(var) false, which would violate the guarantee.
> *
> - * A consequence is that futex_wait() can return zero and absorb
> - * a wakeup when *uaddr != val on entry to the syscall. This is
> - * rare, but normal.
> + * On the other hand, we insert q and release the hash-bucket only
> + * after testing *uaddr. This guarantees that futex_wait() will NOT
> + * absorb a wakeup if *uaddr does not match the desired values
> + * while the syscall executes.
> */
> retry:
> q->key = FUTEX_KEY_INIT;
> - ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
> + ret = get_futex_key(uaddr, fshared, &q->key,
> + (val == val2) ? VERIFY_READ : VERIFY_WRITE);
Have you compared the performance of FUTEX_WAIT before and after the
application of this patch? I'd be interested to see your test results on
a prepatched kernel (with the FUTEX_SET_WAIT side commented out of course).
> if (unlikely(ret != 0))
> return ret;
>
> retry_private:
> *hb = queue_lock(q);
>
> - ret = get_futex_value_locked(&uval, uaddr);
> -
> - if (ret) {
> + pagefault_disable();
> + if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
> + pagefault_enable();
What about the addition of val2 makes it so we have to expand
get_futex_value_locked() here with the nested fault handling?
> queue_unlock(q, *hb);
> -
Superfluous whitespace change
> ret = get_user(uval, uaddr);
> + fault_common:
Inconsistent label indentation with the rest of the file.
> if (ret)
> goto out;
> -
> if (!fshared)
> goto retry_private;
> -
> put_futex_key(fshared, &q->key);
> goto retry;
> }
> -
Several of superfluous whitespace changes
> - if (uval != val) {
> - queue_unlock(q, *hb);
> - ret = -EWOULDBLOCK;
> + if (val != val2 && uval == val) {
> + uval = futex_atomic_cmpxchg_inatomic(uaddr, val, val2);
> + if (unlikely(uval == -EFAULT)) {
> + pagefault_enable();
> + queue_unlock(q, *hb);
> + ret = fault_in_user_writeable(uaddr);
> + goto fault_common;
> + }
> }
> + pagefault_enable();
> +
> + if (uval == val || uval == val2)
> + return 0; /* success */
If the comment is necessary, please give it its own line (I've done a
lot of futex commentary cleanup recently and am a little sensitive to
maintaining that :-).
I'd rather not add another return point to the code, even if it saves us
an if statement. You've already tested for uval == val above, perhaps
this test can be integrated in the above blocks and then use the common
out: label?
Now I need to review Peter's Adaptive bits and think on how these two
relate...
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
* Peter Zijlstra <[email protected]> wrote:
> On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
> >
> > Would you be interested in adding it as a 'perf bench futex' testcase?
> > That way kernel developers could monitor futex performance in the future
> > as well.
> >
> > See 'perf bench' in the latest perf events tree:
> >
> > http://people.redhat.com/mingo/tip.git/README
> >
> > Do 'cd tools/perf; make -j install' to install perf.
>
> Darren has recently been putting a lot of effort in making a futex test
> suite:
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Its not yet complete, but hopefully it will soon be :-)
Looks nice! Any futex performance measurements within it could be added
to perf bench - with standardized options and output, etc.
Ingo
Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
>>> Would you be interested in adding it as a 'perf bench futex' testcase?
>>> That way kernel developers could monitor futex performance in the future
>>> as well.
>>>
>>> See 'perf bench' in the latest perf events tree:
>>>
>>> http://people.redhat.com/mingo/tip.git/README
>>>
>>> Do 'cd tools/perf; make -j install' to install perf.
>> Darren has recently been putting a lot of effort in making a futex test
>> suite:
>>
>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>>
>> Its not yet complete, but hopefully it will soon be :-)
>
> Looks nice! Any futex performance measurements within it could be added
> to perf bench - with standardized options and output, etc.
>
> Ingo
I'll take a look at perf this week and see what would be required to
share (?) tests between perf and futextest.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Michel Lespinasse wrote:
> Hi,
>
> I would like to propose adding a new FUTEX_SET_WAIT operation to the futex
> code, as per the following patch against 2.6.32-rc7
If we go ahead with this, I'd like to adapt FUTEX_WAIT_REQUEUE_PI to use
this as well, as it faces all the same races related to the calling task
getting to sleep in the kernel before a wake call (FUTEX_CMP_REQUEUE_PI
in this case) as futex_wait does. FUTEX_SET_WAIT_REQUEUE_PI is
ridiculously long... but it is what it is I guess :-)
--
Darren "not-as-brave-as-peterz (short quote for Linus)" Hart
IBM Linux Technology Center
Real-Time Linux Team
From: Ingo Molnar <[email protected]>
Subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation
Date: Tue, 17 Nov 2009 18:24:56 +0100
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
> > >
> > > Would you be interested in adding it as a 'perf bench futex' testcase?
> > > That way kernel developers could monitor futex performance in the future
> > > as well.
> > >
> > > See 'perf bench' in the latest perf events tree:
> > >
> > > http://people.redhat.com/mingo/tip.git/README
> > >
> > > Do 'cd tools/perf; make -j install' to install perf.
> >
> > Darren has recently been putting a lot of effort in making a futex test
> > suite:
> >
> > http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
> >
> > Its not yet complete, but hopefully it will soon be :-)
>
> Looks nice! Any futex performance measurements within it could be added
> to perf bench - with standardized options and output, etc.
It is interesting for me, too.
Can I add your test programs to perf bench, Michel and Darren?
Of course, I'll do every odd job if you permit.
Hitoshi
On Tue, Nov 17, 2009 at 09:22:18AM -0800, Darren Hart wrote:
>> By doing the futex value update atomically with the kernel's inspection
>> of it to decide to wait, we avoid the time window where the futex has
>> been set to the 'please wake me up' state, but the thread has not been
>> queued onto the hash bucket yet. This has two effects:
>> - Avoids a futex syscall with the FUTEX_WAKE operation if there is no
>> thread to be woken yet
>
> This also reduces lock contention on the hash-bucket locks, another plus.
Yes :)
>> - In the heavily contended case, avoids waking an extra thread that's
>> only likely to make the contention problem worse.
>
> I'm not seeing this. What is the extra thread that would be woken which
> isn't with FUTEX_SET_WAIT?
I meant that when several threads get queued on the futex, one can
choose to wake only one of them when releasing the futex - the remaining
ones stay queued and will ideally be woken only by a thread that already
blocked, or when a new thread blocks.
> It's a nitpic, but val2 is used in the futex syscall arguments already
> and another variable of the same name that is actually initially derived
> from uaddr2... is more likely to confuse than not. Perhaps "setval"?
> Throughout the patch.
OK, I renamed val2 into setval in futex_wait and futex_wait_setup.
I did not do this in do_futex - I followed the example of bitset vs val3
(val2 is just a made up name for one of the sys_futex pointer arguments
casted to integer, I did not want to add a second such variable).
> Have you compared the performance of FUTEX_WAIT before and after the
> application of this patch? I'd be interested to see your test results on
> a prepatched kernel (with the FUTEX_SET_WAIT side commented out of
> course).
Performance seems to be similar as far as I can tell. It's hard to gauge it
for the 2-24 threads case, as there is quite a bit of run-to-run noise there
depending on where your threads land up. In the >=32 threads case, it looks
like the patched kernel is actually slightly faster. It's not clear to me if
this is pure luck or if it's due to getting the fault patch out of the way
with the unlikely() if statement.
FUTEX_WAIT test (with unpatched 2.6.32-rc7):
1 threads: 45045 Kiter/s (2.21s user 0.00s system 2.22s wall 1.00 cores)
2 threads: 17483 Kiter/s (3.98s user 3.80s system 5.72s wall 1.36 cores)
3 threads: 5984 Kiter/s (16.78s user 24.06s system 16.71s wall 2.44 cores)
4 threads: 2910 Kiter/s (21.61s user 86.48s system 34.36s wall 3.15 cores)
5 threads: 4619 Kiter/s (17.53s user 78.59s system 21.65s wall 4.44 cores)
6 threads: 6549 Kiter/s (14.01s user 68.49s system 15.27s wall 5.40 cores)
8 threads: 7663 Kiter/s (12.21s user 72.55s system 13.05s wall 6.50 cores)
10 threads: 7831 Kiter/s (11.54s user 91.41s system 12.77s wall 8.06 cores)
12 threads: 10941 Kiter/s (8.81s user 79.86s system 9.14s wall 9.70 cores)
16 threads: 13661 Kiter/s (7.51s user 92.31s system 7.32s wall 13.64 cores)
24 threads: 17483 Kiter/s (6.13s user 118.48s system 5.72s wall 21.78 cores)
32 threads: 18215 Kiter/s (5.91s user 148.16s system 5.49s wall 28.06 cores)
64 threads: 18349 Kiter/s (5.72s user 160.06s system 5.45s wall 30.42 cores)
128 threads: 18382 Kiter/s (5.68s user 166.96s system 5.44s wall 31.74 cores)
256 threads: 17007 Kiter/s (5.30s user 181.25s system 5.88s wall 31.73 cores)
512 threads: 16026 Kiter/s (4.79s user 193.34s system 6.24s wall 31.75 cores)
1024 threads: 15432 Kiter/s (4.38s user 202.08s system 6.48s wall 31.86 cores)
>> - ret = get_futex_value_locked(&uval, uaddr);
>> -
>> - if (ret) {
>> + pagefault_disable();
>> + if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
>> + pagefault_enable();
>
> What about the addition of val2 makes it so we have to expand
> get_futex_value_locked() here with the nested fault handling?
This is because I did not want to re-enable page faults right afterwards
if we end up going to the futex_atomic_cmpxchg_inatomic path...
I applied your other feedback into the updated change down below.
Signed-off-by: Michel Lespinasse <[email protected]>
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..c5e887d 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,7 @@
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define FUTEX_SET_WAIT 13
#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -39,6 +40,7 @@
FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
+#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
/*
* Support for robust futexes: the kernel cleans up held futexes at
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index a8cc4e1..b08c3ec 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -25,6 +25,7 @@ struct restart_block {
struct {
u32 *uaddr;
u32 val;
+ u32 setval;
u32 flags;
u32 bitset;
u64 time;
diff --git a/kernel/futex.c b/kernel/futex.c
index fb65e82..7ed0869 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1694,6 +1694,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* futex_wait_setup() - Prepare to wait on a futex
* @uaddr: the futex userspace address
* @val: the expected value
+ * @setval: the value we want to set, in replacement of val
* @fshared: whether the futex is shared (1) or not (0)
* @q: the associated futex_q
* @hb: storage for hash_bucket pointer to be returned to caller
@@ -1704,11 +1705,12 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* with no q.key reference on failure.
*
* Returns:
- * 0 - uaddr contains val and hb has been locked
+ * 0 - uaddr contains setval and hb has been locked
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
- struct futex_q *q, struct futex_hash_bucket **hb)
+static int futex_wait_setup(u32 __user *uaddr, u32 val, u32 setval,
+ int fshared, struct futex_q *q,
+ struct futex_hash_bucket **hb)
{
u32 uval;
int ret;
@@ -1722,29 +1724,33 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
*
* The basic logical guarantee of a futex is that it blocks ONLY
* if cond(var) is known to be true at the time of blocking, for
- * any cond. If we queued after testing *uaddr, that would open
- * a race condition where we could block indefinitely with
+ * any cond. If we locked the hash-bucket after testing *uaddr, that
+ * would open a race condition where we could block indefinitely with
* cond(var) false, which would violate the guarantee.
*
- * A consequence is that futex_wait() can return zero and absorb
- * a wakeup when *uaddr != val on entry to the syscall. This is
- * rare, but normal.
+ * On the other hand, we insert q and release the hash-bucket only
+ * after testing *uaddr. This guarantees that futex_wait() will NOT
+ * absorb a wakeup if *uaddr does not match the desired values
+ * while the syscall executes.
*/
retry:
q->key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
+ ret = get_futex_key(uaddr, fshared, &q->key,
+ (val == setval) ? VERIFY_READ : VERIFY_WRITE);
if (unlikely(ret != 0))
return ret;
retry_private:
*hb = queue_lock(q);
- ret = get_futex_value_locked(&uval, uaddr);
+ pagefault_disable();
+ if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
+ pagefault_enable();
- if (ret) {
queue_unlock(q, *hb);
ret = get_user(uval, uaddr);
+fault_common:
if (ret)
goto out;
@@ -1755,7 +1761,21 @@ retry_private:
goto retry;
}
- if (uval != val) {
+ if (val != setval && uval == val) {
+ uval = futex_atomic_cmpxchg_inatomic(uaddr, val, setval);
+ if (unlikely(uval == -EFAULT)) {
+ pagefault_enable();
+
+ queue_unlock(q, *hb);
+
+ ret = fault_in_user_writeable(uaddr);
+ goto fault_common;
+ }
+ }
+
+ pagefault_enable();
+
+ if (uval != val && uval != setval) {
queue_unlock(q, *hb);
ret = -EWOULDBLOCK;
}
@@ -1766,8 +1786,8 @@ out:
return ret;
}
-static int futex_wait(u32 __user *uaddr, int fshared,
- u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int fshared, u32 val, u32 setval,
+ ktime_t *abs_time, u32 bitset, int clockrt)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct restart_block *restart;
@@ -1795,7 +1815,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
retry:
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, setval, fshared, &q, &hb);
if (ret)
goto out;
@@ -1827,6 +1847,7 @@ retry:
restart->fn = futex_wait_restart;
restart->futex.uaddr = (u32 *)uaddr;
restart->futex.val = val;
+ restart->futex.setval = setval;
restart->futex.time = abs_time->tv64;
restart->futex.bitset = bitset;
restart->futex.flags = FLAGS_HAS_TIMEOUT;
@@ -1862,7 +1883,8 @@ static long futex_wait_restart(struct restart_block *restart)
restart->fn = do_no_restart_syscall;
if (restart->futex.flags & FLAGS_SHARED)
fshared = 1;
- return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
+ return (long)futex_wait(uaddr, fshared, restart->futex.val,
+ restart->futex.setval, tp,
restart->futex.bitset,
restart->futex.flags & FLAGS_CLOCKRT);
}
@@ -2219,7 +2241,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
q.requeue_pi_key = &key2;
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, val, fshared, &q, &hb);
if (ret)
goto out_key2;
@@ -2532,14 +2554,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
fshared = 1;
clockrt = op & FUTEX_CLOCK_REALTIME;
- if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+ if (clockrt && cmd != FUTEX_WAIT_BITSET &&
+ cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_SET_WAIT)
return -ENOSYS;
switch (cmd) {
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
- ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+ val2 = val;
+ case FUTEX_SET_WAIT:
+ ret = futex_wait(uaddr, fshared, val, val2, timeout, val3,
+ clockrt);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
@@ -2595,7 +2621,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
cmd == FUTEX_WAIT_BITSET ||
- cmd == FUTEX_WAIT_REQUEUE_PI)) {
+ cmd == FUTEX_WAIT_REQUEUE_PI || cmd == FUTEX_SET_WAIT)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
@@ -2609,10 +2635,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
/*
* requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
* number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
+ * new uval value in 'uaddr2' if cmd == FUTEX_SET_WAIT.
*/
if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
val2 = (u32) (unsigned long) utime;
+ else if (cmd == FUTEX_SET_WAIT) {
+ if (!futex_cmpxchg_enabled)
+ return -ENOSYS;
+ val2 = (u32) (unsigned long) uaddr2;
+ }
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>> Darren has recently been putting a lot of effort in making a futex test
>> suite:
>>
>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>>
>> Its not yet complete, but hopefully it will soon be :-)
>
> Michael, would you be willing to include a version of this test in the
> above test suite? If so, then in keeping with the rest of the test suite,
> I would recommend splitting into two tests, one of each opcode being
> tested, and add argument to define thread count. The run.sh script would
> then run each thread count as a separate test run.
I had a quick look at the tree (using the http interface). I sure would not
mind adding the tests, and the futextest.h file would make them look
quite cleaner than my previous version. Just to be sure, would you put them
under performance/ or stress/ ? This would be the first test to be added
in these directories in any case, right ? (just asking in case I missed
something obvious).
And if Hitoshi wants to do it first, I sure won't mind the help either :)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
On Tue, Nov 17, 2009 at 07:24:09AM -0800, Linus Torvalds wrote:
> The FUTEX_SET_WAIT concept seems well-defined, although it sounds more
> like a FUTEX_CMPXCHG_WAIT to me than a "SET" operation. I'm not entirely
> sure that we really want to do the CMPXCHG in the kernel rather than in
> user space, since lock stealing generally isn't a problem, but I don't
> think it's _wrong_ to add this concept.
>
> In fact, CMPXCHG is generally seen to be the "fundamental" base for
> implementing locking, so in that sense it makes perfect sense to have it
> as a FUTEX model.
My first version called the operation that way, but it did *NOT* block if
val2 (now renamed setval) was already set in the futex. Turned out it helps
my use case if I do block in that situation, so I changed the operation
accordingly and renamed it into FUTEX_SET_WAIT (with a CAS model in mind,
though it's still also similar to cmpxchg in that it just returns if
the uval is not 'val' or 'setval').
> That said, I personally think the adaptive wait model is (a) more likely
> to fix many performance issues and (b) a bit more high-level concept, so I
> like Peter's patch too, but I don't see that the patches would really be
> mutually exclusive.
>
> Of course, it's possible that Michel's performance problem is fixed by the
> adaptive approach too, in which case the FUTEX_SET_WAIT (or _CMPXCHG_WAIT)
> patch is just fundamentally less interesting. But some people do need
> fairness - even when it's bad for performance - so...
>
> One thing that does strike me is that _if_ we want to do both interfaces,
> then I would assume that we quite likely also want to have an adaptive
> version of the FUTEX_SET|CMPXCHG_WAIT thing. Which perhaps implies that
> the "ADAPTIVE" part should be a bitflag in the command value?
I like the adaptive approach as well, though I'm not sure yet if it'd work
for us. I can try it but it'll take a bit of time.
One difficulty with adaptive spinning is that we want to avoid deadlocks.
If two threads end up spinning in-kernel waiting for each other, we better
have preemption enabled... or detect and deal with the situation somehow.
Also one aspect I dislike is that this would impose a given format on the
futex for storing the TID. I would prefer if there were several bits available
in the futex for userspace to do whatever they want. 8 bits would likely
be enough, which leaves 24 for the TID - enough for us, but I have no idea
if that's good enough for upstream inclusion. It that's not possible,
one possible compromise could be:
- userspace passes a TID (which it extracted from the futex value; but kernel
does not necessarily know how)
- kernel spins until that TID goes to sleep, or the futex value is not equal
to val or setval anymore
- if val != setval and the futex value is val, set it to setval
- if the futex valus is setval, block, otherwise -EWOULDBLOCK.
If the lock got stolen from a different thread, userspace can decide to
retry with or without adaptive spinning.
That would be the most generic interface I can think of, though it's
starting to be a LOT of parameters - actually, too many to pass through
the _syscall6 interface.
I also like Darren's suggestion to do a FUTEX_SET_WAIT_REQUEUE_PI,
but it's hitting the same 'too many parameters' limitation as well :/
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
Michel Lespinasse wrote:
> On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>>> Darren has recently been putting a lot of effort in making a futex test
>>> suite:
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>>>
>>> Its not yet complete, but hopefully it will soon be :-)
>> Michael, would you be willing to include a version of this test in the
>> above test suite? If so, then in keeping with the rest of the test suite,
>> I would recommend splitting into two tests, one of each opcode being
>> tested, and add argument to define thread count. The run.sh script would
>> then run each thread count as a separate test run.
>
> I had a quick look at the tree (using the http interface). I sure would not
> mind adding the tests, and the futextest.h file would make them look
> quite cleaner than my previous version. Just to be sure, would you put them
> under performance/ or stress/ ? This would be the first test to be added
> in these directories in any case, right ? (just asking in case I missed
> something obvious).
Michel, excellent!
I think performance sounds like the right place to me. Performance is
where I plan to put measurement tests where you report some score.
Stress tests are considered to have passed if the run to completion
without locking up or crashing the kernel :-)
We should take a look at the perf tool to see what requirements it may
impose on tests we want to share between perf and futextest.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Michel Lespinasse wrote:
> One difficulty with adaptive spinning is that we want to avoid deadlocks.
> If two threads end up spinning in-kernel waiting for each other, we better
> have preemption enabled... or detect and deal with the situation somehow.
This is really only a problem for SCHED_FIFO tasks right? (SCHED_OTHER
should get scheduled() out when CFS deems they've exhausted their fair
share). Real-Time tasks typically should be using PI anyway as adaptive
locking is non-deterministic and doesn't provide for PI. So I'm not sure
how critical this problem is in practice.
> Also one aspect I dislike is that this would impose a given format on the
> futex for storing the TID.
We do have a precedent for this with robust as well as PI futexes.
I would prefer if there were several bits available
> in the futex for userspace to do whatever they want. 8 bits would likely
> be enough, which leaves 24 for the TID - enough for us, but I have no idea
> if that's good enough for upstream inclusion. It that's not possible,
> one possible compromise could be:
And we already use two of those bits for OWNER_DIED and FUTEX_WAITERS.
Perhaps you just have to choose between your own value scheme and
adaptive spinning (sounds horribly limiting as I'm typing this...).
>
> - userspace passes a TID (which it extracted from the futex value; but kernel
> does not necessarily know how)
> - kernel spins until that TID goes to sleep, or the futex value is not equal
> to val or setval anymore
> - if val != setval and the futex value is val, set it to setval
> - if the futex valus is setval, block, otherwise -EWOULDBLOCK.
>
> If the lock got stolen from a different thread, userspace can decide to
> retry with or without adaptive spinning.
I'll think on this a bit more...
>
> That would be the most generic interface I can think of, though it's
> starting to be a LOT of parameters - actually, too many to pass through
> the _syscall6 interface.
>
>
> I also like Darren's suggestion to do a FUTEX_SET_WAIT_REQUEUE_PI,
> but it's hitting the same 'too many parameters' limitation as well :/
We don't use val2 for FUTEX_WAIT_REQUEUE_PI, so we should be able to use
that for setval.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Michael, would you be willing to include a version of this test in the
> above test suite? If so, then in keeping with the rest of the test suite,
> I would recommend splitting into two tests, one of each opcode being
> tested, and add argument to define thread count. The run.sh script would
> then run each thread count as a separate test run.
There you go. Hope this helps. Feel free to adapt as needed.
Signed-off-by: Michel Lespinasse <[email protected]>
diff --git a/functional/futex_requeue_pi_mismatched_ops.c b/functional/futex_requeue_pi_mismatched_ops.c
index 50bd07b..529f5a8 100644
--- a/functional/futex_requeue_pi_mismatched_ops.c
+++ b/functional/futex_requeue_pi_mismatched_ops.c
@@ -113,7 +113,7 @@ int main(int argc, char *argv[])
* requeue_pi target and aborted. Wake the child with
* FUTEX_WAKE.
*/
- ret = futex_wake(&f1, f1, 1, FUTEX_PRIVATE_FLAG);
+ ret = futex_wake(&f1, 1, FUTEX_PRIVATE_FLAG);
if (ret == 1)
ret = 0;
else if (ret < 0)
diff --git a/include/futextest.h b/include/futextest.h
index 2b64f79..853c3c4 100644
--- a/include/futextest.h
+++ b/include/futextest.h
@@ -78,6 +78,9 @@ int _verbose = VCRITICAL;
#ifndef FUTEX_CMP_REQUEUE_PI
#define FUTEX_CMP_REQUEUE_PI 12
#endif
+#ifndef FUTEX_SET_WAIT
+#define FUTEX_SET_WAIT 13
+#endif
#ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
@@ -86,6 +89,9 @@ int _verbose = VCRITICAL;
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
#endif
+#ifndef FUTEX_SET_WAIT_PRIVATE
+#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
+#endif
/**
* futex() - SYS_futex syscall wrapper
@@ -106,7 +112,7 @@ int _verbose = VCRITICAL;
* like-named arguments in the following wrappers except where noted below.
*/
#define futex(uaddr, op, val, timeout, uaddr2, val3, opflags) \
- syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3);
+ syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3)
/**
* futex_wait() - block on uaddr with optional timeout
@@ -119,8 +125,8 @@ int _verbose = VCRITICAL;
* futex_wake() - wake one or more tasks blocked on uaddr
* @nr_wake: wake up to this many tasks
*/
-#define futex_wake(uaddr, val, nr_wake, opflags) \
- futex(uaddr, FUTEX_WAKE, val, NULL, NULL, nr_wake, opflags)
+#define futex_wake(uaddr, nr_wake, opflags) \
+ futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags)
/**
* futex_wait_bitset() - block on uaddr with bitset
@@ -133,8 +139,8 @@ int _verbose = VCRITICAL;
* futex_wake_bitset() - wake one or more tasks blocked on uaddr with bitset
* @bitset: bitset to compare with that used in futex_wait_bitset
*/
-#define futex_wake_bitset(uaddr, val, nr_wake, bitset, opflags) \
- futex(uaddr, FUTEX_WAKE_BITSET, val, NULL, NULL, bitset, opflags)
+#define futex_wake_bitset(uaddr, nr_wake, bitset, opflags) \
+ futex(uaddr, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset, opflags)
/**
* futex_lock_pi() - block on uaddr as a PI mutex
@@ -198,6 +204,14 @@ int _verbose = VCRITICAL;
opflags)
/**
+ * futex_set_wait() - block on uaddr with bitset
+ * @setval: value to set futex to if blocking
+ * @bitset: bitset to be used with futex_wake_bitset
+ */
+#define futex_set_wait(uaddr, val, setval, timeout, bitset, opflags) \
+ futex(uaddr, FUTEX_SET_WAIT, val, timeout, setval, bitset, opflags)
+
+/**
* futex_cmpxchg() - Atomic compare and exchange
* @uaddr: The address of the futex to be modified
* @oldval: The expected value of the futex
diff --git a/performance/Makefile b/performance/Makefile
index 9589e49..c4999a8 100644
--- a/performance/Makefile
+++ b/performance/Makefile
@@ -3,7 +3,7 @@ CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE $(INCLUDES)
LDFLAGS := $(LDFLAGS) -lpthread -lrt
HEADERS := ../include/futextest.h
-TARGETS :=
+TARGETS := futex_wait_test futex_setwait_test
.PHONY: all clean
all: $(TARGETS)
diff --git a/performance/futex_setwait_test.c b/performance/futex_setwait_test.c
new file mode 100644
index 0000000..0d09365
--- /dev/null
+++ b/performance/futex_setwait_test.c
@@ -0,0 +1,71 @@
+// Copyright 2009 Google Inc.
+// Author: [email protected] (Michel Lespinasse)
+
+#include "futextest.h"
+#include "harness.h"
+
+#include <stdio.h>
+#include <errno.h>
+
+
+static inline void futex_setwait_lock(futex_t *futex)
+{
+ int status = *futex;
+ if (status == 0)
+ status = futex_cmpxchg(futex, 0, 1);
+ if (status != 0) {
+ int desired_status = 1;
+ do {
+ if (futex_set_wait(futex, 1, 2, NULL, ~0,
+ FUTEX_PRIVATE_FLAG) == 0) {
+ /* We absorbed a wakeup; so make sure to
+ unblock next thread */
+ desired_status = 2;
+ }
+ status = *futex;
+ if (status == 0)
+ status = futex_cmpxchg(futex, 0,
+ desired_status);
+ } while (status != 0);
+ }
+}
+
+static inline void futex_cmpxchg_unlock(futex_t *futex)
+{
+ int status = *futex;
+ if (status == 1)
+ status = futex_cmpxchg(futex, 1, 0);
+ if (status == 2) {
+ futex_cmpxchg(futex, 2, 0);
+ futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+ }
+}
+
+static void * futex_setwait_test(void * dummy)
+{
+ struct locktest_shared * shared = dummy;
+ int i = shared->loops;
+ barrier_sync(&shared->barrier_before);
+ while (i--) {
+ futex_setwait_lock(&shared->futex);
+ futex_cmpxchg_unlock(&shared->futex);
+ }
+ barrier_sync(&shared->barrier_after);
+ return NULL;
+}
+
+static int futex_setwait_supported(void)
+{
+ int futex = 0;
+ futex_set_wait(futex, 1, 2, NULL, ~0, FUTEX_PRIVATE_FLAG);
+ return errno == EWOULDBLOCK;
+}
+
+int main (void)
+{
+ if (futex_setwait_supported()) {
+ printf("\nFUTEX_SET_WAIT test\n");
+ locktest(futex_setwait_test, 100000000);
+ }
+ return 0;
+}
diff --git a/performance/futex_wait_test.c b/performance/futex_wait_test.c
new file mode 100644
index 0000000..88ce2f2
--- /dev/null
+++ b/performance/futex_wait_test.c
@@ -0,0 +1,56 @@
+// Copyright 2009 Google Inc.
+// Author: [email protected] (Michel Lespinasse)
+
+#include "futextest.h"
+#include "harness.h"
+
+#include <stdio.h>
+
+
+static inline void futex_wait_lock(futex_t *futex)
+{
+ int status = *futex;
+ if (status == 0)
+ status = futex_cmpxchg(futex, 0, 1);
+ while (status != 0) {
+ if (status == 1)
+ status = futex_cmpxchg(futex, 1, 2);
+ if (status != 0) {
+ futex_wait(futex, 2, NULL, FUTEX_PRIVATE_FLAG);
+ status = *futex;
+ }
+ if (status == 0)
+ status = futex_cmpxchg(futex, 0, 2);
+ }
+}
+
+static inline void futex_cmpxchg_unlock(futex_t *futex)
+{
+ int status = *futex;
+ if (status == 1)
+ status = futex_cmpxchg(futex, 1, 0);
+ if (status == 2) {
+ futex_cmpxchg(futex, 2, 0);
+ futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+ }
+}
+
+static void * futex_wait_test(void * dummy)
+{
+ struct locktest_shared * shared = dummy;
+ int i = shared->loops;
+ barrier_sync(&shared->barrier_before);
+ while (i--) {
+ futex_wait_lock(&shared->futex);
+ futex_cmpxchg_unlock(&shared->futex);
+ }
+ barrier_sync(&shared->barrier_after);
+ return NULL;
+}
+
+int main (void)
+{
+ printf("FUTEX_WAIT test\n");
+ locktest(futex_wait_test, 100000000);
+ return 0;
+}
diff --git a/performance/harness.h b/performance/harness.h
new file mode 100644
index 0000000..9d74d17
--- /dev/null
+++ b/performance/harness.h
@@ -0,0 +1,103 @@
+// Copyright 2009 Google Inc.
+// Author: [email protected] (Michel Lespinasse)
+
+#include <limits.h>
+#include <sys/times.h>
+#include <stdio.h>
+#include <pthread.h>
+
+
+struct thread_barrier {
+ futex_t threads;
+ futex_t unblock;
+};
+
+struct locktest_shared {
+ struct thread_barrier barrier_before;
+ struct thread_barrier barrier_after;
+ int loops;
+ futex_t futex;
+};
+
+static inline void decrement(futex_t *ptr)
+{
+ __sync_fetch_and_add(ptr, -1);
+}
+
+/* Called by main thread to initialize barrier */
+static void barrier_init(struct thread_barrier *barrier, int threads)
+{
+ barrier->threads = threads;
+ barrier->unblock = 0;
+}
+
+/* Called by worker threads to synchronize with main thread */
+static void barrier_sync(struct thread_barrier *barrier)
+{
+ decrement(&barrier->threads);
+ if (barrier->threads == 0)
+ futex_wake(&barrier->threads, 1, FUTEX_PRIVATE_FLAG);
+ while (barrier->unblock == 0)
+ futex_wait(&barrier->unblock, 0, NULL, FUTEX_PRIVATE_FLAG);
+}
+
+/* Called by main thread to wait for all workers to reach sync point */
+static void barrier_wait(struct thread_barrier *barrier)
+{
+ int threads;
+ while ((threads = barrier->threads) > 0)
+ futex_wait(&barrier->threads, threads, NULL,
+ FUTEX_PRIVATE_FLAG);
+}
+
+/* Called by main thread to unblock worker threads from their sync point */
+static void barrier_unblock(struct thread_barrier *barrier)
+{
+ barrier->unblock = 1;
+ futex_wake(&barrier->unblock, INT_MAX, FUTEX_PRIVATE_FLAG);
+}
+
+
+static void locktest(void * thread_function(void *), int iterations)
+{
+ int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
+ 64, 128, 256, 512, 1024, 0 };
+ int t;
+ for (t = 0; threads[t]; t++) {
+ int num_threads = threads[t];
+ struct locktest_shared shared;
+ pthread_t thread[num_threads];
+ int i;
+ clock_t before, after;
+ struct tms tms_before, tms_after;
+ int wall, user, system;
+ double tick;
+
+ barrier_init(&shared.barrier_before, num_threads);
+ barrier_init(&shared.barrier_after, num_threads);
+ shared.loops = iterations / num_threads;
+ shared.futex = 0;
+
+ for (i = 0; i < num_threads; i++)
+ pthread_create(thread + i, NULL, thread_function,
+ &shared);
+ barrier_wait(&shared.barrier_before);
+ before = times(&tms_before);
+ barrier_unblock(&shared.barrier_before);
+ barrier_wait(&shared.barrier_after);
+ after = times(&tms_after);
+ wall = after - before;
+ user = tms_after.tms_utime - tms_before.tms_utime;
+ system = tms_after.tms_stime - tms_before.tms_stime;
+ tick = 1.0 / sysconf(_SC_CLK_TCK);
+ printf("%d threads: %.0f Kiter/s "
+ "(%.2fs user %.2fs system %.2fs wall %.2f cores)\n",
+ num_threads,
+ (num_threads * shared.loops) / (wall * tick * 1000),
+ user * tick, system * tick, wall * tick,
+ wall ? (user + system) * 1. / wall : 1.);
+ barrier_unblock(&shared.barrier_after);
+ for (i = 0; i < num_threads; i++)
+ pthread_join(thread[i], NULL);
+ }
+}
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
Michel Lespinasse wrote:
> On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>> Michael, would you be willing to include a version of this test in the
>> above test suite? If so, then in keeping with the rest of the test suite,
>> I would recommend splitting into two tests, one of each opcode being
>> tested, and add argument to define thread count. The run.sh script would
>> then run each thread count as a separate test run.
>
> There you go. Hope this helps. Feel free to adapt as needed.
>
> Signed-off-by: Michel Lespinasse <[email protected]>
Thanks Michel. I've split the patch into two, one with the futex_wait
test and infrastructure and one for the futex_set_wait patch. This way
we can pull in the futex_wait bits independently of however this
futex_set_wait() patch turns out (merge with adaptive or whatever). I
still need to do a lot of integration work, so for now these patches
exist in the "set_wait" branch in the repo.
Thanks for the contributions!
Darren Hart
>
> diff --git a/functional/futex_requeue_pi_mismatched_ops.c b/functional/futex_requeue_pi_mismatched_ops.c
> index 50bd07b..529f5a8 100644
> --- a/functional/futex_requeue_pi_mismatched_ops.c
> +++ b/functional/futex_requeue_pi_mismatched_ops.c
> @@ -113,7 +113,7 @@ int main(int argc, char *argv[])
> * requeue_pi target and aborted. Wake the child with
> * FUTEX_WAKE.
> */
> - ret = futex_wake(&f1, f1, 1, FUTEX_PRIVATE_FLAG);
> + ret = futex_wake(&f1, 1, FUTEX_PRIVATE_FLAG);
> if (ret == 1)
> ret = 0;
> else if (ret < 0)
> diff --git a/include/futextest.h b/include/futextest.h
> index 2b64f79..853c3c4 100644
> --- a/include/futextest.h
> +++ b/include/futextest.h
> @@ -78,6 +78,9 @@ int _verbose = VCRITICAL;
> #ifndef FUTEX_CMP_REQUEUE_PI
> #define FUTEX_CMP_REQUEUE_PI 12
> #endif
> +#ifndef FUTEX_SET_WAIT
> +#define FUTEX_SET_WAIT 13
> +#endif
> #ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
> #define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> @@ -86,6 +89,9 @@ int _verbose = VCRITICAL;
> #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> #endif
> +#ifndef FUTEX_SET_WAIT_PRIVATE
> +#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
> +#endif
>
> /**
> * futex() - SYS_futex syscall wrapper
> @@ -106,7 +112,7 @@ int _verbose = VCRITICAL;
> * like-named arguments in the following wrappers except where noted below.
> */
> #define futex(uaddr, op, val, timeout, uaddr2, val3, opflags) \
> - syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3);
> + syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3)
>
> /**
> * futex_wait() - block on uaddr with optional timeout
> @@ -119,8 +125,8 @@ int _verbose = VCRITICAL;
> * futex_wake() - wake one or more tasks blocked on uaddr
> * @nr_wake: wake up to this many tasks
> */
> -#define futex_wake(uaddr, val, nr_wake, opflags) \
> - futex(uaddr, FUTEX_WAKE, val, NULL, NULL, nr_wake, opflags)
> +#define futex_wake(uaddr, nr_wake, opflags) \
> + futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags)
>
> /**
> * futex_wait_bitset() - block on uaddr with bitset
> @@ -133,8 +139,8 @@ int _verbose = VCRITICAL;
> * futex_wake_bitset() - wake one or more tasks blocked on uaddr with bitset
> * @bitset: bitset to compare with that used in futex_wait_bitset
> */
> -#define futex_wake_bitset(uaddr, val, nr_wake, bitset, opflags) \
> - futex(uaddr, FUTEX_WAKE_BITSET, val, NULL, NULL, bitset, opflags)
> +#define futex_wake_bitset(uaddr, nr_wake, bitset, opflags) \
> + futex(uaddr, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset, opflags)
>
> /**
> * futex_lock_pi() - block on uaddr as a PI mutex
> @@ -198,6 +204,14 @@ int _verbose = VCRITICAL;
> opflags)
>
> /**
> + * futex_set_wait() - block on uaddr with bitset
> + * @setval: value to set futex to if blocking
> + * @bitset: bitset to be used with futex_wake_bitset
> + */
> +#define futex_set_wait(uaddr, val, setval, timeout, bitset, opflags) \
> + futex(uaddr, FUTEX_SET_WAIT, val, timeout, setval, bitset, opflags)
> +
> +/**
> * futex_cmpxchg() - Atomic compare and exchange
> * @uaddr: The address of the futex to be modified
> * @oldval: The expected value of the futex
> diff --git a/performance/Makefile b/performance/Makefile
> index 9589e49..c4999a8 100644
> --- a/performance/Makefile
> +++ b/performance/Makefile
> @@ -3,7 +3,7 @@ CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE $(INCLUDES)
> LDFLAGS := $(LDFLAGS) -lpthread -lrt
>
> HEADERS := ../include/futextest.h
> -TARGETS :=
> +TARGETS := futex_wait_test futex_setwait_test
>
> .PHONY: all clean
> all: $(TARGETS)
> diff --git a/performance/futex_setwait_test.c b/performance/futex_setwait_test.c
> new file mode 100644
> index 0000000..0d09365
> --- /dev/null
> +++ b/performance/futex_setwait_test.c
> @@ -0,0 +1,71 @@
> +// Copyright 2009 Google Inc.
> +// Author: [email protected] (Michel Lespinasse)
> +
> +#include "futextest.h"
> +#include "harness.h"
> +
> +#include <stdio.h>
> +#include <errno.h>
> +
> +
> +static inline void futex_setwait_lock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0, 1);
> + if (status != 0) {
> + int desired_status = 1;
> + do {
> + if (futex_set_wait(futex, 1, 2, NULL, ~0,
> + FUTEX_PRIVATE_FLAG) == 0) {
> + /* We absorbed a wakeup; so make sure to
> + unblock next thread */
> + desired_status = 2;
> + }
> + status = *futex;
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0,
> + desired_status);
> + } while (status != 0);
> + }
> +}
> +
> +static inline void futex_cmpxchg_unlock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 1)
> + status = futex_cmpxchg(futex, 1, 0);
> + if (status == 2) {
> + futex_cmpxchg(futex, 2, 0);
> + futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
> + }
> +}
> +
> +static void * futex_setwait_test(void * dummy)
> +{
> + struct locktest_shared * shared = dummy;
> + int i = shared->loops;
> + barrier_sync(&shared->barrier_before);
> + while (i--) {
> + futex_setwait_lock(&shared->futex);
> + futex_cmpxchg_unlock(&shared->futex);
> + }
> + barrier_sync(&shared->barrier_after);
> + return NULL;
> +}
> +
> +static int futex_setwait_supported(void)
> +{
> + int futex = 0;
> + futex_set_wait(futex, 1, 2, NULL, ~0, FUTEX_PRIVATE_FLAG);
> + return errno == EWOULDBLOCK;
> +}
> +
> +int main (void)
> +{
> + if (futex_setwait_supported()) {
> + printf("\nFUTEX_SET_WAIT test\n");
> + locktest(futex_setwait_test, 100000000);
> + }
> + return 0;
> +}
> diff --git a/performance/futex_wait_test.c b/performance/futex_wait_test.c
> new file mode 100644
> index 0000000..88ce2f2
> --- /dev/null
> +++ b/performance/futex_wait_test.c
> @@ -0,0 +1,56 @@
> +// Copyright 2009 Google Inc.
> +// Author: [email protected] (Michel Lespinasse)
> +
> +#include "futextest.h"
> +#include "harness.h"
> +
> +#include <stdio.h>
> +
> +
> +static inline void futex_wait_lock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0, 1);
> + while (status != 0) {
> + if (status == 1)
> + status = futex_cmpxchg(futex, 1, 2);
> + if (status != 0) {
> + futex_wait(futex, 2, NULL, FUTEX_PRIVATE_FLAG);
> + status = *futex;
> + }
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0, 2);
> + }
> +}
> +
> +static inline void futex_cmpxchg_unlock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 1)
> + status = futex_cmpxchg(futex, 1, 0);
> + if (status == 2) {
> + futex_cmpxchg(futex, 2, 0);
> + futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
> + }
> +}
> +
> +static void * futex_wait_test(void * dummy)
> +{
> + struct locktest_shared * shared = dummy;
> + int i = shared->loops;
> + barrier_sync(&shared->barrier_before);
> + while (i--) {
> + futex_wait_lock(&shared->futex);
> + futex_cmpxchg_unlock(&shared->futex);
> + }
> + barrier_sync(&shared->barrier_after);
> + return NULL;
> +}
> +
> +int main (void)
> +{
> + printf("FUTEX_WAIT test\n");
> + locktest(futex_wait_test, 100000000);
> + return 0;
> +}
> diff --git a/performance/harness.h b/performance/harness.h
> new file mode 100644
> index 0000000..9d74d17
> --- /dev/null
> +++ b/performance/harness.h
> @@ -0,0 +1,103 @@
> +// Copyright 2009 Google Inc.
> +// Author: [email protected] (Michel Lespinasse)
> +
> +#include <limits.h>
> +#include <sys/times.h>
> +#include <stdio.h>
> +#include <pthread.h>
> +
> +
> +struct thread_barrier {
> + futex_t threads;
> + futex_t unblock;
> +};
> +
> +struct locktest_shared {
> + struct thread_barrier barrier_before;
> + struct thread_barrier barrier_after;
> + int loops;
> + futex_t futex;
> +};
> +
> +static inline void decrement(futex_t *ptr)
> +{
> + __sync_fetch_and_add(ptr, -1);
> +}
> +
> +/* Called by main thread to initialize barrier */
> +static void barrier_init(struct thread_barrier *barrier, int threads)
> +{
> + barrier->threads = threads;
> + barrier->unblock = 0;
> +}
> +
> +/* Called by worker threads to synchronize with main thread */
> +static void barrier_sync(struct thread_barrier *barrier)
> +{
> + decrement(&barrier->threads);
> + if (barrier->threads == 0)
> + futex_wake(&barrier->threads, 1, FUTEX_PRIVATE_FLAG);
> + while (barrier->unblock == 0)
> + futex_wait(&barrier->unblock, 0, NULL, FUTEX_PRIVATE_FLAG);
> +}
> +
> +/* Called by main thread to wait for all workers to reach sync point */
> +static void barrier_wait(struct thread_barrier *barrier)
> +{
> + int threads;
> + while ((threads = barrier->threads) > 0)
> + futex_wait(&barrier->threads, threads, NULL,
> + FUTEX_PRIVATE_FLAG);
> +}
> +
> +/* Called by main thread to unblock worker threads from their sync point */
> +static void barrier_unblock(struct thread_barrier *barrier)
> +{
> + barrier->unblock = 1;
> + futex_wake(&barrier->unblock, INT_MAX, FUTEX_PRIVATE_FLAG);
> +}
> +
> +
> +static void locktest(void * thread_function(void *), int iterations)
> +{
> + int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
> + 64, 128, 256, 512, 1024, 0 };
> + int t;
> + for (t = 0; threads[t]; t++) {
> + int num_threads = threads[t];
> + struct locktest_shared shared;
> + pthread_t thread[num_threads];
> + int i;
> + clock_t before, after;
> + struct tms tms_before, tms_after;
> + int wall, user, system;
> + double tick;
> +
> + barrier_init(&shared.barrier_before, num_threads);
> + barrier_init(&shared.barrier_after, num_threads);
> + shared.loops = iterations / num_threads;
> + shared.futex = 0;
> +
> + for (i = 0; i < num_threads; i++)
> + pthread_create(thread + i, NULL, thread_function,
> + &shared);
> + barrier_wait(&shared.barrier_before);
> + before = times(&tms_before);
> + barrier_unblock(&shared.barrier_before);
> + barrier_wait(&shared.barrier_after);
> + after = times(&tms_after);
> + wall = after - before;
> + user = tms_after.tms_utime - tms_before.tms_utime;
> + system = tms_after.tms_stime - tms_before.tms_stime;
> + tick = 1.0 / sysconf(_SC_CLK_TCK);
> + printf("%d threads: %.0f Kiter/s "
> + "(%.2fs user %.2fs system %.2fs wall %.2f cores)\n",
> + num_threads,
> + (num_threads * shared.loops) / (wall * tick * 1000),
> + user * tick, system * tick, wall * tick,
> + wall ? (user + system) * 1. / wall : 1.);
> + barrier_unblock(&shared.barrier_after);
> + for (i = 0; i < num_threads; i++)
> + pthread_join(thread[i], NULL);
> + }
> +}
>
>
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Michel Lespinasse wrote:
> On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>> Michael, would you be willing to include a version of this test in the
>> above test suite? If so, then in keeping with the rest of the test suite,
>> I would recommend splitting into two tests, one of each opcode being
>> tested, and add argument to define thread count. The run.sh script would
>> then run each thread count as a separate test run.
>
> There you go. Hope this helps. Feel free to adapt as needed.
>
> Signed-off-by: Michel Lespinasse <[email protected]>
My core-duo laptop hung after 256 threads. I left it running all night
and woke to it still sitting at:
256 threads: 11792 Kiter/s (14.18s user 0.28s system 8.48s wall 1.71 cores)
Have experienced a hang with this test on any platform? I'll take a
closer look at the source today to see if there is anything in there
that requires a certain number of CPUs to function properly.
--
Darren
>
> diff --git a/functional/futex_requeue_pi_mismatched_ops.c b/functional/futex_requeue_pi_mismatched_ops.c
> index 50bd07b..529f5a8 100644
> --- a/functional/futex_requeue_pi_mismatched_ops.c
> +++ b/functional/futex_requeue_pi_mismatched_ops.c
> @@ -113,7 +113,7 @@ int main(int argc, char *argv[])
> * requeue_pi target and aborted. Wake the child with
> * FUTEX_WAKE.
> */
> - ret = futex_wake(&f1, f1, 1, FUTEX_PRIVATE_FLAG);
> + ret = futex_wake(&f1, 1, FUTEX_PRIVATE_FLAG);
> if (ret == 1)
> ret = 0;
> else if (ret < 0)
> diff --git a/include/futextest.h b/include/futextest.h
> index 2b64f79..853c3c4 100644
> --- a/include/futextest.h
> +++ b/include/futextest.h
> @@ -78,6 +78,9 @@ int _verbose = VCRITICAL;
> #ifndef FUTEX_CMP_REQUEUE_PI
> #define FUTEX_CMP_REQUEUE_PI 12
> #endif
> +#ifndef FUTEX_SET_WAIT
> +#define FUTEX_SET_WAIT 13
> +#endif
> #ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
> #define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> @@ -86,6 +89,9 @@ int _verbose = VCRITICAL;
> #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> #endif
> +#ifndef FUTEX_SET_WAIT_PRIVATE
> +#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
> +#endif
>
> /**
> * futex() - SYS_futex syscall wrapper
> @@ -106,7 +112,7 @@ int _verbose = VCRITICAL;
> * like-named arguments in the following wrappers except where noted below.
> */
> #define futex(uaddr, op, val, timeout, uaddr2, val3, opflags) \
> - syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3);
> + syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3)
>
> /**
> * futex_wait() - block on uaddr with optional timeout
> @@ -119,8 +125,8 @@ int _verbose = VCRITICAL;
> * futex_wake() - wake one or more tasks blocked on uaddr
> * @nr_wake: wake up to this many tasks
> */
> -#define futex_wake(uaddr, val, nr_wake, opflags) \
> - futex(uaddr, FUTEX_WAKE, val, NULL, NULL, nr_wake, opflags)
> +#define futex_wake(uaddr, nr_wake, opflags) \
> + futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags)
>
> /**
> * futex_wait_bitset() - block on uaddr with bitset
> @@ -133,8 +139,8 @@ int _verbose = VCRITICAL;
> * futex_wake_bitset() - wake one or more tasks blocked on uaddr with bitset
> * @bitset: bitset to compare with that used in futex_wait_bitset
> */
> -#define futex_wake_bitset(uaddr, val, nr_wake, bitset, opflags) \
> - futex(uaddr, FUTEX_WAKE_BITSET, val, NULL, NULL, bitset, opflags)
> +#define futex_wake_bitset(uaddr, nr_wake, bitset, opflags) \
> + futex(uaddr, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset, opflags)
>
> /**
> * futex_lock_pi() - block on uaddr as a PI mutex
> @@ -198,6 +204,14 @@ int _verbose = VCRITICAL;
> opflags)
>
> /**
> + * futex_set_wait() - block on uaddr with bitset
> + * @setval: value to set futex to if blocking
> + * @bitset: bitset to be used with futex_wake_bitset
> + */
> +#define futex_set_wait(uaddr, val, setval, timeout, bitset, opflags) \
> + futex(uaddr, FUTEX_SET_WAIT, val, timeout, setval, bitset, opflags)
> +
> +/**
> * futex_cmpxchg() - Atomic compare and exchange
> * @uaddr: The address of the futex to be modified
> * @oldval: The expected value of the futex
> diff --git a/performance/Makefile b/performance/Makefile
> index 9589e49..c4999a8 100644
> --- a/performance/Makefile
> +++ b/performance/Makefile
> @@ -3,7 +3,7 @@ CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE $(INCLUDES)
> LDFLAGS := $(LDFLAGS) -lpthread -lrt
>
> HEADERS := ../include/futextest.h
> -TARGETS :=
> +TARGETS := futex_wait_test futex_setwait_test
>
> .PHONY: all clean
> all: $(TARGETS)
> diff --git a/performance/futex_setwait_test.c b/performance/futex_setwait_test.c
> new file mode 100644
> index 0000000..0d09365
> --- /dev/null
> +++ b/performance/futex_setwait_test.c
> @@ -0,0 +1,71 @@
> +// Copyright 2009 Google Inc.
> +// Author: [email protected] (Michel Lespinasse)
> +
> +#include "futextest.h"
> +#include "harness.h"
> +
> +#include <stdio.h>
> +#include <errno.h>
> +
> +
> +static inline void futex_setwait_lock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0, 1);
> + if (status != 0) {
> + int desired_status = 1;
> + do {
> + if (futex_set_wait(futex, 1, 2, NULL, ~0,
> + FUTEX_PRIVATE_FLAG) == 0) {
> + /* We absorbed a wakeup; so make sure to
> + unblock next thread */
> + desired_status = 2;
> + }
> + status = *futex;
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0,
> + desired_status);
> + } while (status != 0);
> + }
> +}
> +
> +static inline void futex_cmpxchg_unlock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 1)
> + status = futex_cmpxchg(futex, 1, 0);
> + if (status == 2) {
> + futex_cmpxchg(futex, 2, 0);
> + futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
> + }
> +}
> +
> +static void * futex_setwait_test(void * dummy)
> +{
> + struct locktest_shared * shared = dummy;
> + int i = shared->loops;
> + barrier_sync(&shared->barrier_before);
> + while (i--) {
> + futex_setwait_lock(&shared->futex);
> + futex_cmpxchg_unlock(&shared->futex);
> + }
> + barrier_sync(&shared->barrier_after);
> + return NULL;
> +}
> +
> +static int futex_setwait_supported(void)
> +{
> + int futex = 0;
> + futex_set_wait(futex, 1, 2, NULL, ~0, FUTEX_PRIVATE_FLAG);
> + return errno == EWOULDBLOCK;
> +}
> +
> +int main (void)
> +{
> + if (futex_setwait_supported()) {
> + printf("\nFUTEX_SET_WAIT test\n");
> + locktest(futex_setwait_test, 100000000);
> + }
> + return 0;
> +}
> diff --git a/performance/futex_wait_test.c b/performance/futex_wait_test.c
> new file mode 100644
> index 0000000..88ce2f2
> --- /dev/null
> +++ b/performance/futex_wait_test.c
> @@ -0,0 +1,56 @@
> +// Copyright 2009 Google Inc.
> +// Author: [email protected] (Michel Lespinasse)
> +
> +#include "futextest.h"
> +#include "harness.h"
> +
> +#include <stdio.h>
> +
> +
> +static inline void futex_wait_lock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0, 1);
> + while (status != 0) {
> + if (status == 1)
> + status = futex_cmpxchg(futex, 1, 2);
> + if (status != 0) {
> + futex_wait(futex, 2, NULL, FUTEX_PRIVATE_FLAG);
> + status = *futex;
> + }
> + if (status == 0)
> + status = futex_cmpxchg(futex, 0, 2);
> + }
> +}
> +
> +static inline void futex_cmpxchg_unlock(futex_t *futex)
> +{
> + int status = *futex;
> + if (status == 1)
> + status = futex_cmpxchg(futex, 1, 0);
> + if (status == 2) {
> + futex_cmpxchg(futex, 2, 0);
> + futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
> + }
> +}
> +
> +static void * futex_wait_test(void * dummy)
> +{
> + struct locktest_shared * shared = dummy;
> + int i = shared->loops;
> + barrier_sync(&shared->barrier_before);
> + while (i--) {
> + futex_wait_lock(&shared->futex);
> + futex_cmpxchg_unlock(&shared->futex);
> + }
> + barrier_sync(&shared->barrier_after);
> + return NULL;
> +}
> +
> +int main (void)
> +{
> + printf("FUTEX_WAIT test\n");
> + locktest(futex_wait_test, 100000000);
> + return 0;
> +}
> diff --git a/performance/harness.h b/performance/harness.h
> new file mode 100644
> index 0000000..9d74d17
> --- /dev/null
> +++ b/performance/harness.h
> @@ -0,0 +1,103 @@
> +// Copyright 2009 Google Inc.
> +// Author: [email protected] (Michel Lespinasse)
> +
> +#include <limits.h>
> +#include <sys/times.h>
> +#include <stdio.h>
> +#include <pthread.h>
> +
> +
> +struct thread_barrier {
> + futex_t threads;
> + futex_t unblock;
> +};
> +
> +struct locktest_shared {
> + struct thread_barrier barrier_before;
> + struct thread_barrier barrier_after;
> + int loops;
> + futex_t futex;
> +};
> +
> +static inline void decrement(futex_t *ptr)
> +{
> + __sync_fetch_and_add(ptr, -1);
> +}
> +
> +/* Called by main thread to initialize barrier */
> +static void barrier_init(struct thread_barrier *barrier, int threads)
> +{
> + barrier->threads = threads;
> + barrier->unblock = 0;
> +}
> +
> +/* Called by worker threads to synchronize with main thread */
> +static void barrier_sync(struct thread_barrier *barrier)
> +{
> + decrement(&barrier->threads);
> + if (barrier->threads == 0)
> + futex_wake(&barrier->threads, 1, FUTEX_PRIVATE_FLAG);
> + while (barrier->unblock == 0)
> + futex_wait(&barrier->unblock, 0, NULL, FUTEX_PRIVATE_FLAG);
> +}
> +
> +/* Called by main thread to wait for all workers to reach sync point */
> +static void barrier_wait(struct thread_barrier *barrier)
> +{
> + int threads;
> + while ((threads = barrier->threads) > 0)
> + futex_wait(&barrier->threads, threads, NULL,
> + FUTEX_PRIVATE_FLAG);
> +}
> +
> +/* Called by main thread to unblock worker threads from their sync point */
> +static void barrier_unblock(struct thread_barrier *barrier)
> +{
> + barrier->unblock = 1;
> + futex_wake(&barrier->unblock, INT_MAX, FUTEX_PRIVATE_FLAG);
> +}
> +
> +
> +static void locktest(void * thread_function(void *), int iterations)
> +{
> + int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
> + 64, 128, 256, 512, 1024, 0 };
> + int t;
> + for (t = 0; threads[t]; t++) {
> + int num_threads = threads[t];
> + struct locktest_shared shared;
> + pthread_t thread[num_threads];
> + int i;
> + clock_t before, after;
> + struct tms tms_before, tms_after;
> + int wall, user, system;
> + double tick;
> +
> + barrier_init(&shared.barrier_before, num_threads);
> + barrier_init(&shared.barrier_after, num_threads);
> + shared.loops = iterations / num_threads;
> + shared.futex = 0;
> +
> + for (i = 0; i < num_threads; i++)
> + pthread_create(thread + i, NULL, thread_function,
> + &shared);
> + barrier_wait(&shared.barrier_before);
> + before = times(&tms_before);
> + barrier_unblock(&shared.barrier_before);
> + barrier_wait(&shared.barrier_after);
> + after = times(&tms_after);
> + wall = after - before;
> + user = tms_after.tms_utime - tms_before.tms_utime;
> + system = tms_after.tms_stime - tms_before.tms_stime;
> + tick = 1.0 / sysconf(_SC_CLK_TCK);
> + printf("%d threads: %.0f Kiter/s "
> + "(%.2fs user %.2fs system %.2fs wall %.2f cores)\n",
> + num_threads,
> + (num_threads * shared.loops) / (wall * tick * 1000),
> + user * tick, system * tick, wall * tick,
> + wall ? (user + system) * 1. / wall : 1.);
> + barrier_unblock(&shared.barrier_after);
> + for (i = 0; i < num_threads; i++)
> + pthread_join(thread[i], NULL);
> + }
> +}
>
>
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Michel Lespinasse wrote:
>
>
> On Thu, Nov 19, 2009 at 9:03 AM, Darren Hart <[email protected]
> <mailto:[email protected]>> wrote:
>
> Michel Lespinasse wrote:
>
> On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Michael, would you be willing to include a version of this
> test in the above test suite? If so, then in keeping with
> the rest of the test suite, I would recommend splitting into
> two tests, one of each opcode being tested, and add
> argument to define thread count. The run.sh script would
> then run each thread count as a separate test run.
>
>
> There you go. Hope this helps. Feel free to adapt as needed.
>
> Signed-off-by: Michel Lespinasse <[email protected]
> <mailto:[email protected]>>
>
>
> My core-duo laptop hung after 256 threads. I left it running all
> night and woke to it still sitting at:
>
> 256 threads: 11792 Kiter/s (14.18s user 0.28s system 8.48s wall 1.71
> cores)
>
> Have experienced a hang with this test on any platform? I'll take a
> closer look at the source today to see if there is anything in there
> that requires a certain number of CPUs to function properly.
>
>
> Which test were you running, futex_wait_test or futex_setwait_test ?
This is futex_wait_test
>
> This is not supposed to require any particular number of CPUs, so I am
> concerned about the hang.
>
> How reproducible is this for you ? Do you know if the original test code
> I sent hanged in the same way ?
I'm basically 2 for 2 on each version of the futex_wait_test. I haven't
seen it run to completion yet. This is on a stock Ubuntu kernel
(2.6.31-15-generic) on my core duo laptop (32 bit).
Futex locking constructs are tricky. I'll spend some time looking over
the barriers and locks used in the test. I tried to do some simple
instrumenting, but that masked the hang (not unexpectedly). I'll keep
looking.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
On Thu, Nov 19, 2009 at 03:13:40PM -0800, Darren Hart wrote:
>> This is not supposed to require any particular number of CPUs, so I am
>> concerned about the hang.
>>
>> How reproducible is this for you ? Do you know if the original test code
>> I sent hanged in the same way ?
>
> I'm basically 2 for 2 on each version of the futex_wait_test. I haven't
> seen it run to completion yet. This is on a stock Ubuntu kernel
> (2.6.31-15-generic) on my core duo laptop (32 bit).
As we found out in private email conversation before, this was due to hitting
resource allocation limits while creating the threads. The patch below
(relative to your set_wait branch) should fix this.
Patch summary:
* Added FUTEX_WAIT_BITSET/FUTEX_WAKE_BITSET definitions - I do have
old enough headers that this was a problem now that you use
inline functions in futextest.h
* Changed futex_cmpxchg return type to int - I dont really like this change,
but otherwise gcc complains about the volatile keyword being ignored in
the returned function type. Feel free to ignore this if you like.
* futex_setwait_lock: changed to a more compact & faster implementation
(same algorithm but wrote the loop in a different way)
* test harness: look at pthread_create return code; if thread creation fails
then join with existing threads and exit. Also moved the before/after
barrier logic into the test harness rather than the futex_[set]wait_test
functions.
Hope this helps.
Signed-off-by: Michel Lespinasse <[email protected]>
diff --git a/include/futextest.h b/include/futextest.h
index 835867e..77d6a72 100644
--- a/include/futextest.h
+++ b/include/futextest.h
@@ -39,6 +39,12 @@ typedef volatile u_int32_t futex_t;
#define FUTEX_INITIALIZER 0
/* Define the newer op codes if the system header file is not up to date. */
+#ifndef FUTEX_WAIT_BITSET
+#define FUTEX_WAIT_BITSET 9
+#endif
+#ifndef FUTEX_WAKE_BITSET
+#define FUTEX_WAKE_BITSET 10
+#endif
#ifndef FUTEX_WAIT_REQUEUE_PI
#define FUTEX_WAIT_REQUEUE_PI 11
#endif
@@ -239,7 +245,7 @@ futex_set_wait(futex_t *uaddr, futex_t val, u_int32_t setval,
* Implement cmpxchg using gcc atomic builtins.
* http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
*/
-static inline futex_t
+static inline int
futex_cmpxchg(futex_t *uaddr, u_int32_t oldval, u_int32_t newval)
{
return __sync_val_compare_and_swap(uaddr, oldval, newval);
diff --git a/performance/futex_set_wait.c b/performance/futex_set_wait.c
index 4f89e40..d804778 100644
--- a/performance/futex_set_wait.c
+++ b/performance/futex_set_wait.c
@@ -10,23 +10,14 @@
static inline void futex_setwait_lock(futex_t *futex)
{
- int status = *futex;
- if (status == 0)
- status = futex_cmpxchg(futex, 0, 1);
- if (status != 0) {
- int desired_status = 1;
- do {
- if (futex_set_wait(futex, 1, 2, NULL, ~0,
- FUTEX_PRIVATE_FLAG) == 0) {
- /* We absorbed a wakeup; so make sure to
- unblock next thread */
- desired_status = 2;
- }
- status = *futex;
- if (status == 0)
- status = futex_cmpxchg(futex, 0,
- desired_status);
- } while (status != 0);
+ int desired_status = 1;
+ while (*futex != 0 || futex_cmpxchg(futex, 0, desired_status) != 0) {
+ if (futex_set_wait(futex, 1, 2, NULL, ~0,
+ FUTEX_PRIVATE_FLAG) == 0) {
+ /* We absorbed a wakeup; so make sure to
+ unblock next thread */
+ desired_status = 2;
+ }
}
}
@@ -41,17 +32,12 @@ static inline void futex_cmpxchg_unlock(futex_t *futex)
}
}
-static void * futex_setwait_test(void * dummy)
+static void futex_setwait_test(futex_t *futex, int loops)
{
- struct locktest_shared * shared = dummy;
- int i = shared->loops;
- barrier_sync(&shared->barrier_before);
- while (i--) {
- futex_setwait_lock(&shared->futex);
- futex_cmpxchg_unlock(&shared->futex);
+ while (loops--) {
+ futex_setwait_lock(futex);
+ futex_cmpxchg_unlock(futex);
}
- barrier_sync(&shared->barrier_after);
- return NULL;
}
static int futex_setwait_supported(void)
diff --git a/performance/futex_wait.c b/performance/futex_wait.c
index 88ce2f2..bcbca77 100644
--- a/performance/futex_wait.c
+++ b/performance/futex_wait.c
@@ -35,17 +35,12 @@ static inline void futex_cmpxchg_unlock(futex_t *futex)
}
}
-static void * futex_wait_test(void * dummy)
+static void futex_wait_test(futex_t *futex, int loops)
{
- struct locktest_shared * shared = dummy;
- int i = shared->loops;
- barrier_sync(&shared->barrier_before);
- while (i--) {
- futex_wait_lock(&shared->futex);
- futex_cmpxchg_unlock(&shared->futex);
+ while (loops--) {
+ futex_wait_lock(futex);
+ futex_cmpxchg_unlock(futex);
}
- barrier_sync(&shared->barrier_after);
- return NULL;
}
int main (void)
diff --git a/performance/harness.h b/performance/harness.h
index 9d74d17..d0dd392 100644
--- a/performance/harness.h
+++ b/performance/harness.h
@@ -15,13 +15,14 @@ struct thread_barrier {
struct locktest_shared {
struct thread_barrier barrier_before;
struct thread_barrier barrier_after;
+ void (* locktest_function)(futex_t * ptr, int loops);
int loops;
futex_t futex;
};
static inline void decrement(futex_t *ptr)
{
- __sync_fetch_and_add(ptr, -1);
+ __sync_sub_and_fetch(ptr, 1);
}
/* Called by main thread to initialize barrier */
@@ -32,13 +33,14 @@ static void barrier_init(struct thread_barrier *barrier, int threads)
}
/* Called by worker threads to synchronize with main thread */
-static void barrier_sync(struct thread_barrier *barrier)
+static int barrier_sync(struct thread_barrier *barrier)
{
decrement(&barrier->threads);
if (barrier->threads == 0)
futex_wake(&barrier->threads, 1, FUTEX_PRIVATE_FLAG);
while (barrier->unblock == 0)
futex_wait(&barrier->unblock, 0, NULL, FUTEX_PRIVATE_FLAG);
+ return barrier->unblock;
}
/* Called by main thread to wait for all workers to reach sync point */
@@ -51,14 +53,24 @@ static void barrier_wait(struct thread_barrier *barrier)
}
/* Called by main thread to unblock worker threads from their sync point */
-static void barrier_unblock(struct thread_barrier *barrier)
+static void barrier_unblock(struct thread_barrier *barrier, int value)
{
- barrier->unblock = 1;
+ barrier->unblock = value;
futex_wake(&barrier->unblock, INT_MAX, FUTEX_PRIVATE_FLAG);
}
+static void * locktest_thread(void * dummy)
+{
+ struct locktest_shared * shared = dummy;
+ if (barrier_sync(&shared->barrier_before) > 0) {
+ shared->locktest_function(&shared->futex, shared->loops);
+ barrier_sync(&shared->barrier_after);
+ }
+ return NULL;
+}
-static void locktest(void * thread_function(void *), int iterations)
+static void locktest(void locktest_function(futex_t * ptr, int loops),
+ int iterations)
{
int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
64, 128, 256, 512, 1024, 0 };
@@ -75,15 +87,22 @@ static void locktest(void * thread_function(void *), int iterations)
barrier_init(&shared.barrier_before, num_threads);
barrier_init(&shared.barrier_after, num_threads);
+ shared.locktest_function = locktest_function;
shared.loops = iterations / num_threads;
shared.futex = 0;
for (i = 0; i < num_threads; i++)
- pthread_create(thread + i, NULL, thread_function,
- &shared);
+ if (pthread_create(thread + i, NULL, locktest_thread,
+ &shared)) {
+ /* Could not create thread; abort */
+ barrier_unblock(&shared.barrier_before, -1);
+ while (--i >= 0)
+ pthread_join(thread[i], NULL);
+ return;
+ }
barrier_wait(&shared.barrier_before);
before = times(&tms_before);
- barrier_unblock(&shared.barrier_before);
+ barrier_unblock(&shared.barrier_before, 1);
barrier_wait(&shared.barrier_after);
after = times(&tms_after);
wall = after - before;
@@ -96,7 +115,7 @@ static void locktest(void * thread_function(void *), int iterations)
(num_threads * shared.loops) / (wall * tick * 1000),
user * tick, system * tick, wall * tick,
wall ? (user + system) * 1. / wall : 1.);
- barrier_unblock(&shared.barrier_after);
+ barrier_unblock(&shared.barrier_after, 1);
for (i = 0; i < num_threads; i++)
pthread_join(thread[i], NULL);
}
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
Michel Lespinasse wrote:
> On Thu, Nov 19, 2009 at 03:13:40PM -0800, Darren Hart wrote:
>>> This is not supposed to require any particular number of CPUs, so I am
>>> concerned about the hang.
>>>
>>> How reproducible is this for you ? Do you know if the original test code
>>> I sent hanged in the same way ?
>> I'm basically 2 for 2 on each version of the futex_wait_test. I haven't
>> seen it run to completion yet. This is on a stock Ubuntu kernel
>> (2.6.31-15-generic) on my core duo laptop (32 bit).
>
> As we found out in private email conversation before, this was due to hitting
> resource allocation limits while creating the threads. The patch below
> (relative to your set_wait branch) should fix this.
>
> Patch summary:
> * Added FUTEX_WAIT_BITSET/FUTEX_WAKE_BITSET definitions - I do have
> old enough headers that this was a problem now that you use
> inline functions in futextest.h
Great, thanks.
> * Changed futex_cmpxchg return type to int - I dont really like this change,
> but otherwise gcc complains about the volatile keyword being ignored in
> the returned function type. Feel free to ignore this if you like.
Hrm. Which version of gcc? I've tested with 4.1 and 4.4 with the -Wall
option, neither complain for me.
> * futex_setwait_lock: changed to a more compact & faster implementation
> (same algorithm but wrote the loop in a different way)
> * test harness: look at pthread_create return code; if thread creation fails
> then join with existing threads and exit. Also moved the before/after
> barrier logic into the test harness rather than the futex_[set]wait_test
> functions.
I'll split these up and apply - with the exception of the futex_cmpxchg
bit until we sort out what's going on. It would facilitate patch
integration in the future if you could break things up functionally and
use use "git format-patch" to generate the patches.
Again, thank you for the contributions.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Hi folks, sorry for my very slow response...
I found that Darren's futextest contains Michel's test program now.
So I added it to 'perf bench' as sample.
If you like this style, I'd like to add rest part of futextest.
Example of use:
| % ./perf bench futex wait -t 48 -l 100000000
| # Running futex/wait benchmark...
| # Running 48 threads
| # Total number of iteration: 100000000
| 16233.763636 Kiter/sec
| 6.100000 user sec
| 0.000000 system sec
| 6.160000 wall sec
| 0.990260 cores
| % perf bench --format=simple futex wait
| 16207.455429
This patch series contains two typedefs (and one volatile).
I know typedef and volatile is not a thing to welcome.
But I judged these are not problematic things,
could you review this?
Hitoshi Mitake (3):
perf bench: Add wrappers for atomic operation of GCC
perf bench: Add new files for futex performance test
perf bench: Fix misc files to build files related to futex
tools/perf/Makefile | 1 +
tools/perf/bench/bench.h | 3 +-
tools/perf/bench/futex-wait.c | 218 ++++++++++++++++++++++++++
tools/perf/bench/futextest.h | 280 ++++++++++++++++++++++++++++++++++
tools/perf/builtin-bench.c | 16 ++-
tools/perf/util/include/asm/atomic.h | 91 +++++++++++
6 files changed, 606 insertions(+), 3 deletions(-)
create mode 100644 tools/perf/bench/futex-wait.c
create mode 100644 tools/perf/bench/futextest.h
create mode 100644 tools/perf/util/include/asm/atomic.h
This patch adds new file util/include/asm/atomic.h.
It contains wrappers for atomic operation of GCC,
I think this is useful not only for 'perf bench',
but also for entire of perf command.
This patch adds new typedefed struct 'atomic_t'.
I know new typedef is not a thing to welcome,
but I believe that atomic_t is worth to typedef
because it is much general.
I borrowed this file from Darren Hart's futextest.
http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
tools/perf/util/include/asm/atomic.h | 91 ++++++++++++++++++++++++++++++++++
1 files changed, 91 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/util/include/asm/atomic.h
diff --git a/tools/perf/util/include/asm/atomic.h b/tools/perf/util/include/asm/atomic.h
new file mode 100644
index 0000000..1cef451
--- /dev/null
+++ b/tools/perf/util/include/asm/atomic.h
@@ -0,0 +1,91 @@
+/******************************************************************************
+ *
+ * Copyright © International Business Machines Corp., 2009
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * NAME
+ * atomic.h
+ *
+ * DESCRIPTION
+ * GCC atomic builtin wrappers
+ * http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
+ *
+ * AUTHOR
+ * Darren Hart <[email protected]>
+ *
+ * HISTORY
+ * 2009-Nov-17: Initial version by Darren Hart <[email protected]>
+ *
+ *****************************************************************************/
+
+#ifndef _PERF_ASM_ATOMIC_H
+#define _PERF_ASM_ATOMIC_H
+
+typedef struct {
+ volatile int val;
+} atomic_t;
+
+#define ATOMIC_INITIALIZER { 0 }
+
+/**
+ * atomic_cmpxchg() - Atomic compare and exchange
+ * @uaddr: The address of the futex to be modified
+ * @oldval: The expected value of the futex
+ * @newval: The new value to try and assign the futex
+ *
+ * Return the old value of addr->val.
+ */
+static inline int atomic_cmpxchg(atomic_t *addr, int oldval, int newval)
+{
+ return __sync_val_compare_and_swap(&addr->val, oldval, newval);
+}
+
+/**
+ * atomic_inc() - Atomic incrememnt
+ * @addr: Address of the variable to increment
+ *
+ * Return the new value of addr->val.
+ */
+static inline int atomic_inc(atomic_t *addr)
+{
+ return __sync_add_and_fetch(&addr->val, 1);
+}
+
+/**
+ * atomic_dec() - Atomic decrement
+ * @addr: Address of the variable to decrement
+ *
+ * Return the new value of addr-val.
+ */
+static inline int atomic_dec(atomic_t *addr)
+{
+ return __sync_sub_and_fetch(&addr->val, 1);
+}
+
+/**
+ * atomic_set() - Atomic set
+ * @addr: Address of the variable to set
+ * @newval: New value for the atomic_t
+ *
+ * Return the new value of addr->val.
+ */
+static inline int atomic_set(atomic_t *addr, int newval)
+{
+ addr->val = newval;
+ return newval;
+}
+
+#endif /* _PERF_ASM_ATOMIC_H */
--
1.6.5.2
This patch adds two new files.
futextest.h provides general wrappers for futex() system call.
This patch containts the line:
typedef volatile u_int32_t futex_t;
I know that new typedef is not a thing to welcome,
but this is useful thing.
futex-wait.c is a new suite to test performance of FUTEX_WAIT.
These files are from Darren Hart's futex test,
and futex-wait.c is based on the program originally
written by Michel Lespinasse.
Example of use:
| % ./perf bench futex wait -t 48 -l 100000000
| # Running futex/wait benchmark...
| # Running 48 threads
| # Total number of iteration: 100000000
| 16233.763636 Kiter/sec
| 6.100000 user sec
| 0.000000 system sec
| 6.160000 wall sec
| 0.990260 cores
| % perf bench --format=simple futex wait
| 16207.455429
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
tools/perf/bench/futex-wait.c | 218 ++++++++++++++++++++++++++++++++
tools/perf/bench/futextest.h | 280 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 498 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/bench/futex-wait.c
create mode 100644 tools/perf/bench/futextest.h
diff --git a/tools/perf/bench/futex-wait.c b/tools/perf/bench/futex-wait.c
new file mode 100644
index 0000000..81335ed
--- /dev/null
+++ b/tools/perf/bench/futex-wait.c
@@ -0,0 +1,218 @@
+/*
+ *
+ * futex-wait.c
+ *
+ * wait: Performance test for wait op of futex()
+ *
+ * Copyright 2009 Google Inc.
+ * Original Author: [email protected] (Michel Lespinasse)
+ * Ported to perf by Hitoshi Mitake <[email protected]>
+ */
+
+#include "../perf.h"
+#include "../util/util.h"
+#include "../util/parse-options.h"
+#include "../util/header.h"
+#include "../builtin.h"
+#include "bench.h"
+#include "futextest.h"
+#include "../util/include/asm/atomic.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <limits.h>
+#include <sys/times.h>
+#include <stdio.h>
+#include <pthread.h>
+
+struct thread_barrier {
+ futex_t threads;
+ futex_t unblock;
+};
+
+struct locktest_shared {
+ struct thread_barrier barrier_before;
+ struct thread_barrier barrier_after;
+ void (*locktest_function)(futex_t *ptr, int loops);
+ int loops;
+ futex_t futex;
+};
+
+/* Called by main thread to initialize barrier */
+static void barrier_init(struct thread_barrier *barrier, int threads)
+{
+ barrier->threads = threads;
+ barrier->unblock = 0;
+}
+
+/* Called by worker threads to synchronize with main thread */
+static int barrier_sync(struct thread_barrier *barrier)
+{
+ futex_dec(&barrier->threads);
+ if (barrier->threads == 0)
+ futex_wake(&barrier->threads, 1, FUTEX_PRIVATE_FLAG);
+ while (barrier->unblock == 0)
+ futex_wait(&barrier->unblock, 0, NULL, FUTEX_PRIVATE_FLAG);
+ return barrier->unblock;
+}
+
+/* Called by main thread to wait for all workers to reach sync point */
+static void barrier_wait(struct thread_barrier *barrier)
+{
+ int threads;
+ while ((threads = barrier->threads) > 0) {
+ futex_wait(&barrier->threads, threads, NULL,
+ FUTEX_PRIVATE_FLAG);
+ }
+}
+
+/* Called by main thread to unblock worker threads from their sync point */
+static void barrier_unblock(struct thread_barrier *barrier, int value)
+{
+ barrier->unblock = value;
+ futex_wake(&barrier->unblock, INT_MAX, FUTEX_PRIVATE_FLAG);
+}
+
+static void *locktest_thread(void * dummy)
+{
+ struct locktest_shared *shared = dummy;
+ if (barrier_sync(&shared->barrier_before) > 0) {
+ shared->locktest_function(&shared->futex, shared->loops);
+ barrier_sync(&shared->barrier_after);
+ }
+ return NULL;
+}
+
+static void locktest(void locktest_function(futex_t *ptr, int loops),
+ int num_threads, int iterations)
+{
+ struct locktest_shared shared;
+ pthread_t *thread;
+ int i;
+ clock_t before, after;
+ struct tms tms_before, tms_after;
+ int wall_time, user_time, system_time;
+ double tick;
+
+ thread = calloc(sizeof(pthread_t), num_threads);
+ BUG_ON(!thread);
+ barrier_init(&shared.barrier_before, num_threads);
+ barrier_init(&shared.barrier_after, num_threads);
+ shared.locktest_function = locktest_function;
+ shared.loops = iterations / num_threads;
+ shared.futex = 0;
+
+ for (i = 0; i < num_threads; i++) {
+ BUG_ON(pthread_create(thread + i, NULL,
+ locktest_thread, &shared));
+ }
+
+ barrier_wait(&shared.barrier_before);
+ before = times(&tms_before);
+ barrier_unblock(&shared.barrier_before, 1);
+ barrier_wait(&shared.barrier_after);
+ after = times(&tms_after);
+ wall_time = after - before;
+ user_time = tms_after.tms_utime - tms_before.tms_utime;
+ system_time = tms_after.tms_stime - tms_before.tms_stime;
+ tick = 1.0 / sysconf(_SC_CLK_TCK);
+
+ switch (bench_format) {
+ case BENCH_FORMAT_DEFAULT:
+ printf(" %14f Kiter/sec\n",
+ (num_threads * shared.loops)
+ / (wall_time * tick * 1000));
+ printf(" %14f user sec\n", user_time * tick);
+ printf(" %14f system sec\n", system_time * tick);
+ printf(" %14f wall sec\n", wall_time * tick);
+ printf(" %14f cores\n",
+ wall_time ?
+ (user_time + system_time) * 1. / wall_time : 1.);
+ break;
+ case BENCH_FORMAT_SIMPLE:
+ printf("%f\n",
+ (num_threads * shared.loops)
+ / (wall_time * tick * 1000));
+ break;
+ default:
+ /* reaching this means there's some disaster: */
+ die("unknown format: %d\n", bench_format);
+ break;
+ }
+
+ barrier_unblock(&shared.barrier_after, 1);
+
+ for (i = 0; i < num_threads; i++)
+ BUG_ON(pthread_join(thread[i], NULL));
+}
+
+static inline void futex_wait_lock(futex_t *futex)
+{
+ int status = *futex;
+
+ if (status == 0)
+ status = futex_cmpxchg(futex, 0, 1);
+
+ while (status != 0) {
+ if (status == 1)
+ status = futex_cmpxchg(futex, 1, 2);
+ if (status != 0) {
+ futex_wait(futex, 2, NULL, FUTEX_PRIVATE_FLAG);
+ status = *futex;
+ }
+ if (status == 0)
+ status = futex_cmpxchg(futex, 0, 2);
+ }
+}
+
+static inline void futex_cmpxchg_unlock(futex_t *futex)
+{
+ int status = *futex;
+
+ if (status == 1)
+ status = futex_cmpxchg(futex, 1, 0);
+
+ if (status == 2) {
+ futex_cmpxchg(futex, 2, 0);
+ futex_wake(futex, 1, FUTEX_PRIVATE_FLAG);
+ }
+}
+
+static void futex_wait_test(futex_t *futex, int loops)
+{
+ while (loops--) {
+ futex_wait_lock(futex);
+ futex_cmpxchg_unlock(futex);
+ }
+}
+
+static int threads = 32;
+static int loops = 100000000;
+
+static const struct option options[] = {
+ OPT_INTEGER('t', "threads", &threads,
+ "Specify number of threads"),
+ OPT_INTEGER('l', "loops", &loops,
+ "Specify number of loops"),
+ OPT_END()
+};
+
+static const char * const bench_futex_wait_usage[] = {
+ "perf bench futex wait <options>",
+ NULL
+};
+
+int bench_futex_wait(int argc, const char **argv,
+ const char *prefix __used)
+{
+ argc = parse_options(argc, argv, options,
+ bench_futex_wait_usage, 0);
+
+ if (bench_format == BENCH_FORMAT_DEFAULT) {
+ printf("# Running %d threads\n", threads);
+ printf("# Total number of iteration: %d\n", loops);
+ }
+
+ locktest(futex_wait_test, threads, loops);
+ return 0;
+}
diff --git a/tools/perf/bench/futextest.h b/tools/perf/bench/futextest.h
new file mode 100644
index 0000000..09d8b94
--- /dev/null
+++ b/tools/perf/bench/futextest.h
@@ -0,0 +1,280 @@
+/******************************************************************************
+ *
+ * Copyright © International Business Machines Corp., 2009
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * NAME
+ * futextest.h
+ *
+ * DESCRIPTION
+ * Glibc independent futex library for testing kernel functionality.
+ *
+ * AUTHOR
+ * Darren Hart <[email protected]>
+ *
+ * HISTORY
+ * 2009-Nov-24:
+ * Ported to perf by Hitoshi Mitake <[email protected]>
+ * 2009-Nov-06: Initial version by Darren Hart <[email protected]>
+ *
+ *****************************************************************************/
+
+#ifndef _FUTEXTEST_H
+#define _FUTEXTEST_H
+
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <linux/futex.h>
+
+typedef volatile u_int32_t futex_t;
+#define FUTEX_INITIALIZER 0
+
+/* Define the newer op codes if the system header file is not up to date. */
+#ifndef FUTEX_WAIT_BITSET
+#define FUTEX_WAIT_BITSET 9
+#endif
+#ifndef FUTEX_WAKE_BITSET
+#define FUTEX_WAKE_BITSET 10
+#endif
+#ifndef FUTEX_WAIT_REQUEUE_PI
+#define FUTEX_WAIT_REQUEUE_PI 11
+#endif
+#ifndef FUTEX_CMP_REQUEUE_PI
+#define FUTEX_CMP_REQUEUE_PI 12
+#endif
+#ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
+ FUTEX_PRIVATE_FLAG)
+#endif
+#ifndef FUTEX_REQUEUE_PI_PRIVATE
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
+ FUTEX_PRIVATE_FLAG)
+#endif
+
+/**
+ * futex() - SYS_futex syscall wrapper
+ * @uaddr: address of first futex
+ * @op: futex op code
+ * @val: typically expected value of uaddr, but varies by op
+ * @timeout: typically an absolute struct timespec (except where noted
+ * otherwise). Overloaded by some ops
+ * @uaddr2: address of second futex for some ops\
+ * @val3: varies by op
+ * @opflags: flags to be bitwise OR'd with op, such as FUTEX_PRIVATE_FLAG
+ *
+ * futex() is used by all the following futex op wrappers. It can also be
+ * used for misuse and abuse testing. Generally, the specific op wrappers
+ * should be used instead. It is a macro instead of an static inline function as
+ * some of the types over overloaded (timeout is used for nr_requeue for
+ * example).
+ *
+ * These argument descriptions are the defaults for all
+ * like-named arguments in the following wrappers except where noted below.
+ */
+#define futex(uaddr, op, val, timeout, uaddr2, val3, opflags) \
+ syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3)
+
+/**
+ * futex_wait() - block on uaddr with optional timeout
+ * @timeout: relative timeout
+ */
+static inline int
+futex_wait(futex_t *uaddr, futex_t val, struct timespec *timeout, int opflags)
+{
+ return futex(uaddr, FUTEX_WAIT, val, timeout, NULL, 0, opflags);
+}
+
+/**
+ * futex_wake() - wake one or more tasks blocked on uaddr
+ * @nr_wake: wake up to this many tasks
+ */
+static inline int
+futex_wake(futex_t *uaddr, int nr_wake, int opflags)
+{
+ return futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags);
+}
+
+/**
+ * futex_wait_bitset() - block on uaddr with bitset
+ * @bitset: bitset to be used with futex_wake_bitset
+ */
+static inline int
+futex_wait_bitset(futex_t *uaddr, futex_t val, struct timespec *timeout,
+ u_int32_t bitset, int opflags)
+{
+ return futex(uaddr, FUTEX_WAIT_BITSET, val, timeout, NULL, bitset,
+ opflags);
+}
+
+/**
+ * futex_wake_bitset() - wake one or more tasks blocked on uaddr with bitset
+ * @bitset: bitset to compare with that used in futex_wait_bitset
+ */
+static inline int
+futex_wake_bitset(futex_t *uaddr, int nr_wake, u_int32_t bitset, int opflags)
+{
+ return futex(uaddr, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset,
+ opflags);
+}
+
+/**
+ * futex_lock_pi() - block on uaddr as a PI mutex
+ * @detect: whether (1) or not (0) to perform deadlock detection
+ */
+static inline int
+futex_lock_pi(futex_t *uaddr, struct timespec *timeout, int detect,
+ int opflags)
+{
+ return futex(uaddr, FUTEX_LOCK_PI, detect, timeout, NULL, 0, opflags);
+}
+
+/**
+ * futex_unlock_pi() - release uaddr as a PI mutex, waking the top waiter
+ */
+static inline int
+futex_unlock_pi(futex_t *uaddr, int opflags)
+{
+ return futex(uaddr, FUTEX_UNLOCK_PI, 0, NULL, NULL, 0, opflags);
+}
+
+/**
+ * futex_wake_op() - FIXME: COME UP WITH A GOOD ONE LINE DESCRIPTION
+ */
+static inline int
+futex_wake_op(futex_t *uaddr, futex_t *uaddr2, int nr_wake, int nr_wake2,
+ int wake_op, int opflags)
+{
+ return futex(uaddr, FUTEX_WAKE_OP, nr_wake, nr_wake2, uaddr2, wake_op,
+ opflags);
+}
+
+/**
+ * futex_requeue() - requeue without expected value comparison, deprecated
+ * @nr_wake: wake up to this many tasks
+ * @nr_requeue: requeue up to this many tasks
+ *
+ * Due to its inherently racy implementation, futex_requeue() is deprecated in
+ * favor of futex_cmp_requeue().
+ */
+static inline int
+futex_requeue(futex_t *uaddr, futex_t *uaddr2, int nr_wake, int nr_requeue,
+ int opflags)
+{
+ return futex(uaddr, FUTEX_REQUEUE, nr_wake, nr_requeue, uaddr2, 0,
+ opflags);
+}
+
+/**
+ * futex_cmp_requeue() - requeue tasks from uaddr to uaddr2
+ * @nr_wake: wake up to this many tasks
+ * @nr_requeue: requeue up to this many tasks
+ */
+static inline int
+futex_cmp_requeue(futex_t *uaddr, futex_t val, futex_t *uaddr2, int nr_wake,
+ int nr_requeue, int opflags)
+{
+ return futex(uaddr, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, uaddr2,
+ val, opflags);
+}
+
+/**
+ * futex_wait_requeue_pi() - block on uaddr and prepare to requeue to uaddr2
+ * @uaddr: non-PI futex source
+ * @uaddr2: PI futex target
+ *
+ * This is the first half of the requeue_pi mechanism. It shall always be
+ * paired with futex_cmp_requeue_pi().
+ */
+static inline int
+futex_wait_requeue_pi(futex_t *uaddr, futex_t val, futex_t *uaddr2,
+ struct timespec *timeout, int opflags)
+{
+ return futex(uaddr, FUTEX_WAIT_REQUEUE_PI, val, timeout, uaddr2, 0,
+ opflags);
+}
+
+/**
+ * futex_cmp_requeue_pi() - requeue tasks from uaddr to uaddr2 (PI aware)
+ * @uaddr: non-PI futex source
+ * @uaddr2: PI futex target
+ * @nr_wake: wake up to this many tasks
+ * @nr_requeue: requeue up to this many tasks
+ */
+static inline int
+futex_cmp_requeue_pi(futex_t *uaddr, futex_t val, futex_t *uaddr2, int nr_wake,
+ int nr_requeue, int opflags)
+{
+ return futex(uaddr, FUTEX_CMP_REQUEUE_PI,
+ nr_wake, nr_requeue, uaddr2, val, opflags);
+}
+
+/**
+ * futex_cmpxchg() - atomic compare and exchange
+ * @uaddr: The address of the futex to be modified
+ * @oldval: The expected value of the futex
+ * @newval: The new value to try and assign the futex
+ *
+ * Implement cmpxchg using gcc atomic builtins.
+ * http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
+ *
+ * Return the old futex value.
+ */
+static inline u_int32_t
+futex_cmpxchg(futex_t *uaddr, u_int32_t oldval, u_int32_t newval)
+{
+ return __sync_val_compare_and_swap(uaddr, oldval, newval);
+}
+
+/**
+ * futex_dec() - atomic decrement of the futex value
+ * @uaddr: The address of the futex to be modified
+ *
+ * Return the new futex value.
+ */
+static inline u_int32_t
+futex_dec(futex_t *uaddr)
+{
+ return __sync_sub_and_fetch(uaddr, 1);
+}
+
+/**
+ * futex_inc() - atomic increment of the futex value
+ * @uaddr: the address of the futex to be modified
+ *
+ * Return the new futex value.
+ */
+static inline u_int32_t
+futex_inc(futex_t *uaddr)
+{
+ return __sync_add_and_fetch(uaddr, 1);
+}
+
+/**
+ * futex_set() - atomic decrement of the futex value
+ * @uaddr: the address of the futex to be modified
+ * @newval: New value for the atomic_t
+ *
+ * Return the new futex value.
+ */
+static inline u_int32_t
+futex_set(futex_t *uaddr, u_int32_t newval)
+{
+ *uaddr = newval;
+ return newval;
+}
+
+#endif
--
1.6.5.2
This patch fixes misc files including Makefile
to build files related to futex.
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/bench/bench.h | 3 ++-
tools/perf/builtin-bench.c | 16 ++++++++++++++--
3 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index f1537a9..fdc4920 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -420,6 +420,7 @@ BUILTIN_OBJS += builtin-bench.o
BUILTIN_OBJS += bench/sched-messaging.o
BUILTIN_OBJS += bench/sched-pipe.o
BUILTIN_OBJS += bench/mem-memcpy.o
+BUILTIN_OBJS += bench/futex-wait.o
BUILTIN_OBJS += builtin-help.o
BUILTIN_OBJS += builtin-sched.o
diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index f7781c6..8010f97 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -3,7 +3,8 @@
extern int bench_sched_messaging(int argc, const char **argv, const char *prefix);
extern int bench_sched_pipe(int argc, const char **argv, const char *prefix);
-extern int bench_mem_memcpy(int argc, const char **argv, const char *prefix __used);
+extern int bench_mem_memcpy(int argc, const char **argv, const char *prefix);
+extern int bench_futex_wait(int argc, const char **argv, const char *prefix);
#define BENCH_FORMAT_DEFAULT_STR "default"
#define BENCH_FORMAT_DEFAULT 0
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index e043eb8..0c421bf 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -53,6 +53,15 @@ static struct bench_suite mem_suites[] = {
NULL }
};
+static struct bench_suite futex_suites[] = {
+ { "wait",
+ "Iterating locking/unlocking with many threads",
+ bench_futex_wait },
+ { NULL,
+ NULL,
+ NULL }
+};
+
struct bench_subsys {
const char *name;
const char *summary;
@@ -65,10 +74,13 @@ static struct bench_subsys subsystems[] = {
sched_suites },
{ "mem",
"memory access performance",
- mem_suites },
+ mem_suites },
+ { "futex",
+ "fast userspace mutex",
+ futex_suites },
{ NULL,
NULL,
- NULL }
+ NULL }
};
static void dump_suites(int subsys_index)
--
1.6.5.2
Hitoshi Mitake wrote:
> This patch adds new file util/include/asm/atomic.h.
> It contains wrappers for atomic operation of GCC,
> I think this is useful not only for 'perf bench',
> but also for entire of perf command.
>
> This patch adds new typedefed struct 'atomic_t'.
> I know new typedef is not a thing to welcome,
> but I believe that atomic_t is worth to typedef
> because it is much general.
>
> I borrowed this file from Darren Hart's futextest.
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
Hi Hitoshi-san,
I took the gcc built-ins approach for futextest.h because I didn't want
the hassle of maintaining per-arch asm files. Since perf is already in
the kernel source, I wonder if you could leverage the already existing
kernel atomic code? See Documentation/atomic_ops.txt.
>
> Signed-off-by: Hitoshi Mitake <[email protected]>
> Cc: Michel Lespinasse <[email protected]>
> Cc: Darren Hart <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> ---
> tools/perf/util/include/asm/atomic.h | 91 ++++++++++++++++++++++++++++++++++
> 1 files changed, 91 insertions(+), 0 deletions(-)
> create mode 100644 tools/perf/util/include/asm/atomic.h
>
> diff --git a/tools/perf/util/include/asm/atomic.h b/tools/perf/util/include/asm/atomic.h
> new file mode 100644
> index 0000000..1cef451
> --- /dev/null
> +++ b/tools/perf/util/include/asm/atomic.h
> @@ -0,0 +1,91 @@
> +/******************************************************************************
> + *
> + * Copyright B) International Business Machines Corp., 2009
B) should be ? (or (C) at the very least), I'm guessing character set issue?
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Hitoshi Mitake wrote:
> This patch adds two new files.
>
Hi Hitoshi-san,
> futextest.h provides general wrappers for futex() system call.
> This patch containts the line:
> typedef volatile u_int32_t futex_t;
> I know that new typedef is not a thing to welcome,
> but this is useful thing.
>
> futex-wait.c is a new suite to test performance of FUTEX_WAIT.
>
> These files are from Darren Hart's futex test,
> and futex-wait.c is based on the program originally
> written by Michel Lespinasse.
Thanks for looking at getting the futex performance tests into perf.
Just wanted to make sure you are aware that there will likely be several
more futextest/performance tests in the near future as the project is
under early active development. I see you have merged harness.h into
futex-wait.c, which is fine, but I will likely be adding a new
include/locking.h to add things like barrier, lock, and lock_pi locking
primitives to the futextest testsuite. futex-wait.c would then be
updated to use those directly if feasible and remove the custom locking
primitives in the test itself. I mention this so you are aware and perf
futex benchmarks don't get too far out of sync with futextest.
I'd be interested in any ideas you have to make futextest/performance/*
tests integrate more easily into perf as I'd like to include each new
test into perf as well.
Couple of nits below:
> diff --git a/tools/perf/bench/futextest.h b/tools/perf/bench/futextest.h
> new file mode 100644
> index 0000000..09d8b94
> --- /dev/null
> +++ b/tools/perf/bench/futextest.h
> @@ -0,0 +1,280 @@
> +/******************************************************************************
> + *
> + * Copyright B) International Business Machines Corp., 2009
Copyright character issue...
> + * HISTORY
> + * 2009-Nov-24:
> + * Ported to perf by Hitoshi Mitake <[email protected]>
> + * 2009-Nov-06: Initial version by Darren Hart <[email protected]>
I usually add the dates in ascending chronological order.
> + *
> + *****************************************************************************/
> +
> +#ifndef _FUTEXTEST_H
> +#define _FUTEXTEST_H
> +
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <linux/futex.h>
> +
> +typedef volatile u_int32_t futex_t;
I have been considering making this into a val wrapped in a struct like
the atomic_t as that will make adding things like flags to the futex_t
easier. Again, just a head's up that it may be changing in the near future.
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
From: Darren Hart <[email protected]>
Subject: Re: [PATCH 1/3] perf bench: Add wrappers for atomic operation of GCC
Date: Tue, 24 Nov 2009 08:20:18 -0800
Hi Darren,
> > It contains wrappers for atomic operation of GCC,
> > I think this is useful not only for 'perf bench',
> > but also for entire of perf command.
> >
> > This patch adds new typedefed struct 'atomic_t'.
> > I know new typedef is not a thing to welcome,
> > but I believe that atomic_t is worth to typedef
> > because it is much general.
> >
> > I borrowed this file from Darren Hart's futextest.
> > http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Hi Hitoshi-san,
>
> I took the gcc built-ins approach for futextest.h because I didn't want
> the hassle of maintaining per-arch asm files. Since perf is already in
> the kernel source, I wonder if you could leverage the already existing
> kernel atomic code? See Documentation/atomic_ops.txt.
Yes, kernel source has atomic operations for every architecture.
But some architectures which has no 64bit atomic ops, like x86_32,
uses lib/atomic64.c.
And headers provided by arch depends on autoconf.h.
So if I use kernel atomic code, 'make menuconfig' must be done
before building perf. I think it's not good for users.
So I borrowed your approach.
>
> >
> > Signed-off-by: Hitoshi Mitake <[email protected]>
> > Cc: Michel Lespinasse <[email protected]>
> > Cc: Darren Hart <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Paul Mackerras <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > ---
> > tools/perf/util/include/asm/atomic.h | 91 ++++++++++++++++++++++++++++++++++
> > 1 files changed, 91 insertions(+), 0 deletions(-)
> > create mode 100644 tools/perf/util/include/asm/atomic.h
> >
> > diff --git a/tools/perf/util/include/asm/atomic.h b/tools/perf/util/include/asm/atomic.h
> > new file mode 100644
> > index 0000000..1cef451
> > --- /dev/null
> > +++ b/tools/perf/util/include/asm/atomic.h
> > @@ -0,0 +1,91 @@
> > +/******************************************************************************
> > + *
> > + * Copyright B) International Business Machines Corp., 2009
>
> B) should be ? (or (C) at the very least), I'm guessing character set issue?
Sorry:( This must be character set issue. I'll fix.
Thanks
Hitoshi
From: Darren Hart <[email protected]>
Subject: Re: [PATCH 2/3] perf bench: Add new files for futex performance test
Date: Tue, 24 Nov 2009 08:33:16 -0800
Hi Darren,
>
> Hitoshi Mitake wrote:
> > This patch adds two new files.
> >
>
> Hi Hitoshi-san,
>
> > futextest.h provides general wrappers for futex() system call.
> > This patch containts the line:
> > typedef volatile u_int32_t futex_t;
> > I know that new typedef is not a thing to welcome,
> > but this is useful thing.
> >
> > futex-wait.c is a new suite to test performance of FUTEX_WAIT.
> >
> > These files are from Darren Hart's futex test,
> > and futex-wait.c is based on the program originally
> > written by Michel Lespinasse.
>
> Thanks for looking at getting the futex performance tests into perf.
> Just wanted to make sure you are aware that there will likely be several
> more futextest/performance tests in the near future as the project is
> under early active development. I see you have merged harness.h into
> futex-wait.c, which is fine, but I will likely be adding a new
> include/locking.h to add things like barrier, lock, and lock_pi locking
> primitives to the futextest testsuite. futex-wait.c would then be
> updated to use those directly if feasible and remove the custom locking
> primitives in the test itself. I mention this so you are aware and perf
> futex benchmarks don't get too far out of sync with futextest.
>
> I'd be interested in any ideas you have to make futextest/performance/*
> tests integrate more easily into perf as I'd like to include each new
> test into perf as well.
>
There are two reasons why I changed and merged harness.h into futex-wait.c.
1) Thread numbers are fixed
2) Output style are fixed
But it seems that there is no other problem.
So I wrote the patch making locktest() more general.
I'll post it later in this thread, could you review this?
> Couple of nits below:
>
>
> > diff --git a/tools/perf/bench/futextest.h b/tools/perf/bench/futextest.h
> > new file mode 100644
> > index 0000000..09d8b94
> > --- /dev/null
> > +++ b/tools/perf/bench/futextest.h
> > @@ -0,0 +1,280 @@
> > +/******************************************************************************
> > + *
> > + * Copyright B) International Business Machines Corp., 2009
>
>
> Copyright character issue...
Sorry again...
>
> > + * HISTORY
> > + * 2009-Nov-24:
> > + * Ported to perf by Hitoshi Mitake <[email protected]>
> > + * 2009-Nov-06: Initial version by Darren Hart <[email protected]>
>
> I usually add the dates in ascending chronological order.
Sorry, I'll fix it.
>
> > + *
> > + *****************************************************************************/
> > +
> > +#ifndef _FUTEXTEST_H
> > +#define _FUTEXTEST_H
> > +
> > +#include <unistd.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +#include <linux/futex.h>
> > +
> > +typedef volatile u_int32_t futex_t;
>
> I have been considering making this into a val wrapped in a struct like
> the atomic_t as that will make adding things like flags to the futex_t
> easier. Again, just a head's up that it may be changing in the near future.
Of course I'll chase your changes in the future :)
Thanks
Hitoshi
This patch adds new function locktest_preset() to harness.h,
which is a wrapper for locktest().
Current locktest() has array of threads number for test,
but this makes usefulness of harness.h less.
So preset arrays should go to outside of locktest().
locktest_preset() has preset array now.
And every output has gone to locktest_preset() too.
Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
performance/futex_wait.c | 2 +-
performance/harness.h | 98 +++++++++++++++++++++++++++-------------------
2 files changed, 59 insertions(+), 41 deletions(-)
diff --git a/performance/futex_wait.c b/performance/futex_wait.c
index bcbca77..62a7330 100644
--- a/performance/futex_wait.c
+++ b/performance/futex_wait.c
@@ -46,6 +46,6 @@ static void futex_wait_test(futex_t *futex, int loops)
int main (void)
{
printf("FUTEX_WAIT test\n");
- locktest(futex_wait_test, 100000000);
+ locktest_preset(futex_wait_test, 100000000);
return 0;
}
diff --git a/performance/harness.h b/performance/harness.h
index 7b23bdc..d08d31d 100644
--- a/performance/harness.h
+++ b/performance/harness.h
@@ -4,8 +4,9 @@
#include <limits.h>
#include <sys/times.h>
#include <stdio.h>
+#include <stdlib.h>
#include <pthread.h>
-
+#include <string.h>
struct thread_barrier {
futex_t threads;
@@ -64,54 +65,71 @@ static void * locktest_thread(void * dummy)
return NULL;
}
-static void locktest(void locktest_function(futex_t * ptr, int loops),
- int iterations)
+static int locktest(void locktest_function(futex_t * ptr, int loops),
+ int num_threads, int iterations,
+ struct tms *tms_before, struct tms *tms_after)
+{
+ struct locktest_shared shared;
+ pthread_t thread[num_threads];
+ int i;
+ clock_t before, after;
+
+ barrier_init(&shared.barrier_before, num_threads);
+ barrier_init(&shared.barrier_after, num_threads);
+ shared.locktest_function = locktest_function;
+ shared.loops = iterations / num_threads;
+ shared.futex = 0;
+
+ for (i = 0; i < num_threads; i++)
+ if (pthread_create(thread + i, NULL, locktest_thread,
+ &shared)) {
+ /* Could not create thread; abort */
+ barrier_unblock(&shared.barrier_before, -1);
+ while (--i >= 0)
+ pthread_join(thread[i], NULL);
+ return -1;
+ }
+ barrier_wait(&shared.barrier_before);
+ before = times(tms_before);
+ barrier_unblock(&shared.barrier_before, 1);
+ barrier_wait(&shared.barrier_after);
+ after = times(tms_after);
+ barrier_unblock(&shared.barrier_after, 1);
+ for (i = 0; i < num_threads; i++)
+ pthread_join(thread[i], NULL);
+ return after - before;
+}
+
+static void locktest_preset(void locktest_function(futex_t * ptr, int loops),
+ int iterations)
{
int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
64, 128, 256, 512, 1024, 0 };
int t;
+ struct tms before, after;
+ int wall, user, system;
+ double tick;
+
for (t = 0; threads[t]; t++) {
- int num_threads = threads[t];
- struct locktest_shared shared;
- pthread_t thread[num_threads];
- int i;
- clock_t before, after;
- struct tms tms_before, tms_after;
- int wall, user, system;
- double tick;
-
- barrier_init(&shared.barrier_before, num_threads);
- barrier_init(&shared.barrier_after, num_threads);
- shared.locktest_function = locktest_function;
- shared.loops = iterations / num_threads;
- shared.futex = 0;
-
- for (i = 0; i < num_threads; i++)
- if (pthread_create(thread + i, NULL, locktest_thread,
- &shared)) {
- /* Could not create thread; abort */
- barrier_unblock(&shared.barrier_before, -1);
- while (--i >= 0)
- pthread_join(thread[i], NULL);
- return;
- }
- barrier_wait(&shared.barrier_before);
- before = times(&tms_before);
- barrier_unblock(&shared.barrier_before, 1);
- barrier_wait(&shared.barrier_after);
- after = times(&tms_after);
- wall = after - before;
- user = tms_after.tms_utime - tms_before.tms_utime;
- system = tms_after.tms_stime - tms_before.tms_stime;
+ bzero(&before, sizeof(struct tms));
+ bzero(&after, sizeof(struct tms));
+ wall = locktest(locktest_function, threads[t], iterations,
+ &before, &after);
+
+ if (wall < 0) {
+ fprintf(stderr, "locktest() failed.\n");
+ exit(1);
+ }
+
+ user = after.tms_utime - before.tms_utime;
+ system = after.tms_stime - before.tms_stime;
tick = 1.0 / sysconf(_SC_CLK_TCK);
printf("%d threads: %.0f Kiter/s "
"(%.2fs user %.2fs system %.2fs wall %.2f cores)\n",
- num_threads,
- (num_threads * shared.loops) / (wall * tick * 1000),
+ threads[t],
+ iterations / (wall * tick * 1000),
user * tick, system * tick, wall * tick,
wall ? (user + system) * 1. / wall : 1.);
- barrier_unblock(&shared.barrier_after, 1);
- for (i = 0; i < num_threads; i++)
- pthread_join(thread[i], NULL);
+
}
}
--
1.6.5.2
Peter Zijlstra wrote:
Hey Peter,
Some thoughts on adaptive futexes ...
> Subject: futex: implement adaptive spin
> From: Peter Zijlstra <[email protected]>
> Date: Tue Jan 20 14:40:36 CET 2009
>
> Implement kernel side adaptive spining for futexes.
>
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
>
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
>
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
>
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
>
> void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
> void *lock, struct thread_info *owner);
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ...
> Index: linux-2.6/kernel/futex.c
> ===================================================================
> --- linux-2.6.orig/kernel/futex.c
> +++ linux-2.6/kernel/futex.c
> @@ -1158,10 +1158,40 @@ handle_fault:
> */
> #define FLAGS_SHARED 0x01
> #define FLAGS_CLOCKRT 0x02
> +#define FLAGS_ADAPTIVE 0x03
>
> static long futex_wait_restart(struct restart_block *restart);
>
> -static int futex_wait(u32 __user *uaddr, int fshared,
> +#ifdef CONFIG_SMP
> +struct thread_info *futex_owner(u32 __user *uaddr)
> +{
> + struct thread_info *ti = NULL;
> + struct task_struct *p;
> + pid_t pid;
> + u32 uval;
> +
> + if (get_futex_value_locked(&uval, uaddr))
> + return NULL;
Just give up if it would cause a fault?
> +
> + pid = uval & FUTEX_TID_MASK;
> + rcu_read_lock();
> + p = find_taks_by_vpid(pid);
I'm impressed that you can create such a solid patch without compiling it!
> + if (p) {
> + const struct cred *cred, *pcred;
> +
> + cread = current_cred();
> + pcred = __task_cred(p);
> + if (cred->euid == pcred->euid ||
> + cred->euid == pcred->uid)
> + ti = task_thread_info(p);
> + }
> + rcu_read_unlock();
> +
> + return ti;
> +}
> +#endif
> +
> +static int futex_wait(u32 __user *uaddr, int flags,
> u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
> {
> struct task_struct *curr = current;
> @@ -1176,11 +1206,43 @@ static int futex_wait(u32 __user *uaddr,
> if (!bitset)
> return -EINVAL;
>
> +#ifdef CONFIG_SMP
> + if (!futex_cmpxchg_enabled || !(flags & FLAGS_ADAPTIVE))
> + goto skip_adaptive;
> +
> + preempt_disable();
> + for (;;) {
> + struct thread_info *owner;
> + u32 curval = 0, newval = task_pid_vnr(curr);
Do we need to lookup newval every iteration?
> +
> + owner = futex_owner(uaddr);
> + if (owner && futex_spin_on_owner(uaddr, owner))
> + break;
> +
> + if (get_futex_value_locked(&uval, uaddr))
> + break;
> +
> + curval |= uval & ~FUTEX_TID_MASK;
> + newval |= uval & ~FUTEX_TID_MASK;
> +
> + if (cmpxchg_futex_value_locked(uaddr, curval, newval)
> + == newval)
"== curval" isn't it? futex_atomic_cmpxchg_inatomic() returns the
oldval, not the newval.
> + return 0;
> +
> + if (!owner && (need_resched() || rt_task(curr)))
> + break;
Hrm... why go through the loop at all for an rt_task if we bail on the
first iteration?
> +
> + cpu_relax();
> + }
> + preempt_enable();
> +skip_adaptive:
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
Peter Zijlstra wrote:
> ---
> Subject: futex: implement adaptive spin
> From: Peter Zijlstra <[email protected]>
> Date: Tue Jan 20 14:40:36 CET 2009
>
> Implement kernel side adaptive spining for futexes.
Hi Peter,
I'm working on reconciling the SET_WAIT patch from Michel and your
ADAPTIVE_WAIT patch. So far I've forward ported the adaptive patch and
cleaned up a few things. I'd like your take on the updated adaptive spin
in futex_wait() below.
Also, in the adaptive case, we have to decide how to handle the val !=
uval case after the adaptive spin has failed. Otherwise
futex_wait_setup() will return EWOULDBLOCK. Just checking for the flag
may not be enough if the lock is subsequently released as we should then
acquire it. We face many of the same problems here as we do with
FUTEX_LOCK_PI, except we don't have the rtmutex code to handle some of
them for us.
I'm starting to think this may be best implemented as a new function and
op code, maybe something like FUTEX_LOCK which could take ADAPTIVE as a
flag. FUTEX_LOCK would demux to futex_lock() in do_futex. This op would
define the futex value policy like PI and Robust futexes do. We would
basicaly be implementing a mutex, indicated by the LOCK term (as with
FUTEX_LOCK_PI), as opposed to the WAIT term which is really meant to be
a simple wait queue. This would keep some complexity out of the wait
paths.
>
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
>
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
>
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
>
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
>
> void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
> void *lock, struct thread_info *owner);
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> include/linux/futex.h | 2 +
> include/linux/sched.h | 1
> kernel/futex.c | 96 ++++++++++++++++++++++++++++++++++++++++++--------
> kernel/sched.c | 59 ++++++++++++++++++++++++++++++
> 4 files changed, 143 insertions(+), 15 deletions(-)
>
> ...
> +static int futex_wait(u32 __user *uaddr, int flags,
> u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
> {
> struct task_struct *curr = current;
> @@ -1176,11 +1206,43 @@ static int futex_wait(u32 __user *uaddr,
> if (!bitset)
> return -EINVAL;
>
> +#ifdef CONFIG_SMP
> + if (!futex_cmpxchg_enabled || !(flags & FLAGS_ADAPTIVE))
> + goto skip_adaptive;
> +
> + preempt_disable();
> + for (;;) {
> + struct thread_info *owner;
> + u32 curval = 0, newval = task_pid_vnr(curr);
> +
> + owner = futex_owner(uaddr);
> + if (owner && futex_spin_on_owner(uaddr, owner))
> + break;
> +
> + if (get_futex_value_locked(&uval, uaddr))
> + break;
> +
> + curval |= uval & ~FUTEX_TID_MASK;
> + newval |= uval & ~FUTEX_TID_MASK;
> +
> + if (cmpxchg_futex_value_locked(uaddr, curval, newval)
> + == newval)
> + return 0;
> +
> + if (!owner && (need_resched() || rt_task(curr)))
> + break;
> +
> + cpu_relax();
> + }
> + preempt_enable();
> +skip_adaptive:
> +#endif
> +
Maybe something more like:
#ifdef CONFIG_SMP
if ((flags & FLAGS_ADAPTIVE) && futex_cmpxchg_enabled) {
preempt_disable();
tid = task_pid_vnr(current);
for (;;) {
struct thread_info *owner;
u32 uval, curval, newval;
owner = futex_owner(uaddr);
if (owner && futex_spin_on_owner(uaddr, owner))
break;
if (get_futex_value_locked(&uval, uaddr))
break;
/*
* Preserve robust and waiters bits.
*/
curval = uval & ~FUTEX_TID_MASK;
newval = tid | curval;
if (cmpxchg_futex_value_locked(uaddr, curval, newval)
== curval)
return 0;
/*
* Adaptive futexes manage the owner atomically. We
* are not in danger of deadlock by preempting a pending
* owner. Abort the loop if our time slice has expired.
* RT Threads can spin until they preempt the owner, at
* which point they will break out of the loop anyway.
*/
if (need_resched())
break;
/* Do we need this here? */
cpu_relax();
}
preempt_enable();
}
#endif
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team