2010-04-09 05:16:10

by Darren Hart

[permalink] [raw]
Subject: [PATCH V5 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning

NOT FOR INCLUSION

The following patch series implements a new experimental kernel side futex mutex
via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The adaptive spin
allows for multiple spinners until the lock is released or the owner is
descheduled. The patch currently allows the user to specify if they want
spinning or not, but the final version will only provide the adaptive variant as
blocking locks are already very efficiently implemented with the existing
operations.

This version greatly outperforms the last, and actually shows some benefit using
adaptive locks! The primary difference in the implementations was the removal of
any attempt to limit the number of spinners as we have no means to track
spinners. This version also allows spinners to continue spinning if the owner of
a lock changes, and only aborts if the current owner deschedules. Aborting the
spin if the owner changes proved ineffectual as this locking mechanism does a
lot of lock stealing, so many of the loops would be a single cycle, and then
move on to block. Something to note is that the adaptive runs can make
significantly more syscalls than the non-adaptive version, and still show
significant improvement.

Normal Lock
===========
# ./futex_lock -i100000000 -n8 -p1000 -d10
futex_lock: Measure FUTEX_LOCK operations per second
Arguments: iterations=100000000 threads=8 adaptive=no
period=1000 duty-cycle=10%
Result: 606 Kiter/s
lock calls: 100000000
lock syscalls: 35729444 (35.73%)
unlock calls: 100000000
unlock syscalls: 41740294 (41.74%)

Adaptive Lock
=============
# ./futex_lock -i100000000 -n8 -p1000 -d10 -a
futex_lock: Measure FUTEX_LOCK operations per second
Arguments: iterations=100000000 threads=8 adaptive=yes
period=1000 duty-cycle=10%
Result: 749 Kiter/s
lock calls: 100000000
lock syscalls: 72257679 (72.26%)
unlock calls: 100000000
unlock syscalls: 72265869 (72.27%)

I'm using the futex_lock branch of my futextest test suite to gather results.
The test is performance/futex_lock.c and can be checked out here:

git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git

At Avi's suggestion, I prepared plots of multiple thread counts, they are
available at the URL below. These plots are all from runs with a period of 1000
instructions, with one plot per each of several duty-cycles.

http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v5/1000-period/plots.html

Now that an advantage can be shown using FUTEX_LOCK_ADAPTIVE over FUTEX_LOCK,
the next steps as I see them are:

o Try and show improvement of FUTEX_LOCK_ADAPTIVE over FUTEX_WAIT based
implementations (pthread_mutex specifically).

o Improve workload definition. The current workload is a cpu-hog. It runs a
fixed set of instructions in a loop, with some percentage of those being
contained within the lock/unlock block. Adding sleeps to reduce CPU overhead
just added so much scheduling overhead that the numbers dropped absurdly for
both normal and adaptive. I'd appreciate any assistance in preparing a
test-case for a real-world usage scenario. I will work with some of IBM's
product teams to do so, and would welcome any input from others with an
interest in kernel assisted userspace spinlocks.

o Attempt the kernel assisted userspace spinning version proposed by Avi, Alan,
and Ulrich for comparison.

Thanks,

Darren Hart
IBM Linux Technology Center
Real-Time Linux Team


2010-04-09 05:16:37

by Darren Hart

[permalink] [raw]
Subject: [PATCH 1/4] futex: replace fshared and clockrt with combined flags

From: Darren Hart <[email protected]>

In the early days we passed the mmap sem around. That became the
"int fshared" with the fast gup improvements. Then we added
"int clockrt" in places. This patch unifies these options as "flags" and
cleans up various calls which had fshared in their signature but no
longer used it.

V4: Fix an operator precedence bug

Signed-off-by: Darren Hart <[email protected]>
---
kernel/futex.c | 215 +++++++++++++++++++++++++++-----------------------------
1 files changed, 104 insertions(+), 111 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..ed08cfd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -69,6 +69,14 @@ int __read_mostly futex_cmpxchg_enabled;
#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)

/*
+ * Futex flags used to encode options to functions and preserve them across
+ * restarts.
+ */
+#define FLAGS_SHARED 0x01
+#define FLAGS_CLOCKRT 0x02
+#define FLAGS_HAS_TIMEOUT 0x04
+
+/*
* Priority Inheritance state:
*/
struct futex_pi_state {
@@ -283,7 +291,7 @@ again:
}

static inline
-void put_futex_key(int fshared, union futex_key *key)
+void put_futex_key(union futex_key *key)
{
drop_futex_key_refs(key);
}
@@ -878,7 +886,7 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
/*
* Wake up waiters matching bitset queued on this futex (uaddr).
*/
-static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
+static int futex_wake(u32 __user *uaddr, int flags, int nr_wake, u32 bitset)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
@@ -889,7 +897,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
if (!bitset)
return -EINVAL;

- ret = get_futex_key(uaddr, fshared, &key);
+ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
if (unlikely(ret != 0))
goto out;

@@ -915,7 +923,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
}

spin_unlock(&hb->lock);
- put_futex_key(fshared, &key);
+ put_futex_key(&key);
out:
return ret;
}
@@ -925,7 +933,7 @@ out:
* to this virtual address:
*/
static int
-futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
+futex_wake_op(u32 __user *uaddr1, int flags, u32 __user *uaddr2,
int nr_wake, int nr_wake2, int op)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -935,10 +943,10 @@ futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
int ret, op_ret;

retry:
- ret = get_futex_key(uaddr1, fshared, &key1);
+ ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
if (unlikely(ret != 0))
goto out;
- ret = get_futex_key(uaddr2, fshared, &key2);
+ ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
if (unlikely(ret != 0))
goto out_put_key1;

@@ -970,11 +978,11 @@ retry_private:
if (ret)
goto out_put_keys;

- if (!fshared)
+ if (!(flags & FLAGS_SHARED))
goto retry_private;

- put_futex_key(fshared, &key2);
- put_futex_key(fshared, &key1);
+ put_futex_key(&key2);
+ put_futex_key(&key1);
goto retry;
}

@@ -1004,9 +1012,9 @@ retry_private:

double_unlock_hb(hb1, hb2);
out_put_keys:
- put_futex_key(fshared, &key2);
+ put_futex_key(&key2);
out_put_key1:
- put_futex_key(fshared, &key1);
+ put_futex_key(&key1);
out:
return ret;
}
@@ -1140,11 +1148,13 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,

/**
* futex_requeue() - Requeue waiters from uaddr1 to uaddr2
- * uaddr1: source futex user address
- * uaddr2: target futex user address
- * nr_wake: number of waiters to wake (must be 1 for requeue_pi)
- * nr_requeue: number of waiters to requeue (0-INT_MAX)
- * requeue_pi: if we are attempting to requeue from a non-pi futex to a
+ * @uaddr1: source futex user address
+ * @flags: futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
+ * the same type, no requeueing from private to shared, etc.
+ * @uaddr2: target futex user address
+ * @nr_wake: number of waiters to wake (must be 1 for requeue_pi)
+ * @nr_requeue: number of waiters to requeue (0-INT_MAX)
+ * @requeue_pi: if we are attempting to requeue from a non-pi futex to a
* pi futex (pi to pi requeue is not supported)
*
* Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire
@@ -1154,7 +1164,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
* >=0 - on success, the number of tasks requeued or woken
* <0 - on error
*/
-static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
+static int futex_requeue(u32 __user *uaddr1, int flags, u32 __user *uaddr2,
int nr_wake, int nr_requeue, u32 *cmpval,
int requeue_pi)
{
@@ -1197,10 +1207,10 @@ retry:
pi_state = NULL;
}

- ret = get_futex_key(uaddr1, fshared, &key1);
+ ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
if (unlikely(ret != 0))
goto out;
- ret = get_futex_key(uaddr2, fshared, &key2);
+ ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
if (unlikely(ret != 0))
goto out_put_key1;

@@ -1222,11 +1232,11 @@ retry_private:
if (ret)
goto out_put_keys;

- if (!fshared)
+ if (!(flags & FLAGS_SHARED))
goto retry_private;

- put_futex_key(fshared, &key2);
- put_futex_key(fshared, &key1);
+ put_futex_key(&key2);
+ put_futex_key(&key1);
goto retry;
}
if (curval != *cmpval) {
@@ -1266,8 +1276,8 @@ retry_private:
break;
case -EFAULT:
double_unlock_hb(hb1, hb2);
- put_futex_key(fshared, &key2);
- put_futex_key(fshared, &key1);
+ put_futex_key(&key2);
+ put_futex_key(&key1);
ret = fault_in_user_writeable(uaddr2);
if (!ret)
goto retry;
@@ -1275,8 +1285,8 @@ retry_private:
case -EAGAIN:
/* The owner was exiting, try again. */
double_unlock_hb(hb1, hb2);
- put_futex_key(fshared, &key2);
- put_futex_key(fshared, &key1);
+ put_futex_key(&key2);
+ put_futex_key(&key1);
cond_resched();
goto retry;
default:
@@ -1358,9 +1368,9 @@ out_unlock:
drop_futex_key_refs(&key1);

out_put_keys:
- put_futex_key(fshared, &key2);
+ put_futex_key(&key2);
out_put_key1:
- put_futex_key(fshared, &key1);
+ put_futex_key(&key1);
out:
if (pi_state != NULL)
free_pi_state(pi_state);
@@ -1500,7 +1510,7 @@ static void unqueue_me_pi(struct futex_q *q)
* private futexes.
*/
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
- struct task_struct *newowner, int fshared)
+ struct task_struct *newowner)
{
u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
@@ -1593,20 +1603,11 @@ handle_fault:
goto retry;
}

-/*
- * In case we must use restart_block to restart a futex_wait,
- * we encode in the 'flags' shared capability
- */
-#define FLAGS_SHARED 0x01
-#define FLAGS_CLOCKRT 0x02
-#define FLAGS_HAS_TIMEOUT 0x04
-
static long futex_wait_restart(struct restart_block *restart);

/**
* fixup_owner() - Post lock pi_state and corner case management
* @uaddr: user address of the futex
- * @fshared: whether the futex is shared (1) or not (0)
* @q: futex_q (contains pi_state and access to the rt_mutex)
* @locked: if the attempt to take the rt_mutex succeeded (1) or not (0)
*
@@ -1619,8 +1620,7 @@ static long futex_wait_restart(struct restart_block *restart);
* 0 - success, lock not taken
* <0 - on error (-EFAULT)
*/
-static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
- int locked)
+static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
{
struct task_struct *owner;
int ret = 0;
@@ -1631,7 +1631,7 @@ static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
* did a lock-steal - fix up the PI-state in that case:
*/
if (q->pi_state->owner != current)
- ret = fixup_pi_state_owner(uaddr, q, current, fshared);
+ ret = fixup_pi_state_owner(uaddr, q, current);
goto out;
}

@@ -1658,7 +1658,7 @@ static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
* lock. Fix the state up.
*/
owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- ret = fixup_pi_state_owner(uaddr, q, owner, fshared);
+ ret = fixup_pi_state_owner(uaddr, q, owner);
goto out;
}

@@ -1721,7 +1721,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* futex_wait_setup() - Prepare to wait on a futex
* @uaddr: the futex userspace address
* @val: the expected value
- * @fshared: whether the futex is shared (1) or not (0)
+ * @flags: futex flags (FLAGS_SHARED, etc.)
* @q: the associated futex_q
* @hb: storage for hash_bucket pointer to be returned to caller
*
@@ -1734,7 +1734,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* 0 - uaddr contains val and hb has been locked
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
+static int futex_wait_setup(u32 __user *uaddr, u32 val, int flags,
struct futex_q *q, struct futex_hash_bucket **hb)
{
u32 uval;
@@ -1759,7 +1759,7 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
*/
retry:
q->key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q->key);
+ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key);
if (unlikely(ret != 0))
return ret;

@@ -1775,10 +1775,10 @@ retry_private:
if (ret)
goto out;

- if (!fshared)
+ if (!(flags & FLAGS_SHARED))
goto retry_private;

- put_futex_key(fshared, &q->key);
+ put_futex_key(&q->key);
goto retry;
}

@@ -1789,12 +1789,12 @@ retry_private:

out:
if (ret)
- put_futex_key(fshared, &q->key);
+ put_futex_key(&q->key);
return ret;
}

-static int futex_wait(u32 __user *uaddr, int fshared,
- u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int flags, u32 val, ktime_t *abs_time,
+ u32 bitset)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct restart_block *restart;
@@ -1813,8 +1813,9 @@ static int futex_wait(u32 __user *uaddr, int fshared,
if (abs_time) {
to = &timeout;

- hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
- CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+ hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+ CLOCK_REALTIME : CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS);
hrtimer_init_sleeper(to, current);
hrtimer_set_expires_range_ns(&to->timer, *abs_time,
current->timer_slack_ns);
@@ -1822,7 +1823,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,

retry:
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
if (ret)
goto out;

@@ -1842,7 +1843,7 @@ retry:
* victim of a spurious wakeup as well.
*/
if (!signal_pending(current)) {
- put_futex_key(fshared, &q.key);
+ put_futex_key(&q.key);
goto retry;
}

@@ -1856,17 +1857,12 @@ retry:
restart->futex.val = val;
restart->futex.time = abs_time->tv64;
restart->futex.bitset = bitset;
- restart->futex.flags = FLAGS_HAS_TIMEOUT;
-
- if (fshared)
- restart->futex.flags |= FLAGS_SHARED;
- if (clockrt)
- restart->futex.flags |= FLAGS_CLOCKRT;
+ restart->futex.flags = flags;

ret = -ERESTART_RESTARTBLOCK;

out_put_key:
- put_futex_key(fshared, &q.key);
+ put_futex_key(&q.key);
out:
if (to) {
hrtimer_cancel(&to->timer);
@@ -1879,7 +1875,6 @@ out:
static long futex_wait_restart(struct restart_block *restart)
{
u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
- int fshared = 0;
ktime_t t, *tp = NULL;

if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
@@ -1887,11 +1882,9 @@ static long futex_wait_restart(struct restart_block *restart)
tp = &t;
}
restart->fn = do_no_restart_syscall;
- if (restart->futex.flags & FLAGS_SHARED)
- fshared = 1;
- return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
- restart->futex.bitset,
- restart->futex.flags & FLAGS_CLOCKRT);
+
+ return (long)futex_wait(uaddr, restart->futex.flags,
+ restart->futex.val, tp, restart->futex.bitset);
}


@@ -1901,8 +1894,8 @@ static long futex_wait_restart(struct restart_block *restart)
* if there are waiters then it will block, it does PI, etc. (Due to
* races the kernel might see a 0 value of the futex too.)
*/
-static int futex_lock_pi(u32 __user *uaddr, int fshared,
- int detect, ktime_t *time, int trylock)
+static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
+ ktime_t *time, int trylock)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct futex_hash_bucket *hb;
@@ -1925,7 +1918,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
q.requeue_pi_key = NULL;
retry:
q.key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q.key);
+ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
if (unlikely(ret != 0))
goto out;

@@ -1947,7 +1940,7 @@ retry_private:
* exit to complete.
*/
queue_unlock(&q, hb);
- put_futex_key(fshared, &q.key);
+ put_futex_key(&q.key);
cond_resched();
goto retry;
default:
@@ -1977,7 +1970,7 @@ retry_private:
* Fixup the pi_state owner and possibly acquire the lock if we
* haven't already.
*/
- res = fixup_owner(uaddr, fshared, &q, !ret);
+ res = fixup_owner(uaddr, &q, !ret);
/*
* If fixup_owner() returned an error, proprogate that. If it acquired
* the lock, clear our -ETIMEDOUT or -EINTR.
@@ -2001,7 +1994,7 @@ out_unlock_put_key:
queue_unlock(&q, hb);

out_put_key:
- put_futex_key(fshared, &q.key);
+ put_futex_key(&q.key);
out:
if (to)
destroy_hrtimer_on_stack(&to->timer);
@@ -2014,10 +2007,10 @@ uaddr_faulted:
if (ret)
goto out_put_key;

- if (!fshared)
+ if (!(flags & FLAGS_SHARED))
goto retry_private;

- put_futex_key(fshared, &q.key);
+ put_futex_key(&q.key);
goto retry;
}

@@ -2026,7 +2019,7 @@ uaddr_faulted:
* This is the in-kernel slowpath: we look up the PI state (if any),
* and do the rt-mutex unlock.
*/
-static int futex_unlock_pi(u32 __user *uaddr, int fshared)
+static int futex_unlock_pi(u32 __user *uaddr, int flags)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
@@ -2044,7 +2037,7 @@ retry:
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
return -EPERM;

- ret = get_futex_key(uaddr, fshared, &key);
+ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
if (unlikely(ret != 0))
goto out;

@@ -2099,14 +2092,14 @@ retry:

out_unlock:
spin_unlock(&hb->lock);
- put_futex_key(fshared, &key);
+ put_futex_key(&key);

out:
return ret;

pi_faulted:
spin_unlock(&hb->lock);
- put_futex_key(fshared, &key);
+ put_futex_key(&key);

ret = fault_in_user_writeable(uaddr);
if (!ret)
@@ -2166,7 +2159,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
/**
* futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
* @uaddr: the futex we initially wait on (non-pi)
- * @fshared: whether the futexes are shared (1) or not (0). They must be
+ * @flags: futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
* the same type, no requeueing from private to shared, etc.
* @val: the expected value of uaddr
* @abs_time: absolute timeout
@@ -2204,9 +2197,9 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
* 0 - On success
* <0 - On error
*/
-static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
+static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
u32 val, ktime_t *abs_time, u32 bitset,
- int clockrt, u32 __user *uaddr2)
+ u32 __user *uaddr2)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct rt_mutex_waiter rt_waiter;
@@ -2221,8 +2214,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,

if (abs_time) {
to = &timeout;
- hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
- CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+ hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+ CLOCK_REALTIME : CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS);
hrtimer_init_sleeper(to, current);
hrtimer_set_expires_range_ns(&to->timer, *abs_time,
current->timer_slack_ns);
@@ -2236,7 +2230,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
rt_waiter.task = NULL;

key2 = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr2, fshared, &key2);
+ ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
if (unlikely(ret != 0))
goto out;

@@ -2246,7 +2240,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
q.requeue_pi_key = &key2;

/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
if (ret)
goto out_key2;

@@ -2274,8 +2268,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
*/
if (q.pi_state && (q.pi_state->owner != current)) {
spin_lock(q.lock_ptr);
- ret = fixup_pi_state_owner(uaddr2, &q, current,
- fshared);
+ ret = fixup_pi_state_owner(uaddr2, &q, current);
spin_unlock(q.lock_ptr);
}
} else {
@@ -2294,7 +2287,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
* Fixup the pi_state owner and possibly acquire the lock if we
* haven't already.
*/
- res = fixup_owner(uaddr2, fshared, &q, !ret);
+ res = fixup_owner(uaddr2, &q, !ret);
/*
* If fixup_owner() returned an error, proprogate that. If it
* acquired the lock, clear -ETIMEDOUT or -EINTR.
@@ -2325,9 +2318,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
}

out_put_keys:
- put_futex_key(fshared, &q.key);
+ put_futex_key(&q.key);
out_key2:
- put_futex_key(fshared, &key2);
+ put_futex_key(&key2);

out:
if (to) {
@@ -2551,58 +2544,58 @@ void exit_robust_list(struct task_struct *curr)
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
- int clockrt, ret = -ENOSYS;
+ int ret = -ENOSYS;
int cmd = op & FUTEX_CMD_MASK;
- int fshared = 0;
+ int flags = 0;

if (!(op & FUTEX_PRIVATE_FLAG))
- fshared = 1;
+ flags |= FLAGS_SHARED;

- clockrt = op & FUTEX_CLOCK_REALTIME;
- if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
- return -ENOSYS;
+ if (op & FUTEX_CLOCK_REALTIME) {
+ flags |= FLAGS_CLOCKRT;
+ if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+ return -ENOSYS;
+ }

switch (cmd) {
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
- ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+ ret = futex_wait(uaddr, flags, val, timeout, val3);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAKE_BITSET:
- ret = futex_wake(uaddr, fshared, val, val3);
+ ret = futex_wake(uaddr, flags, val, val3);
break;
case FUTEX_REQUEUE:
- ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
+ ret = futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0);
break;
case FUTEX_CMP_REQUEUE:
- ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
- 0);
+ ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0);
break;
case FUTEX_WAKE_OP:
- ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
+ ret = futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
break;
case FUTEX_LOCK_PI:
if (futex_cmpxchg_enabled)
- ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
+ ret = futex_lock_pi(uaddr, flags, val, timeout, 0);
break;
case FUTEX_UNLOCK_PI:
if (futex_cmpxchg_enabled)
- ret = futex_unlock_pi(uaddr, fshared);
+ ret = futex_unlock_pi(uaddr, flags);
break;
case FUTEX_TRYLOCK_PI:
if (futex_cmpxchg_enabled)
- ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
+ ret = futex_lock_pi(uaddr, flags, 0, timeout, 1);
break;
case FUTEX_WAIT_REQUEUE_PI:
val3 = FUTEX_BITSET_MATCH_ANY;
- ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3,
- clockrt, uaddr2);
+ ret = futex_wait_requeue_pi(uaddr, flags, val, timeout, val3,
+ uaddr2);
break;
case FUTEX_CMP_REQUEUE_PI:
- ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
- 1);
+ ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
break;
default:
ret = -ENOSYS;
--
1.6.3.3

2010-04-09 05:17:26

by Darren Hart

[permalink] [raw]
Subject: [PATCH 2/4] futex: add futex_q static initializer

From: Darren Hart <[email protected]>

The futex_q struct has grown considerably over the last year or so. I
believe it now merits a static initializer to avoid uninitialized data
errors (having just spent more time than I care to admit debugging
an uninitialized q.bitset in an experimental new op code).

I originally planned on following the __THING_INITIALIZER/DECLARE_THING
method, but since we already had FUTEX_KEY_INIT, and I personally prefer
that method, I went that route.

Signed-off-by: Darren Hart <[email protected]>
---
kernel/futex.c | 19 +++++++++----------
1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index ed08cfd..2ae18cd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -130,6 +130,12 @@ struct futex_q {
u32 bitset;
};

+#define FUTEX_Q_INIT \
+ { /* list gets initialized in queue_me()*/ \
+ .task = NULL, NULL, FUTEX_KEY_INIT \
+ , NULL, NULL, NULL, FUTEX_BITSET_MATCH_ANY }
+
+
/*
* Hash buckets are shared by all the futex_keys that hash to the same
* location. Each key may have multiple futex_q structures, one for each task
@@ -1799,16 +1805,13 @@ static int futex_wait(u32 __user *uaddr, int flags, u32 val, ktime_t *abs_time,
struct hrtimer_sleeper timeout, *to = NULL;
struct restart_block *restart;
struct futex_hash_bucket *hb;
- struct futex_q q;
+ struct futex_q q = FUTEX_Q_INIT;
int ret;

if (!bitset)
return -EINVAL;

- q.pi_state = NULL;
q.bitset = bitset;
- q.rt_waiter = NULL;
- q.requeue_pi_key = NULL;

if (abs_time) {
to = &timeout;
@@ -1899,7 +1902,7 @@ static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
{
struct hrtimer_sleeper timeout, *to = NULL;
struct futex_hash_bucket *hb;
- struct futex_q q;
+ struct futex_q q = FUTEX_Q_INIT;
int res, ret;

if (refill_pi_state_cache())
@@ -1913,9 +1916,6 @@ static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
hrtimer_set_expires(&to->timer, *time);
}

- q.pi_state = NULL;
- q.rt_waiter = NULL;
- q.requeue_pi_key = NULL;
retry:
q.key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
@@ -2206,7 +2206,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
struct rt_mutex *pi_mutex = NULL;
struct futex_hash_bucket *hb;
union futex_key key2;
- struct futex_q q;
+ struct futex_q q = FUTEX_Q_INIT;
int res, ret;

if (!bitset)
@@ -2234,7 +2234,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
if (unlikely(ret != 0))
goto out;

- q.pi_state = NULL;
q.bitset = bitset;
q.rt_waiter = &rt_waiter;
q.requeue_pi_key = &key2;
--
1.6.3.3

2010-04-09 05:17:32

by Darren Hart

[permalink] [raw]
Subject: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning

From: Darren Hart <[email protected]>

Add a non-pi TID value based futex locking mechanism. This enables the
use of adaptive spinning which was problematic with the basic FUTEX_WAIT
operation.

V5: Handle timeout inside adaptive lock spin (needs optimization)
Fix task_struct ref counting
Allow CLOCK_RT and default to CLOCK_MONOTONIC for FUTEX_LOCK
Cleanup futex_lock ret handling (EINTR) cruft
Remove notion of aggressive adaptive spinning
Continue spinning if owner changes, only abort spin if owner deschedules
V4: Remove update_futex_waiters() and perform all waiter clearing in
userspace. Turns out to be generally better to do an O(1) operation in
userspace that forces one extra syscall than to do an O(N) operation during
every syscall. Go figure :).
Cleanup timeout when exiting adaptive loop.
Perform adaptive spinning after retry:
General cleanups.
V3: Fix some locking bugs
Use cmpxchg directly in the adaptive loop
Move the adaptive loop into its own function
Use FUTEX_WAIT instead of FUTEX_UNLOCK
Add update_futex_waiters()
V2: Fix preempt-count breakage

This patch is a forward port of the following patch from Peter Zijlstra
with some bug fixes and a reworking of the futex op codes.

> futex: Implement kernel-side adaptive spinning for futexes
>
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
>
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
>
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
>
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
>
> void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
> void *lock, struct thread_info *owner);
>
> Peter Zijlstra's original patch forward ported by Darren Hart.
>

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Darren Hart <[email protected]>
---
include/linux/futex.h | 6 +
include/linux/sched.h | 1 +
kernel/futex.c | 311 +++++++++++++++++++++++++++++++++++++++++++------
kernel/sched.c | 66 +++++++++++
4 files changed, 349 insertions(+), 35 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..3ca93d1 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,8 @@
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define FUTEX_LOCK 13
+#define FUTEX_LOCK_ADAPTIVE 14

#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -39,6 +41,9 @@
FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PRIVATE (FUTEX_LOCK | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_ADAPTIVE_PRIVATE (FUTEX_LOCK_ADAPTIVE | \
+ FUTEX_PRIVATE_FLAG)

/*
* Support for robust futexes: the kernel cleans up held futexes at
@@ -177,6 +182,7 @@ union futex_key {
extern void exit_robust_list(struct task_struct *curr);
extern void exit_pi_state_list(struct task_struct *curr);
extern int futex_cmpxchg_enabled;
+extern struct thread_info *futex_owner(u32 __user *uaddr);
#else
static inline void exit_robust_list(struct task_struct *curr)
{
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dad7f66..885d659 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -362,6 +362,7 @@ extern signed long schedule_timeout_killable(signed long timeout);
extern signed long schedule_timeout_uninterruptible(signed long timeout);
asmlinkage void schedule(void);
extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner);

struct nsproxy;
struct user_namespace;
diff --git a/kernel/futex.c b/kernel/futex.c
index 8c1bb16..2b757b7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -75,6 +75,7 @@ int __read_mostly futex_cmpxchg_enabled;
#define FLAGS_SHARED 0x01
#define FLAGS_CLOCKRT 0x02
#define FLAGS_HAS_TIMEOUT 0x04
+#define FLAGS_ADAPTIVE 0x08

/*
* Priority Inheritance state:
@@ -368,6 +369,43 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from)
return ret ? -EFAULT : 0;
}

+#ifdef CONFIG_SMP
+/**
+ * futex_owner() - Lookup the thread_info struct of ther futex owner
+ * @uaddr: The user address containing the TID of the owner
+ *
+ * If a non-NULL ti is returned, the caller must call
+ * put_task_struct(ti->task).
+ * */
+struct thread_info *futex_owner(u32 __user *uaddr)
+{
+ struct thread_info *ti = NULL;
+ struct task_struct *p;
+ pid_t pid;
+ u32 uval;
+
+ if (get_futex_value_locked(&uval, uaddr))
+ return NULL;
+
+ pid = uval & FUTEX_TID_MASK;
+ rcu_read_lock();
+ p = find_task_by_vpid(pid);
+ if (p) {
+ const struct cred *cred, *pcred;
+
+ cred = current_cred();
+ pcred = __task_cred(p);
+ if (cred->euid == pcred->euid || cred->euid == pcred->uid) {
+ ti = task_thread_info(p);
+ get_task_struct(ti->task);
+ }
+ }
+ rcu_read_unlock();
+
+ return ti;
+}
+#endif
+

/*
* PI code:
@@ -717,7 +755,7 @@ retry:
* @hb: the pi futex hash bucket
* @key: the futex key associated with uaddr and hb
* @ps: the pi_state pointer where we store the result of the
- * lookup
+ * lookup. If non-NULL, hb and key must also be non-NULL.
* @task: the task to perform the atomic lock work for. This will
* be "current" except in the case of requeue pi.
* @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
@@ -727,7 +765,8 @@ retry:
* 1 - acquired the lock
* <0 - error
*
- * The hb->lock and futex_key refs shall be held by the caller.
+ * The hb->lock and futex_key refs shall be held by the caller if the pi_state
+ * pointer is non-NULL.
*/
static int lock_pi_futex_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
union futex_key *key,
@@ -741,37 +780,32 @@ retry:
ret = lock_futex_atomic(uaddr, task, set_waiters);
if (ret)
return ret;
-
- /*
- * We dont have the lock. Look up the PI state (or create it if
- * we are the first waiter):
+ /*
+ * We don't have the lock.
+ * Look up the PI state (or create it if we are the first waiter).
+ * futex_lock_atomic() calls us with ps==NULL for non-pi atomic futex
+ * acquisition.
*/
- if (get_futex_value_locked(&uval, uaddr))
- return -EFAULT;
- ret = lookup_pi_state(uval, hb, key, ps);
+ if (ps)
+ ret = lookup_pi_state(uval, hb, key, ps);

- if (unlikely(ret)) {
- switch (ret) {
- case -ESRCH:
- /*
- * No owner found for this futex. Check if the
- * OWNER_DIED bit is set to figure out whether
- * this is a robust futex or not.
- */
- if (get_futex_value_locked(&uval, uaddr))
- return -EFAULT;
+ if (!ps || unlikely(ret) == -ESRCH) {
+ /*
+ * No owner found for this futex. Check if the
+ * OWNER_DIED bit is set to figure out whether
+ * this is a robust futex or not.
+ */
+ if (get_futex_value_locked(&uval, uaddr))
+ return -EFAULT;

- /*
- * We simply start over in case of a robust
- * futex. The code above will take the futex
- * and return happy.
- */
- if (uval & FUTEX_OWNER_DIED)
- goto retry;
- default:
- break;
- }
- }
+ /*
+ * We simply start over in case of a robust
+ * futex. The code above will take the futex
+ * and return happy.
+ */
+ if (uval & FUTEX_OWNER_DIED)
+ goto retry;
+ }

return ret;
}
@@ -874,13 +908,13 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
return 0;
}

-static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
+static int unlock_futex(u32 __user *uaddr, u32 uval)
{
u32 oldval;

/*
* There is no waiter, so we unlock the futex. The owner died
- * bit has not to be preserved here. We are the owner:
+ * bit does not need to be preserved here. We are the owner:
*/
oldval = cmpxchg_futex_value_locked(uaddr, uval, 0);

@@ -2112,7 +2146,7 @@ retry:
* No waiters - kernel unlocks the futex:
*/
if (!(uval & FUTEX_OWNER_DIED)) {
- ret = unlock_futex_pi(uaddr, uval);
+ ret = unlock_futex(uaddr, uval);
if (ret == -EFAULT)
goto pi_faulted;
}
@@ -2356,6 +2390,205 @@ out:
return ret;
}

+/**
+ * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
+ * @uaddr: the futex user address
+ * @timeout: absolute timeout or NULL if none
+ *
+ * Try to acquire a futex lock in a loop until the owner changes or the owner
+ * is descheduled. To lock the futex, set the value to the current TID.
+ *
+ * Returns:
+ * 0 - Gave up, lock not acquired
+ * 1 - Futex lock acquired
+ * <0 - On error
+ */
+static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
+{
+ struct thread_info *owner = NULL;
+ pid_t curtid, ownertid = 0;
+ ktime_t now;
+ int ret = 0;
+ u32 curval;
+
+ curtid = task_pid_vnr(current);
+
+ for (;;) {
+ /*
+ * Try to acquire the lock. If we acquire it or error out,
+ * break to enable preemption and handle ret outside the loop.
+ */
+ curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
+
+ if (curval == 0) {
+ ret = 1;
+ break;
+ }
+
+ if (unlikely(curval == -EFAULT)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ /*
+ * Futex locks manage the owner atomically. We are not in
+ * danger of deadlock by preempting a pending owner. Abort the
+ * loop if our time slice has expired. RT Threads can spin
+ * until they preempt the owner, at which point they will break
+ * out of the loop anyway.
+ */
+ if (need_resched())
+ break;
+
+ if (timeout) {
+ now = ktime_get();
+ if (timeout->tv64 < now.tv64)
+ break;
+ }
+
+ cpu_relax();
+
+ if ((curval & FUTEX_TID_MASK) != ownertid) {
+ if (owner)
+ put_task_struct(owner->task);
+ owner = futex_owner(uaddr);
+ }
+ if (owner && !futex_spin_on_owner(uaddr, owner))
+ break;
+ }
+
+ if (owner)
+ put_task_struct(owner->task);
+
+ return ret;
+}
+
+/**
+ * futex_lock() - Acquire the lock and set the futex value
+ * @uaddr: the futex user address
+ * @flags: futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, FLAGS_ADAPTIVE, etc.)
+ * @detect: detect deadlock (1) or not (0)
+ * @time: absolute timeout
+ *
+ * futex_(un)lock() define a futex value policy and implement a full mutex. The
+ * futex value stores the owner's TID or'd with FUTEX_WAITERS and/or
+ * FUTEX_OWNER_DIED as appropriate.
+ *
+ * Userspace tried a 0 -> TID atomic transition of the futex value and failed.
+ * Try to acquire the lock in the kernel, blocking if necessary. Return to
+ * userspace with the lock held and the futex value set accordingly (or after a
+ * timeout).
+ *
+ * Returns:
+ * 0 - On success
+ * <0 - On error
+ */
+static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
+{
+ struct hrtimer_sleeper timeout, *to = NULL;
+ struct futex_hash_bucket *hb;
+ struct futex_q q = FUTEX_Q_INIT;
+ int ret = 0;
+
+ if (refill_pi_state_cache())
+ return -ENOMEM;
+
+ if (time) {
+ to = &timeout;
+ hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+ CLOCK_REALTIME : CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS);
+ hrtimer_init_sleeper(to, current);
+ hrtimer_set_expires(&to->timer, *time);
+ }
+
+retry:
+#ifdef CONFIG_SMP
+ if (flags & FLAGS_ADAPTIVE) {
+ preempt_disable();
+ ret = trylock_futex_adaptive(uaddr, time);
+ preempt_enable();
+
+ /* We got the lock. */
+ if (ret == 1) {
+ ret = 0;
+ goto out;
+ }
+
+ /* We encountered an error, -EFAULT or -ETIMEDOUT */
+ if (ret)
+ goto out;
+ }
+#endif
+ q.key = FUTEX_KEY_INIT;
+ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
+ if (unlikely(ret != 0))
+ goto out;
+
+ hb = queue_lock(&q);
+
+ ret = lock_futex_atomic(uaddr, current, 0);
+ if (unlikely(ret)) {
+ switch (ret) {
+ case 1:
+ /* We got the lock. */
+ ret = 0;
+ goto out_unlock_put_key;
+ case -EFAULT:
+ goto uaddr_faulted;
+ default:
+ goto out_unlock_put_key;
+ }
+ }
+ /*
+ * Only actually queue now that the atomic ops are done:
+ */
+ futex_wait_queue_me(hb, &q, to);
+
+ ret = unqueue_me(&q);
+
+ /* If we were woken by the owner, try and acquire the lock. */
+ if (!ret)
+ goto retry;
+
+ ret = -ETIMEDOUT;
+ if (to && !to->task)
+ goto out_put_key;
+
+ /*
+ * We expect signal_pending(current), but we might be the
+ * victim of a spurious wakeup as well.
+ */
+ ret = -ERESTARTNOINTR;
+ if (!signal_pending(current)) {
+ put_futex_key(&q.key);
+ goto retry;
+ }
+
+ goto out_put_key;
+
+out_unlock_put_key:
+ queue_unlock(&q, hb);
+
+out_put_key:
+ put_futex_key(&q.key);
+out:
+ if (to)
+ destroy_hrtimer_on_stack(&to->timer);
+ return ret;
+
+uaddr_faulted:
+ queue_unlock(&q, hb);
+
+ ret = fault_in_user_writeable(uaddr);
+ if (ret)
+ goto out_put_key;
+
+ put_futex_key(&q.key);
+ goto retry;
+}
+
+
/*
* Support for robust futexes: the kernel cleans up held futexes at
* thread exit time.
@@ -2579,7 +2812,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,

if (op & FUTEX_CLOCK_REALTIME) {
flags |= FLAGS_CLOCKRT;
- if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+ if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI &&
+ cmd != FUTEX_LOCK && cmd != FUTEX_LOCK_ADAPTIVE)
return -ENOSYS;
}

@@ -2623,6 +2857,12 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
case FUTEX_CMP_REQUEUE_PI:
ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
break;
+ case FUTEX_LOCK_ADAPTIVE:
+ flags |= FLAGS_ADAPTIVE;
+ case FUTEX_LOCK:
+ if (futex_cmpxchg_enabled)
+ ret = futex_lock(uaddr, flags, val, timeout);
+ break;
default:
ret = -ENOSYS;
}
@@ -2641,7 +2881,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,

if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
cmd == FUTEX_WAIT_BITSET ||
- cmd == FUTEX_WAIT_REQUEUE_PI)) {
+ cmd == FUTEX_WAIT_REQUEUE_PI ||
+ cmd == FUTEX_LOCK || cmd == FUTEX_LOCK_ADAPTIVE)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
diff --git a/kernel/sched.c b/kernel/sched.c
index 9ab3cd7..4bb519e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3818,6 +3818,72 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
out:
return 1;
}
+
+#ifdef CONFIG_FUTEX
+#include <linux/futex.h>
+
+int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
+{
+ u32 ownertid, uval;
+ unsigned int cpu;
+ struct rq *rq;
+
+ if (!sched_feat(OWNER_SPIN))
+ return 0;
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ /*
+ * Need to access the cpu field knowing that
+ * DEBUG_PAGEALLOC could have unmapped it if
+ * the mutex owner just released it and exited.
+ */
+ if (probe_kernel_address(&owner->cpu, cpu))
+ goto out;
+#else
+ cpu = owner->cpu;
+#endif
+
+ /*
+ * Even if the access succeeded (likely case),
+ * the cpu field may no longer be valid.
+ */
+ if (cpu >= nr_cpumask_bits)
+ goto out;
+
+ /*
+ * We need to validate that we can do a
+ * get_cpu() and that we have the percpu area.
+ */
+ if (!cpu_online(cpu))
+ goto out;
+
+ rq = cpu_rq(cpu);
+
+ if (get_user(ownertid, uaddr))
+ return -EFAULT;
+ ownertid &= FUTEX_TID_MASK;
+
+ for (;;) {
+ /*
+ * Owner changed, break to re-assess state.
+ */
+ if (get_user(uval, uaddr))
+ return -EFAULT;
+ if ((uval & FUTEX_TID_MASK) != ownertid)
+ break;
+
+ /*
+ * Is that owner really running on that cpu?
+ */
+ if (task_thread_info(rq->curr) != owner || need_resched())
+ return 0;
+
+ cpu_relax();
+ }
+out:
+ return 1;
+}
+#endif
#endif

#ifdef CONFIG_PREEMPT
--
1.6.3.3

2010-04-09 05:17:39

by Darren Hart

[permalink] [raw]
Subject: [PATCH 3/4] futex: refactor futex_lock_pi_atomic

From: Darren Hart <[email protected]>

Prepare for FUTEX_LOCK by refactoring futex_lock_pi_atomic() into
lock_futex_atomic() and lock_pi_futex_atomic(). The name change is meant to
reflect the naming convention in futex.c that futex_*() functions map directly
to futex op codes and the others are internal helper functions.

Signed-off-by: Darren Hart <[email protected]>

===================================================================
---
kernel/futex.c | 79 +++++++++++++++++++++++++++++++++++++------------------
1 files changed, 53 insertions(+), 26 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 2ae18cd..8c1bb16 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -625,30 +625,23 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
}

/**
- * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
- * @uaddr: the pi futex user address
- * @hb: the pi futex hash bucket
- * @key: the futex key associated with uaddr and hb
- * @ps: the pi_state pointer where we store the result of the
- * lookup
- * @task: the task to perform the atomic lock work for. This will
- * be "current" except in the case of requeue pi.
- * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
+ * lock_futex_atomic() - Try to acquire the futex lock atomically
+ * @uaddr: user address of the futex
+ * @task: the task to perform the atomic lock work for.
+ * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
+ *
+ * The hb->lock shall be held by the caller.
*
* Returns:
* 0 - ready to wait
* 1 - acquired the lock
* <0 - error
- *
- * The hb->lock and futex_key refs shall be held by the caller.
*/
-static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
- union futex_key *key,
- struct futex_pi_state **ps,
- struct task_struct *task, int set_waiters)
+static int lock_futex_atomic(u32 __user *uaddr, struct task_struct *task,
+ int set_waiters)
{
- int lock_taken, ret, ownerdied = 0;
u32 uval, newval, curval;
+ int lock_taken, ret;

retry:
ret = lock_taken = 0;
@@ -695,10 +688,10 @@ retry:
*
* This is safe as we are protected by the hash bucket lock !
*/
- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+ if (unlikely((curval & FUTEX_OWNER_DIED) ||
+ !(curval & FUTEX_TID_MASK))) {
/* Keep the OWNER_DIED bit */
newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
- ownerdied = 0;
lock_taken = 1;
}

@@ -715,10 +708,46 @@ retry:
if (unlikely(lock_taken))
return 1;

+ return ret;
+}
+
+/**
+ * lock_pi_futex_atomic() - Atomic work required to acquire a pi aware futex
+ * @uaddr: the pi futex user address
+ * @hb: the pi futex hash bucket
+ * @key: the futex key associated with uaddr and hb
+ * @ps: the pi_state pointer where we store the result of the
+ * lookup
+ * @task: the task to perform the atomic lock work for. This will
+ * be "current" except in the case of requeue pi.
+ * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
+ *
+ * Returns:
+ * 0 - ready to wait
+ * 1 - acquired the lock
+ * <0 - error
+ *
+ * The hb->lock and futex_key refs shall be held by the caller.
+ */
+static int lock_pi_futex_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
+ union futex_key *key,
+ struct futex_pi_state **ps,
+ struct task_struct *task, int set_waiters)
+{
+ u32 uval;
+ int ret;
+
+retry:
+ ret = lock_futex_atomic(uaddr, task, set_waiters);
+ if (ret)
+ return ret;
+
/*
* We dont have the lock. Look up the PI state (or create it if
* we are the first waiter):
*/
+ if (get_futex_value_locked(&uval, uaddr))
+ return -EFAULT;
ret = lookup_pi_state(uval, hb, key, ps);

if (unlikely(ret)) {
@@ -729,7 +758,7 @@ retry:
* OWNER_DIED bit is set to figure out whether
* this is a robust futex or not.
*/
- if (get_futex_value_locked(&curval, uaddr))
+ if (get_futex_value_locked(&uval, uaddr))
return -EFAULT;

/*
@@ -737,10 +766,8 @@ retry:
* futex. The code above will take the futex
* and return happy.
*/
- if (curval & FUTEX_OWNER_DIED) {
- ownerdied = 1;
+ if (uval & FUTEX_OWNER_DIED)
goto retry;
- }
default:
break;
}
@@ -1100,7 +1127,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
*
* Try and get the lock on behalf of the top waiter if we can do it atomically.
* Wake the top waiter if we succeed. If the caller specified set_waiters,
- * then direct futex_lock_pi_atomic() to force setting the FUTEX_WAITERS bit.
+ * then direct lock_pi_futex_atomic() to force setting the FUTEX_WAITERS bit.
* hb1 and hb2 must be held by the caller.
*
* Returns:
@@ -1124,7 +1151,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
/*
* Find the top_waiter and determine if there are additional waiters.
* If the caller intends to requeue more than 1 waiter to pifutex,
- * force futex_lock_pi_atomic() to set the FUTEX_WAITERS bit now,
+ * force lock_pi_futex_atomic() to set the FUTEX_WAITERS bit now,
* as we have means to handle the possible fault. If not, don't set
* the bit unecessarily as it will force the subsequent unlock to enter
* the kernel.
@@ -1144,7 +1171,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
* the contended case or if set_waiters is 1. The pi_state is returned
* in ps in contended cases.
*/
- ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
+ ret = lock_pi_futex_atomic(pifutex, hb2, key2, ps, top_waiter->task,
set_waiters);
if (ret == 1)
requeue_pi_wake_futex(top_waiter, key2, hb2);
@@ -1925,7 +1952,7 @@ retry:
retry_private:
hb = queue_lock(&q);

- ret = futex_lock_pi_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
+ ret = lock_pi_futex_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
if (unlikely(ret)) {
switch (ret) {
case 1:
--
1.6.3.3

2010-04-15 06:13:32

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH V5 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning

[email protected] wrote:

> Now that an advantage can be shown using FUTEX_LOCK_ADAPTIVE over FUTEX_LOCK,
> the next steps as I see them are:
>
> o Try and show improvement of FUTEX_LOCK_ADAPTIVE over FUTEX_WAIT based
> implementations (pthread_mutex specifically).

I've spent a bit of time on this, and made huge improvements through
some simple optimizations of the testcase lock/unlock routines. I'll be
away for a few days and wanted to let people know where things stand
with FUTEX_LOCK_ADAPTIVE.

I ran all the tests with the following options:
-i 1000000 -p 1000 -d 20
where:
-i iterations
-p period (in instructions)
-d duty cycle (in percent)

MECHANISM KITERS/SEC
----------------------------------
pthread_mutex_adaptive 1562
FUTEX_LOCK_ADAPTIVE 1190
pthread_mutex 1010
FUTEX_LOCK 532


I took some perf data while running each of the above tests as well. Any
thoughts on getting more from perf are appreciated, this is my first
pass at it. I recorded with "perf record -fg" and snippets of "perf
report" follow:

FUTEX_LOCK (not adaptive) spends a lot of time spinning on the futex
hashbucket lock.
# Overhead Command Shared Object Symbol
# ........ .......... .................. ......
#
40.76% futex_lock [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--62.16%-- do_futex
| sys_futex
| system_call_fastpath
| syscall
|
|--31.05%-- futex_wake
| do_futex
| sys_futex
| system_call_fastpath
| syscall
...
14.98% futex_lock futex_lock [.] locktest


FUTEX_LOCK_ADAPTIVE spends much of its time in the test loop itself,
followed by the actual adaptive loop in the kernel. It appears much of
our savings over FUTEX_LOCK comes from not contending on the hashbucket
lock.
# Overhead Command Shared Object Symbol
# ........ .......... .................. ......
#
36.07% futex_lock futex_lock [.] locktest
|
--- locktest
|
--100.00%-- 0x400e7000000000

9.12% futex_lock perf [.] 0x00000000000eee
...
8.26% futex_lock [kernel.kallsyms] [k] futex_spin_on_owner


Pthread Mutex Adaptive spends most of it's time in the glibc heuristic
spinning, as expected, followed by the test loop itself. An impressively
minimal 3.35% is spent on the hashbucket lock.
# Overhead Command Shared Object Symbol
# ........ ............... ........................ ......
#
47.88% pthread_mutex_2 libpthread-2.5.so [.]
__pthread_mutex_lock_internal
|
--- __pthread_mutex_lock_internal

22.78% pthread_mutex_2 pthread_mutex_2 [.] locktest
...
15.16% pthread_mutex_2 perf [.] ...
...
3.35% pthread_mutex_2 [kernel.kallsyms] [k] _raw_spin_lock


Pthread Mutex (not adaptive) spends much of it's time on the hashbucket
lock as expected, followed by the test loop.
33.89% pthread_mutex_2 [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--56.90%-- futex_wake
| do_futex
| sys_futex
| system_call_fastpath
| __lll_unlock_wake
|
|--28.95%-- futex_wait_setup
| futex_wait
| do_futex
| sys_futex
| system_call_fastpath
| __lll_lock_wait
...
16.60% pthread_mutex_2 pthread_mutex_2 [.] locktest


These results mostly confirm the expected: the adaptive versions spend
more time in their spin loops and less time contending for hashbucket
locks while the non-adaptive versions take the hashbucket lock more
often, and therefore shore more contention there.

I believe I should be able to get the plain FUTEX_LOCK implementation to
be much closer in performance to the plain pthread mutex version. I
expect much of the work done to benefit FUTEX_LOCK will also benefit
FUTEX_LOCK_ADAPTIVE. If that's true, and I can make a significant
improvement to FUTEX_LOCK, it wouldn't take much to get
FUTEX_LOCK_ADAPTIVE to beat the heuristics spinlock in glibc.

It could also be that this synthetic benchmark is an ideal situation for
glibc's heuristics, and a more realistic load with varying lock hold
times wouldn't favor the adaptive pthread mutex over FUTEX_LOCK_ADAPTIVE
by such a large margin.

More next week.

Thanks,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team