2009-09-22 05:29:51

by Darren Hart

[permalink] [raw]
Subject: [PATCH 0/5] Futex cleanups and race fix

The following series corrects several innacurate comments in the futex code as
well as reformat some of the existing comments to be mostly consistent
throughout futex.c. The final patch fixes a nasty little race in the requeue pi
code resulting in missed wakeups and hung userspace application threads.

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Dinakar Guniguntala <[email protected]>
CC: John Stultz <[email protected]>

---

Darren Hart (5):
futex: fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me
futex: correct futex_q woken state commentary
futex: make function kernel-doc commentary consistent
futex: correct queue_me and unqueue_me commentary
futex: correct futex_wait_requeue_pi() commentary


kernel/futex.c | 135 ++++++++++++++++++++++++++++++--------------------------
1 files changed, 72 insertions(+), 63 deletions(-)

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team


2009-09-22 05:30:10

by Darren Hart

[permalink] [raw]
Subject: [PATCH 1/5] futex: correct futex_wait_requeue_pi() commentary

Correct various typos and formatting inconsistencies in the commentary of
futex_wait_requeue_pi().

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Dinakar Guniguntala <[email protected]>
CC: John Stultz <[email protected]>
---

kernel/futex.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 248dd11..6c498b1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2114,12 +2114,12 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,

/**
* futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
- * @uaddr: the futex we initialyl wait on (non-pi)
+ * @uaddr: the futex we initially wait on (non-pi)
* @fshared: whether the futexes are shared (1) or not (0). They must be
* the same type, no requeueing from private to shared, etc.
* @val: the expected value of uaddr
* @abs_time: absolute timeout
- * @bitset: 32 bit wakeup bitset set by userspace, defaults to all.
+ * @bitset: 32 bit wakeup bitset set by userspace, defaults to all
* @clockrt: whether to use CLOCK_REALTIME (1) or CLOCK_MONOTONIC (0)
* @uaddr2: the pi futex we will take prior to returning to user-space
*
@@ -2246,7 +2246,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
res = fixup_owner(uaddr2, fshared, &q, !ret);
/*
* If fixup_owner() returned an error, proprogate that. If it
- * acquired the lock, clear our -ETIMEDOUT or -EINTR.
+ * acquired the lock, clear -ETIMEDOUT or -EINTR.
*/
if (res)
ret = (res < 0) ? res : 0;

2009-09-22 05:30:19

by Darren Hart

[permalink] [raw]
Subject: [PATCH 2/5] futex: correct queue_me and unqueue_me commentary

The queue_me/unqueue_me commentary is oddly placed and out of date.
Clean it up and correct the inaccurate bits.

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Dinakar Guniguntala <[email protected]>
CC: John Stultz <[email protected]>
---

kernel/futex.c | 41 +++++++++++++++++++++++++++++------------
1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 6c498b1..cedcd60 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1350,6 +1350,25 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
return hb;
}

+static inline void
+queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
+{
+ spin_unlock(&hb->lock);
+ drop_futex_key_refs(&q->key);
+}
+
+/**
+ * queue_me() - Enqueue the futex_q on the futex_hash_bucket
+ * @q: The futex_q to enqueue
+ * @hb: The destination hash bucket
+ *
+ * The hb->lock must be held by the caller, and is released here. A call to
+ * queue_me() is typically paired with exactly one call to unqueue_me(). The
+ * exceptions involve the PI related operations, which may use unqueue_me_pi()
+ * or nothing if the unqueue is done as part of the wake process and the unqueue
+ * state is implicit in the state of woken task (see futex_wait_requeue_pi() for
+ * an example).
+ */
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
{
int prio;
@@ -1373,19 +1392,17 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
spin_unlock(&hb->lock);
}

-static inline void
-queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
-{
- spin_unlock(&hb->lock);
- drop_futex_key_refs(&q->key);
-}
-
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once. They are called with the hashed spinlock held.
+/**
+ * unqueue_me() - Remove the futex_q from its futex_hash_bucket
+ * @q: The futex_q to unqueue
+ *
+ * The q->lock_ptr must not be held by the caller. A call to unqueue_me() must
+ * be paired with exactly one earlier call to queue_me().
+ *
+ * Returns:
+ * 1 - if the futex_q was still queued (and we removed unqueued it)
+ * 0 - if the futex_q was already removed by the waking thread
*/
-
-/* Return 1 if we were still queued (ie. 0 means we were woken) */
static int unqueue_me(struct futex_q *q)
{
spinlock_t *lock_ptr;

2009-09-22 05:30:29

by Darren Hart

[permalink] [raw]
Subject: [PATCH 3/5] futex: make function kernel-doc commentary consistent

Make the existing function kernel-doc consistent throughout futex.c, following
Documentation/kernel-doc-nano-howto.txt as closely as possible. When unsure,
at least be consistent within futex.c.

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Dinakar Guniguntala <[email protected]>
CC: John Stultz <[email protected]>
---

kernel/futex.c | 41 +++++++++++++++++++++--------------------
1 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index cedcd60..720fa3d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -198,11 +198,12 @@ static void drop_futex_key_refs(union futex_key *key)
}

/**
- * get_futex_key - Get parameters which are the keys for a futex.
- * @uaddr: virtual address of the futex
- * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
- * @key: address where result is stored.
- * @rw: mapping needs to be read/write (values: VERIFY_READ, VERIFY_WRITE)
+ * get_futex_key() - Get parameters which are the keys for a futex
+ * @uaddr: virtual address of the futex
+ * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
+ * @key: address where result is stored.
+ * @rw: mapping needs to be read/write (values: VERIFY_READ,
+ * VERIFY_WRITE)
*
* Returns a negative error code or 0
* The key words are stored in *key on success.
@@ -288,8 +289,8 @@ void put_futex_key(int fshared, union futex_key *key)
drop_futex_key_refs(key);
}

-/*
- * fault_in_user_writeable - fault in user address and verify RW access
+/**
+ * fault_in_user_writeable() - Fault in user address and verify RW access
* @uaddr: pointer to faulting user space address
*
* Slow path to fixup the fault we just took in the atomic write
@@ -309,8 +310,8 @@ static int fault_in_user_writeable(u32 __user *uaddr)

/**
* futex_top_waiter() - Return the highest priority waiter on a futex
- * @hb: the hash bucket the futex_q's reside in
- * @key: the futex key (to distinguish it from other futex futex_q's)
+ * @hb: the hash bucket the futex_q's reside in
+ * @key: the futex key (to distinguish it from other futex futex_q's)
*
* Must be called with the hb lock held.
*/
@@ -588,7 +589,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
}

/**
- * futex_lock_pi_atomic() - atomic work required to acquire a pi aware futex
+ * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
* @uaddr: the pi futex user address
* @hb: the pi futex hash bucket
* @key: the futex key associated with uaddr and hb
@@ -1011,9 +1012,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,

/**
* requeue_pi_wake_futex() - Wake a task that acquired the lock during requeue
- * q: the futex_q
- * key: the key of the requeue target futex
- * hb: the hash_bucket of the requeue target futex
+ * @q: the futex_q
+ * @key: the key of the requeue target futex
+ * @hb: the hash_bucket of the requeue target futex
*
* During futex_requeue, with requeue_pi=1, it is possible to acquire the
* target futex if it is uncontended or via a lock steal. Set the futex_q key
@@ -2319,9 +2320,9 @@ out:
*/

/**
- * sys_set_robust_list - set the robust-futex list head of a task
- * @head: pointer to the list-head
- * @len: length of the list-head, as userspace expects
+ * sys_set_robust_list() - Set the robust-futex list head of a task
+ * @head: pointer to the list-head
+ * @len: length of the list-head, as userspace expects
*/
SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
size_t, len)
@@ -2340,10 +2341,10 @@ SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
}

/**
- * sys_get_robust_list - get the robust-futex list head of a task
- * @pid: pid of the process [zero for current task]
- * @head_ptr: pointer to a list-head pointer, the kernel fills it in
- * @len_ptr: pointer to a length field, the kernel fills in the header size
+ * sys_get_robust_list() - Get the robust-futex list head of a task
+ * @pid: pid of the process [zero for current task]
+ * @head_ptr: pointer to a list-head pointer, the kernel fills it in
+ * @len_ptr: pointer to a length field, the kernel fills in the header size
*/
SYSCALL_DEFINE3(get_robust_list, int, pid,
struct robust_list_head __user * __user *, head_ptr,

2009-09-22 05:30:37

by Darren Hart

[permalink] [raw]
Subject: [PATCH 4/5] futex: correct futex_q woken state commentary

Use kernel-doc format to describe struct futex_q. Correct the wakeup
definition to eliminate the statement about waking the waiter between
the plist_del() and the q->lock_ptr = 0. Note in the comment that PI
futexes have a different definition of the woken state.

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Dinakar Guniguntala <[email protected]>
CC: John Stultz <[email protected]>
---

kernel/futex.c | 32 ++++++++++++++++----------------
1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 720fa3d..f92afbe 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -89,36 +89,36 @@ struct futex_pi_state {
union futex_key key;
};

-/*
- * We use this hashed waitqueue instead of a normal wait_queue_t, so
+/**
+ * struct futex_q - The hashed futex queue entry, one per waiting task
+ * @task: the task waiting on the futex
+ * @lock_ptr: the hash bucket lock
+ * @key: the key the futex is hashed on
+ * @pi_state: optional priority inheritance state
+ * @rt_waiter: rt_waiter storage for use with requeue_pi
+ * @requeue_pi_key: the requeue_pi target futex key
+ * @bitset: bitset for the optional bitmasked wakeup
+ *
+ * We use this hashed waitqueue, instead of a normal wait_queue_t, so
* we can wake only the relevant ones (hashed queues may be shared).
*
* A futex_q has a woken state, just like tasks have TASK_RUNNING.
* It is considered woken when plist_node_empty(&q->list) || q->lock_ptr == 0.
* The order of wakup is always to make the first condition true, then
- * wake up q->waiter, then make the second condition true.
+ * the second.
+ *
+ * PI futexes are typically woken before they are removed from the hash list via
+ * the rt_mutex code. See unqueue_me_pi().
*/
struct futex_q {
struct plist_node list;
- /* Waiter reference */
- struct task_struct *task;

- /* Which hash list lock to use: */
+ struct task_struct *task;
spinlock_t *lock_ptr;
-
- /* Key which the futex is hashed on: */
union futex_key key;
-
- /* Optional priority inheritance state: */
struct futex_pi_state *pi_state;
-
- /* rt_waiter storage for requeue_pi: */
struct rt_mutex_waiter *rt_waiter;
-
- /* The expected requeue pi target futex key: */
union futex_key *requeue_pi_key;
-
- /* Bitset for the optional bitmasked wakeup */
u32 bitset;
};

2009-09-22 05:30:46

by Darren Hart

[permalink] [raw]
Subject: [PATCH 5/5] futex: fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me

PI futexes do not use the same plist_node_empty() test for wakeup. It was
possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE
after the waker assigned the rtmutex to the waiter. The waiter would then note
the plist was not empty and call schedule(). The task would not be found by any
subsequeuent futex wakeups, resulting in a userspace hang. By moving the
setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with
the waker is eliminated. Since we no longer call get_user() from within
queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until
after the call to queue_me().

The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely
on the rtmutex code to handle schedule() and wakeup. The requeue PI code is
affected because the waiter starts as a non-PI waiter and is woken on a PI
futex.

Remove the crusty old comment about holding spinlocks() across get_user() as we
no longer do that. Correct the locking statement with a description of why the
test is performed.

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Dinakar Guniguntala <[email protected]>
CC: John Stultz <[email protected]>
---

kernel/futex.c | 15 +++------------
1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f92afbe..463af2e 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1656,17 +1656,8 @@ out:
static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
struct hrtimer_sleeper *timeout)
{
- queue_me(q, hb);
-
- /*
- * There might have been scheduling since the queue_me(), as we
- * cannot hold a spinlock across the get_user() in case it
- * faults, and we cannot just set TASK_INTERRUPTIBLE state when
- * queueing ourselves into the futex hash. This code thus has to
- * rely on the futex_wake() code removing us from hash when it
- * wakes us up.
- */
set_current_state(TASK_INTERRUPTIBLE);
+ queue_me(q, hb);

/* Arm the timer */
if (timeout) {
@@ -1676,8 +1667,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
}

/*
- * !plist_node_empty() is safe here without any lock.
- * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
+ * If we have been removed from the hash list, then another task
+ * has tried to wake us, and we can skip the call to schedule().
*/
if (likely(!plist_node_empty(&q->list))) {
/*

2009-09-22 08:38:59

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 5/5] futex: fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me

Darren Hart a écrit :
> PI futexes do not use the same plist_node_empty() test for wakeup. It was
> possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE
> after the waker assigned the rtmutex to the waiter. The waiter would then note
> the plist was not empty and call schedule(). The task would not be found by any
> subsequeuent futex wakeups, resulting in a userspace hang. By moving the
> setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with
> the waker is eliminated. Since we no longer call get_user() from within
> queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until
> after the call to queue_me().
>
> The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely
> on the rtmutex code to handle schedule() and wakeup. The requeue PI code is
> affected because the waiter starts as a non-PI waiter and is woken on a PI
> futex.
>
> Remove the crusty old comment about holding spinlocks() across get_user() as we
> no longer do that. Correct the locking statement with a description of why the
> test is performed.

I am very confused by this ChangeLog...

>
> Signed-off-by: Darren Hart <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> CC: Eric Dumazet <[email protected]>
> CC: Dinakar Guniguntala <[email protected]>
> CC: John Stultz <[email protected]>
> ---
>
> kernel/futex.c | 15 +++------------
> 1 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index f92afbe..463af2e 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1656,17 +1656,8 @@ out:
> static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
> struct hrtimer_sleeper *timeout)
> {
> - queue_me(q, hb);
> -
> - /*
> - * There might have been scheduling since the queue_me(), as we
> - * cannot hold a spinlock across the get_user() in case it
> - * faults, and we cannot just set TASK_INTERRUPTIBLE state when
> - * queueing ourselves into the futex hash. This code thus has to
> - * rely on the futex_wake() code removing us from hash when it
> - * wakes us up.
> - */
> set_current_state(TASK_INTERRUPTIBLE);

Hmm, you missed the smp_mb() properties here...

Before :
queue_me()
set_mb(current->state, TASK_INTERRUPTIBLE);
if (timeout) {...}
if (likely(!plist_node_empty(&q->list))) {
...
}

After :
set_mb(current->state, TASK_INTERRUPTIBLE);
queue_me();
if (timeout) {...}
// no barrier... why ar we still testing q->list
// since it has no synchro between queue_me() and test ?
if (likely(!plist_node_empty(&q->list))) {
...
}



> + queue_me(q, hb);
>
> /* Arm the timer */
> if (timeout) {
> @@ -1676,8 +1667,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
> }
>
> /*
> - * !plist_node_empty() is safe here without any lock.
> - * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
> + * If we have been removed from the hash list, then another task
> + * has tried to wake us, and we can skip the call to schedule().
> */
> if (likely(!plist_node_empty(&q->list))) {
> /*
>

2009-09-22 08:41:09

by Darren Hart

[permalink] [raw]
Subject: [tip:core/urgent] futex: Correct futex_wait_requeue_pi() commentary

Commit-ID: 56ec1607b15b6a5f6de5aab14f16a763b88296fc
Gitweb: http://git.kernel.org/tip/56ec1607b15b6a5f6de5aab14f16a763b88296fc
Author: Darren Hart <[email protected]>
AuthorDate: Mon, 21 Sep 2009 22:29:59 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 22 Sep 2009 10:37:42 +0200

futex: Correct futex_wait_requeue_pi() commentary

Correct various typos and formatting inconsistencies in the
commentary of futex_wait_requeue_pi().

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <20090922052958.8717.21932.stgit@Aeon>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/futex.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 248dd11..6c498b1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2114,12 +2114,12 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,

/**
* futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
- * @uaddr: the futex we initialyl wait on (non-pi)
+ * @uaddr: the futex we initially wait on (non-pi)
* @fshared: whether the futexes are shared (1) or not (0). They must be
* the same type, no requeueing from private to shared, etc.
* @val: the expected value of uaddr
* @abs_time: absolute timeout
- * @bitset: 32 bit wakeup bitset set by userspace, defaults to all.
+ * @bitset: 32 bit wakeup bitset set by userspace, defaults to all
* @clockrt: whether to use CLOCK_REALTIME (1) or CLOCK_MONOTONIC (0)
* @uaddr2: the pi futex we will take prior to returning to user-space
*
@@ -2246,7 +2246,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
res = fixup_owner(uaddr2, fshared, &q, !ret);
/*
* If fixup_owner() returned an error, proprogate that. If it
- * acquired the lock, clear our -ETIMEDOUT or -EINTR.
+ * acquired the lock, clear -ETIMEDOUT or -EINTR.
*/
if (res)
ret = (res < 0) ? res : 0;

2009-09-22 08:41:24

by Darren Hart

[permalink] [raw]
Subject: [tip:core/urgent] futex: Correct queue_me and unqueue_me commentary

Commit-ID: d40d65c8dbdd39f0b64e043f6bd08f8a38f55194
Gitweb: http://git.kernel.org/tip/d40d65c8dbdd39f0b64e043f6bd08f8a38f55194
Author: Darren Hart <[email protected]>
AuthorDate: Mon, 21 Sep 2009 22:30:15 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 22 Sep 2009 10:37:43 +0200

futex: Correct queue_me and unqueue_me commentary

The queue_me/unqueue_me commentary is oddly placed and out of date.
Clean it up and correct the inaccurate bits.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <20090922053015.8717.71713.stgit@Aeon>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/futex.c | 41 +++++++++++++++++++++++++++++------------
1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 6c498b1..cedcd60 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1350,6 +1350,25 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
return hb;
}

+static inline void
+queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
+{
+ spin_unlock(&hb->lock);
+ drop_futex_key_refs(&q->key);
+}
+
+/**
+ * queue_me() - Enqueue the futex_q on the futex_hash_bucket
+ * @q: The futex_q to enqueue
+ * @hb: The destination hash bucket
+ *
+ * The hb->lock must be held by the caller, and is released here. A call to
+ * queue_me() is typically paired with exactly one call to unqueue_me(). The
+ * exceptions involve the PI related operations, which may use unqueue_me_pi()
+ * or nothing if the unqueue is done as part of the wake process and the unqueue
+ * state is implicit in the state of woken task (see futex_wait_requeue_pi() for
+ * an example).
+ */
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
{
int prio;
@@ -1373,19 +1392,17 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
spin_unlock(&hb->lock);
}

-static inline void
-queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
-{
- spin_unlock(&hb->lock);
- drop_futex_key_refs(&q->key);
-}
-
-/*
- * queue_me and unqueue_me must be called as a pair, each
- * exactly once. They are called with the hashed spinlock held.
+/**
+ * unqueue_me() - Remove the futex_q from its futex_hash_bucket
+ * @q: The futex_q to unqueue
+ *
+ * The q->lock_ptr must not be held by the caller. A call to unqueue_me() must
+ * be paired with exactly one earlier call to queue_me().
+ *
+ * Returns:
+ * 1 - if the futex_q was still queued (and we removed unqueued it)
+ * 0 - if the futex_q was already removed by the waking thread
*/
-
-/* Return 1 if we were still queued (ie. 0 means we were woken) */
static int unqueue_me(struct futex_q *q)
{
spinlock_t *lock_ptr;

2009-09-22 08:41:38

by Darren Hart

[permalink] [raw]
Subject: [tip:core/urgent] futex: Make function kernel-doc commentary consistent

Commit-ID: d96ee56ce0582967fba9f3f0ac4503957147bea6
Gitweb: http://git.kernel.org/tip/d96ee56ce0582967fba9f3f0ac4503957147bea6
Author: Darren Hart <[email protected]>
AuthorDate: Mon, 21 Sep 2009 22:30:22 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 22 Sep 2009 10:37:43 +0200

futex: Make function kernel-doc commentary consistent

Make the existing function kernel-doc consistent throughout
futex.c, following Documentation/kernel-doc-nano-howto.txt as
closely as possible.

When unsure, at least be consistent within futex.c.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <20090922053022.8717.13339.stgit@Aeon>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/futex.c | 41 +++++++++++++++++++++--------------------
1 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index cedcd60..720fa3d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -198,11 +198,12 @@ static void drop_futex_key_refs(union futex_key *key)
}

/**
- * get_futex_key - Get parameters which are the keys for a futex.
- * @uaddr: virtual address of the futex
- * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
- * @key: address where result is stored.
- * @rw: mapping needs to be read/write (values: VERIFY_READ, VERIFY_WRITE)
+ * get_futex_key() - Get parameters which are the keys for a futex
+ * @uaddr: virtual address of the futex
+ * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
+ * @key: address where result is stored.
+ * @rw: mapping needs to be read/write (values: VERIFY_READ,
+ * VERIFY_WRITE)
*
* Returns a negative error code or 0
* The key words are stored in *key on success.
@@ -288,8 +289,8 @@ void put_futex_key(int fshared, union futex_key *key)
drop_futex_key_refs(key);
}

-/*
- * fault_in_user_writeable - fault in user address and verify RW access
+/**
+ * fault_in_user_writeable() - Fault in user address and verify RW access
* @uaddr: pointer to faulting user space address
*
* Slow path to fixup the fault we just took in the atomic write
@@ -309,8 +310,8 @@ static int fault_in_user_writeable(u32 __user *uaddr)

/**
* futex_top_waiter() - Return the highest priority waiter on a futex
- * @hb: the hash bucket the futex_q's reside in
- * @key: the futex key (to distinguish it from other futex futex_q's)
+ * @hb: the hash bucket the futex_q's reside in
+ * @key: the futex key (to distinguish it from other futex futex_q's)
*
* Must be called with the hb lock held.
*/
@@ -588,7 +589,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
}

/**
- * futex_lock_pi_atomic() - atomic work required to acquire a pi aware futex
+ * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
* @uaddr: the pi futex user address
* @hb: the pi futex hash bucket
* @key: the futex key associated with uaddr and hb
@@ -1011,9 +1012,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,

/**
* requeue_pi_wake_futex() - Wake a task that acquired the lock during requeue
- * q: the futex_q
- * key: the key of the requeue target futex
- * hb: the hash_bucket of the requeue target futex
+ * @q: the futex_q
+ * @key: the key of the requeue target futex
+ * @hb: the hash_bucket of the requeue target futex
*
* During futex_requeue, with requeue_pi=1, it is possible to acquire the
* target futex if it is uncontended or via a lock steal. Set the futex_q key
@@ -2319,9 +2320,9 @@ out:
*/

/**
- * sys_set_robust_list - set the robust-futex list head of a task
- * @head: pointer to the list-head
- * @len: length of the list-head, as userspace expects
+ * sys_set_robust_list() - Set the robust-futex list head of a task
+ * @head: pointer to the list-head
+ * @len: length of the list-head, as userspace expects
*/
SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
size_t, len)
@@ -2340,10 +2341,10 @@ SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
}

/**
- * sys_get_robust_list - get the robust-futex list head of a task
- * @pid: pid of the process [zero for current task]
- * @head_ptr: pointer to a list-head pointer, the kernel fills it in
- * @len_ptr: pointer to a length field, the kernel fills in the header size
+ * sys_get_robust_list() - Get the robust-futex list head of a task
+ * @pid: pid of the process [zero for current task]
+ * @head_ptr: pointer to a list-head pointer, the kernel fills it in
+ * @len_ptr: pointer to a length field, the kernel fills in the header size
*/
SYSCALL_DEFINE3(get_robust_list, int, pid,
struct robust_list_head __user * __user *, head_ptr,

2009-09-22 08:41:48

by Darren Hart

[permalink] [raw]
Subject: [tip:core/urgent] futex: Correct futex_q woken state commentary

Commit-ID: d8d88fbb186fe3ea37b2a58adb32413c98b59656
Gitweb: http://git.kernel.org/tip/d8d88fbb186fe3ea37b2a58adb32413c98b59656
Author: Darren Hart <[email protected]>
AuthorDate: Mon, 21 Sep 2009 22:30:30 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 22 Sep 2009 10:37:44 +0200

futex: Correct futex_q woken state commentary

Use kernel-doc format to describe struct futex_q.

Correct the wakeup definition to eliminate the statement about
waking the waiter between the plist_del() and the q->lock_ptr = 0.

Note in the comment that PI futexes have a different definition of
the woken state.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <20090922053029.8717.62798.stgit@Aeon>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/futex.c | 32 ++++++++++++++++----------------
1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 720fa3d..f92afbe 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -89,36 +89,36 @@ struct futex_pi_state {
union futex_key key;
};

-/*
- * We use this hashed waitqueue instead of a normal wait_queue_t, so
+/**
+ * struct futex_q - The hashed futex queue entry, one per waiting task
+ * @task: the task waiting on the futex
+ * @lock_ptr: the hash bucket lock
+ * @key: the key the futex is hashed on
+ * @pi_state: optional priority inheritance state
+ * @rt_waiter: rt_waiter storage for use with requeue_pi
+ * @requeue_pi_key: the requeue_pi target futex key
+ * @bitset: bitset for the optional bitmasked wakeup
+ *
+ * We use this hashed waitqueue, instead of a normal wait_queue_t, so
* we can wake only the relevant ones (hashed queues may be shared).
*
* A futex_q has a woken state, just like tasks have TASK_RUNNING.
* It is considered woken when plist_node_empty(&q->list) || q->lock_ptr == 0.
* The order of wakup is always to make the first condition true, then
- * wake up q->waiter, then make the second condition true.
+ * the second.
+ *
+ * PI futexes are typically woken before they are removed from the hash list via
+ * the rt_mutex code. See unqueue_me_pi().
*/
struct futex_q {
struct plist_node list;
- /* Waiter reference */
- struct task_struct *task;

- /* Which hash list lock to use: */
+ struct task_struct *task;
spinlock_t *lock_ptr;
-
- /* Key which the futex is hashed on: */
union futex_key key;
-
- /* Optional priority inheritance state: */
struct futex_pi_state *pi_state;
-
- /* rt_waiter storage for requeue_pi: */
struct rt_mutex_waiter *rt_waiter;
-
- /* The expected requeue pi target futex key: */
union futex_key *requeue_pi_key;
-
- /* Bitset for the optional bitmasked wakeup */
u32 bitset;
};

2009-09-22 08:41:59

by Darren Hart

[permalink] [raw]
Subject: [tip:core/urgent] futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me()

Commit-ID: 0729e196147692d84d4c099fcff056eba2ed61d8
Gitweb: http://git.kernel.org/tip/0729e196147692d84d4c099fcff056eba2ed61d8
Author: Darren Hart <[email protected]>
AuthorDate: Mon, 21 Sep 2009 22:30:38 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 22 Sep 2009 10:37:44 +0200

futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me()

PI futexes do not use the same plist_node_empty() test for wakeup.
It was possible for the waiter (in futex_wait_requeue_pi()) to set
TASK_INTERRUPTIBLE after the waker assigned the rtmutex to the
waiter. The waiter would then note the plist was not empty and call
schedule(). The task would not be found by any subsequeuent futex
wakeups, resulting in a userspace hang.

By moving the setting of TASK_INTERRUPTIBLE to before the call to
queue_me(), the race with the waker is eliminated. Since we no
longer call get_user() from within queue_me(), there is no need to
delay the setting of TASK_INTERRUPTIBLE until after the call to
queue_me().

The FUTEX_LOCK_PI operation is not affected as futex_lock_pi()
relies entirely on the rtmutex code to handle schedule() and
wakeup. The requeue PI code is affected because the waiter starts
as a non-PI waiter and is woken on a PI futex.

Remove the crusty old comment about holding spinlocks() across
get_user() as we no longer do that. Correct the locking statement
with a description of why the test is performed.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <20090922053038.8717.97838.stgit@Aeon>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/futex.c | 15 +++------------
1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f92afbe..463af2e 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1656,17 +1656,8 @@ out:
static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
struct hrtimer_sleeper *timeout)
{
- queue_me(q, hb);
-
- /*
- * There might have been scheduling since the queue_me(), as we
- * cannot hold a spinlock across the get_user() in case it
- * faults, and we cannot just set TASK_INTERRUPTIBLE state when
- * queueing ourselves into the futex hash. This code thus has to
- * rely on the futex_wake() code removing us from hash when it
- * wakes us up.
- */
set_current_state(TASK_INTERRUPTIBLE);
+ queue_me(q, hb);

/* Arm the timer */
if (timeout) {
@@ -1676,8 +1667,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
}

/*
- * !plist_node_empty() is safe here without any lock.
- * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
+ * If we have been removed from the hash list, then another task
+ * has tried to wake us, and we can skip the call to schedule().
*/
if (likely(!plist_node_empty(&q->list))) {
/*

2009-09-22 09:10:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/5] futex: fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me


* Eric Dumazet <[email protected]> wrote:

> Darren Hart a ??crit :
> > PI futexes do not use the same plist_node_empty() test for wakeup. It was
> > possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE
> > after the waker assigned the rtmutex to the waiter. The waiter would then note
> > the plist was not empty and call schedule(). The task would not be found by any
> > subsequeuent futex wakeups, resulting in a userspace hang. By moving the
> > setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with
> > the waker is eliminated. Since we no longer call get_user() from within
> > queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until
> > after the call to queue_me().
> >
> > The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely
> > on the rtmutex code to handle schedule() and wakeup. The requeue PI code is
> > affected because the waiter starts as a non-PI waiter and is woken on a PI
> > futex.
> >
> > Remove the crusty old comment about holding spinlocks() across get_user() as we
> > no longer do that. Correct the locking statement with a description of why the
> > test is performed.
>
> I am very confused by this ChangeLog...
>
> >
> > Signed-off-by: Darren Hart <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > CC: Eric Dumazet <[email protected]>
> > CC: Dinakar Guniguntala <[email protected]>
> > CC: John Stultz <[email protected]>
> > ---
> >
> > kernel/futex.c | 15 +++------------
> > 1 files changed, 3 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index f92afbe..463af2e 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1656,17 +1656,8 @@ out:
> > static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
> > struct hrtimer_sleeper *timeout)
> > {
> > - queue_me(q, hb);
> > -
> > - /*
> > - * There might have been scheduling since the queue_me(), as we
> > - * cannot hold a spinlock across the get_user() in case it
> > - * faults, and we cannot just set TASK_INTERRUPTIBLE state when
> > - * queueing ourselves into the futex hash. This code thus has to
> > - * rely on the futex_wake() code removing us from hash when it
> > - * wakes us up.
> > - */
> > set_current_state(TASK_INTERRUPTIBLE);
>
> Hmm, you missed the smp_mb() properties here...
>
> Before :
> queue_me()
> set_mb(current->state, TASK_INTERRUPTIBLE);
> if (timeout) {...}
> if (likely(!plist_node_empty(&q->list))) {
> ...
> }
>
> After :
> set_mb(current->state, TASK_INTERRUPTIBLE);
> queue_me();
> if (timeout) {...}
> // no barrier... why ar we still testing q->list
> // since it has no synchro between queue_me() and test ?
> if (likely(!plist_node_empty(&q->list))) {
> ...
> }

queue_me() itself does a spin_unlock(), so at least for the bits
protected by hb->lock it should be half-serializing.

Ingo

2009-09-22 17:22:20

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 5/5] futex: fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me

Eric Dumazet wrote:
> Darren Hart a écrit :
>> PI futexes do not use the same plist_node_empty() test for wakeup. It was
>> possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE
>> after the waker assigned the rtmutex to the waiter. The waiter would then note
>> the plist was not empty and call schedule(). The task would not be found by any
>> subsequeuent futex wakeups, resulting in a userspace hang. By moving the
>> setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with
>> the waker is eliminated. Since we no longer call get_user() from within
>> queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until
>> after the call to queue_me().
>>
>> The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely
>> on the rtmutex code to handle schedule() and wakeup. The requeue PI code is
>> affected because the waiter starts as a non-PI waiter and is woken on a PI
>> futex.
>>
>> Remove the crusty old comment about holding spinlocks() across get_user() as we
>> no longer do that. Correct the locking statement with a description of why the
>> test is performed.
>
> I am very confused by this ChangeLog...
>
>> Signed-off-by: Darren Hart <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Steven Rostedt <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> CC: Eric Dumazet <[email protected]>
>> CC: Dinakar Guniguntala <[email protected]>
>> CC: John Stultz <[email protected]>
>> ---
>>
>> kernel/futex.c | 15 +++------------
>> 1 files changed, 3 insertions(+), 12 deletions(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index f92afbe..463af2e 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -1656,17 +1656,8 @@ out:
>> static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
>> struct hrtimer_sleeper *timeout)
>> {
>> - queue_me(q, hb);
>> -
>> - /*
>> - * There might have been scheduling since the queue_me(), as we
>> - * cannot hold a spinlock across the get_user() in case it
>> - * faults, and we cannot just set TASK_INTERRUPTIBLE state when
>> - * queueing ourselves into the futex hash. This code thus has to
>> - * rely on the futex_wake() code removing us from hash when it
>> - * wakes us up.
>> - */
>> set_current_state(TASK_INTERRUPTIBLE);
>
> Hmm, you missed the smp_mb() properties here...
>
> Before :
> queue_me()
> set_mb(current->state, TASK_INTERRUPTIBLE);
> if (timeout) {...}
> if (likely(!plist_node_empty(&q->list))) {
> ...
> }
>
> After :
> set_mb(current->state, TASK_INTERRUPTIBLE);
> queue_me();
> if (timeout) {...}
> // no barrier... why ar we still testing q->list
> // since it has no synchro between queue_me() and test ?

As Ingo said, the barrier is covered by the spin_unlock() in queue_me()
according to memory-barriers.txt:


(2) UNLOCK operation implication:

Memory operations issued before the UNLOCK will be completed before
the UNLOCK operation has completed.

> if (likely(!plist_node_empty(&q->list))) {

Note that this test is really just an optimization to avoid calling
schedule() if the waker has already removed the futex_q from the list.
If it is about to wake us, but hasn't removed us from the list, it will
have set TASK_RUNNING and schedule() will do the right thing, with a
little more overhead than is truly necessary.

Thanks,

Darren Hart

> ...
> }
>
>
>
>> + queue_me(q, hb);
>>
>> /* Arm the timer */
>> if (timeout) {
>> @@ -1676,8 +1667,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
>> }
>>
>> /*
>> - * !plist_node_empty() is safe here without any lock.
>> - * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
>> + * If we have been removed from the hash list, then another task
>> + * has tried to wake us, and we can skip the call to schedule().
>> */
>> if (likely(!plist_node_empty(&q->list))) {
>> /*
>>
>


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

2009-09-22 19:27:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/5] futex: fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me


* Darren Hart <[email protected]> wrote:

> Eric Dumazet wrote:
>> Darren Hart a ??crit :
>>> PI futexes do not use the same plist_node_empty() test for wakeup. It was
>>> possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE
>>> after the waker assigned the rtmutex to the waiter. The waiter would then note
>>> the plist was not empty and call schedule(). The task would not be found by any
>>> subsequeuent futex wakeups, resulting in a userspace hang. By moving the
>>> setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with
>>> the waker is eliminated. Since we no longer call get_user() from within
>>> queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until
>>> after the call to queue_me().
>>>
>>> The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely
>>> on the rtmutex code to handle schedule() and wakeup. The requeue PI code is
>>> affected because the waiter starts as a non-PI waiter and is woken on a PI
>>> futex.
>>>
>>> Remove the crusty old comment about holding spinlocks() across get_user() as we
>>> no longer do that. Correct the locking statement with a description of why the
>>> test is performed.
>>
>> I am very confused by this ChangeLog...
>>
>>> Signed-off-by: Darren Hart <[email protected]>
>>> Cc: Thomas Gleixner <[email protected]>
>>> Cc: Peter Zijlstra <[email protected]>
>>> Cc: Steven Rostedt <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> CC: Eric Dumazet <[email protected]>
>>> CC: Dinakar Guniguntala <[email protected]>
>>> CC: John Stultz <[email protected]>
>>> ---
>>>
>>> kernel/futex.c | 15 +++------------
>>> 1 files changed, 3 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/kernel/futex.c b/kernel/futex.c
>>> index f92afbe..463af2e 100644
>>> --- a/kernel/futex.c
>>> +++ b/kernel/futex.c
>>> @@ -1656,17 +1656,8 @@ out:
>>> static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
>>> struct hrtimer_sleeper *timeout)
>>> {
>>> - queue_me(q, hb);
>>> -
>>> - /*
>>> - * There might have been scheduling since the queue_me(), as we
>>> - * cannot hold a spinlock across the get_user() in case it
>>> - * faults, and we cannot just set TASK_INTERRUPTIBLE state when
>>> - * queueing ourselves into the futex hash. This code thus has to
>>> - * rely on the futex_wake() code removing us from hash when it
>>> - * wakes us up.
>>> - */
>>> set_current_state(TASK_INTERRUPTIBLE);
>>
>> Hmm, you missed the smp_mb() properties here...
>>
>> Before :
>> queue_me()
>> set_mb(current->state, TASK_INTERRUPTIBLE);
>> if (timeout) {...}
>> if (likely(!plist_node_empty(&q->list))) {
>> ...
>> }
>>
>> After :
>> set_mb(current->state, TASK_INTERRUPTIBLE);
>> queue_me();
>> if (timeout) {...}
>> // no barrier... why ar we still testing q->list
>> // since it has no synchro between queue_me() and test ?
>
> As Ingo said, the barrier is covered by the spin_unlock() in queue_me()
> according to memory-barriers.txt:
>
>
> (2) UNLOCK operation implication:
>
> Memory operations issued before the UNLOCK will be completed before
> the UNLOCK operation has completed.

btw., it might make sense to add a comment about this - it's not trivial
at all and the barrier rules here are tricky ...

Ingo