2014-10-17 16:38:58

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd07f (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.

However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).

Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.

Signed-off-by: Catalin Marinas <[email protected]>
Reported-by: Matteo Franchin <[email protected]>
Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Cc: <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul E. McKenney <[email protected]>
---
kernel/futex.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 815d7af2ffe8..f3a3a071283c 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -343,6 +343,8 @@ static void get_futex_key_refs(union futex_key *key)
case FUT_OFF_MMSHARED:
futex_get_mm(key); /* implies MB (B) */
break;
+ default:
+ smp_mb(); /* explicit MB (B) */
}
}


2014-10-18 06:54:52

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Fri, 2014-10-17 at 17:38 +0100, Catalin Marinas wrote:
> Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
> nothing to wake up) changes the futex code to avoid taking a lock when
> there are no waiters. This code has been subsequently fixed in commit
> 11d4616bd07f (futex: revert back to the explicit waiter counting code).
> Both the original commit and the fix-up rely on get_futex_key_refs() to
> always imply a barrier.
>
> However, for private futexes, none of the cases in the switch statement
> of get_futex_key_refs() would be hit and the function completes without
> a memory barrier as required before checking the "waiters" in
> futex_wake() -> hb_waiters_pending(). The consequence is a race with a
> thread waiting on a futex on another CPU, allowing the waker thread to
> read "waiters == 0" while the waiter thread to have read "futex_val ==
> locked" (in kernel).
>
> Without this fix, the problem (user space deadlocks) can be seen with
> Android bionic's mutex implementation on an arm64 multi-cluster system.

How 'bout that, you just triggered my "watch this pot" alarm.

https://lkml.org/lkml/2014/10/8/406

The hang I encountered with stockfish only ever happened on one specific
box. Linus/Thomas said it I was likely a problem with the futex usage,
but it suspiciously deterministic, so I put this on the "watch out for
further evidence" back burner.

The barrier fixing up my problematic box smells a lot like evidence.

> Signed-off-by: Catalin Marinas <[email protected]>
> Reported-by: Matteo Franchin <[email protected]>
> Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
> Cc: <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Darren Hart <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> ---
> kernel/futex.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 815d7af2ffe8..f3a3a071283c 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -343,6 +343,8 @@ static void get_futex_key_refs(union futex_key *key)
> case FUT_OFF_MMSHARED:
> futex_get_mm(key); /* implies MB (B) */
> break;
> + default:
> + smp_mb(); /* explicit MB (B) */
> }
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-10-18 07:09:22

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

(fixes Davidlohr bounce)

On Sat, 2014-10-18 at 08:54 +0200, Mike Galbraith wrote:
> On Fri, 2014-10-17 at 17:38 +0100, Catalin Marinas wrote:
> > Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
> > nothing to wake up) changes the futex code to avoid taking a lock when
> > there are no waiters. This code has been subsequently fixed in commit
> > 11d4616bd07f (futex: revert back to the explicit waiter counting code).
> > Both the original commit and the fix-up rely on get_futex_key_refs() to
> > always imply a barrier.
> >
> > However, for private futexes, none of the cases in the switch statement
> > of get_futex_key_refs() would be hit and the function completes without
> > a memory barrier as required before checking the "waiters" in
> > futex_wake() -> hb_waiters_pending(). The consequence is a race with a
> > thread waiting on a futex on another CPU, allowing the waker thread to
> > read "waiters == 0" while the waiter thread to have read "futex_val ==
> > locked" (in kernel).
> >
> > Without this fix, the problem (user space deadlocks) can be seen with
> > Android bionic's mutex implementation on an arm64 multi-cluster system.
>
> How 'bout that, you just triggered my "watch this pot" alarm.
>
> https://lkml.org/lkml/2014/10/8/406
>
> The hang I encountered with stockfish only ever happened on one specific
> box. Linus/Thomas said it I was likely a problem with the futex usage,
> but it suspiciously deterministic, so I put this on the "watch out for
> further evidence" back burner.
>
> The barrier fixing up my problematic box smells a lot like evidence.
>
> > Signed-off-by: Catalin Marinas <[email protected]>
> > Reported-by: Matteo Franchin <[email protected]>
> > Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
> > Cc: <[email protected]>
> > Cc: Davidlohr Bueso <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Darren Hart <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Paul E. McKenney <[email protected]>
> > ---
> > kernel/futex.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 815d7af2ffe8..f3a3a071283c 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -343,6 +343,8 @@ static void get_futex_key_refs(union futex_key *key)
> > case FUT_OFF_MMSHARED:
> > futex_get_mm(key); /* implies MB (B) */
> > break;
> > + default:
> > + smp_mb(); /* explicit MB (B) */
> > }
> > }
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
>

2014-10-18 07:33:18

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Fri, 2014-10-17 at 17:38 +0100, Catalin Marinas wrote:
> Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
> nothing to wake up) changes the futex code to avoid taking a lock when
> there are no waiters. This code has been subsequently fixed in commit
> 11d4616bd07f (futex: revert back to the explicit waiter counting code).
> Both the original commit and the fix-up rely on get_futex_key_refs() to
> always imply a barrier.
>
> However, for private futexes, none of the cases in the switch statement
> of get_futex_key_refs() would be hit and the function completes without
> a memory barrier as required before checking the "waiters" in
> futex_wake() -> hb_waiters_pending().

Good catch, glad I ran into this thread (my email recently changed).
Private process futex (PTHREAD_PROCESS_PRIVATE) have no reference on an
inode or mm so it would need the explicit barrier in those cases.

> The consequence is a race with a
> thread waiting on a futex on another CPU, allowing the waker thread to
> read "waiters == 0" while the waiter thread to have read "futex_val ==
> locked" (in kernel).

Yeah missing wakeups are a strong sign of a problem with the
hb_waiters_pending() side.

> Without this fix, the problem (user space deadlocks) can be seen with
> Android bionic's mutex implementation on an arm64 multi-cluster system.
> Signed-off-by: Catalin Marinas <[email protected]>
> Reported-by: Matteo Franchin <[email protected]>
> Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
> Cc: <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Darren Hart <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> ---
> kernel/futex.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 815d7af2ffe8..f3a3a071283c 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -343,6 +343,8 @@ static void get_futex_key_refs(union futex_key *key)
> case FUT_OFF_MMSHARED:
> futex_get_mm(key); /* implies MB (B) */
> break;
> + default:
> + smp_mb(); /* explicit MB (B) */
> }

Should we comment that this default is for the private futex case?
Otherwise:

Acked-by: Davidlohr Bueso <[email protected]>

2014-10-18 15:28:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Fri, Oct 17, 2014 at 11:54 PM, Mike Galbraith
<[email protected]> wrote:
>
> The barrier fixing up my problematic box smells a lot like evidence.

Is this a "tested-by"? Did you actuallyu verify that the patch ends up
fixing the problem you saw?

Linus

2014-10-18 16:15:36

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, 2014-10-18 at 08:28 -0700, Linus Torvalds wrote:
> On Fri, Oct 17, 2014 at 11:54 PM, Mike Galbraith
> <[email protected]> wrote:
> >
> > The barrier fixing up my problematic box smells a lot like evidence.
>
> Is this a "tested-by"? Did you actuallyu verify that the patch ends up
> fixing the problem you saw?

Yup, it definitely fixed it up.

Weird that it didn't show at all on the 2 socket 20 core box the problem
I was looking into was reported on, but was 100% busted on the similar 2
socket 28 core box I borrowed to have a peek, and only that box. My
(crippled/slow) 64 core DL980 was perfectly happy, as were my 3 home
boxen, not a peep from anywhere but that one 28 core box.

Hohum.. Tested-by: Mike Galbraith <[email protected]>

-Mike

2014-10-18 19:32:17

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On 10/17/14 11:38, Catalin Marinas wrote:
> Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
> nothing to wake up) changes the futex code to avoid taking a lock when
> there are no waiters. This code has been subsequently fixed in commit
> 11d4616bd07f (futex: revert back to the explicit waiter counting code).
> Both the original commit and the fix-up rely on get_futex_key_refs() to
> always imply a barrier.
>
> However, for private futexes, none of the cases in the switch statement
> of get_futex_key_refs() would be hit and the function completes without
> a memory barrier as required before checking the "waiters" in
> futex_wake() -> hb_waiters_pending(). The consequence is a race with a
> thread waiting on a futex on another CPU, allowing the waker thread to
> read "waiters == 0" while the waiter thread to have read "futex_val ==
> locked" (in kernel).


Verified that this is:

a) how it is documented to work
b) not how it actually works currently

Nice catch indeed.

...

> diff --git a/kernel/futex.c b/kernel/futex.c
> index 815d7af2ffe8..f3a3a071283c 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -343,6 +343,8 @@ static void get_futex_key_refs(union futex_key *key)
> case FUT_OFF_MMSHARED:
> futex_get_mm(key); /* implies MB (B) */
> break;
> + default:

A comment here indicating this covers the PROCESS_PRIVATE futex case
would be welcome, given the complexity involved.

> + smp_mb(); /* explicit MB (B) */

Also, the "Basic" futex operation and ordering guarantees documentation
currently reads:

* Where (A) orders the waiters increment and the futex value read through
* atomic operations (see hb_waiters_inc) and where (B) orders the write
* to futex and the waiters read -- this is done by the barriers in
* get_futex_key_refs(), through either ihold or atomic_inc, depending
on the
* futex type.

Which is not incomplete (lacking the explicit smp_mb()) added by this
patch. Perhaps the MB implementation of get_futex_key_refs() need not be
explicitly enumerated here?

--
Darren Hart
Intel Open Source Technology Center

2014-10-18 19:58:52

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, 2014-10-18 at 00:33 -0700, Davidlohr Bueso wrote:
> On Fri, 2014-10-17 at 17:38 +0100, Catalin Marinas wrote:
> > Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
> > nothing to wake up) changes the futex code to avoid taking a lock when
> > there are no waiters. This code has been subsequently fixed in commit
> > 11d4616bd07f (futex: revert back to the explicit waiter counting code).
> > Both the original commit and the fix-up rely on get_futex_key_refs() to
> > always imply a barrier.
> >
> > However, for private futexes, none of the cases in the switch statement
> > of get_futex_key_refs() would be hit and the function completes without
> > a memory barrier as required before checking the "waiters" in
> > futex_wake() -> hb_waiters_pending().
>
> Good catch, glad I ran into this thread (my email recently changed).
> Private process futex (PTHREAD_PROCESS_PRIVATE) have no reference on an
> inode or mm so it would need the explicit barrier in those cases.

And [get/put]_futex_keys() shouldn't even be called for private futexes.
The following patch had some very minor testing on a 60 core box last
night, but passes both Darren's and perf's tests. So I *think* this is
right, but lack of sleep and I overall just don't trust them futexes!

8<----------------------------------------------------------
From: Davidlohr Bueso <[email protected]>
Date: Sat, 18 Oct 2014 12:30:37 -0700
Subject: [PATCH 2/1] futex: No key referencing for private futexes

Because private futexes do not hold references on either
an inode or mm, they should not be calling key referencing
operations (even though they are practically a nop). However,
we need to call the get part only because we need the barrier
in order to maintain correct ordering guarantees for the lockless
waiter checking. In addition, we can avoid calling the put part
for private futexes altogether, as it serves no purpose in the
ordering.

This patch 1) documents the situation, 2) explicitly avoids calling
drop_futex_key_refs() when calling put_futex_keys() for private futexes
and 3) changes the interface of the function to pass the 'fshared'
variable, similarly to get_futex_key_refs(). In theory this should apply
to all drop_futex_key_refs() callers, but just keep it simple and apply
it as the get/put alternatives when calling futex(2).

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/futex.c | 99 ++++++++++++++++++++++++++++++++--------------------------
1 file changed, 55 insertions(+), 44 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 815d7af..21f7e41 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -415,6 +415,11 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
if (!fshared) {
key->private.mm = mm;
key->private.address = address;
+ /*
+ * Private futexes do not hold reference on an inode or
+ * mm, therefore the only purpose of calling get_futex_key_refs
+ * is because we need the barrier for the lockless waiter check.
+ */
get_futex_key_refs(key); /* implies MB (B) */
return 0;
}
@@ -530,9 +535,14 @@ out:
return err;
}

-static inline void put_futex_key(union futex_key *key)
+static inline void put_futex_key(int fshared, union futex_key *key)
{
- drop_futex_key_refs(key);
+ /*
+ * See comment in get_futex_key() about key
+ * referencing when dealing with private futexes.
+ */
+ if (fshared)
+ drop_futex_key_refs(key);
}

/**
@@ -1202,12 +1212,12 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
union futex_key key = FUTEX_KEY_INIT;
- int ret;
+ int ret, fshared = flags & FLAGS_SHARED;

if (!bitset)
return -EINVAL;

- ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, VERIFY_READ);
+ ret = get_futex_key(uaddr, fshared, &key, VERIFY_READ);
if (unlikely(ret != 0))
goto out;

@@ -1238,7 +1248,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)

spin_unlock(&hb->lock);
out_put_key:
- put_futex_key(&key);
+ put_futex_key(fshared, &key);
out:
return ret;
}
@@ -1254,13 +1264,13 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2;
struct futex_q *this, *next;
- int ret, op_ret;
+ int ret, op_ret, fshared = flags & FLAGS_SHARED;

retry:
- ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
+ ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ);
if (unlikely(ret != 0))
goto out;
- ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, VERIFY_WRITE);
+ ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE);
if (unlikely(ret != 0))
goto out_put_key1;

@@ -1292,11 +1302,11 @@ retry_private:
if (ret)
goto out_put_keys;

- if (!(flags & FLAGS_SHARED))
+ if (!fshared)
goto retry_private;

- put_futex_key(&key2);
- put_futex_key(&key1);
+ put_futex_key(fshared, &key2);
+ put_futex_key(fshared, &key1);
goto retry;
}

@@ -1331,9 +1341,9 @@ retry_private:
out_unlock:
double_unlock_hb(hb1, hb2);
out_put_keys:
- put_futex_key(&key2);
+ put_futex_key(fshared, &key2);
out_put_key1:
- put_futex_key(&key1);
+ put_futex_key(fshared, &key1);
out:
return ret;
}
@@ -1485,10 +1495,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
u32 *cmpval, int requeue_pi)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
- int drop_count = 0, task_count = 0, ret;
+ int drop_count = 0, task_count = 0;
struct futex_pi_state *pi_state = NULL;
struct futex_hash_bucket *hb1, *hb2;
struct futex_q *this, *next;
+ int ret, fshared = flags & FLAGS_SHARED;

if (requeue_pi) {
/*
@@ -1528,10 +1539,10 @@ retry:
pi_state = NULL;
}

- ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
+ ret = get_futex_key(uaddr1, fshared, &key1, VERIFY_READ);
if (unlikely(ret != 0))
goto out;
- ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
+ ret = get_futex_key(uaddr2, fshared, &key2,
requeue_pi ? VERIFY_WRITE : VERIFY_READ);
if (unlikely(ret != 0))
goto out_put_key1;
@@ -1565,11 +1576,11 @@ retry_private:
if (ret)
goto out_put_keys;

- if (!(flags & FLAGS_SHARED))
+ if (!fshared)
goto retry_private;

- put_futex_key(&key2);
- put_futex_key(&key1);
+ put_futex_key(fshared, &key2);
+ put_futex_key(fshared, &key1);
goto retry;
}
if (curval != *cmpval) {
@@ -1619,8 +1630,8 @@ retry_private:
case -EFAULT:
double_unlock_hb(hb1, hb2);
hb_waiters_dec(hb2);
- put_futex_key(&key2);
- put_futex_key(&key1);
+ put_futex_key(fshared, &key2);
+ put_futex_key(fshared, &key1);
ret = fault_in_user_writeable(uaddr2);
if (!ret)
goto retry;
@@ -1634,8 +1645,8 @@ retry_private:
*/
double_unlock_hb(hb1, hb2);
hb_waiters_dec(hb2);
- put_futex_key(&key2);
- put_futex_key(&key1);
+ put_futex_key(fshared, &key2);
+ put_futex_key(fshared, &key1);
cond_resched();
goto retry;
default:
@@ -1721,9 +1732,9 @@ out_unlock:
drop_futex_key_refs(&key1);

out_put_keys:
- put_futex_key(&key2);
+ put_futex_key(fshared, &key2);
out_put_key1:
- put_futex_key(&key1);
+ put_futex_key(fshared, &key1);
out:
if (pi_state != NULL)
free_pi_state(pi_state);
@@ -2098,7 +2109,7 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
struct futex_q *q, struct futex_hash_bucket **hb)
{
u32 uval;
- int ret;
+ int ret, fshared = flags & FLAGS_SHARED;

/*
* Access the page AFTER the hash-bucket is locked.
@@ -2119,7 +2130,7 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
* while the syscall executes.
*/
retry:
- ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key, VERIFY_READ);
+ ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
if (unlikely(ret != 0))
return ret;

@@ -2135,10 +2146,10 @@ retry_private:
if (ret)
goto out;

- if (!(flags & FLAGS_SHARED))
+ if (!fshared)
goto retry_private;

- put_futex_key(&q->key);
+ put_futex_key(fshared, &q->key);
goto retry;
}

@@ -2149,7 +2160,7 @@ retry_private:

out:
if (ret)
- put_futex_key(&q->key);
+ put_futex_key(fshared, &q->key);
return ret;
}

@@ -2256,7 +2267,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, int detect,
struct hrtimer_sleeper timeout, *to = NULL;
struct futex_hash_bucket *hb;
struct futex_q q = futex_q_init;
- int res, ret;
+ int res, ret, fshared = flags & FLAGS_SHARED;

if (refill_pi_state_cache())
return -ENOMEM;
@@ -2270,7 +2281,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, int detect,
}

retry:
- ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, VERIFY_WRITE);
+ ret = get_futex_key(uaddr, fshared, &q.key, VERIFY_WRITE);
if (unlikely(ret != 0))
goto out;

@@ -2294,7 +2305,7 @@ retry_private:
* - The user space value changed.
*/
queue_unlock(hb);
- put_futex_key(&q.key);
+ put_futex_key(fshared, &q.key);
cond_resched();
goto retry;
default:
@@ -2348,7 +2359,7 @@ out_unlock_put_key:
queue_unlock(hb);

out_put_key:
- put_futex_key(&q.key);
+ put_futex_key(fshared, &q.key);
out:
if (to)
destroy_hrtimer_on_stack(&to->timer);
@@ -2361,10 +2372,10 @@ uaddr_faulted:
if (ret)
goto out_put_key;

- if (!(flags & FLAGS_SHARED))
+ if (!fshared)
goto retry_private;

- put_futex_key(&q.key);
+ put_futex_key(fshared, &q.key);
goto retry;
}

@@ -2379,7 +2390,7 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
union futex_key key = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb;
struct futex_q *match;
- int ret;
+ int ret, fshared = flags & FLAGS_SHARED;

retry:
if (get_user(uval, uaddr))
@@ -2390,7 +2401,7 @@ retry:
if ((uval & FUTEX_TID_MASK) != vpid)
return -EPERM;

- ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, VERIFY_WRITE);
+ ret = get_futex_key(uaddr, fshared, &key, VERIFY_WRITE);
if (ret)
return ret;

@@ -2431,12 +2442,12 @@ retry:

out_unlock:
spin_unlock(&hb->lock);
- put_futex_key(&key);
+ put_futex_key(fshared, &key);
return ret;

pi_faulted:
spin_unlock(&hb->lock);
- put_futex_key(&key);
+ put_futex_key(fshared, &key);

ret = fault_in_user_writeable(uaddr);
if (!ret)
@@ -2544,7 +2555,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
struct futex_hash_bucket *hb;
union futex_key key2 = FUTEX_KEY_INIT;
struct futex_q q = futex_q_init;
- int res, ret;
+ int res, ret, fshared = flags & FLAGS_SHARED;

if (uaddr == uaddr2)
return -EINVAL;
@@ -2571,7 +2582,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
RB_CLEAR_NODE(&rt_waiter.tree_entry);
rt_waiter.task = NULL;

- ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, VERIFY_WRITE);
+ ret = get_futex_key(uaddr2, fshared, &key2, VERIFY_WRITE);
if (unlikely(ret != 0))
goto out;

@@ -2673,9 +2684,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
}

out_put_keys:
- put_futex_key(&q.key);
+ put_futex_key(fshared, &q.key);
out_key2:
- put_futex_key(&key2);
+ put_futex_key(fshared, &key2);

out:
if (to) {
--
1.8.4.5


2014-10-18 20:20:09

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, 2014-10-18 at 14:32 -0500, Darren Hart wrote:
> Which is not incomplete (lacking the explicit smp_mb()) added by this
> patch. Perhaps the MB implementation of get_futex_key_refs() need not be
> explicitly enumerated here?

Agreed, how about this:

diff --git a/kernel/futex.c b/kernel/futex.c
index 21f7e41..7a0805a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -143,9 +143,8 @@
*
* Where (A) orders the waiters increment and the futex value read through
* atomic operations (see hb_waiters_inc) and where (B) orders the write
- * to futex and the waiters read -- this is done by the barriers in
- * get_futex_key_refs(), through either ihold or atomic_inc, depending on the
- * futex type.
+ * to futex and the waiters read -- this is done by the barriers for both
+ * shared and private futexes in get_futex_key_refs().
*
* This yields the following case (where X:=waiters, Y:=futex):
*


2014-10-18 20:50:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, Oct 18, 2014 at 12:58 PM, Davidlohr Bueso <[email protected]> wrote:
>
> And [get/put]_futex_keys() shouldn't even be called for private futexes.
> The following patch had some very minor testing on a 60 core box last
> night, but passes both Darren's and perf's tests. So I *think* this is
> right, but lack of sleep and I overall just don't trust them futexes!

Hmm. I don't see the advantage of making the code more complex in
order to avoid the functions that are no-ops for the !fshared case?

IOW, as far as I can tell, this patch doesn't actually really *change*
anything. Am I missing something?

Linus

2014-10-19 01:48:44

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier



On October 18, 2014 3:19:50 PM CDT, Davidlohr Bueso <[email protected]> wrote:
>On Sat, 2014-10-18 at 14:32 -0500, Darren Hart wrote:
>> Which is not incomplete (lacking the explicit smp_mb()) added by this
>> patch. Perhaps the MB implementation of get_futex_key_refs() need not
>be
>> explicitly enumerated here?
>
>Agreed, how about this:
>
>diff --git a/kernel/futex.c b/kernel/futex.c
>index 21f7e41..7a0805a 100644
>--- a/kernel/futex.c
>+++ b/kernel/futex.c
>@@ -143,9 +143,8 @@
> *
>* Where (A) orders the waiters increment and the futex value read
>through
>* atomic operations (see hb_waiters_inc) and where (B) orders the write
>- * to futex and the waiters read -- this is done by the barriers in
>- * get_futex_key_refs(), through either ihold or atomic_inc, depending
>on the
>- * futex type.
>+ * to futex and the waiters read -- this is done by the barriers for
>both
>+ * shared and private futexes in get_futex_key_refs().

Works for me.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2014-10-19 02:16:36

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, 2014-10-18 at 13:50 -0700, Linus Torvalds wrote:
> On Sat, Oct 18, 2014 at 12:58 PM, Davidlohr Bueso <[email protected]> wrote:
> >
> > And [get/put]_futex_keys() shouldn't even be called for private futexes.
> > The following patch had some very minor testing on a 60 core box last
> > night, but passes both Darren's and perf's tests. So I *think* this is
> > right, but lack of sleep and I overall just don't trust them futexes!
>
> Hmm. I don't see the advantage of making the code more complex in
> order to avoid the functions that are no-ops for the !fshared case?
>
> IOW, as far as I can tell, this patch doesn't actually really *change*
> anything. Am I missing something?

Right, all we do is avoid a NOP, but I don't see how this patch makes
the code more complex. In fact, the whole idea is to make it easier to
read and makes the key referencing differences between shared and
private futexes crystal clear, hoping to mitigate future bugs.

Thanks,
Davidlohr

2014-10-20 09:11:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, 18 Oct 2014, Davidlohr Bueso wrote:
> On Sat, 2014-10-18 at 13:50 -0700, Linus Torvalds wrote:
> > On Sat, Oct 18, 2014 at 12:58 PM, Davidlohr Bueso <[email protected]> wrote:
> > >
> > > And [get/put]_futex_keys() shouldn't even be called for private futexes.
> > > The following patch had some very minor testing on a 60 core box last
> > > night, but passes both Darren's and perf's tests. So I *think* this is
> > > right, but lack of sleep and I overall just don't trust them futexes!
> >
> > Hmm. I don't see the advantage of making the code more complex in
> > order to avoid the functions that are no-ops for the !fshared case?
> >
> > IOW, as far as I can tell, this patch doesn't actually really *change*
> > anything. Am I missing something?
>
> Right, all we do is avoid a NOP, but I don't see how this patch makes
> the code more complex. In fact, the whole idea is to make it easier to
> read and makes the key referencing differences between shared and
> private futexes crystal clear, hoping to mitigate future bugs.

I tend to disagree. The current code is symetric versus get/drop and
you make it unsymetric by avoiding the drop call with a pointless
extra conditional in all call sites.

I really had to look twice to figure out that the patch is correct,
but I really cannot see any value and definitely have a hard time how
this makes the code clearer and would prevent future bugs.

I rather keep it symetric and document the NOP property for private
futexes in both get and drop.

Thanks,

tglx

2014-10-20 10:15:42

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Sat, Oct 18, 2014 at 09:19:50PM +0100, Davidlohr Bueso wrote:
> On Sat, 2014-10-18 at 14:32 -0500, Darren Hart wrote:
> > Which is not incomplete (lacking the explicit smp_mb()) added by this
> > patch. Perhaps the MB implementation of get_futex_key_refs() need not be
> > explicitly enumerated here?
>
> Agreed, how about this:
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 21f7e41..7a0805a 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -143,9 +143,8 @@
> *
> * Where (A) orders the waiters increment and the futex value read through
> * atomic operations (see hb_waiters_inc) and where (B) orders the write
> - * to futex and the waiters read -- this is done by the barriers in
> - * get_futex_key_refs(), through either ihold or atomic_inc, depending on the
> - * futex type.
> + * to futex and the waiters read -- this is done by the barriers for both
> + * shared and private futexes in get_futex_key_refs().
> *
> * This yields the following case (where X:=waiters, Y:=futex):

Looks fine to me. Since Linus already picked the original patch, if you
plan to send an update for the comments please also mention the
"explicit MB (B) for private futexes" in get_futex_key_refs().

Thanks.

--
Catalin

2014-10-20 10:49:38

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Mon, Oct 20, 2014 at 10:11:40AM +0100, Thomas Gleixner wrote:
> On Sat, 18 Oct 2014, Davidlohr Bueso wrote:
> > On Sat, 2014-10-18 at 13:50 -0700, Linus Torvalds wrote:
> > > On Sat, Oct 18, 2014 at 12:58 PM, Davidlohr Bueso <[email protected]> wrote:
> > > >
> > > > And [get/put]_futex_keys() shouldn't even be called for private futexes.
> > > > The following patch had some very minor testing on a 60 core box last
> > > > night, but passes both Darren's and perf's tests. So I *think* this is
> > > > right, but lack of sleep and I overall just don't trust them futexes!
> > >
> > > Hmm. I don't see the advantage of making the code more complex in
> > > order to avoid the functions that are no-ops for the !fshared case?
> > >
> > > IOW, as far as I can tell, this patch doesn't actually really *change*
> > > anything. Am I missing something?
> >
> > Right, all we do is avoid a NOP, but I don't see how this patch makes
> > the code more complex. In fact, the whole idea is to make it easier to
> > read and makes the key referencing differences between shared and
> > private futexes crystal clear, hoping to mitigate future bugs.
>
> I tend to disagree. The current code is symetric versus get/drop and
> you make it unsymetric by avoiding the drop call with a pointless
> extra conditional in all call sites.

Since you mention symmetry, something like below makes the barriers more
explicit. However, it removes the implied barrier for get_futex_key_refs
and the other calling places need to be verified (requeue_futex and
requeue_pi_futex; if barrier is not required here, the result may be
slightly more optimal, not sure you would see the difference though).


diff --git a/kernel/futex.c b/kernel/futex.c
index f3a3a071283c..5b9d857d0816 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -143,9 +143,7 @@
*
* Where (A) orders the waiters increment and the futex value read through
* atomic operations (see hb_waiters_inc) and where (B) orders the write
- * to futex and the waiters read -- this is done by the barriers in
- * get_futex_key_refs(), through either ihold or atomic_inc, depending on the
- * futex type.
+ * to futex and the waiters read (see hb_waiters_pending).
*
* This yields the following case (where X:=waiters, Y:=futex):
*
@@ -262,12 +260,6 @@ static struct futex_hash_bucket *futex_queues;
static inline void futex_get_mm(union futex_key *key)
{
atomic_inc(&key->private.mm->mm_count);
- /*
- * Ensure futex_get_mm() implies a full barrier such that
- * get_futex_key() implies a full barrier. This is relied upon
- * as full barrier (B), see the ordering comment above.
- */
- smp_mb__after_atomic();
}

/*
@@ -297,6 +289,10 @@ static inline void hb_waiters_dec(struct futex_hash_bucket *hb)

static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
{
+ /*
+ * Full barrier (B), see the ordering comment above.
+ */
+ smp_mb__before_atomic();
#ifdef CONFIG_SMP
return atomic_read(&hb->waiters);
#else
@@ -338,13 +334,11 @@ static void get_futex_key_refs(union futex_key *key)

switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
case FUT_OFF_INODE:
- ihold(key->shared.inode); /* implies MB (B) */
+ __iget(key->shared.inode);
break;
case FUT_OFF_MMSHARED:
- futex_get_mm(key); /* implies MB (B) */
+ futex_get_mm(key);
break;
- default:
- smp_mb(); /* explicit MB (B) */
}
}

@@ -417,7 +411,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
if (!fshared) {
key->private.mm = mm;
key->private.address = address;
- get_futex_key_refs(key); /* implies MB (B) */
+ get_futex_key_refs(key);
return 0;
}

@@ -524,7 +518,7 @@ again:
key->shared.pgoff = basepage_index(page);
}

- get_futex_key_refs(key); /* implies MB (B) */
+ get_futex_key_refs(key);

out:
unlock_page(page_head);

2014-10-20 15:32:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Mon, Oct 20, 2014 at 3:49 AM, Catalin Marinas
<[email protected]> wrote:
>
> Since you mention symmetry, something like below makes the barriers more
> explicit.

Borken, for two reasons:

> diff --git a/kernel/futex.c b/kernel/futex.c
> index f3a3a071283c..5b9d857d0816 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -143,9 +143,7 @@
> static inline void futex_get_mm(union futex_key *key)
> {
> atomic_inc(&key->private.mm->mm_count);
> - /*
> - * Ensure futex_get_mm() implies a full barrier such that
> - * get_futex_key() implies a full barrier. This is relied upon
> - * as full barrier (B), see the ordering comment above.
> - */
> - smp_mb__after_atomic();
> }

So the thing is, this means that we can't take advantage of the fact
that "atomic_inc" is already an atomic. So this is just a performance
breakage. But:

>
> static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
> {
> + /*
> + * Full barrier (B), see the ordering comment above.
> + */
> + smp_mb__before_atomic();
> #ifdef CONFIG_SMP
> return atomic_read(&hb->waiters);

This is just entirely broken.

"atomic_read()" isn't really an "atomic op" at all. despite the name,
it's just a read that is basically ACCESS_ONCE.

So smp_mb__before_atomic() doesn't work for atomic_read(), and the
code is nonsensical and doesn't work. It would need to be a full
memory barrier.

Linus

2014-10-20 16:00:22

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] futex: Ensure get_futex_key_refs() always implies a barrier

On Mon, Oct 20, 2014 at 04:32:00PM +0100, Linus Torvalds wrote:
> On Mon, Oct 20, 2014 at 3:49 AM, Catalin Marinas
> <[email protected]> wrote:
> > Since you mention symmetry, something like below makes the barriers more
> > explicit.
>
> Borken, for two reasons:
>
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index f3a3a071283c..5b9d857d0816 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -143,9 +143,7 @@
> > static inline void futex_get_mm(union futex_key *key)
> > {
> > atomic_inc(&key->private.mm->mm_count);
> > - /*
> > - * Ensure futex_get_mm() implies a full barrier such that
> > - * get_futex_key() implies a full barrier. This is relied upon
> > - * as full barrier (B), see the ordering comment above.
> > - */
> > - smp_mb__after_atomic();
> > }
>
> So the thing is, this means that we can't take advantage of the fact
> that "atomic_inc" is already an atomic. So this is just a performance
> breakage. But:

OK, I looked at this from an ARM perspective only and it would not make
any difference. But it seems that MIPS makes a distinction between
"before" and "after" barriers with "before" defined as wmb in some
configuration, so the hunk below would break it.

> > static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
> > {
> > + /*
> > + * Full barrier (B), see the ordering comment above.
> > + */
> > + smp_mb__before_atomic();
> > #ifdef CONFIG_SMP
> > return atomic_read(&hb->waiters);
>
> This is just entirely broken.
>
> "atomic_read()" isn't really an "atomic op" at all. despite the name,
> it's just a read that is basically ACCESS_ONCE.
>
> So smp_mb__before_atomic() doesn't work for atomic_read(), and the
> code is nonsensical and doesn't work. It would need to be a full
> memory barrier.

Looking at the semantics of smp_mb__*_atomic(), it would indeed have to
be a full smp_mb() here.

--
Catalin

2014-10-24 03:27:17

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH] futex: Mention key referencing differences between shared and private futexes

From: Davidlohr Bueso <[email protected]>

Update our documentation as of fix 76835b0ebf8 (futex: Ensure
get_futex_key_refs() always implies a barrier). Explicitly
state that we don't do key referencing for private futexes.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/futex.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f3a3a07..bbf071f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -143,9 +143,8 @@
*
* Where (A) orders the waiters increment and the futex value read through
* atomic operations (see hb_waiters_inc) and where (B) orders the write
- * to futex and the waiters read -- this is done by the barriers in
- * get_futex_key_refs(), through either ihold or atomic_inc, depending on the
- * futex type.
+ * to futex and the waiters read -- this is done by the barriers for both
+ * shared and private futexes in get_futex_key_refs().
*
* This yields the following case (where X:=waiters, Y:=futex):
*
@@ -344,13 +343,20 @@ static void get_futex_key_refs(union futex_key *key)
futex_get_mm(key); /* implies MB (B) */
break;
default:
+ /*
+ * Private futexes do not hold reference on an inode or
+ * mm, therefore the only purpose of calling get_futex_key_refs
+ * is because we need the barrier for the lockless waiter check.
+ */
smp_mb(); /* explicit MB (B) */
}
}

/*
* Drop a reference to the resource addressed by a key.
- * The hash bucket spinlock must not be held.
+ * The hash bucket spinlock must not be held. This is
+ * a no-op for private futexes, see comment in the get
+ * counterpart.
*/
static void drop_futex_key_refs(union futex_key *key)
{
--
1.8.4.5


2014-10-24 09:11:50

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] futex: Mention key referencing differences between shared and private futexes

On Fri, Oct 24, 2014 at 04:27:00AM +0100, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <[email protected]>
>
> Update our documentation as of fix 76835b0ebf8 (futex: Ensure
> get_futex_key_refs() always implies a barrier). Explicitly
> state that we don't do key referencing for private futexes.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>

Looks fine to me.

Acked-by: Catalin Marinas <[email protected]>

Thanks.

Subject: [tip:locking/urgent] futex: Mention key referencing differences between shared and private futexes

Commit-ID: 993b2ff221999066fcff231590593d0b98f45d32
Gitweb: http://git.kernel.org/tip/993b2ff221999066fcff231590593d0b98f45d32
Author: Davidlohr Bueso <[email protected]>
AuthorDate: Thu, 23 Oct 2014 20:27:00 -0700
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 26 Oct 2014 16:16:18 +0100

futex: Mention key referencing differences between shared and private futexes

Update our documentation as of fix 76835b0ebf8 (futex: Ensure
get_futex_key_refs() always implies a barrier). Explicitly
state that we don't do key referencing for private futexes.

Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Matteo Franchin <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/futex.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f3a3a07..bbf071f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -143,9 +143,8 @@
*
* Where (A) orders the waiters increment and the futex value read through
* atomic operations (see hb_waiters_inc) and where (B) orders the write
- * to futex and the waiters read -- this is done by the barriers in
- * get_futex_key_refs(), through either ihold or atomic_inc, depending on the
- * futex type.
+ * to futex and the waiters read -- this is done by the barriers for both
+ * shared and private futexes in get_futex_key_refs().
*
* This yields the following case (where X:=waiters, Y:=futex):
*
@@ -344,13 +343,20 @@ static void get_futex_key_refs(union futex_key *key)
futex_get_mm(key); /* implies MB (B) */
break;
default:
+ /*
+ * Private futexes do not hold reference on an inode or
+ * mm, therefore the only purpose of calling get_futex_key_refs
+ * is because we need the barrier for the lockless waiter check.
+ */
smp_mb(); /* explicit MB (B) */
}
}

/*
* Drop a reference to the resource addressed by a key.
- * The hash bucket spinlock must not be held.
+ * The hash bucket spinlock must not be held. This is
+ * a no-op for private futexes, see comment in the get
+ * counterpart.
*/
static void drop_futex_key_refs(union futex_key *key)
{