2017-12-06 23:46:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes

On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:

> fixup_owner() used to have additional seemingly relevant checks in place
> that were removed 73d786bd043eb ("futex: Rework inconsistent
> rt_mutex/futex_q state").

*groan*... yes. I completely missed that extra case also applied to
requeue_pi (requeue always did hurt my brain).

I got as far as actually understanding the problem; but its near 1am and
I desperately need a sleep, I'll see if I can come up with a solution
tomorrow. I just wanted to let you know I finally got around to looking
at this..


2017-12-07 02:17:26

by Gratian Crisan

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes


Peter Zijlstra writes:

> On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
>
>> fixup_owner() used to have additional seemingly relevant checks in place
>> that were removed 73d786bd043eb ("futex: Rework inconsistent
>> rt_mutex/futex_q state").
>
> *groan*... yes. I completely missed that extra case also applied to
> requeue_pi (requeue always did hurt my brain).

FWIW I have been testing for about two days now with the fixup_owner()
hunk of 73d786bd043eb ("futex: Rework inconsistent rt_mutex/futex_q
state") reverted. So far it hasn't hit the race/deadlock. It normally
takes around 8 hours to reproduce.

I've also tried Julia's msleep() trick for expanding the race window for
the last 4 hours or so of testing and it seems to be still going.

Thanks,
Gratian

2017-12-07 10:45:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes

On Wed, Dec 06, 2017 at 08:09:28PM -0600, Gratian Crisan wrote:
>
> Peter Zijlstra writes:
>
> > On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> >
> >> fixup_owner() used to have additional seemingly relevant checks in place
> >> that were removed 73d786bd043eb ("futex: Rework inconsistent
> >> rt_mutex/futex_q state").
> >
> > *groan*... yes. I completely missed that extra case also applied to
> > requeue_pi (requeue always did hurt my brain).
>
> FWIW I have been testing for about two days now with the fixup_owner()
> hunk of 73d786bd043eb ("futex: Rework inconsistent rt_mutex/futex_q
> state") reverted. So far it hasn't hit the race/deadlock. It normally
> takes around 8 hours to reproduce.

Yeah, that should more-or-less work I think. But I'm trying to see if
there's anything saner we can do, but so far my brain keeps slipping
off.

At the very least I want to kill the various wait_lock lockbreaks in
there, those hurt my brain and make me nervous as hell. That fixup does
_3_ consecutive wait_lock sections, and it becomes a very complicated
story to argue why that's not riddled with holes.

For now I have something like the below; which obviously doesn't
compile yet. Let me grab lunch and such things before attempting more.

---
kernel/futex.c | 33 ++++++++++++++++++++++++++++++++-
kernel/locking/rtmutex.c | 27 +++++++++++++++++++--------
kernel/locking/rtmutex_common.h | 1 +
3 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 76ed5921117a..8ad5221fbd84 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2303,14 +2303,35 @@ static void unqueue_me_pi(struct futex_q *q)
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
struct task_struct *newowner)
{
- u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
u32 uval, uninitialized_var(curval), newval;
struct task_struct *oldowner;
+ u32 newtid;
int ret;

raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

+ if (!newowner) {
+ if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+ ret = 0;
+ goto out_unlock;
+ }
+
+ newowner = rt_mutex_owner(&pi_state->pi_mutex);
+ if (WARN_ON_ONCE(!newowner)) {
+ /*
+ * We just attempted a trylock; since that failed there
+ * must be an owner, right?
+ */
+ ret = -EFUCKED; /* XXX: check return paths */
+ goto out_unlock;
+ }
+
+ /* OK we have a newowner, fixup uval */
+ }
+
+ newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
oldowner = pi_state->owner;
/* Owner died? */
if (!pi_state->owner)
@@ -2443,6 +2464,16 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
goto out;
}

+ /*
+ * If we didn't get the lock; check if nobody stole it from us.
+ * In that case, we need to fix up the uval to point to them
+ * instead of us, otherwise bad things happen.
+ */
+ if (q->pi_state->owner == current) {
+ ret = fixup_pi_state_owner(uaddr, q, NULL);
+ goto out;
+ }
+
/*
* Paranoia check. If we did not take the lock, then we should not be
* the owner of the rt_mutex.
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6f3dba6e4e9e..21705f2fae1c 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,13 +1290,25 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
return ret;
}

+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+ int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+ /*
+ * try_to_take_rt_mutex() sets the lock waiters bit
+ * unconditionally. Clean this up.
+ */
+ fixup_rt_mutex_waiters(lock);
+
+ return ret;
+}
+
/*
* Slow path try-lock function:
*/
static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
{
unsigned long flags;
- int ret;

/*
* If the lock already has an owner we fail to get the lock.
@@ -1312,13 +1324,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);

- ret = try_to_take_rt_mutex(lock, current, NULL);
-
- /*
- * try_to_take_rt_mutex() sets the lock waiters bit
- * unconditionally. Clean this up.
- */
- fixup_rt_mutex_waiters(lock);
+ ret = __rt_mutex_slowtrylock(lock);

raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

@@ -1505,6 +1511,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
return rt_mutex_slowtrylock(lock);
}

+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+ return __rt_mutex_slowtrylock(lock);
+}
+
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 124e98ca0b17..68686b3ec3c1 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter);

extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);

extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

2017-12-07 14:27:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes

On Thu, Dec 07, 2017 at 11:45:16AM +0100, Peter Zijlstra wrote:
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 76ed5921117a..8ad5221fbd84 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2303,14 +2303,35 @@ static void unqueue_me_pi(struct futex_q *q)
> static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
> struct task_struct *newowner)
> {
> - u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
> struct futex_pi_state *pi_state = q->pi_state;
> u32 uval, uninitialized_var(curval), newval;
> struct task_struct *oldowner;
> + u32 newtid;
> int ret;
>
> raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
>
> + if (!newowner) {
> + if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
> + ret = 0;
> + goto out_unlock;
> + }
> +
> + newowner = rt_mutex_owner(&pi_state->pi_mutex);
> + if (WARN_ON_ONCE(!newowner)) {
> + /*
> + * We just attempted a trylock; since that failed there
> + * must be an owner, right?
> + */
> + ret = -EFUCKED; /* XXX: check return paths */
> + goto out_unlock;
> + }
> +
> + /* OK we have a newowner, fixup uval */
> + }
> +
> + newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
> +
> oldowner = pi_state->owner;
> /* Owner died? */
> if (!pi_state->owner)

OK, this is broken because it needs to be inside the retry label,
because we drop the locks when we take a fault. And if we drop the lock,
the rt_mutex owner can change etc..

> @@ -2443,6 +2464,16 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
> goto out;
> }
>
> + /*
> + * If we didn't get the lock; check if nobody stole it from us.
> + * In that case, we need to fix up the uval to point to them
> + * instead of us, otherwise bad things happen.
> + */
> + if (q->pi_state->owner == current) {
> + ret = fixup_pi_state_owner(uaddr, q, NULL);
> + goto out;
> + }

And this had me read the comment just above here that stated the
unlocked access of ->owner is fine, because only the rt_mutex owner
changes pi_state, except of course, this just changed that.

So now we need double check the state after we've taken the lock in
fixup_pi_state_owner().


The below compiles and boots, but is otherwise untested. Could you give
it a spin?

---
kernel/futex.c | 83 +++++++++++++++++++++++++++++++++--------
kernel/locking/rtmutex.c | 26 +++++++++----
kernel/locking/rtmutex_common.h | 1 +
3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 76ed5921117a..29ac5b64e7c7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q)
spin_unlock(q->lock_ptr);
}

-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
- struct task_struct *newowner)
+ struct task_struct *argowner)
{
- u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
u32 uval, uninitialized_var(curval), newval;
- struct task_struct *oldowner;
+ struct task_struct *oldowner, *newowner;
+ u32 newtid;
int ret;

+ lockdep_assert_held(q->lock_ptr);
+
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
newtid |= FUTEX_OWNER_DIED;

/*
- * We are here either because we stole the rtmutex from the
- * previous highest priority waiter or we are the highest priority
- * waiter but have failed to get the rtmutex the first time.
+ * We are here because either:
+ *
+ * - we stole the lock and pi_state->owner needs updating to reflect
+ * that (@argowner == current),
*
- * We have to replace the newowner TID in the user space variable.
+ * or:
+ *
+ * - someone stole our lock and we need to fix things to point to the
+ * new owner (@argowner == NULL).
+ *
+ * Either way, we have to replace the TID in the user space variable.
* This must be atomic as we have to preserve the owner died bit here.
*
* Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
* in the PID check in lookup_pi_state.
*/
retry:
+ if (!argowner) {
+ if (oldowner != current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+ /* We got the lock after all, nothing to fix. */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * Since we just failed the trylock; there must be an owner.
+ */
+ newowner = rt_mutex_owner(&pi_state->pi_mutex);
+ BUG_ON(!newowner);
+ } else {
+ WARN_ON_ONCE(argowner != current);
+ if (oldowner == current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+ newowner = argowner;
+ }
+
+ newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
if (get_futex_value_locked(&uval, uaddr))
goto handle_fault;

@@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
* Got the lock. We might not be the anticipated owner if we
* did a lock-steal - fix up the PI-state in that case:
*
- * We can safely read pi_state->owner without holding wait_lock
- * because we now own the rt_mutex, only the owner will attempt
- * to change it.
+ * Speculative pi_state->owner read (we don't hold wait_lock);
+ * since we own the lock pi_state->owner == current is the
+ * stable state, anything else needs more attention.
*/
if (q->pi_state->owner != current)
ret = fixup_pi_state_owner(uaddr, q, current);
goto out;
}

+ /*
+ * If we didn't get the lock; check if anybody stole it from us. In
+ * that case, we need to fix up the uval to point to them instead of
+ * us, otherwise bad things happen. [10]
+ *
+ * Another speculative read; pi_state->owner == current is unstable
+ * but needs our attention.
+ */
+ if (q->pi_state->owner == current) {
+ ret = fixup_pi_state_owner(uaddr, q, NULL);
+ goto out;
+ }
+
/*
* Paranoia check. If we did not take the lock, then we should not be
* the owner of the rt_mutex.
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6f3dba6e4e9e..65cc0cb984e6 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
return ret;
}

+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+ int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+ /*
+ * try_to_take_rt_mutex() sets the lock waiters bit
+ * unconditionally. Clean this up.
+ */
+ fixup_rt_mutex_waiters(lock);
+
+ return ret;
+}
+
/*
* Slow path try-lock function:
*/
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);

- ret = try_to_take_rt_mutex(lock, current, NULL);
-
- /*
- * try_to_take_rt_mutex() sets the lock waiters bit
- * unconditionally. Clean this up.
- */
- fixup_rt_mutex_waiters(lock);
+ ret = __rt_mutex_slowtrylock(lock);

raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
return rt_mutex_slowtrylock(lock);
}

+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+ return __rt_mutex_slowtrylock(lock);
+}
+
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 124e98ca0b17..68686b3ec3c1 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter);

extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);

extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

2017-12-07 14:58:43

by Gratian Crisan

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes


Peter Zijlstra writes:
> The below compiles and boots, but is otherwise untested. Could you give
> it a spin?

Thank you! Yes, I'll start a test now.

-Gratian

> ---
> kernel/futex.c | 83 +++++++++++++++++++++++++++++++++--------
> kernel/locking/rtmutex.c | 26 +++++++++----
> kernel/locking/rtmutex_common.h | 1 +
> 3 files changed, 87 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 76ed5921117a..29ac5b64e7c7 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q)
> spin_unlock(q->lock_ptr);
> }
>
> -/*
> - * Fixup the pi_state owner with the new owner.
> - *
> - * Must be called with hash bucket lock held and mm->sem held for non
> - * private futexes.
> - */
> static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
> - struct task_struct *newowner)
> + struct task_struct *argowner)
> {
> - u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
> struct futex_pi_state *pi_state = q->pi_state;
> u32 uval, uninitialized_var(curval), newval;
> - struct task_struct *oldowner;
> + struct task_struct *oldowner, *newowner;
> + u32 newtid;
> int ret;
>
> + lockdep_assert_held(q->lock_ptr);
> +
> raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
>
> oldowner = pi_state->owner;
> @@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
> newtid |= FUTEX_OWNER_DIED;
>
> /*
> - * We are here either because we stole the rtmutex from the
> - * previous highest priority waiter or we are the highest priority
> - * waiter but have failed to get the rtmutex the first time.
> + * We are here because either:
> + *
> + * - we stole the lock and pi_state->owner needs updating to reflect
> + * that (@argowner == current),
> *
> - * We have to replace the newowner TID in the user space variable.
> + * or:
> + *
> + * - someone stole our lock and we need to fix things to point to the
> + * new owner (@argowner == NULL).
> + *
> + * Either way, we have to replace the TID in the user space variable.
> * This must be atomic as we have to preserve the owner died bit here.
> *
> * Note: We write the user space value _before_ changing the pi_state
> @@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
> * in the PID check in lookup_pi_state.
> */
> retry:
> + if (!argowner) {
> + if (oldowner != current) {
> + /*
> + * We raced against a concurrent self; things are
> + * already fixed up. Nothing to do.
> + */
> + ret = 0;
> + goto out_unlock;
> + }
> +
> + if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
> + /* We got the lock after all, nothing to fix. */
> + ret = 0;
> + goto out_unlock;
> + }
> +
> + /*
> + * Since we just failed the trylock; there must be an owner.
> + */
> + newowner = rt_mutex_owner(&pi_state->pi_mutex);
> + BUG_ON(!newowner);
> + } else {
> + WARN_ON_ONCE(argowner != current);
> + if (oldowner == current) {
> + /*
> + * We raced against a concurrent self; things are
> + * already fixed up. Nothing to do.
> + */
> + ret = 0;
> + goto out_unlock;
> + }
> + newowner = argowner;
> + }
> +
> + newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
> +
> if (get_futex_value_locked(&uval, uaddr))
> goto handle_fault;
>
> @@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
> * Got the lock. We might not be the anticipated owner if we
> * did a lock-steal - fix up the PI-state in that case:
> *
> - * We can safely read pi_state->owner without holding wait_lock
> - * because we now own the rt_mutex, only the owner will attempt
> - * to change it.
> + * Speculative pi_state->owner read (we don't hold wait_lock);
> + * since we own the lock pi_state->owner == current is the
> + * stable state, anything else needs more attention.
> */
> if (q->pi_state->owner != current)
> ret = fixup_pi_state_owner(uaddr, q, current);
> goto out;
> }
>
> + /*
> + * If we didn't get the lock; check if anybody stole it from us. In
> + * that case, we need to fix up the uval to point to them instead of
> + * us, otherwise bad things happen. [10]
> + *
> + * Another speculative read; pi_state->owner == current is unstable
> + * but needs our attention.
> + */
> + if (q->pi_state->owner == current) {
> + ret = fixup_pi_state_owner(uaddr, q, NULL);
> + goto out;
> + }
> +
> /*
> * Paranoia check. If we did not take the lock, then we should not be
> * the owner of the rt_mutex.
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 6f3dba6e4e9e..65cc0cb984e6 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
> return ret;
> }
>
> +static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
> +{
> + int ret = try_to_take_rt_mutex(lock, current, NULL);
> +
> + /*
> + * try_to_take_rt_mutex() sets the lock waiters bit
> + * unconditionally. Clean this up.
> + */
> + fixup_rt_mutex_waiters(lock);
> +
> + return ret;
> +}
> +
> /*
> * Slow path try-lock function:
> */
> @@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
> */
> raw_spin_lock_irqsave(&lock->wait_lock, flags);
>
> - ret = try_to_take_rt_mutex(lock, current, NULL);
> -
> - /*
> - * try_to_take_rt_mutex() sets the lock waiters bit
> - * unconditionally. Clean this up.
> - */
> - fixup_rt_mutex_waiters(lock);
> + ret = __rt_mutex_slowtrylock(lock);
>
> raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
>
> @@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
> return rt_mutex_slowtrylock(lock);
> }
>
> +int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
> +{
> + return __rt_mutex_slowtrylock(lock);
> +}
> +
> /**
> * rt_mutex_timed_lock - lock a rt_mutex interruptible
> * the timeout structure is provided
> diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
> index 124e98ca0b17..68686b3ec3c1 100644
> --- a/kernel/locking/rtmutex_common.h
> +++ b/kernel/locking/rtmutex_common.h
> @@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
> struct rt_mutex_waiter *waiter);
>
> extern int rt_mutex_futex_trylock(struct rt_mutex *l);
> +extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
>
> extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
> extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

2017-12-07 19:51:33

by Julia Cartwright

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes

On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote:
>
> Peter Zijlstra writes:
> > The below compiles and boots, but is otherwise untested. Could you give
> > it a spin?
>
> Thank you! Yes, I'll start a test now.

I gave it a spin with my minimal reproducing case w/ the strategic
msleep(). Previously, I could reproduce it in less than a minute in
that setup; now it's been running for a couple hours. So ... looks
good? :)

Thanks,
Julia

2017-12-07 23:04:07

by Gratian Crisan

[permalink] [raw]
Subject: Re: PI futexes + lock stealing woes


Julia Cartwright writes:

> On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote:
>>
>> Peter Zijlstra writes:
>> > The below compiles and boots, but is otherwise untested. Could you give
>> > it a spin?
>>
>> Thank you! Yes, I'll start a test now.
>
> I gave it a spin with my minimal reproducing case w/ the strategic
> msleep(). Previously, I could reproduce it in less than a minute in
> that setup; now it's been running for a couple hours. So ... looks
> good? :)

Yep ... looks good to me. I've been running two targets with the
original reproducer for 8 hours now plus a target running the C test.
All of them are still going.

I'm going to let them run overnight to make sure but I'm going to call
it fixed.

-Gratian

2017-12-08 12:49:46

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] futex: Avoid violating the 10th rule of futex

On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:

> Yep ... looks good to me. I've been running two targets with the
> original reproducer for 8 hours now plus a target running the C test.
> All of them are still going.
>
> I'm going to let them run overnight to make sure but I'm going to call
> it fixed.

Assuming nothing bad happens; find below the patch with a Changelog
attached.

---
Subject: futex: Avoid violating the 10th rule of futex
From: Peter Zijlstra <[email protected]>
Date: Thu Dec 7 16:54:23 CET 2017

Julia reported futex state corruption in the following scenario:

waiter waker stealer (prio > waiter)

futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
timeout=[N ms])
futex_wait_requeue_pi()
futex_wait_queue_me()
freezable_schedule()
<scheduled out>
futex(LOCK_PI, uaddr2)
futex(CMP_REQUEUE_PI, uaddr,
uaddr2, 1, 0)
/* requeues waiter to uaddr2 */
futex(UNLOCK_PI, uaddr2)
wake_futex_pi()
cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
<woken by waker>
<hrtimer_wakeup() fires,
clears sleeper->task>
futex(LOCK_PI, uaddr2)
__rt_mutex_start_proxy_lock()
try_to_take_rt_mutex() /* steals lock */
rt_mutex_set_owner(lock, stealer)
<preempted>
<scheduled in>
rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
try_to_take_rt_mutex() /* fails, lock held by stealer */
if (timeout && !timeout->task)
return -ETIMEDOUT;
fixup_owner()
/* lock wasn't acquired, so,
fixup_pi_state_owner skipped */

return -ETIMEDOUT;

/* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

futex_lock(LOCK_PI, uaddr2)
-> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
- locked = 1;
- goto out;
- }

- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
- owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- if (!owner)
- owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
- raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
- ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-and-Tested-by: Julia Cartwright <[email protected]>
Reported-and-Tested-by: Gratian Crisan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q
spin_unlock(q->lock_ptr);
}

-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
- struct task_struct *newowner)
+ struct task_struct *argowner)
{
- u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
u32 uval, uninitialized_var(curval), newval;
- struct task_struct *oldowner;
+ struct task_struct *oldowner, *newowner;
+ u32 newtid;
int ret;

+ lockdep_assert_held(q->lock_ptr);
+
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __us
newtid |= FUTEX_OWNER_DIED;

/*
- * We are here either because we stole the rtmutex from the
- * previous highest priority waiter or we are the highest priority
- * waiter but have failed to get the rtmutex the first time.
+ * We are here because either:
+ *
+ * - we stole the lock and pi_state->owner needs updating to reflect
+ * that (@argowner == current),
+ *
+ * or:
*
- * We have to replace the newowner TID in the user space variable.
+ * - someone stole our lock and we need to fix things to point to the
+ * new owner (@argowner == NULL).
+ *
+ * Either way, we have to replace the TID in the user space variable.
* This must be atomic as we have to preserve the owner died bit here.
*
* Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __us
* in the PID check in lookup_pi_state.
*/
retry:
+ if (!argowner) {
+ if (oldowner != current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+ /* We got the lock after all, nothing to fix. */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * Since we just failed the trylock; there must be an owner.
+ */
+ newowner = rt_mutex_owner(&pi_state->pi_mutex);
+ BUG_ON(!newowner);
+ } else {
+ WARN_ON_ONCE(argowner != current);
+ if (oldowner == current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+ newowner = argowner;
+ }
+
+ newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
if (get_futex_value_locked(&uval, uaddr))
goto handle_fault;

@@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr
* Got the lock. We might not be the anticipated owner if we
* did a lock-steal - fix up the PI-state in that case:
*
- * We can safely read pi_state->owner without holding wait_lock
- * because we now own the rt_mutex, only the owner will attempt
- * to change it.
+ * Speculative pi_state->owner read (we don't hold wait_lock);
+ * since we own the lock pi_state->owner == current is the
+ * stable state, anything else needs more attention.
*/
if (q->pi_state->owner != current)
ret = fixup_pi_state_owner(uaddr, q, current);
goto out;
}

+ /*
+ * If we didn't get the lock; check if anybody stole it from us. In
+ * that case, we need to fix up the uval to point to them instead of
+ * us, otherwise bad things happen. [10]
+ *
+ * Another speculative read; pi_state->owner == current is unstable
+ * but needs our attention.
+ */
+ if (q->pi_state->owner == current) {
+ ret = fixup_pi_state_owner(uaddr, q, NULL);
+ goto out;
+ }
+
/*
* Paranoia check. If we did not take the lock, then we should not be
* the owner of the rt_mutex.
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock,
return ret;
}

+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+ int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+ /*
+ * try_to_take_rt_mutex() sets the lock waiters bit
+ * unconditionally. Clean this up.
+ */
+ fixup_rt_mutex_waiters(lock);
+
+ return ret;
+}
+
/*
* Slow path try-lock function:
*/
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(s
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);

- ret = try_to_take_rt_mutex(lock, current, NULL);
-
- /*
- * try_to_take_rt_mutex() sets the lock waiters bit
- * unconditionally. Clean this up.
- */
- fixup_rt_mutex_waiters(lock);
+ ret = __rt_mutex_slowtrylock(lock);

raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struc
return rt_mutex_slowtrylock(lock);
}

+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+ return __rt_mutex_slowtrylock(lock);
+}
+
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(
struct rt_mutex_waiter *waiter);

extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);

extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

2017-12-08 16:05:11

by Gratian Crisan

[permalink] [raw]
Subject: Re: [PATCH] futex: Avoid violating the 10th rule of futex


Peter Zijlstra writes:

> On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:
>
>> Yep ... looks good to me. I've been running two targets with the
>> original reproducer for 8 hours now plus a target running the C test.
>> All of them are still going.
>>
>> I'm going to let them run overnight to make sure but I'm going to call
>> it fixed.
>
> Assuming nothing bad happens; find below the patch with a Changelog
> attached.

Looks good after overnight testing. Thank you so much.

-Gratian

2018-01-08 21:09:58

by Julia Cartwright

[permalink] [raw]
Subject: Re: [PATCH] futex: Avoid violating the 10th rule of futex

On Fri, Dec 08, 2017 at 01:49:39PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:
[..]
> Assuming nothing bad happens; find below the patch with a Changelog
> attached.
>
> ---
> Subject: futex: Avoid violating the 10th rule of futex
> From: Peter Zijlstra <[email protected]>
> Date: Thu Dec 7 16:54:23 CET 2017
>
> Julia reported futex state corruption in the following scenario:
>
> waiter waker stealer (prio > waiter)
>
> futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
> timeout=[N ms])
> futex_wait_requeue_pi()
> futex_wait_queue_me()
> freezable_schedule()
> <scheduled out>
> futex(LOCK_PI, uaddr2)
> futex(CMP_REQUEUE_PI, uaddr,
> uaddr2, 1, 0)
> /* requeues waiter to uaddr2 */
> futex(UNLOCK_PI, uaddr2)
> wake_futex_pi()
> cmp_futex_value_locked(uaddr2, waiter)
> wake_up_q()
> <woken by waker>
> <hrtimer_wakeup() fires,
> clears sleeper->task>
> futex(LOCK_PI, uaddr2)
> __rt_mutex_start_proxy_lock()
> try_to_take_rt_mutex() /* steals lock */
> rt_mutex_set_owner(lock, stealer)
> <preempted>
> <scheduled in>
> rt_mutex_wait_proxy_lock()
> __rt_mutex_slowlock()
> try_to_take_rt_mutex() /* fails, lock held by stealer */
> if (timeout && !timeout->task)
> return -ETIMEDOUT;
> fixup_owner()
> /* lock wasn't acquired, so,
> fixup_pi_state_owner skipped */
>
> return -ETIMEDOUT;
>
> /* At this point, we've returned -ETIMEDOUT to userspace, but the
> * futex word shows waiter to be the owner, and the pi_mutex has
> * stealer as the owner */
>
> futex_lock(LOCK_PI, uaddr2)
> -> bails with EDEADLK, futex word says we're owner.
>
> And suggested that what commit:
>
> 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
>
> removes from fixup_owner() looks to be just what is needed. And indeed
> it is -- I completely missed that requeue_pi could also result in this
> case. So we need to restore that, except that subsequent patches, like
> commit:
>
> 16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
>
> changed all the locking rules. Even without that, the sequence:
>
> - if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
> - locked = 1;
> - goto out;
> - }
>
> - raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
> - owner = rt_mutex_owner(&q->pi_state->pi_mutex);
> - if (!owner)
> - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
> - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
> - ret = fixup_pi_state_owner(uaddr, q, owner);
>
> already suggests there were races; otherwise we'd never have to look
> at next_owner.
>
> So instead of doing 3 consecutive wait_lock sections with who knows
> what races, we do it all in a single section. Additionally, the usage
> of pi_state->owner in fixup_owner() was only safe because only the
> rt_mutex owner would modify it, which this additional case wrecks.
>
> Luckily the values can only change away and not to the value we're
> testing, this means we can do a speculative test and double check once
> we have the wait_lock.
>
> Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
> Reported-and-Tested-by: Julia Cartwright <[email protected]>
> Reported-and-Tested-by: Gratian Crisan <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

Hey Peter-

We've been running w/ this patch now without further regression. I was
expecting to see this hit 4.15-rc* eventually, but haven't seen it land
anywhere. Is this in the queue for 4.16, then?

Thanks!
Julia

Subject: [tip:locking/urgent] futex: Avoid violating the 10th rule of futex

Commit-ID: c1e2f0eaf015fb7076d51a339011f2383e6dd389
Gitweb: https://git.kernel.org/tip/c1e2f0eaf015fb7076d51a339011f2383e6dd389
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 8 Dec 2017 13:49:39 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 14 Jan 2018 18:49:16 +0100

futex: Avoid violating the 10th rule of futex

Julia reported futex state corruption in the following scenario:

waiter waker stealer (prio > waiter)

futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
timeout=[N ms])
futex_wait_requeue_pi()
futex_wait_queue_me()
freezable_schedule()
<scheduled out>
futex(LOCK_PI, uaddr2)
futex(CMP_REQUEUE_PI, uaddr,
uaddr2, 1, 0)
/* requeues waiter to uaddr2 */
futex(UNLOCK_PI, uaddr2)
wake_futex_pi()
cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
<woken by waker>
<hrtimer_wakeup() fires,
clears sleeper->task>
futex(LOCK_PI, uaddr2)
__rt_mutex_start_proxy_lock()
try_to_take_rt_mutex() /* steals lock */
rt_mutex_set_owner(lock, stealer)
<preempted>
<scheduled in>
rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
try_to_take_rt_mutex() /* fails, lock held by stealer */
if (timeout && !timeout->task)
return -ETIMEDOUT;
fixup_owner()
/* lock wasn't acquired, so,
fixup_pi_state_owner skipped */

return -ETIMEDOUT;

/* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

futex_lock(LOCK_PI, uaddr2)
-> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
- locked = 1;
- goto out;
- }

- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
- owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- if (!owner)
- owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
- raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
- ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright <[email protected]>
Reported-by: Gratian Crisan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Julia Cartwright <[email protected]>
Tested-by: Gratian Crisan <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/futex.c | 83 +++++++++++++++++++++++++++++++++--------
kernel/locking/rtmutex.c | 26 +++++++++----
kernel/locking/rtmutex_common.h | 1 +
3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 57d0b36..9e69589 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q)
spin_unlock(q->lock_ptr);
}

-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
- struct task_struct *newowner)
+ struct task_struct *argowner)
{
- u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
u32 uval, uninitialized_var(curval), newval;
- struct task_struct *oldowner;
+ struct task_struct *oldowner, *newowner;
+ u32 newtid;
int ret;

+ lockdep_assert_held(q->lock_ptr);
+
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);

oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
newtid |= FUTEX_OWNER_DIED;

/*
- * We are here either because we stole the rtmutex from the
- * previous highest priority waiter or we are the highest priority
- * waiter but have failed to get the rtmutex the first time.
+ * We are here because either:
+ *
+ * - we stole the lock and pi_state->owner needs updating to reflect
+ * that (@argowner == current),
*
- * We have to replace the newowner TID in the user space variable.
+ * or:
+ *
+ * - someone stole our lock and we need to fix things to point to the
+ * new owner (@argowner == NULL).
+ *
+ * Either way, we have to replace the TID in the user space variable.
* This must be atomic as we have to preserve the owner died bit here.
*
* Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
* in the PID check in lookup_pi_state.
*/
retry:
+ if (!argowner) {
+ if (oldowner != current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+ /* We got the lock after all, nothing to fix. */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * Since we just failed the trylock; there must be an owner.
+ */
+ newowner = rt_mutex_owner(&pi_state->pi_mutex);
+ BUG_ON(!newowner);
+ } else {
+ WARN_ON_ONCE(argowner != current);
+ if (oldowner == current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+ newowner = argowner;
+ }
+
+ newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
if (get_futex_value_locked(&uval, uaddr))
goto handle_fault;

@@ -2434,9 +2472,9 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
* Got the lock. We might not be the anticipated owner if we
* did a lock-steal - fix up the PI-state in that case:
*
- * We can safely read pi_state->owner without holding wait_lock
- * because we now own the rt_mutex, only the owner will attempt
- * to change it.
+ * Speculative pi_state->owner read (we don't hold wait_lock);
+ * since we own the lock pi_state->owner == current is the
+ * stable state, anything else needs more attention.
*/
if (q->pi_state->owner != current)
ret = fixup_pi_state_owner(uaddr, q, current);
@@ -2444,6 +2482,19 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
}

/*
+ * If we didn't get the lock; check if anybody stole it from us. In
+ * that case, we need to fix up the uval to point to them instead of
+ * us, otherwise bad things happen. [10]
+ *
+ * Another speculative read; pi_state->owner == current is unstable
+ * but needs our attention.
+ */
+ if (q->pi_state->owner == current) {
+ ret = fixup_pi_state_owner(uaddr, q, NULL);
+ goto out;
+ }
+
+ /*
* Paranoia check. If we did not take the lock, then we should not be
* the owner of the rt_mutex.
*/
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6f3dba6..65cc0cb 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
return ret;
}

+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+ int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+ /*
+ * try_to_take_rt_mutex() sets the lock waiters bit
+ * unconditionally. Clean this up.
+ */
+ fixup_rt_mutex_waiters(lock);
+
+ return ret;
+}
+
/*
* Slow path try-lock function:
*/
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);

- ret = try_to_take_rt_mutex(lock, current, NULL);
-
- /*
- * try_to_take_rt_mutex() sets the lock waiters bit
- * unconditionally. Clean this up.
- */
- fixup_rt_mutex_waiters(lock);
+ ret = __rt_mutex_slowtrylock(lock);

raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
return rt_mutex_slowtrylock(lock);
}

+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+ return __rt_mutex_slowtrylock(lock);
+}
+
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 124e98c..68686b3 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter);

extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);

extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,