2016-11-23 11:25:53

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

From: Nicolai Hähnle <[email protected]>

Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
the following example. Acquire context stamps are ordered like the thread
numbers, i.e. thread #1 should back off when it encounters a mutex locked
by thread #0 etc.

Thread #0 Thread #1 Thread #2 Thread #3
--------- --------- --------- ---------
lock(ww)
success
lock(ww')
success
lock(ww)
lock(ww) .
. . unlock(ww) part 1
lock(ww) . . .
success . . .
. . unlock(ww) part 2
. back off
lock(ww') .
. .
(stuck) (stuck)

Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
(without being protected by lock->base.wait_lock), meaning that thread #0
can acquire ww in the fast path or, much more likely, the medium path
in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
won't wake up any of the waiters in ww_mutex_set_context_fastpath.

Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
thread #2, since waiters are added at the tail. Thread #2 wakes up and
backs off since it sees ww owned by a context with a lower stamp.

Meanwhile, thread #1 is never woken up, and so it won't back off its lock
on ww'. So thread #0 gets stuck waiting for ww' to be released.

This patch fixes the deadlock by waking up all waiters in the slow path
of ww_mutex_unlock.

We have an internal test case for amdgpu which continuously submits
command streams from tens of threads, where all command streams reference
hundreds of GPU buffer objects with a lot of overlap in the buffer lists
between command streams. This test reliably caused a deadlock, and while I
haven't completely confirmed that it is exactly the scenario outlined
above, this patch does fix the test case.

v2:
- use wake_q_add
- add additional explanations

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christian König <[email protected]> (v1)
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..7fbf9b4 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock,
__visible __used noinline
void __sched __mutex_unlock_slowpath(atomic_t *lock_count);

+static __used noinline
+void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count);
+
/**
* mutex_unlock - release the mutex
* @lock: the mutex to be released
@@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
*/
mutex_clear_owner(&lock->base);
#endif
- __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+ /*
+ * A previously _not_ waiting task may acquire the lock via the fast
+ * path during our unlock. In that case, already waiting tasks may have
+ * to back off to avoid a deadlock. Wake up all waiters so that they
+ * can check their acquire context stamp against the new owner.
+ */
+ __mutex_fastpath_unlock(&lock->base.count,
+ __mutex_unlock_slowpath_wakeall);
}
EXPORT_SYMBOL(ww_mutex_unlock);

@@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
* Release the lock, slowpath:
*/
static inline void
-__mutex_unlock_common_slowpath(struct mutex *lock, int nested)
+__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all)
{
unsigned long flags;
WAKE_Q(wake_q);
@@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
mutex_release(&lock->dep_map, nested, _RET_IP_);
debug_mutex_unlock(lock);

- if (!list_empty(&lock->wait_list)) {
+ if (wake_all) {
+ struct mutex_waiter *waiter;
+
+ list_for_each_entry(waiter, &lock->wait_list, list) {
+ debug_mutex_wake_waiter(lock, waiter);
+ wake_q_add(&wake_q, waiter->task);
+ }
+ } else if (!list_empty(&lock->wait_list)) {
/* get the first entry from the wait-list: */
struct mutex_waiter *waiter =
list_entry(lock->wait_list.next,
@@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count)
{
struct mutex *lock = container_of(lock_count, struct mutex, count);

- __mutex_unlock_common_slowpath(lock, 1);
+ __mutex_unlock_common_slowpath(lock, 1, 0);
+}
+
+static void
+__mutex_unlock_slowpath_wakeall(atomic_t *lock_count)
+{
+ struct mutex *lock = container_of(lock_count, struct mutex, count);
+
+ __mutex_unlock_common_slowpath(lock, 1, 1);
}

#ifndef CONFIG_DEBUG_LOCK_ALLOC
--
2.7.4


2016-11-23 11:25:55

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH 2/4] locking/ww_mutex: Remove redundant wakeups in ww_mutex_set_context_slowpath

From: Nicolai Hähnle <[email protected]>

When ww_mutex_set_context_slowpath runs, we are in one of two situations:

1. The current task was woken up by ww_mutex_unlock.

2. The current task is racing with ww_mutex_unlock: We entered the slow
path while lock->base.count <= 0, but skipped the wait in
__mutex_lock_common.

In both cases, all tasks that are currently contending for the lock are
either racing as in point 2 and blocked on lock->wait_lock, or they have
been or will be woken up by ww_mutex_unlock.

Either way, they will check their stamp against ours without having to be
woken up again.

This improvement is possible only after the changed behavior of
ww_mutex_unlock from the previous patch ("locking/ww_mutex: Fix a deadlock
affecting ww_mutexes").

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 7fbf9b4..7c09376 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -192,8 +192,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
}

/*
- * After acquiring lock in the slowpath set ctx and wake up any
- * waiters so they can recheck.
+ * After acquiring lock in the slowpath set ctx.
+ *
+ * Unlike the fast path, other waiters are already woken up by ww_mutex_unlock,
+ * so we don't have to do it again here.
*
* Callers must hold the mutex wait_lock.
*/
@@ -201,19 +203,8 @@ static __always_inline void
ww_mutex_set_context_slowpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
- struct mutex_waiter *cur;
-
ww_mutex_lock_acquired(lock, ctx);
lock->ctx = ctx;
-
- /*
- * Give any possible sleeping processes the chance to wake up,
- * so they can recheck if they have to back off.
- */
- list_for_each_entry(cur, &lock->base.wait_list, list) {
- debug_mutex_wake_waiter(&lock->base, cur);
- wake_up_process(cur->task);
- }
}

#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
--
2.7.4

2016-11-23 11:26:22

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH 4/4] locking/ww_mutex: Fix a comment typo

From: Nicolai Hähnle <[email protected]>

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/ww_mutex.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 760399a..fd93cd3 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -97,7 +97,7 @@ static inline void ww_mutex_init(struct ww_mutex *lock,
* @ctx: w/w acquire context to initialize
* @ww_class: w/w class of the context
*
- * Initializes an context to acquire multiple mutexes of the given w/w class.
+ * Initializes a context to acquire multiple mutexes of the given w/w class.
*
* Context-based w/w mutex acquiring can be done in any order whatsoever within
* a given lock class. Deadlocks will be detected and handled with the
--
2.7.4

2016-11-23 11:26:41

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH 3/4] locking/Documentation: fix a typo

From: Nicolai Hähnle <[email protected]>

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
Documentation/locking/00-INDEX | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/locking/00-INDEX b/Documentation/locking/00-INDEX
index c256c9b..adc1d289 100644
--- a/Documentation/locking/00-INDEX
+++ b/Documentation/locking/00-INDEX
@@ -13,4 +13,4 @@ rt-mutex.txt
spinlocks.txt
- info on using spinlocks to provide exclusive access in kernel.
ww-mutex-design.txt
- - Intro to Mutex wait/would deadlock handling.s
+ - Intro to Mutex wait/wound deadlock handling.
--
2.7.4

2016-11-23 12:50:57

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> From: Nicolai H?hnle <[email protected]>
>
> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> the following example. Acquire context stamps are ordered like the thread
> numbers, i.e. thread #1 should back off when it encounters a mutex locked
> by thread #0 etc.
>
> Thread #0 Thread #1 Thread #2 Thread #3
> --------- --------- --------- ---------
> lock(ww)
> success
> lock(ww')
> success
> lock(ww)
> lock(ww) .
> . . unlock(ww) part 1
> lock(ww) . . .
> success . . .
> . . unlock(ww) part 2
> . back off
> lock(ww') .
> . .
> (stuck) (stuck)
>
> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> (without being protected by lock->base.wait_lock), meaning that thread #0
> can acquire ww in the fast path or, much more likely, the medium path
> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
>
> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> thread #2, since waiters are added at the tail. Thread #2 wakes up and
> backs off since it sees ww owned by a context with a lower stamp.
>
> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> on ww'. So thread #0 gets stuck waiting for ww' to be released.
>
> This patch fixes the deadlock by waking up all waiters in the slow path
> of ww_mutex_unlock.
>
> We have an internal test case for amdgpu which continuously submits
> command streams from tens of threads, where all command streams reference
> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> between command streams. This test reliably caused a deadlock, and while I
> haven't completely confirmed that it is exactly the scenario outlined
> above, this patch does fix the test case.
>
> v2:
> - use wake_q_add
> - add additional explanations
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Christian K?nig <[email protected]> (v1)
> Signed-off-by: Nicolai H?hnle <[email protected]>

Yeah, when the owning ctx changes we need to wake up all waiters, to make
sure we catch all (new) deadlock scenarios. And I tried poking at your
example, and I think it's solid and can't be minimized any further. I
don't have much clue on mutex.c code itself, but the changes seem
reasonable. With that caveat:

Reviewed-by: Daniel Vetter <[email protected]>

Cheers, Daniel

> ---
> kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++----
> 1 file changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index a70b90d..7fbf9b4 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock,
> __visible __used noinline
> void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
>
> +static __used noinline
> +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count);
> +
> /**
> * mutex_unlock - release the mutex
> * @lock: the mutex to be released
> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
> */
> mutex_clear_owner(&lock->base);
> #endif
> - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
> + /*
> + * A previously _not_ waiting task may acquire the lock via the fast
> + * path during our unlock. In that case, already waiting tasks may have
> + * to back off to avoid a deadlock. Wake up all waiters so that they
> + * can check their acquire context stamp against the new owner.
> + */
> + __mutex_fastpath_unlock(&lock->base.count,
> + __mutex_unlock_slowpath_wakeall);
> }
> EXPORT_SYMBOL(ww_mutex_unlock);
>
> @@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
> * Release the lock, slowpath:
> */
> static inline void
> -__mutex_unlock_common_slowpath(struct mutex *lock, int nested)
> +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all)
> {
> unsigned long flags;
> WAKE_Q(wake_q);
> @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
> mutex_release(&lock->dep_map, nested, _RET_IP_);
> debug_mutex_unlock(lock);
>
> - if (!list_empty(&lock->wait_list)) {
> + if (wake_all) {
> + struct mutex_waiter *waiter;
> +
> + list_for_each_entry(waiter, &lock->wait_list, list) {
> + debug_mutex_wake_waiter(lock, waiter);
> + wake_q_add(&wake_q, waiter->task);
> + }
> + } else if (!list_empty(&lock->wait_list)) {
> /* get the first entry from the wait-list: */
> struct mutex_waiter *waiter =
> list_entry(lock->wait_list.next,
> @@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count)
> {
> struct mutex *lock = container_of(lock_count, struct mutex, count);
>
> - __mutex_unlock_common_slowpath(lock, 1);
> + __mutex_unlock_common_slowpath(lock, 1, 0);
> +}
> +
> +static void
> +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count)
> +{
> + struct mutex *lock = container_of(lock_count, struct mutex, count);
> +
> + __mutex_unlock_common_slowpath(lock, 1, 1);
> }
>
> #ifndef CONFIG_DEBUG_LOCK_ALLOC
> --
> 2.7.4
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-11-23 13:01:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> From: Nicolai H?hnle <[email protected]>
>
> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> the following example. Acquire context stamps are ordered like the thread
> numbers, i.e. thread #1 should back off when it encounters a mutex locked
> by thread #0 etc.
>
> Thread #0 Thread #1 Thread #2 Thread #3
> --------- --------- --------- ---------
> lock(ww)
> success
> lock(ww')
> success
> lock(ww)
> lock(ww) .
> . . unlock(ww) part 1
> lock(ww) . . .
> success . . .
> . . unlock(ww) part 2
> . back off
> lock(ww') .
> . .
> (stuck) (stuck)
>
> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> (without being protected by lock->base.wait_lock), meaning that thread #0
> can acquire ww in the fast path or, much more likely, the medium path
> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
>
> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> thread #2, since waiters are added at the tail. Thread #2 wakes up and
> backs off since it sees ww owned by a context with a lower stamp.
>
> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> on ww'. So thread #0 gets stuck waiting for ww' to be released.
>
> This patch fixes the deadlock by waking up all waiters in the slow path
> of ww_mutex_unlock.
>
> We have an internal test case for amdgpu which continuously submits
> command streams from tens of threads, where all command streams reference
> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> between command streams. This test reliably caused a deadlock, and while I
> haven't completely confirmed that it is exactly the scenario outlined
> above, this patch does fix the test case.
>
> v2:
> - use wake_q_add
> - add additional explanations
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Christian K?nig <[email protected]> (v1)
> Signed-off-by: Nicolai H?hnle <[email protected]>

Completely and utterly fails to apply; I think this patch is based on
code prior to the mutex rewrite.

Please rebase on tip/locking/core.

Also, is this a regression, or has this been a 'feature' of the ww_mutex
code from early on?

2016-11-23 13:08:52

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> > From: Nicolai H?hnle <[email protected]>
> >
> > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> > the following example. Acquire context stamps are ordered like the thread
> > numbers, i.e. thread #1 should back off when it encounters a mutex locked
> > by thread #0 etc.
> >
> > Thread #0 Thread #1 Thread #2 Thread #3
> > --------- --------- --------- ---------
> > lock(ww)
> > success
> > lock(ww')
> > success
> > lock(ww)
> > lock(ww) .
> > . . unlock(ww) part 1
> > lock(ww) . . .
> > success . . .
> > . . unlock(ww) part 2
> > . back off
> > lock(ww') .
> > . .
> > (stuck) (stuck)
> >
> > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> > (without being protected by lock->base.wait_lock), meaning that thread #0
> > can acquire ww in the fast path or, much more likely, the medium path
> > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> > won't wake up any of the waiters in ww_mutex_set_context_fastpath.
> >
> > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> > thread #2, since waiters are added at the tail. Thread #2 wakes up and
> > backs off since it sees ww owned by a context with a lower stamp.
> >
> > Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> > on ww'. So thread #0 gets stuck waiting for ww' to be released.
> >
> > This patch fixes the deadlock by waking up all waiters in the slow path
> > of ww_mutex_unlock.
> >
> > We have an internal test case for amdgpu which continuously submits
> > command streams from tens of threads, where all command streams reference
> > hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> > between command streams. This test reliably caused a deadlock, and while I
> > haven't completely confirmed that it is exactly the scenario outlined
> > above, this patch does fix the test case.
> >
> > v2:
> > - use wake_q_add
> > - add additional explanations
> >
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Chris Wilson <[email protected]>
> > Cc: Maarten Lankhorst <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Reviewed-by: Christian K?nig <[email protected]> (v1)
> > Signed-off-by: Nicolai H?hnle <[email protected]>
>
> Completely and utterly fails to apply; I think this patch is based on
> code prior to the mutex rewrite.
>
> Please rebase on tip/locking/core.
>
> Also, is this a regression, or has this been a 'feature' of the ww_mutex
> code from early on?

Sorry forgot to mention that, but I checked. Seems to have been broken
since day 1, at least looking at the original code the wake-single-waiter
stuff is as old as the mutex code added in 2006.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-11-23 13:11:59

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote:
> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> > > From: Nicolai H?hnle <[email protected]>
> > >
> > > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> > > the following example. Acquire context stamps are ordered like the thread
> > > numbers, i.e. thread #1 should back off when it encounters a mutex locked
> > > by thread #0 etc.
> > >
> > > Thread #0 Thread #1 Thread #2 Thread #3
> > > --------- --------- --------- ---------
> > > lock(ww)
> > > success
> > > lock(ww')
> > > success
> > > lock(ww)
> > > lock(ww) .
> > > . . unlock(ww) part 1
> > > lock(ww) . . .
> > > success . . .
> > > . . unlock(ww) part 2
> > > . back off
> > > lock(ww') .
> > > . .
> > > (stuck) (stuck)
> > >
> > > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> > > (without being protected by lock->base.wait_lock), meaning that thread #0
> > > can acquire ww in the fast path or, much more likely, the medium path
> > > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> > > won't wake up any of the waiters in ww_mutex_set_context_fastpath.
> > >
> > > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> > > thread #2, since waiters are added at the tail. Thread #2 wakes up and
> > > backs off since it sees ww owned by a context with a lower stamp.
> > >
> > > Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> > > on ww'. So thread #0 gets stuck waiting for ww' to be released.
> > >
> > > This patch fixes the deadlock by waking up all waiters in the slow path
> > > of ww_mutex_unlock.
> > >
> > > We have an internal test case for amdgpu which continuously submits
> > > command streams from tens of threads, where all command streams reference
> > > hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> > > between command streams. This test reliably caused a deadlock, and while I
> > > haven't completely confirmed that it is exactly the scenario outlined
> > > above, this patch does fix the test case.
> > >
> > > v2:
> > > - use wake_q_add
> > > - add additional explanations
> > >
> > > Cc: Peter Zijlstra <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Chris Wilson <[email protected]>
> > > Cc: Maarten Lankhorst <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Reviewed-by: Christian K?nig <[email protected]> (v1)
> > > Signed-off-by: Nicolai H?hnle <[email protected]>
> >
> > Completely and utterly fails to apply; I think this patch is based on
> > code prior to the mutex rewrite.
> >
> > Please rebase on tip/locking/core.
> >
> > Also, is this a regression, or has this been a 'feature' of the ww_mutex
> > code from early on?
>
> Sorry forgot to mention that, but I checked. Seems to have been broken
> since day 1, at least looking at the original code the wake-single-waiter
> stuff is as old as the mutex code added in 2006.

More details: For gpu drivers this was originally working, since the
ww_mutex implementation in ttm did use wake_up_all. So need to add a

Fixes: 5e338405119a ("drm/ttm: convert to the reservation api")
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-11-23 13:42:53

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

Op 23-11-16 om 14:11 schreef Daniel Vetter:
> On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote:
>> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote:
>>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
>>>> From: Nicolai H?hnle <[email protected]>
>>>>
>>>> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
>>>> the following example. Acquire context stamps are ordered like the thread
>>>> numbers, i.e. thread #1 should back off when it encounters a mutex locked
>>>> by thread #0 etc.
>>>>
>>>> Thread #0 Thread #1 Thread #2 Thread #3
>>>> --------- --------- --------- ---------
>>>> lock(ww)
>>>> success
>>>> lock(ww')
>>>> success
>>>> lock(ww)
>>>> lock(ww) .
>>>> . . unlock(ww) part 1
>>>> lock(ww) . . .
>>>> success . . .
>>>> . . unlock(ww) part 2
>>>> . back off
>>>> lock(ww') .
>>>> . .
>>>> (stuck) (stuck)
>>>>
>>>> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
>>>> (without being protected by lock->base.wait_lock), meaning that thread #0
>>>> can acquire ww in the fast path or, much more likely, the medium path
>>>> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
>>>> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
>>>>
>>>> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
>>>> thread #2, since waiters are added at the tail. Thread #2 wakes up and
>>>> backs off since it sees ww owned by a context with a lower stamp.
>>>>
>>>> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
>>>> on ww'. So thread #0 gets stuck waiting for ww' to be released.
>>>>
>>>> This patch fixes the deadlock by waking up all waiters in the slow path
>>>> of ww_mutex_unlock.
>>>>
>>>> We have an internal test case for amdgpu which continuously submits
>>>> command streams from tens of threads, where all command streams reference
>>>> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
>>>> between command streams. This test reliably caused a deadlock, and while I
>>>> haven't completely confirmed that it is exactly the scenario outlined
>>>> above, this patch does fix the test case.
>>>>
>>>> v2:
>>>> - use wake_q_add
>>>> - add additional explanations
>>>>
>>>> Cc: Peter Zijlstra <[email protected]>
>>>> Cc: Ingo Molnar <[email protected]>
>>>> Cc: Chris Wilson <[email protected]>
>>>> Cc: Maarten Lankhorst <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Reviewed-by: Christian K?nig <[email protected]> (v1)
>>>> Signed-off-by: Nicolai H?hnle <[email protected]>
>>> Completely and utterly fails to apply; I think this patch is based on
>>> code prior to the mutex rewrite.
>>>
>>> Please rebase on tip/locking/core.
>>>
>>> Also, is this a regression, or has this been a 'feature' of the ww_mutex
>>> code from early on?
>> Sorry forgot to mention that, but I checked. Seems to have been broken
>> since day 1, at least looking at the original code the wake-single-waiter
>> stuff is as old as the mutex code added in 2006.
> More details: For gpu drivers this was originally working, since the
> ww_mutex implementation in ttm did use wake_up_all. So need to add a
>
> Fixes: 5e338405119a ("drm/ttm: convert to the reservation api")

But it's broken in the original kernel ww_mutex implementation, I guess it doesn't matter much since ttm was the first in-kernel user anyway. :)

~Maarten

2016-11-23 14:33:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 03:25:25PM +0100, Daniel Vetter wrote:

> Without thinking it through in detail this is a PI issue, except that we
> replace boosting with wakeup&back-off. Could we perhaps steal something
> from rt mutexes to make it fair&efficient?

rt_mutexes order the waiters by 'priority' and wake the top most.

2016-11-23 14:25:41

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> > @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
> > */
> > mutex_clear_owner(&lock->base);
> > #endif
> > - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
> > + /*
> > + * A previously _not_ waiting task may acquire the lock via the fast
> > + * path during our unlock. In that case, already waiting tasks may have
> > + * to back off to avoid a deadlock. Wake up all waiters so that they
> > + * can check their acquire context stamp against the new owner.
> > + */
> > + __mutex_fastpath_unlock(&lock->base.count,
> > + __mutex_unlock_slowpath_wakeall);
> > }
>
> So doing a wake-all has obvious issues with thundering herd etc.. Also,
> with the new mutex, you'd not be able to do hand-off, which would
> introduce starvation cases.
>
> Ideally we'd iterate the blocked list and pick the waiter with the
> earliest stamp, or we'd maintain the list in stamp order instead of
> FIFO, for ww_mutex.

Not sure we'll win that much, at least I think we still need to wake up
everyone with earlier stamp than the one of the task that just released
the lock. Otherwise there's deadlocks. So just cuts the wakeups in half,
on average.

What we could do is do a handoff-hint with the timestamp of earliest task
we believe should get the lock. Everyone with a later timestamp that gets
woken then knows that they definitely have a deadlock situation and need
to back off (thread 2 in the example).

thread 1 would get woken, and would be able to take the lock, except when
thread 0 successfully raced it and stole the lock. And anyone else racing
in with later timestamps would also immediately back off, ensuring
fairness.

Without thinking it through in detail this is a PI issue, except that we
replace boosting with wakeup&back-off. Could we perhaps steal something
from rt mutexes to make it fair&efficient?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-11-23 15:19:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
> */
> mutex_clear_owner(&lock->base);
> #endif
> - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
> + /*
> + * A previously _not_ waiting task may acquire the lock via the fast
> + * path during our unlock. In that case, already waiting tasks may have
> + * to back off to avoid a deadlock. Wake up all waiters so that they
> + * can check their acquire context stamp against the new owner.
> + */
> + __mutex_fastpath_unlock(&lock->base.count,
> + __mutex_unlock_slowpath_wakeall);
> }

So doing a wake-all has obvious issues with thundering herd etc.. Also,
with the new mutex, you'd not be able to do hand-off, which would
introduce starvation cases.

Ideally we'd iterate the blocked list and pick the waiter with the
earliest stamp, or we'd maintain the list in stamp order instead of
FIFO, for ww_mutex.

2016-11-24 11:27:03

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On 23.11.2016 15:25, Daniel Vetter wrote:
> On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote:
>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
>>> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>>> */
>>> mutex_clear_owner(&lock->base);
>>> #endif
>>> - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
>>> + /*
>>> + * A previously _not_ waiting task may acquire the lock via the fast
>>> + * path during our unlock. In that case, already waiting tasks may have
>>> + * to back off to avoid a deadlock. Wake up all waiters so that they
>>> + * can check their acquire context stamp against the new owner.
>>> + */
>>> + __mutex_fastpath_unlock(&lock->base.count,
>>> + __mutex_unlock_slowpath_wakeall);
>>> }
>>
>> So doing a wake-all has obvious issues with thundering herd etc.. Also,
>> with the new mutex, you'd not be able to do hand-off, which would
>> introduce starvation cases.
>>
>> Ideally we'd iterate the blocked list and pick the waiter with the
>> earliest stamp, or we'd maintain the list in stamp order instead of
>> FIFO, for ww_mutex.
>
> Not sure we'll win that much, at least I think we still need to wake up
> everyone with earlier stamp than the one of the task that just released
> the lock. Otherwise there's deadlocks. So just cuts the wakeups in half,
> on average.
>
> What we could do is do a handoff-hint with the timestamp of earliest task
> we believe should get the lock. Everyone with a later timestamp that gets
> woken then knows that they definitely have a deadlock situation and need
> to back off (thread 2 in the example).
>
> thread 1 would get woken, and would be able to take the lock, except when
> thread 0 successfully raced it and stole the lock. And anyone else racing
> in with later timestamps would also immediately back off, ensuring
> fairness.

I did consider maintaining stamp order in the waiter list and originally
decided against it because I just wanted a simple and conservative fix
to avoid adding new regressions.

Now that a different approach is needed for >= 4.9 anyway mostly due to
the hand-off logic, I'm reconsidering this.

I do believe we can win a bit by keeping the wait list sorted, if we
also make sure that waiters don't add themselves in the first place if
they see that a deadlock situation cannot be avoided.

I will probably want to extend struct mutex_waiter with
ww_mutex-specific fields to facilitate this (i.e. ctx pointer, perhaps
stamp as well to reduce pointer-chasing). That should be fine since it
lives on the stack.

In the meantime, I'd appreciate it if patch #1 could be accepted as-is
for stable updates to <= 4.8. It fixes a real (if rare) bug, and the
stampede inefficiency isn't a problem in practice at least for GPU
applications.

Thanks,
Nicolai

>
> Without thinking it through in detail this is a PI issue, except that we
> replace boosting with wakeup&back-off. Could we perhaps steal something
> from rt mutexes to make it fair&efficient?
> -Daniel
>

2016-11-24 11:40:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Thu, Nov 24, 2016 at 12:26:57PM +0100, Nicolai H?hnle wrote:

> I do believe we can win a bit by keeping the wait list sorted, if we also
> make sure that waiters don't add themselves in the first place if they see
> that a deadlock situation cannot be avoided.
>
> I will probably want to extend struct mutex_waiter with ww_mutex-specific
> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce
> pointer-chasing). That should be fine since it lives on the stack.

Right, shouldn't be a problem I think.

The only 'problem' I can see with using that is that its possible to mix
ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the
list order somewhat tricky.

Ideally we'd remove that feature, although I see its actually used quite
a bit :/

> In the meantime, I'd appreciate it if patch #1 could be accepted as-is for
> stable updates to <= 4.8. It fixes a real (if rare) bug, and the stampede
> inefficiency isn't a problem in practice at least for GPU applications.

Sorry can't do. We don't do stable patches that don't have anything
upstream.

2016-11-24 11:53:08

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra <[email protected]> wrote:
>
>> I do believe we can win a bit by keeping the wait list sorted, if we also
>> make sure that waiters don't add themselves in the first place if they see
>> that a deadlock situation cannot be avoided.
>>
>> I will probably want to extend struct mutex_waiter with ww_mutex-specific
>> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce
>> pointer-chasing). That should be fine since it lives on the stack.
>
> Right, shouldn't be a problem I think.
>
> The only 'problem' I can see with using that is that its possible to mix
> ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the
> list order somewhat tricky.
>
> Ideally we'd remove that feature, although I see its actually used quite
> a bit :/

I guess we could create a small fake acquire_ctx for single-lock
paths. That way callers still don't need to deal with having an
explicit ctx, but we can assume the timestamp (for ensuring fairness)
is available for all cases. Otherwise there's indeed a problem with
correctly (well fairly) interleaving ctx and non-ctx lockers I think.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2016-11-24 12:19:23

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On 24.11.2016 12:56, Peter Zijlstra wrote:
> On Thu, Nov 24, 2016 at 12:52:25PM +0100, Daniel Vetter wrote:
>> On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra <[email protected]> wrote:
>>>
>>>> I do believe we can win a bit by keeping the wait list sorted, if we also
>>>> make sure that waiters don't add themselves in the first place if they see
>>>> that a deadlock situation cannot be avoided.
>>>>
>>>> I will probably want to extend struct mutex_waiter with ww_mutex-specific
>>>> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce
>>>> pointer-chasing). That should be fine since it lives on the stack.
>>>
>>> Right, shouldn't be a problem I think.
>>>
>>> The only 'problem' I can see with using that is that its possible to mix
>>> ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the
>>> list order somewhat tricky.
>>>
>>> Ideally we'd remove that feature, although I see its actually used quite
>>> a bit :/
>>
>> I guess we could create a small fake acquire_ctx for single-lock
>> paths. That way callers still don't need to deal with having an
>> explicit ctx, but we can assume the timestamp (for ensuring fairness)
>> is available for all cases. Otherwise there's indeed a problem with
>> correctly (well fairly) interleaving ctx and non-ctx lockers I think.
>
> Actually tried that, but we need a ww_class to get a stamp from, and
> ww_mutex_lock() doesn't have one of those..

The acquire context needs to be live until the unlock anyway, so this is
something that requires modifying the callers of ww_mutex_lock. Those
should all have a ww_class available, or something is very wrong :)

Nicolai

2016-11-24 12:19:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

On Thu, Nov 24, 2016 at 12:52:25PM +0100, Daniel Vetter wrote:
> On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra <[email protected]> wrote:
> >
> >> I do believe we can win a bit by keeping the wait list sorted, if we also
> >> make sure that waiters don't add themselves in the first place if they see
> >> that a deadlock situation cannot be avoided.
> >>
> >> I will probably want to extend struct mutex_waiter with ww_mutex-specific
> >> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce
> >> pointer-chasing). That should be fine since it lives on the stack.
> >
> > Right, shouldn't be a problem I think.
> >
> > The only 'problem' I can see with using that is that its possible to mix
> > ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the
> > list order somewhat tricky.
> >
> > Ideally we'd remove that feature, although I see its actually used quite
> > a bit :/
>
> I guess we could create a small fake acquire_ctx for single-lock
> paths. That way callers still don't need to deal with having an
> explicit ctx, but we can assume the timestamp (for ensuring fairness)
> is available for all cases. Otherwise there's indeed a problem with
> correctly (well fairly) interleaving ctx and non-ctx lockers I think.

Actually tried that, but we need a ww_class to get a stamp from, and
ww_mutex_lock() doesn't have one of those..