LinuxLists.cc - [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

2016-11-28 12:20:30

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

It turns out that the deadlock that I found last week was already implicitly
fixed during the lock->owner redesign, by checking the WAITERS bit in the
w/w lock fast path. However, since I had already started looking into
sorting the wait list, here goes.

The basic idea is to make sure that:

1. All waiters that have a ww_ctx appear in stamp order in the wait list.
Waiters without a ww_ctx are still supported and appear in FIFO order as
before.

2. At most one of the waiters can be in a state where it has to check for
back off (i.e., ww_ctx->acquire > 0). Technically, there are short time
windows in which more than one such waiter can be on the list, but all
but the first one are running. This happens when a new waiter with
ww_ctx->acquire > 0 adds itself at the front of the list and wakes up the
previous head of the list, and of course multiple such chained cases can
be in-flight simultaneously.

Then we only ever have to wake up one task at a time. This is _not_ always
the head of the wait list, since there may be waiters without a context. But
among waiters with a context, we only ever have to wake the first one.

To achieve all this, the series adds a new field to mutex_waiter which is
only used for the w/w lock case. As a consequence, calling mutex_lock
directly on w/w locks is now definitely incorrect. That was likely the
intention previously anyway, but grepping through the source I did find one
place that had slipped through.

I've included timings taken from a contention-heavy stress test to some of
the patches. The stress test performs actual GPU operations which take a
good chunk of the wall time, but even so, the series still manages to
improve the wall time quite a bit.

Cheers,
Nicolai

2016-11-28 12:20:42

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after

From: Nicolai Hähnle <[email protected]>

The function will be re-used in subsequent patches.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0afa998..200629a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -277,6 +277,13 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
ww_ctx->acquired++;
}

+static inline bool __sched
+__ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
+{
+ return a->stamp - b->stamp <= LONG_MAX &&
+ (a->stamp != b->stamp || a > b);
+}
+
/*
* After acquiring lock with fastpath or when we lost out in contested
* slowpath, set ctx and wake up any waiters so they can recheck.
@@ -610,8 +617,7 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
if (!hold_ctx)
return 0;

- if (ctx->stamp - hold_ctx->stamp <= LONG_MAX &&
- (ctx->stamp != hold_ctx->stamp || ctx > hold_ctx)) {
+ if (__ww_mutex_stamp_after(ctx, hold_ctx)) {
#ifdef CONFIG_DEBUG_MUTEXES
DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
ctx->contending_lock = ww;
--
2.7.4

2016-11-28 12:20:53

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order

From: Nicolai Hähnle <[email protected]>

Add regular waiters in stamp order. Keep adding waiters that have no
context in FIFO order and take care not to starve them.

While adding our task as a waiter, back off if we detect that there is a
waiter with a lower stamp in front of us.

Make sure to call lock_contended even when we back off early.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/mutex.h | 3 ++
kernel/locking/mutex.c | 76 +++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index b97870f..118a3b6 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -20,6 +20,8 @@
#include <linux/osq_lock.h>
#include <linux/debug_locks.h>

+struct ww_acquire_ctx;
+
/*
* Simple, straightforward mutexes with strict semantics:
*
@@ -75,6 +77,7 @@ static inline struct task_struct *__mutex_owner(struct mutex *lock)
struct mutex_waiter {
struct list_head list;
struct task_struct *task;
+ struct ww_acquire_ctx *ww_ctx;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 585627f..01dcae7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -628,6 +628,40 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
return 0;
}

+static inline int __sched
+__ww_mutex_add_waiter(struct mutex_waiter *waiter,
+ struct mutex *lock,
+ struct ww_acquire_ctx *ww_ctx)
+{
+ if (ww_ctx) {
+ struct mutex_waiter *cur;
+
+ /*
+ * Add the waiter before the first waiter with a higher stamp.
+ * Waiters without a context are skipped to avoid starving
+ * them.
+ */
+ list_for_each_entry(cur, &lock->wait_list, list) {
+ if (!cur->ww_ctx)
+ continue;
+
+ if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
+ /* Back off immediately if necessary. */
+ if (ww_ctx->acquired > 0)
+ return -EDEADLK;
+
+ continue;
+ }
+
+ list_add_tail(&waiter->list, &cur->list);
+ return 0;
+ }
+ }
+
+ list_add_tail(&waiter->list, &lock->wait_list);
+ return 0;
+}
+
/*
* Lock a mutex (possibly interruptible), slowpath:
*/
@@ -677,15 +711,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
debug_mutex_lock_common(lock, &waiter);
debug_mutex_add_waiter(lock, &waiter, task);

- /* add waiting tasks to the end of the waitqueue (FIFO): */
- list_add_tail(&waiter.list, &lock->wait_list);
+ lock_contended(&lock->dep_map, ip);
+
+ if (!use_ww_ctx) {
+ /* add waiting tasks to the end of the waitqueue (FIFO): */
+ list_add_tail(&waiter.list, &lock->wait_list);
+ } else {
+ /* Add in stamp order, waking up waiters that must back off. */
+ ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+ if (ret)
+ goto err_early_backoff;
+
+ waiter.ww_ctx = ww_ctx;
+ }
+
waiter.task = task;

if (__mutex_waiter_is_first(lock, &waiter))
__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);

- lock_contended(&lock->dep_map, ip);
-
set_task_state(task, state);
for (;;) {
/*
@@ -693,8 +737,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* mutex_unlock() handing the lock off to us, do a trylock
* before testing the error conditions to make sure we pick up
* the handoff.
+ *
+ * For w/w locks, we always need to do this even if we're not
+ * currently the first waiter, because we may have been the
+ * first waiter during the unlock.
*/
- if (__mutex_trylock(lock, first))
+ if (__mutex_trylock(lock, use_ww_ctx || first))
goto acquired;

/*
@@ -716,7 +764,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
spin_unlock_mutex(&lock->wait_lock, flags);
schedule_preempt_disabled();

- if (!first && __mutex_waiter_is_first(lock, &waiter)) {
+ if (use_ww_ctx && ww_ctx) {
+ /*
+ * Always re-check whether we're in first position. We
+ * don't want to spin if another task with a lower
+ * stamp has taken our position.
+ *
+ * We also may have to set the handoff flag again, if
+ * our position at the head was temporarily taken away.
+ */
+ first = __mutex_waiter_is_first(lock, &waiter);
+
+ if (first)
+ __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
+ } else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
first = true;
__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
}
@@ -728,7 +789,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* or we must see its unlock and acquire.
*/
if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
- __mutex_trylock(lock, first))
+ __mutex_trylock(lock, use_ww_ctx || first))
break;

spin_lock_mutex(&lock->wait_lock, flags);
@@ -761,6 +822,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
err:
__set_task_state(task, TASK_RUNNING);
mutex_remove_waiter(lock, &waiter, task);
+err_early_backoff:
spin_unlock_mutex(&lock->wait_lock, flags);
debug_mutex_free_waiter(&waiter);
mutex_release(&lock->dep_map, 1, ip);
--
2.7.4

2016-11-28 12:21:02

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

From: Nicolai Hähnle <[email protected]>

In the following scenario, thread #1 should back off its attempt to lock
ww1 and unlock ww2 (assuming the acquire context stamps are ordered
accordingly).

Thread #0 Thread #1
--------- ---------
successfully lock ww2
set ww1->base.owner
attempt to lock ww1
confirm ww1->ctx == NULL
enter mutex_spin_on_owner
set ww1->ctx

What was likely to happen previously is:

attempt to lock ww2
refuse to spin because
ww2->ctx != NULL
schedule()
detect thread #0 is off CPU
stop optimistic spin
return -EDEADLK
unlock ww2
wakeup thread #0
lock ww2

Now, we are more likely to see:

detect ww1->ctx != NULL
stop optimistic spin
return -EDEADLK
unlock ww2
successfully lock ww2

... because thread #1 will stop its optimistic spin as soon as possible.

The whole scenario is quite unlikely, since it requires thread #1 to get
between thread #0 setting the owner and setting the ctx. But since we're
idling here anyway, the additional check is basically free.

Found by inspection.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 9b34961..0afa998 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -350,7 +350,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
* access and not reliable.
*/
static noinline
-bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
+ bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
{
bool ret = true;

@@ -373,6 +374,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
break;
}

+ if (use_ww_ctx && ww_ctx->acquired > 0) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
+
+ /*
+ * If ww->ctx is set the contents are undefined, only
+ * by acquiring wait_lock there is a guarantee that
+ * they are not invalid when reading.
+ *
+ * As such, when deadlock detection needs to be
+ * performed the optimistic spinning cannot be done.
+ *
+ * Check this in every inner iteration because we may
+ * be racing against another thread's ww_mutex_lock.
+ */
+ if (READ_ONCE(ww->ctx)) {
+ ret = false;
+ break;
+ }
+ }
+
cpu_relax();
}
rcu_read_unlock();
@@ -460,22 +483,6 @@ static bool mutex_optimistic_spin(struct mutex *lock,
for (;;) {
struct task_struct *owner;

- if (use_ww_ctx && ww_ctx->acquired > 0) {
- struct ww_mutex *ww;
-
- ww = container_of(lock, struct ww_mutex, base);
- /*
- * If ww->ctx is set the contents are undefined, only
- * by acquiring wait_lock there is a guarantee that
- * they are not invalid when reading.
- *
- * As such, when deadlock detection needs to be
- * performed the optimistic spinning cannot be done.
- */
- if (READ_ONCE(ww->ctx))
- goto fail_unlock;
- }
-
/*
* If there's an owner, wait for it to either
* release the lock or go to sleep.
@@ -487,7 +494,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
break;
}

- if (!mutex_spin_on_owner(lock, owner))
+ if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
+ ww_ctx))
goto fail_unlock;
}

--
2.7.4

2016-11-28 12:21:13

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

From: Nicolai Hähnle <[email protected]>

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
index 488909a..e1d516f 100644
--- a/drivers/gpu/drm/vgem/vgem_fence.c
+++ b/drivers/gpu/drm/vgem/vgem_fence.c
@@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,

/* Expose the fence via the dma-buf */
ret = 0;
- mutex_lock(&resv->lock.base);
+ ww_mutex_lock(&resv->lock.base, NULL);
if (arg->flags & VGEM_FENCE_WRITE)
reservation_object_add_excl_fence(resv, fence);
else if ((ret = reservation_object_reserve_shared(resv)) == 0)
reservation_object_add_shared_fence(resv, fence);
- mutex_unlock(&resv->lock.base);
+ ww_mutex_unlock(&resv->lock.base);

/* Record the fence in our idr for later signaling */
if (ret == 0) {
--
2.7.4

2016-11-28 12:21:24

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

From: Nicolai Hähnle <[email protected]>

We will add a new field to struct mutex_waiter. This field must be
initialized for all waiters if any waiter uses the ww_use_ctx path.

So there is a trade-off: Keep ww_mutex locking without a context on the
faster non-use_ww_ctx path, at the cost of adding the initialization to all
mutex locks (including non-ww_mutexes), or avoid the additional cost for
non-ww_mutex locks, at the cost of adding additional checks to the
use_ww_ctx path.

We take the latter choice. It may be worth eliminating the users of
ww_mutex_lock(lock, NULL), but there are a lot of them.

Move the declaration of ww in __mutex_lock_common closer to its uses
because gcc otherwise (incorrectly) believes ww may be used without
initialization.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/ww_mutex.h | 11 ++---------
kernel/locking/mutex.c | 37 +++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 2bb5deb..a5960e5 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -222,11 +222,7 @@ extern int __must_check __ww_mutex_lock_interruptible(struct ww_mutex *lock,
*/
static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
{
- if (ctx)
- return __ww_mutex_lock(lock, ctx);
-
- mutex_lock(&lock->base);
- return 0;
+ return __ww_mutex_lock(lock, ctx);
}

/**
@@ -262,10 +258,7 @@ static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ct
static inline int __must_check ww_mutex_lock_interruptible(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
- if (ctx)
- return __ww_mutex_lock_interruptible(lock, ctx);
- else
- return mutex_lock_interruptible(&lock->base);
+ return __ww_mutex_lock_interruptible(lock, ctx);
}

/**
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 200629a..585627f 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -381,7 +381,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
break;
}

- if (use_ww_ctx && ww_ctx->acquired > 0) {
+ if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
struct ww_mutex *ww;

ww = container_of(lock, struct ww_mutex, base);
@@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
struct mutex_waiter waiter;
unsigned long flags;
bool first = false;
- struct ww_mutex *ww;
int ret;

- if (use_ww_ctx) {
+ if (use_ww_ctx && ww_ctx) {
+ struct ww_mutex *ww;
+
ww = container_of(lock, struct ww_mutex, base);
if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
return -EALREADY;
@@ -656,8 +657,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
/* got the lock, yay! */
lock_acquired(&lock->dep_map, ip);
- if (use_ww_ctx)
+ if (use_ww_ctx && ww_ctx) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
ww_mutex_set_context_fastpath(ww, ww_ctx);
+ }
preempt_enable();
return 0;
}
@@ -702,7 +707,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
goto err;
}

- if (use_ww_ctx && ww_ctx->acquired > 0) {
+ if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
if (ret)
goto err;
@@ -742,8 +747,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
/* got the lock - cleanup and rejoice! */
lock_acquired(&lock->dep_map, ip);

- if (use_ww_ctx)
+ if (use_ww_ctx && ww_ctx) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
ww_mutex_set_context_slowpath(ww, ww_ctx);
+ }

spin_unlock_mutex(&lock->wait_lock, flags);
preempt_enable();
@@ -830,8 +839,9 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)

might_sleep();
ret = __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE,
- 0, &ctx->dep_map, _RET_IP_, ctx, 1);
- if (!ret && ctx->acquired > 1)
+ 0, ctx ? &ctx->dep_map : NULL, _RET_IP_,
+ ctx, 1);
+ if (!ret && ctx && ctx->acquired > 1)
return ww_mutex_deadlock_injection(lock, ctx);

return ret;
@@ -845,9 +855,10 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)

might_sleep();
ret = __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE,
- 0, &ctx->dep_map, _RET_IP_, ctx, 1);
+ 0, ctx ? &ctx->dep_map : NULL, _RET_IP_,
+ ctx, 1);

- if (!ret && ctx->acquired > 1)
+ if (!ret && ctx && ctx->acquired > 1)
return ww_mutex_deadlock_injection(lock, ctx);

return ret;
@@ -1034,7 +1045,8 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
might_sleep();

if (__mutex_trylock_fast(&lock->base)) {
- ww_mutex_set_context_fastpath(lock, ctx);
+ if (ctx)
+ ww_mutex_set_context_fastpath(lock, ctx);
return 0;
}

@@ -1048,7 +1060,8 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
might_sleep();

if (__mutex_trylock_fast(&lock->base)) {
- ww_mutex_set_context_fastpath(lock, ctx);
+ if (ctx)
+ ww_mutex_set_context_fastpath(lock, ctx);
return 0;
}

--
2.7.4

2016-11-28 12:21:34

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 11/11] [rfc] locking/ww_mutex: Always spin optimistically for the first waiter

From: Nicolai Hähnle <[email protected]>

Check the current owner's context once against our stamp. If our stamp is
lower, we continue to spin optimistically instead of backing off.

This is correct with respect to deadlock detection because while the
(owner, ww_ctx) pair may re-appear if the owner task manages to unlock
and re-acquire the lock while we're spinning, the context could only have
been re-initialized with an even higher stamp. We also still detect when
we have to back off for other waiters that join the list while we're
spinning.

But taking the wait_lock in mutex_spin_on_owner feels iffy, even if it is
done only once.

Median timings taken of a contention-heavy GPU workload:

Before:
real 0m53.086s
user 0m7.360s
sys 1m46.204s

After:
real 0m52.577s
user 0m7.544s
sys 1m49.200s

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 35 ++++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ee84007..e7d5fac 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -378,6 +378,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
struct mutex_waiter *waiter)
{
bool ret = true;
+ struct ww_acquire_ctx *owner_ww_ctx = NULL;
+
+ if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
+ struct ww_mutex *ww;
+ unsigned long flags;
+
+ ww = container_of(lock, struct ww_mutex, base);
+
+ /*
+ * Check the stamp of the current owner once. This allows us
+ * to spin optimistically in the case where the current owner
+ * has a higher stamp than us.
+ */
+ spin_lock_mutex(&lock->wait_lock, flags);
+ owner_ww_ctx = ww->ctx;
+ if (owner_ww_ctx &&
+ __ww_mutex_stamp_after(ww_ctx, owner_ww_ctx)) {
+ spin_unlock_mutex(&lock->wait_lock, flags);
+ return false;
+ }
+ spin_unlock_mutex(&lock->wait_lock, flags);
+ }

rcu_read_lock();
while (__mutex_owner(lock) == owner) {
@@ -414,9 +436,16 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
* Check this in every inner iteration because we may
* be racing against another thread's ww_mutex_lock.
*/
- if (ww_ctx->acquired > 0 && READ_ONCE(ww->ctx)) {
- ret = false;
- break;
+ if (ww_ctx->acquired > 0) {
+ struct ww_acquire_ctx *current_ctx;
+
+ current_ctx = READ_ONCE(ww->ctx);
+
+ if (current_ctx &&
+ current_ctx != owner_ww_ctx) {
+ ret = false;
+ break;
+ }
}

/*
--
2.7.4

2016-11-28 12:22:04

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 10/11] Documentation/locking/ww_mutex: Update the design document

From: Nicolai Hähnle <[email protected]>

Document the invariants we maintain for the wait list of ww_mutexes.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
Documentation/locking/ww-mutex-design.txt | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 8a112dc..34c3a1b 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -309,11 +309,15 @@ Design:
normal mutex locks, which are far more common. As such there is only a small
increase in code size if wait/wound mutexes are not used.

+ We maintain the following invariants for the wait list:
+ (1) Waiters with an acquire context are sorted by stamp order; waiters
+ without an acquire context are interspersed in FIFO order.
+ (2) Among waiters with contexts, only the first one can have other locks
+ acquired already (ctx->acquired > 0). Note that this waiter may come
+ after other waiters without contexts in the list.
+
In general, not much contention is expected. The locks are typically used to
- serialize access to resources for devices. The only way to make wakeups
- smarter would be at the cost of adding a field to struct mutex_waiter. This
- would add overhead to all cases where normal mutexes are used, and
- ww_mutexes are generally less performance sensitive.
+ serialize access to resources for devices.

Lockdep:
Special care has been taken to warn for as many cases of api abuse
--
2.7.4

2016-11-28 12:23:22

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 09/11] locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging

From: Nicolai Hähnle <[email protected]>

Help catch cases where mutex_lock is used directly on w/w mutexes, which
otherwise result in the w/w tasks reading uninitialized data.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/poison.h | 1 +
kernel/locking/mutex.c | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/include/linux/poison.h b/include/linux/poison.h
index 51334ed..a395403 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -80,6 +80,7 @@
/********** kernel/mutexes **********/
#define MUTEX_DEBUG_INIT 0x11
#define MUTEX_DEBUG_FREE 0x22
+#define MUTEX_POISON_WW_CTX ((void *) 0x500 + POISON_POINTER_DELTA)

/********** lib/flex_array.c **********/
#define FLEX_ARRAY_FREE 0x6c /* for use-after-free poisoning */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 1e4da21..ee84007 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -783,6 +783,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
if (!use_ww_ctx) {
/* add waiting tasks to the end of the waitqueue (FIFO): */
list_add_tail(&waiter.list, &lock->wait_list);
+
+#ifdef CONFIG_DEBUG_MUTEXES
+ waiter.ww_ctx = MUTEX_POISON_WW_CTX;
+#endif
} else {
/* Add in stamp order, waking up waiters that must back off. */
ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
--
2.7.4

2016-11-28 12:23:31

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 08/11] locking/ww_mutex: Yield to other waiters from optimistic spin

From: Nicolai Hähnle <[email protected]>

Lock stealing is less beneficial for w/w mutexes since we may just end up
backing off if we stole from a thread with an earlier acquire stamp that
already holds another w/w mutex that we also need. So don't spin
optimistically unless we are sure that there is no other waiter that might
cause us to back off.

Median timings taken of a contention-heavy GPU workload:

Before:
real 0m52.946s
user 0m7.272s
sys 1m55.964s

After:
real 0m53.086s
user 0m7.360s
sys 1m46.204s

This particular workload still spends 20%-25% of CPU in mutex_spin_on_owner
according to perf, but my attempts to further reduce this spinning based on
various heuristics all lead to an increase in measured wall time despite
the decrease in sys time.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 48 ++++++++++++++++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0f6f1da..1e4da21 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -374,7 +374,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
*/
static noinline
bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
- bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
+ bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx,
+ struct mutex_waiter *waiter)
{
bool ret = true;

@@ -397,7 +398,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
break;
}

- if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
+ if (use_ww_ctx && ww_ctx) {
struct ww_mutex *ww;

ww = container_of(lock, struct ww_mutex, base);
@@ -413,7 +414,30 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
* Check this in every inner iteration because we may
* be racing against another thread's ww_mutex_lock.
*/
- if (READ_ONCE(ww->ctx)) {
+ if (ww_ctx->acquired > 0 && READ_ONCE(ww->ctx)) {
+ ret = false;
+ break;
+ }
+
+ /*
+ * If we aren't on the wait list yet, cancel the spin
+ * if there are waiters. We want to avoid stealing the
+ * lock from a waiter with an earlier stamp, since the
+ * other thread may already own a lock that we also
+ * need.
+ */
+ if (!waiter &&
+ (atomic_long_read(&lock->owner) &
+ MUTEX_FLAG_WAITERS)) {
+ ret = false;
+ break;
+ }
+
+ /*
+ * Similarly, stop spinning if we are no longer the
+ * first waiter.
+ */
+ if (waiter && !__mutex_waiter_is_first(lock, waiter)) {
ret = false;
break;
}
@@ -479,7 +503,8 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
*/
static bool mutex_optimistic_spin(struct mutex *lock,
struct ww_acquire_ctx *ww_ctx,
- const bool use_ww_ctx, const bool waiter)
+ const bool use_ww_ctx,
+ struct mutex_waiter *waiter)
{
struct task_struct *task = current;

@@ -518,12 +543,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
}

if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
- ww_ctx))
+ ww_ctx, waiter))
goto fail_unlock;
}

/* Try to acquire the mutex if it is unlocked. */
- if (__mutex_trylock(lock, waiter))
+ if (__mutex_trylock(lock, waiter != NULL))
break;

/*
@@ -565,7 +590,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
#else
static bool mutex_optimistic_spin(struct mutex *lock,
struct ww_acquire_ctx *ww_ctx,
- const bool use_ww_ctx, const bool waiter)
+ const bool use_ww_ctx,
+ struct mutex_waiter *waiter)
{
return false;
}
@@ -725,7 +751,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);

if (__mutex_trylock(lock, false) ||
- mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
+ mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, NULL)) {
/* got the lock, yay! */
lock_acquired(&lock->dep_map, ip);
if (use_ww_ctx && ww_ctx) {
@@ -830,8 +856,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* state back to RUNNING and fall through the next schedule(),
* or we must see its unlock and acquire.
*/
- if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
- __mutex_trylock(lock, use_ww_ctx || first))
+ if ((first &&
+ mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
+ &waiter)) ||
+ __mutex_trylock(lock, use_ww_ctx || first))
break;

spin_lock_mutex(&lock->wait_lock, flags);
--
2.7.4

2016-11-28 12:23:42

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 07/11] locking/ww_mutex: Wake at most one waiter for back off when acquiring the lock

From: Nicolai Hähnle <[email protected]>

The wait list is sorted by stamp order, and the only waiting task that may
have to back off is the first waiter with a context.

The regular slow path does not have to wake any other tasks at all, since
all other waiters that would have to back off were either woken up when
the waiter was added to the list, or detected the condition before they
added themselves.

Median timings taken of a contention-heavy GPU workload:

Without this series:
real 0m59.900s
user 0m7.516s
sys 2m16.076s

With changes up to and including this patch:
real 0m52.946s
user 0m7.272s
sys 1m55.964s

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 58 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index d310703e..0f6f1da 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -285,6 +285,35 @@ __ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
}

/*
+ * Wake up any waiters that may have to back off when the lock is held by the
+ * given context.
+ *
+ * Due to the invariants on the wait list, this can only affect the first
+ * waiter with a context.
+ *
+ * Must be called with wait_lock held. The current task must not be on the
+ * wait list.
+ */
+static void __sched
+__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+ struct mutex_waiter *cur;
+
+ list_for_each_entry(cur, &lock->wait_list, list) {
+ if (!cur->ww_ctx)
+ continue;
+
+ if (cur->ww_ctx->acquired > 0 &&
+ __ww_mutex_stamp_after(cur->ww_ctx, ww_ctx)) {
+ debug_mutex_wake_waiter(lock, cur);
+ wake_up_process(cur->task);
+ }
+
+ break;
+ }
+}
+
+/*
* After acquiring lock with fastpath or when we lost out in contested
* slowpath, set ctx and wake up any waiters so they can recheck.
*/
@@ -293,7 +322,6 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
unsigned long flags;
- struct mutex_waiter *cur;

ww_mutex_lock_acquired(lock, ctx);

@@ -319,16 +347,15 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
* so they can see the new lock->ctx.
*/
spin_lock_mutex(&lock->base.wait_lock, flags);
- list_for_each_entry(cur, &lock->base.wait_list, list) {
- debug_mutex_wake_waiter(&lock->base, cur);
- wake_up_process(cur->task);
- }
+ __ww_mutex_wakeup_for_backoff(&lock->base, ctx);
spin_unlock_mutex(&lock->base.wait_lock, flags);
}

/*
- * After acquiring lock in the slowpath set ctx and wake up any
- * waiters so they can recheck.
+ * After acquiring lock in the slowpath set ctx.
+ *
+ * Unlike for the fast path, the caller ensures that waiters are woken up where
+ * necessary.
*
* Callers must hold the mutex wait_lock.
*/
@@ -336,19 +363,8 @@ static __always_inline void
ww_mutex_set_context_slowpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
- struct mutex_waiter *cur;
-
ww_mutex_lock_acquired(lock, ctx);
lock->ctx = ctx;
-
- /*
- * Give any possible sleeping processes the chance to wake up,
- * so they can recheck if they have to back off.
- */
- list_for_each_entry(cur, &lock->base.wait_list, list) {
- debug_mutex_wake_waiter(&lock->base, cur);
- wake_up_process(cur->task);
- }
}

#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
@@ -726,8 +742,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
/*
* After waiting to acquire the wait_lock, try again.
*/
- if (__mutex_trylock(lock, false))
+ if (__mutex_trylock(lock, false)) {
+ if (use_ww_ctx && ww_ctx)
+ __ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
goto skip_wait;
+ }

debug_mutex_lock_common(lock, &waiter);
debug_mutex_add_waiter(lock, &waiter, task);
--
2.7.4

2016-11-28 12:23:53

by Nicolai Hähnle

[permalink] [raw]

Subject: [PATCH 06/11] locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list

From: Nicolai Hähnle <[email protected]>

While adding our task as a waiter, detect if another task should back off
because of us.

With this patch, we establish the invariant that the wait list contains
at most one (sleeping) waiter with ww_ctx->acquired > 0, and this waiter
will be the first waiter with a context.

Since only waiters with ww_ctx->acquired > 0 have to back off, this allows
us to be much more economical with wakeups.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 42 ++++++++++++++++++++++++++++++++----------
1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 01dcae7..d310703e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -609,23 +609,34 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
EXPORT_SYMBOL(ww_mutex_unlock);

static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+ struct ww_acquire_ctx *ctx)
{
struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
+ struct mutex_waiter *cur;

- if (!hold_ctx)
- return 0;
+ if (hold_ctx && __ww_mutex_stamp_after(ctx, hold_ctx))
+ goto deadlock;

- if (__ww_mutex_stamp_after(ctx, hold_ctx)) {
-#ifdef CONFIG_DEBUG_MUTEXES
- DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
- ctx->contending_lock = ww;
-#endif
- return -EDEADLK;
+ /*
+ * If there is a waiter in front of us that has a context, then its
+ * stamp is earlier than ours and we must back off.
+ */
+ cur = waiter;
+ list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
+ if (cur->ww_ctx)
+ goto deadlock;
}

return 0;
+
+deadlock:
+#ifdef CONFIG_DEBUG_MUTEXES
+ DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
+ ctx->contending_lock = ww;
+#endif
+ return -EDEADLK;
}

static inline int __sched
@@ -654,6 +665,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
}

list_add_tail(&waiter->list, &cur->list);
+
+ /*
+ * Wake up the waiter so that it gets a chance to back
+ * off.
+ */
+ if (cur->ww_ctx->acquired > 0) {
+ debug_mutex_wake_waiter(lock, cur);
+ wake_up_process(cur->task);
+ }
+
return 0;
}
}
@@ -756,7 +777,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
}

if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
- ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
+ ret = __ww_mutex_lock_check_stamp(lock, &waiter,
+ ww_ctx);
if (ret)
goto err;
}
--
2.7.4

2016-11-28 12:42:37

by Maarten Lankhorst

[permalink] [raw]

Subject: Re: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

Op 28-11-16 om 13:20 schreef Nicolai Hähnle:
> From: Nicolai Hähnle <[email protected]>
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: [email protected]
> Signed-off-by: Nicolai Hähnle <[email protected]>
> ---
> drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
> index 488909a..e1d516f 100644
> --- a/drivers/gpu/drm/vgem/vgem_fence.c
> +++ b/drivers/gpu/drm/vgem/vgem_fence.c
> @@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
>
> /* Expose the fence via the dma-buf */
> ret = 0;
> - mutex_lock(&resv->lock.base);
> + ww_mutex_lock(&resv->lock.base, NULL);
Yuck, can we rename base to __NEVER_TOUCH_DIRECTLY_OUTSIDE_LOCKING_CORE?
It's harder to get them confused like that, even with a null context it's illegal to call mutex_lock/unlock directly.

~Maarten

2016-11-28 12:50:43

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

On Mon, Nov 28, 2016 at 01:42:26PM +0100, Maarten Lankhorst wrote:

> > + ww_mutex_lock(&resv->lock.base, NULL);
> Yuck, can we rename base to __NEVER_TOUCH_DIRECTLY_OUTSIDE_LOCKING_CORE?
> It's harder to get them confused like that, even with a null context it's illegal to call mutex_lock/unlock directly.

I think there's a __private sparse annotation that could be used.

2016-11-28 12:59:11

by Christian König

[permalink] [raw]

Subject: Re: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

Am 28.11.2016 um 13:20 schrieb Nicolai Hähnle:
> From: Nicolai Hähnle <[email protected]>
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: [email protected]
> Signed-off-by: Nicolai Hähnle <[email protected]>
> ---
> drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
> index 488909a..e1d516f 100644
> --- a/drivers/gpu/drm/vgem/vgem_fence.c
> +++ b/drivers/gpu/drm/vgem/vgem_fence.c
> @@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
>
> /* Expose the fence via the dma-buf */
> ret = 0;
> - mutex_lock(&resv->lock.base);
> + ww_mutex_lock(&resv->lock.base, NULL);

Accessing the base member here is probably superfluous now and will most
likely raise a warning.

Christian.

> if (arg->flags & VGEM_FENCE_WRITE)
> reservation_object_add_excl_fence(resv, fence);
> else if ((ret = reservation_object_reserve_shared(resv)) == 0)
> reservation_object_add_shared_fence(resv, fence);
> - mutex_unlock(&resv->lock.base);
> + ww_mutex_unlock(&resv->lock.base);
>
> /* Record the fence in our idr for later signaling */
> if (ret == 0) {

2016-11-30 00:36:08

by Chris Wilson

[permalink] [raw]

Subject: [PATCH 1/4] locking: Begin kselftests for ww_mutex

Signed-off-by: Chris Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Nicolai Hähnle <[email protected]>
---
kernel/locking/Makefile | 1 +
kernel/locking/test-ww_mutex.c | 137 +++++++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 10 +++
3 files changed, 148 insertions(+)
create mode 100644 kernel/locking/test-ww_mutex.c

diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 6f88e352cd4f..760158d9d98d 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -28,3 +28,4 @@ obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o
obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o
obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o
+obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
new file mode 100644
index 000000000000..e94b807e06c2
--- /dev/null
+++ b/kernel/locking/test-ww_mutex.c
@@ -0,0 +1,137 @@
+/*
+ * Module-based API test facility for ww_mutexes
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/ww_mutex.h>
+#include <linux/completion.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+static DEFINE_WW_CLASS(ww_class);
+
+struct test_mutex {
+ struct work_struct work;
+ struct ww_mutex mutex;
+ struct completion ready, go, done;
+ unsigned flags;
+#define TEST_AB_SPIN BIT(0)
+#define TEST_AB_TRY BIT(1)
+};
+
+static void test_mutex_work(struct work_struct *work)
+{
+ struct test_mutex *mtx = container_of(work, typeof(*mtx), work);
+
+ complete(&mtx->ready);
+ wait_for_completion(&mtx->go);
+
+ if (mtx->flags & TEST_AB_TRY) {
+ while (!ww_mutex_trylock(&mtx->mutex))
+ cpu_relax();
+ } else {
+ ww_mutex_lock(&mtx->mutex, NULL);
+ }
+ complete(&mtx->done);
+ ww_mutex_unlock(&mtx->mutex);
+}
+
+static int __test_mutex(unsigned flags)
+{
+ struct test_mutex mtx;
+ int ret;
+
+ ww_mutex_init(&mtx.mutex, &ww_class);
+ INIT_WORK_ONSTACK(&mtx.work, test_mutex_work);
+ init_completion(&mtx.ready);
+ init_completion(&mtx.go);
+ init_completion(&mtx.done);
+ mtx.flags = flags;
+
+ schedule_work(&mtx.work);
+
+ wait_for_completion(&mtx.ready);
+ ww_mutex_lock(&mtx.mutex, NULL);
+ complete(&mtx.go);
+ if (flags & TEST_AB_SPIN) {
+ unsigned long timeout = jiffies + HZ;
+
+ ret = 0;
+ do {
+ if (completion_done(&mtx.done)) {
+ ret = -EINVAL;
+ break;
+ }
+ cpu_relax();
+ } while (time_before(jiffies, timeout));
+ } else {
+ ret = wait_for_completion_timeout(&mtx.done, HZ);
+ }
+ ww_mutex_unlock(&mtx.mutex);
+
+ if (ret) {
+ pr_err("%s(flags=%x): mutual exclusion failure\n",
+ __func__, flags);
+ ret = -EINVAL;
+ }
+
+ flush_work(&mtx.work);
+ destroy_work_on_stack(&mtx.work);
+ return ret;
+}
+
+static int test_mutex(void)
+{
+ unsigned int modes[] = {
+ 0,
+ TEST_AB_SPIN,
+ TEST_AB_TRY,
+ TEST_AB_SPIN | TEST_AB_TRY,
+ };
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(modes); i++) {
+ int ret;
+
+ ret = __test_mutex(modes[i]);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static int __init test_ww_mutex_init(void)
+{
+ int ret;
+
+ ret = test_mutex();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static void __exit test_ww_mutex_exit(void)
+{
+}
+
+module_init(test_ww_mutex_init);
+module_exit(test_ww_mutex_exit);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a6c8db1d62f6..5ffe7fd34c50 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1161,6 +1161,16 @@ config LOCK_TORTURE_TEST
Say M if you want these torture tests to build as a module.
Say N if you are unsure.

+config WW_MUTEX_SELFTEST
+ tristate "Wait/wound mutex selftests"
+ select DEBUG_WW_MUTEX_SLOWPATH
+ help
+ This option provides a kernel module that runs tests on the
+ on the struct ww_mutex locking API.
+
+ Say M if you want these self tests to build as a module.
+ Say N if you are unsure.
+
endmenu # lock debugging

config TRACE_IRQFLAGS
--
2.10.2

2016-11-30 00:36:12

by Chris Wilson

[permalink] [raw]

Subject: [PATCH 2/4] locking: Add kselftests for ww_mutex AA deadlock detection

Signed-off-by: Chris Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Nicolai Hähnle <[email protected]>
---
kernel/locking/test-ww_mutex.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index e94b807e06c2..02a4bacf8aac 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -118,6 +118,41 @@ static int test_mutex(void)
return 0;
}

+static int test_aa(void)
+{
+ struct ww_mutex mutex;
+ struct ww_acquire_ctx ctx;
+ int ret;
+
+ ww_mutex_init(&mutex, &ww_class);
+ ww_acquire_init(&ctx, &ww_class);
+
+ ww_mutex_lock(&mutex, &ctx);
+
+ if (ww_mutex_trylock(&mutex)) {
+ pr_err("%s: trylocked itself!\n", __func__);
+ ww_mutex_unlock(&mutex);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = ww_mutex_lock(&mutex, &ctx);
+ if (ret != -EALREADY) {
+ pr_err("%s: missed deadlock for recursing, ret=%d\n",
+ __func__, ret);
+ if (!ret)
+ ww_mutex_unlock(&mutex);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = 0;
+out:
+ ww_mutex_unlock(&mutex);
+ ww_acquire_fini(&ctx);
+ return ret;
+}
+
static int __init test_ww_mutex_init(void)
{
int ret;
@@ -126,6 +161,10 @@ static int __init test_ww_mutex_init(void)
if (ret)
return ret;

+ ret = test_aa();
+ if (ret)
+ return ret;
+
return 0;
}

--
2.10.2

2016-11-30 00:36:21

by Chris Wilson

[permalink] [raw]

Subject: [PATCH 4/4] locking: Add kselftests for ww_mutex stress

Signed-off-by: Chris Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Nicolai Hähnle <[email protected]>
---
kernel/locking/test-ww_mutex.c | 134 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 134 insertions(+)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 63a5031de138..c367014f62dc 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -21,6 +21,9 @@
#include <linux/kthread.h>
#include <linux/ww_mutex.h>
#include <linux/completion.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/delay.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Intel Corporation");
@@ -224,6 +227,129 @@ static int test_abba(void)
return ret;
}

+struct stress {
+ struct work_struct work;
+ struct ww_mutex *locks;
+ int nlocks;
+};
+
+static int *get_random_order(int count)
+{
+ int *order;
+ int n, r, tmp;
+
+ order = kmalloc_array(count, sizeof(*order), GFP_TEMPORARY);
+ if (!order)
+ return order;
+
+ for (n = 0; n < count; n++)
+ order[n] = n;
+
+ for (n = count - 1; n > 1; n--) {
+ r = get_random_int() % (n + 1);
+ if (r != n) {
+ tmp = order[n];
+ order[n] = order[r];
+ order[r] = tmp;
+ }
+ }
+
+ return order;
+}
+
+static void stress_work(struct work_struct *work)
+{
+ struct stress *stress = container_of(work, typeof(*stress), work);
+ const int nlocks = stress->nlocks;
+ struct ww_mutex *locks = stress->locks;
+ struct ww_acquire_ctx ctx;
+ int contended = -1;
+ int *order;
+ int n, ret;
+
+ order = get_random_order(nlocks);
+ if (!order)
+ return;
+
+ ww_acquire_init(&ctx, &ww_class);
+
+retry:
+ ret = 0;
+ for (n = 0; n < nlocks; n++) {
+ if (n == contended)
+ continue;
+
+ ret = ww_mutex_lock(&locks[order[n]], &ctx);
+ if (ret < 0)
+ break;
+ }
+ if (!ret)
+ usleep_range(1000, 2000); /* dummy load */
+
+ if (contended > n)
+ ww_mutex_unlock(&locks[order[contended]]);
+ contended = n;
+ while (n--)
+ ww_mutex_unlock(&locks[order[n]]);
+
+ if (ret == -EDEADLK) {
+ ww_mutex_lock_slow(&locks[order[contended]], &ctx);
+ goto retry;
+ }
+
+ if (ret)
+ pr_err_once("ww_mutex stress test failed with %d\n", ret);
+
+ ww_acquire_fini(&ctx);
+
+ kfree(order);
+ kfree(stress);
+}
+
+static int stress(int nlocks, int count)
+{
+ struct ww_mutex *locks;
+ struct workqueue_struct *wq;
+ int ret = -ENOMEM;
+ int n;
+
+ wq = alloc_workqueue("test-ww_mutex", WQ_UNBOUND, 0);
+ if (!wq)
+ return -ENOMEM;
+
+ locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
+ if (!locks)
+ goto err;
+
+ for (n = 0; n < nlocks; n++)
+ ww_mutex_init(&locks[n], &ww_class);
+
+ for (n = 0; n < count; n++) {
+ struct stress *stress;
+
+ stress = kmalloc(sizeof(*stress), GFP_KERNEL);
+ if (!stress)
+ break;
+
+ INIT_WORK(&stress->work, stress_work);
+ stress->locks = locks;
+ stress->nlocks = nlocks;
+
+ queue_work(wq, &stress->work);
+ }
+
+ flush_workqueue(wq);
+
+ for (n = 0; n < nlocks; n++)
+ ww_mutex_destroy(&locks[n]);
+ kfree(locks);
+
+ ret = 0;
+err:
+ destroy_workqueue(wq);
+ return ret;
+}
+
static int __init test_ww_mutex_init(void)
{
int ret;
@@ -240,6 +366,14 @@ static int __init test_ww_mutex_init(void)
if (ret)
return ret;

+ ret = stress(16, 1024);
+ if (ret)
+ return ret;
+
+ ret = stress(4096, 1024);
+ if (ret)
+ return ret;
+
return 0;
}

--
2.10.2

2016-11-30 00:36:43

by Chris Wilson

[permalink] [raw]

Subject: [PATCH 3/4] locking: Add kselftests for ww_mutex ABBA deadlock detection

Signed-off-by: Chris Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Nicolai Hähnle <[email protected]>
---
kernel/locking/test-ww_mutex.c | 75 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 02a4bacf8aac..63a5031de138 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -153,6 +153,77 @@ static int test_aa(void)
return ret;
}

+struct test_abba {
+ struct work_struct work;
+ struct ww_mutex a_mutex;
+ struct ww_mutex b_mutex;
+ struct completion a_ready;
+ struct completion b_ready;
+ struct completion done;
+ int ret;
+};
+
+static void test_abba_work(struct work_struct *work)
+{
+ struct test_abba *abba = container_of(work, typeof(*abba), work);
+ struct ww_acquire_ctx ctx;
+
+ ww_acquire_init(&ctx, &ww_class);
+ ww_mutex_lock(&abba->b_mutex, &ctx);
+
+ complete(&abba->b_ready);
+ wait_for_completion(&abba->a_ready);
+ abba->ret = ww_mutex_lock(&abba->a_mutex, &ctx);
+ if (!abba->ret)
+ ww_mutex_unlock(&abba->a_mutex);
+
+ ww_mutex_unlock(&abba->b_mutex);
+ ww_acquire_fini(&ctx);
+
+ complete(&abba->done);
+}
+
+static int test_abba(void)
+{
+ struct test_abba abba;
+ struct ww_acquire_ctx ctx;
+ int ret;
+
+ ww_mutex_init(&abba.a_mutex, &ww_class);
+ ww_mutex_init(&abba.b_mutex, &ww_class);
+ INIT_WORK_ONSTACK(&abba.work, test_abba_work);
+ init_completion(&abba.a_ready);
+ init_completion(&abba.b_ready);
+ init_completion(&abba.done);
+
+ schedule_work(&abba.work);
+
+ ww_acquire_init(&ctx, &ww_class);
+ ww_mutex_lock(&abba.a_mutex, &ctx);
+ complete(&abba.a_ready);
+ wait_for_completion(&abba.b_ready);
+ ret = ww_mutex_lock(&abba.b_mutex, &ctx);
+ if (!ret)
+ ww_mutex_unlock(&abba.b_mutex);
+ ww_mutex_unlock(&abba.a_mutex);
+
+ wait_for_completion(&abba.done);
+
+ if (ret != -EDEADLK && abba.ret != -EDEADLK) {
+ pr_err("%s: missed ABBA deadlock, A ret=%d, B ret=%d\n",
+ __func__, ret, abba.ret);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = 0;
+out:
+ ww_acquire_fini(&ctx);
+ flush_work(&abba.work);
+ destroy_work_on_stack(&abba.work);
+ return ret;
+}
+
static int __init test_ww_mutex_init(void)
{
int ret;
@@ -165,6 +236,10 @@ static int __init test_ww_mutex_init(void)
if (ret)
return ret;

+ ret = test_abba();
+ if (ret)
+ return ret;
+
return 0;
}

--
2.10.2

2016-11-30 08:01:08

by Nicolai Hähnle

[permalink] [raw]

Subject: Re: [PATCH 1/4] locking: Begin kselftests for ww_mutex

On 30.11.2016 01:35, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Nicolai Hähnle <[email protected]>
> ---
> kernel/locking/Makefile | 1 +
> kernel/locking/test-ww_mutex.c | 137 +++++++++++++++++++++++++++++++++++++++++
> lib/Kconfig.debug | 10 +++
> 3 files changed, 148 insertions(+)
> create mode 100644 kernel/locking/test-ww_mutex.c
>
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index 6f88e352cd4f..760158d9d98d 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -28,3 +28,4 @@ obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
> obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o
> obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o
> obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o
> +obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> new file mode 100644
> index 000000000000..e94b807e06c2
> --- /dev/null
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -0,0 +1,137 @@
> +/*
> + * Module-based API test facility for ww_mutexes
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, you can access it online at
> + * http://www.gnu.org/licenses/gpl-2.0.html.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/kthread.h>
> +#include <linux/ww_mutex.h>
> +#include <linux/completion.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Intel Corporation");
> +
> +static DEFINE_WW_CLASS(ww_class);
> +
> +struct test_mutex {
> + struct work_struct work;
> + struct ww_mutex mutex;
> + struct completion ready, go, done;
> + unsigned flags;
> +#define TEST_AB_SPIN BIT(0)
> +#define TEST_AB_TRY BIT(1)
> +};

Is it common to put #defines inside structs like that? It looks odd to
me. Apart from that, patches 1-4 all make sense to me.

Thanks,
Nicolai

2016-11-30 09:40:50

by Chris Wilson

[permalink] [raw]

Subject: Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai H?hnle wrote:
> I've included timings taken from a contention-heavy stress test to some of
> the patches. The stress test performs actual GPU operations which take a
> good chunk of the wall time, but even so, the series still manages to
> improve the wall time quite a bit.

In looking at your contention scenarios, what was the average/max list
size? Just wondering if it makes sense to use an rbtree + first_waiter
instead of a sorted list from the start.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-11-30 12:01:07

by Nicolai Hähnle

[permalink] [raw]

Subject: Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

On 30.11.2016 10:40, Chris Wilson wrote:
> On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai H?hnle wrote:
>> I've included timings taken from a contention-heavy stress test to some of
>> the patches. The stress test performs actual GPU operations which take a
>> good chunk of the wall time, but even so, the series still manages to
>> improve the wall time quite a bit.
>
> In looking at your contention scenarios, what was the average/max list
> size? Just wondering if it makes sense to use an rbtree + first_waiter
> instead of a sorted list from the start.

I haven't measured this with the new series; previously, while I was
debugging the deadlock on older kernels, I occasionally saw wait lists
of up to ~20 tasks, spit-balling the average over all the deadlock cases
I'd say the average was not more than ~5. The average _without_
deadlocks should be lower, if anything.

I saw that your test cases go quite a bit higher, but even the rather
extreme load I was testing with -- which is not quite a load from an
actual application, though it is related to one -- has 40 threads and so
a theoretical maximum of 40.

Nicolai

> -Chris
>

2016-11-30 12:20:43

by Chris Wilson

[permalink] [raw]

Subject: Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

On Wed, Nov 30, 2016 at 12:52:28PM +0100, Nicolai H?hnle wrote:
> On 30.11.2016 10:40, Chris Wilson wrote:
> >On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai H?hnle wrote:
> >>I've included timings taken from a contention-heavy stress test to some of
> >>the patches. The stress test performs actual GPU operations which take a
> >>good chunk of the wall time, but even so, the series still manages to
> >>improve the wall time quite a bit.
> >
> >In looking at your contention scenarios, what was the average/max list
> >size? Just wondering if it makes sense to use an rbtree + first_waiter
> >instead of a sorted list from the start.
>
> I haven't measured this with the new series; previously, while I was
> debugging the deadlock on older kernels, I occasionally saw wait
> lists of up to ~20 tasks, spit-balling the average over all the
> deadlock cases I'd say the average was not more than ~5. The average
> _without_ deadlocks should be lower, if anything.

Right, I wasn't expecting the list to be large, certainly no larger than
cores typically. On the borderline of where a more complex tree starts
to pay off.

> I saw that your test cases go quite a bit higher, but even the
> rather extreme load I was testing with -- which is not quite a load
> from an actual application, though it is related to one -- has 40
> threads and so a theoretical maximum of 40.

The stress loads were just values plucked out of nowhere to try and have
a reasonable stab at hitting the deadlock. Certainly if we were to wrap
that up in a microbenchmark we would want to have wider coverage (so the
graph against contention is more useful).

Do you have a branch I can pull the patches for (or what did you use as
the base)?
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-11-30 12:29:51

by Maarten Lankhorst

[permalink] [raw]

Subject: Re: [PATCH 4/4] locking: Add kselftests for ww_mutex stress

Op 30-11-16 om 01:35 schreef Chris Wilson:
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Nicolai Hähnle <[email protected]>
> ---
> kernel/locking/test-ww_mutex.c | 134 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 134 insertions(+)
>
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> index 63a5031de138..c367014f62dc 100644
> --- a/kernel/locking/test-ww_mutex.c
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -21,6 +21,9 @@
> #include <linux/kthread.h>
> #include <linux/ww_mutex.h>
> #include <linux/completion.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/delay.h>
>
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Intel Corporation");
> @@ -224,6 +227,129 @@ static int test_abba(void)
> return ret;
> }
>
> +struct stress {
> + struct work_struct work;
> + struct ww_mutex *locks;
> + int nlocks;
> +};
> +
> +static int *get_random_order(int count)
> +{
> + int *order;
> + int n, r, tmp;
> +
> + order = kmalloc_array(count, sizeof(*order), GFP_TEMPORARY);
> + if (!order)
> + return order;
> +
> + for (n = 0; n < count; n++)
> + order[n] = n;
> +
> + for (n = count - 1; n > 1; n--) {
> + r = get_random_int() % (n + 1);
> + if (r != n) {
> + tmp = order[n];
> + order[n] = order[r];
> + order[r] = tmp;
> + }
> + }
> +
> + return order;
> +}
> +
> +static void stress_work(struct work_struct *work)
> +{
> + struct stress *stress = container_of(work, typeof(*stress), work);
> + const int nlocks = stress->nlocks;
> + struct ww_mutex *locks = stress->locks;
> + struct ww_acquire_ctx ctx;
> + int contended = -1;
> + int *order;
> + int n, ret;
> +
> + order = get_random_order(nlocks);
> + if (!order)
> + return;
> +
> + ww_acquire_init(&ctx, &ww_class);
> +
> +retry:
> + ret = 0;
> + for (n = 0; n < nlocks; n++) {
> + if (n == contended)
> + continue;
> +
> + ret = ww_mutex_lock(&locks[order[n]], &ctx);
> + if (ret < 0)
> + break;
> + }
What's wrong with attempting to lock the contended lock here?
Who knows, this might find some more bugs than the functional tests already do.
> + if (!ret)
> + usleep_range(1000, 2000); /* dummy load */
> +
> + if (contended > n)
> + ww_mutex_unlock(&locks[order[contended]]);
> + contended = n;
> + while (n--)
> + ww_mutex_unlock(&locks[order[n]]);
> +
> + if (ret == -EDEADLK) {
> + ww_mutex_lock_slow(&locks[order[contended]], &ctx);
> + goto retry;
> + }
> +
> + if (ret)
> + pr_err_once("ww_mutex stress test failed with %d\n", ret);
> +
> + ww_acquire_fini(&ctx);
> +
> + kfree(order);
> + kfree(stress);
> +}
> +
> +static int stress(int nlocks, int count)
> +{
> + struct ww_mutex *locks;
> + struct workqueue_struct *wq;
> + int ret = -ENOMEM;
> + int n;
> +
> + wq = alloc_workqueue("test-ww_mutex", WQ_UNBOUND, 0);
> + if (!wq)
> + return -ENOMEM;
> +
> + locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
> + if (!locks)
> + goto err;
> +
> + for (n = 0; n < nlocks; n++)
> + ww_mutex_init(&locks[n], &ww_class);
> +
> + for (n = 0; n < count; n++) {
> + struct stress *stress;
> +
> + stress = kmalloc(sizeof(*stress), GFP_KERNEL);
> + if (!stress)
> + break;
> +
> + INIT_WORK(&stress->work, stress_work);
> + stress->locks = locks;
> + stress->nlocks = nlocks;
> +
> + queue_work(wq, &stress->work);
> + }
> +
> + flush_workqueue(wq);
> +
> + for (n = 0; n < nlocks; n++)
> + ww_mutex_destroy(&locks[n]);
> + kfree(locks);
> +
> + ret = 0;
> +err:
> + destroy_workqueue(wq);
> + return ret;
> +}
> +
> static int __init test_ww_mutex_init(void)
> {
> int ret;
> @@ -240,6 +366,14 @@ static int __init test_ww_mutex_init(void)
> if (ret)
> return ret;
>
> + ret = stress(16, 1024);
> + if (ret)
> + return ret;
> +
> + ret = stress(4096, 1024);
> + if (ret)
> + return ret;
> +
> return 0;
> }
>

2016-11-30 12:53:31

by Chris Wilson

[permalink] [raw]

Subject: Re: [PATCH 4/4] locking: Add kselftests for ww_mutex stress

On Wed, Nov 30, 2016 at 01:29:39PM +0100, Maarten Lankhorst wrote:
> > +static void stress_work(struct work_struct *work)
> > +{
> > + struct stress *stress = container_of(work, typeof(*stress), work);
> > + const int nlocks = stress->nlocks;
> > + struct ww_mutex *locks = stress->locks;
> > + struct ww_acquire_ctx ctx;
> > + int contended = -1;
> > + int *order;
> > + int n, ret;
> > +
> > + order = get_random_order(nlocks);
> > + if (!order)
> > + return;
> > +
> > + ww_acquire_init(&ctx, &ww_class);
> > +
> > +retry:
> > + ret = 0;
> > + for (n = 0; n < nlocks; n++) {
> > + if (n == contended)
> > + continue;
> > +
> > + ret = ww_mutex_lock(&locks[order[n]], &ctx);
> > + if (ret < 0)
> > + break;
> > + }
> What's wrong with attempting to lock the contended lock here?
> Who knows, this might find some more bugs than the functional tests already do.

I was trying to follow the guide, which was lock, backoff by unlocking
everything, slowlock the contended lock, then lock everything else.

I have now a second worker that follows the reordering method as well.
(As well as a test that slowlock after the ABBA deadlock detection
resolves the locking order.)

If you have a sketch of something else to try, I'll add it.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-11-30 13:41:03

by Nicolai Hähnle

[permalink] [raw]

Subject: Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

On 30.11.2016 13:20, Chris Wilson wrote:
> On Wed, Nov 30, 2016 at 12:52:28PM +0100, Nicolai H?hnle wrote:
>> On 30.11.2016 10:40, Chris Wilson wrote:
>>> On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai H?hnle wrote:
>>>> I've included timings taken from a contention-heavy stress test to some of
>>>> the patches. The stress test performs actual GPU operations which take a
>>>> good chunk of the wall time, but even so, the series still manages to
>>>> improve the wall time quite a bit.
>>>
>>> In looking at your contention scenarios, what was the average/max list
>>> size? Just wondering if it makes sense to use an rbtree + first_waiter
>>> instead of a sorted list from the start.
>>
>> I haven't measured this with the new series; previously, while I was
>> debugging the deadlock on older kernels, I occasionally saw wait
>> lists of up to ~20 tasks, spit-balling the average over all the
>> deadlock cases I'd say the average was not more than ~5. The average
>> _without_ deadlocks should be lower, if anything.
>
> Right, I wasn't expecting the list to be large, certainly no larger than
> cores typically. On the borderline of where a more complex tree starts
> to pay off.
>
>> I saw that your test cases go quite a bit higher, but even the
>> rather extreme load I was testing with -- which is not quite a load
>> from an actual application, though it is related to one -- has 40
>> threads and so a theoretical maximum of 40.
>
> The stress loads were just values plucked out of nowhere to try and have
> a reasonable stab at hitting the deadlock. Certainly if we were to wrap
> that up in a microbenchmark we would want to have wider coverage (so the
> graph against contention is more useful).
>
> Do you have a branch I can pull the patches for (or what did you use as
> the base)?

See git://people.freedesktop.org/~nh/linux mutex or
https://cgit.freedesktop.org/~nh/linux/log/?h=mutex.

It's based on tip/core/locking + agd5f's drm-next, the latter only
because I needed it for the test application.

Nicolai

2016-11-30 14:11:51

by Chris Wilson

[permalink] [raw]

Subject: Re: [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order

On Mon, Nov 28, 2016 at 01:20:06PM +0100, Nicolai H?hnle wrote:
> From: Nicolai H?hnle <[email protected]>
>
> Add regular waiters in stamp order. Keep adding waiters that have no
> context in FIFO order and take care not to starve them.
>
> While adding our task as a waiter, back off if we detect that there is a
> waiter with a lower stamp in front of us.
>
> Make sure to call lock_contended even when we back off early.

I'm hitting
[ 86.202749] WARNING: CPU: 1 PID: 813 at ./include/linux/ww_mutex.h:292 stress_inorder_work+0x436/0x4b5 [test_ww_mutex]
[ 86.202885] DEBUG_LOCKS_WARN_ON(!ctx->contending_lock)

which if I understand correctly is due to

> +static inline int __sched
> +__ww_mutex_add_waiter(struct mutex_waiter *waiter,
> + struct mutex *lock,
> + struct ww_acquire_ctx *ww_ctx)
> +{
> + if (ww_ctx) {
> + struct mutex_waiter *cur;
> +
> + /*
> + * Add the waiter before the first waiter with a higher stamp.
> + * Waiters without a context are skipped to avoid starving
> + * them.
> + */
> + list_for_each_entry(cur, &lock->wait_list, list) {
> + if (!cur->ww_ctx)
> + continue;
> +
> + if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
> + /* Back off immediately if necessary. */
> + if (ww_ctx->acquired > 0)
> + return -EDEADLK;

not setting ww_ctx->contending_lock here.

> +
> + continue;
> + }
> +
> + list_add_tail(&waiter->list, &cur->list);
> + return 0;
> + }
> + }
> +
> + list_add_tail(&waiter->list, &lock->wait_list);
> + return 0;
> +}

--
Chris Wilson, Intel Open Source Technology Centre

2016-11-30 19:09:33

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH 1/4] locking: Begin kselftests for ww_mutex

Hi Chris,

[auto build test WARNING on tip/locking/core]
[also build test WARNING on v4.9-rc7 next-20161130]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Chris-Wilson/locking-Begin-kselftests-for-ww_mutex/20161130-221058
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha

All warnings (new ones prefixed by >>):

warning: (WW_MUTEX_SELFTEST) selects DEBUG_WW_MUTEX_SLOWPATH which has unmet direct dependencies (DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT)

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

Attachments:

(No filename) (1.04 kB)
.config.gz (46.52 kB)
Download all attachments