2016-12-01 14:07:03

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

Changes to patches 1 & 5 based on feedback. I've also updated the branch
at https://cgit.freedesktop.org/~nh/linux/log/?h=mutex.

There's been the question of using a balanced tree rather than a list.
Frankly, I'd say the 99% use case doesn't need it. Also, dealing with
waiters without a context is easy in the list, but becomes trickier with a
tree. I think one can do it with an additional counter on the lock itself
to establish the FIFO order of context-less waiters, but it'd make the
code harder to follow. I doubt that it's worth it.

(original cover letter below)
The basic idea is to make sure that:

1. All waiters that have a ww_ctx appear in stamp order in the wait list.
Waiters without a ww_ctx are still supported and appear in FIFO order as
before.

2. At most one of the waiters can be in a state where it has to check for
back off (i.e., ww_ctx->acquire > 0). Technically, there are short time
windows in which more than one such waiter can be on the list, but all
but the first one are running. This happens when a new waiter with
ww_ctx->acquire > 0 adds itself at the front of the list and wakes up the
previous head of the list, and of course multiple such chained cases can
be in-flight simultaneously.

Then we only ever have to wake up one task at a time. This is _not_ always
the head of the wait list, since there may be waiters without a context. But
among waiters with a context, we only ever have to wake the first one.

To achieve all this, the series adds a new field to mutex_waiter which is
only used for the w/w lock case. As a consequence, calling mutex_lock
directly on w/w locks is now definitely incorrect. That was likely the
intention previously anyway, but grepping through the source I did find one
place that had slipped through.

I've included timings taken from a contention-heavy stress test to some of
the patches. The stress test performs actual GPU operations which take a
good chunk of the wall time, but even so, the series still manages to
improve the wall time quite a bit.

Cheers,
Nicolai


2016-12-01 14:07:11

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

From: Nicolai Hähnle <[email protected]>

Add regular waiters in stamp order. Keep adding waiters that have no
context in FIFO order and take care not to starve them.

While adding our task as a waiter, back off if we detect that there is a
waiter with a lower stamp in front of us.

Make sure to call lock_contended even when we back off early.

v2:
- rein in the indentation of __ww_mutex_add_waiter a bit
- set contending_lock in __ww_mutex_add_waiter (Chris Wilson)

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/mutex.h | 3 ++
kernel/locking/mutex.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index b97870f..118a3b6 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -20,6 +20,8 @@
#include <linux/osq_lock.h>
#include <linux/debug_locks.h>

+struct ww_acquire_ctx;
+
/*
* Simple, straightforward mutexes with strict semantics:
*
@@ -75,6 +77,7 @@ static inline struct task_struct *__mutex_owner(struct mutex *lock)
struct mutex_waiter {
struct list_head list;
struct task_struct *task;
+ struct ww_acquire_ctx *ww_ctx;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 585627f..d63eebb 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -628,6 +628,51 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
return 0;
}

+static inline int __sched
+__ww_mutex_add_waiter(struct mutex_waiter *waiter,
+ struct mutex *lock,
+ struct ww_acquire_ctx *ww_ctx)
+{
+ struct mutex_waiter *cur;
+
+ if (!ww_ctx) {
+ list_add_tail(&waiter->list, &lock->wait_list);
+ return 0;
+ }
+
+ /*
+ * Add the waiter before the first waiter with a higher stamp.
+ * Waiters without a context are skipped to avoid starving
+ * them.
+ */
+ list_for_each_entry(cur, &lock->wait_list, list) {
+ if (!cur->ww_ctx)
+ continue;
+
+ if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
+ /* Back off immediately if necessary. */
+ if (ww_ctx->acquired > 0) {
+#ifdef CONFIG_DEBUG_MUTEXES
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
+ DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
+ ww_ctx->contending_lock = ww;
+#endif
+ return -EDEADLK;
+ }
+
+ continue;
+ }
+
+ list_add_tail(&waiter->list, &cur->list);
+ return 0;
+ }
+
+ list_add_tail(&waiter->list, &lock->wait_list);
+ return 0;
+}
+
/*
* Lock a mutex (possibly interruptible), slowpath:
*/
@@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
debug_mutex_lock_common(lock, &waiter);
debug_mutex_add_waiter(lock, &waiter, task);

- /* add waiting tasks to the end of the waitqueue (FIFO): */
- list_add_tail(&waiter.list, &lock->wait_list);
+ lock_contended(&lock->dep_map, ip);
+
+ if (!use_ww_ctx) {
+ /* add waiting tasks to the end of the waitqueue (FIFO): */
+ list_add_tail(&waiter.list, &lock->wait_list);
+ } else {
+ /* Add in stamp order, waking up waiters that must back off. */
+ ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+ if (ret)
+ goto err_early_backoff;
+
+ waiter.ww_ctx = ww_ctx;
+ }
+
waiter.task = task;

if (__mutex_waiter_is_first(lock, &waiter))
__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);

- lock_contended(&lock->dep_map, ip);
-
set_task_state(task, state);
for (;;) {
/*
@@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* mutex_unlock() handing the lock off to us, do a trylock
* before testing the error conditions to make sure we pick up
* the handoff.
+ *
+ * For w/w locks, we always need to do this even if we're not
+ * currently the first waiter, because we may have been the
+ * first waiter during the unlock.
*/
- if (__mutex_trylock(lock, first))
+ if (__mutex_trylock(lock, use_ww_ctx || first))
goto acquired;

/*
@@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
spin_unlock_mutex(&lock->wait_lock, flags);
schedule_preempt_disabled();

- if (!first && __mutex_waiter_is_first(lock, &waiter)) {
+ if (use_ww_ctx && ww_ctx) {
+ /*
+ * Always re-check whether we're in first position. We
+ * don't want to spin if another task with a lower
+ * stamp has taken our position.
+ *
+ * We also may have to set the handoff flag again, if
+ * our position at the head was temporarily taken away.
+ */
+ first = __mutex_waiter_is_first(lock, &waiter);
+
+ if (first)
+ __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
+ } else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
first = true;
__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
}
@@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* or we must see its unlock and acquire.
*/
if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
- __mutex_trylock(lock, first))
+ __mutex_trylock(lock, use_ww_ctx || first))
break;

spin_lock_mutex(&lock->wait_lock, flags);
@@ -761,6 +833,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
err:
__set_task_state(task, TASK_RUNNING);
mutex_remove_waiter(lock, &waiter, task);
+err_early_backoff:
spin_unlock_mutex(&lock->wait_lock, flags);
debug_mutex_free_waiter(&waiter);
mutex_release(&lock->dep_map, 1, ip);
--
2.7.4

2016-12-01 14:07:07

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

From: Nicolai Hähnle <[email protected]>

In the following scenario, thread #1 should back off its attempt to lock
ww1 and unlock ww2 (assuming the acquire context stamps are ordered
accordingly).

Thread #0 Thread #1
--------- ---------
successfully lock ww2
set ww1->base.owner
attempt to lock ww1
confirm ww1->ctx == NULL
enter mutex_spin_on_owner
set ww1->ctx

What was likely to happen previously is:

attempt to lock ww2
refuse to spin because
ww2->ctx != NULL
schedule()
detect thread #0 is off CPU
stop optimistic spin
return -EDEADLK
unlock ww2
wakeup thread #0
lock ww2

Now, we are more likely to see:

detect ww1->ctx != NULL
stop optimistic spin
return -EDEADLK
unlock ww2
successfully lock ww2

... because thread #1 will stop its optimistic spin as soon as possible.

The whole scenario is quite unlikely, since it requires thread #1 to get
between thread #0 setting the owner and setting the ctx. But since we're
idling here anyway, the additional check is basically free.

Found by inspection.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 9b34961..0afa998 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -350,7 +350,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
* access and not reliable.
*/
static noinline
-bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
+ bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
{
bool ret = true;

@@ -373,6 +374,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
break;
}

+ if (use_ww_ctx && ww_ctx->acquired > 0) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
+
+ /*
+ * If ww->ctx is set the contents are undefined, only
+ * by acquiring wait_lock there is a guarantee that
+ * they are not invalid when reading.
+ *
+ * As such, when deadlock detection needs to be
+ * performed the optimistic spinning cannot be done.
+ *
+ * Check this in every inner iteration because we may
+ * be racing against another thread's ww_mutex_lock.
+ */
+ if (READ_ONCE(ww->ctx)) {
+ ret = false;
+ break;
+ }
+ }
+
cpu_relax();
}
rcu_read_unlock();
@@ -460,22 +483,6 @@ static bool mutex_optimistic_spin(struct mutex *lock,
for (;;) {
struct task_struct *owner;

- if (use_ww_ctx && ww_ctx->acquired > 0) {
- struct ww_mutex *ww;
-
- ww = container_of(lock, struct ww_mutex, base);
- /*
- * If ww->ctx is set the contents are undefined, only
- * by acquiring wait_lock there is a guarantee that
- * they are not invalid when reading.
- *
- * As such, when deadlock detection needs to be
- * performed the optimistic spinning cannot be done.
- */
- if (READ_ONCE(ww->ctx))
- goto fail_unlock;
- }
-
/*
* If there's an owner, wait for it to either
* release the lock or go to sleep.
@@ -487,7 +494,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
break;
}

- if (!mutex_spin_on_owner(lock, owner))
+ if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
+ ww_ctx))
goto fail_unlock;
}

--
2.7.4

2016-12-01 14:07:15

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 08/11] locking/ww_mutex: Yield to other waiters from optimistic spin

From: Nicolai Hähnle <[email protected]>

Lock stealing is less beneficial for w/w mutexes since we may just end up
backing off if we stole from a thread with an earlier acquire stamp that
already holds another w/w mutex that we also need. So don't spin
optimistically unless we are sure that there is no other waiter that might
cause us to back off.

Median timings taken of a contention-heavy GPU workload:

Before:
real 0m52.946s
user 0m7.272s
sys 1m55.964s

After:
real 0m53.086s
user 0m7.360s
sys 1m46.204s

This particular workload still spends 20%-25% of CPU in mutex_spin_on_owner
according to perf, but my attempts to further reduce this spinning based on
various heuristics all lead to an increase in measured wall time despite
the decrease in sys time.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 48 ++++++++++++++++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index d2ca447..296605c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -374,7 +374,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
*/
static noinline
bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
- bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
+ bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx,
+ struct mutex_waiter *waiter)
{
bool ret = true;

@@ -397,7 +398,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
break;
}

- if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
+ if (use_ww_ctx && ww_ctx) {
struct ww_mutex *ww;

ww = container_of(lock, struct ww_mutex, base);
@@ -413,7 +414,30 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
* Check this in every inner iteration because we may
* be racing against another thread's ww_mutex_lock.
*/
- if (READ_ONCE(ww->ctx)) {
+ if (ww_ctx->acquired > 0 && READ_ONCE(ww->ctx)) {
+ ret = false;
+ break;
+ }
+
+ /*
+ * If we aren't on the wait list yet, cancel the spin
+ * if there are waiters. We want to avoid stealing the
+ * lock from a waiter with an earlier stamp, since the
+ * other thread may already own a lock that we also
+ * need.
+ */
+ if (!waiter &&
+ (atomic_long_read(&lock->owner) &
+ MUTEX_FLAG_WAITERS)) {
+ ret = false;
+ break;
+ }
+
+ /*
+ * Similarly, stop spinning if we are no longer the
+ * first waiter.
+ */
+ if (waiter && !__mutex_waiter_is_first(lock, waiter)) {
ret = false;
break;
}
@@ -479,7 +503,8 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
*/
static bool mutex_optimistic_spin(struct mutex *lock,
struct ww_acquire_ctx *ww_ctx,
- const bool use_ww_ctx, const bool waiter)
+ const bool use_ww_ctx,
+ struct mutex_waiter *waiter)
{
struct task_struct *task = current;

@@ -518,12 +543,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
}

if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
- ww_ctx))
+ ww_ctx, waiter))
goto fail_unlock;
}

/* Try to acquire the mutex if it is unlocked. */
- if (__mutex_trylock(lock, waiter))
+ if (__mutex_trylock(lock, waiter != NULL))
break;

/*
@@ -565,7 +590,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
#else
static bool mutex_optimistic_spin(struct mutex *lock,
struct ww_acquire_ctx *ww_ctx,
- const bool use_ww_ctx, const bool waiter)
+ const bool use_ww_ctx,
+ struct mutex_waiter *waiter)
{
return false;
}
@@ -736,7 +762,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);

if (__mutex_trylock(lock, false) ||
- mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
+ mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, NULL)) {
/* got the lock, yay! */
lock_acquired(&lock->dep_map, ip);
if (use_ww_ctx && ww_ctx) {
@@ -841,8 +867,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* state back to RUNNING and fall through the next schedule(),
* or we must see its unlock and acquire.
*/
- if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
- __mutex_trylock(lock, use_ww_ctx || first))
+ if ((first &&
+ mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
+ &waiter)) ||
+ __mutex_trylock(lock, use_ww_ctx || first))
break;

spin_lock_mutex(&lock->wait_lock, flags);
--
2.7.4

2016-12-01 14:07:22

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 06/11] locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list

From: Nicolai Hähnle <[email protected]>

While adding our task as a waiter, detect if another task should back off
because of us.

With this patch, we establish the invariant that the wait list contains
at most one (sleeping) waiter with ww_ctx->acquired > 0, and this waiter
will be the first waiter with a context.

Since only waiters with ww_ctx->acquired > 0 have to back off, this allows
us to be much more economical with wakeups.

v2: rebase on v2 of earlier patches

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 42 ++++++++++++++++++++++++++++++++----------
1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index d63eebb..01e9438 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -609,23 +609,34 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
EXPORT_SYMBOL(ww_mutex_unlock);

static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+ struct ww_acquire_ctx *ctx)
{
struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
+ struct mutex_waiter *cur;

- if (!hold_ctx)
- return 0;
+ if (hold_ctx && __ww_mutex_stamp_after(ctx, hold_ctx))
+ goto deadlock;

- if (__ww_mutex_stamp_after(ctx, hold_ctx)) {
-#ifdef CONFIG_DEBUG_MUTEXES
- DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
- ctx->contending_lock = ww;
-#endif
- return -EDEADLK;
+ /*
+ * If there is a waiter in front of us that has a context, then its
+ * stamp is earlier than ours and we must back off.
+ */
+ cur = waiter;
+ list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
+ if (cur->ww_ctx)
+ goto deadlock;
}

return 0;
+
+deadlock:
+#ifdef CONFIG_DEBUG_MUTEXES
+ DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
+ ctx->contending_lock = ww;
+#endif
+ return -EDEADLK;
}

static inline int __sched
@@ -666,6 +677,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
}

list_add_tail(&waiter->list, &cur->list);
+
+ /*
+ * Wake up the waiter so that it gets a chance to back
+ * off.
+ */
+ if (cur->ww_ctx->acquired > 0) {
+ debug_mutex_wake_waiter(lock, cur);
+ wake_up_process(cur->task);
+ }
+
return 0;
}

@@ -767,7 +788,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
}

if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
- ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
+ ret = __ww_mutex_lock_check_stamp(lock, &waiter,
+ ww_ctx);
if (ret)
goto err;
}
--
2.7.4

2016-12-01 14:07:19

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 11/11] [rfc] locking/ww_mutex: Always spin optimistically for the first waiter

From: Nicolai Hähnle <[email protected]>

Check the current owner's context once against our stamp. If our stamp is
lower, we continue to spin optimistically instead of backing off.

This is correct with respect to deadlock detection because while the
(owner, ww_ctx) pair may re-appear if the owner task manages to unlock
and re-acquire the lock while we're spinning, the context could only have
been re-initialized with an even higher stamp. We also still detect when
we have to back off for other waiters that join the list while we're
spinning.

But taking the wait_lock in mutex_spin_on_owner feels iffy, even if it is
done only once.

Median timings taken of a contention-heavy GPU workload:

Before:
real 0m53.086s
user 0m7.360s
sys 1m46.204s

After:
real 0m52.577s
user 0m7.544s
sys 1m49.200s

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 35 ++++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 38d173c..9216239 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -378,6 +378,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
struct mutex_waiter *waiter)
{
bool ret = true;
+ struct ww_acquire_ctx *owner_ww_ctx = NULL;
+
+ if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
+ struct ww_mutex *ww;
+ unsigned long flags;
+
+ ww = container_of(lock, struct ww_mutex, base);
+
+ /*
+ * Check the stamp of the current owner once. This allows us
+ * to spin optimistically in the case where the current owner
+ * has a higher stamp than us.
+ */
+ spin_lock_mutex(&lock->wait_lock, flags);
+ owner_ww_ctx = ww->ctx;
+ if (owner_ww_ctx &&
+ __ww_mutex_stamp_after(ww_ctx, owner_ww_ctx)) {
+ spin_unlock_mutex(&lock->wait_lock, flags);
+ return false;
+ }
+ spin_unlock_mutex(&lock->wait_lock, flags);
+ }

rcu_read_lock();
while (__mutex_owner(lock) == owner) {
@@ -414,9 +436,16 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
* Check this in every inner iteration because we may
* be racing against another thread's ww_mutex_lock.
*/
- if (ww_ctx->acquired > 0 && READ_ONCE(ww->ctx)) {
- ret = false;
- break;
+ if (ww_ctx->acquired > 0) {
+ struct ww_acquire_ctx *current_ctx;
+
+ current_ctx = READ_ONCE(ww->ctx);
+
+ if (current_ctx &&
+ current_ctx != owner_ww_ctx) {
+ ret = false;
+ break;
+ }
}

/*
--
2.7.4

2016-12-01 14:07:46

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 10/11] Documentation/locking/ww_mutex: Update the design document

From: Nicolai Hähnle <[email protected]>

Document the invariants we maintain for the wait list of ww_mutexes.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
Documentation/locking/ww-mutex-design.txt | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 8a112dc..34c3a1b 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -309,11 +309,15 @@ Design:
normal mutex locks, which are far more common. As such there is only a small
increase in code size if wait/wound mutexes are not used.

+ We maintain the following invariants for the wait list:
+ (1) Waiters with an acquire context are sorted by stamp order; waiters
+ without an acquire context are interspersed in FIFO order.
+ (2) Among waiters with contexts, only the first one can have other locks
+ acquired already (ctx->acquired > 0). Note that this waiter may come
+ after other waiters without contexts in the list.
+
In general, not much contention is expected. The locks are typically used to
- serialize access to resources for devices. The only way to make wakeups
- smarter would be at the cost of adding a field to struct mutex_waiter. This
- would add overhead to all cases where normal mutexes are used, and
- ww_mutexes are generally less performance sensitive.
+ serialize access to resources for devices.

Lockdep:
Special care has been taken to warn for as many cases of api abuse
--
2.7.4

2016-12-01 14:08:09

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 09/11] locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging

From: Nicolai Hähnle <[email protected]>

Help catch cases where mutex_lock is used directly on w/w mutexes, which
otherwise result in the w/w tasks reading uninitialized data.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/poison.h | 1 +
kernel/locking/mutex.c | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/include/linux/poison.h b/include/linux/poison.h
index 51334ed..a395403 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -80,6 +80,7 @@
/********** kernel/mutexes **********/
#define MUTEX_DEBUG_INIT 0x11
#define MUTEX_DEBUG_FREE 0x22
+#define MUTEX_POISON_WW_CTX ((void *) 0x500 + POISON_POINTER_DELTA)

/********** lib/flex_array.c **********/
#define FLEX_ARRAY_FREE 0x6c /* for use-after-free poisoning */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 296605c..38d173c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -794,6 +794,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
if (!use_ww_ctx) {
/* add waiting tasks to the end of the waitqueue (FIFO): */
list_add_tail(&waiter.list, &lock->wait_list);
+
+#ifdef CONFIG_DEBUG_MUTEXES
+ waiter.ww_ctx = MUTEX_POISON_WW_CTX;
+#endif
} else {
/* Add in stamp order, waking up waiters that must back off. */
ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
--
2.7.4

2016-12-01 14:08:26

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 07/11] locking/ww_mutex: Wake at most one waiter for back off when acquiring the lock

From: Nicolai Hähnle <[email protected]>

The wait list is sorted by stamp order, and the only waiting task that may
have to back off is the first waiter with a context.

The regular slow path does not have to wake any other tasks at all, since
all other waiters that would have to back off were either woken up when
the waiter was added to the list, or detected the condition before they
added themselves.

Median timings taken of a contention-heavy GPU workload:

Without this series:
real 0m59.900s
user 0m7.516s
sys 2m16.076s

With changes up to and including this patch:
real 0m52.946s
user 0m7.272s
sys 1m55.964s

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 58 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 01e9438..d2ca447 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -285,6 +285,35 @@ __ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
}

/*
+ * Wake up any waiters that may have to back off when the lock is held by the
+ * given context.
+ *
+ * Due to the invariants on the wait list, this can only affect the first
+ * waiter with a context.
+ *
+ * Must be called with wait_lock held. The current task must not be on the
+ * wait list.
+ */
+static void __sched
+__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+ struct mutex_waiter *cur;
+
+ list_for_each_entry(cur, &lock->wait_list, list) {
+ if (!cur->ww_ctx)
+ continue;
+
+ if (cur->ww_ctx->acquired > 0 &&
+ __ww_mutex_stamp_after(cur->ww_ctx, ww_ctx)) {
+ debug_mutex_wake_waiter(lock, cur);
+ wake_up_process(cur->task);
+ }
+
+ break;
+ }
+}
+
+/*
* After acquiring lock with fastpath or when we lost out in contested
* slowpath, set ctx and wake up any waiters so they can recheck.
*/
@@ -293,7 +322,6 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
unsigned long flags;
- struct mutex_waiter *cur;

ww_mutex_lock_acquired(lock, ctx);

@@ -319,16 +347,15 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
* so they can see the new lock->ctx.
*/
spin_lock_mutex(&lock->base.wait_lock, flags);
- list_for_each_entry(cur, &lock->base.wait_list, list) {
- debug_mutex_wake_waiter(&lock->base, cur);
- wake_up_process(cur->task);
- }
+ __ww_mutex_wakeup_for_backoff(&lock->base, ctx);
spin_unlock_mutex(&lock->base.wait_lock, flags);
}

/*
- * After acquiring lock in the slowpath set ctx and wake up any
- * waiters so they can recheck.
+ * After acquiring lock in the slowpath set ctx.
+ *
+ * Unlike for the fast path, the caller ensures that waiters are woken up where
+ * necessary.
*
* Callers must hold the mutex wait_lock.
*/
@@ -336,19 +363,8 @@ static __always_inline void
ww_mutex_set_context_slowpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
- struct mutex_waiter *cur;
-
ww_mutex_lock_acquired(lock, ctx);
lock->ctx = ctx;
-
- /*
- * Give any possible sleeping processes the chance to wake up,
- * so they can recheck if they have to back off.
- */
- list_for_each_entry(cur, &lock->base.wait_list, list) {
- debug_mutex_wake_waiter(&lock->base, cur);
- wake_up_process(cur->task);
- }
}

#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
@@ -737,8 +753,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
/*
* After waiting to acquire the wait_lock, try again.
*/
- if (__mutex_trylock(lock, false))
+ if (__mutex_trylock(lock, false)) {
+ if (use_ww_ctx && ww_ctx)
+ __ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
goto skip_wait;
+ }

debug_mutex_lock_common(lock, &waiter);
debug_mutex_add_waiter(lock, &waiter, task);
--
2.7.4

2016-12-01 14:08:47

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

From: Nicolai Hähnle <[email protected]>

We will add a new field to struct mutex_waiter. This field must be
initialized for all waiters if any waiter uses the ww_use_ctx path.

So there is a trade-off: Keep ww_mutex locking without a context on the
faster non-use_ww_ctx path, at the cost of adding the initialization to all
mutex locks (including non-ww_mutexes), or avoid the additional cost for
non-ww_mutex locks, at the cost of adding additional checks to the
use_ww_ctx path.

We take the latter choice. It may be worth eliminating the users of
ww_mutex_lock(lock, NULL), but there are a lot of them.

Move the declaration of ww in __mutex_lock_common closer to its uses
because gcc otherwise (incorrectly) believes ww may be used without
initialization.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
include/linux/ww_mutex.h | 11 ++---------
kernel/locking/mutex.c | 37 +++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 2bb5deb..a5960e5 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -222,11 +222,7 @@ extern int __must_check __ww_mutex_lock_interruptible(struct ww_mutex *lock,
*/
static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
{
- if (ctx)
- return __ww_mutex_lock(lock, ctx);
-
- mutex_lock(&lock->base);
- return 0;
+ return __ww_mutex_lock(lock, ctx);
}

/**
@@ -262,10 +258,7 @@ static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ct
static inline int __must_check ww_mutex_lock_interruptible(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
- if (ctx)
- return __ww_mutex_lock_interruptible(lock, ctx);
- else
- return mutex_lock_interruptible(&lock->base);
+ return __ww_mutex_lock_interruptible(lock, ctx);
}

/**
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 200629a..585627f 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -381,7 +381,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
break;
}

- if (use_ww_ctx && ww_ctx->acquired > 0) {
+ if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
struct ww_mutex *ww;

ww = container_of(lock, struct ww_mutex, base);
@@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
struct mutex_waiter waiter;
unsigned long flags;
bool first = false;
- struct ww_mutex *ww;
int ret;

- if (use_ww_ctx) {
+ if (use_ww_ctx && ww_ctx) {
+ struct ww_mutex *ww;
+
ww = container_of(lock, struct ww_mutex, base);
if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
return -EALREADY;
@@ -656,8 +657,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
/* got the lock, yay! */
lock_acquired(&lock->dep_map, ip);
- if (use_ww_ctx)
+ if (use_ww_ctx && ww_ctx) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
ww_mutex_set_context_fastpath(ww, ww_ctx);
+ }
preempt_enable();
return 0;
}
@@ -702,7 +707,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
goto err;
}

- if (use_ww_ctx && ww_ctx->acquired > 0) {
+ if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
if (ret)
goto err;
@@ -742,8 +747,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
/* got the lock - cleanup and rejoice! */
lock_acquired(&lock->dep_map, ip);

- if (use_ww_ctx)
+ if (use_ww_ctx && ww_ctx) {
+ struct ww_mutex *ww;
+
+ ww = container_of(lock, struct ww_mutex, base);
ww_mutex_set_context_slowpath(ww, ww_ctx);
+ }

spin_unlock_mutex(&lock->wait_lock, flags);
preempt_enable();
@@ -830,8 +839,9 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)

might_sleep();
ret = __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE,
- 0, &ctx->dep_map, _RET_IP_, ctx, 1);
- if (!ret && ctx->acquired > 1)
+ 0, ctx ? &ctx->dep_map : NULL, _RET_IP_,
+ ctx, 1);
+ if (!ret && ctx && ctx->acquired > 1)
return ww_mutex_deadlock_injection(lock, ctx);

return ret;
@@ -845,9 +855,10 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)

might_sleep();
ret = __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE,
- 0, &ctx->dep_map, _RET_IP_, ctx, 1);
+ 0, ctx ? &ctx->dep_map : NULL, _RET_IP_,
+ ctx, 1);

- if (!ret && ctx->acquired > 1)
+ if (!ret && ctx && ctx->acquired > 1)
return ww_mutex_deadlock_injection(lock, ctx);

return ret;
@@ -1034,7 +1045,8 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
might_sleep();

if (__mutex_trylock_fast(&lock->base)) {
- ww_mutex_set_context_fastpath(lock, ctx);
+ if (ctx)
+ ww_mutex_set_context_fastpath(lock, ctx);
return 0;
}

@@ -1048,7 +1060,8 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
might_sleep();

if (__mutex_trylock_fast(&lock->base)) {
- ww_mutex_set_context_fastpath(lock, ctx);
+ if (ctx)
+ ww_mutex_set_context_fastpath(lock, ctx);
return 0;
}

--
2.7.4

2016-12-01 14:09:06

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after

From: Nicolai Hähnle <[email protected]>

The function will be re-used in subsequent patches.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
kernel/locking/mutex.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0afa998..200629a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -277,6 +277,13 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
ww_ctx->acquired++;
}

+static inline bool __sched
+__ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
+{
+ return a->stamp - b->stamp <= LONG_MAX &&
+ (a->stamp != b->stamp || a > b);
+}
+
/*
* After acquiring lock with fastpath or when we lost out in contested
* slowpath, set ctx and wake up any waiters so they can recheck.
@@ -610,8 +617,7 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
if (!hold_ctx)
return 0;

- if (ctx->stamp - hold_ctx->stamp <= LONG_MAX &&
- (ctx->stamp != hold_ctx->stamp || ctx > hold_ctx)) {
+ if (__ww_mutex_stamp_after(ctx, hold_ctx)) {
#ifdef CONFIG_DEBUG_MUTEXES
DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
ctx->contending_lock = ww;
--
2.7.4

2016-12-01 14:09:24

by Nicolai Hähnle

[permalink] [raw]
Subject: [PATCH v2 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

From: Nicolai Hähnle <[email protected]>

v2: use resv->lock instead of resv->lock.base (Christian König)

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: [email protected]
Signed-off-by: Nicolai Hähnle <[email protected]>
---
drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
index 488909a..9cb00a5 100644
--- a/drivers/gpu/drm/vgem/vgem_fence.c
+++ b/drivers/gpu/drm/vgem/vgem_fence.c
@@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,

/* Expose the fence via the dma-buf */
ret = 0;
- mutex_lock(&resv->lock.base);
+ ww_mutex_lock(&resv->lock, NULL);
if (arg->flags & VGEM_FENCE_WRITE)
reservation_object_add_excl_fence(resv, fence);
else if ((ret = reservation_object_reserve_shared(resv)) == 0)
reservation_object_add_shared_fence(resv, fence);
- mutex_unlock(&resv->lock.base);
+ ww_mutex_unlock(&resv->lock);

/* Record the fence in our idr for later signaling */
if (ret == 0) {
--
2.7.4

2016-12-01 14:19:07

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH v2 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

On Thu, Dec 01, 2016 at 03:06:44PM +0100, Nicolai H?hnle wrote:
> From: Nicolai H?hnle <[email protected]>
>
> v2: use resv->lock instead of resv->lock.base (Christian K?nig)
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: [email protected]
> Signed-off-by: Nicolai H?hnle <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-12-01 14:36:53

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH v2 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

On Thu, Dec 01, 2016 at 03:06:45PM +0100, Nicolai H?hnle wrote:
> From: Nicolai H?hnle <[email protected]>
>
> In the following scenario, thread #1 should back off its attempt to lock
> ww1 and unlock ww2 (assuming the acquire context stamps are ordered
> accordingly).
>
> Thread #0 Thread #1
> --------- ---------
> successfully lock ww2
> set ww1->base.owner
> attempt to lock ww1
> confirm ww1->ctx == NULL
> enter mutex_spin_on_owner
> set ww1->ctx
>
> What was likely to happen previously is:
>
> attempt to lock ww2
> refuse to spin because
> ww2->ctx != NULL
> schedule()
> detect thread #0 is off CPU
> stop optimistic spin
> return -EDEADLK
> unlock ww2
> wakeup thread #0
> lock ww2
>
> Now, we are more likely to see:
>
> detect ww1->ctx != NULL
> stop optimistic spin
> return -EDEADLK
> unlock ww2
> successfully lock ww2
>
> ... because thread #1 will stop its optimistic spin as soon as possible.
>
> The whole scenario is quite unlikely, since it requires thread #1 to get
> between thread #0 setting the owner and setting the ctx. But since we're
> idling here anyway, the additional check is basically free.
>
> Found by inspection.

Similar question can be raised for can_spin_on_owner() as well. Is it
worth for a contending ww_mutex to enter the osq queue if we expect a
EDEADLK? It seems to boil down to how likely is the EDEADLK going to
evaporate if we wait for the owner to finish and unlock.

The patch looks reasonable, just a question of desirability.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-12-01 14:42:58

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after

On Thu, Dec 01, 2016 at 03:06:46PM +0100, Nicolai H?hnle wrote:
> From: Nicolai H?hnle <[email protected]>
>
> The function will be re-used in subsequent patches.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: [email protected]
> Signed-off-by: Nicolai H?hnle <[email protected]>
> ---
> kernel/locking/mutex.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0afa998..200629a 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -277,6 +277,13 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
> ww_ctx->acquired++;
> }
>
> +static inline bool __sched
> +__ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)

Should it be ww_mutex_stamp or ww_acquire_stamp / ww_ctx_stamp?

Nothing else operates on the ww_acquire_ctx without a ww_mutex so it
might look a bit odd if it didn't use ww_mutex.

Patch only does what it says on tin, so
Reviewed-by: Chris Wilson <[email protected]>
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-12-01 15:14:27

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v2 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

On Thu, Dec 01, 2016 at 02:18:04PM +0000, Chris Wilson wrote:
> On Thu, Dec 01, 2016 at 03:06:44PM +0100, Nicolai H?hnle wrote:
> > From: Nicolai H?hnle <[email protected]>
> >
> > v2: use resv->lock instead of resv->lock.base (Christian K?nig)
> >
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Maarten Lankhorst <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: Chris Wilson <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Nicolai H?hnle <[email protected]>
> Reviewed-by: Chris Wilson <[email protected]>

Applied to drm-misc, thanks.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-12-01 16:00:44

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
> @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> debug_mutex_lock_common(lock, &waiter);
> debug_mutex_add_waiter(lock, &waiter, task);
>
> - /* add waiting tasks to the end of the waitqueue (FIFO): */
> - list_add_tail(&waiter.list, &lock->wait_list);
> + lock_contended(&lock->dep_map, ip);
> +
> + if (!use_ww_ctx) {
> + /* add waiting tasks to the end of the waitqueue (FIFO): */
> + list_add_tail(&waiter.list, &lock->wait_list);
> + } else {
> + /* Add in stamp order, waking up waiters that must back off. */
> + ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
> + if (ret)
> + goto err_early_backoff;
> +
> + waiter.ww_ctx = ww_ctx;
> + }
> +
> waiter.task = task;

Would an unconditional waiter.ww_ctx = ww_ctx be chep enough? (Same
cacheline write and all that?)

Makes the above clearer in that you have

if (!ww_ctx) {
list_add_tail();
} else {
ret = __ww_mutex_add_waiter(); /* no need to handle !ww_ctx */
if (ret)
goto err_early_backoff;
}

waiter.ww_ctx = ww_ctx;
waiter.task = task;

>
> if (__mutex_waiter_is_first(lock, &waiter))
> __mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
>
> - lock_contended(&lock->dep_map, ip);
> -
> set_task_state(task, state);
> for (;;) {
> /*
> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> * mutex_unlock() handing the lock off to us, do a trylock
> * before testing the error conditions to make sure we pick up
> * the handoff.
> + *
> + * For w/w locks, we always need to do this even if we're not
> + * currently the first waiter, because we may have been the
> + * first waiter during the unlock.
> */
> - if (__mutex_trylock(lock, first))
> + if (__mutex_trylock(lock, use_ww_ctx || first))

I'm not certain about the magic of first vs HANDOFF. Afaict, first ==
HANDOFF and this patch breaks that relationship. I think you need to add
bool handoff; as a separate tracker to first.

> goto acquired;
>
> /*
> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> spin_unlock_mutex(&lock->wait_lock, flags);
> schedule_preempt_disabled();
>
> - if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> + if (use_ww_ctx && ww_ctx) {
> + /*
> + * Always re-check whether we're in first position. We
> + * don't want to spin if another task with a lower
> + * stamp has taken our position.
> + *
> + * We also may have to set the handoff flag again, if
> + * our position at the head was temporarily taken away.

Comment makes sense.

Ah. Should this be just if (use_ww_ctx) { /* always recheck... */ ?
Except that !ww_ctx are never gazzumped in the list, so if they are
first, then they are always first.

Could you explain that as well (about why !ww_ctx is special here but
not above). And then it can even be reduced to if (ww_ctx) {} to match
the first chunk if the revision is acceptable.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-12-01 16:24:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context


Just to let you know, I've not been ignoring all the ww_mutex stuff,
I've been ill this week. I'm slowly returning to feeling human again,
but am still fairly wrecked.

2016-12-06 15:06:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

On Thu, Dec 01, 2016 at 03:06:45PM +0100, Nicolai H?hnle wrote:
> +++ b/kernel/locking/mutex.c
> @@ -350,7 +350,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
> * access and not reliable.
> */
> static noinline
> -bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
> +bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
> + bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
> {
> bool ret = true;
>
> @@ -373,6 +374,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
> break;
> }
>
> + if (use_ww_ctx && ww_ctx->acquired > 0) {
> + struct ww_mutex *ww;
> +
> + ww = container_of(lock, struct ww_mutex, base);
> +
> + /*
> + * If ww->ctx is set the contents are undefined, only
> + * by acquiring wait_lock there is a guarantee that
> + * they are not invalid when reading.
> + *
> + * As such, when deadlock detection needs to be
> + * performed the optimistic spinning cannot be done.
> + *
> + * Check this in every inner iteration because we may
> + * be racing against another thread's ww_mutex_lock.
> + */
> + if (READ_ONCE(ww->ctx)) {
> + ret = false;
> + break;
> + }
> + }
> +
> cpu_relax();
> }
> rcu_read_unlock();

Aside from the valid question about mutex_can_spin_on_owner() there's
another 'problem' here, mutex_spin_on_owner() is marked noinline, so all
the use_ww_ctx stuff doesn't 'work' here.

As is, I think we're missing an __always_inline on
mutex_optimistic_spin, I'll have to go see what that does for code
generation, but given both @use_ww_ctx and @waiter there that makes
sense.

2016-12-06 15:15:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

On Thu, Dec 01, 2016 at 03:06:47PM +0100, Nicolai H?hnle wrote:
> +++ b/include/linux/ww_mutex.h
> @@ -222,11 +222,7 @@ extern int __must_check __ww_mutex_lock_interruptible(struct ww_mutex *lock,
> */
> static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> {
> - if (ctx)
> - return __ww_mutex_lock(lock, ctx);
> -
> - mutex_lock(&lock->base);
> - return 0;
> + return __ww_mutex_lock(lock, ctx);
> }
>
> /**
> @@ -262,10 +258,7 @@ static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ct
> static inline int __must_check ww_mutex_lock_interruptible(struct ww_mutex *lock,
> struct ww_acquire_ctx *ctx)
> {
> - if (ctx)
> - return __ww_mutex_lock_interruptible(lock, ctx);
> - else
> - return mutex_lock_interruptible(&lock->base);
> + return __ww_mutex_lock_interruptible(lock, ctx);
> }
>

After this the entire point of __ww_mutex_lock*() is gone, right? Might
as well rename them to ww_mutex_lock() and remove this pointless
wrapper.

2016-12-06 15:26:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

On Thu, Dec 01, 2016 at 03:06:47PM +0100, Nicolai H?hnle wrote:

> @@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> struct mutex_waiter waiter;
> unsigned long flags;
> bool first = false;
> - struct ww_mutex *ww;
> int ret;
>
> - if (use_ww_ctx) {
> + if (use_ww_ctx && ww_ctx) {
> + struct ww_mutex *ww;
> +
> ww = container_of(lock, struct ww_mutex, base);
> if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
> return -EALREADY;

So I don't see the point of removing *ww from the function scope, we can
still compute that container_of() even if !ww_ctx, right? That would
safe a ton of churn below, adding all those struct ww_mutex declarations
and container_of() casts.

(and note that the container_of() is a fancy NO-OP because base is the
first member).

> @@ -656,8 +657,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
> /* got the lock, yay! */
> lock_acquired(&lock->dep_map, ip);
> - if (use_ww_ctx)
> + if (use_ww_ctx && ww_ctx) {
> + struct ww_mutex *ww;
> +
> + ww = container_of(lock, struct ww_mutex, base);
> ww_mutex_set_context_fastpath(ww, ww_ctx);
> + }
> preempt_enable();
> return 0;
> }
> @@ -702,7 +707,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> goto err;
> }
>
> - if (use_ww_ctx && ww_ctx->acquired > 0) {
> + if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
> ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
> if (ret)
> goto err;
> @@ -742,8 +747,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> /* got the lock - cleanup and rejoice! */
> lock_acquired(&lock->dep_map, ip);
>
> - if (use_ww_ctx)
> + if (use_ww_ctx && ww_ctx) {
> + struct ww_mutex *ww;
> +
> + ww = container_of(lock, struct ww_mutex, base);
> ww_mutex_set_context_slowpath(ww, ww_ctx);
> + }
>
> spin_unlock_mutex(&lock->wait_lock, flags);
> preempt_enable();

All that then reverts to:

- if (use_ww_ctx)
+ if (use_ww_ctx && ww_ctx)


2016-12-06 15:36:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
> +static inline int __sched
> +__ww_mutex_add_waiter(struct mutex_waiter *waiter,
> + struct mutex *lock,
> + struct ww_acquire_ctx *ww_ctx)
> +{
> + struct mutex_waiter *cur;
> +
> + if (!ww_ctx) {
> + list_add_tail(&waiter->list, &lock->wait_list);
> + return 0;
> + }
> +
> + /*
> + * Add the waiter before the first waiter with a higher stamp.
> + * Waiters without a context are skipped to avoid starving
> + * them.
> + */
> + list_for_each_entry(cur, &lock->wait_list, list) {
> + if (!cur->ww_ctx)
> + continue;
> +
> + if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
> + /* Back off immediately if necessary. */
> + if (ww_ctx->acquired > 0) {
> +#ifdef CONFIG_DEBUG_MUTEXES
> + struct ww_mutex *ww;
> +
> + ww = container_of(lock, struct ww_mutex, base);
> + DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> + ww_ctx->contending_lock = ww;
> +#endif
> + return -EDEADLK;
> + }
> +
> + continue;
> + }
> +
> + list_add_tail(&waiter->list, &cur->list);
> + return 0;
> + }
> +
> + list_add_tail(&waiter->list, &lock->wait_list);
> + return 0;
> +}

So you keep the list in order of stamp, and in general stamps come in,
in-order. That is, barring races on concurrent ww_mutex_lock(), things
are already ordered.

So it doesn't make sense to scan the entire list forwards, that's almost
guarantees you scan the entire list every single time.

Or am I reading this wrong? Which in itself is a hint a comment might be
in place.

2016-12-06 16:03:33

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

On 12/06/2016 10:06 AM, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 03:06:45PM +0100, Nicolai H?hnle wrote:
>> +++ b/kernel/locking/mutex.c
>> @@ -350,7 +350,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
>> * access and not reliable.
>> */
>> static noinline
>> -bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
>> +bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
>> + bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
>> {
>> bool ret = true;
>>
>> @@ -373,6 +374,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
>> break;
>> }
>>
>> + if (use_ww_ctx && ww_ctx->acquired > 0) {
>> + struct ww_mutex *ww;
>> +
>> + ww = container_of(lock, struct ww_mutex, base);
>> +
>> + /*
>> + * If ww->ctx is set the contents are undefined, only
>> + * by acquiring wait_lock there is a guarantee that
>> + * they are not invalid when reading.
>> + *
>> + * As such, when deadlock detection needs to be
>> + * performed the optimistic spinning cannot be done.
>> + *
>> + * Check this in every inner iteration because we may
>> + * be racing against another thread's ww_mutex_lock.
>> + */
>> + if (READ_ONCE(ww->ctx)) {
>> + ret = false;
>> + break;
>> + }
>> + }
>> +
>> cpu_relax();
>> }
>> rcu_read_unlock();
> Aside from the valid question about mutex_can_spin_on_owner() there's
> another 'problem' here, mutex_spin_on_owner() is marked noinline, so all
> the use_ww_ctx stuff doesn't 'work' here.

The mutex_spin_on_owner() function was originally marked noinline
because it could be a major consumer of CPU cycles in a contended lock.
Having it shown separately in the perf output will help the users have a
better understanding of what is consuming all the CPU cycles. So I would
still like to keep it this way.

If you have concern about additional latency for non-ww_mutex calls, one
alternative can be:

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0afa998..777338d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -349,9 +349,9 @@ static __always_inline void ww_mutex_lock_acquired(struct ww
* Look out! "owner" is an entirely speculative pointer
* access and not reliable.
*/
-static noinline
-bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
- bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
+static __always_inline
+bool __mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
+ const bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
{
bool ret = true;

@@ -403,6 +403,19 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_st
return ret;
}

+static noinline
+bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+{
+ return __mutex_spin_on_owner(lock, owner, false, NULL);
+}
+
+static noinline
+bool ww_mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
+ struct ww_acquire_ctx *ww_ctx)
+{
+ return __mutex_spin_on_owner(lock, owner, true, ww_ctx);
+}
+
/*
* Initial check for entering the mutex spinning loop
*/
@@ -489,13 +502,17 @@ static bool mutex_optimistic_spin(struct mutex *lock,
*/
owner = __mutex_owner(lock);
if (owner) {
+ bool spin_ok;
+
if (waiter && owner == task) {
smp_mb(); /* ACQUIRE */
break;
}

- if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
- ww_ctx))
+ spin_ok = use_ww_ctx
+ ? ww_mutex_spin_on_owner(lock, owner, ww_ctx)
+ : mutex_spin_on_owner(lock, owner);
+ if (!spin_ok)
goto fail_unlock;
}



2016-12-06 16:56:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> * mutex_unlock() handing the lock off to us, do a trylock
> * before testing the error conditions to make sure we pick up
> * the handoff.
> + *
> + * For w/w locks, we always need to do this even if we're not
> + * currently the first waiter, because we may have been the
> + * first waiter during the unlock.
> */
> - if (__mutex_trylock(lock, first))
> + if (__mutex_trylock(lock, use_ww_ctx || first))
> goto acquired;

So I'm somewhat uncomfortable with this. The point is that with the
.handoff logic it is very easy to accidentally allow:

mutex_lock(&a);
mutex_lock(&a);

And I'm not sure this doesn't make that happen for ww_mutexes. We get to
this __mutex_trylock() without first having blocked.


> /*
> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> spin_unlock_mutex(&lock->wait_lock, flags);
> schedule_preempt_disabled();
>
> - if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> + if (use_ww_ctx && ww_ctx) {
> + /*
> + * Always re-check whether we're in first position. We
> + * don't want to spin if another task with a lower
> + * stamp has taken our position.
> + *
> + * We also may have to set the handoff flag again, if
> + * our position at the head was temporarily taken away.
> + */
> + first = __mutex_waiter_is_first(lock, &waiter);
> +
> + if (first)
> + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> first = true;
> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> }

So the point is that !ww_ctx entries are 'skipped' during the insertion
and therefore, if one becomes first, it must stay first?

> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> * or we must see its unlock and acquire.
> */
> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
> - __mutex_trylock(lock, first))
> + __mutex_trylock(lock, use_ww_ctx || first))
> break;
>
> spin_lock_mutex(&lock->wait_lock, flags);


2016-12-06 18:47:40

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

On 12/06/2016 01:29 PM, Peter Zijlstra wrote:
> On Tue, Dec 06, 2016 at 11:03:28AM -0500, Waiman Long wrote:
>> The mutex_spin_on_owner() function was originally marked noinline
>> because it could be a major consumer of CPU cycles in a contended lock.
>> Having it shown separately in the perf output will help the users have a
>> better understanding of what is consuming all the CPU cycles. So I would
>> still like to keep it this way.
> ah!, I tried to dig through history but couldn't find a reason for it.
>
>> If you have concern about additional latency for non-ww_mutex calls, one
>> alternative can be:
> That's pretty horrific :/

I am not totally against making mutex_spin_on_owner() an inline
function. If you think it is the right way to go, I am OK with that.

-Longman

2016-12-06 19:08:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop

On Tue, Dec 06, 2016 at 11:03:28AM -0500, Waiman Long wrote:
>
> The mutex_spin_on_owner() function was originally marked noinline
> because it could be a major consumer of CPU cycles in a contended lock.
> Having it shown separately in the perf output will help the users have a
> better understanding of what is consuming all the CPU cycles. So I would
> still like to keep it this way.

ah!, I tried to dig through history but couldn't find a reason for it.

>
> If you have concern about additional latency for non-ww_mutex calls, one
> alternative can be:

That's pretty horrific :/

2016-12-16 13:17:38

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

On 06.12.2016 16:25, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 03:06:47PM +0100, Nicolai H?hnle wrote:
>
>> @@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> struct mutex_waiter waiter;
>> unsigned long flags;
>> bool first = false;
>> - struct ww_mutex *ww;
>> int ret;
>>
>> - if (use_ww_ctx) {
>> + if (use_ww_ctx && ww_ctx) {
>> + struct ww_mutex *ww;
>> +
>> ww = container_of(lock, struct ww_mutex, base);
>> if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>> return -EALREADY;
>
> So I don't see the point of removing *ww from the function scope, we can
> still compute that container_of() even if !ww_ctx, right? That would
> safe a ton of churn below, adding all those struct ww_mutex declarations
> and container_of() casts.
>
> (and note that the container_of() is a fancy NO-OP because base is the
> first member).

Sorry for taking so long to get back to you.

In my experience, the undefined behavior sanitizer in GCC for userspace
programs complains about merely casting a pointer to the wrong type. I
never went into the standards rabbit hole to figure out the details. It
might be a C++ only thing (ubsan cannot tell the difference otherwise
anyway), but that was the reason for doing the change in this more
complicated way.

Are you sure that this is defined behavior in C? If so, I'd be happy to
go with the version that has less churn.

I'll also get rid of those ww_mutex_lock* wrapper functions.

Thanks,
Nicolai

>
>> @@ -656,8 +657,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
>> /* got the lock, yay! */
>> lock_acquired(&lock->dep_map, ip);
>> - if (use_ww_ctx)
>> + if (use_ww_ctx && ww_ctx) {
>> + struct ww_mutex *ww;
>> +
>> + ww = container_of(lock, struct ww_mutex, base);
>> ww_mutex_set_context_fastpath(ww, ww_ctx);
>> + }
>> preempt_enable();
>> return 0;
>> }
>> @@ -702,7 +707,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> goto err;
>> }
>>
>> - if (use_ww_ctx && ww_ctx->acquired > 0) {
>> + if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
>> ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
>> if (ret)
>> goto err;
>> @@ -742,8 +747,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> /* got the lock - cleanup and rejoice! */
>> lock_acquired(&lock->dep_map, ip);
>>
>> - if (use_ww_ctx)
>> + if (use_ww_ctx && ww_ctx) {
>> + struct ww_mutex *ww;
>> +
>> + ww = container_of(lock, struct ww_mutex, base);
>> ww_mutex_set_context_slowpath(ww, ww_ctx);
>> + }
>>
>> spin_unlock_mutex(&lock->wait_lock, flags);
>> preempt_enable();
>
> All that then reverts to:
>
> - if (use_ww_ctx)
> + if (use_ww_ctx && ww_ctx)
>
>

2016-12-16 13:35:04

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On 06.12.2016 16:36, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
>> +static inline int __sched
>> +__ww_mutex_add_waiter(struct mutex_waiter *waiter,
>> + struct mutex *lock,
>> + struct ww_acquire_ctx *ww_ctx)
>> +{
>> + struct mutex_waiter *cur;
>> +
>> + if (!ww_ctx) {
>> + list_add_tail(&waiter->list, &lock->wait_list);
>> + return 0;
>> + }
>> +
>> + /*
>> + * Add the waiter before the first waiter with a higher stamp.
>> + * Waiters without a context are skipped to avoid starving
>> + * them.
>> + */
>> + list_for_each_entry(cur, &lock->wait_list, list) {
>> + if (!cur->ww_ctx)
>> + continue;
>> +
>> + if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
>> + /* Back off immediately if necessary. */
>> + if (ww_ctx->acquired > 0) {
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> + struct ww_mutex *ww;
>> +
>> + ww = container_of(lock, struct ww_mutex, base);
>> + DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
>> + ww_ctx->contending_lock = ww;
>> +#endif
>> + return -EDEADLK;
>> + }
>> +
>> + continue;
>> + }
>> +
>> + list_add_tail(&waiter->list, &cur->list);
>> + return 0;
>> + }
>> +
>> + list_add_tail(&waiter->list, &lock->wait_list);
>> + return 0;
>> +}
>
> So you keep the list in order of stamp, and in general stamps come in,
> in-order. That is, barring races on concurrent ww_mutex_lock(), things
> are already ordered.
>
> So it doesn't make sense to scan the entire list forwards, that's almost
> guarantees you scan the entire list every single time.
>
> Or am I reading this wrong? Which in itself is a hint a comment might be
> in place.

No, it's a reasonable question. Some things to keep in mind:

1. Each ww_acquire_ctx may be used with hundreds of locks. It's not that
clear that things will be ordered in a contention scenario, especially
since the old stamp is re-used when a context backs off and goes into
the slow path (with ww_ctx->acquired == 0).

2. We want to add waiters directly before the first waiter with a higher
stamp. Keeping in mind that there may be non-ww_ctx waiters in between,
and we don't want to starve them, traversing the list backwards would
require keeping the best insertion point around in a separate variable.
Clearly possible, but it seemed more awkward.

In hindsight, backwards iteration may not be so bad.

Nicolai

2016-12-16 14:20:03

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

Hi Peter and Chris,

(trying to combine the handoff discussion here)

On 06.12.2016 17:55, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
>> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> * mutex_unlock() handing the lock off to us, do a trylock
>> * before testing the error conditions to make sure we pick up
>> * the handoff.
>> + *
>> + * For w/w locks, we always need to do this even if we're not
>> + * currently the first waiter, because we may have been the
>> + * first waiter during the unlock.
>> */
>> - if (__mutex_trylock(lock, first))
>> + if (__mutex_trylock(lock, use_ww_ctx || first))
>> goto acquired;
>
> So I'm somewhat uncomfortable with this. The point is that with the
> .handoff logic it is very easy to accidentally allow:
>
> mutex_lock(&a);
> mutex_lock(&a);
>
> And I'm not sure this doesn't make that happen for ww_mutexes. We get to
> this __mutex_trylock() without first having blocked.

Okay, took me a while, but I see the problem. If we have:

ww_mutex_lock(&a, NULL);
ww_mutex_lock(&a, ctx);

then it's possible that another currently waiting task sets the HANDOFF
flag between those calls and we'll allow the second ww_mutex_lock to go
through.

The concern about picking up a handoff that we didn't request is real,
though it cannot happen in the first iteration. Perhaps this
__mutex_trylock can be moved to the end of the loop? See below...


>
>
>> /*
>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> spin_unlock_mutex(&lock->wait_lock, flags);
>> schedule_preempt_disabled();
>>
>> - if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>> + if (use_ww_ctx && ww_ctx) {
>> + /*
>> + * Always re-check whether we're in first position. We
>> + * don't want to spin if another task with a lower
>> + * stamp has taken our position.
>> + *
>> + * We also may have to set the handoff flag again, if
>> + * our position at the head was temporarily taken away.
>> + */
>> + first = __mutex_waiter_is_first(lock, &waiter);
>> +
>> + if (first)
>> + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>> + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>> first = true;
>> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>> }
>
> So the point is that !ww_ctx entries are 'skipped' during the insertion
> and therefore, if one becomes first, it must stay first?

Yes. Actually, it should be possible to replace all the cases of
use_ww_ctx || first with ww_ctx. Similarly, all cases of use_ww_ctx &&
ww_ctx could be replaced by just ww_ctx.

>
>> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> * or we must see its unlock and acquire.
>> */
>> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
>> - __mutex_trylock(lock, first))
>> + __mutex_trylock(lock, use_ww_ctx || first))
>> break;
>>
>> spin_lock_mutex(&lock->wait_lock, flags);

Change this code to:

acquired = first &&
mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
&waiter);
spin_lock_mutex(&lock->wait_lock, flags);

if (acquired ||
__mutex_trylock(lock, use_ww_ctx || first))
break;
}

This changes the trylock to always be under the wait_lock, but we
previously had that at the beginning of the loop anyway. It also removes
back-to-back calls to __mutex_trylock when going through the loop; and
for the first iteration, there is a __mutex_trylock under wait_lock
already before adding ourselves to the wait list.

What do you think?

Nicolai

2016-12-16 14:21:49

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On 01.12.2016 16:59, Chris Wilson wrote:
> On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
>> @@ -677,15 +722,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> debug_mutex_lock_common(lock, &waiter);
>> debug_mutex_add_waiter(lock, &waiter, task);
>>
>> - /* add waiting tasks to the end of the waitqueue (FIFO): */
>> - list_add_tail(&waiter.list, &lock->wait_list);
>> + lock_contended(&lock->dep_map, ip);
>> +
>> + if (!use_ww_ctx) {
>> + /* add waiting tasks to the end of the waitqueue (FIFO): */
>> + list_add_tail(&waiter.list, &lock->wait_list);
>> + } else {
>> + /* Add in stamp order, waking up waiters that must back off. */
>> + ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
>> + if (ret)
>> + goto err_early_backoff;
>> +
>> + waiter.ww_ctx = ww_ctx;
>> + }
>> +
>> waiter.task = task;
>
> Would an unconditional waiter.ww_ctx = ww_ctx be chep enough? (Same
> cacheline write and all that?)
>
> Makes the above clearer in that you have
>
> if (!ww_ctx) {
> list_add_tail();
> } else {
> ret = __ww_mutex_add_waiter(); /* no need to handle !ww_ctx */
> if (ret)
> goto err_early_backoff;
> }
>
> waiter.ww_ctx = ww_ctx;
> waiter.task = task;

I don't feel strongly either way. I thought it'd be nice to have an
explicit distinction between mutex_lock(&a) and ww_mutex_lock(&a, NULL)
though.

>
>>
>> if (__mutex_waiter_is_first(lock, &waiter))
>> __mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
>>
>> - lock_contended(&lock->dep_map, ip);
>> -
>> set_task_state(task, state);
>> for (;;) {
>> /*
>> @@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> * mutex_unlock() handing the lock off to us, do a trylock
>> * before testing the error conditions to make sure we pick up
>> * the handoff.
>> + *
>> + * For w/w locks, we always need to do this even if we're not
>> + * currently the first waiter, because we may have been the
>> + * first waiter during the unlock.
>> */
>> - if (__mutex_trylock(lock, first))
>> + if (__mutex_trylock(lock, use_ww_ctx || first))
>
> I'm not certain about the magic of first vs HANDOFF. Afaict, first ==
> HANDOFF and this patch breaks that relationship. I think you need to add
> bool handoff; as a separate tracker to first.
>
>> goto acquired;
>>
>> /*
>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> spin_unlock_mutex(&lock->wait_lock, flags);
>> schedule_preempt_disabled();
>>
>> - if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>> + if (use_ww_ctx && ww_ctx) {
>> + /*
>> + * Always re-check whether we're in first position. We
>> + * don't want to spin if another task with a lower
>> + * stamp has taken our position.
>> + *
>> + * We also may have to set the handoff flag again, if
>> + * our position at the head was temporarily taken away.
>
> Comment makes sense.
>
> Ah. Should this be just if (use_ww_ctx) { /* always recheck... */ ?
> Except that !ww_ctx are never gazzumped in the list, so if they are
> first, then they are always first.

Right. See also the other mail.

Nicolai

>
> Could you explain that as well (about why !ww_ctx is special here but
> not above). And then it can even be reduced to if (ww_ctx) {} to match
> the first chunk if the revision is acceptable.
> -Chris
>

2016-12-16 16:08:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai H?hnle wrote:
> Hi Peter and Chris,
>
> (trying to combine the handoff discussion here)
>
> On 06.12.2016 17:55, Peter Zijlstra wrote:
> >On Thu, Dec 01, 2016 at 03:06:48PM +0100, Nicolai H?hnle wrote:
> >>@@ -693,8 +748,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> * mutex_unlock() handing the lock off to us, do a trylock
> >> * before testing the error conditions to make sure we pick up
> >> * the handoff.
> >>+ *
> >>+ * For w/w locks, we always need to do this even if we're not
> >>+ * currently the first waiter, because we may have been the
> >>+ * first waiter during the unlock.
> >> */
> >>- if (__mutex_trylock(lock, first))
> >>+ if (__mutex_trylock(lock, use_ww_ctx || first))
> >> goto acquired;
> >
> >So I'm somewhat uncomfortable with this. The point is that with the
> >.handoff logic it is very easy to accidentally allow:
> >
> > mutex_lock(&a);
> > mutex_lock(&a);
> >
> >And I'm not sure this doesn't make that happen for ww_mutexes. We get to
> >this __mutex_trylock() without first having blocked.
>
> Okay, took me a while, but I see the problem. If we have:
>
> ww_mutex_lock(&a, NULL);
> ww_mutex_lock(&a, ctx);
>
> then it's possible that another currently waiting task sets the HANDOFF flag
> between those calls and we'll allow the second ww_mutex_lock to go through.

Its worse, __mutex_trylock() doesn't check if MUTEX_FLAG_HANDOFF is set,
if .handoff == true && __owner_task() == current, we 'acquire'.

And since 'use_ww_ctx' is unconditionally true for ww_mutex_lock(), the
sequence:

ww_mutex_lock(&a, ...);
ww_mutex_lock(&a, ...);

will 'work'.

2016-12-16 17:15:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai H?hnle wrote:
> The concern about picking up a handoff that we didn't request is real,
> though it cannot happen in the first iteration. Perhaps this __mutex_trylock
> can be moved to the end of the loop? See below...


> >>@@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> * or we must see its unlock and acquire.
> >> */
> >> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
> >>- __mutex_trylock(lock, first))
> >>+ __mutex_trylock(lock, use_ww_ctx || first))
> >> break;
> >>
> >> spin_lock_mutex(&lock->wait_lock, flags);
>
> Change this code to:
>
> acquired = first &&
> mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
> &waiter);
> spin_lock_mutex(&lock->wait_lock, flags);
>
> if (acquired ||
> __mutex_trylock(lock, use_ww_ctx || first))
> break;

goto acquired;

will work lots better.

> }
>
> This changes the trylock to always be under the wait_lock, but we previously
> had that at the beginning of the loop anyway.

> It also removes back-to-back
> calls to __mutex_trylock when going through the loop;

Yeah, I had that explicitly. It allows taking the mutex when
mutex_unlock() is still holding the wait_lock.

> and for the first
> iteration, there is a __mutex_trylock under wait_lock already before adding
> ourselves to the wait list.

Correct.

2016-12-16 17:20:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai H?hnle wrote:
> >>@@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> spin_unlock_mutex(&lock->wait_lock, flags);
> >> schedule_preempt_disabled();
> >>
> >>- if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> >>+ if (use_ww_ctx && ww_ctx) {
> >>+ /*
> >>+ * Always re-check whether we're in first position. We
> >>+ * don't want to spin if another task with a lower
> >>+ * stamp has taken our position.
> >>+ *
> >>+ * We also may have to set the handoff flag again, if
> >>+ * our position at the head was temporarily taken away.
> >>+ */
> >>+ first = __mutex_waiter_is_first(lock, &waiter);
> >>+
> >>+ if (first)
> >>+ __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> >>+ } else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
> >> first = true;
> >> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> >> }
> >
> >So the point is that !ww_ctx entries are 'skipped' during the insertion
> >and therefore, if one becomes first, it must stay first?
>
> Yes. Actually, it should be possible to replace all the cases of use_ww_ctx
> || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be
> replaced by just ww_ctx.


I'm not seeing how "use_ww_ctx || first" -> "ww_ctx" works. And while
"use_ww_ctx && ww_ctx" -> "ww_ctx" is correct, it didn't work right on
some older GCCs, they choked on value propagation for ww_ctx and kept
emitting code even if we passed in NULL. Hence use_ww_ctx.

Arnd is now looking to raise the minimum supported GCC version, so maybe
we should look at that again if he gets anywhere.

2016-12-16 18:11:52

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On 16.12.2016 18:15, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai H?hnle wrote:
>> The concern about picking up a handoff that we didn't request is real,
>> though it cannot happen in the first iteration. Perhaps this __mutex_trylock
>> can be moved to the end of the loop? See below...
>
>
>>>> @@ -728,7 +800,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>>> * or we must see its unlock and acquire.
>>>> */
>>>> if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
>>>> - __mutex_trylock(lock, first))
>>>> + __mutex_trylock(lock, use_ww_ctx || first))
>>>> break;
>>>>
>>>> spin_lock_mutex(&lock->wait_lock, flags);
>>
>> Change this code to:
>>
>> acquired = first &&
>> mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
>> &waiter);
>> spin_lock_mutex(&lock->wait_lock, flags);
>>
>> if (acquired ||
>> __mutex_trylock(lock, use_ww_ctx || first))
>> break;
>
> goto acquired;
>
> will work lots better.

Wasn't explicit enough, sorry. The idea was to get rid of the acquired
label and change things so that all paths exit the loop with wait_lock
held. That seems cleaner to me.


>> }
>>
>> This changes the trylock to always be under the wait_lock, but we previously
>> had that at the beginning of the loop anyway.
>
>> It also removes back-to-back
>> calls to __mutex_trylock when going through the loop;
>
> Yeah, I had that explicitly. It allows taking the mutex when
> mutex_unlock() is still holding the wait_lock.

mutex_optimistic_spin() already calls __mutex_trylock, and for the
no-spin case, __mutex_unlock_slowpath() only calls wake_up_q() after
releasing the wait_lock.

So I don't see the purpose of the back-to-back __mutex_trylocks,
especially considering that if the first one succeeds, we immediately
take the wait_lock anyway.

Nicolai



>> and for the first
>> iteration, there is a __mutex_trylock under wait_lock already before adding
>> ourselves to the wait list.
>
> Correct.
>

2016-12-16 18:13:20

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order



On 16.12.2016 18:20, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 03:19:43PM +0100, Nicolai H?hnle wrote:
>>>> @@ -716,7 +775,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>>> spin_unlock_mutex(&lock->wait_lock, flags);
>>>> schedule_preempt_disabled();
>>>>
>>>> - if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>>>> + if (use_ww_ctx && ww_ctx) {
>>>> + /*
>>>> + * Always re-check whether we're in first position. We
>>>> + * don't want to spin if another task with a lower
>>>> + * stamp has taken our position.
>>>> + *
>>>> + * We also may have to set the handoff flag again, if
>>>> + * our position at the head was temporarily taken away.
>>>> + */
>>>> + first = __mutex_waiter_is_first(lock, &waiter);
>>>> +
>>>> + if (first)
>>>> + __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>> + } else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
>>>> first = true;
>>>> __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>> }
>>>
>>> So the point is that !ww_ctx entries are 'skipped' during the insertion
>>> and therefore, if one becomes first, it must stay first?
>>
>> Yes. Actually, it should be possible to replace all the cases of use_ww_ctx
>> || first with ww_ctx. Similarly, all cases of use_ww_ctx && ww_ctx could be
>> replaced by just ww_ctx.
>
>
> I'm not seeing how "use_ww_ctx || first" -> "ww_ctx" works.

My bad, missing the '|| first'.


> And while
> "use_ww_ctx && ww_ctx" -> "ww_ctx" is correct, it didn't work right on
> some older GCCs, they choked on value propagation for ww_ctx and kept
> emitting code even if we passed in NULL. Hence use_ww_ctx.

Okay, I'll stick with use_ww_ctx. Thanks for the explanation.

Nicolai


> Arnd is now looking to raise the minimum supported GCC version, so maybe
> we should look at that again if he gets anywhere.
>

2016-12-16 20:10:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On Fri, Dec 16, 2016 at 07:11:41PM +0100, Nicolai H?hnle wrote:
> mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin
> case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the
> wait_lock.

mutex_optimistic_spin() is a no-op when !CONFIG_MUTEX_SPIN_ON_OWNER

2016-12-16 22:35:20

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] locking/ww_mutex: Add waiters in stamp order

On 16.12.2016 21:00, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 07:11:41PM +0100, Nicolai H?hnle wrote:
>> mutex_optimistic_spin() already calls __mutex_trylock, and for the no-spin
>> case, __mutex_unlock_slowpath() only calls wake_up_q() after releasing the
>> wait_lock.
>
> mutex_optimistic_spin() is a no-op when !CONFIG_MUTEX_SPIN_ON_OWNER

Does this change the conclusion in a meaningful way? I did mention the
no-spin case in the very part you quoted...

Again, AFAIU we're talking about the part of my proposal that turns what
is effectively

__mutex_trylock(lock, ...);
spin_lock_mutex(&lock->wait_lock, flags);

(independent of whether the trylock succeeds or not!) into

spin_lock_mutex(&lock->wait_lock, flags);
__mutex_trylock(lock, ...);

in an effort to streamline the code overall.

Also AFAIU, you're concerned that spin_lock_mutex(...) has to wait for
an unlock from mutex_unlock(), but when does that actually happen with
relevant probability?

When we spin optimistically, that could happen -- except that
__mutex_trylock is already called in mutex_optimistic_spin, so it
doesn't matter. When we don't spin -- whether due to .config or !first
-- then the chance of overlap with mutex_unlock is exceedingly small.

Even if we do overlap, we'll have to wait for mutex_unlock to release
the wait_lock anyway! So what good does acquiring the lock first really do?

Anyway, this is really more of an argument about whether there's really
a good reason to calling __mutex_trylock twice in that loop. I don't
think there is, your arguments certainly haven't been convincing, but
the issue can be side-stepped for this patch by keeping the trylock
calls as they are and just setting first = true unconditionally for
ww_ctx != NULL (but keep the logic for when to set the HANDOFF flag
as-is). Should probably rename the variable s/first/handoff/ then.

Nicolai

2016-12-17 07:53:57

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

Op 16-12-16 om 14:17 schreef Nicolai H?hnle:
> On 06.12.2016 16:25, Peter Zijlstra wrote:
>> On Thu, Dec 01, 2016 at 03:06:47PM +0100, Nicolai H?hnle wrote:
>>
>>> @@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>> struct mutex_waiter waiter;
>>> unsigned long flags;
>>> bool first = false;
>>> - struct ww_mutex *ww;
>>> int ret;
>>>
>>> - if (use_ww_ctx) {
>>> + if (use_ww_ctx && ww_ctx) {
>>> + struct ww_mutex *ww;
>>> +
>>> ww = container_of(lock, struct ww_mutex, base);
>>> if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>>> return -EALREADY;
>>
>> So I don't see the point of removing *ww from the function scope, we can
>> still compute that container_of() even if !ww_ctx, right? That would
>> safe a ton of churn below, adding all those struct ww_mutex declarations
>> and container_of() casts.
>>
>> (and note that the container_of() is a fancy NO-OP because base is the
>> first member).
>
> Sorry for taking so long to get back to you.
>
> In my experience, the undefined behavior sanitizer in GCC for userspace programs complains about merely casting a pointer to the wrong type. I never went into the standards rabbit hole to figure out the details. It might be a C++ only thing (ubsan cannot tell the difference otherwise anyway), but that was the reason for doing the change in this more complicated way.
>
> Are you sure that this is defined behavior in C? If so, I'd be happy to go with the version that has less churn.
>
> I'll also get rid of those ww_mutex_lock* wrapper functions.

ww_ctx = use_ww_ctx ? container_of : NULL ?

2016-12-17 13:50:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context

On Fri, Dec 16, 2016 at 02:17:25PM +0100, Nicolai H?hnle wrote:
> On 06.12.2016 16:25, Peter Zijlstra wrote:
> >On Thu, Dec 01, 2016 at 03:06:47PM +0100, Nicolai H?hnle wrote:
> >
> >>@@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >> struct mutex_waiter waiter;
> >> unsigned long flags;
> >> bool first = false;
> >>- struct ww_mutex *ww;
> >> int ret;
> >>
> >>- if (use_ww_ctx) {
> >>+ if (use_ww_ctx && ww_ctx) {
> >>+ struct ww_mutex *ww;
> >>+
> >> ww = container_of(lock, struct ww_mutex, base);
> >> if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
> >> return -EALREADY;
> >
> >So I don't see the point of removing *ww from the function scope, we can
> >still compute that container_of() even if !ww_ctx, right? That would
> >safe a ton of churn below, adding all those struct ww_mutex declarations
> >and container_of() casts.
> >
> >(and note that the container_of() is a fancy NO-OP because base is the
> >first member).
>
> Sorry for taking so long to get back to you.
>
> In my experience, the undefined behavior sanitizer in GCC for userspace
> programs complains about merely casting a pointer to the wrong type. I never
> went into the standards rabbit hole to figure out the details. It might be a
> C++ only thing (ubsan cannot tell the difference otherwise anyway), but that
> was the reason for doing the change in this more complicated way.

Note that C only has what C++ calls reinterpret_cast<>(). It cannot
complain about a 'wrong' cast, there is no such thing.

Also, container_of() works, irrespective of what C language says about
it -- note that the kernel in general hard relies on a lot of things C
calls undefined behaviour.

> Are you sure that this is defined behavior in C? If so, I'd be happy to go
> with the version that has less churn.

It should very much work with kernel C.