2018-06-14 07:30:54

by Thomas Hellstrom

[permalink] [raw]
Subject: [PATCH v2 0/2] locking,drm: Fix ww mutex naming / algorithm inconsistency

This is a small fallout from a work to allow batching WW mutex locks and
unlocks.

Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but
the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to
"Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested
here is to implement also the "Wound-Wait" algorithm as a per-WW-class
choice, as it has advantages in some cases. See for example

http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html

Now Wound-Wait is a preemptive algorithm, and the preemption is implemented
using a lazy scheme: If a wounded transaction is about to go to sleep on
a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock
prevention. Since with WW mutexes we also require the aborted transaction to
sleep waiting to lock the WW mutex it was aborted on, this choice also provides
a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first
WW mutex lock after the transaction was wounded whether the WW mutex was
contended or not, the transaction might frequently be restarted without a wait,
which is far from optimal. Note also that with the lazy preemption scheme,
contrary to Wait-Die there will be no rollbacks on lock contention of locks
held by a transaction that has completed its locking sequence.

The modeset locks are then changed from Wait-Die to Wound-Wait since the
typical locking pattern of those locks very well matches the criterion for
a substantial reduction in the number of rollbacks. For reservation objects,
the benefit is more unclear at this point and they remain using Wait-Die.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

v2:
Updated DEFINE_WW_CLASS() API, minor code- and comment fixes as
detailed in each patch.


2018-06-14 07:32:22

by Thomas Hellstrom

[permalink] [raw]
Subject: [PATCH v2 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.

For modeset locks we don't expect a high number of contending
transactions so change algorithm from Wait-Die to Wound-Wait.

Signed-off-by: Thomas Hellstrom <[email protected]>

---
v2: Update to API change.
---
drivers/gpu/drm/drm_modeset_lock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index ff00a814f617..8a5100685875 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
* lists and lookup data structures.
*/

-static DEFINE_WW_CLASS_WDIE(crtc_ww_class);
+static DEFINE_WW_CLASS(crtc_ww_class);

/**
* drm_modeset_lock_all - take all modeset locks
--
2.14.3


2018-06-14 07:32:41

by Thomas Hellstrom

[permalink] [raw]
Subject: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-14

Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
chosen out of 100000. Four core Intel x86_64:

Algorithm #threads Rollbacks time
Wound-Wait 4 ~100 ~17s.
Wait-Die 4 ~150000 ~19s.
Wound-Wait 16 ~360000 ~109s.
Wait-Die 16 ~450000 ~82s.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Thomas Hellstrom <[email protected]>

---
v2:
* Update API according to review comment by Greg Kroah-Hartman.
* Address review comments by Peter Zijlstra:
- Avoid _Bool in composites
- Fix typo
- Use __mutex_owner() where applicable
- Rely on built-in barriers for the main loop exit condition,
struct ww_acquire_ctx::wounded. Update code comments.
- Explain unlocked use of list_empty().
---
Documentation/locking/ww-mutex-design.txt | 54 ++++++++++++----
drivers/dma-buf/reservation.c | 2 +-
drivers/gpu/drm/drm_modeset_lock.c | 2 +-
include/linux/ww_mutex.h | 19 ++++--
kernel/locking/locktorture.c | 2 +-
kernel/locking/mutex.c | 103 +++++++++++++++++++++++++++---
kernel/locking/test-ww_mutex.c | 2 +-
lib/locking-selftest.c | 2 +-
8 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..b9597def9581 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
======================================

Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
younger task) unlocks all of the buffers that it has already locked, and then
tries again.

-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.

Concepts
--------
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
trying to acquire locks doesn't grab a new reservation id, but keeps the one it
acquired when starting the lock acquisition. This ticket is stored in the
acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.

W/w class: In contrast to normal mutexes the lock class needs to be explicit for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.

Furthermore there are three different class of w/w lock acquire functions:

@@ -90,6 +105,12 @@ provided.
Usage
-----

+The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
+DEFINE_WW_CLASS_WDIE() for Wait-Die or DEFINE_WW_CLASS() for Wound-Wait.
+As a rough rule of thumb, use Wound-Wait iff you typically expect the number
+of simultaneous competing transactions to be small, and the rollback cost can
+be substantial.
+
Three different ways to acquire locks within the same w/w class. Common
definitions for methods #1 and #2:

@@ -312,12 +333,23 @@ Design:
We maintain the following invariants for the wait list:
(1) Waiters with an acquire context are sorted by stamp order; waiters
without an acquire context are interspersed in FIFO order.
- (2) Among waiters with contexts, only the first one can have other locks
- acquired already (ctx->acquired > 0). Note that this waiter may come
- after other waiters without contexts in the list.
+ (2) For Wait-Die, among waiters with contexts, only the first one can have
+ other locks acquired already (ctx->acquired > 0). Note that this waiter
+ may come after other waiters without contexts in the list.
+
+ The Wound-Wait preemption is implemented with a lazy-preemption scheme:
+ The wounded status of the transaction is checked only when there is
+ contention for a new lock and hence a true chance of deadlock. In that
+ situation, if the transaction is wounded, it backs off, clears the
+ wounded status and retries. A great benefit of implementing preemption in
+ this way is that the wounded transaction can identify a contending lock to
+ wait for before restarting the transaction. Just blindly restarting the
+ transaction would likely make the transaction end up in a situation where
+ it would have to back off again.

In general, not much contention is expected. The locks are typically used to
- serialize access to resources for devices.
+ serialize access to resources for devices, and optimization focus should
+ therefore be directed towards the uncontended cases.

Lockdep:
Special care has been taken to warn for as many cases of api abuse
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..b94a4bab2ecd 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -46,7 +46,7 @@
* write-side updates.
*/

-DEFINE_WW_CLASS(reservation_ww_class);
+DEFINE_WW_CLASS_WDIE(reservation_ww_class);
EXPORT_SYMBOL(reservation_ww_class);

struct lock_class_key reservation_seqcount_class;
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index 8a5100685875..ff00a814f617 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
* lists and lookup data structures.
*/

-static DEFINE_WW_CLASS(crtc_ww_class);
+static DEFINE_WW_CLASS_WDIE(crtc_ww_class);

/**
* drm_modeset_lock_all - take all modeset locks
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..3880813b7db5 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
*
* Wound/wait implementation:
* Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ * Copyright (C) 2018 WMWare Inc.
*
* This file contains the main data structure and API definitions.
*/
@@ -23,15 +25,17 @@ struct ww_class {
struct lock_class_key mutex_key;
const char *acquire_name;
const char *mutex_name;
+ unsigned int is_wait_die;
};

struct ww_acquire_ctx {
struct task_struct *task;
unsigned long stamp;
unsigned acquired;
+ unsigned int wounded;
+ struct ww_class *ww_class;
#ifdef CONFIG_DEBUG_MUTEXES
unsigned done_acquire;
- struct ww_class *ww_class;
struct ww_mutex *contending_lock;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -58,17 +62,21 @@ struct ww_mutex {
# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
#endif

-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
{ .stamp = ATOMIC_LONG_INIT(0) \
, .acquire_name = #ww_class "_acquire" \
- , .mutex_name = #ww_class "_mutex" }
+ , .mutex_name = #ww_class "_mutex" \
+ , .is_wait_die = _is_wait_die }

#define __WW_MUTEX_INITIALIZER(lockname, class) \
{ .base = __MUTEX_INITIALIZER(lockname.base) \
__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }

#define DEFINE_WW_CLASS(classname) \
- struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+ struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
+
+#define DEFINE_WW_CLASS_WDIE(classname) \
+ struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)

#define DEFINE_WW_MUTEX(mutexname, ww_class) \
struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,8 +131,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
ctx->task = current;
ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
ctx->acquired = 0;
-#ifdef CONFIG_DEBUG_MUTEXES
ctx->ww_class = ww_class;
+ ctx->wounded = false;
+#ifdef CONFIG_DEBUG_MUTEXES
ctx->done_acquire = 0;
ctx->contending_lock = NULL;
#endif
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 6850ffd69125..e861c1bf0e1e 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
};

#include <linux/ww_mutex.h>
-static DEFINE_WW_CLASS(torture_ww_class);
+static DEFINE_WW_CLASS_WDIE(torture_ww_class);
static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2048359f33d2..ffa00b5aaf03 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -290,12 +290,49 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
(a->stamp != b->stamp || a > b);
}

+/*
+ * Wound the lock holder transaction if it's younger than the contending
+ * transaction, and there is a possibility of a deadlock.
+ * Also if the lock holder transaction isn't the current transaction,
+ * make sure it's woken up in case it's sleeping on another ww mutex.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+ struct ww_acquire_ctx *ww_ctx,
+ struct ww_acquire_ctx *hold_ctx)
+{
+ struct task_struct *owner = __mutex_owner(lock);
+
+ lockdep_assert_held(&lock->wait_lock);
+
+ if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
+ ww_ctx->acquired > 0) {
+ hold_ctx->wounded = 1;
+
+ /*
+ * wake_up_process() paired with set_current_state() inserts
+ * sufficient barriers to make sure @owner either sees it's
+ * wounded or has a wakeup pending to re-read the wounded
+ * state.
+ *
+ * The value of hold_ctx->wounded in
+ * __ww_mutex_lock_check_stamp();
+ */
+ if (owner != current)
+ wake_up_process(owner);
+
+ return true;
+ }
+
+ return false;
+}
+
/*
* Wake up any waiters that may have to back off when the lock is held by the
* given context.
*
* Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * waiter with a context, unless the Wound-Wait algorithm is used where
+ * also subsequent waiters with a context main wound the lock holder.
*
* The current task must not be on the wait list.
*/
@@ -303,6 +340,7 @@ static void __sched
__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
{
struct mutex_waiter *cur;
+ unsigned int is_wait_die = ww_ctx->ww_class->is_wait_die;

lockdep_assert_held(&lock->wait_lock);

@@ -310,13 +348,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
if (!cur->ww_ctx)
continue;

- if (cur->ww_ctx->acquired > 0 &&
+ if (is_wait_die && cur->ww_ctx->acquired > 0 &&
__ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
debug_mutex_wake_waiter(lock, cur);
wake_up_process(cur->task);
}

- break;
+ if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+ break;
}
}

@@ -338,12 +377,18 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
* and keep spinning, or it will acquire wait_lock, add itself
* to waiter list and sleep.
*/
- smp_mb(); /* ^^^ */
+ smp_mb(); /* See comments above and below. */

/*
- * Check if lock is contended, if not there is nobody to wake up
+ * Check if lock is contended, if not there is nobody to wake up.
+ * We can use list_empty() unlocked here since it only compares a
+ * list_head field pointer to the address of the list head
+ * itself, similarly to how list_empty() can be considered RCU-safe.
+ * The memory barrier above pairs with the memory barrier in
+ * __ww_mutex_add_waiter and makes sure lock->ctx is visible before
+ * we check for waiters.
*/
- if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+ if (likely(list_empty(&lock->base.wait_list)))
return;

/*
@@ -653,6 +698,13 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
struct mutex_waiter *cur;

+ if (!ctx->ww_class->is_wait_die) {
+ if (ctx->wounded)
+ goto deadlock;
+ else
+ return 0;
+ }
+
if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
goto deadlock;

@@ -683,16 +735,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
{
struct mutex_waiter *cur;
struct list_head *pos;
+ unsigned int is_wait_die;

if (!ww_ctx) {
list_add_tail(&waiter->list, &lock->wait_list);
return 0;
}

+ is_wait_die = ww_ctx->ww_class->is_wait_die;
+
/*
* Add the waiter before the first waiter with a higher stamp.
* Waiters without a context are skipped to avoid starving
- * them.
+ * them. Wait-Die waiters may back off here. Wound-Wait waiters
+ * never back off here, but they are sorted in stamp order and
+ * may wound the lock holder.
*/
pos = &lock->wait_list;
list_for_each_entry_reverse(cur, &lock->wait_list, list) {
@@ -701,7 +758,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,

if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
/* Back off immediately if necessary. */
- if (ww_ctx->acquired > 0) {
+ if (is_wait_die && ww_ctx->acquired > 0) {
#ifdef CONFIG_DEBUG_MUTEXES
struct ww_mutex *ww;

@@ -721,13 +778,28 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
* Wake up the waiter so that it gets a chance to back
* off.
*/
- if (cur->ww_ctx->acquired > 0) {
+ if (is_wait_die && cur->ww_ctx->acquired > 0) {
debug_mutex_wake_waiter(lock, cur);
wake_up_process(cur->task);
}
}

list_add_tail(&waiter->list, pos);
+ if (!is_wait_die) {
+ struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
+ /*
+ * Make sure a racing lock taker sees a non-empty waiting list
+ * before we read ww->ctx, so that if we miss ww->ctx, the
+ * racing lock taker will see a non-empty list and call
+ * __ww_mutex_wake_up_for_backoff() and wound itself. The
+ * memory barrier pairs with the one in
+ * ww_mutex_set_context_fastpath().
+ */
+ smp_mb();
+ __ww_mutex_wound(lock, ww_ctx, ww->ctx);
+ }
+
return 0;
}

@@ -750,6 +822,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
if (use_ww_ctx && ww_ctx) {
if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
return -EALREADY;
+
+ /*
+ * Reset the wounded flag after a backoff.
+ * No other process can race and wound us here since they
+ * can't have a valid owner pointer at this time
+ */
+ if (ww_ctx->acquired == 0)
+ ww_ctx->wounded = 0;
}

preempt_disable();
@@ -858,6 +938,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
acquired:
__set_current_state(TASK_RUNNING);

+ /* We stole the lock. Need to check wounded status. */
+ if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
+ !__mutex_waiter_is_first(lock, &waiter))
+ __ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
mutex_remove_waiter(lock, &waiter, current);
if (likely(list_empty(&lock->wait_list)))
__mutex_clear_flag(lock, MUTEX_FLAGS);
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 0e4cd64ad2c0..3413430611d8 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -26,7 +26,7 @@
#include <linux/slab.h>
#include <linux/ww_mutex.h>

-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS_WDIE(ww_class);
struct workqueue_struct *wq;

struct test_mutex {
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index b5c1293ce147..d0abf65ba9ad 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -29,7 +29,7 @@
*/
static unsigned int debug_locks_verbose;

-static DEFINE_WW_CLASS(ww_lockdep);
+static DEFINE_WW_CLASS_WDIE(ww_lockdep);

static int __init setup_debug_locks_verbose(char *str)
{
--
2.14.3


2018-06-14 10:40:16

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

Hi Thomas,

On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
> The current Wound-Wait mutex algorithm is actually not Wound-Wait but
> Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
> is, contrary to Wait-Die a preemptive algorithm and is known to generate
> fewer backoffs. Testing reveals that this is true if the
> number of simultaneous contending transactions is small.
> As the number of simultaneous contending threads increases, Wait-Wound
> becomes inferior to Wait-Die in terms of elapsed time.
> Possibly due to the larger number of held locks of sleeping transactions.
>
> Update documentation and callers.
>
> Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
> tag patch-18-06-14
>
> Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
> chosen out of 100000. Four core Intel x86_64:
>
> Algorithm #threads Rollbacks time
> Wound-Wait 4 ~100 ~17s.
> Wait-Die 4 ~150000 ~19s.
> Wound-Wait 16 ~360000 ~109s.
> Wait-Die 16 ~450000 ~82s.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Gustavo Padovan <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Sean Paul <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Kate Stewart <[email protected]>
> Cc: Philippe Ombredanne <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Thomas Hellstrom <[email protected]>
>
> ---
> v2:
> * Update API according to review comment by Greg Kroah-Hartman.
> * Address review comments by Peter Zijlstra:
> - Avoid _Bool in composites
> - Fix typo
> - Use __mutex_owner() where applicable
> - Rely on built-in barriers for the main loop exit condition,
> struct ww_acquire_ctx::wounded. Update code comments.
> - Explain unlocked use of list_empty().
> ---
> Documentation/locking/ww-mutex-design.txt | 54 ++++++++++++----
> drivers/dma-buf/reservation.c | 2 +-
> drivers/gpu/drm/drm_modeset_lock.c | 2 +-
> include/linux/ww_mutex.h | 19 ++++--
> kernel/locking/locktorture.c | 2 +-
> kernel/locking/mutex.c | 103 +++++++++++++++++++++++++++---
> kernel/locking/test-ww_mutex.c | 2 +-
> lib/locking-selftest.c | 2 +-
> 8 files changed, 156 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
> index 34c3a1b50b9a..b9597def9581 100644
> --- a/Documentation/locking/ww-mutex-design.txt
> +++ b/Documentation/locking/ww-mutex-design.txt
> @@ -1,4 +1,4 @@
> -Wait/Wound Deadlock-Proof Mutex Design
> +Wound/Wait Deadlock-Proof Mutex Design
> ======================================
>
> Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
> @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
> younger task) unlocks all of the buffers that it has already locked, and then
> tries again.
>
> -In the RDBMS literature this deadlock handling approach is called wait/wound:
> -The older tasks waits until it can acquire the contended lock. The younger tasks
> -needs to back off and drop all the locks it is currently holding, i.e. the
> -younger task is wounded.
> +In the RDBMS literature, a reservation ticket is associated with a transaction.
> +and the deadlock handling approach is called Wait-Die. The name is based on
> +the actions of a locking thread when it encounters an already locked mutex.
> +If the transaction holding the lock is younger, the locking transaction waits.
> +If the transaction holding the lock is older, the locking transaction backs off
> +and dies. Hence Wait-Die.
> +There is also another algorithm called Wound-Wait:
> +If the transaction holding the lock is younger, the locking transaction
> +preempts the transaction holding the lock, requiring it to back off. It
> +Wounds the other transaction.
> +If the transaction holding the lock is older, it waits for the other
> +transaction. Hence Wound-Wait.
> +The two algorithms are both fair in that a transaction will eventually succeed.
> +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
> +compared to Wait-Die, but is, on the other hand, associated with more work than
> +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
> +algorithm which requires a reliable way to preempt another transaction.
>
> Concepts
> --------
> @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
> trying to acquire locks doesn't grab a new reservation id, but keeps the one it
> acquired when starting the lock acquisition. This ticket is stored in the
> acquire context. Furthermore the acquire context keeps track of debugging state
> -to catch w/w mutex interface abuse.
> +to catch w/w mutex interface abuse. An acquire context is representing a
> +transaction.
>
> W/w class: In contrast to normal mutexes the lock class needs to be explicit for
> -w/w mutexes, since it is required to initialize the acquire context.
> +w/w mutexes, since it is required to initialize the acquire context. The lock
> +class also specifies what algorithm to use, Wound-Wait or Wait-Die.
>
> Furthermore there are three different class of w/w lock acquire functions:
>
> @@ -90,6 +105,12 @@ provided.
> Usage
> -----
>
> +The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
> +DEFINE_WW_CLASS_WDIE() for Wait-Die or DEFINE_WW_CLASS() for Wound-Wait.
> +As a rough rule of thumb, use Wound-Wait iff you typically expect the number
> +of simultaneous competing transactions to be small, and the rollback cost can
> +be substantial.
> +
> Three different ways to acquire locks within the same w/w class. Common
> definitions for methods #1 and #2:
>
> @@ -312,12 +333,23 @@ Design:
> We maintain the following invariants for the wait list:
> (1) Waiters with an acquire context are sorted by stamp order; waiters
> without an acquire context are interspersed in FIFO order.
> - (2) Among waiters with contexts, only the first one can have other locks
> - acquired already (ctx->acquired > 0). Note that this waiter may come
> - after other waiters without contexts in the list.
> + (2) For Wait-Die, among waiters with contexts, only the first one can have
> + other locks acquired already (ctx->acquired > 0). Note that this waiter
> + may come after other waiters without contexts in the list.
> +
> + The Wound-Wait preemption is implemented with a lazy-preemption scheme:
> + The wounded status of the transaction is checked only when there is
> + contention for a new lock and hence a true chance of deadlock. In that
> + situation, if the transaction is wounded, it backs off, clears the
> + wounded status and retries. A great benefit of implementing preemption in
> + this way is that the wounded transaction can identify a contending lock to
> + wait for before restarting the transaction. Just blindly restarting the
> + transaction would likely make the transaction end up in a situation where
> + it would have to back off again.
>
> In general, not much contention is expected. The locks are typically used to
> - serialize access to resources for devices.
> + serialize access to resources for devices, and optimization focus should
> + therefore be directed towards the uncontended cases.
>
> Lockdep:
> Special care has been taken to warn for as many cases of api abuse
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..b94a4bab2ecd 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -46,7 +46,7 @@
> * write-side updates.
> */
>
> -DEFINE_WW_CLASS(reservation_ww_class);
> +DEFINE_WW_CLASS_WDIE(reservation_ww_class);
> EXPORT_SYMBOL(reservation_ww_class);
>
> struct lock_class_key reservation_seqcount_class;
> diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
> index 8a5100685875..ff00a814f617 100644
> --- a/drivers/gpu/drm/drm_modeset_lock.c
> +++ b/drivers/gpu/drm/drm_modeset_lock.c
> @@ -70,7 +70,7 @@
> * lists and lookup data structures.
> */
>
> -static DEFINE_WW_CLASS(crtc_ww_class);
> +static DEFINE_WW_CLASS_WDIE(crtc_ww_class);
>
> /**
> * drm_modeset_lock_all - take all modeset locks
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..3880813b7db5 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
> *
> * Wound/wait implementation:
> * Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + * Copyright (C) 2018 WMWare Inc.
> *
> * This file contains the main data structure and API definitions.
> */
> @@ -23,15 +25,17 @@ struct ww_class {
> struct lock_class_key mutex_key;
> const char *acquire_name;
> const char *mutex_name;
> + unsigned int is_wait_die;
> };
>
> struct ww_acquire_ctx {
> struct task_struct *task;
> unsigned long stamp;
> unsigned acquired;
> + unsigned int wounded;
> + struct ww_class *ww_class;
> #ifdef CONFIG_DEBUG_MUTEXES
> unsigned done_acquire;
> - struct ww_class *ww_class;
> struct ww_mutex *contending_lock;
> #endif
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> @@ -58,17 +62,21 @@ struct ww_mutex {
> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
> #endif
>
> -#define __WW_CLASS_INITIALIZER(ww_class) \
> +#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
> { .stamp = ATOMIC_LONG_INIT(0) \
> , .acquire_name = #ww_class "_acquire" \
> - , .mutex_name = #ww_class "_mutex" }
> + , .mutex_name = #ww_class "_mutex" \
> + , .is_wait_die = _is_wait_die }
>
> #define __WW_MUTEX_INITIALIZER(lockname, class) \
> { .base = __MUTEX_INITIALIZER(lockname.base) \
> __WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>
> #define DEFINE_WW_CLASS(classname) \
> - struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
> + struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
> +
> +#define DEFINE_WW_CLASS_WDIE(classname) \
> + struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
>
> #define DEFINE_WW_MUTEX(mutexname, ww_class) \
> struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
> @@ -123,8 +131,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
> ctx->task = current;
> ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
> ctx->acquired = 0;
> -#ifdef CONFIG_DEBUG_MUTEXES
> ctx->ww_class = ww_class;
> + ctx->wounded = false;
> +#ifdef CONFIG_DEBUG_MUTEXES
> ctx->done_acquire = 0;
> ctx->contending_lock = NULL;
> #endif
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index 6850ffd69125..e861c1bf0e1e 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
> };
>
> #include <linux/ww_mutex.h>
> -static DEFINE_WW_CLASS(torture_ww_class);
> +static DEFINE_WW_CLASS_WDIE(torture_ww_class);
> static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
> static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
> static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 2048359f33d2..ffa00b5aaf03 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -290,12 +290,49 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
> (a->stamp != b->stamp || a > b);
> }
>
> +/*
> + * Wound the lock holder transaction if it's younger than the contending
> + * transaction, and there is a possibility of a deadlock.
> + * Also if the lock holder transaction isn't the current transaction,
> + * make sure it's woken up in case it's sleeping on another ww mutex.
> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> + struct ww_acquire_ctx *ww_ctx,
> + struct ww_acquire_ctx *hold_ctx)
> +{
> + struct task_struct *owner = __mutex_owner(lock);
> +
> + lockdep_assert_held(&lock->wait_lock);
> +
> + if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> + ww_ctx->acquired > 0) {
> + hold_ctx->wounded = 1;
> +
> + /*
> + * wake_up_process() paired with set_current_state() inserts
> + * sufficient barriers to make sure @owner either sees it's
> + * wounded or has a wakeup pending to re-read the wounded
> + * state.

IIUC, "sufficient barriers" = full memory barriers (here). (You may
want to be more specific.)

> + *
> + * The value of hold_ctx->wounded in
> + * __ww_mutex_lock_check_stamp();

Missing parts/incomplete sentence?

Andrea


> + */
> + if (owner != current)
> + wake_up_process(owner);
> +
> + return true;
> + }
> +
> + return false;
> +}
> +
> /*
> * Wake up any waiters that may have to back off when the lock is held by the
> * given context.
> *
> * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * waiter with a context, unless the Wound-Wait algorithm is used where
> + * also subsequent waiters with a context main wound the lock holder.
> *
> * The current task must not be on the wait list.
> */
> @@ -303,6 +340,7 @@ static void __sched
> __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> {
> struct mutex_waiter *cur;
> + unsigned int is_wait_die = ww_ctx->ww_class->is_wait_die;
>
> lockdep_assert_held(&lock->wait_lock);
>
> @@ -310,13 +348,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> if (!cur->ww_ctx)
> continue;
>
> - if (cur->ww_ctx->acquired > 0 &&
> + if (is_wait_die && cur->ww_ctx->acquired > 0 &&
> __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
> debug_mutex_wake_waiter(lock, cur);
> wake_up_process(cur->task);
> }
>
> - break;
> + if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> + break;
> }
> }
>
> @@ -338,12 +377,18 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> * and keep spinning, or it will acquire wait_lock, add itself
> * to waiter list and sleep.
> */
> - smp_mb(); /* ^^^ */
> + smp_mb(); /* See comments above and below. */
>
> /*
> - * Check if lock is contended, if not there is nobody to wake up
> + * Check if lock is contended, if not there is nobody to wake up.
> + * We can use list_empty() unlocked here since it only compares a
> + * list_head field pointer to the address of the list head
> + * itself, similarly to how list_empty() can be considered RCU-safe.
> + * The memory barrier above pairs with the memory barrier in
> + * __ww_mutex_add_waiter and makes sure lock->ctx is visible before
> + * we check for waiters.
> */
> - if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> + if (likely(list_empty(&lock->base.wait_list)))
> return;
>
> /*
> @@ -653,6 +698,13 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
> struct mutex_waiter *cur;
>
> + if (!ctx->ww_class->is_wait_die) {
> + if (ctx->wounded)
> + goto deadlock;
> + else
> + return 0;
> + }
> +
> if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
> goto deadlock;
>
> @@ -683,16 +735,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
> {
> struct mutex_waiter *cur;
> struct list_head *pos;
> + unsigned int is_wait_die;
>
> if (!ww_ctx) {
> list_add_tail(&waiter->list, &lock->wait_list);
> return 0;
> }
>
> + is_wait_die = ww_ctx->ww_class->is_wait_die;
> +
> /*
> * Add the waiter before the first waiter with a higher stamp.
> * Waiters without a context are skipped to avoid starving
> - * them.
> + * them. Wait-Die waiters may back off here. Wound-Wait waiters
> + * never back off here, but they are sorted in stamp order and
> + * may wound the lock holder.
> */
> pos = &lock->wait_list;
> list_for_each_entry_reverse(cur, &lock->wait_list, list) {
> @@ -701,7 +758,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>
> if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
> /* Back off immediately if necessary. */
> - if (ww_ctx->acquired > 0) {
> + if (is_wait_die && ww_ctx->acquired > 0) {
> #ifdef CONFIG_DEBUG_MUTEXES
> struct ww_mutex *ww;
>
> @@ -721,13 +778,28 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
> * Wake up the waiter so that it gets a chance to back
> * off.
> */
> - if (cur->ww_ctx->acquired > 0) {
> + if (is_wait_die && cur->ww_ctx->acquired > 0) {
> debug_mutex_wake_waiter(lock, cur);
> wake_up_process(cur->task);
> }
> }
>
> list_add_tail(&waiter->list, pos);
> + if (!is_wait_die) {
> + struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
> + /*
> + * Make sure a racing lock taker sees a non-empty waiting list
> + * before we read ww->ctx, so that if we miss ww->ctx, the
> + * racing lock taker will see a non-empty list and call
> + * __ww_mutex_wake_up_for_backoff() and wound itself. The
> + * memory barrier pairs with the one in
> + * ww_mutex_set_context_fastpath().
> + */
> + smp_mb();
> + __ww_mutex_wound(lock, ww_ctx, ww->ctx);
> + }
> +
> return 0;
> }
>
> @@ -750,6 +822,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> if (use_ww_ctx && ww_ctx) {
> if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
> return -EALREADY;
> +
> + /*
> + * Reset the wounded flag after a backoff.
> + * No other process can race and wound us here since they
> + * can't have a valid owner pointer at this time
> + */
> + if (ww_ctx->acquired == 0)
> + ww_ctx->wounded = 0;
> }
>
> preempt_disable();
> @@ -858,6 +938,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> acquired:
> __set_current_state(TASK_RUNNING);
>
> + /* We stole the lock. Need to check wounded status. */
> + if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
> + !__mutex_waiter_is_first(lock, &waiter))
> + __ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +
> mutex_remove_waiter(lock, &waiter, current);
> if (likely(list_empty(&lock->wait_list)))
> __mutex_clear_flag(lock, MUTEX_FLAGS);
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> index 0e4cd64ad2c0..3413430611d8 100644
> --- a/kernel/locking/test-ww_mutex.c
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -26,7 +26,7 @@
> #include <linux/slab.h>
> #include <linux/ww_mutex.h>
>
> -static DEFINE_WW_CLASS(ww_class);
> +static DEFINE_WW_CLASS_WDIE(ww_class);
> struct workqueue_struct *wq;
>
> struct test_mutex {
> diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
> index b5c1293ce147..d0abf65ba9ad 100644
> --- a/lib/locking-selftest.c
> +++ b/lib/locking-selftest.c
> @@ -29,7 +29,7 @@
> */
> static unsigned int debug_locks_verbose;
>
> -static DEFINE_WW_CLASS(ww_lockdep);
> +static DEFINE_WW_CLASS_WDIE(ww_lockdep);
>
> static int __init setup_debug_locks_verbose(char *str)
> {
> --
> 2.14.3
>

2018-06-14 11:12:59

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On 06/14/2018 12:38 PM, Andrea Parri wrote:
> Hi Thomas,
>
> On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
>> The current Wound-Wait mutex algorithm is actually not Wound-Wait but
>> Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
>> is, contrary to Wait-Die a preemptive algorithm and is known to generate
>> fewer backoffs. Testing reveals that this is true if the
>> number of simultaneous contending transactions is small.
>> As the number of simultaneous contending threads increases, Wait-Wound
>> becomes inferior to Wait-Die in terms of elapsed time.
>> Possibly due to the larger number of held locks of sleeping transactions.
>>
>> Update documentation and callers.
>>
>> Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
>> tag patch-18-06-14
>>
>> Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
>> chosen out of 100000. Four core Intel x86_64:
>>
>> Algorithm #threads Rollbacks time
>> Wound-Wait 4 ~100 ~17s.
>> Wait-Die 4 ~150000 ~19s.
>> Wound-Wait 16 ~360000 ~109s.
>> Wait-Die 16 ~450000 ~82s.
>>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Jonathan Corbet <[email protected]>
>> Cc: Gustavo Padovan <[email protected]>
>> Cc: Maarten Lankhorst <[email protected]>
>> Cc: Sean Paul <[email protected]>
>> Cc: David Airlie <[email protected]>
>> Cc: Davidlohr Bueso <[email protected]>
>> Cc: "Paul E. McKenney" <[email protected]>
>> Cc: Josh Triplett <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Kate Stewart <[email protected]>
>> Cc: Philippe Ombredanne <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Thomas Hellstrom <[email protected]>
>>
>> ---
>> v2:
>> * Update API according to review comment by Greg Kroah-Hartman.
>> * Address review comments by Peter Zijlstra:
>> - Avoid _Bool in composites
>> - Fix typo
>> - Use __mutex_owner() where applicable
>> - Rely on built-in barriers for the main loop exit condition,
>> struct ww_acquire_ctx::wounded. Update code comments.
>> - Explain unlocked use of list_empty().
>> ---
>> Documentation/locking/ww-mutex-design.txt | 54 ++++++++++++----
>> drivers/dma-buf/reservation.c | 2 +-
>> drivers/gpu/drm/drm_modeset_lock.c | 2 +-
>> include/linux/ww_mutex.h | 19 ++++--
>> kernel/locking/locktorture.c | 2 +-
>> kernel/locking/mutex.c | 103 +++++++++++++++++++++++++++---
>> kernel/locking/test-ww_mutex.c | 2 +-
>> lib/locking-selftest.c | 2 +-
>> 8 files changed, 156 insertions(+), 30 deletions(-)
>>
>> diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
>> index 34c3a1b50b9a..b9597def9581 100644
>> --- a/Documentation/locking/ww-mutex-design.txt
>> +++ b/Documentation/locking/ww-mutex-design.txt
>> @@ -1,4 +1,4 @@
>> -Wait/Wound Deadlock-Proof Mutex Design
>> +Wound/Wait Deadlock-Proof Mutex Design
>> ======================================
>>
>> Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
>> @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
>> younger task) unlocks all of the buffers that it has already locked, and then
>> tries again.
>>
>> -In the RDBMS literature this deadlock handling approach is called wait/wound:
>> -The older tasks waits until it can acquire the contended lock. The younger tasks
>> -needs to back off and drop all the locks it is currently holding, i.e. the
>> -younger task is wounded.
>> +In the RDBMS literature, a reservation ticket is associated with a transaction.
>> +and the deadlock handling approach is called Wait-Die. The name is based on
>> +the actions of a locking thread when it encounters an already locked mutex.
>> +If the transaction holding the lock is younger, the locking transaction waits.
>> +If the transaction holding the lock is older, the locking transaction backs off
>> +and dies. Hence Wait-Die.
>> +There is also another algorithm called Wound-Wait:
>> +If the transaction holding the lock is younger, the locking transaction
>> +preempts the transaction holding the lock, requiring it to back off. It
>> +Wounds the other transaction.
>> +If the transaction holding the lock is older, it waits for the other
>> +transaction. Hence Wound-Wait.
>> +The two algorithms are both fair in that a transaction will eventually succeed.
>> +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
>> +compared to Wait-Die, but is, on the other hand, associated with more work than
>> +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
>> +algorithm which requires a reliable way to preempt another transaction.
>>
>> Concepts
>> --------
>> @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
>> trying to acquire locks doesn't grab a new reservation id, but keeps the one it
>> acquired when starting the lock acquisition. This ticket is stored in the
>> acquire context. Furthermore the acquire context keeps track of debugging state
>> -to catch w/w mutex interface abuse.
>> +to catch w/w mutex interface abuse. An acquire context is representing a
>> +transaction.
>>
>> W/w class: In contrast to normal mutexes the lock class needs to be explicit for
>> -w/w mutexes, since it is required to initialize the acquire context.
>> +w/w mutexes, since it is required to initialize the acquire context. The lock
>> +class also specifies what algorithm to use, Wound-Wait or Wait-Die.
>>
>> Furthermore there are three different class of w/w lock acquire functions:
>>
>> @@ -90,6 +105,12 @@ provided.
>> Usage
>> -----
>>
>> +The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
>> +DEFINE_WW_CLASS_WDIE() for Wait-Die or DEFINE_WW_CLASS() for Wound-Wait.
>> +As a rough rule of thumb, use Wound-Wait iff you typically expect the number
>> +of simultaneous competing transactions to be small, and the rollback cost can
>> +be substantial.
>> +
>> Three different ways to acquire locks within the same w/w class. Common
>> definitions for methods #1 and #2:
>>
>> @@ -312,12 +333,23 @@ Design:
>> We maintain the following invariants for the wait list:
>> (1) Waiters with an acquire context are sorted by stamp order; waiters
>> without an acquire context are interspersed in FIFO order.
>> - (2) Among waiters with contexts, only the first one can have other locks
>> - acquired already (ctx->acquired > 0). Note that this waiter may come
>> - after other waiters without contexts in the list.
>> + (2) For Wait-Die, among waiters with contexts, only the first one can have
>> + other locks acquired already (ctx->acquired > 0). Note that this waiter
>> + may come after other waiters without contexts in the list.
>> +
>> + The Wound-Wait preemption is implemented with a lazy-preemption scheme:
>> + The wounded status of the transaction is checked only when there is
>> + contention for a new lock and hence a true chance of deadlock. In that
>> + situation, if the transaction is wounded, it backs off, clears the
>> + wounded status and retries. A great benefit of implementing preemption in
>> + this way is that the wounded transaction can identify a contending lock to
>> + wait for before restarting the transaction. Just blindly restarting the
>> + transaction would likely make the transaction end up in a situation where
>> + it would have to back off again.
>>
>> In general, not much contention is expected. The locks are typically used to
>> - serialize access to resources for devices.
>> + serialize access to resources for devices, and optimization focus should
>> + therefore be directed towards the uncontended cases.
>>
>> Lockdep:
>> Special care has been taken to warn for as many cases of api abuse
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 314eb1071cce..b94a4bab2ecd 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -46,7 +46,7 @@
>> * write-side updates.
>> */
>>
>> -DEFINE_WW_CLASS(reservation_ww_class);
>> +DEFINE_WW_CLASS_WDIE(reservation_ww_class);
>> EXPORT_SYMBOL(reservation_ww_class);
>>
>> struct lock_class_key reservation_seqcount_class;
>> diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
>> index 8a5100685875..ff00a814f617 100644
>> --- a/drivers/gpu/drm/drm_modeset_lock.c
>> +++ b/drivers/gpu/drm/drm_modeset_lock.c
>> @@ -70,7 +70,7 @@
>> * lists and lookup data structures.
>> */
>>
>> -static DEFINE_WW_CLASS(crtc_ww_class);
>> +static DEFINE_WW_CLASS_WDIE(crtc_ww_class);
>>
>> /**
>> * drm_modeset_lock_all - take all modeset locks
>> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
>> index 39fda195bf78..3880813b7db5 100644
>> --- a/include/linux/ww_mutex.h
>> +++ b/include/linux/ww_mutex.h
>> @@ -8,6 +8,8 @@
>> *
>> * Wound/wait implementation:
>> * Copyright (C) 2013 Canonical Ltd.
>> + * Choice of algorithm:
>> + * Copyright (C) 2018 WMWare Inc.
>> *
>> * This file contains the main data structure and API definitions.
>> */
>> @@ -23,15 +25,17 @@ struct ww_class {
>> struct lock_class_key mutex_key;
>> const char *acquire_name;
>> const char *mutex_name;
>> + unsigned int is_wait_die;
>> };
>>
>> struct ww_acquire_ctx {
>> struct task_struct *task;
>> unsigned long stamp;
>> unsigned acquired;
>> + unsigned int wounded;
>> + struct ww_class *ww_class;
>> #ifdef CONFIG_DEBUG_MUTEXES
>> unsigned done_acquire;
>> - struct ww_class *ww_class;
>> struct ww_mutex *contending_lock;
>> #endif
>> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> @@ -58,17 +62,21 @@ struct ww_mutex {
>> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
>> #endif
>>
>> -#define __WW_CLASS_INITIALIZER(ww_class) \
>> +#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
>> { .stamp = ATOMIC_LONG_INIT(0) \
>> , .acquire_name = #ww_class "_acquire" \
>> - , .mutex_name = #ww_class "_mutex" }
>> + , .mutex_name = #ww_class "_mutex" \
>> + , .is_wait_die = _is_wait_die }
>>
>> #define __WW_MUTEX_INITIALIZER(lockname, class) \
>> { .base = __MUTEX_INITIALIZER(lockname.base) \
>> __WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>>
>> #define DEFINE_WW_CLASS(classname) \
>> - struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
>> + struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
>> +
>> +#define DEFINE_WW_CLASS_WDIE(classname) \
>> + struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
>>
>> #define DEFINE_WW_MUTEX(mutexname, ww_class) \
>> struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
>> @@ -123,8 +131,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>> ctx->task = current;
>> ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
>> ctx->acquired = 0;
>> -#ifdef CONFIG_DEBUG_MUTEXES
>> ctx->ww_class = ww_class;
>> + ctx->wounded = false;
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> ctx->done_acquire = 0;
>> ctx->contending_lock = NULL;
>> #endif
>> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
>> index 6850ffd69125..e861c1bf0e1e 100644
>> --- a/kernel/locking/locktorture.c
>> +++ b/kernel/locking/locktorture.c
>> @@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
>> };
>>
>> #include <linux/ww_mutex.h>
>> -static DEFINE_WW_CLASS(torture_ww_class);
>> +static DEFINE_WW_CLASS_WDIE(torture_ww_class);
>> static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
>> static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
>> static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 2048359f33d2..ffa00b5aaf03 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -290,12 +290,49 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>> (a->stamp != b->stamp || a > b);
>> }
>>
>> +/*
>> + * Wound the lock holder transaction if it's younger than the contending
>> + * transaction, and there is a possibility of a deadlock.
>> + * Also if the lock holder transaction isn't the current transaction,
>> + * make sure it's woken up in case it's sleeping on another ww mutex.
>> + */
>> +static bool __ww_mutex_wound(struct mutex *lock,
>> + struct ww_acquire_ctx *ww_ctx,
>> + struct ww_acquire_ctx *hold_ctx)
>> +{
>> + struct task_struct *owner = __mutex_owner(lock);
>> +
>> + lockdep_assert_held(&lock->wait_lock);
>> +
>> + if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>> + ww_ctx->acquired > 0) {
>> + hold_ctx->wounded = 1;
>> +
>> + /*
>> + * wake_up_process() paired with set_current_state() inserts
>> + * sufficient barriers to make sure @owner either sees it's
>> + * wounded or has a wakeup pending to re-read the wounded
>> + * state.
> IIUC, "sufficient barriers" = full memory barriers (here). (You may
> want to be more specific.)

Thanks for reviewing!
OK. What about if someone relaxes that in the future? I mean, what we
care about in this code is just that we have sufficient barriers for
that statement to be true, regardless what type of barriers those really
are?
>
>> + *
>> + * The value of hold_ctx->wounded in
>> + * __ww_mutex_lock_check_stamp();
> Missing parts/incomplete sentence?

Oops. I'll fix in next version.


>
> Andrea
Thanks,
Thomas



2018-06-14 11:37:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:

> __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> {
> struct mutex_waiter *cur;
> + unsigned int is_wait_die = ww_ctx->ww_class->is_wait_die;
>
> lockdep_assert_held(&lock->wait_lock);
>
> @@ -310,13 +348,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> if (!cur->ww_ctx)
> continue;
>
> - if (cur->ww_ctx->acquired > 0 &&
> + if (is_wait_die && cur->ww_ctx->acquired > 0 &&
> __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
> debug_mutex_wake_waiter(lock, cur);
> wake_up_process(cur->task);
> }
>
> - break;
> + if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> + break;
> }
> }

I ended up with:


static void __sched
__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
{
bool is_wait_die = ww_ctx->ww_class->is_wait_die;
struct mutex_waiter *cur;

lockdep_assert_held(&lock->wait_lock);

list_for_each_entry(cur, &lock->wait_list, list) {
if (!cur->ww_ctx)
continue;

if (is_wait_die) {
/*
* Because __ww_mutex_add_waiter() and
* __ww_mutex_check_stamp() wake any but the earliest
* context, this can only affect the first waiter (with
* a context).
*/
if (cur->ww_ctx->acquired > 0 &&
__ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
debug_mutex_wake_waiter(lock, cur);
wake_up_process(cur->task);
}

break;
}

if (__ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
break;
}
}


Currently you don't allow mixing WD and WW contexts (which is not
immediately obvious from the above code), and the above hard relies on
that. Are there sensible use cases for mixing them? IOW will your
current restriction stand without hassle?

2018-06-14 11:51:55

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

[...]

> >>+ /*
> >>+ * wake_up_process() paired with set_current_state() inserts
> >>+ * sufficient barriers to make sure @owner either sees it's
> >>+ * wounded or has a wakeup pending to re-read the wounded
> >>+ * state.
> >IIUC, "sufficient barriers" = full memory barriers (here). (You may
> >want to be more specific.)
>
> Thanks for reviewing!
> OK. What about if someone relaxes that in the future?

This is actually one of my main concerns ;-) as, IIUC, those barriers are
not only sufficient but also necessary: anything "less than a full barrier"
(in either wake_up_process() or set_current_state()) would _not_ guarantee
the "condition" above unless I'm misunderstanding it.

But am I misunderstanding it? Which barriers/guarantee do you _need_ from
the above mentioned pairing? (hence my comment...)

Andrea

2018-06-14 11:55:23

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On 06/14/2018 01:36 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
>
>> __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>> {
>> struct mutex_waiter *cur;
>> + unsigned int is_wait_die = ww_ctx->ww_class->is_wait_die;
>>
>> lockdep_assert_held(&lock->wait_lock);
>>
>> @@ -310,13 +348,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>> if (!cur->ww_ctx)
>> continue;
>>
>> - if (cur->ww_ctx->acquired > 0 &&
>> + if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>> __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>> debug_mutex_wake_waiter(lock, cur);
>> wake_up_process(cur->task);
>> }
>>
>> - break;
>> + if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
>> + break;
>> }
>> }
> I ended up with:
>
>
> static void __sched
> __ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> {
> bool is_wait_die = ww_ctx->ww_class->is_wait_die;
> struct mutex_waiter *cur;
>
> lockdep_assert_held(&lock->wait_lock);
>
> list_for_each_entry(cur, &lock->wait_list, list) {
> if (!cur->ww_ctx)
> continue;
>
> if (is_wait_die) {
> /*
> * Because __ww_mutex_add_waiter() and
> * __ww_mutex_check_stamp() wake any but the earliest
> * context, this can only affect the first waiter (with
> * a context).
> */
> if (cur->ww_ctx->acquired > 0 &&
> __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
> debug_mutex_wake_waiter(lock, cur);
> wake_up_process(cur->task);
> }
>
> break;
> }
>
> if (__ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> break;
> }
> }

Looks OK to me.

>
> Currently you don't allow mixing WD and WW contexts (which is not
> immediately obvious from the above code), and the above hard relies on
> that. Are there sensible use cases for mixing them? IOW will your
> current restriction stand without hassle?

Contexts _must_ agree on the algorithm used to resolve deadlocks. With
Wait-Die, for example, older transactions will wait if a lock is held by
a younger transaction and with Wound-Wait, younger transactions will
wait if a lock is held by an older transaction so there is no way of
mixing them.

Thanks,

/Thomas



2018-06-14 12:05:29

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On 06/14/2018 01:49 PM, Andrea Parri wrote:
> [...]
>
>>>> + /*
>>>> + * wake_up_process() paired with set_current_state() inserts
>>>> + * sufficient barriers to make sure @owner either sees it's
>>>> + * wounded or has a wakeup pending to re-read the wounded
>>>> + * state.
>>> IIUC, "sufficient barriers" = full memory barriers (here). (You may
>>> want to be more specific.)
>> Thanks for reviewing!
>> OK. What about if someone relaxes that in the future?
> This is actually one of my main concerns ;-) as, IIUC, those barriers are
> not only sufficient but also necessary: anything "less than a full barrier"
> (in either wake_up_process() or set_current_state()) would _not_ guarantee
> the "condition" above unless I'm misunderstanding it.
>
> But am I misunderstanding it? Which barriers/guarantee do you _need_ from
> the above mentioned pairing? (hence my comment...)
>
> Andrea

No you are probably not misunderstanding me at all. My comment
originated from the reading of the kerneldoc of set_current_state()

* set_current_state() includes a barrier so that the write of current->state
* is correctly serialised wrt the caller's subsequent test of whether to
* actually sleep:
*
* for (;;) {
* set_current_state(TASK_UNINTERRUPTIBLE);
* if (!need_sleep)
* break;
*
* schedule();
* }
* __set_current_state(TASK_RUNNING);
*
* If the caller does not need such serialisation (because, for instance, the
* condition test and condition change and wakeup are under the same
lock) then
* use __set_current_state().
*
* The above is typically ordered against the wakeup, which does:
*
* need_sleep = false;
* wake_up_state(p, TASK_UNINTERRUPTIBLE);
*
* Where wake_up_state() (and all other wakeup primitives) imply enough
* barriers to order the store of the variable against wakeup. And with
ctx->wounded := !need_sleep this exactly matches what's happening in my
code. So what I was trying to say in the comment was that this above
contract is sufficient to guarantee the "condition" above, whitout me
actually knowing exactly what barriers are required. Thanks, Thomas


2018-06-14 12:10:08

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

Resending hopefully better formatted..


On 06/14/2018 01:49 PM, Andrea Parri wrote:
> [...]
>
>>>> + /*
>>>> + * wake_up_process() paired with set_current_state() inserts
>>>> + * sufficient barriers to make sure @owner either sees it's
>>>> + * wounded or has a wakeup pending to re-read the wounded
>>>> + * state.
>>> IIUC, "sufficient barriers" = full memory barriers (here). (You may
>>> want to be more specific.)
>> Thanks for reviewing!
>> OK. What about if someone relaxes that in the future?
> This is actually one of my main concerns ;-) as, IIUC, those barriers are
> not only sufficient but also necessary: anything "less than a full barrier"
> (in either wake_up_process() or set_current_state()) would _not_ guarantee
> the "condition" above unless I'm misunderstanding it.
>
> But am I misunderstanding it? Which barriers/guarantee do you _need_ from
> the above mentioned pairing? (hence my comment...)
>
> Andrea

No you are probably not misunderstanding me at all. My comment
originated from the reading of the kerneldoc of set_current_state()

/*
* set_current_state() includes a barrier so that the write of current->state
* is correctly serialised wrt the caller's subsequent test of whether to
* actually sleep:
*
* for (;;) {
* set_current_state(TASK_UNINTERRUPTIBLE);
* if (!need_sleep)
* break;
*
* schedule();
* }
* __set_current_state(TASK_RUNNING);
*
* If the caller does not need such serialisation (because, for instance, the
* condition test and condition change and wakeup are under the same
lock) then
* use __set_current_state().
*
* The above is typically ordered against the wakeup, which does:
*
* need_sleep = false;
* wake_up_state(p, TASK_UNINTERRUPTIBLE);
*
* Where wake_up_state() (and all other wakeup primitives) imply enough
* barriers to order the store of the variable against wakeup.
---
*/

And with ctx->wounded := !need_sleep this exactly matches what's
happening in my code. So what I was trying to say in the comment was
that this above contract is sufficient to guarantee the "condition"
above, whitout me actually knowing exactly what barriers are required.
Thanks, Thomas


2018-06-14 12:42:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
> +static bool __ww_mutex_wound(struct mutex *lock,
> + struct ww_acquire_ctx *ww_ctx,
> + struct ww_acquire_ctx *hold_ctx)
> +{
> + struct task_struct *owner = __mutex_owner(lock);
> +
> + lockdep_assert_held(&lock->wait_lock);
> +
> + if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> + ww_ctx->acquired > 0) {
> + hold_ctx->wounded = 1;
> +
> + /*
> + * wake_up_process() paired with set_current_state() inserts
> + * sufficient barriers to make sure @owner either sees it's
> + * wounded or has a wakeup pending to re-read the wounded
> + * state.
> + *
> + * The value of hold_ctx->wounded in
> + * __ww_mutex_lock_check_stamp();
> + */
> + if (owner != current)
> + wake_up_process(owner);
> +
> + return true;
> + }
> +
> + return false;
> +}

> @@ -338,12 +377,18 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> * and keep spinning, or it will acquire wait_lock, add itself
> * to waiter list and sleep.
> */
> - smp_mb(); /* ^^^ */
> + smp_mb(); /* See comments above and below. */
>
> /*
> - * Check if lock is contended, if not there is nobody to wake up
> + * Check if lock is contended, if not there is nobody to wake up.
> + * We can use list_empty() unlocked here since it only compares a
> + * list_head field pointer to the address of the list head
> + * itself, similarly to how list_empty() can be considered RCU-safe.
> + * The memory barrier above pairs with the memory barrier in
> + * __ww_mutex_add_waiter and makes sure lock->ctx is visible before
> + * we check for waiters.
> */
> - if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> + if (likely(list_empty(&lock->base.wait_list)))
> return;
>

OK, so what happens is that if we see !empty list, we take wait_lock,
if we end up in __ww_mutex_wound() we must really have !empty wait-list.

It can however still see !owner because __mutex_unlock_slowpath() can
clear the owner field. But if owner is set, it must stay valid because
FLAG_WAITERS and we're holding wait_lock.

So the wake_up_process() is in fact safe.

Let me put that in a comment.

2018-06-14 12:49:39

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

Hi, Peter,

On 06/14/2018 02:41 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
>> +static bool __ww_mutex_wound(struct mutex *lock,
>> + struct ww_acquire_ctx *ww_ctx,
>> + struct ww_acquire_ctx *hold_ctx)
>> +{
>> + struct task_struct *owner = __mutex_owner(lock);
>> +
>> + lockdep_assert_held(&lock->wait_lock);
>> +
>> + if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>> + ww_ctx->acquired > 0) {
>> + hold_ctx->wounded = 1;
>> +
>> + /*
>> + * wake_up_process() paired with set_current_state() inserts
>> + * sufficient barriers to make sure @owner either sees it's
>> + * wounded or has a wakeup pending to re-read the wounded
>> + * state.
>> + *
>> + * The value of hold_ctx->wounded in
>> + * __ww_mutex_lock_check_stamp();
>> + */
>> + if (owner != current)
>> + wake_up_process(owner);
>> +
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> @@ -338,12 +377,18 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>> * and keep spinning, or it will acquire wait_lock, add itself
>> * to waiter list and sleep.
>> */
>> - smp_mb(); /* ^^^ */
>> + smp_mb(); /* See comments above and below. */
>>
>> /*
>> - * Check if lock is contended, if not there is nobody to wake up
>> + * Check if lock is contended, if not there is nobody to wake up.
>> + * We can use list_empty() unlocked here since it only compares a
>> + * list_head field pointer to the address of the list head
>> + * itself, similarly to how list_empty() can be considered RCU-safe.
>> + * The memory barrier above pairs with the memory barrier in
>> + * __ww_mutex_add_waiter and makes sure lock->ctx is visible before
>> + * we check for waiters.
>> */
>> - if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>> + if (likely(list_empty(&lock->base.wait_list)))
>> return;
>>
> OK, so what happens is that if we see !empty list, we take wait_lock,
> if we end up in __ww_mutex_wound() we must really have !empty wait-list.
>
> It can however still see !owner because __mutex_unlock_slowpath() can
> clear the owner field. But if owner is set, it must stay valid because
> FLAG_WAITERS and we're holding wait_lock.

If __ww_mutex_wound() is called from ww_mutex_set_context_fastpath()
owner is the current process so we can never see !owner. However if
__ww_mutex_wound() is called from __ww_mutex_add_waiter() then the above
is true.

>
> So the wake_up_process() is in fact safe.
>
> Let me put that in a comment.


Thanks,

Thomas



2018-06-14 13:20:03

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On 06/14/2018 02:48 PM, Thomas Hellstrom wrote:
> Hi, Peter,
>
> On 06/14/2018 02:41 PM, Peter Zijlstra wrote:
>> On Thu, Jun 14, 2018 at 09:29:21AM +0200, Thomas Hellstrom wrote:
>>> +static bool __ww_mutex_wound(struct mutex *lock,
>>> +                 struct ww_acquire_ctx *ww_ctx,
>>> +                 struct ww_acquire_ctx *hold_ctx)
>>> +{
>>> +    struct task_struct *owner = __mutex_owner(lock);
>>> +
>>> +    lockdep_assert_held(&lock->wait_lock);
>>> +
>>> +    if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>>> +        ww_ctx->acquired > 0) {
>>> +        hold_ctx->wounded = 1;
>>> +
>>> +        /*
>>> +         * wake_up_process() paired with set_current_state() inserts
>>> +         * sufficient barriers to make sure @owner either sees it's
>>> +         * wounded or has a wakeup pending to re-read the wounded
>>> +         * state.
>>> +         *
>>> +         * The value of hold_ctx->wounded in
>>> +         * __ww_mutex_lock_check_stamp();
>>> +         */
>>> +        if (owner != current)
>>> +            wake_up_process(owner);
>>> +
>>> +        return true;
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> @@ -338,12 +377,18 @@ ww_mutex_set_context_fastpath(struct ww_mutex
>>> *lock, struct ww_acquire_ctx *ctx)
>>>        * and keep spinning, or it will acquire wait_lock, add itself
>>>        * to waiter list and sleep.
>>>        */
>>> -    smp_mb(); /* ^^^ */
>>> +    smp_mb(); /* See comments above and below. */
>>>         /*
>>> -     * Check if lock is contended, if not there is nobody to wake up
>>> +     * Check if lock is contended, if not there is nobody to wake up.
>>> +     * We can use list_empty() unlocked here since it only compares a
>>> +     * list_head field pointer to the address of the list head
>>> +     * itself, similarly to how list_empty() can be considered
>>> RCU-safe.
>>> +     * The memory barrier above pairs with the memory barrier in
>>> +     * __ww_mutex_add_waiter and makes sure lock->ctx is visible
>>> before
>>> +     * we check for waiters.
>>>        */
>>> -    if (likely(!(atomic_long_read(&lock->base.owner) &
>>> MUTEX_FLAG_WAITERS)))
>>> +    if (likely(list_empty(&lock->base.wait_list)))
>>>           return;
>> OK, so what happens is that if we see !empty list, we take wait_lock,
>> if we end up in __ww_mutex_wound() we must really have !empty wait-list.
>>
>> It can however still see !owner because __mutex_unlock_slowpath() can
>> clear the owner field. But if owner is set, it must stay valid because
>> FLAG_WAITERS and we're holding wait_lock.
>
> If __ww_mutex_wound() is called from ww_mutex_set_context_fastpath()
> owner is the current process so we can never see !owner. However if
> __ww_mutex_wound() is called from __ww_mutex_add_waiter() then the
> above is true.

Or actually it was intended to be true, but FLAG_WAITERS is set too
late. It needs to be moved to just after we actually add the waiter to
the list.

Then the hunk that replaces a FLAG_WAITERS read with a lockless
list_empty() can also be ditched.

/Thomas


>
>>
>> So the wake_up_process() is in fact safe.
>>
>> Let me put that in a comment.
>
>
> Thanks,
>
> Thomas
>
>


2018-06-14 13:30:00

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote:
> On 06/14/2018 01:36 PM, Peter Zijlstra wrote:
> > Currently you don't allow mixing WD and WW contexts (which is not
> > immediately obvious from the above code), and the above hard relies on
> > that. Are there sensible use cases for mixing them? IOW will your
> > current restriction stand without hassle?
>
> Contexts _must_ agree on the algorithm used to resolve deadlocks. With
> Wait-Die, for example, older transactions will wait if a lock is held by a
> younger transaction and with Wound-Wait, younger transactions will wait if a
> lock is held by an older transaction so there is no way of mixing them.

Maybe the compiler should be enforcing that; ie make it a different type?

2018-06-14 13:44:29

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On 06/14/2018 03:29 PM, Matthew Wilcox wrote:
> On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote:
>> On 06/14/2018 01:36 PM, Peter Zijlstra wrote:
>>> Currently you don't allow mixing WD and WW contexts (which is not
>>> immediately obvious from the above code), and the above hard relies on
>>> that. Are there sensible use cases for mixing them? IOW will your
>>> current restriction stand without hassle?
>> Contexts _must_ agree on the algorithm used to resolve deadlocks. With
>> Wait-Die, for example, older transactions will wait if a lock is held by a
>> younger transaction and with Wound-Wait, younger transactions will wait if a
>> lock is held by an older transaction so there is no way of mixing them.
> Maybe the compiler should be enforcing that; ie make it a different type?

It's intended to be enforced by storing the algorithm choice in the
WW_MUTEX_CLASS which must be common for an acquire context and the
ww_mutexes it acquires. However, I don't think there is a check that
that holds. I guess we could add it as a DEBUG_MUTEX test in
ww_mutex_lock().

Thanks,

Thomas



2018-06-14 14:48:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

On Thu, Jun 14, 2018 at 03:43:04PM +0200, Thomas Hellstrom wrote:
> It's intended to be enforced by storing the algorithm choice in the
> WW_MUTEX_CLASS which must be common for an acquire context and the
> ww_mutexes it acquires. However, I don't think there is a check that that
> holds. I guess we could add it as a DEBUG_MUTEX test in ww_mutex_lock().

There is ww_mutex_lock_acquired():

DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class)

which should trigger if you try and be clever.