2014-12-28 09:12:19

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 0/8] locking: Various updates

Hello,

A little bit of everything really, with the last two patches being
the most interesting ones.

Patch 1 fixes a silly typo.
Patches 2-5 cleanup a bit of ww mutex code.
Patch 6 isolates osq code.
Patch 7 uses the brand new READ/ASSIGN_ONCE primitives.
Patch 8 is a performance patch and gets rid of barrier calls when
polling for the (osq) lock.

More details obviously in the individual patches. Applies on today's -tip.

Please consider for 3.20, thanks!

Davidlohr Bueso (8):
Documentation/memory-barriers.txt: Fix smp typo
locking/mutex: Get rid of use_ww_cxt param in common paths
locking/mutex: Checking the stamp is ww only
locking/mutex: Delete useless comment
locking/mutex: Introduce ww_mutex_set_context_slowpath
locking/mcs: Better differentiate between mcs variants
locking: Use [READ,ASSIGN]_ONCE() for non-scalar types
locking/osq: No need for load/acquire when acquire-polling

Documentation/memory-barriers.txt | 2 +-
include/linux/osq_lock.h | 12 ++-
kernel/Kconfig.locks | 4 +
kernel/locking/Makefile | 3 +-
kernel/locking/mcs_spinlock.c | 208 --------------------------------------
kernel/locking/mcs_spinlock.h | 22 +---
kernel/locking/mutex.c | 99 +++++++++---------
kernel/locking/osq_lock.c | 203 +++++++++++++++++++++++++++++++++++++
kernel/locking/rwsem-xadd.c | 4 +-
9 files changed, 276 insertions(+), 281 deletions(-)
delete mode 100644 kernel/locking/mcs_spinlock.c
create mode 100644 kernel/locking/osq_lock.c

--
2.1.2


2014-12-28 09:12:20

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/8] Documentation/memory-barriers.txt: Fix smp typo

Signed-off-by: Davidlohr Bueso <[email protected]>
---
Documentation/memory-barriers.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 70a09f8..7086f83 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -750,7 +750,7 @@ In summary:
However, they do -not- guarantee any other sort of ordering:
Not prior loads against later loads, nor prior stores against
later anything. If you need these other forms of ordering,
- use smb_rmb(), smp_wmb(), or, in the case of prior stores and
+ use smp_rmb(), smp_wmb(), or, in the case of prior stores and
later loads, smp_mb().

(*) If both legs of the "if" statement begin with identical stores
--
2.1.2

2014-12-28 09:12:33

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 8/8] locking/osq: No need for load/acquire when acquire-polling

Both mutexes and rwsems took a performance hit when we switched
over from the original mcs code to the cancelable variant (osq).
The reason being the use of smp_load_acquire() when polling for
node->locked. Paul describes the scenario nicely:
https://lkml.org/lkml/2013/11/19/405

The smp_load_acquire() when unqueuing make sense. In addition,
we don't need to worry about leaking the critical region as
osq is only used internally.

This impacts both regular and large levels of concurrency and
hardware, ie on a 40 core system with a disk intensive workload:

disk-1 804.83 ( 0.00%) 828.16 ( 2.90%)
disk-61 8063.45 ( 0.00%) 18181.82 (125.48%)
disk-121 7187.41 ( 0.00%) 20119.17 (179.92%)
disk-181 6933.32 ( 0.00%) 20509.91 (195.82%)
disk-241 6850.81 ( 0.00%) 20397.80 (197.74%)
disk-301 6815.22 ( 0.00%) 20287.58 (197.68%)
disk-361 7080.40 ( 0.00%) 20205.22 (185.37%)
disk-421 7076.13 ( 0.00%) 19957.33 (182.04%)
disk-481 7083.25 ( 0.00%) 19784.06 (179.31%)
disk-541 7038.39 ( 0.00%) 19610.92 (178.63%)
disk-601 7072.04 ( 0.00%) 19464.53 (175.23%)
disk-661 7010.97 ( 0.00%) 19348.23 (175.97%)
disk-721 7069.44 ( 0.00%) 19255.33 (172.37%)
disk-781 7007.58 ( 0.00%) 19103.14 (172.61%)
disk-841 6981.18 ( 0.00%) 18964.22 (171.65%)
disk-901 6968.47 ( 0.00%) 18826.72 (170.17%)
disk-961 6964.61 ( 0.00%) 18708.02 (168.62%)

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/osq_lock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 9c6e251..d10dfb9 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -109,7 +109,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
* cmpxchg in an attempt to undo our queueing.
*/

- while (!smp_load_acquire(&node->locked)) {
+ while (!READ_ONCE(node->locked)) {
/*
* If we need to reschedule bail... so we can block.
*/
--
2.1.2

2014-12-28 09:12:56

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 6/8] locking/mcs: Better differentiate between mcs variants

We have two flavors of the MCS spinlock: standard and cancelable (osq).
While each one is independent of the other, we currently mix and match
them. This patch:

o Moves osq code out of mcs_spinlock.h (which only deals with the traditional
version) into include/linux/osq_lock.h. No unnecessary code is added to the
more global header file, anything locks that make use of osq must include
it anyway.

o Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.

o Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
if there is support for it.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/osq_lock.h | 12 ++-
kernel/Kconfig.locks | 4 +
kernel/locking/Makefile | 3 +-
kernel/locking/mcs_spinlock.c | 208 ------------------------------------------
kernel/locking/mcs_spinlock.h | 16 ----
kernel/locking/osq_lock.c | 203 +++++++++++++++++++++++++++++++++++++++++
6 files changed, 219 insertions(+), 227 deletions(-)
delete mode 100644 kernel/locking/mcs_spinlock.c
create mode 100644 kernel/locking/osq_lock.c

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 90230d5..3a6490e 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -5,8 +5,11 @@
* An MCS like lock especially tailored for optimistic spinning for sleeping
* lock implementations (mutex, rwsem, etc).
*/
-
-#define OSQ_UNLOCKED_VAL (0)
+struct optimistic_spin_node {
+ struct optimistic_spin_node *next, *prev;
+ int locked; /* 1 if lock acquired */
+ int cpu; /* encoded CPU # + 1 value */
+};

struct optimistic_spin_queue {
/*
@@ -16,6 +19,8 @@ struct optimistic_spin_queue {
atomic_t tail;
};

+#define OSQ_UNLOCKED_VAL (0)
+
/* Init macro and function. */
#define OSQ_LOCK_UNLOCKED { ATOMIC_INIT(OSQ_UNLOCKED_VAL) }

@@ -24,4 +29,7 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
}

+extern bool osq_lock(struct optimistic_spin_queue *lock);
+extern void osq_unlock(struct optimistic_spin_queue *lock);
+
#endif
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 76768ee..08561f1 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -231,6 +231,10 @@ config RWSEM_SPIN_ON_OWNER
def_bool y
depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW

+config LOCK_SPIN_ON_OWNER
+ def_bool y
+ depends on MUTEX_SPIN_ON_OWNER || RWSEM_SPIN_ON_OWNER
+
config ARCH_USE_QUEUE_RWLOCK
bool

diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 8541bfd..4ca8eb1 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -1,5 +1,5 @@

-obj-y += mutex.o semaphore.o rwsem.o mcs_spinlock.o
+obj-y += mutex.o semaphore.o rwsem.o

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = -pg
@@ -14,6 +14,7 @@ ifeq ($(CONFIG_PROC_FS),y)
obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
endif
obj-$(CONFIG_SMP) += spinlock.o
+obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
obj-$(CONFIG_SMP) += lglock.o
obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c
deleted file mode 100644
index 9887a90..0000000
--- a/kernel/locking/mcs_spinlock.c
+++ /dev/null
@@ -1,208 +0,0 @@
-#include <linux/percpu.h>
-#include <linux/sched.h>
-#include "mcs_spinlock.h"
-
-#ifdef CONFIG_SMP
-
-/*
- * An MCS like lock especially tailored for optimistic spinning for sleeping
- * lock implementations (mutex, rwsem, etc).
- *
- * Using a single mcs node per CPU is safe because sleeping locks should not be
- * called from interrupt context and we have preemption disabled while
- * spinning.
- */
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
-
-/*
- * We use the value 0 to represent "no CPU", thus the encoded value
- * will be the CPU number incremented by 1.
- */
-static inline int encode_cpu(int cpu_nr)
-{
- return cpu_nr + 1;
-}
-
-static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
-{
- int cpu_nr = encoded_cpu_val - 1;
-
- return per_cpu_ptr(&osq_node, cpu_nr);
-}
-
-/*
- * Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
- * Can return NULL in case we were the last queued and we updated @lock instead.
- */
-static inline struct optimistic_spin_node *
-osq_wait_next(struct optimistic_spin_queue *lock,
- struct optimistic_spin_node *node,
- struct optimistic_spin_node *prev)
-{
- struct optimistic_spin_node *next = NULL;
- int curr = encode_cpu(smp_processor_id());
- int old;
-
- /*
- * If there is a prev node in queue, then the 'old' value will be
- * the prev node's CPU #, else it's set to OSQ_UNLOCKED_VAL since if
- * we're currently last in queue, then the queue will then become empty.
- */
- old = prev ? prev->cpu : OSQ_UNLOCKED_VAL;
-
- for (;;) {
- if (atomic_read(&lock->tail) == curr &&
- atomic_cmpxchg(&lock->tail, curr, old) == curr) {
- /*
- * We were the last queued, we moved @lock back. @prev
- * will now observe @lock and will complete its
- * unlock()/unqueue().
- */
- break;
- }
-
- /*
- * We must xchg() the @node->next value, because if we were to
- * leave it in, a concurrent unlock()/unqueue() from
- * @node->next might complete Step-A and think its @prev is
- * still valid.
- *
- * If the concurrent unlock()/unqueue() wins the race, we'll
- * wait for either @lock to point to us, through its Step-B, or
- * wait for a new @node->next from its Step-C.
- */
- if (node->next) {
- next = xchg(&node->next, NULL);
- if (next)
- break;
- }
-
- cpu_relax_lowlatency();
- }
-
- return next;
-}
-
-bool osq_lock(struct optimistic_spin_queue *lock)
-{
- struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
- struct optimistic_spin_node *prev, *next;
- int curr = encode_cpu(smp_processor_id());
- int old;
-
- node->locked = 0;
- node->next = NULL;
- node->cpu = curr;
-
- old = atomic_xchg(&lock->tail, curr);
- if (old == OSQ_UNLOCKED_VAL)
- return true;
-
- prev = decode_cpu(old);
- node->prev = prev;
- ACCESS_ONCE(prev->next) = node;
-
- /*
- * Normally @prev is untouchable after the above store; because at that
- * moment unlock can proceed and wipe the node element from stack.
- *
- * However, since our nodes are static per-cpu storage, we're
- * guaranteed their existence -- this allows us to apply
- * cmpxchg in an attempt to undo our queueing.
- */
-
- while (!smp_load_acquire(&node->locked)) {
- /*
- * If we need to reschedule bail... so we can block.
- */
- if (need_resched())
- goto unqueue;
-
- cpu_relax_lowlatency();
- }
- return true;
-
-unqueue:
- /*
- * Step - A -- stabilize @prev
- *
- * Undo our @prev->next assignment; this will make @prev's
- * unlock()/unqueue() wait for a next pointer since @lock points to us
- * (or later).
- */
-
- for (;;) {
- if (prev->next == node &&
- cmpxchg(&prev->next, node, NULL) == node)
- break;
-
- /*
- * We can only fail the cmpxchg() racing against an unlock(),
- * in which case we should observe @node->locked becomming
- * true.
- */
- if (smp_load_acquire(&node->locked))
- return true;
-
- cpu_relax_lowlatency();
-
- /*
- * Or we race against a concurrent unqueue()'s step-B, in which
- * case its step-C will write us a new @node->prev pointer.
- */
- prev = ACCESS_ONCE(node->prev);
- }
-
- /*
- * Step - B -- stabilize @next
- *
- * Similar to unlock(), wait for @node->next or move @lock from @node
- * back to @prev.
- */
-
- next = osq_wait_next(lock, node, prev);
- if (!next)
- return false;
-
- /*
- * Step - C -- unlink
- *
- * @prev is stable because its still waiting for a new @prev->next
- * pointer, @next is stable because our @node->next pointer is NULL and
- * it will wait in Step-A.
- */
-
- ACCESS_ONCE(next->prev) = prev;
- ACCESS_ONCE(prev->next) = next;
-
- return false;
-}
-
-void osq_unlock(struct optimistic_spin_queue *lock)
-{
- struct optimistic_spin_node *node, *next;
- int curr = encode_cpu(smp_processor_id());
-
- /*
- * Fast path for the uncontended case.
- */
- if (likely(atomic_cmpxchg(&lock->tail, curr, OSQ_UNLOCKED_VAL) == curr))
- return;
-
- /*
- * Second most likely case.
- */
- node = this_cpu_ptr(&osq_node);
- next = xchg(&node->next, NULL);
- if (next) {
- ACCESS_ONCE(next->locked) = 1;
- return;
- }
-
- next = osq_wait_next(lock, node, NULL);
- if (next)
- ACCESS_ONCE(next->locked) = 1;
-}
-
-#endif
-
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 4d60986..d1fe2ba 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -108,20 +108,4 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
arch_mcs_spin_unlock_contended(&next->locked);
}

-/*
- * Cancellable version of the MCS lock above.
- *
- * Intended for adaptive spinning of sleeping locks:
- * mutex_lock()/rwsem_down_{read,write}() etc.
- */
-
-struct optimistic_spin_node {
- struct optimistic_spin_node *next, *prev;
- int locked; /* 1 if lock acquired */
- int cpu; /* encoded CPU # value */
-};
-
-extern bool osq_lock(struct optimistic_spin_queue *lock);
-extern void osq_unlock(struct optimistic_spin_queue *lock);
-
#endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
new file mode 100644
index 0000000..ec83d4d
--- /dev/null
+++ b/kernel/locking/osq_lock.c
@@ -0,0 +1,203 @@
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/osq_lock.h>
+
+/*
+ * An MCS like lock especially tailored for optimistic spinning for sleeping
+ * lock implementations (mutex, rwsem, etc).
+ *
+ * Using a single mcs node per CPU is safe because sleeping locks should not be
+ * called from interrupt context and we have preemption disabled while
+ * spinning.
+ */
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
+
+/*
+ * We use the value 0 to represent "no CPU", thus the encoded value
+ * will be the CPU number incremented by 1.
+ */
+static inline int encode_cpu(int cpu_nr)
+{
+ return cpu_nr + 1;
+}
+
+static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
+{
+ int cpu_nr = encoded_cpu_val - 1;
+
+ return per_cpu_ptr(&osq_node, cpu_nr);
+}
+
+/*
+ * Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
+ * Can return NULL in case we were the last queued and we updated @lock instead.
+ */
+static inline struct optimistic_spin_node *
+osq_wait_next(struct optimistic_spin_queue *lock,
+ struct optimistic_spin_node *node,
+ struct optimistic_spin_node *prev)
+{
+ struct optimistic_spin_node *next = NULL;
+ int curr = encode_cpu(smp_processor_id());
+ int old;
+
+ /*
+ * If there is a prev node in queue, then the 'old' value will be
+ * the prev node's CPU #, else it's set to OSQ_UNLOCKED_VAL since if
+ * we're currently last in queue, then the queue will then become empty.
+ */
+ old = prev ? prev->cpu : OSQ_UNLOCKED_VAL;
+
+ for (;;) {
+ if (atomic_read(&lock->tail) == curr &&
+ atomic_cmpxchg(&lock->tail, curr, old) == curr) {
+ /*
+ * We were the last queued, we moved @lock back. @prev
+ * will now observe @lock and will complete its
+ * unlock()/unqueue().
+ */
+ break;
+ }
+
+ /*
+ * We must xchg() the @node->next value, because if we were to
+ * leave it in, a concurrent unlock()/unqueue() from
+ * @node->next might complete Step-A and think its @prev is
+ * still valid.
+ *
+ * If the concurrent unlock()/unqueue() wins the race, we'll
+ * wait for either @lock to point to us, through its Step-B, or
+ * wait for a new @node->next from its Step-C.
+ */
+ if (node->next) {
+ next = xchg(&node->next, NULL);
+ if (next)
+ break;
+ }
+
+ cpu_relax_lowlatency();
+ }
+
+ return next;
+}
+
+bool osq_lock(struct optimistic_spin_queue *lock)
+{
+ struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
+ struct optimistic_spin_node *prev, *next;
+ int curr = encode_cpu(smp_processor_id());
+ int old;
+
+ node->locked = 0;
+ node->next = NULL;
+ node->cpu = curr;
+
+ old = atomic_xchg(&lock->tail, curr);
+ if (old == OSQ_UNLOCKED_VAL)
+ return true;
+
+ prev = decode_cpu(old);
+ node->prev = prev;
+ ACCESS_ONCE(prev->next) = node;
+
+ /*
+ * Normally @prev is untouchable after the above store; because at that
+ * moment unlock can proceed and wipe the node element from stack.
+ *
+ * However, since our nodes are static per-cpu storage, we're
+ * guaranteed their existence -- this allows us to apply
+ * cmpxchg in an attempt to undo our queueing.
+ */
+
+ while (!smp_load_acquire(&node->locked)) {
+ /*
+ * If we need to reschedule bail... so we can block.
+ */
+ if (need_resched())
+ goto unqueue;
+
+ cpu_relax_lowlatency();
+ }
+ return true;
+
+unqueue:
+ /*
+ * Step - A -- stabilize @prev
+ *
+ * Undo our @prev->next assignment; this will make @prev's
+ * unlock()/unqueue() wait for a next pointer since @lock points to us
+ * (or later).
+ */
+
+ for (;;) {
+ if (prev->next == node &&
+ cmpxchg(&prev->next, node, NULL) == node)
+ break;
+
+ /*
+ * We can only fail the cmpxchg() racing against an unlock(),
+ * in which case we should observe @node->locked becomming
+ * true.
+ */
+ if (smp_load_acquire(&node->locked))
+ return true;
+
+ cpu_relax_lowlatency();
+
+ /*
+ * Or we race against a concurrent unqueue()'s step-B, in which
+ * case its step-C will write us a new @node->prev pointer.
+ */
+ prev = ACCESS_ONCE(node->prev);
+ }
+
+ /*
+ * Step - B -- stabilize @next
+ *
+ * Similar to unlock(), wait for @node->next or move @lock from @node
+ * back to @prev.
+ */
+
+ next = osq_wait_next(lock, node, prev);
+ if (!next)
+ return false;
+
+ /*
+ * Step - C -- unlink
+ *
+ * @prev is stable because its still waiting for a new @prev->next
+ * pointer, @next is stable because our @node->next pointer is NULL and
+ * it will wait in Step-A.
+ */
+
+ ACCESS_ONCE(next->prev) = prev;
+ ACCESS_ONCE(prev->next) = next;
+
+ return false;
+}
+
+void osq_unlock(struct optimistic_spin_queue *lock)
+{
+ struct optimistic_spin_node *node, *next;
+ int curr = encode_cpu(smp_processor_id());
+
+ /*
+ * Fast path for the uncontended case.
+ */
+ if (likely(atomic_cmpxchg(&lock->tail, curr, OSQ_UNLOCKED_VAL) == curr))
+ return;
+
+ /*
+ * Second most likely case.
+ */
+ node = this_cpu_ptr(&osq_node);
+ next = xchg(&node->next, NULL);
+ if (next) {
+ ACCESS_ONCE(next->locked) = 1;
+ return;
+ }
+
+ next = osq_wait_next(lock, node, NULL);
+ if (next)
+ ACCESS_ONCE(next->locked) = 1;
+}
--
2.1.2

2014-12-28 09:12:54

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 7/8] locking: Use [READ,ASSIGN]_ONCE() for non-scalar types

I guess everyone will eventually have to update, so lets
do our part... and become the first users of ASSIGN_ONCE().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/mcs_spinlock.h | 6 +++---
kernel/locking/mutex.c | 8 ++++----
kernel/locking/osq_lock.c | 8 ++++----
kernel/locking/rwsem-xadd.c | 4 ++--
4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index d1fe2ba..903009a 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -78,7 +78,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
*/
return;
}
- ACCESS_ONCE(prev->next) = node;
+ ASSIGN_ONCE(node, prev->next);

/* Wait until the lock holder passes the lock down. */
arch_mcs_spin_lock_contended(&node->locked);
@@ -91,7 +91,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
static inline
void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
{
- struct mcs_spinlock *next = ACCESS_ONCE(node->next);
+ struct mcs_spinlock *next = READ_ONCE(node->next);

if (likely(!next)) {
/*
@@ -100,7 +100,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
if (likely(cmpxchg(lock, node, NULL) == node))
return;
/* Wait until the next pointer is set */
- while (!(next = ACCESS_ONCE(node->next)))
+ while (!(next = READ_ONCE(node->next)))
cpu_relax_lowlatency();
}

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5b6df69..d143427 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -274,7 +274,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
return 0;

rcu_read_lock();
- owner = ACCESS_ONCE(lock->owner);
+ owner = READ_ONCE(lock->owner);
if (owner)
retval = owner->on_cpu;
rcu_read_unlock();
@@ -343,7 +343,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
* As such, when deadlock detection needs to be
* performed the optimistic spinning cannot be done.
*/
- if (ACCESS_ONCE(ww->ctx))
+ if (READ_ONCE(ww->ctx))
break;
}

@@ -351,7 +351,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
* If there's an owner, wait for it to either
* release the lock or go to sleep.
*/
- owner = ACCESS_ONCE(lock->owner);
+ owner = READ_ONCE(lock->owner);
if (owner && !mutex_spin_on_owner(lock, owner))
break;

@@ -490,7 +490,7 @@ static inline int __sched
__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
{
struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
- struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
+ struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);

if (!hold_ctx)
return 0;
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index ec83d4d..9c6e251 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -98,7 +98,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)

prev = decode_cpu(old);
node->prev = prev;
- ACCESS_ONCE(prev->next) = node;
+ ASSIGN_ONCE(node, prev->next);

/*
* Normally @prev is untouchable after the above store; because at that
@@ -148,7 +148,7 @@ unqueue:
* Or we race against a concurrent unqueue()'s step-B, in which
* case its step-C will write us a new @node->prev pointer.
*/
- prev = ACCESS_ONCE(node->prev);
+ prev = READ_ONCE(node->prev);
}

/*
@@ -170,8 +170,8 @@ unqueue:
* it will wait in Step-A.
*/

- ACCESS_ONCE(next->prev) = prev;
- ACCESS_ONCE(prev->next) = next;
+ ASSIGN_ONCE(prev, next->prev);
+ ASSIGN_ONCE(next, prev->next);

return false;
}
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7628c3f..2e651f6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -294,7 +294,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
return false;

rcu_read_lock();
- owner = ACCESS_ONCE(sem->owner);
+ owner = READ_ONCE(sem->owner);
if (owner)
on_cpu = owner->on_cpu;
rcu_read_unlock();
@@ -359,7 +359,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
goto done;

while (true) {
- owner = ACCESS_ONCE(sem->owner);
+ owner = READ_ONCE(sem->owner);
if (owner && !rwsem_spin_on_owner(sem, owner))
break;

--
2.1.2

2014-12-28 09:13:48

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 5/8] locking/mutex: Introduce ww_mutex_set_context_slowpath

... which is equivalent to the fastpath counter part.
This mainly allows getting some ww specific code out
of generic mutex paths.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/mutex.c | 43 ++++++++++++++++++++++++++-----------------
1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index caa3c9f..5b6df69 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -155,7 +155,7 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
*/
static __always_inline void
ww_mutex_set_context_fastpath(struct ww_mutex *lock,
- struct ww_acquire_ctx *ctx)
+ struct ww_acquire_ctx *ctx)
{
unsigned long flags;
struct mutex_waiter *cur;
@@ -191,6 +191,30 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
spin_unlock_mutex(&lock->base.wait_lock, flags);
}

+/*
+ * after acquiring lock in the slowpath or when we lost out in contested
+ * slowpath, set ctx and wake up any waiters so they can recheck.
+ *
+ * Callers must hold the mutex wait_lock.
+ */
+static __always_inline void
+ww_mutex_set_context_slowpath(struct ww_mutex *lock,
+ struct ww_acquire_ctx *ctx)
+{
+ struct mutex_waiter *cur;
+
+ ww_mutex_lock_acquired(lock, ctx);
+ lock->ctx = ctx;
+
+ /*
+ * Give any possible sleeping processes the chance to wake up,
+ * so they can recheck if they have to back off.
+ */
+ list_for_each_entry(cur, &lock->base.wait_list, list) {
+ debug_mutex_wake_waiter(lock->base, cur);
+ wake_up_process(cur->task);
+ }
+}

#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
/*
@@ -576,23 +600,8 @@ skip_wait:

if (ww_ctx) {
struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
- struct mutex_waiter *cur;

- /*
- * This branch gets optimized out for the common case,
- * and is only important for ww_mutex_lock.
- */
- ww_mutex_lock_acquired(ww, ww_ctx);
- ww->ctx = ww_ctx;
-
- /*
- * Give any possible sleeping processes the chance to wake up,
- * so they can recheck if they have to back off.
- */
- list_for_each_entry(cur, &lock->wait_list, list) {
- debug_mutex_wake_waiter(lock, cur);
- wake_up_process(cur->task);
- }
+ ww_mutex_set_context_slowpath(ww, ww_ctx);
}

spin_unlock_mutex(&lock->wait_lock, flags);
--
2.1.2

2014-12-28 09:14:18

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 4/8] locking/mutex: Delete useless comment

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/mutex.c | 6 ------
1 file changed, 6 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 3252a7d..caa3c9f 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -197,13 +197,7 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
* In order to avoid a stampede of mutex spinners from acquiring the mutex
* more or less simultaneously, the spinners need to acquire a MCS lock
* first before spinning on the owner field.
- *
*/
-
-/*
- * Mutex spinning code migrated from kernel/sched/core.c
- */
-
static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
{
if (lock->owner != owner)
--
2.1.2

2014-12-28 09:14:24

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/8] locking/mutex: Get rid of use_ww_cxt param in common paths

... and simplify a bit the parameters of __mutex_lock_common(). This
is straightforward as we can just check if the ww cxt for NULL if passed
from non wait/wound paths.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/mutex.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4541951..2e687a8 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -300,7 +300,7 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
* that we need to jump to the slowpath and sleep.
*/
static bool mutex_optimistic_spin(struct mutex *lock,
- struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+ struct ww_acquire_ctx *ww_ctx)
{
struct task_struct *task = current;

@@ -313,7 +313,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
while (true) {
struct task_struct *owner;

- if (use_ww_ctx && ww_ctx->acquired > 0) {
+ if (ww_ctx && ww_ctx->acquired > 0) {
struct ww_mutex *ww;

ww = container_of(lock, struct ww_mutex, base);
@@ -341,7 +341,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
if (mutex_try_to_acquire(lock)) {
lock_acquired(&lock->dep_map, ip);

- if (use_ww_ctx) {
+ if (ww_ctx) {
struct ww_mutex *ww;
ww = container_of(lock, struct ww_mutex, base);

@@ -391,7 +391,7 @@ done:
}
#else
static bool mutex_optimistic_spin(struct mutex *lock,
- struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+ struct ww_acquire_ctx *ww_ctx)
{
return false;
}
@@ -498,7 +498,7 @@ __mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
static __always_inline int __sched
__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
struct lockdep_map *nest_lock, unsigned long ip,
- struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+ struct ww_acquire_ctx *ww_ctx)
{
struct task_struct *task = current;
struct mutex_waiter waiter;
@@ -508,7 +508,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
preempt_disable();
mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);

- if (mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx)) {
+ if (mutex_optimistic_spin(lock, ww_ctx)) {
/* got the lock, yay! */
preempt_enable();
return 0;
@@ -556,7 +556,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
goto err;
}

- if (use_ww_ctx && ww_ctx->acquired > 0) {
+ if (ww_ctx && ww_ctx->acquired > 0) {
ret = __mutex_lock_check_stamp(lock, ww_ctx);
if (ret)
goto err;
@@ -580,7 +580,7 @@ skip_wait:
lock_acquired(&lock->dep_map, ip);
mutex_set_owner(lock);

- if (use_ww_ctx) {
+ if (ww_ctx) {
struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
struct mutex_waiter *cur;

@@ -620,7 +620,7 @@ mutex_lock_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
- subclass, NULL, _RET_IP_, NULL, 0);
+ subclass, NULL, _RET_IP_, NULL);
}

EXPORT_SYMBOL_GPL(mutex_lock_nested);
@@ -630,7 +630,7 @@ _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
{
might_sleep();
__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
- 0, nest, _RET_IP_, NULL, 0);
+ 0, nest, _RET_IP_, NULL);
}

EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
@@ -640,7 +640,7 @@ mutex_lock_killable_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
return __mutex_lock_common(lock, TASK_KILLABLE,
- subclass, NULL, _RET_IP_, NULL, 0);
+ subclass, NULL, _RET_IP_, NULL);
}
EXPORT_SYMBOL_GPL(mutex_lock_killable_nested);

@@ -649,7 +649,7 @@ mutex_lock_interruptible_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
- subclass, NULL, _RET_IP_, NULL, 0);
+ subclass, NULL, _RET_IP_, NULL);
}

EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
@@ -687,7 +687,7 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)

might_sleep();
ret = __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE,
- 0, &ctx->dep_map, _RET_IP_, ctx, 1);
+ 0, &ctx->dep_map, _RET_IP_, ctx);
if (!ret && ctx->acquired > 1)
return ww_mutex_deadlock_injection(lock, ctx);

@@ -702,7 +702,7 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)

might_sleep();
ret = __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE,
- 0, &ctx->dep_map, _RET_IP_, ctx, 1);
+ 0, &ctx->dep_map, _RET_IP_, ctx);

if (!ret && ctx->acquired > 1)
return ww_mutex_deadlock_injection(lock, ctx);
@@ -822,28 +822,28 @@ __mutex_lock_slowpath(atomic_t *lock_count)
struct mutex *lock = container_of(lock_count, struct mutex, count);

__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0,
- NULL, _RET_IP_, NULL, 0);
+ NULL, _RET_IP_, NULL);
}

static noinline int __sched
__mutex_lock_killable_slowpath(struct mutex *lock)
{
return __mutex_lock_common(lock, TASK_KILLABLE, 0,
- NULL, _RET_IP_, NULL, 0);
+ NULL, _RET_IP_, NULL);
}

static noinline int __sched
__mutex_lock_interruptible_slowpath(struct mutex *lock)
{
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0,
- NULL, _RET_IP_, NULL, 0);
+ NULL, _RET_IP_, NULL);
}

static noinline int __sched
__ww_mutex_lock_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
{
return __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE, 0,
- NULL, _RET_IP_, ctx, 1);
+ NULL, _RET_IP_, ctx);
}

static noinline int __sched
@@ -851,7 +851,7 @@ __ww_mutex_lock_interruptible_slowpath(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx)
{
return __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE, 0,
- NULL, _RET_IP_, ctx, 1);
+ NULL, _RET_IP_, ctx);
}

#endif
--
2.1.2

2014-12-28 09:14:21

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 3/8] locking/mutex: Checking the stamp is ww only

Mark it so by renaming mutex_lock_check_stamp().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/mutex.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2e687a8..3252a7d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -469,7 +469,7 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
EXPORT_SYMBOL(ww_mutex_unlock);

static inline int __sched
-__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
{
struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
@@ -557,7 +557,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
}

if (ww_ctx && ww_ctx->acquired > 0) {
- ret = __mutex_lock_check_stamp(lock, ww_ctx);
+ ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
if (ret)
goto err;
}
--
2.1.2

2014-12-28 16:30:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/8] locking/mutex: Get rid of use_ww_cxt param in common paths

On Sun, Dec 28, 2014 at 01:11:17AM -0800, Davidlohr Bueso wrote:
> ... and simplify a bit the parameters of __mutex_lock_common(). This
> is straightforward as we can just check if the ww cxt for NULL if passed
> from non wait/wound paths.


The reason for this trainwreck is because of some older GCC not actually doing
the const propagation of the MULL pointer right.

There were at least two threads on this,
b0267507dfd0187fb7840a0ec461a510a7f041c5 might give clues.

> static bool mutex_optimistic_spin(struct mutex *lock,
> - struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
> + struct ww_acquire_ctx *ww_ctx)
> {
> struct task_struct *task = current;
>
> @@ -313,7 +313,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
> while (true) {
> struct task_struct *owner;
>
> - if (use_ww_ctx && ww_ctx->acquired > 0) {
> + if (ww_ctx && ww_ctx->acquired > 0) {
> struct ww_mutex *ww;
>
> ww = container_of(lock, struct ww_mutex, base);

2014-12-28 16:38:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 7/8] locking: Use [READ,ASSIGN]_ONCE() for non-scalar types

On Sun, Dec 28, 2014 at 01:11:22AM -0800, Davidlohr Bueso wrote:
> I guess everyone will eventually have to update, so lets
> do our part... and become the first users of ASSIGN_ONCE().

> - ACCESS_ONCE(prev->next) = node;
> + ASSIGN_ONCE(node, prev->next);

> - struct mcs_spinlock *next = ACCESS_ONCE(node->next);
> + struct mcs_spinlock *next = READ_ONCE(node->next);

That's disgusting and sad, doubly so because I was entirely unaware of that stuff.

2014-12-28 18:13:11

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/8] locking/mutex: Get rid of use_ww_cxt param in common paths

On Sun, 2014-12-28 at 17:23 +0100, Peter Zijlstra wrote:
> On Sun, Dec 28, 2014 at 01:11:17AM -0800, Davidlohr Bueso wrote:
> > ... and simplify a bit the parameters of __mutex_lock_common(). This
> > is straightforward as we can just check if the ww cxt for NULL if passed
> > from non wait/wound paths.
>
>
> The reason for this trainwreck is because of some older GCC not actually doing
> the const propagation of the MULL pointer right.

That's a shame, I really detest all this ww mess.

2014-12-31 12:10:36

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/8] Documentation/memory-barriers.txt: Fix smp typo

On Sun, Dec 28, 2014 at 01:11:16AM -0800, Davidlohr Bueso wrote:
> Signed-off-by: Davidlohr Bueso <[email protected]>

Good catch, queued for 3.20.

Thanx, Paul

> ---
> Documentation/memory-barriers.txt | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 70a09f8..7086f83 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -750,7 +750,7 @@ In summary:
> However, they do -not- guarantee any other sort of ordering:
> Not prior loads against later loads, nor prior stores against
> later anything. If you need these other forms of ordering,
> - use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> + use smp_rmb(), smp_wmb(), or, in the case of prior stores and
> later loads, smp_mb().
>
> (*) If both legs of the "if" statement begin with identical stores
> --
> 2.1.2
>