2015-08-03 17:03:44

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 0/8] Add generic support for relaxed atomics

Hello,

Here is version four of the patches previously posted here:

v1: https://lwn.net/Articles/650862/
v2: https://lwn.net/Articles/651293/
v3: https://lwn.net/Articles/652369/

The series adds support for a family of relaxed atomics to the kernel.
More specifically:

- acquire/release/relaxed flavours of xchg, cmpxchg and {add,sub}_return
- atomic_read_acquire
- atomic_set_release

This came out of a separate patch series porting the (barrier-heavy)
qrwlock code to arm64. Rather than have arch-specific hooks littered
around the place, it makes more sense to define a core set of relaxed
atomics that can be used regardless of architecture.

The only change since v3 is an extension to the comment in
linux/atomic.h, as suggested by Peter.

Build tested on ARM, arm64, PowerPC and x86.

All feedback welcome,

Will

--->8

Will Deacon (8):
atomics: add acquire/release/relaxed variants of some atomic
operations
asm-generic: rework atomic-long.h to avoid bulk code duplication
asm-generic: add relaxed/acquire/release variants for atomic_long_t
lockref: remove homebrew cmpxchg64_relaxed macro definition
locking/qrwlock: implement queue_write_unlock using smp_store_release
locking/qrwlock: make use of acquire/release/relaxed atomics
include/llist: use linux/atomic.h instead of asm/cmpxchg.h
ARM: atomics: define our SMP atomics in terms of _relaxed operations

arch/arm/include/asm/atomic.h | 37 ++---
arch/arm/include/asm/cmpxchg.h | 47 +-----
arch/x86/include/asm/qrwlock.h | 10 --
include/asm-generic/atomic-long.h | 263 +++++++++++-------------------
include/asm-generic/qrwlock.h | 22 +--
include/linux/atomic.h | 328 ++++++++++++++++++++++++++++++++++++++
include/linux/llist.h | 2 +-
kernel/locking/qrwlock.c | 23 ++-
lib/lockref.c | 8 -
9 files changed, 468 insertions(+), 272 deletions(-)

--
2.1.4


2015-08-03 17:16:36

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 1/8] atomics: add acquire/release/relaxed variants of some atomic operations

Whilst porting the generic qrwlock code over to arm64, it became
apparent that any portable locking code needs finer-grained control of
the memory-ordering guarantees provided by our atomic routines.

In particular: xchg, cmpxchg, {add,sub}_return are often used in
situations where full barrier semantics (currently the only option
available) are not required. For example, when a reader increments a
reader count to obtain a lock, checking the old value to see if a writer
was present, only acquire semantics are strictly needed.

This patch introduces three new ordering semantics for these operations:

- *_relaxed: No ordering guarantees. This is similar to what we have
already for the non-return atomics (e.g. atomic_add).

- *_acquire: ACQUIRE semantics, similar to smp_load_acquire.

- *_release: RELEASE semantics, similar to smp_store_release.

In memory-ordering speak, this means that the acquire/release semantics
are RCpc as opposed to RCsc. Consequently a RELEASE followed by an
ACQUIRE does not imply a full barrier, as already documented in
memory-barriers.txt.

Currently, all the new macros are conditionally mapped to the full-mb
variants, however if the *_relaxed version is provided by the
architecture, then the acquire/release variants are constructed by
supplementing the relaxed routine with an explicit barrier.

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
include/linux/atomic.h | 328 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 328 insertions(+)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 5b08a8540ecf..d2515c05e7c8 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -2,6 +2,334 @@
#ifndef _LINUX_ATOMIC_H
#define _LINUX_ATOMIC_H
#include <asm/atomic.h>
+#include <asm/barrier.h>
+
+/*
+ * Relaxed variants of xchg, cmpxchg and some atomic operations.
+ *
+ * We support four variants:
+ *
+ * - Fully ordered: The default implementation, no suffix required.
+ * - Acquire: Provides ACQUIRE semantics, _acquire suffix.
+ * - Release: Provides RELEASE semantics, _release suffix.
+ * - Relaxed: No ordering guarantees, _relaxed suffix.
+ *
+ * For compound atomics performing both a load and a store, ACQUIRE
+ * semantics apply only to the load and RELEASE semantics only to the
+ * store portion of the operation. Note that a failed cmpxchg_acquire
+ * does -not- imply any memory ordering constraints.
+ *
+ * See Documentation/memory-barriers.txt for ACQUIRE/RELEASE definitions.
+ */
+
+#ifndef atomic_read_acquire
+#define atomic_read_acquire(v) smp_load_acquire(&(v)->counter)
+#endif
+
+#ifndef atomic_set_release
+#define atomic_set_release(v, i) smp_store_release(&(v)->counter, (i))
+#endif
+
+/*
+ * The idea here is to build acquire/release variants by adding explicit
+ * barriers on top of the relaxed variant. In the case where the relaxed
+ * variant is already fully ordered, no additional barriers are needed.
+ */
+#define __atomic_op_acquire(ret_t, op, ...) \
+({ \
+ ret_t __ret = op##_relaxed(__VA_ARGS__); \
+ smp_mb__after_atomic(); \
+ __ret; \
+})
+
+#define __atomic_op_release(ret_t, op, ...) \
+({ \
+ ret_t __ret; \
+ smp_mb__before_atomic(); \
+ __ret = op##_relaxed(__VA_ARGS__); \
+ __ret; \
+})
+
+#define __atomic_op_fence(ret_t, op, ...) \
+({ \
+ ret_t __ret; \
+ smp_mb__before_atomic(); \
+ __ret = op##_relaxed(__VA_ARGS__); \
+ smp_mb__after_atomic(); \
+ __ret; \
+})
+
+/* atomic_add_return_relaxed */
+#ifndef atomic_add_return_relaxed
+#define atomic_add_return_relaxed atomic_add_return
+#define atomic_add_return_acquire atomic_add_return
+#define atomic_add_return_release atomic_add_return
+
+#else /* atomic_add_return_relaxed */
+
+#ifndef atomic_add_return_acquire
+#define atomic_add_return_acquire(...) \
+ __atomic_op_acquire(int, atomic_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_add_return_release
+#define atomic_add_return_release(...) \
+ __atomic_op_release(int, atomic_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_add_return
+#define atomic_add_return(...) \
+ __atomic_op_fence(int, atomic_add_return, __VA_ARGS__)
+#endif
+#endif /* atomic_add_return_relaxed */
+
+/* atomic_sub_return_relaxed */
+#ifndef atomic_sub_return_relaxed
+#define atomic_sub_return_relaxed atomic_sub_return
+#define atomic_sub_return_acquire atomic_sub_return
+#define atomic_sub_return_release atomic_sub_return
+
+#else /* atomic_sub_return_relaxed */
+
+#ifndef atomic_sub_return_acquire
+#define atomic_sub_return_acquire(...) \
+ __atomic_op_acquire(int, atomic_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_sub_return_release
+#define atomic_sub_return_release(...) \
+ __atomic_op_release(int, atomic_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic_sub_return
+#define atomic_sub_return(...) \
+ __atomic_op_fence(int, atomic_sub_return, __VA_ARGS__)
+#endif
+#endif /* atomic_sub_return_relaxed */
+
+/* atomic_xchg_relaxed */
+#ifndef atomic_xchg_relaxed
+#define atomic_xchg_relaxed atomic_xchg
+#define atomic_xchg_acquire atomic_xchg
+#define atomic_xchg_release atomic_xchg
+
+#else /* atomic_xchg_relaxed */
+
+#ifndef atomic_xchg_acquire
+#define atomic_xchg_acquire(...) \
+ __atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_xchg_release
+#define atomic_xchg_release(...) \
+ __atomic_op_release(int, atomic_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_xchg
+#define atomic_xchg(...) \
+ __atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
+#endif
+#endif /* atomic_xchg_relaxed */
+
+/* atomic_cmpxchg_relaxed */
+#ifndef atomic_cmpxchg_relaxed
+#define atomic_cmpxchg_relaxed atomic_cmpxchg
+#define atomic_cmpxchg_acquire atomic_cmpxchg
+#define atomic_cmpxchg_release atomic_cmpxchg
+
+#else /* atomic_cmpxchg_relaxed */
+
+#ifndef atomic_cmpxchg_acquire
+#define atomic_cmpxchg_acquire(...) \
+ __atomic_op_acquire(int, atomic_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_cmpxchg_release
+#define atomic_cmpxchg_release(...) \
+ __atomic_op_release(int, atomic_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic_cmpxchg
+#define atomic_cmpxchg(...) \
+ __atomic_op_fence(int, atomic_cmpxchg, __VA_ARGS__)
+#endif
+#endif /* atomic_cmpxchg_relaxed */
+
+#ifndef atomic64_read_acquire
+#define atomic64_read_acquire(v) smp_load_acquire(&(v)->counter)
+#endif
+
+#ifndef atomic64_set_release
+#define atomic64_set_release(v, i) smp_store_release(&(v)->counter, (i))
+#endif
+
+/* atomic64_add_return_relaxed */
+#ifndef atomic64_add_return_relaxed
+#define atomic64_add_return_relaxed atomic64_add_return
+#define atomic64_add_return_acquire atomic64_add_return
+#define atomic64_add_return_release atomic64_add_return
+
+#else /* atomic64_add_return_relaxed */
+
+#ifndef atomic64_add_return_acquire
+#define atomic64_add_return_acquire(...) \
+ __atomic_op_acquire(long long, atomic64_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_add_return_release
+#define atomic64_add_return_release(...) \
+ __atomic_op_release(long long, atomic64_add_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_add_return
+#define atomic64_add_return(...) \
+ __atomic_op_fence(long long, atomic64_add_return, __VA_ARGS__)
+#endif
+#endif /* atomic64_add_return_relaxed */
+
+/* atomic64_sub_return_relaxed */
+#ifndef atomic64_sub_return_relaxed
+#define atomic64_sub_return_relaxed atomic64_sub_return
+#define atomic64_sub_return_acquire atomic64_sub_return
+#define atomic64_sub_return_release atomic64_sub_return
+
+#else /* atomic64_sub_return_relaxed */
+
+#ifndef atomic64_sub_return_acquire
+#define atomic64_sub_return_acquire(...) \
+ __atomic_op_acquire(long long, atomic64_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_sub_return_release
+#define atomic64_sub_return_release(...) \
+ __atomic_op_release(long long, atomic64_sub_return, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_sub_return
+#define atomic64_sub_return(...) \
+ __atomic_op_fence(long long, atomic64_sub_return, __VA_ARGS__)
+#endif
+#endif /* atomic64_sub_return_relaxed */
+
+/* atomic64_xchg_relaxed */
+#ifndef atomic64_xchg_relaxed
+#define atomic64_xchg_relaxed atomic64_xchg
+#define atomic64_xchg_acquire atomic64_xchg
+#define atomic64_xchg_release atomic64_xchg
+
+#else /* atomic64_xchg_relaxed */
+
+#ifndef atomic64_xchg_acquire
+#define atomic64_xchg_acquire(...) \
+ __atomic_op_acquire(long long, atomic64_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_xchg_release
+#define atomic64_xchg_release(...) \
+ __atomic_op_release(long long, atomic64_xchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_xchg
+#define atomic64_xchg(...) \
+ __atomic_op_fence(long long, atomic64_xchg, __VA_ARGS__)
+#endif
+#endif /* atomic64_xchg_relaxed */
+
+/* atomic64_cmpxchg_relaxed */
+#ifndef atomic64_cmpxchg_relaxed
+#define atomic64_cmpxchg_relaxed atomic64_cmpxchg
+#define atomic64_cmpxchg_acquire atomic64_cmpxchg
+#define atomic64_cmpxchg_release atomic64_cmpxchg
+
+#else /* atomic64_cmpxchg_relaxed */
+
+#ifndef atomic64_cmpxchg_acquire
+#define atomic64_cmpxchg_acquire(...) \
+ __atomic_op_acquire(long long, atomic64_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_cmpxchg_release
+#define atomic64_cmpxchg_release(...) \
+ __atomic_op_release(long long, atomic64_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef atomic64_cmpxchg
+#define atomic64_cmpxchg(...) \
+ __atomic_op_fence(long long, atomic64_cmpxchg, __VA_ARGS__)
+#endif
+#endif /* atomic64_cmpxchg_relaxed */
+
+/* cmpxchg_relaxed */
+#ifndef cmpxchg_relaxed
+#define cmpxchg_relaxed cmpxchg
+#define cmpxchg_acquire cmpxchg
+#define cmpxchg_release cmpxchg
+
+#else /* cmpxchg_relaxed */
+
+#ifndef cmpxchg_acquire
+#define cmpxchg_acquire(ptr, ...) \
+ __atomic_op_acquire(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg_release
+#define cmpxchg_release(ptr, ...) \
+ __atomic_op_release(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg
+#define cmpxchg(ptr, ...) \
+ __atomic_op_fence(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#endif
+#endif /* cmpxchg_relaxed */
+
+/* cmpxchg64_relaxed */
+#ifndef cmpxchg64_relaxed
+#define cmpxchg64_relaxed cmpxchg64
+#define cmpxchg64_acquire cmpxchg64
+#define cmpxchg64_release cmpxchg64
+
+#else /* cmpxchg64_relaxed */
+
+#ifndef cmpxchg64_acquire
+#define cmpxchg64_acquire(ptr, ...) \
+ __atomic_op_acquire(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg64_release
+#define cmpxchg64_release(ptr, ...) \
+ __atomic_op_release(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#endif
+
+#ifndef cmpxchg64
+#define cmpxchg64(ptr, ...) \
+ __atomic_op_fence(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#endif
+#endif /* cmpxchg64_relaxed */
+
+/* xchg_relaxed */
+#ifndef xchg_relaxed
+#define xchg_relaxed xchg
+#define xchg_acquire xchg
+#define xchg_release xchg
+
+#else /* xchg_relaxed */
+
+#ifndef xchg_acquire
+#define xchg_acquire(ptr, ...) \
+ __atomic_op_acquire(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef xchg_release
+#define xchg_release(ptr, ...) \
+ __atomic_op_release(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#endif
+
+#ifndef xchg
+#define xchg(ptr, ...) \
+ __atomic_op_fence(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#endif
+#endif /* xchg_relaxed */

/**
* atomic_add_unless - add unless the number is already a given value
--
2.1.4

2015-08-03 17:15:44

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 2/8] asm-generic: rework atomic-long.h to avoid bulk code duplication

We can use some (admittedly ugly) macros to generate the 32-bit and
64-bit based atomic_long implementations from the same code.

Signed-off-by: Will Deacon <[email protected]>
---
include/asm-generic/atomic-long.h | 189 ++++++++------------------------------
1 file changed, 40 insertions(+), 149 deletions(-)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index b7babf0206b8..beaea541adfb 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -23,236 +23,127 @@
typedef atomic64_t atomic_long_t;

#define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i)
+#define ATOMIC_LONG_PFX(x) atomic64 ## x

-static inline long atomic_long_read(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return (long)atomic64_read(v);
-}
-
-static inline void atomic_long_set(atomic_long_t *l, long i)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- atomic64_set(v, i);
-}
-
-static inline void atomic_long_inc(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- atomic64_inc(v);
-}
-
-static inline void atomic_long_dec(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- atomic64_dec(v);
-}
-
-static inline void atomic_long_add(long i, atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- atomic64_add(i, v);
-}
-
-static inline void atomic_long_sub(long i, atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- atomic64_sub(i, v);
-}
-
-static inline int atomic_long_sub_and_test(long i, atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return atomic64_sub_and_test(i, v);
-}
-
-static inline int atomic_long_dec_and_test(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return atomic64_dec_and_test(v);
-}
-
-static inline int atomic_long_inc_and_test(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return atomic64_inc_and_test(v);
-}
-
-static inline int atomic_long_add_negative(long i, atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return atomic64_add_negative(i, v);
-}
-
-static inline long atomic_long_add_return(long i, atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return (long)atomic64_add_return(i, v);
-}
-
-static inline long atomic_long_sub_return(long i, atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return (long)atomic64_sub_return(i, v);
-}
-
-static inline long atomic_long_inc_return(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return (long)atomic64_inc_return(v);
-}
-
-static inline long atomic_long_dec_return(atomic_long_t *l)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return (long)atomic64_dec_return(v);
-}
-
-static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
-{
- atomic64_t *v = (atomic64_t *)l;
-
- return (long)atomic64_add_unless(v, a, u);
-}
-
-#define atomic_long_inc_not_zero(l) atomic64_inc_not_zero((atomic64_t *)(l))
-
-#define atomic_long_cmpxchg(l, old, new) \
- (atomic64_cmpxchg((atomic64_t *)(l), (old), (new)))
-#define atomic_long_xchg(v, new) \
- (atomic64_xchg((atomic64_t *)(v), (new)))
-
-#else /* BITS_PER_LONG == 64 */
+#else

typedef atomic_t atomic_long_t;

#define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
+#define ATOMIC_LONG_PFX(x) atomic ## x
+
+#endif
+
static inline long atomic_long_read(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return (long)atomic_read(v);
+ return (long)ATOMIC_LONG_PFX(_read)(v);
}

static inline void atomic_long_set(atomic_long_t *l, long i)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- atomic_set(v, i);
+ ATOMIC_LONG_PFX(_set)(v, i);
}

static inline void atomic_long_inc(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- atomic_inc(v);
+ ATOMIC_LONG_PFX(_inc)(v);
}

static inline void atomic_long_dec(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- atomic_dec(v);
+ ATOMIC_LONG_PFX(_dec)(v);
}

static inline void atomic_long_add(long i, atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- atomic_add(i, v);
+ ATOMIC_LONG_PFX(_add)(i, v);
}

static inline void atomic_long_sub(long i, atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- atomic_sub(i, v);
+ ATOMIC_LONG_PFX(_sub)(i, v);
}

static inline int atomic_long_sub_and_test(long i, atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return atomic_sub_and_test(i, v);
+ return ATOMIC_LONG_PFX(_sub_and_test)(i, v);
}

static inline int atomic_long_dec_and_test(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return atomic_dec_and_test(v);
+ return ATOMIC_LONG_PFX(_dec_and_test)(v);
}

static inline int atomic_long_inc_and_test(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return atomic_inc_and_test(v);
+ return ATOMIC_LONG_PFX(_inc_and_test)(v);
}

static inline int atomic_long_add_negative(long i, atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return atomic_add_negative(i, v);
+ return ATOMIC_LONG_PFX(_add_negative)(i, v);
}

static inline long atomic_long_add_return(long i, atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return (long)atomic_add_return(i, v);
+ return (long)ATOMIC_LONG_PFX(_add_return)(i, v);
}

static inline long atomic_long_sub_return(long i, atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return (long)atomic_sub_return(i, v);
+ return (long)ATOMIC_LONG_PFX(_sub_return)(i, v);
}

static inline long atomic_long_inc_return(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return (long)atomic_inc_return(v);
+ return (long)ATOMIC_LONG_PFX(_inc_return)(v);
}

static inline long atomic_long_dec_return(atomic_long_t *l)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return (long)atomic_dec_return(v);
+ return (long)ATOMIC_LONG_PFX(_dec_return)(v);
}

static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
{
- atomic_t *v = (atomic_t *)l;
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;

- return (long)atomic_add_unless(v, a, u);
+ return (long)ATOMIC_LONG_PFX(_add_unless)(v, a, u);
}

-#define atomic_long_inc_not_zero(l) atomic_inc_not_zero((atomic_t *)(l))
-
+#define atomic_long_inc_not_zero(l) \
+ ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
#define atomic_long_cmpxchg(l, old, new) \
- (atomic_cmpxchg((atomic_t *)(l), (old), (new)))
+ (ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
#define atomic_long_xchg(v, new) \
- (atomic_xchg((atomic_t *)(v), (new)))
-
-#endif /* BITS_PER_LONG == 64 */
+ (ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))

#endif /* _ASM_GENERIC_ATOMIC_LONG_H */
--
2.1.4

2015-08-03 17:03:34

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 3/8] asm-generic: add relaxed/acquire/release variants for atomic_long_t

This patch adds atomic_long_t wrappers for the new relaxed atomic
operations.

Signed-off-by: Will Deacon <[email protected]>
---
include/asm-generic/atomic-long.h | 86 +++++++++++++++++++++++++++------------
1 file changed, 59 insertions(+), 27 deletions(-)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index beaea541adfb..a94cbebbc33d 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -34,19 +34,69 @@ typedef atomic_t atomic_long_t;

#endif

-static inline long atomic_long_read(atomic_long_t *l)
-{
- ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
-
- return (long)ATOMIC_LONG_PFX(_read)(v);
+#define ATOMIC_LONG_READ_OP(mo) \
+static inline long atomic_long_read##mo(atomic_long_t *l) \
+{ \
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l; \
+ \
+ return (long)ATOMIC_LONG_PFX(_read##mo)(v); \
}
+ATOMIC_LONG_READ_OP()
+ATOMIC_LONG_READ_OP(_acquire)

-static inline void atomic_long_set(atomic_long_t *l, long i)
-{
- ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
+#undef ATOMIC_LONG_READ_OP

- ATOMIC_LONG_PFX(_set)(v, i);
+#define ATOMIC_LONG_SET_OP(mo) \
+static inline void atomic_long_set##mo(atomic_long_t *l, long i) \
+{ \
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l; \
+ \
+ ATOMIC_LONG_PFX(_set##mo)(v, i); \
+}
+ATOMIC_LONG_SET_OP()
+ATOMIC_LONG_SET_OP(_release)
+
+#undef ATOMIC_LONG_SET_OP
+
+#define ATOMIC_LONG_ADD_SUB_OP(op, mo) \
+static inline long \
+atomic_long_##op##_return##mo(long i, atomic_long_t *l) \
+{ \
+ ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l; \
+ \
+ return (long)ATOMIC_LONG_PFX(_##op##_return##mo)(i, v); \
}
+ATOMIC_LONG_ADD_SUB_OP(add,)
+ATOMIC_LONG_ADD_SUB_OP(add, _relaxed)
+ATOMIC_LONG_ADD_SUB_OP(add, _acquire)
+ATOMIC_LONG_ADD_SUB_OP(add, _release)
+ATOMIC_LONG_ADD_SUB_OP(sub,)
+ATOMIC_LONG_ADD_SUB_OP(sub, _relaxed)
+ATOMIC_LONG_ADD_SUB_OP(sub, _acquire)
+ATOMIC_LONG_ADD_SUB_OP(sub, _release)
+
+#undef ATOMIC_LONG_ADD_SUB_OP
+
+#define atomic_long_cmpxchg_relaxed(l, old, new) \
+ (ATOMIC_LONG_PFX(_cmpxchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(l), \
+ (old), (new)))
+#define atomic_long_cmpxchg_acquire(l, old, new) \
+ (ATOMIC_LONG_PFX(_cmpxchg_acquire)((ATOMIC_LONG_PFX(_t) *)(l), \
+ (old), (new)))
+#define atomic_long_cmpxchg_release(l, old, new) \
+ (ATOMIC_LONG_PFX(_cmpxchg_release)((ATOMIC_LONG_PFX(_t) *)(l), \
+ (old), (new)))
+#define atomic_long_cmpxchg(l, old, new) \
+ (ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
+
+#define atomic_long_xchg_relaxed(v, new) \
+ (ATOMIC_LONG_PFX(_xchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
+#define atomic_long_xchg_acquire(v, new) \
+ (ATOMIC_LONG_PFX(_xchg_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
+#define atomic_long_xchg_release(v, new) \
+ (ATOMIC_LONG_PFX(_xchg_release)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
+#define atomic_long_xchg(v, new) \
+ (ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))

static inline void atomic_long_inc(atomic_long_t *l)
{
@@ -104,20 +154,6 @@ static inline int atomic_long_add_negative(long i, atomic_long_t *l)
return ATOMIC_LONG_PFX(_add_negative)(i, v);
}

-static inline long atomic_long_add_return(long i, atomic_long_t *l)
-{
- ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
-
- return (long)ATOMIC_LONG_PFX(_add_return)(i, v);
-}
-
-static inline long atomic_long_sub_return(long i, atomic_long_t *l)
-{
- ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
-
- return (long)ATOMIC_LONG_PFX(_sub_return)(i, v);
-}
-
static inline long atomic_long_inc_return(atomic_long_t *l)
{
ATOMIC_LONG_PFX(_t) *v = (ATOMIC_LONG_PFX(_t) *)l;
@@ -141,9 +177,5 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)

#define atomic_long_inc_not_zero(l) \
ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
-#define atomic_long_cmpxchg(l, old, new) \
- (ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
-#define atomic_long_xchg(v, new) \
- (ATOMIC_LONG_PFX(_xchg)((ATOMIC_LONG_PFX(_t) *)(v), (new)))

#endif /* _ASM_GENERIC_ATOMIC_LONG_H */
--
2.1.4

2015-08-03 17:03:36

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 4/8] lockref: remove homebrew cmpxchg64_relaxed macro definition

cmpxchg64_relaxed is now defined by linux/atomic.h, so we can
remove our local definition from the lockref code.

Signed-off-by: Will Deacon <[email protected]>
---
lib/lockref.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/lib/lockref.c b/lib/lockref.c
index 494994bf17c8..5a92189ad711 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -4,14 +4,6 @@
#if USE_CMPXCHG_LOCKREF

/*
- * Allow weakly-ordered memory architectures to provide barrier-less
- * cmpxchg semantics for lockref updates.
- */
-#ifndef cmpxchg64_relaxed
-# define cmpxchg64_relaxed cmpxchg64
-#endif
-
-/*
* Note that the "cmpxchg()" reloads the "old" value for the
* failure case.
*/
--
2.1.4

2015-08-03 17:03:37

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 5/8] locking/qrwlock: implement queue_write_unlock using smp_store_release

Since 536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire()
and smp_store_release()"), smp_store_release supports byte accesses, so
use that in writer unlock and remove the conditional macro override.

Signed-off-by: Will Deacon <[email protected]>
---
arch/x86/include/asm/qrwlock.h | 10 ----------
include/asm-generic/qrwlock.h | 9 +--------
2 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/qrwlock.h b/arch/x86/include/asm/qrwlock.h
index a8810bf135ab..c537cbb038a7 100644
--- a/arch/x86/include/asm/qrwlock.h
+++ b/arch/x86/include/asm/qrwlock.h
@@ -2,16 +2,6 @@
#define _ASM_X86_QRWLOCK_H

#include <asm-generic/qrwlock_types.h>
-
-#ifndef CONFIG_X86_PPRO_FENCE
-#define queued_write_unlock queued_write_unlock
-static inline void queued_write_unlock(struct qrwlock *lock)
-{
- barrier();
- ACCESS_ONCE(*(u8 *)&lock->cnts) = 0;
-}
-#endif
-
#include <asm-generic/qrwlock.h>

#endif /* _ASM_X86_QRWLOCK_H */
diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index deb9e8b0eb9e..eb673dde8879 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -134,21 +134,14 @@ static inline void queued_read_unlock(struct qrwlock *lock)
atomic_sub(_QR_BIAS, &lock->cnts);
}

-#ifndef queued_write_unlock
/**
* queued_write_unlock - release write lock of a queue rwlock
* @lock : Pointer to queue rwlock structure
*/
static inline void queued_write_unlock(struct qrwlock *lock)
{
- /*
- * If the writer field is atomic, it can be cleared directly.
- * Otherwise, an atomic subtraction will be used to clear it.
- */
- smp_mb__before_atomic();
- atomic_sub(_QW_LOCKED, &lock->cnts);
+ smp_store_release((u8 *)&lock->cnts, 0);
}
-#endif

/*
* Remapping rwlock architecture specific functions to the corresponding
--
2.1.4

2015-08-03 17:03:13

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 6/8] locking/qrwlock: make use of acquire/release/relaxed atomics

The qrwlock implementation is slightly heavy in its use of memory
barriers, mainly through the use of cmpxchg and _return atomics, which
imply full barrier semantics.

This patch modifies the qrwlock code to use the more relaxed atomic
routines so that we can reduce the unnecessary barrier overhead on
weakly-ordered architectures.

Signed-off-by: Will Deacon <[email protected]>
---
include/asm-generic/qrwlock.h | 13 ++++++-------
kernel/locking/qrwlock.c | 23 +++++++++++++++--------
2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index eb673dde8879..54a8e65e18b6 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -68,7 +68,7 @@ static inline int queued_read_trylock(struct qrwlock *lock)

cnts = atomic_read(&lock->cnts);
if (likely(!(cnts & _QW_WMASK))) {
- cnts = (u32)atomic_add_return(_QR_BIAS, &lock->cnts);
+ cnts = (u32)atomic_add_return_acquire(_QR_BIAS, &lock->cnts);
if (likely(!(cnts & _QW_WMASK)))
return 1;
atomic_sub(_QR_BIAS, &lock->cnts);
@@ -89,8 +89,8 @@ static inline int queued_write_trylock(struct qrwlock *lock)
if (unlikely(cnts))
return 0;

- return likely(atomic_cmpxchg(&lock->cnts,
- cnts, cnts | _QW_LOCKED) == cnts);
+ return likely(atomic_cmpxchg_acquire(&lock->cnts,
+ cnts, cnts | _QW_LOCKED) == cnts);
}
/**
* queued_read_lock - acquire read lock of a queue rwlock
@@ -100,7 +100,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
{
u32 cnts;

- cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
+ cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts);
if (likely(!(cnts & _QW_WMASK)))
return;

@@ -115,7 +115,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
static inline void queued_write_lock(struct qrwlock *lock)
{
/* Optimize for the unfair lock case where the fair flag is 0. */
- if (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0)
+ if (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0)
return;

queued_write_lock_slowpath(lock);
@@ -130,8 +130,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
/*
* Atomically decrement the reader count
*/
- smp_mb__before_atomic();
- atomic_sub(_QR_BIAS, &lock->cnts);
+ (void)atomic_sub_return_release(_QR_BIAS, &lock->cnts);
}

/**
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index d9c36c5f5711..fb4ef2d636f2 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -55,7 +55,7 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
{
while ((cnts & _QW_WMASK) == _QW_LOCKED) {
cpu_relax_lowlatency();
- cnts = smp_load_acquire((u32 *)&lock->cnts);
+ cnts = atomic_read_acquire(&lock->cnts);
}
}

@@ -74,8 +74,9 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
* Readers in interrupt context will get the lock immediately
* if the writer is just waiting (not holding the lock yet).
* The rspin_until_writer_unlock() function returns immediately
- * in this case. Otherwise, they will spin until the lock
- * is available without waiting in the queue.
+ * in this case. Otherwise, they will spin (with ACQUIRE
+ * semantics) until the lock is available without waiting in
+ * the queue.
*/
rspin_until_writer_unlock(lock, cnts);
return;
@@ -97,7 +98,13 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
while (atomic_read(&lock->cnts) & _QW_WMASK)
cpu_relax_lowlatency();

- cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+ cnts = atomic_add_return_relaxed(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+
+ /*
+ * The ACQUIRE semantics of the spinning code ensure that
+ * accesses can't leak upwards out of our subsequent critical
+ * section.
+ */
rspin_until_writer_unlock(lock, cnts);

/*
@@ -120,7 +127,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)

/* Try to acquire the lock directly if no reader is present */
if (!atomic_read(&lock->cnts) &&
- (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0))
+ (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
goto unlock;

/*
@@ -131,7 +138,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
struct __qrwlock *l = (struct __qrwlock *)lock;

if (!READ_ONCE(l->wmode) &&
- (cmpxchg(&l->wmode, 0, _QW_WAITING) == 0))
+ (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
break;

cpu_relax_lowlatency();
@@ -141,8 +148,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
for (;;) {
cnts = atomic_read(&lock->cnts);
if ((cnts == _QW_WAITING) &&
- (atomic_cmpxchg(&lock->cnts, _QW_WAITING,
- _QW_LOCKED) == _QW_WAITING))
+ (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
+ _QW_LOCKED) == _QW_WAITING))
break;

cpu_relax_lowlatency();
--
2.1.4

2015-08-03 17:15:57

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 7/8] include/llist: use linux/atomic.h instead of asm/cmpxchg.h

Including an asm/ header directly is best avoided, so use linux/atomic.h
instead of asm/cmpxchg.h in linux/llist.h.

Signed-off-by: Will Deacon <[email protected]>
---
include/linux/llist.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index fbf10a0bc095..fd4ca0b4fe0f 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -55,8 +55,8 @@
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

+#include <linux/atomic.h>
#include <linux/kernel.h>
-#include <asm/cmpxchg.h>

struct llist_head {
struct llist_node *first;
--
2.1.4

2015-08-03 17:15:59

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 8/8] ARM: atomics: define our SMP atomics in terms of _relaxed operations

By defining our SMP atomics in terms of relaxed operations, we gain
a small reduction in code size and have acquire/release/fence variants
generated automatically by the core code.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm/include/asm/atomic.h | 37 ++++++++++++++-------------------
arch/arm/include/asm/cmpxchg.h | 47 +++++++-----------------------------------
2 files changed, 24 insertions(+), 60 deletions(-)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index e22c11970b7b..bc9da3a3ff5e 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -57,12 +57,11 @@ static inline void atomic_##op(int i, atomic_t *v) \
} \

#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
-static inline int atomic_##op##_return(int i, atomic_t *v) \
+static inline int atomic_##op##_return_relaxed(int i, atomic_t *v) \
{ \
unsigned long tmp; \
int result; \
\
- smp_mb(); \
prefetchw(&v->counter); \
\
__asm__ __volatile__("@ atomic_" #op "_return\n" \
@@ -75,17 +74,17 @@ static inline int atomic_##op##_return(int i, atomic_t *v) \
: "r" (&v->counter), "Ir" (i) \
: "cc"); \
\
- smp_mb(); \
- \
return result; \
}

-static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
+#define atomic_add_return_relaxed atomic_add_return_relaxed
+#define atomic_sub_return_relaxed atomic_sub_return_relaxed
+
+static inline int atomic_cmpxchg_relaxed(atomic_t *ptr, int old, int new)
{
int oldval;
unsigned long res;

- smp_mb();
prefetchw(&ptr->counter);

do {
@@ -99,10 +98,9 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
: "cc");
} while (res);

- smp_mb();
-
return oldval;
}
+#define atomic_cmpxchg_relaxed atomic_cmpxchg_relaxed

static inline int __atomic_add_unless(atomic_t *v, int a, int u)
{
@@ -290,12 +288,12 @@ static inline void atomic64_##op(long long i, atomic64_t *v) \
} \

#define ATOMIC64_OP_RETURN(op, op1, op2) \
-static inline long long atomic64_##op##_return(long long i, atomic64_t *v) \
+static inline long long \
+atomic64_##op##_return_relaxed(long long i, atomic64_t *v) \
{ \
long long result; \
unsigned long tmp; \
\
- smp_mb(); \
prefetchw(&v->counter); \
\
__asm__ __volatile__("@ atomic64_" #op "_return\n" \
@@ -309,8 +307,6 @@ static inline long long atomic64_##op##_return(long long i, atomic64_t *v) \
: "r" (&v->counter), "r" (i) \
: "cc"); \
\
- smp_mb(); \
- \
return result; \
}

@@ -321,17 +317,19 @@ static inline long long atomic64_##op##_return(long long i, atomic64_t *v) \
ATOMIC64_OPS(add, adds, adc)
ATOMIC64_OPS(sub, subs, sbc)

+#define atomic64_add_return_relaxed atomic64_add_return_relaxed
+#define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
+
#undef ATOMIC64_OPS
#undef ATOMIC64_OP_RETURN
#undef ATOMIC64_OP

-static inline long long atomic64_cmpxchg(atomic64_t *ptr, long long old,
- long long new)
+static inline long long
+atomic64_cmpxchg_relaxed(atomic64_t *ptr, long long old, long long new)
{
long long oldval;
unsigned long res;

- smp_mb();
prefetchw(&ptr->counter);

do {
@@ -346,17 +344,15 @@ static inline long long atomic64_cmpxchg(atomic64_t *ptr, long long old,
: "cc");
} while (res);

- smp_mb();
-
return oldval;
}
+#define atomic64_cmpxchg_relaxed atomic64_cmpxchg_relaxed

-static inline long long atomic64_xchg(atomic64_t *ptr, long long new)
+static inline long long atomic64_xchg_relaxed(atomic64_t *ptr, long long new)
{
long long result;
unsigned long tmp;

- smp_mb();
prefetchw(&ptr->counter);

__asm__ __volatile__("@ atomic64_xchg\n"
@@ -368,10 +364,9 @@ static inline long long atomic64_xchg(atomic64_t *ptr, long long new)
: "r" (&ptr->counter), "r" (new)
: "cc");

- smp_mb();
-
return result;
}
+#define atomic64_xchg_relaxed atomic64_xchg_relaxed

static inline long long atomic64_dec_if_positive(atomic64_t *v)
{
diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
index 1692a05d3207..916a2744d5c6 100644
--- a/arch/arm/include/asm/cmpxchg.h
+++ b/arch/arm/include/asm/cmpxchg.h
@@ -35,7 +35,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
unsigned int tmp;
#endif

- smp_mb();
prefetchw((const void *)ptr);

switch (size) {
@@ -98,12 +97,11 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
__bad_xchg(ptr, size), ret = 0;
break;
}
- smp_mb();

return ret;
}

-#define xchg(ptr, x) ({ \
+#define xchg_relaxed(ptr, x) ({ \
(__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), \
sizeof(*(ptr))); \
})
@@ -117,6 +115,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
#error "SMP is not supported on this platform"
#endif

+#define xchg xchg_relaxed
+
/*
* cmpxchg_local and cmpxchg64_local are atomic wrt current CPU. Always make
* them available.
@@ -194,23 +194,11 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
return oldval;
}

-static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
- unsigned long new, int size)
-{
- unsigned long ret;
-
- smp_mb();
- ret = __cmpxchg(ptr, old, new, size);
- smp_mb();
-
- return ret;
-}
-
-#define cmpxchg(ptr,o,n) ({ \
- (__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n), \
- sizeof(*(ptr))); \
+#define cmpxchg_relaxed(ptr,o,n) ({ \
+ (__typeof__(*(ptr)))__cmpxchg((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))); \
})

static inline unsigned long __cmpxchg_local(volatile void *ptr,
@@ -273,25 +261,6 @@ static inline unsigned long long __cmpxchg64(unsigned long long *ptr,

#define cmpxchg64_local(ptr, o, n) cmpxchg64_relaxed((ptr), (o), (n))

-static inline unsigned long long __cmpxchg64_mb(unsigned long long *ptr,
- unsigned long long old,
- unsigned long long new)
-{
- unsigned long long ret;
-
- smp_mb();
- ret = __cmpxchg64(ptr, old, new);
- smp_mb();
-
- return ret;
-}
-
-#define cmpxchg64(ptr, o, n) ({ \
- (__typeof__(*(ptr)))__cmpxchg64_mb((ptr), \
- (unsigned long long)(o), \
- (unsigned long long)(n)); \
-})
-
#endif /* __LINUX_ARM_ARCH__ >= 6 */

#endif /* __ASM_ARM_CMPXCHG_H */
--
2.1.4

2015-08-03 17:27:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/8] atomics: add acquire/release/relaxed variants of some atomic operations

On Mon, Aug 03, 2015 at 06:02:24PM +0100, Will Deacon wrote:
> +/*
> + * The idea here is to build acquire/release variants by adding explicit
> + * barriers on top of the relaxed variant. In the case where the relaxed
> + * variant is already fully ordered, no additional barriers are needed.
> + */
> +#define __atomic_op_acquire(ret_t, op, ...) \
> +({ \
> + ret_t __ret = op##_relaxed(__VA_ARGS__); \

Do you really need ret_t? Can't we use typeof() on the expression?

> + smp_mb__after_atomic(); \
> + __ret; \
> +})

2015-08-03 18:21:41

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 1/8] atomics: add acquire/release/relaxed variants of some atomic operations

On Mon, Aug 03, 2015 at 06:26:58PM +0100, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 06:02:24PM +0100, Will Deacon wrote:
> > +/*
> > + * The idea here is to build acquire/release variants by adding explicit
> > + * barriers on top of the relaxed variant. In the case where the relaxed
> > + * variant is already fully ordered, no additional barriers are needed.
> > + */
> > +#define __atomic_op_acquire(ret_t, op, ...) \
> > +({ \
> > + ret_t __ret = op##_relaxed(__VA_ARGS__); \
>
> Do you really need ret_t? Can't we use typeof() on the expression?

*gulp*! I was slightly worried about this from the GNU docs:

`The operand of typeof is evaluated for its side effects if and only if
it is an expression of variably modified type or the name of such a
type.'

but since none of our atomic functions return "variably modified types",
then there shouldn't be anything to worry about. It also means I can
slightly simplify the xchg/cmpxchg wrappers where we were previously
passing through the typeof(*ptr).

Incremental diff below (I'll post a v5 when the build testing comes back
clean).

Will

--->8

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index d2515c05e7c8..41ea776052be 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -35,26 +35,24 @@
* barriers on top of the relaxed variant. In the case where the relaxed
* variant is already fully ordered, no additional barriers are needed.
*/
-#define __atomic_op_acquire(ret_t, op, ...) \
+#define __atomic_op_acquire(op, args...) \
({ \
- ret_t __ret = op##_relaxed(__VA_ARGS__); \
+ typeof(op##_relaxed(args)) __ret = op##_relaxed(args); \
smp_mb__after_atomic(); \
__ret; \
})

-#define __atomic_op_release(ret_t, op, ...) \
+#define __atomic_op_release(op, args...) \
({ \
- ret_t __ret; \
smp_mb__before_atomic(); \
- __ret = op##_relaxed(__VA_ARGS__); \
- __ret; \
+ op##_relaxed(args); \
})

-#define __atomic_op_fence(ret_t, op, ...) \
+#define __atomic_op_fence(op, args...) \
({ \
- ret_t __ret; \
+ typeof(op##_relaxed(args)) __ret; \
smp_mb__before_atomic(); \
- __ret = op##_relaxed(__VA_ARGS__); \
+ __ret = op##_relaxed(args); \
smp_mb__after_atomic(); \
__ret; \
})
@@ -69,17 +67,17 @@

#ifndef atomic_add_return_acquire
#define atomic_add_return_acquire(...) \
- __atomic_op_acquire(int, atomic_add_return, __VA_ARGS__)
+ __atomic_op_acquire(atomic_add_return, __VA_ARGS__)
#endif

#ifndef atomic_add_return_release
#define atomic_add_return_release(...) \
- __atomic_op_release(int, atomic_add_return, __VA_ARGS__)
+ __atomic_op_release(atomic_add_return, __VA_ARGS__)
#endif

#ifndef atomic_add_return
#define atomic_add_return(...) \
- __atomic_op_fence(int, atomic_add_return, __VA_ARGS__)
+ __atomic_op_fence(atomic_add_return, __VA_ARGS__)
#endif
#endif /* atomic_add_return_relaxed */

@@ -93,17 +91,17 @@

#ifndef atomic_sub_return_acquire
#define atomic_sub_return_acquire(...) \
- __atomic_op_acquire(int, atomic_sub_return, __VA_ARGS__)
+ __atomic_op_acquire(atomic_sub_return, __VA_ARGS__)
#endif

#ifndef atomic_sub_return_release
#define atomic_sub_return_release(...) \
- __atomic_op_release(int, atomic_sub_return, __VA_ARGS__)
+ __atomic_op_release(atomic_sub_return, __VA_ARGS__)
#endif

#ifndef atomic_sub_return
#define atomic_sub_return(...) \
- __atomic_op_fence(int, atomic_sub_return, __VA_ARGS__)
+ __atomic_op_fence(atomic_sub_return, __VA_ARGS__)
#endif
#endif /* atomic_sub_return_relaxed */

@@ -117,17 +115,17 @@

#ifndef atomic_xchg_acquire
#define atomic_xchg_acquire(...) \
- __atomic_op_acquire(int, atomic_xchg, __VA_ARGS__)
+ __atomic_op_acquire(atomic_xchg, __VA_ARGS__)
#endif

#ifndef atomic_xchg_release
#define atomic_xchg_release(...) \
- __atomic_op_release(int, atomic_xchg, __VA_ARGS__)
+ __atomic_op_release(atomic_xchg, __VA_ARGS__)
#endif

#ifndef atomic_xchg
#define atomic_xchg(...) \
- __atomic_op_fence(int, atomic_xchg, __VA_ARGS__)
+ __atomic_op_fence(atomic_xchg, __VA_ARGS__)
#endif
#endif /* atomic_xchg_relaxed */

@@ -141,17 +139,17 @@

#ifndef atomic_cmpxchg_acquire
#define atomic_cmpxchg_acquire(...) \
- __atomic_op_acquire(int, atomic_cmpxchg, __VA_ARGS__)
+ __atomic_op_acquire(atomic_cmpxchg, __VA_ARGS__)
#endif

#ifndef atomic_cmpxchg_release
#define atomic_cmpxchg_release(...) \
- __atomic_op_release(int, atomic_cmpxchg, __VA_ARGS__)
+ __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)
#endif

#ifndef atomic_cmpxchg
#define atomic_cmpxchg(...) \
- __atomic_op_fence(int, atomic_cmpxchg, __VA_ARGS__)
+ __atomic_op_fence(atomic_cmpxchg, __VA_ARGS__)
#endif
#endif /* atomic_cmpxchg_relaxed */

@@ -173,17 +171,17 @@

#ifndef atomic64_add_return_acquire
#define atomic64_add_return_acquire(...) \
- __atomic_op_acquire(long long, atomic64_add_return, __VA_ARGS__)
+ __atomic_op_acquire(atomic64_add_return, __VA_ARGS__)
#endif

#ifndef atomic64_add_return_release
#define atomic64_add_return_release(...) \
- __atomic_op_release(long long, atomic64_add_return, __VA_ARGS__)
+ __atomic_op_release(atomic64_add_return, __VA_ARGS__)
#endif

#ifndef atomic64_add_return
#define atomic64_add_return(...) \
- __atomic_op_fence(long long, atomic64_add_return, __VA_ARGS__)
+ __atomic_op_fence(atomic64_add_return, __VA_ARGS__)
#endif
#endif /* atomic64_add_return_relaxed */

@@ -197,17 +195,17 @@

#ifndef atomic64_sub_return_acquire
#define atomic64_sub_return_acquire(...) \
- __atomic_op_acquire(long long, atomic64_sub_return, __VA_ARGS__)
+ __atomic_op_acquire(atomic64_sub_return, __VA_ARGS__)
#endif

#ifndef atomic64_sub_return_release
#define atomic64_sub_return_release(...) \
- __atomic_op_release(long long, atomic64_sub_return, __VA_ARGS__)
+ __atomic_op_release(atomic64_sub_return, __VA_ARGS__)
#endif

#ifndef atomic64_sub_return
#define atomic64_sub_return(...) \
- __atomic_op_fence(long long, atomic64_sub_return, __VA_ARGS__)
+ __atomic_op_fence(atomic64_sub_return, __VA_ARGS__)
#endif
#endif /* atomic64_sub_return_relaxed */

@@ -221,17 +219,17 @@

#ifndef atomic64_xchg_acquire
#define atomic64_xchg_acquire(...) \
- __atomic_op_acquire(long long, atomic64_xchg, __VA_ARGS__)
+ __atomic_op_acquire(atomic64_xchg, __VA_ARGS__)
#endif

#ifndef atomic64_xchg_release
#define atomic64_xchg_release(...) \
- __atomic_op_release(long long, atomic64_xchg, __VA_ARGS__)
+ __atomic_op_release(atomic64_xchg, __VA_ARGS__)
#endif

#ifndef atomic64_xchg
#define atomic64_xchg(...) \
- __atomic_op_fence(long long, atomic64_xchg, __VA_ARGS__)
+ __atomic_op_fence(atomic64_xchg, __VA_ARGS__)
#endif
#endif /* atomic64_xchg_relaxed */

@@ -245,17 +243,17 @@

#ifndef atomic64_cmpxchg_acquire
#define atomic64_cmpxchg_acquire(...) \
- __atomic_op_acquire(long long, atomic64_cmpxchg, __VA_ARGS__)
+ __atomic_op_acquire(atomic64_cmpxchg, __VA_ARGS__)
#endif

#ifndef atomic64_cmpxchg_release
#define atomic64_cmpxchg_release(...) \
- __atomic_op_release(long long, atomic64_cmpxchg, __VA_ARGS__)
+ __atomic_op_release(atomic64_cmpxchg, __VA_ARGS__)
#endif

#ifndef atomic64_cmpxchg
#define atomic64_cmpxchg(...) \
- __atomic_op_fence(long long, atomic64_cmpxchg, __VA_ARGS__)
+ __atomic_op_fence(atomic64_cmpxchg, __VA_ARGS__)
#endif
#endif /* atomic64_cmpxchg_relaxed */

@@ -268,18 +266,18 @@
#else /* cmpxchg_relaxed */

#ifndef cmpxchg_acquire
-#define cmpxchg_acquire(ptr, ...) \
- __atomic_op_acquire(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#define cmpxchg_acquire(...) \
+ __atomic_op_acquire(cmpxchg, __VA_ARGS__)
#endif

#ifndef cmpxchg_release
-#define cmpxchg_release(ptr, ...) \
- __atomic_op_release(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#define cmpxchg_release(...) \
+ __atomic_op_release(cmpxchg, __VA_ARGS__)
#endif

#ifndef cmpxchg
-#define cmpxchg(ptr, ...) \
- __atomic_op_fence(__typeof__(*ptr), cmpxchg, ptr, __VA_ARGS__)
+#define cmpxchg(...) \
+ __atomic_op_fence(cmpxchg, __VA_ARGS__)
#endif
#endif /* cmpxchg_relaxed */

@@ -292,18 +290,18 @@
#else /* cmpxchg64_relaxed */

#ifndef cmpxchg64_acquire
-#define cmpxchg64_acquire(ptr, ...) \
- __atomic_op_acquire(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#define cmpxchg64_acquire(...) \
+ __atomic_op_acquire(cmpxchg64, __VA_ARGS__)
#endif

#ifndef cmpxchg64_release
-#define cmpxchg64_release(ptr, ...) \
- __atomic_op_release(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#define cmpxchg64_release(...) \
+ __atomic_op_release(cmpxchg64, __VA_ARGS__)
#endif

#ifndef cmpxchg64
-#define cmpxchg64(ptr, ...) \
- __atomic_op_fence(__typeof__(*ptr), cmpxchg64, ptr, __VA_ARGS__)
+#define cmpxchg64(...) \
+ __atomic_op_fence(cmpxchg64, __VA_ARGS__)
#endif
#endif /* cmpxchg64_relaxed */

@@ -316,18 +314,15 @@
#else /* xchg_relaxed */

#ifndef xchg_acquire
-#define xchg_acquire(ptr, ...) \
- __atomic_op_acquire(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#define xchg_acquire(...) __atomic_op_acquire(xchg, __VA_ARGS__)
#endif

#ifndef xchg_release
-#define xchg_release(ptr, ...) \
- __atomic_op_release(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#define xchg_release(...) __atomic_op_release(xchg, __VA_ARGS__)
#endif

#ifndef xchg
-#define xchg(ptr, ...) \
- __atomic_op_fence(__typeof__(*ptr), xchg, ptr, __VA_ARGS__)
+#define xchg(...) __atomic_op_fence(xchg, __VA_ARGS__)
#endif
#endif /* xchg_relaxed */

2015-08-03 20:44:49

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4 5/8] locking/qrwlock: implement queue_write_unlock using smp_store_release

On 08/03/2015 01:02 PM, Will Deacon wrote:
> Since 536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire()
> and smp_store_release()"), smp_store_release supports byte accesses, so
> use that in writer unlock and remove the conditional macro override.
>
> Signed-off-by: Will Deacon<[email protected]>
> ---
> arch/x86/include/asm/qrwlock.h | 10 ----------
> include/asm-generic/qrwlock.h | 9 +--------
> 2 files changed, 1 insertion(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/qrwlock.h b/arch/x86/include/asm/qrwlock.h
> index a8810bf135ab..c537cbb038a7 100644
> --- a/arch/x86/include/asm/qrwlock.h
> +++ b/arch/x86/include/asm/qrwlock.h
> @@ -2,16 +2,6 @@
> #define _ASM_X86_QRWLOCK_H
>
> #include<asm-generic/qrwlock_types.h>
> -
> -#ifndef CONFIG_X86_PPRO_FENCE
> -#define queued_write_unlock queued_write_unlock
> -static inline void queued_write_unlock(struct qrwlock *lock)
> -{
> - barrier();
> - ACCESS_ONCE(*(u8 *)&lock->cnts) = 0;
> -}
> -#endif
> -
> #include<asm-generic/qrwlock.h>
>
> #endif /* _ASM_X86_QRWLOCK_H */
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index deb9e8b0eb9e..eb673dde8879 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -134,21 +134,14 @@ static inline void queued_read_unlock(struct qrwlock *lock)
> atomic_sub(_QR_BIAS,&lock->cnts);
> }
>
> -#ifndef queued_write_unlock
> /**
> * queued_write_unlock - release write lock of a queue rwlock
> * @lock : Pointer to queue rwlock structure
> */
> static inline void queued_write_unlock(struct qrwlock *lock)
> {
> - /*
> - * If the writer field is atomic, it can be cleared directly.
> - * Otherwise, an atomic subtraction will be used to clear it.
> - */
> - smp_mb__before_atomic();
> - atomic_sub(_QW_LOCKED,&lock->cnts);
> + smp_store_release((u8 *)&lock->cnts, 0);
> }
> -#endif
>
> /*
> * Remapping rwlock architecture specific functions to the corresponding

Reviewed-by: Waiman Long <[email protected]>

2015-08-03 20:49:35

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4 6/8] locking/qrwlock: make use of acquire/release/relaxed atomics

On 08/03/2015 01:02 PM, Will Deacon wrote:
> The qrwlock implementation is slightly heavy in its use of memory
> barriers, mainly through the use of cmpxchg and _return atomics, which
> imply full barrier semantics.
>
> This patch modifies the qrwlock code to use the more relaxed atomic
> routines so that we can reduce the unnecessary barrier overhead on
> weakly-ordered architectures.
>
> Signed-off-by: Will Deacon<[email protected]>
> ---
> include/asm-generic/qrwlock.h | 13 ++++++-------
> kernel/locking/qrwlock.c | 23 +++++++++++++++--------
> 2 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index eb673dde8879..54a8e65e18b6 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -68,7 +68,7 @@ static inline int queued_read_trylock(struct qrwlock *lock)
>
> cnts = atomic_read(&lock->cnts);
> if (likely(!(cnts& _QW_WMASK))) {
> - cnts = (u32)atomic_add_return(_QR_BIAS,&lock->cnts);
> + cnts = (u32)atomic_add_return_acquire(_QR_BIAS,&lock->cnts);
> if (likely(!(cnts& _QW_WMASK)))
> return 1;
> atomic_sub(_QR_BIAS,&lock->cnts);
> @@ -89,8 +89,8 @@ static inline int queued_write_trylock(struct qrwlock *lock)
> if (unlikely(cnts))
> return 0;
>
> - return likely(atomic_cmpxchg(&lock->cnts,
> - cnts, cnts | _QW_LOCKED) == cnts);
> + return likely(atomic_cmpxchg_acquire(&lock->cnts,
> + cnts, cnts | _QW_LOCKED) == cnts);
> }
> /**
> * queued_read_lock - acquire read lock of a queue rwlock
> @@ -100,7 +100,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
> {
> u32 cnts;
>
> - cnts = atomic_add_return(_QR_BIAS,&lock->cnts);
> + cnts = atomic_add_return_acquire(_QR_BIAS,&lock->cnts);
> if (likely(!(cnts& _QW_WMASK)))
> return;
>
> @@ -115,7 +115,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
> static inline void queued_write_lock(struct qrwlock *lock)
> {
> /* Optimize for the unfair lock case where the fair flag is 0. */
> - if (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0)
> + if (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0)
> return;
>
> queued_write_lock_slowpath(lock);
> @@ -130,8 +130,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
> /*
> * Atomically decrement the reader count
> */
> - smp_mb__before_atomic();
> - atomic_sub(_QR_BIAS,&lock->cnts);
> + (void)atomic_sub_return_release(_QR_BIAS,&lock->cnts);
> }
>
> /**
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index d9c36c5f5711..fb4ef2d636f2 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -55,7 +55,7 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
> {
> while ((cnts& _QW_WMASK) == _QW_LOCKED) {
> cpu_relax_lowlatency();
> - cnts = smp_load_acquire((u32 *)&lock->cnts);
> + cnts = atomic_read_acquire(&lock->cnts);
> }
> }
>
> @@ -74,8 +74,9 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
> * Readers in interrupt context will get the lock immediately
> * if the writer is just waiting (not holding the lock yet).
> * The rspin_until_writer_unlock() function returns immediately
> - * in this case. Otherwise, they will spin until the lock
> - * is available without waiting in the queue.
> + * in this case. Otherwise, they will spin (with ACQUIRE
> + * semantics) until the lock is available without waiting in
> + * the queue.
> */
> rspin_until_writer_unlock(lock, cnts);
> return;
> @@ -97,7 +98,13 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
> while (atomic_read(&lock->cnts)& _QW_WMASK)
> cpu_relax_lowlatency();
>
> - cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> + cnts = atomic_add_return_relaxed(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> +
> + /*
> + * The ACQUIRE semantics of the spinning code ensure that
> + * accesses can't leak upwards out of our subsequent critical
> + * section.
> + */

Maybe you should be more specific to mention the arch_spin_lock() call
above. Other than that,

Reviewed-by: Waiman Long <[email protected]>

2015-08-04 11:20:32

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 6/8] locking/qrwlock: make use of acquire/release/relaxed atomics

Hi Waiman,

Thanks for having a look.

On Mon, Aug 03, 2015 at 09:49:26PM +0100, Waiman Long wrote:
> On 08/03/2015 01:02 PM, Will Deacon wrote:
> > The qrwlock implementation is slightly heavy in its use of memory
> > barriers, mainly through the use of cmpxchg and _return atomics, which
> > imply full barrier semantics.
> >
> > This patch modifies the qrwlock code to use the more relaxed atomic
> > routines so that we can reduce the unnecessary barrier overhead on
> > weakly-ordered architectures.
> >
> > Signed-off-by: Will Deacon<[email protected]>

[...]

> > @@ -74,8 +74,9 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
> > * Readers in interrupt context will get the lock immediately
> > * if the writer is just waiting (not holding the lock yet).
> > * The rspin_until_writer_unlock() function returns immediately
> > - * in this case. Otherwise, they will spin until the lock
> > - * is available without waiting in the queue.
> > + * in this case. Otherwise, they will spin (with ACQUIRE
> > + * semantics) until the lock is available without waiting in
> > + * the queue.
> > */
> > rspin_until_writer_unlock(lock, cnts);
> > return;
> > @@ -97,7 +98,13 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
> > while (atomic_read(&lock->cnts)& _QW_WMASK)
> > cpu_relax_lowlatency();
> >
> > - cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> > + cnts = atomic_add_return_relaxed(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> > +
> > + /*
> > + * The ACQUIRE semantics of the spinning code ensure that
> > + * accesses can't leak upwards out of our subsequent critical
> > + * section.
> > + */
>
> Maybe you should be more specific to mention the arch_spin_lock() call
> above. Other than that,

Actually, I think you've uncovered a bug! Initially, I based this on top
of my qrwlock series that made the acquire unconditional in
rspin_until_writer_unlock, but you (reasonably) objected to the extra
overhead on the interrupt path, so now we only get an acquire if the
initial test of (cnts & _QW_WMASK) == _QW_LOCKED) succeeds.

So actually, the atomic_add_return needs to be made an
atomic_add_return_acquire. I'll make that change and adjust the comment
accordingly.

Fixup below.

Cheers,

Will

--->8

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index fb4ef2d636f2..1724eac4c84b 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -98,13 +98,12 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
while (atomic_read(&lock->cnts) & _QW_WMASK)
cpu_relax_lowlatency();

- cnts = atomic_add_return_relaxed(_QR_BIAS, &lock->cnts) - _QR_BIAS;
-
/*
- * The ACQUIRE semantics of the spinning code ensure that
- * accesses can't leak upwards out of our subsequent critical
- * section.
+ * The ACQUIRE semantics of the following spinning code ensure
+ * that accesses can't leak upwards out of our subsequent critical
+ * section in the case that the lock is currently held for write.
*/
+ cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
rspin_until_writer_unlock(lock, cnts);

/*