LinuxLists.cc - [PATCH 00/11] ARC atomics update

2021-08-04 21:14:58

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 00/11] ARC atomics update

Hi,

This series contains long due update to ARC atomics, discussed back
in 2018 [1] and [2]. I had them for arc64 port and decided to post them
here for some review and inclusion, after Mark's rework.

The main changes are use of relaxed atomics and generic bitops. Latter
does cause some cogen bloat on ARC due to signed args but that can be
reviewd seperately consider cross-arch impact.

The changes survive glibc testsuite with no regressions whatsoever.

Please review and provide any feedback.

Thx,
-Vineet

[1] https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]

Vineet Gupta (10):
ARC: atomics: disintegrate header
ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
ARC: atomic: !LLSC: use int data type consistently
ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
ARC: atomics: implement relaxed variants
ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
ARC: xchg: !LLSC: remove UP micro-optimization/hack
ARC: cmpxchg/xchg: rewrite as macros to make type safe
ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants

Will Deacon (1):
ARC: switch to generic bitops

arch/arc/include/asm/atomic-llsc.h | 97 ++++++
arch/arc/include/asm/atomic-spinlock.h | 102 ++++++
arch/arc/include/asm/atomic.h | 444 ++-----------------------
arch/arc/include/asm/atomic64-arcv2.h | 250 ++++++++++++++
arch/arc/include/asm/bitops.h | 188 +----------
arch/arc/include/asm/cmpxchg.h | 233 ++++++-------
arch/arc/include/asm/smp.h | 14 -
arch/arc/kernel/smp.c | 2 -
8 files changed, 588 insertions(+), 742 deletions(-)
create mode 100644 arch/arc/include/asm/atomic-llsc.h
create mode 100644 arch/arc/include/asm/atomic-spinlock.h
create mode 100644 arch/arc/include/asm/atomic64-arcv2.h

--
2.25.1

2021-08-04 21:15:30

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 04/11] ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return

This is a non-functional change since those wrappers are not
used in kernel sources at all.

Link: http://lists.infradead.org/pipermail/linux-snps-arc/2018-August/004246.html
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic64-arcv2.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arc/include/asm/atomic64-arcv2.h b/arch/arc/include/asm/atomic64-arcv2.h
index 53996b11b551..22ef1cbb94e2 100644
--- a/arch/arc/include/asm/atomic64-arcv2.h
+++ b/arch/arc/include/asm/atomic64-arcv2.h
@@ -116,6 +116,12 @@ static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v) \

ATOMIC64_OPS(add, add.f, adc)
ATOMIC64_OPS(sub, sub.f, sbc)
+
+#undef ATOMIC64_OPS
+#define ATOMIC64_OPS(op, op1, op2) \
+ ATOMIC64_OP(op, op1, op2) \
+ ATOMIC64_FETCH_OP(op, op1, op2)
+
ATOMIC64_OPS(and, and, and)
ATOMIC64_OPS(andnot, bic, bic)
ATOMIC64_OPS(or, or, or)
--
2.25.1

2021-08-04 21:16:23

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 01/11] ARC: atomics: disintegrate header

Non functional change, to ease future addition/removal

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic-llsc.h | 103 ++++++
arch/arc/include/asm/atomic-spinlock.h | 111 +++++++
arch/arc/include/asm/atomic.h | 429 +------------------------
arch/arc/include/asm/atomic64-arcv2.h | 242 ++++++++++++++
4 files changed, 461 insertions(+), 424 deletions(-)
create mode 100644 arch/arc/include/asm/atomic-llsc.h
create mode 100644 arch/arc/include/asm/atomic-spinlock.h
create mode 100644 arch/arc/include/asm/atomic64-arcv2.h

diff --git a/arch/arc/include/asm/atomic-llsc.h b/arch/arc/include/asm/atomic-llsc.h
new file mode 100644
index 000000000000..aab4f2855457
--- /dev/null
+++ b/arch/arc/include/asm/atomic-llsc.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARC_ATOMIC_LLSC_H
+#define _ASM_ARC_ATOMIC_LLSC_H
+
+#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
+
+#define ATOMIC_OP(op, c_op, asm_op) \
+static inline void arch_atomic_##op(int i, atomic_t *v) \
+{ \
+ unsigned int val; \
+ \
+ __asm__ __volatile__( \
+ "1: llock %[val], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[val], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " bnz 1b \n" \
+ : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
+ : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
+ [i] "ir" (i) \
+ : "cc"); \
+} \
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
+static inline int arch_atomic_##op##_return(int i, atomic_t *v) \
+{ \
+ unsigned int val; \
+ \
+ /* \
+ * Explicit full memory barrier needed before/after as \
+ * LLOCK/SCOND themselves don't provide any such semantics \
+ */ \
+ smp_mb(); \
+ \
+ __asm__ __volatile__( \
+ "1: llock %[val], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[val], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " bnz 1b \n" \
+ : [val] "=&r" (val) \
+ : [ctr] "r" (&v->counter), \
+ [i] "ir" (i) \
+ : "cc"); \
+ \
+ smp_mb(); \
+ \
+ return val; \
+}
+
+#define ATOMIC_FETCH_OP(op, c_op, asm_op) \
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
+{ \
+ unsigned int val, orig; \
+ \
+ /* \
+ * Explicit full memory barrier needed before/after as \
+ * LLOCK/SCOND themselves don't provide any such semantics \
+ */ \
+ smp_mb(); \
+ \
+ __asm__ __volatile__( \
+ "1: llock %[orig], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[orig], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " bnz 1b \n" \
+ : [val] "=&r" (val), \
+ [orig] "=&r" (orig) \
+ : [ctr] "r" (&v->counter), \
+ [i] "ir" (i) \
+ : "cc"); \
+ \
+ smp_mb(); \
+ \
+ return orig; \
+}
+
+#define ATOMIC_OPS(op, c_op, asm_op) \
+ ATOMIC_OP(op, c_op, asm_op) \
+ ATOMIC_OP_RETURN(op, c_op, asm_op) \
+ ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(add, +=, add)
+ATOMIC_OPS(sub, -=, sub)
+
+#undef ATOMIC_OPS
+#define ATOMIC_OPS(op, c_op, asm_op) \
+ ATOMIC_OP(op, c_op, asm_op) \
+ ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(and, &=, and)
+ATOMIC_OPS(andnot, &= ~, bic)
+ATOMIC_OPS(or, |=, or)
+ATOMIC_OPS(xor, ^=, xor)
+
+#define arch_atomic_andnot arch_atomic_andnot
+#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
+
+#undef ATOMIC_OPS
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+#endif
diff --git a/arch/arc/include/asm/atomic-spinlock.h b/arch/arc/include/asm/atomic-spinlock.h
new file mode 100644
index 000000000000..bdf87610b2d7
--- /dev/null
+++ b/arch/arc/include/asm/atomic-spinlock.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARC_ATOMIC_SPLOCK_H
+#define _ASM_ARC_ATOMIC_SPLOCK_H
+
+#ifndef CONFIG_SMP
+
+ /* violating atomic_xxx API locking protocol in UP for optimization sake */
+#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
+
+#else
+
+static inline void arch_atomic_set(atomic_t *v, int i)
+{
+ /*
+ * Independent of hardware support, all of the atomic_xxx() APIs need
+ * to follow the same locking rules to make sure that a "hardware"
+ * atomic insn (e.g. LD) doesn't clobber an "emulated" atomic insn
+ * sequence
+ *
+ * Thus atomic_set() despite being 1 insn (and seemingly atomic)
+ * requires the locking.
+ */
+ unsigned long flags;
+
+ atomic_ops_lock(flags);
+ WRITE_ONCE(v->counter, i);
+ atomic_ops_unlock(flags);
+}
+
+#define arch_atomic_set_release(v, i) arch_atomic_set((v), (i))
+
+#endif
+
+/*
+ * Non hardware assisted Atomic-R-M-W
+ * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
+ */
+
+#define ATOMIC_OP(op, c_op, asm_op) \
+static inline void arch_atomic_##op(int i, atomic_t *v) \
+{ \
+ unsigned long flags; \
+ \
+ atomic_ops_lock(flags); \
+ v->counter c_op i; \
+ atomic_ops_unlock(flags); \
+}
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
+static inline int arch_atomic_##op##_return(int i, atomic_t *v) \
+{ \
+ unsigned long flags; \
+ unsigned long temp; \
+ \
+ /* \
+ * spin lock/unlock provides the needed smp_mb() before/after \
+ */ \
+ atomic_ops_lock(flags); \
+ temp = v->counter; \
+ temp c_op i; \
+ v->counter = temp; \
+ atomic_ops_unlock(flags); \
+ \
+ return temp; \
+}
+
+#define ATOMIC_FETCH_OP(op, c_op, asm_op) \
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
+{ \
+ unsigned long flags; \
+ unsigned long orig; \
+ \
+ /* \
+ * spin lock/unlock provides the needed smp_mb() before/after \
+ */ \
+ atomic_ops_lock(flags); \
+ orig = v->counter; \
+ v->counter c_op i; \
+ atomic_ops_unlock(flags); \
+ \
+ return orig; \
+}
+
+#define ATOMIC_OPS(op, c_op, asm_op) \
+ ATOMIC_OP(op, c_op, asm_op) \
+ ATOMIC_OP_RETURN(op, c_op, asm_op) \
+ ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(add, +=, add)
+ATOMIC_OPS(sub, -=, sub)
+
+#undef ATOMIC_OPS
+#define ATOMIC_OPS(op, c_op, asm_op) \
+ ATOMIC_OP(op, c_op, asm_op) \
+ ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(and, &=, and)
+ATOMIC_OPS(andnot, &= ~, bic)
+ATOMIC_OPS(or, |=, or)
+ATOMIC_OPS(xor, ^=, xor)
+
+#define arch_atomic_andnot arch_atomic_andnot
+#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
+
+#undef ATOMIC_OPS
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+#endif
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 7a36d79b5b2f..ee88e1dbaab5 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -17,435 +17,16 @@
#define arch_atomic_read(v) READ_ONCE((v)->counter)

#ifdef CONFIG_ARC_HAS_LLSC
-
-#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-
-#define ATOMIC_OP(op, c_op, asm_op) \
-static inline void arch_atomic_##op(int i, atomic_t *v) \
-{ \
- unsigned int val; \
- \
- __asm__ __volatile__( \
- "1: llock %[val], [%[ctr]] \n" \
- " " #asm_op " %[val], %[val], %[i] \n" \
- " scond %[val], [%[ctr]] \n" \
- " bnz 1b \n" \
- : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
- : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
- [i] "ir" (i) \
- : "cc"); \
-} \
-
-#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
-static inline int arch_atomic_##op##_return(int i, atomic_t *v) \
-{ \
- unsigned int val; \
- \
- /* \
- * Explicit full memory barrier needed before/after as \
- * LLOCK/SCOND themselves don't provide any such semantics \
- */ \
- smp_mb(); \
- \
- __asm__ __volatile__( \
- "1: llock %[val], [%[ctr]] \n" \
- " " #asm_op " %[val], %[val], %[i] \n" \
- " scond %[val], [%[ctr]] \n" \
- " bnz 1b \n" \
- : [val] "=&r" (val) \
- : [ctr] "r" (&v->counter), \
- [i] "ir" (i) \
- : "cc"); \
- \
- smp_mb(); \
- \
- return val; \
-}
-
-#define ATOMIC_FETCH_OP(op, c_op, asm_op) \
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
-{ \
- unsigned int val, orig; \
- \
- /* \
- * Explicit full memory barrier needed before/after as \
- * LLOCK/SCOND themselves don't provide any such semantics \
- */ \
- smp_mb(); \
- \
- __asm__ __volatile__( \
- "1: llock %[orig], [%[ctr]] \n" \
- " " #asm_op " %[val], %[orig], %[i] \n" \
- " scond %[val], [%[ctr]] \n" \
- " bnz 1b \n" \
- : [val] "=&r" (val), \
- [orig] "=&r" (orig) \
- : [ctr] "r" (&v->counter), \
- [i] "ir" (i) \
- : "cc"); \
- \
- smp_mb(); \
- \
- return orig; \
-}
-
-#else /* !CONFIG_ARC_HAS_LLSC */
-
-#ifndef CONFIG_SMP
-
- /* violating atomic_xxx API locking protocol in UP for optimization sake */
-#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-
+#include <asm/atomic-llsc.h>
#else
-
-static inline void arch_atomic_set(atomic_t *v, int i)
-{
- /*
- * Independent of hardware support, all of the atomic_xxx() APIs need
- * to follow the same locking rules to make sure that a "hardware"
- * atomic insn (e.g. LD) doesn't clobber an "emulated" atomic insn
- * sequence
- *
- * Thus atomic_set() despite being 1 insn (and seemingly atomic)
- * requires the locking.
- */
- unsigned long flags;
-
- atomic_ops_lock(flags);
- WRITE_ONCE(v->counter, i);
- atomic_ops_unlock(flags);
-}
-
-#define arch_atomic_set_release(v, i) arch_atomic_set((v), (i))
-
+#include <asm/atomic-spinlock.h>
#endif

-/*
- * Non hardware assisted Atomic-R-M-W
- * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
- */
-
-#define ATOMIC_OP(op, c_op, asm_op) \
-static inline void arch_atomic_##op(int i, atomic_t *v) \
-{ \
- unsigned long flags; \
- \
- atomic_ops_lock(flags); \
- v->counter c_op i; \
- atomic_ops_unlock(flags); \
-}
-
-#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
-static inline int arch_atomic_##op##_return(int i, atomic_t *v) \
-{ \
- unsigned long flags; \
- unsigned long temp; \
- \
- /* \
- * spin lock/unlock provides the needed smp_mb() before/after \
- */ \
- atomic_ops_lock(flags); \
- temp = v->counter; \
- temp c_op i; \
- v->counter = temp; \
- atomic_ops_unlock(flags); \
- \
- return temp; \
-}
-
-#define ATOMIC_FETCH_OP(op, c_op, asm_op) \
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
-{ \
- unsigned long flags; \
- unsigned long orig; \
- \
- /* \
- * spin lock/unlock provides the needed smp_mb() before/after \
- */ \
- atomic_ops_lock(flags); \
- orig = v->counter; \
- v->counter c_op i; \
- atomic_ops_unlock(flags); \
- \
- return orig; \
-}
-
-#endif /* !CONFIG_ARC_HAS_LLSC */
-
-#define ATOMIC_OPS(op, c_op, asm_op) \
- ATOMIC_OP(op, c_op, asm_op) \
- ATOMIC_OP_RETURN(op, c_op, asm_op) \
- ATOMIC_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC_OPS(add, +=, add)
-ATOMIC_OPS(sub, -=, sub)
-
-#undef ATOMIC_OPS
-#define ATOMIC_OPS(op, c_op, asm_op) \
- ATOMIC_OP(op, c_op, asm_op) \
- ATOMIC_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC_OPS(and, &=, and)
-ATOMIC_OPS(andnot, &= ~, bic)
-ATOMIC_OPS(or, |=, or)
-ATOMIC_OPS(xor, ^=, xor)
-
-#define arch_atomic_andnot arch_atomic_andnot
-#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
-
-#undef ATOMIC_OPS
-#undef ATOMIC_FETCH_OP
-#undef ATOMIC_OP_RETURN
-#undef ATOMIC_OP
-
#ifdef CONFIG_GENERIC_ATOMIC64
-
#include <asm-generic/atomic64.h>
-
-#else /* Kconfig ensures this is only enabled with needed h/w assist */
-
-/*
- * ARCv2 supports 64-bit exclusive load (LLOCKD) / store (SCONDD)
- * - The address HAS to be 64-bit aligned
- * - There are 2 semantics involved here:
- * = exclusive implies no interim update between load/store to same addr
- * = both words are observed/updated together: this is guaranteed even
- * for regular 64-bit load (LDD) / store (STD). Thus atomic64_set()
- * is NOT required to use LLOCKD+SCONDD, STD suffices
- */
-
-typedef struct {
- s64 __aligned(8) counter;
-} atomic64_t;
-
-#define ATOMIC64_INIT(a) { (a) }
-
-static inline s64 arch_atomic64_read(const atomic64_t *v)
-{
- s64 val;
-
- __asm__ __volatile__(
- " ldd %0, [%1] \n"
- : "=r"(val)
- : "r"(&v->counter));
-
- return val;
-}
-
-static inline void arch_atomic64_set(atomic64_t *v, s64 a)
-{
- /*
- * This could have been a simple assignment in "C" but would need
- * explicit volatile. Otherwise gcc optimizers could elide the store
- * which borked atomic64 self-test
- * In the inline asm version, memory clobber needed for exact same
- * reason, to tell gcc about the store.
- *
- * This however is not needed for sibling atomic64_add() etc since both
- * load/store are explicitly done in inline asm. As long as API is used
- * for each access, gcc has no way to optimize away any load/store
- */
- __asm__ __volatile__(
- " std %0, [%1] \n"
- :
- : "r"(a), "r"(&v->counter)
- : "memory");
-}
-
-#define ATOMIC64_OP(op, op1, op2) \
-static inline void arch_atomic64_##op(s64 a, atomic64_t *v) \
-{ \
- s64 val; \
- \
- __asm__ __volatile__( \
- "1: \n" \
- " llockd %0, [%1] \n" \
- " " #op1 " %L0, %L0, %L2 \n" \
- " " #op2 " %H0, %H0, %H2 \n" \
- " scondd %0, [%1] \n" \
- " bnz 1b \n" \
- : "=&r"(val) \
- : "r"(&v->counter), "ir"(a) \
- : "cc"); \
-} \
-
-#define ATOMIC64_OP_RETURN(op, op1, op2) \
-static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v) \
-{ \
- s64 val; \
- \
- smp_mb(); \
- \
- __asm__ __volatile__( \
- "1: \n" \
- " llockd %0, [%1] \n" \
- " " #op1 " %L0, %L0, %L2 \n" \
- " " #op2 " %H0, %H0, %H2 \n" \
- " scondd %0, [%1] \n" \
- " bnz 1b \n" \
- : [val] "=&r"(val) \
- : "r"(&v->counter), "ir"(a) \
- : "cc"); /* memory clobber comes from smp_mb() */ \
- \
- smp_mb(); \
- \
- return val; \
-}
-
-#define ATOMIC64_FETCH_OP(op, op1, op2) \
-static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v) \
-{ \
- s64 val, orig; \
- \
- smp_mb(); \
- \
- __asm__ __volatile__( \
- "1: \n" \
- " llockd %0, [%2] \n" \
- " " #op1 " %L1, %L0, %L3 \n" \
- " " #op2 " %H1, %H0, %H3 \n" \
- " scondd %1, [%2] \n" \
- " bnz 1b \n" \
- : "=&r"(orig), "=&r"(val) \
- : "r"(&v->counter), "ir"(a) \
- : "cc"); /* memory clobber comes from smp_mb() */ \
- \
- smp_mb(); \
- \
- return orig; \
-}
-
-#define ATOMIC64_OPS(op, op1, op2) \
- ATOMIC64_OP(op, op1, op2) \
- ATOMIC64_OP_RETURN(op, op1, op2) \
- ATOMIC64_FETCH_OP(op, op1, op2)
-
-ATOMIC64_OPS(add, add.f, adc)
-ATOMIC64_OPS(sub, sub.f, sbc)
-ATOMIC64_OPS(and, and, and)
-ATOMIC64_OPS(andnot, bic, bic)
-ATOMIC64_OPS(or, or, or)
-ATOMIC64_OPS(xor, xor, xor)
-
-#define arch_atomic64_andnot arch_atomic64_andnot
-#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
-static inline s64
-arch_atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new)
-{
- s64 prev;
-
- smp_mb();
-
- __asm__ __volatile__(
- "1: llockd %0, [%1] \n"
- " brne %L0, %L2, 2f \n"
- " brne %H0, %H2, 2f \n"
- " scondd %3, [%1] \n"
- " bnz 1b \n"
- "2: \n"
- : "=&r"(prev)
- : "r"(ptr), "ir"(expected), "r"(new)
- : "cc"); /* memory clobber comes from smp_mb() */
-
- smp_mb();
-
- return prev;
-}
-
-static inline s64 arch_atomic64_xchg(atomic64_t *ptr, s64 new)
-{
- s64 prev;
-
- smp_mb();
-
- __asm__ __volatile__(
- "1: llockd %0, [%1] \n"
- " scondd %2, [%1] \n"
- " bnz 1b \n"
- "2: \n"
- : "=&r"(prev)
- : "r"(ptr), "r"(new)
- : "cc"); /* memory clobber comes from smp_mb() */
-
- smp_mb();
-
- return prev;
-}
-
-/**
- * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic64_t
- *
- * The function returns the old value of *v minus 1, even if
- * the atomic variable, v, was not decremented.
- */
-
-static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
-{
- s64 val;
-
- smp_mb();
-
- __asm__ __volatile__(
- "1: llockd %0, [%1] \n"
- " sub.f %L0, %L0, 1 # w0 - 1, set C on borrow\n"
- " sub.c %H0, %H0, 1 # if C set, w1 - 1\n"
- " brlt %H0, 0, 2f \n"
- " scondd %0, [%1] \n"
- " bnz 1b \n"
- "2: \n"
- : "=&r"(val)
- : "r"(&v->counter)
- : "cc"); /* memory clobber comes from smp_mb() */
-
- smp_mb();
-
- return val;
-}
-#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
-
-/**
- * arch_atomic64_fetch_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, if it was not @u.
- * Returns the old value of @v
- */
-static inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
-{
- s64 old, temp;
-
- smp_mb();
-
- __asm__ __volatile__(
- "1: llockd %0, [%2] \n"
- " brne %L0, %L4, 2f # continue to add since v != u \n"
- " breq.d %H0, %H4, 3f # return since v == u \n"
- "2: \n"
- " add.f %L1, %L0, %L3 \n"
- " adc %H1, %H0, %H3 \n"
- " scondd %1, [%2] \n"
- " bnz 1b \n"
- "3: \n"
- : "=&r"(old), "=&r" (temp)
- : "r"(&v->counter), "r"(a), "r"(u)
- : "cc"); /* memory clobber comes from smp_mb() */
-
- smp_mb();
-
- return old;
-}
-#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
-
-#endif /* !CONFIG_GENERIC_ATOMIC64 */
+#else
+#include <asm/atomic64-arcv2.h>
+#endif

#endif /* !__ASSEMBLY__ */

diff --git a/arch/arc/include/asm/atomic64-arcv2.h b/arch/arc/include/asm/atomic64-arcv2.h
new file mode 100644
index 000000000000..53996b11b551
--- /dev/null
+++ b/arch/arc/include/asm/atomic64-arcv2.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * ARCv2 supports 64-bit exclusive load (LLOCKD) / store (SCONDD)
+ * - The address HAS to be 64-bit aligned
+ */
+
+#ifndef _ASM_ARC_ATOMIC64_ARCV2_H
+#define _ASM_ARC_ATOMIC64_ARCV2_H
+
+typedef struct {
+ s64 __aligned(8) counter;
+} atomic64_t;
+
+#define ATOMIC64_INIT(a) { (a) }
+
+static inline s64 arch_atomic64_read(const atomic64_t *v)
+{
+ s64 val;
+
+ __asm__ __volatile__(
+ " ldd %0, [%1] \n"
+ : "=r"(val)
+ : "r"(&v->counter));
+
+ return val;
+}
+
+static inline void arch_atomic64_set(atomic64_t *v, s64 a)
+{
+ /*
+ * This could have been a simple assignment in "C" but would need
+ * explicit volatile. Otherwise gcc optimizers could elide the store
+ * which borked atomic64 self-test
+ * In the inline asm version, memory clobber needed for exact same
+ * reason, to tell gcc about the store.
+ *
+ * This however is not needed for sibling atomic64_add() etc since both
+ * load/store are explicitly done in inline asm. As long as API is used
+ * for each access, gcc has no way to optimize away any load/store
+ */
+ __asm__ __volatile__(
+ " std %0, [%1] \n"
+ :
+ : "r"(a), "r"(&v->counter)
+ : "memory");
+}
+
+#define ATOMIC64_OP(op, op1, op2) \
+static inline void arch_atomic64_##op(s64 a, atomic64_t *v) \
+{ \
+ s64 val; \
+ \
+ __asm__ __volatile__( \
+ "1: \n" \
+ " llockd %0, [%1] \n" \
+ " " #op1 " %L0, %L0, %L2 \n" \
+ " " #op2 " %H0, %H0, %H2 \n" \
+ " scondd %0, [%1] \n" \
+ " bnz 1b \n" \
+ : "=&r"(val) \
+ : "r"(&v->counter), "ir"(a) \
+ : "cc"); \
+} \
+
+#define ATOMIC64_OP_RETURN(op, op1, op2) \
+static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v) \
+{ \
+ s64 val; \
+ \
+ smp_mb(); \
+ \
+ __asm__ __volatile__( \
+ "1: \n" \
+ " llockd %0, [%1] \n" \
+ " " #op1 " %L0, %L0, %L2 \n" \
+ " " #op2 " %H0, %H0, %H2 \n" \
+ " scondd %0, [%1] \n" \
+ " bnz 1b \n" \
+ : [val] "=&r"(val) \
+ : "r"(&v->counter), "ir"(a) \
+ : "cc"); /* memory clobber comes from smp_mb() */ \
+ \
+ smp_mb(); \
+ \
+ return val; \
+}
+
+#define ATOMIC64_FETCH_OP(op, op1, op2) \
+static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v) \
+{ \
+ s64 val, orig; \
+ \
+ smp_mb(); \
+ \
+ __asm__ __volatile__( \
+ "1: \n" \
+ " llockd %0, [%2] \n" \
+ " " #op1 " %L1, %L0, %L3 \n" \
+ " " #op2 " %H1, %H0, %H3 \n" \
+ " scondd %1, [%2] \n" \
+ " bnz 1b \n" \
+ : "=&r"(orig), "=&r"(val) \
+ : "r"(&v->counter), "ir"(a) \
+ : "cc"); /* memory clobber comes from smp_mb() */ \
+ \
+ smp_mb(); \
+ \
+ return orig; \
+}
+
+#define ATOMIC64_OPS(op, op1, op2) \
+ ATOMIC64_OP(op, op1, op2) \
+ ATOMIC64_OP_RETURN(op, op1, op2) \
+ ATOMIC64_FETCH_OP(op, op1, op2)
+
+ATOMIC64_OPS(add, add.f, adc)
+ATOMIC64_OPS(sub, sub.f, sbc)
+ATOMIC64_OPS(and, and, and)
+ATOMIC64_OPS(andnot, bic, bic)
+ATOMIC64_OPS(or, or, or)
+ATOMIC64_OPS(xor, xor, xor)
+
+#define arch_atomic64_andnot arch_atomic64_andnot
+#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_OP_RETURN
+#undef ATOMIC64_OP
+
+static inline s64
+arch_atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new)
+{
+ s64 prev;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llockd %0, [%1] \n"
+ " brne %L0, %L2, 2f \n"
+ " brne %H0, %H2, 2f \n"
+ " scondd %3, [%1] \n"
+ " bnz 1b \n"
+ "2: \n"
+ : "=&r"(prev)
+ : "r"(ptr), "ir"(expected), "r"(new)
+ : "cc"); /* memory clobber comes from smp_mb() */
+
+ smp_mb();
+
+ return prev;
+}
+
+static inline s64 arch_atomic64_xchg(atomic64_t *ptr, s64 new)
+{
+ s64 prev;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llockd %0, [%1] \n"
+ " scondd %2, [%1] \n"
+ " bnz 1b \n"
+ "2: \n"
+ : "=&r"(prev)
+ : "r"(ptr), "r"(new)
+ : "cc"); /* memory clobber comes from smp_mb() */
+
+ smp_mb();
+
+ return prev;
+}
+
+/**
+ * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
+ * @v: pointer of type atomic64_t
+ *
+ * The function returns the old value of *v minus 1, even if
+ * the atomic variable, v, was not decremented.
+ */
+
+static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+ s64 val;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llockd %0, [%1] \n"
+ " sub.f %L0, %L0, 1 # w0 - 1, set C on borrow\n"
+ " sub.c %H0, %H0, 1 # if C set, w1 - 1\n"
+ " brlt %H0, 0, 2f \n"
+ " scondd %0, [%1] \n"
+ " bnz 1b \n"
+ "2: \n"
+ : "=&r"(val)
+ : "r"(&v->counter)
+ : "cc"); /* memory clobber comes from smp_mb() */
+
+ smp_mb();
+
+ return val;
+}
+#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
+
+/**
+ * arch_atomic64_fetch_add_unless - add unless the number is a given value
+ * @v: pointer of type atomic64_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if it was not @u.
+ * Returns the old value of @v
+ */
+static inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
+{
+ s64 old, temp;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llockd %0, [%2] \n"
+ " brne %L0, %L4, 2f # continue to add since v != u \n"
+ " breq.d %H0, %H4, 3f # return since v == u \n"
+ "2: \n"
+ " add.f %L1, %L0, %L3 \n"
+ " adc %H1, %H0, %H3 \n"
+ " scondd %1, [%2] \n"
+ " bnz 1b \n"
+ "3: \n"
+ : "=&r"(old), "=&r" (temp)
+ : "r"(&v->counter), "r"(a), "r"(u)
+ : "cc"); /* memory clobber comes from smp_mb() */
+
+ smp_mb();
+
+ return old;
+}
+#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
+
+#endif
--
2.25.1

2021-08-04 21:19:14

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 05/11] ARC: atomics: implement relaxed variants

The current ARC fetch/return atomics provide fully ordered semantics
only with 2 full barriers around the operation.

Instead implement them as relaxed variants without any barriers and
rely on generic code to generate the fully-ordered, acquire and release
varaints by adding the appropriate full barriers.

This helps elide some extra barriers in case of acquire/release/relaxed
calls.

bloat-o-meter for hsdk defconfig shows codegen improvements, although
numbers below inflated due to unrelated inlining heuristic changes

| bloat-o-meter vmlinux-643babe34fd7-non-relaxed vmlinux-45aa05cb44d7-relaxed
| add/remove: 2/5 grow/shrink: 42/1222 up/down: 4158/-14312 (-10154)
| Function old new delta
| ..
| sys_renameat 462 476 +14
| ip_mc_inc_group 424 436 +12
| do_read_cache_page 1882 1894 +12
| ..
| refcount_dec_and_mutex_lock 254 250 -4
| refcount_dec_and_lock_irqsave 258 254 -4
| refcount_dec_and_lock 254 250 -4
| ..
| tcp_v6_route_req 246 238 -8
| tcp_v4_destroy_sock 286 278 -8
| tcp_twsk_unique 352 344 -8

Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic-llsc.h | 32 +++++++++++----------------
arch/arc/include/asm/atomic64-arcv2.h | 24 +++++++++++---------
2 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/arch/arc/include/asm/atomic-llsc.h b/arch/arc/include/asm/atomic-llsc.h
index aab4f2855457..088d348781c1 100644
--- a/arch/arc/include/asm/atomic-llsc.h
+++ b/arch/arc/include/asm/atomic-llsc.h
@@ -22,16 +22,10 @@ static inline void arch_atomic_##op(int i, atomic_t *v) \
} \

#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
-static inline int arch_atomic_##op##_return(int i, atomic_t *v) \
+static inline int arch_atomic_##op##_return_relaxed(int i, atomic_t *v) \
{ \
unsigned int val; \
\
- /* \
- * Explicit full memory barrier needed before/after as \
- * LLOCK/SCOND themselves don't provide any such semantics \
- */ \
- smp_mb(); \
- \
__asm__ __volatile__( \
"1: llock %[val], [%[ctr]] \n" \
" " #asm_op " %[val], %[val], %[i] \n" \
@@ -42,22 +36,17 @@ static inline int arch_atomic_##op##_return(int i, atomic_t *v) \
[i] "ir" (i) \
: "cc"); \
\
- smp_mb(); \
- \
return val; \
}

+#define arch_atomic_add_return_relaxed arch_atomic_add_return_relaxed
+#define arch_atomic_sub_return_relaxed arch_atomic_sub_return_relaxed
+
#define ATOMIC_FETCH_OP(op, c_op, asm_op) \
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
+static inline int arch_atomic_fetch_##op##_relaxed(int i, atomic_t *v) \
{ \
unsigned int val, orig; \
\
- /* \
- * Explicit full memory barrier needed before/after as \
- * LLOCK/SCOND themselves don't provide any such semantics \
- */ \
- smp_mb(); \
- \
__asm__ __volatile__( \
"1: llock %[orig], [%[ctr]] \n" \
" " #asm_op " %[val], %[orig], %[i] \n" \
@@ -69,11 +58,17 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
[i] "ir" (i) \
: "cc"); \
\
- smp_mb(); \
- \
return orig; \
}

+#define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add_relaxed
+#define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub_relaxed
+
+#define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and_relaxed
+#define arch_atomic_fetch_andnot_relaxed arch_atomic_fetch_andnot_relaxed
+#define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or_relaxed
+#define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor_relaxed
+
#define ATOMIC_OPS(op, c_op, asm_op) \
ATOMIC_OP(op, c_op, asm_op) \
ATOMIC_OP_RETURN(op, c_op, asm_op) \
@@ -93,7 +88,6 @@ ATOMIC_OPS(or, |=, or)
ATOMIC_OPS(xor, ^=, xor)

#define arch_atomic_andnot arch_atomic_andnot
-#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot

#undef ATOMIC_OPS
#undef ATOMIC_FETCH_OP
diff --git a/arch/arc/include/asm/atomic64-arcv2.h b/arch/arc/include/asm/atomic64-arcv2.h
index 22ef1cbb94e2..c5a8010fdc97 100644
--- a/arch/arc/include/asm/atomic64-arcv2.h
+++ b/arch/arc/include/asm/atomic64-arcv2.h
@@ -64,12 +64,10 @@ static inline void arch_atomic64_##op(s64 a, atomic64_t *v) \
} \

#define ATOMIC64_OP_RETURN(op, op1, op2) \
-static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v) \
+static inline s64 arch_atomic64_##op##_return_relaxed(s64 a, atomic64_t *v) \
{ \
s64 val; \
\
- smp_mb(); \
- \
__asm__ __volatile__( \
"1: \n" \
" llockd %0, [%1] \n" \
@@ -81,18 +79,17 @@ static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v) \
: "r"(&v->counter), "ir"(a) \
: "cc"); /* memory clobber comes from smp_mb() */ \
\
- smp_mb(); \
- \
return val; \
}

+#define arch_atomic64_add_return_relaxed arch_atomic64_add_return_relaxed
+#define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return_relaxed
+
#define ATOMIC64_FETCH_OP(op, op1, op2) \
-static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v) \
+static inline s64 arch_atomic64_fetch_##op##_relaxed(s64 a, atomic64_t *v) \
{ \
s64 val, orig; \
\
- smp_mb(); \
- \
__asm__ __volatile__( \
"1: \n" \
" llockd %0, [%2] \n" \
@@ -104,11 +101,17 @@ static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v) \
: "r"(&v->counter), "ir"(a) \
: "cc"); /* memory clobber comes from smp_mb() */ \
\
- smp_mb(); \
- \
return orig; \
}

+#define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add_relaxed
+#define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub_relaxed
+
+#define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and_relaxed
+#define arch_atomic64_fetch_andnot_relaxed arch_atomic64_fetch_andnot_relaxed
+#define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or_relaxed
+#define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor_relaxed
+
#define ATOMIC64_OPS(op, op1, op2) \
ATOMIC64_OP(op, op1, op2) \
ATOMIC64_OP_RETURN(op, op1, op2) \
@@ -128,7 +131,6 @@ ATOMIC64_OPS(or, or, or)
ATOMIC64_OPS(xor, xor, xor)

#define arch_atomic64_andnot arch_atomic64_andnot
-#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot

#undef ATOMIC64_OPS
#undef ATOMIC64_FETCH_OP
--
2.25.1

2021-08-04 21:20:10

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 11/11] ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants

And move them out of cmpxchg.h to canonical atomic.h

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic.h | 27 +++++++++++++++++++++++++++
arch/arc/include/asm/cmpxchg.h | 23 -----------------------
2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index ee88e1dbaab5..52ee51e1ff7c 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -22,6 +22,33 @@
#include <asm/atomic-spinlock.h>
#endif

+#define arch_atomic_cmpxchg(v, o, n) \
+({ \
+ arch_cmpxchg(&((v)->counter), (o), (n)); \
+})
+
+#ifdef arch_cmpxchg_relaxed
+#define arch_atomic_cmpxchg_relaxed(v, o, n) \
+({ \
+ arch_cmpxchg_relaxed(&((v)->counter), (o), (n)); \
+})
+#endif
+
+#define arch_atomic_xchg(v, n) \
+({ \
+ arch_xchg(&((v)->counter), (n)); \
+})
+
+#ifdef arch_xchg_relaxed
+#define arch_atomic_xchg_relaxed(v, n) \
+({ \
+ arch_xchg_relaxed(&((v)->counter), (n)); \
+})
+#endif
+
+/*
+ * 64-bit atomics
+ */
#ifdef CONFIG_GENERIC_ATOMIC64
#include <asm-generic/atomic64.h>
#else
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e2ae0eb1ca07..c5b544a5fe81 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -80,14 +80,6 @@

#endif

-/*
- * atomic_cmpxchg is same as cmpxchg
- * LLSC: only different in data-type, semantics are exactly same
- * !LLSC: cmpxchg() has to use an external lock atomic_ops_lock to guarantee
- * semantics, and this lock also happens to be used by atomic_*()
- */
-#define arch_atomic_cmpxchg(v, o, n) ((int)arch_cmpxchg(&((v)->counter), (o), (n)))
-
/*
* xchg
*/
@@ -148,19 +140,4 @@

#endif

-/*
- * "atomic" variant of xchg()
- * REQ: It needs to follow the same serialization rules as other atomic_xxx()
- * Since xchg() doesn't always do that, it would seem that following definition
- * is incorrect. But here's the rationale:
- * SMP : Even xchg() takes the atomic_ops_lock, so OK.
- * LLSC: atomic_ops_lock are not relevant at all (even if SMP, since LLSC
- * is natively "SMP safe", no serialization required).
- * UP : other atomics disable IRQ, so no way a difft ctxt atomic_xchg()
- * could clobber them. atomic_xchg() itself would be 1 insn, so it
- * can't be clobbered by others. Thus no serialization required when
- * atomic_xchg is involved.
- */
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
#endif
--
2.25.1

2021-08-04 21:21:22

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 09/11] ARC: cmpxchg/xchg: rewrite as macros to make type safe

Existing code forces/assume args to type "long" which won't work in LP64
regime, so prepare code for that

Interestingly this should be a non functional change but I do see
some codegen changes

| bloat-o-meter vmlinux-cmpxchg-A vmlinux-cmpxchg-B
| add/remove: 0/0 grow/shrink: 17/12 up/down: 218/-150 (68)
|
| Function old new delta
| rwsem_optimistic_spin 518 550 +32
| rwsem_down_write_slowpath 1244 1274 +30
| __do_sys_perf_event_open 2576 2600 +24
| down_read 192 200 +8
| __down_read 192 200 +8
...
| task_work_run 168 148 -20
| dma_fence_chain_walk.part 760 736 -24
| __genradix_ptr_alloc 674 646 -28

Total: Before=6187409, After=6187477, chg +0.00%

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/cmpxchg.h | 209 ++++++++++++++++++---------------
1 file changed, 117 insertions(+), 92 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index bac9b564a140..00deb076d6f6 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -6,6 +6,7 @@
#ifndef __ASM_ARC_CMPXCHG_H
#define __ASM_ARC_CMPXCHG_H

+#include <linux/build_bug.h>
#include <linux/types.h>

#include <asm/barrier.h>
@@ -13,62 +14,77 @@

#ifdef CONFIG_ARC_HAS_LLSC

-static inline unsigned long
-__cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
-{
- unsigned long prev;
-
- /*
- * Explicit full memory barrier needed before/after as
- * LLOCK/SCOND themselves don't provide any such semantics
- */
- smp_mb();
-
- __asm__ __volatile__(
- "1: llock %0, [%1] \n"
- " brne %0, %2, 2f \n"
- " scond %3, [%1] \n"
- " bnz 1b \n"
- "2: \n"
- : "=&r"(prev) /* Early clobber, to prevent reg reuse */
- : "r"(ptr), /* Not "m": llock only supports reg direct addr mode */
- "ir"(expected),
- "r"(new) /* can't be "ir". scond can't take LIMM for "b" */
- : "cc", "memory"); /* so that gcc knows memory is being written here */
-
- smp_mb();
-
- return prev;
-}
-
-#else /* !CONFIG_ARC_HAS_LLSC */
-
-static inline unsigned long
-__cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
-{
- unsigned long flags;
- int prev;
- volatile unsigned long *p = ptr;
-
- /*
- * spin lock/unlock provide the needed smp_mb() before/after
- */
- atomic_ops_lock(flags);
- prev = *p;
- if (prev == expected)
- *p = new;
- atomic_ops_unlock(flags);
- return prev;
-}
+/*
+ * if (*ptr == @old)
+ * *ptr = @new
+ */
+#define __cmpxchg(ptr, old, new) \
+({ \
+ __typeof__(*(ptr)) _prev; \
+ \
+ __asm__ __volatile__( \
+ "1: llock %0, [%1] \n" \
+ " brne %0, %2, 2f \n" \
+ " scond %3, [%1] \n" \
+ " bnz 1b \n" \
+ "2: \n" \
+ : "=&r"(_prev) /* Early clobber prevent reg reuse */ \
+ : "r"(ptr), /* Not "m": llock only supports reg */ \
+ "ir"(old), \
+ "r"(new) /* Not "ir": scond can't take LIMM */ \
+ : "cc", \
+ "memory"); /* gcc knows memory is clobbered */ \
+ \
+ _prev; \
+})

-#endif
+#define arch_cmpxchg(ptr, old, new) \
+({ \
+ __typeof__(ptr) _p_ = (ptr); \
+ __typeof__(*(ptr)) _o_ = (old); \
+ __typeof__(*(ptr)) _n_ = (new); \
+ __typeof__(*(ptr)) _prev_; \
+ \
+ switch(sizeof((_p_))) { \
+ case 4: \
+ /* \
+ * Explicit full memory barrier needed before/after \
+ */ \
+ smp_mb(); \
+ _prev_ = __cmpxchg(_p_, _o_, _n_); \
+ smp_mb(); \
+ break; \
+ default: \
+ BUILD_BUG(); \
+ } \
+ _prev_; \
+})

-#define arch_cmpxchg(ptr, o, n) ({ \
- (typeof(*(ptr)))__cmpxchg((ptr), \
- (unsigned long)(o), \
- (unsigned long)(n)); \
+#else
+
+#define arch_cmpxchg(ptr, old, new) \
+({ \
+ volatile __typeof__(ptr) _p_ = (ptr); \
+ __typeof__(*(ptr)) _o_ = (old); \
+ __typeof__(*(ptr)) _n_ = (new); \
+ __typeof__(*(ptr)) _prev_; \
+ unsigned long __flags; \
+ \
+ BUILD_BUG_ON(sizeof(_p_) != 4); \
+ \
+ /* \
+ * spin lock/unlock provide the needed smp_mb() before/after \
+ */ \
+ atomic_ops_lock(__flags); \
+ _prev_ = *_p_; \
+ if (_prev_ == _o_) \
+ *_p_ = _n_; \
+ atomic_ops_unlock(__flags); \
+ _prev_; \
})

+#endif
+
/*
* atomic_cmpxchg is same as cmpxchg
* LLSC: only different in data-type, semantics are exactly same
@@ -77,55 +93,64 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
*/
#define arch_atomic_cmpxchg(v, o, n) ((int)arch_cmpxchg(&((v)->counter), (o), (n)))

-
/*
- * xchg (reg with memory) based on "Native atomic" EX insn
+ * xchg
*/
-static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
- int size)
-{
- extern unsigned long __xchg_bad_pointer(void);
-
- switch (size) {
- case 4:
- smp_mb();
-
- __asm__ __volatile__(
- " ex %0, [%1] \n"
- : "+r"(val)
- : "r"(ptr)
- : "memory");
+#ifdef CONFIG_ARC_HAS_LLSC

- smp_mb();
+#define __xchg(ptr, val) \
+({ \
+ __asm__ __volatile__( \
+ " ex %0, [%1] \n" /* set new value */ \
+ : "+r"(val) \
+ : "r"(ptr) \
+ : "memory"); \
+ _val_; /* get old value */ \
+})

- return val;
- }
- return __xchg_bad_pointer();
-}
+#define arch_xchg(ptr, val) \
+({ \
+ __typeof__(ptr) _p_ = (ptr); \
+ __typeof__(*(ptr)) _val_ = (val); \
+ \
+ switch(sizeof(*(_p_))) { \
+ case 4: \
+ smp_mb(); \
+ _val_ = __xchg(_p_, _val_); \
+ smp_mb(); \
+ break; \
+ default: \
+ BUILD_BUG(); \
+ } \
+ _val_; \
+})

-#define _xchg(ptr, with) ((typeof(*(ptr)))__xchg((unsigned long)(with), (ptr), \
- sizeof(*(ptr))))
+#else /* !CONFIG_ARC_HAS_LLSC */

/*
- * xchg() maps directly to ARC EX instruction which guarantees atomicity.
- * However in !LLSC config, it also needs to be use @atomic_ops_lock spinlock
- * due to a subtle reason:
- * - For !LLSC, cmpxchg() needs to use that lock (see above) and there is lot
- * of kernel code which calls xchg()/cmpxchg() on same data (see llist.h)
- * Hence xchg() needs to follow same locking rules.
+ * EX instructions is baseline and present in !LLSC too. But in this
+ * regime it still needs use @atomic_ops_lock spinlock to allow interop
+ * with cmpxchg() which uses spinlock in !LLSC
+ * (llist.h use xchg and cmpxchg on sama data)
*/

-#ifndef CONFIG_ARC_HAS_LLSC
-
-#define arch_xchg(ptr, with) \
-({ \
- unsigned long flags; \
- typeof(*(ptr)) old_val; \
- \
- atomic_ops_lock(flags); \
- old_val = _xchg(ptr, with); \
- atomic_ops_unlock(flags); \
- old_val; \
+#define arch_xchg(ptr, val) \
+({ \
+ __typeof__(ptr) _p_ = (ptr); \
+ __typeof__(*(ptr)) _val_ = (val); \
+ \
+ unsigned long __flags; \
+ \
+ atomic_ops_lock(__flags); \
+ \
+ __asm__ __volatile__( \
+ " ex %0, [%1] \n" \
+ : "+r"(_val_) \
+ : "r"(_p_) \
+ : "memory"); \
+ \
+ atomic_ops_unlock(__flags); \
+ _val_; \
})

#endif
--
2.25.1

2021-08-05 01:26:39

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 07/11] ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/bitops.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 4f35130f5ba3..a7daaf64ae34 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -114,7 +114,7 @@ static inline __attribute__ ((const)) unsigned long __ffs(unsigned long word)
* @result: [1-32]
* fls(1) = 1, fls(0x80000000) = 32, fls(0) = 0
*/
-static inline __attribute__ ((const)) int fls(unsigned long x)
+static inline __attribute__ ((const)) int fls(unsigned int x)
{
int n;

@@ -141,7 +141,7 @@ static inline __attribute__ ((const)) int __fls(unsigned long x)
* ffs = Find First Set in word (LSB to MSB)
* @result: [1-32], 0 if all 0's
*/
-static inline __attribute__ ((const)) int ffs(unsigned long x)
+static inline __attribute__ ((const)) int ffs(unsigned int x)
{
int n;

--
2.25.1

2021-08-05 01:26:55

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 02/11] ARC: atomic: !LLSC: remove hack in atomic_set() for for UP

!LLSC atomics use spinlock (SMP) or irq-disable (UP) to implement
criticla regions. UP atomic_set() however was "cheating" by not doing
any of that so and still being functional.

Remove this anomaly (primarily as cleanup for future code improvements)
given that this config is not worth hassle of special case code.

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic-spinlock.h | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/arc/include/asm/atomic-spinlock.h b/arch/arc/include/asm/atomic-spinlock.h
index bdf87610b2d7..8c6fd0e651e5 100644
--- a/arch/arc/include/asm/atomic-spinlock.h
+++ b/arch/arc/include/asm/atomic-spinlock.h
@@ -3,12 +3,10 @@
#ifndef _ASM_ARC_ATOMIC_SPLOCK_H
#define _ASM_ARC_ATOMIC_SPLOCK_H

-#ifndef CONFIG_SMP
-
- /* violating atomic_xxx API locking protocol in UP for optimization sake */
-#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-
-#else
+/*
+ * Non hardware assisted Atomic-R-M-W
+ * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
+ */

static inline void arch_atomic_set(atomic_t *v, int i)
{
@@ -30,13 +28,6 @@ static inline void arch_atomic_set(atomic_t *v, int i)

#define arch_atomic_set_release(v, i) arch_atomic_set((v), (i))

-#endif
-
-/*
- * Non hardware assisted Atomic-R-M-W
- * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
- */
-
#define ATOMIC_OP(op, c_op, asm_op) \
static inline void arch_atomic_##op(int i, atomic_t *v) \
{ \
--
2.25.1

2021-08-05 01:27:03

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 06/11] ARC: switch to generic bitops

From: Will Deacon <[email protected]>

- !LLSC now only needs a single spinlock for atomics and bitops

- Some codegen changes (slight bloat) with generic bitops

1. code increase due to LD-check-atomic paradigm vs. unconditonal
atomic (but dirty'ing the cache line even if set already).
So despite increase, generic is right thing to do.

2. code decrease (but use of costlier instructions such as DIV vs.
shifts based math) due to signed arithmetic.
This needs to be revisited seperately.

arc:
static inline int test_bit(unsigned int nr, const volatile unsigned long *addr)
^^^^^^^^^^^^
generic:
static inline int test_bit(int nr, const volatile unsigned long *addr)
^^^

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
[vgupta: wrote patch based on Will's poc, analysed codegen diffs]
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/bitops.h | 184 +---------------------------------
arch/arc/include/asm/smp.h | 14 ---
arch/arc/kernel/smp.c | 2 -
3 files changed, 2 insertions(+), 198 deletions(-)

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index fb98440c0bd4..4f35130f5ba3 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -14,188 +14,6 @@

#include <linux/types.h>
#include <linux/compiler.h>
-#include <asm/barrier.h>
-#ifndef CONFIG_ARC_HAS_LLSC
-#include <asm/smp.h>
-#endif
-
-#ifdef CONFIG_ARC_HAS_LLSC
-
-/*
- * Hardware assisted Atomic-R-M-W
- */
-
-#define BIT_OP(op, c_op, asm_op) \
-static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
-{ \
- unsigned int temp; \
- \
- m += nr >> 5; \
- \
- nr &= 0x1f; \
- \
- __asm__ __volatile__( \
- "1: llock %0, [%1] \n" \
- " " #asm_op " %0, %0, %2 \n" \
- " scond %0, [%1] \n" \
- " bnz 1b \n" \
- : "=&r"(temp) /* Early clobber, to prevent reg reuse */ \
- : "r"(m), /* Not "m": llock only supports reg direct addr mode */ \
- "ir"(nr) \
- : "cc"); \
-}
-
-/*
- * Semantically:
- * Test the bit
- * if clear
- * set it and return 0 (old value)
- * else
- * return 1 (old value).
- *
- * Since ARC lacks a equivalent h/w primitive, the bit is set unconditionally
- * and the old value of bit is returned
- */
-#define TEST_N_BIT_OP(op, c_op, asm_op) \
-static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
-{ \
- unsigned long old, temp; \
- \
- m += nr >> 5; \
- \
- nr &= 0x1f; \
- \
- /* \
- * Explicit full memory barrier needed before/after as \
- * LLOCK/SCOND themselves don't provide any such smenatic \
- */ \
- smp_mb(); \
- \
- __asm__ __volatile__( \
- "1: llock %0, [%2] \n" \
- " " #asm_op " %1, %0, %3 \n" \
- " scond %1, [%2] \n" \
- " bnz 1b \n" \
- : "=&r"(old), "=&r"(temp) \
- : "r"(m), "ir"(nr) \
- : "cc"); \
- \
- smp_mb(); \
- \
- return (old & (1 << nr)) != 0; \
-}
-
-#else /* !CONFIG_ARC_HAS_LLSC */
-
-/*
- * Non hardware assisted Atomic-R-M-W
- * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
- *
- * There's "significant" micro-optimization in writing our own variants of
- * bitops (over generic variants)
- *
- * (1) The generic APIs have "signed" @nr while we have it "unsigned"
- * This avoids extra code to be generated for pointer arithmatic, since
- * is "not sure" that index is NOT -ve
- * (2) Utilize the fact that ARCompact bit fidding insn (BSET/BCLR/ASL) etc
- * only consider bottom 5 bits of @nr, so NO need to mask them off.
- * (GCC Quirk: however for constant @nr we still need to do the masking
- * at compile time)
- */
-
-#define BIT_OP(op, c_op, asm_op) \
-static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
-{ \
- unsigned long temp, flags; \
- m += nr >> 5; \
- \
- /* \
- * spin lock/unlock provide the needed smp_mb() before/after \
- */ \
- bitops_lock(flags); \
- \
- temp = *m; \
- *m = temp c_op (1UL << (nr & 0x1f)); \
- \
- bitops_unlock(flags); \
-}
-
-#define TEST_N_BIT_OP(op, c_op, asm_op) \
-static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
-{ \
- unsigned long old, flags; \
- m += nr >> 5; \
- \
- bitops_lock(flags); \
- \
- old = *m; \
- *m = old c_op (1UL << (nr & 0x1f)); \
- \
- bitops_unlock(flags); \
- \
- return (old & (1UL << (nr & 0x1f))) != 0; \
-}
-
-#endif
-
-/***************************************
- * Non atomic variants
- **************************************/
-
-#define __BIT_OP(op, c_op, asm_op) \
-static inline void __##op##_bit(unsigned long nr, volatile unsigned long *m) \
-{ \
- unsigned long temp; \
- m += nr >> 5; \
- \
- temp = *m; \
- *m = temp c_op (1UL << (nr & 0x1f)); \
-}
-
-#define __TEST_N_BIT_OP(op, c_op, asm_op) \
-static inline int __test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
-{ \
- unsigned long old; \
- m += nr >> 5; \
- \
- old = *m; \
- *m = old c_op (1UL << (nr & 0x1f)); \
- \
- return (old & (1UL << (nr & 0x1f))) != 0; \
-}
-
-#define BIT_OPS(op, c_op, asm_op) \
- \
- /* set_bit(), clear_bit(), change_bit() */ \
- BIT_OP(op, c_op, asm_op) \
- \
- /* test_and_set_bit(), test_and_clear_bit(), test_and_change_bit() */\
- TEST_N_BIT_OP(op, c_op, asm_op) \
- \
- /* __set_bit(), __clear_bit(), __change_bit() */ \
- __BIT_OP(op, c_op, asm_op) \
- \
- /* __test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit() */\
- __TEST_N_BIT_OP(op, c_op, asm_op)
-
-BIT_OPS(set, |, bset)
-BIT_OPS(clear, & ~, bclr)
-BIT_OPS(change, ^, bxor)
-
-/*
- * This routine doesn't need to be atomic.
- */
-static inline int
-test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
- unsigned long mask;
-
- addr += nr >> 5;
-
- mask = 1UL << (nr & 0x1f);
-
- return ((mask & *addr) != 0);
-}

#ifdef CONFIG_ISA_ARCOMPACT

@@ -368,6 +186,8 @@ static inline __attribute__ ((const)) unsigned long __ffs(unsigned long x)
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/non-atomic.h>

#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/le.h>
diff --git a/arch/arc/include/asm/smp.h b/arch/arc/include/asm/smp.h
index c5de4008d19f..d856491606ac 100644
--- a/arch/arc/include/asm/smp.h
+++ b/arch/arc/include/asm/smp.h
@@ -105,7 +105,6 @@ static inline const char *arc_platform_smp_cpuinfo(void)
#include <asm/spinlock.h>

extern arch_spinlock_t smp_atomic_ops_lock;
-extern arch_spinlock_t smp_bitops_lock;

#define atomic_ops_lock(flags) do { \
local_irq_save(flags); \
@@ -117,24 +116,11 @@ extern arch_spinlock_t smp_bitops_lock;
local_irq_restore(flags); \
} while (0)

-#define bitops_lock(flags) do { \
- local_irq_save(flags); \
- arch_spin_lock(&smp_bitops_lock); \
-} while (0)
-
-#define bitops_unlock(flags) do { \
- arch_spin_unlock(&smp_bitops_lock); \
- local_irq_restore(flags); \
-} while (0)
-
#else /* !CONFIG_SMP */

#define atomic_ops_lock(flags) local_irq_save(flags)
#define atomic_ops_unlock(flags) local_irq_restore(flags)

-#define bitops_lock(flags) local_irq_save(flags)
-#define bitops_unlock(flags) local_irq_restore(flags)
-
#endif /* !CONFIG_SMP */

#endif /* !CONFIG_ARC_HAS_LLSC */
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index db0e104d6835..62f641c9a493 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -29,10 +29,8 @@

#ifndef CONFIG_ARC_HAS_LLSC
arch_spinlock_t smp_atomic_ops_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-arch_spinlock_t smp_bitops_lock = __ARCH_SPIN_LOCK_UNLOCKED;

EXPORT_SYMBOL_GPL(smp_atomic_ops_lock);
-EXPORT_SYMBOL_GPL(smp_bitops_lock);
#endif

struct plat_smp_ops __weak plat_smp_ops;
--
2.25.1

2021-08-05 01:34:08

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 08/11] ARC: xchg: !LLSC: remove UP micro-optimization/hack

It gets in the way of cleaning things up and is a maintenance
pain-in-neck !

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/cmpxchg.h | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index d42917e803e1..bac9b564a140 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -113,15 +113,9 @@ static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
* - For !LLSC, cmpxchg() needs to use that lock (see above) and there is lot
* of kernel code which calls xchg()/cmpxchg() on same data (see llist.h)
* Hence xchg() needs to follow same locking rules.
- *
- * Technically the lock is also needed for UP (boils down to irq save/restore)
- * but we can cheat a bit since cmpxchg() atomic_ops_lock() would cause irqs to
- * be disabled thus can't possibly be interrupted/preempted/clobbered by xchg()
- * Other way around, xchg is one instruction anyways, so can't be interrupted
- * as such
*/

-#if !defined(CONFIG_ARC_HAS_LLSC) && defined(CONFIG_SMP)
+#ifndef CONFIG_ARC_HAS_LLSC

#define arch_xchg(ptr, with) \
({ \
@@ -134,10 +128,6 @@ static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
old_val; \
})

-#else
-
-#define arch_xchg(ptr, with) _xchg(ptr, with)
-
#endif

/*
--
2.25.1

2021-08-05 01:34:53

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 10/11] ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)

It only makes sense to do this for the LLSC config

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/cmpxchg.h | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index 00deb076d6f6..e2ae0eb1ca07 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -38,7 +38,7 @@
_prev; \
})

-#define arch_cmpxchg(ptr, old, new) \
+#define arch_cmpxchg_relaxed(ptr, old, new) \
({ \
__typeof__(ptr) _p_ = (ptr); \
__typeof__(*(ptr)) _o_ = (old); \
@@ -47,12 +47,7 @@
\
switch(sizeof((_p_))) { \
case 4: \
- /* \
- * Explicit full memory barrier needed before/after \
- */ \
- smp_mb(); \
_prev_ = __cmpxchg(_p_, _o_, _n_); \
- smp_mb(); \
break; \
default: \
BUILD_BUG(); \
@@ -108,16 +103,14 @@
_val_; /* get old value */ \
})

-#define arch_xchg(ptr, val) \
+#define arch_xchg_relaxed(ptr, val) \
({ \
__typeof__(ptr) _p_ = (ptr); \
__typeof__(*(ptr)) _val_ = (val); \
\
switch(sizeof(*(_p_))) { \
case 4: \
- smp_mb(); \
_val_ = __xchg(_p_, _val_); \
- smp_mb(); \
break; \
default: \
BUILD_BUG(); \
--
2.25.1

2021-08-05 22:37:00

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 00/11] ARC atomics update

On 8/5/21 2:02 AM, Peter Zijlstra wrote:
> On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:
>
>> Vineet Gupta (10):
>> ARC: atomics: disintegrate header
>> ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
>> ARC: atomic: !LLSC: use int data type consistently
>> ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
>> ARC: atomics: implement relaxed variants
>> ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
>> ARC: xchg: !LLSC: remove UP micro-optimization/hack
>> ARC: cmpxchg/xchg: rewrite as macros to make type safe
>> ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
>> ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
>>
>> Will Deacon (1):
>> ARC: switch to generic bitops
>
> Didn't see any weird things:
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>

Thx Peter. A lot of this is your code anyways ;-)

Any initial thoughts/comments on patch 06/11 - is there an obvious
reason that generic bitops take signed @nr or the hurdle is need to be
done consistently cross-arch.

-Vineet

2021-08-06 14:53:28

[permalink] [raw]

Subject: Re: [PATCH 00/11] ARC atomics update

On Thu, Aug 05, 2021 at 07:04:32PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 05, 2021 at 04:18:29PM +0000, Vineet Gupta wrote:
> > On 8/5/21 2:02 AM, Peter Zijlstra wrote:
> > > On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:
> > >
> > >> Vineet Gupta (10):
> > >> ARC: atomics: disintegrate header
> > >> ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
> > >> ARC: atomic: !LLSC: use int data type consistently
> > >> ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
> > >> ARC: atomics: implement relaxed variants
> > >> ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
> > >> ARC: xchg: !LLSC: remove UP micro-optimization/hack
> > >> ARC: cmpxchg/xchg: rewrite as macros to make type safe
> > >> ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
> > >> ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
> > >>
> > >> Will Deacon (1):
> > >> ARC: switch to generic bitops
> > >
> > > Didn't see any weird things:
> > >
> > > Acked-by: Peter Zijlstra (Intel) <[email protected]>
> >
> > Thx Peter. A lot of this is your code anyways ;-)
> >
> > Any initial thoughts/comments on patch 06/11 - is there an obvious
> > reason that generic bitops take signed @nr or the hurdle is need to be
> > done consistently cross-arch.
>
> That does indeed seem daft and ready for a cleanup. Will any
> recollection from when you touched this?

I had a patch to fix this but it blew up in the robot and I didn't get round
to reworking it:

https://lore.kernel.org/patchwork/patch/1245555/

Will