2019-09-30 23:12:28

by Paul Burton

[permalink] [raw]
Subject: [PATCH 00/37] MIPS: barriers & atomics cleanups

This series consists of a bunch of cleanups to the way we handle memory
barriers (though no changes to the sync instructions we use to implement
them) & atomic memory accesses. One major goal was to ensure the
Loongson3 LL/SC errata workarounds are applied in a safe manner from
within inline-asm & that we can automatically verify the resulting
kernel binary looks reasonable. Many patches are cleanups found along
the way.

Applies atop v5.4-rc1.

Paul Burton (37):
MIPS: Unify sc beqz definition
MIPS: Use compact branch for LL/SC loops on MIPSr6+
MIPS: barrier: Add __SYNC() infrastructure
MIPS: barrier: Clean up rmb() & wmb() definitions
MIPS: barrier: Clean up __smp_mb() definition
MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
MIPS: barrier: Clean up __sync() definition
MIPS: barrier: Clean up sync_ginv()
MIPS: atomic: Fix whitespace in ATOMIC_OP macros
MIPS: atomic: Handle !kernel_uses_llsc first
MIPS: atomic: Use one macro to generate 32b & 64b functions
MIPS: atomic: Emit Loongson3 sync workarounds within asm
MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()
MIPS: atomic: Unify 32b & 64b sub_if_positive
MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg
MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz
MIPS: bitops: Handle !kernel_uses_llsc first
MIPS: bitops: Only use ins for bit 16 or higher
MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs
MIPS: bitops: ins start position is always an immediate
MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant
MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit
MIPS: bitops: Use the BIT() macro
MIPS: bitops: Avoid redundant zero-comparison for non-LLSC
MIPS: bitops: Abstract LL/SC loops
MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG
MIPS: bitops: Emit Loongson3 sync workarounds within asm
MIPS: bitops: Use smp_mb__before_atomic in test_* ops
MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm
MIPS: cmpxchg: Omit redundant barriers for Loongson3
MIPS: futex: Emit Loongson3 sync workarounds within asm
MIPS: syscall: Emit Loongson3 sync workarounds within asm
MIPS: barrier: Remove loongson_llsc_mb()
MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3
MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler
MIPS: genex: Don't reload address unnecessarily
MIPS: Check Loongson3 LL/SC errata workaround correctness

arch/mips/Makefile | 2 +-
arch/mips/Makefile.postlink | 10 +-
arch/mips/include/asm/atomic.h | 571 ++++++-----------
arch/mips/include/asm/barrier.h | 215 +------
arch/mips/include/asm/bitops.h | 593 ++++--------------
arch/mips/include/asm/cmpxchg.h | 59 +-
arch/mips/include/asm/cpu-features.h | 10 -
arch/mips/include/asm/futex.h | 9 +-
arch/mips/include/asm/llsc.h | 19 +-
.../asm/mach-malta/cpu-feature-overrides.h | 2 -
arch/mips/include/asm/sync.h | 207 ++++++
arch/mips/kernel/genex.S | 6 +-
arch/mips/kernel/pm-cps.c | 20 +-
arch/mips/kernel/syscall.c | 3 +-
arch/mips/lib/bitops.c | 57 +-
arch/mips/loongson64/Platform | 2 +-
arch/mips/tools/.gitignore | 1 +
arch/mips/tools/Makefile | 5 +
arch/mips/tools/loongson3-llsc-check.c | 307 +++++++++
19 files changed, 975 insertions(+), 1123 deletions(-)
create mode 100644 arch/mips/include/asm/sync.h
create mode 100644 arch/mips/tools/loongson3-llsc-check.c

--
2.23.0


2019-09-30 23:12:38

by Paul Burton

[permalink] [raw]
Subject: [PATCH 32/37] MIPS: syscall: Emit Loongson3 sync workarounds within asm

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/kernel/syscall.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index b0e25e913bdb..3ea288ca35f1 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -37,6 +37,7 @@
#include <asm/signal.h>
#include <asm/sim.h>
#include <asm/shmparam.h>
+#include <asm/sync.h>
#include <asm/sysmips.h>
#include <asm/switch_to.h>

@@ -132,12 +133,12 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new)
[efault] "i" (-EFAULT)
: "memory");
} else if (cpu_has_llsc) {
- loongson_llsc_mb();
__asm__ __volatile__ (
" .set push \n"
" .set "MIPS_ISA_ARCH_LEVEL" \n"
" li %[err], 0 \n"
"1: \n"
+ " " __SYNC(full, loongson3_war) " \n"
user_ll("%[old]", "(%[addr])")
" move %[tmp], %[new] \n"
"2: \n"
--
2.23.0

2019-09-30 23:12:41

by Paul Burton

[permalink] [raw]
Subject: [PATCH 30/37] MIPS: cmpxchg: Omit redundant barriers for Loongson3

When building a kernel configured to support Loongson3 LL/SC workarounds
(ie. CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) the inline assembly in
__xchg_asm() & __cmpxchg_asm() already emits completion barriers, and as
such we don't need to emit extra barriers from the xchg() or cmpxchg()
macros. Add compile-time constant checks causing us to omit the
redundant memory barriers.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index fc121d20a980..820df68e32e1 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -94,7 +94,13 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
({ \
__typeof__(*(ptr)) __res; \
\
- smp_mb__before_llsc(); \
+ /* \
+ * In the Loongson3 workaround case __xchg_asm() already \
+ * contains a completion barrier prior to the LL, so we don't \
+ * need to emit an extra one here. \
+ */ \
+ if (!__SYNC_loongson3_war) \
+ smp_mb__before_llsc(); \
\
__res = (__typeof__(*(ptr))) \
__xchg((ptr), (unsigned long)(x), sizeof(*(ptr))); \
@@ -179,9 +185,23 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
({ \
__typeof__(*(ptr)) __res; \
\
- smp_mb__before_llsc(); \
+ /* \
+ * In the Loongson3 workaround case __cmpxchg_asm() already \
+ * contains a completion barrier prior to the LL, so we don't \
+ * need to emit an extra one here. \
+ */ \
+ if (!__SYNC_loongson3_war) \
+ smp_mb__before_llsc(); \
+ \
__res = cmpxchg_local((ptr), (old), (new)); \
- smp_llsc_mb(); \
+ \
+ /* \
+ * In the Loongson3 workaround case __cmpxchg_asm() already \
+ * contains a completion barrier after the SC, so we don't \
+ * need to emit an extra one here. \
+ */ \
+ if (!__SYNC_loongson3_war) \
+ smp_llsc_mb(); \
\
__res; \
})
--
2.23.0

2019-09-30 23:12:56

by Paul Burton

[permalink] [raw]
Subject: [PATCH 36/37] MIPS: genex: Don't reload address unnecessarily

In ejtag_debug_handler() we must reload the address of
ejtag_debug_buffer_spinlock if an sc fails, since the address in k0 will
have been clobbered by the result of the sc instruction. In the case
where we simply load a non-zero value (ie. there's contention for the
lock) the address will not be clobbered & we can simply branch back to
repeat the load from memory without reloading the address into k0.

The primary motivation for this change is that it moves the target of
the bnez instruction to an instruction within the LL/SC loop (the LL
itself), which we know contains no other memory accesses & therefore
isn't affected by Loongson3 LL/SC errata.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/kernel/genex.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index ac4f2b835165..60ede6b75a3b 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -355,8 +355,8 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
#ifdef CONFIG_SMP
1: PTR_LA k0, ejtag_debug_buffer_spinlock
__SYNC(full, loongson3_war)
- ll k0, 0(k0)
- bnez k0, 1b
+2: ll k0, 0(k0)
+ bnez k0, 2b
PTR_LA k0, ejtag_debug_buffer_spinlock
sc k0, 0(k0)
beqz k0, 1b
--
2.23.0

2019-09-30 23:13:08

by Paul Burton

[permalink] [raw]
Subject: [PATCH 33/37] MIPS: barrier: Remove loongson_llsc_mb()

The loongson_llsc_mb() macro is no longer used - instead barriers are
emitted as part of inline asm using the __SYNC() macro. Remove the
now-defunct loongson_llsc_mb() macro.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/barrier.h | 40 ---------------------------------
arch/mips/loongson64/Platform | 2 +-
2 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index c7e05e832da9..1a99a6c5b5dd 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -121,46 +121,6 @@ static inline void wmb(void)
#define __smp_mb__before_atomic() __smp_mb__before_llsc()
#define __smp_mb__after_atomic() smp_llsc_mb()

-/*
- * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
- * store or prefetch) in between an LL & SC can cause the SC instruction to
- * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
- * containing such sequences, this bug bites harder than we might otherwise
- * expect due to reordering & speculation:
- *
- * 1) A memory access appearing prior to the LL in program order may actually
- * be executed after the LL - this is the reordering case.
- *
- * In order to avoid this we need to place a memory barrier (ie. a SYNC
- * instruction) prior to every LL instruction, in between it and any earlier
- * memory access instructions.
- *
- * This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
- *
- * 2) If a conditional branch exists between an LL & SC with a target outside
- * of the LL-SC loop, for example an exit upon value mismatch in cmpxchg()
- * or similar, then misprediction of the branch may allow speculative
- * execution of memory accesses from outside of the LL-SC loop.
- *
- * In order to avoid this we need a memory barrier (ie. a SYNC instruction)
- * at each affected branch target, for which we also use loongson_llsc_mb()
- * defined below.
- *
- * This case affects all current Loongson 3 CPUs.
- *
- * The above described cases cause an error in the cache coherence protocol;
- * such that the Invalidate of a competing LL-SC goes 'missing' and SC
- * erroneously observes its core still has Exclusive state and lets the SC
- * proceed.
- *
- * Therefore the error only occurs on SMP systems.
- */
-#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
-#define loongson_llsc_mb() __asm__ __volatile__("sync" : : :"memory")
-#else
-#define loongson_llsc_mb() do { } while (0)
-#endif
-
static inline void sync_ginv(void)
{
asm volatile(__SYNC(ginv, always));
diff --git a/arch/mips/loongson64/Platform b/arch/mips/loongson64/Platform
index c1a4d4dc4665..28172500f95a 100644
--- a/arch/mips/loongson64/Platform
+++ b/arch/mips/loongson64/Platform
@@ -27,7 +27,7 @@ cflags-$(CONFIG_CPU_LOONGSON3) += -Wa,--trap
#
# Some versions of binutils, not currently mainline as of 2019/02/04, support
# an -mfix-loongson3-llsc flag which emits a sync prior to each ll instruction
-# to work around a CPU bug (see loongson_llsc_mb() in asm/barrier.h for a
+# to work around a CPU bug (see __SYNC_loongson3_war in asm/sync.h for a
# description).
#
# We disable this in order to prevent the assembler meddling with the
--
2.23.0

2019-09-30 23:13:19

by Paul Burton

[permalink] [raw]
Subject: [PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher

set_bit() can set bits 0-15 using an ori instruction, rather than
loading the value -1 into a register & then using an ins instruction.

That is, rather than the following:

li t0, -1
ll t1, 0(t2)
ins t1, t0, 4, 1
sc t1, 0(t2)

We can have the simpler:

ll t1, 0(t2)
ori t1, t1, 0x10
sc t1, 0(t2)

The or path already allows immediates to be used, so simply restricting
the ins path to bits that don't fit in immediates is sufficient to take
advantage of this.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/bitops.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index d3f3f37ca0b1..3ea4f172ac08 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -77,7 +77,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
}

#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
- if (__builtin_constant_p(bit)) {
+ if (__builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
--
2.23.0

2019-09-30 23:13:34

by Paul Burton

[permalink] [raw]
Subject: [PATCH 11/37] MIPS: atomic: Use one macro to generate 32b & 64b functions

Cut down on duplication by generalizing the ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros to work for both 32b &
64b atomics, and removing the ATOMIC64_ variants. This ensures
consistency between our atomic_* & atomic64_* functions.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/atomic.h | 196 ++++++++-------------------------
1 file changed, 45 insertions(+), 151 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index ace2ea005588..b834af5a7382 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -42,10 +42,10 @@
*/
#define atomic_set(v, i) WRITE_ONCE((v)->counter, (i))

-#define ATOMIC_OP(op, c_op, asm_op) \
-static __inline__ void atomic_##op(int i, atomic_t * v) \
+#define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
+static __inline__ void pfx##_##op(type i, pfx##_t * v) \
{ \
- int temp; \
+ type temp; \
\
if (!kernel_uses_llsc) { \
unsigned long flags; \
@@ -60,19 +60,19 @@ static __inline__ void atomic_##op(int i, atomic_t * v) \
__asm__ __volatile__( \
" .set push \n" \
" .set " MIPS_ISA_LEVEL " \n" \
- "1: ll %0, %1 # atomic_" #op " \n" \
+ "1: " #ll " %0, %1 # " #pfx "_" #op " \n" \
" " #asm_op " %0, %2 \n" \
- " sc %0, %1 \n" \
+ " " #sc " %0, %1 \n" \
"\t" __SC_BEQZ "%0, 1b \n" \
" .set pop \n" \
: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter) \
: "Ir" (i) : __LLSC_CLOBBER); \
}

-#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
-static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v) \
+#define ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc) \
+static __inline__ type pfx##_##op##_return_relaxed(type i, pfx##_t * v) \
{ \
- int temp, result; \
+ type temp, result; \
\
if (!kernel_uses_llsc) { \
unsigned long flags; \
@@ -89,9 +89,9 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v) \
__asm__ __volatile__( \
" .set push \n" \
" .set " MIPS_ISA_LEVEL " \n" \
- "1: ll %1, %2 # atomic_" #op "_return \n" \
+ "1: " #ll " %1, %2 # " #pfx "_" #op "_return\n" \
" " #asm_op " %0, %1, %3 \n" \
- " sc %0, %2 \n" \
+ " " #sc " %0, %2 \n" \
"\t" __SC_BEQZ "%0, 1b \n" \
" " #asm_op " %0, %1, %3 \n" \
" .set pop \n" \
@@ -102,8 +102,8 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v) \
return result; \
}

-#define ATOMIC_FETCH_OP(op, c_op, asm_op) \
-static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \
+#define ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc) \
+static __inline__ type pfx##_fetch_##op##_relaxed(type i, pfx##_t * v) \
{ \
int temp, result; \
\
@@ -120,10 +120,10 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \
loongson_llsc_mb(); \
__asm__ __volatile__( \
" .set push \n" \
- " .set "MIPS_ISA_LEVEL" \n" \
- "1: ll %1, %2 # atomic_fetch_" #op " \n" \
+ " .set " MIPS_ISA_LEVEL " \n" \
+ "1: " #ll " %1, %2 # " #pfx "_fetch_" #op "\n" \
" " #asm_op " %0, %1, %3 \n" \
- " sc %0, %2 \n" \
+ " " #sc " %0, %2 \n" \
"\t" __SC_BEQZ "%0, 1b \n" \
" .set pop \n" \
" move %0, %1 \n" \
@@ -134,32 +134,50 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \
return result; \
}

-#define ATOMIC_OPS(op, c_op, asm_op) \
- ATOMIC_OP(op, c_op, asm_op) \
- ATOMIC_OP_RETURN(op, c_op, asm_op) \
- ATOMIC_FETCH_OP(op, c_op, asm_op)
+#define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc) \
+ ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
+ ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc) \
+ ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc)

-ATOMIC_OPS(add, +=, addu)
-ATOMIC_OPS(sub, -=, subu)
+ATOMIC_OPS(atomic, add, int, +=, addu, ll, sc)
+ATOMIC_OPS(atomic, sub, int, -=, subu, ll, sc)

#define atomic_add_return_relaxed atomic_add_return_relaxed
#define atomic_sub_return_relaxed atomic_sub_return_relaxed
#define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
#define atomic_fetch_sub_relaxed atomic_fetch_sub_relaxed

+#ifdef CONFIG_64BIT
+ATOMIC_OPS(atomic64, add, s64, +=, daddu, lld, scd)
+ATOMIC_OPS(atomic64, sub, s64, -=, dsubu, lld, scd)
+# define atomic64_add_return_relaxed atomic64_add_return_relaxed
+# define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
+# define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
+# define atomic64_fetch_sub_relaxed atomic64_fetch_sub_relaxed
+#endif /* CONFIG_64BIT */
+
#undef ATOMIC_OPS
-#define ATOMIC_OPS(op, c_op, asm_op) \
- ATOMIC_OP(op, c_op, asm_op) \
- ATOMIC_FETCH_OP(op, c_op, asm_op)
+#define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc) \
+ ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
+ ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc)

-ATOMIC_OPS(and, &=, and)
-ATOMIC_OPS(or, |=, or)
-ATOMIC_OPS(xor, ^=, xor)
+ATOMIC_OPS(atomic, and, int, &=, and, ll, sc)
+ATOMIC_OPS(atomic, or, int, |=, or, ll, sc)
+ATOMIC_OPS(atomic, xor, int, ^=, xor, ll, sc)

#define atomic_fetch_and_relaxed atomic_fetch_and_relaxed
#define atomic_fetch_or_relaxed atomic_fetch_or_relaxed
#define atomic_fetch_xor_relaxed atomic_fetch_xor_relaxed

+#ifdef CONFIG_64BIT
+ATOMIC_OPS(atomic64, and, s64, &=, and, lld, scd)
+ATOMIC_OPS(atomic64, or, s64, |=, or, lld, scd)
+ATOMIC_OPS(atomic64, xor, s64, ^=, xor, lld, scd)
+# define atomic64_fetch_and_relaxed atomic64_fetch_and_relaxed
+# define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
+# define atomic64_fetch_xor_relaxed atomic64_fetch_xor_relaxed
+#endif
+
#undef ATOMIC_OPS
#undef ATOMIC_FETCH_OP
#undef ATOMIC_OP_RETURN
@@ -243,130 +261,6 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
*/
#define atomic64_set(v, i) WRITE_ONCE((v)->counter, (i))

-#define ATOMIC64_OP(op, c_op, asm_op) \
-static __inline__ void atomic64_##op(s64 i, atomic64_t * v) \
-{ \
- if (kernel_uses_llsc) { \
- s64 temp; \
- \
- loongson_llsc_mb(); \
- __asm__ __volatile__( \
- " .set push \n" \
- " .set "MIPS_ISA_LEVEL" \n" \
- "1: lld %0, %1 # atomic64_" #op " \n" \
- " " #asm_op " %0, %2 \n" \
- " scd %0, %1 \n" \
- "\t" __SC_BEQZ "%0, 1b \n" \
- " .set pop \n" \
- : "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter) \
- : "Ir" (i) : __LLSC_CLOBBER); \
- } else { \
- unsigned long flags; \
- \
- raw_local_irq_save(flags); \
- v->counter c_op i; \
- raw_local_irq_restore(flags); \
- } \
-}
-
-#define ATOMIC64_OP_RETURN(op, c_op, asm_op) \
-static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v) \
-{ \
- s64 result; \
- \
- if (kernel_uses_llsc) { \
- s64 temp; \
- \
- loongson_llsc_mb(); \
- __asm__ __volatile__( \
- " .set push \n" \
- " .set "MIPS_ISA_LEVEL" \n" \
- "1: lld %1, %2 # atomic64_" #op "_return\n" \
- " " #asm_op " %0, %1, %3 \n" \
- " scd %0, %2 \n" \
- "\t" __SC_BEQZ "%0, 1b \n" \
- " " #asm_op " %0, %1, %3 \n" \
- " .set pop \n" \
- : "=&r" (result), "=&r" (temp), \
- "+" GCC_OFF_SMALL_ASM() (v->counter) \
- : "Ir" (i) : __LLSC_CLOBBER); \
- } else { \
- unsigned long flags; \
- \
- raw_local_irq_save(flags); \
- result = v->counter; \
- result c_op i; \
- v->counter = result; \
- raw_local_irq_restore(flags); \
- } \
- \
- return result; \
-}
-
-#define ATOMIC64_FETCH_OP(op, c_op, asm_op) \
-static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v) \
-{ \
- s64 result; \
- \
- if (kernel_uses_llsc) { \
- s64 temp; \
- \
- loongson_llsc_mb(); \
- __asm__ __volatile__( \
- " .set push \n" \
- " .set "MIPS_ISA_LEVEL" \n" \
- "1: lld %1, %2 # atomic64_fetch_" #op "\n" \
- " " #asm_op " %0, %1, %3 \n" \
- " scd %0, %2 \n" \
- "\t" __SC_BEQZ "%0, 1b \n" \
- " move %0, %1 \n" \
- " .set pop \n" \
- : "=&r" (result), "=&r" (temp), \
- "+" GCC_OFF_SMALL_ASM() (v->counter) \
- : "Ir" (i) : __LLSC_CLOBBER); \
- } else { \
- unsigned long flags; \
- \
- raw_local_irq_save(flags); \
- result = v->counter; \
- v->counter c_op i; \
- raw_local_irq_restore(flags); \
- } \
- \
- return result; \
-}
-
-#define ATOMIC64_OPS(op, c_op, asm_op) \
- ATOMIC64_OP(op, c_op, asm_op) \
- ATOMIC64_OP_RETURN(op, c_op, asm_op) \
- ATOMIC64_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC64_OPS(add, +=, daddu)
-ATOMIC64_OPS(sub, -=, dsubu)
-
-#define atomic64_add_return_relaxed atomic64_add_return_relaxed
-#define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
-#define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
-#define atomic64_fetch_sub_relaxed atomic64_fetch_sub_relaxed
-
-#undef ATOMIC64_OPS
-#define ATOMIC64_OPS(op, c_op, asm_op) \
- ATOMIC64_OP(op, c_op, asm_op) \
- ATOMIC64_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC64_OPS(and, &=, and)
-ATOMIC64_OPS(or, |=, or)
-ATOMIC64_OPS(xor, ^=, xor)
-
-#define atomic64_fetch_and_relaxed atomic64_fetch_and_relaxed
-#define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
-#define atomic64_fetch_xor_relaxed atomic64_fetch_xor_relaxed
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
/*
* atomic64_sub_if_positive - conditionally subtract integer from atomic
* variable
--
2.23.0

2019-09-30 23:13:46

by Paul Burton

[permalink] [raw]
Subject: [PATCH 20/37] MIPS: bitops: ins start position is always an immediate

The start position for an ins instruction is always encoded as an
immediate, so allowing registers to be used by the inline asm makes no
sense. It should never happen anyway since a bit index should always be
small enough to be treated as an immediate, but remove the nonsensical
"r" for sanity.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/bitops.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index b8785bdf3507..83fd1f1c3ab4 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -85,7 +85,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
" " __INS "%0, %3, %2, 1 \n"
" " __SC "%0, %1 \n"
: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
- : "ir" (bit), "r" (~0)
+ : "i" (bit), "r" (~0)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
return;
@@ -150,7 +150,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
" " __INS "%0, $0, %2, 1 \n"
" " __SC "%0, %1 \n"
: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
- : "ir" (bit)
+ : "i" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
return;
@@ -383,7 +383,7 @@ static inline int test_and_clear_bit(unsigned long nr,
" " __INS "%0, $0, %3, 1 \n"
" " __SC "%0, %1 \n"
: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
- : "ir" (bit)
+ : "i" (bit)
: __LLSC_CLOBBER);
} while (unlikely(!temp));
} else {
--
2.23.0

2019-09-30 23:13:46

by Paul Burton

[permalink] [raw]
Subject: [PATCH 15/37] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg

Remove the remaining duplication between 32b & 64b in asm/atomic.h by
making use of an ATOMIC_OPS() macro to generate:

- atomic_read()/atomic64_read()
- atomic_set()/atomic64_set()
- atomic_cmpxchg()/atomic64_cmpxchg()
- atomic_xchg()/atomic64_xchg()

This is consistent with the way all other functions in asm/atomic.h are
generated, and ensures consistency between the 32b & 64b functions.

Of note is that this results in the above now being static inline
functions rather than macros.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/atomic.h | 70 +++++++++++++---------------------
1 file changed, 27 insertions(+), 43 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 96ef50fa2817..e5ac88392d1f 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -24,24 +24,34 @@
#include <asm/sync.h>
#include <asm/war.h>

-#define ATOMIC_INIT(i) { (i) }
+#define ATOMIC_OPS(pfx, type) \
+static __always_inline type pfx##_read(const pfx##_t *v) \
+{ \
+ return READ_ONCE(v->counter); \
+} \
+ \
+static __always_inline void pfx##_set(pfx##_t *v, type i) \
+{ \
+ WRITE_ONCE(v->counter, i); \
+} \
+ \
+static __always_inline type pfx##_cmpxchg(pfx##_t *v, type o, type n) \
+{ \
+ return cmpxchg(&v->counter, o, n); \
+} \
+ \
+static __always_inline type pfx##_xchg(pfx##_t *v, type n) \
+{ \
+ return xchg(&v->counter, n); \
+}

-/*
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
-#define atomic_read(v) READ_ONCE((v)->counter)
+#define ATOMIC_INIT(i) { (i) }
+ATOMIC_OPS(atomic, int)

-/*
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
-#define atomic_set(v, i) WRITE_ONCE((v)->counter, (i))
+#ifdef CONFIG_64BIT
+# define ATOMIC64_INIT(i) { (i) }
+ATOMIC_OPS(atomic64, s64)
+#endif

#define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
static __inline__ void pfx##_##op(type i, pfx##_t * v) \
@@ -135,6 +145,7 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, pfx##_t * v) \
return result; \
}

+#undef ATOMIC_OPS
#define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc) \
ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc) \
ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc) \
@@ -254,31 +265,4 @@ ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)

#undef ATOMIC_SIP_OP

-#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#ifdef CONFIG_64BIT
-
-#define ATOMIC64_INIT(i) { (i) }
-
-/*
- * atomic64_read - read atomic variable
- * @v: pointer of type atomic64_t
- *
- */
-#define atomic64_read(v) READ_ONCE((v)->counter)
-
-/*
- * atomic64_set - set atomic variable
- * @v: pointer of type atomic64_t
- * @i: required value
- */
-#define atomic64_set(v, i) WRITE_ONCE((v)->counter, (i))
-
-#define atomic64_cmpxchg(v, o, n) \
- ((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#endif /* CONFIG_64BIT */
-
#endif /* _ASM_ATOMIC_H */
--
2.23.0

2019-09-30 23:13:49

by Paul Burton

[permalink] [raw]
Subject: [PATCH 12/37] MIPS: atomic: Emit Loongson3 sync workarounds within asm

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/atomic.h | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index b834af5a7382..841ff274ada6 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -21,6 +21,7 @@
#include <asm/cpu-features.h>
#include <asm/cmpxchg.h>
#include <asm/llsc.h>
+#include <asm/sync.h>
#include <asm/war.h>

#define ATOMIC_INIT(i) { (i) }
@@ -56,10 +57,10 @@ static __inline__ void pfx##_##op(type i, pfx##_t * v) \
return; \
} \
\
- loongson_llsc_mb(); \
__asm__ __volatile__( \
" .set push \n" \
" .set " MIPS_ISA_LEVEL " \n" \
+ " " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %0, %1 # " #pfx "_" #op " \n" \
" " #asm_op " %0, %2 \n" \
" " #sc " %0, %1 \n" \
@@ -85,10 +86,10 @@ static __inline__ type pfx##_##op##_return_relaxed(type i, pfx##_t * v) \
return result; \
} \
\
- loongson_llsc_mb(); \
__asm__ __volatile__( \
" .set push \n" \
" .set " MIPS_ISA_LEVEL " \n" \
+ " " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %1, %2 # " #pfx "_" #op "_return\n" \
" " #asm_op " %0, %1, %3 \n" \
" " #sc " %0, %2 \n" \
@@ -117,10 +118,10 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, pfx##_t * v) \
return result; \
} \
\
- loongson_llsc_mb(); \
__asm__ __volatile__( \
" .set push \n" \
" .set " MIPS_ISA_LEVEL " \n" \
+ " " __SYNC(full, loongson3_war) " \n" \
"1: " #ll " %1, %2 # " #pfx "_fetch_" #op "\n" \
" " #asm_op " %0, %1, %3 \n" \
" " #sc " %0, %2 \n" \
@@ -200,10 +201,10 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
if (kernel_uses_llsc) {
int temp;

- loongson_llsc_mb();
__asm__ __volatile__(
" .set push \n"
" .set "MIPS_ISA_LEVEL" \n"
+ " " __SYNC(full, loongson3_war) " \n"
"1: ll %1, %2 # atomic_sub_if_positive\n"
" .set pop \n"
" subu %0, %1, %3 \n"
@@ -213,7 +214,7 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
" .set "MIPS_ISA_LEVEL" \n"
" sc %1, %2 \n"
"\t" __SC_BEQZ "%1, 1b \n"
- "2: \n"
+ "2: " __SYNC(full, loongson3_war) " \n"
" .set pop \n"
: "=&r" (result), "=&r" (temp),
"+" GCC_OFF_SMALL_ASM() (v->counter)
@@ -229,7 +230,14 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
raw_local_irq_restore(flags);
}

- smp_llsc_mb();
+ /*
+ * In the Loongson3 workaround case we already have a completion
+ * barrier at 2: above, which is needed due to the bltz that can branch
+ * to code outside of the LL/SC loop. As such, we don't need to emit
+ * another barrier here.
+ */
+ if (!__SYNC_loongson3_war)
+ smp_llsc_mb();

return result;
}
--
2.23.0

2019-09-30 23:14:14

by Paul Burton

[permalink] [raw]
Subject: [PATCH 07/37] MIPS: barrier: Clean up __sync() definition

Implement __sync() using the new __SYNC() infrastructure, which will
take care of not emitting an instruction for old R3k CPUs that don't
support it. The only behavioral difference is that __sync() will now
provide a compiler barrier on these old CPUs, but that seems like
reasonable behavior anyway.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/barrier.h | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 657ec01120a4..a117c6d95038 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -11,20 +11,10 @@
#include <asm/addrspace.h>
#include <asm/sync.h>

-#ifdef CONFIG_CPU_HAS_SYNC
-#define __sync() \
- __asm__ __volatile__( \
- ".set push\n\t" \
- ".set noreorder\n\t" \
- ".set mips2\n\t" \
- "sync\n\t" \
- ".set pop" \
- : /* no output */ \
- : /* no input */ \
- : "memory")
-#else
-#define __sync() do { } while(0)
-#endif
+static inline void __sync(void)
+{
+ asm volatile(__SYNC(full, always) ::: "memory");
+}

static inline void rmb(void)
{
--
2.23.0

2019-09-30 23:14:20

by Paul Burton

[permalink] [raw]
Subject: [PATCH 04/37] MIPS: barrier: Clean up rmb() & wmb() definitions

Simplify our definitions of rmb() & wmb() using the new __SYNC()
infrastructure.

The fast_rmb() & fast_wmb() macros are removed, since they only provided
a level of indirection that made the code less readable & weren't
directly used anywhere in the kernel tree.

The Octeon #ifdef'ery is removed, since the "syncw" instruction
previously used is merely an alias for "sync 4" which __SYNC() will emit
for the wmb sync type when the kernel is configured for an Octeon CPU.
Similarly __SYNC() will emit nothing for the rmb sync type in Octeon
configurations.

Signed-off-by: Paul Burton <[email protected]>
---

arch/mips/include/asm/barrier.h | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 5ad39bfd3b6d..f36cab87cfde 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -26,6 +26,18 @@
#define __sync() do { } while(0)
#endif

+static inline void rmb(void)
+{
+ asm volatile(__SYNC(rmb, always) ::: "memory");
+}
+#define rmb rmb
+
+static inline void wmb(void)
+{
+ asm volatile(__SYNC(wmb, always) ::: "memory");
+}
+#define wmb wmb
+
#define __fast_iob() \
__asm__ __volatile__( \
".set push\n\t" \
@@ -37,16 +49,9 @@
: "m" (*(int *)CKSEG1) \
: "memory")
#ifdef CONFIG_CPU_CAVIUM_OCTEON
-# define OCTEON_SYNCW_STR ".set push\n.set arch=octeon\nsyncw\nsyncw\n.set pop\n"
-# define __syncw() __asm__ __volatile__(OCTEON_SYNCW_STR : : : "memory")
-
-# define fast_wmb() __syncw()
-# define fast_rmb() barrier()
# define fast_mb() __sync()
# define fast_iob() do { } while (0)
#else /* ! CONFIG_CPU_CAVIUM_OCTEON */
-# define fast_wmb() __sync()
-# define fast_rmb() __sync()
# define fast_mb() __sync()
# ifdef CONFIG_SGI_IP28
# define fast_iob() \
@@ -83,19 +88,14 @@

#endif /* !CONFIG_CPU_HAS_WB */

-#define wmb() fast_wmb()
-#define rmb() fast_rmb()
-
#if defined(CONFIG_WEAK_ORDERING)
# ifdef CONFIG_CPU_CAVIUM_OCTEON
# define __smp_mb() __sync()
-# define __smp_rmb() barrier()
-# define __smp_wmb() __syncw()
# else
# define __smp_mb() __asm__ __volatile__("sync" : : :"memory")
-# define __smp_rmb() __asm__ __volatile__("sync" : : :"memory")
-# define __smp_wmb() __asm__ __volatile__("sync" : : :"memory")
# endif
+# define __smp_rmb() rmb()
+# define __smp_wmb() wmb()
#else
#define __smp_mb() barrier()
#define __smp_rmb() barrier()
--
2.23.0