LinuxLists.cc - [PATCH 0/6] ARC: spinlocks/atomics rework

2015-08-03 10:03:34

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 0/6] ARC: spinlocks/atomics rework

Hi Peter,

Just running this by you for quick eye balling and also as FYI.
The PREFETCHW workaround for llock/scond livelock was not sufficient after
all and we had to do some work there. Extending testing of quad core FPGA
builds shows things pretty stable, whereas w/o patches some of the LTP tests
(shm_open/23-1) would cause the system to go bonkers.

I need to send this to Linus 4.2-rcx so will appreicate if you could take
a quick peek.

Thx,
-Vineet

Vineet Gupta (6):
Revert "ARCv2: STAR 9000837815 workaround hardware exclusive
transactions livelock"
ARC: refactor atomic inline asm operands with symbolic names
ARC: LLOCK/SCOND based spin_lock
ARC: LLOCK/SCOND based rwlock
ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with
exponential backoff
ARCv2: spinlock/rwlock: Reset retry delay when starting a new
spin-wait cycle

arch/arc/Kconfig | 5 +
arch/arc/include/asm/atomic.h | 113 +++++--
arch/arc/include/asm/spinlock.h | 551 +++++++++++++++++++++++++++++++++-
arch/arc/include/asm/spinlock_types.h | 2 +
arch/arc/kernel/setup.c | 4 +
5 files changed, 636 insertions(+), 39 deletions(-)

--
1.9.1

2015-08-03 10:03:39

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 1/6] Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock"

Extended testing of quad core configuration revealed that this fix was
insufficient. Specifically LTP open posix shm_op/23-1 would cause the
hardware livelock in llock/scond loop in update_cpu_load_active()

So remove this and make way for a proper workaround

This reverts commit a5c8b52abe677977883655166796f167ef1e0084.

Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic.h | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 03484cb4d16d..20b7dc17979e 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,21 +23,13 @@

#define atomic_set(v, i) (((v)->counter) = (i))

-#ifdef CONFIG_ISA_ARCV2
-#define PREFETCHW " prefetchw [%1] \n"
-#else
-#define PREFETCHW
-#endif
-
#define ATOMIC_OP(op, c_op, asm_op) \
static inline void atomic_##op(int i, atomic_t *v) \
{ \
unsigned int temp; \
\
__asm__ __volatile__( \
- "1: \n" \
- PREFETCHW \
- " llock %0, [%1] \n" \
+ "1: llock %0, [%1] \n" \
" " #asm_op " %0, %0, %2 \n" \
" scond %0, [%1] \n" \
" bnz 1b \n" \
@@ -58,9 +50,7 @@ static inline int atomic_##op##_return(int i, atomic_t *v) \
smp_mb(); \
\
__asm__ __volatile__( \
- "1: \n" \
- PREFETCHW \
- " llock %0, [%1] \n" \
+ "1: llock %0, [%1] \n" \
" " #asm_op " %0, %0, %2 \n" \
" scond %0, [%1] \n" \
" bnz 1b \n" \
--
1.9.1

2015-08-03 10:04:28

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 2/6] ARC: refactor atomic inline asm operands with symbolic names

This reduces the diff in forth-coming patches and also helps understand
better the incremental changes to inline asm.

Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/atomic.h | 32 +++++++++++++++++---------------
1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 20b7dc17979e..3dd36c1efee1 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -26,22 +26,23 @@
#define ATOMIC_OP(op, c_op, asm_op) \
static inline void atomic_##op(int i, atomic_t *v) \
{ \
- unsigned int temp; \
+ unsigned int val; \
\
__asm__ __volatile__( \
- "1: llock %0, [%1] \n" \
- " " #asm_op " %0, %0, %2 \n" \
- " scond %0, [%1] \n" \
- " bnz 1b \n" \
- : "=&r"(temp) /* Early clobber, to prevent reg reuse */ \
- : "r"(&v->counter), "ir"(i) \
+ "1: llock %[val], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[val], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " bnz 1b \n" \
+ : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
+ : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
+ [i] "ir" (i) \
: "cc"); \
} \

#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
static inline int atomic_##op##_return(int i, atomic_t *v) \
{ \
- unsigned int temp; \
+ unsigned int val; \
\
/* \
* Explicit full memory barrier needed before/after as \
@@ -50,17 +51,18 @@ static inline int atomic_##op##_return(int i, atomic_t *v) \
smp_mb(); \
\
__asm__ __volatile__( \
- "1: llock %0, [%1] \n" \
- " " #asm_op " %0, %0, %2 \n" \
- " scond %0, [%1] \n" \
- " bnz 1b \n" \
- : "=&r"(temp) \
- : "r"(&v->counter), "ir"(i) \
+ "1: llock %[val], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[val], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " bnz 1b \n" \
+ : [val] "=&r" (val) \
+ : [ctr] "r" (&v->counter), \
+ [i] "ir" (i) \
: "cc"); \
\
smp_mb(); \
\
- return temp; \
+ return val; \
}

#else /* !CONFIG_ARC_HAS_LLSC */
--
1.9.1

2015-08-03 10:03:52

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock

EX causes the cache line to be in Exclusive state and if done
concurrently by multiple cores, it keeps bouncing around.
In LLOCK/SCOND regime, spinning only involves LLOCK which doesn't
change the line state hence better solution.

Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/spinlock.h | 76 +++++++++++++++++++++++++++++++++++++----
1 file changed, 69 insertions(+), 7 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index e1651df6a93d..4f6c90a0a68a 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -18,9 +18,68 @@
#define arch_spin_unlock_wait(x) \
do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)

+#ifdef CONFIG_ARC_HAS_LLSC
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llock %[val], [%[slock]] \n"
+ " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
+ " scond %[LOCKED], [%[slock]] \n" /* acquire */
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [slock] "r" (&(lock->slock)),
+ [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ unsigned int val, got_it = 0;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llock %[val], [%[slock]] \n"
+ " breq %[val], %[LOCKED], 4f \n" /* already LOCKED, just bail */
+ " scond %[LOCKED], [%[slock]] \n" /* acquire */
+ " bnz 1b \n"
+ " mov %[got_it], 1 \n"
+ "4: \n"
+ " \n"
+ : [val] "=&r" (val),
+ [got_it] "+&r" (got_it)
+ : [slock] "r" (&(lock->slock)),
+ [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
+ : "memory", "cc");
+
+ smp_mb();
+
+ return got_it;
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+ smp_mb();
+
+ lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
+
+ smp_mb();
+}
+
+#else /* !CONFIG_ARC_HAS_LLSC */
+
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
- unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
+ unsigned int val = __ARCH_SPIN_LOCK_LOCKED__;

/*
* This smp_mb() is technically superfluous, we only need the one
@@ -33,7 +92,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
__asm__ __volatile__(
"1: ex %0, [%1] \n"
" breq %0, %2, 1b \n"
- : "+&r" (tmp)
+ : "+&r" (val)
: "r"(&(lock->slock)), "ir"(__ARCH_SPIN_LOCK_LOCKED__)
: "memory");

@@ -48,26 +107,27 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
smp_mb();
}

+/* 1 - lock taken successfully */
static inline int arch_spin_trylock(arch_spinlock_t *lock)
{
- unsigned int tmp = __ARCH_SPIN_LOCK_LOCKED__;
+ unsigned int val = __ARCH_SPIN_LOCK_LOCKED__;

smp_mb();

__asm__ __volatile__(
"1: ex %0, [%1] \n"
- : "+r" (tmp)
+ : "+r" (val)
: "r"(&(lock->slock))
: "memory");

smp_mb();

- return (tmp == __ARCH_SPIN_LOCK_UNLOCKED__);
+ return (val == __ARCH_SPIN_LOCK_UNLOCKED__);
}

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
- unsigned int tmp = __ARCH_SPIN_LOCK_UNLOCKED__;
+ unsigned int val = __ARCH_SPIN_LOCK_UNLOCKED__;

/*
* RELEASE barrier: given the instructions avail on ARCv2, full barrier
@@ -77,7 +137,7 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)

__asm__ __volatile__(
" ex %0, [%1] \n"
- : "+r" (tmp)
+ : "+r" (val)
: "r"(&(lock->slock))
: "memory");

@@ -88,6 +148,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
smp_mb();
}

+#endif
+
/*
* Read-write spinlocks, allowing multiple readers but only one writer.
*
--
1.9.1

2015-08-03 10:03:57

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 4/6] ARC: LLOCK/SCOND based rwlock

With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
for a guarding spin lock.

Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/spinlock.h | 187 ++++++++++++++++++++++++++++++++--
arch/arc/include/asm/spinlock_types.h | 2 +
2 files changed, 179 insertions(+), 10 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 4f6c90a0a68a..9a7a7293f127 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -75,6 +75,177 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
smp_mb();
}

+/*
+ * Read-write spinlocks, allowing multiple readers but only one writer.
+ * Unfair locking as Writers could be starved indefinitely by Reader(s)
+ */
+
+static inline void arch_read_lock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ /*
+ * zero means writer holds the lock exclusively, deny Reader.
+ * Otherwise grant lock to first/subseq reader
+ *
+ * if (rw->counter > 0) {
+ * rw->counter--;
+ * ret = 1;
+ * }
+ */
+
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " brls %[val], %[WR_LOCKED], 1b\n" /* <= 0: spin while write locked */
+ " sub %[val], %[val], 1 \n" /* reader lock */
+ " scond %[val], [%[rwlock]] \n"
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [rwlock] "r" (&(rw->counter)),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_read_trylock(arch_rwlock_t *rw)
+{
+ unsigned int val, got_it = 0;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " brls %[val], %[WR_LOCKED], 4f\n" /* <= 0: already write locked, bail */
+ " sub %[val], %[val], 1 \n" /* counter-- */
+ " scond %[val], [%[rwlock]] \n"
+ " bnz 1b \n" /* retry if collided with someone */
+ " mov %[got_it], 1 \n"
+ " \n"
+ "4: ; --- done --- \n"
+
+ : [val] "=&r" (val),
+ [got_it] "+&r" (got_it)
+ : [rwlock] "r" (&(rw->counter)),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+
+ return got_it;
+}
+
+static inline void arch_write_lock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ /*
+ * If reader(s) hold lock (lock < __ARCH_RW_LOCK_UNLOCKED__),
+ * deny writer. Otherwise if unlocked grant to writer
+ * Hence the claim that Linux rwlocks are unfair to writers.
+ * (can be starved for an indefinite time by readers).
+ *
+ * if (rw->counter == __ARCH_RW_LOCK_UNLOCKED__) {
+ * rw->counter = 0;
+ * ret = 1;
+ * }
+ */
+
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " brne %[val], %[UNLOCKED], 1b \n" /* while !UNLOCKED spin */
+ " mov %[val], %[WR_LOCKED] \n"
+ " scond %[val], [%[rwlock]] \n"
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [rwlock] "r" (&(rw->counter)),
+ [UNLOCKED] "ir" (__ARCH_RW_LOCK_UNLOCKED__),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_write_trylock(arch_rwlock_t *rw)
+{
+ unsigned int val, got_it = 0;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " brne %[val], %[UNLOCKED], 4f \n" /* !UNLOCKED, bail */
+ " mov %[val], %[WR_LOCKED] \n"
+ " scond %[val], [%[rwlock]] \n"
+ " bnz 1b \n" /* retry if collided with someone */
+ " mov %[got_it], 1 \n"
+ " \n"
+ "4: ; --- done --- \n"
+
+ : [val] "=&r" (val),
+ [got_it] "+&r" (got_it)
+ : [rwlock] "r" (&(rw->counter)),
+ [UNLOCKED] "ir" (__ARCH_RW_LOCK_UNLOCKED__),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+
+ return got_it;
+}
+
+static inline void arch_read_unlock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ /*
+ * rw->counter++;
+ */
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " add %[val], %[val], 1 \n"
+ " scond %[val], [%[rwlock]] \n"
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [rwlock] "r" (&(rw->counter))
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+static inline void arch_write_unlock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ /*
+ * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
+ */
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " scond %[UNLOCKED], [%[rwlock]]\n"
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [rwlock] "r" (&(rw->counter)),
+ [UNLOCKED] "r" (__ARCH_RW_LOCK_UNLOCKED__)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
#else /* !CONFIG_ARC_HAS_LLSC */

static inline void arch_spin_lock(arch_spinlock_t *lock)
@@ -148,23 +319,14 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
smp_mb();
}

-#endif
-
/*
* Read-write spinlocks, allowing multiple readers but only one writer.
+ * Unfair locking as Writers could be starved indefinitely by Reader(s)
*
* The spinlock itself is contained in @counter and access to it is
* serialized with @lock_mutex.
- *
- * Unfair locking as Writers could be starved indefinitely by Reader(s)
*/

-/* Would read_trylock() succeed? */
-#define arch_read_can_lock(x) ((x)->counter > 0)
-
-/* Would write_trylock() succeed? */
-#define arch_write_can_lock(x) ((x)->counter == __ARCH_RW_LOCK_UNLOCKED__)
-
/* 1 - lock taken successfully */
static inline int arch_read_trylock(arch_rwlock_t *rw)
{
@@ -235,6 +397,11 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
arch_spin_unlock(&(rw->lock_mutex));
}

+#endif
+
+#define arch_read_can_lock(x) ((x)->counter > 0)
+#define arch_write_can_lock(x) ((x)->counter == __ARCH_RW_LOCK_UNLOCKED__)
+
#define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
#define arch_write_lock_flags(lock, flags) arch_write_lock(lock)

diff --git a/arch/arc/include/asm/spinlock_types.h b/arch/arc/include/asm/spinlock_types.h
index 662627ced4f2..4e1ef5f650c6 100644
--- a/arch/arc/include/asm/spinlock_types.h
+++ b/arch/arc/include/asm/spinlock_types.h
@@ -26,7 +26,9 @@ typedef struct {
*/
typedef struct {
volatile unsigned int counter;
+#ifndef CONFIG_ARC_HAS_LLSC
arch_spinlock_t lock_mutex;
+#endif
} arch_rwlock_t;

#define __ARCH_RW_LOCK_UNLOCKED__ 0x01000000
--
1.9.1

2015-08-03 10:04:05

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

This is to workaround the hardware livelock when multiple cores are trying to
SCOND to same location (see code comments for details)

Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/Kconfig | 5 +
arch/arc/include/asm/atomic.h | 73 ++++++++++
arch/arc/include/asm/spinlock.h | 292 ++++++++++++++++++++++++++++++++++++++++
arch/arc/kernel/setup.c | 4 +
4 files changed, 374 insertions(+)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index a5fccdfbfc8f..bd4670d1b89b 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -365,6 +365,11 @@ config ARC_HAS_LLSC
default y
depends on !ARC_CANT_LLSC

+config ARC_STAR_9000923308
+ bool "Workaround for llock/scond livelock"
+ default y
+ depends on ISA_ARCV2 && SMP && ARC_HAS_LLSC
+
config ARC_HAS_SWAPE
bool "Insn: SWAPE (endian-swap)"
default y
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 3dd36c1efee1..f59d2cfc00fc 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,6 +23,8 @@

#define atomic_set(v, i) (((v)->counter) = (i))

+#ifndef CONFIG_ARC_STAR_9000923308
+
#define ATOMIC_OP(op, c_op, asm_op) \
static inline void atomic_##op(int i, atomic_t *v) \
{ \
@@ -65,6 +67,74 @@ static inline int atomic_##op##_return(int i, atomic_t *v) \
return val; \
}

+#else /* CONFIG_ARC_STAR_9000923308 */
+
+#define SCOND_FAIL_RETRY_VAR_DEF \
+ unsigned int delay = 1, tmp; \
+
+#define SCOND_FAIL_RETRY_ASM \
+ " bz 4f \n" \
+ " ; --- scond fail delay --- \n" \
+ " mov %[tmp], %[delay] \n" /* tmp = delay */ \
+ "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
+ " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
+ " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
+ " b 1b \n" /* start over */ \
+ "4: ; --- success --- \n" \
+
+#define SCOND_FAIL_RETRY_VARS \
+ ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
+
+#define ATOMIC_OP(op, c_op, asm_op) \
+static inline void atomic_##op(int i, atomic_t *v) \
+{ \
+ unsigned int val, delay = 1, tmp; \
+ \
+ __asm__ __volatile__( \
+ "1: llock %[val], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[val], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " \n" \
+ SCOND_FAIL_RETRY_ASM \
+ \
+ : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
+ SCOND_FAIL_RETRY_VARS \
+ : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
+ [i] "ir" (i) \
+ : "cc"); \
+} \
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
+static inline int atomic_##op##_return(int i, atomic_t *v) \
+{ \
+ unsigned int val, delay = 1, tmp; \
+ \
+ /* \
+ * Explicit full memory barrier needed before/after as \
+ * LLOCK/SCOND thmeselves don't provide any such semantics \
+ */ \
+ smp_mb(); \
+ \
+ __asm__ __volatile__( \
+ "1: llock %[val], [%[ctr]] \n" \
+ " " #asm_op " %[val], %[val], %[i] \n" \
+ " scond %[val], [%[ctr]] \n" \
+ " \n" \
+ SCOND_FAIL_RETRY_ASM \
+ \
+ : [val] "=&r" (val) \
+ SCOND_FAIL_RETRY_VARS \
+ : [ctr] "r" (&v->counter), \
+ [i] "ir" (i) \
+ : "cc"); \
+ \
+ smp_mb(); \
+ \
+ return val; \
+}
+
+#endif /* CONFIG_ARC_STAR_9000923308 */
+
#else /* !CONFIG_ARC_HAS_LLSC */

#ifndef CONFIG_SMP
@@ -142,6 +212,9 @@ ATOMIC_OP(and, &=, and)
#undef ATOMIC_OPS
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
+#undef SCOND_FAIL_RETRY_VAR_DEF
+#undef SCOND_FAIL_RETRY_ASM
+#undef SCOND_FAIL_RETRY_VARS

/**
* __atomic_add_unless - add unless the number is a given value
diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 9a7a7293f127..297c54484013 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -20,6 +20,11 @@

#ifdef CONFIG_ARC_HAS_LLSC

+/*
+ * A normal LLOCK/SCOND based system, w/o need for livelock workaround
+ */
+#ifndef CONFIG_ARC_STAR_9000923308
+
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned int val;
@@ -246,6 +251,293 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
smp_mb();
}

+#else /* CONFIG_ARC_STAR_9000923308 */
+
+/*
+ * HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
+ * coherency transactions in the SCU. The exclusive line state keeps rotating
+ * among contenting cores leading to a never ending cycle. So break the cycle
+ * by deferring the retry of failed exclusive access (SCOND). The actual delay
+ * needed is function of number of contending cores as well as the unrelated
+ * coherency traffic from other cores. To keep the code simple, start off with
+ * small delay of 1 which would suffice most cases and in case of contention
+ * double the delay. Eventually the delay is sufficient such that the coherency
+ * pipeline is drained, thus a subsequent exclusive access would succeed.
+ */
+
+#define SCOND_FAIL_RETRY_VAR_DEF \
+ unsigned int delay, tmp; \
+
+#define SCOND_FAIL_RETRY_ASM \
+ " ; --- scond fail delay --- \n" \
+ " mov %[tmp], %[delay] \n" /* tmp = delay */ \
+ "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
+ " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
+ " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
+ " b 1b \n" /* start over */ \
+ " \n" \
+ "4: ; --- done --- \n" \
+
+#define SCOND_FAIL_RETRY_VARS \
+ ,[delay] "=&r" (delay), [tmp] "=&r" (tmp) \
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+ unsigned int val;
+ SCOND_FAIL_RETRY_VAR_DEF;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "0: mov %[delay], 1 \n"
+ "1: llock %[val], [%[slock]] \n"
+ " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
+ " scond %[LOCKED], [%[slock]] \n" /* acquire */
+ " bz 4f \n" /* done */
+ " \n"
+ SCOND_FAIL_RETRY_ASM
+
+ : [val] "=&r" (val)
+ SCOND_FAIL_RETRY_VARS
+ : [slock] "r" (&(lock->slock)),
+ [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ unsigned int val, got_it = 0;
+ SCOND_FAIL_RETRY_VAR_DEF;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "0: mov %[delay], 1 \n"
+ "1: llock %[val], [%[slock]] \n"
+ " breq %[val], %[LOCKED], 4f \n" /* already LOCKED, just bail */
+ " scond %[LOCKED], [%[slock]] \n" /* acquire */
+ " bz.d 4f \n"
+ " mov.z %[got_it], 1 \n" /* got it */
+ " \n"
+ SCOND_FAIL_RETRY_ASM
+
+ : [val] "=&r" (val),
+ [got_it] "+&r" (got_it)
+ SCOND_FAIL_RETRY_VARS
+ : [slock] "r" (&(lock->slock)),
+ [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
+ : "memory", "cc");
+
+ smp_mb();
+
+ return got_it;
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+ smp_mb();
+
+ lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
+
+ smp_mb();
+}
+
+/*
+ * Read-write spinlocks, allowing multiple readers but only one writer.
+ * Unfair locking as Writers could be starved indefinitely by Reader(s)
+ */
+
+static inline void arch_read_lock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+ SCOND_FAIL_RETRY_VAR_DEF;
+
+ smp_mb();
+
+ /*
+ * zero means writer holds the lock exclusively, deny Reader.
+ * Otherwise grant lock to first/subseq reader
+ *
+ * if (rw->counter > 0) {
+ * rw->counter--;
+ * ret = 1;
+ * }
+ */
+
+ __asm__ __volatile__(
+ "0: mov %[delay], 1 \n"
+ "1: llock %[val], [%[rwlock]] \n"
+ " brls %[val], %[WR_LOCKED], 1b\n" /* <= 0: spin while write locked */
+ " sub %[val], %[val], 1 \n" /* reader lock */
+ " scond %[val], [%[rwlock]] \n"
+ " bz 4f \n" /* done */
+ " \n"
+ SCOND_FAIL_RETRY_ASM
+
+ : [val] "=&r" (val)
+ SCOND_FAIL_RETRY_VARS
+ : [rwlock] "r" (&(rw->counter)),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_read_trylock(arch_rwlock_t *rw)
+{
+ unsigned int val, got_it = 0;
+ SCOND_FAIL_RETRY_VAR_DEF;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "0: mov %[delay], 1 \n"
+ "1: llock %[val], [%[rwlock]] \n"
+ " brls %[val], %[WR_LOCKED], 4f\n" /* <= 0: already write locked, bail */
+ " sub %[val], %[val], 1 \n" /* counter-- */
+ " scond %[val], [%[rwlock]] \n"
+ " bz.d 4f \n"
+ " mov.z %[got_it], 1 \n" /* got it */
+ " \n"
+ SCOND_FAIL_RETRY_ASM
+
+ : [val] "=&r" (val),
+ [got_it] "+&r" (got_it)
+ SCOND_FAIL_RETRY_VARS
+ : [rwlock] "r" (&(rw->counter)),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+
+ return got_it;
+}
+
+static inline void arch_write_lock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+ SCOND_FAIL_RETRY_VAR_DEF;
+
+ smp_mb();
+
+ /*
+ * If reader(s) hold lock (lock < __ARCH_RW_LOCK_UNLOCKED__),
+ * deny writer. Otherwise if unlocked grant to writer
+ * Hence the claim that Linux rwlocks are unfair to writers.
+ * (can be starved for an indefinite time by readers).
+ *
+ * if (rw->counter == __ARCH_RW_LOCK_UNLOCKED__) {
+ * rw->counter = 0;
+ * ret = 1;
+ * }
+ */
+
+ __asm__ __volatile__(
+ "0: mov %[delay], 1 \n"
+ "1: llock %[val], [%[rwlock]] \n"
+ " brne %[val], %[UNLOCKED], 1b \n" /* while !UNLOCKED spin */
+ " mov %[val], %[WR_LOCKED] \n"
+ " scond %[val], [%[rwlock]] \n"
+ " bz 4f \n"
+ " \n"
+ SCOND_FAIL_RETRY_ASM
+
+ : [val] "=&r" (val)
+ SCOND_FAIL_RETRY_VARS
+ : [rwlock] "r" (&(rw->counter)),
+ [UNLOCKED] "ir" (__ARCH_RW_LOCK_UNLOCKED__),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+/* 1 - lock taken successfully */
+static inline int arch_write_trylock(arch_rwlock_t *rw)
+{
+ unsigned int val, got_it = 0;
+ SCOND_FAIL_RETRY_VAR_DEF;
+
+ smp_mb();
+
+ __asm__ __volatile__(
+ "0: mov %[delay], 1 \n"
+ "1: llock %[val], [%[rwlock]] \n"
+ " brne %[val], %[UNLOCKED], 4f \n" /* !UNLOCKED, bail */
+ " mov %[val], %[WR_LOCKED] \n"
+ " scond %[val], [%[rwlock]] \n"
+ " bz.d 4f \n"
+ " mov.z %[got_it], 1 \n" /* got it */
+ " \n"
+ SCOND_FAIL_RETRY_ASM
+
+ : [val] "=&r" (val),
+ [got_it] "+&r" (got_it)
+ SCOND_FAIL_RETRY_VARS
+ : [rwlock] "r" (&(rw->counter)),
+ [UNLOCKED] "ir" (__ARCH_RW_LOCK_UNLOCKED__),
+ [WR_LOCKED] "ir" (0)
+ : "memory", "cc");
+
+ smp_mb();
+
+ return got_it;
+}
+
+static inline void arch_read_unlock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ /*
+ * rw->counter++;
+ */
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " add %[val], %[val], 1 \n"
+ " scond %[val], [%[rwlock]] \n"
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [rwlock] "r" (&(rw->counter))
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+static inline void arch_write_unlock(arch_rwlock_t *rw)
+{
+ unsigned int val;
+
+ smp_mb();
+
+ /*
+ * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
+ */
+ __asm__ __volatile__(
+ "1: llock %[val], [%[rwlock]] \n"
+ " scond %[UNLOCKED], [%[rwlock]]\n"
+ " bnz 1b \n"
+ " \n"
+ : [val] "=&r" (val)
+ : [rwlock] "r" (&(rw->counter)),
+ [UNLOCKED] "r" (__ARCH_RW_LOCK_UNLOCKED__)
+ : "memory", "cc");
+
+ smp_mb();
+}
+
+#undef SCOND_FAIL_RETRY_VAR_DEF
+#undef SCOND_FAIL_RETRY_ASM
+#undef SCOND_FAIL_RETRY_VARS
+
+#endif /* CONFIG_ARC_STAR_9000923308 */
+
#else /* !CONFIG_ARC_HAS_LLSC */

static inline void arch_spin_lock(arch_spinlock_t *lock)
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index 18cc01591c96..1f3d9b3160fa 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -330,6 +330,10 @@ static void arc_chk_core_config(void)
pr_warn("CONFIG_ARC_FPU_SAVE_RESTORE needed for working apps\n");
else if (!cpu->extn.fpu_dp && fpu_enabled)
panic("FPU non-existent, disable CONFIG_ARC_FPU_SAVE_RESTORE\n");
+
+ if (is_isa_arcv2() && IS_ENABLED(CONFIG_SMP) && cpu->isa.atomic &&
+ !IS_ENABLED(CONFIG_ARC_STAR_9000923308))
+ panic("llock/scond livelock workaround missing\n");
}

/*
--
1.9.1

2015-08-03 10:04:09

by Vineet Gupta

[permalink] [raw]

Subject: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle

A spin lock could be available momentarily, but the SCOND to actually
acquire it might still fail due to concurrent update from other core(s).
To elide hardware lock, the sequence is retried after a "delay" which is
increased expoenntially to get a nice backoff behaviour.

However, this could cause the delay counter to get to a high value. Thus
when the next spin cycle restarts, reset the counter back to starting
value of 1.

Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/spinlock.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 297c54484013..dce6e3d8583c 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -291,7 +291,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
__asm__ __volatile__(
"0: mov %[delay], 1 \n"
"1: llock %[val], [%[slock]] \n"
- " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
+ " breq %[val], %[LOCKED], 0b \n" /* spin while LOCKED */
" scond %[LOCKED], [%[slock]] \n" /* acquire */
" bz 4f \n" /* done */
" \n"
@@ -370,7 +370,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw)
__asm__ __volatile__(
"0: mov %[delay], 1 \n"
"1: llock %[val], [%[rwlock]] \n"
- " brls %[val], %[WR_LOCKED], 1b\n" /* <= 0: spin while write locked */
+ " brls %[val], %[WR_LOCKED], 0b\n" /* <= 0: spin while write locked */
" sub %[val], %[val], 1 \n" /* reader lock */
" scond %[val], [%[rwlock]] \n"
" bz 4f \n" /* done */
@@ -439,7 +439,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
__asm__ __volatile__(
"0: mov %[delay], 1 \n"
"1: llock %[val], [%[rwlock]] \n"
- " brne %[val], %[UNLOCKED], 1b \n" /* while !UNLOCKED spin */
+ " brne %[val], %[UNLOCKED], 0b \n" /* while !UNLOCKED spin */
" mov %[val], %[WR_LOCKED] \n"
" scond %[val], [%[rwlock]] \n"
" bz 4f \n"
--
1.9.1

2015-08-03 11:29:38

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock

On Mon, Aug 03, 2015 at 03:33:05PM +0530, Vineet Gupta wrote:
> EX causes the cache line to be in Exclusive state and if done
> concurrently by multiple cores, it keeps bouncing around.
> In LLOCK/SCOND regime, spinning only involves LLOCK which doesn't
> change the line state hence better solution.

Maybe write like:

"The EXchange instruction forces the cacheline into exclusive state
(because of the modify) and concurrent loops with it will bounce the
cacheline between the cores

Instead use LLOCK/SCOND to form the test-and-set lock implementation
since LLOCK can keep the line in shared state."

Because it wasn't clear to me what EX meant, and surely a LOAD must
change the cacheline state of it previously was in exclusive on another
core. Its just that shared is a whole lot better to spin on than
exclusive.

Also, since you're using LL/SC now, a slightly more complex lock is
trivial to implement, might I suggest you look at implementing ticket
locks next?

> Cc: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Vineet Gupta <[email protected]>
> ---
> arch/arc/include/asm/spinlock.h | 76 +++++++++++++++++++++++++++++++++++++----
> 1 file changed, 69 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
> index e1651df6a93d..4f6c90a0a68a 100644
> --- a/arch/arc/include/asm/spinlock.h
> +++ b/arch/arc/include/asm/spinlock.h
> @@ -18,9 +18,68 @@
> #define arch_spin_unlock_wait(x) \
> do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
>
> +#ifdef CONFIG_ARC_HAS_LLSC
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + unsigned int val;
> +
> + smp_mb();

I'm still puzzled by your need of this one ...

> +
> + __asm__ __volatile__(
> + "1: llock %[val], [%[slock]] \n"
> + " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
> + " bnz 1b \n"
> + " \n"
> + : [val] "=&r" (val)
> + : [slock] "r" (&(lock->slock)),
> + [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
> + : "memory", "cc");
> +
> + smp_mb();
> +}
> +
> +/* 1 - lock taken successfully */
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + unsigned int val, got_it = 0;
> +
> + smp_mb();

Idem.

> +
> + __asm__ __volatile__(
> + "1: llock %[val], [%[slock]] \n"
> + " breq %[val], %[LOCKED], 4f \n" /* already LOCKED, just bail */
> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
> + " bnz 1b \n"
> + " mov %[got_it], 1 \n"
> + "4: \n"
> + " \n"
> + : [val] "=&r" (val),
> + [got_it] "+&r" (got_it)
> + : [slock] "r" (&(lock->slock)),
> + [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
> + : "memory", "cc");
> +
> + smp_mb();
> +
> + return got_it;
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + smp_mb();
> +
> + lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
> +
> + smp_mb();

Idem.

> +}
> +
> +#else /* !CONFIG_ARC_HAS_LLSC */

2015-08-03 11:33:38

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 4/6] ARC: LLOCK/SCOND based rwlock

On Mon, Aug 03, 2015 at 03:33:06PM +0530, Vineet Gupta wrote:
> With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
> for a guarding spin lock.

Maybe re-iterate the exclusive vs shared spin story again.

And aside from the far too many full barriers (again), I was just
wondering about:

> +static inline void arch_write_unlock(arch_rwlock_t *rw)
> +{
> + unsigned int val;
> +
> + smp_mb();
> +
> + /*
> + * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
> + */
> + __asm__ __volatile__(
> + "1: llock %[val], [%[rwlock]] \n"
> + " scond %[UNLOCKED], [%[rwlock]]\n"
> + " bnz 1b \n"
> + " \n"
> + : [val] "=&r" (val)
> + : [rwlock] "r" (&(rw->counter)),
> + [UNLOCKED] "r" (__ARCH_RW_LOCK_UNLOCKED__)
> + : "memory", "cc");
> +
> + smp_mb();
> +}

Why can't that be a straight store?

2015-08-03 11:41:14

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
> +#define SCOND_FAIL_RETRY_VAR_DEF \
> + unsigned int delay = 1, tmp; \
> +
> +#define SCOND_FAIL_RETRY_ASM \
> + " bz 4f \n" \
> + " ; --- scond fail delay --- \n" \
> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
> + " b 1b \n" /* start over */ \
> + "4: ; --- success --- \n" \
> +
> +#define SCOND_FAIL_RETRY_VARS \
> + ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
> +
> +#define ATOMIC_OP(op, c_op, asm_op) \
> +static inline void atomic_##op(int i, atomic_t *v) \
> +{ \
> + unsigned int val, delay = 1, tmp; \

Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?

> + \
> + __asm__ __volatile__( \
> + "1: llock %[val], [%[ctr]] \n" \
> + " " #asm_op " %[val], %[val], %[i] \n" \
> + " scond %[val], [%[ctr]] \n" \
> + " \n" \
> + SCOND_FAIL_RETRY_ASM \
> + \
> + : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
> + SCOND_FAIL_RETRY_VARS \
> + : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
> + [i] "ir" (i) \
> + : "cc"); \
> +} \
> +
> +#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
> +static inline int atomic_##op##_return(int i, atomic_t *v) \
> +{ \
> + unsigned int val, delay = 1, tmp; \

Idem.

> + \
> + /* \
> + * Explicit full memory barrier needed before/after as \
> + * LLOCK/SCOND thmeselves don't provide any such semantics \
> + */ \
> + smp_mb(); \
> + \
> + __asm__ __volatile__( \
> + "1: llock %[val], [%[ctr]] \n" \
> + " " #asm_op " %[val], %[val], %[i] \n" \
> + " scond %[val], [%[ctr]] \n" \
> + " \n" \
> + SCOND_FAIL_RETRY_ASM \
> + \
> + : [val] "=&r" (val) \
> + SCOND_FAIL_RETRY_VARS \
> + : [ctr] "r" (&v->counter), \
> + [i] "ir" (i) \
> + : "cc"); \
> + \
> + smp_mb(); \
> + \
> + return val; \
> +}

> +#define SCOND_FAIL_RETRY_VAR_DEF \
> + unsigned int delay, tmp; \
> +
> +#define SCOND_FAIL_RETRY_ASM \
> + " ; --- scond fail delay --- \n" \
> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
> + " b 1b \n" /* start over */ \
> + " \n" \
> + "4: ; --- done --- \n" \
> +
> +#define SCOND_FAIL_RETRY_VARS \
> + ,[delay] "=&r" (delay), [tmp] "=&r" (tmp) \

This is looking remarkably similar to the previous ones, why not a
shared header?

> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + unsigned int val;
> + SCOND_FAIL_RETRY_VAR_DEF;
> +
> + smp_mb();
> +
> + __asm__ __volatile__(
> + "0: mov %[delay], 1 \n"
> + "1: llock %[val], [%[slock]] \n"
> + " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
> + " bz 4f \n" /* done */
> + " \n"
> + SCOND_FAIL_RETRY_ASM

But,... in the case that macro is empty, the label 4 does not actually
exist. I see no real reason for this to be different from the previous
incarnation either.

2015-08-03 11:43:58

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle

On Mon, Aug 03, 2015 at 03:33:08PM +0530, Vineet Gupta wrote:
> A spin lock could be available momentarily, but the SCOND to actually
> acquire it might still fail due to concurrent update from other core(s).
> To elide hardware lock, the sequence is retried after a "delay" which is
> increased expoenntially to get a nice backoff behaviour.
>
> However, this could cause the delay counter to get to a high value. Thus
> when the next spin cycle restarts, reset the counter back to starting
> value of 1.

Cute.. fwiw, did you look at what Sparc64 does?

2015-08-03 11:44:08

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 3/6] ARC: LLOCK/SCOND based spin_lock

On Monday 03 August 2015 04:59 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:05PM +0530, Vineet Gupta wrote:
>> EX causes the cache line to be in Exclusive state and if done
>> concurrently by multiple cores, it keeps bouncing around.
>> In LLOCK/SCOND regime, spinning only involves LLOCK which doesn't
>> change the line state hence better solution.
> Maybe write like:
>
> "The EXchange instruction forces the cacheline into exclusive state
> (because of the modify) and concurrent loops with it will bounce the
> cacheline between the cores
>
> Instead use LLOCK/SCOND to form the test-and-set lock implementation
> since LLOCK can keep the line in shared state."

OK ! Will change this for v2.

> Because it wasn't clear to me what EX meant, and surely a LOAD must
> change the cacheline state of it previously was in exclusive on another
> core. Its just that shared is a whole lot better to spin on than
> exclusive.
>
> Also, since you're using LL/SC now, a slightly more complex lock is
> trivial to implement, might I suggest you look at implementing ticket
> locks next?

Sure !

>> Cc: Peter Zijlstra (Intel) <[email protected]>
>> Signed-off-by: Vineet Gupta <[email protected]>
>> ---
>> arch/arc/include/asm/spinlock.h | 76 +++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 69 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
>> index e1651df6a93d..4f6c90a0a68a 100644
>> --- a/arch/arc/include/asm/spinlock.h
>> +++ b/arch/arc/include/asm/spinlock.h
>> @@ -18,9 +18,68 @@
>> #define arch_spin_unlock_wait(x) \
>> do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
>>
>> +#ifdef CONFIG_ARC_HAS_LLSC
>> +
>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> + unsigned int val;
>> +
>> + smp_mb();
> I'm still puzzled by your need of this one ...

My initial experiment with removing this caused weird jumps in hackbench numbers
and i'v enot had the chance to investigate it at all - but hopefully after this
series I can go back to it. So for now I'd like to keep it as is...

>> +
>> + __asm__ __volatile__(
>> + "1: llock %[val], [%[slock]] \n"
>> + " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
>> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
>> + " bnz 1b \n"
>> + " \n"
>> + : [val] "=&r" (val)
>> + : [slock] "r" (&(lock->slock)),
>> + [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
>> + : "memory", "cc");
>> +
>> + smp_mb();
>> +}
>> +
>> +/* 1 - lock taken successfully */
>> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> + unsigned int val, got_it = 0;
>> +
>> + smp_mb();
> Idem.
>
>> +
>> + __asm__ __volatile__(
>> + "1: llock %[val], [%[slock]] \n"
>> + " breq %[val], %[LOCKED], 4f \n" /* already LOCKED, just bail */
>> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
>> + " bnz 1b \n"
>> + " mov %[got_it], 1 \n"
>> + "4: \n"
>> + " \n"
>> + : [val] "=&r" (val),
>> + [got_it] "+&r" (got_it)
>> + : [slock] "r" (&(lock->slock)),
>> + [LOCKED] "r" (__ARCH_SPIN_LOCK_LOCKED__)
>> + : "memory", "cc");
>> +
>> + smp_mb();
>> +
>> + return got_it;
>> +}
>> +
>> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
>> +{
>> + smp_mb();
>> +
>> + lock->slock = __ARCH_SPIN_LOCK_UNLOCKED__;
>> +
>> + smp_mb();
> Idem.
>
>> +}
>> +
>> +#else /* !CONFIG_ARC_HAS_LLSC */

2015-08-03 11:51:07

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
> +#define SCOND_FAIL_RETRY_ASM \
> + " bz 4f \n" \
> + " ; --- scond fail delay --- \n" \
> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
> + " b 1b \n" /* start over */ \
> + "4: ; --- success --- \n" \

One more note, you might want a test to handle the case where delay *= 2
overflows and results in a 0.

2015-08-03 11:51:22

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 4/6] ARC: LLOCK/SCOND based rwlock

On Monday 03 August 2015 05:03 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:06PM +0530, Vineet Gupta wrote:
>> With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
>> for a guarding spin lock.
> Maybe re-iterate the exclusive vs shared spin story again.
>
> And aside from the far too many full barriers (again), I was just
> wondering about:
>
>> +static inline void arch_write_unlock(arch_rwlock_t *rw)
>> +{
>> + unsigned int val;
>> +
>> + smp_mb();
>> +
>> + /*
>> + * rw->counter = __ARCH_RW_LOCK_UNLOCKED__;
>> + */
>> + __asm__ __volatile__(
>> + "1: llock %[val], [%[rwlock]] \n"
>> + " scond %[UNLOCKED], [%[rwlock]]\n"
>> + " bnz 1b \n"
>> + " \n"
>> + : [val] "=&r" (val)
>> + : [rwlock] "r" (&(rw->counter)),
>> + [UNLOCKED] "r" (__ARCH_RW_LOCK_UNLOCKED__)
>> + : "memory", "cc");
>> +
>> + smp_mb();
>> +}
> Why can't that be a straight store?

Right - that was my overly cautious initial implementation. I did switch the spin
unlock to regular ST after initial experimenting, but missed this one. I'll take
it out in next version.

-Vineet

2015-08-03 13:02:04

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>> +#define SCOND_FAIL_RETRY_VAR_DEF \
>> + unsigned int delay = 1, tmp; \
>> +
>> +#define SCOND_FAIL_RETRY_ASM \
>> + " bz 4f \n" \
>> + " ; --- scond fail delay --- \n" \
>> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
>> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
>> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
>> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
>> + " b 1b \n" /* start over */ \
>> + "4: ; --- success --- \n" \
>> +
>> +#define SCOND_FAIL_RETRY_VARS \
>> + ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
>> +
>> +#define ATOMIC_OP(op, c_op, asm_op) \
>> +static inline void atomic_##op(int i, atomic_t *v) \
>> +{ \
>> + unsigned int val, delay = 1, tmp; \
> Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?

Right - not sure how I missed that !

>
>> + \
>> + __asm__ __volatile__( \
>> + "1: llock %[val], [%[ctr]] \n" \
>> + " " #asm_op " %[val], %[val], %[i] \n" \
>> + " scond %[val], [%[ctr]] \n" \
>> + " \n" \
>> + SCOND_FAIL_RETRY_ASM \
>> + \
>> + : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
>> + SCOND_FAIL_RETRY_VARS \
>> + : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
>> + [i] "ir" (i) \
>> + : "cc"); \
>> +} \
>> +
>> +#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
>> +static inline int atomic_##op##_return(int i, atomic_t *v) \
>> +{ \
>> + unsigned int val, delay = 1, tmp; \
> Idem.

OK !

>> + \
>> + /* \
>> + * Explicit full memory barrier needed before/after as \
>> + * LLOCK/SCOND thmeselves don't provide any such semantics \
>> + */ \
>> + smp_mb(); \
>> + \
>> + __asm__ __volatile__( \
>> + "1: llock %[val], [%[ctr]] \n" \
>> + " " #asm_op " %[val], %[val], %[i] \n" \
>> + " scond %[val], [%[ctr]] \n" \
>> + " \n" \
>> + SCOND_FAIL_RETRY_ASM \
>> + \
>> + : [val] "=&r" (val) \
>> + SCOND_FAIL_RETRY_VARS \
>> + : [ctr] "r" (&v->counter), \
>> + [i] "ir" (i) \
>> + : "cc"); \
>> + \
>> + smp_mb(); \
>> + \
>> + return val; \
>> +}
>> +#define SCOND_FAIL_RETRY_VAR_DEF \
>> + unsigned int delay, tmp; \
>> +
>> +#define SCOND_FAIL_RETRY_ASM \
>> + " ; --- scond fail delay --- \n" \
>> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
>> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
>> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
>> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
>> + " b 1b \n" /* start over */ \
>> + " \n" \
>> + "4: ; --- done --- \n" \
>> +
>> +#define SCOND_FAIL_RETRY_VARS \
>> + ,[delay] "=&r" (delay), [tmp] "=&r" (tmp) \
> This is looking remarkably similar to the previous ones, why not a
> shared header?

I thought about it when duplicating the code - however it seemed that readability
was better if code was present in same file, rather than having to look up in a
different header with no context at all.

Plus there are some subtle differences in two when looked closely. Basically
spinlocks need the reset to 1 quirk which atomics don't which means we need the
delay reset to 1 in spinlock inline asm (and a different inline asm constraint).
Plus for atomics, the success branch (bz 4f) is folded away into the macro while
we can't for lock try routines, as that branch uses a delay slot. Agreed that all
of this is in the micro-optim realm, but I suppose worth when u have a 10 stage
pipeline.

>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> + unsigned int val;
>> + SCOND_FAIL_RETRY_VAR_DEF;
>> +
>> + smp_mb();
>> +
>> + __asm__ __volatile__(
>> + "0: mov %[delay], 1 \n"
>> + "1: llock %[val], [%[slock]] \n"
>> + " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
>> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
>> + " bz 4f \n" /* done */
>> + " \n"
>> + SCOND_FAIL_RETRY_ASM
> But,... in the case that macro is empty, the label 4 does not actually
> exist. I see no real reason for this to be different from the previous
> incarnation either.

Per current code, the macro is never empty. I initially wrote it to have one
version of routines with different macro definition but then it was getting
terribly difficult to follow so I resorted to duplicating all the routines, with
macros to kind of compensate for duplication by factoring out common code in
duplicated code :-)

for locks, I can again fold the the bz into macro, but then we can't use the delay
slot in try versions !

2015-08-03 13:02:17

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Monday 03 August 2015 05:21 PM, Peter Zijlstra wrote:

On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:

> +#define SCOND_FAIL_RETRY_ASM \
> + " bz 4f \n" \
> + " ; --- scond fail delay --- \n" \
> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
> + " b 1b \n" /* start over */ \
> + "4: ; --- success --- \n" \

One more note, you might want a test to handle the case where delay *= 2
overflows and results in a 0.

yeah !

- asl %[delay], %[delay], 1
+ asl.f %[delay], %[delay], 1
+ mov.z %[delay], 1

2015-08-03 13:06:59

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Mon, Aug 03, 2015 at 01:02:09PM +0000, Vineet Gupta wrote:
> On Monday 03 August 2015 05:21 PM, Peter Zijlstra wrote:
>
> On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>
>
> > +#define SCOND_FAIL_RETRY_ASM \
> > + " bz 4f \n" \
> > + " ; --- scond fail delay --- \n" \
> > + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
> > + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
> > + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
> > + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
> > + " b 1b \n" /* start over */ \
> > + "4: ; --- success --- \n" \
>
>
> One more note, you might want a test to handle the case where delay *= 2
> overflows and results in a 0.
>
>
> yeah !
>
> - asl %[delay], %[delay], 1
> + asl.f %[delay], %[delay], 1
> + mov.z %[delay], 1

The other obvious option is: 0x80000000 :-)

2015-08-03 13:50:08

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Monday 03 August 2015 06:31 PM, Vineet Gupta wrote:
> On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote:
>> > On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>>> >> +#define SCOND_FAIL_RETRY_VAR_DEF \
>>> >> + unsigned int delay = 1, tmp; \
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_ASM \
>>> >> + " bz 4f \n" \
>>> >> + " ; --- scond fail delay --- \n" \
>>> >> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
>>> >> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
>>> >> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
>>> >> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
>>> >> + " b 1b \n" /* start over */ \
>>> >> + "4: ; --- success --- \n" \
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_VARS \
>>> >> + ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
>>> >> +
>>> >> +#define ATOMIC_OP(op, c_op, asm_op) \
>>> >> +static inline void atomic_##op(int i, atomic_t *v) \
>>> >> +{ \
>>> >> + unsigned int val, delay = 1, tmp; \
>> > Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?
> Right - not sure how I missed that !
>
>> >
>>> >> + \
>>> >> + __asm__ __volatile__( \
>>> >> + "1: llock %[val], [%[ctr]] \n" \
>>> >> + " " #asm_op " %[val], %[val], %[i] \n" \
>>> >> + " scond %[val], [%[ctr]] \n" \
>>> >> + " \n" \
>>> >> + SCOND_FAIL_RETRY_ASM \
>>> >> + \
>>> >> + : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
>>> >> + SCOND_FAIL_RETRY_VARS \
>>> >> + : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
>>> >> + [i] "ir" (i) \
>>> >> + : "cc"); \
>>> >> +} \
>>> >> +
>>> >> +#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
>>> >> +static inline int atomic_##op##_return(int i, atomic_t *v) \
>>> >> +{ \
>>> >> + unsigned int val, delay = 1, tmp; \
>> > Idem.
> OK !
>
>>> >> + \
>>> >> + /* \
>>> >> + * Explicit full memory barrier needed before/after as \
>>> >> + * LLOCK/SCOND thmeselves don't provide any such semantics \
>>> >> + */ \
>>> >> + smp_mb(); \
>>> >> + \
>>> >> + __asm__ __volatile__( \
>>> >> + "1: llock %[val], [%[ctr]] \n" \
>>> >> + " " #asm_op " %[val], %[val], %[i] \n" \
>>> >> + " scond %[val], [%[ctr]] \n" \
>>> >> + " \n" \
>>> >> + SCOND_FAIL_RETRY_ASM \
>>> >> + \
>>> >> + : [val] "=&r" (val) \
>>> >> + SCOND_FAIL_RETRY_VARS \
>>> >> + : [ctr] "r" (&v->counter), \
>>> >> + [i] "ir" (i) \
>>> >> + : "cc"); \
>>> >> + \
>>> >> + smp_mb(); \
>>> >> + \
>>> >> + return val; \
>>> >> +}
>>> >> +#define SCOND_FAIL_RETRY_VAR_DEF \
>>> >> + unsigned int delay, tmp; \
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_ASM \
>>> >> + " ; --- scond fail delay --- \n" \
>>> >> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
>>> >> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
>>> >> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
>>> >> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
>>> >> + " b 1b \n" /* start over */ \
>>> >> + " \n" \
>>> >> + "4: ; --- done --- \n" \
>>> >> +
>>> >> +#define SCOND_FAIL_RETRY_VARS \
>>> >> + ,[delay] "=&r" (delay), [tmp] "=&r" (tmp) \
>> > This is looking remarkably similar to the previous ones, why not a
>> > shared header?

On second thoughts, the duplication of atomic generator macros seems to be
superflous ...

Below is much more readable and shorter.

------------>
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 3dd36c1efee1..c2e012ca4560 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -23,17 +23,50 @@

#define atomic_set(v, i) (((v)->counter) = (i))

+#ifdef CONFIG_ARC_STAR_9000923308
+
+#define SCOND_FAIL_RETRY_VAR_DEF \
+ unsigned int delay = 1, tmp; \
+
+#define SCOND_FAIL_RETRY_ASM \
+ " bz 4f \n" \
+ " ; --- scond fail delay --- \n" \
+ " mov %[tmp], %[delay] \n" /* tmp = delay */ \
+ "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
+ " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
+ " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
+ " b 1b \n" /* start over */ \
+ "4: ; --- success --- \n" \
+
+#define SCOND_FAIL_RETRY_VARS \
+ ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
+
+#else /* !CONFIG_ARC_STAR_9000923308 */
+
+#define SCOND_FAIL_RETRY_VAR_DEF
+
+#define SCOND_FAIL_RETRY_ASM \
+ " bnz 1b \n" \
+
+#define SCOND_FAIL_RETRY_VARS
+
+#endif
+
#define ATOMIC_OP(op, c_op, asm_op) \
static inline void atomic_##op(int i, atomic_t *v) \
{ \
- unsigned int val; \
+ unsigned int val; \
+ SCOND_FAIL_RETRY_VAR_DEF \
\
__asm__ __volatile__( \
"1: llock %[val], [%[ctr]] \n" \
" " #asm_op " %[val], %[val], %[i] \n" \
" scond %[val], [%[ctr]] \n" \
- " bnz 1b \n" \
+ " \n" \
+ SCOND_FAIL_RETRY_ASM \
+ \
: [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
+ SCOND_FAIL_RETRY_VARS \
: [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct
addr mode */ \
[i] "ir" (i) \
: "cc"); \
@@ -42,7 +75,8 @@ static inline void atomic_##op(int i, atomic_t *v) \
#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
static inline int atomic_##op##_return(int i, atomic_t *v) \
{ \
- unsigned int val; \
+ unsigned int val; \
+ SCOND_FAIL_RETRY_VAR_DEF \
\
/* \
* Explicit full memory barrier needed before/after as \
@@ -54,8 +88,11 @@ static inline int atomic_##op##_return(int i, atomic_t *v) \
"1: llock %[val], [%[ctr]] \n" \
" " #asm_op " %[val], %[val], %[i] \n" \
" scond %[val], [%[ctr]] \n" \
- " bnz 1b \n" \
+ " \n" \
+ SCOND_FAIL_RETRY_ASM \
+ \
: [val] "=&r" (val) \
+ SCOND_FAIL_RETRY_VARS \
: [ctr] "r" (&v->counter), \
[i] "ir" (i) \
: "cc"); \
@@ -142,6 +179,9 @@ ATOMIC_OP(and, &=, and)
#undef ATOMIC_OPS
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
+#undef SCOND_FAIL_RETRY_VAR_DEF
+#undef SCOND_FAIL_RETRY_ASM
+#undef SCOND_FAIL_RETRY_VARS

/**
* __atomic_add_unless - add unless the number is a given value

2015-08-03 14:08:26

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

On Mon, Aug 03, 2015 at 01:50:01PM +0000, Vineet Gupta wrote:
> diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
> index 3dd36c1efee1..c2e012ca4560 100644
> --- a/arch/arc/include/asm/atomic.h
> +++ b/arch/arc/include/asm/atomic.h
> @@ -23,17 +23,50 @@
>
> #define atomic_set(v, i) (((v)->counter) = (i))
>
> +#ifdef CONFIG_ARC_STAR_9000923308
> +
> +#define SCOND_FAIL_RETRY_VAR_DEF \
> + unsigned int delay = 1, tmp; \
> +
> +#define SCOND_FAIL_RETRY_ASM \
> + " bz 4f \n" \
> + " ; --- scond fail delay --- \n" \
> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \

You forgot the overflow test .. :-)

> + " b 1b \n" /* start over */ \
> + "4: ; --- success --- \n" \
> +
> +#define SCOND_FAIL_RETRY_VARS \
> + ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
> +
> +#else /* !CONFIG_ARC_STAR_9000923308 */
> +
> +#define SCOND_FAIL_RETRY_VAR_DEF
> +
> +#define SCOND_FAIL_RETRY_ASM \
> + " bnz 1b \n" \
> +
> +#define SCOND_FAIL_RETRY_VARS
> +
> +#endif

But yes, much better.

2015-08-03 14:40:19

by Vineet Gupta

[permalink] [raw]

Subject: Re: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle

On Monday 03 August 2015 05:14 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:08PM +0530, Vineet Gupta wrote:
>> > A spin lock could be available momentarily, but the SCOND to actually
>> > acquire it might still fail due to concurrent update from other core(s).
>> > To elide hardware lock, the sequence is retried after a "delay" which is
>> > increased expoenntially to get a nice backoff behaviour.
>> >
>> > However, this could cause the delay counter to get to a high value. Thus
>> > when the next spin cycle restarts, reset the counter back to starting
>> > value of 1.
> Cute.. fwiw, did you look at what Sparc64 does?
>

Can't really comprehend what's special there - are you referring to the special
branching or to the out of line slow path code for 32 bit version.

2015-08-03 14:42:22

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 6/6] ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle

On Mon, Aug 03, 2015 at 02:40:13PM +0000, Vineet Gupta wrote:
> On Monday 03 August 2015 05:14 PM, Peter Zijlstra wrote:
> > On Mon, Aug 03, 2015 at 03:33:08PM +0530, Vineet Gupta wrote:
> >> > A spin lock could be available momentarily, but the SCOND to actually
> >> > acquire it might still fail due to concurrent update from other core(s).
> >> > To elide hardware lock, the sequence is retried after a "delay" which is
> >> > increased expoenntially to get a nice backoff behaviour.
> >> >
> >> > However, this could cause the delay counter to get to a high value. Thus
> >> > when the next spin cycle restarts, reset the counter back to starting
> >> > value of 1.
> > Cute.. fwiw, did you look at what Sparc64 does?
> >
>
> Can't really comprehend what's special there - are you referring to the special
> branching or to the out of line slow path code for 32 bit version.

Its the only arch that I can remember seeing decay logic in.