LinuxLists.cc - [PATCH v6 0/9] riscv: Add qspinlock/qrwlock

2021-03-31 14:33:40

Subject: [PATCH v6 0/9] riscv: Add qspinlock/qrwlock

From: Guo Ren <[email protected]>

Current riscv is still using baby spinlock implementation. It'll cause
fairness and cache line bouncing problems. Many people are involved
and pay the efforts to improve it:

- The first version of patch was made in 2019.1:
https://lore.kernel.org/linux-riscv/[email protected]/#r

- The second version was made in 2020.11:
https://lore.kernel.org/linux-riscv/[email protected]/

- A good discussion at Platform HSC.2021-03-08:
https://drive.google.com/drive/folders/1ooqdnIsYx7XKor5O1XTtM6D1CHp4hc0p

- A good discussion on V4 in mailling list:
https://lore.kernel.org/linux-riscv/[email protected]/T/#t

- Openrisc's maintainer want to implement arch_cmpxchg infrastructure.
https://lore.kernel.org/linux-riscv/[email protected]/T/#m11b712fb6a4fda043811b1f4c3d61446951ed65a

Hope your comments and Tested-by or Co-developed-by or Reviewed-by ...

Let's kick the qspinlock into riscv right now (Also for the
architecture which hasn't xchg16 atomic instruction.)

Change V6:
- Add ticket-lock for riscv, default is qspinlock
- Keep ticket-lock for csky, default is ticketlock
- Using smp_cond_load for riscv ticket-lock
- Optimize csky ticketlock with smp_cond_load, store_release
- Add PPC_LBARX_LWARX for powerpc

Change V5:
- Fixup #endif comment typo by Waiman
- Remove cmpxchg coding convention patches which will get into a
separate patchset later by Arnd's advice
- Try to involve more architectures in the discussion

Change V4:
- Remove custom sub-word xchg implementation
- Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock

Change V3:
- Coding convention by Peter Zijlstra's advices

Change V2:
- Coding convention in cmpxchg.h
- Re-implement short xchg
- Remove char & cmpxchg implementations

Guo Ren (8):
locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
riscv: locks: Introduce ticket-based spinlock implementation
csky: locks: Optimize coding convention
csky: Convert custom spinlock/rwlock to generic qspinlock/qrwlock
openrisc: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
sparc: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
xtensa: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
powerpc/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

Michael Clark (1):
riscv: Convert custom spinlock/rwlock to generic qspinlock/qrwlock

arch/csky/Kconfig | 8 ++
arch/csky/include/asm/Kbuild | 2 +
arch/csky/include/asm/spinlock.h | 15 +--
arch/csky/include/asm/spinlock_types.h | 4 +
arch/openrisc/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/riscv/Kconfig | 8 ++
arch/riscv/include/asm/Kbuild | 3 +
arch/riscv/include/asm/spinlock.h | 158 +++++++++---------------
arch/riscv/include/asm/spinlock_types.h | 26 ++--
arch/sparc/Kconfig | 1 +
arch/xtensa/Kconfig | 1 +
kernel/Kconfig.locks | 3 +
kernel/locking/qspinlock.c | 46 +++----
14 files changed, 142 insertions(+), 135 deletions(-)

--
2.17.1

2021-03-31 14:33:40

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

From: Guo Ren <[email protected]>

Some architectures don't have sub-word swap atomic instruction,
they only have the full word's one.

The sub-word swap only improve the performance when:
NR_CPUS < 16K
* 0- 7: locked byte
* 8: pending
* 9-15: not used
* 16-17: tail index
* 18-31: tail cpu (+1)

The 9-15 bits are wasted to use xchg16 in xchg_tail.

Please let architecture select xchg16/xchg32 to implement
xchg_tail.

Signed-off-by: Guo Ren <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Anup Patel <[email protected]>
---
kernel/Kconfig.locks | 3 +++
kernel/locking/qspinlock.c | 46 +++++++++++++++++++++-----------------
2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 3de8fd11873b..d02f1261f73f 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -239,6 +239,9 @@ config LOCK_SPIN_ON_OWNER
config ARCH_USE_QUEUED_SPINLOCKS
bool

+config ARCH_USE_QUEUED_SPINLOCKS_XCHG32
+ bool
+
config QUEUED_SPINLOCKS
def_bool y if ARCH_USE_QUEUED_SPINLOCKS
depends on SMP
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index cbff6ba53d56..4bfaa969bd15 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -163,26 +163,6 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
}

-/*
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail), which heads an address dependency
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
- /*
- * We can use relaxed semantics since the caller ensures that the
- * MCS node is properly initialized before updating the tail.
- */
- return (u32)xchg_relaxed(&lock->tail,
- tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
-}
-
#else /* _Q_PENDING_BITS == 8 */

/**
@@ -206,6 +186,30 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
{
atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
}
+#endif /* _Q_PENDING_BITS == 8 */
+
+#if _Q_PENDING_BITS == 8 && !defined(CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32)
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail), which heads an address dependency
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+ /*
+ * We can use relaxed semantics since the caller ensures that the
+ * MCS node is properly initialized before updating the tail.
+ */
+ return (u32)xchg_relaxed(&lock->tail,
+ tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else

/**
* xchg_tail - Put in the new queue tail code word & retrieve previous one
@@ -236,7 +240,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
}
return old;
}
-#endif /* _Q_PENDING_BITS == 8 */
+#endif

/**
* queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
--
2.17.1

2021-03-31 14:33:43

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 2/9] riscv: Convert custom spinlock/rwlock to generic qspinlock/qrwlock

From: Michael Clark <[email protected]>

Update the RISC-V port to use the generic qspinlock and qrwlock.

This patch requires support for xchg_xtail for full-word which
are added by a previous patch:

Guo added select ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in Kconfig

Guo fixed up compile error which made by below include sequence:
+#include <asm/qrwlock.h>
+#include <asm/qspinlock.h>

Signed-off-by: Michael Clark <[email protected]>
Co-developed-by: Guo Ren <[email protected]>
Tested-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/linux-riscv/[email protected]/
Cc: Peter Zijlstra <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
---
arch/riscv/Kconfig | 3 +
arch/riscv/include/asm/Kbuild | 3 +
arch/riscv/include/asm/spinlock.h | 126 +-----------------------
arch/riscv/include/asm/spinlock_types.h | 15 +--
4 files changed, 11 insertions(+), 136 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 87d7b52f278f..67cc65ba1ea1 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -33,6 +33,9 @@ config RISCV
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+ select ARCH_USE_QUEUED_RWLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
select COMMON_CLK
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 445ccc97305a..750c1056b90f 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -3,5 +3,8 @@ generic-y += early_ioremap.h
generic-y += extable.h
generic-y += flat.h
generic-y += kvm_para.h
+generic-y += mcs_spinlock.h
+generic-y += qrwlock.h
+generic-y += qspinlock.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index f4f7fa1b7ca8..a557de67a425 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -7,129 +7,7 @@
#ifndef _ASM_RISCV_SPINLOCK_H
#define _ASM_RISCV_SPINLOCK_H

-#include <linux/kernel.h>
-#include <asm/current.h>
-#include <asm/fence.h>
-
-/*
- * Simple spin lock operations. These provide no fairness guarantees.
- */
-
-/* FIXME: Replace this with a ticket lock, like MIPS. */
-
-#define arch_spin_is_locked(x) (READ_ONCE((x)->lock) != 0)
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
- smp_store_release(&lock->lock, 0);
-}
-
-static inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
- int tmp = 1, busy;
-
- __asm__ __volatile__ (
- " amoswap.w %0, %2, %1\n"
- RISCV_ACQUIRE_BARRIER
- : "=r" (busy), "+A" (lock->lock)
- : "r" (tmp)
- : "memory");
-
- return !busy;
-}
-
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
- while (1) {
- if (arch_spin_is_locked(lock))
- continue;
-
- if (arch_spin_trylock(lock))
- break;
- }
-}
-
-/***********************************************************/
-
-static inline void arch_read_lock(arch_rwlock_t *lock)
-{
- int tmp;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bltz %1, 1b\n"
- " addi %1, %1, 1\n"
- " sc.w %1, %1, %0\n"
- " bnez %1, 1b\n"
- RISCV_ACQUIRE_BARRIER
- : "+A" (lock->lock), "=&r" (tmp)
- :: "memory");
-}
-
-static inline void arch_write_lock(arch_rwlock_t *lock)
-{
- int tmp;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bnez %1, 1b\n"
- " li %1, -1\n"
- " sc.w %1, %1, %0\n"
- " bnez %1, 1b\n"
- RISCV_ACQUIRE_BARRIER
- : "+A" (lock->lock), "=&r" (tmp)
- :: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *lock)
-{
- int busy;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bltz %1, 1f\n"
- " addi %1, %1, 1\n"
- " sc.w %1, %1, %0\n"
- " bnez %1, 1b\n"
- RISCV_ACQUIRE_BARRIER
- "1:\n"
- : "+A" (lock->lock), "=&r" (busy)
- :: "memory");
-
- return !busy;
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *lock)
-{
- int busy;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bnez %1, 1f\n"
- " li %1, -1\n"
- " sc.w %1, %1, %0\n"
- " bnez %1, 1b\n"
- RISCV_ACQUIRE_BARRIER
- "1:\n"
- : "+A" (lock->lock), "=&r" (busy)
- :: "memory");
-
- return !busy;
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *lock)
-{
- __asm__ __volatile__(
- RISCV_RELEASE_BARRIER
- " amoadd.w x0, %1, %0\n"
- : "+A" (lock->lock)
- : "r" (-1)
- : "memory");
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *lock)
-{
- smp_store_release(&lock->lock, 0);
-}
+#include <asm/qspinlock.h>
+#include <asm/qrwlock.h>

#endif /* _ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
index f398e7638dd6..d033a973f287 100644
--- a/arch/riscv/include/asm/spinlock_types.h
+++ b/arch/riscv/include/asm/spinlock_types.h
@@ -6,20 +6,11 @@
#ifndef _ASM_RISCV_SPINLOCK_TYPES_H
#define _ASM_RISCV_SPINLOCK_TYPES_H

-#ifndef __LINUX_SPINLOCK_TYPES_H
+#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(_ASM_RISCV_SPINLOCK_H)
# error "please don't include this file directly"
#endif

-typedef struct {
- volatile unsigned int lock;
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
-
-typedef struct {
- volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED { 0 }
+#include <asm-generic/qspinlock_types.h>
+#include <asm-generic/qrwlock_types.h>

#endif /* _ASM_RISCV_SPINLOCK_TYPES_H */
--
2.17.1

2021-03-31 14:33:50

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 4/9] csky: locks: Optimize coding convention

From: Guo Ren <[email protected]>

- Using smp_cond_load_acquire in arch_spin_lock by Peter's
advice.
- Using __smp_acquire_fence in arch_spin_trylock
- Using smp_store_release in arch_spin_unlock

All above are just coding conventions and won't affect the
function.

TODO in smp_cond_load_acquire for architecture:
- current csky only has:
lr.w val, <p0>
sc.w <p0>. val2
(Any other stores to p0 will let sc.w failed)

- But smp_cond_load_acquire need:
lr.w val, <p0>
wfe
(Any stores to p0 will send the event to let wfe retired)

Signed-off-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/linux-riscv/CAAhSdy1JHLUFwu7RuCaQ+RUWRBks2KsDva7EpRt8--4ZfofSUQ@mail.gmail.com/T/#m13adac285b7f51f4f879a5d6b65753ecb1a7524e
Cc: Peter Zijlstra <[email protected]>
Cc: Arnd Bergmann <[email protected]>
---
arch/csky/include/asm/spinlock.h | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
index 69f5aa249c5f..69677167977a 100644
--- a/arch/csky/include/asm/spinlock.h
+++ b/arch/csky/include/asm/spinlock.h
@@ -26,10 +26,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
: "r"(p), "r"(ticket_next)
: "cc");

- while (lockval.tickets.next != lockval.tickets.owner)
- lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
-
- smp_mb();
+ smp_cond_load_acquire(&lock->tickets.owner,
+ VAL == lockval.tickets.next);
}

static inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -55,15 +53,14 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
} while (!res);

if (!contended)
- smp_mb();
+ __smp_acquire_fence();

return !contended;
}

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
- smp_mb();
- WRITE_ONCE(lock->tickets.owner, lock->tickets.owner + 1);
+ smp_store_release(&lock->tickets.owner, lock->tickets.owner + 1);
}

static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
--
2.17.1

2021-03-31 14:34:13

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 5/9] csky: Convert custom spinlock/rwlock to generic qspinlock/qrwlock

From: Guo Ren <[email protected]>

Update the C-SKY port to use the generic qspinlock and qrwlock.

C-SKY only support ldex.w/stex.w with word(double word) size &
align access. So it must select XCHG32 to let qspinlock only use
word atomic xchg_tail.

Default is still ticket lock.

Signed-off-by: Guo Ren <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Arnd Bergmann <[email protected]>
---
arch/csky/Kconfig | 8 ++++++++
arch/csky/include/asm/Kbuild | 2 ++
arch/csky/include/asm/spinlock.h | 4 ++++
arch/csky/include/asm/spinlock_types.h | 4 ++++
4 files changed, 18 insertions(+)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 34e91224adc3..ae12332edb7b 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -8,6 +8,8 @@ config CSKY
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS if !CSKY_TICKET_LOCK
+ select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
select ARCH_WANT_FRAME_POINTERS if !CPU_CK610
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
select COMMON_CLK
@@ -304,6 +306,12 @@ config NR_CPUS
depends on SMP
default "4"

+config CSKY_TICKET_LOCK
+ bool "Ticket-based spin-locking"
+ default y
+ help
+ Say Y here to use ticket-based spin-locking.
+
config HIGHMEM
bool "High Memory Support"
depends on !CPU_CK610
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index cc24bb8e539f..2a2d09963bb9 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -2,6 +2,8 @@
generic-y += asm-offsets.h
generic-y += gpio.h
generic-y += kvm_para.h
+generic-y += mcs_spinlock.h
generic-y += qrwlock.h
+generic-y += qspinlock.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
index 69677167977a..fe98ad8ece51 100644
--- a/arch/csky/include/asm/spinlock.h
+++ b/arch/csky/include/asm/spinlock.h
@@ -6,6 +6,7 @@
#include <linux/spinlock_types.h>
#include <asm/barrier.h>

+#ifdef CONFIG_CSKY_TICKET_LOCK
/*
* Ticket-based spin-locking.
*/
@@ -80,6 +81,9 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
return (tickets.next - tickets.owner) > 1;
}
#define arch_spin_is_contended arch_spin_is_contended
+#else /* CONFIG_CSKY_TICKET_LOCK */
+#include <asm/qspinlock.h>
+#endif /* CONFIG_CSKY_TICKET_LOCK */

#include <asm/qrwlock.h>

diff --git a/arch/csky/include/asm/spinlock_types.h b/arch/csky/include/asm/spinlock_types.h
index 8ff0f6ff3a00..547f035f6dd5 100644
--- a/arch/csky/include/asm/spinlock_types.h
+++ b/arch/csky/include/asm/spinlock_types.h
@@ -7,6 +7,7 @@
# error "please don't include this file directly"
#endif

+#ifdef CONFIG_CSKY_TICKET_LOCK
#define TICKET_NEXT 16

typedef struct {
@@ -21,6 +22,9 @@ typedef struct {
} arch_spinlock_t;

#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
+#else
+#include <asm-generic/qspinlock_types.h>
+#endif

#include <asm-generic/qrwlock_types.h>

--
2.17.1

2021-03-31 14:34:15

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 6/9] openrisc: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

From: Guo Ren <[email protected]>

We don't have native hw xchg16 instruction, so let qspinlock
generic code to deal with it.

Using the full-word atomic xchg instructions implement xchg16 has
the semantic risk for atomic operations.

This patch cancels the dependency of on qspinlock generic code on
architecture's xchg16.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: [email protected]
---
arch/openrisc/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 591acc5990dc..b299e409429f 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -33,6 +33,7 @@ config OPENRISC
select OR1K_PIC
select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
select ARCH_USE_QUEUED_RWLOCKS
select OMPIC if SMP
select ARCH_WANT_FRAME_POINTERS
--
2.17.1

2021-03-31 14:34:41

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 8/9] xtensa: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

From: Guo Ren <[email protected]>

We don't have native hw xchg16 instruction, so let qspinlock
generic code to deal with it.

Using the full-word atomic xchg instructions implement xchg16 has
the semantic risk for atomic operations.

This patch cancels the dependency of on qspinlock generic code on
architecture's xchg16.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
---
arch/xtensa/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 9ad6b7b82707..f19d780638f7 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -9,6 +9,7 @@ config XTENSA
select ARCH_HAS_DMA_SET_UNCACHED if MMU
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_TABLE_SORT
--
2.17.1

2021-03-31 14:34:45

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 9/9] powerpc/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

From: Guo Ren <[email protected]>

We don't have native hw xchg16 instruction, so let qspinlock
generic code to deal with it.

Using the full-word atomic xchg instructions implement xchg16 has
the semantic risk for atomic operations.

This patch cancels the dependency of on qspinlock generic code on
architecture's xchg16.

Also no need when PPC_LBARX_LWARX is enabled, see the link below.

Signed-off-by: Guo Ren <[email protected]>
Link: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
---
arch/powerpc/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..6133ad51690e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -151,6 +151,7 @@ config PPC
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_QUEUED_RWLOCKS if PPC_QUEUED_SPINLOCKS
select ARCH_USE_QUEUED_SPINLOCKS if PPC_QUEUED_SPINLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS_XCHG32 if PPC_QUEUED_SPINLOCKS && !PPC_LBARX_LWARX
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select ARCH_WANT_LD_ORPHAN_WARN
--
2.17.1

2021-03-31 14:35:47

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 3/9] riscv: locks: Introduce ticket-based spinlock implementation

From: Guo Ren <[email protected]>

This patch introduces a ticket lock implementation for riscv, along the
same lines as the implementation for arch/arm & arch/csky.

We still use qspinlock as default.

Signed-off-by: Guo Ren <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
---
arch/riscv/Kconfig | 7 ++-
arch/riscv/include/asm/spinlock.h | 84 +++++++++++++++++++++++++
arch/riscv/include/asm/spinlock_types.h | 17 +++++
3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 67cc65ba1ea1..34d0276f01d5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -34,7 +34,7 @@ config RISCV
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
select ARCH_USE_QUEUED_RWLOCKS
- select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS if !RISCV_TICKET_LOCK
select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
@@ -344,6 +344,11 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA

+config RISCV_TICKET_LOCK
+ bool "Ticket-based spin-locking"
+ help
+ Say Y here to use ticket-based spin-locking.
+
config RISCV_ISA_C
bool "Emit compressed instructions when building Linux"
default y
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index a557de67a425..90b7eaa950cf 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -7,7 +7,91 @@
#ifndef _ASM_RISCV_SPINLOCK_H
#define _ASM_RISCV_SPINLOCK_H

+#ifdef CONFIG_RISCV_TICKET_LOCK
+#ifdef CONFIG_32BIT
+#define __ASM_SLLIW "slli\t"
+#define __ASM_SRLIW "srli\t"
+#else
+#define __ASM_SLLIW "slliw\t"
+#define __ASM_SRLIW "srliw\t"
+#endif
+
+/*
+ * Ticket-based spin-locking.
+ */
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+ arch_spinlock_t lockval;
+ u32 tmp;
+
+ asm volatile (
+ "1: lr.w %0, %2 \n"
+ " mv %1, %0 \n"
+ " addw %0, %0, %3 \n"
+ " sc.w %0, %0, %2 \n"
+ " bnez %0, 1b \n"
+ : "=&r" (tmp), "=&r" (lockval), "+A" (lock->lock)
+ : "r" (1 << TICKET_NEXT)
+ : "memory");
+
+ smp_cond_load_acquire(&lock->tickets.owner,
+ VAL == lockval.tickets.next);
+}
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ u32 tmp, contended, res;
+
+ do {
+ asm volatile (
+ " lr.w %0, %3 \n"
+ __ASM_SRLIW "%1, %0, %5 \n"
+ __ASM_SLLIW "%2, %0, %5 \n"
+ " or %1, %2, %1 \n"
+ " li %2, 0 \n"
+ " sub %1, %1, %0 \n"
+ " bnez %1, 1f \n"
+ " addw %0, %0, %4 \n"
+ " sc.w %2, %0, %3 \n"
+ "1: \n"
+ : "=&r" (tmp), "=&r" (contended), "=&r" (res),
+ "+A" (lock->lock)
+ : "r" (1 << TICKET_NEXT), "I" (TICKET_NEXT)
+ : "memory");
+ } while (res);
+
+ if (!contended)
+ __atomic_acquire_fence();
+
+ return !contended;
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+ smp_store_release(&lock->tickets.owner, lock->tickets.owner + 1);
+}
+
+static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+ return lock.tickets.owner == lock.tickets.next;
+}
+
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+ return !arch_spin_value_unlocked(READ_ONCE(*lock));
+}
+
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
+{
+ struct __raw_tickets tickets = READ_ONCE(lock->tickets);
+
+ return (tickets.next - tickets.owner) > 1;
+}
+#define arch_spin_is_contended arch_spin_is_contended
+#else /* CONFIG_RISCV_TICKET_LOCK */
#include <asm/qspinlock.h>
+#endif /* CONFIG_RISCV_TICKET_LOCK */
+
#include <asm/qrwlock.h>

#endif /* _ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
index d033a973f287..afbb19841d0f 100644
--- a/arch/riscv/include/asm/spinlock_types.h
+++ b/arch/riscv/include/asm/spinlock_types.h
@@ -10,7 +10,24 @@
# error "please don't include this file directly"
#endif

+#ifdef CONFIG_RISCV_TICKET_LOCK
+#define TICKET_NEXT 16
+
+typedef struct {
+ union {
+ u32 lock;
+ struct __raw_tickets {
+ /* little endian */
+ u16 owner;
+ u16 next;
+ } tickets;
+ };
+} arch_spinlock_t;
+
+#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
+#else
#include <asm-generic/qspinlock_types.h>
+#endif
#include <asm-generic/qrwlock_types.h>

#endif /* _ASM_RISCV_SPINLOCK_TYPES_H */
--
2.17.1

2021-03-31 14:36:34

by Guo Ren

[permalink] [raw]

Subject: [PATCH v6 7/9] sparc: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

From: Guo Ren <[email protected]>

We don't have native hw xchg16 instruction, so let qspinlock
generic code to deal with it.

Using the full-word atomic xchg instructions implement xchg16 has
the semantic risk for atomic operations.

This patch cancels the dependency of on qspinlock generic code on
architecture's xchg16.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Rob Gardner <[email protected]>
---
arch/sparc/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 164a5254c91c..1079fe3f058c 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -91,6 +91,7 @@ config SPARC64
select HAVE_REGS_AND_STACK_ACCESS_API
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
select GENERIC_TIME_VSYSCALL
select ARCH_CLOCKSOURCE_DATA
select ARCH_HAS_PTE_SPECIAL
--
2.17.1

2021-04-05 05:55:16

by Guo Ren

[permalink] [raw]

Subject: Re: [PATCH v6 3/9] riscv: locks: Introduce ticket-based spinlock implementation

On Wed, Mar 31, 2021 at 10:32 PM <[email protected]> wrote:
>
> From: Guo Ren <[email protected]>
>
> This patch introduces a ticket lock implementation for riscv, along the
> same lines as the implementation for arch/arm & arch/csky.
>
> We still use qspinlock as default.
>
> Signed-off-by: Guo Ren <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Anup Patel <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> ---
> arch/riscv/Kconfig | 7 ++-
> arch/riscv/include/asm/spinlock.h | 84 +++++++++++++++++++++++++
> arch/riscv/include/asm/spinlock_types.h | 17 +++++
> 3 files changed, 107 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 67cc65ba1ea1..34d0276f01d5 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -34,7 +34,7 @@ config RISCV
> select ARCH_WANT_FRAME_POINTERS
> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> select ARCH_USE_QUEUED_RWLOCKS
> - select ARCH_USE_QUEUED_SPINLOCKS
> + select ARCH_USE_QUEUED_SPINLOCKS if !RISCV_TICKET_LOCK
> select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> select CLONE_BACKWARDS
> select CLINT_TIMER if !MMU
> @@ -344,6 +344,11 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> +config RISCV_TICKET_LOCK
> + bool "Ticket-based spin-locking"
> + help
> + Say Y here to use ticket-based spin-locking.
> +
> config RISCV_ISA_C
> bool "Emit compressed instructions when building Linux"
> default y
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index a557de67a425..90b7eaa950cf 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -7,7 +7,91 @@
> #ifndef _ASM_RISCV_SPINLOCK_H
> #define _ASM_RISCV_SPINLOCK_H
>
> +#ifdef CONFIG_RISCV_TICKET_LOCK
> +#ifdef CONFIG_32BIT
> +#define __ASM_SLLIW "slli\t"
> +#define __ASM_SRLIW "srli\t"
> +#else
> +#define __ASM_SLLIW "slliw\t"
> +#define __ASM_SRLIW "srliw\t"
> +#endif
> +
> +/*
> + * Ticket-based spin-locking.
> + */
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + arch_spinlock_t lockval;
> + u32 tmp;
> +
> + asm volatile (
> + "1: lr.w %0, %2 \n"
> + " mv %1, %0 \n"
> + " addw %0, %0, %3 \n"
> + " sc.w %0, %0, %2 \n"
> + " bnez %0, 1b \n"
> + : "=&r" (tmp), "=&r" (lockval), "+A" (lock->lock)
> + : "r" (1 << TICKET_NEXT)
> + : "memory");
It's could be optimized by amoadd.w with Anup advice, and I'll update
it in the next patchset version:
diff --git a/arch/riscv/include/asm/spinlock.h
b/arch/riscv/include/asm/spinlock.h
index 90b7eaa950cf..435286ad342b 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -22,15 +22,10 @@
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
arch_spinlock_t lockval;
- u32 tmp;

asm volatile (
- "1: lr.w %0, %2 \n"
- " mv %1, %0 \n"
- " addw %0, %0, %3 \n"
- " sc.w %0, %0, %2 \n"
- " bnez %0, 1b \n"
- : "=&r" (tmp), "=&r" (lockval), "+A" (lock->lock)
+ " amoadd.w %0, %2, %1 \n"
+ : "=&r" (lockval), "+A" (lock->lock)
: "r" (1 << TICKET_NEXT)
: "memory");

> +
> + smp_cond_load_acquire(&lock->tickets.owner,
> + VAL == lockval.tickets.next);
> +}
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + u32 tmp, contended, res;
> +
> + do {
> + asm volatile (
> + " lr.w %0, %3 \n"
> + __ASM_SRLIW "%1, %0, %5 \n"
> + __ASM_SLLIW "%2, %0, %5 \n"
> + " or %1, %2, %1 \n"
> + " li %2, 0 \n"
> + " sub %1, %1, %0 \n"
> + " bnez %1, 1f \n"
> + " addw %0, %0, %4 \n"
> + " sc.w %2, %0, %3 \n"
> + "1: \n"
> + : "=&r" (tmp), "=&r" (contended), "=&r" (res),
> + "+A" (lock->lock)
> + : "r" (1 << TICKET_NEXT), "I" (TICKET_NEXT)
> + : "memory");
> + } while (res);
> +
> + if (!contended)
> + __atomic_acquire_fence();
> +
> + return !contended;
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + smp_store_release(&lock->tickets.owner, lock->tickets.owner + 1);
> +}
> +
> +static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> +{
> + return lock.tickets.owner == lock.tickets.next;
> +}
> +
> +static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> +{
> + return !arch_spin_value_unlocked(READ_ONCE(*lock));
> +}
> +
> +static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> +{
> + struct __raw_tickets tickets = READ_ONCE(lock->tickets);
> +
> + return (tickets.next - tickets.owner) > 1;
> +}
> +#define arch_spin_is_contended arch_spin_is_contended
> +#else /* CONFIG_RISCV_TICKET_LOCK */
> #include <asm/qspinlock.h>
> +#endif /* CONFIG_RISCV_TICKET_LOCK */
> +
> #include <asm/qrwlock.h>
>
> #endif /* _ASM_RISCV_SPINLOCK_H */
> diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
> index d033a973f287..afbb19841d0f 100644
> --- a/arch/riscv/include/asm/spinlock_types.h
> +++ b/arch/riscv/include/asm/spinlock_types.h
> @@ -10,7 +10,24 @@
> # error "please don't include this file directly"
> #endif
>
> +#ifdef CONFIG_RISCV_TICKET_LOCK
> +#define TICKET_NEXT 16
> +
> +typedef struct {
> + union {
> + u32 lock;
> + struct __raw_tickets {
> + /* little endian */
> + u16 owner;
> + u16 next;
> + } tickets;
> + };
> +} arch_spinlock_t;
> +
> +#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
> +#else
> #include <asm-generic/qspinlock_types.h>
> +#endif
> #include <asm-generic/qrwlock_types.h>
>
> #endif /* _ASM_RISCV_SPINLOCK_TYPES_H */
> --
> 2.17.1
>

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-04-06 17:51:09

by Stafford Horne

[permalink] [raw]

Subject: Re: [PATCH v6 6/9] openrisc: qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Wed, Mar 31, 2021 at 02:30:37PM +0000, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> We don't have native hw xchg16 instruction, so let qspinlock
> generic code to deal with it.
>
> Using the full-word atomic xchg instructions implement xchg16 has
> the semantic risk for atomic operations.
>
> This patch cancels the dependency of on qspinlock generic code on
> architecture's xchg16.
>
> Signed-off-by: Guo Ren <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: [email protected]

Acked-by: Stafford Horne <[email protected]>

> ---
> arch/openrisc/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
> index 591acc5990dc..b299e409429f 100644
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -33,6 +33,7 @@ config OPENRISC
> select OR1K_PIC
> select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
> select ARCH_USE_QUEUED_SPINLOCKS
> + select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> select ARCH_USE_QUEUED_RWLOCKS
> select OMPIC if SMP
> select ARCH_WANT_FRAME_POINTERS
> --
> 2.17.1
>

2021-04-07 07:28:19

by Boqun Feng

[permalink] [raw]

Subject: Re: [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

Hi,

On Wed, Mar 31, 2021 at 02:30:32PM +0000, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> Some architectures don't have sub-word swap atomic instruction,
> they only have the full word's one.
>
> The sub-word swap only improve the performance when:
> NR_CPUS < 16K
> * 0- 7: locked byte
> * 8: pending
> * 9-15: not used
> * 16-17: tail index
> * 18-31: tail cpu (+1)
>
> The 9-15 bits are wasted to use xchg16 in xchg_tail.
>
> Please let architecture select xchg16/xchg32 to implement
> xchg_tail.
>

If the architecture doesn't have sub-word swap atomic, won't it generate
the same/similar code no matter which version xchg_tail() is used? That
is even CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, xchg_tail() acts
similar to an xchg16() implemented by cmpxchg(), which means we still
don't have forward progress guarantee. So this configuration doesn't
solve the problem.

I think it's OK to introduce this config and don't provide xchg16() for
risc-v. But I don't see the point of converting other architectures to
use it.

Regards,
Boqun

> Signed-off-by: Guo Ren <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Waiman Long <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Anup Patel <[email protected]>
> ---
> kernel/Kconfig.locks | 3 +++
> kernel/locking/qspinlock.c | 46 +++++++++++++++++++++-----------------
> 2 files changed, 28 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
> index 3de8fd11873b..d02f1261f73f 100644
> --- a/kernel/Kconfig.locks
> +++ b/kernel/Kconfig.locks
> @@ -239,6 +239,9 @@ config LOCK_SPIN_ON_OWNER
> config ARCH_USE_QUEUED_SPINLOCKS
> bool
>
> +config ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> + bool
> +
> config QUEUED_SPINLOCKS
> def_bool y if ARCH_USE_QUEUED_SPINLOCKS
> depends on SMP
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index cbff6ba53d56..4bfaa969bd15 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -163,26 +163,6 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
> WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
> }
>
> -/*
> - * xchg_tail - Put in the new queue tail code word & retrieve previous one
> - * @lock : Pointer to queued spinlock structure
> - * @tail : The new queue tail code word
> - * Return: The previous queue tail code word
> - *
> - * xchg(lock, tail), which heads an address dependency
> - *
> - * p,*,* -> n,*,* ; prev = xchg(lock, node)
> - */
> -static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
> -{
> - /*
> - * We can use relaxed semantics since the caller ensures that the
> - * MCS node is properly initialized before updating the tail.
> - */
> - return (u32)xchg_relaxed(&lock->tail,
> - tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
> -}
> -
> #else /* _Q_PENDING_BITS == 8 */
>
> /**
> @@ -206,6 +186,30 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
> {
> atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
> }
> +#endif /* _Q_PENDING_BITS == 8 */
> +
> +#if _Q_PENDING_BITS == 8 && !defined(CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32)
> +/*
> + * xchg_tail - Put in the new queue tail code word & retrieve previous one
> + * @lock : Pointer to queued spinlock structure
> + * @tail : The new queue tail code word
> + * Return: The previous queue tail code word
> + *
> + * xchg(lock, tail), which heads an address dependency
> + *
> + * p,*,* -> n,*,* ; prev = xchg(lock, node)
> + */
> +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
> +{
> + /*
> + * We can use relaxed semantics since the caller ensures that the
> + * MCS node is properly initialized before updating the tail.
> + */
> + return (u32)xchg_relaxed(&lock->tail,
> + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
> +}
> +
> +#else
>
> /**
> * xchg_tail - Put in the new queue tail code word & retrieve previous one
> @@ -236,7 +240,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
> }
> return old;
> }
> -#endif /* _Q_PENDING_BITS == 8 */
> +#endif
>
> /**
> * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
> --
> 2.17.1
>

2021-04-07 15:51:02

by Stafford Horne

[permalink] [raw]

Subject: Re: [OpenRISC] [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Wed, Apr 07, 2021 at 12:51:56AM +0800, Boqun Feng wrote:
> Hi,
>
> On Wed, Mar 31, 2021 at 02:30:32PM +0000, [email protected] wrote:
> > From: Guo Ren <[email protected]>
> >
> > Some architectures don't have sub-word swap atomic instruction,
> > they only have the full word's one.
> >
> > The sub-word swap only improve the performance when:
> > NR_CPUS < 16K
> > * 0- 7: locked byte
> > * 8: pending
> > * 9-15: not used
> > * 16-17: tail index
> > * 18-31: tail cpu (+1)
> >
> > The 9-15 bits are wasted to use xchg16 in xchg_tail.
> >
> > Please let architecture select xchg16/xchg32 to implement
> > xchg_tail.
> >
>
> If the architecture doesn't have sub-word swap atomic, won't it generate
> the same/similar code no matter which version xchg_tail() is used? That
> is even CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, xchg_tail() acts
> similar to an xchg16() implemented by cmpxchg(), which means we still
> don't have forward progress guarantee. So this configuration doesn't
> solve the problem.
>
> I think it's OK to introduce this config and don't provide xchg16() for
> risc-v. But I don't see the point of converting other architectures to
> use it.

Hello,

For OpenRISC I did ack the patch to convert to
CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y. But I think you are right, the
generic code in xchg_tail and the xchg16 emulation code in produced by OpenRISC
using xchg32 would produce very similar code. I have not compared instructions,
but it does seem like duplicate functionality.

Why doesn't RISC-V add the xchg16 emulation code similar to OpenRISC? For
OpenRISC we added xchg16 and xchg8 emulation code to enable qspinlocks. So
one thought is with CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, can we remove our
xchg16/xchg8 emulation code?

-Stafford

2021-04-07 20:42:42

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [OpenRISC] [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Wed, Apr 07, 2021 at 08:52:08AM +0900, Stafford Horne wrote:
> Why doesn't RISC-V add the xchg16 emulation code similar to OpenRISC? For
> OpenRISC we added xchg16 and xchg8 emulation code to enable qspinlocks. So
> one thought is with CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, can we remove our
> xchg16/xchg8 emulation code?

CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32 is guaranteed crap.

All the architectures that have wanted it are RISC style LL/SC archs,
and for them a cmpxchg loop is a daft thing to do, since it reduces the
chance of it behaving sanely.

Why would we provide something that's known to be suboptimal? If an
architecture chooses to not care about determinism and or fwd progress,
then that's their choice. But not one, I feel, we should encourage.

2021-04-07 22:42:59

by Stafford Horne

[permalink] [raw]

Subject: Re: [OpenRISC] [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Wed, Apr 07, 2021 at 11:47:49AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 08:52:08AM +0900, Stafford Horne wrote:
> > Why doesn't RISC-V add the xchg16 emulation code similar to OpenRISC? For
> > OpenRISC we added xchg16 and xchg8 emulation code to enable qspinlocks. So
> > one thought is with CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, can we remove our
> > xchg16/xchg8 emulation code?
>
> CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32 is guaranteed crap.
>
> All the architectures that have wanted it are RISC style LL/SC archs,
> and for them a cmpxchg loop is a daft thing to do, since it reduces the
> chance of it behaving sanely.
>
> Why would we provide something that's known to be suboptimal? If an
> architecture chooses to not care about determinism and or fwd progress,
> then that's their choice. But not one, I feel, we should encourage.

Thanks, this is the response I was hoping my comment would provoke.

So not enabling CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32 for architectures
unless they really want it should be the way.

-Stafford

2021-04-08 19:02:23

by Waiman Long

[permalink] [raw]

Subject: Re: [OpenRISC] [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On 4/6/21 7:52 PM, Stafford Horne wrote:
>
> For OpenRISC I did ack the patch to convert to
> CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y. But I think you are right, the
> generic code in xchg_tail and the xchg16 emulation code in produced by OpenRISC
> using xchg32 would produce very similar code. I have not compared instructions,
> but it does seem like duplicate functionality.
>
> Why doesn't RISC-V add the xchg16 emulation code similar to OpenRISC? For
> OpenRISC we added xchg16 and xchg8 emulation code to enable qspinlocks. So
> one thought is with CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, can we remove our
> xchg16/xchg8 emulation code?

For the record, the latest qspinlock code doesn't use xchg8 anymore. It
still need xchg16, though.

Cheers,
Longman

2021-04-11 17:44:57

by Guo Ren

[permalink] [raw]

Subject: Re: [PATCH v6 3/9] riscv: locks: Introduce ticket-based spinlock implementation

On Wed, Mar 31, 2021 at 10:32 PM <[email protected]> wrote:
>
> From: Guo Ren <[email protected]>
>
> This patch introduces a ticket lock implementation for riscv, along the
> same lines as the implementation for arch/arm & arch/csky.
>
> We still use qspinlock as default.
>
> Signed-off-by: Guo Ren <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Anup Patel <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> ---
> arch/riscv/Kconfig | 7 ++-
> arch/riscv/include/asm/spinlock.h | 84 +++++++++++++++++++++++++
> arch/riscv/include/asm/spinlock_types.h | 17 +++++
> 3 files changed, 107 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 67cc65ba1ea1..34d0276f01d5 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -34,7 +34,7 @@ config RISCV
> select ARCH_WANT_FRAME_POINTERS
> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> select ARCH_USE_QUEUED_RWLOCKS
> - select ARCH_USE_QUEUED_SPINLOCKS
> + select ARCH_USE_QUEUED_SPINLOCKS if !RISCV_TICKET_LOCK
> select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> select CLONE_BACKWARDS
> select CLINT_TIMER if !MMU
> @@ -344,6 +344,11 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> +config RISCV_TICKET_LOCK
> + bool "Ticket-based spin-locking"
> + help
> + Say Y here to use ticket-based spin-locking.
> +
> config RISCV_ISA_C
> bool "Emit compressed instructions when building Linux"
> default y
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index a557de67a425..90b7eaa950cf 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -7,7 +7,91 @@
> #ifndef _ASM_RISCV_SPINLOCK_H
> #define _ASM_RISCV_SPINLOCK_H
>
> +#ifdef CONFIG_RISCV_TICKET_LOCK
> +#ifdef CONFIG_32BIT
> +#define __ASM_SLLIW "slli\t"
> +#define __ASM_SRLIW "srli\t"
> +#else
> +#define __ASM_SLLIW "slliw\t"
> +#define __ASM_SRLIW "srliw\t"
> +#endif
> +
> +/*
> + * Ticket-based spin-locking.
> + */
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + arch_spinlock_t lockval;
> + u32 tmp;
> +
> + asm volatile (
> + "1: lr.w %0, %2 \n"
> + " mv %1, %0 \n"
> + " addw %0, %0, %3 \n"
> + " sc.w %0, %0, %2 \n"
> + " bnez %0, 1b \n"
> + : "=&r" (tmp), "=&r" (lockval), "+A" (lock->lock)
> + : "r" (1 << TICKET_NEXT)
> + : "memory");
> +
> + smp_cond_load_acquire(&lock->tickets.owner,
> + VAL == lockval.tickets.next);
It's wrong, blew is fixup:

diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
index fe98ad8ece51..2be627ceb9df 100644
--- a/arch/csky/include/asm/spinlock.h
+++ b/arch/csky/include/asm/spinlock.h
@@ -27,7 +27,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
: "r"(p), "r"(ticket_next)
: "cc");

- smp_cond_load_acquire(&lock->tickets.owner,
+ if (lockval.owner != lockval.tickets.next)
+ smp_cond_load_acquire(&lock->tickets.owner,
VAL == lockval.tickets.next);
> +}
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + u32 tmp, contended, res;
> +
> + do {
> + asm volatile (
> + " lr.w %0, %3 \n"
> + __ASM_SRLIW "%1, %0, %5 \n"
> + __ASM_SLLIW "%2, %0, %5 \n"
> + " or %1, %2, %1 \n"
> + " li %2, 0 \n"
> + " sub %1, %1, %0 \n"
> + " bnez %1, 1f \n"
> + " addw %0, %0, %4 \n"
> + " sc.w %2, %0, %3 \n"
> + "1: \n"
> + : "=&r" (tmp), "=&r" (contended), "=&r" (res),
> + "+A" (lock->lock)
> + : "r" (1 << TICKET_NEXT), "I" (TICKET_NEXT)
> + : "memory");
> + } while (res);
> +
> + if (!contended)
> + __atomic_acquire_fence();
> +
> + return !contended;
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + smp_store_release(&lock->tickets.owner, lock->tickets.owner + 1);
> +}
> +
> +static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> +{
> + return lock.tickets.owner == lock.tickets.next;
> +}
> +
> +static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> +{
> + return !arch_spin_value_unlocked(READ_ONCE(*lock));
> +}
> +
> +static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> +{
> + struct __raw_tickets tickets = READ_ONCE(lock->tickets);
> +
> + return (tickets.next - tickets.owner) > 1;
> +}
> +#define arch_spin_is_contended arch_spin_is_contended
> +#else /* CONFIG_RISCV_TICKET_LOCK */
> #include <asm/qspinlock.h>
> +#endif /* CONFIG_RISCV_TICKET_LOCK */
> +
> #include <asm/qrwlock.h>
>
> #endif /* _ASM_RISCV_SPINLOCK_H */
> diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
> index d033a973f287..afbb19841d0f 100644
> --- a/arch/riscv/include/asm/spinlock_types.h
> +++ b/arch/riscv/include/asm/spinlock_types.h
> @@ -10,7 +10,24 @@
> # error "please don't include this file directly"
> #endif
>
> +#ifdef CONFIG_RISCV_TICKET_LOCK
> +#define TICKET_NEXT 16
> +
> +typedef struct {
> + union {
> + u32 lock;
> + struct __raw_tickets {
> + /* little endian */
> + u16 owner;
> + u16 next;
> + } tickets;
> + };
> +} arch_spinlock_t;
> +
> +#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
> +#else
> #include <asm-generic/qspinlock_types.h>
> +#endif
> #include <asm-generic/qrwlock_types.h>
>
> #endif /* _ASM_RISCV_SPINLOCK_TYPES_H */
> --
> 2.17.1
>

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-04-11 19:27:54

by Guo Ren

[permalink] [raw]

Subject: Re: [PATCH v6 4/9] csky: locks: Optimize coding convention

On Wed, Mar 31, 2021 at 10:32 PM <[email protected]> wrote:
>
> From: Guo Ren <[email protected]>
>
> - Using smp_cond_load_acquire in arch_spin_lock by Peter's
> advice.
> - Using __smp_acquire_fence in arch_spin_trylock
> - Using smp_store_release in arch_spin_unlock
>
> All above are just coding conventions and won't affect the
> function.
>
> TODO in smp_cond_load_acquire for architecture:
> - current csky only has:
> lr.w val, <p0>
> sc.w <p0>. val2
> (Any other stores to p0 will let sc.w failed)
>
> - But smp_cond_load_acquire need:
> lr.w val, <p0>
> wfe
> (Any stores to p0 will send the event to let wfe retired)
>
> Signed-off-by: Guo Ren <[email protected]>
> Link: https://lore.kernel.org/linux-riscv/CAAhSdy1JHLUFwu7RuCaQ+RUWRBks2KsDva7EpRt8--4ZfofSUQ@mail.gmail.com/T/#m13adac285b7f51f4f879a5d6b65753ecb1a7524e
> Cc: Peter Zijlstra <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> ---
> arch/csky/include/asm/spinlock.h | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
> index 69f5aa249c5f..69677167977a 100644
> --- a/arch/csky/include/asm/spinlock.h
> +++ b/arch/csky/include/asm/spinlock.h
> @@ -26,10 +26,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
> : "r"(p), "r"(ticket_next)
> : "cc");
>
> - while (lockval.tickets.next != lockval.tickets.owner)
> - lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
> -
> - smp_mb();
> + smp_cond_load_acquire(&lock->tickets.owner,
> + VAL == lockval.tickets.next);
It's wrong, we should determine lockval before next read.

Fixup:

diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
index fe98ad8ece51..2be627ceb9df 100644
--- a/arch/csky/include/asm/spinlock.h
+++ b/arch/csky/include/asm/spinlock.h
@@ -27,7 +27,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
: "r"(p), "r"(ticket_next)
: "cc");

- smp_cond_load_acquire(&lock->tickets.owner,
+ if (lockval.owner != lockval.tickets.next)
+ smp_cond_load_acquire(&lock->tickets.owner,
VAL == lockval.tickets.next);

> }
>
> static inline int arch_spin_trylock(arch_spinlock_t *lock)
> @@ -55,15 +53,14 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
> } while (!res);
>
> if (!contended)
> - smp_mb();
> + __smp_acquire_fence();
>
> return !contended;
> }
>
> static inline void arch_spin_unlock(arch_spinlock_t *lock)
> {
> - smp_mb();
> - WRITE_ONCE(lock->tickets.owner, lock->tickets.owner + 1);
> + smp_store_release(&lock->tickets.owner, lock->tickets.owner + 1);
> }
>
> static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> --
> 2.17.1
>

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-04-11 19:47:33

by Guo Ren

[permalink] [raw]

Subject: Re: [PATCH v6 3/9] riscv: locks: Introduce ticket-based spinlock implementation

On Mon, Apr 12, 2021 at 12:02 AM Guo Ren <[email protected]> wrote:
>
> On Wed, Mar 31, 2021 at 10:32 PM <[email protected]> wrote:
> >
> > From: Guo Ren <[email protected]>
> >
> > This patch introduces a ticket lock implementation for riscv, along the
> > same lines as the implementation for arch/arm & arch/csky.
> >
> > We still use qspinlock as default.
> >
> > Signed-off-by: Guo Ren <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Anup Patel <[email protected]>
> > Cc: Arnd Bergmann <[email protected]>
> > ---
> > arch/riscv/Kconfig | 7 ++-
> > arch/riscv/include/asm/spinlock.h | 84 +++++++++++++++++++++++++
> > arch/riscv/include/asm/spinlock_types.h | 17 +++++
> > 3 files changed, 107 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 67cc65ba1ea1..34d0276f01d5 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -34,7 +34,7 @@ config RISCV
> > select ARCH_WANT_FRAME_POINTERS
> > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> > select ARCH_USE_QUEUED_RWLOCKS
> > - select ARCH_USE_QUEUED_SPINLOCKS
> > + select ARCH_USE_QUEUED_SPINLOCKS if !RISCV_TICKET_LOCK
> > select ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> > select CLONE_BACKWARDS
> > select CLINT_TIMER if !MMU
> > @@ -344,6 +344,11 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> > def_bool y
> > depends on NUMA
> >
> > +config RISCV_TICKET_LOCK
> > + bool "Ticket-based spin-locking"
> > + help
> > + Say Y here to use ticket-based spin-locking.
> > +
> > config RISCV_ISA_C
> > bool "Emit compressed instructions when building Linux"
> > default y
> > diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> > index a557de67a425..90b7eaa950cf 100644
> > --- a/arch/riscv/include/asm/spinlock.h
> > +++ b/arch/riscv/include/asm/spinlock.h
> > @@ -7,7 +7,91 @@
> > #ifndef _ASM_RISCV_SPINLOCK_H
> > #define _ASM_RISCV_SPINLOCK_H
> >
> > +#ifdef CONFIG_RISCV_TICKET_LOCK
> > +#ifdef CONFIG_32BIT
> > +#define __ASM_SLLIW "slli\t"
> > +#define __ASM_SRLIW "srli\t"
> > +#else
> > +#define __ASM_SLLIW "slliw\t"
> > +#define __ASM_SRLIW "srliw\t"
> > +#endif
> > +
> > +/*
> > + * Ticket-based spin-locking.
> > + */
> > +static inline void arch_spin_lock(arch_spinlock_t *lock)
> > +{
> > + arch_spinlock_t lockval;
> > + u32 tmp;
> > +
> > + asm volatile (
> > + "1: lr.w %0, %2 \n"
> > + " mv %1, %0 \n"
> > + " addw %0, %0, %3 \n"
> > + " sc.w %0, %0, %2 \n"
> > + " bnez %0, 1b \n"
> > + : "=&r" (tmp), "=&r" (lockval), "+A" (lock->lock)
> > + : "r" (1 << TICKET_NEXT)
> > + : "memory");
> > +
> > + smp_cond_load_acquire(&lock->tickets.owner,
> > + VAL == lockval.tickets.next);
> It's wrong, blew is fixup:
>
> diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
> index fe98ad8ece51..2be627ceb9df 100644
> --- a/arch/csky/include/asm/spinlock.h
> +++ b/arch/csky/include/asm/spinlock.h
> @@ -27,7 +27,8 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
> : "r"(p), "r"(ticket_next)
> : "cc");
>
> - smp_cond_load_acquire(&lock->tickets.owner,
> + if (lockval.owner != lockval.tickets.next)
> + smp_cond_load_acquire(&lock->tickets.owner,
> VAL == lockval.tickets.next);
eh... plus __smp_acquire_fence:

if (lockval.owner != lockval.tickets.next)
smp_cond_load_acquire(&lock->tickets.owner,
VAL == lockval.tickets.next);
else
__smp_acquire_fence();

> > +}
> > +
> > +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> > +{
> > + u32 tmp, contended, res;
> > +
> > + do {
> > + asm volatile (
> > + " lr.w %0, %3 \n"
> > + __ASM_SRLIW "%1, %0, %5 \n"
> > + __ASM_SLLIW "%2, %0, %5 \n"
> > + " or %1, %2, %1 \n"
> > + " li %2, 0 \n"
> > + " sub %1, %1, %0 \n"
> > + " bnez %1, 1f \n"
> > + " addw %0, %0, %4 \n"
> > + " sc.w %2, %0, %3 \n"
> > + "1: \n"
> > + : "=&r" (tmp), "=&r" (contended), "=&r" (res),
> > + "+A" (lock->lock)
> > + : "r" (1 << TICKET_NEXT), "I" (TICKET_NEXT)
> > + : "memory");
> > + } while (res);
> > +
> > + if (!contended)
> > + __atomic_acquire_fence();
> > +
> > + return !contended;
> > +}
> > +
> > +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > +{
> > + smp_store_release(&lock->tickets.owner, lock->tickets.owner + 1);
> > +}
> > +
> > +static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> > +{
> > + return lock.tickets.owner == lock.tickets.next;
> > +}
> > +
> > +static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> > +{
> > + return !arch_spin_value_unlocked(READ_ONCE(*lock));
> > +}
> > +
> > +static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> > +{
> > + struct __raw_tickets tickets = READ_ONCE(lock->tickets);
> > +
> > + return (tickets.next - tickets.owner) > 1;
> > +}
> > +#define arch_spin_is_contended arch_spin_is_contended
> > +#else /* CONFIG_RISCV_TICKET_LOCK */
> > #include <asm/qspinlock.h>
> > +#endif /* CONFIG_RISCV_TICKET_LOCK */
> > +
> > #include <asm/qrwlock.h>
> >
> > #endif /* _ASM_RISCV_SPINLOCK_H */
> > diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
> > index d033a973f287..afbb19841d0f 100644
> > --- a/arch/riscv/include/asm/spinlock_types.h
> > +++ b/arch/riscv/include/asm/spinlock_types.h
> > @@ -10,7 +10,24 @@
> > # error "please don't include this file directly"
> > #endif
> >
> > +#ifdef CONFIG_RISCV_TICKET_LOCK
> > +#define TICKET_NEXT 16
> > +
> > +typedef struct {
> > + union {
> > + u32 lock;
> > + struct __raw_tickets {
> > + /* little endian */
> > + u16 owner;
> > + u16 next;
> > + } tickets;
> > + };
> > +} arch_spinlock_t;
> > +
> > +#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
> > +#else
> > #include <asm-generic/qspinlock_types.h>
> > +#endif
> > #include <asm-generic/qrwlock_types.h>
> >
> > #endif /* _ASM_RISCV_SPINLOCK_TYPES_H */
> > --
> > 2.17.1
> >
>
>
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/