2021-03-25 08:00:20

by Guo Ren

[permalink] [raw]
Subject: [PATCH v3 0/4] riscv: Add qspinlock/qrwlock

From: Guo Ren <[email protected]>

Current riscv is still using baby spinlock implementation. It'll cause
fairness and cache line bouncing problems. Many people are involved
and pay the efforts to improve it:

- The first version of patch was made in 2019.1:
https://lore.kernel.org/linux-riscv/[email protected]/#r

- The second version was made in 2020.11:
https://lore.kernel.org/linux-riscv/[email protected]/

- A good discussion at Platform HSC.2021-03-08:
https://drive.google.com/drive/folders/1ooqdnIsYx7XKor5O1XTtM6D1CHp4hc0p

Hope your comments and Tested-by or Co-developed-by or Reviewed-by ...

Let's kick the qspinlock into riscv right now (Also for the
architectures which doesn't have short atmoic xchg instructions.)

Change V3:
- Fixup short-xchg asm code (slli -> slliw, srli -> srliw)
- Coding convention by Peter Zijlstra's advices

Change V2:
- Coding convention in cmpxchg.h
- Re-implement short xchg
- Remove char & cmpxchg implementations

V1: (by michael)

Guo Ren (3):
riscv: cmpxchg.h: Cleanup unused code
riscv: cmpxchg.h: Merge macros
riscv: cmpxchg.h: Implement xchg for short

Michael Clark (1):
riscv: Convert custom spinlock/rwlock to generic qspinlock/qrwlock

arch/riscv/Kconfig | 2 +
arch/riscv/include/asm/Kbuild | 3 +
arch/riscv/include/asm/cmpxchg.h | 211 ++++++------------------
arch/riscv/include/asm/spinlock.h | 126 +-------------
arch/riscv/include/asm/spinlock_types.h | 15 +-
5 files changed, 58 insertions(+), 299 deletions(-)

--
2.17.1


2021-03-25 08:00:52

by Guo Ren

[permalink] [raw]
Subject: [PATCH v3 3/4] riscv: cmpxchg.h: Implement xchg for short

From: Guo Ren <[email protected]>

riscv only support lr.wd/s(c).w(d) with word(double word) size &
align access. There are not lr.h/sc.h instructions. But qspinlock.c
need xchg with short type variable:

xchg_tail -> xchg_releaxed(&lock->tail, ...

typedef struct qspinlock {
union {
atomic_t val;

/*
* By using the whole 2nd least significant byte for the
* pending bit, we can allow better optimization of the lock
* acquisition for the pending bit holder.
*/
struct {
u8 locked;
u8 pending;
};
struct {
u16 locked_pending;
u16 tail; /* half word*/
};
};
} arch_spinlock_t;

So we add short emulation in xchg with word length and it only
solve qspinlock's requirement.

Michael has sent another implementation, see the Link below.

Signed-off-by: Guo Ren <[email protected]>
Co-developed-by: Michael Clark <[email protected]>
Tested-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/linux-riscv/[email protected]/
Cc: Peter Zijlstra <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 36 ++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 50513b95411d..5ca41152cf4b 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -22,7 +22,43 @@
__typeof__(ptr) __ptr = (ptr); \
__typeof__(new) __new = (new); \
__typeof__(*(ptr)) __ret; \
+ register unsigned long __rc, tmp, align, addr; \
switch (size) { \
+ case 2: \
+ align = ((unsigned long) __ptr & 0x3); \
+ addr = ((unsigned long) __ptr & ~0x3); \
+ if (align) { \
+ __asm__ __volatile__ ( \
+ "0: lr.w %0, (%4) \n" \
+ " mv %1, %0 \n" \
+ " slliw %1, %1, 16 \n" \
+ " srliw %1, %1, 16 \n" \
+ " mv %2, %3 \n" \
+ " slliw %2, %2, 16 \n" \
+ " or %1, %2, %1 \n" \
+ " sc.w %2, %1, (%4) \n" \
+ " bnez %2, 0b \n" \
+ " srliw %0, %0, 16 \n" \
+ : "=&r" (__ret), "=&r" (tmp), "=&r" (__rc) \
+ : "r" (__new), "r"(addr) \
+ : "memory"); \
+ } else { \
+ __asm__ __volatile__ ( \
+ "0: lr.w %0, (%4) \n" \
+ " mv %1, %0 \n" \
+ " srliw %1, %1, 16 \n" \
+ " slliw %1, %1, 16 \n" \
+ " mv %2, %3 \n" \
+ " or %1, %2, %1 \n" \
+ " sc.w %2, %1, 0(%4) \n" \
+ " bnez %2, 0b \n" \
+ " slliw %0, %0, 16 \n" \
+ " srliw %0, %0, 16 \n" \
+ : "=&r" (__ret), "=&r" (tmp), "=&r" (__rc) \
+ : "r" (__new), "r"(addr) \
+ : "memory"); \
+ } \
+ break; \
case 4: \
__asm__ __volatile__ ( \
" amoswap.w %0, %2, %1\n" \
--
2.17.1