From: Guo Ren <[email protected]>
patch[1 - 8]: Native qspinlock
patch[9 -14]: Paravirt qspinlock
This series based on:
- v6.7-rc7
- Rework & improve riscv cmpxchg.h and atomic.h
https://lore.kernel.org/linux-riscv/[email protected]/
You can directly try it:
https://github.com/guoren83/linux/tree/qspinlock_v12
Native qspinlock
================
This time we've proved the qspinlock on th1520 [1] & sg2042 [2], which
gives stability and performance improvement. All T-HEAD processors have
a strong LR/SC forward progress guarantee than the requirements of the
ISA, which could satisfy the xchg_tail of native_qspinlock. Now,
qspinlock has been run with us for more than 1 year, and we have enough
confidence to enable it for all the T-HEAD processors. Of causes, we
found a livelock problem with the qspinlock lock torture test from the
CPU store merge buffer delay mechanism, which caused the queued spinlock
becomes a dead ring and RCU warning to come out. We introduce a custom
WRITE_ONCE to solve this, which will be fixed in the next generation of
hardware.
We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress
test on Fedora & Ubuntu & OpenEuler ... Here is the performance
comparison between qspinlock and ticket_lock on sg2042 (64 cores):
sysbench test=threads threads=32 yields=100 lock=8 (+13.8%):
queued_spinlock 0.5109/0.00
ticket_spinlock 0.5814/0.00
perf futex/hash (+6.7%):
queued_spinlock 1444393 operations/sec (+- 0.09%)
ticket_spinlock 1353215 operations/sec (+- 0.15%)
perf futex/wake-parallel (+8.6%):
queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%)
ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%)
perf futex/requeue (+4.2%):
queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%)
ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%)
System Benchmarks (+6.4%)
queued_spinlock:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 628613745.4 53865.8
Double-Precision Whetstone 55.0 182422.8 33167.8
Execl Throughput 43.0 13116.6 3050.4
File Copy 1024 bufsize 2000 maxblocks 3960.0 7762306.2 19601.8
File Copy 256 bufsize 500 maxblocks 1655.0 3417556.8 20649.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 7427995.7 12806.9
Pipe Throughput 12440.0 23058600.5 18535.9
Pipe-based Context Switching 4000.0 2835617.7 7089.0
Process Creation 126.0 12537.3 995.0
Shell Scripts (1 concurrent) 42.4 57057.4 13456.9
Shell Scripts (8 concurrent) 6.0 7367.1 12278.5
System Call Overhead 15000.0 33308301.3 22205.5
========
System Benchmarks Index Score 12426.1
ticket_spinlock:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 626541701.9 53688.2
Double-Precision Whetstone 55.0 181921.0 33076.5
Execl Throughput 43.0 12625.1 2936.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 6553792.9 16550.0
File Copy 256 bufsize 500 maxblocks 1655.0 3189231.6 19270.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 7221277.0 12450.5
Pipe Throughput 12440.0 20594018.7 16554.7
Pipe-based Context Switching 4000.0 2571117.7 6427.8
Process Creation 126.0 10798.4 857.0
Shell Scripts (1 concurrent) 42.4 57227.5 13497.1
Shell Scripts (8 concurrent) 6.0 7329.2 12215.3
System Call Overhead 15000.0 30766778.4 20511.2
========
System Benchmarks Index Score 11670.7
The qspinlock has a significant improvement on SOPHGO SG2042 64
cores platform than the ticket_lock.
Paravirt qspinlock
==================
We implemented kvm_kick_cpu/kvm_wait_cpu and add tracepoints to observe
the behaviors and introduce a new SBI extension SBI_EXT_PVLOCK.
Changlog:
V12:
- Remove force thead qspinlock with errata
- Separate Zicbop patch from this series
- Remove cpus >= 16 patch
- Cleanup rebase and move it on v6.7-rc7
- Reorder the coding struct with the last version's advice.
V11:
https://lore.kernel.org/linux-riscv/[email protected]/
- Based on Leonardo Bras's cmpxchg_small patches v5.
- Based on Guo Ren's Optimize arch_spin_value_unlocked patch v3.
- Remove abusing alternative framework and use jump_label instead.
- Introduce prefetch.w to improve T-HEAD processors' LR/SC forward
progress guarantee.
- Optimize qspinlock xchg_tail when NR_CPUS >= 16K.
V10:
https://lore.kernel.org/linux-riscv/[email protected]/
- Using an alternative framework instead of static_key_branch in the
asm/spinlock.h.
- Fixup store merge buffer problem, which causes qspinlock lock
torture test livelock.
- Add paravirt qspinlock support, include KVM backend
- Add Compact NUMA-awared qspinlock support
V9:
https://lore.kernel.org/linux-riscv/[email protected]/
- Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as
RCsc)
- Add qspinlock and combo-lock for riscv
- Add qspinlock to openrisc
- Use generic header in csky
- Optimize cmpxchg & atomic code
V8:
https://lore.kernel.org/linux-riscv/[email protected]/
- Coding convention ticket fixup
- Move combo spinlock into riscv and simply asm-generic/spinlock.h
- Fixup xchg16 with wrong return value
- Add csky qspinlock
- Add combo & qspinlock & ticket-lock comparison
- Clean up unnecessary riscv acquire and release definitions
- Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky
V7:
https://lore.kernel.org/linux-riscv/[email protected]/
- Add combo spinlock (ticket & queued) support
- Rename ticket_spinlock.h
- Remove unnecessary atomic_read in ticket_spin_value_unlocked
V6:
https://lore.kernel.org/linux-riscv/[email protected]/
- Fixup Clang compile problem Reported-by: kernel test robot
- Cleanup asm-generic/spinlock.h
- Remove changelog in patch main comment part, suggested by
Conor.Dooley
- Remove "default y if NUMA" in Kconfig
V5:
https://lore.kernel.org/linux-riscv/[email protected]/
- Update comment with RISC-V forward guarantee feature.
- Back to V3 direction and optimize asm code.
V4:
https://lore.kernel.org/linux-riscv/[email protected]/
- Remove custom sub-word xchg implementation
- Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock
V3:
https://lore.kernel.org/linux-riscv/[email protected]/
- Coding convention by Peter Zijlstra's advices
V2:
https://lore.kernel.org/linux-riscv/[email protected]/
- Coding convention in cmpxchg.h
- Re-implement short xchg
- Remove char & cmpxchg implementations
V1:
https://lore.kernel.org/linux-riscv/[email protected]/
- Using cmpxchg loop to implement sub-word atomic
Guo Ren (14):
asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
asm-generic: ticket-lock: Add separate ticket-lock.h
riscv: errata: Move errata vendor func-id into vendorid_list.h
riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup
riscv: qspinlock: Add basic queued_spinlock support
riscv: qspinlock: Introduce combo spinlock
riscv: qspinlock: Add virt_spin_lock() support for VM guest
riscv: qspinlock: Force virt_spin_lock for KVM guests
RISC-V: paravirt: Add pvqspinlock KVM backend
RISC-V: paravirt: Add pvqspinlock frontend skeleton
RISC-V: paravirt: pvqspinlock: Add SBI implementation
RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter
RISC-V: paravirt: pvqspinlock: Add kconfig entry
RISC-V: paravirt: pvqspinlock: Add trace point for pv_kick/wait
.../admin-guide/kernel-parameters.txt | 8 +-
arch/riscv/Kconfig | 35 ++++++
arch/riscv/Kconfig.errata | 19 ++++
arch/riscv/errata/thead/errata.c | 20 ++++
arch/riscv/include/asm/Kbuild | 3 +-
arch/riscv/include/asm/errata_list.h | 18 ---
arch/riscv/include/asm/kvm_vcpu_sbi.h | 1 +
arch/riscv/include/asm/qspinlock.h | 35 ++++++
arch/riscv/include/asm/qspinlock_paravirt.h | 29 +++++
arch/riscv/include/asm/rwonce.h | 31 ++++++
arch/riscv/include/asm/sbi.h | 14 +++
arch/riscv/include/asm/spinlock.h | 88 +++++++++++++++
arch/riscv/include/asm/vendorid_list.h | 19 ++++
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/qspinlock_paravirt.c | 83 ++++++++++++++
arch/riscv/kernel/sbi.c | 2 +-
arch/riscv/kernel/setup.c | 68 ++++++++++++
.../kernel/trace_events_filter_paravirt.h | 60 ++++++++++
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu_sbi.c | 4 +
arch/riscv/kvm/vcpu_sbi_pvlock.c | 57 ++++++++++
include/asm-generic/qspinlock.h | 2 +
include/asm-generic/rwonce.h | 2 +
include/asm-generic/spinlock.h | 87 +--------------
include/asm-generic/spinlock_types.h | 12 +-
include/asm-generic/ticket_spinlock.h | 105 ++++++++++++++++++
27 files changed, 688 insertions(+), 117 deletions(-)
create mode 100644 arch/riscv/include/asm/qspinlock.h
create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h
create mode 100644 arch/riscv/include/asm/rwonce.h
create mode 100644 arch/riscv/include/asm/spinlock.h
create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c
create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h
create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c
create mode 100644 include/asm-generic/ticket_spinlock.h
--
2.40.1
From: Guo Ren <[email protected]>
The arch_spinlock_t of qspinlock has contained the atomic_t val, which
satisfies the ticket-lock requirement. Thus, unify the arch_spinlock_t
into qspinlock_types.h. This is the preparation for the next combo
spinlock.
Reviewed-by: Leonardo Bras <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
include/asm-generic/spinlock.h | 14 +++++++-------
include/asm-generic/spinlock_types.h | 12 ++----------
2 files changed, 9 insertions(+), 17 deletions(-)
diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 90803a826ba0..4773334ee638 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -32,7 +32,7 @@
static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
{
- u32 val = atomic_fetch_add(1<<16, lock);
+ u32 val = atomic_fetch_add(1<<16, &lock->val);
u16 ticket = val >> 16;
if (ticket == (u16)val)
@@ -46,31 +46,31 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
* have no outstanding writes due to the atomic_fetch_add() the extra
* orderings are free.
*/
- atomic_cond_read_acquire(lock, ticket == (u16)VAL);
+ atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
smp_mb();
}
static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
{
- u32 old = atomic_read(lock);
+ u32 old = atomic_read(&lock->val);
if ((old >> 16) != (old & 0xffff))
return false;
- return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
+ return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
}
static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
{
u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
- u32 val = atomic_read(lock);
+ u32 val = atomic_read(&lock->val);
smp_store_release(ptr, (u16)val + 1);
}
static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
{
- u32 val = lock.counter;
+ u32 val = lock.val.counter;
return ((val >> 16) == (val & 0xffff));
}
@@ -84,7 +84,7 @@ static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
{
- u32 val = atomic_read(lock);
+ u32 val = atomic_read(&lock->val);
return (s16)((val >> 16) - (val & 0xffff)) > 1;
}
diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
index 8962bb730945..f534aa5de394 100644
--- a/include/asm-generic/spinlock_types.h
+++ b/include/asm-generic/spinlock_types.h
@@ -3,15 +3,7 @@
#ifndef __ASM_GENERIC_SPINLOCK_TYPES_H
#define __ASM_GENERIC_SPINLOCK_TYPES_H
-#include <linux/types.h>
-typedef atomic_t arch_spinlock_t;
-
-/*
- * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
- * include.
- */
-#include <asm/qrwlock_types.h>
-
-#define __ARCH_SPIN_LOCK_UNLOCKED ATOMIC_INIT(0)
+#include <asm-generic/qspinlock_types.h>
+#include <asm-generic/qrwlock_types.h>
#endif /* __ASM_GENERIC_SPINLOCK_TYPES_H */
--
2.40.1
From: Guo Ren <[email protected]>
Add a separate ticket-lock.h to include multiple spinlock versions and
select one at compile time or runtime.
Reviewed-by: Leonardo Bras <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
include/asm-generic/spinlock.h | 87 +---------------------
include/asm-generic/ticket_spinlock.h | 103 ++++++++++++++++++++++++++
2 files changed, 104 insertions(+), 86 deletions(-)
create mode 100644 include/asm-generic/ticket_spinlock.h
diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 4773334ee638..970590baf61b 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -1,94 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * 'Generic' ticket-lock implementation.
- *
- * It relies on atomic_fetch_add() having well defined forward progress
- * guarantees under contention. If your architecture cannot provide this, stick
- * to a test-and-set lock.
- *
- * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
- * sub-word of the value. This is generally true for anything LL/SC although
- * you'd be hard pressed to find anything useful in architecture specifications
- * about this. If your architecture cannot do this you might be better off with
- * a test-and-set.
- *
- * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
- * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
- * a full fence after the spin to upgrade the otherwise-RCpc
- * atomic_cond_read_acquire().
- *
- * The implementation uses smp_cond_load_acquire() to spin, so if the
- * architecture has WFE like instructions to sleep instead of poll for word
- * modifications be sure to implement that (see ARM64 for example).
- *
- */
-
#ifndef __ASM_GENERIC_SPINLOCK_H
#define __ASM_GENERIC_SPINLOCK_H
-#include <linux/atomic.h>
-#include <asm-generic/spinlock_types.h>
-
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
- u32 val = atomic_fetch_add(1<<16, &lock->val);
- u16 ticket = val >> 16;
-
- if (ticket == (u16)val)
- return;
-
- /*
- * atomic_cond_read_acquire() is RCpc, but rather than defining a
- * custom cond_read_rcsc() here we just emit a full fence. We only
- * need the prior reads before subsequent writes ordering from
- * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
- * have no outstanding writes due to the atomic_fetch_add() the extra
- * orderings are free.
- */
- atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
- smp_mb();
-}
-
-static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
-{
- u32 old = atomic_read(&lock->val);
-
- if ((old >> 16) != (old & 0xffff))
- return false;
-
- return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
- u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
- u32 val = atomic_read(&lock->val);
-
- smp_store_release(ptr, (u16)val + 1);
-}
-
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
- u32 val = lock.val.counter;
-
- return ((val >> 16) == (val & 0xffff));
-}
-
-static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
- arch_spinlock_t val = READ_ONCE(*lock);
-
- return !arch_spin_value_unlocked(val);
-}
-
-static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
- u32 val = atomic_read(&lock->val);
-
- return (s16)((val >> 16) - (val & 0xffff)) > 1;
-}
-
+#include <asm-generic/ticket_spinlock.h>
#include <asm/qrwlock.h>
#endif /* __ASM_GENERIC_SPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
new file mode 100644
index 000000000000..cfcff22b37b3
--- /dev/null
+++ b/include/asm-generic/ticket_spinlock.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * 'Generic' ticket-lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
+ * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
+ * a full fence after the spin to upgrade the otherwise-RCpc
+ * atomic_cond_read_acquire().
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
+ */
+
+#ifndef __ASM_GENERIC_TICKET_SPINLOCK_H
+#define __ASM_GENERIC_TICKET_SPINLOCK_H
+
+#include <linux/atomic.h>
+#include <asm-generic/spinlock_types.h>
+
+static __always_inline void ticket_spin_lock(arch_spinlock_t *lock)
+{
+ u32 val = atomic_fetch_add(1<<16, &lock->val);
+ u16 ticket = val >> 16;
+
+ if (ticket == (u16)val)
+ return;
+
+ /*
+ * atomic_cond_read_acquire() is RCpc, but rather than defining a
+ * custom cond_read_rcsc() here we just emit a full fence. We only
+ * need the prior reads before subsequent writes ordering from
+ * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
+ * have no outstanding writes due to the atomic_fetch_add() the extra
+ * orderings are free.
+ */
+ atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
+ smp_mb();
+}
+
+static __always_inline bool ticket_spin_trylock(arch_spinlock_t *lock)
+{
+ u32 old = atomic_read(&lock->val);
+
+ if ((old >> 16) != (old & 0xffff))
+ return false;
+
+ return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
+}
+
+static __always_inline void ticket_spin_unlock(arch_spinlock_t *lock)
+{
+ u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+ u32 val = atomic_read(&lock->val);
+
+ smp_store_release(ptr, (u16)val + 1);
+}
+
+static __always_inline int ticket_spin_value_unlocked(arch_spinlock_t lock)
+{
+ u32 val = lock.val.counter;
+
+ return ((val >> 16) == (val & 0xffff));
+}
+
+static __always_inline int ticket_spin_is_locked(arch_spinlock_t *lock)
+{
+ arch_spinlock_t val = READ_ONCE(*lock);
+
+ return !ticket_spin_value_unlocked(val);
+}
+
+static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
+{
+ u32 val = atomic_read(&lock->val);
+
+ return (s16)((val >> 16) - (val & 0xffff)) > 1;
+}
+
+/*
+ * Remapping spinlock architecture specific functions to the corresponding
+ * ticket spinlock functions.
+ */
+#define arch_spin_is_locked(l) ticket_spin_is_locked(l)
+#define arch_spin_is_contended(l) ticket_spin_is_contended(l)
+#define arch_spin_value_unlocked(l) ticket_spin_value_unlocked(l)
+#define arch_spin_lock(l) ticket_spin_lock(l)
+#define arch_spin_trylock(l) ticket_spin_trylock(l)
+#define arch_spin_unlock(l) ticket_spin_unlock(l)
+
+#endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
--
2.40.1
From: Guo Ren <[email protected]>
Move errata vendor func-id definitions from errata_list into
vendorid_list.h. Unifying these definitions is also for following
rwonce errata implementation.
Suggested-by: Leonardo Bras <[email protected]>
Link: https://lore.kernel.org/linux-riscv/[email protected]/
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/errata_list.h | 18 ------------------
arch/riscv/include/asm/vendorid_list.h | 18 ++++++++++++++++++
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 83ed25e43553..31bbd9840e97 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -11,24 +11,6 @@
#include <asm/hwcap.h>
#include <asm/vendorid_list.h>
-#ifdef CONFIG_ERRATA_ANDES
-#define ERRATA_ANDESTECH_NO_IOCP 0
-#define ERRATA_ANDESTECH_NUMBER 1
-#endif
-
-#ifdef CONFIG_ERRATA_SIFIVE
-#define ERRATA_SIFIVE_CIP_453 0
-#define ERRATA_SIFIVE_CIP_1200 1
-#define ERRATA_SIFIVE_NUMBER 2
-#endif
-
-#ifdef CONFIG_ERRATA_THEAD
-#define ERRATA_THEAD_PBMT 0
-#define ERRATA_THEAD_CMO 1
-#define ERRATA_THEAD_PMU 2
-#define ERRATA_THEAD_NUMBER 3
-#endif
-
#ifdef __ASSEMBLY__
#define ALT_INSN_FAULT(x) \
diff --git a/arch/riscv/include/asm/vendorid_list.h b/arch/riscv/include/asm/vendorid_list.h
index e55407ace0c3..c503373193d2 100644
--- a/arch/riscv/include/asm/vendorid_list.h
+++ b/arch/riscv/include/asm/vendorid_list.h
@@ -9,4 +9,22 @@
#define SIFIVE_VENDOR_ID 0x489
#define THEAD_VENDOR_ID 0x5b7
+#ifdef CONFIG_ERRATA_ANDES
+#define ERRATA_ANDESTECH_NO_IOCP 0
+#define ERRATA_ANDESTECH_NUMBER 1
+#endif
+
+#ifdef CONFIG_ERRATA_SIFIVE
+#define ERRATA_SIFIVE_CIP_453 0
+#define ERRATA_SIFIVE_CIP_1200 1
+#define ERRATA_SIFIVE_NUMBER 2
+#endif
+
+#ifdef CONFIG_ERRATA_THEAD
+#define ERRATA_THEAD_PBMT 0
+#define ERRATA_THEAD_CMO 1
+#define ERRATA_THEAD_PMU 2
+#define ERRATA_THEAD_NUMBER 3
+#endif
+
#endif
--
2.40.1
From: Guo Ren <[email protected]>
The early version of T-Head C9xx cores has a store merge buffer
delay problem. The store merge buffer could improve the store queue
performance by merging multi-store requests, but when there are not
continued store requests, the prior single store request would be
waiting in the store queue for a long time. That would cause
significant problems for communication between multi-cores. This
problem was found on sg2042 & th1520 platforms with the qspinlock
lock torture test.
So appending a fence w.o could immediately flush the store merge
buffer and let other cores see the write result.
This will apply the WRITE_ONCE errata to handle the non-standard
behavior via appending a fence w.o instruction for WRITE_ONCE().
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/Kconfig.errata | 19 ++++++++++++++++
arch/riscv/errata/thead/errata.c | 20 +++++++++++++++++
arch/riscv/include/asm/rwonce.h | 31 ++++++++++++++++++++++++++
arch/riscv/include/asm/vendorid_list.h | 3 ++-
include/asm-generic/rwonce.h | 2 ++
5 files changed, 74 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/rwonce.h
diff --git a/arch/riscv/Kconfig.errata b/arch/riscv/Kconfig.errata
index e2c731cfed8c..2824ff165741 100644
--- a/arch/riscv/Kconfig.errata
+++ b/arch/riscv/Kconfig.errata
@@ -99,4 +99,23 @@ config ERRATA_THEAD_PMU
If you don't know what to do here, say "Y".
+config ERRATA_THEAD_WRITE_ONCE
+ bool "Apply T-Head WRITE_ONCE errata"
+ depends on ERRATA_THEAD
+ default y
+ help
+ The early version of T-Head C9xx cores of sg2042 has a store merge
+ buffer delay problem. The store merge buffer could improve the store
+ queue performance by merging multi-store requests, but when there are
+ no continued store requests, the prior single store request would be
+ waiting in the store queue for a long time. That would cause signifi-
+ cant problems for communication between multi-cores. Appending a
+ fence w.o could immediately flush the store merge buffer and let other
+ cores see the write result.
+
+ This will apply the WRITE_ONCE errata to handle the non-standard beh-
+ avior via appending a fence w.o instruction for WRITE_ONCE().
+
+ If you don't know what to do here, say "Y".
+
endmenu # "CPU errata selection"
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index 0554ed4bf087..f6c1da819670 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -69,6 +69,23 @@ static bool errata_probe_pmu(unsigned int stage,
return true;
}
+static bool errata_probe_write_once(unsigned int stage,
+ unsigned long arch_id, unsigned long impid)
+{
+ if (!IS_ENABLED(CONFIG_ERRATA_THEAD_WRITE_ONCE))
+ return false;
+
+ /* target-c9xx cores report arch_id and impid as 0 */
+ if (arch_id != 0 || impid != 0)
+ return false;
+
+ if (stage == RISCV_ALTERNATIVES_BOOT ||
+ stage == RISCV_ALTERNATIVES_MODULE)
+ return true;
+
+ return false;
+}
+
static u32 thead_errata_probe(unsigned int stage,
unsigned long archid, unsigned long impid)
{
@@ -83,6 +100,9 @@ static u32 thead_errata_probe(unsigned int stage,
if (errata_probe_pmu(stage, archid, impid))
cpu_req_errata |= BIT(ERRATA_THEAD_PMU);
+ if (errata_probe_write_once(stage, archid, impid))
+ cpu_req_errata |= BIT(ERRATA_THEAD_WRITE_ONCE);
+
return cpu_req_errata;
}
diff --git a/arch/riscv/include/asm/rwonce.h b/arch/riscv/include/asm/rwonce.h
new file mode 100644
index 000000000000..4c407c482ed0
--- /dev/null
+++ b/arch/riscv/include/asm/rwonce.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_RWONCE_H
+#define __ASM_RWONCE_H
+
+#include <linux/compiler_types.h>
+#include <asm/alternative-macros.h>
+#include <asm/vendorid_list.h>
+
+#if defined(CONFIG_ERRATA_THEAD_WRITE_ONCE) && !defined(NO_ALTERNATIVE)
+#define write_once_flush() \
+do { \
+ asm volatile(ALTERNATIVE( \
+ __nops(1), \
+ "fence w, o\n\t", \
+ THEAD_VENDOR_ID, \
+ ERRATA_THEAD_WRITE_ONCE, \
+ CONFIG_ERRATA_THEAD_WRITE_ONCE) \
+ : : : "memory"); \
+} while (0)
+
+#define __WRITE_ONCE(x, val) \
+do { \
+ *(volatile typeof(x) *)&(x) = (val); \
+ write_once_flush(); \
+} while (0)
+#endif
+
+#include <asm-generic/rwonce.h>
+
+#endif /* __ASM_RWONCE_H */
diff --git a/arch/riscv/include/asm/vendorid_list.h b/arch/riscv/include/asm/vendorid_list.h
index c503373193d2..5df1862bf0c9 100644
--- a/arch/riscv/include/asm/vendorid_list.h
+++ b/arch/riscv/include/asm/vendorid_list.h
@@ -24,7 +24,8 @@
#define ERRATA_THEAD_PBMT 0
#define ERRATA_THEAD_CMO 1
#define ERRATA_THEAD_PMU 2
-#define ERRATA_THEAD_NUMBER 3
+#define ERRATA_THEAD_WRITE_ONCE 3
+#define ERRATA_THEAD_NUMBER 4
#endif
#endif
diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h
index 8d0a6280e982..fb07fe8c6e45 100644
--- a/include/asm-generic/rwonce.h
+++ b/include/asm-generic/rwonce.h
@@ -50,10 +50,12 @@
__READ_ONCE(x); \
})
+#ifndef __WRITE_ONCE
#define __WRITE_ONCE(x, val) \
do { \
*(volatile typeof(x) *)&(x) = (val); \
} while (0)
+#endif
#define WRITE_ONCE(x, val) \
do { \
--
2.40.1
From: Guo Ren <[email protected]>
The requirements of qspinlock have been documented by commit:
a8ad07e5240c ("asm-generic: qspinlock: Indicate the use of mixed-size
atomics").
Although RISC-V ISA gives out a weaker forward guarantee LR/SC, which
doesn't satisfy the requirements of qspinlock above, it won't prevent
some riscv vendors from implementing a strong fwd guarantee LR/SC in
microarchitecture to match xchg_tail requirement. T-HEAD C9xx processor
is the one.
We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress
test on Fedora & Ubuntu & OpenEuler ... Here is the performance
comparison between qspinlock and ticket_lock on sg2042 (64 cores):
sysbench test=threads threads=32 yields=100 lock=8 (+13.8%):
queued_spinlock 0.5109/0.00
ticket_spinlock 0.5814/0.00
perf futex/hash (+6.7%):
queued_spinlock 1444393 operations/sec (+- 0.09%)
ticket_spinlock 1353215 operations/sec (+- 0.15%)
perf futex/wake-parallel (+8.6%):
queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%)
ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%)
perf futex/requeue (+4.2%):
queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%)
ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%)
System Benchmarks (+6.4%)
queued_spinlock:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 628613745.4 53865.8
Double-Precision Whetstone 55.0 182422.8 33167.8
Execl Throughput 43.0 13116.6 3050.4
File Copy 1024 bufsize 2000 maxblocks 3960.0 7762306.2 19601.8
File Copy 256 bufsize 500 maxblocks 1655.0 3417556.8 20649.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 7427995.7 12806.9
Pipe Throughput 12440.0 23058600.5 18535.9
Pipe-based Context Switching 4000.0 2835617.7 7089.0
Process Creation 126.0 12537.3 995.0
Shell Scripts (1 concurrent) 42.4 57057.4 13456.9
Shell Scripts (8 concurrent) 6.0 7367.1 12278.5
System Call Overhead 15000.0 33308301.3 22205.5
========
System Benchmarks Index Score 12426.1
ticket_spinlock:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 626541701.9 53688.2
Double-Precision Whetstone 55.0 181921.0 33076.5
Execl Throughput 43.0 12625.1 2936.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 6553792.9 16550.0
File Copy 256 bufsize 500 maxblocks 1655.0 3189231.6 19270.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 7221277.0 12450.5
Pipe Throughput 12440.0 20594018.7 16554.7
Pipe-based Context Switching 4000.0 2571117.7 6427.8
Process Creation 126.0 10798.4 857.0
Shell Scripts (1 concurrent) 42.4 57227.5 13497.1
Shell Scripts (8 concurrent) 6.0 7329.2 12215.3
System Call Overhead 15000.0 30766778.4 20511.2
========
System Benchmarks Index Score 11670.7
The qspinlock has a significant improvement on SOPHGO SG2042 64
cores platform than the ticket_lock.
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/Kconfig | 16 ++++++++++++++++
arch/riscv/include/asm/Kbuild | 4 +++-
arch/riscv/include/asm/spinlock.h | 18 ++++++++++++++++++
3 files changed, 37 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/spinlock.h
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 24c1799e2ec4..f345df0763b2 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -432,6 +432,22 @@ config NODES_SHIFT
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.
+choice
+ prompt "RISC-V spinlock type"
+ default RISCV_TICKET_SPINLOCKS
+
+config RISCV_TICKET_SPINLOCKS
+ bool "Using ticket spinlock"
+
+config RISCV_QUEUED_SPINLOCKS
+ bool "Using queued spinlock"
+ depends on SMP && MMU
+ select ARCH_USE_QUEUED_SPINLOCKS
+ help
+ Make sure your micro arch give cmpxchg/xchg forward progress
+ guarantee. Otherwise, stay at ticket-lock.
+endchoice
+
config RISCV_ALTERNATIVE
bool
depends on !XIP_KERNEL
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 504f8b7e72d4..ad72f2bd4cc9 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -2,10 +2,12 @@
generic-y += early_ioremap.h
generic-y += flat.h
generic-y += kvm_para.h
+generic-y += mcs_spinlock.h
generic-y += parport.h
-generic-y += spinlock.h
generic-y += spinlock_types.h
+generic-y += ticket_spinlock.h
generic-y += qrwlock.h
generic-y += qrwlock_types.h
+generic-y += qspinlock.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
new file mode 100644
index 000000000000..98a3da4b1056
--- /dev/null
+++ b/arch/riscv/include/asm/spinlock.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_RISCV_SPINLOCK_H
+#define __ASM_RISCV_SPINLOCK_H
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#define _Q_PENDING_LOOPS (1 << 9)
+#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#else
+#include <asm/ticket_spinlock.h>
+#endif
+
+#include <asm/qrwlock.h>
+
+#endif /* __ASM_RISCV_SPINLOCK_H */
--
2.40.1
From: Guo Ren <[email protected]>
Combo spinlock could support queued and ticket in one Linux Image and
select them during boot time via command line. Here is the func
size (Bytes) comparison table below:
TYPE : COMBO | TICKET | QUEUED
arch_spin_lock : 106 | 60 | 50
arch_spin_unlock : 54 | 36 | 26
arch_spin_trylock : 110 | 72 | 54
arch_spin_is_locked : 48 | 34 | 20
arch_spin_is_contended : 56 | 40 | 24
rch_spin_value_unlocked : 48 | 34 | 24
One example of disassemble combo arch_spin_unlock:
<+14>: nop # detour slot
<+18>: fence rw,w --+-> queued_spin_unlock
<+22>: sb zero,0(a4) --+ (2 instructions)
<+26>: ld s0,8(sp)
<+28>: addi sp,sp,16
<+30>: ret
<+32>: lw a5,0(a4) --+-> ticket_spin_unlock
<+34>: sext.w a5,a5 | (7 instructions)
<+36>: fence rw,w |
<+40>: addiw a5,a5,1 |
<+42>: slli a5,a5,0x30 |
<+44>: srli a5,a5,0x30 |
<+46>: sh a5,0(a4) --+
<+50>: ld s0,8(sp)
<+52>: addi sp,sp,16
<+54>: ret
The qspinlock is smaller and faster than ticket-lock when all are in a
fast path.
The combo spinlock could provide a compatible Linux Image for different
micro-arch designs that have/haven't forward progress guarantee. Use
command line options to select between qspinlock and ticket-lock, and
the default is ticket-lock.
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 2 +
arch/riscv/Kconfig | 9 +++-
arch/riscv/include/asm/spinlock.h | 48 +++++++++++++++++++
arch/riscv/kernel/setup.c | 34 +++++++++++++
include/asm-generic/qspinlock.h | 2 +
include/asm-generic/ticket_spinlock.h | 2 +
6 files changed, 96 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..2ac9f1511774 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4753,6 +4753,8 @@
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
+ qspinlock [RISCV] Use native qspinlock.
+
quiet [KNL] Disable most log messages
r128= [HW,DRM]
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index f345df0763b2..b7673c5c0997 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -434,7 +434,7 @@ config NODES_SHIFT
choice
prompt "RISC-V spinlock type"
- default RISCV_TICKET_SPINLOCKS
+ default RISCV_COMBO_SPINLOCKS
config RISCV_TICKET_SPINLOCKS
bool "Using ticket spinlock"
@@ -446,6 +446,13 @@ config RISCV_QUEUED_SPINLOCKS
help
Make sure your micro arch give cmpxchg/xchg forward progress
guarantee. Otherwise, stay at ticket-lock.
+
+config RISCV_COMBO_SPINLOCKS
+ bool "Using combo spinlock"
+ depends on SMP && MMU
+ select ARCH_USE_QUEUED_SPINLOCKS
+ help
+ Select queued spinlock or ticket-lock by cmdline.
endchoice
config RISCV_ALTERNATIVE
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index 98a3da4b1056..d07643c07aae 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -7,12 +7,60 @@
#define _Q_PENDING_LOOPS (1 << 9)
#endif
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+#define __no_arch_spinlock_redefine
+#include <asm/ticket_spinlock.h>
+#include <asm/qspinlock.h>
+#include <linux/jump_label.h>
+
+DECLARE_STATIC_KEY_TRUE(combo_qspinlock_key);
+
+#define COMBO_SPINLOCK_BASE_DECLARE(op) \
+static __always_inline void arch_spin_##op(arch_spinlock_t *lock) \
+{ \
+ if (static_branch_likely(&combo_qspinlock_key)) \
+ queued_spin_##op(lock); \
+ else \
+ ticket_spin_##op(lock); \
+}
+COMBO_SPINLOCK_BASE_DECLARE(lock)
+COMBO_SPINLOCK_BASE_DECLARE(unlock)
+
+#define COMBO_SPINLOCK_IS_DECLARE(op) \
+static __always_inline int arch_spin_##op(arch_spinlock_t *lock) \
+{ \
+ if (static_branch_likely(&combo_qspinlock_key)) \
+ return queued_spin_##op(lock); \
+ else \
+ return ticket_spin_##op(lock); \
+}
+COMBO_SPINLOCK_IS_DECLARE(is_locked)
+COMBO_SPINLOCK_IS_DECLARE(is_contended)
+
+static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
+{
+ if (static_branch_likely(&combo_qspinlock_key))
+ return queued_spin_trylock(lock);
+ else
+ return ticket_spin_trylock(lock);
+}
+
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+ if (static_branch_likely(&combo_qspinlock_key))
+ return queued_spin_value_unlocked(lock);
+ else
+ return ticket_spin_value_unlocked(lock);
+}
+
+#else /* CONFIG_RISCV_COMBO_SPINLOCKS */
#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm/qspinlock.h>
#else
#include <asm/ticket_spinlock.h>
#endif
+#endif /* CONFIG_RISCV_COMBO_SPINLOCKS */
#include <asm/qrwlock.h>
#endif /* __ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 535a837de55d..d9072a59831c 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -246,6 +246,37 @@ static void __init parse_dtb(void)
#endif
}
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+static bool enable_qspinlock __ro_after_init;
+static int __init queued_spinlock_setup(char *p)
+{
+ enable_qspinlock = true;
+
+ return 0;
+}
+early_param("qspinlock", queued_spinlock_setup);
+
+/*
+ * Ticket-lock would dirty the lock value, so force qspinlock at
+ * first and switch to ticket-lock later.
+ * - key is true : qspinlock -> qspinlock (no change)
+ * - key is false: qspinlock -> ticket-lock
+ * (No ticket-lock -> qspinlock)
+ */
+DEFINE_STATIC_KEY_TRUE(combo_qspinlock_key);
+EXPORT_SYMBOL(combo_qspinlock_key);
+
+static void __init riscv_spinlock_init(void)
+{
+ if (!enable_qspinlock) {
+ static_branch_disable(&combo_qspinlock_key);
+ pr_info("Ticket spinlock: enabled\n");
+ } else {
+ pr_info("Queued spinlock: enabled\n");
+ }
+}
+#endif
+
extern void __init init_rt_signal_env(void);
void __init setup_arch(char **cmdline_p)
@@ -297,6 +328,9 @@ void __init setup_arch(char **cmdline_p)
riscv_set_dma_cache_alignment();
riscv_user_isa_enable();
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+ riscv_spinlock_init();
+#endif
}
static int __init topology_init(void)
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 0655aa5b57b2..bf47cca2c375 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
}
#endif
+#ifndef __no_arch_spinlock_redefine
/*
* Remapping spinlock architecture specific functions to the corresponding
* queued spinlock functions.
@@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
#define arch_spin_lock(l) queued_spin_lock(l)
#define arch_spin_trylock(l) queued_spin_trylock(l)
#define arch_spin_unlock(l) queued_spin_unlock(l)
+#endif
#endif /* __ASM_GENERIC_QSPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
index cfcff22b37b3..325779970d8a 100644
--- a/include/asm-generic/ticket_spinlock.h
+++ b/include/asm-generic/ticket_spinlock.h
@@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
return (s16)((val >> 16) - (val & 0xffff)) > 1;
}
+#ifndef __no_arch_spinlock_redefine
/*
* Remapping spinlock architecture specific functions to the corresponding
* ticket spinlock functions.
@@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
#define arch_spin_lock(l) ticket_spin_lock(l)
#define arch_spin_trylock(l) ticket_spin_trylock(l)
#define arch_spin_unlock(l) ticket_spin_unlock(l)
+#endif
#endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
--
2.40.1
From: Guo Ren <[email protected]>
Add a static key controlling whether virt_spin_lock() should be
called or not. When running on bare metal set the new key to
false.
The VM guests should fall back to a Test-and-Set spinlock,
because fair locks have horrible lock 'holder' preemption issues.
The virt_spin_lock_key would shortcut for the queued_spin_lock_-
slowpath() function that allow virt_spin_lock to hijack it.
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 4 +++
arch/riscv/include/asm/spinlock.h | 22 ++++++++++++++++
arch/riscv/kernel/setup.c | 26 +++++++++++++++++++
3 files changed, 52 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2ac9f1511774..b7794c96d91e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3997,6 +3997,10 @@
no_uaccess_flush
[PPC] Don't flush the L1-D cache after accessing user data.
+ no_virt_spin [RISC-V] Disable virt_spin_lock in VM guest to use
+ native_queued_spinlock when the nopvspin option is enabled.
+ This would help vcpu=pcpu scenarios.
+
novmcoredd [KNL,KDUMP]
Disable device dump. Device dump allows drivers to
append dump data to vmcore so you can collect driver
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index d07643c07aae..7bbcf3d9fff0 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -4,6 +4,28 @@
#define __ASM_RISCV_SPINLOCK_H
#ifdef CONFIG_QUEUED_SPINLOCKS
+/*
+ * The KVM guests fall back to a Test-and-Set spinlock, because fair locks
+ * have horrible lock 'holder' preemption issues. The virt_spin_lock_key
+ * would shortcut for the queued_spin_lock_slowpath() function that allow
+ * virt_spin_lock to hijack it.
+ */
+DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
+
+#define virt_spin_lock virt_spin_lock
+static inline bool virt_spin_lock(struct qspinlock *lock)
+{
+ if (!static_branch_likely(&virt_spin_lock_key))
+ return false;
+
+ do {
+ while (atomic_read(&lock->val) != 0)
+ cpu_relax();
+ } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
+
+ return true;
+}
+
#define _Q_PENDING_LOOPS (1 << 9)
#endif
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index d9072a59831c..0bafb9fd6ea3 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -27,6 +27,7 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/cpu_ops.h>
+#include <asm/cpufeature.h>
#include <asm/early_ioremap.h>
#include <asm/pgtable.h>
#include <asm/setup.h>
@@ -266,6 +267,27 @@ early_param("qspinlock", queued_spinlock_setup);
DEFINE_STATIC_KEY_TRUE(combo_qspinlock_key);
EXPORT_SYMBOL(combo_qspinlock_key);
+#ifdef CONFIG_QUEUED_SPINLOCKS
+static bool no_virt_spin __ro_after_init;
+static int __init no_virt_spin_setup(char *p)
+{
+ no_virt_spin = true;
+
+ return 0;
+}
+early_param("no_virt_spin", no_virt_spin_setup);
+
+DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
+
+static void __init virt_spin_lock_init(void)
+{
+ if (no_virt_spin)
+ static_branch_disable(&virt_spin_lock_key);
+ else
+ pr_info("Enable virt_spin_lock\n");
+}
+#endif
+
static void __init riscv_spinlock_init(void)
{
if (!enable_qspinlock) {
@@ -274,6 +296,10 @@ static void __init riscv_spinlock_init(void)
} else {
pr_info("Queued spinlock: enabled\n");
}
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+ virt_spin_lock_init();
+#endif
}
#endif
--
2.40.1
From: Guo Ren <[email protected]>
Force to enable virt_spin_lock when KVM guest, because fair locks
have horrible lock 'holder' preemption issues.
Suggested-by: Leonardo Bras <[email protected]>
Link: https://lkml.kernel.org/kvm/[email protected]/
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/sbi.h | 8 ++++++++
arch/riscv/kernel/sbi.c | 2 +-
arch/riscv/kernel/setup.c | 6 +++++-
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 0892f4421bc4..8f748d9e1b85 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -51,6 +51,13 @@ enum sbi_ext_base_fid {
SBI_EXT_BASE_GET_MIMPID,
};
+enum sbi_ext_base_impl_id {
+ SBI_EXT_BASE_IMPL_ID_BBL = 0,
+ SBI_EXT_BASE_IMPL_ID_OPENSBI,
+ SBI_EXT_BASE_IMPL_ID_XVISOR,
+ SBI_EXT_BASE_IMPL_ID_KVM,
+};
+
enum sbi_ext_time_fid {
SBI_EXT_TIME_SET_TIMER = 0,
};
@@ -276,6 +283,7 @@ int sbi_console_getchar(void);
long sbi_get_mvendorid(void);
long sbi_get_marchid(void);
long sbi_get_mimpid(void);
+long sbi_get_firmware_id(void);
void sbi_set_timer(uint64_t stime_value);
void sbi_shutdown(void);
void sbi_send_ipi(unsigned int cpu);
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 5a62ed1da453..4330aedf65fd 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -543,7 +543,7 @@ static inline long sbi_get_spec_version(void)
return __sbi_base_ecall(SBI_EXT_BASE_GET_SPEC_VERSION);
}
-static inline long sbi_get_firmware_id(void)
+long sbi_get_firmware_id(void)
{
return __sbi_base_ecall(SBI_EXT_BASE_GET_IMP_ID);
}
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 0bafb9fd6ea3..e33430e9d97e 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -281,6 +281,9 @@ DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
static void __init virt_spin_lock_init(void)
{
+ if (sbi_get_firmware_id() != SBI_EXT_BASE_IMPL_ID_KVM)
+ no_virt_spin = true;
+
if (no_virt_spin)
static_branch_disable(&virt_spin_lock_key);
else
@@ -290,7 +293,8 @@ static void __init virt_spin_lock_init(void)
static void __init riscv_spinlock_init(void)
{
- if (!enable_qspinlock) {
+ if ((!enable_qspinlock) &&
+ (sbi_get_firmware_id() != SBI_EXT_BASE_IMPL_ID_KVM)) {
static_branch_disable(&combo_qspinlock_key);
pr_info("Ticket spinlock: enabled\n");
} else {
--
2.40.1
From: Guo Ren <[email protected]>
Add the files functions needed to support the SBI PVLOCK (paravirt
qspinlock kick_cpu) extension. Implement kvm_sbi_ext_pvlock_kick_-
cpu(), and we only need to call the kvm_vcpu_kick() and bring
target_vcpu from the halt state. No irq raised, no other request,
just a pure vcpu_kick.
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_sbi.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu_sbi.c | 4 ++
arch/riscv/kvm/vcpu_sbi_pvlock.c | 57 +++++++++++++++++++++++++++
5 files changed, 64 insertions(+)
create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c
diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
index 6a453f7f8b56..a051e4875542 100644
--- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
+++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
@@ -76,6 +76,7 @@ extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_hsm;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_dbcn;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_experimental;
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_vendor;
+extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pvlock;
#ifdef CONFIG_RISCV_PMU_SBI
extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu;
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index 60d3b21dead7..24bbada1a9fb 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -157,6 +157,7 @@ enum KVM_RISCV_SBI_EXT_ID {
KVM_RISCV_SBI_EXT_EXPERIMENTAL,
KVM_RISCV_SBI_EXT_VENDOR,
KVM_RISCV_SBI_EXT_DBCN,
+ KVM_RISCV_SBI_EXT_PVLOCK,
KVM_RISCV_SBI_EXT_MAX,
};
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 4c2067fc59fc..6112750a3a0c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -26,6 +26,7 @@ kvm-$(CONFIG_RISCV_SBI_V01) += vcpu_sbi_v01.o
kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_hsm.o
+kvm-y += vcpu_sbi_pvlock.o
kvm-y += vcpu_timer.o
kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
kvm-y += aia.o
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index a04ff98085d9..7078bd57806b 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -78,6 +78,10 @@ static const struct kvm_riscv_sbi_extension_entry sbi_ext[] = {
.ext_idx = KVM_RISCV_SBI_EXT_VENDOR,
.ext_ptr = &vcpu_sbi_ext_vendor,
},
+ {
+ .ext_idx = KVM_RISCV_SBI_EXT_PVLOCK,
+ .ext_ptr = &vcpu_sbi_ext_pvlock,
+ },
};
void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run)
diff --git a/arch/riscv/kvm/vcpu_sbi_pvlock.c b/arch/riscv/kvm/vcpu_sbi_pvlock.c
new file mode 100644
index 000000000000..914fc58aedfe
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi_pvlock.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c), 2023 Alibaba Cloud
+ *
+ * Authors:
+ * Guo Ren <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <asm/sbi.h>
+#include <asm/kvm_vcpu_sbi.h>
+
+static int kvm_sbi_ext_pvlock_kick_cpu(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_vcpu *target;
+
+ target = kvm_get_vcpu_by_id(kvm, cp->a0);
+ if (!target)
+ return SBI_ERR_INVALID_PARAM;
+
+ kvm_vcpu_kick(target);
+
+ if (READ_ONCE(target->ready))
+ kvm_vcpu_yield_to(target);
+
+ return SBI_SUCCESS;
+}
+
+static int kvm_sbi_ext_pvlock_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ struct kvm_vcpu_sbi_return *retdata)
+{
+ int ret = 0;
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+ unsigned long funcid = cp->a6;
+
+ switch (funcid) {
+ case SBI_EXT_PVLOCK_KICK_CPU:
+ ret = kvm_sbi_ext_pvlock_kick_cpu(vcpu);
+ break;
+ default:
+ ret = SBI_ERR_NOT_SUPPORTED;
+ }
+
+ retdata->err_val = ret;
+
+ return 0;
+}
+
+const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pvlock = {
+ .extid_start = SBI_EXT_PVLOCK,
+ .extid_end = SBI_EXT_PVLOCK,
+ .handler = kvm_sbi_ext_pvlock_handler,
+};
--
2.40.1
From: Guo Ren <[email protected]>
Using static_call to switch between:
native_queued_spin_lock_slowpath() __pv_queued_spin_lock_slowpath()
native_queued_spin_unlock() __pv_queued_spin_unlock()
Finish the pv_wait implementation, but pv_kick needs the SBI
definition of the next patches.
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/Kbuild | 1 -
arch/riscv/include/asm/qspinlock.h | 35 +++++++++++++
arch/riscv/include/asm/qspinlock_paravirt.h | 29 +++++++++++
arch/riscv/kernel/qspinlock_paravirt.c | 57 +++++++++++++++++++++
arch/riscv/kernel/setup.c | 4 ++
5 files changed, 125 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/qspinlock.h
create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h
create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index ad72f2bd4cc9..85a428ad116d 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -8,6 +8,5 @@ generic-y += spinlock_types.h
generic-y += ticket_spinlock.h
generic-y += qrwlock.h
generic-y += qrwlock_types.h
-generic-y += qspinlock.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/riscv/include/asm/qspinlock.h b/arch/riscv/include/asm/qspinlock.h
new file mode 100644
index 000000000000..02ce973b5b6e
--- /dev/null
+++ b/arch/riscv/include/asm/qspinlock.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c), 2023 Alibaba
+ * Authors:
+ * Guo Ren <[email protected]>
+ */
+
+#ifndef _ASM_RISCV_QSPINLOCK_H
+#define _ASM_RISCV_QSPINLOCK_H
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#include <asm/qspinlock_paravirt.h>
+
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD (1 << 15)
+
+void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+void __pv_init_lock_hash(void);
+void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+ static_call(pv_queued_spin_lock_slowpath)(lock, val);
+}
+
+#define queued_spin_unlock queued_spin_unlock
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ static_call(pv_queued_spin_unlock)(lock);
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#include <asm-generic/qspinlock.h>
+
+#endif /* _ASM_RISCV_QSPINLOCK_H */
diff --git a/arch/riscv/include/asm/qspinlock_paravirt.h b/arch/riscv/include/asm/qspinlock_paravirt.h
new file mode 100644
index 000000000000..9681e851f69d
--- /dev/null
+++ b/arch/riscv/include/asm/qspinlock_paravirt.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c), 2023 Alibaba Cloud
+ * Authors:
+ * Guo Ren <[email protected]>
+ */
+
+#ifndef _ASM_RISCV_QSPINLOCK_PARAVIRT_H
+#define _ASM_RISCV_QSPINLOCK_PARAVIRT_H
+
+void pv_wait(u8 *ptr, u8 val);
+void pv_kick(int cpu);
+
+void dummy_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+void dummy_queued_spin_unlock(struct qspinlock *lock);
+
+DECLARE_STATIC_CALL(pv_queued_spin_lock_slowpath, dummy_queued_spin_lock_slowpath);
+DECLARE_STATIC_CALL(pv_queued_spin_unlock, dummy_queued_spin_unlock);
+
+void __init pv_qspinlock_init(void);
+
+static inline bool pv_is_native_spin_unlock(void)
+{
+ return false;
+}
+
+void __pv_queued_spin_unlock(struct qspinlock *lock);
+
+#endif /* _ASM_RISCV_QSPINLOCK_PARAVIRT_H */
diff --git a/arch/riscv/kernel/qspinlock_paravirt.c b/arch/riscv/kernel/qspinlock_paravirt.c
new file mode 100644
index 000000000000..85ff5a3ec234
--- /dev/null
+++ b/arch/riscv/kernel/qspinlock_paravirt.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c), 2023 Alibaba Cloud
+ * Authors:
+ * Guo Ren <[email protected]>
+ */
+
+#include <linux/static_call.h>
+#include <asm/qspinlock_paravirt.h>
+#include <asm/sbi.h>
+
+void pv_kick(int cpu)
+{
+ return;
+}
+
+void pv_wait(u8 *ptr, u8 val)
+{
+ unsigned long flags;
+
+ if (in_nmi())
+ return;
+
+ local_irq_save(flags);
+ if (READ_ONCE(*ptr) != val)
+ goto out;
+
+ /* wait_for_interrupt(); */
+out:
+ local_irq_restore(flags);
+}
+
+static void native_queued_spin_unlock(struct qspinlock *lock)
+{
+ smp_store_release(&lock->locked, 0);
+}
+
+DEFINE_STATIC_CALL(pv_queued_spin_lock_slowpath, native_queued_spin_lock_slowpath);
+EXPORT_STATIC_CALL(pv_queued_spin_lock_slowpath);
+
+DEFINE_STATIC_CALL(pv_queued_spin_unlock, native_queued_spin_unlock);
+EXPORT_STATIC_CALL(pv_queued_spin_unlock);
+
+void __init pv_qspinlock_init(void)
+{
+ if (num_possible_cpus() == 1)
+ return;
+
+ if(sbi_get_firmware_id() != SBI_EXT_BASE_IMPL_ID_KVM)
+ return;
+
+ pr_info("PV qspinlocks enabled\n");
+ __pv_init_lock_hash();
+
+ static_call_update(pv_queued_spin_lock_slowpath, __pv_queued_spin_lock_slowpath);
+ static_call_update(pv_queued_spin_unlock, __pv_queued_spin_unlock);
+}
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index e33430e9d97e..052bbfbb7f32 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -304,6 +304,10 @@ static void __init riscv_spinlock_init(void)
#ifdef CONFIG_QUEUED_SPINLOCKS
virt_spin_lock_init();
#endif
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+ pv_qspinlock_init();
+#endif
}
#endif
--
2.40.1
From: Guo Ren <[email protected]>
Implement pv_kick with SBI guest implementation, and add
SBI_EXT_PVLOCK extension detection. The backend part is
in the KVM pvqspinlock patch.
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/sbi.h | 6 ++++++
arch/riscv/kernel/qspinlock_paravirt.c | 7 ++++++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 8f748d9e1b85..318b3dd92958 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -31,6 +31,7 @@ enum sbi_ext_id {
SBI_EXT_SRST = 0x53525354,
SBI_EXT_PMU = 0x504D55,
SBI_EXT_DBCN = 0x4442434E,
+ SBI_EXT_PVLOCK = 0xAB0401,
/* Experimentals extensions must lie within this range */
SBI_EXT_EXPERIMENTAL_START = 0x08000000,
@@ -250,6 +251,11 @@ enum sbi_ext_dbcn_fid {
SBI_EXT_DBCN_CONSOLE_WRITE_BYTE = 2,
};
+/* kick cpu out of wfi */
+enum sbi_ext_pvlock_fid {
+ SBI_EXT_PVLOCK_KICK_CPU = 0,
+};
+
#define SBI_SPEC_VERSION_DEFAULT 0x1
#define SBI_SPEC_VERSION_MAJOR_SHIFT 24
#define SBI_SPEC_VERSION_MAJOR_MASK 0x7f
diff --git a/arch/riscv/kernel/qspinlock_paravirt.c b/arch/riscv/kernel/qspinlock_paravirt.c
index 85ff5a3ec234..7d1b99412222 100644
--- a/arch/riscv/kernel/qspinlock_paravirt.c
+++ b/arch/riscv/kernel/qspinlock_paravirt.c
@@ -11,6 +11,8 @@
void pv_kick(int cpu)
{
+ sbi_ecall(SBI_EXT_PVLOCK, SBI_EXT_PVLOCK_KICK_CPU,
+ cpuid_to_hartid_map(cpu), 0, 0, 0, 0, 0);
return;
}
@@ -25,7 +27,7 @@ void pv_wait(u8 *ptr, u8 val)
if (READ_ONCE(*ptr) != val)
goto out;
- /* wait_for_interrupt(); */
+ wait_for_interrupt();
out:
local_irq_restore(flags);
}
@@ -49,6 +51,9 @@ void __init pv_qspinlock_init(void)
if(sbi_get_firmware_id() != SBI_EXT_BASE_IMPL_ID_KVM)
return;
+ if (!sbi_probe_extension(SBI_EXT_PVLOCK))
+ return;
+
pr_info("PV qspinlocks enabled\n");
__pv_init_lock_hash();
--
2.40.1
From: Guo Ren <[email protected]>
Disables the qspinlock slow path using PV optimizations which
allow the hypervisor to 'idle' the guest on lock contention.
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
arch/riscv/kernel/qspinlock_paravirt.c | 13 +++++++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b7794c96d91e..4aff81d741e2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3927,7 +3927,7 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
- nopvspin [X86,XEN,KVM]
+ nopvspin [X86,XEN,KVM,RISC-V]
Disables the qspinlock slow path using PV optimizations
which allow the hypervisor to 'idle' the guest on lock
contention.
diff --git a/arch/riscv/kernel/qspinlock_paravirt.c b/arch/riscv/kernel/qspinlock_paravirt.c
index 7d1b99412222..4b04c93c4b9b 100644
--- a/arch/riscv/kernel/qspinlock_paravirt.c
+++ b/arch/riscv/kernel/qspinlock_paravirt.c
@@ -43,8 +43,21 @@ EXPORT_STATIC_CALL(pv_queued_spin_lock_slowpath);
DEFINE_STATIC_CALL(pv_queued_spin_unlock, native_queued_spin_unlock);
EXPORT_STATIC_CALL(pv_queued_spin_unlock);
+static bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+ nopvspin = true;
+ return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+
void __init pv_qspinlock_init(void)
{
+ if (nopvspin) {
+ pr_info("PV qspinlocks disabled\n");
+ return;
+ }
+
if (num_possible_cpus() == 1)
return;
--
2.40.1
From: Guo Ren <[email protected]>
Add kconfig entry for paravirt_spinlock, an unfair qspinlock
virtualization-friendly backend, by halting the virtual CPU rather
than spinning.
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/Kconfig | 12 ++++++++++++
arch/riscv/kernel/Makefile | 1 +
2 files changed, 13 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index b7673c5c0997..7df3d50733c6 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -780,6 +780,18 @@ config RANDOMIZE_BASE
If unsure, say N.
+config PARAVIRT_SPINLOCKS
+ bool "Paravirtualization layer for spinlocks"
+ depends on QUEUED_SPINLOCKS
+ default y
+ help
+ Paravirtualized spinlocks allow a unfair qspinlock to replace the
+ test-set kvm-guest virt spinlock implementation with something
+ virtualization-friendly, for example, halt the virtual CPU rather
+ than spinning.
+
+ If you are unsure how to answer this question, answer Y.
+
endmenu # "Kernel features"
menu "Boot options"
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index fee22a3d1b53..2e0754e422cf 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -102,3 +102,4 @@ obj-$(CONFIG_COMPAT) += compat_vdso/
obj-$(CONFIG_64BIT) += pi/
obj-$(CONFIG_ACPI) += acpi.o
+obj-$(CONFIG_PARAVIRT_SPINLOCKS) += qspinlock_paravirt.o
--
2.40.1
From: Guo Ren <[email protected]>
Add trace point for pv_kick&wait, here is the output:
ls /sys/kernel/debug/tracing/events/paravirt/
enable filter pv_kick pv_wait
cat /sys/kernel/debug/tracing/trace
entries-in-buffer/entries-written: 33927/33927 #P:12
_-----=> irqs-off/BH-disabled
/ _----=> need-resched
| / _---=> hardirq/softirq
|| / _--=> preempt-depth
||| / _-=> migrate-disable
|||| / delay
TASK-PID CPU# ||||| TIMESTAMP FUNCTION
| | | ||||| | |
sh-100 [001] d..2. 28.312294: pv_wait: cpu 1 out of wfi
<idle>-0 [000] d.h4. 28.322030: pv_kick: cpu 0 kick target cpu 1
sh-100 [001] d..2. 30.982631: pv_wait: cpu 1 out of wfi
<idle>-0 [000] d.h4. 30.993289: pv_kick: cpu 0 kick target cpu 1
sh-100 [002] d..2. 44.987573: pv_wait: cpu 2 out of wfi
<idle>-0 [000] d.h4. 44.989000: pv_kick: cpu 0 kick target cpu 2
<idle>-0 [003] d.s3. 51.593978: pv_kick: cpu 3 kick target cpu 4
rcu_sched-15 [004] d..2. 51.595192: pv_wait: cpu 4 out of wfi
lock_torture_wr-115 [004] ...2. 52.656482: pv_kick: cpu 4 kick target cpu 2
lock_torture_wr-113 [002] d..2. 52.659146: pv_wait: cpu 2 out of wfi
lock_torture_wr-114 [008] d..2. 52.659507: pv_wait: cpu 8 out of wfi
lock_torture_wr-114 [008] d..2. 52.663503: pv_wait: cpu 8 out of wfi
lock_torture_wr-113 [002] ...2. 52.666128: pv_kick: cpu 2 kick target cpu 8
lock_torture_wr-114 [008] d..2. 52.667261: pv_wait: cpu 8 out of wfi
lock_torture_wr-114 [009] .n.2. 53.141515: pv_kick: cpu 9 kick target cpu 11
lock_torture_wr-113 [002] d..2. 53.143339: pv_wait: cpu 2 out of wfi
lock_torture_wr-116 [007] d..2. 53.143412: pv_wait: cpu 7 out of wfi
lock_torture_wr-118 [000] d..2. 53.143457: pv_wait: cpu 0 out of wfi
lock_torture_wr-115 [008] d..2. 53.143481: pv_wait: cpu 8 out of wfi
lock_torture_wr-117 [011] d..2. 53.143522: pv_wait: cpu 11 out of wfi
lock_torture_wr-117 [011] ...2. 53.143987: pv_kick: cpu 11 kick target cpu 8
lock_torture_wr-115 [008] ...2. 53.144269: pv_kick: cpu 8 kick target cpu 7
Reviewed-by: Leonardo Bras <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/kernel/qspinlock_paravirt.c | 8 +++
.../kernel/trace_events_filter_paravirt.h | 60 +++++++++++++++++++
2 files changed, 68 insertions(+)
create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h
diff --git a/arch/riscv/kernel/qspinlock_paravirt.c b/arch/riscv/kernel/qspinlock_paravirt.c
index 4b04c93c4b9b..0e6d11357243 100644
--- a/arch/riscv/kernel/qspinlock_paravirt.c
+++ b/arch/riscv/kernel/qspinlock_paravirt.c
@@ -9,10 +9,16 @@
#include <asm/qspinlock_paravirt.h>
#include <asm/sbi.h>
+#define CREATE_TRACE_POINTS
+#include "trace_events_filter_paravirt.h"
+
void pv_kick(int cpu)
{
sbi_ecall(SBI_EXT_PVLOCK, SBI_EXT_PVLOCK_KICK_CPU,
cpuid_to_hartid_map(cpu), 0, 0, 0, 0, 0);
+
+ trace_pv_kick(smp_processor_id(), cpu);
+
return;
}
@@ -28,6 +34,8 @@ void pv_wait(u8 *ptr, u8 val)
goto out;
wait_for_interrupt();
+
+ trace_pv_wait(smp_processor_id());
out:
local_irq_restore(flags);
}
diff --git a/arch/riscv/kernel/trace_events_filter_paravirt.h b/arch/riscv/kernel/trace_events_filter_paravirt.h
new file mode 100644
index 000000000000..9ff5aa451b12
--- /dev/null
+++ b/arch/riscv/kernel/trace_events_filter_paravirt.h
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c), 2023 Alibaba Cloud
+ * Authors:
+ * Guo Ren <[email protected]>
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM paravirt
+
+#if !defined(_TRACE_PARAVIRT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PARAVIRT_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(pv_kick,
+ TP_PROTO(int cpu, int target),
+ TP_ARGS(cpu, target),
+
+ TP_STRUCT__entry(
+ __field(int, cpu)
+ __field(int, target)
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ __entry->target = target;
+ ),
+
+ TP_printk("cpu %d kick target cpu %d",
+ __entry->cpu,
+ __entry->target
+ )
+);
+
+TRACE_EVENT(pv_wait,
+ TP_PROTO(int cpu),
+ TP_ARGS(cpu),
+
+ TP_STRUCT__entry(
+ __field(int, cpu)
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ ),
+
+ TP_printk("cpu %d out of wfi",
+ __entry->cpu
+ )
+);
+
+#endif /* _TRACE_PARAVIRT_H || TRACE_HEADER_MULTI_READ */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_PATH ../../../arch/riscv/kernel/
+#define TRACE_INCLUDE_FILE trace_events_filter_paravirt
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.40.1
Sorry, I missed people on the list.
F.Y.I
Waiman Long <[email protected]>
Boqun Feng <[email protected]>
Here is Link:
https://lore.kernel.org/linux-riscv/[email protected]/
On Mon, Dec 25, 2023 at 8:59 PM <[email protected]> wrote:
>
> From: Guo Ren <[email protected]>
>
> patch[1 - 8]: Native qspinlock
> patch[9 -14]: Paravirt qspinlock
>
> This series based on:
> - v6.7-rc7
> - Rework & improve riscv cmpxchg.h and atomic.h
> https://lore.kernel.org/linux-riscv/[email protected]/
>
> You can directly try it:
> https://github.com/guoren83/linux/tree/qspinlock_v12
>
> Native qspinlock
> ================
>
> This time we've proved the qspinlock on th1520 [1] & sg2042 [2], which
> gives stability and performance improvement. All T-HEAD processors have
> a strong LR/SC forward progress guarantee than the requirements of the
> ISA, which could satisfy the xchg_tail of native_qspinlock. Now,
> qspinlock has been run with us for more than 1 year, and we have enough
> confidence to enable it for all the T-HEAD processors. Of causes, we
> found a livelock problem with the qspinlock lock torture test from the
> CPU store merge buffer delay mechanism, which caused the queued spinlock
> becomes a dead ring and RCU warning to come out. We introduce a custom
> WRITE_ONCE to solve this, which will be fixed in the next generation of
> hardware.
>
> We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress
> test on Fedora & Ubuntu & OpenEuler ... Here is the performance
> comparison between qspinlock and ticket_lock on sg2042 (64 cores):
>
> sysbench test=threads threads=32 yields=100 lock=8 (+13.8%):
> queued_spinlock 0.5109/0.00
> ticket_spinlock 0.5814/0.00
>
> perf futex/hash (+6.7%):
> queued_spinlock 1444393 operations/sec (+- 0.09%)
> ticket_spinlock 1353215 operations/sec (+- 0.15%)
>
> perf futex/wake-parallel (+8.6%):
> queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%)
> ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%)
>
> perf futex/requeue (+4.2%):
> queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%)
> ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%)
>
>
> System Benchmarks (+6.4%)
> queued_spinlock:
> System Benchmarks Index Values BASELINE RESULT INDEX
> Dhrystone 2 using register variables 116700.0 628613745.4 53865.8
> Double-Precision Whetstone 55.0 182422.8 33167.8
> Execl Throughput 43.0 13116.6 3050.4
> File Copy 1024 bufsize 2000 maxblocks 3960.0 7762306.2 19601.8
> File Copy 256 bufsize 500 maxblocks 1655.0 3417556.8 20649.9
> File Copy 4096 bufsize 8000 maxblocks 5800.0 7427995.7 12806.9
> Pipe Throughput 12440.0 23058600.5 18535.9
> Pipe-based Context Switching 4000.0 2835617.7 7089.0
> Process Creation 126.0 12537.3 995.0
> Shell Scripts (1 concurrent) 42.4 57057.4 13456.9
> Shell Scripts (8 concurrent) 6.0 7367.1 12278.5
> System Call Overhead 15000.0 33308301.3 22205.5
> ========
> System Benchmarks Index Score 12426.1
>
> ticket_spinlock:
> System Benchmarks Index Values BASELINE RESULT INDEX
> Dhrystone 2 using register variables 116700.0 626541701.9 53688.2
> Double-Precision Whetstone 55.0 181921.0 33076.5
> Execl Throughput 43.0 12625.1 2936.1
> File Copy 1024 bufsize 2000 maxblocks 3960.0 6553792.9 16550.0
> File Copy 256 bufsize 500 maxblocks 1655.0 3189231.6 19270.3
> File Copy 4096 bufsize 8000 maxblocks 5800.0 7221277.0 12450.5
> Pipe Throughput 12440.0 20594018.7 16554.7
> Pipe-based Context Switching 4000.0 2571117.7 6427.8
> Process Creation 126.0 10798.4 857.0
> Shell Scripts (1 concurrent) 42.4 57227.5 13497.1
> Shell Scripts (8 concurrent) 6.0 7329.2 12215.3
> System Call Overhead 15000.0 30766778.4 20511.2
> ========
> System Benchmarks Index Score 11670.7
>
> The qspinlock has a significant improvement on SOPHGO SG2042 64
> cores platform than the ticket_lock.
>
> Paravirt qspinlock
> ==================
>
> We implemented kvm_kick_cpu/kvm_wait_cpu and add tracepoints to observe
> the behaviors and introduce a new SBI extension SBI_EXT_PVLOCK.
>
> Changlog:
> V12:
> - Remove force thead qspinlock with errata
> - Separate Zicbop patch from this series
> - Remove cpus >= 16 patch
> - Cleanup rebase and move it on v6.7-rc7
> - Reorder the coding struct with the last version's advice.
>
> V11:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Based on Leonardo Bras's cmpxchg_small patches v5.
> - Based on Guo Ren's Optimize arch_spin_value_unlocked patch v3.
> - Remove abusing alternative framework and use jump_label instead.
> - Introduce prefetch.w to improve T-HEAD processors' LR/SC forward
> progress guarantee.
> - Optimize qspinlock xchg_tail when NR_CPUS >= 16K.
>
> V10:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Using an alternative framework instead of static_key_branch in the
> asm/spinlock.h.
> - Fixup store merge buffer problem, which causes qspinlock lock
> torture test livelock.
> - Add paravirt qspinlock support, include KVM backend
> - Add Compact NUMA-awared qspinlock support
>
> V9:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as
> RCsc)
> - Add qspinlock and combo-lock for riscv
> - Add qspinlock to openrisc
> - Use generic header in csky
> - Optimize cmpxchg & atomic code
>
> V8:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Coding convention ticket fixup
> - Move combo spinlock into riscv and simply asm-generic/spinlock.h
> - Fixup xchg16 with wrong return value
> - Add csky qspinlock
> - Add combo & qspinlock & ticket-lock comparison
> - Clean up unnecessary riscv acquire and release definitions
> - Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky
>
> V7:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Add combo spinlock (ticket & queued) support
> - Rename ticket_spinlock.h
> - Remove unnecessary atomic_read in ticket_spin_value_unlocked
>
> V6:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Fixup Clang compile problem Reported-by: kernel test robot
> - Cleanup asm-generic/spinlock.h
> - Remove changelog in patch main comment part, suggested by
> Conor.Dooley
> - Remove "default y if NUMA" in Kconfig
>
> V5:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Update comment with RISC-V forward guarantee feature.
> - Back to V3 direction and optimize asm code.
>
> V4:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Remove custom sub-word xchg implementation
> - Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock
>
> V3:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Coding convention by Peter Zijlstra's advices
>
> V2:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Coding convention in cmpxchg.h
> - Re-implement short xchg
> - Remove char & cmpxchg implementations
>
> V1:
> https://lore.kernel.org/linux-riscv/[email protected]/
> - Using cmpxchg loop to implement sub-word atomic
>
> Guo Ren (14):
> asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
> asm-generic: ticket-lock: Add separate ticket-lock.h
> riscv: errata: Move errata vendor func-id into vendorid_list.h
> riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup
> riscv: qspinlock: Add basic queued_spinlock support
> riscv: qspinlock: Introduce combo spinlock
> riscv: qspinlock: Add virt_spin_lock() support for VM guest
> riscv: qspinlock: Force virt_spin_lock for KVM guests
> RISC-V: paravirt: Add pvqspinlock KVM backend
> RISC-V: paravirt: Add pvqspinlock frontend skeleton
> RISC-V: paravirt: pvqspinlock: Add SBI implementation
> RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter
> RISC-V: paravirt: pvqspinlock: Add kconfig entry
> RISC-V: paravirt: pvqspinlock: Add trace point for pv_kick/wait
>
> .../admin-guide/kernel-parameters.txt | 8 +-
> arch/riscv/Kconfig | 35 ++++++
> arch/riscv/Kconfig.errata | 19 ++++
> arch/riscv/errata/thead/errata.c | 20 ++++
> arch/riscv/include/asm/Kbuild | 3 +-
> arch/riscv/include/asm/errata_list.h | 18 ---
> arch/riscv/include/asm/kvm_vcpu_sbi.h | 1 +
> arch/riscv/include/asm/qspinlock.h | 35 ++++++
> arch/riscv/include/asm/qspinlock_paravirt.h | 29 +++++
> arch/riscv/include/asm/rwonce.h | 31 ++++++
> arch/riscv/include/asm/sbi.h | 14 +++
> arch/riscv/include/asm/spinlock.h | 88 +++++++++++++++
> arch/riscv/include/asm/vendorid_list.h | 19 ++++
> arch/riscv/include/uapi/asm/kvm.h | 1 +
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/qspinlock_paravirt.c | 83 ++++++++++++++
> arch/riscv/kernel/sbi.c | 2 +-
> arch/riscv/kernel/setup.c | 68 ++++++++++++
> .../kernel/trace_events_filter_paravirt.h | 60 ++++++++++
> arch/riscv/kvm/Makefile | 1 +
> arch/riscv/kvm/vcpu_sbi.c | 4 +
> arch/riscv/kvm/vcpu_sbi_pvlock.c | 57 ++++++++++
> include/asm-generic/qspinlock.h | 2 +
> include/asm-generic/rwonce.h | 2 +
> include/asm-generic/spinlock.h | 87 +--------------
> include/asm-generic/spinlock_types.h | 12 +-
> include/asm-generic/ticket_spinlock.h | 105 ++++++++++++++++++
> 27 files changed, 688 insertions(+), 117 deletions(-)
> create mode 100644 arch/riscv/include/asm/qspinlock.h
> create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h
> create mode 100644 arch/riscv/include/asm/rwonce.h
> create mode 100644 arch/riscv/include/asm/spinlock.h
> create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c
> create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h
> create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c
> create mode 100644 include/asm-generic/ticket_spinlock.h
>
> --
> 2.40.1
>
>
--
Best Regards
Guo Ren
On Mon, Dec 25, 2023 at 07:58:36AM -0500, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> Move errata vendor func-id definitions from errata_list into
> vendorid_list.h. Unifying these definitions is also for following
> rwonce errata implementation.
>
> Suggested-by: Leonardo Bras <[email protected]>
> Link: https://lore.kernel.org/linux-riscv/[email protected]/
> Signed-off-by: Guo Ren <[email protected]>
> Signed-off-by: Guo Ren <[email protected]>
> ---
> arch/riscv/include/asm/errata_list.h | 18 ------------------
> arch/riscv/include/asm/vendorid_list.h | 18 ++++++++++++++++++
> 2 files changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 83ed25e43553..31bbd9840e97 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -11,24 +11,6 @@
> #include <asm/hwcap.h>
> #include <asm/vendorid_list.h>
>
> -#ifdef CONFIG_ERRATA_ANDES
> -#define ERRATA_ANDESTECH_NO_IOCP 0
> -#define ERRATA_ANDESTECH_NUMBER 1
> -#endif
> -
> -#ifdef CONFIG_ERRATA_SIFIVE
> -#define ERRATA_SIFIVE_CIP_453 0
> -#define ERRATA_SIFIVE_CIP_1200 1
> -#define ERRATA_SIFIVE_NUMBER 2
> -#endif
> -
> -#ifdef CONFIG_ERRATA_THEAD
> -#define ERRATA_THEAD_PBMT 0
> -#define ERRATA_THEAD_CMO 1
> -#define ERRATA_THEAD_PMU 2
> -#define ERRATA_THEAD_NUMBER 3
> -#endif
> -
> #ifdef __ASSEMBLY__
>
> #define ALT_INSN_FAULT(x) \
> diff --git a/arch/riscv/include/asm/vendorid_list.h b/arch/riscv/include/asm/vendorid_list.h
> index e55407ace0c3..c503373193d2 100644
> --- a/arch/riscv/include/asm/vendorid_list.h
> +++ b/arch/riscv/include/asm/vendorid_list.h
> @@ -9,4 +9,22 @@
> #define SIFIVE_VENDOR_ID 0x489
> #define THEAD_VENDOR_ID 0x5b7
>
> +#ifdef CONFIG_ERRATA_ANDES
> +#define ERRATA_ANDESTECH_NO_IOCP 0
> +#define ERRATA_ANDESTECH_NUMBER 1
> +#endif
> +
> +#ifdef CONFIG_ERRATA_SIFIVE
> +#define ERRATA_SIFIVE_CIP_453 0
> +#define ERRATA_SIFIVE_CIP_1200 1
> +#define ERRATA_SIFIVE_NUMBER 2
> +#endif
> +
> +#ifdef CONFIG_ERRATA_THEAD
> +#define ERRATA_THEAD_PBMT 0
> +#define ERRATA_THEAD_CMO 1
> +#define ERRATA_THEAD_PMU 2
> +#define ERRATA_THEAD_NUMBER 3
> +#endif
> +
> #endif
> --
> 2.40.1
>
LGTM:
Reviewed-by: Leonardo Bras <[email protected]>
On Mon, Dec 25, 2023 at 07:58:39AM -0500, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> Combo spinlock could support queued and ticket in one Linux Image and
> select them during boot time via command line. Here is the func
> size (Bytes) comparison table below:
>
> TYPE : COMBO | TICKET | QUEUED
> arch_spin_lock : 106 | 60 | 50
> arch_spin_unlock : 54 | 36 | 26
> arch_spin_trylock : 110 | 72 | 54
> arch_spin_is_locked : 48 | 34 | 20
> arch_spin_is_contended : 56 | 40 | 24
> rch_spin_value_unlocked : 48 | 34 | 24
>
> One example of disassemble combo arch_spin_unlock:
> <+14>: nop # detour slot
> <+18>: fence rw,w --+-> queued_spin_unlock
> <+22>: sb zero,0(a4) --+ (2 instructions)
> <+26>: ld s0,8(sp)
> <+28>: addi sp,sp,16
> <+30>: ret
> <+32>: lw a5,0(a4) --+-> ticket_spin_unlock
> <+34>: sext.w a5,a5 | (7 instructions)
> <+36>: fence rw,w |
> <+40>: addiw a5,a5,1 |
> <+42>: slli a5,a5,0x30 |
> <+44>: srli a5,a5,0x30 |
> <+46>: sh a5,0(a4) --+
> <+50>: ld s0,8(sp)
> <+52>: addi sp,sp,16
> <+54>: ret
> The qspinlock is smaller and faster than ticket-lock when all are in a
> fast path.
>
> The combo spinlock could provide a compatible Linux Image for different
> micro-arch designs that have/haven't forward progress guarantee. Use
> command line options to select between qspinlock and ticket-lock, and
> the default is ticket-lock.
>
> Signed-off-by: Guo Ren <[email protected]>
> Signed-off-by: Guo Ren <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 2 +
> arch/riscv/Kconfig | 9 +++-
> arch/riscv/include/asm/spinlock.h | 48 +++++++++++++++++++
> arch/riscv/kernel/setup.c | 34 +++++++++++++
> include/asm-generic/qspinlock.h | 2 +
> include/asm-generic/ticket_spinlock.h | 2 +
> 6 files changed, 96 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 65731b060e3f..2ac9f1511774 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4753,6 +4753,8 @@
> [KNL] Number of legacy pty's. Overwrites compiled-in
> default number.
>
> + qspinlock [RISCV] Use native qspinlock.
> +
> quiet [KNL] Disable most log messages
>
> r128= [HW,DRM]
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index f345df0763b2..b7673c5c0997 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -434,7 +434,7 @@ config NODES_SHIFT
>
> choice
> prompt "RISC-V spinlock type"
> - default RISCV_TICKET_SPINLOCKS
> + default RISCV_COMBO_SPINLOCKS
>
> config RISCV_TICKET_SPINLOCKS
> bool "Using ticket spinlock"
> @@ -446,6 +446,13 @@ config RISCV_QUEUED_SPINLOCKS
> help
> Make sure your micro arch give cmpxchg/xchg forward progress
> guarantee. Otherwise, stay at ticket-lock.
> +
> +config RISCV_COMBO_SPINLOCKS
> + bool "Using combo spinlock"
> + depends on SMP && MMU
> + select ARCH_USE_QUEUED_SPINLOCKS
> + help
> + Select queued spinlock or ticket-lock by cmdline.
> endchoice
>
> config RISCV_ALTERNATIVE
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index 98a3da4b1056..d07643c07aae 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -7,12 +7,60 @@
> #define _Q_PENDING_LOOPS (1 << 9)
> #endif
>
> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> +#define __no_arch_spinlock_redefine
> +#include <asm/ticket_spinlock.h>
> +#include <asm/qspinlock.h>
> +#include <linux/jump_label.h>
> +
> +DECLARE_STATIC_KEY_TRUE(combo_qspinlock_key);
> +
> +#define COMBO_SPINLOCK_BASE_DECLARE(op) \
> +static __always_inline void arch_spin_##op(arch_spinlock_t *lock) \
> +{ \
> + if (static_branch_likely(&combo_qspinlock_key)) \
> + queued_spin_##op(lock); \
> + else \
> + ticket_spin_##op(lock); \
> +}
> +COMBO_SPINLOCK_BASE_DECLARE(lock)
> +COMBO_SPINLOCK_BASE_DECLARE(unlock)
> +
> +#define COMBO_SPINLOCK_IS_DECLARE(op) \
> +static __always_inline int arch_spin_##op(arch_spinlock_t *lock) \
> +{ \
> + if (static_branch_likely(&combo_qspinlock_key)) \
> + return queued_spin_##op(lock); \
> + else \
> + return ticket_spin_##op(lock); \
> +}
> +COMBO_SPINLOCK_IS_DECLARE(is_locked)
> +COMBO_SPINLOCK_IS_DECLARE(is_contended)
> +
> +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + if (static_branch_likely(&combo_qspinlock_key))
> + return queued_spin_trylock(lock);
> + else
> + return ticket_spin_trylock(lock);
> +}
> +
> +static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> +{
> + if (static_branch_likely(&combo_qspinlock_key))
> + return queued_spin_value_unlocked(lock);
> + else
> + return ticket_spin_value_unlocked(lock);
> +}
> +
Hello Guo Ren,
The above is much better than v11, but can be improved as I mentioned in
my reply in v11. Okay, I noticed there is a type issue: some return int,
others return bool while some return void, but it can be improved:
+#define COMBO_SPINLOCK_DECLARE(op, type) \
+static __always_inline type arch_spin_##op(arch_spinlock_t *lock) \
+{ \
+ if (static_branch_likely(&combo_qspinlock_key)) \
+ return queued_spin_##op(lock); \
+ else \
+ return ticket_spin_##op(lock); \
+}
+
+COMBO_SPINLOCK_DECLARE(lock, void)
+COMBO_SPINLOCK_DECLARE(unlock, void)
+COMBO_SPINLOCK_DECLARE(is_locked, int)
+COMBO_SPINLOCK_DECLARE(is_contended, int)
+COMBO_SPINLOCK_DECLARE(value_unlocked, int)
+COMBO_SPINLOCK_DECLARE(trylock, bool)
===
IIRC it's legal to return a void f1() from a void f2():
void f1() {}
void f2() {
return f1(); /* <- IIRC it's legal :)*/
}
===
> +#else /* CONFIG_RISCV_COMBO_SPINLOCKS */
> #ifdef CONFIG_QUEUED_SPINLOCKS
> #include <asm/qspinlock.h>
> #else
> #include <asm/ticket_spinlock.h>
> #endif
>
> +#endif /* CONFIG_RISCV_COMBO_SPINLOCKS */
> #include <asm/qrwlock.h>
>
> #endif /* __ASM_RISCV_SPINLOCK_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 535a837de55d..d9072a59831c 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -246,6 +246,37 @@ static void __init parse_dtb(void)
> #endif
> }
>
> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> +static bool enable_qspinlock __ro_after_init;
> +static int __init queued_spinlock_setup(char *p)
> +{
> + enable_qspinlock = true;
> +
> + return 0;
> +}
> +early_param("qspinlock", queued_spinlock_setup);
> +
> +/*
> + * Ticket-lock would dirty the lock value, so force qspinlock at
> + * first and switch to ticket-lock later.
> + * - key is true : qspinlock -> qspinlock (no change)
> + * - key is false: qspinlock -> ticket-lock
> + * (No ticket-lock -> qspinlock)
> + */
> +DEFINE_STATIC_KEY_TRUE(combo_qspinlock_key);
> +EXPORT_SYMBOL(combo_qspinlock_key);
> +
> +static void __init riscv_spinlock_init(void)
> +{
> + if (!enable_qspinlock) {
> + static_branch_disable(&combo_qspinlock_key);
> + pr_info("Ticket spinlock: enabled\n");
> + } else {
> + pr_info("Queued spinlock: enabled\n");
> + }
> +}
> +#endif
> +
> extern void __init init_rt_signal_env(void);
>
> void __init setup_arch(char **cmdline_p)
> @@ -297,6 +328,9 @@ void __init setup_arch(char **cmdline_p)
> riscv_set_dma_cache_alignment();
>
> riscv_user_isa_enable();
> +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> + riscv_spinlock_init();
> +#endif
> }
>
> static int __init topology_init(void)
> diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
> index 0655aa5b57b2..bf47cca2c375 100644
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> }
> #endif
>
> +#ifndef __no_arch_spinlock_redefine
> /*
> * Remapping spinlock architecture specific functions to the corresponding
> * queued spinlock functions.
> @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock)
> #define arch_spin_lock(l) queued_spin_lock(l)
> #define arch_spin_trylock(l) queued_spin_trylock(l)
> #define arch_spin_unlock(l) queued_spin_unlock(l)
> +#endif
>
> #endif /* __ASM_GENERIC_QSPINLOCK_H */
> diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
> index cfcff22b37b3..325779970d8a 100644
> --- a/include/asm-generic/ticket_spinlock.h
> +++ b/include/asm-generic/ticket_spinlock.h
> @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> return (s16)((val >> 16) - (val & 0xffff)) > 1;
> }
>
> +#ifndef __no_arch_spinlock_redefine
> /*
> * Remapping spinlock architecture specific functions to the corresponding
> * ticket spinlock functions.
> @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
> #define arch_spin_lock(l) ticket_spin_lock(l)
> #define arch_spin_trylock(l) ticket_spin_trylock(l)
> #define arch_spin_unlock(l) ticket_spin_unlock(l)
> +#endif
>
> #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
> --
> 2.40.1
>
On Mon, Dec 25, 2023 at 07:58:40AM -0500, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> Add a static key controlling whether virt_spin_lock() should be
> called or not. When running on bare metal set the new key to
> false.
>
> The VM guests should fall back to a Test-and-Set spinlock,
> because fair locks have horrible lock 'holder' preemption issues.
> The virt_spin_lock_key would shortcut for the queued_spin_lock_-
> slowpath() function that allow virt_spin_lock to hijack it.
>
> Signed-off-by: Guo Ren <[email protected]>
> Signed-off-by: Guo Ren <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 4 +++
> arch/riscv/include/asm/spinlock.h | 22 ++++++++++++++++
> arch/riscv/kernel/setup.c | 26 +++++++++++++++++++
> 3 files changed, 52 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 2ac9f1511774..b7794c96d91e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3997,6 +3997,10 @@
> no_uaccess_flush
> [PPC] Don't flush the L1-D cache after accessing user data.
>
> + no_virt_spin [RISC-V] Disable virt_spin_lock in VM guest to use
> + native_queued_spinlock when the nopvspin option is enabled.
> + This would help vcpu=pcpu scenarios.
> +
> novmcoredd [KNL,KDUMP]
> Disable device dump. Device dump allows drivers to
> append dump data to vmcore so you can collect driver
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index d07643c07aae..7bbcf3d9fff0 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -4,6 +4,28 @@
> #define __ASM_RISCV_SPINLOCK_H
>
> #ifdef CONFIG_QUEUED_SPINLOCKS
> +/*
> + * The KVM guests fall back to a Test-and-Set spinlock, because fair locks
> + * have horrible lock 'holder' preemption issues. The virt_spin_lock_key
> + * would shortcut for the queued_spin_lock_slowpath() function that allow
> + * virt_spin_lock to hijack it.
> + */
> +DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +
> +#define virt_spin_lock virt_spin_lock
> +static inline bool virt_spin_lock(struct qspinlock *lock)
> +{
> + if (!static_branch_likely(&virt_spin_lock_key))
> + return false;
> +
> + do {
> + while (atomic_read(&lock->val) != 0)
> + cpu_relax();
> + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> +
> + return true;
> +}
> +
> #define _Q_PENDING_LOOPS (1 << 9)
> #endif
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index d9072a59831c..0bafb9fd6ea3 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -27,6 +27,7 @@
> #include <asm/cacheflush.h>
> #include <asm/cpufeature.h>
> #include <asm/cpu_ops.h>
> +#include <asm/cpufeature.h>
> #include <asm/early_ioremap.h>
> #include <asm/pgtable.h>
> #include <asm/setup.h>
> @@ -266,6 +267,27 @@ early_param("qspinlock", queued_spinlock_setup);
> DEFINE_STATIC_KEY_TRUE(combo_qspinlock_key);
> EXPORT_SYMBOL(combo_qspinlock_key);
>
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +static bool no_virt_spin __ro_after_init;
> +static int __init no_virt_spin_setup(char *p)
> +{
> + no_virt_spin = true;
> +
> + return 0;
> +}
> +early_param("no_virt_spin", no_virt_spin_setup);
> +
> +DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +
> +static void __init virt_spin_lock_init(void)
> +{
> + if (no_virt_spin)
> + static_branch_disable(&virt_spin_lock_key);
> + else
> + pr_info("Enable virt_spin_lock\n");
> +}
> +#endif
> +
> static void __init riscv_spinlock_init(void)
> {
> if (!enable_qspinlock) {
> @@ -274,6 +296,10 @@ static void __init riscv_spinlock_init(void)
> } else {
> pr_info("Queued spinlock: enabled\n");
> }
> +
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> + virt_spin_lock_init();
> +#endif
> }
> #endif
>
> --
> 2.40.1
>
LGTM:
Reviewed-by: Leonardo Bras <[email protected]>
On Mon, Dec 25, 2023 at 07:58:41AM -0500, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> Force to enable virt_spin_lock when KVM guest, because fair locks
> have horrible lock 'holder' preemption issues.
>
> Suggested-by: Leonardo Bras <[email protected]>
> Link: https://lkml.kernel.org/kvm/[email protected]/
> Signed-off-by: Guo Ren <[email protected]>
> Signed-off-by: Guo Ren <[email protected]>
> ---
> arch/riscv/include/asm/sbi.h | 8 ++++++++
> arch/riscv/kernel/sbi.c | 2 +-
> arch/riscv/kernel/setup.c | 6 +++++-
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 0892f4421bc4..8f748d9e1b85 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -51,6 +51,13 @@ enum sbi_ext_base_fid {
> SBI_EXT_BASE_GET_MIMPID,
> };
>
> +enum sbi_ext_base_impl_id {
> + SBI_EXT_BASE_IMPL_ID_BBL = 0,
> + SBI_EXT_BASE_IMPL_ID_OPENSBI,
> + SBI_EXT_BASE_IMPL_ID_XVISOR,
> + SBI_EXT_BASE_IMPL_ID_KVM,
> +};
> +
> enum sbi_ext_time_fid {
> SBI_EXT_TIME_SET_TIMER = 0,
> };
> @@ -276,6 +283,7 @@ int sbi_console_getchar(void);
> long sbi_get_mvendorid(void);
> long sbi_get_marchid(void);
> long sbi_get_mimpid(void);
> +long sbi_get_firmware_id(void);
> void sbi_set_timer(uint64_t stime_value);
> void sbi_shutdown(void);
> void sbi_send_ipi(unsigned int cpu);
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index 5a62ed1da453..4330aedf65fd 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -543,7 +543,7 @@ static inline long sbi_get_spec_version(void)
> return __sbi_base_ecall(SBI_EXT_BASE_GET_SPEC_VERSION);
> }
>
> -static inline long sbi_get_firmware_id(void)
> +long sbi_get_firmware_id(void)
> {
> return __sbi_base_ecall(SBI_EXT_BASE_GET_IMP_ID);
> }
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 0bafb9fd6ea3..e33430e9d97e 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -281,6 +281,9 @@ DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
>
> static void __init virt_spin_lock_init(void)
> {
> + if (sbi_get_firmware_id() != SBI_EXT_BASE_IMPL_ID_KVM)
> + no_virt_spin = true;
> +
> if (no_virt_spin)
> static_branch_disable(&virt_spin_lock_key);
> else
> @@ -290,7 +293,8 @@ static void __init virt_spin_lock_init(void)
>
> static void __init riscv_spinlock_init(void)
> {
> - if (!enable_qspinlock) {
> + if ((!enable_qspinlock) &&
> + (sbi_get_firmware_id() != SBI_EXT_BASE_IMPL_ID_KVM)) {
> static_branch_disable(&combo_qspinlock_key);
> pr_info("Ticket spinlock: enabled\n");
> } else {
> --
> 2.40.1
>
LGTM:
Reviewed-by: Leonardo Bras <[email protected]>