This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the
cmpxchg-based lockless lockref implementation for riscv. Then,
implement arch_cmpxchg64_{relaxed|acquire|release}.
After patch1:
Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
On JH7110 platform, I see 12.0% improvement.
After patch2:
on both TH1520 and JH7110 platforms, I didn't see obvious
performance improvement with Linus' test case [1]. IMHO, this may
be related with the fence and lr.d/sc.d hw implementations. In theory,
lr/sc without fence could give performance improvement over lr/sc plus
fence, so add the code here to leave performance improvement room on
newer HW platforms.
Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
Since v1:
- only select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
Jisheng Zhang (2):
riscv: select ARCH_USE_CMPXCHG_LOCKREF
riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/cmpxchg.h | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)
--
2.42.0
After selecting ARCH_USE_CMPXCHG_LOCKREF, one straight futher
optimization is implementing the arch_cmpxchg64_relaxed() because the
lockref code does not need the cmpxchg to have barrier semantics. At
the same time, implement arch_cmpxchg64_acquire and
arch_cmpxchg64_release as well.
However, on both TH1520 and JH7110 platforms, I didn't see obvious
performance improvement with Linus' test case [1]. IMHO, this may
be related with the fence and lr.d/sc.d hw implementations. In theory,
lr/sc without fence could give performance improvement over lr/sc plus
fence, so add the code here to leave performance improvement room on
newer HW platforms.
Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
Signed-off-by: Jisheng Zhang <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 2f4726d3cfcc..6318187f426f 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -360,4 +360,22 @@
arch_cmpxchg_relaxed((ptr), (o), (n)); \
})
+#define arch_cmpxchg64_relaxed(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
+ arch_cmpxchg_relaxed((ptr), (o), (n)); \
+})
+
+#define arch_cmpxchg64_acquire(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
+ arch_cmpxchg_acquire((ptr), (o), (n)); \
+})
+
+#define arch_cmpxchg64_release(ptr, o, n) \
+({ \
+ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
+ arch_cmpxchg_release((ptr), (o), (n)); \
+})
+
#endif /* _ASM_RISCV_CMPXCHG_H */
--
2.42.0
On Sat, Dec 02, 2023 at 10:03:21PM +0800, Jisheng Zhang wrote:
> This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the
> cmpxchg-based lockless lockref implementation for riscv. Then,
> implement arch_cmpxchg64_{relaxed|acquire|release}.
>
> After patch1:
> Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
> On JH7110 platform, I see 12.0% improvement.
>
> After patch2:
> on both TH1520 and JH7110 platforms, I didn't see obvious
> performance improvement with Linus' test case [1]. IMHO, this may
> be related with the fence and lr.d/sc.d hw implementations. In theory,
> lr/sc without fence could give performance improvement over lr/sc plus
> fence, so add the code here to leave performance improvement room on
> newer HW platforms.
>
> Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
Hi Palmer,
this series is also missed, let me know if there's something need to be
done.
Thanks
>
> Since v1:
> - only select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
>
> Jisheng Zhang (2):
> riscv: select ARCH_USE_CMPXCHG_LOCKREF
> riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
>
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/cmpxchg.h | 18 ++++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> --
> 2.42.0
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
On Sat, Dec 02, 2023 at 10:03:21PM +0800, Jisheng Zhang wrote:
> This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the
> cmpxchg-based lockless lockref implementation for riscv. Then,
> implement arch_cmpxchg64_{relaxed|acquire|release}.
>
> After patch1:
> Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
> On JH7110 platform, I see 12.0% improvement.
>
> After patch2:
> on both TH1520 and JH7110 platforms, I didn't see obvious
> performance improvement with Linus' test case [1]. IMHO, this may
> be related with the fence and lr.d/sc.d hw implementations. In theory,
> lr/sc without fence could give performance improvement over lr/sc plus
> fence, so add the code here to leave performance improvement room on
> newer HW platforms.
>
> Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
>
> Since v1:
> - only select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
>
> Jisheng Zhang (2):
> riscv: select ARCH_USE_CMPXCHG_LOCKREF
> riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
For the series,
Reviewed-by: Andrea Parri <[email protected]> # code audit, QEMU tests
Andrea