2024-04-01 21:39:31

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg()

Hello!

This series provides emulation functions for one-byte and two-byte
cmpxchg, and uses them for those architectures not supporting these
in hardware. The emulation is in terms of the fully ordered four-byte
cmpxchg() that is supplied by all of these architectures. These were
tested by making x86 forget that it can do these natively:

88a9b3f7a924 ("EXP arch/x86: Test one-byte cmpxchg emulation")

This commit is local to -rcu and is of course not intended for mainline.

If accepted, RCU Tasks will use this capability in place of the current
rcu_trc_cmpxchg_need_qs() open-coding of this emulation.

1. Add one-byte and two-byte cmpxchg() emulation functions.

2. sparc: Emulate one-byte and two-byte cmpxchg.

3. ARC: Emulate one-byte and two-byte cmpxchg.

4. csky: Emulate one-byte and two-byte cmpxchg.

5. sh: Emulate one-byte and two-byte cmpxchg.

6. xtensa: Emulate one-byte and two-byte cmpxchg.

7. parisc: Emulate two-byte cmpxchg.

8. riscv: Emulate one-byte and two-byte cmpxchg.

Thanx, Paul

------------------------------------------------------------------------

arch/Kconfig | 3 +
arch/arc/Kconfig | 1
arch/arc/include/asm/cmpxchg.h | 38 ++++++++++++++----
arch/csky/Kconfig | 1
arch/csky/include/asm/cmpxchg.h | 18 ++++++++
arch/parisc/Kconfig | 1
arch/parisc/include/asm/cmpxchg.h | 1
arch/riscv/Kconfig | 1
arch/riscv/include/asm/cmpxchg.h | 25 ++++++++++++
arch/sh/Kconfig | 1
arch/sh/include/asm/cmpxchg.h | 4 +
arch/sparc/Kconfig | 1
arch/sparc/include/asm/cmpxchg_32.h | 6 ++
arch/xtensa/Kconfig | 1
arch/xtensa/include/asm/cmpxchg.h | 3 +
include/linux/cmpxchg-emu.h | 16 +++++++
lib/Makefile | 1
lib/cmpxchg-emu.c | 74 ++++++++++++++++++++++++++++++++++++
18 files changed, 187 insertions(+), 9 deletions(-)


2024-04-01 21:40:01

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions

Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte and two-byte cmpxchg()
would be extremely useful. Therefore, provide cmpxchg_emu_u8() and
cmpxchg_emu_u16() that emulate one-byte and two-byte cmpxchg() in terms
of four-byte cmpxchg().

Note that these emulations are fully ordered, and can (for example)
cause one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.

[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]

Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Douglas Anderson <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: <[email protected]>
---
arch/Kconfig | 3 ++
include/linux/cmpxchg-emu.h | 16 ++++++++
lib/Makefile | 1 +
lib/cmpxchg-emu.c | 74 +++++++++++++++++++++++++++++++++++++
4 files changed, 94 insertions(+)
create mode 100644 include/linux/cmpxchg-emu.h
create mode 100644 lib/cmpxchg-emu.c

diff --git a/arch/Kconfig b/arch/Kconfig
index ae4a4f37bbf08..01093c60952a5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
# strict alignment always, even with -falign-functions.
def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG

+config ARCH_NEED_CMPXCHG_1_2_EMU
+ bool
+
endmenu
diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
new file mode 100644
index 0000000000000..fee8171fa05eb
--- /dev/null
+++ b/include/linux/cmpxchg-emu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes. These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#ifndef __LINUX_CMPXCHG_EMU_H
+#define __LINUX_CMPXCHG_EMU_H
+
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
+uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new);
+
+#endif /* __LINUX_CMPXCHG_EMU_H */
diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45a..1d93b61a7ecbe 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
lib-$(CONFIG_GENERIC_BUG) += bug.o

obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
+obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o

obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
#ensure exported functions have prototypes
diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
new file mode 100644
index 0000000000000..a88c4f3c88430
--- /dev/null
+++ b/lib/cmpxchg-emu.c
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes. These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/instrumented.h>
+#include <linux/atomic.h>
+#include <linux/panic.h>
+#include <linux/bug.h>
+#include <asm-generic/rwonce.h>
+#include <linux/cmpxchg-emu.h>
+
+union u8_32 {
+ u8 b[4];
+ u32 w;
+};
+
+/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
+{
+ u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+ int i = ((uintptr_t)p) & 0x3;
+ union u8_32 old32;
+ union u8_32 new32;
+ u32 ret;
+
+ ret = READ_ONCE(*p32);
+ do {
+ old32.w = ret;
+ if (old32.b[i] != old)
+ return old32.b[i];
+ new32.w = old32.w;
+ new32.b[i] = new;
+ instrument_atomic_read_write(p, 1);
+ ret = data_race(cmpxchg(p32, old32.w, new32.w));
+ } while (ret != old32.w);
+ return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
+
+union u16_32 {
+ u16 h[2];
+ u32 w;
+};
+
+/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new)
+{
+ u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+ int i = (((uintptr_t)p) & 0x2) / 2;
+ union u16_32 old32;
+ union u16_32 new32;
+ u32 ret;
+
+ WARN_ON_ONCE(((uintptr_t)p) & 0x1);
+ ret = READ_ONCE(*p32);
+ do {
+ old32.w = ret;
+ if (old32.h[i] != old)
+ return old32.h[i];
+ new32.w = old32.w;
+ new32.h[i] = new;
+ instrument_atomic_read_write(p, 2);
+ ret = data_race(cmpxchg(p32, old32.w, new32.w));
+ } while (ret != old32.w);
+ return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
--
2.40.1


2024-04-01 21:40:24

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 5/8] sh: Emulate one-byte and two-byte cmpxchg

Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
and two-byte cmpxchg() on sh.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: <[email protected]>
---
arch/sh/Kconfig | 1 +
arch/sh/include/asm/cmpxchg.h | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2ad3e29f0ebec..43a121c6ca0f3 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -16,6 +16,7 @@ config SUPERH
select ARCH_HIBERNATION_POSSIBLE if MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_WANT_IPC_PARSE_VERSION
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select CPU_NO_EFFICIENT_FFS
select DMA_DECLARE_COHERENT
select GENERIC_ATOMIC64
diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
index 5d617b3ef78f7..18233cc14419e 100644
--- a/arch/sh/include/asm/cmpxchg.h
+++ b/arch/sh/include/asm/cmpxchg.h
@@ -56,6 +56,10 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
unsigned long new, int size)
{
switch (size) {
+ case 1:
+ return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
+ case 2:
+ return cmpxchg_emu_u16((volatile u16 *)ptr, old, new);
case 4:
return __cmpxchg_u32(ptr, old, new);
}
--
2.40.1


2024-04-01 21:40:27

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
and two-byte cmpxchg() on arc.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: <[email protected]>
---
arch/arc/Kconfig | 1 +
arch/arc/include/asm/cmpxchg.h | 38 ++++++++++++++++++++++++++--------
2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 99d2845f3feb9..0b40039f38eb2 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -14,6 +14,7 @@ config ARC
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select ARCH_32BIT_OFF_T
select BUILDTIME_TABLE_SORT
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e138fde067dea..1e3e23adaca13 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -46,6 +46,12 @@
__typeof__(*(ptr)) _prev_; \
\
switch(sizeof((_p_))) { \
+ case 1: \
+ _prev_ = cmpxchg_emu_u8((volatile u8 *)_p_, _o_, _n_); \
+ break; \
+ case 2: \
+ _prev_ = cmpxchg_emu_u16((volatile u16 *)_p_, _o_, _n_); \
+ break; \
case 4: \
_prev_ = __cmpxchg(_p_, _o_, _n_); \
break; \
@@ -65,16 +71,30 @@
__typeof__(*(ptr)) _prev_; \
unsigned long __flags; \
\
- BUILD_BUG_ON(sizeof(_p_) != 4); \
+ switch(sizeof((_p_))) { \
+ case 1: \
+ __flags = cmpxchg_emu_u8((volatile u8 *)_p_, _o_, _n_); \
+ _prev_ = (__typeof__(*(ptr)))__flags; \
+ break; \
+ case 2: \
+ __flags = cmpxchg_emu_u16((volatile u16 *)_p_, _o_, _n_); \
+ _prev_ = (__typeof__(*(ptr)))__flags; \
+ break; \
+ case 4: \
+ /* \
+ * spin lock/unlock provide the needed smp_mb() \
+ * before/after \
+ */ \
+ atomic_ops_lock(__flags); \
+ _prev_ = *_p_; \
+ if (_prev_ == _o_) \
+ *_p_ = _n_; \
+ atomic_ops_unlock(__flags); \
+ break; \
+ default: \
+ BUILD_BUG(); \
+ } \
\
- /* \
- * spin lock/unlock provide the needed smp_mb() before/after \
- */ \
- atomic_ops_lock(__flags); \
- _prev_ = *_p_; \
- if (_prev_ == _o_) \
- *_p_ = _n_; \
- atomic_ops_unlock(__flags); \
_prev_; \
})

--
2.40.1


2024-04-01 21:40:28

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
and two-byte cmpxchg() on 32-bit sparc.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Marco Elver <[email protected]>
---
arch/sparc/Kconfig | 1 +
arch/sparc/include/asm/cmpxchg_32.h | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 11bf9d312318c..e6d1bbc4fedd0 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -55,6 +55,7 @@ config SPARC32
select ARCH_32BIT_OFF_T
select ARCH_HAS_CPU_FINALIZE_INIT if !SMP
select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select CLZ_TAB
select DMA_DIRECT_REMAP
select GENERIC_ATOMIC64
diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index d0af82c240b73..8eb483b5887a1 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -41,11 +41,17 @@ void __cmpxchg_called_with_bad_pointer(void);
/* we only need to support cmpxchg of a u32 on sparc */
unsigned long __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);

+#include <linux/cmpxchg-emu.h>
+
/* don't worry...optimizer will get rid of most of this */
static inline unsigned long
__cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
{
switch (size) {
+ case 1:
+ return cmpxchg_emu_u8((volatile u8 *)ptr, old, new_);
+ case 2:
+ return cmpxchg_emu_u16((volatile u16 *)ptr, old, new_);
case 4:
return __cmpxchg_u32((u32 *)ptr, (u32)old, (u32)new_);
default:
--
2.40.1


2024-04-01 21:40:30

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 4/8] csky: Emulate one-byte and two-byte cmpxchg

Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
and two-byte cmpxchg() on csky.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: <[email protected]>
---
arch/csky/Kconfig | 1 +
arch/csky/include/asm/cmpxchg.h | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index d3ac36751ad1f..860d4e02d6295 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -37,6 +37,7 @@ config CSKY
select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace)
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
select COMMON_CLK
diff --git a/arch/csky/include/asm/cmpxchg.h b/arch/csky/include/asm/cmpxchg.h
index 916043b845f14..848a8691c5a2a 100644
--- a/arch/csky/include/asm/cmpxchg.h
+++ b/arch/csky/include/asm/cmpxchg.h
@@ -61,6 +61,12 @@
__typeof__(old) __old = (old); \
__typeof__(*(ptr)) __ret; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
+ break; \
case 4: \
asm volatile ( \
"1: ldex.w %0, (%3) \n" \
@@ -91,6 +97,12 @@
__typeof__(old) __old = (old); \
__typeof__(*(ptr)) __ret; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
+ break; \
case 4: \
asm volatile ( \
"1: ldex.w %0, (%3) \n" \
@@ -122,6 +134,12 @@
__typeof__(old) __old = (old); \
__typeof__(*(ptr)) __ret; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
+ break; \
case 4: \
asm volatile ( \
RELEASE_FENCE \
--
2.40.1


2024-04-01 21:40:31

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 6/8] xtensa: Emulate one-byte and two-byte cmpxchg

Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
and two-byte cmpxchg() on xtensa.

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
---
arch/xtensa/Kconfig | 1 +
arch/xtensa/include/asm/cmpxchg.h | 3 +++
2 files changed, 4 insertions(+)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index f200a4ec044e6..8131f0d75bb58 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -14,6 +14,7 @@ config XTENSA
select ARCH_HAS_DMA_SET_UNCACHED if MMU
select ARCH_HAS_STRNCPY_FROM_USER if !KASAN
select ARCH_HAS_STRNLEN_USER
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/xtensa/include/asm/cmpxchg.h b/arch/xtensa/include/asm/cmpxchg.h
index 675a11ea8de76..a0f9a2070209b 100644
--- a/arch/xtensa/include/asm/cmpxchg.h
+++ b/arch/xtensa/include/asm/cmpxchg.h
@@ -15,6 +15,7 @@

#include <linux/bits.h>
#include <linux/stringify.h>
+#include <linux/cmpxchg-emu.h>

/*
* cmpxchg
@@ -74,6 +75,8 @@ static __inline__ unsigned long
__cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
{
switch (size) {
+ case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
+ case 2: return cmpxchg_emu_u16((volatile u16 *)ptr, old, new);
case 4: return __cmpxchg_u32(ptr, old, new);
default: __cmpxchg_called_with_bad_pointer();
return old;
--
2.40.1


2024-04-01 21:40:31

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 7/8] parisc: Emulate two-byte cmpxchg

Use the new cmpxchg_emu_u16() to emulate two-byte cmpxchg() on parisc.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: <[email protected]>
---
arch/parisc/Kconfig | 1 +
arch/parisc/include/asm/cmpxchg.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index daafeb20f9937..06f221a3f3459 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -15,6 +15,7 @@ config PARISC
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_UBSAN
select ARCH_HAS_PTE_SPECIAL
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select ARCH_NO_SG_CHAIN
select ARCH_SUPPORTS_HUGETLBFS if PA20
select ARCH_SUPPORTS_MEMORY_FAILURE
diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index c1d776bb16b4e..f909c000b6577 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -72,6 +72,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
#endif
case 4: return __cmpxchg_u32((unsigned int *)ptr,
(unsigned int)old, (unsigned int)new_);
+ case 2: return cmpxchg_emu_u16((volatile u16 *)ptr, old, new);
case 1: return __cmpxchg_u8((u8 *)ptr, old & 0xff, new_ & 0xff);
}
__cmpxchg_called_with_bad_pointer();
--
2.40.1


2024-04-01 21:42:14

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH RFC cmpxchg 8/8] riscv: Emulate one-byte and two-byte cmpxchg

Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
and two-byte cmpxchg() on riscv.

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/cmpxchg.h | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56b..4eaf40d0a52ec 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -44,6 +44,7 @@ config RISCV
select ARCH_HAS_UBSAN
select ARCH_HAS_VDSO_DATA
select ARCH_KEEP_MEMBLOCK if ACPI
+ select ARCH_NEED_CMPXCHG_1_2_EMU
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
select ARCH_STACKWALK
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 2fee65cc84432..a5b377481785c 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -9,6 +9,7 @@
#include <linux/bug.h>

#include <asm/fence.h>
+#include <linux/cmpxchg-emu.h>

#define __xchg_relaxed(ptr, new, size) \
({ \
@@ -170,6 +171,12 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ break; \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
case 4: \
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
@@ -214,6 +221,12 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ break; \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
case 4: \
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
@@ -260,6 +273,12 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ break; \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
case 4: \
__asm__ __volatile__ ( \
RISCV_RELEASE_BARRIER \
@@ -306,6 +325,12 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
+ case 2: \
+ break; \
+ __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
case 4: \
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
--
2.40.1


2024-04-01 22:38:19

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Mon, Apr 01, 2024 at 02:39:44PM -0700, Paul E. McKenney wrote:
> Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> and two-byte cmpxchg() on 32-bit sparc.

> __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
> {
> switch (size) {
> + case 1:
> + return cmpxchg_emu_u8((volatile u8 *)ptr, old, new_);
> + case 2:
> + return cmpxchg_emu_u16((volatile u16 *)ptr, old, new_);
> case 4:
> return __cmpxchg_u32((u32 *)ptr, (u32)old, (u32)new_);
> default:

Considering how awful sparc32 32bit cmpxchg is, it might be better to
implement those directly rather than trying to do them on top of
that. Something like

#define CMPXCHG(T) \
T __cmpxchg_##T(volatile ##T *ptr, ##T old, ##T new) \
{ \
unsigned long flags; \
##T prev; \
\
spin_lock_irqsave(ATOMIC_HASH(ptr), flags); \
if ((prev = *ptr) == old) \
*ptr = new; \
spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);\
return prev; \
}

CMPXCHG(u8)
CMPXCHG(u16)
CMPXCHG(u32)
CMPXCHG(u64)

in arch/sparc/lib/atomic32.c, replacing equivalent __cmpxchg_u{32,64}()
definitions already there and use of those in that switch in __cmpxchg()
definition...

2024-04-01 23:58:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Mon, Apr 01, 2024 at 11:38:03PM +0100, Al Viro wrote:
> On Mon, Apr 01, 2024 at 02:39:44PM -0700, Paul E. McKenney wrote:
> > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> > and two-byte cmpxchg() on 32-bit sparc.
>
> > __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
> > {
> > switch (size) {
> > + case 1:
> > + return cmpxchg_emu_u8((volatile u8 *)ptr, old, new_);
> > + case 2:
> > + return cmpxchg_emu_u16((volatile u16 *)ptr, old, new_);
> > case 4:
> > return __cmpxchg_u32((u32 *)ptr, (u32)old, (u32)new_);
> > default:
>
> Considering how awful sparc32 32bit cmpxchg is, it might be better to
> implement those directly rather than trying to do them on top of
> that. Something like
>
> #define CMPXCHG(T) \
> T __cmpxchg_##T(volatile ##T *ptr, ##T old, ##T new) \
> { \
> unsigned long flags; \
> ##T prev; \
> \
> spin_lock_irqsave(ATOMIC_HASH(ptr), flags); \
> if ((prev = *ptr) == old) \
> *ptr = new; \
> spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);\
> return prev; \
> }
>
> CMPXCHG(u8)
> CMPXCHG(u16)
> CMPXCHG(u32)
> CMPXCHG(u64)
>
> in arch/sparc/lib/atomic32.c, replacing equivalent __cmpxchg_u{32,64}()
> definitions already there and use of those in that switch in __cmpxchg()
> definition...

Fair enough, and ATOMIC_HASH() is set up to permit mixed-size atomic
accesses courtesy of ignoring the bottom bits, though ignoring more
of them than absolutely necessary. Maybe 32-bit sparc has 32-byte
cache lines?

Would you like to do that patch? If so, I would be happy to drop mine
in favor of yours. If not, could I please have your Signed-off-by so
I can do the Co-developed-by dance?

Thanx, Paul

2024-04-02 02:11:47

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Mon, Apr 01, 2024 at 04:58:03PM -0700, Paul E. McKenney wrote:
> > #define CMPXCHG(T) \
> > T __cmpxchg_##T(volatile ##T *ptr, ##T old, ##T new) \
^^^

*blink*

I understand what search-and-replace has produced that, but not
how I hadn't noticed the results... Sorry ;-/

> > { \
> > unsigned long flags; \
> > ##T prev; \
> > \
> > spin_lock_irqsave(ATOMIC_HASH(ptr), flags); \
> > if ((prev = *ptr) == old) \
> > *ptr = new; \
> > spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);\
> > return prev; \
> > }
> >
> > CMPXCHG(u8)
> > CMPXCHG(u16)
> > CMPXCHG(u32)
> > CMPXCHG(u64)
> >
> > in arch/sparc/lib/atomic32.c, replacing equivalent __cmpxchg_u{32,64}()
> > definitions already there and use of those in that switch in __cmpxchg()
> > definition...
>
> Fair enough, and ATOMIC_HASH() is set up to permit mixed-size atomic
> accesses courtesy of ignoring the bottom bits, though ignoring more
> of them than absolutely necessary. Maybe 32-bit sparc has 32-byte
> cache lines?

It does, IIRC.

> Would you like to do that patch? If so, I would be happy to drop mine
> in favor of yours. If not, could I please have your Signed-off-by so
> I can do the Co-developed-by dance?

Will do once I dig my way from under the pile of mail (sick for a week
and subscribed to l-k, among other lists)...

2024-04-02 03:38:10

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 02, 2024 at 01:07:58AM +0100, Al Viro wrote:

> It does, IIRC.
>
> > Would you like to do that patch? If so, I would be happy to drop mine
> > in favor of yours. If not, could I please have your Signed-off-by so
> > I can do the Co-developed-by dance?
>
> Will do once I dig my way from under the pile of mail (sick for a week
> and subscribed to l-k, among other lists)...

FWIW, parisc is in the same situation - atomics-by-cached-spinlocks.
've a candidate branch, will post if it survives build...

Re parisc: why does it bother with arch_cmpxchg_local()? Default is
* save and disable local interrupts
* read the current value, compare to old
* if equal, store new there
* restore local interrupts
For 32bit case parisc goes for __cmpxchg_u32(), which is
* if (SMP) choose the spinlock (indexed by hash of address)
* save and disable local interrupes
* if (SMP) arch_spin_lock(spinlock)
* read the current value, compare to old
* if equal, store new there
* if (SMP) arch_spin_unlock(spinlock)
* restore local interrupts
In UP case it's identical to generic; on SMP it's strictly more work.
Unless I'm very confused about cmpxchg_local() semantics, the
callers do not expect atomicity wrt other CPUs, so why do we bother?

2024-04-02 04:17:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 02, 2024 at 04:37:53AM +0100, Al Viro wrote:
> On Tue, Apr 02, 2024 at 01:07:58AM +0100, Al Viro wrote:
>
> > It does, IIRC.
> >
> > > Would you like to do that patch? If so, I would be happy to drop mine
> > > in favor of yours. If not, could I please have your Signed-off-by so
> > > I can do the Co-developed-by dance?
> >
> > Will do once I dig my way from under the pile of mail (sick for a week
> > and subscribed to l-k, among other lists)...
>
> FWIW, parisc is in the same situation - atomics-by-cached-spinlocks.
> 've a candidate branch, will post if it survives build...

I am sure that it seemed like a good idea at the time. ;-)

> Re parisc: why does it bother with arch_cmpxchg_local()? Default is
> * save and disable local interrupts
> * read the current value, compare to old
> * if equal, store new there
> * restore local interrupts
> For 32bit case parisc goes for __cmpxchg_u32(), which is
> * if (SMP) choose the spinlock (indexed by hash of address)
> * save and disable local interrupes
> * if (SMP) arch_spin_lock(spinlock)
> * read the current value, compare to old
> * if equal, store new there
> * if (SMP) arch_spin_unlock(spinlock)
> * restore local interrupts
> In UP case it's identical to generic; on SMP it's strictly more work.
> Unless I'm very confused about cmpxchg_local() semantics, the
> callers do not expect atomicity wrt other CPUs, so why do we bother?

;-) ;-) ;-)

In any case, happy to replace my patches with yours whenever you have
them ready.

Thanx, Paul

2024-04-02 04:18:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 02, 2024 at 05:11:38AM +0100, Al Viro wrote:
> On Tue, Apr 02, 2024 at 04:37:53AM +0100, Al Viro wrote:
> > On Tue, Apr 02, 2024 at 01:07:58AM +0100, Al Viro wrote:
> >
> > > It does, IIRC.
> > >
> > > > Would you like to do that patch? If so, I would be happy to drop mine
> > > > in favor of yours. If not, could I please have your Signed-off-by so
> > > > I can do the Co-developed-by dance?
> > >
> > > Will do once I dig my way from under the pile of mail (sick for a week
> > > and subscribed to l-k, among other lists)...
> >
> > FWIW, parisc is in the same situation - atomics-by-cached-spinlocks.
> > 've a candidate branch, will post if it survives build...
>
> Seems to survive. See
> git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git misc.cmpxchg
>
> Completely untested; builds on several configs, but that's it.
> Al Viro (8):
> sparc32: make __cmpxchg_u32() return u32
> sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
> sparc32: unify __cmpxchg_u{32,64}
> sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
> parisc: __cmpxchg_u32(): lift conversion into the callers
> parisc: unify implementations of __cmpxchg_u{8,32,64}
> parisc: add missing export of __cmpxchg_u8()
> parisc: add u16 support to cmpxchg()
>
> arch/parisc/include/asm/cmpxchg.h | 16 ++++++------
> arch/parisc/kernel/parisc_ksyms.c | 2 ++
> arch/parisc/lib/bitops.c | 52 ++++++++++++-------------------------
> arch/sparc/include/asm/cmpxchg_32.h | 11 +++++---
> arch/sparc/lib/atomic32.c | 45 ++++++++++++++------------------
> 5 files changed, 55 insertions(+), 71 deletions(-)
>
> Individual patches in followups.

Very good, thank you! I will take yours in place of mine on my next
rebase.

Thanx, Paul

2024-04-02 07:17:51

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 2/8] sparc: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 02, 2024 at 04:37:53AM +0100, Al Viro wrote:
> On Tue, Apr 02, 2024 at 01:07:58AM +0100, Al Viro wrote:
>
> > It does, IIRC.
> >
> > > Would you like to do that patch? If so, I would be happy to drop mine
> > > in favor of yours. If not, could I please have your Signed-off-by so
> > > I can do the Co-developed-by dance?
> >
> > Will do once I dig my way from under the pile of mail (sick for a week
> > and subscribed to l-k, among other lists)...
>
> FWIW, parisc is in the same situation - atomics-by-cached-spinlocks.
> 've a candidate branch, will post if it survives build...

Seems to survive. See
git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git misc.cmpxchg

Completely untested; builds on several configs, but that's it.
Al Viro (8):
sparc32: make __cmpxchg_u32() return u32
sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
sparc32: unify __cmpxchg_u{32,64}
sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
parisc: __cmpxchg_u32(): lift conversion into the callers
parisc: unify implementations of __cmpxchg_u{8,32,64}
parisc: add missing export of __cmpxchg_u8()
parisc: add u16 support to cmpxchg()

arch/parisc/include/asm/cmpxchg.h | 16 ++++++------
arch/parisc/kernel/parisc_ksyms.c | 2 ++
arch/parisc/lib/bitops.c | 52 ++++++++++++-------------------------
arch/sparc/include/asm/cmpxchg_32.h | 11 +++++---
arch/sparc/lib/atomic32.c | 45 ++++++++++++++------------------
5 files changed, 55 insertions(+), 71 deletions(-)

Individual patches in followups.

2024-04-02 08:14:48

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
> Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> and two-byte cmpxchg() on arc.
>
> Signed-off-by: Paul E. McKenney <[email protected]>

I'm missing the context here, is it now mandatory to have 16-bit
cmpxchg() everywhere? I think we've historically tried hard to
keep this out of common code since it's expensive on architectures
that don't have native 16-bit load/store instructions (alpha, armv3)
and or sub-word atomics (armv5, riscv, mips).

Does the code that uses this rely on working concurrently with
non-atomic stores to part of the 32-bit word? If we want to
allow that, we need to merge my alpha ev4/45/5 removal series
first.

For the cmpxchg() interface, I would prefer to handle the
8-bit and 16-bit versions the same way as cmpxchg64() and
provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
by architectures that operate on fixed-size integer values
but not compounds or pointers, and a generic cmpxchg() wrapper
in common code that can handle the abtraction for pointers,
long and (if absolutely necessary) compounds by multiplexing
between cmpxchg32() and cmpxchg64() where needed.

I did a prototype a few years ago and found that there is
probably under a dozen users of the sub-word atomics in
the tree, so this mostly requires changes to architecture
code and less to drivers and core code.

Arnd

2024-04-02 13:45:58

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions

On Mon, 1 Apr 2024 at 23:39, Paul E. McKenney <[email protected]> wrote:
>
> Architectures are required to provide four-byte cmpxchg() and 64-bit
> architectures are additionally required to provide eight-byte cmpxchg().
> However, there are cases where one-byte and two-byte cmpxchg()
> would be extremely useful. Therefore, provide cmpxchg_emu_u8() and
> cmpxchg_emu_u16() that emulate one-byte and two-byte cmpxchg() in terms
> of four-byte cmpxchg().
>
> Note that these emulations are fully ordered, and can (for example)
> cause one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
> If this causes problems for a given architecture, that architecture is
> free to provide its own lighter-weight primitives.
>
> [ paulmck: Apply Marco Elver feedback. ]
> [ paulmck: Apply kernel test robot feedback. ]
>
> Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Cc: Marco Elver <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: "Peter Zijlstra (Intel)" <[email protected]>
> Cc: Douglas Anderson <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: <[email protected]>

Acked-by: Marco Elver <[email protected]>

> ---
> arch/Kconfig | 3 ++
> include/linux/cmpxchg-emu.h | 16 ++++++++
> lib/Makefile | 1 +
> lib/cmpxchg-emu.c | 74 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 94 insertions(+)
> create mode 100644 include/linux/cmpxchg-emu.h
> create mode 100644 lib/cmpxchg-emu.c
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index ae4a4f37bbf08..01093c60952a5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> # strict alignment always, even with -falign-functions.
> def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
>
> +config ARCH_NEED_CMPXCHG_1_2_EMU
> + bool
> +
> endmenu
> diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> new file mode 100644
> index 0000000000000..fee8171fa05eb
> --- /dev/null
> +++ b/include/linux/cmpxchg-emu.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> + * lacking direct support for these sizes. These are implemented in terms
> + * of 4-byte cmpxchg operations.
> + *
> + * Copyright (C) 2024 Paul E. McKenney.
> + */
> +
> +#ifndef __LINUX_CMPXCHG_EMU_H
> +#define __LINUX_CMPXCHG_EMU_H
> +
> +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new);
> +
> +#endif /* __LINUX_CMPXCHG_EMU_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index ffc6b2341b45a..1d93b61a7ecbe 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> lib-$(CONFIG_GENERIC_BUG) += bug.o
>
> obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o
>
> obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
> #ensure exported functions have prototypes
> diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> new file mode 100644
> index 0000000000000..a88c4f3c88430
> --- /dev/null
> +++ b/lib/cmpxchg-emu.c
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> + * lacking direct support for these sizes. These are implemented in terms
> + * of 4-byte cmpxchg operations.
> + *
> + * Copyright (C) 2024 Paul E. McKenney.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/export.h>
> +#include <linux/instrumented.h>
> +#include <linux/atomic.h>
> +#include <linux/panic.h>
> +#include <linux/bug.h>
> +#include <asm-generic/rwonce.h>
> +#include <linux/cmpxchg-emu.h>
> +
> +union u8_32 {
> + u8 b[4];
> + u32 w;
> +};
> +
> +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> +{
> + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> + int i = ((uintptr_t)p) & 0x3;
> + union u8_32 old32;
> + union u8_32 new32;
> + u32 ret;
> +
> + ret = READ_ONCE(*p32);
> + do {
> + old32.w = ret;
> + if (old32.b[i] != old)
> + return old32.b[i];
> + new32.w = old32.w;
> + new32.b[i] = new;
> + instrument_atomic_read_write(p, 1);
> + ret = data_race(cmpxchg(p32, old32.w, new32.w));
> + } while (ret != old32.w);
> + return old;
> +}
> +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
> +
> +union u16_32 {
> + u16 h[2];
> + u32 w;
> +};
> +
> +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */
> +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new)
> +{
> + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> + int i = (((uintptr_t)p) & 0x2) / 2;
> + union u16_32 old32;
> + union u16_32 new32;
> + u32 ret;
> +
> + WARN_ON_ONCE(((uintptr_t)p) & 0x1);
> + ret = READ_ONCE(*p32);
> + do {
> + old32.w = ret;
> + if (old32.h[i] != old)
> + return old32.h[i];
> + new32.w = old32.w;
> + new32.h[i] = new;
> + instrument_atomic_read_write(p, 2);
> + ret = data_race(cmpxchg(p32, old32.w, new32.w));
> + } while (ret != old32.w);
> + return old;
> +}
> +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
> --
> 2.40.1
>

2024-04-02 17:06:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 02, 2024 at 10:14:08AM +0200, Arnd Bergmann wrote:
> On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
> > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> > and two-byte cmpxchg() on arc.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> I'm missing the context here, is it now mandatory to have 16-bit
> cmpxchg() everywhere? I think we've historically tried hard to
> keep this out of common code since it's expensive on architectures
> that don't have native 16-bit load/store instructions (alpha, armv3)
> and or sub-word atomics (armv5, riscv, mips).

I need 8-bit, and just added 16-bit because it was easy to do so.
I would be OK dropping the 16-bit portions of this series, assuming
that no-one needs it. And assuming that it is easier to drop it than
to explain why it is not available. ;-)

> Does the code that uses this rely on working concurrently with
> non-atomic stores to part of the 32-bit word? If we want to
> allow that, we need to merge my alpha ev4/45/5 removal series
> first.

For 8-but cmpxchg(), yes. There are potentially concurrent
smp_load_acquire() and smp_store_release() operations to this same byte.

Or is your question specific to the 16-bit primitives? (Full disclosure:
I have no objection to removing Alpha ev4/45/5, having several times
suggested removing Alpha entirely. And having the scars to prove it.)

> For the cmpxchg() interface, I would prefer to handle the
> 8-bit and 16-bit versions the same way as cmpxchg64() and
> provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
> by architectures that operate on fixed-size integer values
> but not compounds or pointers, and a generic cmpxchg() wrapper
> in common code that can handle the abtraction for pointers,
> long and (if absolutely necessary) compounds by multiplexing
> between cmpxchg32() and cmpxchg64() where needed.

So as to support _acquire(), _relaxed(), and _release()?

If so, I don't have any use cases for other than full ordering.

> I did a prototype a few years ago and found that there is
> probably under a dozen users of the sub-word atomics in
> the tree, so this mostly requires changes to architecture
> code and less to drivers and core code.

Given this approach, the predominance of changes to architecture code
seems quite likely to me.

But do we really wish to invest that much work into architectures that
might not be all that long for the world? (Quickly donning my old
asbestos suit, the one with the tungsten pinstripes...)

Thanx, Paul

2024-04-02 17:15:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 1/8] lib: Add one-byte and two-byte cmpxchg() emulation functions

On Tue, Apr 02, 2024 at 03:07:22PM +0200, Marco Elver wrote:
> On Mon, 1 Apr 2024 at 23:39, Paul E. McKenney <[email protected]> wrote:
> >
> > Architectures are required to provide four-byte cmpxchg() and 64-bit
> > architectures are additionally required to provide eight-byte cmpxchg().
> > However, there are cases where one-byte and two-byte cmpxchg()
> > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and
> > cmpxchg_emu_u16() that emulate one-byte and two-byte cmpxchg() in terms
> > of four-byte cmpxchg().
> >
> > Note that these emulations are fully ordered, and can (for example)
> > cause one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
> > If this causes problems for a given architecture, that architecture is
> > free to provide its own lighter-weight primitives.
> >
> > [ paulmck: Apply Marco Elver feedback. ]
> > [ paulmck: Apply kernel test robot feedback. ]
> >
> > Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Cc: Marco Elver <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: "Peter Zijlstra (Intel)" <[email protected]>
> > Cc: Douglas Anderson <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > Cc: <[email protected]>
>
> Acked-by: Marco Elver <[email protected]>

Thank you! I will apply on my next rebase.

Thanx, Paul

> > ---
> > arch/Kconfig | 3 ++
> > include/linux/cmpxchg-emu.h | 16 ++++++++
> > lib/Makefile | 1 +
> > lib/cmpxchg-emu.c | 74 +++++++++++++++++++++++++++++++++++++
> > 4 files changed, 94 insertions(+)
> > create mode 100644 include/linux/cmpxchg-emu.h
> > create mode 100644 lib/cmpxchg-emu.c
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index ae4a4f37bbf08..01093c60952a5 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
> > # strict alignment always, even with -falign-functions.
> > def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG
> >
> > +config ARCH_NEED_CMPXCHG_1_2_EMU
> > + bool
> > +
> > endmenu
> > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
> > new file mode 100644
> > index 0000000000000..fee8171fa05eb
> > --- /dev/null
> > +++ b/include/linux/cmpxchg-emu.h
> > @@ -0,0 +1,16 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +/*
> > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> > + * lacking direct support for these sizes. These are implemented in terms
> > + * of 4-byte cmpxchg operations.
> > + *
> > + * Copyright (C) 2024 Paul E. McKenney.
> > + */
> > +
> > +#ifndef __LINUX_CMPXCHG_EMU_H
> > +#define __LINUX_CMPXCHG_EMU_H
> > +
> > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
> > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new);
> > +
> > +#endif /* __LINUX_CMPXCHG_EMU_H */
> > diff --git a/lib/Makefile b/lib/Makefile
> > index ffc6b2341b45a..1d93b61a7ecbe 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> > lib-$(CONFIG_GENERIC_BUG) += bug.o
> >
> > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o
> >
> > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
> > #ensure exported functions have prototypes
> > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
> > new file mode 100644
> > index 0000000000000..a88c4f3c88430
> > --- /dev/null
> > +++ b/lib/cmpxchg-emu.c
> > @@ -0,0 +1,74 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +/*
> > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures
> > + * lacking direct support for these sizes. These are implemented in terms
> > + * of 4-byte cmpxchg operations.
> > + *
> > + * Copyright (C) 2024 Paul E. McKenney.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/export.h>
> > +#include <linux/instrumented.h>
> > +#include <linux/atomic.h>
> > +#include <linux/panic.h>
> > +#include <linux/bug.h>
> > +#include <asm-generic/rwonce.h>
> > +#include <linux/cmpxchg-emu.h>
> > +
> > +union u8_32 {
> > + u8 b[4];
> > + u32 w;
> > +};
> > +
> > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
> > +{
> > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > + int i = ((uintptr_t)p) & 0x3;
> > + union u8_32 old32;
> > + union u8_32 new32;
> > + u32 ret;
> > +
> > + ret = READ_ONCE(*p32);
> > + do {
> > + old32.w = ret;
> > + if (old32.b[i] != old)
> > + return old32.b[i];
> > + new32.w = old32.w;
> > + new32.b[i] = new;
> > + instrument_atomic_read_write(p, 1);
> > + ret = data_race(cmpxchg(p32, old32.w, new32.w));
> > + } while (ret != old32.w);
> > + return old;
> > +}
> > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
> > +
> > +union u16_32 {
> > + u16 h[2];
> > + u32 w;
> > +};
> > +
> > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */
> > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new)
> > +{
> > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
> > + int i = (((uintptr_t)p) & 0x2) / 2;
> > + union u16_32 old32;
> > + union u16_32 new32;
> > + u32 ret;
> > +
> > + WARN_ON_ONCE(((uintptr_t)p) & 0x1);
> > + ret = READ_ONCE(*p32);
> > + do {
> > + old32.w = ret;
> > + if (old32.h[i] != old)
> > + return old32.h[i];
> > + new32.w = old32.w;
> > + new32.h[i] = new;
> > + instrument_atomic_read_write(p, 2);
> > + ret = data_race(cmpxchg(p32, old32.w, new32.w));
> > + } while (ret != old32.w);
> > + return old;
> > +}
> > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);
> > --
> > 2.40.1
> >

2024-04-02 20:52:33

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 02, 2024 at 10:06:14AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 02, 2024 at 10:14:08AM +0200, Arnd Bergmann wrote:
> > On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
> > > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> > > and two-byte cmpxchg() on arc.
> > >
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > I'm missing the context here, is it now mandatory to have 16-bit
> > cmpxchg() everywhere? I think we've historically tried hard to
> > keep this out of common code since it's expensive on architectures
> > that don't have native 16-bit load/store instructions (alpha, armv3)
> > and or sub-word atomics (armv5, riscv, mips).
>
> I need 8-bit, and just added 16-bit because it was easy to do so.
> I would be OK dropping the 16-bit portions of this series, assuming
> that no-one needs it. And assuming that it is easier to drop it than
> to explain why it is not available. ;-)
>
> > Does the code that uses this rely on working concurrently with
> > non-atomic stores to part of the 32-bit word? If we want to
> > allow that, we need to merge my alpha ev4/45/5 removal series
> > first.
>
> For 8-but cmpxchg(), yes. There are potentially concurrent
> smp_load_acquire() and smp_store_release() operations to this same byte.
>
> Or is your question specific to the 16-bit primitives? (Full disclosure:
> I have no objection to removing Alpha ev4/45/5, having several times
> suggested removing Alpha entirely. And having the scars to prove it.)
>
> > For the cmpxchg() interface, I would prefer to handle the
> > 8-bit and 16-bit versions the same way as cmpxchg64() and
> > provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
> > by architectures that operate on fixed-size integer values
> > but not compounds or pointers, and a generic cmpxchg() wrapper
> > in common code that can handle the abtraction for pointers,
> > long and (if absolutely necessary) compounds by multiplexing
> > between cmpxchg32() and cmpxchg64() where needed.
>
> So as to support _acquire(), _relaxed(), and _release()?
>
> If so, I don't have any use cases for other than full ordering.

Nor any use cases other than integers. (In case another thing you are
after here is good type-checking for non-integers combined with allowing
C-language implicit conversions for integers.)

Thanx, Paul

> > I did a prototype a few years ago and found that there is
> > probably under a dozen users of the sub-word atomics in
> > the tree, so this mostly requires changes to architecture
> > code and less to drivers and core code.
>
> Given this approach, the predominance of changes to architecture code
> seems quite likely to me.
>
> But do we really wish to invest that much work into architectures that
> might not be all that long for the world? (Quickly donning my old
> asbestos suit, the one with the tungsten pinstripes...)
>
> Thanx, Paul

2024-04-04 11:58:31

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

On Tue, Apr 2, 2024, at 19:06, Paul E. McKenney wrote:
> On Tue, Apr 02, 2024 at 10:14:08AM +0200, Arnd Bergmann wrote:
>> On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
>> > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
>> > and two-byte cmpxchg() on arc.
>> >
>> > Signed-off-by: Paul E. McKenney <[email protected]>
>>
>> I'm missing the context here, is it now mandatory to have 16-bit
>> cmpxchg() everywhere? I think we've historically tried hard to
>> keep this out of common code since it's expensive on architectures
>> that don't have native 16-bit load/store instructions (alpha, armv3)
>> and or sub-word atomics (armv5, riscv, mips).
>
> I need 8-bit, and just added 16-bit because it was easy to do so.
> I would be OK dropping the 16-bit portions of this series, assuming
> that no-one needs it. And assuming that it is easier to drop it than
> to explain why it is not available. ;-)

It certainly makes sense to handle both the same, whichever
way we do this.

>> Does the code that uses this rely on working concurrently with
>> non-atomic stores to part of the 32-bit word? If we want to
>> allow that, we need to merge my alpha ev4/45/5 removal series
>> first.
>
> For 8-but cmpxchg(), yes. There are potentially concurrent
> smp_load_acquire() and smp_store_release() operations to this same byte.
>
> Or is your question specific to the 16-bit primitives? (Full disclosure:
> I have no objection to removing Alpha ev4/45/5, having several times
> suggested removing Alpha entirely. And having the scars to prove it.)

For the native sub-word access, alpha ev5 cannot do either of
them, while armv3 could do byte access but not 16-bit words.

It sounds like the old alphas are already broken then, which
is a good reason to finally drop support. It would appear that
the arm riscpc is not affected by this though.

>> For the cmpxchg() interface, I would prefer to handle the
>> 8-bit and 16-bit versions the same way as cmpxchg64() and
>> provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
>> by architectures that operate on fixed-size integer values
>> but not compounds or pointers, and a generic cmpxchg() wrapper
>> in common code that can handle the abtraction for pointers,
>> long and (if absolutely necessary) compounds by multiplexing
>> between cmpxchg32() and cmpxchg64() where needed.
>
> So as to support _acquire(), _relaxed(), and _release()?
>
> If so, I don't have any use cases for other than full ordering.

My main goal here would be to simplify the definition of
the very commonly used cmpxchg() macro so it doesn't have
to deal with so many corner cases, and make it work the
same way across all architectures. Without the type
agnostic wrapper, there would also be the benefit of
additional type checking that we get by replacing the
macros with inline functions.

We'd still need all the combinations of cmpxchg() and xchg()
with the four sizes and ordering variants, but at least the
latter should easily collapse on most architectures. At the
moment, most architectures only provide the full-ordering
version.

Arnd

2024-04-04 14:36:23

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 8/8] riscv: Emulate one-byte and two-byte cmpxchg

On Mon, 01 Apr 2024 14:39:50 PDT (-0700), [email protected] wrote:
> Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> and two-byte cmpxchg() on riscv.
>
> [ paulmck: Apply kernel test robot feedback. ]

I'm not entirely following the thread, but sounds like there's going to
be generic kernel users of this now? Before we'd said "no" to the
byte/half atomic emulation routines beacuse they weren't used, but if
it's a generic thing then I'm find adding them.

There's a patch set over here
<https://lore.kernel.org/all/[email protected]/>
that implements these more directly using LR/SC. I was sort of on the
fence about just taking it even with no direct users right now, as the
byte/half atomic extension is working its way through the spec process
so we'll have them for real soon. I stopped right there for the last
merge window, though, as I figured it was too late to be messing with
the atomics...

So

Acked-by: Palmer Dabbelt <[email protected]>

if you guys want to take some sort of tree-wide change to make the
byte/half stuff be required everywhere. We'll eventually end up with
arch routines for the extension, so at that point we might as well also
have the more direct LR/SC flavors.

If you want I can go review/merge that RISC-V patch set and then it'll
have time to bake for a shared tag you can pick up for all this stuff?
No rush on my end, just LMK.

> Signed-off-by: Paul E. McKenney <[email protected]>
> Cc: Andi Shyti <[email protected]>
> Cc: Andrzej Hajda <[email protected]>
> Cc: <[email protected]>
> ---
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/cmpxchg.h | 25 +++++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index be09c8836d56b..4eaf40d0a52ec 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -44,6 +44,7 @@ config RISCV
> select ARCH_HAS_UBSAN
> select ARCH_HAS_VDSO_DATA
> select ARCH_KEEP_MEMBLOCK if ACPI
> + select ARCH_NEED_CMPXCHG_1_2_EMU
> select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
> select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
> select ARCH_STACKWALK
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 2fee65cc84432..a5b377481785c 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -9,6 +9,7 @@
> #include <linux/bug.h>
>
> #include <asm/fence.h>
> +#include <linux/cmpxchg-emu.h>
>
> #define __xchg_relaxed(ptr, new, size) \
> ({ \
> @@ -170,6 +171,12 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> + case 2: \
> + break; \
> + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> case 4: \
> __asm__ __volatile__ ( \
> "0: lr.w %0, %2\n" \
> @@ -214,6 +221,12 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> + case 2: \
> + break; \
> + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> case 4: \
> __asm__ __volatile__ ( \
> "0: lr.w %0, %2\n" \
> @@ -260,6 +273,12 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> + case 2: \
> + break; \
> + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> case 4: \
> __asm__ __volatile__ ( \
> RISCV_RELEASE_BARRIER \
> @@ -306,6 +325,12 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> + case 2: \
> + break; \
> + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> case 4: \
> __asm__ __volatile__ ( \
> "0: lr.w %0, %2\n" \

2024-04-04 14:46:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

On Thu, Apr 04, 2024 at 01:57:32PM +0200, Arnd Bergmann wrote:
> On Tue, Apr 2, 2024, at 19:06, Paul E. McKenney wrote:
> > On Tue, Apr 02, 2024 at 10:14:08AM +0200, Arnd Bergmann wrote:
> >> On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
> >> > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> >> > and two-byte cmpxchg() on arc.
> >> >
> >> > Signed-off-by: Paul E. McKenney <[email protected]>
> >>
> >> I'm missing the context here, is it now mandatory to have 16-bit
> >> cmpxchg() everywhere? I think we've historically tried hard to
> >> keep this out of common code since it's expensive on architectures
> >> that don't have native 16-bit load/store instructions (alpha, armv3)
> >> and or sub-word atomics (armv5, riscv, mips).
> >
> > I need 8-bit, and just added 16-bit because it was easy to do so.
> > I would be OK dropping the 16-bit portions of this series, assuming
> > that no-one needs it. And assuming that it is easier to drop it than
> > to explain why it is not available. ;-)
>
> It certainly makes sense to handle both the same, whichever
> way we do this.

Agreed, at least as long as the properties of the relevant hardware are
consistent with doing so.

> >> Does the code that uses this rely on working concurrently with
> >> non-atomic stores to part of the 32-bit word? If we want to
> >> allow that, we need to merge my alpha ev4/45/5 removal series
> >> first.
> >
> > For 8-but cmpxchg(), yes. There are potentially concurrent
> > smp_load_acquire() and smp_store_release() operations to this same byte.
> >
> > Or is your question specific to the 16-bit primitives? (Full disclosure:
> > I have no objection to removing Alpha ev4/45/5, having several times
> > suggested removing Alpha entirely. And having the scars to prove it.)
>
> For the native sub-word access, alpha ev5 cannot do either of
> them, while armv3 could do byte access but not 16-bit words.
>
> It sounds like the old alphas are already broken then, which
> is a good reason to finally drop support. It would appear that
> the arm riscpc is not affected by this though.

Good point, given that single-byte load/store and emulated single-byte
cmpxchg() has been in common code for some time.

So armv3 is OK with one-byte emulated cmpxchg(), but not with two-byte,
which is consistent with the current state of this series in the -rcu
tree. (My plan is to wait until Monday to re-send the series in order
to allow the test robots to find yet more bugs.)

Or is the plan to also drop support for armv3 in the near term?

> >> For the cmpxchg() interface, I would prefer to handle the
> >> 8-bit and 16-bit versions the same way as cmpxchg64() and
> >> provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
> >> by architectures that operate on fixed-size integer values
> >> but not compounds or pointers, and a generic cmpxchg() wrapper
> >> in common code that can handle the abtraction for pointers,
> >> long and (if absolutely necessary) compounds by multiplexing
> >> between cmpxchg32() and cmpxchg64() where needed.
> >
> > So as to support _acquire(), _relaxed(), and _release()?
> >
> > If so, I don't have any use cases for other than full ordering.
>
> My main goal here would be to simplify the definition of
> the very commonly used cmpxchg() macro so it doesn't have
> to deal with so many corner cases, and make it work the
> same way across all architectures. Without the type
> agnostic wrapper, there would also be the benefit of
> additional type checking that we get by replacing the
> macros with inline functions.
>
> We'd still need all the combinations of cmpxchg() and xchg()
> with the four sizes and ordering variants, but at least the
> latter should easily collapse on most architectures. At the
> moment, most architectures only provide the full-ordering
> version.

That does make a lot of sense to me. Though C-language inline functions
have some trouble with type-generic parameters.

However, my patch is down at architecture-specific level, so should not
affect the cmpxchg() macro, right? Or am I missing some aspect of your
proposed refactoring?

Thanx, Paul

2024-04-04 14:50:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 8/8] riscv: Emulate one-byte and two-byte cmpxchg

On Thu, Apr 04, 2024 at 07:15:40AM -0700, Palmer Dabbelt wrote:
> On Mon, 01 Apr 2024 14:39:50 PDT (-0700), [email protected] wrote:
> > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> > and two-byte cmpxchg() on riscv.
> >
> > [ paulmck: Apply kernel test robot feedback. ]
>
> I'm not entirely following the thread, but sounds like there's going to be
> generic kernel users of this now? Before we'd said "no" to the byte/half
> atomic emulation routines beacuse they weren't used, but if it's a generic
> thing then I'm find adding them.

RCU currently contains an open-coded counterpart of the proposed
cmpxchg_emu_u8() function, so yes. ;-)

> There's a patch set over here
> <https://lore.kernel.org/all/[email protected]/>
> that implements these more directly using LR/SC. I was sort of on the fence
> about just taking it even with no direct users right now, as the byte/half
> atomic extension is working its way through the spec process so we'll have
> them for real soon. I stopped right there for the last merge window,
> though, as I figured it was too late to be messing with the atomics...

I would be extremely happy to drop my riscv patch in favor of an
architecture-specific implementation, especially a more-efficient
implementation. ;-)

> So
>
> Acked-by: Palmer Dabbelt <[email protected]>
>
> if you guys want to take some sort of tree-wide change to make the byte/half
> stuff be required everywhere. We'll eventually end up with arch routines
> for the extension, so at that point we might as well also have the more
> direct LR/SC flavors.
>
> If you want I can go review/merge that RISC-V patch set and then it'll have
> time to bake for a shared tag you can pick up for all this stuff? No rush
> on my end, just LMK.

That sounds very good! I will apply your ack to my emulatino commit
in the meantime, so your schedule is my schedule. And a big "thank
you!" for both!!!

Thanx, Paul

> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Cc: Andi Shyti <[email protected]>
> > Cc: Andrzej Hajda <[email protected]>
> > Cc: <[email protected]>
> > ---
> > arch/riscv/Kconfig | 1 +
> > arch/riscv/include/asm/cmpxchg.h | 25 +++++++++++++++++++++++++
> > 2 files changed, 26 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index be09c8836d56b..4eaf40d0a52ec 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -44,6 +44,7 @@ config RISCV
> > select ARCH_HAS_UBSAN
> > select ARCH_HAS_VDSO_DATA
> > select ARCH_KEEP_MEMBLOCK if ACPI
> > + select ARCH_NEED_CMPXCHG_1_2_EMU
> > select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
> > select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
> > select ARCH_STACKWALK
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index 2fee65cc84432..a5b377481785c 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -9,6 +9,7 @@
> > #include <linux/bug.h>
> >
> > #include <asm/fence.h>
> > +#include <linux/cmpxchg-emu.h>
> >
> > #define __xchg_relaxed(ptr, new, size) \
> > ({ \
> > @@ -170,6 +171,12 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > + case 2: \
> > + break; \
> > + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> > case 4: \
> > __asm__ __volatile__ ( \
> > "0: lr.w %0, %2\n" \
> > @@ -214,6 +221,12 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > + case 2: \
> > + break; \
> > + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> > case 4: \
> > __asm__ __volatile__ ( \
> > "0: lr.w %0, %2\n" \
> > @@ -260,6 +273,12 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > + case 2: \
> > + break; \
> > + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> > case 4: \
> > __asm__ __volatile__ ( \
> > RISCV_RELEASE_BARRIER \
> > @@ -306,6 +325,12 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > + case 2: \
> > + break; \
> > + __ret = cmpxchg_emu_u16((volatile u16 *)__ptr, __old, __new); \
> > case 4: \
> > __asm__ __volatile__ ( \
> > "0: lr.w %0, %2\n" \

2024-04-04 15:40:38

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

On Thu, Apr 4, 2024, at 16:44, Paul E. McKenney wrote:
> On Thu, Apr 04, 2024 at 01:57:32PM +0200, Arnd Bergmann wrote:
>> On Tue, Apr 2, 2024, at 19:06, Paul E. McKenney wrote:
>> > Or is your question specific to the 16-bit primitives? (Full disclosure:
>> > I have no objection to removing Alpha ev4/45/5, having several times
>> > suggested removing Alpha entirely. And having the scars to prove it.)
>>
>> For the native sub-word access, alpha ev5 cannot do either of
>> them, while armv3 could do byte access but not 16-bit words.
>>
>> It sounds like the old alphas are already broken then, which
>> is a good reason to finally drop support. It would appear that
>> the arm riscpc is not affected by this though.
>
> Good point, given that single-byte load/store and emulated single-byte
> cmpxchg() has been in common code for some time.
>
> So armv3 is OK with one-byte emulated cmpxchg(), but not with two-byte,
> which is consistent with the current state of this series in the -rcu
> tree. (My plan is to wait until Monday to re-send the series in order
> to allow the test robots to find yet more bugs.)
>
> Or is the plan to also drop support for armv3 in the near term?

Russell still has his RiscPC, which is probably the only one
using armv3 kernels (it's actually v4 but relies on -march=armv3
to work around hardware quirks). Since armv3 support was dropped
in gcc-9, it's a matter of time before we drop this as well, but
it's not now.

>> We'd still need all the combinations of cmpxchg() and xchg()
>> with the four sizes and ordering variants, but at least the
>> latter should easily collapse on most architectures. At the
>> moment, most architectures only provide the full-ordering
>> version.
>
> That does make a lot of sense to me. Though C-language inline functions
> have some trouble with type-generic parameters.
>
> However, my patch is down at architecture-specific level, so should not
> affect the cmpxchg() macro, right? Or am I missing some aspect of your
> proposed refactoring?

Today, arch_cmpxchg() and its variants are defined in each
architecture to handle some subset of integer sizes.

The way I'd like to do this in the future would be to remove that
macro in favor of arch_{cmp,}xchg{8,16,32,64}{,_relaxed,_release,
_acquire,_local} inline functions that each architecture needs
to provide either directly or through a generic helper, with
all the macros wrapping them moved to common code.

I've been wanting to change this for years now and it never
quite makes it to the top of my todo pile, so I guess it can wait
a little longer.

Arnd

2024-04-08 17:48:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg()

Hello!

This series provides emulation functions for one-byte cmpxchg, and uses it
for those architectures not supporting this in hardware. The emulation
is in terms of the fully ordered four-byte cmpxchg() that is supplied
by all of these architectures. This was tested by making x86 forget
that it can do one-byte cmpxchg() natively:

88a9b3f7a924 ("EXP arch/x86: Test one-byte cmpxchg emulation")

This commit is local to -rcu and is of course not intended for mainline.

If accepted, RCU Tasks will use this capability in place of the current
rcu_trc_cmpxchg_need_qs() open-coding of this emulation.

1. sparc32: make __cmpxchg_u32() return u32, courtesy of Al Viro.

2. sparc32: make the first argument of __cmpxchg_u64() volatile u64 *,
courtesy of Al Viro.

3. sparc32: unify __cmpxchg_u{32,64}, courtesy of Al Viro.

4. sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those
sizes, courtesy of Al Viro.

5. parisc: __cmpxchg_u32(): lift conversion into the callers, courtesy of
Al Viro.

6. parisc: unify implementations of __cmpxchg_u{8,32,64}, courtesy of
Al Viro.

7. parisc: add missing export of __cmpxchg_u8(), courtesy of Al Viro.

8. parisc: add u16 support to cmpxchg(), courtesy of Al Viro.

9. lib: Add one-byte emulation function.

10. ARC: Emulate one-byte cmpxchg.

11. csky: Emulate one-byte cmpxchg.

12. sh: Emulate one-byte cmpxchg.

13. xtensa: Emulate one-byte cmpxchg.

14. riscv: Emulate one-byte cmpxchg.

Changes since v1:

o Add native support for sparc32 and parisc, courtesy of Al Viro.

o Remove two-byte emulation due to architectures that still do not
support two-byte load and store instructions, per Arnd Bergmann
feedback. (Yes, there are a few systems out there that do not
even support one-byte load instructions, but these are slated
for removal anyway.)

o Fix numerous casting bugs spotted by kernel test robot.

o Fix SPDX header. "//" for .c files and "/*" for .h files.
I am sure that there is a good reason for this. ;-)

Thanx, Paul

------------------------------------------------------------------------

arch/parisc/include/asm/cmpxchg.h | 19 +++++-------
arch/parisc/kernel/parisc_ksyms.c | 1
arch/parisc/lib/bitops.c | 52 +++++++++++-----------------------
arch/sparc/include/asm/cmpxchg_32.h | 18 +++++------
arch/sparc/lib/atomic32.c | 47 +++++++++++++-----------------
b/arch/Kconfig | 3 +
b/arch/arc/Kconfig | 1
b/arch/arc/include/asm/cmpxchg.h | 32 +++++++++++++++-----
b/arch/csky/Kconfig | 1
b/arch/csky/include/asm/cmpxchg.h | 10 ++++++
b/arch/parisc/include/asm/cmpxchg.h | 3 -
b/arch/parisc/kernel/parisc_ksyms.c | 1
b/arch/parisc/lib/bitops.c | 6 +--
b/arch/riscv/Kconfig | 1
b/arch/riscv/include/asm/cmpxchg.h | 13 ++++++++
b/arch/sh/Kconfig | 1
b/arch/sh/include/asm/cmpxchg.h | 2 +
b/arch/sparc/include/asm/cmpxchg_32.h | 4 +-
b/arch/sparc/lib/atomic32.c | 4 +-
b/arch/xtensa/Kconfig | 1
b/arch/xtensa/include/asm/cmpxchg.h | 2 +
b/include/linux/cmpxchg-emu.h | 15 +++++++++
b/lib/Makefile | 1
b/lib/cmpxchg-emu.c | 45 +++++++++++++++++++++++++++++
24 files changed, 184 insertions(+), 99 deletions(-)

2024-04-08 17:50:20

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 03/14] sparc32: unify __cmpxchg_u{32,64}

From: Al Viro <[email protected]>

Add a macro that expands to one of those when given u32 or u64
as an argument - atomic32.c has a lot of similar stuff already.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/sparc/lib/atomic32.c | 41 +++++++++++++++------------------------
1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index e15affbbb5238..0d215a772428e 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -159,32 +159,23 @@ unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
}
EXPORT_SYMBOL(sp32___change_bit);

-u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
-{
- unsigned long flags;
- u32 prev;
-
- spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
- if ((prev = *ptr) == old)
- *ptr = new;
- spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
-
- return prev;
-}
+#define CMPXCHG(T) \
+ T __cmpxchg_##T(volatile T *ptr, T old, T new) \
+ { \
+ unsigned long flags; \
+ T prev; \
+ \
+ spin_lock_irqsave(ATOMIC_HASH(ptr), flags); \
+ if ((prev = *ptr) == old) \
+ *ptr = new; \
+ spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);\
+ \
+ return prev; \
+ }
+
+CMPXCHG(u32)
+CMPXCHG(u64)
EXPORT_SYMBOL(__cmpxchg_u32);
-
-u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
-{
- unsigned long flags;
- u64 prev;
-
- spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
- if ((prev = *ptr) == old)
- *ptr = new;
- spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
-
- return prev;
-}
EXPORT_SYMBOL(__cmpxchg_u64);

unsigned long __xchg_u32(volatile u32 *ptr, u32 new)
--
2.40.1


2024-04-08 17:50:21

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 02/14] sparc32: make the first argument of __cmpxchg_u64() volatile u64 *

From: Al Viro <[email protected]>

.. to match all cmpxchg variants.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/sparc/include/asm/cmpxchg_32.h | 2 +-
arch/sparc/lib/atomic32.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index 2a05cb236480c..05d5f86a56dc2 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -63,7 +63,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
(unsigned long)_n_, sizeof(*(ptr))); \
})

-u64 __cmpxchg_u64(u64 *ptr, u64 old, u64 new);
+u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new);
#define arch_cmpxchg64(ptr, old, new) __cmpxchg_u64(ptr, old, new)

#include <asm-generic/cmpxchg-local.h>
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index d90d756123d81..e15affbbb5238 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -173,7 +173,7 @@ u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
}
EXPORT_SYMBOL(__cmpxchg_u32);

-u64 __cmpxchg_u64(u64 *ptr, u64 old, u64 new)
+u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
{
unsigned long flags;
u64 prev;
--
2.40.1


2024-04-08 17:50:24

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 01/14] sparc32: make __cmpxchg_u32() return u32

From: Al Viro <[email protected]>

Conversion between u32 and unsigned long is tautological there,
and the only use of return value is to return it from
__cmpxchg() (which return unsigned long).

Get rid of explicit casts in __cmpxchg_u32() call, while we are
at it - normal conversions for arguments will do just fine.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/sparc/include/asm/cmpxchg_32.h | 4 ++--
arch/sparc/lib/atomic32.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index d0af82c240b73..2a05cb236480c 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -39,7 +39,7 @@ static __always_inline unsigned long __arch_xchg(unsigned long x, __volatile__ v
/* bug catcher for when unsupported size is used - won't link */
void __cmpxchg_called_with_bad_pointer(void);
/* we only need to support cmpxchg of a u32 on sparc */
-unsigned long __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
+u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);

/* don't worry...optimizer will get rid of most of this */
static inline unsigned long
@@ -47,7 +47,7 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
{
switch (size) {
case 4:
- return __cmpxchg_u32((u32 *)ptr, (u32)old, (u32)new_);
+ return __cmpxchg_u32(ptr, old, new_);
default:
__cmpxchg_called_with_bad_pointer();
break;
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index cf80d1ae352be..d90d756123d81 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -159,7 +159,7 @@ unsigned long sp32___change_bit(unsigned long *addr, unsigned long mask)
}
EXPORT_SYMBOL(sp32___change_bit);

-unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
+u32 __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
{
unsigned long flags;
u32 prev;
@@ -169,7 +169,7 @@ unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
*ptr = new;
spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);

- return (unsigned long)prev;
+ return prev;
}
EXPORT_SYMBOL(__cmpxchg_u32);

--
2.40.1


2024-04-08 17:50:30

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 05/14] parisc: __cmpxchg_u32(): lift conversion into the callers

From: Al Viro <[email protected]>

__cmpxchg_u32() return value is unsigned int explicitly cast to
unsigned long. Both callers are returns from functions that
return unsigned long; might as well have __cmpxchg_u32()
return that unsigned int (aka u32) and let the callers convert
implicitly.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/parisc/include/asm/cmpxchg.h | 3 +--
arch/parisc/lib/bitops.c | 6 +++---
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index c1d776bb16b4e..0924ebc576d28 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -57,8 +57,7 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
extern void __cmpxchg_called_with_bad_pointer(void);

/* __cmpxchg_u32/u64 defined in arch/parisc/lib/bitops.c */
-extern unsigned long __cmpxchg_u32(volatile unsigned int *m, unsigned int old,
- unsigned int new_);
+extern u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
extern u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new_);
extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);

diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index 36a3141990746..ae2231d921985 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -68,16 +68,16 @@ u64 notrace __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
return prev;
}

-unsigned long notrace __cmpxchg_u32(volatile unsigned int *ptr, unsigned int old, unsigned int new)
+u32 notrace __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
{
unsigned long flags;
- unsigned int prev;
+ u32 prev;

_atomic_spin_lock_irqsave(ptr, flags);
if ((prev = *ptr) == old)
*ptr = new;
_atomic_spin_unlock_irqrestore(ptr, flags);
- return (unsigned long)prev;
+ return prev;
}

u8 notrace __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new)
--
2.40.1


2024-04-08 17:51:59

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 04/14] sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes

From: Al Viro <[email protected]>

trivial now

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/sparc/include/asm/cmpxchg_32.h | 16 +++++++---------
arch/sparc/lib/atomic32.c | 4 ++++
2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/asm/cmpxchg_32.h b/arch/sparc/include/asm/cmpxchg_32.h
index 05d5f86a56dc2..8c1a3ca34eeb7 100644
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -38,21 +38,19 @@ static __always_inline unsigned long __arch_xchg(unsigned long x, __volatile__ v

/* bug catcher for when unsupported size is used - won't link */
void __cmpxchg_called_with_bad_pointer(void);
-/* we only need to support cmpxchg of a u32 on sparc */
+u8 __cmpxchg_u8(volatile u8 *m, u8 old, u8 new_);
+u16 __cmpxchg_u16(volatile u16 *m, u16 old, u16 new_);
u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);

/* don't worry...optimizer will get rid of most of this */
static inline unsigned long
__cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
{
- switch (size) {
- case 4:
- return __cmpxchg_u32(ptr, old, new_);
- default:
- __cmpxchg_called_with_bad_pointer();
- break;
- }
- return old;
+ return
+ size == 1 ? __cmpxchg_u8(ptr, old, new_) :
+ size == 2 ? __cmpxchg_u16(ptr, old, new_) :
+ size == 4 ? __cmpxchg_u32(ptr, old, new_) :
+ (__cmpxchg_called_with_bad_pointer(), old);
}

#define arch_cmpxchg(ptr, o, n) \
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 0d215a772428e..8ae880ebf07aa 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -173,8 +173,12 @@ EXPORT_SYMBOL(sp32___change_bit);
return prev; \
}

+CMPXCHG(u8)
+CMPXCHG(u16)
CMPXCHG(u32)
CMPXCHG(u64)
+EXPORT_SYMBOL(__cmpxchg_u8);
+EXPORT_SYMBOL(__cmpxchg_u16);
EXPORT_SYMBOL(__cmpxchg_u32);
EXPORT_SYMBOL(__cmpxchg_u64);

--
2.40.1


2024-04-08 17:51:58

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 10/14] ARC: Emulate one-byte cmpxchg

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on arc.

[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
---
arch/arc/Kconfig | 1 +
arch/arc/include/asm/cmpxchg.h | 32 +++++++++++++++++++++++---------
2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 99d2845f3feb9..5bf6137f0fd47 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -14,6 +14,7 @@ config ARC
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select ARCH_NEED_CMPXCHG_1_EMU
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select ARCH_32BIT_OFF_T
select BUILDTIME_TABLE_SORT
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e138fde067dea..c3833e18389f4 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -46,6 +46,9 @@
__typeof__(*(ptr)) _prev_; \
\
switch(sizeof((_p_))) { \
+ case 1: \
+ _prev_ = cmpxchg_emu_u8((volatile u8 *)_p_, _o_, _n_); \
+ break; \
case 4: \
_prev_ = __cmpxchg(_p_, _o_, _n_); \
break; \
@@ -65,16 +68,27 @@
__typeof__(*(ptr)) _prev_; \
unsigned long __flags; \
\
- BUILD_BUG_ON(sizeof(_p_) != 4); \
+ switch(sizeof((_p_))) { \
+ case 1: \
+ __flags = cmpxchg_emu_u8((volatile u8 *)_p_, _o_, _n_); \
+ _prev_ = (__typeof__(*(ptr)))__flags; \
+ break; \
+ break; \
+ case 4: \
+ /* \
+ * spin lock/unlock provide the needed smp_mb() \
+ * before/after \
+ */ \
+ atomic_ops_lock(__flags); \
+ _prev_ = *_p_; \
+ if (_prev_ == _o_) \
+ *_p_ = _n_; \
+ atomic_ops_unlock(__flags); \
+ break; \
+ default: \
+ BUILD_BUG(); \
+ } \
\
- /* \
- * spin lock/unlock provide the needed smp_mb() before/after \
- */ \
- atomic_ops_lock(__flags); \
- _prev_ = *_p_; \
- if (_prev_ == _o_) \
- *_p_ = _n_; \
- atomic_ops_unlock(__flags); \
_prev_; \
})

--
2.40.1


2024-04-08 17:52:00

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 07/14] parisc: add missing export of __cmpxchg_u8()

From: Al Viro <[email protected]>

__cmpxchg_u8() had been added (initially) for the sake of
drivers/phy/ti/phy-tusb1210.c; the thing is, that drivers is
modular, so we need an export

Fixes: b344d6a83d01 "parisc: add support for cmpxchg on u8 pointers"
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/parisc/kernel/parisc_ksyms.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index 6f0c92e8149d8..dcf61cbd31470 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -22,6 +22,7 @@ EXPORT_SYMBOL(memset);
#include <linux/atomic.h>
EXPORT_SYMBOL(__xchg8);
EXPORT_SYMBOL(__xchg32);
+EXPORT_SYMBOL(__cmpxchg_u8);
EXPORT_SYMBOL(__cmpxchg_u32);
EXPORT_SYMBOL(__cmpxchg_u64);
#ifdef CONFIG_SMP
--
2.40.1


2024-04-08 17:52:05

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 12/14] sh: Emulate one-byte cmpxchg

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.

[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
---
arch/sh/Kconfig | 1 +
arch/sh/include/asm/cmpxchg.h | 2 ++
2 files changed, 3 insertions(+)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 2ad3e29f0ebec..f47e9ccf4efd2 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -16,6 +16,7 @@ config SUPERH
select ARCH_HIBERNATION_POSSIBLE if MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_WANT_IPC_PARSE_VERSION
+ select ARCH_NEED_CMPXCHG_1_EMU
select CPU_NO_EFFICIENT_FFS
select DMA_DECLARE_COHERENT
select GENERIC_ATOMIC64
diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
index 5d617b3ef78f7..27a9040983cfe 100644
--- a/arch/sh/include/asm/cmpxchg.h
+++ b/arch/sh/include/asm/cmpxchg.h
@@ -56,6 +56,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
unsigned long new, int size)
{
switch (size) {
+ case 1:
+ return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
case 4:
return __cmpxchg_u32(ptr, old, new);
}
--
2.40.1


2024-04-08 17:52:12

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 09/14] lib: Add one-byte emulation function

Architectures are required to provide four-byte cmpxchg() and 64-bit
architectures are additionally required to provide eight-byte cmpxchg().
However, there are cases where one-byte cmpxchg() would be extremely
useful. Therefore, provide cmpxchg_emu_u8() that emulates one-byte
cmpxchg() in terms of four-byte cmpxchg().

Note that this emulations is fully ordered, and can (for example) cause
one-byte cmpxchg_relaxed() to incur the overhead of full ordering.
If this causes problems for a given architecture, that architecture is
free to provide its own lighter-weight primitives.

[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Link: https://lore.kernel.org/all/0733eb10-5e7a-4450-9b8a-527b97c842ff@paulmck-laptop/

Signed-off-by: Paul E. McKenney <[email protected]>
Acked-by: Marco Elver <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Douglas Anderson <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
---
arch/Kconfig | 3 +++
include/linux/cmpxchg-emu.h | 15 +++++++++++++
lib/Makefile | 1 +
lib/cmpxchg-emu.c | 45 +++++++++++++++++++++++++++++++++++++
4 files changed, 64 insertions(+)
create mode 100644 include/linux/cmpxchg-emu.h
create mode 100644 lib/cmpxchg-emu.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71d..284663392eef8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1609,4 +1609,7 @@ config CC_HAS_SANE_FUNCTION_ALIGNMENT
# strict alignment always, even with -falign-functions.
def_bool CC_HAS_MIN_FUNCTION_ALIGNMENT || CC_IS_CLANG

+config ARCH_NEED_CMPXCHG_1_EMU
+ bool
+
endmenu
diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h
new file mode 100644
index 0000000000000..998deec67740a
--- /dev/null
+++ b/include/linux/cmpxchg-emu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Emulated 1-byte and 2-byte cmpxchg operations for architectures
+ * lacking direct support for these sizes. These are implemented in terms
+ * of 4-byte cmpxchg operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#ifndef __LINUX_CMPXCHG_EMU_H
+#define __LINUX_CMPXCHG_EMU_H
+
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new);
+
+#endif /* __LINUX_CMPXCHG_EMU_H */
diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45a..cc3d52fdb477d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -236,6 +236,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
lib-$(CONFIG_GENERIC_BUG) += bug.o

obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
+obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_EMU) += cmpxchg-emu.o

obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
#ensure exported functions have prototypes
diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c
new file mode 100644
index 0000000000000..27f6f97cb60dd
--- /dev/null
+++ b/lib/cmpxchg-emu.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Emulated 1-byte cmpxchg operation for architectures lacking direct
+ * support for this size. This is implemented in terms of 4-byte cmpxchg
+ * operations.
+ *
+ * Copyright (C) 2024 Paul E. McKenney.
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/instrumented.h>
+#include <linux/atomic.h>
+#include <linux/panic.h>
+#include <linux/bug.h>
+#include <asm-generic/rwonce.h>
+#include <linux/cmpxchg-emu.h>
+
+union u8_32 {
+ u8 b[4];
+ u32 w;
+};
+
+/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */
+uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new)
+{
+ u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3);
+ int i = ((uintptr_t)p) & 0x3;
+ union u8_32 old32;
+ union u8_32 new32;
+ u32 ret;
+
+ ret = READ_ONCE(*p32);
+ do {
+ old32.w = ret;
+ if (old32.b[i] != old)
+ return old32.b[i];
+ new32.w = old32.w;
+ new32.b[i] = new;
+ instrument_atomic_read_write(p, 1);
+ ret = data_race(cmpxchg(p32, old32.w, new32.w)); // Overridden above.
+ } while (ret != old32.w);
+ return old;
+}
+EXPORT_SYMBOL_GPL(cmpxchg_emu_u8);
--
2.40.1


2024-04-08 17:52:12

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Yujie Liu <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
---
arch/xtensa/Kconfig | 1 +
arch/xtensa/include/asm/cmpxchg.h | 2 ++
2 files changed, 3 insertions(+)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index f200a4ec044e6..d3db28f2f8110 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -14,6 +14,7 @@ config XTENSA
select ARCH_HAS_DMA_SET_UNCACHED if MMU
select ARCH_HAS_STRNCPY_FROM_USER if !KASAN
select ARCH_HAS_STRNLEN_USER
+ select ARCH_NEED_CMPXCHG_1_EMU
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/xtensa/include/asm/cmpxchg.h b/arch/xtensa/include/asm/cmpxchg.h
index 675a11ea8de76..29f8f594d5592 100644
--- a/arch/xtensa/include/asm/cmpxchg.h
+++ b/arch/xtensa/include/asm/cmpxchg.h
@@ -15,6 +15,7 @@

#include <linux/bits.h>
#include <linux/stringify.h>
+#include <linux/cmpxchg-emu.h>

/*
* cmpxchg
@@ -74,6 +75,7 @@ static __inline__ unsigned long
__cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
{
switch (size) {
+ case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
case 4: return __cmpxchg_u32(ptr, old, new);
default: __cmpxchg_called_with_bad_pointer();
return old;
--
2.40.1


2024-04-08 17:52:14

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 11/14] csky: Emulate one-byte cmpxchg

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Co-developed-by: Yujie Liu <[email protected]>
Signed-off-by: Yujie Liu <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Yujie Liu <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
---
arch/csky/Kconfig | 1 +
arch/csky/include/asm/cmpxchg.h | 10 ++++++++++
2 files changed, 11 insertions(+)

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index d3ac36751ad1f..5479707eb5d10 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -37,6 +37,7 @@ config CSKY
select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
+ select ARCH_NEED_CMPXCHG_1_EMU
select ARCH_WANT_FRAME_POINTERS if !CPU_CK610 && $(cc-option,-mbacktrace)
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
select COMMON_CLK
diff --git a/arch/csky/include/asm/cmpxchg.h b/arch/csky/include/asm/cmpxchg.h
index 916043b845f14..db6dda47184e4 100644
--- a/arch/csky/include/asm/cmpxchg.h
+++ b/arch/csky/include/asm/cmpxchg.h
@@ -6,6 +6,7 @@
#ifdef CONFIG_SMP
#include <linux/bug.h>
#include <asm/barrier.h>
+#include <linux/cmpxchg-emu.h>

#define __xchg_relaxed(new, ptr, size) \
({ \
@@ -61,6 +62,9 @@
__typeof__(old) __old = (old); \
__typeof__(*(ptr)) __ret; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+ break; \
case 4: \
asm volatile ( \
"1: ldex.w %0, (%3) \n" \
@@ -91,6 +95,9 @@
__typeof__(old) __old = (old); \
__typeof__(*(ptr)) __ret; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+ break; \
case 4: \
asm volatile ( \
"1: ldex.w %0, (%3) \n" \
@@ -122,6 +129,9 @@
__typeof__(old) __old = (old); \
__typeof__(*(ptr)) __ret; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+ break; \
case 4: \
asm volatile ( \
RELEASE_FENCE \
--
2.40.1


2024-04-08 17:52:16

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 14/14] riscv: Emulate one-byte cmpxchg

Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on riscv.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Yujie Liu <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/cmpxchg.h | 13 +++++++++++++
2 files changed, 14 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56b..3bab9c5c0f465 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -44,6 +44,7 @@ config RISCV
select ARCH_HAS_UBSAN
select ARCH_HAS_VDSO_DATA
select ARCH_KEEP_MEMBLOCK if ACPI
+ select ARCH_NEED_CMPXCHG_1_EMU
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
select ARCH_STACKWALK
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 2fee65cc84432..abcd5543b861b 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -9,6 +9,7 @@
#include <linux/bug.h>

#include <asm/fence.h>
+#include <linux/cmpxchg-emu.h>

#define __xchg_relaxed(ptr, new, size) \
({ \
@@ -170,6 +171,9 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
+ break; \
case 4: \
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
@@ -214,6 +218,9 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
case 4: \
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
@@ -260,6 +267,9 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
case 4: \
__asm__ __volatile__ ( \
RISCV_RELEASE_BARRIER \
@@ -306,6 +316,9 @@
__typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
switch (size) { \
+ case 1: \
+ __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
+ break; \
case 4: \
__asm__ __volatile__ ( \
"0: lr.w %0, %2\n" \
--
2.40.1


2024-04-08 18:30:59

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 06/14] parisc: unify implementations of __cmpxchg_u{8,32,64}

From: Al Viro <[email protected]>

identical except for type name involved

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/parisc/lib/bitops.c | 51 +++++++++++++---------------------------
1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index ae2231d921985..cae30a3eb6d9b 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -56,38 +56,19 @@ unsigned long notrace __xchg8(char x, volatile char *ptr)
}


-u64 notrace __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new)
-{
- unsigned long flags;
- u64 prev;
-
- _atomic_spin_lock_irqsave(ptr, flags);
- if ((prev = *ptr) == old)
- *ptr = new;
- _atomic_spin_unlock_irqrestore(ptr, flags);
- return prev;
-}
-
-u32 notrace __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
-{
- unsigned long flags;
- u32 prev;
-
- _atomic_spin_lock_irqsave(ptr, flags);
- if ((prev = *ptr) == old)
- *ptr = new;
- _atomic_spin_unlock_irqrestore(ptr, flags);
- return prev;
-}
-
-u8 notrace __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new)
-{
- unsigned long flags;
- u8 prev;
-
- _atomic_spin_lock_irqsave(ptr, flags);
- if ((prev = *ptr) == old)
- *ptr = new;
- _atomic_spin_unlock_irqrestore(ptr, flags);
- return prev;
-}
+#define CMPXCHG(T) \
+ T notrace __cmpxchg_##T(volatile T *ptr, T old, T new) \
+ { \
+ unsigned long flags; \
+ T prev; \
+ \
+ _atomic_spin_lock_irqsave(ptr, flags); \
+ if ((prev = *ptr) == old) \
+ *ptr = new; \
+ _atomic_spin_unlock_irqrestore(ptr, flags); \
+ return prev; \
+ }
+
+CMPXCHG(u64)
+CMPXCHG(u32)
+CMPXCHG(u8)
--
2.40.1


2024-04-08 18:34:09

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()

From: Al Viro <[email protected]>

Add (and export) __cmpxchg_u16(), teach __cmpxchg() to use it.

And get rid of manual truncation down to u8, etc. in there - the
only reason for those is to avoid bogus warnings about constant
truncation from sparse, and those are easy to avoid by turning
that switch into conditional expression.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/parisc/include/asm/cmpxchg.h | 19 +++++++++----------
arch/parisc/kernel/parisc_ksyms.c | 1 +
arch/parisc/lib/bitops.c | 1 +
3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/parisc/include/asm/cmpxchg.h b/arch/parisc/include/asm/cmpxchg.h
index 0924ebc576d28..bf0a0f1189eb2 100644
--- a/arch/parisc/include/asm/cmpxchg.h
+++ b/arch/parisc/include/asm/cmpxchg.h
@@ -56,25 +56,24 @@ __arch_xchg(unsigned long x, volatile void *ptr, int size)
/* bug catcher for when unsupported size is used - won't link */
extern void __cmpxchg_called_with_bad_pointer(void);

-/* __cmpxchg_u32/u64 defined in arch/parisc/lib/bitops.c */
+/* __cmpxchg_u... defined in arch/parisc/lib/bitops.c */
+extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);
+extern u16 __cmpxchg_u16(volatile u16 *ptr, u16 old, u16 new_);
extern u32 __cmpxchg_u32(volatile u32 *m, u32 old, u32 new_);
extern u64 __cmpxchg_u64(volatile u64 *ptr, u64 old, u64 new_);
-extern u8 __cmpxchg_u8(volatile u8 *ptr, u8 old, u8 new_);

/* don't worry...optimizer will get rid of most of this */
static inline unsigned long
__cmpxchg(volatile void *ptr, unsigned long old, unsigned long new_, int size)
{
- switch (size) {
+ return
#ifdef CONFIG_64BIT
- case 8: return __cmpxchg_u64((u64 *)ptr, old, new_);
+ size == 8 ? __cmpxchg_u64(ptr, old, new_) :
#endif
- case 4: return __cmpxchg_u32((unsigned int *)ptr,
- (unsigned int)old, (unsigned int)new_);
- case 1: return __cmpxchg_u8((u8 *)ptr, old & 0xff, new_ & 0xff);
- }
- __cmpxchg_called_with_bad_pointer();
- return old;
+ size == 4 ? __cmpxchg_u32(ptr, old, new_) :
+ size == 2 ? __cmpxchg_u16(ptr, old, new_) :
+ size == 1 ? __cmpxchg_u8(ptr, old, new_) :
+ (__cmpxchg_called_with_bad_pointer(), old);
}

#define arch_cmpxchg(ptr, o, n) \
diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index dcf61cbd31470..c1587aa35beb6 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -23,6 +23,7 @@ EXPORT_SYMBOL(memset);
EXPORT_SYMBOL(__xchg8);
EXPORT_SYMBOL(__xchg32);
EXPORT_SYMBOL(__cmpxchg_u8);
+EXPORT_SYMBOL(__cmpxchg_u16);
EXPORT_SYMBOL(__cmpxchg_u32);
EXPORT_SYMBOL(__cmpxchg_u64);
#ifdef CONFIG_SMP
diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index cae30a3eb6d9b..9df8100506427 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -71,4 +71,5 @@ unsigned long notrace __xchg8(char x, volatile char *ptr)

CMPXCHG(u64)
CMPXCHG(u32)
+CMPXCHG(u16)
CMPXCHG(u8)
--
2.40.1


2024-04-08 20:12:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()

On Mon, 8 Apr 2024 at 10:50, Paul E. McKenney <[email protected]> wrote:
>
> And get rid of manual truncation down to u8, etc. in there - the
> only reason for those is to avoid bogus warnings about constant
> truncation from sparse, and those are easy to avoid by turning
> that switch into conditional expression.

I support the use of the conditional, but why add the 16-bit case when
it turns out we don't want it after all?

Linus

2024-04-08 20:56:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()

On Mon, Apr 08, 2024 at 01:10:40PM -0700, Linus Torvalds wrote:
> On Mon, 8 Apr 2024 at 10:50, Paul E. McKenney <[email protected]> wrote:
> >
> > And get rid of manual truncation down to u8, etc. in there - the
> > only reason for those is to avoid bogus warnings about constant
> > truncation from sparse, and those are easy to avoid by turning
> > that switch into conditional expression.
>
> I support the use of the conditional, but why add the 16-bit case when
> it turns out we don't want it after all?

You are quite right that we do not want it for emulation. However, this
commit is providing native parisc support for the full set of cases,
just like x86 already does.

Plus this native parisc/sparc32 support is harmless. If someone adds a
16-bit cmpxchg() in core code, which (as you say) is a bug given that
some systems do not support 16-bit loads and stores, then kernel test
robot builds of arc, csky, sh, xtensa, and riscv will complain bitterly.

Plus I hope that ongoing removal of support for antique systems will allow
us to support 16-bit cmpxchg() in core code sooner rather than later.
(Hey, I can dream, can't I?)

Thanx, Paul

2024-04-09 17:35:54

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 14/14] riscv: Emulate one-byte cmpxchg

Hi Paul,

> @@ -170,6 +171,9 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> + break; \
> case 4: \
> __asm__ __volatile__ ( \
> "0: lr.w %0, %2\n" \
> @@ -214,6 +218,9 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> case 4: \
> __asm__ __volatile__ ( \
> "0: lr.w %0, %2\n" \
> @@ -260,6 +267,9 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> case 4: \
> __asm__ __volatile__ ( \
> RISCV_RELEASE_BARRIER \
> @@ -306,6 +316,9 @@
> __typeof__(*(ptr)) __ret; \
> register unsigned int __rc; \
> switch (size) { \
> + case 1: \
> + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> + break; \
> case 4: \
> __asm__ __volatile__ ( \
> "0: lr.w %0, %2\n" \

Seems the last three are missing uintptr_t casts?

Andrea

2024-04-09 18:08:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 14/14] riscv: Emulate one-byte cmpxchg

On Tue, Apr 09, 2024 at 07:35:39PM +0200, Andrea Parri wrote:
> Hi Paul,
>
> > @@ -170,6 +171,9 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, (uintptr_t)__old, (uintptr_t)__new); \
> > + break; \
> > case 4: \
> > __asm__ __volatile__ ( \
> > "0: lr.w %0, %2\n" \
> > @@ -214,6 +218,9 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > case 4: \
> > __asm__ __volatile__ ( \
> > "0: lr.w %0, %2\n" \
> > @@ -260,6 +267,9 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > case 4: \
> > __asm__ __volatile__ ( \
> > RISCV_RELEASE_BARRIER \
> > @@ -306,6 +316,9 @@
> > __typeof__(*(ptr)) __ret; \
> > register unsigned int __rc; \
> > switch (size) { \
> > + case 1: \
> > + __ret = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)__ptr, __old, __new); \
> > + break; \
> > case 4: \
> > __asm__ __volatile__ ( \
> > "0: lr.w %0, %2\n" \
>
> Seems the last three are missing uintptr_t casts?

Indeed they are, and good eyes!

However, Liu, Yujie beat you to it, and this commit contains the fix:

4d5c72a34948 ("riscv: Emulate one-byte cmpxchg")

Thanx, Paul

2024-04-18 08:05:21

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 12/14] sh: Emulate one-byte cmpxchg

Hi Paul,

On Mon, Apr 8, 2024 at 7:50 PM Paul E. McKenney <[email protected]> wrote:
> Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.
>
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
>
> Signed-off-by: Paul E. McKenney <[email protected]>

Thanks for your patch!

> --- a/arch/sh/include/asm/cmpxchg.h
> +++ b/arch/sh/include/asm/cmpxchg.h
> @@ -56,6 +56,8 @@ static inline unsigned long __cmpxchg(volatile void * ptr, unsigned long old,
> unsigned long new, int size)
> {
> switch (size) {
> + case 1:
> + return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);

The cast is not needed.

> case 4:
> return __cmpxchg_u32(ptr, old, new);
> }

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2024-04-18 08:07:11

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg

Hi Paul,

On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <[email protected]> wrote:
> Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
>
> [ paulmck: Apply kernel test robot feedback. ]
> [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
>
> Signed-off-by: Paul E. McKenney <[email protected]>

Thanks for your patch!

> --- a/arch/xtensa/include/asm/cmpxchg.h
> +++ b/arch/xtensa/include/asm/cmpxchg.h
> @@ -74,6 +75,7 @@ static __inline__ unsigned long
> __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> {
> switch (size) {
> + case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);

The cast is not needed.

> case 4: return __cmpxchg_u32(ptr, old, new);
> default: __cmpxchg_called_with_bad_pointer();
> return old;

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2024-04-18 23:23:11

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg

On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
>
> On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <[email protected]> wrote:
> > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> >
> > [ paulmck: Apply kernel test robot feedback. ]
> > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Thanks for your patch!
>
> > --- a/arch/xtensa/include/asm/cmpxchg.h
> > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > {
> > switch (size) {
> > + case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
>
> The cast is not needed.

In both cases, kernel test robot yelled at me when it was not present.

Happy to resubmit without it, though, if that is a yell that I should
have ignored.

Thanx, Paul

> > case 4: return __cmpxchg_u32(ptr, old, new);
> > default: __cmpxchg_called_with_bad_pointer();
> > return old;
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

2024-04-19 05:14:38

by Yujie Liu

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg

On Thu, Apr 18, 2024 at 04:21:46PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> > Hi Paul,
> >
> > On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <[email protected]> wrote:
> > > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> > >
> > > [ paulmck: Apply kernel test robot feedback. ]
> > > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > >
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > Thanks for your patch!
> >
> > > --- a/arch/xtensa/include/asm/cmpxchg.h
> > > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > > __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > > {
> > > switch (size) {
> > > + case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> >
> > The cast is not needed.
>
> In both cases, kernel test robot yelled at me when it was not present.
>
> Happy to resubmit without it, though, if that is a yell that I should
> have ignored.

FYI, kernel test robot did yell some reports on various architectures such as:

[1] https://lore.kernel.org/oe-kbuild-all/[email protected]/
[2] https://lore.kernel.org/oe-kbuild-all/[email protected]/
[3] https://lore.kernel.org/oe-kbuild-all/[email protected]/

In brief, there were mainly three types of issues:

* The cmpxchg-emu.h header is missing
* The parameters of cmpxchg_emu_u8 need to be cast to corresponding types
* The return value of cmpxchg_emu_u8 needs to be cast to the "ret" type

As for this specific case of xtensa arch, the compiler doesn't warn
regardless of whether there is an explicit cast for "ptr" or not.
The "ptr" being passed in is "void *", and it seems that a "void *"
pointer can be automatically cast to any other type of pointer, so it
is not necessary to have an explicit cast of "u8 *".

As for the implementations of other architectures that don't pass the
"ptr" as "void *" (such as a macro implementation), the explicit cast to
"u8 *" may still be required.

Thanks,
Yujie

>
> Thanx, Paul
>
> > > case 4: return __cmpxchg_u32(ptr, old, new);
> > > default: __cmpxchg_called_with_bad_pointer();
> > > return old;
> >
> > Gr{oetje,eeting}s,
> >
> > Geert
> >
> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
> >
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> > -- Linus Torvalds

2024-04-19 08:05:26

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg

On Fri, Apr 19, 2024 at 7:14 AM Yujie Liu <[email protected]> wrote:
> On Thu, Apr 18, 2024 at 04:21:46PM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> > > On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <[email protected]> wrote:
> > > > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> > > >
> > > > [ paulmck: Apply kernel test robot feedback. ]
> > > > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > > >
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > Thanks for your patch!
> > >
> > > > --- a/arch/xtensa/include/asm/cmpxchg.h
> > > > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > > > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > > > __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > > > {
> > > > switch (size) {
> > > > + case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> > >
> > > The cast is not needed.
> >
> > In both cases, kernel test robot yelled at me when it was not present.
> >
> > Happy to resubmit without it, though, if that is a yell that I should
> > have ignored.
>
> FYI, kernel test robot did yell some reports on various architectures such as:
>
> [1] https://lore.kernel.org/oe-kbuild-all/202403292321.T55etywH-lkp@intelcom/
> [2] https://lore.kernel.org/oe-kbuild-all/202404040526.GVzaL2io-lkp@intelcom/
> [3] https://lore.kernel.org/oe-kbuild-all/202404022106.mYwpypit-lkp@intelcom/
>
> In brief, there were mainly three types of issues:
>
> * The cmpxchg-emu.h header is missing
> * The parameters of cmpxchg_emu_u8 need to be cast to corresponding types
> * The return value of cmpxchg_emu_u8 needs to be cast to the "ret" type
>
> As for this specific case of xtensa arch, the compiler doesn't warn
> regardless of whether there is an explicit cast for "ptr" or not.
> The "ptr" being passed in is "void *", and it seems that a "void *"
> pointer can be automatically cast to any other type of pointer, so it
> is not necessary to have an explicit cast of "u8 *".
>
> As for the implementations of other architectures that don't pass the
> "ptr" as "void *" (such as a macro implementation), the explicit cast to
> "u8 *" may still be required.

Exactly. On sh and xtensa, the original pointer is of type
"volatile void *", so no cast is needed.
On E.g. arc, the original pointer is of type "volatile __typeof__(ptr) _p_",
which is not always compatible with "volatile u8 *".

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2024-04-20 14:03:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH cmpxchg 13/14] xtensa: Emulate one-byte cmpxchg

On Fri, Apr 19, 2024 at 10:02:47AM +0200, Geert Uytterhoeven wrote:
> On Fri, Apr 19, 2024 at 7:14 AM Yujie Liu <[email protected]> wrote:
> > On Thu, Apr 18, 2024 at 04:21:46PM -0700, Paul E. McKenney wrote:
> > > On Thu, Apr 18, 2024 at 10:06:21AM +0200, Geert Uytterhoeven wrote:
> > > > On Mon, Apr 8, 2024 at 7:49 PM Paul E. McKenney <[email protected]> wrote:
> > > > > Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
> > > > >
> > > > > [ paulmck: Apply kernel test robot feedback. ]
> > > > > [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
> > > > >
> > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > >
> > > > Thanks for your patch!
> > > >
> > > > > --- a/arch/xtensa/include/asm/cmpxchg.h
> > > > > +++ b/arch/xtensa/include/asm/cmpxchg.h
> > > > > @@ -74,6 +75,7 @@ static __inline__ unsigned long
> > > > > __cmpxchg(volatile void *ptr, unsigned long old, unsigned long new, int size)
> > > > > {
> > > > > switch (size) {
> > > > > + case 1: return cmpxchg_emu_u8((volatile u8 *)ptr, old, new);
> > > >
> > > > The cast is not needed.
> > >
> > > In both cases, kernel test robot yelled at me when it was not present.
> > >
> > > Happy to resubmit without it, though, if that is a yell that I should
> > > have ignored.
> >
> > FYI, kernel test robot did yell some reports on various architectures such as:
> >
> > [1] https://lore.kernel.org/oe-kbuild-all/[email protected]/
> > [2] https://lore.kernel.org/oe-kbuild-all/[email protected]/
> > [3] https://lore.kernel.org/oe-kbuild-all/[email protected]/
> >
> > In brief, there were mainly three types of issues:
> >
> > * The cmpxchg-emu.h header is missing
> > * The parameters of cmpxchg_emu_u8 need to be cast to corresponding types
> > * The return value of cmpxchg_emu_u8 needs to be cast to the "ret" type
> >
> > As for this specific case of xtensa arch, the compiler doesn't warn
> > regardless of whether there is an explicit cast for "ptr" or not.
> > The "ptr" being passed in is "void *", and it seems that a "void *"
> > pointer can be automatically cast to any other type of pointer, so it
> > is not necessary to have an explicit cast of "u8 *".
> >
> > As for the implementations of other architectures that don't pass the
> > "ptr" as "void *" (such as a macro implementation), the explicit cast to
> > "u8 *" may still be required.
>
> Exactly. On sh and xtensa, the original pointer is of type
> "volatile void *", so no cast is needed.
> On E.g. arc, the original pointer is of type "volatile __typeof__(ptr) _p_",
> which is not always compatible with "volatile u8 *".

Very good, and thank you both! I will spin updated patches and send
them out early this coming week.

Thanx, Paul

> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds