2024-01-03 16:32:41

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v1 0/5] Rework & improve riscv cmpxchg.h and atomic.h

While studying riscv's cmpxchg.h file, I got really interested in
understanding how RISCV asm implemented the different versions of
{cmp,}xchg.

When I understood the pattern, it made sense for me to remove the
duplications and create macros to make it easier to understand what exactly
changes between the versions: Instruction sufixes & barriers.

Also, did the same kind of work on atomic.c.

After that, I noted both cmpxchg and xchg only accept variables of
size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.

Now that deduplication is done, it is quite direct to implement them
for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
me some possible users :)

I did compare the generated asm on a test.c that contained usage for every
changed function, and could not detect any change on patches 1 + 2 + 3
compared with upstream.

Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
booted just fine with qemu -machine virt -append "qspinlock".

(tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)

Latest tests happened based on this tree:
https://github.com/guoren83/linux/tree/qspinlock_v12

Thanks!
Leo

Changes since squashed cmpxchg RFCv5:
- Resend as v1
https://lore.kernel.org/all/[email protected]/

Changes since squashed cmpxchg RFCv4:
- Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
in current upstream, (possibly) fixing the bug from kernel test robot
https://lore.kernel.org/all/[email protected]/

Changes since squashed cmpxchg RFCv3:
- Fixed bug on cmpxchg macro for var size 1 & 2: now working
- Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
input of a 32-bit aligned address
- Renamed internal macros from _mask to _masked for patches 4 & 5
- __rc variable on macros for var size 1 & 2 changed from register to ulong
https://lore.kernel.org/all/[email protected]/

Changes since squashed cmpxchg RFCv2:
- Removed rc parameter from the new macro: it can be internal to the macro
- 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
https://lore.kernel.org/all/[email protected]/

Changes since squashed cmpxchg RFCv1:
- Unified with atomic.c patchset
- Rebased on top of torvalds/master (thanks Andrea Parri!)
- Removed helper macros that were not being used elsewhere in the kernel.
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/

Changes since (cmpxchg) RFCv3:
- Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
https://lore.kernel.org/all/[email protected]/

Changes since (cmpxchg) RFCv2:
- Fixed macros that depend on having a local variable with a magic name
- Previous cast to (long) is now only applied on 4-bytes cmpxchg
https://lore.kernel.org/all/[email protected]/

Changes since (cmpxchg) RFCv1:
- Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
https://lore.kernel.org/all/[email protected]/

Leonardo Bras (5):
riscv/cmpxchg: Deduplicate xchg() asm functions
riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
riscv/atomic.h : Deduplicate arch_atomic.*
riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
riscv/cmpxchg: Implement xchg for variables of size 1 and 2

arch/riscv/include/asm/atomic.h | 164 ++++++-------
arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
2 files changed, 200 insertions(+), 368 deletions(-)


base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
--
2.43.0



2024-01-03 16:32:57

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

In this header every xchg define (_relaxed, _acquire, _release, vanilla)
contain it's own asm file, both for 4-byte variables an 8-byte variables,
on a total of 8 versions of mostly the same asm.

This is usually bad, as it means any change may be done in up to 8
different places.

Unify those versions by creating a new define with enough parameters to
generate any version of the previous 8.

Then unify the result under a more general define, and simplify
arch_xchg* generation.

(This did not cause any change in generated asm)

Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Tested-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
1 file changed, 23 insertions(+), 115 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 2f4726d3cfcc2..48478a8eecee7 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,140 +11,48 @@
#include <asm/barrier.h>
#include <asm/fence.h>

-#define __xchg_relaxed(ptr, new, size) \
+#define __arch_xchg(sfx, prepend, append, r, p, n) \
({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(new) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- " amoswap.w %0, %2, %1\n" \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- " amoswap.d %0, %2, %1\n" \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __ret; \
-})
-
-#define arch_xchg_relaxed(ptr, x) \
-({ \
- __typeof__(*(ptr)) _x_ = (x); \
- (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
- _x_, sizeof(*(ptr))); \
+ __asm__ __volatile__ ( \
+ prepend \
+ " amoswap" sfx " %0, %2, %1\n" \
+ append \
+ : "=r" (r), "+A" (*(p)) \
+ : "r" (n) \
+ : "memory"); \
})

-#define __xchg_acquire(ptr, new, size) \
+#define _arch_xchg(ptr, new, sfx, prepend, append) \
({ \
__typeof__(ptr) __ptr = (ptr); \
- __typeof__(new) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- switch (size) { \
+ __typeof__(*(__ptr)) __new = (new); \
+ __typeof__(*(__ptr)) __ret; \
+ switch (sizeof(*__ptr)) { \
case 4: \
- __asm__ __volatile__ ( \
- " amoswap.w %0, %2, %1\n" \
- RISCV_ACQUIRE_BARRIER \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
+ __arch_xchg(".w" sfx, prepend, append, \
+ __ret, __ptr, __new); \
break; \
case 8: \
- __asm__ __volatile__ ( \
- " amoswap.d %0, %2, %1\n" \
- RISCV_ACQUIRE_BARRIER \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
+ __arch_xchg(".d" sfx, prepend, append, \
+ __ret, __ptr, __new); \
break; \
default: \
BUILD_BUG(); \
} \
- __ret; \
+ (__typeof__(*(__ptr)))__ret; \
})

-#define arch_xchg_acquire(ptr, x) \
-({ \
- __typeof__(*(ptr)) _x_ = (x); \
- (__typeof__(*(ptr))) __xchg_acquire((ptr), \
- _x_, sizeof(*(ptr))); \
-})
+#define arch_xchg_relaxed(ptr, x) \
+ _arch_xchg(ptr, x, "", "", "")

-#define __xchg_release(ptr, new, size) \
-({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(new) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- RISCV_RELEASE_BARRIER \
- " amoswap.w %0, %2, %1\n" \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- RISCV_RELEASE_BARRIER \
- " amoswap.d %0, %2, %1\n" \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __ret; \
-})
+#define arch_xchg_acquire(ptr, x) \
+ _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)

#define arch_xchg_release(ptr, x) \
-({ \
- __typeof__(*(ptr)) _x_ = (x); \
- (__typeof__(*(ptr))) __xchg_release((ptr), \
- _x_, sizeof(*(ptr))); \
-})
-
-#define __arch_xchg(ptr, new, size) \
-({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(new) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- " amoswap.w.aqrl %0, %2, %1\n" \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- " amoswap.d.aqrl %0, %2, %1\n" \
- : "=r" (__ret), "+A" (*__ptr) \
- : "r" (__new) \
- : "memory"); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __ret; \
-})
+ _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")

#define arch_xchg(ptr, x) \
-({ \
- __typeof__(*(ptr)) _x_ = (x); \
- (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
-})
+ _arch_xchg(ptr, x, ".aqrl", "", "")

#define xchg32(ptr, x) \
({ \
--
2.43.0


2024-01-03 16:33:10

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v1 2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros

In this header every cmpxchg define (_relaxed, _acquire, _release,
vanilla) contain it's own asm file, both for 4-byte variables an 8-byte
variables, on a total of 8 versions of mostly the same asm.

This is usually bad, as it means any change may be done in up to 8
different places.

Unify those versions by creating a new define with enough parameters to
generate any version of the previous 8.

Then unify the result under a more general define, and simplify
arch_cmpxchg* generation

(This did not cause any change in generated asm)

Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Tested-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 195 ++++++-------------------------
1 file changed, 33 insertions(+), 162 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 48478a8eecee7..e3e0ac7ba061b 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -71,190 +71,61 @@
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
-#define __cmpxchg_relaxed(ptr, old, new, size) \
-({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(*(ptr)) __old = (old); \
- __typeof__(*(ptr)) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- register unsigned int __rc; \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- "0: lr.w %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.w %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" ((long)__old), "rJ" (__new) \
- : "memory"); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- "0: lr.d %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.d %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" (__old), "rJ" (__new) \
- : "memory"); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __ret; \
-})

-#define arch_cmpxchg_relaxed(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) _o_ = (o); \
- __typeof__(*(ptr)) _n_ = (n); \
- (__typeof__(*(ptr))) __cmpxchg_relaxed((ptr), \
- _o_, _n_, sizeof(*(ptr))); \
-})

-#define __cmpxchg_acquire(ptr, old, new, size) \
+#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n) \
({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(*(ptr)) __old = (old); \
- __typeof__(*(ptr)) __new = (new); \
- __typeof__(*(ptr)) __ret; \
register unsigned int __rc; \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- "0: lr.w %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.w %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- RISCV_ACQUIRE_BARRIER \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" ((long)__old), "rJ" (__new) \
- : "memory"); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- "0: lr.d %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.d %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- RISCV_ACQUIRE_BARRIER \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" (__old), "rJ" (__new) \
- : "memory"); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __ret; \
-})
-
-#define arch_cmpxchg_acquire(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) _o_ = (o); \
- __typeof__(*(ptr)) _n_ = (n); \
- (__typeof__(*(ptr))) __cmpxchg_acquire((ptr), \
- _o_, _n_, sizeof(*(ptr))); \
+ \
+ __asm__ __volatile__ ( \
+ prepend \
+ "0: lr" lr_sfx " %0, %2\n" \
+ " bne %0, %z3, 1f\n" \
+ " sc" sc_sfx " %1, %z4, %2\n" \
+ " bnez %1, 0b\n" \
+ append \
+ "1:\n" \
+ : "=&r" (r), "=&r" (__rc), "+A" (*(p)) \
+ : "rJ" (co o), "rJ" (n) \
+ : "memory"); \
})

-#define __cmpxchg_release(ptr, old, new, size) \
+#define _arch_cmpxchg(ptr, old, new, sc_sfx, prepend, append) \
({ \
__typeof__(ptr) __ptr = (ptr); \
- __typeof__(*(ptr)) __old = (old); \
- __typeof__(*(ptr)) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- register unsigned int __rc; \
- switch (size) { \
+ __typeof__(*(__ptr)) __old = (old); \
+ __typeof__(*(__ptr)) __new = (new); \
+ __typeof__(*(__ptr)) __ret; \
+ \
+ switch (sizeof(*__ptr)) { \
case 4: \
- __asm__ __volatile__ ( \
- RISCV_RELEASE_BARRIER \
- "0: lr.w %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.w %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" ((long)__old), "rJ" (__new) \
- : "memory"); \
+ __arch_cmpxchg(".w", ".w" sc_sfx, prepend, append, \
+ __ret, __ptr, (long), __old, __new); \
break; \
case 8: \
- __asm__ __volatile__ ( \
- RISCV_RELEASE_BARRIER \
- "0: lr.d %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.d %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" (__old), "rJ" (__new) \
- : "memory"); \
+ __arch_cmpxchg(".d", ".d" sc_sfx, prepend, append, \
+ __ret, __ptr, /**/, __old, __new); \
break; \
default: \
BUILD_BUG(); \
} \
- __ret; \
+ (__typeof__(*(__ptr)))__ret; \
})

-#define arch_cmpxchg_release(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) _o_ = (o); \
- __typeof__(*(ptr)) _n_ = (n); \
- (__typeof__(*(ptr))) __cmpxchg_release((ptr), \
- _o_, _n_, sizeof(*(ptr))); \
-})
+#define arch_cmpxchg_relaxed(ptr, o, n) \
+ _arch_cmpxchg((ptr), (o), (n), "", "", "")

-#define __cmpxchg(ptr, old, new, size) \
-({ \
- __typeof__(ptr) __ptr = (ptr); \
- __typeof__(*(ptr)) __old = (old); \
- __typeof__(*(ptr)) __new = (new); \
- __typeof__(*(ptr)) __ret; \
- register unsigned int __rc; \
- switch (size) { \
- case 4: \
- __asm__ __volatile__ ( \
- "0: lr.w %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.w.rl %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- " fence rw, rw\n" \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" ((long)__old), "rJ" (__new) \
- : "memory"); \
- break; \
- case 8: \
- __asm__ __volatile__ ( \
- "0: lr.d %0, %2\n" \
- " bne %0, %z3, 1f\n" \
- " sc.d.rl %1, %z4, %2\n" \
- " bnez %1, 0b\n" \
- " fence rw, rw\n" \
- "1:\n" \
- : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
- : "rJ" (__old), "rJ" (__new) \
- : "memory"); \
- break; \
- default: \
- BUILD_BUG(); \
- } \
- __ret; \
-})
+#define arch_cmpxchg_acquire(ptr, o, n) \
+ _arch_cmpxchg((ptr), (o), (n), "", "", RISCV_ACQUIRE_BARRIER)
+
+#define arch_cmpxchg_release(ptr, o, n) \
+ _arch_cmpxchg((ptr), (o), (n), "", RISCV_RELEASE_BARRIER, "")

#define arch_cmpxchg(ptr, o, n) \
-({ \
- __typeof__(*(ptr)) _o_ = (o); \
- __typeof__(*(ptr)) _n_ = (n); \
- (__typeof__(*(ptr))) __cmpxchg((ptr), \
- _o_, _n_, sizeof(*(ptr))); \
-})
+ _arch_cmpxchg((ptr), (o), (n), ".rl", "", " fence rw, rw\n")

#define arch_cmpxchg_local(ptr, o, n) \
- (__cmpxchg_relaxed((ptr), (o), (n), sizeof(*(ptr))))
+ arch_cmpxchg_relaxed((ptr), (o), (n))

#define arch_cmpxchg64(ptr, o, n) \
({ \
--
2.43.0


2024-01-03 16:33:26

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v1 4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2

cmpxchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.

Implement 1-byte and 2-bytes cmpxchg in order to achieve parity with other
architectures.

Signed-off-by: Leonardo Bras <[email protected]>
Tested-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 34 ++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index e3e0ac7ba061b..ac9d0eeb74e67 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -72,6 +72,35 @@
* indicated by comparing RETURN with OLD.
*/

+#define __arch_cmpxchg_masked(sc_sfx, prepend, append, r, p, o, n) \
+({ \
+ u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3); \
+ ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE; \
+ ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0) \
+ << __s; \
+ ulong __newx = (ulong)(n) << __s; \
+ ulong __oldx = (ulong)(o) << __s; \
+ ulong __retx; \
+ ulong __rc; \
+ \
+ __asm__ __volatile__ ( \
+ prepend \
+ "0: lr.w %0, %2\n" \
+ " and %1, %0, %z5\n" \
+ " bne %1, %z3, 1f\n" \
+ " and %1, %0, %z6\n" \
+ " or %1, %1, %z4\n" \
+ " sc.w" sc_sfx " %1, %1, %2\n" \
+ " bnez %1, 0b\n" \
+ append \
+ "1:\n" \
+ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \
+ : "rJ" ((long)__oldx), "rJ" (__newx), \
+ "rJ" (__mask), "rJ" (~__mask) \
+ : "memory"); \
+ \
+ r = (__typeof__(*(p)))((__retx & __mask) >> __s); \
+})

#define __arch_cmpxchg(lr_sfx, sc_sfx, prepend, append, r, p, co, o, n) \
({ \
@@ -98,6 +127,11 @@
__typeof__(*(__ptr)) __ret; \
\
switch (sizeof(*__ptr)) { \
+ case 1: \
+ case 2: \
+ __arch_cmpxchg_masked(sc_sfx, prepend, append, \
+ __ret, __ptr, __old, __new); \
+ break; \
case 4: \
__arch_cmpxchg(".w", ".w" sc_sfx, prepend, append, \
__ret, __ptr, (long), __old, __new); \
--
2.43.0


2024-01-03 16:33:31

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v1 3/5] riscv/atomic.h : Deduplicate arch_atomic.*

Some functions use mostly the same asm for 32-bit and 64-bit versions.

Make a macro that is generic enough and avoid code duplication.

(This did not cause any change in generated asm)

Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Tested-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/atomic.h | 164 +++++++++++++++-----------------
1 file changed, 76 insertions(+), 88 deletions(-)

diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index f5dfef6c2153f..80cca7ac16fd3 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -196,22 +196,28 @@ ATOMIC_OPS(xor, xor, i)
#undef ATOMIC_FETCH_OP
#undef ATOMIC_OP_RETURN

+#define _arch_atomic_fetch_add_unless(_prev, _rc, counter, _a, _u, sfx) \
+({ \
+ __asm__ __volatile__ ( \
+ "0: lr." sfx " %[p], %[c]\n" \
+ " beq %[p], %[u], 1f\n" \
+ " add %[rc], %[p], %[a]\n" \
+ " sc." sfx ".rl %[rc], %[rc], %[c]\n" \
+ " bnez %[rc], 0b\n" \
+ " fence rw, rw\n" \
+ "1:\n" \
+ : [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter) \
+ : [a]"r" (_a), [u]"r" (_u) \
+ : "memory"); \
+})
+
/* This is required to provide a full barrier on success. */
static __always_inline int arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int prev, rc;

- __asm__ __volatile__ (
- "0: lr.w %[p], %[c]\n"
- " beq %[p], %[u], 1f\n"
- " add %[rc], %[p], %[a]\n"
- " sc.w.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- : [a]"r" (a), [u]"r" (u)
- : "memory");
+ _arch_atomic_fetch_add_unless(prev, rc, v->counter, a, u, "w");
+
return prev;
}
#define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
@@ -222,77 +228,86 @@ static __always_inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a,
s64 prev;
long rc;

- __asm__ __volatile__ (
- "0: lr.d %[p], %[c]\n"
- " beq %[p], %[u], 1f\n"
- " add %[rc], %[p], %[a]\n"
- " sc.d.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- : [a]"r" (a), [u]"r" (u)
- : "memory");
+ _arch_atomic_fetch_add_unless(prev, rc, v->counter, a, u, "d");
+
return prev;
}
#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
#endif

+#define _arch_atomic_inc_unless_negative(_prev, _rc, counter, sfx) \
+({ \
+ __asm__ __volatile__ ( \
+ "0: lr." sfx " %[p], %[c]\n" \
+ " bltz %[p], 1f\n" \
+ " addi %[rc], %[p], 1\n" \
+ " sc." sfx ".rl %[rc], %[rc], %[c]\n" \
+ " bnez %[rc], 0b\n" \
+ " fence rw, rw\n" \
+ "1:\n" \
+ : [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter) \
+ : \
+ : "memory"); \
+})
+
static __always_inline bool arch_atomic_inc_unless_negative(atomic_t *v)
{
int prev, rc;

- __asm__ __volatile__ (
- "0: lr.w %[p], %[c]\n"
- " bltz %[p], 1f\n"
- " addi %[rc], %[p], 1\n"
- " sc.w.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- :
- : "memory");
+ _arch_atomic_inc_unless_negative(prev, rc, v->counter, "w");
+
return !(prev < 0);
}

#define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative

+#define _arch_atomic_dec_unless_positive(_prev, _rc, counter, sfx) \
+({ \
+ __asm__ __volatile__ ( \
+ "0: lr." sfx " %[p], %[c]\n" \
+ " bgtz %[p], 1f\n" \
+ " addi %[rc], %[p], -1\n" \
+ " sc." sfx ".rl %[rc], %[rc], %[c]\n" \
+ " bnez %[rc], 0b\n" \
+ " fence rw, rw\n" \
+ "1:\n" \
+ : [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter) \
+ : \
+ : "memory"); \
+})
+
static __always_inline bool arch_atomic_dec_unless_positive(atomic_t *v)
{
int prev, rc;

- __asm__ __volatile__ (
- "0: lr.w %[p], %[c]\n"
- " bgtz %[p], 1f\n"
- " addi %[rc], %[p], -1\n"
- " sc.w.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- :
- : "memory");
+ _arch_atomic_dec_unless_positive(prev, rc, v->counter, "w");
+
return !(prev > 0);
}

#define arch_atomic_dec_unless_positive arch_atomic_dec_unless_positive

+#define _arch_atomic_dec_if_positive(_prev, _rc, counter, sfx) \
+({ \
+ __asm__ __volatile__ ( \
+ "0: lr." sfx " %[p], %[c]\n" \
+ " addi %[rc], %[p], -1\n" \
+ " bltz %[rc], 1f\n" \
+ " sc." sfx ".rl %[rc], %[rc], %[c]\n" \
+ " bnez %[rc], 0b\n" \
+ " fence rw, rw\n" \
+ "1:\n" \
+ : [p]"=&r" (_prev), [rc]"=&r" (_rc), [c]"+A" (counter) \
+ : \
+ : "memory"); \
+})
+
static __always_inline int arch_atomic_dec_if_positive(atomic_t *v)
{
int prev, rc;

- __asm__ __volatile__ (
- "0: lr.w %[p], %[c]\n"
- " addi %[rc], %[p], -1\n"
- " bltz %[rc], 1f\n"
- " sc.w.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- :
- : "memory");
+ _arch_atomic_dec_if_positive(prev, rc, v->counter, "w");
+
return prev - 1;
}

@@ -304,17 +319,8 @@ static __always_inline bool arch_atomic64_inc_unless_negative(atomic64_t *v)
s64 prev;
long rc;

- __asm__ __volatile__ (
- "0: lr.d %[p], %[c]\n"
- " bltz %[p], 1f\n"
- " addi %[rc], %[p], 1\n"
- " sc.d.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- :
- : "memory");
+ _arch_atomic_inc_unless_negative(prev, rc, v->counter, "d");
+
return !(prev < 0);
}

@@ -325,17 +331,8 @@ static __always_inline bool arch_atomic64_dec_unless_positive(atomic64_t *v)
s64 prev;
long rc;

- __asm__ __volatile__ (
- "0: lr.d %[p], %[c]\n"
- " bgtz %[p], 1f\n"
- " addi %[rc], %[p], -1\n"
- " sc.d.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- :
- : "memory");
+ _arch_atomic_dec_unless_positive(prev, rc, v->counter, "d");
+
return !(prev > 0);
}

@@ -346,17 +343,8 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
s64 prev;
long rc;

- __asm__ __volatile__ (
- "0: lr.d %[p], %[c]\n"
- " addi %[rc], %[p], -1\n"
- " bltz %[rc], 1f\n"
- " sc.d.rl %[rc], %[rc], %[c]\n"
- " bnez %[rc], 0b\n"
- " fence rw, rw\n"
- "1:\n"
- : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
- :
- : "memory");
+ _arch_atomic_dec_if_positive(prev, rc, v->counter, "d");
+
return prev - 1;
}

--
2.43.0


2024-01-03 16:33:53

by Leonardo Bras

[permalink] [raw]
Subject: [PATCH v1 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

xchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.

Implement 1-byte and 2-bytes xchg in order to achieve parity with other
architectures.

Signed-off-by: Leonardo Bras <[email protected]>
Tested-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index ac9d0eeb74e67..26cea2395aae8 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,6 +11,31 @@
#include <asm/barrier.h>
#include <asm/fence.h>

+#define __arch_xchg_masked(prepend, append, r, p, n) \
+({ \
+ u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3); \
+ ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE; \
+ ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0) \
+ << __s; \
+ ulong __newx = (ulong)(n) << __s; \
+ ulong __retx; \
+ ulong __rc; \
+ \
+ __asm__ __volatile__ ( \
+ prepend \
+ "0: lr.w %0, %2\n" \
+ " and %1, %0, %z4\n" \
+ " or %1, %1, %z3\n" \
+ " sc.w %1, %1, %2\n" \
+ " bnez %1, 0b\n" \
+ append \
+ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \
+ : "rJ" (__newx), "rJ" (~__mask) \
+ : "memory"); \
+ \
+ r = (__typeof__(*(p)))((__retx & __mask) >> __s); \
+})
+
#define __arch_xchg(sfx, prepend, append, r, p, n) \
({ \
__asm__ __volatile__ ( \
@@ -27,7 +52,13 @@
__typeof__(ptr) __ptr = (ptr); \
__typeof__(*(__ptr)) __new = (new); \
__typeof__(*(__ptr)) __ret; \
+ \
switch (sizeof(*__ptr)) { \
+ case 1: \
+ case 2: \
+ __arch_xchg_masked(prepend, append, \
+ __ret, __ptr, __new); \
+ break; \
case 4: \
__arch_xchg(".w" sfx, prepend, append, \
__ret, __ptr, __new); \
--
2.43.0


2024-01-03 16:35:01

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Rework & improve riscv cmpxchg.h and atomic.h

On Wed, Jan 03, 2024 at 01:31:58PM -0300, Leonardo Bras wrote:
> While studying riscv's cmpxchg.h file, I got really interested in
> understanding how RISCV asm implemented the different versions of
> {cmp,}xchg.
>
> When I understood the pattern, it made sense for me to remove the
> duplications and create macros to make it easier to understand what exactly
> changes between the versions: Instruction sufixes & barriers.
>
> Also, did the same kind of work on atomic.c.
>
> After that, I noted both cmpxchg and xchg only accept variables of
> size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.
>
> Now that deduplication is done, it is quite direct to implement them
> for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
> me some possible users :)
>
> I did compare the generated asm on a test.c that contained usage for every
> changed function, and could not detect any change on patches 1 + 2 + 3
> compared with upstream.
>
> Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
> booted just fine with qemu -machine virt -append "qspinlock".
>
> (tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)
>
> Latest tests happened based on this tree:
> https://github.com/guoren83/linux/tree/qspinlock_v12
>
> Thanks!
> Leo
>
> Changes since squashed cmpxchg RFCv5:
> - Resend as v1

Oh, forgot to mention:
I added some of Reviewed-by that were missing.
Thanks Guo Ren!


> https://lore.kernel.org/all/[email protected]/
>
> Changes since squashed cmpxchg RFCv4:
> - Added (__typeof__(*(p))) before returning from {cmp,}xchg, as done
> in current upstream, (possibly) fixing the bug from kernel test robot
> https://lore.kernel.org/all/[email protected]/
>
> Changes since squashed cmpxchg RFCv3:
> - Fixed bug on cmpxchg macro for var size 1 & 2: now working
> - Macros for var size 1 & 2's lr.w and sc.w now are guaranteed to receive
> input of a 32-bit aligned address
> - Renamed internal macros from _mask to _masked for patches 4 & 5
> - __rc variable on macros for var size 1 & 2 changed from register to ulong
> https://lore.kernel.org/all/[email protected]/
>
> Changes since squashed cmpxchg RFCv2:
> - Removed rc parameter from the new macro: it can be internal to the macro
> - 2 new patches: cmpxchg size 1 and 2, xchg size 1 and 2
> https://lore.kernel.org/all/[email protected]/
>
> Changes since squashed cmpxchg RFCv1:
> - Unified with atomic.c patchset
> - Rebased on top of torvalds/master (thanks Andrea Parri!)
> - Removed helper macros that were not being used elsewhere in the kernel.
> https://lore.kernel.org/all/[email protected]/
> https://lore.kernel.org/all/[email protected]/
>
> Changes since (cmpxchg) RFCv3:
> - Squashed the 6 original patches in 2: one for cmpxchg and one for xchg
> https://lore.kernel.org/all/[email protected]/
>
> Changes since (cmpxchg) RFCv2:
> - Fixed macros that depend on having a local variable with a magic name
> - Previous cast to (long) is now only applied on 4-bytes cmpxchg
> https://lore.kernel.org/all/[email protected]/
>
> Changes since (cmpxchg) RFCv1:
> - Fixed patch 4/6 suffix from 'w.aqrl' to '.w.aqrl', to avoid build error
> https://lore.kernel.org/all/[email protected]/
>
> Leonardo Bras (5):
> riscv/cmpxchg: Deduplicate xchg() asm functions
> riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
> riscv/atomic.h : Deduplicate arch_atomic.*
> riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
> riscv/cmpxchg: Implement xchg for variables of size 1 and 2
>
> arch/riscv/include/asm/atomic.h | 164 ++++++-------
> arch/riscv/include/asm/cmpxchg.h | 404 ++++++++++---------------------
> 2 files changed, 200 insertions(+), 368 deletions(-)
>
>
> base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
> --
> 2.43.0
>


2024-01-04 19:55:06

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Wed, Jan 03, 2024 at 01:31:59PM -0300, Leonardo Bras wrote:
> In this header every xchg define (_relaxed, _acquire, _release, vanilla)
> contain it's own asm file, both for 4-byte variables an 8-byte variables,
> on a total of 8 versions of mostly the same asm.
>
> This is usually bad, as it means any change may be done in up to 8
> different places.
>
> Unify those versions by creating a new define with enough parameters to
> generate any version of the previous 8.
>
> Then unify the result under a more general define, and simplify
> arch_xchg* generation.
>
> (This did not cause any change in generated asm)
>
> Signed-off-by: Leonardo Bras <[email protected]>
> Reviewed-by: Guo Ren <[email protected]>
> Reviewed-by: Andrea Parri <[email protected]>
> Tested-by: Guo Ren <[email protected]>
> ---
> arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
> 1 file changed, 23 insertions(+), 115 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> index 2f4726d3cfcc2..48478a8eecee7 100644
> --- a/arch/riscv/include/asm/cmpxchg.h
> +++ b/arch/riscv/include/asm/cmpxchg.h
> @@ -11,140 +11,48 @@
> #include <asm/barrier.h>
> #include <asm/fence.h>
>
> -#define __xchg_relaxed(ptr, new, size) \
> +#define __arch_xchg(sfx, prepend, append, r, p, n) \
> ({ \
> - __typeof__(ptr) __ptr = (ptr); \
> - __typeof__(new) __new = (new); \
> - __typeof__(*(ptr)) __ret; \
> - switch (size) { \
> - case 4: \
> - __asm__ __volatile__ ( \
> - " amoswap.w %0, %2, %1\n" \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \

Hmm... actually xchg_relaxed() doesn't need to be a barrier(), so the
"memory" clobber here is not needed here. Of course, it's out of the
scope of this series, but I'm curious to see what would happen if we
remove the "memory" clobber _relaxed() ;-)

Regards,
Boqun

> - break; \
> - case 8: \
> - __asm__ __volatile__ ( \
> - " amoswap.d %0, %2, %1\n" \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> - break; \
> - default: \
> - BUILD_BUG(); \
> - } \
> - __ret; \
> -})
> -
> -#define arch_xchg_relaxed(ptr, x) \
> -({ \
> - __typeof__(*(ptr)) _x_ = (x); \
> - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> - _x_, sizeof(*(ptr))); \
> + __asm__ __volatile__ ( \
> + prepend \
> + " amoswap" sfx " %0, %2, %1\n" \
> + append \
> + : "=r" (r), "+A" (*(p)) \
> + : "r" (n) \
> + : "memory"); \
> })
>
> -#define __xchg_acquire(ptr, new, size) \
> +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> ({ \
> __typeof__(ptr) __ptr = (ptr); \
> - __typeof__(new) __new = (new); \
> - __typeof__(*(ptr)) __ret; \
> - switch (size) { \
> + __typeof__(*(__ptr)) __new = (new); \
> + __typeof__(*(__ptr)) __ret; \
> + switch (sizeof(*__ptr)) { \
> case 4: \
> - __asm__ __volatile__ ( \
> - " amoswap.w %0, %2, %1\n" \
> - RISCV_ACQUIRE_BARRIER \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> + __arch_xchg(".w" sfx, prepend, append, \
> + __ret, __ptr, __new); \
> break; \
> case 8: \
> - __asm__ __volatile__ ( \
> - " amoswap.d %0, %2, %1\n" \
> - RISCV_ACQUIRE_BARRIER \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> + __arch_xchg(".d" sfx, prepend, append, \
> + __ret, __ptr, __new); \
> break; \
> default: \
> BUILD_BUG(); \
> } \
> - __ret; \
> + (__typeof__(*(__ptr)))__ret; \
> })
>
> -#define arch_xchg_acquire(ptr, x) \
> -({ \
> - __typeof__(*(ptr)) _x_ = (x); \
> - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> - _x_, sizeof(*(ptr))); \
> -})
> +#define arch_xchg_relaxed(ptr, x) \
> + _arch_xchg(ptr, x, "", "", "")
>
> -#define __xchg_release(ptr, new, size) \
> -({ \
> - __typeof__(ptr) __ptr = (ptr); \
> - __typeof__(new) __new = (new); \
> - __typeof__(*(ptr)) __ret; \
> - switch (size) { \
> - case 4: \
> - __asm__ __volatile__ ( \
> - RISCV_RELEASE_BARRIER \
> - " amoswap.w %0, %2, %1\n" \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> - break; \
> - case 8: \
> - __asm__ __volatile__ ( \
> - RISCV_RELEASE_BARRIER \
> - " amoswap.d %0, %2, %1\n" \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> - break; \
> - default: \
> - BUILD_BUG(); \
> - } \
> - __ret; \
> -})
> +#define arch_xchg_acquire(ptr, x) \
> + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
>
> #define arch_xchg_release(ptr, x) \
> -({ \
> - __typeof__(*(ptr)) _x_ = (x); \
> - (__typeof__(*(ptr))) __xchg_release((ptr), \
> - _x_, sizeof(*(ptr))); \
> -})
> -
> -#define __arch_xchg(ptr, new, size) \
> -({ \
> - __typeof__(ptr) __ptr = (ptr); \
> - __typeof__(new) __new = (new); \
> - __typeof__(*(ptr)) __ret; \
> - switch (size) { \
> - case 4: \
> - __asm__ __volatile__ ( \
> - " amoswap.w.aqrl %0, %2, %1\n" \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> - break; \
> - case 8: \
> - __asm__ __volatile__ ( \
> - " amoswap.d.aqrl %0, %2, %1\n" \
> - : "=r" (__ret), "+A" (*__ptr) \
> - : "r" (__new) \
> - : "memory"); \
> - break; \
> - default: \
> - BUILD_BUG(); \
> - } \
> - __ret; \
> -})
> + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
>
> #define arch_xchg(ptr, x) \
> -({ \
> - __typeof__(*(ptr)) _x_ = (x); \
> - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> -})
> + _arch_xchg(ptr, x, ".aqrl", "", "")
>
> #define xchg32(ptr, x) \
> ({ \
> --
> 2.43.0
>

2024-01-04 20:42:30

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Thu, Jan 04, 2024 at 11:53:45AM -0800, Boqun Feng wrote:
> On Wed, Jan 03, 2024 at 01:31:59PM -0300, Leonardo Bras wrote:
> > In this header every xchg define (_relaxed, _acquire, _release, vanilla)
> > contain it's own asm file, both for 4-byte variables an 8-byte variables,
> > on a total of 8 versions of mostly the same asm.
> >
> > This is usually bad, as it means any change may be done in up to 8
> > different places.
> >
> > Unify those versions by creating a new define with enough parameters to
> > generate any version of the previous 8.
> >
> > Then unify the result under a more general define, and simplify
> > arch_xchg* generation.
> >
> > (This did not cause any change in generated asm)
> >
> > Signed-off-by: Leonardo Bras <[email protected]>
> > Reviewed-by: Guo Ren <[email protected]>
> > Reviewed-by: Andrea Parri <[email protected]>
> > Tested-by: Guo Ren <[email protected]>
> > ---
> > arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
> > 1 file changed, 23 insertions(+), 115 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index 2f4726d3cfcc2..48478a8eecee7 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -11,140 +11,48 @@
> > #include <asm/barrier.h>
> > #include <asm/fence.h>
> >
> > -#define __xchg_relaxed(ptr, new, size) \
> > +#define __arch_xchg(sfx, prepend, append, r, p, n) \
> > ({ \
> > - __typeof__(ptr) __ptr = (ptr); \
> > - __typeof__(new) __new = (new); \
> > - __typeof__(*(ptr)) __ret; \
> > - switch (size) { \
> > - case 4: \
> > - __asm__ __volatile__ ( \
> > - " amoswap.w %0, %2, %1\n" \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \

Hello Boqun, thanks for reviewing!

>
> Hmm... actually xchg_relaxed() doesn't need to be a barrier(), so the
> "memory" clobber here is not needed here. Of course, it's out of the
> scope of this series, but I'm curious to see what would happen if we
> remove the "memory" clobber _relaxed() ;-)

Nice question :)
I am happy my patch can help bring up those ideas :)


According to gcc.gnu.org:

---
"memory" [clobber]:

The "memory" clobber tells the compiler that the assembly code
performs memory reads or writes to items other than those listed in
the input and output operands (for example, accessing the memory
pointed to by one of the input parameters). To ensure memory contains
correct values, GCC may need to flush specific register values to
memory before executing the asm. Further, the compiler does not assume
that any values read from memory before an asm remain unchanged after
that asm ; it reloads them as needed. Using the "memory" clobber
effectively forms a read/write memory barrier for the compiler.

Note that this clobber does not prevent the processor from doing
speculative reads past the asm statement. To prevent that, you need
processor-specific fence instructions.
---

IIUC above text says that having memory accesses to *__ptr would require
above asm to have the "memory" clobber, so memory accesses don't get
reordered by the compiler.

By above affirmation, all asm in this file should have the "memory"
clobber, since all atomic operations will change memory pointed by an input
ptr. Is that correct?

Thanks!
Leo


>
> Regards,
> Boqun
>
> > - break; \
> > - case 8: \
> > - __asm__ __volatile__ ( \
> > - " amoswap.d %0, %2, %1\n" \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > - break; \
> > - default: \
> > - BUILD_BUG(); \
> > - } \
> > - __ret; \
> > -})
> > -
> > -#define arch_xchg_relaxed(ptr, x) \
> > -({ \
> > - __typeof__(*(ptr)) _x_ = (x); \
> > - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> > - _x_, sizeof(*(ptr))); \
> > + __asm__ __volatile__ ( \
> > + prepend \
> > + " amoswap" sfx " %0, %2, %1\n" \
> > + append \
> > + : "=r" (r), "+A" (*(p)) \
> > + : "r" (n) \
> > + : "memory"); \
> > })
> >
> > -#define __xchg_acquire(ptr, new, size) \
> > +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> > ({ \
> > __typeof__(ptr) __ptr = (ptr); \
> > - __typeof__(new) __new = (new); \
> > - __typeof__(*(ptr)) __ret; \
> > - switch (size) { \
> > + __typeof__(*(__ptr)) __new = (new); \
> > + __typeof__(*(__ptr)) __ret; \
> > + switch (sizeof(*__ptr)) { \
> > case 4: \
> > - __asm__ __volatile__ ( \
> > - " amoswap.w %0, %2, %1\n" \
> > - RISCV_ACQUIRE_BARRIER \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > + __arch_xchg(".w" sfx, prepend, append, \
> > + __ret, __ptr, __new); \
> > break; \
> > case 8: \
> > - __asm__ __volatile__ ( \
> > - " amoswap.d %0, %2, %1\n" \
> > - RISCV_ACQUIRE_BARRIER \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > + __arch_xchg(".d" sfx, prepend, append, \
> > + __ret, __ptr, __new); \
> > break; \
> > default: \
> > BUILD_BUG(); \
> > } \
> > - __ret; \
> > + (__typeof__(*(__ptr)))__ret; \
> > })
> >
> > -#define arch_xchg_acquire(ptr, x) \
> > -({ \
> > - __typeof__(*(ptr)) _x_ = (x); \
> > - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> > - _x_, sizeof(*(ptr))); \
> > -})
> > +#define arch_xchg_relaxed(ptr, x) \
> > + _arch_xchg(ptr, x, "", "", "")
> >
> > -#define __xchg_release(ptr, new, size) \
> > -({ \
> > - __typeof__(ptr) __ptr = (ptr); \
> > - __typeof__(new) __new = (new); \
> > - __typeof__(*(ptr)) __ret; \
> > - switch (size) { \
> > - case 4: \
> > - __asm__ __volatile__ ( \
> > - RISCV_RELEASE_BARRIER \
> > - " amoswap.w %0, %2, %1\n" \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > - break; \
> > - case 8: \
> > - __asm__ __volatile__ ( \
> > - RISCV_RELEASE_BARRIER \
> > - " amoswap.d %0, %2, %1\n" \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > - break; \
> > - default: \
> > - BUILD_BUG(); \
> > - } \
> > - __ret; \
> > -})
> > +#define arch_xchg_acquire(ptr, x) \
> > + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> >
> > #define arch_xchg_release(ptr, x) \
> > -({ \
> > - __typeof__(*(ptr)) _x_ = (x); \
> > - (__typeof__(*(ptr))) __xchg_release((ptr), \
> > - _x_, sizeof(*(ptr))); \
> > -})
> > -
> > -#define __arch_xchg(ptr, new, size) \
> > -({ \
> > - __typeof__(ptr) __ptr = (ptr); \
> > - __typeof__(new) __new = (new); \
> > - __typeof__(*(ptr)) __ret; \
> > - switch (size) { \
> > - case 4: \
> > - __asm__ __volatile__ ( \
> > - " amoswap.w.aqrl %0, %2, %1\n" \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > - break; \
> > - case 8: \
> > - __asm__ __volatile__ ( \
> > - " amoswap.d.aqrl %0, %2, %1\n" \
> > - : "=r" (__ret), "+A" (*__ptr) \
> > - : "r" (__new) \
> > - : "memory"); \
> > - break; \
> > - default: \
> > - BUILD_BUG(); \
> > - } \
> > - __ret; \
> > -})
> > + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> >
> > #define arch_xchg(ptr, x) \
> > -({ \
> > - __typeof__(*(ptr)) _x_ = (x); \
> > - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> > -})
> > + _arch_xchg(ptr, x, ".aqrl", "", "")
> >
> > #define xchg32(ptr, x) \
> > ({ \
> > --
> > 2.43.0
> >
>


2024-01-04 21:52:44

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Thu, Jan 04, 2024 at 05:41:26PM -0300, Leonardo Bras wrote:
> On Thu, Jan 04, 2024 at 11:53:45AM -0800, Boqun Feng wrote:
> > On Wed, Jan 03, 2024 at 01:31:59PM -0300, Leonardo Bras wrote:
> > > In this header every xchg define (_relaxed, _acquire, _release, vanilla)
> > > contain it's own asm file, both for 4-byte variables an 8-byte variables,
> > > on a total of 8 versions of mostly the same asm.
> > >
> > > This is usually bad, as it means any change may be done in up to 8
> > > different places.
> > >
> > > Unify those versions by creating a new define with enough parameters to
> > > generate any version of the previous 8.
> > >
> > > Then unify the result under a more general define, and simplify
> > > arch_xchg* generation.
> > >
> > > (This did not cause any change in generated asm)
> > >
> > > Signed-off-by: Leonardo Bras <[email protected]>
> > > Reviewed-by: Guo Ren <[email protected]>
> > > Reviewed-by: Andrea Parri <[email protected]>
> > > Tested-by: Guo Ren <[email protected]>
> > > ---
> > > arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
> > > 1 file changed, 23 insertions(+), 115 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > > index 2f4726d3cfcc2..48478a8eecee7 100644
> > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > @@ -11,140 +11,48 @@
> > > #include <asm/barrier.h>
> > > #include <asm/fence.h>
> > >
> > > -#define __xchg_relaxed(ptr, new, size) \
> > > +#define __arch_xchg(sfx, prepend, append, r, p, n) \
> > > ({ \
> > > - __typeof__(ptr) __ptr = (ptr); \
> > > - __typeof__(new) __new = (new); \
> > > - __typeof__(*(ptr)) __ret; \
> > > - switch (size) { \
> > > - case 4: \
> > > - __asm__ __volatile__ ( \
> > > - " amoswap.w %0, %2, %1\n" \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
>
> Hello Boqun, thanks for reviewing!
>
> >
> > Hmm... actually xchg_relaxed() doesn't need to be a barrier(), so the
> > "memory" clobber here is not needed here. Of course, it's out of the
> > scope of this series, but I'm curious to see what would happen if we
> > remove the "memory" clobber _relaxed() ;-)
>
> Nice question :)
> I am happy my patch can help bring up those ideas :)
>
>
> According to gcc.gnu.org:
>
> ---
> "memory" [clobber]:
>
> The "memory" clobber tells the compiler that the assembly code
> performs memory reads or writes to items other than those listed in
> the input and output operands (for example, accessing the memory
> pointed to by one of the input parameters). To ensure memory contains

Note here it says "other than those listed in the input and output
operands", and in the above asm block, the memory pointed by "__ptr" is
already marked as read-and-write by the asm block via "+A" (*__ptr), so
the compiler knows the asm block may modify the memory pointed by
"__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.

Here is an example showing the difference, considering the follow case:

this_val = *this;
that_val = *that;
xchg_relaxed(this, 1);
reread_this = *this;

by the semantics of _relaxed, compilers can optimize the above into

this_val = *this;
xchg_relaxed(this, 1);
that_val = *that;
reread_this = *this;

but the "memory" clobber in the xchg_relexed() will provide this.
Needless to say the '"+A" (*__ptr)' prevents compiler from the following
optimization:

this_val = *this;
that_val = *that;
xchg_relaxed(this, 1);
reread_this = this_val;

since the compiler knows the asm block will read and write *this.

Regards,
Boqun

> correct values, GCC may need to flush specific register values to
> memory before executing the asm. Further, the compiler does not assume
> that any values read from memory before an asm remain unchanged after
> that asm ; it reloads them as needed. Using the "memory" clobber
> effectively forms a read/write memory barrier for the compiler.
>
> Note that this clobber does not prevent the processor from doing
> speculative reads past the asm statement. To prevent that, you need
> processor-specific fence instructions.
> ---
>
> IIUC above text says that having memory accesses to *__ptr would require
> above asm to have the "memory" clobber, so memory accesses don't get
> reordered by the compiler.
>
> By above affirmation, all asm in this file should have the "memory"
> clobber, since all atomic operations will change memory pointed by an input
> ptr. Is that correct?
>
> Thanks!
> Leo
>
>
> >
> > Regards,
> > Boqun
> >
> > > - break; \
> > > - case 8: \
> > > - __asm__ __volatile__ ( \
> > > - " amoswap.d %0, %2, %1\n" \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > - break; \
> > > - default: \
> > > - BUILD_BUG(); \
> > > - } \
> > > - __ret; \
> > > -})
> > > -
> > > -#define arch_xchg_relaxed(ptr, x) \
> > > -({ \
> > > - __typeof__(*(ptr)) _x_ = (x); \
> > > - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> > > - _x_, sizeof(*(ptr))); \
> > > + __asm__ __volatile__ ( \
> > > + prepend \
> > > + " amoswap" sfx " %0, %2, %1\n" \
> > > + append \
> > > + : "=r" (r), "+A" (*(p)) \
> > > + : "r" (n) \
> > > + : "memory"); \
> > > })
> > >
> > > -#define __xchg_acquire(ptr, new, size) \
> > > +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> > > ({ \
> > > __typeof__(ptr) __ptr = (ptr); \
> > > - __typeof__(new) __new = (new); \
> > > - __typeof__(*(ptr)) __ret; \
> > > - switch (size) { \
> > > + __typeof__(*(__ptr)) __new = (new); \
> > > + __typeof__(*(__ptr)) __ret; \
> > > + switch (sizeof(*__ptr)) { \
> > > case 4: \
> > > - __asm__ __volatile__ ( \
> > > - " amoswap.w %0, %2, %1\n" \
> > > - RISCV_ACQUIRE_BARRIER \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > + __arch_xchg(".w" sfx, prepend, append, \
> > > + __ret, __ptr, __new); \
> > > break; \
> > > case 8: \
> > > - __asm__ __volatile__ ( \
> > > - " amoswap.d %0, %2, %1\n" \
> > > - RISCV_ACQUIRE_BARRIER \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > + __arch_xchg(".d" sfx, prepend, append, \
> > > + __ret, __ptr, __new); \
> > > break; \
> > > default: \
> > > BUILD_BUG(); \
> > > } \
> > > - __ret; \
> > > + (__typeof__(*(__ptr)))__ret; \
> > > })
> > >
> > > -#define arch_xchg_acquire(ptr, x) \
> > > -({ \
> > > - __typeof__(*(ptr)) _x_ = (x); \
> > > - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> > > - _x_, sizeof(*(ptr))); \
> > > -})
> > > +#define arch_xchg_relaxed(ptr, x) \
> > > + _arch_xchg(ptr, x, "", "", "")
> > >
> > > -#define __xchg_release(ptr, new, size) \
> > > -({ \
> > > - __typeof__(ptr) __ptr = (ptr); \
> > > - __typeof__(new) __new = (new); \
> > > - __typeof__(*(ptr)) __ret; \
> > > - switch (size) { \
> > > - case 4: \
> > > - __asm__ __volatile__ ( \
> > > - RISCV_RELEASE_BARRIER \
> > > - " amoswap.w %0, %2, %1\n" \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > - break; \
> > > - case 8: \
> > > - __asm__ __volatile__ ( \
> > > - RISCV_RELEASE_BARRIER \
> > > - " amoswap.d %0, %2, %1\n" \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > - break; \
> > > - default: \
> > > - BUILD_BUG(); \
> > > - } \
> > > - __ret; \
> > > -})
> > > +#define arch_xchg_acquire(ptr, x) \
> > > + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > >
> > > #define arch_xchg_release(ptr, x) \
> > > -({ \
> > > - __typeof__(*(ptr)) _x_ = (x); \
> > > - (__typeof__(*(ptr))) __xchg_release((ptr), \
> > > - _x_, sizeof(*(ptr))); \
> > > -})
> > > -
> > > -#define __arch_xchg(ptr, new, size) \
> > > -({ \
> > > - __typeof__(ptr) __ptr = (ptr); \
> > > - __typeof__(new) __new = (new); \
> > > - __typeof__(*(ptr)) __ret; \
> > > - switch (size) { \
> > > - case 4: \
> > > - __asm__ __volatile__ ( \
> > > - " amoswap.w.aqrl %0, %2, %1\n" \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > - break; \
> > > - case 8: \
> > > - __asm__ __volatile__ ( \
> > > - " amoswap.d.aqrl %0, %2, %1\n" \
> > > - : "=r" (__ret), "+A" (*__ptr) \
> > > - : "r" (__new) \
> > > - : "memory"); \
> > > - break; \
> > > - default: \
> > > - BUILD_BUG(); \
> > > - } \
> > > - __ret; \
> > > -})
> > > + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > >
> > > #define arch_xchg(ptr, x) \
> > > -({ \
> > > - __typeof__(*(ptr)) _x_ = (x); \
> > > - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> > > -})
> > > + _arch_xchg(ptr, x, ".aqrl", "", "")
> > >
> > > #define xchg32(ptr, x) \
> > > ({ \
> > > --
> > > 2.43.0
> > >
> >
>

2024-01-05 04:46:12

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Thu, Jan 04, 2024 at 01:51:20PM -0800, Boqun Feng wrote:
> On Thu, Jan 04, 2024 at 05:41:26PM -0300, Leonardo Bras wrote:
> > On Thu, Jan 04, 2024 at 11:53:45AM -0800, Boqun Feng wrote:
> > > On Wed, Jan 03, 2024 at 01:31:59PM -0300, Leonardo Bras wrote:
> > > > In this header every xchg define (_relaxed, _acquire, _release, vanilla)
> > > > contain it's own asm file, both for 4-byte variables an 8-byte variables,
> > > > on a total of 8 versions of mostly the same asm.
> > > >
> > > > This is usually bad, as it means any change may be done in up to 8
> > > > different places.
> > > >
> > > > Unify those versions by creating a new define with enough parameters to
> > > > generate any version of the previous 8.
> > > >
> > > > Then unify the result under a more general define, and simplify
> > > > arch_xchg* generation.
> > > >
> > > > (This did not cause any change in generated asm)
> > > >
> > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > > Reviewed-by: Guo Ren <[email protected]>
> > > > Reviewed-by: Andrea Parri <[email protected]>
> > > > Tested-by: Guo Ren <[email protected]>
> > > > ---
> > > > arch/riscv/include/asm/cmpxchg.h | 138 ++++++-------------------------
> > > > 1 file changed, 23 insertions(+), 115 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > > > index 2f4726d3cfcc2..48478a8eecee7 100644
> > > > --- a/arch/riscv/include/asm/cmpxchg.h
> > > > +++ b/arch/riscv/include/asm/cmpxchg.h
> > > > @@ -11,140 +11,48 @@
> > > > #include <asm/barrier.h>
> > > > #include <asm/fence.h>
> > > >
> > > > -#define __xchg_relaxed(ptr, new, size) \
> > > > +#define __arch_xchg(sfx, prepend, append, r, p, n) \
> > > > ({ \
> > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > - case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.w %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> >
> > Hello Boqun, thanks for reviewing!
> >
> > >
> > > Hmm... actually xchg_relaxed() doesn't need to be a barrier(), so the
> > > "memory" clobber here is not needed here. Of course, it's out of the
> > > scope of this series, but I'm curious to see what would happen if we
> > > remove the "memory" clobber _relaxed() ;-)
> >
> > Nice question :)
> > I am happy my patch can help bring up those ideas :)
> >
> >
> > According to gcc.gnu.org:
> >
> > ---
> > "memory" [clobber]:
> >
> > The "memory" clobber tells the compiler that the assembly code
> > performs memory reads or writes to items other than those listed in
> > the input and output operands (for example, accessing the memory
> > pointed to by one of the input parameters). To ensure memory contains
>
> Note here it says "other than those listed in the input and output
> operands", and in the above asm block, the memory pointed by "__ptr" is
> already marked as read-and-write by the asm block via "+A" (*__ptr), so
> the compiler knows the asm block may modify the memory pointed by
> "__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.

Thanks for pointing that out!
That helped me improve my understanding on constraints for asm operands :)
(I ended up getting even more info from the gcc manual)

So "+A" constraints means the operand will get read/write and it's an
address stored into a register.

>
> Here is an example showing the difference, considering the follow case:
>
> this_val = *this;
> that_val = *that;
> xchg_relaxed(this, 1);
> reread_this = *this;
>
> by the semantics of _relaxed, compilers can optimize the above into
>
> this_val = *this;
> xchg_relaxed(this, 1);
> that_val = *that;
> reread_this = *this;
>

Seems correct, since there is no barrier().

> but the "memory" clobber in the xchg_relexed() will provide this.

By 'this' here you mean the barrier? I mean, IIUC "memory" clobber will
avoid the above optimization, right?

> Needless to say the '"+A" (*__ptr)' prevents compiler from the following
> optimization:
>
> this_val = *this;
> that_val = *that;
> xchg_relaxed(this, 1);
> reread_this = this_val;
>
> since the compiler knows the asm block will read and write *this.

Right, the compiler knows that address will be wrote by the asm block, and
so it reloads the value instead of re-using the old one.


A question, though:
Do we need the "memory" clobber in any other xchg / cmpxchg asm?
I mean, usually the only write to memory will happen in the *__ptr, which
should be safe by "+A".

I understand that since the others are not "relaxed" they will need to
have a barrier, but is not the compiler supposed to understand the barrier
instruction and avoid compiler reordering / optimizations across given
instruction ?


Thanks!
Leo

> Regards,
> Boqun
>
> > correct values, GCC may need to flush specific register values to
> > memory before executing the asm. Further, the compiler does not assume
> > that any values read from memory before an asm remain unchanged after
> > that asm ; it reloads them as needed. Using the "memory" clobber
> > effectively forms a read/write memory barrier for the compiler.
> >
> > Note that this clobber does not prevent the processor from doing
> > speculative reads past the asm statement. To prevent that, you need
> > processor-specific fence instructions.
> > ---
> >
> > IIUC above text says that having memory accesses to *__ptr would require
> > above asm to have the "memory" clobber, so memory accesses don't get
> > reordered by the compiler.
> >
> > By above affirmation, all asm in this file should have the "memory"
> > clobber, since all atomic operations will change memory pointed by an input
> > ptr. Is that correct?
> >
> > Thanks!
> > Leo
> >
> >
> > >
> > > Regards,
> > > Boqun
> > >
> > > > - break; \
> > > > - case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.d %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - default: \
> > > > - BUILD_BUG(); \
> > > > - } \
> > > > - __ret; \
> > > > -})
> > > > -
> > > > -#define arch_xchg_relaxed(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> > > > - _x_, sizeof(*(ptr))); \
> > > > + __asm__ __volatile__ ( \
> > > > + prepend \
> > > > + " amoswap" sfx " %0, %2, %1\n" \
> > > > + append \
> > > > + : "=r" (r), "+A" (*(p)) \
> > > > + : "r" (n) \
> > > > + : "memory"); \
> > > > })
> > > >
> > > > -#define __xchg_acquire(ptr, new, size) \
> > > > +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> > > > ({ \
> > > > __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > + __typeof__(*(__ptr)) __new = (new); \
> > > > + __typeof__(*(__ptr)) __ret; \
> > > > + switch (sizeof(*__ptr)) { \
> > > > case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.w %0, %2, %1\n" \
> > > > - RISCV_ACQUIRE_BARRIER \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > + __arch_xchg(".w" sfx, prepend, append, \
> > > > + __ret, __ptr, __new); \
> > > > break; \
> > > > case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.d %0, %2, %1\n" \
> > > > - RISCV_ACQUIRE_BARRIER \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > + __arch_xchg(".d" sfx, prepend, append, \
> > > > + __ret, __ptr, __new); \
> > > > break; \
> > > > default: \
> > > > BUILD_BUG(); \
> > > > } \
> > > > - __ret; \
> > > > + (__typeof__(*(__ptr)))__ret; \
> > > > })
> > > >
> > > > -#define arch_xchg_acquire(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> > > > - _x_, sizeof(*(ptr))); \
> > > > -})
> > > > +#define arch_xchg_relaxed(ptr, x) \
> > > > + _arch_xchg(ptr, x, "", "", "")
> > > >
> > > > -#define __xchg_release(ptr, new, size) \
> > > > -({ \
> > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > - case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - RISCV_RELEASE_BARRIER \
> > > > - " amoswap.w %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - RISCV_RELEASE_BARRIER \
> > > > - " amoswap.d %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - default: \
> > > > - BUILD_BUG(); \
> > > > - } \
> > > > - __ret; \
> > > > -})
> > > > +#define arch_xchg_acquire(ptr, x) \
> > > > + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > > >
> > > > #define arch_xchg_release(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __xchg_release((ptr), \
> > > > - _x_, sizeof(*(ptr))); \
> > > > -})
> > > > -
> > > > -#define __arch_xchg(ptr, new, size) \
> > > > -({ \
> > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > - __typeof__(new) __new = (new); \
> > > > - __typeof__(*(ptr)) __ret; \
> > > > - switch (size) { \
> > > > - case 4: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.w.aqrl %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - case 8: \
> > > > - __asm__ __volatile__ ( \
> > > > - " amoswap.d.aqrl %0, %2, %1\n" \
> > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > - : "r" (__new) \
> > > > - : "memory"); \
> > > > - break; \
> > > > - default: \
> > > > - BUILD_BUG(); \
> > > > - } \
> > > > - __ret; \
> > > > -})
> > > > + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > > >
> > > > #define arch_xchg(ptr, x) \
> > > > -({ \
> > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> > > > -})
> > > > + _arch_xchg(ptr, x, ".aqrl", "", "")
> > > >
> > > > #define xchg32(ptr, x) \
> > > > ({ \
> > > > --
> > > > 2.43.0
> > > >
> > >
> >
>


2024-01-05 05:18:33

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Fri, Jan 05, 2024 at 01:45:42AM -0300, Leonardo Bras wrote:
[...]
> > > According to gcc.gnu.org:
> > >
> > > ---
> > > "memory" [clobber]:
> > >
> > > The "memory" clobber tells the compiler that the assembly code
> > > performs memory reads or writes to items other than those listed in
> > > the input and output operands (for example, accessing the memory
> > > pointed to by one of the input parameters). To ensure memory contains
> >
> > Note here it says "other than those listed in the input and output
> > operands", and in the above asm block, the memory pointed by "__ptr" is
> > already marked as read-and-write by the asm block via "+A" (*__ptr), so
> > the compiler knows the asm block may modify the memory pointed by
> > "__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.
>
> Thanks for pointing that out!
> That helped me improve my understanding on constraints for asm operands :)
> (I ended up getting even more info from the gcc manual)
>
> So "+A" constraints means the operand will get read/write and it's an
> address stored into a register.
>
> >
> > Here is an example showing the difference, considering the follow case:
> >
> > this_val = *this;
> > that_val = *that;
> > xchg_relaxed(this, 1);
> > reread_this = *this;
> >
> > by the semantics of _relaxed, compilers can optimize the above into
> >
> > this_val = *this;
> > xchg_relaxed(this, 1);
> > that_val = *that;
> > reread_this = *this;
> >
>
> Seems correct, since there is no barrier().
>
> > but the "memory" clobber in the xchg_relexed() will provide this.
>
> By 'this' here you mean the barrier? I mean, IIUC "memory" clobber will
> avoid the above optimization, right?
>

Right, seems I mis-typed "provide" (I meant "prevent")

> > Needless to say the '"+A" (*__ptr)' prevents compiler from the following
> > optimization:
> >
> > this_val = *this;
> > that_val = *that;
> > xchg_relaxed(this, 1);
> > reread_this = this_val;
> >
> > since the compiler knows the asm block will read and write *this.
>
> Right, the compiler knows that address will be wrote by the asm block, and
> so it reloads the value instead of re-using the old one.
>

Correct.

>
> A question, though:

Good question ;-)

> Do we need the "memory" clobber in any other xchg / cmpxchg asm?

The "memory" clobber is needed for others, see below:

> I mean, usually the only write to memory will happen in the *__ptr, which
> should be safe by "+A".
>
> I understand that since the others are not "relaxed" they will need to
> have a barrier, but is not the compiler supposed to understand the barrier
> instruction and avoid compiler reordering / optimizations across given
> instruction ?
>

The barrier semantics (ACQUIRE/RELEASE/FULL) is provided by the combined
effort of both 1) preventing compiler optimization by "memory" clobber
and 2) preventing CPU/memory reordering by arch-specific instructions.

In other words, an asm block contains a hardware barrier instruction
should always have the "memory" clobber, otherwise, there are
possiblities that compilers reorder the asm block therefore break the
ordering provided by the hardware instructions.

Regards,
Boqun

>
> Thanks!
> Leo
>
> > Regards,
> > Boqun
> >
> > > correct values, GCC may need to flush specific register values to
> > > memory before executing the asm. Further, the compiler does not assume
> > > that any values read from memory before an asm remain unchanged after
> > > that asm ; it reloads them as needed. Using the "memory" clobber
> > > effectively forms a read/write memory barrier for the compiler.
> > >
> > > Note that this clobber does not prevent the processor from doing
> > > speculative reads past the asm statement. To prevent that, you need
> > > processor-specific fence instructions.
> > > ---
> > >
> > > IIUC above text says that having memory accesses to *__ptr would require
> > > above asm to have the "memory" clobber, so memory accesses don't get
> > > reordered by the compiler.
> > >
> > > By above affirmation, all asm in this file should have the "memory"
> > > clobber, since all atomic operations will change memory pointed by an input
> > > ptr. Is that correct?
> > >
> > > Thanks!
> > > Leo
> > >
> > >
> > > >
> > > > Regards,
> > > > Boqun
> > > >
> > > > > - break; \
> > > > > - case 8: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - " amoswap.d %0, %2, %1\n" \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > - break; \
> > > > > - default: \
> > > > > - BUILD_BUG(); \
> > > > > - } \
> > > > > - __ret; \
> > > > > -})
> > > > > -
> > > > > -#define arch_xchg_relaxed(ptr, x) \
> > > > > -({ \
> > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> > > > > - _x_, sizeof(*(ptr))); \
> > > > > + __asm__ __volatile__ ( \
> > > > > + prepend \
> > > > > + " amoswap" sfx " %0, %2, %1\n" \
> > > > > + append \
> > > > > + : "=r" (r), "+A" (*(p)) \
> > > > > + : "r" (n) \
> > > > > + : "memory"); \
> > > > > })
> > > > >
> > > > > -#define __xchg_acquire(ptr, new, size) \
> > > > > +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> > > > > ({ \
> > > > > __typeof__(ptr) __ptr = (ptr); \
> > > > > - __typeof__(new) __new = (new); \
> > > > > - __typeof__(*(ptr)) __ret; \
> > > > > - switch (size) { \
> > > > > + __typeof__(*(__ptr)) __new = (new); \
> > > > > + __typeof__(*(__ptr)) __ret; \
> > > > > + switch (sizeof(*__ptr)) { \
> > > > > case 4: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - " amoswap.w %0, %2, %1\n" \
> > > > > - RISCV_ACQUIRE_BARRIER \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > + __arch_xchg(".w" sfx, prepend, append, \
> > > > > + __ret, __ptr, __new); \
> > > > > break; \
> > > > > case 8: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - " amoswap.d %0, %2, %1\n" \
> > > > > - RISCV_ACQUIRE_BARRIER \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > + __arch_xchg(".d" sfx, prepend, append, \
> > > > > + __ret, __ptr, __new); \
> > > > > break; \
> > > > > default: \
> > > > > BUILD_BUG(); \
> > > > > } \
> > > > > - __ret; \
> > > > > + (__typeof__(*(__ptr)))__ret; \
> > > > > })
> > > > >
> > > > > -#define arch_xchg_acquire(ptr, x) \
> > > > > -({ \
> > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> > > > > - _x_, sizeof(*(ptr))); \
> > > > > -})
> > > > > +#define arch_xchg_relaxed(ptr, x) \
> > > > > + _arch_xchg(ptr, x, "", "", "")
> > > > >
> > > > > -#define __xchg_release(ptr, new, size) \
> > > > > -({ \
> > > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > > - __typeof__(new) __new = (new); \
> > > > > - __typeof__(*(ptr)) __ret; \
> > > > > - switch (size) { \
> > > > > - case 4: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - RISCV_RELEASE_BARRIER \
> > > > > - " amoswap.w %0, %2, %1\n" \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > - break; \
> > > > > - case 8: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - RISCV_RELEASE_BARRIER \
> > > > > - " amoswap.d %0, %2, %1\n" \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > - break; \
> > > > > - default: \
> > > > > - BUILD_BUG(); \
> > > > > - } \
> > > > > - __ret; \
> > > > > -})
> > > > > +#define arch_xchg_acquire(ptr, x) \
> > > > > + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > > > >
> > > > > #define arch_xchg_release(ptr, x) \
> > > > > -({ \
> > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > - (__typeof__(*(ptr))) __xchg_release((ptr), \
> > > > > - _x_, sizeof(*(ptr))); \
> > > > > -})
> > > > > -
> > > > > -#define __arch_xchg(ptr, new, size) \
> > > > > -({ \
> > > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > > - __typeof__(new) __new = (new); \
> > > > > - __typeof__(*(ptr)) __ret; \
> > > > > - switch (size) { \
> > > > > - case 4: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - " amoswap.w.aqrl %0, %2, %1\n" \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > - break; \
> > > > > - case 8: \
> > > > > - __asm__ __volatile__ ( \
> > > > > - " amoswap.d.aqrl %0, %2, %1\n" \
> > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > - : "r" (__new) \
> > > > > - : "memory"); \
> > > > > - break; \
> > > > > - default: \
> > > > > - BUILD_BUG(); \
> > > > > - } \
> > > > > - __ret; \
> > > > > -})
> > > > > + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > > > >
> > > > > #define arch_xchg(ptr, x) \
> > > > > -({ \
> > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> > > > > -})
> > > > > + _arch_xchg(ptr, x, ".aqrl", "", "")
> > > > >
> > > > > #define xchg32(ptr, x) \
> > > > > ({ \
> > > > > --
> > > > > 2.43.0
> > > > >
> > > >
> > >
> >
>

2024-01-05 07:00:09

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Thu, Jan 04, 2024 at 09:18:15PM -0800, Boqun Feng wrote:
> On Fri, Jan 05, 2024 at 01:45:42AM -0300, Leonardo Bras wrote:
> [...]
> > > > According to gcc.gnu.org:
> > > >
> > > > ---
> > > > "memory" [clobber]:
> > > >
> > > > The "memory" clobber tells the compiler that the assembly code
> > > > performs memory reads or writes to items other than those listed in
> > > > the input and output operands (for example, accessing the memory
> > > > pointed to by one of the input parameters). To ensure memory contains
> > >
> > > Note here it says "other than those listed in the input and output
> > > operands", and in the above asm block, the memory pointed by "__ptr" is
> > > already marked as read-and-write by the asm block via "+A" (*__ptr), so
> > > the compiler knows the asm block may modify the memory pointed by
> > > "__ptr", therefore in _relaxed() case, "memory" clobber can be avoided.
> >
> > Thanks for pointing that out!
> > That helped me improve my understanding on constraints for asm operands :)
> > (I ended up getting even more info from the gcc manual)
> >
> > So "+A" constraints means the operand will get read/write and it's an
> > address stored into a register.
> >
> > >
> > > Here is an example showing the difference, considering the follow case:
> > >
> > > this_val = *this;
> > > that_val = *that;
> > > xchg_relaxed(this, 1);
> > > reread_this = *this;
> > >
> > > by the semantics of _relaxed, compilers can optimize the above into
> > >
> > > this_val = *this;
> > > xchg_relaxed(this, 1);
> > > that_val = *that;
> > > reread_this = *this;
> > >
> >
> > Seems correct, since there is no barrier().
> >
> > > but the "memory" clobber in the xchg_relexed() will provide this.
> >
> > By 'this' here you mean the barrier? I mean, IIUC "memory" clobber will
> > avoid the above optimization, right?
> >
>
> Right, seems I mis-typed "provide" (I meant "prevent")
>
> > > Needless to say the '"+A" (*__ptr)' prevents compiler from the following
> > > optimization:
> > >
> > > this_val = *this;
> > > that_val = *that;
> > > xchg_relaxed(this, 1);
> > > reread_this = this_val;
> > >
> > > since the compiler knows the asm block will read and write *this.
> >
> > Right, the compiler knows that address will be wrote by the asm block, and
> > so it reloads the value instead of re-using the old one.
> >
>
> Correct.
>
> >
> > A question, though:
>
> Good question ;-)
>
> > Do we need the "memory" clobber in any other xchg / cmpxchg asm?
>
> The "memory" clobber is needed for others, see below:
>
> > I mean, usually the only write to memory will happen in the *__ptr, which
> > should be safe by "+A".
> >
> > I understand that since the others are not "relaxed" they will need to
> > have a barrier, but is not the compiler supposed to understand the barrier
> > instruction and avoid compiler reordering / optimizations across given
> > instruction ?
> >
>
> The barrier semantics (ACQUIRE/RELEASE/FULL) is provided by the combined
> effort of both 1) preventing compiler optimization by "memory" clobber
> and 2) preventing CPU/memory reordering by arch-specific instructions.
>
> In other words, an asm block contains a hardware barrier instruction
> should always have the "memory" clobber, otherwise, there are
> possiblities that compilers reorder the asm block therefore break the
> ordering provided by the hardware instructions.

Oh, I see.
So this means the compiler does not check for memory barrier instructions
before reordering loads/stores. Right?

Meaning it needs a way to signal a compiler barrier, on top of the barrier
instructions.

Thanks for helping me improve my understanding of this!
Leo

>
> Regards,
> Boqun
>
> >
> > Thanks!
> > Leo
> >
> > > Regards,
> > > Boqun
> > >
> > > > correct values, GCC may need to flush specific register values to
> > > > memory before executing the asm. Further, the compiler does not assume
> > > > that any values read from memory before an asm remain unchanged after
> > > > that asm ; it reloads them as needed. Using the "memory" clobber
> > > > effectively forms a read/write memory barrier for the compiler.
> > > >
> > > > Note that this clobber does not prevent the processor from doing
> > > > speculative reads past the asm statement. To prevent that, you need
> > > > processor-specific fence instructions.
> > > > ---
> > > >
> > > > IIUC above text says that having memory accesses to *__ptr would require
> > > > above asm to have the "memory" clobber, so memory accesses don't get
> > > > reordered by the compiler.
> > > >
> > > > By above affirmation, all asm in this file should have the "memory"
> > > > clobber, since all atomic operations will change memory pointed by an input
> > > > ptr. Is that correct?
> > > >
> > > > Thanks!
> > > > Leo
> > > >
> > > >
> > > > >
> > > > > Regards,
> > > > > Boqun
> > > > >
> > > > > > - break; \
> > > > > > - case 8: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - " amoswap.d %0, %2, %1\n" \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > - break; \
> > > > > > - default: \
> > > > > > - BUILD_BUG(); \
> > > > > > - } \
> > > > > > - __ret; \
> > > > > > -})
> > > > > > -
> > > > > > -#define arch_xchg_relaxed(ptr, x) \
> > > > > > -({ \
> > > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > > - (__typeof__(*(ptr))) __xchg_relaxed((ptr), \
> > > > > > - _x_, sizeof(*(ptr))); \
> > > > > > + __asm__ __volatile__ ( \
> > > > > > + prepend \
> > > > > > + " amoswap" sfx " %0, %2, %1\n" \
> > > > > > + append \
> > > > > > + : "=r" (r), "+A" (*(p)) \
> > > > > > + : "r" (n) \
> > > > > > + : "memory"); \
> > > > > > })
> > > > > >
> > > > > > -#define __xchg_acquire(ptr, new, size) \
> > > > > > +#define _arch_xchg(ptr, new, sfx, prepend, append) \
> > > > > > ({ \
> > > > > > __typeof__(ptr) __ptr = (ptr); \
> > > > > > - __typeof__(new) __new = (new); \
> > > > > > - __typeof__(*(ptr)) __ret; \
> > > > > > - switch (size) { \
> > > > > > + __typeof__(*(__ptr)) __new = (new); \
> > > > > > + __typeof__(*(__ptr)) __ret; \
> > > > > > + switch (sizeof(*__ptr)) { \
> > > > > > case 4: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - " amoswap.w %0, %2, %1\n" \
> > > > > > - RISCV_ACQUIRE_BARRIER \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > + __arch_xchg(".w" sfx, prepend, append, \
> > > > > > + __ret, __ptr, __new); \
> > > > > > break; \
> > > > > > case 8: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - " amoswap.d %0, %2, %1\n" \
> > > > > > - RISCV_ACQUIRE_BARRIER \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > + __arch_xchg(".d" sfx, prepend, append, \
> > > > > > + __ret, __ptr, __new); \
> > > > > > break; \
> > > > > > default: \
> > > > > > BUILD_BUG(); \
> > > > > > } \
> > > > > > - __ret; \
> > > > > > + (__typeof__(*(__ptr)))__ret; \
> > > > > > })
> > > > > >
> > > > > > -#define arch_xchg_acquire(ptr, x) \
> > > > > > -({ \
> > > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > > - (__typeof__(*(ptr))) __xchg_acquire((ptr), \
> > > > > > - _x_, sizeof(*(ptr))); \
> > > > > > -})
> > > > > > +#define arch_xchg_relaxed(ptr, x) \
> > > > > > + _arch_xchg(ptr, x, "", "", "")
> > > > > >
> > > > > > -#define __xchg_release(ptr, new, size) \
> > > > > > -({ \
> > > > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > > > - __typeof__(new) __new = (new); \
> > > > > > - __typeof__(*(ptr)) __ret; \
> > > > > > - switch (size) { \
> > > > > > - case 4: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - RISCV_RELEASE_BARRIER \
> > > > > > - " amoswap.w %0, %2, %1\n" \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > - break; \
> > > > > > - case 8: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - RISCV_RELEASE_BARRIER \
> > > > > > - " amoswap.d %0, %2, %1\n" \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > - break; \
> > > > > > - default: \
> > > > > > - BUILD_BUG(); \
> > > > > > - } \
> > > > > > - __ret; \
> > > > > > -})
> > > > > > +#define arch_xchg_acquire(ptr, x) \
> > > > > > + _arch_xchg(ptr, x, "", "", RISCV_ACQUIRE_BARRIER)
> > > > > >
> > > > > > #define arch_xchg_release(ptr, x) \
> > > > > > -({ \
> > > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > > - (__typeof__(*(ptr))) __xchg_release((ptr), \
> > > > > > - _x_, sizeof(*(ptr))); \
> > > > > > -})
> > > > > > -
> > > > > > -#define __arch_xchg(ptr, new, size) \
> > > > > > -({ \
> > > > > > - __typeof__(ptr) __ptr = (ptr); \
> > > > > > - __typeof__(new) __new = (new); \
> > > > > > - __typeof__(*(ptr)) __ret; \
> > > > > > - switch (size) { \
> > > > > > - case 4: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - " amoswap.w.aqrl %0, %2, %1\n" \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > - break; \
> > > > > > - case 8: \
> > > > > > - __asm__ __volatile__ ( \
> > > > > > - " amoswap.d.aqrl %0, %2, %1\n" \
> > > > > > - : "=r" (__ret), "+A" (*__ptr) \
> > > > > > - : "r" (__new) \
> > > > > > - : "memory"); \
> > > > > > - break; \
> > > > > > - default: \
> > > > > > - BUILD_BUG(); \
> > > > > > - } \
> > > > > > - __ret; \
> > > > > > -})
> > > > > > + _arch_xchg(ptr, x, "", RISCV_RELEASE_BARRIER, "")
> > > > > >
> > > > > > #define arch_xchg(ptr, x) \
> > > > > > -({ \
> > > > > > - __typeof__(*(ptr)) _x_ = (x); \
> > > > > > - (__typeof__(*(ptr))) __arch_xchg((ptr), _x_, sizeof(*(ptr))); \
> > > > > > -})
> > > > > > + _arch_xchg(ptr, x, ".aqrl", "", "")
> > > > > >
> > > > > > #define xchg32(ptr, x) \
> > > > > > ({ \
> > > > > > --
> > > > > > 2.43.0
> > > > > >
> > > > >
> > > >
> > >
> >
>


2024-01-13 06:54:49

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

Hi Leonardo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 610a9b8f49fbcf1100716370d3b5f6f884a2835a]

url: https://github.com/intel-lab-lkp/linux/commits/Leonardo-Bras/riscv-cmpxchg-Deduplicate-xchg-asm-functions/20240104-003501
base: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
patch link: https://lore.kernel.org/r/20240103163203.72768-3-leobras%40redhat.com
patch subject: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions
config: riscv-randconfig-r111-20240112 (https://download.01.org/0day-ci/archive/20240113/[email protected]/config)
compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 9bde5becb44ea071f5e1fa1f5d4071dc8788b18c)
reproduce: (https://download.01.org/0day-ci/archive/20240113/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

sparse warnings: (new ones prefixed by >>)
>> net/ipv4/tcp_cong.c:300:24: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct tcp_congestion_ops const [noderef] __rcu *__new @@ got struct tcp_congestion_ops *[assigned] ca @@
net/ipv4/tcp_cong.c:300:24: sparse: expected struct tcp_congestion_ops const [noderef] __rcu *__new
net/ipv4/tcp_cong.c:300:24: sparse: got struct tcp_congestion_ops *[assigned] ca
net/ipv4/tcp_cong.c:300:22: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct tcp_congestion_ops const *prev @@ got struct tcp_congestion_ops const [noderef] __rcu * @@
net/ipv4/tcp_cong.c:300:22: sparse: expected struct tcp_congestion_ops const *prev
net/ipv4/tcp_cong.c:300:22: sparse: got struct tcp_congestion_ops const [noderef] __rcu *
net/ipv4/tcp_cong.c: note: in included file (through include/linux/module.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true

vim +300 net/ipv4/tcp_cong.c

317a76f9a44b43 Stephen Hemminger 2005-06-23 281
317a76f9a44b43 Stephen Hemminger 2005-06-23 282 /* Used by sysctl to change default congestion control */
6670e152447732 Stephen Hemminger 2017-11-14 283 int tcp_set_default_congestion_control(struct net *net, const char *name)
317a76f9a44b43 Stephen Hemminger 2005-06-23 284 {
317a76f9a44b43 Stephen Hemminger 2005-06-23 285 struct tcp_congestion_ops *ca;
6670e152447732 Stephen Hemminger 2017-11-14 286 const struct tcp_congestion_ops *prev;
6670e152447732 Stephen Hemminger 2017-11-14 287 int ret;
317a76f9a44b43 Stephen Hemminger 2005-06-23 288
6670e152447732 Stephen Hemminger 2017-11-14 289 rcu_read_lock();
6670e152447732 Stephen Hemminger 2017-11-14 290 ca = tcp_ca_find_autoload(net, name);
6670e152447732 Stephen Hemminger 2017-11-14 291 if (!ca) {
6670e152447732 Stephen Hemminger 2017-11-14 292 ret = -ENOENT;
0baf26b0fcd74b Martin KaFai Lau 2020-01-08 293 } else if (!bpf_try_module_get(ca, ca->owner)) {
6670e152447732 Stephen Hemminger 2017-11-14 294 ret = -EBUSY;
8d432592f30fcc Jonathon Reinhart 2021-05-01 295 } else if (!net_eq(net, &init_net) &&
8d432592f30fcc Jonathon Reinhart 2021-05-01 296 !(ca->flags & TCP_CONG_NON_RESTRICTED)) {
8d432592f30fcc Jonathon Reinhart 2021-05-01 297 /* Only init netns can set default to a restricted algorithm */
8d432592f30fcc Jonathon Reinhart 2021-05-01 298 ret = -EPERM;
6670e152447732 Stephen Hemminger 2017-11-14 299 } else {
6670e152447732 Stephen Hemminger 2017-11-14 @300 prev = xchg(&net->ipv4.tcp_congestion_control, ca);
6670e152447732 Stephen Hemminger 2017-11-14 301 if (prev)
0baf26b0fcd74b Martin KaFai Lau 2020-01-08 302 bpf_module_put(prev, prev->owner);
317a76f9a44b43 Stephen Hemminger 2005-06-23 303
6670e152447732 Stephen Hemminger 2017-11-14 304 ca->flags |= TCP_CONG_NON_RESTRICTED;
317a76f9a44b43 Stephen Hemminger 2005-06-23 305 ret = 0;
317a76f9a44b43 Stephen Hemminger 2005-06-23 306 }
6670e152447732 Stephen Hemminger 2017-11-14 307 rcu_read_unlock();
317a76f9a44b43 Stephen Hemminger 2005-06-23 308
317a76f9a44b43 Stephen Hemminger 2005-06-23 309 return ret;
317a76f9a44b43 Stephen Hemminger 2005-06-23 310 }
317a76f9a44b43 Stephen Hemminger 2005-06-23 311

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-01-16 19:27:37

by Leonardo Bras

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions

On Sat, Jan 13, 2024 at 02:54:17PM +0800, kernel test robot wrote:
> Hi Leonardo,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on 610a9b8f49fbcf1100716370d3b5f6f884a2835a]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Leonardo-Bras/riscv-cmpxchg-Deduplicate-xchg-asm-functions/20240104-003501

Cloned this repo

> base: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
> patch link: https://lore.kernel.org/r/20240103163203.72768-3-leobras%40redhat.com
> patch subject: [PATCH v1 1/5] riscv/cmpxchg: Deduplicate xchg() asm functions
> config: riscv-randconfig-r111-20240112 (https://download.01.org/0day-ci/archive/20240113/[email protected]/config)
> compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 9bde5becb44ea071f5e1fa1f5d4071dc8788b18c)
> reproduce: (https://download.01.org/0day-ci/archive/20240113/[email protected]/reproduce)

And followed those instructions, while using sparse v0.6.4-52-g1cf3d98c.

>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
>
> sparse warnings: (new ones prefixed by >>)
> >> net/ipv4/tcp_cong.c:300:24: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct tcp_congestion_ops const [noderef] __rcu *__new @@ got struct tcp_congestion_ops *[assigned] ca @@
> net/ipv4/tcp_cong.c:300:24: sparse: expected struct tcp_congestion_ops const [noderef] __rcu *__new
> net/ipv4/tcp_cong.c:300:24: sparse: got struct tcp_congestion_ops *[assigned] ca
> net/ipv4/tcp_cong.c:300:22: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct tcp_congestion_ops const *prev @@ got struct tcp_congestion_ops const [noderef] __rcu * @@
> net/ipv4/tcp_cong.c:300:22: sparse: expected struct tcp_congestion_ops const *prev
> net/ipv4/tcp_cong.c:300:22: sparse: got struct tcp_congestion_ops const [noderef] __rcu *
> net/ipv4/tcp_cong.c: note: in included file (through include/linux/module.h):
> include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
> include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
>
> vim +300 net/ipv4/tcp_cong.c
>
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 281
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 282 /* Used by sysctl to change default congestion control */
> 6670e152447732 Stephen Hemminger 2017-11-14 283 int tcp_set_default_congestion_control(struct net *net, const char *name)
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 284 {
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 285 struct tcp_congestion_ops *ca;
> 6670e152447732 Stephen Hemminger 2017-11-14 286 const struct tcp_congestion_ops *prev;
> 6670e152447732 Stephen Hemminger 2017-11-14 287 int ret;
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 288
> 6670e152447732 Stephen Hemminger 2017-11-14 289 rcu_read_lock();
> 6670e152447732 Stephen Hemminger 2017-11-14 290 ca = tcp_ca_find_autoload(net, name);
> 6670e152447732 Stephen Hemminger 2017-11-14 291 if (!ca) {
> 6670e152447732 Stephen Hemminger 2017-11-14 292 ret = -ENOENT;
> 0baf26b0fcd74b Martin KaFai Lau 2020-01-08 293 } else if (!bpf_try_module_get(ca, ca->owner)) {
> 6670e152447732 Stephen Hemminger 2017-11-14 294 ret = -EBUSY;
> 8d432592f30fcc Jonathon Reinhart 2021-05-01 295 } else if (!net_eq(net, &init_net) &&
> 8d432592f30fcc Jonathon Reinhart 2021-05-01 296 !(ca->flags & TCP_CONG_NON_RESTRICTED)) {
> 8d432592f30fcc Jonathon Reinhart 2021-05-01 297 /* Only init netns can set default to a restricted algorithm */
> 8d432592f30fcc Jonathon Reinhart 2021-05-01 298 ret = -EPERM;
> 6670e152447732 Stephen Hemminger 2017-11-14 299 } else {
> 6670e152447732 Stephen Hemminger 2017-11-14 @300 prev = xchg(&net->ipv4.tcp_congestion_control, ca);
> 6670e152447732 Stephen Hemminger 2017-11-14 301 if (prev)
> 0baf26b0fcd74b Martin KaFai Lau 2020-01-08 302 bpf_module_put(prev, prev->owner);
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 303
> 6670e152447732 Stephen Hemminger 2017-11-14 304 ca->flags |= TCP_CONG_NON_RESTRICTED;
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 305 ret = 0;
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 306 }
> 6670e152447732 Stephen Hemminger 2017-11-14 307 rcu_read_unlock();
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 308
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 309 return ret;
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 310 }
> 317a76f9a44b43 Stephen Hemminger 2005-06-23 311
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

I did some testing using the instructions above on above file, and patch
1/5 haven't introduced anything new.


Command for gathering sparse warnings:
COMPILER_INSTALL_PATH=$HOME/0day ~/lkp-tests/kbuild/make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__ -fmax-errors=unlimited -fmax-warnings=unlimited' O=build_dir ARCH=riscv SHELL=/bin/bash net/ipv4/tcp_cong.o 2> sparse

I ran this for the commit mentioned in the reproduction instructions
(7931dc023 : riscv/cmpxchg: Deduplicate xchg() asm functions ) and for it's
parent (610a9b8f49 : Linux 6.7-rc8). The diff -u on the output was:

# diff -u sparse_vanilla sparse_p1_5
--- sparse_vanilla 2024-01-16 14:16:36.217965076 -0500
+++ sparse_p1_5 2024-01-16 14:15:29.942712160 -0500
@@ -1,5 +1,5 @@
../net/ipv4/tcp_cong.c:300:24: sparse: warning: incorrect type in initializer (different address spaces)
-../net/ipv4/tcp_cong.c:300:24: sparse: expected struct tcp_congestion_ops const [noderef] __rcu *_x_
+../net/ipv4/tcp_cong.c:300:24: sparse: expected struct tcp_congestion_ops const [noderef] __rcu *__new
../net/ipv4/tcp_cong.c:300:24: sparse: got struct tcp_congestion_ops *[assigned] ca
../net/ipv4/tcp_cong.c:300:22: sparse: warning: incorrect type in assignment (different address spaces)
../net/ipv4/tcp_cong.c:300:22: sparse: expected struct tcp_congestion_ops const *prev

So I did not introduce anything new, as per sparse v0.6.4-52-g1cf3d98c .

I noticed the output is slightly different, and that in the reproduction
steps this used:
# sparse version: v0.6.4-52-g1cf3d98c-dirty

Since there is no indicator on what the -dirty stands for, it's hard for me
to get the same reproduction, but as far as I could test there is not
any new error.

Thanks!
Leo



Subject: Re: [PATCH v1 0/5] Rework & improve riscv cmpxchg.h and atomic.h

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <[email protected]>:

On Wed, 3 Jan 2024 13:31:58 -0300 you wrote:
> While studying riscv's cmpxchg.h file, I got really interested in
> understanding how RISCV asm implemented the different versions of
> {cmp,}xchg.
>
> When I understood the pattern, it made sense for me to remove the
> duplications and create macros to make it easier to understand what exactly
> changes between the versions: Instruction sufixes & barriers.
>
> [...]

Here is the summary with links:
- [v1,1/5] riscv/cmpxchg: Deduplicate xchg() asm functions
https://git.kernel.org/riscv/c/4bfa185fe3f0
- [v1,2/5] riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
https://git.kernel.org/riscv/c/07a0a41cb77d
- [v1,3/5] riscv/atomic.h : Deduplicate arch_atomic.*
https://git.kernel.org/riscv/c/906123739272
- [v1,4/5] riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
https://git.kernel.org/riscv/c/54280ca64626
- [v1,5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2
https://git.kernel.org/riscv/c/a8ed2b7a2c13

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html