2023-03-05 20:56:56

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 00/10] locking: Introduce local{,64}_try_cmpxchg

Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

4b3: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax
4ba: 48 8b b8 08 04 00 00 mov 0x408(%rax),%rdi
4c1: 8b 42 1c mov 0x1c(%rdx),%eax
4c4: 48 8b 4a 28 mov 0x28(%rdx),%rcx
4c8: 85 c0 test %eax,%eax
...
4ef: 48 89 c8 mov %rcx,%rax
4f2: 48 0f b1 7a 28 cmpxchg %rdi,0x28(%rdx)
4f7: 48 39 c1 cmp %rax,%rcx
4fa: 75 b7 jne 4b3 <...>

to

4b2: 48 8b 4a 28 mov 0x28(%rdx),%rcx
4b6: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax
4bd: 48 8b b0 08 04 00 00 mov 0x408(%rax),%rsi
4c4: 8b 42 1c mov 0x1c(%rdx),%eax
4c7: 85 c0 test %eax,%eax
...
4d4: 48 89 c8 mov %rcx,%rax
4d7: 48 0f b1 72 28 cmpxchg %rsi,0x28(%rdx)
4dc: 0f 85 d0 00 00 00 jne 5b2 <...>
...
5b2: 48 89 c1 mov %rax,%rcx
5b5: e9 fc fe ff ff jmp 4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: [email protected]
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: Jun Yi <[email protected]>

Uros Bizjak (10):
locking/atomic: Add missing cast to try_cmpxchg() fallbacks
locking/atomic: Add generic try_cmpxchg{,64}_local support
locking/alpha: Wire up local_try_cmpxchg
locking/loongarch: Wire up local_try_cmpxchg
locking/mips: Wire up local_try_cmpxchg
locking/powerpc: Wire up local_try_cmpxchg
locking/x86: Wire up local_try_cmpxchg
locking/generic: Wire up local{,64}_try_cmpxchg
locking/x86: Enable local{,64}_try_cmpxchg
perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin

arch/alpha/include/asm/local.h | 2 ++
arch/loongarch/include/asm/local.h | 2 ++
arch/mips/include/asm/local.h | 2 ++
arch/powerpc/include/asm/local.h | 11 ++++++
arch/x86/include/asm/cmpxchg.h | 6 ++++
arch/x86/include/asm/local.h | 2 ++
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 2 ++
include/linux/atomic/atomic-arch-fallback.h | 40 ++++++++++++++++-----
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++-
kernel/events/ring_buffer.c | 5 +--
scripts/atomic/gen-atomic-fallback.sh | 6 +++-
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
13 files changed, 87 insertions(+), 14 deletions(-)

--
2.39.2



2023-03-05 20:57:05

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.

Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Mark Rutland <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
include/linux/atomic/atomic-arch-fallback.h | 18 +++++++++---------
scripts/atomic/gen-atomic-fallback.sh | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 77bc5522e61c..19debd501ee7 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -87,7 +87,7 @@
#ifndef arch_try_cmpxchg
#define arch_try_cmpxchg(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -98,7 +98,7 @@
#ifndef arch_try_cmpxchg_acquire
#define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -109,7 +109,7 @@
#ifndef arch_try_cmpxchg_release
#define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -120,7 +120,7 @@
#ifndef arch_try_cmpxchg_relaxed
#define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -157,7 +157,7 @@
#ifndef arch_try_cmpxchg64
#define arch_try_cmpxchg64(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -168,7 +168,7 @@
#ifndef arch_try_cmpxchg64_acquire
#define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -179,7 +179,7 @@
#ifndef arch_try_cmpxchg64_release
#define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -190,7 +190,7 @@
#ifndef arch_try_cmpxchg64_relaxed
#define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
({ \
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// b5e87bdd5ede61470c29f7a7e4de781af3770f09
+// 1b4d4c82ae653389cd1538d5b07170267d9b3837
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695e3c89..39f447161108 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -171,7 +171,7 @@ cat <<EOF
#ifndef arch_try_${cmpxchg}${order}
#define arch_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\
({ \\
- typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \\
___r = arch_${cmpxchg}${order}((_ptr), ___o, (_new)); \\
if (unlikely(___r != ___o)) \\
*___op = ___r; \\
--
2.39.2


2023-03-05 20:57:08

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 02/10] locking/atomic: Add generic try_cmpxchg{,64}_local support

Add generic support for try_cmpxchg{,64}_local and their falbacks.

Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Mark Rutland <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
scripts/atomic/gen-atomic-fallback.sh | 4 ++++
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 19debd501ee7..a7d2e5f4e548 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@

#endif /* arch_try_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 1b4d4c82ae653389cd1538d5b07170267d9b3837
+// 9bb8cca3d4cbc000e7068eb7cb4481cb3e48c45a
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 7a139ec030b0..14a9212cc987 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
#define cmpxchg_double(ptr, ...) \
({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
})

#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
+// 456e206c7e4e681126c482e4edcc6f46921ac731
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 39f447161108..26829a75b644 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
done

+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+ gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 77c06526a574..c8165e9431bf 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
done
done

-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
done
--
2.39.2


2023-03-05 20:57:11

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 03/10] locking/alpha: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/alpha/include/asm/local.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1c93d5..7eef027e0dde 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -54,6 +54,8 @@ static __inline__ long local_sub_return(long i, local_t * l)

#define local_cmpxchg(l, o, n) \
(cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+ (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
#define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))

/**
--
2.39.2


2023-03-05 20:57:15

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 04/10] locking/loongarch: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg.

Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: Jun Yi <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/loongarch/include/asm/local.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h
index 65fbbae9fc4d..dff6bcbe4821 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -58,6 +58,8 @@ static inline long local_sub_return(long i, local_t *l)

#define local_cmpxchg(l, o, n) \
((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+ (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
--
2.39.2


2023-03-05 20:57:18

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 06/10] locking/powerpc: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg.

Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/powerpc/include/asm/local.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19b7fc2..45492fb5bf22 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
return t;
}

+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+ long o = *po, r;
+
+ r = local_cmpxchg(l, o, n);
+ if (unlikely(r != o))
+ *po = r;
+
+ return likely(r == o);
+}
+
static __inline__ long local_xchg(local_t *l, long n)
{
long t;
--
2.39.2


2023-03-05 20:57:22

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 05/10] locking/mips: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg.

Cc: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/mips/include/asm/local.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1fd273..b611361a760b 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -96,6 +96,8 @@ static __inline__ long local_sub_return(long i, local_t * l)

#define local_cmpxchg(l, o, n) \
((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+ (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
--
2.39.2


2023-03-05 20:57:25

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 07/10] locking/x86: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/x86/include/asm/local.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47acaa4a..4bcc72906524 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -122,6 +122,8 @@ static inline long local_sub_return(long i, local_t *l)

#define local_cmpxchg(l, o, n) \
(cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+ (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
/* Always has a lock prefix */
#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))

--
2.39.2


2023-03-05 20:57:40

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 08/10] locking/generic: Wire up local{,64}_try_cmpxchg

Implement generic support for local{,64}_try_cmpxchg.

Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 2 ++
2 files changed, 3 insertions(+)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d84818..7f97018df66f 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
#define local_inc_return(l) atomic_long_inc_return(&(l)->a)

#define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
#define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
#define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b7d883..54b91e93ae76 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -43,6 +43,7 @@ typedef struct {
#define local64_inc_return(l) local_inc_return(&(l)->a)

#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) local_try_cmpxchg((&(l)->a), (po), (n))
#define local64_xchg(l, n) local_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
@@ -81,6 +82,7 @@ typedef struct {
#define local64_inc_return(l) atomic64_inc_return(&(l)->a)

#define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
#define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)
--
2.39.2


2023-03-05 20:58:00

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 09/10] locking/x86: Enable local{,64}_try_cmpxchg

Enable local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/x86/include/asm/cmpxchg.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6ae7431..e8f939e8432e 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

+#define __try_cmpxchg_local(ptr, pold, new, size) \
+ __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

+#define arch_try_cmpxchg_local(ptr, pold, new) \
+ __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
/*
* xadd() adds "inc" to "*ptr" and atomically returns the previous
* value of "*ptr".
--
2.39.2


2023-03-05 20:58:10

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 10/10] perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin

Use local_try_cmpxchg instead of local_cmpxchg (*ptr, old, new) == old in
__perf_output_begin. x86 CMPXCHG instruction returns success in ZF flag,
so this change saves a compare after cmpxchg.

Also, local_try_cmpxchg implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

No functional change intended.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Ian Rogers <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
kernel/events/ring_buffer.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 273a0fe7910a..e07c10f4d141 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle,

perf_output_get_handle(handle);

+ offset = local_read(&rb->head);
do {
+ head = offset;
tail = READ_ONCE(rb->user_page->data_tail);
- offset = head = local_read(&rb->head);
if (!rb->overwrite) {
if (unlikely(!ring_buffer_has_space(head, tail,
perf_data_size(rb),
@@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle,
head += size;
else
head -= size;
- } while (local_cmpxchg(&rb->head, offset, head) != offset);
+ } while (!local_try_cmpxchg(&rb->head, &offset, head));

if (backward) {
offset = head;
--
2.39.2


2023-03-24 14:17:44

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.

Can you give an example of where we are passing an incompatible pointer?

That sounds indicative of a bug in the caller, but maybe I'm missing some
reason this is necessary due to some indirection.

> Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")

I'm not sure that this needs a fixes tag. Does anything go wrong today, or only
later in this series?

Thanks,
Mark.

> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Signed-off-by: Uros Bizjak <[email protected]>
> ---
> include/linux/atomic/atomic-arch-fallback.h | 18 +++++++++---------
> scripts/atomic/gen-atomic-fallback.sh | 2 +-
> 2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
> index 77bc5522e61c..19debd501ee7 100644
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -87,7 +87,7 @@
> #ifndef arch_try_cmpxchg
> #define arch_try_cmpxchg(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -98,7 +98,7 @@
> #ifndef arch_try_cmpxchg_acquire
> #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -109,7 +109,7 @@
> #ifndef arch_try_cmpxchg_release
> #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -120,7 +120,7 @@
> #ifndef arch_try_cmpxchg_relaxed
> #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -157,7 +157,7 @@
> #ifndef arch_try_cmpxchg64
> #define arch_try_cmpxchg64(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -168,7 +168,7 @@
> #ifndef arch_try_cmpxchg64_acquire
> #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -179,7 +179,7 @@
> #ifndef arch_try_cmpxchg64_release
> #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -190,7 +190,7 @@
> #ifndef arch_try_cmpxchg64_relaxed
> #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
> ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
> ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
> if (unlikely(___r != ___o)) \
> *___op = ___r; \
> @@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
> #endif
>
> #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// b5e87bdd5ede61470c29f7a7e4de781af3770f09
> +// 1b4d4c82ae653389cd1538d5b07170267d9b3837
> diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
> index 3a07695e3c89..39f447161108 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -171,7 +171,7 @@ cat <<EOF
> #ifndef arch_try_${cmpxchg}${order}
> #define arch_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\
> ({ \\
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \\
> ___r = arch_${cmpxchg}${order}((_ptr), ___o, (_new)); \\
> if (unlikely(___r != ___o)) \\
> *___op = ___r; \\
> --
> 2.39.2
>

2023-03-24 15:46:33

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
>
> On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
>
> Can you give an example of where we are passing an incompatible pointer?

An example is patch 10/10 from the series, which will fail without
this fix when fallback code is used. We have:

- } while (local_cmpxchg(&rb->head, offset, head) != offset);
+ } while (!local_try_cmpxchg(&rb->head, &offset, head));

where rb->head is defined as:

typedef struct {
atomic_long_t a;
} local_t;

while offset is defined as 'unsigned long'.

The assignment in existing try_cmpxchg template:

typeof(*(_ptr)) *___op = (_oldp)

will trigger an initialization from an incompatible pointer type error.

Please note that x86 avoids this issue by a cast in its
target-dependent definition:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
({ \
bool success; \
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
__typeof__(*(_ptr)) __old = *_old; \
__typeof__(*(_ptr)) __new = (_new); \

so, the warning/error will trigger only in the fallback code.

> That sounds indicative of a bug in the caller, but maybe I'm missing some
> reason this is necessary due to some indirection.
>
> > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
>
> I'm not sure that this needs a fixes tag. Does anything go wrong today, or only
> later in this series?

The patch at [1] triggered a build error in posix_acl.c/__get.acl due
to the same problem. The compilation for x86 target was OK, because
x86 defines target-specific arch_try_cmpxchg, but the compilation
broke for targets that revert to generic support. Please note that
this specific problem was recently fixed in a different way [2], but
the issue with the fallback remains.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

Uros.

2023-03-24 16:24:45

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> >
> > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> >
> > Can you give an example of where we are passing an incompatible pointer?
>
> An example is patch 10/10 from the series, which will fail without
> this fix when fallback code is used. We have:
>
> - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> + } while (!local_try_cmpxchg(&rb->head, &offset, head));
>
> where rb->head is defined as:
>
> typedef struct {
> atomic_long_t a;
> } local_t;
>
> while offset is defined as 'unsigned long'.

Ok, but that's because we're doing the wrong thing to start with.

Since local_t is defined in terms of atomic_long_t, we should define the
generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
have a mismatch between 'long *' and 'unsigned long *', but then we can fix
that in the callsite:

while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))

... which then won't silently mask issues elsewhere, and will be consistent
with all the other atomic APIs.

Thanks,
Mark.

>
> The assignment in existing try_cmpxchg template:
>
> typeof(*(_ptr)) *___op = (_oldp)
>
> will trigger an initialization from an incompatible pointer type error.
>
> Please note that x86 avoids this issue by a cast in its
> target-dependent definition:
>
> #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
> ({ \
> bool success; \
> __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
> __typeof__(*(_ptr)) __old = *_old; \
> __typeof__(*(_ptr)) __new = (_new); \
>
> so, the warning/error will trigger only in the fallback code.
>
> > That sounds indicative of a bug in the caller, but maybe I'm missing some
> > reason this is necessary due to some indirection.
> >
> > > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
> >
> > I'm not sure that this needs a fixes tag. Does anything go wrong today, or only
> > later in this series?
>
> The patch at [1] triggered a build error in posix_acl.c/__get.acl due
> to the same problem. The compilation for x86 target was OK, because
> x86 defines target-specific arch_try_cmpxchg, but the compilation
> broke for targets that revert to generic support. Please note that
> this specific problem was recently fixed in a different way [2], but
> the issue with the fallback remains.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> Uros.

2023-03-24 16:38:13

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> > >
> > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> > >
> > > Can you give an example of where we are passing an incompatible pointer?
> >
> > An example is patch 10/10 from the series, which will fail without
> > this fix when fallback code is used. We have:
> >
> > - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> > + } while (!local_try_cmpxchg(&rb->head, &offset, head));
> >
> > where rb->head is defined as:
> >
> > typedef struct {
> > atomic_long_t a;
> > } local_t;
> >
> > while offset is defined as 'unsigned long'.
>
> Ok, but that's because we're doing the wrong thing to start with.
>
> Since local_t is defined in terms of atomic_long_t, we should define the
> generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> that in the callsite:
>
> while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))

Sorry, that should be:

while (!local_try_cmpxchg(&rb->head, (long *)&offset, head))

The fundamenalthing I'm trying to say is that the
atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
their try_cmpxchg() implementations, the type signature should be:

${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, ${inttype} new)

Thanks,
Mark.

2023-03-26 20:05:00

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland <[email protected]> wrote:
>
> On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> > > >
> > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> > > >
> > > > Can you give an example of where we are passing an incompatible pointer?
> > >
> > > An example is patch 10/10 from the series, which will fail without
> > > this fix when fallback code is used. We have:
> > >
> > > - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> > > + } while (!local_try_cmpxchg(&rb->head, &offset, head));
> > >
> > > where rb->head is defined as:
> > >
> > > typedef struct {
> > > atomic_long_t a;
> > > } local_t;
> > >
> > > while offset is defined as 'unsigned long'.
> >
> > Ok, but that's because we're doing the wrong thing to start with.
> >
> > Since local_t is defined in terms of atomic_long_t, we should define the
> > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > that in the callsite:
> >
> > while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))
>
> Sorry, that should be:
>
> while (!local_try_cmpxchg(&rb->head, (long *)&offset, head))

The fallbacks are a bit more complicated than above, and are different
from atomic_try_cmpxchg.

Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
are not defined call arch_cmpxchg_local. Also in patch 2/10,
try_cmpxchg_local is introduced, where it calls
arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
:

#define local_cmpxchg(l, o, n) \
(cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+ (try_cmpxchg_local(&((l)->a.counter), (po), (n)))

which is part of the local_t API. Targets should either define all
these #defines, or none. There are no partial fallbacks as is the case
with atomic_t.

The core of the local_h API is in the local.h header. If the target
doesn't define its own local.h header, then asm-generic/local.h is
used that does exactly what you propose above regarding the usage of
atomic functions.

OTOH, when the target defines its own local.h, then the above
target-dependent #define path applies. The target should define its
own arch_try_cmpxchg_local, otherwise a "generic" target-dependent
fallback that calls target arch_cmpxchg_local applies. In the case of
x86, patch 9/10 enables new instruction by defining
arch_try_cmpxchg_local.

FYI, the patch sequence is carefully chosen so that x86 also exercises
fallback code between different patches in the series.

Targets are free to define local_t to whatever they like, but for some
reason they all define it to:

typedef struct {
atomic_long_t a;
} local_t;

so they have to dig the variable out of the struct like:

#define local_cmpxchg(l, o, n) \
(cmpxchg_local(&((l)->a.counter), (o), (n)))

Regarding the mismatch of 'long *' vs 'unsigned long *': x86
target-specific code does for try_cmpxchg:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
({ \
bool success; \
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
__typeof__(*(_ptr)) __old = *_old; \
__typeof__(*(_ptr)) __new = (_new); \

so, it *does* cast the "old" pointer to the type of "ptr". The generic
code does *not*. This difference is dangerous, since the compilation
of some code involving try_cmpxchg will compile OK for x86 but will
break for other targets that use try_cmpxchg fallback templates (I was
the unlucky one that tripped on this in the past). Please note that
this problem is not specific to the proposed local_try_cmpxchg series,
but affects the existing try_cmpxchg API.

Also, I don't think that "fixing" callsites is the right thing to do.
The generic code should follow x86 and cast the "old" pointer to the
type of "ptr" inside the fallback.

> The fundamenalthing I'm trying to say is that the
> atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
> their try_cmpxchg() implementations, the type signature should be:
>
> ${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, ${inttype} new)

This conversion should be performed also for the cmpxchg family of
functions, if desired at all. try_cmpxchg fallback is just cmpxchg
with some extra code around.

Thanks,
Uros.

2023-04-03 10:23:15

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland <[email protected]> wrote:
> >
> > On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> > > > >
> > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> > > > >
> > > > > Can you give an example of where we are passing an incompatible pointer?
> > > >
> > > > An example is patch 10/10 from the series, which will fail without
> > > > this fix when fallback code is used. We have:
> > > >
> > > > - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> > > > + } while (!local_try_cmpxchg(&rb->head, &offset, head));
> > > >
> > > > where rb->head is defined as:
> > > >
> > > > typedef struct {
> > > > atomic_long_t a;
> > > > } local_t;
> > > >
> > > > while offset is defined as 'unsigned long'.
> > >
> > > Ok, but that's because we're doing the wrong thing to start with.
> > >
> > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> > > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > > that in the callsite:
> > >
> > > while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))
> >
> > Sorry, that should be:
> >
> > while (!local_try_cmpxchg(&rb->head, (long *)&offset, head))
>
> The fallbacks are a bit more complicated than above, and are different
> from atomic_try_cmpxchg.
>
> Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> are not defined call arch_cmpxchg_local. Also in patch 2/10,
> try_cmpxchg_local is introduced, where it calls
> arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> :
>
> #define local_cmpxchg(l, o, n) \
> (cmpxchg_local(&((l)->a.counter), (o), (n)))
> +#define local_try_cmpxchg(l, po, n) \
> + (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
>
> which is part of the local_t API. Targets should either define all
> these #defines, or none. There are no partial fallbacks as is the case
> with atomic_t.

Whether or not there are fallbacks is immaterial.

In those cases, architectures can just as easily write C wrappers, e.g.

long local_cmpxchg(local_t *l, long old, long new)
{
return cmpxchg_local(&l->a.counter, old, new);
}

long local_try_cmpxchg(local_t *l, long *old, long new)
{
return try_cmpxchg_local(&l->a.counter, old, new);
}

> The core of the local_h API is in the local.h header. If the target
> doesn't define its own local.h header, then asm-generic/local.h is
> used that does exactly what you propose above regarding the usage of
> atomic functions.
>
> OTOH, when the target defines its own local.h, then the above
> target-dependent #define path applies. The target should define its
> own arch_try_cmpxchg_local, otherwise a "generic" target-dependent
> fallback that calls target arch_cmpxchg_local applies. In the case of
> x86, patch 9/10 enables new instruction by defining
> arch_try_cmpxchg_local.
>
> FYI, the patch sequence is carefully chosen so that x86 also exercises
> fallback code between different patches in the series.
>
> Targets are free to define local_t to whatever they like, but for some
> reason they all define it to:
>
> typedef struct {
> atomic_long_t a;
> } local_t;

Yes, which is why I used atomic_long() above.

> so they have to dig the variable out of the struct like:
>
> #define local_cmpxchg(l, o, n) \
> (cmpxchg_local(&((l)->a.counter), (o), (n)))
>
> Regarding the mismatch of 'long *' vs 'unsigned long *': x86
> target-specific code does for try_cmpxchg:
>
> #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
> ({ \
> bool success; \
> __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
> __typeof__(*(_ptr)) __old = *_old; \
> __typeof__(*(_ptr)) __new = (_new); \
>
> so, it *does* cast the "old" pointer to the type of "ptr". The generic
> code does *not*. This difference is dangerous, since the compilation
> of some code involving try_cmpxchg will compile OK for x86 but will
> break for other targets that use try_cmpxchg fallback templates (I was
> the unlucky one that tripped on this in the past). Please note that
> this problem is not specific to the proposed local_try_cmpxchg series,
> but affects the existing try_cmpxchg API.

I understand the problem of arch code differing from generic code, and that we
want to have *a* consistent behaviour for hte API.

What I'm saying is that the behaviour we should aim for is where the 'old'
pointer has a specific type (long), and we always require that, as we do for
the various atomic_*() APIs of which local_*() is a cousin.

> Also, I don't think that "fixing" callsites is the right thing to do.

Why? What's wrong with doing that?

The documentation in Documentation/core-api/local_ops.rst says:

The ``local_t`` type is defined as an opaque ``signed long``

So the obvious and least surprising thing is for the local_*() functions to use
'long' for values and 'long *' for pointers to values.

Requiring a cast in a few places is not the end of the world.

> The generic code should follow x86 and cast the "old" pointer to the
> type of "ptr" inside the fallback.

Why?

I disagree, and think it's far better to be strict by default. That way,
accidental usage of the wrong type will be caught by the compiler, and if
someone *really* wants to use a differently type then can use a cast in the
callsite, which makes it really obvious when that is happening.

I appreciate that may require some preparatory cleanup, but I think that's a
small price to pay for having this in a clearer and more maintainable state.

> > The fundamenalthing I'm trying to say is that the
> > atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
> > their try_cmpxchg() implementations, the type signature should be:
> >
> > ${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, ${inttype} new)
>
> This conversion should be performed also for the cmpxchg family of
> functions, if desired at all. try_cmpxchg fallback is just cmpxchg
> with some extra code around.

FWIW, I agree that we *should* make try_cmpxchg() check that ptr and old
pointer are the same type.

However, I don't think that's a prerequisite for doing so for
local_try_cmpxchg().

Plese make local_try_cmpxchg() have a proper type-safe C prototype, as we do
with the atomic*_try_cmpxchg() APIs.

Thanks,
Mark,

2023-04-04 12:26:54

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland <[email protected]> wrote:
>
> On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland <[email protected]> wrote:
> > >
> > > On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> > > > > >
> > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> > > > > >
> > > > > > Can you give an example of where we are passing an incompatible pointer?
> > > > >
> > > > > An example is patch 10/10 from the series, which will fail without
> > > > > this fix when fallback code is used. We have:
> > > > >
> > > > > - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> > > > > + } while (!local_try_cmpxchg(&rb->head, &offset, head));
> > > > >
> > > > > where rb->head is defined as:
> > > > >
> > > > > typedef struct {
> > > > > atomic_long_t a;
> > > > > } local_t;
> > > > >
> > > > > while offset is defined as 'unsigned long'.
> > > >
> > > > Ok, but that's because we're doing the wrong thing to start with.
> > > >
> > > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> > > > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > > > that in the callsite:
> > > >
> > > > while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))
> > >
> > > Sorry, that should be:
> > >
> > > while (!local_try_cmpxchg(&rb->head, (long *)&offset, head))
> >
> > The fallbacks are a bit more complicated than above, and are different
> > from atomic_try_cmpxchg.
> >
> > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > try_cmpxchg_local is introduced, where it calls
> > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > :
> >
> > #define local_cmpxchg(l, o, n) \
> > (cmpxchg_local(&((l)->a.counter), (o), (n)))
> > +#define local_try_cmpxchg(l, po, n) \
> > + (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> >
> > which is part of the local_t API. Targets should either define all
> > these #defines, or none. There are no partial fallbacks as is the case
> > with atomic_t.
>
> Whether or not there are fallbacks is immaterial.
>
> In those cases, architectures can just as easily write C wrappers, e.g.
>
> long local_cmpxchg(local_t *l, long old, long new)
> {
> return cmpxchg_local(&l->a.counter, old, new);
> }
>
> long local_try_cmpxchg(local_t *l, long *old, long new)
> {
> return try_cmpxchg_local(&l->a.counter, old, new);
> }

Please find attached the complete prototype patch that implements the
above suggestion.

The patch includes:
- implementation of instrumented try_cmpxchg{,64}_local definitions
- corresponding arch_try_cmpxchg{,64}_local fallback definitions
- generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers

- x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
- x86 specific arch_try_cmpxchg_local definition

- kernel/events/ring_buffer.c change to test local_try_cmpxchg
implementation and illustrate the transition
- arch/x86/events/core.c change to test local64_try_cmpxchg
implementation and illustrate the transition

The definition of atomic_long_t is different for 64-bit and 32-bit
targets (s64 vs int), so target specific C wrappers have to use
different casts to account for this difference.

Uros.


Attachments:
try_cmpxchg_local-04042023.diff.txt (8.69 kB)

2023-04-04 13:33:23

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Tue, Apr 04, 2023 at 02:24:38PM +0200, Uros Bizjak wrote:
> On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland <[email protected]> wrote:
> >
> > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> > > > > > >
> > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> > > > > > >
> > > > > > > Can you give an example of where we are passing an incompatible pointer?
> > > > > >
> > > > > > An example is patch 10/10 from the series, which will fail without
> > > > > > this fix when fallback code is used. We have:
> > > > > >
> > > > > > - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> > > > > > + } while (!local_try_cmpxchg(&rb->head, &offset, head));
> > > > > >
> > > > > > where rb->head is defined as:
> > > > > >
> > > > > > typedef struct {
> > > > > > atomic_long_t a;
> > > > > > } local_t;
> > > > > >
> > > > > > while offset is defined as 'unsigned long'.
> > > > >
> > > > > Ok, but that's because we're doing the wrong thing to start with.
> > > > >
> > > > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> > > > > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > > > > that in the callsite:
> > > > >
> > > > > while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))
> > > >
> > > > Sorry, that should be:
> > > >
> > > > while (!local_try_cmpxchg(&rb->head, (long *)&offset, head))
> > >
> > > The fallbacks are a bit more complicated than above, and are different
> > > from atomic_try_cmpxchg.
> > >
> > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > > try_cmpxchg_local is introduced, where it calls
> > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > > :
> > >
> > > #define local_cmpxchg(l, o, n) \
> > > (cmpxchg_local(&((l)->a.counter), (o), (n)))
> > > +#define local_try_cmpxchg(l, po, n) \
> > > + (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> > >
> > > which is part of the local_t API. Targets should either define all
> > > these #defines, or none. There are no partial fallbacks as is the case
> > > with atomic_t.
> >
> > Whether or not there are fallbacks is immaterial.
> >
> > In those cases, architectures can just as easily write C wrappers, e.g.
> >
> > long local_cmpxchg(local_t *l, long old, long new)
> > {
> > return cmpxchg_local(&l->a.counter, old, new);
> > }
> >
> > long local_try_cmpxchg(local_t *l, long *old, long new)
> > {
> > return try_cmpxchg_local(&l->a.counter, old, new);
> > }
>
> Please find attached the complete prototype patch that implements the
> above suggestion.
>
> The patch includes:
> - implementation of instrumented try_cmpxchg{,64}_local definitions
> - corresponding arch_try_cmpxchg{,64}_local fallback definitions
> - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers
>
> - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
> - x86 specific arch_try_cmpxchg_local definition
>
> - kernel/events/ring_buffer.c change to test local_try_cmpxchg
> implementation and illustrate the transition
> - arch/x86/events/core.c change to test local64_try_cmpxchg
> implementation and illustrate the transition
>
> The definition of atomic_long_t is different for 64-bit and 32-bit
> targets (s64 vs int), so target specific C wrappers have to use
> different casts to account for this difference.
>
> Uros.

Thanks for this!

FWIW, the patch (inline below) looks good to me.

Mark.

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index d096b04bf80e..d9310e9363f1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
> * exchange a new raw count - then add that new-prev delta
> * count to the generic event atomically:
> */
> -again:
> prev_raw_count = local64_read(&hwc->prev_count);
> - rdpmcl(hwc->event_base_rdpmc, new_raw_count);
>
> - if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> - new_raw_count) != prev_raw_count)
> - goto again;
> + do {
> + rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> + } while (!local64_try_cmpxchg(&hwc->prev_count, &prev_raw_count,
> + new_raw_count));
>
> /*
> * Now we have the new raw value and have updated the prev
> diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
> index 94fbe6ae7431..540573f515b7 100644
> --- a/arch/x86/include/asm/cmpxchg.h
> +++ b/arch/x86/include/asm/cmpxchg.h
> @@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
> #define __try_cmpxchg(ptr, pold, new, size) \
> __raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
>
> +#define __try_cmpxchg_local(ptr, pold, new, size) \
> + __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
> +
> #define arch_try_cmpxchg(ptr, pold, new) \
> __try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
>
> +#define arch_try_cmpxchg_local(ptr, pold, new) \
> + __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
> +
> /*
> * xadd() adds "inc" to "*ptr" and atomically returns the previous
> * value of "*ptr".
> diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
> index 349a47acaa4a..d286a6c7c0b7 100644
> --- a/arch/x86/include/asm/local.h
> +++ b/arch/x86/include/asm/local.h
> @@ -120,8 +120,20 @@ static inline long local_sub_return(long i, local_t *l)
> #define local_inc_return(l) (local_add_return(1, l))
> #define local_dec_return(l) (local_sub_return(1, l))
>
> -#define local_cmpxchg(l, o, n) \
> - (cmpxchg_local(&((l)->a.counter), (o), (n)))
> +static inline long local_cmpxchg(local_t *l, long old, long new)
> +{
> + return cmpxchg_local(&l->a.counter, old, new);
> +}
> +
> +static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
> +{
> +#ifdef CONFIG_64BIT
> + return try_cmpxchg_local(&l->a.counter, (s64 *)old, new);
> +#else
> + return try_cmpxchg_local(&l->a.counter, (int *)old, new);
> +#endif
> +}
> +
> /* Always has a lock prefix */
> #define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))
>
> diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
> index fca7f1d84818..7f97018df66f 100644
> --- a/include/asm-generic/local.h
> +++ b/include/asm-generic/local.h
> @@ -42,6 +42,7 @@ typedef struct
> #define local_inc_return(l) atomic_long_inc_return(&(l)->a)
>
> #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
> +#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
> #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
> #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
> #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
> diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
> index 765be0b7d883..14963a7a6253 100644
> --- a/include/asm-generic/local64.h
> +++ b/include/asm-generic/local64.h
> @@ -42,7 +42,16 @@ typedef struct {
> #define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
> #define local64_inc_return(l) local_inc_return(&(l)->a)
>
> -#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
> +static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
> +{
> + return local_cmpxchg(&l->a, old, new);
> +}
> +
> +static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
> +{
> + return local_try_cmpxchg(&l->a, (long *)old, new);
> +}
> +
> #define local64_xchg(l, n) local_xchg((&(l)->a), (n))
> #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
> #define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
> @@ -81,6 +90,7 @@ typedef struct {
> #define local64_inc_return(l) atomic64_inc_return(&(l)->a)
>
> #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
> +#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
> #define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
> #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
> #define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)
> diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
> index 77bc5522e61c..36c92851cdee 100644
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -217,6 +217,28 @@
>
> #endif /* arch_try_cmpxchg64_relaxed */
>
> +#ifndef arch_try_cmpxchg_local
> +#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
> +({ \
> + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
> + if (unlikely(___r != ___o)) \
> + *___op = ___r; \
> + likely(___r == ___o); \
> +})
> +#endif /* arch_try_cmpxchg_local */
> +
> +#ifndef arch_try_cmpxchg64_local
> +#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
> +({ \
> + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
> + if (unlikely(___r != ___o)) \
> + *___op = ___r; \
> + likely(___r == ___o); \
> +})
> +#endif /* arch_try_cmpxchg64_local */
> +
> #ifndef arch_atomic_read_acquire
> static __always_inline int
> arch_atomic_read_acquire(const atomic_t *v)
> @@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
> #endif
>
> #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// b5e87bdd5ede61470c29f7a7e4de781af3770f09
> +// 1f49bd4895a4b7a5383906649027205c52ec80ab
> diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
> index 7a139ec030b0..14a9212cc987 100644
> --- a/include/linux/atomic/atomic-instrumented.h
> +++ b/include/linux/atomic/atomic-instrumented.h
> @@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
> arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
> })
>
> +#define try_cmpxchg_local(ptr, oldp, ...) \
> +({ \
> + typeof(ptr) __ai_ptr = (ptr); \
> + typeof(oldp) __ai_oldp = (oldp); \
> + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> + arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> +})
> +
> +#define try_cmpxchg64_local(ptr, oldp, ...) \
> +({ \
> + typeof(ptr) __ai_ptr = (ptr); \
> + typeof(oldp) __ai_oldp = (oldp); \
> + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> + arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> +})
> +
> #define cmpxchg_double(ptr, ...) \
> ({ \
> typeof(ptr) __ai_ptr = (ptr); \
> @@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
> })
>
> #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
> -// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
> +// 456e206c7e4e681126c482e4edcc6f46921ac731
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 273a0fe7910a..111ab85ee97d 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle,
>
> perf_output_get_handle(handle);
>
> + offset = local_read(&rb->head);
> do {
> tail = READ_ONCE(rb->user_page->data_tail);
> - offset = head = local_read(&rb->head);
> + head = offset;
> if (!rb->overwrite) {
> if (unlikely(!ring_buffer_has_space(head, tail,
> perf_data_size(rb),
> @@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle,
> head += size;
> else
> head -= size;
> - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> + } while (!local_try_cmpxchg(&rb->head, &offset, head));
>
> if (backward) {
> offset = head;
> diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
> index 3a07695e3c89..6e853f0dad8d 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
> gen_try_cmpxchg_fallbacks "${cmpxchg}"
> done
>
> +for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
> + gen_try_cmpxchg_fallback "${cmpxchg}" ""
> +done
> +
> grep '^[a-z]' "$1" | while read name meta args; do
> gen_proto "${meta}" "${name}" "atomic" "int" ${args}
> done
> diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> index 77c06526a574..c8165e9431bf 100755
> --- a/scripts/atomic/gen-atomic-instrumented.sh
> +++ b/scripts/atomic/gen-atomic-instrumented.sh
> @@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
> done
> done
>
> -for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
> +for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
> gen_xchg "${xchg}" "" ""
> printf "\n"
> done

2023-04-04 13:34:04

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

On Tue, Apr 4, 2023 at 3:19 PM Mark Rutland <[email protected]> wrote:
>
> On Tue, Apr 04, 2023 at 02:24:38PM +0200, Uros Bizjak wrote:
> > On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland <[email protected]> wrote:
> > >
> > > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> > > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
> > > > > > > >
> > > > > > > > Can you give an example of where we are passing an incompatible pointer?
> > > > > > >
> > > > > > > An example is patch 10/10 from the series, which will fail without
> > > > > > > this fix when fallback code is used. We have:
> > > > > > >
> > > > > > > - } while (local_cmpxchg(&rb->head, offset, head) != offset);
> > > > > > > + } while (!local_try_cmpxchg(&rb->head, &offset, head));
> > > > > > >
> > > > > > > where rb->head is defined as:
> > > > > > >
> > > > > > > typedef struct {
> > > > > > > atomic_long_t a;
> > > > > > > } local_t;
> > > > > > >
> > > > > > > while offset is defined as 'unsigned long'.
> > > > > >
> > > > > > Ok, but that's because we're doing the wrong thing to start with.
> > > > > >
> > > > > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> > > > > > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > > > > > that in the callsite:
> > > > > >
> > > > > > while (!local_try_cmpxchg(&rb->head, &(long *)offset, head))
> > > > >
> > > > > Sorry, that should be:
> > > > >
> > > > > while (!local_try_cmpxchg(&rb->head, (long *)&offset, head))
> > > >
> > > > The fallbacks are a bit more complicated than above, and are different
> > > > from atomic_try_cmpxchg.
> > > >
> > > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > > > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > > > try_cmpxchg_local is introduced, where it calls
> > > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > > > :
> > > >
> > > > #define local_cmpxchg(l, o, n) \
> > > > (cmpxchg_local(&((l)->a.counter), (o), (n)))
> > > > +#define local_try_cmpxchg(l, po, n) \
> > > > + (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> > > >
> > > > which is part of the local_t API. Targets should either define all
> > > > these #defines, or none. There are no partial fallbacks as is the case
> > > > with atomic_t.
> > >
> > > Whether or not there are fallbacks is immaterial.
> > >
> > > In those cases, architectures can just as easily write C wrappers, e.g.
> > >
> > > long local_cmpxchg(local_t *l, long old, long new)
> > > {
> > > return cmpxchg_local(&l->a.counter, old, new);
> > > }
> > >
> > > long local_try_cmpxchg(local_t *l, long *old, long new)
> > > {
> > > return try_cmpxchg_local(&l->a.counter, old, new);
> > > }
> >
> > Please find attached the complete prototype patch that implements the
> > above suggestion.
> >
> > The patch includes:
> > - implementation of instrumented try_cmpxchg{,64}_local definitions
> > - corresponding arch_try_cmpxchg{,64}_local fallback definitions
> > - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers
> >
> > - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
> > - x86 specific arch_try_cmpxchg_local definition
> >
> > - kernel/events/ring_buffer.c change to test local_try_cmpxchg
> > implementation and illustrate the transition
> > - arch/x86/events/core.c change to test local64_try_cmpxchg
> > implementation and illustrate the transition
> >
> > The definition of atomic_long_t is different for 64-bit and 32-bit
> > targets (s64 vs int), so target specific C wrappers have to use
> > different casts to account for this difference.
> >
> > Uros.
>
> Thanks for this!
>
> FWIW, the patch (inline below) looks good to me.

Thanks, I will prepare a patch series for submission later today.

Uros.