2023-04-05 14:22:06

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

4b3: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax
4ba: 48 8b b8 08 04 00 00 mov 0x408(%rax),%rdi
4c1: 8b 42 1c mov 0x1c(%rdx),%eax
4c4: 48 8b 4a 28 mov 0x28(%rdx),%rcx
4c8: 85 c0 test %eax,%eax
...
4ef: 48 89 c8 mov %rcx,%rax
4f2: 48 0f b1 7a 28 cmpxchg %rdi,0x28(%rdx)
4f7: 48 39 c1 cmp %rax,%rcx
4fa: 75 b7 jne 4b3 <...>

to

4b2: 48 8b 4a 28 mov 0x28(%rdx),%rcx
4b6: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax
4bd: 48 8b b0 08 04 00 00 mov 0x408(%rax),%rsi
4c4: 8b 42 1c mov 0x1c(%rdx),%eax
4c7: 85 c0 test %eax,%eax
...
4d4: 48 89 c8 mov %rcx,%rax
4d7: 48 0f b1 72 28 cmpxchg %rsi,0x28(%rdx)
4dc: 0f 85 d0 00 00 00 jne 5b2 <...>
...
5b2: 48 89 c1 mov %rax,%rcx
5b5: e9 fc fe ff ff jmp 4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.
---
v2:

Implement target specific support for local_try_cmpxchg and
local_cmpxchg using typed C wrappers that call their _local
counterpart and provide additional checking of their input
arguments.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: Jun Yi <[email protected]>

Uros Bizjak (5):
locking/atomic: Add generic try_cmpxchg{,64}_local support
locking/generic: Wire up local{,64}_try_cmpxchg
locking/arch: Wire up local_try_cmpxchg
locking/x86: Define arch_try_cmpxchg_local
events: Illustrate the transition to local{,64}_try_cmpxchg

arch/alpha/include/asm/local.h | 12 +++++++++--
arch/loongarch/include/asm/local.h | 13 +++++++++--
arch/mips/include/asm/local.h | 13 +++++++++--
arch/powerpc/include/asm/local.h | 11 ++++++++++
arch/x86/events/core.c | 9 ++++----
arch/x86/include/asm/cmpxchg.h | 6 ++++++
arch/x86/include/asm/local.h | 13 +++++++++--
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 12 ++++++++++-
include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
kernel/events/ring_buffer.c | 5 +++--
scripts/atomic/gen-atomic-fallback.sh | 4 ++++
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
14 files changed, 126 insertions(+), 19 deletions(-)

--
2.39.2


2023-04-05 14:22:15

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support

Add generic support for try_cmpxchg{,64}_local and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Mark Rutland <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
scripts/atomic/gen-atomic-fallback.sh | 4 ++++
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 77bc5522e61c..36c92851cdee 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@

#endif /* arch_try_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// b5e87bdd5ede61470c29f7a7e4de781af3770f09
+// 1f49bd4895a4b7a5383906649027205c52ec80ab
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 7a139ec030b0..14a9212cc987 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
#define cmpxchg_double(ptr, ...) \
({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
})

#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
+// 456e206c7e4e681126c482e4edcc6f46921ac731
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695e3c89..6e853f0dad8d 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
done

+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+ gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 77c06526a574..c8165e9431bf 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
done
done

-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
done
--
2.39.2

2023-04-05 14:22:20

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg

Implement generic support for local{,64}_try_cmpxchg.

Redirect to the atomic_ family of functions when the target
does not provide its own local.h definitions.

For 64-bit targets, implement local64_try_cmpxchg and
local64_cmpxchg using typed C wrappers that call local_
family of functions and provide additional checking
of their input arguments.

Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d84818..7f97018df66f 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
#define local_inc_return(l) atomic_long_inc_return(&(l)->a)

#define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
#define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
#define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b7d883..14963a7a6253 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -42,7 +42,16 @@ typedef struct {
#define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
#define local64_inc_return(l) local_inc_return(&(l)->a)

-#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
+{
+ return local_cmpxchg(&l->a, old, new);
+}
+
+static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
+{
+ return local_try_cmpxchg(&l->a, (long *)old, new);
+}
+
#define local64_xchg(l, n) local_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
@@ -81,6 +90,7 @@ typedef struct {
#define local64_inc_return(l) atomic64_inc_return(&(l)->a)

#define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
#define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)
--
2.39.2

2023-04-05 14:22:22

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg
and local_cmpxchg using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: Jun Yi <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/alpha/include/asm/local.h | 12 ++++++++++--
arch/loongarch/include/asm/local.h | 13 +++++++++++--
arch/mips/include/asm/local.h | 13 +++++++++++--
arch/powerpc/include/asm/local.h | 11 +++++++++++
arch/x86/include/asm/local.h | 13 +++++++++++--
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1c93d5..0fcaad642cc3 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ return try_cmpxchg_local(&l->a.counter, (s64 *)old, new);
+}
+
#define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))

/**
diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h
index 65fbbae9fc4d..83e995b30e47 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1fd273..5daf6fe8e3e9 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19b7fc2..45492fb5bf22 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
return t;
}

+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+ long o = *po, r;
+
+ r = local_cmpxchg(l, o, n);
+ if (unlikely(r != o))
+ *po = r;
+
+ return likely(r == o);
+}
+
static __inline__ long local_xchg(local_t *l, long n)
{
long t;
diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47acaa4a..56d4ef604b91 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l)
#define local_inc_return(l) (local_add_return(1, l))
#define local_dec_return(l) (local_sub_return(1, l))

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
/* Always has a lock prefix */
#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))

--
2.39.2

2023-04-05 14:23:03

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg

This patch illustrates the transition to local{,64}_try_cmpxchg.
It is not intended to be merged as-is.

Signed-off-by: Uros Bizjak <[email protected]>
---
arch/x86/events/core.c | 9 ++++-----
kernel/events/ring_buffer.c | 5 +++--
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d096b04bf80e..d9310e9363f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
* exchange a new raw count - then add that new-prev delta
* count to the generic event atomically:
*/
-again:
prev_raw_count = local64_read(&hwc->prev_count);
- rdpmcl(hwc->event_base_rdpmc, new_raw_count);

- if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
- new_raw_count) != prev_raw_count)
- goto again;
+ do {
+ rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+ } while (!local64_try_cmpxchg(&hwc->prev_count, &prev_raw_count,
+ new_raw_count));

/*
* Now we have the new raw value and have updated the prev
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 273a0fe7910a..111ab85ee97d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle,

perf_output_get_handle(handle);

+ offset = local_read(&rb->head);
do {
tail = READ_ONCE(rb->user_page->data_tail);
- offset = head = local_read(&rb->head);
+ head = offset;
if (!rb->overwrite) {
if (unlikely(!ring_buffer_has_space(head, tail,
perf_data_size(rb),
@@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle,
head += size;
else
head -= size;
- } while (local_cmpxchg(&rb->head, offset, head) != offset);
+ } while (!local_try_cmpxchg(&rb->head, &offset, head));

if (backward) {
offset = head;
--
2.39.2

2023-04-05 14:23:20

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local

Define target specific arch_try_cmpxchg_local. This
definition overrides the generic arch_try_cmpxchg_local
fallback definition and enables target-specific
implementation of try_cmpxchg_local.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/x86/include/asm/cmpxchg.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6ae7431..540573f515b7 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

+#define __try_cmpxchg_local(ptr, pold, new, size) \
+ __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

+#define arch_try_cmpxchg_local(ptr, pold, new) \
+ __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
/*
* xadd() adds "inc" to "*ptr" and atomically returns the previous
* value of "*ptr".
--
2.39.2

2023-04-05 16:46:03

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

On 4/5/23 07:17, Uros Bizjak wrote:
> Add generic and target specific support for local{,64}_try_cmpxchg
> and wire up support for all targets that use local_t infrastructure.

I feel like I'm missing some context.

What are the actual end user visible effects of this series? Is there a
measurable decrease in perf overhead? Why go to all this trouble for
perf? Who else will use local_try_cmpxchg()?

I'm all for improving things, and perf is an important user. But, if
the goal here is improving performance, it would be nice to see at least
a stab at quantifying the performance delta.

2023-04-05 19:09:05

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

On Wed, Apr 5, 2023 at 6:37 PM Dave Hansen <[email protected]> wrote:
>
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
>
> I feel like I'm missing some context.
>
> What are the actual end user visible effects of this series? Is there a
> measurable decrease in perf overhead? Why go to all this trouble for
> perf? Who else will use local_try_cmpxchg()?

This functionality was requested by perf people [1], so perhaps Steven
can give us some concrete examples. In general, apart from the removal
of unneeded compare instruction on x86, usage of try_cmpxchg also
results in slightly better code on non-x86 targets [2], since the code
now correctly identifies fast-path through the cmpxchg loop.

Also important is that try_cmpxchg code reuses the result of cmpxchg
instruction in the loop, so a read from the memory in the loop is
eliminated. When reviewing the cmpxchg usage sites, I found numerous
places where unnecessary read from memory was present in the loop, two
examples can be seen in the last patch of this series.

Also, using try_cmpxchg prevents inconsistencies of the cmpxchg loop,
where the result of the cmpxchg is compared with the wrong "old" value
- one such bug is still lurking in x86 APIC code, please see [3].

Please note that apart from perf subsystem, event subsystem can also
be improved by using local_try_cmpxchg. This is the reason that the
last patch includes a change in events/core.c.

> I'm all for improving things, and perf is an important user. But, if
> the goal here is improving performance, it would be nice to see at least
> a stab at quantifying the performance delta.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/[email protected]/

Uros.

2023-04-06 08:28:13

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

From: Dave Hansen
> Sent: 05 April 2023 17:37
>
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
>
> I feel like I'm missing some context.
>
> What are the actual end user visible effects of this series? Is there a
> measurable decrease in perf overhead? Why go to all this trouble for
> perf? Who else will use local_try_cmpxchg()?

I'm assuming the local_xxx operations only have to be save wrt interrupts?
On x86 it is possible that an alternate instruction sequence
that doesn't use a locked instruction may actually be faster!

Although, maybe, any kind of locked cmpxchg just needs to ensure
the cache line isn't 'stolen', so apart from possible slight
delays on another cpu that gets a cache miss for the line in
all makes little difference.
The cache line miss costs a lot anyway, line bouncing more
and is best avoided.
So is there actually much of a benefit at all?

Clearly the try_cmpxchg help - but that is a different issue.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-04-06 08:42:13

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

On Thu, Apr 6, 2023 at 10:26 AM David Laight <[email protected]> wrote:
>
> From: Dave Hansen
> > Sent: 05 April 2023 17:37
> >
> > On 4/5/23 07:17, Uros Bizjak wrote:
> > > Add generic and target specific support for local{,64}_try_cmpxchg
> > > and wire up support for all targets that use local_t infrastructure.
> >
> > I feel like I'm missing some context.
> >
> > What are the actual end user visible effects of this series? Is there a
> > measurable decrease in perf overhead? Why go to all this trouble for
> > perf? Who else will use local_try_cmpxchg()?
>
> I'm assuming the local_xxx operations only have to be save wrt interrupts?
> On x86 it is possible that an alternate instruction sequence
> that doesn't use a locked instruction may actually be faster!

Please note that "local" functions do not use lock prefix. Only atomic
properties of cmpxchg instruction are exploited since it only needs to
be safe wrt interrupts.

Uros.

> Although, maybe, any kind of locked cmpxchg just needs to ensure
> the cache line isn't 'stolen', so apart from possible slight
> delays on another cpu that gets a cache miss for the line in
> all makes little difference.
> The cache line miss costs a lot anyway, line bouncing more
> and is best avoided.
> So is there actually much of a benefit at all?
>
> Clearly the try_cmpxchg help - but that is a different issue.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2023-04-06 09:09:33

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

From: Uros Bizjak
> Sent: 06 April 2023 09:39
>
> On Thu, Apr 6, 2023 at 10:26 AM David Laight <[email protected]> wrote:
> >
> > From: Dave Hansen
> > > Sent: 05 April 2023 17:37
> > >
> > > On 4/5/23 07:17, Uros Bizjak wrote:
> > > > Add generic and target specific support for local{,64}_try_cmpxchg
> > > > and wire up support for all targets that use local_t infrastructure.
> > >
> > > I feel like I'm missing some context.
> > >
> > > What are the actual end user visible effects of this series? Is there a
> > > measurable decrease in perf overhead? Why go to all this trouble for
> > > perf? Who else will use local_try_cmpxchg()?
> >
> > I'm assuming the local_xxx operations only have to be save wrt interrupts?
> > On x86 it is possible that an alternate instruction sequence
> > that doesn't use a locked instruction may actually be faster!
>
> Please note that "local" functions do not use lock prefix. Only atomic
> properties of cmpxchg instruction are exploited since it only needs to
> be safe wrt interrupts.

Gah, I was assuming that LOCK was implied - like it is for xchg
and all the bit instructions.

In any case I suspect it makes little difference unless the
locked variant affects the instruction pipeline.
In fact, you may want to stop the cacheline being invalidated
between the read and write in order to avoid an extra cache
line bounce.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-04-11 11:14:04

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support

On Wed, Apr 05, 2023 at 04:17:06PM +0200, Uros Bizjak wrote:
> Add generic support for try_cmpxchg{,64}_local and their falbacks.
>
> These provides the generic try_cmpxchg_local family of functions
> from the arch_ prefixed version, also adding explicit instrumentation.
>
> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Signed-off-by: Uros Bizjak <[email protected]>

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
> include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
> scripts/atomic/gen-atomic-fallback.sh | 4 ++++
> scripts/atomic/gen-atomic-instrumented.sh | 2 +-
> 4 files changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
> index 77bc5522e61c..36c92851cdee 100644
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -217,6 +217,28 @@
>
> #endif /* arch_try_cmpxchg64_relaxed */
>
> +#ifndef arch_try_cmpxchg_local
> +#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
> +({ \
> + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
> + if (unlikely(___r != ___o)) \
> + *___op = ___r; \
> + likely(___r == ___o); \
> +})
> +#endif /* arch_try_cmpxchg_local */
> +
> +#ifndef arch_try_cmpxchg64_local
> +#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
> +({ \
> + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
> + if (unlikely(___r != ___o)) \
> + *___op = ___r; \
> + likely(___r == ___o); \
> +})
> +#endif /* arch_try_cmpxchg64_local */
> +
> #ifndef arch_atomic_read_acquire
> static __always_inline int
> arch_atomic_read_acquire(const atomic_t *v)
> @@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
> #endif
>
> #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// b5e87bdd5ede61470c29f7a7e4de781af3770f09
> +// 1f49bd4895a4b7a5383906649027205c52ec80ab
> diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
> index 7a139ec030b0..14a9212cc987 100644
> --- a/include/linux/atomic/atomic-instrumented.h
> +++ b/include/linux/atomic/atomic-instrumented.h
> @@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
> arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
> })
>
> +#define try_cmpxchg_local(ptr, oldp, ...) \
> +({ \
> + typeof(ptr) __ai_ptr = (ptr); \
> + typeof(oldp) __ai_oldp = (oldp); \
> + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> + arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> +})
> +
> +#define try_cmpxchg64_local(ptr, oldp, ...) \
> +({ \
> + typeof(ptr) __ai_ptr = (ptr); \
> + typeof(oldp) __ai_oldp = (oldp); \
> + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
> + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
> + arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
> +})
> +
> #define cmpxchg_double(ptr, ...) \
> ({ \
> typeof(ptr) __ai_ptr = (ptr); \
> @@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
> })
>
> #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
> -// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
> +// 456e206c7e4e681126c482e4edcc6f46921ac731
> diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
> index 3a07695e3c89..6e853f0dad8d 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
> gen_try_cmpxchg_fallbacks "${cmpxchg}"
> done
>
> +for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
> + gen_try_cmpxchg_fallback "${cmpxchg}" ""
> +done
> +
> grep '^[a-z]' "$1" | while read name meta args; do
> gen_proto "${meta}" "${name}" "atomic" "int" ${args}
> done
> diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> index 77c06526a574..c8165e9431bf 100755
> --- a/scripts/atomic/gen-atomic-instrumented.sh
> +++ b/scripts/atomic/gen-atomic-instrumented.sh
> @@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
> done
> done
>
> -for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
> +for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
> gen_xchg "${xchg}" "" ""
> printf "\n"
> done
> --
> 2.39.2
>

2023-04-11 11:23:24

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg

On Wed, Apr 05, 2023 at 04:17:07PM +0200, Uros Bizjak wrote:
> Implement generic support for local{,64}_try_cmpxchg.
>
> Redirect to the atomic_ family of functions when the target
> does not provide its own local.h definitions.
>
> For 64-bit targets, implement local64_try_cmpxchg and
> local64_cmpxchg using typed C wrappers that call local_
> family of functions and provide additional checking
> of their input arguments.
>
> Cc: Arnd Bergmann <[email protected]>
> Signed-off-by: Uros Bizjak <[email protected]>

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> include/asm-generic/local.h | 1 +
> include/asm-generic/local64.h | 12 +++++++++++-
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
> index fca7f1d84818..7f97018df66f 100644
> --- a/include/asm-generic/local.h
> +++ b/include/asm-generic/local.h
> @@ -42,6 +42,7 @@ typedef struct
> #define local_inc_return(l) atomic_long_inc_return(&(l)->a)
>
> #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
> +#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
> #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
> #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
> #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
> diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
> index 765be0b7d883..14963a7a6253 100644
> --- a/include/asm-generic/local64.h
> +++ b/include/asm-generic/local64.h
> @@ -42,7 +42,16 @@ typedef struct {
> #define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
> #define local64_inc_return(l) local_inc_return(&(l)->a)
>
> -#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
> +static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
> +{
> + return local_cmpxchg(&l->a, old, new);
> +}
> +
> +static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
> +{
> + return local_try_cmpxchg(&l->a, (long *)old, new);
> +}
> +
> #define local64_xchg(l, n) local_xchg((&(l)->a), (n))
> #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
> #define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
> @@ -81,6 +90,7 @@ typedef struct {
> #define local64_inc_return(l) atomic64_inc_return(&(l)->a)
>
> #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
> +#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
> #define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
> #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
> #define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)
> --
> 2.39.2
>

2023-04-11 11:36:42

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

On Wed, Apr 05, 2023 at 09:37:04AM -0700, Dave Hansen wrote:
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
>
> I feel like I'm missing some context.
>
> What are the actual end user visible effects of this series? Is there a
> measurable decrease in perf overhead? Why go to all this trouble for
> perf? Who else will use local_try_cmpxchg()?

Overall, the theory is that it can generate slightly better code (e.g. by
reusing the flags on x86). In practice, that might be in the noise, but as
demonstrated in prior postings the code generation is no worse than before.

From my perspective, the more important part is that this aligns local_t with
the other atomic*_t APIs, which all have ${atomictype}_try_cmpxchg(), and for
consistency/legibility/maintainability it's nice to be able to use the same
code patterns, e.g.

${inttype} new, old = ${atomictype}_read(ptr);
do {
...
new = do_something_with(old);
} while (${atomictype}_try_cmpxvhg(ptr, &oldval, newval);

> I'm all for improving things, and perf is an important user. But, if
> the goal here is improving performance, it would be nice to see at least
> a stab at quantifying the performance delta.

IIUC, Steve's original request for local_try_cmpxchg() was a combination of a
theoretical performance benefit and a more general preference to use
try_cmpxchg() for consistency / better structure of the source code:

https://lore.kernel.org/lkml/[email protected]/

I agree it'd be nice to have performance figures, but I think those would only
need to demonstrate a lack of a regression rather than a performance
improvement, and I think it's fairly clear from eyeballing the generated
instructions that a regression isn't likely.

Thanks,
Mark.

2023-04-11 13:45:48

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

On 4/11/23 04:35, Mark Rutland wrote:
> I agree it'd be nice to have performance figures, but I think those would only
> need to demonstrate a lack of a regression rather than a performance
> improvement, and I think it's fairly clear from eyeballing the generated
> instructions that a regression isn't likely.

Thanks for the additional context.

I totally agree that there's zero burden here to show a performance
increase. If anyone can think of a quick way to do _some_ kind of
benchmark on the code being changed and just show that it's free of
brown paper bags, it would be appreciated. Nothing crazy, just think of
one workload (synthetic or not) that will stress the paths being changed
and run it with and without these changes. Make sure there are not
surprises.

I also agree that it's unlikely to be brown paper bag material.

2023-04-11 21:39:11

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

From: Dave Hansen
> Sent: 11 April 2023 14:44
>
> On 4/11/23 04:35, Mark Rutland wrote:
> > I agree it'd be nice to have performance figures, but I think those would only
> > need to demonstrate a lack of a regression rather than a performance
> > improvement, and I think it's fairly clear from eyeballing the generated
> > instructions that a regression isn't likely.
>
> Thanks for the additional context.
>
> I totally agree that there's zero burden here to show a performance
> increase. If anyone can think of a quick way to do _some_ kind of
> benchmark on the code being changed and just show that it's free of
> brown paper bags, it would be appreciated. Nothing crazy, just think of
> one workload (synthetic or not) that will stress the paths being changed
> and run it with and without these changes. Make sure there are not
> surprises.
>
> I also agree that it's unlikely to be brown paper bag material.

The only thing I can think of is that, on x86, the locked
variant may actually be faster!
Both require exclusive access to the cache line (the unlocked
variant always does the write! [1]).
So if the cache line is contended between cpu the unlocked
variant might ping-pong the cache line twice!
Of course, if the line is shared like that then performance
is horrid.

[1] I checked on an uncached PCIe address on which I can monitor
the TLP. The write always happens so you can use cmpxchg18b
with a 'known bad value' to do a 16 byte read as a single TLP
(without using an SSE register).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-04-12 11:49:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
> index bc4bd19b7fc2..45492fb5bf22 100644
> --- a/arch/powerpc/include/asm/local.h
> +++ b/arch/powerpc/include/asm/local.h
> @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
> return t;
> }
>
> +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> +{
> + long o = *po, r;
> +
> + r = local_cmpxchg(l, o, n);
> + if (unlikely(r != o))
> + *po = r;
> +
> + return likely(r == o);
> +}
> +

Why is the ppc one different from the rest? Why can't it use the
try_cmpxchg_local() fallback and needs to have it open-coded?

2023-04-12 13:41:03

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

On Wed, Apr 12, 2023 at 1:33 PM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> > diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
> > index bc4bd19b7fc2..45492fb5bf22 100644
> > --- a/arch/powerpc/include/asm/local.h
> > +++ b/arch/powerpc/include/asm/local.h
> > @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
> > return t;
> > }
> >
> > +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> > +{
> > + long o = *po, r;
> > +
> > + r = local_cmpxchg(l, o, n);
> > + if (unlikely(r != o))
> > + *po = r;
> > +
> > + return likely(r == o);
> > +}
> > +
>
> Why is the ppc one different from the rest? Why can't it use the
> try_cmpxchg_local() fallback and needs to have it open-coded?

Please note that ppc directly defines local_cmpxchg that bypasses
cmpxchg_local/arch_cmpxchg_local machinery. The patch takes the same
approach for local_try_cmpxchg, because fallbacks are using
arch_cmpxchg_local definitions.

PPC should be converted to use arch_cmpxchg_local (to also enable
instrumentation), but this is not the scope of the proposed patchset.

Uros.

2023-04-12 13:42:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

On Wed, Apr 12, 2023 at 03:37:50PM +0200, Uros Bizjak wrote:
> On Wed, Apr 12, 2023 at 1:33 PM Peter Zijlstra <[email protected]> wrote:
> >
> > On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> > > diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
> > > index bc4bd19b7fc2..45492fb5bf22 100644
> > > --- a/arch/powerpc/include/asm/local.h
> > > +++ b/arch/powerpc/include/asm/local.h
> > > @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
> > > return t;
> > > }
> > >
> > > +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> > > +{
> > > + long o = *po, r;
> > > +
> > > + r = local_cmpxchg(l, o, n);
> > > + if (unlikely(r != o))
> > > + *po = r;
> > > +
> > > + return likely(r == o);
> > > +}
> > > +
> >
> > Why is the ppc one different from the rest? Why can't it use the
> > try_cmpxchg_local() fallback and needs to have it open-coded?
>
> Please note that ppc directly defines local_cmpxchg that bypasses
> cmpxchg_local/arch_cmpxchg_local machinery. The patch takes the same
> approach for local_try_cmpxchg, because fallbacks are using
> arch_cmpxchg_local definitions.
>
> PPC should be converted to use arch_cmpxchg_local (to also enable
> instrumentation), but this is not the scope of the proposed patchset.

Ah indeed. Thanks!

Subject: [tip: locking/core] locking/x86: Define arch_try_cmpxchg_local

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 561b081f19655a46a9690e868fa4aa516ff1b62c
Gitweb: https://git.kernel.org/tip/561b081f19655a46a9690e868fa4aa516ff1b62c
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:09 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 12 Apr 2023 16:46:35 +02:00

locking/x86: Define arch_try_cmpxchg_local

Define target specific arch_try_cmpxchg_local. This
definition overrides the generic arch_try_cmpxchg_local
fallback definition and enables target-specific
implementation of try_cmpxchg_local.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/cmpxchg.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6a..540573f 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

+#define __try_cmpxchg_local(ptr, pold, new, size) \
+ __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

+#define arch_try_cmpxchg_local(ptr, pold, new) \
+ __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
/*
* xadd() adds "inc" to "*ptr" and atomically returns the previous
* value of "*ptr".

Subject: [tip: locking/core] locking/generic: Wire up local{,64}_try_cmpxchg

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 1e9653c8289cf4b3c8690b558cdbe717d99e673a
Gitweb: https://git.kernel.org/tip/1e9653c8289cf4b3c8690b558cdbe717d99e673a
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:07 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 12 Apr 2023 16:46:34 +02:00

locking/generic: Wire up local{,64}_try_cmpxchg

Implement generic support for local{,64}_try_cmpxchg.

Redirect to the atomic_ family of functions when the target
does not provide its own local.h definitions.

For 64-bit targets, implement local64_try_cmpxchg and
local64_cmpxchg using typed C wrappers that call local_
family of functions and provide additional checking
of their input arguments.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d..7f97018 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
#define local_inc_return(l) atomic_long_inc_return(&(l)->a)

#define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
#define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
#define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b..14963a7 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -42,7 +42,16 @@ typedef struct {
#define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
#define local64_inc_return(l) local_inc_return(&(l)->a)

-#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
+{
+ return local_cmpxchg(&l->a, old, new);
+}
+
+static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
+{
+ return local_try_cmpxchg(&l->a, (long *)old, new);
+}
+
#define local64_xchg(l, n) local_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
@@ -81,6 +90,7 @@ typedef struct {
#define local64_inc_return(l) atomic64_inc_return(&(l)->a)

#define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
#define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)

Subject: [tip: locking/core] locking/atomic: Add generic try_cmpxchg{,64}_local support

The following commit has been merged into the locking/core branch of tip:

Commit-ID: e18230fccee14c34e95594e78024c9755e454ae2
Gitweb: https://git.kernel.org/tip/e18230fccee14c34e95594e78024c9755e454ae2
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:06 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 12 Apr 2023 16:46:33 +02:00

locking/atomic: Add generic try_cmpxchg{,64}_local support

Add generic support for try_cmpxchg{,64}_local and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/atomic/atomic-arch-fallback.h | 24 +++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
scripts/atomic/gen-atomic-fallback.sh | 4 +++-
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 4226379..a6e4437 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@

#endif /* arch_try_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2646,4 +2668,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 00071fffa021cec66f6290d706d69c91df87bade
+// ad2e2b4d168dbc60a73922616047a9bfa446af36
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 0496816..245ba66 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2132,6 +2132,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
#define cmpxchg_double(ptr, ...) \
({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2149,4 +2167,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
})

#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 1b485de9cbaa4900de59e14ee2084357eaeb1c3a
+// 97fe4d79aa058d2164df824632cbc4f716d2a407
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695..6e853f0 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
done

+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+ gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 77c0652..c8165e9 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
done
done

-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
done

Subject: [tip: locking/core] locking/arch: Wire up local_try_cmpxchg

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 62de78f1c39a6b3f456e9dfe000b3d92b9a58ee4
Gitweb: https://git.kernel.org/tip/62de78f1c39a6b3f456e9dfe000b3d92b9a58ee4
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:08 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 12 Apr 2023 16:46:34 +02:00

locking/arch: Wire up local_try_cmpxchg

Implement target specific support for local_try_cmpxchg
and local_cmpxchg using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/alpha/include/asm/local.h | 12 ++++++++++--
arch/loongarch/include/asm/local.h | 13 +++++++++++--
arch/mips/include/asm/local.h | 13 +++++++++++--
arch/powerpc/include/asm/local.h | 11 +++++++++++
arch/x86/include/asm/local.h | 13 +++++++++++--
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1..0fcaad6 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ return try_cmpxchg_local(&l->a.counter, (s64 *)old, new);
+}
+
#define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))

/**
diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h
index 65fbbae..83e995b 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1..5daf6fe 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19..45492fb 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
return t;
}

+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+ long o = *po, r;
+
+ r = local_cmpxchg(l, o, n);
+ if (unlikely(r != o))
+ *po = r;
+
+ return likely(r == o);
+}
+
static __inline__ long local_xchg(local_t *l, long n)
{
long t;
diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47a..56d4ef6 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l)
#define local_inc_return(l) (local_add_return(1, l))
#define local_dec_return(l) (local_sub_return(1, l))

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
/* Always has a lock prefix */
#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))

Subject: [tip: locking/core] locking/generic: Wire up local{,64}_try_cmpxchg()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 8184667fd00ee1081aa37b3300947d1bfc294210
Gitweb: https://git.kernel.org/tip/8184667fd00ee1081aa37b3300947d1bfc294210
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:07 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 08:52:16 +02:00

locking/generic: Wire up local{,64}_try_cmpxchg()

Implement generic support for local{,64}_try_cmpxchg().

Redirect to the atomic_ family of functions when the target
does not provide its own local.h definitions.

For 64-bit targets, implement local64_try_cmpxchg and
local64_cmpxchg using typed C wrappers that call local_
family of functions and provide additional checking
of their input arguments.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d..7f97018 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
#define local_inc_return(l) atomic_long_inc_return(&(l)->a)

#define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
#define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
#define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b..14963a7 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -42,7 +42,16 @@ typedef struct {
#define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
#define local64_inc_return(l) local_inc_return(&(l)->a)

-#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
+{
+ return local_cmpxchg(&l->a, old, new);
+}
+
+static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
+{
+ return local_try_cmpxchg(&l->a, (long *)old, new);
+}
+
#define local64_xchg(l, n) local_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
@@ -81,6 +90,7 @@ typedef struct {
#define local64_inc_return(l) atomic64_inc_return(&(l)->a)

#define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
#define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)

Subject: [tip: locking/core] locking/x86: Define arch_try_cmpxchg_local()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 1ab70804ef3f829c5a738cba8627f0a0c2486d2b
Gitweb: https://git.kernel.org/tip/1ab70804ef3f829c5a738cba8627f0a0c2486d2b
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:09 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 08:52:44 +02:00

locking/x86: Define arch_try_cmpxchg_local()

Define target specific arch_try_cmpxchg_local(). This
definition overrides the generic arch_try_cmpxchg_local()
fallback definition and enables target-specific
implementation of try_cmpxchg_local().

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
arch/x86/include/asm/cmpxchg.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6a..540573f 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

+#define __try_cmpxchg_local(ptr, pold, new, size) \
+ __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

+#define arch_try_cmpxchg_local(ptr, pold, new) \
+ __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
/*
* xadd() adds "inc" to "*ptr" and atomically returns the previous
* value of "*ptr".

Subject: [tip: locking/core] locking/arch: Wire up local_try_cmpxchg()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: bdf02b98474827e83e7e9254904eb18eb85552cd
Gitweb: https://git.kernel.org/tip/bdf02b98474827e83e7e9254904eb18eb85552cd
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:08 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 08:52:25 +02:00

locking/arch: Wire up local_try_cmpxchg()

Implement target specific support for local_try_cmpxchg()
and local_cmpxchg() using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
arch/alpha/include/asm/local.h | 12 ++++++++++--
arch/loongarch/include/asm/local.h | 13 +++++++++++--
arch/mips/include/asm/local.h | 13 +++++++++++--
arch/powerpc/include/asm/local.h | 11 +++++++++++
arch/x86/include/asm/local.h | 13 +++++++++++--
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1..0fcaad6 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ return try_cmpxchg_local(&l->a.counter, (s64 *)old, new);
+}
+
#define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))

/**
diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h
index 65fbbae..83e995b 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1..5daf6fe 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19..45492fb 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
return t;
}

+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+ long o = *po, r;
+
+ r = local_cmpxchg(l, o, n);
+ if (unlikely(r != o))
+ *po = r;
+
+ return likely(r == o);
+}
+
static __inline__ long local_xchg(local_t *l, long n)
{
long t;
diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47a..56d4ef6 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l)
#define local_inc_return(l) (local_add_return(1, l))
#define local_dec_return(l) (local_sub_return(1, l))

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
/* Always has a lock prefix */
#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))

Subject: [tip: locking/core] locking/atomic: Add generic try_cmpxchg{,64}_local() support

The following commit has been merged into the locking/core branch of tip:

Commit-ID: cd6637248298aca6f9f7df489e73baa727409210
Gitweb: https://git.kernel.org/tip/cd6637248298aca6f9f7df489e73baa727409210
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:06 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 08:51:55 +02:00

locking/atomic: Add generic try_cmpxchg{,64}_local() support

Add generic support for try_cmpxchg{,64}_local() and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
include/linux/atomic/atomic-arch-fallback.h | 24 +++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
scripts/atomic/gen-atomic-fallback.sh | 4 +++-
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 4226379..a6e4437 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@

#endif /* arch_try_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2646,4 +2668,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 00071fffa021cec66f6290d706d69c91df87bade
+// ad2e2b4d168dbc60a73922616047a9bfa446af36
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 0496816..245ba66 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2132,6 +2132,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
#define cmpxchg_double(ptr, ...) \
({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2149,4 +2167,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
})

#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 1b485de9cbaa4900de59e14ee2084357eaeb1c3a
+// 97fe4d79aa058d2164df824632cbc4f716d2a407
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695..6e853f0 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
done

+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+ gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 77c0652..c8165e9 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
done
done

-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
done

Subject: [tip: locking/core] locking/x86: Define arch_try_cmpxchg_local()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 5cd4c268412f802e71b7722c3ec4638b0fe04acd
Gitweb: https://git.kernel.org/tip/5cd4c268412f802e71b7722c3ec4638b0fe04acd
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:09 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 09:09:23 +02:00

locking/x86: Define arch_try_cmpxchg_local()

Define target specific arch_try_cmpxchg_local(). This
definition overrides the generic arch_try_cmpxchg_local()
fallback definition and enables target-specific
implementation of try_cmpxchg_local().

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
arch/x86/include/asm/cmpxchg.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6a..540573f 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

+#define __try_cmpxchg_local(ptr, pold, new, size) \
+ __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

+#define arch_try_cmpxchg_local(ptr, pold, new) \
+ __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
/*
* xadd() adds "inc" to "*ptr" and atomically returns the previous
* value of "*ptr".

Subject: [tip: locking/core] locking/arch: Wire up local_try_cmpxchg()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: d994f2c8e2410ebcb928df67baa6f04e29bc9d3e
Gitweb: https://git.kernel.org/tip/d994f2c8e2410ebcb928df67baa6f04e29bc9d3e
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:08 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 09:09:16 +02:00

locking/arch: Wire up local_try_cmpxchg()

Implement target specific support for local_try_cmpxchg()
and local_cmpxchg() using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
arch/alpha/include/asm/local.h | 12 ++++++++++--
arch/loongarch/include/asm/local.h | 13 +++++++++++--
arch/mips/include/asm/local.h | 13 +++++++++++--
arch/powerpc/include/asm/local.h | 11 +++++++++++
arch/x86/include/asm/local.h | 13 +++++++++++--
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1..0fcaad6 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ return try_cmpxchg_local(&l->a.counter, (s64 *)old, new);
+}
+
#define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))

/**
diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h
index 65fbbae..83e995b 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1..5daf6fe 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
}

-#define local_cmpxchg(l, o, n) \
- ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
#define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))

/**
diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19..45492fb 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n)
return t;
}

+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+ long o = *po, r;
+
+ r = local_cmpxchg(l, o, n);
+ if (unlikely(r != o))
+ *po = r;
+
+ return likely(r == o);
+}
+
static __inline__ long local_xchg(local_t *l, long n)
{
long t;
diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47a..56d4ef6 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l)
#define local_inc_return(l) (local_add_return(1, l))
#define local_dec_return(l) (local_sub_return(1, l))

-#define local_cmpxchg(l, o, n) \
- (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+ return cmpxchg_local(&l->a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+ typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+ return try_cmpxchg_local(&l->a.counter, __old, new);
+}
+
/* Always has a lock prefix */
#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))

Subject: [tip: locking/core] locking/atomic: Add generic try_cmpxchg{,64}_local() support

The following commit has been merged into the locking/core branch of tip:

Commit-ID: e6ce9d741163af0b846637ce6550ae8a671b1588
Gitweb: https://git.kernel.org/tip/e6ce9d741163af0b846637ce6550ae8a671b1588
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:06 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 09:09:02 +02:00

locking/atomic: Add generic try_cmpxchg{,64}_local() support

Add generic support for try_cmpxchg{,64}_local() and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
include/linux/atomic/atomic-arch-fallback.h | 24 +++++++++++++++++++-
include/linux/atomic/atomic-instrumented.h | 20 ++++++++++++++++-
scripts/atomic/gen-atomic-fallback.sh | 4 +++-
scripts/atomic/gen-atomic-instrumented.sh | 2 +-
4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index 4226379..a6e4437 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@

#endif /* arch_try_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2646,4 +2668,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 00071fffa021cec66f6290d706d69c91df87bade
+// ad2e2b4d168dbc60a73922616047a9bfa446af36
diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 0496816..245ba66 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2132,6 +2132,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
#define cmpxchg_double(ptr, ...) \
({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2149,4 +2167,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
})

#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 1b485de9cbaa4900de59e14ee2084357eaeb1c3a
+// 97fe4d79aa058d2164df824632cbc4f716d2a407
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695..6e853f0 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
done

+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+ gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 77c0652..c8165e9 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do
done
done

-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
done

Subject: [tip: locking/core] locking/generic: Wire up local{,64}_try_cmpxchg()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 8fc4fddaf9a184eea7da21290236a1764e608a01
Gitweb: https://git.kernel.org/tip/8fc4fddaf9a184eea7da21290236a1764e608a01
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 05 Apr 2023 16:17:07 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 29 Apr 2023 09:09:09 +02:00

locking/generic: Wire up local{,64}_try_cmpxchg()

Implement generic support for local{,64}_try_cmpxchg().

Redirect to the atomic_ family of functions when the target
does not provide its own local.h definitions.

For 64-bit targets, implement local64_try_cmpxchg and
local64_cmpxchg using typed C wrappers that call local_
family of functions and provide additional checking
of their input arguments.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Linus Torvalds <[email protected]>
---
include/asm-generic/local.h | 1 +
include/asm-generic/local64.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d..7f97018 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
#define local_inc_return(l) atomic_long_inc_return(&(l)->a)

#define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n))
#define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
#define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b..14963a7 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -42,7 +42,16 @@ typedef struct {
#define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
#define local64_inc_return(l) local_inc_return(&(l)->a)

-#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
+{
+ return local_cmpxchg(&l->a, old, new);
+}
+
+static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
+{
+ return local_try_cmpxchg(&l->a, (long *)old, new);
+}
+
#define local64_xchg(l, n) local_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) local_inc_not_zero(&(l)->a)
@@ -81,6 +90,7 @@ typedef struct {
#define local64_inc_return(l) atomic64_inc_return(&(l)->a)

#define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n))
#define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
#define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
#define local64_inc_not_zero(l) atomic64_inc_not_zero(&(l)->a)

2023-05-17 07:52:32

by Charlemagne Lasse

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

> +static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
> +{
> + typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
> + return try_cmpxchg_local(&l->a.counter, __old, new);
> +}
> +

This patch then causes following sparse errors:

./arch/x86/include/asm/local.h:131:16: warning: symbol '__old'
shadows an earlier one
./arch/x86/include/asm/local.h:130:30: originally declared here

This is then visible in all kinds of builds - which makes it hard to
find out actual problems with sparse.