2024-04-09 10:05:59

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 0/6] locking/atomic/x86: Improve arch_atomic*() family of functions

The following patch series improves x86 arch_atomic*()
family of functions and merges x86_32 and x86_64
arch_atomic64_fetch_{and,or,xor}() functions.

The patch series enables impressive assembly code
reductions for x86_32 target and lowers future maintenace
burden and technical debt of the source code by unifying
several functions between x86_32 and x86_32 targets.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>

Uros Bizjak (6):
locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32
locking/atomic/x86: Rewrite x86_32
arch_atomic64_{,fetch}_{and,or,xor}() functions
locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops
locking/atomic/x86: Merge x86_32 and x86_64
arch_atomic64_fetch_{and,or,xor}() functions
locking/atomic/x86: Define arch_atomic_sub() family using
arch_atomic_add() functions
locking/atomic/x86: Reorder a couple of arch_atomic64 functions

arch/x86/include/asm/atomic.h | 50 ++++++++++----
arch/x86/include/asm/atomic64_32.h | 103 ++++++++++-------------------
arch/x86/include/asm/atomic64_64.h | 44 +-----------
3 files changed, 75 insertions(+), 122 deletions(-)

--
2.44.0



2024-04-09 10:06:02

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 1/6] locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg to x86_32

Introduce arch_atomic64_try_cmpxchg for 32-bit targets to use
optimized target specific implementation instead of a generic one.
This implementation eliminates dual-word compare after
cmpxchg8b instruction and improves generated asm code from:

2273: f0 0f c7 0f lock cmpxchg8b (%edi)
2277: 8b 74 24 2c mov 0x2c(%esp),%esi
227b: 89 d3 mov %edx,%ebx
227d: 89 c2 mov %eax,%edx
227f: 89 5c 24 10 mov %ebx,0x10(%esp)
2283: 8b 7c 24 30 mov 0x30(%esp),%edi
2287: 89 44 24 1c mov %eax,0x1c(%esp)
228b: 31 f2 xor %esi,%edx
228d: 89 d0 mov %edx,%eax
228f: 89 da mov %ebx,%edx
2291: 31 fa xor %edi,%edx
2293: 09 d0 or %edx,%eax
2295: 0f 85 a5 00 00 00 jne 2340 <...>

to:

2270: f0 0f c7 0f lock cmpxchg8b (%edi)
2274: 0f 85 a6 00 00 00 jne 2320 <...>

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/atomic64_32.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index d510405e4e1d..11e817dab44a 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -61,12 +61,18 @@ ATOMIC64_DECL(add_unless);
#undef __ATOMIC64_DECL
#undef ATOMIC64_EXPORT

-static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
{
- return arch_cmpxchg64(&v->counter, o, n);
+ return arch_cmpxchg64(&v->counter, old, new);
}
#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg

+static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+ return arch_try_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+
static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
{
s64 o;
--
2.44.0


2024-04-09 10:07:05

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions to
use arch_atomic64_try_cmpxchg. This implementation avoids one extra
trip through the cmpxchg loop.

The value preload before the cmpxchg loop does not need to be atomic,
but should use READ_ONCE to prevent compiler from merging, refetching
or reordering the read.

The generated code improves from:

1917d5: 31 c9 xor %ecx,%ecx
1917d7: 31 db xor %ebx,%ebx
1917d9: 89 4c 24 3c mov %ecx,0x3c(%esp)
1917dd: 8b 74 24 24 mov 0x24(%esp),%esi
1917e1: 89 c8 mov %ecx,%eax
1917e3: 89 5c 24 34 mov %ebx,0x34(%esp)
1917e7: 8b 7c 24 28 mov 0x28(%esp),%edi
1917eb: 21 ce and %ecx,%esi
1917ed: 89 74 24 4c mov %esi,0x4c(%esp)
1917f1: 21 df and %ebx,%edi
1917f3: 89 de mov %ebx,%esi
1917f5: 89 7c 24 50 mov %edi,0x50(%esp)
1917f9: 8b 54 24 4c mov 0x4c(%esp),%edx
1917fd: 8b 7c 24 2c mov 0x2c(%esp),%edi
191801: 8b 4c 24 50 mov 0x50(%esp),%ecx
191805: 89 d3 mov %edx,%ebx
191807: 89 f2 mov %esi,%edx
191809: f0 0f c7 0f lock cmpxchg8b (%edi)
19180d: 89 c1 mov %eax,%ecx
19180f: 8b 74 24 34 mov 0x34(%esp),%esi
191813: 89 d3 mov %edx,%ebx
191815: 89 44 24 4c mov %eax,0x4c(%esp)
191819: 8b 44 24 3c mov 0x3c(%esp),%eax
19181d: 89 df mov %ebx,%edi
19181f: 89 54 24 44 mov %edx,0x44(%esp)
191823: 89 ca mov %ecx,%edx
191825: 31 de xor %ebx,%esi
191827: 31 c8 xor %ecx,%eax
191829: 09 f0 or %esi,%eax
19182b: 75 ac jne 1917d9 <...>

to:

1912ba: 8b 06 mov (%esi),%eax
1912bc: 8b 56 04 mov 0x4(%esi),%edx
1912bf: 89 44 24 3c mov %eax,0x3c(%esp)
1912c3: 89 c1 mov %eax,%ecx
1912c5: 23 4c 24 34 and 0x34(%esp),%ecx
1912c9: 89 d3 mov %edx,%ebx
1912cb: 23 5c 24 38 and 0x38(%esp),%ebx
1912cf: 89 54 24 40 mov %edx,0x40(%esp)
1912d3: 89 4c 24 2c mov %ecx,0x2c(%esp)
1912d7: 89 5c 24 30 mov %ebx,0x30(%esp)
1912db: 8b 5c 24 2c mov 0x2c(%esp),%ebx
1912df: 8b 4c 24 30 mov 0x30(%esp),%ecx
1912e3: f0 0f c7 0e lock cmpxchg8b (%esi)
1912e7: 0f 85 f3 02 00 00 jne 1915e0 <...>

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/atomic64_32.h | 44 ++++++++++++------------------
1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 11e817dab44a..84affd7a5d1c 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -201,69 +201,61 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)

static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
}

static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));

- return old;
+ return val;
}
#define arch_atomic64_fetch_and arch_atomic64_fetch_and

static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
}

static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));

- return old;
+ return val;
}
#define arch_atomic64_fetch_or arch_atomic64_fetch_or

static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
}

static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
- c = old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));

- return old;
+ return val;
}
#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor

static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
- s64 old, c = 0;
+ s64 val = __READ_ONCE(v->counter);

- while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c)
- c = old;
-
- return old;
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
+ return val;
}
#define arch_atomic64_fetch_add arch_atomic64_fetch_add

--
2.44.0


2024-04-09 10:07:12

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops

The value preload before the cmpxchg loop does not need to be atomic,
but should use READ_ONCE to prevent compiler from merging, refetching
or reordering the read.

This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
between x86_32 and x86_64 targets.

No functional changes intended.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/atomic.h | 8 ++++----
arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 55a55ec04350..b166da21ee98 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
* Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
* it's non-inlined function that increases binary size and stack usage.
*/
- return __READ_ONCE((v)->counter);
+ return __READ_ONCE(v->counter);
}

static __always_inline void arch_atomic_set(atomic_t *v, int i)
@@ -132,7 +132,7 @@ static __always_inline void arch_atomic_and(int i, atomic_t *v)

static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
{
- int val = arch_atomic_read(v);
+ int val = __READ_ONCE(v->counter);

do { } while (!arch_atomic_try_cmpxchg(v, &val, val & i));

@@ -150,7 +150,7 @@ static __always_inline void arch_atomic_or(int i, atomic_t *v)

static __always_inline int arch_atomic_fetch_or(int i, atomic_t *v)
{
- int val = arch_atomic_read(v);
+ int val = __READ_ONCE(v->counter);

do { } while (!arch_atomic_try_cmpxchg(v, &val, val | i));

@@ -168,7 +168,7 @@ static __always_inline void arch_atomic_xor(int i, atomic_t *v)

static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
{
- int val = arch_atomic_read(v);
+ int val = __READ_ONCE(v->counter);

do { } while (!arch_atomic_try_cmpxchg(v, &val, val ^ i));

diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 3165c0feedf7..e7b12a48fecb 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -12,7 +12,7 @@

static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
{
- return __READ_ONCE((v)->counter);
+ return __READ_ONCE(v->counter);
}

static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
@@ -126,10 +126,10 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)

static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
{
- s64 val = arch_atomic64_read(v);
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));

- do {
- } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
return val;
}
#define arch_atomic64_fetch_and arch_atomic64_fetch_and
@@ -144,10 +144,10 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)

static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
{
- s64 val = arch_atomic64_read(v);
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));

- do {
- } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
return val;
}
#define arch_atomic64_fetch_or arch_atomic64_fetch_or
@@ -162,10 +162,10 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)

static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
{
- s64 val = arch_atomic64_read(v);
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));

- do {
- } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
return val;
}
#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
--
2.44.0


2024-04-09 10:07:23

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 4/6] locking/atomic/x86: Merge x86_32 and x86_64 arch_atomic64_fetch_{and,or,xor}() functions

Move the same definitions of arch_atomic64_fetch_{and,or,xor}() from
x86/include/asm/atomic64_32.h and x86/include/asm/atomic64_64.h
to the common place in arch/x86/include/asm/atomic.h

No functional changes intended.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/atomic.h | 30 ++++++++++++++++++++++++++++++
arch/x86/include/asm/atomic64_32.h | 30 ------------------------------
arch/x86/include/asm/atomic64_64.h | 30 ------------------------------
3 files changed, 30 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b166da21ee98..b2e44de36934 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -182,4 +182,34 @@ static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
# include <asm/atomic64_64.h>
#endif

+static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
+
+ return val;
+}
+#define arch_atomic64_fetch_and arch_atomic64_fetch_and
+
+static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
+
+ return val;
+}
+#define arch_atomic64_fetch_or arch_atomic64_fetch_or
+
+static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
+
+ return val;
+}
+#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
+
#endif /* _ASM_X86_ATOMIC_H */
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 84affd7a5d1c..4f79198da98e 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -206,16 +206,6 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
}

-static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
-
- return val;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
-
static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
@@ -223,16 +213,6 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
}

-static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
-
- return val;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-
static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
@@ -240,16 +220,6 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
}

-static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
-
- return val;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-
static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index e7b12a48fecb..b2c9974ba971 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -124,16 +124,6 @@ static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
: "memory");
}

-static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
-
- return val;
-}
-#define arch_atomic64_fetch_and arch_atomic64_fetch_and
-
static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
{
asm volatile(LOCK_PREFIX "orq %1,%0"
@@ -142,16 +132,6 @@ static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
: "memory");
}

-static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
-
- return val;
-}
-#define arch_atomic64_fetch_or arch_atomic64_fetch_or
-
static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
{
asm volatile(LOCK_PREFIX "xorq %1,%0"
@@ -160,14 +140,4 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
: "memory");
}

-static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
-
- return val;
-}
-#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
-
#endif /* _ASM_X86_ATOMIC64_64_H */
--
2.44.0


2024-04-09 10:07:32

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 5/6] locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions

There is no need to implement arch_atomic_sub() family of inline
functions, corresponding macros can be directly implemented using
arch_atomic_add() inlines with negated argument.

No functional changes intended.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/atomic.h | 12 ++----------
arch/x86/include/asm/atomic64_32.h | 23 ++++++++++++-----------
arch/x86/include/asm/atomic64_64.h | 12 ++----------
3 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b2e44de36934..77526257dacf 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -86,11 +86,7 @@ static __always_inline int arch_atomic_add_return(int i, atomic_t *v)
}
#define arch_atomic_add_return arch_atomic_add_return

-static __always_inline int arch_atomic_sub_return(int i, atomic_t *v)
-{
- return arch_atomic_add_return(-i, v);
-}
-#define arch_atomic_sub_return arch_atomic_sub_return
+#define arch_atomic_sub_return(i, v) arch_atomic_add_return(-(i), v)

static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
{
@@ -98,11 +94,7 @@ static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
}
#define arch_atomic_fetch_add arch_atomic_fetch_add

-static __always_inline int arch_atomic_fetch_sub(int i, atomic_t *v)
-{
- return xadd(&v->counter, -i);
-}
-#define arch_atomic_fetch_sub arch_atomic_fetch_sub
+#define arch_atomic_fetch_sub(i, v) arch_atomic_fetch_add(-(i), v)

static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
{
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 4f79198da98e..862448db1207 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -199,6 +199,18 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
#undef alternative_atomic64
#undef __alternative_atomic64

+static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
+{
+ s64 val = __READ_ONCE(v->counter);
+
+ do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
+
+ return val;
+}
+#define arch_atomic64_fetch_add arch_atomic64_fetch_add
+
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), v)
+
static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
@@ -220,15 +232,4 @@ static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
}

-static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
-{
- s64 val = __READ_ONCE(v->counter);
-
- do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
- return val;
-}
-#define arch_atomic64_fetch_add arch_atomic64_fetch_add
-
-#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), (v))
-
#endif /* _ASM_X86_ATOMIC64_32_H */
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index b2c9974ba971..a96a4b9acfcb 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -80,11 +80,7 @@ static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
}
#define arch_atomic64_add_return arch_atomic64_add_return

-static __always_inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v)
-{
- return arch_atomic64_add_return(-i, v);
-}
-#define arch_atomic64_sub_return arch_atomic64_sub_return
+#define arch_atomic64_sub_return(i, v) arch_atomic64_add_return(-(i), v)

static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
@@ -92,11 +88,7 @@ static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
}
#define arch_atomic64_fetch_add arch_atomic64_fetch_add

-static __always_inline s64 arch_atomic64_fetch_sub(s64 i, atomic64_t *v)
-{
- return xadd(&v->counter, -i);
-}
-#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), v)

static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
{
--
2.44.0


2024-04-09 10:07:40

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH 6/6] locking/atomic/x86: Reorder a couple of arch_atomic64 functions

Reorder a couple of arch_atomic64 functions in
arch/x86/include/asm/atomic64_32.h to better match
their sequence of declarations between x86_32 and x86_64.

No functional changes intended.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/atomic64_32.h | 46 +++++++++++++++---------------
1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 862448db1207..3864d82a9339 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -12,7 +12,7 @@ typedef struct {
s64 __aligned(8) counter;
} atomic64_t;

-#define ATOMIC64_INIT(val) { (val) }
+#define ATOMIC64_INIT(i) { (i) }

#define __ATOMIC64_DECL(sym) void atomic64_##sym(atomic64_t *, ...)
#ifndef ATOMIC64_EXPORT
@@ -61,17 +61,21 @@ ATOMIC64_DECL(add_unless);
#undef __ATOMIC64_DECL
#undef ATOMIC64_EXPORT

-static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
{
- return arch_cmpxchg64(&v->counter, old, new);
+ s64 r;
+ alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
+ return r;
}
-#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg

-static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
{
- return arch_try_cmpxchg64(&v->counter, old, new);
+ unsigned high = (unsigned)(i >> 32);
+ unsigned low = (unsigned)i;
+ alternative_atomic64(set, /* no output */,
+ "S" (v), "b" (low), "c" (high)
+ : "eax", "edx", "memory");
}
-#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg

static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
{
@@ -85,22 +89,6 @@ static __always_inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
}
#define arch_atomic64_xchg arch_atomic64_xchg

-static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
-{
- unsigned high = (unsigned)(i >> 32);
- unsigned low = (unsigned)i;
- alternative_atomic64(set, /* no output */,
- "S" (v), "b" (low), "c" (high)
- : "eax", "edx", "memory");
-}
-
-static __always_inline s64 arch_atomic64_read(const atomic64_t *v)
-{
- s64 r;
- alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
- return r;
-}
-
static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
{
alternative_atomic64(add_return,
@@ -199,6 +187,18 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
#undef alternative_atomic64
#undef __alternative_atomic64

+static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+{
+ return arch_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
+
+static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+ return arch_try_cmpxchg64(&v->counter, old, new);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+
static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
{
s64 val = __READ_ONCE(v->counter);
--
2.44.0


2024-04-09 11:09:21

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops

On Tue, Apr 09, 2024 at 12:03:54PM +0200, Uros Bizjak wrote:
> The value preload before the cmpxchg loop does not need to be atomic,
> but should use READ_ONCE to prevent compiler from merging, refetching
> or reordering the read.
>

Yes, and that's what arch_atomic_read() and arch_atomic64_read() do...

> This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> between x86_32 and x86_64 targets.
>
> No functional changes intended.
>
> Signed-off-by: Uros Bizjak <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> arch/x86/include/asm/atomic.h | 8 ++++----
> arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
> 2 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index 55a55ec04350..b166da21ee98 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
> * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
> * it's non-inlined function that increases binary size and stack usage.
> */
> - return __READ_ONCE((v)->counter);
> + return __READ_ONCE(v->counter);

Removing the unncessary brackets is fine, but the commit message didn't mention this.

[...]

> static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
> {
> - int val = arch_atomic_read(v);
> + int val = __READ_ONCE(v->counter);

This is the wrong thing to do; arch_atomic_read() already has the required
semantic, and it more clearly aligns with the use of arch_atomic_try_cmpxchg()
below. It contains the documentation regarding why we use __READ_ONCE()
specifically (which we should probably note in arch_atomic64_read()).

Please leave this as-is, and likewise for the other cases below. Similarly, the
prior patch should use arch_atomic{,_64}_read() rather than using
__READ_ONCE().

[...]

> static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> {
> - s64 val = arch_atomic64_read(v);
> + s64 val = __READ_ONCE(v->counter);
> +
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>
> - do {
> - } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> return val;

I think this reformatting is what you meant in the commit message when you said:

| This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
| between x86_32 and x86_64 targets.

Assuming so, can you please jsut do that, and say:

This patch reformats the x86_64 arch_atomic{,64}_{,fetch}_{and,or,xor}()
functions to match the x86_32 versions.

Mark.

2024-04-09 11:13:26

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

On Tue, Apr 09, 2024 at 12:03:53PM +0200, Uros Bizjak wrote:
> Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions to
> use arch_atomic64_try_cmpxchg. This implementation avoids one extra
> trip through the cmpxchg loop.
>
> The value preload before the cmpxchg loop does not need to be atomic,
> but should use READ_ONCE to prevent compiler from merging, refetching
> or reordering the read.
>
> The generated code improves from:
>
> 1917d5: 31 c9 xor %ecx,%ecx
> 1917d7: 31 db xor %ebx,%ebx
> 1917d9: 89 4c 24 3c mov %ecx,0x3c(%esp)
> 1917dd: 8b 74 24 24 mov 0x24(%esp),%esi
> 1917e1: 89 c8 mov %ecx,%eax
> 1917e3: 89 5c 24 34 mov %ebx,0x34(%esp)
> 1917e7: 8b 7c 24 28 mov 0x28(%esp),%edi
> 1917eb: 21 ce and %ecx,%esi
> 1917ed: 89 74 24 4c mov %esi,0x4c(%esp)
> 1917f1: 21 df and %ebx,%edi
> 1917f3: 89 de mov %ebx,%esi
> 1917f5: 89 7c 24 50 mov %edi,0x50(%esp)
> 1917f9: 8b 54 24 4c mov 0x4c(%esp),%edx
> 1917fd: 8b 7c 24 2c mov 0x2c(%esp),%edi
> 191801: 8b 4c 24 50 mov 0x50(%esp),%ecx
> 191805: 89 d3 mov %edx,%ebx
> 191807: 89 f2 mov %esi,%edx
> 191809: f0 0f c7 0f lock cmpxchg8b (%edi)
> 19180d: 89 c1 mov %eax,%ecx
> 19180f: 8b 74 24 34 mov 0x34(%esp),%esi
> 191813: 89 d3 mov %edx,%ebx
> 191815: 89 44 24 4c mov %eax,0x4c(%esp)
> 191819: 8b 44 24 3c mov 0x3c(%esp),%eax
> 19181d: 89 df mov %ebx,%edi
> 19181f: 89 54 24 44 mov %edx,0x44(%esp)
> 191823: 89 ca mov %ecx,%edx
> 191825: 31 de xor %ebx,%esi
> 191827: 31 c8 xor %ecx,%eax
> 191829: 09 f0 or %esi,%eax
> 19182b: 75 ac jne 1917d9 <...>
>
> to:
>
> 1912ba: 8b 06 mov (%esi),%eax
> 1912bc: 8b 56 04 mov 0x4(%esi),%edx
> 1912bf: 89 44 24 3c mov %eax,0x3c(%esp)
> 1912c3: 89 c1 mov %eax,%ecx
> 1912c5: 23 4c 24 34 and 0x34(%esp),%ecx
> 1912c9: 89 d3 mov %edx,%ebx
> 1912cb: 23 5c 24 38 and 0x38(%esp),%ebx
> 1912cf: 89 54 24 40 mov %edx,0x40(%esp)
> 1912d3: 89 4c 24 2c mov %ecx,0x2c(%esp)
> 1912d7: 89 5c 24 30 mov %ebx,0x30(%esp)
> 1912db: 8b 5c 24 2c mov 0x2c(%esp),%ebx
> 1912df: 8b 4c 24 30 mov 0x30(%esp),%ecx
> 1912e3: f0 0f c7 0e lock cmpxchg8b (%esi)
> 1912e7: 0f 85 f3 02 00 00 jne 1915e0 <...>
>
> Signed-off-by: Uros Bizjak <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> arch/x86/include/asm/atomic64_32.h | 44 ++++++++++++------------------
> 1 file changed, 18 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
> index 11e817dab44a..84affd7a5d1c 100644
> --- a/arch/x86/include/asm/atomic64_32.h
> +++ b/arch/x86/include/asm/atomic64_32.h
> @@ -201,69 +201,61 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
>
> static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);

I reckon it's worth placing this in a helper with a big comment, e.g.

static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
{
/*
* TODO: explain that this might be torn, but it occurs *once*, and can
* safely be consumed by atomic64_try_cmpxchg().
*
* TODO: point to the existing commentary regarding why we use
* __READ_ONCE() for KASAN reasons.
*/
return __READ_ONCE(v->counter);
}

.. and then use that in each of the instances below.

That way the subtlety is clearly documented, and it'd more clearly align with
the x86_64 verions.

Mark.

>
> - while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> }
>
> static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
>
> - return old;
> + return val;
> }
> #define arch_atomic64_fetch_and arch_atomic64_fetch_and
>
> static __always_inline void arch_atomic64_or(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
> }
>
> static __always_inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
>
> - return old;
> + return val;
> }
> #define arch_atomic64_fetch_or arch_atomic64_fetch_or
>
> static __always_inline void arch_atomic64_xor(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
> }
>
> static __always_inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c)
> - c = old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
>
> - return old;
> + return val;
> }
> #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
>
> static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
> {
> - s64 old, c = 0;
> + s64 val = __READ_ONCE(v->counter);
>
> - while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c)
> - c = old;
> -
> - return old;
> + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val + i));
> + return val;
> }
> #define arch_atomic64_fetch_add arch_atomic64_fetch_add
>
> --
> 2.44.0
>
>

2024-04-09 11:59:57

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 3/6] locking/atomic/x86: Use READ_ONCE before atomic{,64}_try_cmpxchg loops

On Tue, Apr 9, 2024 at 1:07 PM Mark Rutland <[email protected]> wrote:
>
> On Tue, Apr 09, 2024 at 12:03:54PM +0200, Uros Bizjak wrote:
> > The value preload before the cmpxchg loop does not need to be atomic,
> > but should use READ_ONCE to prevent compiler from merging, refetching
> > or reordering the read.
> >
>
> Yes, and that's what arch_atomic_read() and arch_atomic64_read() do...
>
> > This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> > between x86_32 and x86_64 targets.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Uros Bizjak <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > ---
> > arch/x86/include/asm/atomic.h | 8 ++++----
> > arch/x86/include/asm/atomic64_64.h | 20 ++++++++++----------
> > 2 files changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> > index 55a55ec04350..b166da21ee98 100644
> > --- a/arch/x86/include/asm/atomic.h
> > +++ b/arch/x86/include/asm/atomic.h
> > @@ -20,7 +20,7 @@ static __always_inline int arch_atomic_read(const atomic_t *v)
> > * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
> > * it's non-inlined function that increases binary size and stack usage.
> > */
> > - return __READ_ONCE((v)->counter);
> > + return __READ_ONCE(v->counter);
>
> Removing the unncessary brackets is fine, but the commit message didn't mention this.

It was just a change in passing. I didn't think it even needed a comment.

> [...]
>
> > static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
> > {
> > - int val = arch_atomic_read(v);
> > + int val = __READ_ONCE(v->counter);
>
> This is the wrong thing to do; arch_atomic_read() already has the required
> semantic, and it more clearly aligns with the use of arch_atomic_try_cmpxchg()
> below. It contains the documentation regarding why we use __READ_ONCE()
> specifically (which we should probably note in arch_atomic64_read()).
>
> Please leave this as-is, and likewise for the other cases below. Similarly, the
> prior patch should use arch_atomic{,_64}_read() rather than using
> __READ_ONCE().

Please note that arch_atomic64_read implements true 64-bit atomic read
on x86_32. I tried to bypass this using __READ_ONCE(), but your
suggestion to use arch_atomic64_read_tearable() is indeed a much
better approach.

> [...]
>
> > static __always_inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v)
> > {
> > - s64 val = arch_atomic64_read(v);
> > + s64 val = __READ_ONCE(v->counter);
> > +
> > + do { } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> >
> > - do {
> > - } while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
> > return val;
>
> I think this reformatting is what you meant in the commit message when you said:
>
> | This patch unifies arch_atomic{,64}_{,fetch}_{and,or,xor}() macros
> | between x86_32 and x86_64 targets.

Actually, it was the change to use __READ_ONCE(). The reformatting
follows the functions from atomic.h, e.g. arch_atomic_fetch_and() and
was another case of change in passing, not worth mentioning in the
commit message. I will rewrite this in v2 of the patch, so these
functions will uniformly use arch_atomic64_read_tearable().

> Assuming so, can you please jsut do that, and say:
>
> This patch reformats the x86_64 arch_atomic{,64}_{,fetch}_{and,or,xor}()
> functions to match the x86_32 versions.

Thanks,
Uros.

2024-04-09 12:04:23

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <[email protected]> wrote:

> > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > {
> > - s64 old, c = 0;
> > + s64 val = __READ_ONCE(v->counter);
>
> I reckon it's worth placing this in a helper with a big comment, e.g.
>
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
> /*
> * TODO: explain that this might be torn, but it occurs *once*, and can
> * safely be consumed by atomic64_try_cmpxchg().
> *
> * TODO: point to the existing commentary regarding why we use
> * __READ_ONCE() for KASAN reasons.
> */
> return __READ_ONCE(v->counter);
> }
>
> ... and then use that in each of the instances below.
>
> That way the subtlety is clearly documented, and it'd more clearly align with
> the x86_64 verions.

This is an excellent idea. The separate definitions needs to be placed
in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
typedef), but it will allow the same unification of functions between
x64_32 and x64_64 as the approach with __READ_ONCE().

Thanks,
Uros.

2024-04-09 12:50:44

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <[email protected]> wrote:
>
> On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <[email protected]> wrote:
>
> > > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > > {
> > > - s64 old, c = 0;
> > > + s64 val = __READ_ONCE(v->counter);
> >
> > I reckon it's worth placing this in a helper with a big comment, e.g.
> >
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> > /*
> > * TODO: explain that this might be torn, but it occurs *once*, and can
> > * safely be consumed by atomic64_try_cmpxchg().
> > *
> > * TODO: point to the existing commentary regarding why we use
> > * __READ_ONCE() for KASAN reasons.
> > */
> > return __READ_ONCE(v->counter);
> > }
> >
> > ... and then use that in each of the instances below.
> >
> > That way the subtlety is clearly documented, and it'd more clearly align with
> > the x86_64 verions.
>
> This is an excellent idea. The separate definitions needs to be placed
> in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> typedef), but it will allow the same unification of functions between
> x64_32 and x64_64 as the approach with __READ_ONCE().

Something like this:

--cut here--
/*
* This function is intended to preload the value from atomic64_t
* location in a non-atomic way. The read might be torn, but can
* safely be consumed by the compare-and-swap loop.
*/
static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
{
/*
* See the comment in arch_atomic_read() on why we use
* __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
*/
return __READ_ONCE(v->counter);
}
--cut here--

Thanks,
Uros.

2024-04-09 16:40:48

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

On Tue, Apr 9, 2024 at 6:34 PM Mark Rutland <[email protected]> wrote:

> > > > ... and then use that in each of the instances below.
> > > >
> > > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > > the x86_64 verions.
> > >
> > > This is an excellent idea. The separate definitions needs to be placed
> > > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > > typedef), but it will allow the same unification of functions between
> > > x64_32 and x64_64 as the approach with __READ_ONCE().
> >
> > Something like this:
> >
> > --cut here--
> > /*
> > * This function is intended to preload the value from atomic64_t
> > * location in a non-atomic way. The read might be torn, but can
> > * safely be consumed by the compare-and-swap loop.
> > */
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> > /*
> > * See the comment in arch_atomic_read() on why we use
> > * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> > */
> > return __READ_ONCE(v->counter);
> > }
> > --cut here--
> >
> > Thanks,
> > Uros.
>
> Yeah, something of that shape.
>
> Having thought for a bit longer, it's probably better to use '_torn' rather
> than '_tearable' (i.e. name this arch_atomic64_read_torn()).
>
> It'd be nice if we could specify the usage restrictions a bit more clearly,
> since this can only be used for compare-and-swap loops that implement
> unconditional atomics. (e.g. arch_atomic64_and(), but not
> arch_atomic_add_unless()).
>
> So I'd suggest:

Eh, just sent a v2 a second before I received your mail. I'll respin
the patchset tomorrow to include your suggested text. Please note that
v2 patch set avoids all cosmetic changes.

Thanks,
Uros.

>
> /*
> * Read an atomic64_t non-atomically.
> *
> * This is intended to be used in cases where a subsequent atomic operation
> * will handle the torn value, and can be used to prime the first iteration of
> * unconditional try_cmpxchg() loops, e.g.
> *
> * s64 val = arch_atomic64_read_torn(v);
> * do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
> *
> * This is NOT safe to use where the value is not always checked by a
> * subsequent atomic operation, such as in conditional try_cmpxchg() loops that
> * can break before the atomic, e.g.
> *
> * s64 val = arch_atomic64_read_torn(v);
> * do {
> * if (condition(val))
> * break;
> * } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
> */
> static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
> {
> /* See comment in arch_atomic_read() */
> return __READ_ONCE(v->counter);
> }
>
> Mark.

2024-04-09 16:53:53

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

On Tue, Apr 9, 2024 at 6:34 PM Mark Rutland <[email protected]> wrote:

> > Something like this:
> >
> > --cut here--
> > /*
> > * This function is intended to preload the value from atomic64_t
> > * location in a non-atomic way. The read might be torn, but can
> > * safely be consumed by the compare-and-swap loop.
> > */
> > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > {
> > /*
> > * See the comment in arch_atomic_read() on why we use
> > * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> > */
> > return __READ_ONCE(v->counter);
> > }
> > --cut here--
>
> Yeah, something of that shape.
>
> Having thought for a bit longer, it's probably better to use '_torn' rather
> than '_tearable' (i.e. name this arch_atomic64_read_torn()).

How about we simply name the function

arch_atomic64_read_nonatomic()

in the sense that it reads atomic64_t variables in a non-atomic way?

Uros.

2024-04-09 17:07:32

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

On Tue, Apr 09, 2024 at 02:50:19PM +0200, Uros Bizjak wrote:
> On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <[email protected]> wrote:
> >
> > On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <[email protected]> wrote:
> >
> > > > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > > > {
> > > > - s64 old, c = 0;
> > > > + s64 val = __READ_ONCE(v->counter);
> > >
> > > I reckon it's worth placing this in a helper with a big comment, e.g.
> > >
> > > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > > {
> > > /*
> > > * TODO: explain that this might be torn, but it occurs *once*, and can
> > > * safely be consumed by atomic64_try_cmpxchg().
> > > *
> > > * TODO: point to the existing commentary regarding why we use
> > > * __READ_ONCE() for KASAN reasons.
> > > */
> > > return __READ_ONCE(v->counter);
> > > }
> > >
> > > ... and then use that in each of the instances below.
> > >
> > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > the x86_64 verions.
> >
> > This is an excellent idea. The separate definitions needs to be placed
> > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > typedef), but it will allow the same unification of functions between
> > x64_32 and x64_64 as the approach with __READ_ONCE().
>
> Something like this:
>
> --cut here--
> /*
> * This function is intended to preload the value from atomic64_t
> * location in a non-atomic way. The read might be torn, but can
> * safely be consumed by the compare-and-swap loop.
> */
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
> /*
> * See the comment in arch_atomic_read() on why we use
> * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> */
> return __READ_ONCE(v->counter);
> }
> --cut here--
>
> Thanks,
> Uros.

Yeah, something of that shape.

Having thought for a bit longer, it's probably better to use '_torn' rather
than '_tearable' (i.e. name this arch_atomic64_read_torn()).

It'd be nice if we could specify the usage restrictions a bit more clearly,
since this can only be used for compare-and-swap loops that implement
unconditional atomics. (e.g. arch_atomic64_and(), but not
arch_atomic_add_unless()).

So I'd suggest:

/*
* Read an atomic64_t non-atomically.
*
* This is intended to be used in cases where a subsequent atomic operation
* will handle the torn value, and can be used to prime the first iteration of
* unconditional try_cmpxchg() loops, e.g.
*
* s64 val = arch_atomic64_read_torn(v);
* do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
*
* This is NOT safe to use where the value is not always checked by a
* subsequent atomic operation, such as in conditional try_cmpxchg() loops that
* can break before the atomic, e.g.
*
* s64 val = arch_atomic64_read_torn(v);
* do {
* if (condition(val))
* break;
* } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
*/
static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
{
/* See comment in arch_atomic_read() */
return __READ_ONCE(v->counter);
}

Mark.