2023-05-09 15:18:04

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath()

Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
in rcuref_put_slowpath(). 86 CMPXCHG instruction returns success in
ZF flag, so this change saves a compare after cmpxchg. Additionaly,
the compiler reorders some code blocks to follow likely/unlikely
annotations in the atomic_try_cmpxchg macro, improving the code from

9a: f0 0f b1 0b lock cmpxchg %ecx,(%rbx)
9e: 83 f8 ff cmp $0xffffffff,%eax
a1: 74 04 je a7 <rcuref_put_slowpath+0x27>
a3: 31 c0 xor %eax,%eax

to

9a: f0 0f b1 0b lock cmpxchg %ecx,(%rbx)
9e: 75 4c jne ec <rcuref_put_slowpath+0x6c>
a0: b0 01 mov $0x1,%al

No functional change intended.

Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
lib/rcuref.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/rcuref.c b/lib/rcuref.c
index 5ec00a4a64d1..97f300eca927 100644
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -248,7 +248,7 @@ bool rcuref_put_slowpath(rcuref_t *ref)
* require a retry. If this fails the caller is not
* allowed to deconstruct the object.
*/
- if (atomic_cmpxchg_release(&ref->refcnt, RCUREF_NOREF, RCUREF_DEAD) != RCUREF_NOREF)
+ if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
return false;

/*
--
2.40.1


2023-05-10 11:39:07

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] atomics: Use atomic_try_cmpxchg_release in rcuref_put_slowpath()

From: Uros Bizjak
> Sent: 09 May 2023 16:03
>
> Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
> in rcuref_put_slowpath(). 86 CMPXCHG instruction returns success in
> ZF flag, so this change saves a compare after cmpxchg. Additionaly,
> the compiler reorders some code blocks to follow likely/unlikely
> annotations in the atomic_try_cmpxchg macro, improving the code from
>
> 9a: f0 0f b1 0b lock cmpxchg %ecx,(%rbx)
> 9e: 83 f8 ff cmp $0xffffffff,%eax
> a1: 74 04 je a7 <rcuref_put_slowpath+0x27>
> a3: 31 c0 xor %eax,%eax
>
> to
>
> 9a: f0 0f b1 0b lock cmpxchg %ecx,(%rbx)
> 9e: 75 4c jne ec <rcuref_put_slowpath+0x6c>
> a0: b0 01 mov $0x1,%al
>
> No functional change intended.

While I'm not against the change I bet you can't detect
any actual difference. IIRC:
- The 'cmp+je' get merged into a single u-op.
- The 'lock cmpxchg' will take long enough that the instruction
decoder won't be a bottleneck.
- Whether the je/jne is predicted taken is pretty much random.
So you'll speculatively execute somewhere (could be anywhere)
while the locked cycle completes.
So the only change is three less bytes of object code.
That will change the cache line alignment of later code.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


Subject: [tip: locking/core] locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 4fbf8b136ded943f8661cf48270482ad1f5ce7bd
Gitweb: https://git.kernel.org/tip/4fbf8b136ded943f8661cf48270482ad1f5ce7bd
Author: Uros Bizjak <[email protected]>
AuthorDate: Tue, 09 May 2023 17:02:55 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 10 Oct 2023 10:14:27 +02:00

locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()

Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in rcuref_put_slowpath(). On x86 the CMPXCHG instruction returns success in the
ZF flag, so this change saves a compare after CMPXCHG. Additionaly,
the compiler reorders some code blocks to follow likely/unlikely
annotations in the atomic_try_cmpxchg() macro, improving the code from:

9a: f0 0f b1 0b lock cmpxchg %ecx,(%rbx)
9e: 83 f8 ff cmp $0xffffffff,%eax
a1: 74 04 je a7 <rcuref_put_slowpath+0x27>
a3: 31 c0 xor %eax,%eax

to:

9a: f0 0f b1 0b lock cmpxchg %ecx,(%rbx)
9e: 75 4c jne ec <rcuref_put_slowpath+0x6c>
a0: b0 01 mov $0x1,%al

No functional change intended.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
lib/rcuref.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/rcuref.c b/lib/rcuref.c
index 5ec00a4..97f300e 100644
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -248,7 +248,7 @@ bool rcuref_put_slowpath(rcuref_t *ref)
* require a retry. If this fails the caller is not
* allowed to deconstruct the object.
*/
- if (atomic_cmpxchg_release(&ref->refcnt, RCUREF_NOREF, RCUREF_DEAD) != RCUREF_NOREF)
+ if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
return false;

/*