2022-06-26 21:03:20

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH v2 RESEND] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64

Commit bc08b449ee14ace4d869adaa1bb35a44ce68d775 enabled lockless
reference count updates using cmpxchg() only for x86_64 and
left x86_32 behind due to inability to detect support for
cmpxchg8b instruction. Nowadays, we can use CONFIG_X86_CMPXCHG64
for this purpose. Also, by using try_cmpxchg64() instead of cmpxchg64()
in CMPXCHG_LOOP macro, the compiler actually produces sane code,
improving lockref_get_or_lock main loop from:

2a5: 8d 48 01 lea 0x1(%eax),%ecx
2a8: 85 c0 test %eax,%eax
2aa: 7e 3c jle 2e8 <lockref_get_or_lock+0x78>
2ac: 8b 44 24 08 mov 0x8(%esp),%eax
2b0: 8b 54 24 0c mov 0xc(%esp),%edx
2b4: 8b 74 24 04 mov 0x4(%esp),%esi
2b8: f0 0f c7 0e lock cmpxchg8b (%esi)
2bc: 8b 4c 24 0c mov 0xc(%esp),%ecx
2c0: 89 c3 mov %eax,%ebx
2c2: 89 d0 mov %edx,%eax
2c4: 8b 74 24 08 mov 0x8(%esp),%esi
2c8: 31 ca xor %ecx,%edx
2ca: 31 de xor %ebx,%esi
2cc: 09 f2 or %esi,%edx
2ce: 75 40 jne 310 <lockref_get_or_lock+0xa0>

to:

2d: 8d 4f 01 lea 0x1(%edi),%ecx
30: 85 ff test %edi,%edi
32: 7e 1c jle 50 <lockref_get_or_lock+0x50>
34: f0 0f c7 0e lock cmpxchg8b (%esi)
38: 75 36 jne 70 <lockref_get_or_lock+0x70>

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
v2:
- select ARCH_USE_CMPXCHG_LOCKREF for CONFIG_X86_CMPXCHG which
is unconditionally defined for X86_64
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index be0b95e51df6..22555e0c894d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,7 +27,6 @@ config X86_64
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
- select ARCH_USE_CMPXCHG_LOCKREF
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
@@ -111,6 +110,7 @@ config X86
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_USE_BUILTIN_BSWAP
+ select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
--
2.35.3


2022-07-03 21:08:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 RESEND] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64

On Sun, Jun 26, 2022 at 1:18 PM Uros Bizjak <[email protected]> wrote:
>
> Also, by using try_cmpxchg64() instead of cmpxchg64()
> in CMPXCHG_LOOP macro, the compiler actually produces sane code,
> improving lockref_get_or_lock main loop from:

Heh. I'm actually looking at that function because I committed my "add
sparse annotation for conditional locking" patch, and
lockref_get_or_lock() has the wrong "polarity" for conditional locking
(it returns false when it takes the lock).

But then I started looking closer, and that function has no users any
more. In fact, it hasn't had users since back in 2013.

So while I still think ARCH_USE_CMPXCHG_LOCKREF is fine for 32-bit
x86, the part about improving lockref_get_or_lock() code generation is
kind of pointless. I'm going to remove that function as "unused, and
with the wrong return value".

Linus

2022-09-10 06:31:34

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH v2 RESEND] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64

On Sun, Jul 3, 2022 at 11:01 PM Linus Torvalds
<[email protected]> wrote:
>
> On Sun, Jun 26, 2022 at 1:18 PM Uros Bizjak <[email protected]> wrote:
> >
> > Also, by using try_cmpxchg64() instead of cmpxchg64()
> > in CMPXCHG_LOOP macro, the compiler actually produces sane code,
> > improving lockref_get_or_lock main loop from:
>
> Heh. I'm actually looking at that function because I committed my "add
> sparse annotation for conditional locking" patch, and
> lockref_get_or_lock() has the wrong "polarity" for conditional locking
> (it returns false when it takes the lock).
>
> But then I started looking closer, and that function has no users any
> more. In fact, it hasn't had users since back in 2013.
>
> So while I still think ARCH_USE_CMPXCHG_LOCKREF is fine for 32-bit
> x86, the part about improving lockref_get_or_lock() code generation is
> kind of pointless. I'm going to remove that function as "unused, and
> with the wrong return value".

May I consider this message as a formal Acked-by: for the patch? I'll
resubmit the patch with a commit message updated to reference
lockref_put_not_zero instead of the removed lockref_get_or_lock.

Thanks,
Uros.

2022-09-10 17:30:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 RESEND] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64

On Sat, Sep 10, 2022 at 2:28 AM Uros Bizjak <[email protected]> wrote:
>
> May I consider this message as a formal Acked-by: for the patch? I'll
> resubmit the patch with a commit message updated to reference
> lockref_put_not_zero instead of the removed lockref_get_or_lock.

Sure, sounds good to me.

It would be particularly nice if you can also see any change in
performance numbers - but that simply may not be possible.

32-bit x86 tends to also be very low core count, so the whole lockref
thing may or may not be measurable (no practical contention on the
lock), but the code certainly seems to get better.

Linus

2022-09-12 10:45:30

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH v2 RESEND] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64

On Sat, Sep 10, 2022 at 7:23 PM Linus Torvalds
<[email protected]> wrote:
>
> On Sat, Sep 10, 2022 at 2:28 AM Uros Bizjak <[email protected]> wrote:
> >
> > May I consider this message as a formal Acked-by: for the patch? I'll
> > resubmit the patch with a commit message updated to reference
> > lockref_put_not_zero instead of the removed lockref_get_or_lock.
>
> Sure, sounds good to me.
>
> It would be particularly nice if you can also see any change in
> performance numbers - but that simply may not be possible.
>
> 32-bit x86 tends to also be very low core count, so the whole lockref
> thing may or may not be measurable (no practical contention on the
> lock), but the code certainly seems to get better.

I tested this patch on an old core-2 duo in 32-bit mode, mainly to
test my try_cmpxchg64 patch on 32-bit targets. There were no
observable changes in the run time, but we are talking about a
two-core system here. OTOH, there were considerable code-size savings,
as noticed in my patch submission entry.

Please also note, that I am aware that changes to the default
configuration of 32-bit falls into the vintage computing nowadays.
However, a couple of enthusiast old-timers would still like to squeeze
some more juice out of their old rigs (e.g. m68k, alpha and x86-32) by
using all available processor infrastructure. OTOH, I don't want to
burden x86 maintainers with my hobby, but here we are talking about
such low-hanging fruit that it warrants the one-line change. By the
same patch, the default config can be cleaned a bit and made a bit
more consistent also for x86_64.

Thanks,
Uros.