LinuxLists.cc - [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86

2022-05-25 19:57:19

Subject: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64

Commit bc08b449ee14ace4d869adaa1bb35a44ce68d775 enabled lockless
reference count updates using cmpxchg() only for x86_64 and
left x86_32 behind due to inability to detect support for
cmpxchg8b instruction. Nowadays, we can use CONFIG_X86_CMPXCHG64
for this purpose. Also, by using try_cmpxchg64() instead of cmpxchg()
in CMPXCHG_LOOP macro, the compiler actually produces sane code,
improving lockref_get_or_lock main loop from:

2a5: 8d 48 01 lea 0x1(%eax),%ecx
2a8: 85 c0 test %eax,%eax
2aa: 7e 3c jle 2e8 <lockref_get_or_lock+0x78>
2ac: 8b 44 24 08 mov 0x8(%esp),%eax
2b0: 8b 54 24 0c mov 0xc(%esp),%edx
2b4: 8b 74 24 04 mov 0x4(%esp),%esi
2b8: f0 0f c7 0e lock cmpxchg8b (%esi)
2bc: 8b 4c 24 0c mov 0xc(%esp),%ecx
2c0: 89 c3 mov %eax,%ebx
2c2: 89 d0 mov %edx,%eax
2c4: 8b 74 24 08 mov 0x8(%esp),%esi
2c8: 31 ca xor %ecx,%edx
2ca: 31 de xor %ebx,%esi
2cc: 09 f2 or %esi,%edx
2ce: 75 40 jne 310 <lockref_get_or_lock+0xa0>

to:

2d: 8d 4f 01 lea 0x1(%edi),%ecx
30: 85 ff test %edi,%edi
32: 7e 1c jle 50 <lockref_get_or_lock+0x50>
34: f0 0f c7 0e lock cmpxchg8b (%esi)
38: 75 36 jne 70 <lockref_get_or_lock+0x70>

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 762a0b6ab8b6..326cfdc4f136 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,7 +27,6 @@ config X86_64
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
- select ARCH_USE_CMPXCHG_LOCKREF
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
@@ -111,6 +110,7 @@ config X86
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_USE_BUILTIN_BSWAP
+ select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
--
2.35.3

2022-05-26 08:16:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64

On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <[email protected]> wrote:
>
> + select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)

Ugh. That looks pointlessly complicated. X86_64 already enables
X86_CMPXCHG64 afaik, so you can just say

select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64

which is much clearer: CMPXCHG_LOCKREF needs CMPXCHG64, and the
Kconfig option says exactly that.

That said, this also makes me wonder if we should just make cmpxchg8b
requirement unconditional.

Googling around, it looks like Windows NT stopped booting on CPUs
without cmpxchg8b in version 5.1. That was in 2001.

Here we are, 21 years later, and we still ostensibly try to support
CPUs without it, but I doubt it's worth it.

So maybe time to just say "we require cmpxchg8b".

In fact, I think we have effectively done that already for years, since we have

config X86_CMPXCHG64
def_bool y
depends on X86_PAE || ...

iow, enabling X86_PAE will force-enable CMPXCHG8B due to the wider
page table entries.

And I would assume that all distros basically do that anyway (but I do
not have a 32-bit distro around to check).

It would mean that we would finally drop support for i486 (and
possibly some Pentium clones, but afaik a number of them did actually
support cmpxchg8b even if they didn't report it in cpuid).

Comments?

Linus

2022-05-26 18:18:16

by Uros Bizjak

[permalink] [raw]

Subject: Re: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64

On Thu, May 26, 2022 at 10:30 AM David Laight <[email protected]> wrote:
>
> From: Linus Torvalds
> > Sent: 25 May 2022 17:30
> >
> > On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <[email protected]> wrote:
> > >
> > > + select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)
> >
> > Ugh. That looks pointlessly complicated. X86_64 already enables
> > X86_CMPXCHG64 afaik, so you can just say
> >
> > select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
> >
> > which is much clearer: CMPXCHG_LOCKREF needs CMPXCHG64, and the
> > Kconfig option says exactly that.
> >
> > That said, this also makes me wonder if we should just make cmpxchg8b
> > requirement unconditional.
> >
> > Googling around, it looks like Windows NT stopped booting on CPUs
> > without cmpxchg8b in version 5.1. That was in 2001.
> >
> > Here we are, 21 years later, and we still ostensibly try to support
> > CPUs without it, but I doubt it's worth it.
> >
> > So maybe time to just say "we require cmpxchg8b".
> >
> > In fact, I think we have effectively done that already for years, since we have
> >
> > config X86_CMPXCHG64
> > def_bool y
> > depends on X86_PAE || ...
> >
> > iow, enabling X86_PAE will force-enable CMPXCHG8B due to the wider
> > page table entries.
> >
> > And I would assume that all distros basically do that anyway (but I do
> > not have a 32-bit distro around to check).
> >
> > It would mean that we would finally drop support for i486 (and
> > possibly some Pentium clones, but afaik a number of them did actually
> > support cmpxchg8b even if they didn't report it in cpuid).
>
> Perhaps there could be a non-smp implementation of cmpxchg8b
> that just disables interrupts?
>
> While I have used a dual 486 I doubt Linux would run ever
> have on it. The same is probably true for old dual Pentiums.
>
> I think there are still some 486-class embedded cpu that include
> a few of the newer instructions (usually things like rdtsc).
> But they won't be smp.

The above solution is already implemented when X86_CMPXCHG64 is not
defined. Please see arch/x86/include/asm/cmpxchg_32.h, where in this
case we have:

#define arch_cmpxchg64(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
__typeof__(*(ptr)) __old = (o); \
__typeof__(*(ptr)) __new = (n); \
alternative_io(LOCK_PREFIX_HERE \
"call cmpxchg8b_emu", \
"lock; cmpxchg8b (%%esi)" , \
X86_FEATURE_CX8, \
"=A" (__ret), \
"S" ((ptr)), "0" (__old), \
"b" ((unsigned int)__new), \
"c" ((unsigned int)(__new>>32)) \
: "memory"); \
__ret; })

Without CX8, the code will be patched to the call to cmpxchg8b_emu,
defined in arch/x86/lib/cmpxchg8b_emu.S.

I think that the solution with CONFIG_X86_CMPXCHG64 is quite good.
When defined, arch_cmpxchg64 defines the real CMPXCHG8B instruction.
Without the config flag, the above definition is compiled in and the
code is patched to either the call or the real instruction during
runtime. Also, the config flag makes difference only in
arch/x86/include/asm/cmpxchg_32.h, and with try_cmpxchg64 fallbacks,
one can still use cmpxchg64/try_cmpxchg64 even when the HW doesn't
support CMPXCHG8B natively.

Uros.

2022-05-27 07:55:48

by David Laight

[permalink] [raw]

Subject: RE: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64

From: Linus Torvalds
> Sent: 25 May 2022 17:30
>
> On Wed, May 25, 2022 at 7:40 AM Uros Bizjak <[email protected]> wrote:
> >
> > + select ARCH_USE_CMPXCHG_LOCKREF if X86_64 || (X86_32 && X86_CMPXCHG64)
>
> Ugh. That looks pointlessly complicated. X86_64 already enables
> X86_CMPXCHG64 afaik, so you can just say
>
> select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
>
> which is much clearer: CMPXCHG_LOCKREF needs CMPXCHG64, and the
> Kconfig option says exactly that.
>
> That said, this also makes me wonder if we should just make cmpxchg8b
> requirement unconditional.
>
> Googling around, it looks like Windows NT stopped booting on CPUs
> without cmpxchg8b in version 5.1. That was in 2001.
>
> Here we are, 21 years later, and we still ostensibly try to support
> CPUs without it, but I doubt it's worth it.
>
> So maybe time to just say "we require cmpxchg8b".
>
> In fact, I think we have effectively done that already for years, since we have
>
> config X86_CMPXCHG64
> def_bool y
> depends on X86_PAE || ...
>
> iow, enabling X86_PAE will force-enable CMPXCHG8B due to the wider
> page table entries.
>
> And I would assume that all distros basically do that anyway (but I do
> not have a 32-bit distro around to check).
>
> It would mean that we would finally drop support for i486 (and
> possibly some Pentium clones, but afaik a number of them did actually
> support cmpxchg8b even if they didn't report it in cpuid).

Perhaps there could be a non-smp implementation of cmpxchg8b
that just disables interrupts?

While I have used a dual 486 I doubt Linux would run ever
have on it. The same is probably true for old dual Pentiums.

I think there are still some 486-class embedded cpu that include
a few of the newer instructions (usually things like rdtsc).
But they won't be smp.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-05-27 15:52:04

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 2/2] locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_32 && X86_CMPXCHG64

On Thu, May 26, 2022 at 1:30 AM David Laight <[email protected]> wrote:
>
> Perhaps there could be a non-smp implementation of cmpxchg8b
> that just disables interrupts?

As Uros points out, we do have exactly that, but it's not actually
written to be usable for the "trylock" case. Plus it would be
pointless for lockrefs, since the non-cmpxchg case that just falls
back to a spinlock would be faster and simpler (particularly on UP,
where locking goes away).

> While I have used a dual 486 I doubt Linux would run ever
> have on it. The same is probably true for old dual Pentiums.

Yeah, I don't think we ever supported SMP on i486, afaik they all
needed special system glue.

I think the "modern" x86 SMP support with a local APIC was a PPro and
newer thing historically, but clearly there were then later what
amounted to Penitum/MMX class cores (ie old Atom) that did support
SMP.

Linus