2023-09-16 13:15:47

by Greg Ungerer

[permalink] [raw]
Subject: Re: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte



On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote:
> Using EOR to clear the guaranteed-to-be-set lock bit will test the
> negative flag just like the x86 implementation. This should be
> more efficient than the generic implementation in filemap.c. It
> would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> arch/m68k/include/asm/bitops.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index e984af71df6b..909ebe7cab5d 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> return test_and_change_bit(nr, addr);
> }
>
> +static inline bool xor_unlock_is_negative_byte(unsigned long mask,
> + volatile unsigned long *p)
> +{
> + char result;
> + char *cp = (char *)p + 3; /* m68k is big-endian */
> +
> + __asm__ __volatile__ ("eor.b %1, %2; smi %0"

The ColdFire members of the 68k family do not support byte size eor:

CC mm/filemap.o
{standard input}: Assembler messages:
{standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored

Regards
Greg




> + : "=d" (result)
> + : "di" (mask), "o" (*cp)
> + : "memory");
> + return result;
> +}
> +#define xor_unlock_is_negative_byte xor_unlock_is_negative_byte
> +
> /*
> * The true 68020 and more advanced processors support the "bfffo"
> * instruction for finding bits. ColdFire and simple 68000 parts


2023-09-16 19:57:49

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte

On Sat, Sep 16, 2023 at 11:11:32PM +1000, Greg Ungerer wrote:
> On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote:
> > Using EOR to clear the guaranteed-to-be-set lock bit will test the
> > negative flag just like the x86 implementation. This should be
> > more efficient than the generic implementation in filemap.c. It
> > would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > arch/m68k/include/asm/bitops.h | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> > index e984af71df6b..909ebe7cab5d 100644
> > --- a/arch/m68k/include/asm/bitops.h
> > +++ b/arch/m68k/include/asm/bitops.h
> > @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> > return test_and_change_bit(nr, addr);
> > }
> > +static inline bool xor_unlock_is_negative_byte(unsigned long mask,
> > + volatile unsigned long *p)
> > +{
> > + char result;
> > + char *cp = (char *)p + 3; /* m68k is big-endian */
> > +
> > + __asm__ __volatile__ ("eor.b %1, %2; smi %0"
>
> The ColdFire members of the 68k family do not support byte size eor:
>
> CC mm/filemap.o
> {standard input}: Assembler messages:
> {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored

Well, that sucks. What do you suggest for Coldfire?

(Shame you didn't join in on the original discussion:
https://lore.kernel.org/linux-m68k/[email protected]/ )

2023-09-18 17:11:24

by Greg Ungerer

[permalink] [raw]
Subject: Re: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte


On 17/9/23 00:34, Matthew Wilcox wrote:
> On Sat, Sep 16, 2023 at 11:11:32PM +1000, Greg Ungerer wrote:
>> On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote:
>>> Using EOR to clear the guaranteed-to-be-set lock bit will test the
>>> negative flag just like the x86 implementation. This should be
>>> more efficient than the generic implementation in filemap.c. It
>>> would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__.
>>>
>>> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
>>> ---
>>> arch/m68k/include/asm/bitops.h | 14 ++++++++++++++
>>> 1 file changed, 14 insertions(+)
>>>
>>> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
>>> index e984af71df6b..909ebe7cab5d 100644
>>> --- a/arch/m68k/include/asm/bitops.h
>>> +++ b/arch/m68k/include/asm/bitops.h
>>> @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>>> return test_and_change_bit(nr, addr);
>>> }
>>> +static inline bool xor_unlock_is_negative_byte(unsigned long mask,
>>> + volatile unsigned long *p)
>>> +{
>>> + char result;
>>> + char *cp = (char *)p + 3; /* m68k is big-endian */
>>> +
>>> + __asm__ __volatile__ ("eor.b %1, %2; smi %0"
>>
>> The ColdFire members of the 68k family do not support byte size eor:
>>
>> CC mm/filemap.o
>> {standard input}: Assembler messages:
>> {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored
>
> Well, that sucks. What do you suggest for Coldfire?

I am not seeing an easy way to not fall back to something like the MIPS
implementation for ColdFire. Could obviously assemblerize this to do better
than gcc, but if it has to be atomic I think we are stuck with the irq locking.

static inline bool cf_xor_is_negative_byte(unsigned long mask,
volatile unsigned long *addr)
{
unsigned long flags;
unsigned long data;

local_irq_save(flags)
data = *addr;
*addr = data ^ mask;
local_irq_restore(flags);

return (data & BIT(7)) != 0;
}

Regards
Greg


> (Shame you didn't join in on the original discussion:
> https://lore.kernel.org/linux-m68k/[email protected]/ )

2023-09-19 14:00:53

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte

> Well, that sucks. What do you suggest for Coldfire?

Can you just do a 32bit xor ?
Unless you've got smp m68k I'd presume it is ok?
(And assuming you aren't falling off a page.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-09-19 14:50:13

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte

On Tue, Sep 19, 2023 at 01:23:08PM +0000, David Laight wrote:
> > Well, that sucks. What do you suggest for Coldfire?
>
> Can you just do a 32bit xor ?
> Unless you've got smp m68k I'd presume it is ok?
> (And assuming you aren't falling off a page.)

Patch welcome.

2023-09-20 08:02:18

by Greg Ungerer

[permalink] [raw]
Subject: Re: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte


On 19/9/23 00:37, Greg Ungerer wrote:
> On 17/9/23 00:34, Matthew Wilcox wrote:
>> On Sat, Sep 16, 2023 at 11:11:32PM +1000, Greg Ungerer wrote:
>>> On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote:
>>>> Using EOR to clear the guaranteed-to-be-set lock bit will test the
>>>> negative flag just like the x86 implementation.  This should be
>>>> more efficient than the generic implementation in filemap.c.  It
>>>> would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__.
>>>>
>>>> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
>>>> ---
>>>>    arch/m68k/include/asm/bitops.h | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
>>>> index e984af71df6b..909ebe7cab5d 100644
>>>> --- a/arch/m68k/include/asm/bitops.h
>>>> +++ b/arch/m68k/include/asm/bitops.h
>>>> @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
>>>>        return test_and_change_bit(nr, addr);
>>>>    }
>>>> +static inline bool xor_unlock_is_negative_byte(unsigned long mask,
>>>> +        volatile unsigned long *p)
>>>> +{
>>>> +    char result;
>>>> +    char *cp = (char *)p + 3;    /* m68k is big-endian */
>>>> +
>>>> +    __asm__ __volatile__ ("eor.b %1, %2; smi %0"
>>>
>>> The ColdFire members of the 68k family do not support byte size eor:
>>>
>>>    CC      mm/filemap.o
>>> {standard input}: Assembler messages:
>>> {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored
>>
>> Well, that sucks.  What do you suggest for Coldfire?
>
> I am not seeing an easy way to not fall back to something like the MIPS
> implementation for ColdFire. Could obviously assemblerize this to do better
> than gcc, but if it has to be atomic I think we are stuck with the irq locking.
>
> static inline bool cf_xor_is_negative_byte(unsigned long mask,
>                 volatile unsigned long *addr)
> {
>         unsigned long flags;
>         unsigned long data;
>
>         local_irq_save(flags)
>         data = *addr;
>         *addr = data ^ mask;
>         local_irq_restore(flags);
>
>         return (data & BIT(7)) != 0;
> }

The problem with this C implementation is that need to use loal_irq_save()
which results in some ugly header dependencies trying top include irqflags.h.

This version at least compiles and run, though we can probably do better still.


diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index e984af71df6b..99392c26e784 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -319,6 +319,48 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_change_bit(nr, addr);
}

+static inline bool cf_xor_unlock_is_negative_byte(unsigned long mask,
+ volatile unsigned long *addr)
+{
+ unsigned long data;
+
+ asm volatile (
+ "move.w %%sr,%%d1 \n\t"
+ "move.w %%d1,%%d0 \n\t"
+ "ori.l #0x0700,%%d0 \n\t"
+ "move.w %%d0,%%sr \n\t"
+
+ "move.l %2@,%0 \n\t"
+ "eor.l %1,%0 \n\t"
+ "move.l %0,%2@ \n\t"
+
+ "movew %%d1,%%sr \n"
+ : "=d" (data)
+ : "di" (mask), "a" (addr)
+ : "cc", "%d0", "%d1", "memory");
+
+ return (data & BIT(7)) != 0;
+}
+
+static inline bool m68k_xor_unlock_is_negative_byte(unsigned long mask,
+ volatile unsigned long *p)
+{
+ char result;
+ char *cp = (char *)p + 3; /* m68k is big-endian */
+
+ __asm__ __volatile__ ("eor.b %1, %2; smi %0"
+ : "=d" (result)
+ : "di" (mask), "o" (*cp)
+ : "memory");
+ return result;
+}
+
+#if defined(CONFIG_COLDFIRE)
+#define xor_unlock_is_negative_byte(mask, p) cf_xor_unlock_is_negative_byte(mask, p)
+#else
+#define xor_unlock_is_negative_byte(mask, p) m68k_xor_unlock_is_negative_byte(mask, p)
+#endif
+
/*
* The true 68020 and more advanced processors support the "bfffo"
* instruction for finding bits. ColdFire and simple 68000 parts


Regards
Greg

2023-09-21 02:42:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 09/17] m68k: Implement xor_unlock_is_negative_byte

On Wed, 20 Sept 2023 at 00:45, Greg Ungerer <[email protected]> wrote:
>
> The problem with this C implementation is that need to use loal_irq_save()
> which results in some ugly header dependencies trying top include irqflags.h.
>
> This version at least compiles and run, though we can probably do better still.

I was going to say "can't you use CAS?" but apparently coldfire
doesn't have that either. What a horrible thing.

I do wonder if we should just say "let's use the top bit instead"?

The reason we used bit #7 is that

- only x86 used to do the clear_bit_unlock_is_negative_byte

- it was easy with a simple "andb".

- it was just a trivial "move two bits around".

but honestly, on x86, doing it with "andl/andq" shouldn't be any
worse, and we can still use a (sign-extended) 8-bit immediate value to
xor the low seven bits and test the high bit at the same time - so it
should be basically exactly the same code sequence.

There's a question about mixing access widths - which can be deadly to
performance on x86 - but generally this op should be the only op on
the page flags in that sequence, and doing a byte access isn't
necessarily better.

Of course, using the top bit then screws with all the
zone/node/section/lru_gen bits that we currently put in the high bits.
So it's absolutely *not* just a trivial "move this bit" operation.

It would change all the <linux/page-flags-layout.h> games we do.

That might be enough for any sane person to go "this is not worth it",
but *if* Willy goes "I like the bit twiddling games", maybe he'd be
willing to look at that.

I mean, he wrote alpha assembler for this, that certainly says
*something* about WIlly ;)

Willy?

Linus