2022-09-20 07:29:15

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

local_paca is declared as global register asm("r13"), it is therefore
garantied to always ever be r13.

It is therefore not required to opencode r13 in the assembly, use
a reference to local_paca->irq_soft_mask instead.

This also allows removing the 'memory' clobber in irq_soft_mask_set()
as GCC now knows what is being modified in memory.

Using %X modifier will give GCC a bit more flexibility on the code
generation.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/hw_irq.h | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index e8de249339d8..dbe037ff4474 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -115,10 +115,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
{
unsigned long flags;

- asm volatile(
- "lbz %0,%1(13)"
- : "=r" (flags)
- : "i" (offsetof(struct paca_struct, irq_soft_mask)));
+ asm volatile("lbz%X1 %0,%1" : "=r" (flags) : "m" (local_paca->irq_soft_mask));

return flags;
}
@@ -147,12 +144,7 @@ static inline notrace void irq_soft_mask_set(unsigned long mask)
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
WARN_ON(mask && !(mask & IRQS_DISABLED));

- asm volatile(
- "stb %0,%1(13)"
- :
- : "r" (mask),
- "i" (offsetof(struct paca_struct, irq_soft_mask))
- : "memory");
+ asm volatile("stb%X0 %1,%0" : "=m" (local_paca->irq_soft_mask) : "r" (mask));
}

static inline notrace unsigned long irq_soft_mask_set_return(unsigned long mask)
--
2.37.1


2022-09-23 07:47:29

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Tue Sep 20, 2022 at 4:41 PM AEST, Christophe Leroy wrote:
> local_paca is declared as global register asm("r13"), it is therefore
> garantied to always ever be r13.
>
> It is therefore not required to opencode r13 in the assembly, use
> a reference to local_paca->irq_soft_mask instead.
>
> This also allows removing the 'memory' clobber in irq_soft_mask_set()
> as GCC now knows what is being modified in memory.

The code matches the changelog AFAIKS. But I don't know where it is
guaranteed it will always be r13 in GCC and Clang. I still don't know
where in the specification or documentation suggests this.

There was some assertion it would always be r13, but that can't be a
*general* rule. e.g., the following code:

struct foo {
#ifdef BIGDISP
int array[1024*1024];
#endif
char bar;
};

register struct foo *foo asm("r13");

static void setval(char val)
{
asm("stb%X0 %1,%0" : "=m" (foo->bar) : "r" (val));
}

int main(void)
{
setval(10);
}

With -O0 this generates stb 9,0(10) for me for GCC 12, and with -O2
-DBIGDISP it generates stb 10,0(9). So that makes me nervious. GCC
does not have some kind of correctness guarantee here, so it must not
have this in its regression tests etc., and who knows about clang.

If it is true for some particular subset of cases that we can guarantee,
e.g., using -O2 and irq_soft_mask offset within range of stb offset and
we can point to specification such that both GCC and Clang will follow
it, then okay. Otherwise I still think it's more trouble than it is
worth.

Thanks,
Nick

>
> Using %X modifier will give GCC a bit more flexibility on the code
> generation.
>
> Signed-off-by: Christophe Leroy <[email protected]>
> ---
> arch/powerpc/include/asm/hw_irq.h | 12 ++----------
> 1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index e8de249339d8..dbe037ff4474 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -115,10 +115,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
> {
> unsigned long flags;
>
> - asm volatile(
> - "lbz %0,%1(13)"
> - : "=r" (flags)
> - : "i" (offsetof(struct paca_struct, irq_soft_mask)));
> + asm volatile("lbz%X1 %0,%1" : "=r" (flags) : "m" (local_paca->irq_soft_mask));
>
> return flags;
> }
> @@ -147,12 +144,7 @@ static inline notrace void irq_soft_mask_set(unsigned long mask)
> if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> WARN_ON(mask && !(mask & IRQS_DISABLED));
>
> - asm volatile(
> - "stb %0,%1(13)"
> - :
> - : "r" (mask),
> - "i" (offsetof(struct paca_struct, irq_soft_mask))
> - : "memory");
> + asm volatile("stb%X0 %1,%0" : "=m" (local_paca->irq_soft_mask) : "r" (mask));
> }
>
> static inline notrace unsigned long irq_soft_mask_set_return(unsigned long mask)
> --
> 2.37.1

2022-09-23 12:49:44

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Fri, Sep 23, 2022 at 05:08:13PM +1000, Nicholas Piggin wrote:
> On Tue Sep 20, 2022 at 4:41 PM AEST, Christophe Leroy wrote:
> > local_paca is declared as global register asm("r13"), it is therefore
> > garantied to always ever be r13.
> >
> > It is therefore not required to opencode r13 in the assembly, use
> > a reference to local_paca->irq_soft_mask instead.

> The code matches the changelog AFAIKS. But I don't know where it is
> guaranteed it will always be r13 in GCC and Clang. I still don't know
> where in the specification or documentation suggests this.

"Global Register Variables" in the GCC manual.

> There was some assertion it would always be r13, but that can't be a
> *general* rule. e.g., the following code:
>
> struct foo {
> #ifdef BIGDISP
> int array[1024*1024];
> #endif
> char bar;
> };
>
> register struct foo *foo asm("r13");
>
> static void setval(char val)
> {
> asm("stb%X0 %1,%0" : "=m" (foo->bar) : "r" (val));
> }
>
> int main(void)
> {
> setval(10);
> }

Just use r13 directly in the asm, if that is what you want!

> With -O0 this generates stb 9,0(10) for me for GCC 12, and with -O2
> -DBIGDISP it generates stb 10,0(9). So that makes me nervious. GCC
> does not have some kind of correctness guarantee here, so it must not
> have this in its regression tests etc., and who knows about clang.

GCC has all kinds of correctness guarantees, here and elsewhere, that is
90% of a compiler's job. But you don't *tell* it what you consider
"correct" here.

You wrote "foo->bar", and this expression was translated to something
that derived from r13. If you made the asm something like
asm("stb%X0 %1,0(%0)" : : "r" (foo), "r" (val) : "memory");
it would work fine. It would also work fine if you wrote 13 in the
template directly. These things follow the rules, so are guaranteed.

The most important pieces of doc here may be
* Accesses to the variable may be optimized as usual and the register
remains available for allocation and use in any computations,
provided that observable values of the variable are not affected.
* If the variable is referenced in inline assembly, the type of
access must be provided to the compiler via constraints (*note
Constraints::). Accesses from basic asms are not supported.
but read the whole "Global Register Variables" chapter?

> If it is true for some particular subset of cases that we can guarantee,
> e.g., using -O2 and irq_soft_mask offset within range of stb offset and
> we can point to specification such that both GCC and Clang will follow
> it, then okay. Otherwise I still think it's more trouble than it is
> worth.

-O2 makes it much more likely some inline assembler things work as
intended, but it guarantees nothing. Sorry.


Segher

2022-09-23 16:36:08

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Fri Sep 23, 2022 at 10:18 PM AEST, Segher Boessenkool wrote:
> On Fri, Sep 23, 2022 at 05:08:13PM +1000, Nicholas Piggin wrote:
> > On Tue Sep 20, 2022 at 4:41 PM AEST, Christophe Leroy wrote:
> > > local_paca is declared as global register asm("r13"), it is therefore
> > > garantied to always ever be r13.
> > >
> > > It is therefore not required to opencode r13 in the assembly, use
> > > a reference to local_paca->irq_soft_mask instead.
>
> > The code matches the changelog AFAIKS. But I don't know where it is
> > guaranteed it will always be r13 in GCC and Clang. I still don't know
> > where in the specification or documentation suggests this.
>
> "Global Register Variables" in the GCC manual.
>
> > There was some assertion it would always be r13, but that can't be a
> > *general* rule. e.g., the following code:
> >
> > struct foo {
> > #ifdef BIGDISP
> > int array[1024*1024];
> > #endif
> > char bar;
> > };
> >
> > register struct foo *foo asm("r13");
> >
> > static void setval(char val)
> > {
> > asm("stb%X0 %1,%0" : "=m" (foo->bar) : "r" (val));
> > }
> >
> > int main(void)
> > {
> > setval(10);
> > }
>
> Just use r13 directly in the asm, if that is what you want!
>
> > With -O0 this generates stb 9,0(10) for me for GCC 12, and with -O2
> > -DBIGDISP it generates stb 10,0(9). So that makes me nervious. GCC
> > does not have some kind of correctness guarantee here, so it must not
> > have this in its regression tests etc., and who knows about clang.
>
> GCC has all kinds of correctness guarantees, here and elsewhere, that is
> 90% of a compiler's job. But you don't *tell* it what you consider
> "correct" here.

Right, that's what I expect. I think the confusion came from here,

https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-August/247595.html

In any case it is answered now.

> You wrote "foo->bar", and this expression was translated to something
> that derived from r13. If you made the asm something like
> asm("stb%X0 %1,0(%0)" : : "r" (foo), "r" (val) : "memory");
> it would work fine. It would also work fine if you wrote 13 in the
> template directly. These things follow the rules, so are guaranteed.
>
> The most important pieces of doc here may be
> * Accesses to the variable may be optimized as usual and the register
> remains available for allocation and use in any computations,
> provided that observable values of the variable are not affected.
> * If the variable is referenced in inline assembly, the type of
> access must be provided to the compiler via constraints (*note
> Constraints::). Accesses from basic asms are not supported.
> but read the whole "Global Register Variables" chapter?

I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
13. It doesn't say access via inline assembly is special, so if
optimized as usual means it could be accessed by any register like
access to a usual variable, then asm could also substitute a different
register for the access by the letter of it AFAIKS.

I think if it was obviously guaranteed then this might be marginally
better than explicit r13 in the asm

asm volatile(
"stb %0,%2(%1)"
:
: "r" (mask),
"r" (local_paca),
"i" (offsetof(struct paca_struct, irq_soft_mask))
: "memory");

But as it is I think we should just stick with explicit r13 base
in the asm.

Thanks,
Nick

2022-09-23 22:54:44

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> 13. It doesn't say access via inline assembly is special,

But it is. It is for all register variables, local and global. I agree
this isn't documented clearly. For local register variables this is the
*only* thing guaranteed; for global register vars there is more (it
changes the ABI, there are safe/restore effects, that kind of thing).

Never it is guaranteed that all accesses through this variable will use
the register directly: this fundamentally cannot work on all archs, and
also not at -O0. More in general it doesn't work if some basic
optimisations are not done, be it because of a compiler deficiency, or a
straight out bug, or maybe it is a conscious choice in some cases.

> I think if it was obviously guaranteed then this might be marginally
> better than explicit r13 in the asm
>
> asm volatile(
> "stb %0,%2(%1)"
> :
> : "r" (mask),
> "r" (local_paca),
> "i" (offsetof(struct paca_struct, irq_soft_mask))
> : "memory");

(Please use "n" instead of "i". Doesn't matter here, but it does in
many other places.)


Segher

2022-09-24 04:26:06

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Sat Sep 24, 2022 at 8:15 AM AEST, Segher Boessenkool wrote:
> On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> > I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> > 13. It doesn't say access via inline assembly is special,
>
> But it is. It is for all register variables, local and global. I agree
> this isn't documented clearly. For local register variables this is the
> *only* thing guaranteed; for global register vars there is more (it
> changes the ABI, there are safe/restore effects, that kind of thing).
>
> Never it is guaranteed that all accesses through this variable will use
> the register directly: this fundamentally cannot work on all archs, and
> also not at -O0. More in general it doesn't work if some basic
> optimisations are not done, be it because of a compiler deficiency, or a
> straight out bug, or maybe it is a conscious choice in some cases.

Right, and we know better than to rely on a spec that is not 100% air
tight with no possibility of lawyering. This may be what the intention is,
it may be what gcc and clang do now, and everybody involved today agrees
with that interpretation. We still have to maintain the kernel tomorrow
though, so explicit r13 it must be.

>
> > I think if it was obviously guaranteed then this might be marginally
> > better than explicit r13 in the asm
> >
> > asm volatile(
> > "stb %0,%2(%1)"
> > :
> > : "r" (mask),
> > "r" (local_paca),
> > "i" (offsetof(struct paca_struct, irq_soft_mask))
> > : "memory");
>
> (Please use "n" instead of "i". Doesn't matter here, but it does in
> many other places.)

What is the difference? Just "i" allows assmebly-time constants?

How about "I"? that looks like it was made for it. Gives much better
errors.

Thanks,
Nick

2022-09-24 16:34:34

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Sat, Sep 24, 2022 at 02:00:55PM +1000, Nicholas Piggin wrote:
> On Sat Sep 24, 2022 at 8:15 AM AEST, Segher Boessenkool wrote:
> > Never it is guaranteed that all accesses through this variable will use
> > the register directly: this fundamentally cannot work on all archs, and
> > also not at -O0. More in general it doesn't work if some basic
> > optimisations are not done, be it because of a compiler deficiency, or a
> > straight out bug, or maybe it is a conscious choice in some cases.
>
> Right, and we know better than to rely on a spec that is not 100% air
> tight with no possibility of lawyering. This may be what the intention is,
> it may be what gcc and clang do now, and everybody involved today agrees
> with that interpretation. We still have to maintain the kernel tomorrow
> though, so explicit r13 it must be.

It has *always* been this way. Very old GCC (say, GCC < 3.x) tried to
guarantee more, even, but that turned out to be untenable. But this is
all in the distant past.

I have no idea if clang implements the GCC C extensions correctly. If
they don't it is just another compiler bug and they'll just have to fix
it.

The rules *are* airtight. But this does not mean you can assume random
other stuff, adjacent or not :-P

> > (Please use "n" instead of "i". Doesn't matter here, but it does in
> > many other places.)
>
> What is the difference? Just "i" allows assmebly-time constants?

"n" means "number": constant integers. "i" means "immediate": any
constant. The address of a global variable is "i" but not "n" (in most
ABIs, no -fPIC and such) for example.

> How about "I"? that looks like it was made for it. Gives much better
> errors.

For PowerPC, "I" is a signed 16-bit number. "K" is unsigned 16-bit,
and there are more as well. Just like for "n" you'll have to make
sure the number you feed in will work in the assembler, and you'll get
the same error message (but, as you say, for "I" in some cases the
compiler will give errors already). It's otherwise only useful if you
use e.g. "IL" as constraint, and then write "addi%e2 %0,%1,%2" for
example, so the asm can generate "addis" insns. Such things aren't very
often useful in internal asm. The main reason any of this exists is
this is how GCC works internally; extended inline asm exposes a lot of
that.


Segher

2022-09-24 18:50:22

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/irq: Modernise inline assembly in irq_soft_mask_{set,return}

On Fri, Sep 23, 2022 at 05:15:43PM -0500, Segher Boessenkool wrote:
> On Sat, Sep 24, 2022 at 02:26:52AM +1000, Nicholas Piggin wrote:
> > I still don't see what clauses guarantees asm("%0" ::"r"(foo)) to give
> > 13. It doesn't say access via inline assembly is special,
>
> But it is. It is for all register variables, local and global. I agree
> this isn't documented clearly. For local register variables this is the
> *only* thing guaranteed; for global register vars there is more (it
> changes the ABI, there are safe/restore effects, that kind of thing).

I filed <https://gcc.gnu.org/PR107027> to improve the docs. Thanks for
bringing this to our attention!


Segher