2019-07-29 20:27:07

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH] powerpc: workaround clang codegen bug in dcbz

Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed
what looks like a codegen bug in Clang's handling of `%y` output
template with `Z` constraint. This is resulting in panics during boot
for 32b powerpc builds w/ Clang, as reported by our CI.

Add back the original code that worked behind a preprocessor check for
__clang__ until we can fix LLVM.

Further, it seems that clang allnoconfig builds are unhappy with `Z`, as
reported by 0day bot. This is likely because Clang warns about inline
asm constraints when the constraint requires inlining to be semantically
valid.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
Debugged-by: Nathan Chancellor <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Suggested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
Alternatively, we could just revert 6c5875843b87. It seems that GCC
generates the same code for these functions for out of line versions.
But I'm not sure how the inlined code generated would be affected.

arch/powerpc/include/asm/cache.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..72983da94dce 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -105,6 +105,30 @@ extern void _set_L3CR(unsigned long);
#define _set_L3CR(val) do { } while(0)
#endif

+/*
+ * Workaround for https://bugs.llvm.org/show_bug.cgi?id=42762.
+ */
+#ifdef __clang__
+static inline void dcbz(void *addr)
+{
+ __asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbi(void *addr)
+{
+ __asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbf(void *addr)
+{
+ __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbst(void *addr)
+{
+ __asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory");
+}
+#else
static inline void dcbz(void *addr)
{
__asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
@@ -124,6 +148,7 @@ static inline void dcbst(void *addr)
{
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
}
+#endif /* __clang__ */
#endif /* !__ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_CACHE_H */
--
2.22.0.709.g102302147b-goog


2019-07-29 20:34:50

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed
> what looks like a codegen bug in Clang's handling of `%y` output
> template with `Z` constraint. This is resulting in panics during boot
> for 32b powerpc builds w/ Clang, as reported by our CI.
>
> Add back the original code that worked behind a preprocessor check for
> __clang__ until we can fix LLVM.
>
> Further, it seems that clang allnoconfig builds are unhappy with `Z`, as
> reported by 0day bot. This is likely because Clang warns about inline
> asm constraints when the constraint requires inlining to be semantically
> valid.
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
> Link: https://github.com/ClangBuiltLinux/linux/issues/593
> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
> Debugged-by: Nathan Chancellor <[email protected]>
> Reported-by: Nathan Chancellor <[email protected]>
> Reported-by: kbuild test robot <[email protected]>
> Suggested-by: Nathan Chancellor <[email protected]>
> Signed-off-by: Nick Desaulniers <[email protected]>
> ---
> Alternatively, we could just revert 6c5875843b87. It seems that GCC
> generates the same code for these functions for out of line versions.
> But I'm not sure how the inlined code generated would be affected.

For the record:

https://godbolt.org/z/z57VU7

This seems consistent with what Michael found so I don't think a revert
is entirely unreasonable.

Either way:

Reviewed-by: Nathan Chancellor <[email protected]>

2019-07-29 20:50:35

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Mon, Jul 29, 2019 at 1:47 PM Nathan Chancellor
<[email protected]> wrote:
>
> On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote:
> > On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor
> > <[email protected]> wrote:
> > >
> > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> > > > But I'm not sure how the inlined code generated would be affected.
> > >
> > > For the record:
> > >
> > > https://godbolt.org/z/z57VU7
> > >
> > > This seems consistent with what Michael found so I don't think a revert
> > > is entirely unreasonable.
> >
> > Thanks for debugging/reporting/testing and the Godbolt link which
> > clearly shows that the codegen for out of line versions is no
> > different. The case I can't comment on is what happens when those
> > `static inline` functions get inlined (maybe the original patch
> > improves those cases?).
> > --
> > Thanks,
> > ~Nick Desaulniers
>
> I'll try to build with various versions of GCC and compare the
> disassembly of the one problematic location that I found and see
> what it looks like.

Also, guess I should have included the tag:
Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
--
Thanks,
~Nick Desaulniers

2019-07-30 05:54:57

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz



Le 29/07/2019 à 22:32, Nathan Chancellor a écrit :
> On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
>> Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed
>> what looks like a codegen bug in Clang's handling of `%y` output
>> template with `Z` constraint. This is resulting in panics during boot
>> for 32b powerpc builds w/ Clang, as reported by our CI.
>>
>> Add back the original code that worked behind a preprocessor check for
>> __clang__ until we can fix LLVM.
>>
>> Further, it seems that clang allnoconfig builds are unhappy with `Z`, as
>> reported by 0day bot. This is likely because Clang warns about inline
>> asm constraints when the constraint requires inlining to be semantically
>> valid.
>>
>> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
>> Link: https://github.com/ClangBuiltLinux/linux/issues/593
>> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
>> Debugged-by: Nathan Chancellor <[email protected]>
>> Reported-by: Nathan Chancellor <[email protected]>
>> Reported-by: kbuild test robot <[email protected]>
>> Suggested-by: Nathan Chancellor <[email protected]>
>> Signed-off-by: Nick Desaulniers <[email protected]>
>> ---
>> Alternatively, we could just revert 6c5875843b87. It seems that GCC
>> generates the same code for these functions for out of line versions.
>> But I'm not sure how the inlined code generated would be affected.
>
> For the record:
>
> https://godbolt.org/z/z57VU7
>
> This seems consistent with what Michael found so I don't think a revert
> is entirely unreasonable.

Your example functions are too simple to show anything. The functions
takes only one parameter so of course GCC won't use two registers
allthough given the opportunity.

Christophe

>
> Either way:
>
> Reviewed-by: Nathan Chancellor <[email protected]>
>

2019-07-30 07:16:49

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor
<[email protected]> wrote:
>
> On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> > But I'm not sure how the inlined code generated would be affected.
>
> For the record:
>
> https://godbolt.org/z/z57VU7
>
> This seems consistent with what Michael found so I don't think a revert
> is entirely unreasonable.

Thanks for debugging/reporting/testing and the Godbolt link which
clearly shows that the codegen for out of line versions is no
different. The case I can't comment on is what happens when those
`static inline` functions get inlined (maybe the original patch
improves those cases?).
--
Thanks,
~Nick Desaulniers

2019-07-30 07:16:53

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote:
> On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor
> <[email protected]> wrote:
> >
> > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> > > But I'm not sure how the inlined code generated would be affected.
> >
> > For the record:
> >
> > https://godbolt.org/z/z57VU7
> >
> > This seems consistent with what Michael found so I don't think a revert
> > is entirely unreasonable.
>
> Thanks for debugging/reporting/testing and the Godbolt link which
> clearly shows that the codegen for out of line versions is no
> different. The case I can't comment on is what happens when those
> `static inline` functions get inlined (maybe the original patch
> improves those cases?).
> --
> Thanks,
> ~Nick Desaulniers

I'll try to build with various versions of GCC and compare the
disassembly of the one problematic location that I found and see
what it looks like.

Cheers,
Nathan

2019-07-30 08:04:03

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
> For the record:
>
> https://godbolt.org/z/z57VU7
>
> This seems consistent with what Michael found so I don't think a revert
> is entirely unreasonable.

Try this:

https://godbolt.org/z/6_ZfVi

This matters in non-trivial loops, for example. But all current cases
where such non-trivial loops are done with cache block instructions are
actually written in real assembler already, using two registers.
Because performance matters. Not that I recommend writing code as
critical as memset in C with inline asm :-)


Segher

2019-07-30 14:32:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool
<[email protected]> wrote:
>
> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
> > For the record:
> >
> > https://godbolt.org/z/z57VU7
> >
> > This seems consistent with what Michael found so I don't think a revert
> > is entirely unreasonable.
>
> Try this:
>
> https://godbolt.org/z/6_ZfVi
>
> This matters in non-trivial loops, for example. But all current cases
> where such non-trivial loops are done with cache block instructions are
> actually written in real assembler already, using two registers.
> Because performance matters. Not that I recommend writing code as
> critical as memset in C with inline asm :-)

Upon a second look, I think the issue is that the "Z" is an input argument
when it should be an output. clang decides that it can make a copy of the
input and pass that into the inline asm. This is not the most efficient
way, but it seems entirely correct according to the constraints.

Changing it to an output "=Z" constraint seems to make it work:

https://godbolt.org/z/FwEqHf

Clang still doesn't use the optimum form, but it passes the correct pointer.

Arnd

2019-07-30 16:35:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
<[email protected]> wrote:
>
> On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > Upon a second look, I think the issue is that the "Z" is an input argument
> > when it should be an output. clang decides that it can make a copy of the
> > input and pass that into the inline asm. This is not the most efficient
> > way, but it seems entirely correct according to the constraints.
>
> Most dcb* (and all icb*) do not change the memory pointed to. The
> memory is an input here, logically as well, and that is obvious.

Ah, right. I had only thought of dcbz here, but you are right that using
an output makes little sense for the others.

readl() is another example where powerpc currently uses "Z" for an
input, which illustrates this even better.

> > Changing it to an output "=Z" constraint seems to make it work:
> >
> > https://godbolt.org/z/FwEqHf
> >
> > Clang still doesn't use the optimum form, but it passes the correct pointer.
>
> As I said many times already, LLVM does not seem to treat all asm
> operands as lvalues. That is a bug. And it is critical for memory
> operands for example, as should be obvious if you look at at for a few
> seconds (you pass *that* memory, not a copy of it). The thing you pass
> has an identity. It's an lvalue. This is true for *all* inline asm
> operands, not just output operands and memory operands, but it is most
> obvious there.

From experimentation, I would guess that llvm handles "m" correctly, but
not "Z". See https://godbolt.org/z/uqfDx_ for another example.

> Or, LLVM might have a bug elsewhere.
>
> Either way, the asm is fine, and it has worked fine in GCC since
> forever. Changing this constraint to be an output constraint would
> just be obfuscation (we could change *all* operands to *everything* to
> be inout ("+") constraints, and it won't affect correctness, just the
> reader's sanity).

I would still argue that for dcbz specifically, an output makes more
sense than an input, but as you say that does not solve the others.

Arnd

2019-07-30 16:46:37

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
> <[email protected]> wrote:
> >
> > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > > Upon a second look, I think the issue is that the "Z" is an input argument
> > > when it should be an output. clang decides that it can make a copy of the
> > > input and pass that into the inline asm. This is not the most efficient
> > > way, but it seems entirely correct according to the constraints.
> >
> > Most dcb* (and all icb*) do not change the memory pointed to. The
> > memory is an input here, logically as well, and that is obvious.
>
> Ah, right. I had only thought of dcbz here, but you are right that using
> an output makes little sense for the others.
>
> readl() is another example where powerpc currently uses "Z" for an
> input, which illustrates this even better.

in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as
well, its (not byte reversing) read will be atomic just fine, so things
will still work correctly.

The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not
look like they will work correctly if an update form address is chosen,
but that won't happen because the constraint is "m" instead of "m<>",
making the %Un pretty useless (it will always be the empty string).

> > As I said many times already, LLVM does not seem to treat all asm
> > operands as lvalues. That is a bug. And it is critical for memory
> > operands for example, as should be obvious if you look at at for a few
> > seconds (you pass *that* memory, not a copy of it). The thing you pass
> > has an identity. It's an lvalue. This is true for *all* inline asm
> > operands, not just output operands and memory operands, but it is most
> > obvious there.
>
> >From experimentation, I would guess that llvm handles "m" correctly, but
> not "Z". See https://godbolt.org/z/uqfDx_ for another example.

Yeah, it does not treat "Z" as a memory constraint apparently, and it
special cases output operands and memory operands to be lvalues, but
does not do that for everything else as it should.

> > Or, LLVM might have a bug elsewhere.
> >
> > Either way, the asm is fine, and it has worked fine in GCC since
> > forever. Changing this constraint to be an output constraint would
> > just be obfuscation (we could change *all* operands to *everything* to
> > be inout ("+") constraints, and it won't affect correctness, just the
> > reader's sanity).
>
> I would still argue that for dcbz specifically, an output makes more
> sense than an input, but as you say that does not solve the others.

An output would be somewhat misleading. dcbz zeroes the whole aligned
cache block sized region of memory its operand points into. The kernel
dcbz functions do not easily know the cache block size I think, and
besides, you want a "memory" clobber anyway, also for the other dcb*,
so it won't help anything. Also, the compiler can almost never use the
extra info ("affects the aligned 32B or 128B block this points into")
usefully anyway; it will usually see it as "can alias pretty much
anything". Just use a "memory" clobber :-/


Segher

2019-07-30 17:10:31

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

Arnd Bergmann <[email protected]> writes:
> On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool
> <[email protected]> wrote:
>> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
>> > For the record:
>> >
>> > https://godbolt.org/z/z57VU7
>> >
>> > This seems consistent with what Michael found so I don't think a revert
>> > is entirely unreasonable.
>>
>> Try this:
>>
>> https://godbolt.org/z/6_ZfVi
>>
>> This matters in non-trivial loops, for example. But all current cases
>> where such non-trivial loops are done with cache block instructions are
>> actually written in real assembler already, using two registers.
>> Because performance matters. Not that I recommend writing code as
>> critical as memset in C with inline asm :-)
>
> Upon a second look, I think the issue is that the "Z" is an input argument
> when it should be an output. clang decides that it can make a copy of the
> input and pass that into the inline asm. This is not the most efficient
> way, but it seems entirely correct according to the constraints.
>
> Changing it to an output "=Z" constraint seems to make it work:
>
> https://godbolt.org/z/FwEqHf
>
> Clang still doesn't use the optimum form, but it passes the correct pointer.

Thanks Arnd. This seems like a better solution.

I'll drop the revert I have staged.

Segher does this look OK to you?

Nathan/Nick, are one of you able to test this with your clang CI?

cheers

2019-07-30 17:24:54

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> Upon a second look, I think the issue is that the "Z" is an input argument
> when it should be an output. clang decides that it can make a copy of the
> input and pass that into the inline asm. This is not the most efficient
> way, but it seems entirely correct according to the constraints.

Most dcb* (and all icb*) do not change the memory pointed to. The
memory is an input here, logically as well, and that is obvious.

> Changing it to an output "=Z" constraint seems to make it work:
>
> https://godbolt.org/z/FwEqHf
>
> Clang still doesn't use the optimum form, but it passes the correct pointer.

As I said many times already, LLVM does not seem to treat all asm
operands as lvalues. That is a bug. And it is critical for memory
operands for example, as should be obvious if you look at at for a few
seconds (you pass *that* memory, not a copy of it). The thing you pass
has an identity. It's an lvalue. This is true for *all* inline asm
operands, not just output operands and memory operands, but it is most
obvious there.

Or, LLVM might have a bug elsewhere.

Either way, the asm is fine, and it has worked fine in GCC since
forever. Changing this constraint to be an output constraint would
just be obfuscation (we could change *all* operands to *everything* to
be inout ("+") constraints, and it won't affect correctness, just the
reader's sanity).


Segher

2019-07-30 18:09:41

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 11:16:37AM -0500, Segher Boessenkool wrote:
> in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as
> well, its (not byte reversing) read will be atomic just fine, so things
> will still work correctly.
>
> The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not
> look like they will work correctly if an update form address is chosen,
> but that won't happen because the constraint is "m" instead of "m<>",
> making the %Un pretty useless (it will always be the empty string).

Btw, this is true since GCC 4.8; before 4.8, plain "m" *could* have an
automodify (autoinc, autodec, etc.) side effect. What is the minimum
GCC version required, these days?

https://gcc.gnu.org/PR44492
https://gcc.gnu.org/r161328


Segher

2019-07-30 18:25:23

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool
<[email protected]> wrote:
>
> On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote:
> > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
> > <[email protected]> wrote:
> > >
> > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > > > Upon a second look, I think the issue is that the "Z" is an input argument
> > > > when it should be an output. clang decides that it can make a copy of the
> > > > input and pass that into the inline asm. This is not the most efficient
> > > > way, but it seems entirely correct according to the constraints.
> > >
> > > Most dcb* (and all icb*) do not change the memory pointed to. The
> > > memory is an input here, logically as well, and that is obvious.
> >
> > Ah, right. I had only thought of dcbz here, but you are right that using
> > an output makes little sense for the others.
> >
> > readl() is another example where powerpc currently uses "Z" for an
> > input, which illustrates this even better.
>
> in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as
> well, its (not byte reversing) read will be atomic just fine, so things
> will still work correctly.

byteorder is fine, the problem I was thinking of is when moving the load/store
instructions around the barriers that synchronize with DMA, or turning
them into different-size accesses. Changing two consecutive 16-bit mmio reads
into an unaligned 32-bit read will rarely have the intended effect ;-)

Arnd

2019-07-30 18:26:53

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 7:07 PM Segher Boessenkool
<[email protected]> wrote:
>
> On Tue, Jul 30, 2019 at 11:16:37AM -0500, Segher Boessenkool wrote:
> > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as
> > well, its (not byte reversing) read will be atomic just fine, so things
> > will still work correctly.
> >
> > The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not
> > look like they will work correctly if an update form address is chosen,
> > but that won't happen because the constraint is "m" instead of "m<>",
> > making the %Un pretty useless (it will always be the empty string).
>
> Btw, this is true since GCC 4.8; before 4.8, plain "m" *could* have an
> automodify (autoinc, autodec, etc.) side effect. What is the minimum
> GCC version required, these days?

gcc-4.6, but an architecture can require a higher version.

Arnd

2019-07-31 01:37:25

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz

On Tue, Jul 30, 2019 at 08:24:14PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool
> <[email protected]> wrote:
> > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as
> > well, its (not byte reversing) read will be atomic just fine, so things
> > will still work correctly.
>
> byteorder is fine, the problem I was thinking of is when moving the load/store
> instructions around the barriers that synchronize with DMA, or turning
> them into different-size accesses. Changing two consecutive 16-bit mmio reads
> into an unaligned 32-bit read will rarely have the intended effect ;-)

Most such barriers will also work on the copy accesses, I think. But
yes it depends on exactly how it is written. The {in,out}_{be,le}<N>
ones use sync;store for out and sync;load;trap;isync for in, so they
should be safe ;-)

(Well, almost -- writes to I/O will not necessarily actually happen
before other stores, not from these macros alone at least).

Should be pretty easy to check what LLVM makes of this?


Segher

2019-08-09 18:23:09

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH] powerpc: fix inline asm constraints for dcbz

The input parameter is modified, so it should be an output parameter
with "=" to make it so that a copy of the input is not made by Clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Link: https://godbolt.org/z/QwhZXi
Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
Debugged-by: Nathan Chancellor <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Suggested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
Green CI run:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122521976
https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37

arch/powerpc/include/asm/cache.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..5a0df6a1b9dc 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long);

static inline void dcbz(void *addr)
{
- __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
}

static inline void dcbi(void *addr)
{
- __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
}

static inline void dcbf(void *addr)
{
- __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbf %y0" : "=Z"(*(u8 *)addr) :: "memory");
}

static inline void dcbst(void *addr)
{
- __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbst %y0" : "=Z"(*(u8 *)addr) :: "memory");
}
#endif /* !__ASSEMBLY__ */
#endif /* __KERNEL__ */
--
2.23.0.rc1.153.gdeed80330f-goog

2019-08-09 18:29:55

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
Linux <[email protected]> wrote:

> static inline void dcbz(void *addr)
> {
> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> }
>
> static inline void dcbi(void *addr)
> {
> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> }

I think the result of the discussion was that an output argument only kind-of
makes sense for dcbz, but for the others it's really an input, and clang is
wrong in the way it handles the "Z" constraint by making a copy, which it
doesn't do for "m".

I'm not sure whether it's correct to use "m" instead of "Z" here, which
would be a better workaround if that works. More importantly though,
clang really needs to be fixed to handle "Z" correctly.

Arnd

2019-08-09 20:03:34

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

Arnd Bergmann <[email protected]> a écrit :

> On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> Linux <[email protected]> wrote:
>
>> static inline void dcbz(void *addr)
>> {
>> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
>> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
>> }
>>
>> static inline void dcbi(void *addr)
>> {
>> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
>> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
>> }
>
> I think the result of the discussion was that an output argument only kind-of
> makes sense for dcbz, but for the others it's really an input, and clang is
> wrong in the way it handles the "Z" constraint by making a copy, which it
> doesn't do for "m".
>
> I'm not sure whether it's correct to use "m" instead of "Z" here, which
> would be a better workaround if that works. More importantly though,
> clang really needs to be fixed to handle "Z" correctly.

As the benefit is null, I think the best is probably to reverse my
original commit until at least CLang is fixed, as initialy suggested
by mpe

Christophe



2019-08-09 20:14:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy
<[email protected]> wrote:
>
> Arnd Bergmann <[email protected]> a écrit :
> > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> > Linux <[email protected]> wrote:
> >
> >> static inline void dcbz(void *addr)
> >> {
> >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >> }
> >>
> >> static inline void dcbi(void *addr)
> >> {
> >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >> }
> >
> > I think the result of the discussion was that an output argument only kind-of
> > makes sense for dcbz, but for the others it's really an input, and clang is
> > wrong in the way it handles the "Z" constraint by making a copy, which it
> > doesn't do for "m".
> >
> > I'm not sure whether it's correct to use "m" instead of "Z" here, which
> > would be a better workaround if that works. More importantly though,
> > clang really needs to be fixed to handle "Z" correctly.
>
> As the benefit is null, I think the best is probably to reverse my
> original commit until at least CLang is fixed, as initialy suggested
> by mpe

Yes, makes sense.

There is one other use of the "Z" constraint, so on top of the revert, I
think it might be helpful if Nick could check if the patch below makes
any difference with clang and, if it does, whether the current version
is broken.

Arnd

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 23e5d5d16c7e..28b467779328 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
__iomem *addr) \
{ \
u##size ret; \
__asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \
- : "=r" (ret) : "Z" (*addr) : "memory"); \
+ : "=r" (ret) : "m" (*addr) : "memory"); \
return ret; \
}

@@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size
__iomem *addr) \
static inline void name(volatile u##size __iomem *addr, u##size val) \
{ \
__asm__ __volatile__("sync;"#insn" %1,%y0" \
- : "=Z" (*addr) : "r" (val) : "memory"); \
+ : "=m" (*addr) : "r" (val) : "memory"); \
mmiowb_set_pending(); \
}

2019-08-09 20:38:07

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 09, 2019 at 11:21:05AM -0700, Nick Desaulniers wrote:
> The input parameter is modified, so it should be an output parameter
> with "=" to make it so that a copy of the input is not made by Clang.
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
> Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers
> Link: https://github.com/ClangBuiltLinux/linux/issues/593
> Link: https://godbolt.org/z/QwhZXi
> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
> Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
> Debugged-by: Nathan Chancellor <[email protected]>
> Reported-by: Nathan Chancellor <[email protected]>
> Reported-by: kbuild test robot <[email protected]>
> Suggested-by: Arnd Bergmann <[email protected]>
> Suggested-by: Nathan Chancellor <[email protected]>
> Signed-off-by: Nick Desaulniers <[email protected]>

I applied this patch as well as a revert of the original patch and both
clang and GCC appear to generate the same code; I think a straight
revert would be better.

Crude testing script and the generated files attached.

Cheers,
Nathan


Attachments:
(No filename) (1.15 kB)
tmp.bRmcRT0jd0.sh (2.59 kB)
testing-output.tar.gz (16.03 kB)
Download all attachments

2019-08-09 21:57:27

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 09, 2019 at 08:28:19PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> Linux <[email protected]> wrote:
>
> > static inline void dcbz(void *addr)
> > {
> > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> > }
> >
> > static inline void dcbi(void *addr)
> > {
> > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> > }
>
> I think the result of the discussion was that an output argument only kind-of
> makes sense for dcbz, but for the others it's really an input, and clang is
> wrong in the way it handles the "Z" constraint by making a copy, which it
> doesn't do for "m".

Yes. And clang has probably miscompiled this in all kernels since we
have used "Z" for the first time, in 2008 (0f3d6bcd391b).

It is not necessarily fatal or at least not easily visible for the I/O
accessors: it "just" gets memory ordering wrong slightly (it looks like
it does the sync;tw;isync thing around an extra stack access, after it
has performed the actual I/O as any other memory load, without any
synchronisation).

> I'm not sure whether it's correct to use "m" instead of "Z" here, which
> would be a better workaround if that works. More importantly though,
> clang really needs to be fixed to handle "Z" correctly.

"m" allows offset addressing, which these insns do not. That is the
same reason you need the "y" output modifier. "m" is wrong here.

We have other memory constraints, but do those work with LLVM?


Segher

2019-08-09 22:03:09

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 09, 2019 at 10:03:01PM +0200, Christophe Leroy wrote:
> Arnd Bergmann <[email protected]> a ?crit?:
>
> >On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> >Linux <[email protected]> wrote:
> >
> >> static inline void dcbz(void *addr)
> >> {
> >>- __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> >>+ __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >> }
> >>
> >> static inline void dcbi(void *addr)
> >> {
> >>- __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> >>+ __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >> }
> >
> >I think the result of the discussion was that an output argument only
> >kind-of
> >makes sense for dcbz, but for the others it's really an input, and clang is
> >wrong in the way it handles the "Z" constraint by making a copy, which it
> >doesn't do for "m".
> >
> >I'm not sure whether it's correct to use "m" instead of "Z" here, which
> >would be a better workaround if that works. More importantly though,
> >clang really needs to be fixed to handle "Z" correctly.
>
> As the benefit is null, I think the best is probably to reverse my
> original commit until at least CLang is fixed, as initialy suggested
> by mpe

And what about the other uses of "Z"?


Also, if you use C routines (instead of assembler code) for the basic
"clear a block" and the like routines, as there have been patches for
recently, the benefit is not zero.


Segher

2019-08-09 22:04:41

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 9, 2019 at 1:13 PM Arnd Bergmann <[email protected]> wrote:
>
> On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy
> <[email protected]> wrote:
> >
> > Arnd Bergmann <[email protected]> a écrit :
> > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> > > Linux <[email protected]> wrote:
> > >
> > >> static inline void dcbz(void *addr)
> > >> {
> > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> > >> }
> > >>
> > >> static inline void dcbi(void *addr)
> > >> {
> > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> > >> }
> > >
> > > I think the result of the discussion was that an output argument only kind-of
> > > makes sense for dcbz, but for the others it's really an input, and clang is
> > > wrong in the way it handles the "Z" constraint by making a copy, which it
> > > doesn't do for "m".
> > >
> > > I'm not sure whether it's correct to use "m" instead of "Z" here, which
> > > would be a better workaround if that works. More importantly though,
> > > clang really needs to be fixed to handle "Z" correctly.
> >
> > As the benefit is null, I think the best is probably to reverse my
> > original commit until at least CLang is fixed, as initialy suggested
> > by mpe
>
> Yes, makes sense.
>
> There is one other use of the "Z" constraint, so on top of the revert, I
> think it might be helpful if Nick could check if the patch below makes
> any difference with clang and, if it does, whether the current version
> is broken.
>
> Arnd
>
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index 23e5d5d16c7e..28b467779328 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
> __iomem *addr) \
> { \
> u##size ret; \
> __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \
> - : "=r" (ret) : "Z" (*addr) : "memory"); \
> + : "=r" (ret) : "m" (*addr) : "memory"); \
> return ret; \
> }
>
> @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size
> __iomem *addr) \
> static inline void name(volatile u##size __iomem *addr, u##size val) \
> { \
> __asm__ __volatile__("sync;"#insn" %1,%y0" \
> - : "=Z" (*addr) : "r" (val) : "memory"); \
> + : "=m" (*addr) : "r" (val) : "memory"); \
> mmiowb_set_pending(); \
> }

Does not work:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122654899
https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37

--
Thanks,
~Nick Desaulniers

2019-08-09 22:05:04

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3] Revert "powerpc: slightly improve cache helpers"

This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a.

Work around Clang bug preventing ppc32 from booting.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Debugged-by: Nathan Chancellor <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
Changes V2 -> V3:
* Just revert, as per Christophe.
Changes V1 -> V2:
* Change to ouput paremeter.


arch/powerpc/include/asm/cache.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..45e3137ccd71 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long);

static inline void dcbz(void *addr)
{
- __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory");
}

static inline void dcbi(void *addr)
{
- __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory");
}

static inline void dcbf(void *addr)
{
- __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
}

static inline void dcbst(void *addr)
{
- __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
+ __asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory");
}
#endif /* !__ASSEMBLY__ */
#endif /* __KERNEL__ */
--
2.23.0.rc1.153.gdeed80330f-goog

2019-08-09 22:12:15

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: fix inline asm constraints for dcbz

On Fri, Aug 09, 2019 at 10:12:56PM +0200, Arnd Bergmann wrote:
> @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
> __iomem *addr) \
> { \
> u##size ret; \
> __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \
> - : "=r" (ret) : "Z" (*addr) : "memory"); \
> + : "=r" (ret) : "m" (*addr) : "memory"); \
> return ret; \
> }

That will no longer compile something like
u8 *p;
u16 x = in_le16(p + 12);
(you'll get something like "invalid %y value, try using the 'Z' constraint").

So then you remove the %y, but that makes you get something like
sync;lhbrx 3,12(3);twi 0,3,0;isync
which is completely wrong.


Segher

2019-08-10 09:09:56

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v3] Revert "powerpc: slightly improve cache helpers"

Nick Desaulniers <[email protected]> writes:
> This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a.
>
> Work around Clang bug preventing ppc32 from booting.
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
> Link: https://github.com/ClangBuiltLinux/linux/issues/593
> Debugged-by: Nathan Chancellor <[email protected]>
> Reported-by: Nathan Chancellor <[email protected]>
> Reported-by: kbuild test robot <[email protected]>
> Suggested-by: Christophe Leroy <[email protected]>
> Signed-off-by: Nick Desaulniers <[email protected]>
> ---
> Changes V2 -> V3:
> * Just revert, as per Christophe.
> Changes V1 -> V2:
> * Change to ouput paremeter.

Thanks. I actually already had this revert in my tree since ~10 days
ago, but hadn't pushed it yet because the discussion was ongoing.

So I'll just use that version, and ask Linus to pull it.

cheers