LinuxLists.cc - [PATCH] x86: support i386 with Clang

2020-05-04 23:08:03

Subject: [PATCH] x86: support i386 with Clang

GCC and Clang are architecturally different, which leads to subtle
issues for code that's invalid but clearly dead. This can happen with
code that emulates polymorphism with the preprocessor and sizeof.

GCC will perform semantic analysis after early inlining and dead code
elimination, so it will not warn on invalid code that's dead. Clang
strictly performs optimizations after semantic analysis, so it will warn
for dead code.

Neither Clang nor GCC like this very much with -m32:

long long ret;
asm ("movb $5, %0" : "=q" (ret));

However, GCC can tolerate this variant:

long long ret;
switch (sizeof(ret)) {
case 1:
asm ("movb $5, %0" : "=q" (ret));
break;
case 8:;
}

Clang, on the other hand, won't accept that because it validates the
inline asm for the '1' case *before* the optimisation phase where it
realises that it wouldn't have to emit it anyway.

If LLVM (Clang's "back end") fails such as during instruction selection
or register allocation, it cannot provide accurate diagnostics
(warnings/errors) that contain line information, as the AST has been
discarded from memory at that point.

While there have been early discussions about having C/C++ specific
language optimizations in Clang via the use of MLIR, which would enable
such earlier optimizations, such work is not scoped and likely a
multi-year endeavor.

We also don't want to swap the use of "=q" with "=r". For 64b, it
doesn't matter. For 32b, it's possible that a 32b register without a 8b
lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
then reject.

With this, Clang can finally build an i386 defconfig.

Reported-by: Arnd Bergmann <[email protected]>
Reported-by: David Woodhouse <[email protected]>
Reported-by: Dmitry Golovin <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Link: https://bugs.llvm.org/show_bug.cgi?id=33587
Link: https://github.com/ClangBuiltLinux/linux/issues/3
Link: https://github.com/ClangBuiltLinux/linux/issues/194
Link: https://github.com/ClangBuiltLinux/linux/issues/781
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
Signed-off-by: Nick Desaulniers <[email protected]>
---
Note: this is a resend of:
https://lore.kernel.org/lkml/[email protected]/
rebased on today's Linux next, and with the additional change to
uaccess.h.

I'm happy to resend with authorship and reported by tags changed to
suggested by's or whatever, just let me know.

Part of the commit message is stolen from David, and part from Linus.
Shall I resend with David's authorship and
[Nick: reworded]
???

I don't really care, I just don't really want to carry this out of tree
for our CI, which is green for i386 otherwise.

arch/x86/include/asm/percpu.h | 12 ++++++++----
arch/x86/include/asm/uaccess.h | 4 +++-
2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797c769d..826d086f71c9 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -99,7 +99,7 @@ do { \
case 1: \
asm qual (op "b %1,"__percpu_arg(0) \
: "+m" (var) \
- : "qi" ((pto_T__)(val))); \
+ : "qi" ((unsigned char)(unsigned long)(val))); \
break; \
case 2: \
asm qual (op "w %1,"__percpu_arg(0) \
@@ -144,7 +144,7 @@ do { \
else \
asm qual ("addb %1, "__percpu_arg(0) \
: "+m" (var) \
- : "qi" ((pao_T__)(val))); \
+ : "qi" ((unsigned char)(unsigned long)(val))); \
break; \
case 2: \
if (pao_ID__ == 1) \
@@ -182,12 +182,14 @@ do { \

#define percpu_from_op(qual, op, var) \
({ \
+ unsigned char pfo_u8__; \
typeof(var) pfo_ret__; \
switch (sizeof(var)) { \
case 1: \
asm qual (op "b "__percpu_arg(1)",%0" \
- : "=q" (pfo_ret__) \
+ : "=q" (pfo_u8__) \
: "m" (var)); \
+ pfo_ret__ = (typeof(var))(unsigned long)pfo_u8__; \
break; \
case 2: \
asm qual (op "w "__percpu_arg(1)",%0" \
@@ -211,12 +213,14 @@ do { \

#define percpu_stable_op(op, var) \
({ \
+ unsigned char pfo_u8__; \
typeof(var) pfo_ret__; \
switch (sizeof(var)) { \
case 1: \
asm(op "b "__percpu_arg(P1)",%0" \
- : "=q" (pfo_ret__) \
+ : "=q" (pfo_u8__) \
: "p" (&(var))); \
+ pfo_ret__ = (typeof(var))(unsigned long)pfo_u8__; \
break; \
case 2: \
asm(op "w "__percpu_arg(P1)",%0" \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index d8f283b9a569..cf8483cd80e1 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -314,11 +314,13 @@ do { \

#define __get_user_size(x, ptr, size, retval) \
do { \
+ unsigned char x_u8__; \
retval = 0; \
__chk_user_ptr(ptr); \
switch (size) { \
case 1: \
- __get_user_asm(x, ptr, retval, "b", "=q"); \
+ __get_user_asm(x_u8__, ptr, retval, "b", "=q"); \
+ (x) = x_u8__; \
break; \
case 2: \
__get_user_asm(x, ptr, retval, "w", "=r"); \
--
2.26.2.526.g744177e7f7-goog

2020-05-11 17:25:40

by Nick Desaulniers

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

Bumping for comment+review.

On Mon, May 4, 2020 at 4:03 PM Nick Desaulniers <[email protected]> wrote:
>
> GCC and Clang are architecturally different, which leads to subtle
> issues for code that's invalid but clearly dead. This can happen with
> code that emulates polymorphism with the preprocessor and sizeof.
>
> GCC will perform semantic analysis after early inlining and dead code
> elimination, so it will not warn on invalid code that's dead. Clang
> strictly performs optimizations after semantic analysis, so it will warn
> for dead code.
>
> Neither Clang nor GCC like this very much with -m32:
>
> long long ret;
> asm ("movb $5, %0" : "=q" (ret));
>
> However, GCC can tolerate this variant:
>
> long long ret;
> switch (sizeof(ret)) {
> case 1:
> asm ("movb $5, %0" : "=q" (ret));
> break;
> case 8:;
> }
>
> Clang, on the other hand, won't accept that because it validates the
> inline asm for the '1' case *before* the optimisation phase where it
> realises that it wouldn't have to emit it anyway.
>
> If LLVM (Clang's "back end") fails such as during instruction selection
> or register allocation, it cannot provide accurate diagnostics
> (warnings/errors) that contain line information, as the AST has been
> discarded from memory at that point.
>
> While there have been early discussions about having C/C++ specific
> language optimizations in Clang via the use of MLIR, which would enable
> such earlier optimizations, such work is not scoped and likely a
> multi-year endeavor.
>
> We also don't want to swap the use of "=q" with "=r". For 64b, it
> doesn't matter. For 32b, it's possible that a 32b register without a 8b
> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
> then reject.
>
> With this, Clang can finally build an i386 defconfig.
>
> Reported-by: Arnd Bergmann <[email protected]>
> Reported-by: David Woodhouse <[email protected]>
> Reported-by: Dmitry Golovin <[email protected]>
> Reported-by: Linus Torvalds <[email protected]>
> Link: https://bugs.llvm.org/show_bug.cgi?id=33587
> Link: https://github.com/ClangBuiltLinux/linux/issues/3
> Link: https://github.com/ClangBuiltLinux/linux/issues/194
> Link: https://github.com/ClangBuiltLinux/linux/issues/781
> Link: https://lore.kernel.org/lkml/[email protected]/
> Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
> Signed-off-by: Nick Desaulniers <[email protected]>
> ---
> Note: this is a resend of:
> https://lore.kernel.org/lkml/[email protected]/
> rebased on today's Linux next, and with the additional change to
> uaccess.h.
>
> I'm happy to resend with authorship and reported by tags changed to
> suggested by's or whatever, just let me know.
>
> Part of the commit message is stolen from David, and part from Linus.
> Shall I resend with David's authorship and
> [Nick: reworded]
> ???
>
> I don't really care, I just don't really want to carry this out of tree
> for our CI, which is green for i386 otherwise.
>
>
> arch/x86/include/asm/percpu.h | 12 ++++++++----
> arch/x86/include/asm/uaccess.h | 4 +++-
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 2278797c769d..826d086f71c9 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -99,7 +99,7 @@ do { \
> case 1: \
> asm qual (op "b %1,"__percpu_arg(0) \
> : "+m" (var) \
> - : "qi" ((pto_T__)(val))); \
> + : "qi" ((unsigned char)(unsigned long)(val))); \
> break; \
> case 2: \
> asm qual (op "w %1,"__percpu_arg(0) \
> @@ -144,7 +144,7 @@ do { \
> else \
> asm qual ("addb %1, "__percpu_arg(0) \
> : "+m" (var) \
> - : "qi" ((pao_T__)(val))); \
> + : "qi" ((unsigned char)(unsigned long)(val))); \
> break; \
> case 2: \
> if (pao_ID__ == 1) \
> @@ -182,12 +182,14 @@ do { \
>
> #define percpu_from_op(qual, op, var) \
> ({ \
> + unsigned char pfo_u8__; \
> typeof(var) pfo_ret__; \
> switch (sizeof(var)) { \
> case 1: \
> asm qual (op "b "__percpu_arg(1)",%0" \
> - : "=q" (pfo_ret__) \
> + : "=q" (pfo_u8__) \
> : "m" (var)); \
> + pfo_ret__ = (typeof(var))(unsigned long)pfo_u8__; \
> break; \
> case 2: \
> asm qual (op "w "__percpu_arg(1)",%0" \
> @@ -211,12 +213,14 @@ do { \
>
> #define percpu_stable_op(op, var) \
> ({ \
> + unsigned char pfo_u8__; \
> typeof(var) pfo_ret__; \
> switch (sizeof(var)) { \
> case 1: \
> asm(op "b "__percpu_arg(P1)",%0" \
> - : "=q" (pfo_ret__) \
> + : "=q" (pfo_u8__) \
> : "p" (&(var))); \
> + pfo_ret__ = (typeof(var))(unsigned long)pfo_u8__; \
> break; \
> case 2: \
> asm(op "w "__percpu_arg(P1)",%0" \
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index d8f283b9a569..cf8483cd80e1 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -314,11 +314,13 @@ do { \
>
> #define __get_user_size(x, ptr, size, retval) \
> do { \
> + unsigned char x_u8__; \
> retval = 0; \
> __chk_user_ptr(ptr); \
> switch (size) { \
> case 1: \
> - __get_user_asm(x, ptr, retval, "b", "=q"); \
> + __get_user_asm(x_u8__, ptr, retval, "b", "=q"); \
> + (x) = x_u8__; \
> break; \
> case 2: \
> __get_user_asm(x, ptr, retval, "w", "=r"); \
> --
> 2.26.2.526.g744177e7f7-goog
>

--
Thanks,
~Nick Desaulniers

2020-05-11 18:11:39

by Brian Gerst

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 1:26 PM Nick Desaulniers
<[email protected]> wrote:
>
> Bumping for comment+review.
>
> On Mon, May 4, 2020 at 4:03 PM Nick Desaulniers <[email protected]> wrote:
> >
> > GCC and Clang are architecturally different, which leads to subtle
> > issues for code that's invalid but clearly dead. This can happen with
> > code that emulates polymorphism with the preprocessor and sizeof.
> >
> > GCC will perform semantic analysis after early inlining and dead code
> > elimination, so it will not warn on invalid code that's dead. Clang
> > strictly performs optimizations after semantic analysis, so it will warn
> > for dead code.
> >
> > Neither Clang nor GCC like this very much with -m32:
> >
> > long long ret;
> > asm ("movb $5, %0" : "=q" (ret));
> >
> > However, GCC can tolerate this variant:
> >
> > long long ret;
> > switch (sizeof(ret)) {
> > case 1:
> > asm ("movb $5, %0" : "=q" (ret));
> > break;
> > case 8:;
> > }
> >
> > Clang, on the other hand, won't accept that because it validates the
> > inline asm for the '1' case *before* the optimisation phase where it
> > realises that it wouldn't have to emit it anyway.
> >
> > If LLVM (Clang's "back end") fails such as during instruction selection
> > or register allocation, it cannot provide accurate diagnostics
> > (warnings/errors) that contain line information, as the AST has been
> > discarded from memory at that point.
> >
> > While there have been early discussions about having C/C++ specific
> > language optimizations in Clang via the use of MLIR, which would enable
> > such earlier optimizations, such work is not scoped and likely a
> > multi-year endeavor.
> >
> > We also don't want to swap the use of "=q" with "=r". For 64b, it
> > doesn't matter. For 32b, it's possible that a 32b register without a 8b
> > lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
> > then reject.
> >
> > With this, Clang can finally build an i386 defconfig.
> >
> > Reported-by: Arnd Bergmann <[email protected]>
> > Reported-by: David Woodhouse <[email protected]>
> > Reported-by: Dmitry Golovin <[email protected]>
> > Reported-by: Linus Torvalds <[email protected]>
> > Link: https://bugs.llvm.org/show_bug.cgi?id=33587
> > Link: https://github.com/ClangBuiltLinux/linux/issues/3
> > Link: https://github.com/ClangBuiltLinux/linux/issues/194
> > Link: https://github.com/ClangBuiltLinux/linux/issues/781
> > Link: https://lore.kernel.org/lkml/[email protected]/
> > Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
> > Signed-off-by: Nick Desaulniers <[email protected]>
> > ---
> > Note: this is a resend of:
> > https://lore.kernel.org/lkml/[email protected]/
> > rebased on today's Linux next, and with the additional change to
> > uaccess.h.
> >
> > I'm happy to resend with authorship and reported by tags changed to
> > suggested by's or whatever, just let me know.
> >
> > Part of the commit message is stolen from David, and part from Linus.
> > Shall I resend with David's authorship and
> > [Nick: reworded]
> > ???
> >
> > I don't really care, I just don't really want to carry this out of tree
> > for our CI, which is green for i386 otherwise.
> >
> >
> > arch/x86/include/asm/percpu.h | 12 ++++++++----
> > arch/x86/include/asm/uaccess.h | 4 +++-
> > 2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index 2278797c769d..826d086f71c9 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -99,7 +99,7 @@ do { \
> > case 1: \
> > asm qual (op "b %1,"__percpu_arg(0) \
> > : "+m" (var) \
> > - : "qi" ((pto_T__)(val))); \
> > + : "qi" ((unsigned char)(unsigned long)(val))); \
> > break; \
> > case 2: \
> > asm qual (op "w %1,"__percpu_arg(0) \
> > @@ -144,7 +144,7 @@ do { \
> > else \
> > asm qual ("addb %1, "__percpu_arg(0) \
> > : "+m" (var) \
> > - : "qi" ((pao_T__)(val))); \
> > + : "qi" ((unsigned char)(unsigned long)(val))); \
> > break; \
> > case 2: \
> > if (pao_ID__ == 1) \
> > @@ -182,12 +182,14 @@ do { \
> >
> > #define percpu_from_op(qual, op, var) \
> > ({ \
> > + unsigned char pfo_u8__; \
> > typeof(var) pfo_ret__; \
> > switch (sizeof(var)) { \
> > case 1: \
> > asm qual (op "b "__percpu_arg(1)",%0" \
> > - : "=q" (pfo_ret__) \
> > + : "=q" (pfo_u8__) \
> > : "m" (var)); \
> > + pfo_ret__ = (typeof(var))(unsigned long)pfo_u8__; \
> > break; \
> > case 2: \
> > asm qual (op "w "__percpu_arg(1)",%0" \
> > @@ -211,12 +213,14 @@ do { \
> >
> > #define percpu_stable_op(op, var) \
> > ({ \
> > + unsigned char pfo_u8__; \
> > typeof(var) pfo_ret__; \
> > switch (sizeof(var)) { \
> > case 1: \
> > asm(op "b "__percpu_arg(P1)",%0" \
> > - : "=q" (pfo_ret__) \
> > + : "=q" (pfo_u8__) \
> > : "p" (&(var))); \
> > + pfo_ret__ = (typeof(var))(unsigned long)pfo_u8__; \
> > break; \
> > case 2: \
> > asm(op "w "__percpu_arg(P1)",%0" \
> > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > index d8f283b9a569..cf8483cd80e1 100644
> > --- a/arch/x86/include/asm/uaccess.h
> > +++ b/arch/x86/include/asm/uaccess.h
> > @@ -314,11 +314,13 @@ do { \
> >
> > #define __get_user_size(x, ptr, size, retval) \
> > do { \
> > + unsigned char x_u8__; \
> > retval = 0; \
> > __chk_user_ptr(ptr); \
> > switch (size) { \
> > case 1: \
> > - __get_user_asm(x, ptr, retval, "b", "=q"); \
> > + __get_user_asm(x_u8__, ptr, retval, "b", "=q"); \
> > + (x) = x_u8__; \
> > break; \
> > case 2: \
> > __get_user_asm(x, ptr, retval, "w", "=r"); \
> > --
> > 2.26.2.526.g744177e7f7-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

This looks like the same issue that we just discussed for bitops.h.
Add the "b" operand size modifier to force it to use the 8-bit
register names (and probably also needs the "w" modifier in the 16-bit
case).

--
Brian Gerst

2020-05-11 18:15:31

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 10:24 AM Nick Desaulniers
<[email protected]> wrote:
>
> Bumping for comment+review.
>
> On Mon, May 4, 2020 at 4:03 PM Nick Desaulniers <[email protected]> wrote:
> >
> > - : "qi" ((pto_T__)(val))); \
> > + : "qi" ((unsigned char)(unsigned long)(val))); \

I find the extra casts really annoying and illogical.

I kind of see why they happen: just casting to 'unsigned char' can
cause warnings, when this case isn't even taken, and the sizeof(var)
is something else (and you cast a pointer to a char without going
through the 'unsigned long' first.

But that doesn't make this change any more sensible or readable.

Would using "__builtin_choose_expr()" be able to avoid this whole issue?

Because I'd rather re-write the whole thing with a different model
that doesn't have this issue, than make the code look insane.

And it does look insane, because that whole thing makes for a very
natural question: "why do we cast to unsigned long and then to char,
when we just checked that the size of the type is 1"?

Linus

2020-05-11 18:34:22

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 11:12 AM Linus Torvalds
<[email protected]> wrote:
>
> Would using "__builtin_choose_expr()" be able to avoid this whole issue?

We actually have a fair amount of "pick expression based on size", so
with a few helper macros we could make the code look better than the
case statements too.

Something (ENTIRELY UNTESTED!) like the attached patch, perhaps?

NOTE! I only converted one single use to that "pick_size_xyz()" model.
If this actually works for clang too, we could do the others.

I guess I should just test it, since I have that clang tree.

Linus

Attachments:

patch (1.81 kB)

2020-05-11 18:38:55

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 11:24 AM Linus Torvalds
<[email protected]> wrote:
>
> I guess I should just test it, since I have that clang tree.

No, clang doesn't seem to handle it even with __builtin_choose_expr(),
and has that

invalid input size for constraint 'qi'

even when it's in a side that is never chosen.

Very annoying. A lot of our magic macros are literally about "pick one
case when the others are not valid for this type".

Linus

2020-05-11 18:50:27

by Nick Desaulniers

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 11:09 AM Brian Gerst <[email protected]> wrote:
> This looks like the same issue that we just discussed for bitops.h.
> Add the "b" operand size modifier to force it to use the 8-bit
> register names (and probably also needs the "w" modifier in the 16-bit
> case).

While it does feel familiar, it is slightly different.
https://godbolt.org/z/Rme4Zg
That case was both compilers validating the inline asm, yet generating
assembly that the assembler would choke on. This case is validation
in the front end failing.

Side note: would you mind sending a review by tag for v5 of that patch
if you think it's good to go? It does fix a regression I'd prefer
didn't ship in 5.7.
--
Thanks,
~Nick Desaulniers

2020-05-11 19:39:09

by Brian Gerst

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 2:46 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, May 11, 2020 at 11:09 AM Brian Gerst <[email protected]> wrote:
> > This looks like the same issue that we just discussed for bitops.h.
> > Add the "b" operand size modifier to force it to use the 8-bit
> > register names (and probably also needs the "w" modifier in the 16-bit
> > case).
>
> While it does feel familiar, it is slightly different.
> https://godbolt.org/z/Rme4Zg
> That case was both compilers validating the inline asm, yet generating
> assembly that the assembler would choke on. This case is validation
> in the front end failing.

> long long ret;
> switch (sizeof(ret)) {
> case 1:
> asm ("movb $5, %0" : "=q" (ret));
> break;
> case 8:;
> }

So if the issue here is that the output variable type is long long,
what code is using a 64-bit percpu variable on a 32-bit kernel? Can
you give a specific file that fails to build with Clang? If Clang is
choking on it it may be silently miscompiling on GCC.

--
Brian Gerst

2020-05-11 19:57:10

by Nick Desaulniers

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 11:24 AM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, May 11, 2020 at 11:12 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > Would using "__builtin_choose_expr()" be able to avoid this whole issue?
>
> We actually have a fair amount of "pick expression based on size", so
> with a few helper macros we could make the code look better than the
> case statements too.
>
> Something (ENTIRELY UNTESTED!) like the attached patch, perhaps?
>
> NOTE! I only converted one single use to that "pick_size_xyz()" model.
> If this actually works for clang too, we could do the others.
>
> I guess I should just test it, since I have that clang tree.

Interesting approach. Researching __builtin_choose_expr, it looks
like it was cited as prior art for C11's _Generic keyword.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1404.htm

I'm kind of tempted now to play with _Generic and see if that makes a
difference, though I'm not hopeful. Without checking, I don't know if
that will produce warnings with `-std=gnu89`.

...

I'm playing around with _Generic now to see if that's a possible
solution, but it looks like it can't be used for selecting inline asm
constraint strings. But maybe there's another way to use _Generic
here. TODO: more testing.

I don't quite understand the use of GNU C statement expressions in
pick_type_statement, though I'm guessing the return value is important
somewhere. Maybe just looking at the preprocessed source would make
it clearer.

> "why do we cast to unsigned long and then to char,
> when we just checked that the size of the type is 1"?

Deja vu. I don't remember who I discussed that with, maybe Arnd, but
that was something else I had asked at some point. I must have
forgotten to look into it more before sending the patch. Can likely
drop that at least.

--
Thanks,
~Nick Desaulniers

2020-05-11 20:04:01

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 12:52 PM Nick Desaulniers
<[email protected]> wrote:
>
> Interesting approach. Researching __builtin_choose_expr, it looks
> like it was cited as prior art for C11's _Generic keyword.

Well, the thing that made me think that __builtin_choose_expr() would
work is that unlike the switch statement, you absolutely _have_ to do
the choice in the front end. You can't leave it as some kind of
optimization for later phases, because the choice od expression ends
up also determining the type of the result, so it isn't just a local
choice - it affects everything around that expression.

But clang still doesn't like that "qi" constraint with a (non-chosen)
expression that has a "u64" type.

I guess we can take the stupid extra cast, but I think it would at
least need a comment (maybe through a helper function) about why "qi"
needs it, but "ri" does not, and why the cast to "unsigned long" is
needed, even though "clearly" the type is already just 8 bits.

Otherwise somebody will just remove that "obviously pointless" cast,
and gcc will eat the result happily, and clang will fail.

And nobody will notice for a while anyway, because this issue only
happens on 32-bit targets, and developers don't use those any more.

Linus

2020-05-11 20:09:00

by David Woodhouse

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, 2020-05-11 at 13:01 -0700, Linus Torvalds wrote:
> On Mon, May 11, 2020 at 12:52 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > Interesting approach. Researching __builtin_choose_expr, it looks
> > like it was cited as prior art for C11's _Generic keyword.
>
> Well, the thing that made me think that __builtin_choose_expr() would
> work is that unlike the switch statement, you absolutely _have_ to do
> the choice in the front end. You can't leave it as some kind of
> optimization for later phases, because the choice od expression ends
> up also determining the type of the result, so it isn't just a local
> choice - it affects everything around that expression.
>
> But clang still doesn't like that "qi" constraint with a (non-chosen)
> expression that has a "u64" type.
>
> I guess we can take the stupid extra cast, but I think it would at
> least need a comment (maybe through a helper function) about why "qi"
> needs it, but "ri" does not, and why the cast to "unsigned long" is
> needed, even though "clearly" the type is already just 8 bits.
>
> Otherwise somebody will just remove that "obviously pointless" cast,
> and gcc will eat the result happily, and clang will fail.

I'm also mildly concerned that LLVM will start to whine about the 'ri'
case too. It's odd that it doesn't, even when GCC does.

Attachments:

smime.p7s (5.05 kB)

2020-05-11 20:20:49

by Nick Desaulniers

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 12:34 PM Brian Gerst <[email protected]> wrote:
>
> On Mon, May 11, 2020 at 2:46 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Mon, May 11, 2020 at 11:09 AM Brian Gerst <[email protected]> wrote:
> > > This looks like the same issue that we just discussed for bitops.h.
> > > Add the "b" operand size modifier to force it to use the 8-bit
> > > register names (and probably also needs the "w" modifier in the 16-bit
> > > case).
> >
> > While it does feel familiar, it is slightly different.
> > https://godbolt.org/z/Rme4Zg
> > That case was both compilers validating the inline asm, yet generating
> > assembly that the assembler would choke on. This case is validation
> > in the front end failing.
>
> > long long ret;
> > switch (sizeof(ret)) {
> > case 1:
> > asm ("movb $5, %0" : "=q" (ret));
> > break;
> > case 8:;
> > }
>
> So if the issue here is that the output variable type is long long,
> what code is using a 64-bit percpu variable on a 32-bit kernel? Can
> you give a specific file that fails to build with Clang? If Clang is
> choking on it it may be silently miscompiling on GCC.

I'm not sure that's the case. Applying this patch, undoing the hunk
in percpu_from_op() we get tons of errors. Looking at one:

kernel/events/core.c:8679:8: error: invalid output size for constraint '=q'
./include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read'
raw_cpu_read(pcp); \
^
...

There's nothing wrong with this line, it's reading a percpu u64 into a
local u64. The error comes from validating the inline asm in the dead
branch.
--
Thanks,
~Nick Desaulniers

2020-05-11 22:56:29

by Brian Gerst

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 3:34 PM Brian Gerst <[email protected]> wrote:
>
> On Mon, May 11, 2020 at 2:46 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Mon, May 11, 2020 at 11:09 AM Brian Gerst <[email protected]> wrote:
> > > This looks like the same issue that we just discussed for bitops.h.
> > > Add the "b" operand size modifier to force it to use the 8-bit
> > > register names (and probably also needs the "w" modifier in the 16-bit
> > > case).
> >
> > While it does feel familiar, it is slightly different.
> > https://godbolt.org/z/Rme4Zg
> > That case was both compilers validating the inline asm, yet generating
> > assembly that the assembler would choke on. This case is validation
> > in the front end failing.
>
> > long long ret;
> > switch (sizeof(ret)) {
> > case 1:
> > asm ("movb $5, %0" : "=q" (ret));
> > break;
> > case 8:;
> > }
>
> So if the issue here is that the output variable type is long long,
> what code is using a 64-bit percpu variable on a 32-bit kernel? Can
> you give a specific file that fails to build with Clang? If Clang is
> choking on it it may be silently miscompiling on GCC.

On further investigation, 64-bit percpu operations fall back to the
generic code on x86-32, so there is no problem with miscompiling here.

On a side note from looking at the preprocessed output of the percpu
macros: they generate a ton of extra dead code because the core macros
also have a switch on data size. I will take a stab at cleaning that
up.

--
Brian Gerst

2020-05-12 20:40:13

by Nick Desaulniers

[permalink] [raw]

Subject: Re: [PATCH] x86: support i386 with Clang

On Mon, May 11, 2020 at 1:03 PM David Woodhouse <[email protected]> wrote:
>
> On Mon, 2020-05-11 at 13:01 -0700, Linus Torvalds wrote:
> > On Mon, May 11, 2020 at 12:52 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > Interesting approach. Researching __builtin_choose_expr, it looks
> > > like it was cited as prior art for C11's _Generic keyword.
> >
> > Well, the thing that made me think that __builtin_choose_expr() would
> > work is that unlike the switch statement, you absolutely _have_ to do
> > the choice in the front end. You can't leave it as some kind of
> > optimization for later phases, because the choice od expression ends
> > up also determining the type of the result, so it isn't just a local
> > choice - it affects everything around that expression.
> >
> > But clang still doesn't like that "qi" constraint with a (non-chosen)
> > expression that has a "u64" type.
> >
> > I guess we can take the stupid extra cast, but I think it would at
> > least need a comment (maybe through a helper function) about why "qi"
> > needs it, but "ri" does not, and why the cast to "unsigned long" is
> > needed, even though "clearly" the type is already just 8 bits.
> >
> > Otherwise somebody will just remove that "obviously pointless" cast,
> > and gcc will eat the result happily, and clang will fail.
>
> I'm also mildly concerned that LLVM will start to whine about the 'ri'
> case too. It's odd that it doesn't, even when GCC does.

Ah, sorry, it took me a while to understand what case you meant by
this. I recall you pointing this out previously in
https://bugs.llvm.org/show_bug.cgi?id=33587#c16, but at the time I
simply wasn't well versed enough in inline asm to understand. The
case you're referring to is 64b operands, "r" constraint, -m32 target.
Yes, if I fix that: https://reviews.llvm.org/D79804, then this whole
patch needs to be reworked. Back to the drawing board.
--
Thanks,
~Nick Desaulniers