2009-01-23 17:57:35

by Török Edwin

[permalink] [raw]
Subject: inline asm semantics: output constraint width smaller than input

Hi,

I am trying to build the kernel with LLVM 2.5 prerelease (using
llvm-gcc-4.2 frontend), however I am running into some inline asm
semantics issues, and after some discussion on LLVM bugzilla I would
like to know if you would be accepting patches for this:
http://llvm.org/bugs/show_bug.cgi?id=3373

The problem is when "a" output constraint is used with a variable of
smaller width than the
"0" input constraint.

Here are 2 examples:

int __ret_pu; unsigned long __pu_val;
return ({asm volatile("call __put_user_" "8" : "=a" (__ret_pu) :"0"
(__pu_val), "c"(addr) : "ebx"); __ret_pu;});


unsigned char return_code; /* %al */
unsigned long address; /* %ebx */
unsigned long length; /* %ecx */
unsigned long entry; /* %edx */
unsigned long flags;
__asm__("lcall *(%%edi); cld"
: "=a" (return_code),
"=b" (address),
"=c" (length),
"=d" (entry)
: "0" (service),
"1" (0),
"D" (&bios32_indirect));

There are 2 cases:
1. output is wider than input
2. output is narrower than input

Case 2 seems to occur lots of times on 64-bit (due to sizeof(int) != sizeof(unsigned long)),
and a few times on 32-bit as well.

Would you accept patches that increase the portability of the inline asm statements?
(essentially by adding casts for case 1, and introducing a temporary of correct width for case 2).

Please see:
http://llvm.org/bugs/show_bug.cgi?id=3373#c6

Best regards,
--Edwin


2009-01-23 18:17:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input


* T?r?k Edwin <[email protected]> wrote:

> Hi,
>
> I am trying to build the kernel with LLVM 2.5 prerelease (using
> llvm-gcc-4.2 frontend), however I am running into some inline asm
> semantics issues, and after some discussion on LLVM bugzilla I would
> like to know if you would be accepting patches for this:
> http://llvm.org/bugs/show_bug.cgi?id=3373
>
> The problem is when "a" output constraint is used with a variable of
> smaller width than the "0" input constraint.
>
> Here are 2 examples:
>
> int __ret_pu; unsigned long __pu_val;
> return ({asm volatile("call __put_user_" "8" : "=a" (__ret_pu) :"0"
> (__pu_val), "c"(addr) : "ebx"); __ret_pu;});
>
>
> unsigned char return_code; /* %al */
> unsigned long address; /* %ebx */
> unsigned long length; /* %ecx */
> unsigned long entry; /* %edx */
> unsigned long flags;
> __asm__("lcall *(%%edi); cld"
> : "=a" (return_code),
> "=b" (address),
> "=c" (length),
> "=d" (entry)
> : "0" (service),
> "1" (0),
> "D" (&bios32_indirect));
>
> There are 2 cases:
> 1. output is wider than input
> 2. output is narrower than input
>
> Case 2 seems to occur lots of times on 64-bit (due to sizeof(int) !=
> sizeof(unsigned long)), and a few times on 32-bit as well.
>
> Would you accept patches that increase the portability of the inline asm
> statements? (essentially by adding casts for case 1, and introducing a
> temporary of correct width for case 2).

i'd not mind it at all if the kernel could be built with other open-source
compilers too.

Now in this case the patch you suggest might end up hurting the end result
so it's not an unconditional 'yes'. But ... how much it actually matters
depends on the circumstances.

So could you please send a sample patch for some of most common inline
assembly statements that are affected by this, so that we can see:

1) how ugly the LLVM workarounds are
2) how they affect the generated kernel image in practice

My gut feeling is that it's going to be acceptable with a bit of thinking
(we might even do some wrappers to do this cleanly) - but i'd really like
to see it before giving you that judgement.

Another question: does LLVM always warn about such input/output aliased
constraint width mismatch problems if they occur, or does it silently
corrupt the resulting instruction sequence?

Ingo

2009-01-23 18:22:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

Ingo Molnar wrote:
>
>> Hi,
>>
>> I am trying to build the kernel with LLVM 2.5 prerelease (using
>> llvm-gcc-4.2 frontend), however I am running into some inline asm
>> semantics issues, and after some discussion on LLVM bugzilla I would
>> like to know if you would be accepting patches for this:
>> http://llvm.org/bugs/show_bug.cgi?id=3373
>>
>> The problem is when "a" output constraint is used with a variable of
>> smaller width than the "0" input constraint.
>>

IMO, that's an LLVM bug and should be fixed in LLVM while it is in
prerelease. That particular constraint is a serious one, and will haunt
us for many years if we allow ourselves to go down the path of having to
fix that every time it crops up.

-hpa

2009-01-23 18:27:37

by Török Edwin

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

On 2009-01-23 20:17, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>
>> Hi,
>>
>> I am trying to build the kernel with LLVM 2.5 prerelease (using
>> llvm-gcc-4.2 frontend), however I am running into some inline asm
>> semantics issues, and after some discussion on LLVM bugzilla I would
>> like to know if you would be accepting patches for this:
>> http://llvm.org/bugs/show_bug.cgi?id=3373
>>
>> The problem is when "a" output constraint is used with a variable of
>> smaller width than the "0" input constraint.
>>
>> Here are 2 examples:
>>
>> int __ret_pu; unsigned long __pu_val;
>> return ({asm volatile("call __put_user_" "8" : "=a" (__ret_pu) :"0"
>> (__pu_val), "c"(addr) : "ebx"); __ret_pu;});
>>
>>
>> unsigned char return_code; /* %al */
>> unsigned long address; /* %ebx */
>> unsigned long length; /* %ecx */
>> unsigned long entry; /* %edx */
>> unsigned long flags;
>> __asm__("lcall *(%%edi); cld"
>> : "=a" (return_code),
>> "=b" (address),
>> "=c" (length),
>> "=d" (entry)
>> : "0" (service),
>> "1" (0),
>> "D" (&bios32_indirect));
>>
>> There are 2 cases:
>> 1. output is wider than input
>> 2. output is narrower than input
>>
>> Case 2 seems to occur lots of times on 64-bit (due to sizeof(int) !=
>> sizeof(unsigned long)), and a few times on 32-bit as well.
>>
>> Would you accept patches that increase the portability of the inline asm
>> statements? (essentially by adding casts for case 1, and introducing a
>> temporary of correct width for case 2).
>>
>
> i'd not mind it at all if the kernel could be built with other open-source
> compilers too.
>
> Now in this case the patch you suggest might end up hurting the end result
> so it's not an unconditional 'yes'. But ... how much it actually matters
> depends on the circumstances.
>
> So could you please send a sample patch for some of most common inline
> assembly statements that are affected by this, so that we can see:
>
> 1) how ugly the LLVM workarounds are
>

Ok, I will prepare a patch for both cases.

> 2) how they affect the generated kernel image in practice
>
> My gut feeling is that it's going to be acceptable with a bit of thinking
> (we might even do some wrappers to do this cleanly) - but i'd really like
> to see it before giving you that judgement.
>
> Another question: does LLVM always warn about such input/output aliased
> constraint width mismatch problems if they occur, or does it silently
> corrupt the resulting instruction sequence?
>

With current LLVM it gives an error message in the frontend and refuses
to compile those files.
With LLVM 2.4 unsatistifiable constraints used to lead to an assertion
failure in the backend during code generation, thus also refusing to
compile the file (assertions are enabled in a release build).
However due to changes in the code generator for LLVM 2.5, mismatching
widths for same registers are no longer accepted (I think it was
complicated to support that in the code generator), and they are now
rejected in the frontend instead of the backend.

Having said that, llvm-gcc is not yet able to compile the full Linux
kernel on its own [for example the boot code, due to asm(".code16gcc")],
but with LLVM 2.4 it was
possible to build "arch=UM", and "arch=X86" (by using gcc to build the
bootcode).
I'd like LLVM 2.5 to be able to build the kernel, so I'll file bugs for
llvm/kernel depending on where the problem is.

Best regards,
--Edwin

2009-01-23 18:30:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input


* T?r?k Edwin <[email protected]> wrote:

> Having said that, llvm-gcc is not yet able to compile the full Linux
> kernel on its own [for example the boot code, due to asm(".code16gcc")],
> but with LLVM 2.4 it was possible to build "arch=UM", and "arch=X86" (by
> using gcc to build the bootcode). I'd like LLVM 2.5 to be able to build
> the kernel, so I'll file bugs for llvm/kernel depending on where the
> problem is.

Could we get LLVM folks on the Cc: and see how difficult it would be to
fix this on the LLVM side? Asm constraints are used all around the place
and different input/output types are very common.

Ingo

2009-01-23 18:52:53

by Török Edwin

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

On 2009-01-23 20:30, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>
>> Having said that, llvm-gcc is not yet able to compile the full Linux
>> kernel on its own [for example the boot code, due to asm(".code16gcc")],
>> but with LLVM 2.4 it was possible to build "arch=UM", and "arch=X86" (by
>> using gcc to build the bootcode). I'd like LLVM 2.5 to be able to build
>> the kernel, so I'll file bugs for llvm/kernel depending on where the
>> problem is.
>>
>
> Could we get LLVM folks on the Cc: and see how difficult it would be to
> fix this on the LLVM side? Asm constraints are used all around the place
> and different input/output types are very common.

I've added LLVMDev to Cc.
Your first post to the list may take a little longer to reach it (first
post is moderated by listmaster, no subscription required).

[For those who missed the initial conversation, see:
http://llvm.org/bugs/show_bug.cgi?id=3373#c9]

Best regards,
--Edwin

2009-01-23 20:43:06

by Török Edwin

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

On 2009-01-23 20:52, T?r?k Edwin wrote:
> On 2009-01-23 20:30, Ingo Molnar wrote:
>
>> * T?r?k Edwin <[email protected]> wrote:
>>
>>
>>
>>> Having said that, llvm-gcc is not yet able to compile the full Linux
>>> kernel on its own [for example the boot code, due to asm(".code16gcc")],
>>> but with LLVM 2.4 it was possible to build "arch=UM", and "arch=X86" (by
>>> using gcc to build the bootcode). I'd like LLVM 2.5 to be able to build
>>> the kernel, so I'll file bugs for llvm/kernel depending on where the
>>> problem is.
>>>
>>>
>> Could we get LLVM folks on the Cc: and see how difficult it would be to
>> fix this on the LLVM side? Asm constraints are used all around the place
>> and different input/output types are very common.
>>
>
>

Hi Ingo,

Could you describe what are the semantics you need for inline asm
constraints in the kernel?
GCC doesn't document all the corner cases, and defining inline asm =
"whatever gcc accepts" is not very useful for LLVM.

So far we've encountered the problem with input/output operand tied to
same register, but having different widths:
- output wider than input, both integers: do you need this case?
- output narrower than input, both integers: this is the common case, right?
- can it also happen that input is pointer, output is integer of
different width?
- .. any other mismatches?

Could you also describe why put_user/the example from pcbios needs the
different widths?

Best regards,
--Edwin

2009-01-24 16:24:18

by Török Edwin

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

On 2009-01-23 20:27, T?r?k Edwin wrote:
>>>
>>>
>> i'd not mind it at all if the kernel could be built with other open-source
>> compilers too.
>>
>> Now in this case the patch you suggest might end up hurting the end result
>> so it's not an unconditional 'yes'. But ... how much it actually matters
>> depends on the circumstances.
>>
>> So could you please send a sample patch for some of most common inline
>> assembly statements that are affected by this, so that we can see:
>>
>> 1) how ugly the LLVM workarounds are
>>
>>
>
> Ok, I will prepare a patch for both cases.
>
>
>> 2) how they affect the generated kernel image in practice
>>
>> My gut feeling is that it's going to be acceptable with a bit of thinking
>> (we might even do some wrappers to do this cleanly) - but i'd really like
>> to see it before giving you that judgement.
>>

The below patch is to build the kernel for x86_64, with the attached
.config,
using llvm-gcc (trunk, with patch from
http://llvm.org/bugs/show_bug.cgi?id=2989#c2).

The .config has KVM turned off, because I didn't know how to change
x86_emulate.c so that LLVM builds it
(http://llvm.org/bugs/show_bug.cgi?id=3373#c10)
For 32-bit some more changes are required.

The resulting kernel image are of the same size
$ ls -l vmlinux.patched
-rwxr-xr-x 1 edwin edwin 11277206 2009-01-24 17:58 vmlinux.patched
$ ls -l vmlinux
-rwxr-xr-x 1 edwin edwin 11277206 2009-01-24 18:01 vmlinux

They aren't identical though, a disassembly shows that the address of
most of functions changed,
also some register assignments changed (r14 instead of r15, and so on).

Are these changes correct, and are they acceptable?

Best regards,
--Edwin

---
arch/x86/include/asm/uaccess.h | 10 ++++++----
arch/x86/lib/delay.c | 2 +-
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 69d2757..28280de 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -154,7 +154,7 @@ extern int __get_user_bad(void);

#define get_user(x, ptr) \
({ \
- int __ret_gu; \
+ unsigned long __ret_gu; \
unsigned long __val_gu; \
__chk_user_ptr(ptr); \
might_fault(); \
@@ -176,7 +176,7 @@ extern int __get_user_bad(void);
break; \
} \
(x) = (__typeof__(*(ptr)))__val_gu; \
- __ret_gu; \
+ (int)__ret_gu; \
})

#define __put_user_x(size, x, ptr, __ret_pu) \
@@ -239,11 +239,13 @@ extern void __put_user_8(void);
*/
#define put_user(x, ptr) \
({ \
- int __ret_pu; \
+ __typeof__(*(ptr)) __ret_pu; \
__typeof__(*(ptr)) __pu_val; \
__chk_user_ptr(ptr); \
might_fault(); \
__pu_val = x; \
+ /* return value is 0 or -EFAULT, both fit in 1 byte, and \
+ * are sign-extendable to int */ \
switch (sizeof(*(ptr))) { \
case 1: \
__put_user_x(1, __pu_val, ptr, __ret_pu); \
@@ -261,7 +263,7 @@ extern void __put_user_8(void);
__put_user_x(X, __pu_val, ptr, __ret_pu); \
break; \
} \
- __ret_pu; \
+ (int)__ret_pu; \
})

#define __put_user_size(x, ptr, size, retval, errret) \
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index f456860..12d27f8 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(__delay);

inline void __const_udelay(unsigned long xloops)
{
- int d0;
+ unsigned long d0;

xloops *= 4;
asm("mull %%edx"
--
1.5.6.5


Attachments:
.config-llvm-64 (52.54 kB)

2009-01-24 17:28:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input


* T?r?k Edwin <[email protected]> wrote:

> On 2009-01-23 20:27, T?r?k Edwin wrote:
> >>>
> >>>
> >> i'd not mind it at all if the kernel could be built with other open-source
> >> compilers too.
> >>
> >> Now in this case the patch you suggest might end up hurting the end result
> >> so it's not an unconditional 'yes'. But ... how much it actually matters
> >> depends on the circumstances.
> >>
> >> So could you please send a sample patch for some of most common inline
> >> assembly statements that are affected by this, so that we can see:
> >>
> >> 1) how ugly the LLVM workarounds are
> >>
> >>
> >
> > Ok, I will prepare a patch for both cases.
> >
> >
> >> 2) how they affect the generated kernel image in practice
> >>
> >> My gut feeling is that it's going to be acceptable with a bit of thinking
> >> (we might even do some wrappers to do this cleanly) - but i'd really like
> >> to see it before giving you that judgement.
> >>
>
> The below patch is to build the kernel for x86_64, with the attached
> .config,
> using llvm-gcc (trunk, with patch from
> http://llvm.org/bugs/show_bug.cgi?id=2989#c2).
>
> The .config has KVM turned off, because I didn't know how to change
> x86_emulate.c so that LLVM builds it
> (http://llvm.org/bugs/show_bug.cgi?id=3373#c10)
> For 32-bit some more changes are required.
>
> The resulting kernel image are of the same size
> $ ls -l vmlinux.patched
> -rwxr-xr-x 1 edwin edwin 11277206 2009-01-24 17:58 vmlinux.patched
> $ ls -l vmlinux
> -rwxr-xr-x 1 edwin edwin 11277206 2009-01-24 18:01 vmlinux
>
> They aren't identical though, a disassembly shows that the address of
> most of functions changed,
> also some register assignments changed (r14 instead of r15, and so on).
>
> Are these changes correct, and are they acceptable?
>
> Best regards,
> --Edwin
>
> ---
> arch/x86/include/asm/uaccess.h | 10 ++++++----
> arch/x86/lib/delay.c | 2 +-
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 69d2757..28280de 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -154,7 +154,7 @@ extern int __get_user_bad(void);
>
> #define get_user(x, ptr) \
> ({ \
> - int __ret_gu; \
> + unsigned long __ret_gu; \
> unsigned long __val_gu; \
> __chk_user_ptr(ptr); \
> might_fault(); \
> @@ -176,7 +176,7 @@ extern int __get_user_bad(void);
> break; \
> } \
> (x) = (__typeof__(*(ptr)))__val_gu; \
> - __ret_gu; \
> + (int)__ret_gu; \
> })
>
> #define __put_user_x(size, x, ptr, __ret_pu) \
> @@ -239,11 +239,13 @@ extern void __put_user_8(void);
> */
> #define put_user(x, ptr) \
> ({ \
> - int __ret_pu; \
> + __typeof__(*(ptr)) __ret_pu; \

This does not look right. We can sometimes have put_user() of non-integer
types (say structures). How does the (int)__ret_pu cast work in that case?
We'll fall into this branch in that case:

default: \
__put_user_x(X, __pu_val, ptr, __ret_pu); \
break; \

and __ret_pu has a nonsensical type in that case.

> __typeof__(*(ptr)) __pu_val; \
> __chk_user_ptr(ptr); \
> might_fault(); \
> __pu_val = x; \
> + /* return value is 0 or -EFAULT, both fit in 1 byte, and \
> + * are sign-extendable to int */ \
> switch (sizeof(*(ptr))) { \
> case 1: \
> __put_user_x(1, __pu_val, ptr, __ret_pu); \
> @@ -261,7 +263,7 @@ extern void __put_user_8(void);
> __put_user_x(X, __pu_val, ptr, __ret_pu); \
> break; \
> } \
> - __ret_pu; \
> + (int)__ret_pu; \
> })
>
> #define __put_user_size(x, ptr, size, retval, errret) \
> diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
> index f456860..12d27f8 100644
> --- a/arch/x86/lib/delay.c
> +++ b/arch/x86/lib/delay.c
> @@ -112,7 +112,7 @@ EXPORT_SYMBOL(__delay);
>
> inline void __const_udelay(unsigned long xloops)
> {
> - int d0;
> + unsigned long d0;
>
> xloops *= 4;
> asm("mull %%edx"

Is this all that you need (plus the 16-bit setup code tweaks) to get LLVM
to successfully build a 64-bit kernel image?

If yes then this doesnt look all that bad or invasive at first sight (if
the put_user() workaround can be expressed in a cleaner way), but in any
case it would be nice to hear an LLVM person's opinion about roughly when
this is going to be solved in LLVM itself.

Ingo

2009-01-24 19:31:22

by Chris Lattner

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input


On Jan 24, 2009, at 9:27 AM, Ingo Molnar wrote:

>> #define __put_user_size(x, ptr, size, retval, errret) \
>> diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
>> index f456860..12d27f8 100644
>> --- a/arch/x86/lib/delay.c
>> +++ b/arch/x86/lib/delay.c
>> @@ -112,7 +112,7 @@ EXPORT_SYMBOL(__delay);
>>
>> inline void __const_udelay(unsigned long xloops)
>> {
>> - int d0;
>> + unsigned long d0;
>>
>> xloops *= 4;
>> asm("mull %%edx"
>
> Is this all that you need (plus the 16-bit setup code tweaks) to get
> LLVM
> to successfully build a 64-bit kernel image?
>
> If yes then this doesnt look all that bad or invasive at first sight
> (if
> the put_user() workaround can be expressed in a cleaner way), but in
> any
> case it would be nice to hear an LLVM person's opinion about roughly
> when
> this is going to be solved in LLVM itself.

Hi Ingo,

We would like to support some more specific cases (e.g. when tying a
pointer/int to a different size pointer/int) but we currently don't
intend to support all cases (e.g. tying a FP value to int). We are in
this position because the semantics are very vague and hard to reason
about (and change based on target endianness) and we had many subtle
bugs in the corner cases.

Instead of having silent miscompiles, we decided to just reject all
the "hard" cases and add them back one by one as there is demand.
That way users could choose to modify their asms instead of having
them be potentially silently miscompiled.

LLVM 2.5 is in its release process right now, so it will not have
improvements in this area, but LLVM 2.6 certainly could. If there is
interest in building the kernel with 2.5, I think taking the patches
would be worthwhile. If that is hopeless anyway, waiting for the LLVM-
side fixes should be fine.

-Chris

2009-01-24 20:07:21

by Andreas Schwab

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

T?r?k Edwin <[email protected]> writes:

> @@ -239,11 +239,13 @@ extern void __put_user_8(void);
> */
> #define put_user(x, ptr) \
> ({ \
> - int __ret_pu; \
> + __typeof__(*(ptr)) __ret_pu; \
> __typeof__(*(ptr)) __pu_val; \
> __chk_user_ptr(ptr); \
> might_fault(); \
> __pu_val = x; \
> + /* return value is 0 or -EFAULT, both fit in 1 byte, and \
> + * are sign-extendable to int */ \

That does not work when *ptr is unsigned (char or short).

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2009-01-24 21:10:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

Chris Lattner wrote:
>
> We would like to support some more specific cases (e.g. when tying a
> pointer/int to a different size pointer/int) but we currently don't
> intend to support all cases (e.g. tying a FP value to int). We are in
> this position because the semantics are very vague and hard to reason
> about (and change based on target endianness) and we had many subtle
> bugs in the corner cases.
>
> Instead of having silent miscompiles, we decided to just reject all the
> "hard" cases and add them back one by one as there is demand. That way
> users could choose to modify their asms instead of having them be
> potentially silently miscompiled.
>

The case that matters for the kernel is integer to integer, when a
register is re-used from input to output.

> LLVM 2.5 is in its release process right now, so it will not have
> improvements in this area, but LLVM 2.6 certainly could. If there is
> interest in building the kernel with 2.5, I think taking the patches
> would be worthwhile. If that is hopeless anyway, waiting for the
> LLVM-side fixes should be fine.

The patches don't look all that bad to me, but I really want to make
sure we don't keep littering the kernel with workarounds for N different
compilers without getting a track to have them cleaned up.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-01-24 18:57:35

by Török Edwin

[permalink] [raw]
Subject: Re: inline asm semantics: output constraint width smaller than input

On 2009-01-24 19:27, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>> #define put_user(x, ptr) \
>> ({ \
>> - int __ret_pu; \
>> + __typeof__(*(ptr)) __ret_pu; \
>>
>
> This does not look right. We can sometimes have put_user() of non-integer
> types (say structures).

I didn't encounter it with my .config, but it is certainly possible.
I think using __builtin_choose_expr would be better than the switch

See a new patch at the end of this mail, using __builtin_choose_expr.
[vmlinux size still same]
It also includes some 32-bit stuff, but that is not complete yet.

> How does the (int)__ret_pu cast work in that case?
>

It would fail at compile time, with an error message that you can't cast
aggregates to ints, so my patch is not good.

> We'll fall into this branch in that case:
>
> default: \
> __put_user_x(X, __pu_val, ptr, __ret_pu); \
> break; \
>
> and __ret_pu has a nonsensical type in that case.
>

That branch is a call to a non-existent function __put_user_X, and
should give error at link time, right?
In the new patch below I used (void)-EFAULT so that you would get an
error at compile time (as suggested
in __builtin_choose_expr in gcc's manual), if that branch would ever get
expanded. Does that sound right?

>
>> __typeof__(*(ptr)) __pu_val; \
>> __chk_user_ptr(ptr); \
>> might_fault(); \
>> __pu_val = x; \
>> + /* return value is 0 or -EFAULT, both fit in 1 byte, and \
>> + * are sign-extendable to int */ \
>> switch (sizeof(*(ptr))) { \
>> case 1: \
>> __put_user_x(1, __pu_val, ptr, __ret_pu); \
>> @@ -261,7 +263,7 @@ extern void __put_user_8(void);
>> __put_user_x(X, __pu_val, ptr, __ret_pu); \
>> break; \
>> } \
>> - __ret_pu; \
>> + (int)__ret_pu; \
>> })
>>
>> #define __put_user_size(x, ptr, size, retval, errret) \
>> diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
>> index f456860..12d27f8 100644
>> --- a/arch/x86/lib/delay.c
>> +++ b/arch/x86/lib/delay.c
>> @@ -112,7 +112,7 @@ EXPORT_SYMBOL(__delay);
>>
>> inline void __const_udelay(unsigned long xloops)
>> {
>> - int d0;
>> + unsigned long d0;
>>
>> xloops *= 4;
>> asm("mull %%edx"
>>
>
> Is this all that you need (plus the 16-bit setup code tweaks)

The 16-bit setup code is compiled, but obviously doesn't work.
I think the best approach would be for LLVM to give a warning/error, when
-fno-unit-at-a-time is used, since it doesn't support that.

> to get LLVM
> to successfully build a 64-bit kernel image?
>

With the .config I sent previously, yes. With some other .config most
likely more changes are needed,
for example the SMP code, KVM code, but as I said in my previous email I
don't know how to "fix" the inline asm in that case.

There's something wrong with building some modules also, I keep getting
an "idr_init" undefined error, but the symbol is present in my vmlinux.
If I turn make those modules built into the kernel it works then (sctp,
w1, thermalsysfs).

It looks like I'll also have to submit some patches for ARCH=um, because
I get undefined references to __bad_size, __guard, and

__stack_smash_handler. __bad_size is probably because LLVM didn't inline expand + DCE something GCC did.


With an unpatched LLVM I would also need weak attributes to be on the
function type, instead of the return type,
but I think thats an LLVM bug, and a one-line patch corrects it.

> If yes then this doesnt look all that bad or invasive at first sight (if
> the put_user() workaround can be expressed in a cleaner way), but in any
> case it would be nice to hear an LLVM person's opinion about roughly when
> this is going to be solved in LLVM itself.
>

Yes, that is why LLVMDev is on Cc:, somebody will eventually reply ;)


Not-Signed-off-by: T?r?k Edwin <[email protected]>
---

arch/x86/include/asm/uaccess.h | 41
+++++++++++++++++++--------------------
arch/x86/lib/delay.c | 2 +-
arch/x86/pci/pcbios.c | 4 ++-
3 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 69d2757..46d00a8 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -154,7 +154,7 @@ extern int __get_user_bad(void);

#define get_user(x, ptr) \
({ \
- int __ret_gu; \
+ unsigned long __ret_gu; \
unsigned long __val_gu; \
__chk_user_ptr(ptr); \
might_fault(); \
@@ -176,7 +176,7 @@ extern int __get_user_bad(void);
break; \
} \
(x) = (__typeof__(*(ptr)))__val_gu; \
- __ret_gu; \
+ (int)__ret_gu; \
})

#define __put_user_x(size, x, ptr, __ret_pu) \
@@ -200,12 +200,15 @@ extern int __get_user_bad(void);
: "A" (x), "r" (addr), "i" (-EFAULT), "0" (err))

#define __put_user_x8(x, ptr, __ret_pu) \
- asm volatile("call __put_user_8" : "=a" (__ret_pu) \
- : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
+ ({ u32 __ret_pu;\
+ asm volatile("call __put_user_8" : "=a" (__ret_pu) \
+ : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx");\
+ (int)__ret_pu;})
#else
#define __put_user_asm_u64(x, ptr, retval) \
__put_user_asm(x, ptr, retval, "q", "", "Zr", -EFAULT)
-#define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu)
+#define __put_user_x8(x, ptr, __ret_pu) \
+ ({ u64 __ret_pu; __put_user_x(8, x, ptr, __ret_pu); (int)__ret_pu; })
#endif

extern void __put_user_bad(void);
@@ -239,29 +242,25 @@ extern void __put_user_8(void);
*/
#define put_user(x, ptr) \
({ \
- int __ret_pu; \
__typeof__(*(ptr)) __pu_val; \
__chk_user_ptr(ptr); \
might_fault(); \
__pu_val = x; \
- switch (sizeof(*(ptr))) { \
- case 1: \
+ __builtin_choose_expr(sizeof(*(ptr)) == 1, \
+ ({ u8 __ret_pu; \
__put_user_x(1, __pu_val, ptr, __ret_pu); \
- break; \
- case 2: \
+ (int)__ret_pu;}), \
+ __builtin_choose_expr(sizeof(*(ptr)) == 2, \
+ ({ u16 __ret_pu; \
__put_user_x(2, __pu_val, ptr, __ret_pu); \
- break; \
- case 4: \
+ (int)__ret_pu;}), \
+ __builtin_choose_expr(sizeof(*(ptr)) == 4, \
+ ({ u32 __ret_pu; \
__put_user_x(4, __pu_val, ptr, __ret_pu); \
- break; \
- case 8: \
- __put_user_x8(__pu_val, ptr, __ret_pu); \
- break; \
- default: \
- __put_user_x(X, __pu_val, ptr, __ret_pu); \
- break; \
- } \
- __ret_pu; \
+ (int)__ret_pu;}), \
+ __builtin_choose_expr(sizeof(*(ptr)) == 8, \
+ __put_user_x8(__pu_val, ptr, __ret_pu), \
+ (void)-EFAULT)))); \
})

#define __put_user_size(x, ptr, size, retval, errret) \
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index f456860..12d27f8 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(__delay);

inline void __const_udelay(unsigned long xloops)
{
- int d0;
+ unsigned long d0;

xloops *= 4;
asm("mull %%edx"
diff --git a/arch/x86/pci/pcbios.c b/arch/x86/pci/pcbios.c
index b82cae9..dfff175 100644
--- a/arch/x86/pci/pcbios.c
+++ b/arch/x86/pci/pcbios.c
@@ -65,6 +65,7 @@ static struct {
static unsigned long bios32_service(unsigned long service)
{
unsigned char return_code; /* %al */
+ unsigned long return_code_compat; /* %eax */
unsigned long address; /* %ebx */
unsigned long length; /* %ecx */
unsigned long entry; /* %edx */
@@ -72,13 +73,14 @@ static unsigned long bios32_service(unsigned long
service)

local_irq_save(flags);
__asm__("lcall *(%%edi); cld"
- : "=a" (return_code),
+ : "=a" (return_code_compat),
"=b" (address),
"=c" (length),
"=d" (entry)
: "0" (service),
"1" (0),
"D" (&bios32_indirect));
+ return_code = return_code_compat;
local_irq_restore(flags);

switch (return_code) {
--
1.5.6.5

2009-01-24 21:36:14

by Mike Stump

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

On Jan 24, 2009, at 10:57 AM, T?r?k Edwin wrote:
> With an unpatched LLVM I would also need weak attributes to be on the
> function type, instead of the return type,
> but I think thats an LLVM bug, and a one-line patch corrects it.

This is indeed a bug in llvm. We'll find a way to get it fixed.-

2009-01-27 20:03:06

by Duncan Sands

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

Hi,

> If yes then this doesnt look all that bad or invasive at first sight (if
> the put_user() workaround can be expressed in a cleaner way), but in any
> case it would be nice to hear an LLVM person's opinion about roughly when
> this is going to be solved in LLVM itself.

one thing that seems to be clear to everyone except me is... what are the
semantics supposed to be? [My understanding is that what is being discussed
is when you have an asm with a register as input and output, but with integer
types of different width for the input and output, but I saw some mention of
struct types in this thread...]. Presumably this is something obvious, but
it would be good to have someone spell it out in small words that even someone
like me can understand :)

Thanks,

Duncan.

2009-01-27 21:26:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

Duncan Sands wrote:
> Hi,
>
>> If yes then this doesnt look all that bad or invasive at first sight (if
>> the put_user() workaround can be expressed in a cleaner way), but in any
>> case it would be nice to hear an LLVM person's opinion about roughly when
>> this is going to be solved in LLVM itself.
>
> one thing that seems to be clear to everyone except me is... what are the
> semantics supposed to be? [My understanding is that what is being discussed
> is when you have an asm with a register as input and output, but with integer
> types of different width for the input and output, but I saw some mention of
> struct types in this thread...]. Presumably this is something obvious, but
> it would be good to have someone spell it out in small words that even someone
> like me can understand :)
>

I don't know about struct types, but the situation I'm talking about is
assembly statements of the form:

asm("foo" : "=r" (bar) : "0" (baz));

Here, "bar" and "baz" are constrained to be in the same hardware
register (from the "0" constraint in "baz"). The types of "bar" and
"baz" are otherwise unrelated.

I assume the difficulty here comes from how this needs to be handled
from the point of view of the register allocator. If both types fit
inside a single allocatable hardware register, the issue is trivial;
"bar" and "baz" form a single logical register for the purpose of
register allocation.

However, things get a bit ugly in the case of different widths that
affect individually scheduled registers, like 32- and 64-bit types on a
32-bit machine. Consider the case above where "bar" is a 64-bit type
and "baz" is a 32-bit type, then you functionally have, at least on x86:

uint64_t tmp = bar;
asm("foo" : "+r" (tmp));
baz = (uint32_t)tmp;

One could possibly argue that the latter case should be
"baz = (uint32_t)(tmp >> 32);" on a bigendian machine... since this is a
gcc syntax it probably should be "whatever gcc does" in that case, as
opposed to what might make sense.

(I'm afraid I don't have a bigendian box readily available at the
moment, so I can't test it out to see what gcc does. I have a powerpc
machine, but it's at home and turned off.)

-hpa

2009-01-28 01:46:14

by Kyle Moffett

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

On Tue, Jan 27, 2009 at 4:25 PM, H. Peter Anvin <[email protected]> wrote:
> However, things get a bit ugly in the case of different widths that affect
> individually scheduled registers, like 32- and 64-bit types on a 32-bit
> machine. Consider the case above where "bar" is a 64-bit type and "baz" is
> a 32-bit type, then you functionally have, at least on x86:
>
> uint64_t tmp = bar;
> asm("foo" : "+r" (tmp));
> baz = (uint32_t)tmp;
>
> One could possibly argue that the latter case should be
> "baz = (uint32_t)(tmp >> 32);" on a bigendian machine... since this is a gcc
> syntax it probably should be "whatever gcc does" in that case, as opposed to
> what might make sense.
>
> (I'm afraid I don't have a bigendian box readily available at the moment, so
> I can't test it out to see what gcc does. I have a powerpc machine, but
> it's at home and turned off.)

Actually, PPC64 boxes basically don't care... the usable GPRs are all
either 32-bit (for PPC32) or 64-bit (for PPC64), the <=32-bit
instructions are identical across both, they just
truncate/sign-extend/etc based on the lower 32-bits of the register.
Also, you would only do a right-shift if you were going all the way
out to memory as 64-bit and all the way back in as 32-bit... within a
single register it's kept coherent.

Structs are basically irrelevant for inline ASM as you can't pass a
struct to one... you can only pass the *address* of a struct, which is
always pointer-sized.

I think that really the only sane solution (which is hopefully what
GCC does) for integer types is to use a register the same size as the
larger of the two integers. Then you copy the value to/from the
smaller register (or just mask it on PPC64-alike architectures) before
or after the inline ASM.

Cheers,
Kyle Moffett

2009-01-28 01:56:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

Kyle Moffett wrote:
>
> Actually, PPC64 boxes basically don't care... the usable GPRs are all
> either 32-bit (for PPC32) or 64-bit (for PPC64), the <=32-bit
> instructions are identical across both, they just
> truncate/sign-extend/etc based on the lower 32-bits of the register.
> Also, you would only do a right-shift if you were going all the way
> out to memory as 64-bit and all the way back in as 32-bit... within a
> single register it's kept coherent.
>

Think about a 64-bit integer on ppc32. It will by necessity kept in two
registers. On gcc I believe it will always be a consecutive pair of
registers (AFAIK that's a hard-coded assumption in gcc, with the result
that gcc has a nonstandard internal register numbering for x86 since the
commonly used dx:ax pair is actually registers 2:0 in the hardware
numbering.)

> Structs are basically irrelevant for inline ASM as you can't pass a
> struct to one... you can only pass the *address* of a struct, which is
> always pointer-sized.

Right, of course.

> I think that really the only sane solution (which is hopefully what
> GCC does) for integer types is to use a register the same size as the
> larger of the two integers. Then you copy the value to/from the
> smaller register (or just mask it on PPC64-alike architectures) before
> or after the inline ASM.

Pretty much. Then you can do conventional copy propagation and
elimination after expanding subregisters to get rid of the extra ops in
the common case.

-hpa

2009-01-28 13:28:57

by Kyle Moffett

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

On Tue, Jan 27, 2009 at 8:56 PM, H. Peter Anvin <[email protected]> wrote:
> Kyle Moffett wrote:
>> Actually, PPC64 boxes basically don't care... the usable GPRs are all
>> either 32-bit (for PPC32) or 64-bit (for PPC64), the <=32-bit
>> instructions are identical across both, they just
>> truncate/sign-extend/etc based on the lower 32-bits of the register.
>> Also, you would only do a right-shift if you were going all the way
>> out to memory as 64-bit and all the way back in as 32-bit... within a
>> single register it's kept coherent.
>
> Think about a 64-bit integer on ppc32. It will by necessity kept in two
> registers. On gcc I believe it will always be a consecutive pair of
> registers (AFAIK that's a hard-coded assumption in gcc, with the result that
> gcc has a nonstandard internal register numbering for x86 since the commonly
> used dx:ax pair is actually registers 2:0 in the hardware numbering.)

Even in the 64-bit-integer on 32-bit-CPU case, you still end up with
the lower 32-bits in a standard integer GPR, and it's trivial to just
ignore the "upper" register. You also would not need to do any kind
of bit-shift, so long as your inline assembly initializes both GPRs
and puts the halves of the result where they belong.

Cheers,
Kyle Moffett

2009-01-28 17:29:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

Kyle Moffett wrote:
>
> Even in the 64-bit-integer on 32-bit-CPU case, you still end up with
> the lower 32-bits in a standard integer GPR, and it's trivial to just
> ignore the "upper" register. You also would not need to do any kind
> of bit-shift, so long as your inline assembly initializes both GPRs
> and puts the halves of the result where they belong.
>

In this case, we're talking about what happens when the assembly takes a
64-bit input operand in the same register as a 32-bit output operand
(with a "0" constraint.) Is the output operand the same register number
as the high register or the low register? On an LE machine the answer
is trivial and obvious -- the low register; on a BE machine both
interpretations are possible (I actually suspect gcc will assign the
high register, just based on how gcc internals work in this case.)

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-01-28 19:27:33

by Kyle Moffett

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

On Wed, Jan 28, 2009 at 12:29 PM, H. Peter Anvin <[email protected]> wrote:
> Kyle Moffett wrote:
>> Even in the 64-bit-integer on 32-bit-CPU case, you still end up with
>> the lower 32-bits in a standard integer GPR, and it's trivial to just
>> ignore the "upper" register. You also would not need to do any kind
>> of bit-shift, so long as your inline assembly initializes both GPRs
>> and puts the halves of the result where they belong.
>
> In this case, we're talking about what happens when the assembly takes a
> 64-bit input operand in the same register as a 32-bit output operand
> (with a "0" constraint.) Is the output operand the same register number
> as the high register or the low register? On an LE machine the answer
> is trivial and obvious -- the low register; on a BE machine both
> interpretations are possible (I actually suspect gcc will assign the
> high register, just based on how gcc internals work in this case.)

On a BE 32-bit machine, the "output register" technically ought to be
"64-bit" anyways, since it's constrained to be the same as the 64-bit
"input register". That means that you ought to make sure to set
*both* output registers appropriately, one of them being 0 and the
other being the 32-bit number. I think that's the only answer that
actually makes any sense from a holistic code-generation sense. So it
seems we are in violent agreement :-D.

Cheers,
Kyle Moffett

2009-01-28 21:01:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] inline asm semantics: output constraint width smaller than input

Kyle Moffett wrote:
>
> On a BE 32-bit machine, the "output register" technically ought to be
> "64-bit" anyways, since it's constrained to be the same as the 64-bit
> "input register". That means that you ought to make sure to set
> *both* output registers appropriately, one of them being 0 and the
> other being the 32-bit number. I think that's the only answer that
> actually makes any sense from a holistic code-generation sense. So it
> seems we are in violent agreement :-D.
>

No.

This is wrong on two accounts.

First of all, THERE ARE NO "TWO OUTPUT REGISTERS". Period. There is
only one. The question is: which of the two *input* registers does it
correspond to?

Second of all, "making sense from a holistic code-generation sense"
doesn't apply here. This is about mimicing a gcc construct, regardless
of which amount of sense it makes. Therefore, the only thing that
actually makes sense is to mimic gcc behavior, no matter how stupid it
happens to be.

-hpa