2001-07-18 10:46:28

by Julian Anastasov

[permalink] [raw]
Subject: cpuid_eax damages registers (2.4.7pre7)


Hello,

I don't know whether cpuid_eax (2.4.7pre) should preserve the
registers changed from cpuid but I have an oops on boot with 2.4.7pre7 in
squash_the_stupid_serial_number where cpuid_eax changes ebx and the
parameter "c" is loaded with "Genu". The following change fixes the
problem:

from:

c->cpuid_level = cpuid_eax(0);

to:

unsigned int dummy;

cpuid(0, &c->cpuid_level, &dummy, &dummy, &dummy);


but I'm not sure in the definitions of these cpuid_XXX funcs. I see
that they are used at many places. IMO, they have to preserve the
registers.


Regards

--
Julian Anastasov <[email protected]>


2001-07-18 15:12:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

In article <Pine.LNX.4.10.10107181347030.16710-100000@l> you write:
>
> I don't know whether cpuid_eax (2.4.7pre) should preserve the
>registers changed from cpuid

It should. It has the proper "this instruction assigned values to these
registers" stuff, so gcc should know which ones change.

> but I have an oops on boot with 2.4.7pre7 in
>squash_the_stupid_serial_number where cpuid_eax changes ebx and the
>parameter "c" is loaded with "Genu". The following change fixes the
>problem:

Interesting. Can you do the following:

- tell us your compiler version

- do a "make arch/i386/kernel/setup.s" both ways, and show what
squash_the_stupid_serial_number() looks like.

- fix _all_ the "cpuid*()" functions to have

:"0" (op)

instead of their current incorrect

:"a" (op)

(we're supposed to explicitly tell the compiler that the first input
is the same as the first output)

- see if that makes any difference to the assembler output.

In any case it does sound like a compiler bug, but it would be good to
have a workaround. But it would also be good to have a more complete
dump of the oops in question to see more about what is going on..

Thanks,
Linus

2001-07-18 17:07:43

by Julian Anastasov

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


Hello,

On Wed, 18 Jul 2001, Linus Torvalds wrote:

> In article <Pine.LNX.4.10.10107181347030.16710-100000@l> you write:
> >
> > I don't know whether cpuid_eax (2.4.7pre) should preserve the
> >registers changed from cpuid
>
> It should. It has the proper "this instruction assigned values to these
> registers" stuff, so gcc should know which ones change.
>
> > but I have an oops on boot with 2.4.7pre7 in
> >squash_the_stupid_serial_number where cpuid_eax changes ebx and the
> >parameter "c" is loaded with "Genu". The following change fixes the
> >problem:
>
> Interesting. Can you do the following:
>
> - tell us your compiler version

root@l:~# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
root@l:~# make --version
GNU Make version 3.77, by Richard Stallman and Roland McGrath.
...
root@l:~# ld -v
GNU ld version 2.9.1 (with BFD 2.9.1.0.24)

root@l:~# insmod -V
insmod version 2.4.6

it seems my binutils is older than the recommended 2.9.1.0.25 :(

> - do a "make arch/i386/kernel/setup.s" both ways, and show what
> squash_the_stupid_serial_number() looks like.

Both with "a" and "0" (zero) gcc shows same output in setup.s
(using cpuid_eax):

.Lfe20:
.size deep_magic_nexgen_probe,.Lfe20-deep_magic_nexgen_probe
.section .rodata
.align 32
.LC127:
.string "<5>CPU serial number disabled.\n"
.section .text.init
.align 4
.type squash_the_stupid_serial_number,@function
squash_the_stupid_serial_number:
pushl %esi
pushl %ebx
movl 12(%esp),%ebx
movl 12(%ebx),%eax
testl $262144,%eax
je .L2393
cmpl $0,disable_x86_serial_nr
je .L2393
movl $281,%ecx
#APP
rdmsr
#NO_APP
movl %eax,%esi
orl $2097152,%esi
movl %esi,%eax
#APP
wrmsr
#NO_APP
pushl $.LC127
call printk
addl $4,%esp
#APP
lock ; btrl $18,12(%ebx)
#NO_APP
xorl %eax,%eax
#APP
cpuid <<<-- cpuid_eax
#NO_APP
movl %eax,8(%ebx) <<<-- oops
.L2393:
popl %ebx
popl %esi
ret


> - fix _all_ the "cpuid*()" functions to have
>
> :"0" (op)

does not help in my case, other solutions? And without recompile
anyone can see its vmlinux file too (disassemble squash_the_stupid_serial_number)

Even by using cpuid(...) replacement the registers are not
preserved but instead of ebx, the arg "c" is in ebp and the oops does not
occur:

0xc029ef94 <squash_the_stupid_serial_number+68>: xor %eax,%eax
0xc029ef96 <squash_the_stupid_serial_number+70>: cpuid
0xc029ef98 <squash_the_stupid_serial_number+72>: mov %eax,0x8(%ebp)


It seems the problem remains in cpuid_XXX funcs.

> instead of their current incorrect
>
> :"a" (op)
>
> (we're supposed to explicitly tell the compiler that the first input
> is the same as the first output)
>
> - see if that makes any difference to the assembler output.
>
> In any case it does sound like a compiler bug, but it would be good to
> have a workaround. But it would also be good to have a more complete
> dump of the oops in question to see more about what is going on..

After fixing this with cpuid() I can work perfectly with this
kernel on network activity (45K packets/sec), SMP. So, only cpuid_xxx are
suspected for now. I don't preserve the oopses but I can do it if the
attached in another mail setup.s is not enough. This problem does not
occur in 2.4.6 and below, I have tried 2.4.2 - 2.4.7pre7 with the same
gcc, etc. It seems only in squash_XXX the ebx damage was noticed.

Part from .config:

#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
CONFIG_M586TSC=y
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_USE_STRING_486=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_TSC=y
# CONFIG_TOSHIBA is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_SMP=y
CONFIG_HAVE_DEC_LOCK=y


2nd CPU (same as 1st, I don't see problems in the detection):

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 864.004
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1723.59


> Thanks,
> Linus


Regards

--
Julian Anastasov <[email protected]>

2001-07-18 17:22:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


On Wed, 18 Jul 2001, Julian Anastasov wrote:
>
> gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)

Ok. That has been considered a "stable" gcc for a long time.

We should try to find a work-around.

> > - do a "make arch/i386/kernel/setup.s" both ways, and show what
> > squash_the_stupid_serial_number() looks like.
>
> Both with "a" and "0" (zero) gcc shows same output in setup.s
> (using cpuid_eax):

Ok.

Can you try to do the following in the cpuid_xxx() functions:
- remove the dummy reads (ie leave just the one register in the asm that
we're actually interested in)
- add explicit clobbers for the other registers

So, for example, cpuid_eax() would be roughly

unsigned int eax;
asm("cpuid"
:"=a" (eax)
:"0" (level)
:"bx","cx","dx");
return eax;

(and for the others, you can't really clobber "ax" because it's an input,
so you'd have to leave that as just a "=a" (eax) and hope that gcc gets
that output right.

> lock ; btrl $18,12(%ebx)
> #NO_APP
> xorl %eax,%eax
> #APP
> cpuid <<<-- cpuid_eax
> #NO_APP
> movl %eax,8(%ebx) <<<-- oops

Yes, the above is very definitely a bug in gcc. Oh, well..

> Even by using cpuid(...) replacement the registers are not
> preserved but instead of ebx, the arg "c" is in ebp and the oops does not
> occur:

They don't have ot be preserved per se - gcc just knows that they are
modified, and because gcc doesn't care about the value it just won't use
it.

> After fixing this with cpuid() I can work perfectly with this
> kernel on network activity (45K packets/sec), SMP. So, only cpuid_xxx are
> suspected for now.

The current asms for cpuid_xxx() are correct. The fact that gcc generates
bad code for this case implies that it might happen somewhere else too,
but as egcs-2.91.66 is fairly well tested it may indeed be that this is
the only case where that actually happens.

> I don't preserve the oopses but I can do it if ..

No, your generated assembly is clear enough evidence of what is going on.
Thanks. If you could try the clobber approach, that would be good.

Linus

2001-07-18 20:15:56

by Rick Hohensee

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

>We should try to find a work-around.

>> > - do a "make arch/i386/kernel/setup.s" both ways, and show what

Is there a way to do a "make bla/zay/woof.c " and save the woof.s ?


Rick Hohensee
http://www.clienux.com

2001-07-18 20:51:11

by Kai Germaschewski

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

On Wed, 18 Jul 2001, Linus Torvalds wrote:

> Can you try to do the following in the cpuid_xxx() functions:
> - remove the dummy reads (ie leave just the one register in the asm that
> we're actually interested in)
> - add explicit clobbers for the other registers

I looked into this a little to improve my knowledge on inline asm. Anyway,
I found one ugly work-around, i.e. using a makro instead of the inline
function plus a local eax_in variable, but your idea seems way nicer and
works as well.

Generated code looks okay now (using kgcc aka egcs-2.91.66):

2002: 31 c0 xor %eax,%eax
2004: 0f a2 cpuid
2006: 89 46 08 mov %eax,0x8(%esi)
2009: 5b pop %ebx
200a: 5e pop %esi
200b: c3 ret

Patch follows:

--Kai


diff -ur linux-2.4.7-pre7/include/asm-i386/processor.h linux-2.4.7-pre7.work/include/asm-i386/processor.h
--- linux-2.4.7-pre7/include/asm-i386/processor.h Wed Jul 18 21:49:47 2001
+++ linux-2.4.7-pre7.work/include/asm-i386/processor.h Wed Jul 18 22:38:20 2001
@@ -134,38 +134,42 @@
*/
extern inline unsigned int cpuid_eax(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int eax;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=a" (eax)
+ : "a" (op)
+ : "ebx", "ecx", "edx");
return eax;
}
extern inline unsigned int cpuid_ebx(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int ebx;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=b" (ebx)
+ : "a" (op)
+ : "eax", "ecx", "edx");
return ebx;
}
extern inline unsigned int cpuid_ecx(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int ecx;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=c" (ecx)
+ : "a" (op)
+ : "eax", "ebx", "edx");
return ecx;
}
extern inline unsigned int cpuid_edx(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int edx;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=d" (edx)
+ : "a" (op)
+ : "eax", "ebx", "ecx");
return edx;
}


2001-07-18 22:05:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

In article <Pine.LNX.4.33.0107182239050.1298-100000@vaio> you write:
>
>Generated code looks okay now (using kgcc aka egcs-2.91.66):
>
> 2002: 31 c0 xor %eax,%eax
> 2004: 0f a2 cpuid
> 2006: 89 46 08 mov %eax,0x8(%esi)
> 2009: 5b pop %ebx
> 200a: 5e pop %esi
> 200b: c3 ret
>
>Patch follows:

Can you verify with this alternate patch instead? Yours works ok on
older gcc's, but the gcc team feels that clobbers must never cover
inputs or outputs, so your patch really generates invalid asms. Here's
a alternate, can you verify that it works for you guys, and perhaps
people can at the same time eye-ball it for any other issues they can
think of?

Linus

----
--- pre7/linux/include/asm-i386/processor.h Wed Jul 18 09:34:03 2001
+++ linux/include/asm-i386/processor.h Wed Jul 18 14:58:45 2001
@@ -126,7 +126,7 @@
"=b" (*ebx),
"=c" (*ecx),
"=d" (*edx)
- : "a" (op));
+ : "0" (op));
}

/*
@@ -134,38 +134,42 @@
*/
extern inline unsigned int cpuid_eax(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int eax;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=a" (eax)
+ : "0" (op)
+ : "bx", "cx", "dx");
return eax;
}
extern inline unsigned int cpuid_ebx(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int eax, ebx;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=a" (eax), "=b" (ebx)
+ : "0" (op)
+ : "cx", "dx" );
return ebx;
}
extern inline unsigned int cpuid_ecx(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int eax, ecx;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=a" (eax), "=c" (ecx)
+ : "0" (op)
+ : "bx", "dx" );
return ecx;
}
extern inline unsigned int cpuid_edx(unsigned int op)
{
- unsigned int eax, ebx, ecx, edx;
+ unsigned int eax, edx;

__asm__("cpuid"
- : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
- : "a" (op));
+ : "=a" (eax), "=d" (edx)
+ : "0" (op)
+ : "bx", "cx");
return edx;
}

2001-07-18 22:33:29

by Kai Germaschewski

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

On Wed, 18 Jul 2001, Linus Torvalds wrote:

> Can you verify with this alternate patch instead? Yours works ok on
> older gcc's, but the gcc team feels that clobbers must never cover
> inputs or outputs, so your patch really generates invalid asms. Here's
> a alternate, can you verify that it works for you guys, and perhaps
> people can at the same time eye-ball it for any other issues they can
> think of?

Generated code looks good here. I checked the asm output for some
instances of cpuid_e[acd]x() in setup.c, and generated asm looks right in
all cases.

--Kai


2001-07-19 05:16:11

by Keith Owens

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

On Wed, 18 Jul 2001 16:30:55 -0400 (EDT),
Rick Hohensee <[email protected]> wrote:
>>We should try to find a work-around.
>
>>> > - do a "make arch/i386/kernel/setup.s" both ways, and show what
>
>Is there a way to do a "make bla/zay/woof.c " and save the woof.s ?

rm bla/zay/woof.o
make CFLAGS_woof.o=-save-temps

2001-07-19 08:22:43

by Julian Anastasov

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


Hello,

On Wed, 18 Jul 2001, Linus Torvalds wrote:

> Can you verify with this alternate patch instead? Yours works ok on
> older gcc's, but the gcc team feels that clobbers must never cover
> inputs or outputs, so your patch really generates invalid asms. Here's
> a alternate, can you verify that it works for you guys, and perhaps
> people can at the same time eye-ball it for any other issues they can
> think of?

This patch works for me too (I checked all cpuid_XXX calls).
After some thinking I produced another patch. The interesting part is
that __volatile__ solves the problem. Patch appended. I see in other
kernel files that volatile solves gcc bugs. The question is whether
the volatile is needed only as a work-around or it is needed in this
case particulary, i.e. where the output registers are not used and are
optimized.

> Linus

Regards

--
Julian Anastasov <[email protected]>


--- include/asm-i386/processor.h.orig1 Wed Jul 18 12:03:26 2001
+++ include/asm-i386/processor.h Thu Jul 19 10:58:06 2001
@@ -136,7 +136,7 @@
{
unsigned int eax, ebx, ecx, edx;

- __asm__("cpuid"
+ __asm__ __volatile__ ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "a" (op));
return eax;
@@ -145,7 +145,7 @@
{
unsigned int eax, ebx, ecx, edx;

- __asm__("cpuid"
+ __asm__ __volatile__ ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "a" (op));
return ebx;
@@ -154,7 +154,7 @@
{
unsigned int eax, ebx, ecx, edx;

- __asm__("cpuid"
+ __asm__ __volatile__ ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "a" (op));
return ecx;
@@ -163,7 +163,7 @@
{
unsigned int eax, ebx, ecx, edx;

- __asm__("cpuid"
+ __asm__ __volatile__ ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "a" (op));
return edx;

2001-07-19 18:37:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

Followup to: <Pine.LNX.4.10.10107191113080.2341-100000@l>
By author: Julian Anastasov <[email protected]>
In newsgroup: linux.dev.kernel
>
> This patch works for me too (I checked all cpuid_XXX calls).
> After some thinking I produced another patch. The interesting part is
> that __volatile__ solves the problem. Patch appended. I see in other
> kernel files that volatile solves gcc bugs. The question is whether
> the volatile is needed only as a work-around or it is needed in this
> case particulary, i.e. where the output registers are not used and are
> optimized.
>

It certainly shouldn't; obviously, the assembly code is clearly
declaring that it is outputting multiple things. "volatile" on an
"asm" statement basically means "do this even if you don't need the
output values" (i.e. don't assume you're doing this just for the
computation), which is incorrect in this case (we *are* doing it just
for the output values, not for any side effects), but it is not really
surprising that it works around this bug.

The problem seems to be that gcc 2.91.66 thinks it can optimize away
half of an indivisible operation, which cannot be called anything but
a bug.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-07-19 22:42:34

by Julian Anastasov

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


Hello,

> > kernel files that volatile solves gcc bugs. The question is whether
> > the volatile is needed only as a work-around or it is needed in this
> > case particulary, i.e. where the output registers are not used and are
> > optimized.
> >
>
> It certainly shouldn't; obviously, the assembly code is clearly
> declaring that it is outputting multiple things. "volatile" on an
> "asm" statement basically means "do this even if you don't need the
> output values" (i.e. don't assume you're doing this just for the
> computation), which is incorrect in this case (we *are* doing it just
> for the output values, not for any side effects), but it is not really
> surprising that it works around this bug.

I'm now learning the inline asm but I found some interesting
notes on this/similar issue:

http://gcc.gnu.org/fom_serv/cache/23.html

In my distro (now with gcc 2.96) I have a gcc info with name "Extended
Asm", "Assembler Instructions with C Expression Operands" with the
following text:

---
If an `asm' has output operands, GNU CC assumes for optimization
purposes the instruction has no side effects except to change the output
operands. This does not mean instructions with a side effect cannot be
used, but you must be careful, because the compiler may eliminate them
if the output operands aren't used, or move them out of loops, or
replace two with one if they constitute a common subexpression. Also,
if your instruction does have a side effect on a variable that otherwise
appears not to change, the old value of the variable may be reused later
if it happens to be found in a register.

You can prevent an `asm' instruction from being deleted, moved
significantly, or combined, by writing the keyword `volatile' after the
`asm'.
------

follows example, etc.

What I want to say (I could be wrong and that can't surprise me) is
that the original cpuid_eax is in fact incorrect. All cpuid_XXX funcs
use only dummy output operands and because they are not used, they
are removed and not considered as changed. By using dummy vars we
say "We don't care for these output values, use them in current
scope only". As result, the cpuid instruction appears as not to
change the 3 of the 4 registers. This explains why cpuid() behaves
differently from the cpuid_XXX funcs: because the cpuid_XXX funcs
declare eax ... edx as local vars, while in cpuid() they are not
local and hence are not optimized.

The two solutions looks valid:

- to clobber the changed registers and hence not to use them as
dummy output operands - already in 2.4.7pre8

- to specify volatile, which in some way avoids the default
behavior to make optimizations and to remove assumptions about the
used dummy output registers

Of course, it is interesting why 2.95 behaves differently, may be
it simply proves that I'm wrong or there is a bug/precaution
introduced after 2.91.66. The key is in the "volatile" semantic.


Regards

--
Julian Anastasov <[email protected]>

2001-07-19 22:51:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

Julian Anastasov wrote:
>
> What I want to say (I could be wrong and that can't surprise me) is
> that the original cpuid_eax is in fact incorrect. All cpuid_XXX funcs
> use only dummy output operands...
>

Bullsh*t. One of the output operands is always a non-dummy (in
cpuid_edx() edx is not a dummy, for example.)

-hpa

2001-07-19 23:00:35

by Julian Anastasov

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


Hello,

On Thu, 19 Jul 2001, H. Peter Anvin wrote:

> Julian Anastasov wrote:
> >
> > What I want to say (I could be wrong and that can't surprise me) is
> > that the original cpuid_eax is in fact incorrect. All cpuid_XXX funcs
> > use only dummy output operands...
> >
>
> Bullsh*t. One of the output operands is always a non-dummy (in
> cpuid_edx() edx is not a dummy, for example.)

Right, and it is may be not damaged. In my first posting I
claim that cpuid_eax damages ebx (and may be ecx and edx).

> -hpa


Regards

--
Julian Anastasov <[email protected]>

2001-07-19 22:57:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


On Fri, 20 Jul 2001, Julian Anastasov wrote:
>
> In my distro (now with gcc 2.96) I have a gcc info with name "Extended
> Asm", "Assembler Instructions with C Expression Operands" with the
> following text:

[ yes ]

> What I want to say (I could be wrong and that can't surprise me) is
> that the original cpuid_eax is in fact incorrect.

No. It's correct, because cpuid doesn't have any side effects (*), so we
don't need to mark it volatile. gcc is free to remove it if nothing uses
the outputs, for example. But gcc cannot (and generally does not) ignore
outputs that _are_ specified.

Now, adding the "volatile" doesn't really make things worse, and it will
make gcc even more anal about optimizations than it normally is, which is
probably why that also hides the gcc bug.

Note that gcc having bus in the inline asm handling is nothing new. We've
had that before, and I'm sure we'll have it again. Not very many people
use them: the kernel tends to be the heaviest user of them (with libc
probably a good second). Which is why bugs here often take time to get
fixed. It doesn't help that the documentation has been quite bad, even
misleading, at times.

Linus

(*) cpuid has the side effect of being a "synchronizing instruction", and
as such you can use it for some SMP ordering things etc, but as it's one
of the slowest such instructions nobody is really ever interested in using
it that way, and it doesn't have any other "architecturally visible"
effects that the compiler could care about.

2001-07-19 23:03:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

Julian Anastasov wrote:
>
> Hello,
>
> On Thu, 19 Jul 2001, H. Peter Anvin wrote:
>
> > Julian Anastasov wrote:
> > >
> > > What I want to say (I could be wrong and that can't surprise me) is
> > > that the original cpuid_eax is in fact incorrect. All cpuid_XXX funcs
> > > use only dummy output operands...
> > >
> >
> > Bullsh*t. One of the output operands is always a non-dummy (in
> > cpuid_edx() edx is not a dummy, for example.)
>
> Right, and it is may be not damaged. In my first posting I
> claim that cpuid_eax damages ebx (and may be ecx and edx).
>

Doesn't matter. gcc can't pick and choose what *effects* of an asm()
statement it wants to happen -- this should be utterly obvious to
anyone. As the old saying goes, you can't be half pregnant.

-hpa

2001-07-19 23:18:47

by Julian Anastasov

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


Hello,

On Thu, 19 Jul 2001, Linus Torvalds wrote:

> No. It's correct, because cpuid doesn't have any side effects (*), so we
> don't need to mark it volatile. gcc is free to remove it if nothing uses
> the outputs, for example. But gcc cannot (and generally does not) ignore
> outputs that _are_ specified.

My understanding was that eax, ... edx are declared as
local vars and so their values can't be used out of the current
function scope, even when they are defined in inline func. So, I
assume they can be optimized (the fact is that they are not used)
and may be gcc forgets them. True, may be the docs do not cover
such situations but my first thought was not to explain everything
with bugs.

> Now, adding the "volatile" doesn't really make things worse, and it will
> make gcc even more anal about optimizations than it normally is, which is
> probably why that also hides the gcc bug.
>
> Note that gcc having bus in the inline asm handling is nothing new. We've
> had that before, and I'm sure we'll have it again. Not very many people
> use them: the kernel tends to be the heaviest user of them (with libc
> probably a good second). Which is why bugs here often take time to get
> fixed. It doesn't help that the documentation has been quite bad, even
> misleading, at times.

Agreed. It seems I have to read more docs ...

> Linus
>
> (*) cpuid has the side effect of being a "synchronizing instruction", and
> as such you can use it for some SMP ordering things etc, but as it's one
> of the slowest such instructions nobody is really ever interested in using
> it that way, and it doesn't have any other "architecturally visible"
> effects that the compiler could care about.


Regards

--
Julian Anastasov <[email protected]>

2001-07-19 23:24:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

Julian Anastasov wrote:
>
> Hello,
>
> On Thu, 19 Jul 2001, Linus Torvalds wrote:
>
> > No. It's correct, because cpuid doesn't have any side effects (*), so we
> > don't need to mark it volatile. gcc is free to remove it if nothing uses
> > the outputs, for example. But gcc cannot (and generally does not) ignore
> > outputs that _are_ specified.
>
> My understanding was that eax, ... edx are declared as
> local vars and so their values can't be used out of the current
> function scope, even when they are defined in inline func. So, I
> assume they can be optimized (the fact is that they are not used)
> and may be gcc forgets them. True, may be the docs do not cover
> such situations but my first thought was not to explain everything
> with bugs.
>

Well, your first thought was wrong. It is a bug. Sorry.

However, your argument basically explains why adding "volatile" does work
-- it keeps gcc from thinking that it can optimize away something that it
otherwise couldn't.

However, it's still a bug.

-hpa

2001-07-19 23:25:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)


On Fri, 20 Jul 2001, Julian Anastasov wrote:
>
> My understanding was that eax, ... edx are declared as
> local vars and so their values can't be used out of the current
> function scope, even when they are defined in inline func.

Yes, but notice how we return a value.

And the only way to get that value is to execute the cpuid. So obviously
gcc can't drop the cpuid. And if it cannot drop it, it cannot ignore the
fact that cpuid changes all the registers we say it changes.

Linus

2001-07-21 23:47:59

by Richard Henderson

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

On Wed, Jul 18, 2001 at 08:10:22AM -0700, Linus Torvalds wrote:
> - fix _all_ the "cpuid*()" functions to have
>
> :"0" (op)
>
> instead of their current incorrect
>
> :"a" (op)
>
> (we're supposed to explicitly tell the compiler that the first input
> is the same as the first output)

FWIW, using "a" as the input constraint isn't incorrect.
The two are equivalent given a singleton register class.


r~

2001-07-22 00:02:41

by Richard Henderson

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

On Wed, Jul 18, 2001 at 03:04:20PM -0700, Linus Torvalds wrote:
> Can you verify with this alternate patch instead?

I take it you've found something that happens to work with egcs 1.1?

At a glance the bug appears to be the one that caused
gcc/testsuite/gcc.dg/clobbers.c to be written. That one
is a fundamental flaw in reload that caused it to be
largely rewritten for gcc 2.95.

In other words, you may not be able to find a workaround
for egcs 1.1 that works for all cases. Using an alternative
that writes all of eax/ebx/ecx/edx to memory is probably
safer if none of the uses of cpuid are performance-critical.


r~

2001-07-22 04:28:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: cpuid_eax damages registers (2.4.7pre7)

Followup to: <[email protected]>
By author: Richard Henderson <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Wed, Jul 18, 2001 at 03:04:20PM -0700, Linus Torvalds wrote:
> > Can you verify with this alternate patch instead?
>
> I take it you've found something that happens to work with egcs 1.1?
>
> At a glance the bug appears to be the one that caused
> gcc/testsuite/gcc.dg/clobbers.c to be written. That one
> is a fundamental flaw in reload that caused it to be
> largely rewritten for gcc 2.95.
>
> In other words, you may not be able to find a workaround
> for egcs 1.1 that works for all cases. Using an alternative
> that writes all of eax/ebx/ecx/edx to memory is probably
> safer if none of the uses of cpuid are performance-critical.
>

Either making the asms "volatile" seems to work, as does leaving the
unused arguments as clobbers rather than output operands. If either
doesn't work with later versions, then later versions would have their
own bugs.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt