2006-11-29 02:22:12

by Keith Owens

[permalink] [raw]
Subject: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
wait_hpet_tick is optimized away to a never ending loop and the kernel
hangs on boot in timer setup.

0000001a <wait_hpet_tick>:
1a: 55 push %ebp
1b: 89 e5 mov %esp,%ebp
1d: eb fe jmp 1d <wait_hpet_tick+0x3>

This is not a problem with gcc 3.3.5. Adding barrier() calls to
wait_hpet_tick does not help, making the variables volatile does.

Signed-off-by: Keith Owens <[email protected]>

---
arch/i386/kernel/time_hpet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/i386/kernel/time_hpet.c
===================================================================
--- linux-2.6.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6/arch/i386/kernel/time_hpet.c
@@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
*/
static void __devinit wait_hpet_tick(void)
{
- unsigned int start_cmp_val, end_cmp_val;
+ unsigned volatile int start_cmp_val, end_cmp_val;

start_cmp_val = hpet_readl(HPET_T0_CMP);
do {


2006-11-29 03:08:29

by Nicholas Miell

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
> wait_hpet_tick is optimized away to a never ending loop and the kernel
> hangs on boot in timer setup.
>
> 0000001a <wait_hpet_tick>:
> 1a: 55 push %ebp
> 1b: 89 e5 mov %esp,%ebp
> 1d: eb fe jmp 1d <wait_hpet_tick+0x3>
>
> This is not a problem with gcc 3.3.5. Adding barrier() calls to
> wait_hpet_tick does not help, making the variables volatile does.
>
> Signed-off-by: Keith Owens <[email protected]>
>
> ---
> arch/i386/kernel/time_hpet.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6/arch/i386/kernel/time_hpet.c
> ===================================================================
> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
> +++ linux-2.6/arch/i386/kernel/time_hpet.c
> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
> */
> static void __devinit wait_hpet_tick(void)
> {
> - unsigned int start_cmp_val, end_cmp_val;
> + unsigned volatile int start_cmp_val, end_cmp_val;
>
> start_cmp_val = hpet_readl(HPET_T0_CMP);
> do {

When you examine the inlined functions involved, this looks an awful lot
like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278

Perhaps SUSE should fix their gcc instead of working around compiler
problems in the kernel?

--
Nicholas Miell <[email protected]>

2006-11-29 03:56:37

by Keith Owens

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
>On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
>> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
>> wait_hpet_tick is optimized away to a never ending loop and the kernel
>> hangs on boot in timer setup.
>>
>> 0000001a <wait_hpet_tick>:
>> 1a: 55 push %ebp
>> 1b: 89 e5 mov %esp,%ebp
>> 1d: eb fe jmp 1d <wait_hpet_tick+0x3>
>>
>> This is not a problem with gcc 3.3.5. Adding barrier() calls to
>> wait_hpet_tick does not help, making the variables volatile does.
>>
>> Signed-off-by: Keith Owens <[email protected]>
>>
>> ---
>> arch/i386/kernel/time_hpet.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Index: linux-2.6/arch/i386/kernel/time_hpet.c
>> ===================================================================
>> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
>> +++ linux-2.6/arch/i386/kernel/time_hpet.c
>> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
>> */
>> static void __devinit wait_hpet_tick(void)
>> {
>> - unsigned int start_cmp_val, end_cmp_val;
>> + unsigned volatile int start_cmp_val, end_cmp_val;
>>
>> start_cmp_val = hpet_readl(HPET_T0_CMP);
>> do {
>
>When you examine the inlined functions involved, this looks an awful lot
>like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278
>
>Perhaps SUSE should fix their gcc instead of working around compiler
>problems in the kernel?

Firstly, the fix for 22278 is included in gcc 4.1.0.

Secondly, I believe that this is a separate problem from bug 22278.
hpet_readl() is correctly using volatile internally, but its result is
being assigned to a pair of normal integers (not declared as volatile).
In the context of wait_hpet_tick, all the variables are unqualified so
gcc is allowed to optimize the comparison away.

The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
where the return value from hpet_readl() is assigned to a normal
variable. Nothing in the C standard says that those unqualified
variables should be magically treated as volatile, just because the
original code that extracted the value used volatile. IOW, time_hpet.c
needs to declare any variables that hold the result of hpet_readl() as
being volatile variables.

2006-11-29 04:04:55

by David Miller

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

From: Keith Owens <[email protected]>
Date: Wed, 29 Nov 2006 14:56:20 +1100

> Secondly, I believe that this is a separate problem from bug 22278.
> hpet_readl() is correctly using volatile internally, but its result is
> being assigned to a pair of normal integers (not declared as volatile).
> In the context of wait_hpet_tick, all the variables are unqualified so
> gcc is allowed to optimize the comparison away.
>
> The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
> where the return value from hpet_readl() is assigned to a normal
> variable. Nothing in the C standard says that those unqualified
> variables should be magically treated as volatile, just because the
> original code that extracted the value used volatile. IOW, time_hpet.c
> needs to declare any variables that hold the result of hpet_readl() as
> being volatile variables.

I disagree with this.

readl() returns values from an opaque source, and it is declared
as such to show this to GCC. It's like a function that GCC
cannot see the implementation of, which it cannot determine
anything about wrt. return values.

The volatile'ness does not simply disappear the moment you
assign the result to some local variable which is not volatile.

Half of our drivers would break if this were true.

2006-11-29 04:30:42

by Keith Owens

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

David Miller (on Tue, 28 Nov 2006 20:04:53 -0800 (PST)) wrote:
>From: Keith Owens <[email protected]>
>Date: Wed, 29 Nov 2006 14:56:20 +1100
>
>> Secondly, I believe that this is a separate problem from bug 22278.
>> hpet_readl() is correctly using volatile internally, but its result is
>> being assigned to a pair of normal integers (not declared as volatile).
>> In the context of wait_hpet_tick, all the variables are unqualified so
>> gcc is allowed to optimize the comparison away.
>>
>> The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
>> where the return value from hpet_readl() is assigned to a normal
>> variable. Nothing in the C standard says that those unqualified
>> variables should be magically treated as volatile, just because the
>> original code that extracted the value used volatile. IOW, time_hpet.c
>> needs to declare any variables that hold the result of hpet_readl() as
>> being volatile variables.
>
>I disagree with this.
>
>readl() returns values from an opaque source, and it is declared
>as such to show this to GCC. It's like a function that GCC
>cannot see the implementation of, which it cannot determine
>anything about wrt. return values.
>
>The volatile'ness does not simply disappear the moment you
>assign the result to some local variable which is not volatile.
>
>Half of our drivers would break if this were true.

This is definitely a gcc bug, 4.1.0 is doing something weird. Compile
with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and the bug appears,
CONFIG_CC_OPTIMIZE_FOR_SIZE=y has no problem.

Compile with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and _either_ of the patches
below and the problem disappears.

Index: linux/arch/i386/kernel/time_hpet.c
===================================================================
--- linux.orig/arch/i386/kernel/time_hpet.c 2006-11-29 13:51:33.900462088 +1100
+++ linux/arch/i386/kernel/time_hpet.c 2006-11-29 15:25:47.853245938 +1100
@@ -35,7 +35,8 @@ static void __iomem * hpet_virt_address;

int hpet_readl(unsigned long a)
{
- return readl(hpet_virt_address + a);
+ volatile int v = readl(hpet_virt_address + a);
+ return v;
}

static void hpet_writel(unsigned long d, unsigned long a)


Index: linux-2.6/arch/i386/kernel/time_hpet.c
===================================================================
--- linux-2.6.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6/arch/i386/kernel/time_hpet.c
@@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
*/
static void __devinit wait_hpet_tick(void)
{
- unsigned int start_cmp_val, end_cmp_val;
+ unsigned volatile int start_cmp_val, end_cmp_val;

start_cmp_val = hpet_readl(HPET_T0_CMP);
do {

2006-11-29 04:57:37

by Nicholas Miell

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Wed, 2006-11-29 at 15:30 +1100, Keith Owens wrote:
> David Miller (on Tue, 28 Nov 2006 20:04:53 -0800 (PST)) wrote:
> >From: Keith Owens <[email protected]>
> >Date: Wed, 29 Nov 2006 14:56:20 +1100
> >
> >> Secondly, I believe that this is a separate problem from bug 22278.
> >> hpet_readl() is correctly using volatile internally, but its result is
> >> being assigned to a pair of normal integers (not declared as volatile).
> >> In the context of wait_hpet_tick, all the variables are unqualified so
> >> gcc is allowed to optimize the comparison away.
> >>
> >> The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
> >> where the return value from hpet_readl() is assigned to a normal
> >> variable. Nothing in the C standard says that those unqualified
> >> variables should be magically treated as volatile, just because the
> >> original code that extracted the value used volatile. IOW, time_hpet.c
> >> needs to declare any variables that hold the result of hpet_readl() as
> >> being volatile variables.
> >
> >I disagree with this.
> >
> >readl() returns values from an opaque source, and it is declared
> >as such to show this to GCC. It's like a function that GCC
> >cannot see the implementation of, which it cannot determine
> >anything about wrt. return values.
> >
> >The volatile'ness does not simply disappear the moment you
> >assign the result to some local variable which is not volatile.
> >
> >Half of our drivers would break if this were true.
>
> This is definitely a gcc bug, 4.1.0 is doing something weird. Compile
> with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and the bug appears,
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y has no problem.
>
> Compile with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and _either_ of the patches
> below and the problem disappears.
>

My theory: gcc is inlining readl into hpet_readl (readl is an inline
function, so it should be doing this no matter what), and inlining
hpet_readl into wait_hpet_tick (otherwise, it can't possibly make any
assumptions about the return values of hpet_readl -- this looks to be a
SUSE-specific over-aggressive optimization), and somewhere along the way
the volatile qualifier is getting lost.

--
Nicholas Miell <[email protected]>

2006-11-29 09:08:18

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Wed, Nov 29, 2006 at 02:56:20PM +1100, Keith Owens wrote:
> Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
> >On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
> >> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
> >> wait_hpet_tick is optimized away to a never ending loop and the kernel
> >> hangs on boot in timer setup.
> >>
> >> 0000001a <wait_hpet_tick>:
> >> 1a: 55 push %ebp
> >> 1b: 89 e5 mov %esp,%ebp
> >> 1d: eb fe jmp 1d <wait_hpet_tick+0x3>
> >>
> >> This is not a problem with gcc 3.3.5. Adding barrier() calls to
> >> wait_hpet_tick does not help, making the variables volatile does.
> >>
> >> Signed-off-by: Keith Owens <[email protected]>
> >>
> >> ---
> >> arch/i386/kernel/time_hpet.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> Index: linux-2.6/arch/i386/kernel/time_hpet.c
> >> ===================================================================
> >> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
> >> +++ linux-2.6/arch/i386/kernel/time_hpet.c
> >> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
> >> */
> >> static void __devinit wait_hpet_tick(void)
> >> {
> >> - unsigned int start_cmp_val, end_cmp_val;
> >> + unsigned volatile int start_cmp_val, end_cmp_val;
> >>
> >> start_cmp_val = hpet_readl(HPET_T0_CMP);
> >> do {
> >
> >When you examine the inlined functions involved, this looks an awful lot
> >like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278
> >
> >Perhaps SUSE should fix their gcc instead of working around compiler
> >problems in the kernel?
>
> Firstly, the fix for 22278 is included in gcc 4.1.0.

This actually sounds more like http://gcc.gnu.org/PR27236
And that one is broken in 4.1.0, fixed in 4.1.1.

Jakub

2006-11-29 20:14:30

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Wed, Nov 29, 2006 at 04:08:00AM -0500, Jakub Jelinek wrote:
> On Wed, Nov 29, 2006 at 02:56:20PM +1100, Keith Owens wrote:
> > Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
> > >On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
> > >> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
> > >> wait_hpet_tick is optimized away to a never ending loop and the kernel
> > >> hangs on boot in timer setup.
> > >>
> > >> 0000001a <wait_hpet_tick>:
> > >> 1a: 55 push %ebp
> > >> 1b: 89 e5 mov %esp,%ebp
> > >> 1d: eb fe jmp 1d <wait_hpet_tick+0x3>
> > >>
> > >> This is not a problem with gcc 3.3.5. Adding barrier() calls to
> > >> wait_hpet_tick does not help, making the variables volatile does.
> > >>
> > >> Signed-off-by: Keith Owens <[email protected]>
> > >>
> > >> ---
> > >> arch/i386/kernel/time_hpet.c | 2 +-
> > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> Index: linux-2.6/arch/i386/kernel/time_hpet.c
> > >> ===================================================================
> > >> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
> > >> +++ linux-2.6/arch/i386/kernel/time_hpet.c
> > >> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
> > >> */
> > >> static void __devinit wait_hpet_tick(void)
> > >> {
> > >> - unsigned int start_cmp_val, end_cmp_val;
> > >> + unsigned volatile int start_cmp_val, end_cmp_val;
> > >>
> > >> start_cmp_val = hpet_readl(HPET_T0_CMP);
> > >> do {
> > >
> > >When you examine the inlined functions involved, this looks an awful lot
> > >like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278
> > >
> > >Perhaps SUSE should fix their gcc instead of working around compiler
> > >problems in the kernel?
> >
> > Firstly, the fix for 22278 is included in gcc 4.1.0.
>
> This actually sounds more like http://gcc.gnu.org/PR27236
> And that one is broken in 4.1.0, fixed in 4.1.1.

Then why not simply check for gcc 4.1.0 in compiler.h and refuse to build
with 4.1.0 if it's known to produce bad code ? I find it a bit annoying
that we start fixing every place affected by buggy compiler versions,
especially when the fix is already available.

> Jakub

Regards,
Willy

2006-11-30 01:05:20

by David Schwartz

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


Ask yourself this question: Can an assignment to a non-volatile variable be
optimized out? Then ask yourself this question: Does casting away volatile
make it not volatile any more?

> The volatile'ness does not simply disappear the moment you
> assign the result to some local variable which is not volatile.

Yes, it does. That's what a cast does, it tells the compiler to, in all
respects, pretend that a variable is of a different type than it 'actually
is', such that it actually isn't anymore.

> Half of our drivers would break if this were true.

On the contrary, they'd break if it was true. If casting away volatile
didn't make it go away, then casting in volatile wouldn't have to make it
appear. A cast causes the compiler to act as if a variable really was the
type you cast it to. If you cast volatile away, that has the reverse of the
same affect casting to volatile has.

The 'readl' function should actually assign the value to a volatile
variable. Assignments to volatiles cannot be cast away, but casts can and
assignments to non-volatile variables can be optimized out.

DS


2006-12-01 05:06:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Wed, 29 Nov 2006 21:14:10 +0100
Willy Tarreau <[email protected]> wrote:

> Then why not simply check for gcc 4.1.0 in compiler.h and refuse to build
> with 4.1.0 if it's known to produce bad code ?

Think so. I'll queue this and see how many howls it causes.

--- a/init/main.c~gcc-4-1-0-is-bust
+++ a/init/main.c
@@ -75,6 +75,10 @@
#error Sorry, your GCC is too old. It builds incorrect kernels.
#endif

+#if __GNUC__ == 4 && __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ == 0
+#error gcc-4.1.0 is known to miscompile the kernel. Please use a different compiler version.
+#endif
+
static int init(void *);

extern void init_IRQ(void);
_

2006-12-01 05:14:32

by Keith Owens

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

Andrew Morton (on Thu, 30 Nov 2006 21:05:51 -0800) wrote:
>On Wed, 29 Nov 2006 21:14:10 +0100
>Willy Tarreau <[email protected]> wrote:
>
>> Then why not simply check for gcc 4.1.0 in compiler.h and refuse to build
>> with 4.1.0 if it's known to produce bad code ?
>
>Think so. I'll queue this and see how many howls it causes.
>
>--- a/init/main.c~gcc-4-1-0-is-bust
>+++ a/init/main.c
>@@ -75,6 +75,10 @@
> #error Sorry, your GCC is too old. It builds incorrect kernels.
> #endif
>
>+#if __GNUC__ == 4 && __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ == 0
>+#error gcc-4.1.0 is known to miscompile the kernel. Please use a different compiler version.
>+#endif
>+
> static int init(void *);
>
> extern void init_IRQ(void);

SuSE's SLES10 ships with gcc 4.1.0. There is nothing to stop a
distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
this patch would not allow the fixed compiler to build the kernel.

2006-12-01 05:28:16

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Fri, Dec 01, 2006 at 04:14:04PM +1100, Keith Owens wrote:
> Andrew Morton (on Thu, 30 Nov 2006 21:05:51 -0800) wrote:
> >On Wed, 29 Nov 2006 21:14:10 +0100
> >Willy Tarreau <[email protected]> wrote:
> >
> >> Then why not simply check for gcc 4.1.0 in compiler.h and refuse to build
> >> with 4.1.0 if it's known to produce bad code ?
> >
> >Think so. I'll queue this and see how many howls it causes.
> >
> >--- a/init/main.c~gcc-4-1-0-is-bust
> >+++ a/init/main.c
> >@@ -75,6 +75,10 @@
> > #error Sorry, your GCC is too old. It builds incorrect kernels.
> > #endif
> >
> >+#if __GNUC__ == 4 && __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ == 0
> >+#error gcc-4.1.0 is known to miscompile the kernel. Please use a different compiler version.
> >+#endif
> >+
> > static int init(void *);
> >
> > extern void init_IRQ(void);
>
> SuSE's SLES10 ships with gcc 4.1.0. There is nothing to stop a
> distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
> this patch would not allow the fixed compiler to build the kernel.

Then maybe replace #error with #warning ? It's too dangerous to let people
build their kernel with a known broken compiler without being informed.

I think this shows the limit of backports to known broken versions.
Providing a full update to 4.1.1 would certainly be cleaner for all
customers than backporting 4.1.1 to 4.1.0 and calling it 4.1.0.

Another solution would be to be able to check gcc for known bugs in the
makefile, just like we check it for specific options. But I don't know
how we can check gcc for bad code, especially in cross-compile environments
:-/

Willy

2006-12-01 05:51:08

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Nov 29, 2006, at 20:04:28, David Schwartz wrote:
> Ask yourself this question: Can an assignment to a non-volatile
> variable be optimized out? Then ask yourself this question: Does
> casting away volatile make it not volatile any more?

Hmm, let's put this in a C file:
> int my_global = 0;

And put this in a header:
> extern int my_global;
> static int my_func(int a)
> {
> return a + my_global++;
> }

Now put this in another source file
> #include <my_header.h>
>
> int main()
> {
> while (my_func(3) < 5)
> printf("It hasn't happened yet!\n");
> return 0;
> }

The obvious result is that it prints "It hasn't happened yet!"
twice. On the third iteration when my_global == 2 it breaks out of
the loop. This is all fairly standard C so far.

Imagine we change the code to read from some auto-increment hardware
at a specific MMIO address instead of a global:
> static int my_func(int a)
> {
> return a + *(volatile int *)(0xABCD1234);
> }

But you're telling me that the change in the header file (where the
new function returns the exact same series of values with every call
as the old) causes the program to enter an infinite loop?

How do you rationalize this again?

> The 'readl' function should actually assign the value to a volatile
> variable. Assignments to volatiles cannot be cast away, but casts
> can and assignments to non-volatile variables can be optimized out.

Actually, no. The reason for the volatile in the pointer dereference
is to force the memory access to *always* happen. It's a guarantee
that the compiler will do that memory access every time it appears.
You have a volatile int afterwards and what you do with that nobody
cares. The point is the presence of the volatile in a single pointer-
dereference forcibly turns off any optimization of that specific
access, including loop unrolling and such.

Cheers,
Kyle Moffett

2006-12-01 06:33:40

by Keith Owens

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

Willy Tarreau (on Fri, 1 Dec 2006 06:26:53 +0100) wrote:
>On Fri, Dec 01, 2006 at 04:14:04PM +1100, Keith Owens wrote:
>> SuSE's SLES10 ships with gcc 4.1.0. There is nothing to stop a
>> distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
>> this patch would not allow the fixed compiler to build the kernel.
>
>Then maybe replace #error with #warning ? It's too dangerous to let people
>build their kernel with a known broken compiler without being informed.

Agreed.

>I think this shows the limit of backports to known broken versions.
>Providing a full update to 4.1.1 would certainly be cleaner for all
>customers than backporting 4.1.1 to 4.1.0 and calling it 4.1.0.

Agreed, but Enterprise customers expect bug fixes, not wholesale
replacements of critical programs.

>Another solution would be to be able to check gcc for known bugs in the
>makefile, just like we check it for specific options. But I don't know
>how we can check gcc for bad code, especially in cross-compile environments

It is doable, but it is as ugly as hell. Note the lack of a
signed-off-by :)

---
arch/i386/kernel/Makefile | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

Index: linux/arch/i386/kernel/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -83,3 +83,19 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.
k8-y += ../../x86_64/kernel/k8.o
stacktrace-y += ../../x86_64/kernel/stacktrace.o

+# Some versions of gcc generate invalid code for hpet_timer, depending
+# on other config options. Make sure that the generated code is valid.
+# Invalid versions of gcc generate a tight loop in wait_hpet_tick, with
+# no 'cmp' instructions. Extract the generated object code for
+# wait_hpet_tick, down to the next function then check that the code
+# contains at least one comparison.
+
+ifeq ($(CONFIG_HPET_TIMER),y)
+$(obj)/built-in.o: $(obj)/.tmp_check_gcc_bug
+
+$(obj)/.tmp_check_gcc_bug: $(obj)/time_hpet.o
+ $(Q)[ -n "`$(OBJDUMP) -Sr $< | grep -A40 '<wait_hpet_tick>:' | sed -e '1d; />:$$/,$$d;' | grep -w cmp`" ] || \
+ (echo gcc volatile bug detected, fix your gcc; exit 1)
+ $(Q)touch $@
+
+endif

2006-12-01 07:28:37

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Fri, Dec 01, 2006 at 05:32:42PM +1100, Keith Owens wrote:
> Willy Tarreau (on Fri, 1 Dec 2006 06:26:53 +0100) wrote:
> >On Fri, Dec 01, 2006 at 04:14:04PM +1100, Keith Owens wrote:
> >> SuSE's SLES10 ships with gcc 4.1.0. There is nothing to stop a
> >> distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
> >> this patch would not allow the fixed compiler to build the kernel.
> >
> >Then maybe replace #error with #warning ? It's too dangerous to let people
> >build their kernel with a known broken compiler without being informed.
>
> Agreed.
>
> >I think this shows the limit of backports to known broken versions.
> >Providing a full update to 4.1.1 would certainly be cleaner for all
> >customers than backporting 4.1.1 to 4.1.0 and calling it 4.1.0.
>
> Agreed, but Enterprise customers expect bug fixes, not wholesale
> replacements of critical programs.

Oh, I'm perfectly aware of this. That's in part why I started the hotfix
branch in the past :-) But sometimes, fixes consist in merging all the
patches from the maintenance branch (eg: from 4.1.0 to 4.1.1), and if
this is the case, there would not be much justification not to simply
update the version. In fact, what's really missing is a "fixlevel" in
the packages, to inform the user that 4.1.0 as shipped by the distro
has the same level of fixes as 4.1.1. But this is what the version is
used for today.

> >Another solution would be to be able to check gcc for known bugs in the
> >makefile, just like we check it for specific options. But I don't know
> >how we can check gcc for bad code, especially in cross-compile environments
>
> It is doable, but it is as ugly as hell. Note the lack of a
> signed-off-by :)

Yes it's ugly, but at least it's a proposal. We might need something like
this as a more generic solution across all architectures to easily check
for such known problems. It's somewhat comparable to runtime fixups but
here it's build time fixups.

Regards,
Willy

> ---
> arch/i386/kernel/Makefile | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> Index: linux/arch/i386/kernel/Makefile
> ===================================================================
> --- linux.orig/arch/i386/kernel/Makefile
> +++ linux/arch/i386/kernel/Makefile
> @@ -83,3 +83,19 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.
> k8-y += ../../x86_64/kernel/k8.o
> stacktrace-y += ../../x86_64/kernel/stacktrace.o
>
> +# Some versions of gcc generate invalid code for hpet_timer, depending
> +# on other config options. Make sure that the generated code is valid.
> +# Invalid versions of gcc generate a tight loop in wait_hpet_tick, with
> +# no 'cmp' instructions. Extract the generated object code for
> +# wait_hpet_tick, down to the next function then check that the code
> +# contains at least one comparison.
> +
> +ifeq ($(CONFIG_HPET_TIMER),y)
> +$(obj)/built-in.o: $(obj)/.tmp_check_gcc_bug
> +
> +$(obj)/.tmp_check_gcc_bug: $(obj)/time_hpet.o
> + $(Q)[ -n "`$(OBJDUMP) -Sr $< | grep -A40 '<wait_hpet_tick>:' | sed -e '1d; />:$$/,$$d;' | grep -w cmp`" ] || \
> + (echo gcc volatile bug detected, fix your gcc; exit 1)
> + $(Q)touch $@
> +
> +endif

2006-12-01 07:57:38

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Fri, Dec 01, 2006 at 08:28:16AM +0100, Willy Tarreau wrote:
> Oh, I'm perfectly aware of this. That's in part why I started the hotfix
> branch in the past :-) But sometimes, fixes consist in merging all the
> patches from the maintenance branch (eg: from 4.1.0 to 4.1.1), and if
> this is the case, there would not be much justification not to simply
> update the version. In fact, what's really missing is a "fixlevel" in
> the packages, to inform the user that 4.1.0 as shipped by the distro
> has the same level of fixes as 4.1.1. But this is what the version is
> used for today.

This is even more complicated by the fact that upstream GCC release branches
(and also several Linux distributors) start announcing the upcoming version
already a few days after a release is tagged.
E.g. 14 days old gcc-4_1-branch says:
./xgcc -B ./ --version; ./xgcc -B ./ -dD -E -xc /dev/null | grep GNU
xgcc (GCC) 4.1.2 20061114 (prerelease)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#define __GNUC__ 4
#define __GNUC_MINOR__ 1
#define __GNUC_PATCHLEVEL__ 2

but GCC 4.1.2 has not been released yet.
In Fedora Core/RHEL and I think a few other distros the version number
is only changed when it is officially released, e.g.:

gcc --version; gcc -dD -E -xc /dev/null | grep GNU
gcc (GCC) 4.1.1 20061011 (Red Hat 4.1.1-30)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#define __GNUC__ 4
#define __GNUC_MINOR__ 1
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC_RH_RELEASE__ 30

Note, 4.1.1 was released end of May this year and 4.1.2 has not been
released. So, using __GNUC_PATCHLEVEL__ to detect if a bug has been fixed
or not isn't very useful (you'd need to rule out also __GNUC_PATCHLEVEL__ <= 1
because gcc-4_1-branch was announcing that patchlevel already since
beggining of March, on the other side there is a lot of GCCs with
__GNUC_PATCHLEVEL__ == 1 that certainly have that bug fixed).

You perhaps could parse the prerelease vs. release vs. vendor strings,
but that could be quite difficult, perhaps easier would be just parse the
date in the --version output. Checking for the bug is best though, because
that will catch even backports of the bugfix without rebasing from the
release branch.

Jakub

2006-12-01 11:24:59

by David Schwartz

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


> Imagine we change the code to read from some auto-increment hardware
> at a specific MMIO address instead of a global:
> > static int my_func(int a)
> > {
> > return a + *(volatile int *)(0xABCD1234);
> > }

> But you're telling me that the change in the header file (where the
> new function returns the exact same series of values with every call
> as the old) causes the program to enter an infinite loop?
>
> How do you rationalize this again?

No, I'm not saying that. I'm saying it *can*.

We try to write code so that no matter what information the compiler has, it
will still build correct code if the compiler is correct. When we don't do
this, smarter compilers compile our code into executables that don't do what
we want.

In some cases, it's very unlikely that compilers will ever become smart
enough to demonstrate that our code is broken, but that doesn't make the
code any less broken, just less likely to fail.

> > The 'readl' function should actually assign the value to a volatile
> > variable. Assignments to volatiles cannot be cast away, but casts
> > can and assignments to non-volatile variables can be optimized out.

> Actually, no. The reason for the volatile in the pointer dereference
> is to force the memory access to *always* happen.

That's why it was placed there, however it was thrown away right after it
was placed, in the same step it was supposed to force a memory access.

> It's a guarantee
> that the compiler will do that memory access every time it appears.

Unless you throw it away before the memory access or in the same step as the
memory access, say by casting it.

> You have a volatile int afterwards and what you do with that nobody
> cares.

You have a volatile int unless you cast it so something else.

> The point is the presence of the volatile in a single pointer-
> dereference forcibly turns off any optimization of that specific
> access, including loop unrolling and such.

It did that in the past, because optimizers weren't smart enough. Now
they're smarter, and so the breakage in the code is becoming apparent.

The code is broken because it gets rid of the volatile in the same step that
it expects the volatile to have effect. Only an assignment to a volatile
variable cannot be elided by a cast.

To put it another way:

int j=*(int *)(volatile int *)f;
is the same as
int j=*(int *)f;

Because the 'int *' cast removes the 'volatile int *' cast. This applies to
whatever is cast, and whenever it is cast or assigned.

Let's look back at 'readl':

static inline unsigned int readl(const volatile void __iomem *addr)
{
return *(volatile unsigned int __force *) addr;
}

Notice that there is no assignment to anything volatile qualified. Notice
also that before any assignment takes place, and in the same step as the
access that you thing can't be eliminated, the result is cast to an
'unsigned int' and returned.

The problem is that '*(volatile unsigned int *)' results in a 'volatile
unsigned int'. The *assignment* occurs in the return operation, after the
'volatile unsigned int' is *cast* to a plain 'unsigned int'. The assignment
is *not* in any sense volatile or inviolate, so neither is the return value.

One solution would be this:

static inline unsigned int readl(const volatile void __iomem *addr)
{
volatile unsigned int j;
j=*(volatile unsigned int __force *) addr;
return j;
}

This will probably result in an extra memory access though. There are
probably better solutions but I can't think of any at the moment.

(This may or may not fix the issue though. There is at least one known
compiler issue that might be causing the breakage. However, correct compiler
optimizations should be ruled out first.)

DS


2006-12-01 12:08:55

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Dec 01, 2006, at 06:24:54, David Schwartz wrote:
>> Imagine we change the code to read from some auto-increment hardware
>> at a specific MMIO address instead of a global:
>>> static int my_func(int a)
>>> {
>>> return a + *(volatile int *)(0xABCD1234);
>>> }
>
>> But you're telling me that the change in the header file (where the
>> new function returns the exact same series of values with every call
>> as the old) causes the program to enter an infinite loop?
>>
>> How do you rationalize this again?
>
> No, I'm not saying that. I'm saying it *can*.

No, it can't. If you leave the prototype alone and the function when
called in sequence returns the same list of values, then by
definition the internals can have no effect on the code which uses
that function. As further proof, if you wrapped my "my_func()" with
this in some C file:
int my_other_func(int a)
{
return my_func(a);
}

Next you stick a my_other_func declaration in a header and use
my_other_func instead of my_func() in the main function. Now the
result is that the compiler has no damn clue what my_other_func()
contains so it can't optimize it out of the loop with either
version. You cannot treat "volatile" the way you are saying it is
treated without severely violating both the C99 spec *and* common sense.

> In some cases, it's very unlikely that compilers will ever become
> smart enough to demonstrate that our code is broken, but that
> doesn't make the code any less broken, just less likely to fail.

No, the code is not broken because the language simply isn't defined
that way. Essentially when the compiler is looking at any volatile
data it cannot ignore or optimize-away any operations on that data.
On the other hand, when you cast volatile data into non-volatile
data, the compiler must preserve linearity of program execution. If
you call a function in a loop which dereferences a pointer to a
volatile then the compiler *MUST* always dereference the pointer,
even if it later discards the result and continues on its merry way.

>> Actually, no. The reason for the volatile in the pointer
>> dereference is to force the memory access to *always* happen.
>
> That's why it was placed there, however it was thrown away right
> after it was placed, in the same step it was supposed to force a
> memory access.

Doesn't matter if you throw away the result. The C standard defines
this:
(void)( *(volatile int *)0xABCD1234 );
to imply that the code reads an integer from that memory location and
then discards the result. The whole point of volatile is you still
MUST do the read. Feel free to read the bugzilla entry mentioned in
this thread as it even quotes all the pertinent sections of the C
standard for you.

> The problem is that '*(volatile unsigned int *)' results in a
> 'volatile unsigned int'. The *assignment* occurs in the return
> operation, after the 'volatile unsigned int' is *cast* to a plain
> 'unsigned int'. The assignment is *not* in any sense volatile or
> inviolate, so neither is the return value.

No, the assignment is irrelevant but the pointer dereference in
rvalue context *is* relevant. The dereference forces a read operation.

> One solution would be this:
>
> static inline unsigned int readl(const volatile void __iomem *addr)
> {
> volatile unsigned int j;
> j=*(volatile unsigned int __force *) addr;
> return j;
> }

This is no different from the current code. You cast the pointer to
a volatile unsigned int, you dereference that pointer, and you cast
the result to an unsigned int. C does not have the same "assignment"
distinctions that C++ has.

> (This may or may not fix the issue though. There is at least one
> known compiler issue that might be causing the breakage. However,
> correct compiler optimizations should be ruled out first.)

Nope, any GCC behavior requiring code such as the above is broken,
not just in my opinion but in the opinion of the GCC developers
themselves, as you'd notice if you took the time to read down to the
end of the bugzilla discussion.

Cheers,
Kyle Moffett

2006-12-01 12:31:46

by Andreas Schwab

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

"David Schwartz" <[email protected]> writes:

> The problem is that '*(volatile unsigned int *)' results in a 'volatile
> unsigned int'.

No, it doesn't. Values don't have qualifiers, only objects have.
Qualifiers on rvalues are meaningless.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2006-12-01 13:53:24

by David Schwartz

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


> No, it can't. If you leave the prototype alone and the function when
> called in sequence returns the same list of values, then by
> definition the internals can have no effect on the code which uses
> that function. As further proof, if you wrapped my "my_func()" with
> this in some C file:
> int my_other_func(int a)
> {
> return my_func(a);
> }

This is an extremely strange position to take. If the function is in some
sense broken or invokes undefined or platform-defined behavior, it most
certainly can affect the code that uses that function.

In this case, the code invokes implementation-defined behavior and is
inlined. As a result, the outcome of the code is implementation-defined.

> Next you stick a my_other_func declaration in a header and use
> my_other_func instead of my_func() in the main function. Now the
> result is that the compiler has no damn clue what my_other_func()
> contains so it can't optimize it out of the loop with either
> version. You cannot treat "volatile" the way you are saying it is
> treated without severely violating both the C99 spec *and* common sense.

The compiler *happens* to have no damn clue because such inter-module
optimizations don't exist. That doesn't make the code correct, just not
likely to demonstrate its brokenness.

> > In some cases, it's very unlikely that compilers will ever become
> > smart enough to demonstrate that our code is broken, but that
> > doesn't make the code any less broken, just less likely to fail.

> No, the code is not broken because the language simply isn't defined
> that way. Essentially when the compiler is looking at any volatile
> data it cannot ignore or optimize-away any operations on that data.

Agreed. But it's not looking at any volatile data. (What data is volatile? A
pointer is only *cast* to volatile, but then it's cast to a non-volatile
type.)

> On the other hand, when you cast volatile data into non-volatile
> data, the compiler must preserve linearity of program execution. If
> you call a function in a loop which dereferences a pointer to a
> volatile then the compiler *MUST* always dereference the pointer,
> even if it later discards the result and continues on its merry way.

I agree, but that's not what happens. It does not discard the result, it
casts the volatile away.

> >> Actually, no. The reason for the volatile in the pointer
> >> dereference is to force the memory access to *always* happen.
> >
> > That's why it was placed there, however it was thrown away right
> > after it was placed, in the same step it was supposed to force a
> > memory access.

> Doesn't matter if you throw away the result. The C standard defines
> this:
> (void)( *(volatile int *)0xABCD1234 );
> to imply that the code reads an integer from that memory location and
> then discards the result. The whole point of volatile is you still
> MUST do the read.

The C standard, AFAICT, doesn't cover this case. The cast to void means that
no volatile object is ever accessed. One is named, but then the compiler is
explicitly told to treat it like it's not volatile. The cast does not come
after the reference, there is no sequence point here.

> Feel free to read the bugzilla entry mentioned in
> this thread as it even quotes all the pertinent sections of the C
> standard for you.

The bugzilla entry is about a different issue. If there are sections of the
C standard that you think affect the case where the volatile is discarded by
cast or conversion and never stored in any volatile-qualified variable,
please tell me what they are.

> > The problem is that '*(volatile unsigned int *)' results in a
> > 'volatile unsigned int'. The *assignment* occurs in the return
> > operation, after the 'volatile unsigned int' is *cast* to a plain
> > 'unsigned int'. The assignment is *not* in any sense volatile or
> > inviolate, so neither is the return value.
>
> No, the assignment is irrelevant but the pointer dereference in
> rvalue context *is* relevant. The dereference forces a read operation.

The dereference forces a read operation if and only if it results in a
volatile value or is assigned to a volatile variable. That's what the
standard says. In this case, you specifically ask the compiler to pretend
the result is not volatile prior to *any* assignment.

For example, the C standard talks about "accessing a volatile object", but
there is no volatile object here. Only an object asked to be treated as
volatile in the same sequence it is asked to be treated as not volatile.
Accesses to volatile objects may not be reordered across sequence points and
must be stable at sequence points, but there is no sequence point here
between the cast to volatile and the coversion away from volatile.

In fact, when the underlying objects are non-volatile, casting a pointer to
volatile and dereferencing it does not provide *any* guarantees from the C
standard. The C standard talks about accesses to volatile objects, not
accesses no non-volatile objects through pointers that claim the object is
volatile.

> > One solution would be this:
> >
> > static inline unsigned int readl(const volatile void __iomem *addr)
> > {
> > volatile unsigned int j;
> > j=*(volatile unsigned int __force *) addr;
> > return j;
> > }

> This is no different from the current code.

Yes it is, for two reasons. One is that sequence point occurs between the
volatile access and any cast away from volatile. The other is that the value
is assigned to a volatile-qualified variable, which is what the standard
says cannot be elided.

To put it another way, in the existing code, no volatile objects exist and
hence none are accessed. In the code above, an assignment is made to a
volatile object, and that assignment cannot be elided.

> You cast the pointer to
> a volatile unsigned int, you dereference that pointer, and you cast
> the result to an unsigned int. C does not have the same "assignment"
> distinctions that C++ has.

You are pretending there are sequence points in sequential casts in C. There
are not. If a cast had a side-effect, there is nothing in the C standard
that would require the side-effects to occur in the order of the casts (C++
is different). You do not dereference the pointer and then cast the result
to an unsigned int. You dereference the pointer and get an unsigned int in
some arbitrary order or method. There is no assurance that the volatile
takes effect before it is cast away.

> > (This may or may not fix the issue though. There is at least one
> > known compiler issue that might be causing the breakage. However,
> > correct compiler optimizations should be ruled out first.)

> Nope, any GCC behavior requiring code such as the above is broken,
> not just in my opinion but in the opinion of the GCC developers
> themselves, as you'd notice if you took the time to read down to the
> end of the bugzilla discussion.

I read the bugzilla discussion. It's about a different issue. It may be that
the fix for that issue happens to fix this too, because the compiler
actually is not smart enough to make the optimization I am claiming it is
legal for it to make.

In fact, I think the compiler is trying to make a different optimization
entirely and is thinking this case is that case. But that doesn't change the
fact that this code is broken and a future compiler that does make the
optimizations I am claiming are legal may show the problem.

Unfortunately, the problem may be that I'm too right. If you only follow the
C standard, there is no difference between:

*(volatile int *)foo;
and
*(int *)foo;

If 'foo' is volatile, the access is to a volatile object in both cases. If
'foo' is not volatile, the access is to a non-volatile object in both cases.

That would obviously be a pretty useless implementation of 'volatile'. And
since I cannot suggest any change that fixes the problem without making the
code worse, it comes down to a compiler QOI issue. If there's no right way
to do something this simple, the compiler is broken no matter what the
standard says.

DS


2006-12-01 14:04:14

by David Schwartz

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


> "David Schwartz" <[email protected]> writes:
>
> > The problem is that '*(volatile unsigned int *)' results in a 'volatile
> > unsigned int'.
>
> No, it doesn't. Values don't have qualifiers, only objects have.
> Qualifiers on rvalues are meaningless.

Yeah. That's the problem here. The 'volatile' has no object to qualify. You
are essentially lying to the compiler (telling it the pointer points to a
volatile object when it doesn't) and hoping it does the right thing.

Nothing in the standard requires any special behavior for accesses through
volatile-qualified pointers. It only requires special behavior for access to
objects that are in fact volatile.

I think the technically right solution is some mechanism to define an object
(which can be volatile-qualified) that exists at a particular address.
Accessing this object would be accessing a volatile object and you'd get all
the things the standard promises.

An adequate solution would probably be to make 'readl' return a
volatile-qualified unsigned integer. However, I'd have no complaints if GCC
provided stronger volatile guarantees than the C standard does, assuring
that even subsequent casts or other changes still assure the access takes
place where you expect it to. Just the guarantees in the standard get you
only signals and longjmp.

It comes down to just what those guarantees GCC provides actually are.

DS


2006-12-02 09:03:34

by Jan Engelhardt

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


>> Next you stick a my_other_func declaration in a header and use
>> my_other_func instead of my_func() in the main function. Now the
>> result is that the compiler has no damn clue what my_other_func()
>> contains so it can't optimize it out of the loop with either
>> version. You cannot treat "volatile" the way you are saying it is
>> treated without severely violating both the C99 spec *and* common sense.
>
>The compiler *happens* to have no damn clue because such inter-module
>optimizations don't exist. That doesn't make the code correct, just not
>likely to demonstrate its brokenness.

GCC has inter-module optimization, it's just not used everyday. I think
I have seen a discussion on this.

Right there it is -> http://lkml.org/lkml/2006/8/24/212



-`J'
--

2006-12-02 10:39:13

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

On Dec 01, 2006, at 09:03:26, David Schwartz wrote:
>> "David Schwartz" <[email protected]> writes:
>>> The problem is that '*(volatile unsigned int *)' results in a
>>> 'volatile unsigned int'.
>>
>> No, it doesn't. Values don't have qualifiers, only objects have.
>> Qualifiers on rvalues are meaningless.
>
> Yeah. That's the problem here. The 'volatile' has no object to
> qualify. You are essentially lying to the compiler (telling it the
> pointer points to a volatile object when it doesn't) and hoping it
> does the right thing.

No, the volatile has a perfectly valid object to qualify; the object
pointed to by the provided pointer. Casting in C is explicitly
defined so that you can change the properties of something. When I
cast a "char *" pointer to "int *", I'm not telling the compiler to
go through and magically convert a bunch of data, I'm telling it to
treat the preexisting memory addresses, whatever they happened to be,
as integer data. The _only_ place where that is not true is casting
an "int" to a "float" or vice versa, and even then it doesn't apply
to any levels of indirection (casting "int *" to "float *" is
virtually guaranteed to cause headaches for first-year CS students).

Likewise when I cast a pointer to "(volatile int *)", I am telling it
that whatever happened before I want it to treat accesses through
that new pointer as accesses to a volatile object, with all the
implied confusion. Now if my code isn't careful about aliasing and I
modified the data through a different pointer a few moments before
(and I have strict-aliasing turned on) then I may get inconsistent
results because the value has not been written to memory yet just
stored in a register.

> Nothing in the standard requires any special behavior for accesses
> through volatile-qualified pointers. It only requires special
> behavior for access to objects that are in fact volatile.

This is completely and utterly wrong. See this quote from the C99
standard pulled from the bugzilla page:
>> 6.7.3: any expression referring to an object of volatile-
>> qualified type must be evaluated strictly by the rules of the
>> abstract machine, although precisely what constitutes an "access"
>> to the object is implementation-defined. (Note, implementation-
>> defined" behavior is required to be well-defined and
>> documented*.) So if the reference in question is an "access", it
>> must occur where the abstract machine says it should.

Now GCC's documentation in multiple places refer to an empty
statement containing an lvalue or rvalue as being "read" access.
Example: The statement "int i = 0; i;" would cause "i" to be "read"
in the second statement. Since "i" is not volatile then it may be
optimized out, however if I stated "volatile int i = 0; i;", then the
compiler would allocate "i" on the stack, assign 0 to it, then read
it into a register. GCC could not optimize any part of that out
without breaking the rules of the abstract machine.

> I think the technically right solution is some mechanism to define
> an object (which can be volatile-qualified) that exists at a
> particular address. Accessing this object would be accessing a
> volatile object and you'd get all the things the standard promises.

Umm, that's exactly what "volatile" on a pointer means; treat this
memory address as containing a volatile-qualified object.

> An adequate solution would probably be to make 'readl' return a
> volatile-qualified unsigned integer.

No, you cannot return a volatile-qualified value because you a return
value is an "rvalue" and rvalues cannot be "volatile". They can't be
"const" or "restrict" either, it just doesn't make any sense.

What would these mean?

const int foo(float x);
restrict int foo(float x);

How can a return-value be const or restrict? It doesn't have an
address I can take so I can't modify it where it is. If you have an
expression for which you cannot get the address of with the "&"
operator (in other words, an rvalue) then "volatile", "const", or
"restrict" make no sense on the type of your expression.

> It comes down to just what those guarantees GCC provides actually are.

This is the first correct statement in your email. In any case the
documented GCC guarantees have always been much stronger than you
have been trying to persuade us they should be. I would argue that
the C standard somewhat indirectly specifies those guarantees but I
really don't have the heart for any more language-lawyering so I'm
going to leave it at that.

Cheers,
Kyle Moffett

2006-12-03 04:29:33

by David Schwartz

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


> > It comes down to just what those guarantees GCC provides actually are.

> This is the first correct statement in your email. In any case the
> documented GCC guarantees have always been much stronger than you
> have been trying to persuade us they should be. I would argue that
> the C standard somewhat indirectly specifies those guarantees but I
> really don't have the heart for any more language-lawyering so I'm
> going to leave it at that.

I have tried to find any documentation of the guarantees gcc actually
provides and have been unable to do so. Where are these "documented GCC
guarantees" documented?

DS


2006-12-07 14:03:28

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

My apologies for the late response, I've had a lot on my schedule
over the last week.

On Dec 02, 2006, at 23:29:28, David Schwartz wrote:
>>> It comes down to just what those guarantees GCC provides actually
>>> are.
>
>> This is the first correct statement in your email. In any case
>> the documented GCC guarantees have always been much stronger than
>> you have been trying to persuade us they should be. I would argue
>> that the C standard somewhat indirectly specifies those guarantees
>> but I really don't have the heart for any more language-lawyering
>> so I'm going to leave it at that.
>
> I have tried to find any documentation of the guarantees gcc
> actually provides and have been unable to do so. Where are these
> "documented GCC guarantees" documented?

Well, under "When is a Volatile Object Accessed" in the GCC manual
(seems to be present for at least a few versions back):
http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Volatiles.html#Volatiles

In that case it specifies that any evaluation of "*foo" in an rvalue
context specifies a read (with a few exceptions for G++ where the C++
language generally confuses things). Specifically it mentions the
statement "*src;" and discusses the statement as providing "a void
context". In other words, a statement such as "(void)(expr);" is
redundant because the statement already implies void context and the
extra cast-to-void is just extra text. As such "(void)(*src);" on a
"volatile int *src;" is documented to force a read of "*src". Now,
if you actually _use_ the result over just casting it to void and
discarding it, then GCC can provide no _less_ guarantee with regards
to the read-and-store than it provides to the read-and-discard.

For example:

/* This code is guaranteed to generate assembly to read the memory
address into a register multiple times */
volatile int *foo = (expression);
*foo;
*foo;

/* This code is guaranteed to do the same */
volatile int *foo = (expression);
int x, y;
x = *foo;
y = *foo;

If the result is actually the conditional expression for a loop as in
the problem code then GCC would plainly have no choice about
optimizing, not only is the volatile variable read and returned, but
the result is the conditional for continued execution of a block of
code with unknown side effects.

Cheers,
Kyle Moffett

2006-12-08 04:22:28

by David Schwartz

[permalink] [raw]
Subject: RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away


> In that case it specifies that any evaluation of "*foo" in an rvalue
> context specifies a read (with a few exceptions for G++ where the C++
> language generally confuses things). Specifically it mentions the
> statement "*src;" and discusses the statement as providing "a void
> context". In other words, a statement such as "(void)(expr);" is
> redundant because the statement already implies void context and the
> extra cast-to-void is just extra text. As such "(void)(*src);" on a
> "volatile int *src;" is documented to force a read of "*src". Now,
> if you actually _use_ the result over just casting it to void and
> discarding it, then GCC can provide no _less_ guarantee with regards
> to the read-and-store than it provides to the read-and-discard.

I read over this section and didn't realize the implications of the void
context. I now agree with you.

DS