[Cc: to gcc list, in case someone wants to argue about standards]
The appended patch fix incorrect code, which interferes badly with
optimizations in GCC 3.0.4 and GCC 3.1.
The GCC tries to replace the strcpy from a constant string source with
a memcpy, since the length is know at compile time.
Thus
strcpy (dst, "abcdef" + 2)
gives
memcpy (dst, "abcdef" + 2, 5)
However, GCC does not handle the case, when the above offset (2) is
not within the bounds of the string, which result in undefined
behavior according to ANSI/ISO C99.
The error is that
strcpy (namep, "linux,phandle" + 0xc0000000);
gets emitted as
memcpy (namep, "linux,phandle" + 0xc0000000, 14 - 0xc0000000);
Regards,
-velco
--- 1.3/arch/ppc/kernel/prom.c Wed Dec 26 18:27:54 2001
+++ edited/arch/ppc/kernel/prom.c Tue Jan 1 22:53:23 2002
@@ -997,7 +997,7 @@
prev_propp = &pp->next;
namep = (char *) (pp + 1);
pp->name = PTRUNRELOC(namep);
- strcpy(namep, RELOC("linux,phandle"));
+ memcpy (namep, RELOC("linux,phandle"), sizeof("linux,phandle"));
mem_start = ALIGN((unsigned long)namep + strlen(namep) + 1);
pp->value = (unsigned char *) PTRUNRELOC(&np->node);
pp->length = sizeof(np->node);
On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
> The appended patch fix incorrect code, which interferes badly with
> optimizations in GCC 3.0.4 and GCC 3.1.
>
> The GCC tries to replace the strcpy from a constant string source with
> a memcpy, since the length is know at compile time.
Check the linuxppc-dev archives, but this has been discussed before, and
it came down to a few things.
1) gcc shouldn't be making this optimization, and Paulus (who wrote the
code) disliked this new feature. As a subnote, what you sent was sent
to Linus with a comment about working around a gcc-3.0 bug/feature, and
was rejected because of this.
2) We could modify the RELOC macro, and not have this problem. The
patch was posted, but not acted upon.
3) We could also try turning off this particular optimization
(-fno-builtin perhaps) on this file, and not worry about it.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
>>>>> "Tom" == Tom Rini <[email protected]> writes:
Tom> On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
>> The appended patch fix incorrect code, which interferes badly with
>> optimizations in GCC 3.0.4 and GCC 3.1.
>>
>> The GCC tries to replace the strcpy from a constant string source with
>> a memcpy, since the length is know at compile time.
Tom> Check the linuxppc-dev archives, but this has been discussed before, and
Tom> it came down to a few things.
Well, you may discuss it again, but this time after actually reading
the C standard:
"6.3.6 Additive operators
...
9 Unless both the pointer operand and the result point to
elements of the same array object, or the pointer operand
points one past the last element of an array object and the
result points to an element of the same array object, the
behavior is undefined if the result is used as an operand of a
unary * operator that is actually evaluated."
Tom> 1) gcc shouldn't be making this optimization, and Paulus (who wrote the
Tom> code) disliked this new feature.
Why gcc shouldn't be making some optimization. Because a particular
person doesn't like it or ? What kind of statement is that anyway ?
Tom> As a subnote, what you sent was sent
Tom> to Linus with a comment about working around a gcc-3.0 bug/feature, and
Tom> was rejected because of this.
It is not a workaround, it is a fix to an invalid code, which gets
triggered by particular optimization.
Tom> 2) We could modify the RELOC macro, and not have this problem. The
Tom> patch was posted, but not acted upon.
Although all uses of the RELOC macro violate the standard, this kind
of pointer arithmetic is far too common and usually produces the
expected behavior, thus it make sense to optimize the cases where ut
breaks
Tom> 3) We could also try turning off this particular optimization
Tom> (-fno-builtin perhaps) on this file, and not worry about it.
*mutters something about head and sand* ;)
Regards,
-velco
> -----Original Message-----
> From: Tom Rini [mailto:[email protected]]
> Sent: Wednesday, January 02, 2002 12:44 AM
> To: Momchil Velikov
> Cc: [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
>
> > The appended patch fix incorrect code, which interferes badly with
> > optimizations in GCC 3.0.4 and GCC 3.1.
> >
> > The GCC tries to replace the strcpy from a constant string
> source with
> > a memcpy, since the length is know at compile time.
>
> Check the linuxppc-dev archives, but this has been discussed
> before, and
> it came down to a few things.
> 1) gcc shouldn't be making this optimization, and Paulus (who
> wrote the
> code) disliked this new feature.
Why? If memcpy can then be expanded in line, and the string is short, this
can be a *huge* win...
> As a subnote, what you sent was sent
> to Linus with a comment about working around a gcc-3.0
> bug/feature, and
> was rejected because of this.
> 2) We could modify the RELOC macro, and not have this problem. The
> patch was posted, but not acted upon.
I think we MUST modify the RELOC macro. Currently the code has an undefined
meaning WRT th eC standard, so is allowed to do almost anything from working
as expected (quite bad in fact: it may suddenly fail when sth is modified
elsewhere), to triggering the 3rd World War :-)
> 3) We could also try turning off this particular optimization
> (-fno-builtin perhaps) on this file, and not worry about it.
That's a compiler dependant way to fix the problem that creates a constraint
on the compiler: it MUST then from now continue to have a given behaviour to
some undefined feature (that is in fact not even documented...)
Note that the RELOC macro triggers an UNDEFINED behaviour, not an
IMPLEMENTATION DEFINED one, so the compiler writer is free of any constraint
and that would be a genuine extension the the C language to fix the
behaviour.
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
Momchil Velikov <[email protected]> writes:
> - strcpy(namep, RELOC("linux,phandle"));
> + memcpy (namep, RELOC("linux,phandle"), sizeof("linux,phandle"));
Doesn't this still trigger undefined behavior, as far as the C
standard is concerned? It's probably a better idea to fix the linker,
so that it performs proper relocation.
>>>>> "Florian" == Florian Weimer <[email protected]> writes:
Florian> Momchil Velikov <[email protected]> writes:
>> - strcpy(namep, RELOC("linux,phandle"));
>> + memcpy (namep, RELOC("linux,phandle"), sizeof("linux,phandle"));
Florian> Doesn't this still trigger undefined behavior, as far as the C
Florian> standard is concerned? It's probably a better idea to fix the linker,
Florian> so that it performs proper relocation.
Well, strictly speaking it _is_ undefined, however adding/subtracting
__PAGE_OFFSET is far too common operation and one can resonably expect
to get away with it in the _vast_ majority of cases. IMHO, it is
better to fix the particular case, which triggers the undefined
behaviour, as these cases are bound to be _very_ rare.
Regards,
-velco
On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
> Thus
> strcpy (dst, "abcdef" + 2)
> gives
> memcpy (dst, "abcdef" + 2, 5)
IMHO gcc should not be touching these function calls, as they are not
made to a standard C library, and thus have different behaviors. I'm
suprised that gcc tries to optimize calls to these functions just
based on their names.
The gcc manpage mentions
-ffreestanding
Assert that compilation takes place in a freestanding
environment. This implies -fno-builtin. A freestand?
ing environment is one in which the standard library
may not exist, and program startup may not necessarily
be at "main". The most obvious example is an OS ker?
nel. This is equivalent to -fno-hosted.
Why is Linux not using this? It sounds very appropriate. The only
things the manpage mentions that -fno-builtin would inhibit from being
optimized are memcpy() and alloca(). memcpy() has its own assembly
optimization and inlining on some (most?) archs, and as for alloca(),
I only see it being used a bit in the S/390 code, where the gcc
optimizations could quite possibly break something. I think
-ffreestanding definately should be used by the kernel to prevent gcc
from messing with its code in broken ways.
> Why is Linux not using this? It sounds very appropriate. The only
> things the manpage mentions that -fno-builtin would inhibit from being
> optimized are memcpy() and alloca(). memcpy() has its own assembly
There are others in newer gcc's, and on the whole gcc does a very good
job with them, so IMHO it is worth being nice to gcc to get the advantages.
> I only see it being used a bit in the S/390 code, where the gcc
> optimizations could quite possibly break something. I think
> -ffreestanding definately should be used by the kernel to prevent gcc
> from messing with its code in broken ways.
-ffreestanding for the compiler versions that support it can be added to
arch specific flags anyway
>>>>> "Aaron" == Aaron Lehmann <[email protected]> writes:
Aaron> On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
>> Thus
>> strcpy (dst, "abcdef" + 2)
>> gives
>> memcpy (dst, "abcdef" + 2, 5)
Aaron> IMHO gcc should not be touching these function calls, as they are not
Aaron> made to a standard C library, and thus have different behaviors. I'm
Aaron> suprised that gcc tries to optimize calls to these functions just
Aaron> based on their names.
IIRC, these identifiers are reserved by the C standard, thus the
compiler is right to assume that they have standard behavior. And note
that they DO have the standard behavior. It even doesn't matter if GCC
is right to judge by the names in each and every case, it is right
in _this_ case.
Aaron> The gcc manpage mentions
Aaron> -ffreestanding
Aaron> Assert that compilation takes place in a freestanding
Aaron> environment. This implies -fno-builtin. A freestand?
Aaron> ing environment is one in which the standard library
Aaron> may not exist, and program startup may not necessarily
Aaron> be at "main". The most obvious example is an OS ker?
Aaron> nel. This is equivalent to -fno-hosted.
Aaron> Why is Linux not using this? It sounds very appropriate. The only
Because it results in less optimization. I see no point in
deliberately preventing the compiler from doing optimizations.
Regards,
-velco
On Wed, Jan 02, 2002 at 12:41:28PM +0200, Momchil Velikov wrote:
> >>>>> "Florian" == Florian Weimer <[email protected]> writes:
>
> Florian> Momchil Velikov <[email protected]> writes:
> >> - strcpy(namep, RELOC("linux,phandle"));
> >> + memcpy (namep, RELOC("linux,phandle"), sizeof("linux,phandle"));
>
> Florian> Doesn't this still trigger undefined behavior, as far as the C
> Florian> standard is concerned? It's probably a better idea to fix the linker,
> Florian> so that it performs proper relocation.
>
> Well, strictly speaking it _is_ undefined, however adding/subtracting
> __PAGE_OFFSET is far too common operation and one can resonably expect
> to get away with it in the _vast_ majority of cases. IMHO, it is
> better to fix the particular case, which triggers the undefined
> behaviour, as these cases are bound to be _very_ rare.
IMHO the best thing is to change the RELOC macro, so that gcc cannot optimize
this.
E.g.:
-#define PTRRELOC(x) ((typeof(x))((unsigned long)(x) + offset))
+#define PTRRELOC(x) ({ unsigned long __x = (unsigned long)(x); \
asm ("" : "=r" (__x) : "0" (__x)); \
(typeof(x))(__x + offset); })
This way gcc cannot assume anything about it.
Jakub
On Tue, 1 Jan 2002, Tom Rini wrote:
> 3) We could also try turning off this particular optimization
> (-fno-builtin perhaps) on this file, and not worry about it.
In particular, it would probably make sense for the kernel to use
-ffreestanding (which implies -fno-builtin) throughout then selectively
enable the built-in functions that are wanted with macros such as
#define strcpy(d, s) __builtin_strcpy((d), (s))
in an appropriate header, then #undef these macros in the files that
shouldn't use the built-in functions.
--
Joseph S. Myers
[email protected]
On Wed, Jan 02, 2002 at 08:39:20AM -0700, Tom Rini wrote:
> Well, Paulus wrote 'strcpy' not 'memcpy', so why does gcc get to assume
> it's safe to change it? In this case it's certainly not.
But unless you trigger undefined behaviour, strcpy(x, "foobar" + n) is
equal to memcpy(x, "foobar" + n, sizeof("foobar") - n); and the latter is
more efficient (you don't have to check for end-of-string during copying).
> > It is not a workaround, it is a fix to an invalid code, which gets
> > triggered by particular optimization.
>
> By a particular optimization that's not present before gcc-3.0, and
> happens to break things under some conditions, as you've seen.
It just happens to do a different thing than it used to when seeing code
with particular case of undefined behaviour.
Jakub
On Wed, Jan 02, 2002 at 08:54:08AM +0200, Momchil Velikov wrote:
> >>>>> "Tom" == Tom Rini <[email protected]> writes:
>
> Tom> On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
>
> >> The appended patch fix incorrect code, which interferes badly with
> >> optimizations in GCC 3.0.4 and GCC 3.1.
> >>
> >> The GCC tries to replace the strcpy from a constant string source with
> >> a memcpy, since the length is know at compile time.
>
> Tom> Check the linuxppc-dev archives, but this has been discussed before, and
> Tom> it came down to a few things.
>
> Well, you may discuss it again, but this time after actually reading
> the C standard:
>
> "6.3.6 Additive operators
> ...
>
> 9 Unless both the pointer operand and the result point to
> elements of the same array object, or the pointer operand
> points one past the last element of an array object and the
> result points to an element of the same array object, the
> behavior is undefined if the result is used as an operand of a
> unary * operator that is actually evaluated."
>
> Tom> 1) gcc shouldn't be making this optimization, and Paulus (who wrote the
> Tom> code) disliked this new feature.
>
> Why gcc shouldn't be making some optimization. Because a particular
> person doesn't like it or ? What kind of statement is that anyway ?
Well, Paulus wrote 'strcpy' not 'memcpy', so why does gcc get to assume
it's safe to change it? In this case it's certainly not.
> Tom> As a subnote, what you sent was sent
> Tom> to Linus with a comment about working around a gcc-3.0 bug/feature, and
> Tom> was rejected because of this.
>
> It is not a workaround, it is a fix to an invalid code, which gets
> triggered by particular optimization.
By a particular optimization that's not present before gcc-3.0, and
happens to break things under some conditions, as you've seen.
> Tom> 2) We could modify the RELOC macro, and not have this problem. The
> Tom> patch was posted, but not acted upon.
>
> Although all uses of the RELOC macro violate the standard, this kind
> of pointer arithmetic is far too common and usually produces the
> expected behavior, thus it make sense to optimize the cases where ut
> breaks
And for the case it breaks?
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Wed, Jan 02, 2002 at 08:11:39AM -0500, Jakub Jelinek wrote:
>
> On Wed, Jan 02, 2002 at 12:41:28PM +0200, Momchil Velikov wrote:
> > >>>>> "Florian" == Florian Weimer <[email protected]> writes:
> >
> > Florian> Momchil Velikov <[email protected]> writes:
> > >> - strcpy(namep, RELOC("linux,phandle"));
> > >> + memcpy (namep, RELOC("linux,phandle"), sizeof("linux,phandle"));
> >
> > Florian> Doesn't this still trigger undefined behavior, as far as the C
> > Florian> standard is concerned? It's probably a better idea to fix the linker,
> > Florian> so that it performs proper relocation.
This has been suggested too. And if someone implemements this in the
2.5.x timeframe it might even go in (as long as it works w/ gcc-2.95.x,
which really should be doable).
> > Well, strictly speaking it _is_ undefined, however adding/subtracting
> > __PAGE_OFFSET is far too common operation and one can resonably expect
> > to get away with it in the _vast_ majority of cases. IMHO, it is
> > better to fix the particular case, which triggers the undefined
> > behaviour, as these cases are bound to be _very_ rare.
>
> IMHO the best thing is to change the RELOC macro, so that gcc cannot optimize
> this.
> E.g.:
> -#define PTRRELOC(x) ((typeof(x))((unsigned long)(x) + offset))
> +#define PTRRELOC(x) ({ unsigned long __x = (unsigned long)(x); \
> asm ("" : "=r" (__x) : "0" (__x)); \
> (typeof(x))(__x + offset); })
> This way gcc cannot assume anything about it.
This is what Franz Sirl suggested awhile back, which should work, but
doesn't look too nice. If gcc-3.0.x is going to be expected to produce a
working kernel on all arches, I suppose this is the best fix for now.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Wed, Jan 02, 2002 at 01:40:06PM +0200, Momchil Velikov wrote:
>
> >>>>> "Aaron" == Aaron Lehmann <[email protected]> writes:
>
> Aaron> On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
> >> Thus
> >> strcpy (dst, "abcdef" + 2)
> >> gives
> >> memcpy (dst, "abcdef" + 2, 5)
>
> Aaron> IMHO gcc should not be touching these function calls, as they are not
> Aaron> made to a standard C library, and thus have different behaviors. I'm
> Aaron> suprised that gcc tries to optimize calls to these functions just
> Aaron> based on their names.
>
> IIRC, these identifiers are reserved by the C standard, thus the
> compiler is right to assume that they have standard behavior. And note
> that they DO have the standard behavior. It even doesn't matter if GCC
> is right to judge by the names in each and every case, it is right
> in _this_ case.
Right. The problem here is that this happens to be at a very early
point in the code (from arch/ppc/kernel/prom.c):
/*
* Note that prom_init() and anything called from prom_init() must
* use the RELOC/PTRRELOC macros to access any static data in
* memory, since the kernel may be running at an address that is
* different from the address that it was linked at.
* (Note that strings count as static variables.)
*/
Which is where the trouble comes in.
> Aaron> The gcc manpage mentions
>
> Aaron> -ffreestanding
> Aaron> Assert that compilation takes place in a freestanding
> Aaron> environment. This implies -fno-builtin. A freestand?
> Aaron> ing environment is one in which the standard library
> Aaron> may not exist, and program startup may not necessarily
> Aaron> be at "main". The most obvious example is an OS ker?
> Aaron> nel. This is equivalent to -fno-hosted.
>
> Aaron> Why is Linux not using this? It sounds very appropriate. The only
>
> Because it results in less optimization. I see no point in
> deliberately preventing the compiler from doing optimizations.
Right. There's only a very small number of files which _might_ be
effected by this optimization.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
>>>>> "Jakub" == Jakub Jelinek <[email protected]> writes:
Jakub> On Wed, Jan 02, 2002 at 08:39:20AM -0700, Tom Rini wrote:
>> Well, Paulus wrote 'strcpy' not 'memcpy', so why does gcc get to
>> assume it's safe to change it? In this case it's certainly not.
Jakub> But unless you trigger undefined behaviour, strcpy(x, "foobar"
Jakub> + n) is equal to memcpy(x, "foobar" + n, sizeof("foobar") -
Jakub> n); and the latter is more efficient (you don't have to check
Jakub> for end-of-string during copying).
I'd say it's the same even if you DO trigger undefined behavior.
"Undefined" means "anything can happen, don't blame the compiler". In
particular, different things may happen at different times, under
different compilers, under different phases of the moon, etc...
The change from strcpy to memcpy either replaces a defined action by
the equivalent one, or replaces an undefined action by an undefined
action. In the undefined case, the outcome may be different after the
change, but that's perfectly ok, that's what "undefined" means.
It might be interesting for the compiler to warn about this coding
error (since it presumably can detect it). But "fixing" the behavior
of undefined code seems like a strange thing to do.
paul
On 2 Jan 02 at 11:45, Paul Koning wrote:
>
> It might be interesting for the compiler to warn about this coding
> error (since it presumably can detect it). But "fixing" the behavior
> of undefined code seems like a strange thing to do.
It is even worse (gcc 2.95.4 20011223 (Debian prerelease), i386).
Test code:
#include <string.h>
char* dst;
void main(void) {
strcpy(dst, "test"+CONSTANT);
}
# gcc -O2 -S test.c -DCONSTANT=10
test.c: In function `main':
test.c:4: warning: offset outside bounds of constant string
...
and compiler generated correct code (call to strcpy with "test"+10).
But:
# gcc -O2 -S test.c -DCONSTANT=0x80000000
test.c: In function `main':
test.c:4: warning: offset outside bounds of constant string
gcc: Internal compiler error: program cc1 got fatal signal 11
(and for CONSTANT < 5 it of course generated correct code to fill
dst with string contents; and yes, I know that code will sigsegv on
run because of dst is not initialized - but it should die at runtime,
not at compile time).
So we should definitely change RELOC(), or sooner or later gcc will
die on such code :-(
Debian's gcc 3.0.3-1 generates:
0 <= CONSTANT <= 4: fills dst directly with constant
5 <= CONSTANT <= 0x7FFFFFFF: emit warnings + use strcpy()
0x80000000U <= CONSTANT <= 0xFFFFFFFFU: use strcpy() silently
... and it does not die.
Best regards,
Petr Vandrovec
[email protected]
On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
> The GCC tries to replace the strcpy from a constant string source with
> a memcpy, since the length is know at compile time.
Okay, here's a summary of all of the options we have:
1) Change this particular strcpy to a memcpy
2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
optimization comes back on with this flag later on, it would be a
compiler bug, yes?)
3) Modify the RELOC() marco in such a way that GCC won't attempt to
optimize anything which touches it [1]. (Franz, again by Jakub)
4) Introduce a function to do the calculations [2]. (Corey Minyard)
5) 'Properly' set things up so that we don't need the RELOC() macros
(-mrelocatable or so?), and forget this mess altogether.
I think that if we're going to make sure that gcc-3.0.x works with 2.4.x
kernels, we should pick one of the first 4 initially. If this is only a
2.5.x matter, we should probably try and do #5 as a long-term goal, and
pick something else for now. But either way I do think it's time to
pick some solution. Comments?
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
[1] http://lists.linuxppc.org/linuxppc-dev/200109/msg00155.html
[2] http://lists.linuxppc.org/linuxppc-dev/200112/msg00038.html
> Okay, here's a summary of all of the options we have:
> 1) Change this particular strcpy to a memcpy
> 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> optimization comes back on with this flag later on, it would be a
> compiler bug, yes?)
> 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> optimize anything which touches it [1]. (Franz, again by Jakub)
> 4) Introduce a function to do the calculations [2]. (Corey Minyard)
> 5) 'Properly' set things up so that we don't need the RELOC() macros
> (-mrelocatable or so?), and forget this mess altogether.
2) will prevent any future gcc from ever assuming it can transform the
strcpy into anything but a call to strcpy, or assume anything about the
semantics of strcpy. Be careful with 3), as trying to fool the optimizer
is likely to be only a temporary solution (meaning that the kernel people
will return to flame the gcc people when the optimizer gets changed
again).
On Wed, Jan 02, 2002 at 12:13:34PM -0800, Joe Buck wrote:
>
> > Okay, here's a summary of all of the options we have:
> > 1) Change this particular strcpy to a memcpy
> > 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> > optimization comes back on with this flag later on, it would be a
> > compiler bug, yes?)
> > 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> > optimize anything which touches it [1]. (Franz, again by Jakub)
> > 4) Introduce a function to do the calculations [2]. (Corey Minyard)
> > 5) 'Properly' set things up so that we don't need the RELOC() macros
> > (-mrelocatable or so?), and forget this mess altogether.
>
> 2) will prevent any future gcc from ever assuming it can transform the
> strcpy into anything but a call to strcpy, or assume anything about the
> semantics of strcpy.
If we go with this, it also might make sense to split all of the
{PTR,UN,}RELOC() macro users into a different file or split up btext.c a
bit more, just to be safe.
> Be careful with 3), as trying to fool the optimizer
> is likely to be only a temporary solution (meaning that the kernel people
> will return to flame the gcc people when the optimizer gets changed
> again).
Well, it's one of those changes that really should stop gcc, unless
there's a bug on the gcc side, if I read the patch/comments about it
right.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Wed, Jan 02, 2002 at 12:09:10PM -0700, Tom Rini wrote:
> 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> optimization comes back on with this flag later on, it would be a
> compiler bug, yes?)
No. You still have the problem of using pointer arithmetic past
one past the end of the object.
C99 6.5.6/8:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
r~
On Wed, Jan 02, 2002 at 01:36:32PM -0800, Richard Henderson wrote:
> On Wed, Jan 02, 2002 at 12:09:10PM -0700, Tom Rini wrote:
> > 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> > optimization comes back on with this flag later on, it would be a
> > compiler bug, yes?)
>
> No. You still have the problem of using pointer arithmetic past
> one past the end of the object.
Well, the problem is that we aren't running where the compiler thinks we
are yet. So what would the right fix be for this?
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Wed, Jan 02, 2002 at 03:05:48PM -0700, Tom Rini wrote:
>
> Well, the problem is that we aren't running where the compiler thinks we
> are yet. So what would the right fix be for this?
Obviously -ffreestanding isn't, because this problem could crop up pretty
much anywhere. The involvement of standard library functions is almost
coincidence and so -ffreestanding would only fix the current symptom.
I'm not much of a kernel hacker, but a quick (and not very efficient,
granted) fix could be to make the offset an extern variable, yes? That
would force the compiler to fall back on the basic "your gun, your foot,
your choice" memory model.
Jeroen
[email protected] said:
> (and for CONSTANT < 5 it of course generated correct code to fill dst
> with string contents; and yes, I know that code will sigsegv on run
> because of dst is not initialized - but it should die at runtime, not
> at compile time).
An ICE, while it's not quite what was expected and it'll probably get fixed,
is nonetheless a perfectly valid implementation of 'undefined behaviour'.
You should count yourself lucky that the compiler didn't beat you up, sleep
with your mother, and/or start WW III.
Contributors to this thread who want to write a kernel in some C-like
language other than C probably ought to start by writing their own compiler,
rather than complaining about gcc. (I won't suggest starting with a language
spec, as the people in question don't really seem to be interested in
those.)
That or implement DWIM for gcc, I suppose...
Can we fix the broken code and stop being silly now, please?
--
dwmw2
On Wed, Jan 02, 2002 at 03:05:48PM -0700, Tom Rini wrote:
> Well, the problem is that we aren't running where the compiler thinks we
> are yet. So what would the right fix be for this?
Assembly.
r~
On Wed, Jan 02, 2002 at 02:27:12PM -0800, Richard Henderson wrote:
> On Wed, Jan 02, 2002 at 03:05:48PM -0700, Tom Rini wrote:
> > Well, the problem is that we aren't running where the compiler thinks we
> > are yet. So what would the right fix be for this?
>
> Assembly.
That's not really an option.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
[email protected] said:
> > (and for CONSTANT < 5 it of course generated correct code to fill dst
> > with string contents; and yes, I know that code will sigsegv on run
> > because of dst is not initialized - but it should die at runtime, not
> > at compile time).
>
> An ICE, while it's not quite what was expected and it'll probably get fixed,
> is nonetheless a perfectly valid implementation of 'undefined behaviour'.
Not for GCC it isn't. Our standards say that a compiler crash, for any
input whatsoever, no matter how invalid (even if you feed in line noise),
is a bug. Other than that we shouldn't make promises, though the old gcc1
behavior of trying to launch a game of rogue or hack when encountering a
#pragma was cute.
On Wed, Jan 02, 2002 at 03:35:57PM -0700, Tom Rini wrote:
> That's not really an option.
Oh come on. It shouldn't take very much code at all to properly
relocate the binary. You can use either -fpic or -mrelocatable
to get hold of the set of addresses in question.
r~
On Wed, Jan 02, 2002 at 02:44:35PM -0800, Richard Henderson wrote:
> On Wed, Jan 02, 2002 at 03:35:57PM -0700, Tom Rini wrote:
> > That's not really an option.
>
> Oh come on. It shouldn't take very much code at all to properly
> relocate the binary. You can use either -fpic or -mrelocatable
> to get hold of the set of addresses in question.
Have you looked at all of the code in question? We're doing a lot of things
before we properly relocate ourself. And as long as things aren't being
optimized away, it just works.
And the other thing is if we want gcc-3.0 to produce a good kernel on
PPC in 2.4.x, the kernel-side changes should be as minimal as possible.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
[email protected] said:
> > An ICE, while it's not quite what was expected and it'll probably get
> > fixed, is nonetheless a perfectly valid implementation of 'undefined
> > behaviour'.
> Not for GCC it isn't. Our standards say that a compiler crash, for
> any input whatsoever, no matter how invalid (even if you feed in line
> noise), is a bug. Other than that we shouldn't make promises, though
> the old gcc1 behavior of trying to launch a game of rogue or hack when
> encountering a #pragma was cute.
True - sorry, I forgot where this was crossposted. I didn't mean to imply
that GCC folks would _accept_ an ICE and not fix it - just that strictly
speaking, it is a perfectly valid response, as is the unintended observed
behaviour of the output code which actually started this thread.
--
dwmw2
On Wed, Jan 02, 2002 at 11:23:21PM +0100, jtv wrote:
> On Wed, Jan 02, 2002 at 03:05:48PM -0700, Tom Rini wrote:
> >
> > Well, the problem is that we aren't running where the compiler thinks we
> > are yet. So what would the right fix be for this?
>
> Obviously -ffreestanding isn't, because this problem could crop up pretty
> much anywhere. The involvement of standard library functions is almost
> coincidence and so -ffreestanding would only fix the current symptom.
After thinking about this a bit more, why wouldn't this be the fix? The
problem is that gcc is assuming that this is a 'normal' program (or in
this case, part of a program) and that it, and that the standard rules
apply, so it optimizes the strcpy into a memcpy. But in this small bit
of the kernel, it's not. It's not even using the 'standard library
functions', but what the kernel provides. This problem can only crop up
in the time before we finish moving ourself around.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
Bernard Dautrevaux writes:
> > 1) gcc shouldn't be making this optimization, and Paulus (who
> > wrote the
> > code) disliked this new feature.
>
> Why? If memcpy can then be expanded in line, and the string is short, this
> can be a *huge* win...
Show me a place in the kernel which is performance-critical where we
do strcpy(p, "string"). I can't think of one. The stuff in prom.c
isn't performance-critical.
> I think we MUST modify the RELOC macro. Currently the code has an undefined
> meaning WRT th eC standard, so is allowed to do almost anything from working
> as expected (quite bad in fact: it may suddenly fail when sth is modified
> elsewhere), to triggering the 3rd World War :-)
As I said in another email, if the gcc maintainers want to change gcc
so that pointer arithmetic can do anything other than an ordinary 2's
complement addition operation, then we will stop using gcc.
Paul.
Momchil Velikov writes:
> Well, you may discuss it again, but this time after actually reading
> the C standard:
>
> "6.3.6 Additive operators
> ...
>
> 9 Unless both the pointer operand and the result point to
> elements of the same array object, or the pointer operand
> points one past the last element of an array object and the
> result points to an element of the same array object, the
> behavior is undefined if the result is used as an operand of a
> unary * operator that is actually evaluated."
One of the reasons why C is a good language for the kernel is that its
memory model is a good match to the memory organization used by the
processors that linux runs on. Thus, for these processors, adding an
offset to a pointer is in fact simply an arithmetic addition. Combine
that with the fact that the kernel is far more aware of how stuff is
laid out in its virtual memory space than most C programs, and you can
see that it is reasonable for the kernel to do pointer arithmetic
which might be undefined according to the strict letter of the law,
but which in fact works correctly on the class of processors that
Linux runs on, for all reasonable compiler implementations.
The rules in the C standard are designed to allow a program to run
consistently on a wide variety of machine architectures, including
things like the DEC PDP-10 which weren't directly byte-addressable.
(Yes the PDP-10 had "byte pointers" but they were a different kind of
object from an ordinary pointer.) The Linux kernel runs on a more
restricted range of machines and IMHO we are entitled to assume things
because of that.
> Why gcc shouldn't be making some optimization. Because a particular
> person doesn't like it or ? What kind of statement is that anyway ?
This is the kernel. If I say strcpy I want the compiler to call a
function called strcpy, not to try to second-guess me and do something
different. If I want memcpy I'll write "memcpy".
> Although all uses of the RELOC macro violate the standard, this kind
> of pointer arithmetic is far too common and usually produces the
> expected behavior, thus it make sense to optimize the cases where ut
> breaks
If the gcc maintainers think they are entitled to change the memory
model so as to break pointer arithmetic that "violates" the standard,
then we will have to use a different compiler.
As for the original problem, my preferred solution at the moment is to
add an label in arch/ppc/lib/string.S so that string_copy() is the
same function as strcpy(), and use string_copy instead of strcpy in
prom.c.
Paul.
Tom Rini writes:
> Okay, here's a summary of all of the options we have:
> 1) Change this particular strcpy to a memcpy
> 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> optimization comes back on with this flag later on, it would be a
> compiler bug, yes?)
> 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> optimize anything which touches it [1]. (Franz, again by Jakub)
> 4) Introduce a function to do the calculations [2]. (Corey Minyard)
> 5) 'Properly' set things up so that we don't need the RELOC() macros
> (-mrelocatable or so?), and forget this mess altogether.
I would add:
6) change strcpy to string_copy so gcc doesn't think it knows what the
function does
7) code RELOC etc. in assembly, which would let us get rid of the
offset = reloc_offset();
at the beginning of each function which uses RELOC.
Maybe 7 is the way to go, at least it will shut the bush-lawyers up.
Paul.
>>>>> "Paul" == Paul Mackerras <[email protected]> writes:
Paul> As I said in another email, if the gcc maintainers want to change gcc
Paul> so that pointer arithmetic can do anything other than an ordinary 2's
Paul> complement addition operation,
Nobody changes pointer arithmetic. The problem is that this
optimization gives _negative_ length, because the resulting pointer
does not point inside or one past the end of the array, which in turn
is explicitly mentioned in the standard as undefined behavior.
Paul> ... then we will stop using gcc.
Specifically separated this part to state that I don't even care to
comment on this. Doh, I did. Anyway ..
Regards,
-velco
On Thu, Jan 03, 2002 at 10:11:53AM +1100, Paul Mackerras wrote:
> Tom Rini writes:
>
> > Okay, here's a summary of all of the options we have:
> > 1) Change this particular strcpy to a memcpy
> > 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> > optimization comes back on with this flag later on, it would be a
> > compiler bug, yes?)
> > 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> > optimize anything which touches it [1]. (Franz, again by Jakub)
> > 4) Introduce a function to do the calculations [2]. (Corey Minyard)
> > 5) 'Properly' set things up so that we don't need the RELOC() macros
> > (-mrelocatable or so?), and forget this mess altogether.
>
> I would add:
>
> 6) change strcpy to string_copy so gcc doesn't think it knows what the
> function does
> 7) code RELOC etc. in assembly, which would let us get rid of the
> offset = reloc_offset();
> at the beginning of each function which uses RELOC.
I think 7 sounds good for 2.4 at least, and maybe we can convince Franz
to look into 5 for 2.5 (since that would make things look a bit more
clean)..
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Wed, Jan 02, 2002 at 04:12:43PM -0700, Tom Rini wrote:
> >
> > Obviously -ffreestanding isn't, because this problem could crop up pretty
> > much anywhere. The involvement of standard library functions is almost
> > coincidence and so -ffreestanding would only fix the current symptom.
>
> After thinking about this a bit more, why wouldn't this be the fix? The
> problem is that gcc is assuming that this is a 'normal' program (or in
> this case, part of a program) and that it, and that the standard rules
> apply, so it optimizes the strcpy into a memcpy. But in this small bit
> of the kernel, it's not. It's not even using the 'standard library
> functions', but what the kernel provides. This problem can only crop up
> in the time before we finish moving ourself around.
I'm not arguing your facts, but the "abnormality" is in the different
workings of memory addresses, not in anything related to the standard
library. What if these functions were named differently but gcc were
able to inline them? AFAICS that might trigger the same problem--no
cstdlib confusion involved.
Conversely, what if these were the real stdlib calls that they seem to
be? Still the same bug. Absence or presence of the standard library
is not essential to the problem, and so -ffreestanding can be a fragile
workaround at best.
The bug just happens to get triggered by a 'builtin' optimization, because
gcc 3.0.3 is a little more aggressive with those. We can't keep the
progress of gcc's optimizer back just for a kernel. Asking for a new
option or #pragma, okay. But weeding out otherwise valid assumptions that
help many inputs but break one? Better to fix the one, even if it does
cost you some speed there.
Oh, and another suggestion: would having RELOC cast the pointer to the
intermediate type "const volatile void *" make gcc drop its assumptions
on that one pointer, and avoid the optimization? It may not be exactly
what the Standard had in mind when it defined "volatile," but then again
the definition was deliberately left vague.
OTOH it may have to be cast away again to make strcpy() work at all, and
so that trick might still break...
Jeroen
>>>>> "Paul" == Paul Mackerras <[email protected]> writes:
Paul> Momchil Velikov writes:
>> Well, you may discuss it again, but this time after actually reading
>> the C standard:
>>
>> "6.3.6 Additive operators
>> ...
>>
>> 9 Unless both the pointer operand and the result point to
>> elements of the same array object, or the pointer operand
>> points one past the last element of an array object and the
>> result points to an element of the same array object, the
>> behavior is undefined if the result is used as an operand of a
>> unary * operator that is actually evaluated."
Paul> One of the reasons why C is a good language for the kernel is that its
Paul> memory model is a good match to the memory organization used by the
Paul> processors that linux runs on. Thus, for these processors, adding an
Paul> offset to a pointer is in fact simply an arithmetic addition. Combine
Paul> that with the fact that the kernel is far more aware of how stuff is
Paul> laid out in its virtual memory space than most C programs, and you can
Paul> see that it is reasonable for the kernel to do pointer arithmetic
Paul> which might be undefined according to the strict letter of the law,
Paul> but which in fact works correctly on the class of processors that
Paul> Linux runs on, for all reasonable compiler implementations.
Paul> The rules in the C standard are designed to allow a program to run
Paul> consistently on a wide variety of machine architectures, including
Paul> things like the DEC PDP-10 which weren't directly byte-addressable.
Paul> (Yes the PDP-10 had "byte pointers" but they were a different kind of
Paul> object from an ordinary pointer.) The Linux kernel runs on a more
Paul> restricted range of machines and IMHO we are entitled to assume things
Paul> because of that.
>> Why gcc shouldn't be making some optimization. Because a particular
>> person doesn't like it or ? What kind of statement is that anyway ?
Paul> This is the kernel. If I say strcpy I want the compiler to call a
Paul> function called strcpy, not to try to second-guess me and do something
Paul> different. If I want memcpy I'll write "memcpy".
This is C. If you use ``strcpy'', an identifier reserved by the
standard, it means that you want to move some bytes from here to there
and please let the compiler decide how to do that. Much in the same
way as when you assign structures and compiler uses ``memcpy'' or
maybe it doesn't if it wouldn't be a win.
>> Although all uses of the RELOC macro violate the standard, this kind
>> of pointer arithmetic is far too common and usually produces the
>> expected behavior, thus it make sense to optimize the cases where ut
>> breaks
Paul> If the gcc maintainers think they are entitled to change the memory
Paul> model so as to break pointer arithmetic that "violates" the standard,
Paul> then we will have to use a different compiler.
Where do you see changes in pointer arithmetic ? Or in the "memory
model" (whatever that means) ?
I'd dare to state that _very_ few people would join the quest for
writing the kernel in something other than C.
Paul> As for the original problem, my preferred solution at the moment is to
Paul> add an label in arch/ppc/lib/string.S so that string_copy() is the
Paul> same function as strcpy(), and use string_copy instead of strcpy in
Paul> prom.c.
On Thu, Jan 03, 2002 at 01:28:42AM +0200, Momchil Velikov wrote:
> >>>>> "Paul" == Paul Mackerras <[email protected]> writes:
>
> Paul> Tom Rini writes:
>
> >> Okay, here's a summary of all of the options we have:
> >> 1) Change this particular strcpy to a memcpy
> >> 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> >> optimization comes back on with this flag later on, it would be a
> >> compiler bug, yes?)
> >> 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> >> optimize anything which touches it [1]. (Franz, again by Jakub)
> >> 4) Introduce a function to do the calculations [2]. (Corey Minyard)
> >> 5) 'Properly' set things up so that we don't need the RELOC() macros
> >> (-mrelocatable or so?), and forget this mess altogether.
>
> Paul> I would add:
>
> Paul> 6) change strcpy to string_copy so gcc doesn't think it knows what the
> Paul> function does
>
> GCC thinks exactly what the function does.
And then optimizes it to something that fails to work in this particular
case.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
<<One of the reasons why C is a good language for the kernel is that its
memory model is a good match to the memory organization used by the
processors that linux runs on. Thus, for these processors, adding an
offset to a pointer is in fact simply an arithmetic addition. Combine
that with the fact that the kernel is far more aware of how stuff is
laid out in its virtual memory space than most C programs, and you can
see that it is reasonable for the kernel to do pointer arithmetic
which might be undefined according to the strict letter of the law,
but which in fact works correctly on the class of processors that
Linux runs on, for all reasonable compiler implementations.
>>
The concept of "all reasonable compiler implementations" is a very dubious
one. There is nothing to stop a valid C compiler from building assertions
based on the quoted paragraph from the C standard, e.g. it could derive
valid range information from knowing that an offset was constrained to
certain limits. So writing bogus C like this is risky, and as compilers
get more sophisticated, one is likely to hear screams, but they are not
justified in my opinion. There is no excuse for such abuse.
>>>>> "Paul" == Paul Mackerras <[email protected]> writes:
Paul> Tom Rini writes:
>> Okay, here's a summary of all of the options we have:
>> 1) Change this particular strcpy to a memcpy
>> 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
>> optimization comes back on with this flag later on, it would be a
>> compiler bug, yes?)
>> 3) Modify the RELOC() marco in such a way that GCC won't attempt to
>> optimize anything which touches it [1]. (Franz, again by Jakub)
>> 4) Introduce a function to do the calculations [2]. (Corey Minyard)
>> 5) 'Properly' set things up so that we don't need the RELOC() macros
>> (-mrelocatable or so?), and forget this mess altogether.
Paul> I would add:
Paul> 6) change strcpy to string_copy so gcc doesn't think it knows what the
Paul> function does
GCC thinks exactly what the function does.
On Thu, Jan 03, 2002 at 12:45:14AM +0100, jtv wrote:
> On Wed, Jan 02, 2002 at 04:12:43PM -0700, Tom Rini wrote:
> > >
> > > Obviously -ffreestanding isn't, because this problem could crop up pretty
> > > much anywhere. The involvement of standard library functions is almost
> > > coincidence and so -ffreestanding would only fix the current symptom.
> >
> > After thinking about this a bit more, why wouldn't this be the fix? The
> > problem is that gcc is assuming that this is a 'normal' program (or in
> > this case, part of a program) and that it, and that the standard rules
> > apply, so it optimizes the strcpy into a memcpy. But in this small bit
> > of the kernel, it's not. It's not even using the 'standard library
> > functions', but what the kernel provides. This problem can only crop up
> > in the time before we finish moving ourself around.
>
> I'm not arguing your facts, but the "abnormality" is in the different
> workings of memory addresses, not in anything related to the standard
> library. What if these functions were named differently but gcc were
> able to inline them? AFAICS that might trigger the same problem--no
> cstdlib confusion involved.
I'm not sure.. We do:
#define RELOC(x) (*PTRRELOC(&(x)))
#define PTRRELOC(x) ((typeof(x))((unsigned long)(x) + offset))
unsigned long offset = reloc_offset();
...
strcpy(namep, RELOC("linux,phandle"));
Which is basically inlining, yes?
> Conversely, what if these were the real stdlib calls that they seem to
> be? Still the same bug. Absence or presence of the standard library
> is not essential to the problem, and so -ffreestanding can be a fragile
> workaround at best.
Yes, but doesn't -ffreestanding imply that gcc _can't_ assume this is
the standard library, and that strcpy _might_ not be what it thinks, and
to just call strpy?
> The bug just happens to get triggered by a 'builtin' optimization, because
> gcc 3.0.3 is a little more aggressive with those. We can't keep the
> progress of gcc's optimizer back just for a kernel. Asking for a new
> option or #pragma, okay. But weeding out otherwise valid assumptions that
> help many inputs but break one? Better to fix the one, even if it does
> cost you some speed there.
We aren't saying this is always a bad thing, but what if we want to turn
off a built-in optimization? Unless -ffreestanding stops implying
-fno-builtin (maybe we could just add -fno-builtin for this one file..),
this line should be fine.
> Oh, and another suggestion: would having RELOC cast the pointer to the
> intermediate type "const volatile void *" make gcc drop its assumptions
> on that one pointer, and avoid the optimization? It may not be exactly
> what the Standard had in mind when it defined "volatile," but then again
> the definition was deliberately left vague.
I _think_ this is option 3 or so that I mentioned in another email.
Modify RELOC so that gcc will drop its assumptions and just do what we
explicitly say.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
Robert Dewar writes:
> The concept of "all reasonable compiler implementations" is a very dubious
> one. There is nothing to stop a valid C compiler from building assertions
> based on the quoted paragraph from the C standard, e.g. it could derive
> valid range information from knowing that an offset was constrained to
> certain limits. So writing bogus C like this is risky, and as compilers
> get more sophisticated, one is likely to hear screams, but they are not
> justified in my opinion. There is no excuse for such abuse.
There is already such a project under development: see
http://gcc.gnu.org/projects/bp/main.html
This is a modification to gcc that implements pointers as triples.
While there is a performance penalty for doing this, it can completely
eliminate the problem of exploitable buffer overflows. However, programs
that violate the rules of ISO C by generating out-of-range pointers will
fail.
On Wed, Jan 02, 2002 at 05:01:18PM -0700, Tom Rini wrote:
> Yes, but doesn't -ffreestanding imply that gcc _can't_ assume this is
> the standard library...
Ignore strcpy. Yes, that's what visibly causing a failure here,
but the bug is in the funny pointer arithmetic. Leave that in
there and the compiler _will_ bite your ass sooner or later.
r~
<<This is a modification to gcc that implements pointers as triples.
While there is a performance penalty for doing this, it can completely
eliminate the problem of exploitable buffer overflows. However, programs
that violate the rules of ISO C by generating out-of-range pointers will
fail.
>>
Note incidentally that the C rules that allow referencing the address just
past the end of an array (an irregularity that recognizes the infeasibility
of declaring the common idiom for (a=b;a<&b[10];a++)) has an interesting
consequence on a segmented machine, namely that you cannot allocate an
array too near the end of the segment.
> Yes, but doesn't -ffreestanding imply that gcc _can't_ assume this is
> the standard library, and that strcpy _might_ not be what it thinks, and
> to just call strpy?
strcpy is part of the C standard. You'd need a -fits-not-c-its-linux
> We aren't saying this is always a bad thing, but what if we want to turn
> off a built-in optimization? Unless -ffreestanding stops implying
> -fno-builtin (maybe we could just add -fno-builtin for this one file..),
> this line should be fine.
If you want a strcpy that isnt strcpy then change its name or use a
different language 8)
On Wed, Jan 02, 2002 at 04:34:52PM -0700, Tom Rini wrote:
> On Thu, Jan 03, 2002 at 01:28:42AM +0200, Momchil Velikov wrote:
> >
> > GCC thinks exactly what the function does.
>
> And then optimizes it to something that fails to work in this particular
> case.
Which it may do with another function *or expression* as well, because
the real bug has already happened before the function call comes into
the issue.
As far as I'm concerned the options are: fix RELOC; obviate RELOC; use
an appropriate gcc option if available (-fPIC might be it, -ffreestanding
certainly isn't--see above); *extend* (not fix, extend) gcc; or work
around all individual cases. In rough descending order of preference.
Jeroen
On Wed, Jan 02, 2002 at 04:07:39PM -0800, Richard Henderson wrote:
> On Wed, Jan 02, 2002 at 05:01:18PM -0700, Tom Rini wrote:
> > Yes, but doesn't -ffreestanding imply that gcc _can't_ assume this is
> > the standard library...
>
> Ignore strcpy. Yes, that's what visibly causing a failure here,
> but the bug is in the funny pointer arithmetic. Leave that in
> there and the compiler _will_ bite your ass sooner or later.
Thank you for taking it all and putting it in a nutshell.
Jeroen
On Wed, Jan 02, 2002 at 04:07:39PM -0800, Richard Henderson wrote:
> On Wed, Jan 02, 2002 at 05:01:18PM -0700, Tom Rini wrote:
> > Yes, but doesn't -ffreestanding imply that gcc _can't_ assume this is
> > the standard library...
>
> Ignore strcpy. Yes, that's what visibly causing a failure here,
> but the bug is in the funny pointer arithmetic. Leave that in
> there and the compiler _will_ bite your ass sooner or later.
Er, which part of the 'funny pointer arithmetic' ? I take it you aren't
a fan of the 'change to
memcpy(namep,RELOC("linux,phandle"),sizeof("linux,phandle"));' fix.
We need to do funny things here, and thus need a way to tell gcc to just
do what we're saying.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Thu, Jan 03, 2002 at 01:19:05AM +0100, jtv wrote:
> On Wed, Jan 02, 2002 at 04:34:52PM -0700, Tom Rini wrote:
> > On Thu, Jan 03, 2002 at 01:28:42AM +0200, Momchil Velikov wrote:
> > >
> > > GCC thinks exactly what the function does.
> >
> > And then optimizes it to something that fails to work in this particular
> > case.
>
> Which it may do with another function *or expression* as well, because
> the real bug has already happened before the function call comes into
> the issue.
What's the bug? The 'funny' arithmetic?
> As far as I'm concerned the options are: fix RELOC;
How?
> obviate RELOC; use
> an appropriate gcc option if available (-fPIC might be it, -ffreestanding
> certainly isn't--see above);
Maybe for 2.5. Too invasive for 2.4.x (initially at least).
> *extend* (not fix, extend) gcc; or work
> around all individual cases. In rough descending order of preference.
Er, say what?
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Wed, Jan 02, 2002 at 07:12:41PM -0500, [email protected] wrote:
>
> Note incidentally that the C rules that allow referencing the address just
> past the end of an array (an irregularity that recognizes the infeasibility
> of declaring the common idiom for (a=b;a<&b[10];a++)) has an interesting
> consequence on a segmented machine, namely that you cannot allocate an
> array too near the end of the segment.
At the risk of going off topic, you can take the non-element's address but
you can't actually touch it. So provided your architecture supports
pointer arithmetic beyond the end of the segment, your only remaining
worries are (1) that you don't stumble into the NULL address (which need
not be zero), and (2) that the address isn't reused as a valid element of
something else. I'm not so sure the latter is even a requirement.
Which does tie in to what we were discussing: in "foo"+LARGE_CONSTANT,
the problem is that C makes no promises on where "foo" ends up in memory,
or if it's even a single place, or what is located at any given offset
from it, or that the sum is even something that can be considered an
address in any way. Nor does the compiler need to assume that the
programmer knows better--bar some compiler-specific support for this
trick.
Jeroen
(Yes, I'm a pedant. I'm pining for the day when gcc will support the
options "-ffascist -Wanal")
(cc list trimmed)
[email protected] said:
> If you want a strcpy that isnt strcpy then change its name or use a
> different language 8)
The former is not necessarily sufficient in this case. You've still done the
broken pointer arithmetic, so even if the function isn't called strcpy() the
compiler is _still_ entitled to replace it with a call to memcpy() or even
machine_restart() before sleeping with your mother and starting WW III.
Granted, it probably _won't_ do any of those today, but you should know
better than to rely on that.
What part of 'undefined behaviour' is so difficult for people to understand?
--
dwmw2
<<At the risk of going off topic, you can take the non-element's address but
you can't actually touch it. So provided your architecture supports
pointer arithmetic beyond the end of the segment, your only remaining
worries are (1) that you don't stumble into the NULL address (which need
not be zero), and (2) that the address isn't reused as a valid element of
something else. I'm not so sure the latter is even a requirement.
>>
Ah! But you can compare it, and on a segmented architecture like the 286,
the address just past the end of the array can wrap to zero if the array
is allocated right up to the end of the segment. This is not theory, at
least one C compiler on the 286 had this bug!
On Wed, Jan 02, 2002 at 05:29:35PM -0700, Tom Rini wrote:
>
> > Which it may do with another function *or expression* as well, because
> > the real bug has already happened before the function call comes into
> > the issue.
>
> What's the bug? The 'funny' arithmetic?
That depends on your definition of bug--symptom, "sighting" (thank you
Intel), or underlying problem. The funny arithmetic, in C terms, is
the underlying problem. The symptom is obvious. The sighting involves
calls to what appear to be calls to the standard library, but in this
case happen not to be. We'd still have the same problem if this really
were the standard C library.
> > As far as I'm concerned the options are: fix RELOC;
>
> How?
That's the EUR 64 * 10^6 question, isn't it? It's likely to cost some
performance, but I suspect this would be the easiest solution.
> > obviate RELOC; use
> > an appropriate gcc option if available (-fPIC might be it, -ffreestanding
> > certainly isn't--see above);
>
> Maybe for 2.5. Too invasive for 2.4.x (initially at least).
I'm not saying these are attractive or even feasible, just prioritizing
the options I see. I'm sure I'll have forgotten some, but I'm convinced
-ffreestanding isn't among them.
Let's say RELOC also broke in other places, for the exact same reason but
without (names familiar from) the standard library come into play. How
would one recognize those cases? And once diagnosed, how to go about
working around them? Even worse, what if gcc tries to help but still
uses the stricter assumptions in some forgotten case?
Those would take time, and I can't see why none of these would arise
at some point. That's why I feel -ffreestanding is too brittle as a
workaround.
> > *extend* (not fix, extend) gcc; or work
> > around all individual cases. In rough descending order of preference.
>
> Er, say what?
In rough descending order of _my_ preference, that is--falls within the
scope of the "as far as I'm concerned." :)
But yes, requiring gcc to support RELOC through anything else than an
existing option (and once again, -ffreestanding only works around one
instance where it breaks) would be an extension to gcc. If it didn't
break in the past, that was essentially a coincidence. Unless it's in
some forgotten corner of the docs: possible, but not probable AFAICS.
Optimizing C is hard to do. Pointer arithmetic is hell on an optimizer,
and the rules were made to give the latter a fair chance of doing a
good job. I'm not arguing against having some specific option to relax
those rules--kernels need to be written, and written in C. But that
option is by definition a compiler-specific extension to the language.
Jeroen
On Wed, Jan 02, 2002 at 05:16:05PM -0700, Tom Rini wrote:
> > Ignore strcpy. Yes, that's what visibly causing a failure here,
> > but the bug is in the funny pointer arithmetic. Leave that in
> > there and the compiler _will_ bite your ass sooner or later.
>
> Er, which part of the 'funny pointer arithmetic'?
What's currently inside RELOC.
> We need to do funny things here, and thus need a way to tell gcc to just
> do what we're saying.
While it's true that you have requirements on doing runtime
relocation, what you absolutely do not want to do is expose
this to gcc.
Hide it inside inline assembly if you must. Better is to
do your relocation before you enter C at all.
r~
On Thu, Jan 03, 2002 at 02:03:58AM +0100, jtv wrote:
> On Wed, Jan 02, 2002 at 05:29:35PM -0700, Tom Rini wrote:
>
> > > As far as I'm concerned the options are: fix RELOC;
> >
> > How?
>
> That's the EUR 64 * 10^6 question, isn't it? It's likely to cost some
> performance, but I suspect this would be the easiest solution.
I'm partial to Paul's' suggestion of redoing RELOC and friends in asm.
> > > obviate RELOC; use
> > > an appropriate gcc option if available (-fPIC might be it, -ffreestanding
> > > certainly isn't--see above);
> >
> > Maybe for 2.5. Too invasive for 2.4.x (initially at least).
>
> I'm not saying these are attractive or even feasible, just prioritizing
> the options I see. I'm sure I'll have forgotten some, but I'm convinced
> -ffreestanding isn't among them.
Well, Franz Sirl mentioned this before as a possible (or rather, why
aren't we doing it like this?), so maybe later someone will look into
this.
> Let's say RELOC also broke in other places, for the exact same reason but
> without (names familiar from) the standard library come into play. How
> would one recognize those cases? And once diagnosed, how to go about
> working around them? Even worse, what if gcc tries to help but still
> uses the stricter assumptions in some forgotten case?
Well, RELOC _always_ is doing funny arithmetic, and there's not much we
can do about it. At this point in the bootup we aren't running where
things think we are yet, and have to adjust things as such.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Thu, Jan 03, 2002 at 12:35:30AM +0000, David Woodhouse wrote:
>
> (cc list trimmed)
>
> [email protected] said:
> > If you want a strcpy that isnt strcpy then change its name or use a
> > different language 8)
>
> The former is not necessarily sufficient in this case. You've still done the
> broken pointer arithmetic, so even if the function isn't called strcpy() the
> compiler is _still_ entitled to replace it with a call to memcpy() or even
> machine_restart() before sleeping with your mother and starting WW III.
>
> Granted, it probably _won't_ do any of those today, but you should know
> better than to rely on that.
>
> What part of 'undefined behaviour' is so difficult for people to understand?
I think it comes down to an expectation that if the behaviour is
undefined, anything _could_ happen, but what should happen is that it
should just be passed along to (in this case) strcpy un-modified.
Anything _could_ happen, but why do something that probably won't help
all the same?
But this is moot anyhow since I _think_ Paul's suggestion of doing RELOC
and friends as asm will work (and echo'd by rth?).
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
Richard Henderson writes:
> Ignore strcpy. Yes, that's what visibly causing a failure here,
> but the bug is in the funny pointer arithmetic. Leave that in
> there and the compiler _will_ bite your ass sooner or later.
I look forward to seeing your patch to remove all uses of
virt_to_phys, phys_to_virt, __pa, __va, etc. from arch/alpha... :)
Paul.
Momchil Velikov writes:
> Where do you see changes in pointer arithmetic ? Or in the "memory
> model" (whatever that means) ?
The C standard says that doing pointer arithmetic is only valid within
the bounds of a particular array of objects; do anything outside that
and the behaviour is "undefined". The reason it says that is that it
is trying to lay down the rules that will ensure that your program
will work on a PDP-10 or on a 286 with 64kB segments, as well as on
reasonable architectures (such as those that Linux runs on).
Now the claim is that RELOC is bad because it adds an offset to a
pointer, and the offset is usually around 0xc0000000, and thus we are
"violating" the C standard. Thus we are being told that someday this
will break and cause a lot of grief. My contention is that this will
only break if a pointer becomes something other than a simple address
and pointer arithmetic becomes something other than simple 2's
complement addition, subtraction, etc. If that happens then C will
have become useless for implementing a kernel, IMHO.
> I'd dare to state that _very_ few people would join the quest for
> writing the kernel in something other than C.
The kernel is already written in a dialect of C that breaks some of
the rules in the C standard. Look at virt_to_phys, phys_to_virt,
__pa, __va, etc. for a start. I don't see that changing.
Paul.
Joe Buck writes:
> There is already such a project under development: see
>
> http://gcc.gnu.org/projects/bp/main.html
>
> This is a modification to gcc that implements pointers as triples.
> While there is a performance penalty for doing this, it can completely
> eliminate the problem of exploitable buffer overflows. However, programs
> that violate the rules of ISO C by generating out-of-range pointers will
> fail.
What will it do if I cast a pointer to unsigned long? Or if I cast an
unsigned long to a pointer? The kernel does both of these things, and
in a lot of places.
Part of my beef with what gcc-3 is doing is that I take a pointer,
cast it to unsigned long, do something to it, cast it back to a
pointer, and gcc _still_ thinks it's knows what I am doing. It
doesn't.
Paul.
On Wed, Jan 02, 2002 at 12:09:10PM -0700, Tom Rini <[email protected]> wrote:
| On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
| > The GCC tries to replace the strcpy from a constant string source with
| > a memcpy, since the length is know at compile time.
|
| Okay, here's a summary of all of the options we have:
| 1) Change this particular strcpy to a memcpy
| 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
| optimization comes back on with this flag later on, it would be a
| compiler bug, yes?)
| 3) Modify the RELOC() marco in such a way that GCC won't attempt to
| optimize anything which touches it [1]. (Franz, again by Jakub)
| 4) Introduce a function to do the calculations [2]. (Corey Minyard)
| 5) 'Properly' set things up so that we don't need the RELOC() macros
| (-mrelocatable or so?), and forget this mess altogether.
Dudes, maybe I'm missing something here, but why don't you just mark the
source data as volatile? Then it _can't_ assume it knows the length of
the strcpy because it can't assume it knows the content:
If PTRRELOC cast the pointer type to
volatile void *
or something else suitable generic but volatile then this discussion might
not be happening. It would at least move the optimisation into "definite
compiler bug" if it still happens.
--
Cameron Simpson, DoD#743 [email protected] http://www.zip.com.au/~cs/
I think... Therefore I ride. I ride... Therefore I am.
- Mark Pope <[email protected]>
On Thu, Jan 03, 2002 at 03:08:43PM +1100, Cameron Simpson wrote:
> On Wed, Jan 02, 2002 at 12:09:10PM -0700, Tom Rini <[email protected]> wrote:
> | On Wed, Jan 02, 2002 at 01:03:25AM +0200, Momchil Velikov wrote:
> | > The GCC tries to replace the strcpy from a constant string source with
> | > a memcpy, since the length is know at compile time.
> |
> | Okay, here's a summary of all of the options we have:
> | 1) Change this particular strcpy to a memcpy
> | 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> | optimization comes back on with this flag later on, it would be a
> | compiler bug, yes?)
> | 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> | optimize anything which touches it [1]. (Franz, again by Jakub)
> | 4) Introduce a function to do the calculations [2]. (Corey Minyard)
> | 5) 'Properly' set things up so that we don't need the RELOC() macros
> | (-mrelocatable or so?), and forget this mess altogether.
>
> Dudes, maybe I'm missing something here, but why don't you just mark the
> source data as volatile? Then it _can't_ assume it knows the length of
> the strcpy because it can't assume it knows the content:
That's what 3 does.
> If PTRRELOC cast the pointer type to
>
> volatile void *
>
> or something else suitable generic but volatile then this discussion might
> not be happening. It would at least move the optimisation into "definite
> compiler bug" if it still happens.
See the rest of the thread for why various people don't like that this
doesn't work 'as-is' anymore, and why other people don't like what it's
doing period (in C anyhow).
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
> Now the claim is that RELOC is bad because it adds an offset to a
> pointer, and the offset is usually around 0xc0000000, and thus we are
> "violating" the C standard. Thus we are being told that someday this
> will break and cause a lot of grief. My contention is that this will
> only break if a pointer becomes something other than a simple address
> and pointer arithmetic becomes something other than simple 2's
> complement addition, subtraction, etc. If that happens then C will
> have become useless for implementing a kernel, IMHO.
Well, pointer arithmetic on the HPPA isn't simple 2's complement due to
the way implicit space register selection works.
For example in an unscaled indexed addressing mode like
ldbx srcreg1(srcreg2), dstreg
Is not equivalent to
ldbx srcreg2(srcreg1), dstreg
ie x + y != y + x for computing an address in a memory operation.
For the same reason computing an address outside of an object into a
register, then using a displacement in a memory operation to generate an
address that is within the bounds of the object may not work. ie, you
can't do something like this and expect it to work:
int a[10];
int *z = a - 10;
foo()
{
return z[10];
}
And before anyone says the PA is unbelievably strange and nobody else would
make the same mistakes -- the mn10300 (for which a linux port does exist)
has the same failings.
jeff
On Thu, Jan 03, 2002 at 01:33:27PM +1100, Paul Mackerras wrote:
> I look forward to seeing your patch to remove all uses of
> virt_to_phys, phys_to_virt, __pa, __va, etc. from arch/alpha... :)
I don't dereference them either, do I?
r~
On Thu, 3 Jan 2002, Paul Mackerras wrote:
> Richard Henderson writes:
> > Ignore strcpy. Yes, that's what visibly causing a failure here,
> > but the bug is in the funny pointer arithmetic. Leave that in
> > there and the compiler _will_ bite your ass sooner or later.
>
> I look forward to seeing your patch to remove all uses of
> virt_to_phys, phys_to_virt, __pa, __va, etc. from arch/alpha... :)
Isn't this why we use `unsigned long' to represent physical addresses, and
`void *' to represent kernel virtual addresses? Not only helps it against a
user dereferencing a physical address `pointer', but also against gcc trying to
be (too) smart.
Of course this also implies we have to change the pointer argument in readb()
and friends, since it's not a real pointer but a magic cookie. But we already
had that discussion last year...
[ and the conclusion was: keep the pointer, so we can do readl(&base->field) ]
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Richard Henderson writes:
> On Thu, Jan 03, 2002 at 01:33:27PM +1100, Paul Mackerras wrote:
> > I look forward to seeing your patch to remove all uses of
> > virt_to_phys, phys_to_virt, __pa, __va, etc. from arch/alpha... :)
>
> I don't dereference them either, do I?
Does that matter? To quote your earlier message:
> No. You still have the problem of using pointer arithmetic past
> one past the end of the object.
>
> C99 6.5.6/8:
>
> If both the pointer operand and the result point to elements of the
> same array object, or one past the last element of the array object,
> the evaluation shall not produce an overflow; otherwise, the behavior
> is undefined.
I don't have a copy of the C standard handy, but that sounds to me
like the result of (unsigned long)(&x) - KERNELBASE is undefined.
Also, what does the standard say about casting pointers to integral
types? IIRC you aren't entitled to assume that a pointer will fit in
any integral type, or anything about the bit patterns that you get.
My point is that the kernel makes some assumptions about how pointers
are represented, which are eminently reasonable assumptions, but which
"portable" C programs are not entitled to make according to the C
standard.
Paul.
> -----Original Message-----
> From: Tom Rini [mailto:[email protected]]
> Sent: Thursday, January 03, 2002 12:13 AM
> To: jtv
> Cc: Richard Henderson; Momchil Velikov; [email protected];
> [email protected]; [email protected]; Franz Sirl; Paul
> Mackerras; Benjamin Herrenschmidt; Corey Minyard
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> On Wed, Jan 02, 2002 at 11:23:21PM +0100, jtv wrote:
> > On Wed, Jan 02, 2002 at 03:05:48PM -0700, Tom Rini wrote:
> > >
> > > Well, the problem is that we aren't running where the
> compiler thinks we
> > > are yet. So what would the right fix be for this?
> >
> > Obviously -ffreestanding isn't, because this problem could
> crop up pretty
> > much anywhere. The involvement of standard library
> functions is almost
> > coincidence and so -ffreestanding would only fix the
> current symptom.
>
> After thinking about this a bit more, why wouldn't this be
> the fix? The
> problem is that gcc is assuming that this is a 'normal' program (or in
> this case, part of a program) and that it, and that the standard rules
> apply, so it optimizes the strcpy into a memcpy. But in this
> small bit
> of the kernel, it's not. It's not even using the 'standard library
> functions', but what the kernel provides. This problem can
> only crop up
> in the time before we finish moving ourself around.
OH... are you saying that the Linux kernel is not writtent in ANSI C? AFAICR
the standard *requires* that the standard library functions have their
standard meaning. So if the Linux kernel expects them to have some special
meaning it is *non-conforming* and then you need a *special* compiler,
understanding this.
OTOH if you only says that the Linux kernel is built with versions of
strcpy/memcpy that have the exact meaning required by the standard but are
not found in th ecompiler's library, then the program is still OK, but the
compiler is *allowed* to use the optimisations it wants to, regarding the
semantics of strcpy/memcpy. And anyway, in this case the "-ffreestanding"
option is *required* as this is what warn the compiler to use an
strcpy/memcpy/... implementation that *the program" provides instead of the
one that the compiler choose to provide (either by inlining or linking
against the library).
But tho whoel point here is that the RELOC macro has an undefined behaviour;
even if -ffreestanding solves the *current* problem, it's unsafe to build
*any* program on undefined behaviour (and particularly such for the kernel
itself...).
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
> -----Original Message-----
> From: Paul Mackerras [mailto:[email protected]]
> Sent: Thursday, January 03, 2002 4:12 AM
> To: Joe Buck
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> Joe Buck writes:
>
> > There is already such a project under development: see
> >
> > http://gcc.gnu.org/projects/bp/main.html
> >
> > This is a modification to gcc that implements pointers as triples.
> > While there is a performance penalty for doing this, it can
> completely
> > eliminate the problem of exploitable buffer overflows.
> However, programs
> > that violate the rules of ISO C by generating out-of-range
> pointers will
> > fail.
>
> What will it do if I cast a pointer to unsigned long? Or if I cast an
> unsigned long to a pointer? The kernel does both of these things, and
> in a lot of places.
>
> Part of my beef with what gcc-3 is doing is that I take a pointer,
> cast it to unsigned long, do something to it, cast it back to a
> pointer, and gcc _still_ thinks it's knows what I am doing. It
> doesn't.
So say it to him: the problem is that the C language gives a quite precise
meaning to the RELOC macro code: When writing RELOC("some string") you say
to the C compiler: look, I want a pointer *in the string* at offset
0xC0000000. The GCC does exactly that. Afterwards it use this for optimizing
strcpy, but you fooled it before and it may be fooled for other cases (e.g.
if you do strlen(RELOC("xxx")) or RELOC("0123456789ABCDEF")[x] in a
"print_hexadecimal" function...).
There is ways in GCC to be sure the compiler forgets what it knows about a
value and use it as you provide it; one of the cleaner is using assembly
language. Note that someone proposed the use of a no-operation asm
instruction that just pretend to change the value returned. If the asm
instruction is furthermore declared "volatile" (I don't remember if it was)
then GCC guarantees that the instruction will not be optimzed away in any
case and so, as the kernel *knows* that RELOC("some string") is effectively
yielding the proper bit-level representation for a valid pointer to the
string th eprogrammer wants, GCC will trust the code and all will be OK.
And all this is just fixing ONE line in the kernel. Is that too much of a
job to do witout discussing it at will in several dozen messages?
Just my .02 euros
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
Bernard Dautrevaux writes:
> OH... are you saying that the Linux kernel is not writtent in ANSI C? AFAICR
No, in fact the kernel isn't written in ANSI C. :)
If nothing else, the fact that it uses a lot of gcc-specific
extensions rules that out. And it assumes that you can freely cast
pointers to unsigned longs and back again. I'm sure others can add to
this list.
> the standard *requires* that the standard library functions have their
> standard meaning. So if the Linux kernel expects them to have some special
> meaning it is *non-conforming* and then you need a *special* compiler,
> understanding this.
Sure the kernel is non-conforming. "Conforming" means that the
program will run the same on any architecture, and the Linux kernel
surely won't do that - it won't get very far on a PDP-10, I can assure
you. :)
Paul.
Paul Mackerras <[email protected]> writes:
> [Linux] won't get very far on a PDP-10, I can assure you. :)
Any particular reason why?
--
Lars Brinkhoff http://lars.nocrew.org/ Linux, GCC, PDP-10
Brinkhoff Consulting http://www.brinkhoff.se/ programming
<<And before anyone says the PA is unbelievably strange and nobody else would
make the same mistakes -- the mn10300 (for which a linux port does exist)
has the same failings.
>>
Not clear these are failings, at least with respect to C, since the
addressing models are completely consistent with the C language, just
not consistent with some abuses of it.
[email protected] writes:
> the mn10300 (for which a linux port does exist)
I tried to locate info about this port, but I couldn't find any.
Do you have any pointers?
--
Lars Brinkhoff http://lars.nocrew.org/ Linux, GCC, PDP-10
Brinkhoff Consulting http://www.brinkhoff.se/ programming
<<No, in fact the kernel isn't written in ANSI C. :)
If nothing else, the fact that it uses a lot of gcc-specific
extensions rules that out. And it assumes that you can freely cast
pointers to unsigned longs and back again. I'm sure others can add to
this list.
>>
Most certainly this list should exist in precise defined form. It is not
unreasonable to have a specific list of features that
a) significant programs rely on, and are allowed to rely on
b) gcc promises to implement as specified, regardless of the standard
What is not reasonable is to have various people informally guess at things
that "obviously" can be expected to work in any "reasonable" C implementation.
It is this kind of informality that is asking for trouble.
> > There is already such a project under development: see
> >
> > http://gcc.gnu.org/projects/bp/main.html
> >
> > This is a modification to gcc that implements pointers as triples.
> > While there is a performance penalty for doing this, it can completely
> > eliminate the problem of exploitable buffer overflows. However, programs
> > that violate the rules of ISO C by generating out-of-range pointers will
> > fail.
>
> What will it do if I cast a pointer to unsigned long? Or if I cast an
> unsigned long to a pointer? The kernel does both of these things, and
> in a lot of places.
Perhaps you need to reread a book like K&R, second edition, to find out
what the C language promises you and what it doesn't.
The bounded-pointer approach represents a C pointer as three machine
pointers. When you cast to unsigned long, it converts the "value"
pointer. When you cast an unsigned long to a pointer, you lose the range
information, so the pointer's extent is anywhere in memory. However,
if the compiler can determine by dataflow analysis where this value
came from it is entitled to put the original range information back.
> Part of my beef with what gcc-3 is doing is that I take a pointer,
> cast it to unsigned long, do something to it, cast it back to a
> pointer, and gcc _still_ thinks it's knows what I am doing. It
> doesn't.
The C language standard defines quite precisely what the compiler is
allowed to assume and what it is not allowed to assume. If you don't
like these rules, you will dislike *all* modern C compilers, especially
the expensive proprietary C compilers hardware vendors often use to
get good SPEC results.
David Woodhouse wrote:
>
> What part of 'undefined behaviour' is so difficult for people to understand?
The behaviour is undefined by the C standard. But the mentioned
pointer arithmetic is defined in the environment where it has been
used. GCC tries to optimize undefined C-standard behaviour. And
IMHO that's the point. It may optimize defined behaviour and should
not touch things undefined by the standard.
Ciao, ET.
PS: Hey, we are talking about C, the de luxe assembler! *g*
>>>>> "Edgar" == Edgar Toernig <[email protected]> writes:
Edgar> David Woodhouse wrote:
>>
>> What part of 'undefined behaviour' is so difficult for people to understand?
Edgar> The behaviour is undefined by the C standard. But the mentioned
Edgar> pointer arithmetic is defined in the environment where it has been
Edgar> used. GCC tries to optimize undefined C-standard behaviour. And
Edgar> IMHO that's the point. It may optimize defined behaviour and should
Edgar> not touch things undefined by the standard.
How do you imagine that "not touch" ? like "#if 0 ... #endif" ?
Edgar> PS: Hey, we are talking about C, the de luxe assembler! *g*
Standard is a standard. Broken code is broken code. Sometimes you can
get away with it, sometimes you can't and when you can't you fix the
code. Or define your own standard and make a compiler implementing
it.
On Thu, Jan 03, 2002 at 04:46:11PM +0100, Edgar Toernig wrote:
>
> The behaviour is undefined by the C standard. But the mentioned
> pointer arithmetic is defined in the environment where it has been
> used. GCC tries to optimize undefined C-standard behaviour. And
> IMHO that's the point. It may optimize defined behaviour and should
> not touch things undefined by the standard.
Ah, if only things were that easy! The whole reason the rules are as they
are in C is that it is not generally decidable whether or not the code
falls within those rules. Removing the assumptions we're talking about
from the rules would make the ability to optimize code an exception instead
of the rule.
Jeroen
On Wed, Jan 02, 2002 at 06:17:09PM -0700, Tom Rini wrote:
>
> I'm partial to Paul's' suggestion of redoing RELOC and friends in asm.
So am I--with the portable fallback of turning RELOC into an extern
function. As an added benefit, that would probably eliminate a lot
of individual offset variables and hide the ugliness in a single place.
Just so it's documented, here's why I think the changes I coined earlier
won't work:
(1) Turn offset into an extern variable.
This doesn't work in cases like "strlen("string" + offset)", which the
compiler may translate to the equivalent of "strlen("string") - offset",
ie. "6 - offset" which of course still breaks.
(2) Cast to const volatile void *.
This *might* work, because gcc ought to suppress some optimizations upon
finding the 'volatile,' but the volatile would have to be cast out again
to make functions accept it as a nonvolatile parameter. And if it's cast
out, gcc might decide to ignore it as well--or even perform its regular
constant propagation regardless. Not casting the volatile away again
OTOH might make the code a lot slower.
(3) Use -fPIC.
This isn't supported by all architectures, and requires special linker
support which may not be available yet at the time it's needed.
IIRC SAS/C used to have an option to allocate strings in the same
section of the object file as the code, and use PC-relative addressing
to get at them. Anyone know if gcc has anything like that? That should
also fix the problem by deferring computation of the string's address
to computation time. Frankly though I don't even know if all
architectures have PC-relative addressing...
> Well, RELOC _always_ is doing funny arithmetic, and there's not much we
> can do about it. At this point in the bootup we aren't running where
> things think we are yet, and have to adjust things as such.
The arithmetic can't be helped--agreed. We just have to break some
assumptions that gcc is allowed to make based on the information it has
and the C memory model.
Let's just make sure we break them beyond repair. :-)
Jeroen
On Jan 2, 2002, David Woodhouse <[email protected]> wrote:
> [email protected] said:
>> (and for CONSTANT < 5 it of course generated correct code to fill dst
>> with string contents; and yes, I know that code will sigsegv on run
>> because of dst is not initialized - but it should die at runtime, not
>> at compile time).
> An ICE, while it's not quite what was expected and it'll probably get fixed,
> is nonetheless a perfectly valid implementation of 'undefined behaviour'.
Not really. Think of code whose execution is guarded by a test that
guarantees it won't invoke undefined behavior, but whose test is in a
separate translation unit. Or even of code that is never executed.
The compiler has no business telling whether the code may potentially
invoke undefined behavior, or whether it's going to be executed at
all, it has to compile it to something that does whatever it wishes
*if* it's executed. So, it should indeed die at runtime, not at
compile time.
--
Alexandre Oliva Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist *Please* write to mailing lists, not to me
On Thu, Jan 03, 2002 at 12:35:30AM +0000, David Woodhouse wrote:
>
> (cc list trimmed)
>
> [email protected] said:
> > If you want a strcpy that isnt strcpy then change its name or use a
> > different language 8)
>
> The former is not necessarily sufficient in this case. You've still done the
> broken pointer arithmetic, so even if the function isn't called strcpy() the
> compiler is _still_ entitled to replace it with a call to memcpy() or even
> machine_restart() before sleeping with your mother and starting WW III.
No, that's not true. We're passing a pointer as an argument. It is a
valid pointer - dereferencing it may not be valid, but the pointer is
perfectly legal! There is nothing wrong with this case. The problem
lies in calling a function whose name is special to GCC and to the C
language, which GCC can then transform.
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
> Also, what does the standard say about casting pointers to integral
> types? IIRC you aren't entitled to assume that a pointer will fit in
> any integral type, or anything about the bit patterns that you get.
Yes, but you're on much safer ground here, since the conversion is
implementation defined. Practical considerations almost guarantee the
implementation choice will be "int == address" for targets where this
makes sense (e.g. has a simple, flat, contiguous address space).
Also, in the latest version of the C standard, you do have a int type
that can contain a pointer.
Allowing people to treat pointers as if they were "just" integers
prevents a whole slew of interesting and useful compiler
transformations, which is why the standard frowns upon such behavior.
Buffer overflow checks are an example. It's possible to build bounded
pointer implementations for strict ANSI C, but impossible for the "all
pointers are just integers" variant.
Do the compiler a favor. If you're playing with pointers as if they
are integers, make them integers. Types are your friend.
-Tim
Lars Brinkhoff writes:
> > [Linux] won't get very far on a PDP-10, I can assure you. :)
>
> Any particular reason why?
For a start, there isn't an arch/pdp10 directory. The kernel's
approach to portability is to have code tailored to each architecture,
and we don't have such code for a pdp10.
As to the question whether such code could be developed, it would
depend a lot on how gcc did things. I would expect an int * to be
just an 18-bit address but a char * to be a byte pointer, i.e. a
36-bit word with the byte offset and size in the top 18 bits and the
word address in the lower 18 bits. This would mean that casting
char * pointers to unsigned long and vice-versa wouldn't give the kind
of results we expect. The kernel assumes in a lot of places that
memory is byte-addressable and that casting a pointer to an unsigned
long gives you the byte address of the first byte of the object that
the pointer points to, and that it can do arithmetic on those byte
addresses.
Another difficulty would be in relation to the MMU. IIRC, the KA10
processor had a simple offset/limit memory management scheme, which
would not be sufficient for linux, which requires support for paged
virtual memory. I have forgotten what the KI10 and KL10 processors
did; I recall it was more complex but I don't think it amounted to
paged virtual memory.
Paul.
Hi!
> > (cc list trimmed)
> >
> > [email protected] said:
> > > If you want a strcpy that isnt strcpy then change its name or use a
> > > different language 8)
> >
> > The former is not necessarily sufficient in this case. You've still done the
> > broken pointer arithmetic, so even if the function isn't called strcpy() the
> > compiler is _still_ entitled to replace it with a call to memcpy() or even
> > machine_restart() before sleeping with your mother and starting WW III.
> >
> > Granted, it probably _won't_ do any of those today, but you should know
> > better than to rely on that.
> >
> > What part of 'undefined behaviour' is so difficult for people to understand?
>
> I think it comes down to an expectation that if the behaviour is
> undefined, anything _could_ happen, but what should happen is that it
> should just be passed along to (in this case) strcpy un-modified.
gcc is allowed not to pass it anywhere. You may not second guess
optimizer. If it is not defined, it is not defined.
Imagine
strcpy(a, "xyzzy"+b);
if (b>16)
printf("foo");
. gcc is permitted to kill printf(), because if b<0 or b>16 behaviour is
undefined. So gcc may assume b<=16.
Quoting alan:
################################################################################
# What part of 'undefined behaviour' is so difficult for people to understand? #
################################################################################
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
Hi!
> > Well, the problem is that we aren't running where the compiler thinks we
> > are yet. So what would the right fix be for this?
>
> Assembly.
Is there problem with doing what Jakub suggested?
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
On Thu, Jan 03, 2002 at 11:40:46PM +0100, Pavel Machek wrote:
> Is there problem with doing what Jakub suggested?
No, but I think it better to extract all of the relocation up front
rather than scattering it about the source code.
r~
Paul Mackerras <[email protected]> writes:
> One of the reasons why C is a good language for the kernel is that its
> memory model is a good match to the memory organization used by the
> processors that linux runs on. Thus, for these processors, adding an
> offset to a pointer is in fact simply an arithmetic addition.
But this is not the memory model of C! Adding an offset to a pointer
is an operation involving objects defined by the C language, and not
machine registers. Sometimes, this makes a noticeable difference.
Paul Mackerras <[email protected]> writes:
> Richard Henderson writes:
>
>> On Thu, Jan 03, 2002 at 01:33:27PM +1100, Paul Mackerras wrote:
>> > I look forward to seeing your patch to remove all uses of
>> > virt_to_phys, phys_to_virt, __pa, __va, etc. from arch/alpha... :)
>>
>> I don't dereference them either, do I?
>
> Does that matter?
No, it doesn't. C pointers do not behave like machine addresses,
there are additional constraints.
> Also, what does the standard say about casting pointers to integral
> types? IIRC you aren't entitled to assume that a pointer will fit in
> any integral type, or anything about the bit patterns that you get.
This casting falls into the category "additional language constructs
supported by GCC". The standard contains wording which encourages
implementors to include a meaningfull conversion from a pointer type
to an integral type, but there is no requirement that this has the
semantics expected by most programmers.
Tom Rini <[email protected]> writes:
> 1) Change this particular strcpy to a memcpy
That doesn't fix the undefined behavior.
> 2) Add -ffreestanding to the CFLAGS of arch/ppc/kernel/prom.o (If this
> optimization comes back on with this flag later on, it would be a
> compiler bug, yes?)
That doesn't fix the undefined behavior.
> 3) Modify the RELOC() marco in such a way that GCC won't attempt to
> optimize anything which touches it [1]. (Franz, again by Jakub)
That *does* fix the undefined behavior, and it seems that GCC is going
to stay the same in the future, so this is a feasible workaround.
> 4) Introduce a function to do the calculations [2]. (Corey Minyard)
That doesn't fix the undefined behavior.
> 5) 'Properly' set things up so that we don't need the RELOC() macros
> (-mrelocatable or so?), and forget this mess altogether.
This is the clean approach, and thus preferable in the long term. (3)
seems to be the best short-time solution.
[email protected] writes:
> <<No, in fact the kernel isn't written in ANSI C. :)
> If nothing else, the fact that it uses a lot of gcc-specific
> extensions rules that out. And it assumes that you can freely cast
> pointers to unsigned longs and back again. I'm sure others can add to
> this list.
>>>
>
> Most certainly this list should exist in precise defined form.
C99 includes a list of additional guarantees made by many C
implementations (in an informative index). I think we really should
check this list (and the list of implementation-defined behavior) and
document the choices made by GCC. In fact, this documentation is
required by the standard.
> It is this kind of informality that is asking for trouble.
We have seen much trouble in this area before, but I doubt we can
avoid all of them by proper documentation. Quite a few people seem to
write some C code, check that it works, look at the generated machine
code, and if it seems to be correct, the C code is considered to be
correct.
Pavel Machek writes:
> gcc is allowed not to pass it anywhere. You may not second guess
> optimizer. If it is not defined, it is not defined.
>
> Imagine
>
> strcpy(a, "xyzzy"+b);
> if (b>16)
> printf("foo");
>
> . gcc is permitted to kill printf(), because if b<0 or b>16 behaviour is
> undefined. So gcc may assume b<=16.
So... I'm curious. How is the result of virt_to_phys ever anything
other than undefined?
Paul.
Paul Mackerras <[email protected]> writes:
> Lars Brinkhoff writes:
> > > [Linux] won't get very far on a PDP-10, I can assure you. :)
> > Any particular reason why?
> As to the question whether such code could be developed, it would
> depend a lot on how gcc did things. I would expect an int * to be
> just an 18-bit address but a char * to be a byte pointer, i.e. a
> 36-bit word with the byte offset and size in the top 18 bits and the
> word address in the lower 18 bits.
This is true in the classic PDP-10 architecture, but would it
really be worthwhile to run Linux in an 18-bit address space?.
In the extended architecture, an int * is a 30-bit address, and the
byte offset and size is encoded in the top 6 bits. This would be a
more suitable target for Linux.
> This would mean that casting char * pointers to unsigned long and
> vice-versa wouldn't give the kind of results we expect. The kernel
> assumes in a lot of places that memory is byte-addressable and that
> casting a pointer to an unsigned long gives you the byte address of
> the first byte of the object that the pointer points to, and that it
> can do arithmetic on those byte addresses.
When a pointer-to-integer conversion is made (or vice versa), GCC
could generate code to convert between a PDP-10 hardware pointer and a
linear byte address.
> Another difficulty would be in relation to the MMU. IIRC, the KA10
> processor had a simple offset/limit memory management scheme, which
> would not be sufficient for linux, which requires support for paged
> virtual memory.
Right.
> I have forgotten what the KI10 and KL10 processors did; I recall it
> was more complex but I don't think it amounted to paged virtual
> memory.
The KI10 has limited support for paging, but I don't remember the
details. KL10 most definitely supports full paged virtual memory.
--
Lars Brinkhoff http://lars.nocrew.org/ Linux, GCC, PDP-10
Brinkhoff Consulting http://www.brinkhoff.se/ programming
On Fri, 4 Jan 2002, Florian Weimer wrote:
> C99 includes a list of additional guarantees made by many C
> implementations (in an informative index). I think we really should
> check this list (and the list of implementation-defined behavior) and
> document the choices made by GCC. In fact, this documentation is
> required by the standard.
The list of implementation-defined behavior is already in the manual (in
extend.texi). However, the actual documentation isn't yet there, only the
headings - except for the preprocessor, where it is covered in cpp.texi,
and the conversion between pointers and integers:
@item
@cite{The result of converting a pointer to an integer or
vice versa (6.3.2.3).}
A cast from pointer to integer discards most-significant bits if the
pointer representation is larger than the integer type,
sign-extends@footnote{Future versions of GCC may zero-extend, or use
a target-defined @code{ptr_extend} pattern. Do not rely on sign extension.}
if the pointer representation is smaller than the integer type, otherwise
the bits are unchanged.
@c ??? We've always claimed that pointers were unsigned entities.
@c Shouldn't we therefore be doing zero-extension? If so, the bug
@c is in convert_to_integer, where we call type_for_size and request
@c a signed integral type. On the other hand, it might be most useful
@c for the target if we extend according to POINTERS_EXTEND_UNSIGNED.
A cast from integer to pointer discards most-significant bits if the
pointer representation is smaller than the integer type, extends according
to the signedness of the integer type if the pointer representation
is larger than the integer type, otherwise the bits are unchanged.
When casting from pointer to integer and back again, the resulting
pointer must reference the same object as the original pointer, otherwise
the behavior is undefined. That is, one may not use integer arithmetic to
avoid the undefined behavior of pointer arithmetic as proscribed in 6.5.6/8.
--
Joseph S. Myers
[email protected]
<<We have seen much trouble in this area before, but I doubt we can
avoid all of them by proper documentation. Quite a few people seem to
write some C code, check that it works, look at the generated machine
code, and if it seems to be correct, the C code is considered to be
correct.
>>
We can't avoid people writing wrong code, but we can avoid debate as to
whether the code is right or wrong :-)
<<But this is not the memory model of C! Adding an offset to a pointer
is an operation involving objects defined by the C language, and not
machine registers. Sometimes, this makes a noticeable difference.
>>
Indeed Alex Stepanov notes that Ada unlike C, *does* have general pointer
arithmetic, assuming a linear address space, and finds the lack of this in
C a problem :-) :-)
Florian Weimer writes:
> Paul Mackerras <[email protected]> writes:
>
> > One of the reasons why C is a good language for the kernel is that its
> > memory model is a good match to the memory organization used by the
> > processors that linux runs on. Thus, for these processors, adding an
> > offset to a pointer is in fact simply an arithmetic addition.
>
> But this is not the memory model of C! Adding an offset to a pointer
> is an operation involving objects defined by the C language, and not
> machine registers. Sometimes, this makes a noticeable difference.
Sorry, you are correct. I should have written "One of the reasons why
C used to be a good language for writing operating system kernels ..."
Paul.
<<Sorry, you are correct. I should have written "One of the reasons why
C used to be a good language for writing operating system kernels ..."
>>
C is perfectly well suited for writing operating system kernels, but you
absolutely HAVE to know what you are doing, and that includes knowing the
C standard accurately, and clearly identifying any implementation dependent
behavior that you are counting on.
The "used to be" is bogus. The (base + offset) memory model of C has been
there since the earliest days of the definition of C. The only thing that
"used to be" the case is that people ignored these rules freely and since
compilers were fairly stupid, they got away with this rash behavior.
Lars Brinkhoff writes:
> This is true in the classic PDP-10 architecture, but would it
> really be worthwhile to run Linux in an 18-bit address space?.
>
> In the extended architecture, an int * is a 30-bit address, and the
> byte offset and size is encoded in the top 6 bits. This would be a
> more suitable target for Linux.
Wow, I didn't know that there was an extended PDP-10 architecture.
It's 25 years since I did anything on a PDP-10. :)
> When a pointer-to-integer conversion is made (or vice versa), GCC
> could generate code to convert between a PDP-10 hardware pointer and a
> linear byte address.
Would you use 9-bit bytes or 8-bit bytes? If you use 8-bit bytes then
there will be some bits in a word that aren't part of any byte. If
you use 9-bit bytes, I'm sure that somewhere in the kernel there will
be some code that will break because it is assuming 8-bit bytes.
> The KI10 has limited support for paging, but I don't remember the
> details. KL10 most definitely supports full paged virtual memory.
Somehow I'm getting the feeling that your next message is going to say
"actually, we have been running linux on the KL10 for the past 3
years". :)
Regards,
Paul.
On Fri, 4 Jan 2002, Joseph S. Myers wrote:
>
> On Fri, 4 Jan 2002, Florian Weimer wrote:
>
> > C99 includes a list of additional guarantees made by many C
> > implementations (in an informative index). I think we really should
> > check this list (and the list of implementation-defined behavior) and
> > document the choices made by GCC. In fact, this documentation is
> > required by the standard.
>
> The list of implementation-defined behavior is already in the manual (in
> extend.texi). However, the actual documentation isn't yet there, only the
> headings - except for the preprocessor, where it is covered in cpp.texi,
> and the conversion between pointers and integers:
>
> @item
> @cite{The result of converting a pointer to an integer or
> vice versa (6.3.2.3).}
>
> A cast from pointer to integer discards most-significant bits if the
> pointer representation is larger than the integer type,
> sign-extends@footnote{Future versions of GCC may zero-extend, or use
> a target-defined @code{ptr_extend} pattern. Do not rely on sign extension.}
> if the pointer representation is smaller than the integer type, otherwise
> the bits are unchanged.
> @c ??? We've always claimed that pointers were unsigned entities.
> @c Shouldn't we therefore be doing zero-extension? If so, the bug
> @c is in convert_to_integer, where we call type_for_size and request
> @c a signed integral type. On the other hand, it might be most useful
> @c for the target if we extend according to POINTERS_EXTEND_UNSIGNED.
>
> A cast from integer to pointer discards most-significant bits if the
> pointer representation is smaller than the integer type, extends according
> to the signedness of the integer type if the pointer representation
> is larger than the integer type, otherwise the bits are unchanged.
>
> When casting from pointer to integer and back again, the resulting
> pointer must reference the same object as the original pointer, otherwise
> the behavior is undefined. That is, one may not use integer arithmetic to
> avoid the undefined behavior of pointer arithmetic as proscribed in 6.5.6/8.
>
Wat this last bit added to the standard after ANSI/ISO 9899-1990? I'm
looking through my copy and I can't find it. All I can find is that
in 6.3.6 Additive Operators
When an expression that has integral type is added to or subtracted from a
pointer [...] Unless both the pointer operand and the result point to
elements of the same array object [...] the behavior is undefined...
but
in 6.3.4 Cast operators
A pointer may be converted to an integral type. The size of the integer
required and the result are implementation-defined. [...]
An arbitrary integer may be converted to a pointer.
^^^^^^^^^
The result is implementation defined.
I interpret this to mean that one MAY use integer arithmatic to
do move a pointer outside the bounds of an array. Specifically, as soon
as I've cast the pointer to an integer, the compiler has an obligation to
forget any assumptions it makes about that pointer. This is what casts
from pointer to integer are for! when i say (int)p I'm saying that I
understand the address structure of the machine and I want to manipulate
the address directly.
If the satandard has changed so this is no longer possible, there
NEEDS to be some other way in the new standard to express the same
concept, or large application domains where C is currently in use will
stop working.
-JKL
On Sat, 5 Jan 2002 [email protected] wrote:
> Wat this last bit added to the standard after ANSI/ISO 9899-1990? I'm
> looking through my copy and I can't find it. All I can find is that
I was quoting from the GCC manual (providing the documentation
implementations are required to provide of implementation-defined
behaviour); of course the subclause numbers are different in C99 (from
which the subclause numbers in the GCC manual are given) from those in
C90. Perhaps once all the documentation for implementation-defined
behaviour in C99 is present I'll go over what's required to document it
all relative to C90 as well.
> I interpret this to mean that one MAY use integer arithmatic to
> do move a pointer outside the bounds of an array. Specifically, as soon
> as I've cast the pointer to an integer, the compiler has an obligation to
> forget any assumptions it makes about that pointer. This is what casts
> from pointer to integer are for! when i say (int)p I'm saying that I
> understand the address structure of the machine and I want to manipulate
> the address directly.
Just because you've created a pointer P, and it compares bitwise equal to
a valid pointer Q you can use to access an object, does not mean that P
can be used to access that object. Look at DR#260, discussing the
provenance of pointer values, which arose from extensive discussion in the
UK C Panel, and if you think it should be resolved otherwise from how we
(UK C Panel) proposed then raise your position on it at a meeting of your
National Body before the next WG14 meeting.
http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_260.htm
--
Joseph S. Myers
[email protected]
[email protected] writes:
> An arbitrary integer may be converted to a pointer.
> ^^^^^^^^^
This rule exists so that implementations are not forced to issue a
diagnostic for (char *)1.
> I interpret this to mean that one MAY use integer arithmatic to
> do move a pointer outside the bounds of an array. Specifically, as soon
> as I've cast the pointer to an integer, the compiler has an obligation to
> forget any assumptions it makes about that pointer. This is what casts
> from pointer to integer are for! when i say (int)p I'm saying that I
> understand the address structure of the machine and I want to manipulate
> the address directly.
According to the standard, you say that you want to cast p to type
int. You cannot manipulate machine addresses in C because C is
defined as a high-level language, without backdoors to such low-level
concepts as machine addresses.
The fact that quite a few implementations traditionally provide
such backdoors in some cases does not mean that the C language is a
low-level language, or that all implementations (even future ones)
have to provide these backdoors.
> If the satandard has changed so this is no longer possible, there
> NEEDS to be some other way in the new standard to express the same
> concept, or large application domains where C is currently in use will
> stop working.
I don't think there are fundamental and conceptual changes in C99 in
this area. Even with previous C reversions, you should have read the
compiler manual carefully before doing address arithmetic.
On Sat, 5 Jan 2002, Joseph S. Myers wrote:
>
> On Sat, 5 Jan 2002 [email protected] wrote:
>
> > Wat this last bit added to the standard after ANSI/ISO 9899-1990? I'm
> > looking through my copy and I can't find it. All I can find is that
>
> I was quoting from the GCC manual (providing the documentation
> implementations are required to provide of implementation-defined
> behaviour); of course the subclause numbers are different in C99 (from
> which the subclause numbers in the GCC manual are given) from those in
> C90. Perhaps once all the documentation for implementation-defined
> behaviour in C99 is present I'll go over what's required to document it
> all relative to C90 as well.
>
> > I interpret this to mean that one MAY use integer arithmatic to
> > do move a pointer outside the bounds of an array. Specifically, as soon
> > as I've cast the pointer to an integer, the compiler has an obligation to
> > forget any assumptions it makes about that pointer. This is what casts
> > from pointer to integer are for! when i say (int)p I'm saying that I
> > understand the address structure of the machine and I want to manipulate
> > the address directly.
>
> Just because you've created a pointer P, and it compares bitwise equal to
> a valid pointer Q you can use to access an object, does not mean that P
> can be used to access that object. Look at DR#260, discussing the
> provenance of pointer values, which arose from extensive discussion in the
> UK C Panel, and if you think it should be resolved otherwise from how we
> (UK C Panel) proposed then raise your position on it at a meeting of your
> National Body before the next WG14 meeting.
>
> http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/dr_260.htm
>
I don't particularly like the interpretation stated in DR260, but that
doesn't matter. It still doesn't give the compiler permission to make
something "undefined behavior" that the standard says is "implementation
defined". The standard says an arbitrary integer (arbitrary means there
are no restrictions on how it was produced) can be converted to a pointer
and that the implementation must define what the results of this are.
All DR260 is saying is that the compiler can do certain sorts of deep
analysis on the code to decide whether my pointer is valid. It doesn't
apply to the case of casting to an integer, munging, then casting back
because the the implementation is supposed to provide a rule for when
such pointers are valid.
Of course the GCC folks are within their rights to say that the
implementation defined behavior is "all pointers created from integers are
broken" but this is THEIR DECISION not a consequence of the standard, and
not a consequence of DR260. And it would be a very, very bad decision
given that one of the major uses of GCC is to build Linux, and
apparently Linux breaks when this happens.
Or to put it another way: The Linux kernel developers NEED an
implementation defined way to build a pointer from an address. If
arithmatic on byte pointers isn't it (which is perfectly understandable)
and casting a pointer from an integer isn't it, then can someone please
tell us (not just "that will work for now" or "that will probably work")
how do we do that?
-JKL
On Sat, 5 Jan 2002, Florian Weimer wrote:
> [email protected] writes:
>
> > An arbitrary integer may be converted to a pointer.
> > ^^^^^^^^^
>
> This rule exists so that implementations are not forced to issue a
> diagnostic for (char *)1.
>
No. If the only goal was to avoid a diagnostic, the it could be
undefined. The fact that it is implementation defined means that it's
reasonable to expect it to be useful for something.
> > I interpret this to mean that one MAY use integer arithmatic to
> > do move a pointer outside the bounds of an array. Specifically, as soon
> > as I've cast the pointer to an integer, the compiler has an obligation to
> > forget any assumptions it makes about that pointer. This is what casts
> > from pointer to integer are for! when i say (int)p I'm saying that I
> > understand the address structure of the machine and I want to manipulate
> > the address directly.
>
> According to the standard, you say that you want to cast p to type
> int. You cannot manipulate machine addresses in C because C is
> defined as a high-level language, without backdoors to such low-level
> concepts as machine addresses.
>
> The fact that quite a few implementations traditionally provide
> such backdoors in some cases does not mean that the C language is a
> low-level language, or that all implementations (even future ones)
> have to provide these backdoors.
>
Whether C is a "low-level" or "high-level" language (whatever that
means) is not the issue here. The issue here is that, as you say, quite a
few implementations of C provide the ability to access arbitrary memory
through pointers, and the standard provides some convenient places for the
implementation to do it, and many users of GCC would really like to have
this feature.
Forgive me if I haven't been reading the manual carefully enough. Now can
we please get a straight answer to the question: Does GCC provide the
ability to turn an arbitrary address into a pointer, and if so how do you
do it?
-JKL
[email protected] writes:
> Forgive me if I haven't been reading the manual carefully enough. Now can
> we please get a straight answer to the question: Does GCC provide the
> ability to turn an arbitrary address into a pointer, and if so how do you
> do it?
If you want to do more than what is permitted according to the
documentation (quoted in the message by Joseph S. Myers you replied
to), you have to use machine code insertions in order to get
deterministic effects. At least this seems to be the consensus so
far.
On Sat, 5 Jan 2002 [email protected] wrote:
> I don't particularly like the interpretation stated in DR260, but that
> doesn't matter. It still doesn't give the compiler permission to make
> something "undefined behavior" that the standard says is "implementation
> defined". The standard says an arbitrary integer (arbitrary means there
> are no restrictions on how it was produced) can be converted to a pointer
> and that the implementation must define what the results of this are.
> All DR260 is saying is that the compiler can do certain sorts of deep
> analysis on the code to decide whether my pointer is valid. It doesn't
> apply to the case of casting to an integer, munging, then casting back
> because the the implementation is supposed to provide a rule for when
> such pointers are valid.
It says that a pointer is more than a sequence of bits, that the
provenance of a pointer determines whether it can be used to access an
object to which it points. (This notion is already explicit in the
standard for "restrict" - which has various confusing issues when you go
beyond its intended purpose of making code, to do the sort of things done
in Fortran 77, as fast as in Fortran 77, see
<URL:http://www.cbau.freeserve.co.uk/RestrictPointers.html>. Related are
the exact meaning of "effective type" (see also the question about
type-based aliasing rules in DR#236), and exactly how large the object
that can be accessed through a pointer is (we know that in accesses to a
multidimensional array, each index must be within its bounds, but there
are other problems). See Nick Maclaren's "What is an object?" discussion
- WG14 reflector sequence 9350, but I don't think it's been more widely
published.)
In the case of GCC, and casting integers to pointers, a rule has been
provided: the value of the pointer is determined by the bits, and the
provenance, determining what can be accessed, is derived from that of the
integer. This works fine with uses such as masking off a few bits to get
a more aligned pointer, since there we are still accessing the original
object. (I think the bounded pointers work also handles certain
cast-adjust-cast-back cases specially, to preserve the bounds when such
idioms are used rather than creating an unbounded pointer.) Simply
documenting the bits of the pointer, and not what it can access, would not
be sufficient documentation of the results of the cast, but the
documentation describes both.
--
Joseph S. Myers
[email protected]
Paul Mackerras <[email protected]> writes:
> Wow, I didn't know that there was an extended PDP-10 architecture.
> It's 25 years since I did anything on a PDP-10. :)
Ok, the extended KL10 wasn't released until around 1978.
> Would you use 9-bit bytes or 8-bit bytes?
The char type is 9 bits by default. There is also support for an
8-bit char type. With four 8-bit chars in a word, the four least
signifiant bits are unused.
> If you use 9-bit bytes, I'm sure that somewhere in the kernel there
> will be some code that will break because it is assuming 8-bit
> bytes.
> Somehow I'm getting the feeling that your next message is going to
> say "actually, we have been running linux on the KL10 for the past 3
> years". :)
No, I'm not that sinister. If there was a Linux port, I would have
told you already. Anyway, we need to finish the GCC port first.
--
Lars Brinkhoff http://lars.nocrew.org/ Linux, GCC, PDP-10
Brinkhoff Consulting http://www.brinkhoff.se/ programming
<<Or to put it another way: The Linux kernel developers NEED an
implementation defined way to build a pointer from an address. If
arithmatic on byte pointers isn't it (which is perfectly understandable)
and casting a pointer from an integer isn't it, then can someone please
tell us (not just "that will work for now" or "that will probably work")
how do we do that?
>>
Now that's a very reasonable question. Note that it is not just
kernel developers who need this, anyone doing memory mapped I/O
needs similar capabilities. They are present explicitly in Ada
(with address clauses), but not in C, so you do indeed need to
decide how to provide this capability in a reasonable manner.
Joseph S. Myers writes:
> Just because you've created a pointer P, and it compares bitwise equal to
> a valid pointer Q you can use to access an object, does not mean that P
> can be used to access that object. Look at DR#260, discussing the
I looked at this, and it starts out with an example that includes a
statement free(p); (where p was assigned a value returned from malloc)
and then states that "After the call to free the value of p is
indeterminate."!
This seems absolutely and completely bogus to me. Certainly, after
the free, the value of *p is indeterminate, but the value of p itself
*is* determinate; its value after the free is identical to its value
before the free. Why do they say that the value of p itself is
indeterminate after the free?
The two examples of why a compiler might want to change the value are
also bogus; the compiler can avoid writing the value of p from a
register back to memory only if the value is dead, and it isn't in the
example given. As for the debugging opportunity, if I want p to be
set to NULL or some other pattern for debugging I'll do it explicitly.
In general I think that when a pointer value has been obtained by a
cast to an integer or by passing the address of a pointer to a
function, the compiler should assume that the pointer can point
anywhere. That means reduced opportunities for optimization, but so
be it. Note that all of the examples in DR#260 involve passing &p to
some function.
Paul.
[email protected] writes:
> <<Sorry, you are correct. I should have written "One of the reasons why
> C used to be a good language for writing operating system kernels ..."
> >>
>
> C is perfectly well suited for writing operating system kernels, but you
Actually, having seen some of the subsequent messages in this thread,
for example this gem from Florian Weimer, I stand by my statement. :)
> You cannot manipulate machine addresses in C because C is
> defined as a high-level language, without backdoors to such low-level
> concepts as machine addresses.
We keep getting told that there is nothing in the standard that says
that the compiler has to give us a way to construct a pointer (one
that we can dereference) from an address, and that if we think we have
a way at the moment we shouldn't rely on it because it might change at
any minute, and if it does and our kernel doesn't work any more it's
all our fault.
> absolutely HAVE to know what you are doing, and that includes knowing the
> C standard accurately, and clearly identifying any implementation dependent
> behavior that you are counting on.
There are some C compilers that are useful for implementing a kernel,
that's true. But when the maintainers of such a compiler say things
that imply that they feel they are constrained only by the standard
and not by the needs of their users, it is very discouraging.
> The "used to be" is bogus. The (base + offset) memory model of C has been
> there since the earliest days of the definition of C. The only thing that
> "used to be" the case is that people ignored these rules freely and since
> compilers were fairly stupid, they got away with this rash behavior.
Oh, I was talking about the original C language as it was designed and
implemented by 2 or 3 smart people who were also using it to build a
kernel, a compiler and a lot of other programs. Not that the original
C language didn't have flaws; it did, and the ANSI committee that
developed the standard fixed a lot of them. But I do feel that the
language has drifted away from its roots as a systems programming
language.
Paul.
<<There are some C compilers that are useful for implementing a kernel,
that's true. But when the maintainers of such a compiler say things
that imply that they feel they are constrained only by the standard
and not by the needs of their users, it is very discouraging.
>>
What is important is for these users to *clearly* and at least
semi-formally, state their needs. Saying general things about the need
to be useful is hardly helpful!
You quote Florian:
> You cannot manipulate machine addresses in C because C is
> defined as a high-level language, without backdoors to such low-level
> concepts as machine addresses.
Unfortunately Florian is right. The ability in C to manipulate low-level
concepts such as machine addresses is NOT part of the language, but rather
comes from exploiting aspects that are deliberately left implementation
dependent. This is why it is so important to formally state the requirements
that are being depended on.
I don't think anyone seriously objects to trying to formulate solutions
to what is indeed a very important problem.
But it is hardly helpful for people to take the attitude "we wrote this
kernel, and it worked, and any change to the compiler that stops it from
working is unacceptable".
[email protected] writes:
> What is important is for these users to *clearly* and at least
> semi-formally, state their needs. Saying general things about the need
> to be useful is hardly helpful!
Sure, here we go:
* Given an address as an int (of the appropriate size), I need a way
to construct a pointer, which when dereferenced, will result in the
CPU presenting that address to the MMU.
* I need a way to tell the compiler not to make any assumptions about
what objects that such a pointer might or might not point to, so
that the effect of dereferencing the pointer is simply to access the
memory at the address I gave, and is not considered "undefined"
regardless of how I might have constructed the address.
* Given a pointer, I need a way to determine the address (as an int of
the appropriate size) that the CPU will present to the MMU when I
dereference that pointer.
GCC already does the first and third, but there doesn't seem to be a
clean and reliable way to do the second.
> I don't think anyone seriously objects to trying to formulate solutions
> to what is indeed a very important problem.
There have been a variety of points of view expressed, and some of
them (not yours) seemed to be saying that it was invalid for a C
program to be trying to manipulate machine addresses at all.
> But it is hardly helpful for people to take the attitude "we wrote this
> kernel, and it worked, and any change to the compiler that stops it from
> working is unacceptable".
No-one has taken that attitude that I have seen. You may be
interested to know that I am now using -mrelocatable instead of the
RELOC macro. That solves most of the problems (and cleans up the code
as well) but there are still some instances where I need to relocate a
pointer: specifically, I generate a pointer when running at the
initial address that I need to use when running at the final address.
So I still need to be able to do the type of address arithmetic that,
according to the caveat in the gcc doco, produces an undefined result,
namely taking the address of an object, adding an offset, and storing
the result in a pointer variable that I will dereference later (after
moving the kernel and its data to its final location).
I find it hardly helpful to be told that doing that is invalid without
anyone offering me a valid way to achieve the effect I want - and
no-one has, other than to say that as long as I dereference the
pointer some time later, in a different procedure, it should be fine.
Which is fine from a pragmatic point of view but it doesn't alter the
validity or invalidity of the operation.
Paul.
> From: Paul Mackerras <[email protected]>
> Date: Sun, 6 Jan 2002 16:32:05 +1100 (EST)
> * I need a way to tell the compiler not to make any assumptions about
> what objects that such a pointer might or might not point to, so
> that the effect of dereferencing the pointer is simply to access the
> memory at the address I gave, and is not considered "undefined"
> regardless of how I might have constructed the address.
> GCC already does the first and third, but there doesn't seem to be a
> clean and reliable way to do the second.
I gave such a snippet of code in my previous posting. Please show me
when it doesn't work. I am unaware.
>>>>> "Paul" == Paul Mackerras <[email protected]> writes:
Paul> I find it hardly helpful to be told that doing that is invalid without
Paul> anyone offering me a valid way to achieve the effect I want - and
Paul> no-one has, other than to say that as long as I dereference the
Number of people suggested using assembly for this, why you keep
ignoring it and insist instead on changing the compiler, changing the
C standard, switching to another compiler and similar unproductive
ideas put forward solely for the sake of argument ?
Paul> pointer some time later, in a different procedure, it should be fine.
Paul> Which is fine from a pragmatic point of view but it doesn't alter the
Paul> validity or invalidity of the operation.
>From a _pragmatic_ point of view you can just keep the code as is,
fixing the _extremely_ rare cases where violations of the standard
interfere with compiler optimizations. It is too much to ask to
penalize _all_ the software by preventing the compiler from doing
optimizations for the convenience of the maintainers of the PPC port
of Linux. As for listening to the users of the compiler (you mentioned
this in another mail), GCC maintainers (an C comitee, for that matter)
do exactly this. You just have to realize that _we_ are minority,
important, by nevertheless minority.
Regards,
-velco
<<* Given a pointer, I need a way to determine the address (as an int of
the appropriate size) that the CPU will present to the MMU when I
dereference that pointer.
>>
This is in general ill-defined, a compiler might generate code that sometimes
does byte access to a particular byte, anmd sometimes gets the entire word
in which the byte resides.
This is often a nasty issue, and is one of the few things in this area that
Ada does not properly address.
If you have a memory mapped byte, you really want a way of saying
"when I read this byte, do a byte read, it will not work to do a word read"
pragma Atomic in Ada (volatile gets close in C, but is not close enough) will
ensure a byte store in practice, but may not ensure byte reads.
<<* Given an address as an int (of the appropriate size), I need a way
to construct a pointer, which when dereferenced, will result in the
CPU presenting that address to the MMU.
* I need a way to tell the compiler not to make any assumptions about
what objects that such a pointer might or might not point to, so
that the effect of dereferencing the pointer is simply to access the
memory at the address I gave, and is not considered "undefined"
regardless of how I might have constructed the address.
>>
So that seems reasonable as a statement of need (i.e. a high level
requirement), so what is needed now is to craft a well defined way
in GNU C, preferably other than ASM inserts, to achieve this very
reasonable goal.
<<Number of people suggested using assembly for this, why you keep
ignoring it and insist instead on changing the compiler, changing the
C standard, switching to another compiler and similar unproductive
ideas put forward solely for the sake of argument ?
>>
Maybe people will jump on me for saying this, but one objection I have to
using assembly is that the assembly language feature on gcc seems
a) awfully complicated, requiring more detailed knowledge of how the compiler
works than most programmers have.
b) certainly more complicated than comparable features in other compilers,
e.g. Borland C.
c) not that well documented
We find in the Ada world (where we have duplicated the C assembly language
feature more or less 100% exactly), that our customers almost always have to
ask us for help in getting asm inserts correct.
The GNU-C feature here is very powerful, but really not very easy to use!
I also find that introducing asm for this purpose is unnecessarily non-portable.
Yes in some cases, we are talking about very target specific code, but in other
cases the code is not so target specific, and it is desirable to stay within
C if possible.
[email protected] writes:
[...]
| I also find that introducing asm for this purpose is unnecessarily
| non-portable. Yes in some cases, we are talking about very target
| specific code, but in other cases the code is not so target
| specific, and it is desirable to stay within C if possible.
Isn't the incriminated construct already outside of C and non-portable?
-- Gaby
CodeSourcery, LLC http://www.codesourcery.com
>>Isn't the incriminated construct already outside of C and non-portable?
The issue is not whether it is outside standard-C here, which of course
it is, but rather whether there is some extension to GNU-C (either an
interpretation of undefined or implementation defined, or an attribute etc)
which wuld make it inside GNU-C even if outside C, and therefore portable
to GNU-C, which is good enough for this purpose.
[email protected] wrote:
> pragma Atomic in Ada (volatile gets close in C, but is not close enough) will
> ensure a byte store in practice, but may not ensure byte reads.
I see no distinction between read and write in the text of the Ada standard.
Also I think if you declare a byte array with Atomic_Component
and Volatile_Component and that the compiler accepts it for its
target architecture, then the compiler is required to generate
a byte read and store for each occurence in the text source.
From C.6:
15 For an atomic object (including an atomic component) all reads and
updates of the object as a whole are indivisible.
16 For a volatile object all reads and updates of the object as a
whole are
performed directly to memory.
20 {external effect (volatile/atomic objects) [partial]} The external
effect of a program (see 1.1.3) is defined to include each read and
update of
a volatile or atomic object. The implementation shall not generate any
memory
reads or updates of atomic or volatile objects other than those specified by
the program.
In my exemple byte array, if I say X := T (I) I don't see how a conformant
compiler accepting the declaration could generate anything other than
one and exactly one byte read. Per 20 it has no right to read T (I+1) or
T(I-1)
since they are "other" objects (components to be pedantic).
Do you agree with interpretation? Now, I haven't checked what GNAT does
there
(what's accepted and what code is generated) which is of course the
interesting
part :).
--
Laurent Guerby <[email protected]>
<<I see no distinction between read and write in the text of the Ada standard.
>>
The point is that the implementation of a write has, given your quote from the
RM, pretty much no choice but to do an exactly "correct" write, but for a
read, there is nothing to stop reading MORE than the minimum, the requirement
of atomicity is still met. Now of course in your array example, you are
exactly right, so you could rig up an array with elements surrounding the
one you really want. A bit heavy, but yes, that's a trick that will work.
[email protected] wrote:
> A bit heavy, but yes, that's a trick that will work.
Well, the whole point of the exercise and discussion is how to
be heavy handed with the compiler so that it can't get away with
ignoring what the programmer asks even with a good language lawyer :).
If I followed correctly, language lawyer wins in C,
but not in Ada (but the developper can loose if the Ada compiler
just says the declaration is illegal :).
But even in the case of the compiler reading more (and writing anyway),
I think it is the compiler burden to prove that the extra stuff
"read" is not observable (in the sense of external
effect) at execution. In a memory-mapped I/O architecture
where there is a distinction on external effects between a byte read
and a word read (eg: a crash :), the compiler can't get very far IMHO
(if it accepts the declaration of course).
--
Laurent Guerby <[email protected]>
<<But even in the case of the compiler reading more (and writing anyway),
I think it is the compiler burden to prove that the extra stuff
"read" is not observable (in the sense of external
effect) at execution. In a memory-mapped I/O architecture
where there is a distinction on external effects between a byte read
and a word read (eg: a crash :), the compiler can't get very far IMHO
(if it accepts the declaration of course).
>>
External effects are well defined in Ada, see 1.1.3(8) of the RM. The
case you mention is not explicitly listed here, so is not an external
effect in the Ada sense. The rule is not that anything observable be
unaffected (after all performance is observable), but that the
six explicit items in section 1.1.3(8) are not observable. Here they
are for those who do not have the Ada RM at hand (it is not that
Ada is so significant here, as that perhaps we can get some clues
for the desirable behavior in GNU C)
9 Any interaction with an external file (see A.7);
10 The execution of certain code_statements (see 13.8); which code_
statements cause external interactions is implementation defined.
11 Any call on an imported subprogram (see Annex B), including any
parameters passed to it;
12 Any result returned or exception propagated from a main
subprogram (see 10.2) or an exported subprogram (see Annex B) to
an external caller;
13 Any read or update of an atomic or volatile object (see C.6);
14 The values of imported and exported objects (see Annex B) at the
time of any other interaction with the external environment.
> I think it is the compiler burden to prove that the extra stuff
> "read" is not observable (in the sense of external
> effect) at execution. In a memory-mapped I/O architecture
> where there is a distinction on external effects between a byte read
> and a word read (eg: a crash :), the compiler can't get very far IMHO
> (if it accepts the declaration of course).
In the Linux case mmio isn't a problem. The rules for mmio in the portable
code require you use architecture dependant macros (readb etc) and that
the mmio space is mapped via ioremap.
Our ioremap/mmio stuff does a whole variety of things on different platforms
including simple memory accesses, adding I/O fences, motherboard specific
function calls, and tlb bypass.
That side of things is all nicely and cleanly handled.
[email protected] writes:
| >>Isn't the incriminated construct already outside of C and non-portable?
|
| The issue is not whether it is outside standard-C here, which of course
| it is,
Stripping the context of my previous message certainly makes your
answer non-rhetorical.
-- Gaby
> From: [email protected]
> To: [email protected], [email protected]
> Date: Sun, 6 Jan 2002 08:05:56 -0500 (EST)
> If you have a memory mapped byte, you really want a way of saying
> "when I read this byte, do a byte read, it will not work to do a
> word read"
> (volatile gets close in C, but is not close enough) will ensure a
> byte store in practice, but may not ensure byte reads.
? Do you have an example where this fails? Do you not consider it a
bug? Now, I would place a fair amount of buren on the compiler to get
it right, though, this isn't absolute. For example, eieieio or
whatever it is called on the powerpc. I think the chip/OS/MMU must
bear some responsibility for meeting its obligations to the compiler,
and if it doesn't, then that chip/OS/MMU fails to provide a reasonable
base on which to provide the compiler. Did you have this case in
mind?
> From: Gabriel Dos Reis <[email protected]>
> Date: 06 Jan 2002 14:22:45 +0100
> Isn't the incriminated construct already outside of C and non-portable?
That has been established and agreed upon. What we are now into is
pragmatics. You now have to argue why we should not provide this
feature in some form to users that ask for it. Love to hear your
ideas.
> From: [email protected]
> To: [email protected], [email protected]
> Date: Sun, 6 Jan 2002 08:43:53 -0500 (EST)
> The point is that the implementation of a write has, given your
> quote from the RM, pretty much no choice but to do an exactly
> "correct" write, but for a read, there is nothing to stop reading
> MORE than the minimum, the requirement of atomicity is still
> met.
Ok, we can agree the wording in the standard sucks. Though, I think
we might be able to agree on the intent and goal of the wording. It
isn't allowed to mandate the byte read/write, as the machine may not
support it, and that would artificially constrain the machine from
being implementable on the platform, and this would be a bad idea.
I think the goal and intent, for Ada and C as well, is to say that the
compiler will generate what is possible from assembly code written by
an expert on the platform, using the best fitting access that is
possible.
Once we understand the standard and the motivation behind it, and what
it is trying to provide our users, and how our users might reasonably
use it, we can mandate for ourselves as a quality of implementation
issue, if you would like to call it that, those reasonable semantics.
Should we fail to provide them, and should users want them, then it is
finally up to the users to more completely describe the language they
want, and refine the language standards to include it. But, in spite
of that, we don't have to go out of our way to do stupid things, nor
should be brow beat our users with, your code isn't conformant
needlessly. It is all to easy to do that, we should resist the
temptation.
<<? Do you have an example where this fails? Do you not consider it a
bug? Now, I would place a fair amount of buren on the compiler to get
it right, though, this isn't absolute. For example, eieieio or
>>
I don't see the obligation on the compiler here. For instance spupose
you are on an architecture where a word read is faster than a byte read.
Let's make it specific, suppose you are on a 386 and the item is 16-bits.
Now it is quicker to read 32-bits, because no prefix is required. Do you
see anything in the C standard (or the Ada standard :-) which requires a
16-bit read here? I don't, but perhaps I am missing something.
Now if you are saying that this is a reasonable expectation, nothing to do
with the standard, then that's another matter, but my example of a tradeoff
with efficiency is an interesting one.
> In the Linux case mmio isn't a problem. The rules for mmio in the portable
> code require you use architecture dependant macros (readb etc) and that
> the mmio space is mapped via ioremap.
I haven't done my homework, but I assume the code behind readb use inline
assembly and not C on most platforms?
May be GNU C could get some of the idea behind the Linux macros
to define the needed extension (portable within GNU C architectures
wherever possible).
--
Laurent Guerby <[email protected]>
perhaps the right construct is an attribute on a pointer variable that
says essentially: use the bits, forget the provenance.
<<I think the goal and intent, for Ada and C as well, is to say that the
compiler will generate what is possible from assembly code written by
an expert on the platform, using the best fitting access that is
possible.
>>
Ah ha! But then look again at my 16-bit example, an expert assembly
langauge programmer will use a 32 bit load if efficiency is not an
issue (and it does not matter if there are extra bits around), but
a 16-bit load if the hardware for some reason requires it. How is
the poort C compiler to distinguish these cases automatically?
> I haven't done my homework, but I assume the code behind readb use inline
> assembly and not C on most platforms?
It depends on the platform. Its all nicely wrapped in platform specific
macros and using things like volatile to get the right results. Whatever
gcc invents we can cope with for a weird port because we can make them asm
On Jan 2, 2002, jtv <[email protected]> wrote:
> (Yes, I'm a pedant. I'm pining for the day when gcc will support the
> options "-ffascist -Wanal")
How about introducing the options `-flame -War' :-)
--
Alexandre Oliva Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist *Please* write to mailing lists, not to me
mike stump <[email protected]> writes:
| > From: Gabriel Dos Reis <[email protected]>
| > Date: 06 Jan 2002 14:22:45 +0100
|
| > Isn't the incriminated construct already outside of C and non-portable?
|
| That has been established and agreed upon. What we are now into is
| pragmatics. You now have to argue why we should not provide this
| feature in some form to users that ask for it.
Personnally, I don't have any sentiment against the assembler
solution. Dewar said it was unnecessarily un-portable, but that the
construct by itself *is* already unportable.
-- Gaby
Gabriel Dos Reis writes:
> Personnally, I don't have any sentiment against the assembler
> solution. Dewar said it was unnecessarily un-portable, but that the
> construct by itself *is* already unportable.
I assume that what we're talking about is using an asm statement like:
asm("" : "=r" (x) : "0" (y));
to make the compiler treat x as a pointer that it knows nothing about,
given a pointer y that the compiler does know something about. For
example, y might be (char *)((unsigned long)"foo" + offset).
My main problem with this is that it doesn't actually solve the
problem AFAICS. Dereferencing x is still undefined according to the
rules in the gcc manual.
Thus, although this would make the problems go away at the moment,
they will come back at some time in the future, e.g. when gcc learns
to analyse asm statements and realises that the asm is just doing
x = y. I would prefer a solution that will last, rather than one
which relies on details of the current gcc implementation.
Other objections that I have to this solution are:
- it is hard to read; it wouldn't be obvious to someone who doesn't
know the details of gcc asm syntax what it is doing or why
- it is a statement, which makes it less convenient to use than an
expression
- it requires an extra dummy variable declaration.
But my main objection is that I don't have any assurance that it
actually solves the problem in a lasting way.
Paul.
On Mon, Jan 07, 2002 at 08:59:47AM +1100, Paul Mackerras wrote:
> Gabriel Dos Reis writes:
>
> > Personnally, I don't have any sentiment against the assembler
> > solution. Dewar said it was unnecessarily un-portable, but that the
> > construct by itself *is* already unportable.
>
> I assume that what we're talking about is using an asm statement like:
>
> asm("" : "=r" (x) : "0" (y));
>
> to make the compiler treat x as a pointer that it knows nothing about,
> given a pointer y that the compiler does know something about. For
> example, y might be (char *)((unsigned long)"foo" + offset).
>
> My main problem with this is that it doesn't actually solve the
> problem AFAICS. Dereferencing x is still undefined according to the
> rules in the gcc manual.
>
> Thus, although this would make the problems go away at the moment,
> they will come back at some time in the future, e.g. when gcc learns
> to analyse asm statements and realises that the asm is just doing
> x = y. I would prefer a solution that will last, rather than one
> which relies on details of the current gcc implementation.
Even if gcc learned to analyze asm statements (and use it in something other
than scheduling), I'm sure this wouldn't be optimized away exactly because
this construct is used by various projects exactly for this purpose (make
gcc think it can have any value allowed for the type in question).
Jakub
On Sun, Jan 06, 2002 at 11:19:40PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 07, 2002 at 08:59:47AM +1100, Paul Mackerras wrote:
> > Gabriel Dos Reis writes:
> >
> > > Personnally, I don't have any sentiment against the assembler
> > > solution. Dewar said it was unnecessarily un-portable, but that the
> > > construct by itself *is* already unportable.
> >
> > I assume that what we're talking about is using an asm statement like:
> >
> > asm("" : "=r" (x) : "0" (y));
> >
> > to make the compiler treat x as a pointer that it knows nothing about,
> > given a pointer y that the compiler does know something about. For
> > example, y might be (char *)((unsigned long)"foo" + offset).
> >
> > My main problem with this is that it doesn't actually solve the
> > problem AFAICS. Dereferencing x is still undefined according to the
> > rules in the gcc manual.
> >
> > Thus, although this would make the problems go away at the moment,
> > they will come back at some time in the future, e.g. when gcc learns
> > to analyse asm statements and realises that the asm is just doing
> > x = y. I would prefer a solution that will last, rather than one
> > which relies on details of the current gcc implementation.
>
> Even if gcc learned to analyze asm statements (and use it in something other
> than scheduling), I'm sure this wouldn't be optimized away exactly because
> this construct is used by various projects exactly for this purpose (make
> gcc think it can have any value allowed for the type in question).
Yes, but there's no gaurentee of that. It'd probably break a few things
if they did, but there's nothing stopping them from doing it.
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
On Jan 6, 2002, Tom Rini <[email protected]> wrote:
> On Sun, Jan 06, 2002 at 11:19:40PM +0100, Jakub Jelinek wrote:
>> On Mon, Jan 07, 2002 at 08:59:47AM +1100, Paul Mackerras wrote:
>> > asm("" : "=r" (x) : "0" (y));
>> Even if gcc learned to analyze asm statements (and use it in something other
>> than scheduling), I'm sure this wouldn't be optimized away exactly because
>> this construct is used by various projects exactly for this purpose (make
>> gcc think it can have any value allowed for the type in question).
> Yes, but there's no gaurentee of that. It'd probably break a few things
> if they did, but there's nothing stopping them from doing it.
If we documented this side effect of such asm statements, one would
have to come up with a very strong case to change it.
--
Alexandre Oliva Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist *Please* write to mailing lists, not to me
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Sunday, January 06, 2002 2:06 PM
> To: [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> <<* Given a pointer, I need a way to determine the address
> (as an int of
> the appropriate size) that the CPU will present to the MMU when I
> dereference that pointer.
> >>
>
> This is in general ill-defined, a compiler might generate
> code that sometimes
> does byte access to a particular byte, anmd sometimes gets
> the entire word
> in which the byte resides.
>
> This is often a nasty issue, and is one of the few things in
> this area that
> Ada does not properly address.
>
> If you have a memory mapped byte, you really want a way of saying
>
> "when I read this byte, do a byte read, it will not work to
> do a word read"
>
> pragma Atomic in Ada (volatile gets close in C, but is not
> close enough) will
> ensure a byte store in practice, but may not ensure byte reads.
>
Truly sure; In fact when writiong our Real Time Kernel in C++ we just had
this problem, and had to "hack" GCC C and C++ compilers so that volatile
acesses are guaranteed to be done with the right size, even in case of bit
fields in fact.
Anyway volatile is probably a solution for most of these kind of problems,
and adding some more implementation-defined semantics to volatile may
provide a sure fix for most problems.
Note however that some may not have noticed, in the volatile-using examples,
that there is a difference between a "pointer to volatile char" and a
"volatile pointer to char"; the former, defined as "volatile char*" does not
help in the case of the RELOC macro, while the latter, written "char
*volatile" (note that volatile is now AFTER the '*', not before) is a sure
fix as the semantics of "volatile" ensure that the compiler will NO MORE use
the value it PREVIOUSLY knows was the one of the pointer.
One of the lessons we learn while writing our C++ kernel was that you NEED
to be both a kernel-expert AND a compiler-expert to be successful, as some
parts of the kernel need to play some nasty tricks that the compiler may
misunderstand; so you must be able to find the proper way to inform the
compiler that you are playing these tricks and forget what it knows. Using
volatile (and expanding its semantics to mean: read and write with the
requested size) was a great help.
Just my .02euro
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
> -----Original Message-----
> From: Laurent Guerby [mailto:[email protected]]
> Sent: Sunday, January 06, 2002 2:41 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> [email protected] wrote:
>
> > pragma Atomic in Ada (volatile gets close in C, but is not
> close enough) will
> > ensure a byte store in practice, but may not ensure byte reads.
>
>
> I see no distinction between read and write in the text of
> the Ada standard.
>
> Also I think if you declare a byte array with Atomic_Component
> and Volatile_Component and that the compiler accepts it for its
> target architecture, then the compiler is required to generate
> a byte read and store for each occurence in the text source.
> From C.6:
>
> 15 For an atomic object (including an atomic component)
> all reads and
> updates of the object as a whole are indivisible.
>
> 16 For a volatile object all reads and updates of the object as a
> whole are
> performed directly to memory.
>
> 20 {external effect (volatile/atomic objects) [partial]}
> The external
> effect of a program (see 1.1.3) is defined to include each read and
> update of
> a volatile or atomic object. The implementation shall not
> generate any
> memory
> reads or updates of atomic or volatile objects other than
> those specified by
> the program.
>
> In my exemple byte array, if I say X := T (I) I don't see how
> a conformant
> compiler accepting the declaration could generate anything other than
> one and exactly one byte read. Per 20 it has no right to read
> T (I+1) or
> T(I-1)
> since they are "other" objects (components to be pedantic).
It seems to me that you are right, but there is other cases; for example:
X := T(0)*256 + T(1);
compiled on a big endian architecture may well generate just one 16-bit word
read...
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
> From: Bernard Dautrevaux <[email protected]>
> To: "'[email protected]'" <[email protected]>, [email protected]
> Cc: [email protected], [email protected], [email protected],
> [email protected]
> Date: Mon, 7 Jan 2002 14:24:35 +0100
> Truly sure; In fact when writiong our Real Time Kernel in C++ we
> just had this problem, and had to "hack" GCC C and C++ compilers so
> that volatile acesses are guaranteed to be done with the right size,
> even in case of bit fields in fact.
Did your case include more than bitfields? I recognize that bitfields
have two possible semantics, and that you and gcc may have different
opinions about the semantics of them. I discount the alternative
choice of semantic as being an example of this problem. Roughly
speaking, as I recall... The two semantics are, the base type of the
bitfield defines the mode in which memory is accessed and the smallest
size supported that contains the bitfield defines the access.
struct { unsigned int field:8; };
would be a 32 bit access, versus
struct { unsigned int field:8; };
would be an 8 bit access.
If gcc doesn't do one of them, it truly is broken (if the machine
supports the access of course). I would like to know if gcc is truly
broken.
> Using volatile (and expanding its semantics to mean: read and write
> with the requested size) was a great help.
In gcc, it already means that. If you think otherwise, I'd like to
see the example that shows it.
> From: [email protected]
> To: [email protected], [email protected], [email protected]
> Date: Sun, 6 Jan 2002 14:32:01 -0500 (EST)
> Ah ha! But then look again at my 16-bit example, an expert assembly
> langauge programmer will use a 32 bit load if efficiency is not an
> issue (and it does not matter if there are extra bits around), but a
> 16-bit load if the hardware for some reason requires it. How is the
> poort C compiler to distinguish these cases automatically?
When you give the compiler as much information to it as your expert
apparently has, then it will produce the same code, until then,
imagine you told you expert that you want to do a 16 bit fetch for
something that might care if it were not a 16 bit access... If you so
tie your experts hands, as you tie gcc hands, then should produce
similar code.
Now, if you want to invent gcc extensions so that it can know as much
as a domain expert, start proposing those language extensions...
> -----Original Message-----
> From: mike stump [mailto:[email protected]]
> Sent: Monday, January 07, 2002 7:39 PM
> To: [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: [PATCH] C undefined behavior fix
>
>
> > From: Bernard Dautrevaux <[email protected]>
> > To: "'[email protected]'" <[email protected]>, [email protected]
> > Cc: [email protected], [email protected],
> [email protected],
> > [email protected]
> > Date: Mon, 7 Jan 2002 14:24:35 +0100
>
> > Truly sure; In fact when writiong our Real Time Kernel in C++ we
> > just had this problem, and had to "hack" GCC C and C++ compilers so
> > that volatile acesses are guaranteed to be done with the right size,
> > even in case of bit fields in fact.
>
> Did your case include more than bitfields? I recognize that bitfields
> have two possible semantics, and that you and gcc may have different
> opinions about the semantics of them. I discount the alternative
> choice of semantic as being an example of this problem. Roughly
> speaking, as I recall... The two semantics are, the base type of the
> bitfield defines the mode in which memory is accessed and the smallest
> size supported that contains the bitfield defines the access.
>
> struct { unsigned int field:8; };
>
> would be a 32 bit access, versus
>
> struct { unsigned int field:8; };
>
> would be an 8 bit access.
AFAIR, we'd got an 8 bit access there, which was wrong in view of the fact
we use this to access memory mapped registers that only accept word
accesses. Also this was quite long ago (on gcc-2.95.3); we're in the process
of migrating to 3.0 but that takes some time as we have to make other
changes in the compiler.
Also note that even with the volatile keyword, it seems legal (although I
doubt that GCC does it) to access two adjacent, properly aligned, volatile
char as one word, especially if the result is placed in one resulting
variable as if you have:
struct { unsigned short a; volatile unsigned char x; volatile
unsigned char y; } *p;
unsigned short z = x << 8 | y;
Here it seems that all the C standard requirements for volatile are
fulfilled if the compiler just read a word in z (and that's a big win on a
big-endian architecture); however from a system programmer point of view
this could cause headaches if the memory mapped register only accept byte
accesses... (hopefully this is quite rare; the opposite is more frequent:
only accept word access and get byte access due to bitfields or masking).
>
> If gcc doesn't do one of them, it truly is broken (if the machine
> supports the access of course). I would like to know if gcc is truly
> broken.
gcc obvioulsy does one of those; but using the smallest size access is a
nuisance for system programming and a very small win (if any) for standard
applications (who anyway seldom use the volatile keyword, except in
multithreaded code, but then often read/write from the cache).
>
> > Using volatile (and expanding its semantics to mean: read and write
> > with the requested size) was a great help.
>
> In gcc, it already means that. If you think otherwise, I'd like to
> see the example that shows it.
I don't have access right now to a gcc-3.0 compiler, but I try it just now
on 2.95.3 (i386), compiling by "gcc -S -O2 test.c" (a quite common set of
options isn't-it?).
First case:
===========
reading an 8-bits bitfield declared as unsigned int (32 bit mmio register),
GCC generate 8-bit access to the memory:
struct x {
?volatile unsigned int x1:8;
};
unsigned char f1(struct x* p) {
?return p->x1;
}
yields:
?.file?"tst1.c"
?.version?"01.01"
gcc2_compiled.:
.text
?.align 4
.globl f1
?.type? f1,@function
f1:
?pushl %ebp
?movl %esp,%ebp
?movl 8(%ebp),%eax
?movl %ebp,%esp
?popl %ebp
?movb (%eax),%al
?movzbl %al,%eax
?ret
.Lfe1:
?.size? f1,.Lfe1-f1
?.ident?"GCC: (GNU) 2.95.3-5 (cygwin special)"
Second case:
============
Reading part of a volatile unsigned int (32 bits) with an 8-bit access:
void f(unsigned int* p, unsigned char* q) {
?*q = *p;
}
yields:
?.file?"test2.c"
?.version?"01.01"
gcc2_compiled.:
.text
?.align 4
.globl f
?.type? f,@function
f:
?pushl %ebp
?movl %esp,%ebp
?movl 8(%ebp),%eax
?movl 12(%ebp),%edx
?movl %ebp,%esp
?movb (%eax),%al
?movb %al,(%edx)
?popl %ebp
?ret
.Lfe1:
?.size? f,.Lfe1-f
?.ident?"GCC: (GNU) 2.95.3-5 (cygwin special)"
Maybe 3.0 is avoiding the byte accesses? that would be fine, at least for me
:-)
Note that I were not able to cause gcc to collapse two byte accesses as one
16-bit one but I may not have tried hard enough...
Clearly the code generated by gcc above IS FULLY ANSI-C conforming.
Obviously when writing system code, one would like a bit more strict
interpretation of volatile.
Regards
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
> From: Paul Mackerras <[email protected]>
> Date: Mon, 7 Jan 2002 08:59:47 +1100 (EST)
> To: Gabriel Dos Reis <[email protected]>
> > Personnally, I don't have any sentiment against the assembler
> > solution. Dewar said it was unnecessarily un-portable, but that the
> > construct by itself *is* already unportable.
> I assume that what we're talking about is using an asm statement like:
> asm("" : "=r" (x) : "0" (y));
Ick! No, that's horrible.
char buf[1024];
#define hide(x) ({ void *vp = x; asm ("" : "+r" (vp)); vp; })
main() {
strcpy(buf, hide("eeeeeeeeeeeeeeeeeeeeeeeeeeeeeelksdlkjasdlkjasdlkjasdaaaaaaaaaa"+20));
}
Perfectly clear, simple, doesn't burn regs and so on. In fact, even
the assembly file doesn't have any extraneous output, cool.
> My main problem with this is that it doesn't actually solve the
> problem AFAICS.
It does for now. It will for the next 10 years, my guess. volatile
will solve it longer, at some performance penalty, if you prefer.
> Dereferencing x is still undefined according to the rules in the gcc
> manual.
? So what? Pragmatically, for now, it does what the user wants. By
the time we break it, we'll probably have enough intelligence in the
compiler to figure out what they were doing and still not break it.
> I would prefer a solution that will last, rather than one which
> relies on details of the current gcc implementation.
Then move the bits to the right address before you execute the C code
and code the thing that moves the bits in assembly.
> - it is hard to read; it wouldn't be obvious to someone who doesn't
> know the details of gcc asm syntax what it is doing or why
See the comment just above.
> - it is a statement, which makes it less convenient to use than an
> expression
? In my example, it is an expression.
> - it requires an extra dummy variable declaration.
Mine doesn't.
> But my main objection is that I don't have any assurance that it
> actually solves the problem in a lasting way.
The code only in that subset of C that is well defined and only use
semantics that have mandated behavior.
On Mon, Jan 07, 2002 at 11:36:45AM -0800, mike stump wrote:
> #define hide(x) ({ void *vp = x; asm ("" : "+r" (vp)); vp; })
>
> > My main problem with this is that it doesn't actually solve the
> > problem AFAICS.
>
> It does for now. It will for the next 10 years, my guess. volatile
> will solve it longer, at some performance penalty, if you prefer.
>
> > Dereferencing x is still undefined according to the rules in the gcc
> > manual.
>
> ? So what? Pragmatically, for now, it does what the user wants. By
> the time we break it, we'll probably have enough intelligence in the
> compiler to figure out what they were doing and still not break it.
Why not fix the problem by adding some a GCC-specific construct which
allows the programmer to say, I know what I'm doing; don't make any
assumptions about this pointer. (i.e., "my gun, my foot, my rules").
I.e., something like this:
#hide(x) (__builtin_gcc_this_is_not_the_pointer_you_were_looking_for__(x))
(or __builtin_gcc_jedi_mind_trick__, or pick some other name if you
like :-)
That way, you don't have to posit adding AI to GCC in the future to be
able to intuit what was going on, and programmers can take comfort in
the fact the behaviour is will defined, and won't get screwed up when
in GCC 4.0, the compiler writers decide that "Gee, most people write
inefficient assembly; so let's go and reverse engineering people's
__asm__ statements in an effort to 'optimize' their code".
- Ted
On Mon, Jan 07, 2002 at 02:24:35PM +0100, Bernard Dautrevaux wrote:
>
> Note however that some may not have noticed, in the volatile-using examples,
> that there is a difference between a "pointer to volatile char" and a
> "volatile pointer to char"; the former, defined as "volatile char*" does not
> help in the case of the RELOC macro, while the latter, written "char
> *volatile" (note that volatile is now AFTER the '*', not before) is a sure
> fix as the semantics of "volatile" ensure that the compiler will NO MORE use
> the value it PREVIOUSLY knows was the one of the pointer.
One problem I ran into considering 'char *volatile' was this one: the
compiler is supposed to disable certain optimizations involving
registerization and such, but isn't it still allowed to do constant
folding? That would eliminate the pointer from the intermediate code
altogether, and so the volatile qualifier would be quickly forgotten.
No fixo breako.
Nothing's taking the pointer's address, so the compiler _will_ be able
to prove that (in a sensible universe) no other thread, interrupt,
kernel code or Angered Rain God will be able to find our pointer--much
less change it.
Jeroen
> Nothing's taking the pointer's address, so the compiler _will_ be able
> to prove that (in a sensible universe) no other thread, interrupt,
> kernel code or Angered Rain God will be able to find our pointer--much
> less change it.
You're not allowed to be that smart wrt volatile. If the programmer
says the value might change unpredictably and should not be optimized,
then It Is So and the compiler must respect that even if it determines
It Cannot Possibly Happen.
-Tim
On Mon, Jan 07, 2002 at 05:28:32PM -0500, Tim Hollebeek wrote:
>
> You're not allowed to be that smart wrt volatile. If the programmer
> says the value might change unpredictably and should not be optimized,
> then It Is So and the compiler must respect that even if it determines
> It Cannot Possibly Happen.
Naturally I hope you're right. But how does that follow from the Standard?
I have to admit I don't have a copy handy. :(
Let's say we have this simplified version of the problem:
int a = 3;
{
volatile int b = 10;
a += b;
}
Is there really language in the Standard preventing the compiler from
constant-folding this code to "int a = 13;"?
Jeroen
-----Original Message-----
From: jtv [mailto:[email protected]]
Sent: Monday, January 07, 2002 4:16 PM
To: Tim Hollebeek
Cc: Bernard Dautrevaux; '[email protected]'; [email protected];
[email protected]; [email protected];
[email protected]; [email protected]
Subject: Re: [PATCH] C undefined behavior fix
On Mon, Jan 07, 2002 at 05:28:32PM -0500, Tim Hollebeek wrote:
>
> You're not allowed to be that smart wrt volatile. If the programmer
> says the value might change unpredictably and should not be optimized,
> then It Is So and the compiler must respect that even if it determines
> It Cannot Possibly Happen.
Naturally I hope you're right. But how does that follow from the
Standard?
I have to admit I don't have a copy handy. :(
Let's say we have this simplified version of the problem:
int a = 3;
{
volatile int b = 10;
a += b;
}
Is there really language in the Standard preventing the compiler from
constant-folding this code to "int a = 13;"?
Jeroen
In the above case it is unlikely that folding would present a problem,
but volatile was created because hardware, or even seemingly unrelated
software, can modify even the most unlikely memory locations. If you
want to break device drivers, go ahead and optimize your compiler.
Tim
The only thing dummer than a cow is a man who thinks he's smarter than a
cow.
> From: Bernard Dautrevaux <[email protected]>
> To: "'Laurent Guerby'" <[email protected]>, [email protected]
> Date: Mon, 7 Jan 2002 14:29:15 +0100
> It seems to me that you are right,
That's unfortunate.
> but there is other cases; for example:
> X := T(0)*256 + T(1);
> compiled on a big endian architecture may well generate just one
> 16-bit word read...
Not in my word, not in gcc's world. That is just broken and wrong.
If you want/need the gcc doc to expound on this, write it up, and
we'll add it.
<<You're not allowed to be that smart wrt volatile. If the programmer
says the value might change unpredictably and should not be optimized,
then It Is So and the compiler must respect that even if it determines
It Cannot Possibly Happen.
>>
Indeed, this is the case, for example, you may have a hardware monitor
on the bus, and you are looking for this address, or some external
process may be modifying the location. Or you may legitimately use
volatile so that your debugger works as expected on this location.
<<Is there really language in the Standard preventing the compiler from
constant-folding this code to "int a = 13;"?
>>
If there is no such language (I do not have my copy with me)< then the
standard is most certainly broken :-)
<<In the above case it is unlikely that folding would present a problem,
but volatile was created because hardware, or even seemingly unrelated
software, can modify even the most unlikely memory locations. If you
want to break device drivers, go ahead and optimize your compiler. =20
>>
No, I think that is the wrong domain of discourse here. We are not talking
about compilers being friendly, but rather correct. Most cetainly volatile
in Ada would not permit this optimzation (i.e. it would not be an optimization
it would be a miscompilation), and I certainly thought that in this respect
C was identical to Ada in semantics of volatile.
>>That is just broken and wrong
According to the gcc docs right? certainly not according to the standard!
<<If you want/need the gcc doc to expound on this, write it up, and
we'll add it.
>>
I think we should add this, since this is the source of the rule, not the
standard (as far as I can tell)
> From: [email protected]
> To: [email protected], [email protected], [email protected]
> Date: Sun, 6 Jan 2002 14:29:01 -0500 (EST)
> > Do you have an example where this fails? Do you not consider it a bug?
> I don't see the obligation on the compiler here.
I think you mean that if one pedantically misreads the C standard and
its intent to the widest extent possible just within the confines of
the langauge document that the standard doesn't compell it. If so, we
might be able to agree.
> Now if you are saying that this is a reasonable expectation, nothing
> to do with the standard, then that's another matter, but my example
> of a tradeoff with efficiency is an interesting one.
Pragmatically, not really.
Hum, where in that standard does it say that the compiler won't
scribble all over memory? If it doesn't, does that mean that within
the confines of the language standard that the compiler can? If you
produce an Ada compiler that did, do you think your users would feel
better when you told them you were allowed to by the language
standard?
The point can only be extreme misreading of a standard.
Just because you can invent some nonsensical mapping for C onto our
machines, doesn't mean we should. Nor should we confuse users with
the notion that we could have a nonsensical compiler, if we wanted.
For example, it is perfectly within the rights of an implementor of C
to say that all volatile access will be 32 bus cycles that are 4 byte
aligned. In this case, trivially we see that promoting your short
read to a 32 read is perfectly fine. If a chip required it, it would
be reasonable (to the user).
> Date: Mon, 7 Jan 2002 22:49:07 +0100
> From: jtv <[email protected]>
> To: Bernard Dautrevaux <[email protected]>
> One problem I ran into considering 'char *volatile' was this one:
> the compiler is supposed to disable certain optimizations involving
> registerization and such, but isn't it still allowed to do constant
> folding?
No. That would be a bug. gcc used to have volatile bugs, most of
them are now gone. Please submit a bug report, if you can discover it
again.
I assume you meant something like this:
char * volatile cp;
main() {
return cp - cp;
}
mike stump writes:
> #define hide(x) ({ void *vp = x; asm ("" : "+r" (vp)); vp; })
>
> main() {
> strcpy(buf, hide("eeeeeeeeeeeeeeeeeeeeeeeeeeeeeelksdlkjasdlkjasdlkjasdaaaaaaaaaa"+20));
> }
>
> Perfectly clear, simple, doesn't burn regs and so on. In fact, even
> the assembly file doesn't have any extraneous output, cool.
Clear to you, I had to look up the gcc info pages to find out what the
"+" constraint does. :)
Is the "r" constraint available (and reasonable) on all architectures
that GCC targets?
> > My main problem with this is that it doesn't actually solve the
> > problem AFAICS.
>
> It does for now. It will for the next 10 years, my guess. volatile
> will solve it longer, at some performance penalty, if you prefer.
>
> > Dereferencing x is still undefined according to the rules in the gcc
> > manual.
>
> ? So what? Pragmatically, for now, it does what the user wants. By
> the time we break it, we'll probably have enough intelligence in the
> compiler to figure out what they were doing and still not break it.
I guess I am reacting to the criticism expressed in this thread
against people who write code which happens to work with a particular
compiler version but for which there is nothing in the standard or
documentation which says that it should work or will continue to work.
It seems to me that the asm is in that category. If the gcc
maintainers were willing to document that side effect that would make
me more comfortable with it. But it still seems like using an ax as a
screwdriver to me.
> Then move the bits to the right address before you execute the C code
> and code the thing that moves the bits in assembly.
Yes, I'm using -mrelocatable on the relevant files now instead of the
RELOC macro. But there are still a couple of places where I need to
take a pointer value which is valid when running at the initial
address and adjust it so it will be valid later when running at the
final address.
The broader issue is one of how the kernel can construct pointers from
addresses in general and be sure that gcc won't assume it knows what
the pointer points to. The kernel needs to be able to do this.
> > - it is a statement, which makes it less convenient to use than an
> > expression
>
> ? In my example, it is an expression.
Well, it's a statement expression. :) Aren't those deprecated these
days?
> > - it requires an extra dummy variable declaration.
>
> Mine doesn't.
<nitpicking> What's the "void *vp;" then? </nitpicking>
> > But my main objection is that I don't have any assurance that it
> > actually solves the problem in a lasting way.
>
> The code only in that subset of C that is well defined and only use
> semantics that have mandated behavior.
That gives me no way to turn an address into a pointer.
Paul.
On 20020107 jtv wrote:
>
>Let's say we have this simplified version of the problem:
>
> int a = 3;
> {
> volatile int b = 10;
>>>>>>>>> here b changes
> a += b;
> }
>
>Is there really language in the Standard preventing the compiler from
>constant-folding this code to "int a = 13;"?
>
--
J.A. Magallon # Let the source be with you...
mailto:[email protected]
Mandrake Linux release 8.2 (Cooker) for i586
Linux werewolf 2.4.18-pre1-beo #1 SMP Fri Jan 4 02:25:59 CET 2002 i686
On Tue, Jan 08, 2002 at 11:19:07AM +1100, Paul Mackerras wrote:
> Is the "r" constraint available (and reasonable) on all architectures
> that GCC targets?
Yes, though "g" is probably better for this purpose.
r~
> -----Original Message-----
> From: jtv [mailto:[email protected]]
> Sent: Monday, January 07, 2002 10:49 PM
> To: Bernard Dautrevaux
> Cc: '[email protected]'; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> On Mon, Jan 07, 2002 at 02:24:35PM +0100, Bernard Dautrevaux wrote:
> >
> > Note however that some may not have noticed, in the
> volatile-using examples,
> > that there is a difference between a "pointer to volatile
> char" and a
> > "volatile pointer to char"; the former, defined as
> "volatile char*" does not
> > help in the case of the RELOC macro, while the latter, written "char
> > *volatile" (note that volatile is now AFTER the '*', not
> before) is a sure
> > fix as the semantics of "volatile" ensure that the compiler
> will NO MORE use
> > the value it PREVIOUSLY knows was the one of the pointer.
>
> One problem I ran into considering 'char *volatile' was this one: the
> compiler is supposed to disable certain optimizations involving
> registerization and such, but isn't it still allowed to do constant
> folding? That would eliminate the pointer from the intermediate code
> altogether, and so the volatile qualifier would be quickly
> forgotten.
> No fixo breako.
>
> Nothing's taking the pointer's address, so the compiler
> _will_ be able
> to prove that (in a sensible universe) no other thread, interrupt,
> kernel code or Angered Rain God will be able to find our
> pointer--much
> less change it.
NO; the standard here is clear: any access to a volatile object is a side
effect (see , and optimization is NOT allowed to eliminate side effects, and
must do them respecting sequence points, even if it determines that the code
will in fact do nothing (see 5.1.2.3, at least in the document I have which
one of the last draft, dated 18th January 1999). That's the whole point of
the volatile specifier.
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
> -----Original Message-----
> From: Rogelio M. Serrano Jr. [mailto:[email protected]]
> Sent: Tuesday, January 08, 2002 1:15 AM
> To: Bernard Dautrevaux
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> I tried the same thing using gcc 3.1
> the file test.c contains:
>
> struct x{
> volatile unsigned int x1:8;
> };
>
> unsigned char f1(struct x* p)
> {
> return p->x1;
> }
>
> unsigned char f(unsigned int* p, unsigned char* q)
> {
> *p = *q;
Note that in my case, the problem is caused by
*q = *p;
as this request 8-bits from a 32-bits volatile, and gcc-2.95.3 generates a
byte load ;-(
> }
>
> compiling with gcc -O2 -S test.c yields:
>
> .file "test.c"
> .text
> .align 2
> .p2align 4,,15
> .globl f1
> .type f1,@function
> f1:
> pushl %ebp
> movl %esp, %ebp
> movl 8(%ebp), %eax
> movb (%eax), %al
So here we get the (correct but annoying) byte access to an 8-bit field
taken from a 32-bit integer. That's exactly what we had to change for
volatile to be useful, so that this access is in fact a 32-bit access.
> andl $255, %eax
> popl %ebp
> ret
> .Lfe1:
> .size f1,.Lfe1-f1
> .align 2
> .p2align 4,,15
> .globl f
> .type f,@function
> f:
> pushl %ebp
> movl %esp, %ebp
> movl 12(%ebp), %eax
> xorl %edx, %edx
> movb (%eax), %dl
This is correct as you read *q, which is a byte
> movl 8(%ebp), %eax
> movl %edx, (%eax)
And hopefully, this is correct also as you write all the 32-bits of *p, as
requested.
> popl %ebp
> ret
> .Lfe2:
> .size f,.Lfe2-f
> .ident "GCC: (GNU) 3.1 20011231 (experimental)"
>
So at least for the first test, gcc-3.1 generates the same (anoying) code as
2.95.3. I'm quite sure this is legal, as I can't see in the standard if when
writing:
volatile unsigned int x:8;
I define:
1) a volatile 8-bit field to be interpreted as an unsigned int.
2) an 8-bit field which is part of a volatile unsigned int.
It seems to me that the choice between these two interpretation is
implementation defined; what bothers me is that the GCC choice (1) is not
the more useful in my case (system programming), nor the more natural IMHO.
The second problem I had on 2.95.3 is more anoying, and may well be a GCC
bug; clearly I ask to read a volatile unsigned int, and to forget 24bits out
of it, but GCC only read th e8 interesting bits. It seems that this violate
the 'side effect preserve' definition of volatile (although in a way
coherent with choice 1 above).
So it would seem more coherent to correct the second problem (if it exists
in 3.0) and switch to model (2) above... but that has to be discussed and
documented (although that is what we made in our own compiler).
Regards,
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
On Mon, Jan 07, 2002 at 04:16:32PM -0800, mike stump wrote:
>
> I assume you meant something like this:
>
> char * volatile cp;
>
> main() {
> return cp - cp;
> }
No. No. In this case you're giving cp external linkage, which means it
can be observed from the outside. Plus, you're reading it multiple times,
which in the case of volatile definitely means it should be read multiple
times because it might be modified by something external between those
reads, and that may be exactly what you want to observe with your code.
What I mean (and what we're seeing in RELOC) is more like this:
char *foo() {
char * volatile cp = NULL;
return cp;
}
Jeroen
On Tue, Jan 08, 2002 at 01:27:34AM +0100, J.A. Magallon wrote:
> >
> > int a = 3;
> > {
> > volatile int b = 10;
>
> >>>>>>>>> here b changes
Yes, thank you, that part was obvious already. The question pertained
to the fact that nobody outside compiler-visible code was being handed
an address for b, and so the compiler could (if it wanted to) prove
under pretty broad assumptions that nobody could *find* b to make the
change in the first place.
Now other people assure me that the Standard explicitly rules this out,
and I'm willing to believe that--although naturally I'd still feel more
comfortable if I'd actually seen the relevant text. Just so long as
we're not making another wild-guess stab at solving the problem.
Jeroen
On Tue, Jan 08, 2002 at 10:44:59AM +0100, Bernard Dautrevaux wrote:
>
> NO; the standard here is clear: any access to a volatile object is a side
> effect (see , and optimization is NOT allowed to eliminate side effects, and
> must do them respecting sequence points, even if it determines that the code
> will in fact do nothing
Thank you. That makes it absolutely clear.
Jeroen
On Jan 8, 2002, jtv <[email protected]> wrote:
> On Tue, Jan 08, 2002 at 01:27:34AM +0100, J.A. Magallon wrote:
>> >
>> > int a = 3;
>> > {
>> > volatile int b = 10;
>>
>> >>>>>>>>> here b changes
> Yes, thank you, that part was obvious already. The question pertained
> to the fact that nobody outside compiler-visible code was being handed
> an address for b, and so the compiler could (if it wanted to) prove
> under pretty broad assumptions that nobody could *find* b to make the
> change in the first place.
How about a debugger?
--
Alexandre Oliva Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist *Please* write to mailing lists, not to me
On Tue, Jan 08, 2002 at 10:36:47AM -0200, Alexandre Oliva wrote:
>
> > Yes, thank you, that part was obvious already. The question pertained
> > to the fact that nobody outside compiler-visible code was being handed
> > an address for b, and so the compiler could (if it wanted to) prove
> > under pretty broad assumptions that nobody could *find* b to make the
> > change in the first place.
>
> How about a debugger?
True. I just figured that actually modifying auto variables that were
assigned only once and used only once in a debugger fell under the heading
of *very* broad assumptions. :-)
As it turns out, the discussion is now moot since volatile does indeed
imply here what I dared not assume it would, and so it might even solve
the RELOC problem.
Jeroen
<<Hum, where in that standard does it say that the compiler won't
scribble all over memory? If it doesn't, does that mean that within
the confines of the language standard that the compiler can? If you
produce an Ada compiler that did, do you think your users would feel
better when you told them you were allowed to by the language
standard?
>>
YOu are appealing to the "intent" of the C standard to say that when
referencing volatile memory, ONLY the volatile variable can be
referenced and nothing else. OK, but where do you find this intent?
Or do we just have to take Mike's word for it? If so, that's not
very helpful (i.e. to consider that there is an implicit clause
in the standard that says to consult Mike to learn the intent of
anything not spelled out).
Seriously, I just don't see the requirement stated or implied in the
standard. Perhaps I am missing some language, that's certainly possible,
it is not a document that I know by heart beginning to end.
As to your question above, the external effects of an Ada program are very
carefully stated in the standard, and no one is allowed to try to extend
this set of effects by appealing to "intent". Of course marketing requirements
say many things, e.g. you can obviously compute A+B by recursive incrementing,
and of course that satisfies the standard, but it is obviously useless.
Now if you are claiming that generating efficient code to access a 16-bit
volatile quantity (by loading 32 bits) is in the same category, I absolutely
do not buy that at all.
<<Yes, thank you, that part was obvious already. The question pertained
to the fact that nobody outside compiler-visible code was being handed
an address for b, and so the compiler could (if it wanted to) prove
under pretty broad assumptions that nobody could *find* b to make the
change in the first place.
>>
Suppose that a hardware emulator is used to change the value of b (IT
knows the address) at the point of:
> >>>>>>>>> here b changes
The reason was precisely to test a different value for b. This should
work if b is volatile. So the compiler's proof is flawed. The whole
point of volatile is that ALL such proofs are forbidden for volatile
objects.
> From: [email protected]
> To: [email protected], [email protected], [email protected]
> Date: Tue, 8 Jan 2002 19:51:35 -0500 (EST)
I noticed you failed to answer my question. Why's that? The answer
is, the standard is not a formal document. If it were, you would be
able to point at the line that had the requirement that said it would
not scribble all over memory. It is just that simple.
The intent is to not. We (they people that write the standard) expect
you to just know. Personally, I find it rather obvious. I suspect
most all people do, at least for the scribbling case.
> OK, but where do you find this intent?
The C standard is set in a historical context, and in the context of
existing implementations, and existing expectations. A setting with
experts that know what the intent is to some varying degree. A
setting with an installed base of users, and an installed base of
code. Using all that as a backdrop, some types of experts I think can
discern the intent fairly well. Somethings are contentious.
Somethings are so trivially true that even a beginner could know
you're violating the intent.
Some things hinge on specific choices an implementor might make. For,
for example, on UNIX, with separate compilation without the compilers
ability to talk across the boundary, on a byte addressable machine, if
we have three translation units:
unit 1:
volatile char c1;
void foo1() {
c1 = 1;
}
unit 2:
volatile char c2;
void foo2() {
c2 = 2;
}
unit 3:
volatile char c3;
void foo3() {
c3 = 3;
}
and we load each one into our app, and place c1, c2 and c3 immediately
next to each other, and then run foo1, then foo2, then foo3, and then
check the side effects, c1, c2 and c3, I would claim we _must_ get
write 1 c1, write 2 c2, write 3 c3, and at the end, c1, c2 c3 should
be 1,2,3. I find it obvious.
One cure, is a completely formal language. Do you know of any real
languages that are? By real, I mean a real language that is used to
code real programs and used in the real world. I don't.
So yes, sometimes, from time to time, people might have to be told
what the intent is, if they don't get it.
In part, it is because gcc has adopted this model of independent
translation units, that makes it a hard requirement in the case above,
for the accesses to be byte based. Because if it had not, gcc would
not be able to implement the intended required semantics of each of
the units. The requirements of the standard forced this because of
the implementation choice.
Welcome to the world of programming.
<<and we load each one into our app, and place c1, c2 and c3 immediately
next to each other, and then run foo1, then foo2, then foo3, and then
check the side effects, c1, c2 and c3, I would claim we _must_ get
write 1 c1, write 2 c2, write 3 c3, and at the end, c1, c2 c3 should
be 1,2,3. I find it obvious.
>>
Yes, of course! No one disagrees. I am talking about *LOADS* not stores,
your example is 100% irrelevant to my point, since it does stores.
If you think I was talking about stores (look back in the thread, you
will see that is your misconception), then no WONDER you are puzzled
by what you *thought* I said :-)
<<In part, it is because gcc has adopted this model of independent
translation units, that makes it a hard requirement in the case above,
for the accesses to be byte based. Because if it had not, gcc would
not be able to implement the intended required semantics of each of
the units. The requirements of the standard forced this because of
the implementation choice.
>>
All this is true, but totally irrelevant to the point I was making, which
was about loads, not stores.
>>Welcome to the world of programming.
If you feel that I need a welcome to the world of programming (in which
I have lived for 38 years, producing several million lines of delivered
commercial code), then it is likely you are misunderstanding what I
am saying :-)
Robert
<<I noticed you failed to answer my question. Why's that? The answer
is, the standard is not a formal document. If it were, you would be
able to point at the line that had the requirement that said it would
not scribble all over memory. It is just that simple.
>>
Sorry, it *is* a formal document. Your question was with respect to Ada,
so if you are really interested here is the quote:
8 The external effect of the execution of an Ada program is defined in
terms of its interactions with its external environment. The following are
defined as external interactions:
9 Any interaction with an external file (see A.7);
10 The execution of certain code_statements (see 13.8); which code_
statements cause external interactions is implementation defined.
11 Any call on an imported subprogram (see Annex B), including any
parameters passed to it;
12 Any result returned or exception propagated from a main
subprogram (see 10.2) or an exported subprogram (see Annex B) to
an external caller;
13 Any read or update of an atomic or volatile object (see C.6);
14 The values of imported and exported objects (see Annex B) at the
time of any other interaction with the external environment.
Now the question is: does scribbling come under any of these definitions.
One would have to look at the particular definition of scribbling to know.
In particular, the compiler has to prove that such scribbling does not
violate 13 or 14. Rather hard, if you give me a specific example of a
an action performed by a scribbling compiler. I will tell you whether
it is conforming or not.
Note that the Ada standard also has the concept of implementation advice.
So one should also look there. It is not strictly normative, but in practice
has some semi-normative force.
The standard is most certainly a formal document in the legal sense. If
you are complaining that it is not mathematicaly precise, OK, but such
precision comes with other costs, so the intention of both the C and
Ada standards is that they are at an appropriate level of formality.
Of *course* it is the case that a conforming compiler that meets all the
requirements of the standards, and, in the case of Ada, also passes the
ISO approved validation suite (there is no such for C), may still be
complete junk for a given purpose (the original Ada-Ed compiler which
compiled Ada, and then executed it at the rate of about 1 statement
per second was hardly useful for Mil-Aero applications :-)
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, January 09, 2002 1:52 AM
> To: [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> <<Hum, where in that standard does it say that the compiler won't
> scribble all over memory? If it doesn't, does that mean that within
> the confines of the language standard that the compiler can? If you
> produce an Ada compiler that did, do you think your users would feel
> better when you told them you were allowed to by the language
> standard?
> >>
>
> YOu are appealing to the "intent" of the C standard to say that when
> referencing volatile memory, ONLY the volatile variable can be
> referenced and nothing else. OK, but where do you find this intent?
> Or do we just have to take Mike's word for it? If so, that's not
> very helpful (i.e. to consider that there is an implicit clause
> in the standard that says to consult Mike to learn the intent of
> anything not spelled out).
>
> Seriously, I just don't see the requirement stated or implied in the
> standard. Perhaps I am missing some language, that's
> certainly possible,
> it is not a document that I know by heart beginning to end.
>
> As to your question above, the external effects of an Ada
> program are very
> carefully stated in the standard, and no one is allowed to
> try to extend
> this set of effects by appealing to "intent". Of course
> marketing requirements
> say many things, e.g. you can obviously compute A+B by
> recursive incrementing,
> and of course that satisfies the standard, but it is
> obviously useless.
>
> Now if you are claiming that generating efficient code to
> access a 16-bit
> volatile quantity (by loading 32 bits) is in the same
> category, I absolutely
> do not buy that at all.
I agree that in some cases reading a 32-bit word when needing a 16-bit
volatile short may be allowed by the standard. HOWEVER that suppose that gcc
makes a careful examination of all the memory layout for the program so that
to be sure that the 16 unneeded bits it reads for efficiency do NOT come
from some volatile object(s), or gcc will then BREAK the volatile semantics
for these objects.
So in any case this is not allowed in a lot of cases such as accessing
accessing an external "volatile short" (only the linker knwos for sure what
is near this short) or reading memory through a "volatile short*" (only GOD
knows if you can). And in fact it's WRONG to access in such a way if you
know that near this object you have other objects (such as is the case in a
volatile struct...). So even if it *may* be legal in some cases, such an
optimization that *may* be more efficient is not at all very interesting.
This is especially true as it is a nuisance for some uses of "volatile" that
were intended by the standard (as is clear in notes refering to the use of
volatile for memory mapped IO) and are broken by this optimization.
Regards,
Bernard
PS: Note also that we have (at least on 2.95.3) the opposite "optimization"
or reading only 8-bits out of a 32-bit "volatile unsigned", something that
may be argued to be plain wrong (after all you are omitting reads that were
requested by the program). I've send a message about that earlier in this
thread but have no way right now to test what 3.0.3 will do about it so I
don't want to file a PR if this was already corrected. The program
generating an (incorrect?) byte load is:
void f(volatile unsigned char* p, volatile unsigned short* q) {
*p = *q;
}
Perhaps I should start a new thread about this? this one is quite overlong
:-)
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
In message <17B78BDF120BD411B70100500422FC6309E40E@IIS000>, Bernard Dautrevaux
wrote:
>I agree that in some cases reading a 32-bit word when needing a 16-bit
>volatile short may be allowed by the standard. HOWEVER that suppose that gcc
>makes a careful examination of all the memory layout for the program so that
>to be sure that the 16 unneeded bits it reads for efficiency do NOT come
>from some volatile object(s), or gcc will then BREAK the volatile semantics
>for these objects.
>
>So in any case this is not allowed in a lot of cases such as accessing
>accessing an external "volatile short" (only the linker knwos for sure what
>is near this short) or reading memory through a "volatile short*" (only GOD
>knows if you can). And in fact it's WRONG to access in such a way if you
>know that near this object you have other objects (such as is the case in a
>volatile struct...). So even if it *may* be legal in some cases, such an
>optimization that *may* be more efficient is not at all very interesting.
Especially if there are cases were this optimization yields a slower
access (or even worse indirect bugs).
E.g. if the referenced "volatile short" is a hardware register and the
access is multiplexed over a slow 8 bit bus. There are embedded systems
around where this is the case and the (cross-)compiler has no way to
know this (except it can be told by the programmer).
Bernd
--
Bernd Petrovitsch Email : [email protected]
g.a.m.s gmbh Fax : +43 1 205255-900
Prinz-Eugen-Stra?e 8 A-1040 Vienna/Austria/Europe
LUGA : http://www.luga.at
On 03-Jan-2002, Paul Mackerras <[email protected]> wrote:
> "Conforming" means that the program will run the same on any architecture
In ANSI/ISO C and C++, "strictly conforming" means basically what you said.
But "conforming" means much less -- a "conforming program" is just a
program that is acceptable to at least one conforming C implementation
somewhere. Since conforming C implementations are allowed to accept
*anything*, provided they issue a diagnostic, the term "conforming program"
is essentially meaningless.
--
Fergus Henderson <[email protected]> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
<<This is especially true as it is a nuisance for some uses of "volatile" that
were intended by the standard (as is clear in notes refering to the use of
volatile for memory mapped IO) and are broken by this optimization.
>>
Right, of course this optimization should only be done if the compiler can
prove that no other volatile objects are accessed (that would also violate
the corresponding rule in the Ada standard).
<<Especially if there are cases were this optimization yields a slower =
access (or even worse indirect bugs).
E.g. if the referenced "volatile short" is a hardware register and the
access is multiplexed over a slow 8 bit bus. There are embedded systems
around where this is the case and the (cross-)compiler has no way to
know this (except it can be told by the programmer).
>>
Well that of course is a situation where the compiler is being deliberately
misinformed as to the relative costs of various machine instructions, and
that is definitely a problem. One can even imagine hardware (not such a hard
feat, one of our customers had such hardware) where a word access works, but
a byte access fails due to hardware shortcuts,
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, January 09, 2002 11:42 AM
> To: [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> <<Especially if there are cases were this optimization yields
> a slower =
>
> access (or even worse indirect bugs).
> E.g. if the referenced "volatile short" is a hardware register and the
> access is multiplexed over a slow 8 bit bus. There are
> embedded systems
> around where this is the case and the (cross-)compiler has no way to
> know this (except it can be told by the programmer).
> >>
>
> Well that of course is a situation where the compiler is
> being deliberately
> misinformed as to the relative costs of various machine
> instructions, and
> that is definitely a problem. One can even imagine hardware
> (not such a hard
> feat, one of our customers had such hardware) where a word
> access works, but
> a byte access fails due to hardware shortcuts,
Tht's quite often the case with MMIO, and the only portable way to give a
hint to the compiler that it should refrain from optimizing is "volatile";
that's why I think the compiler should not do this optimization on volatile
objects at all.
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
Fergus Henderson <[email protected]> writes:
| On 03-Jan-2002, Paul Mackerras <[email protected]> wrote:
| > "Conforming" means that the program will run the same on any architecture
|
| In ANSI/ISO C and C++, "strictly conforming" means basically what you said.
More accurately, "strictly conforming" is an ISO C notion. There is
no such thing in C++.
-- Gaby
On Tue, 8 Jan 2002, J.A. Magallon wrote:
>
> On 20020107 jtv wrote:
> >
> >Let's say we have this simplified version of the problem:
> >
> > int a = 3;
> > {
> > volatile int b = 10;
>
> >>>>>>>>> here b changes
Hmmm...
Then your hardware is probably broken or may-be you are dreaming. :-)
There is nothing in this code that requires the compiler to allocate
memory for 'b'. You just invent the volatile constant concept. :)
> > a += b;
> > }
> >
> >Is there really language in the Standard preventing the compiler from
> >constant-folding this code to "int a = 13;"?
G?rard.
> From: [email protected]
> To: [email protected], [email protected], [email protected]
> Cc: [email protected], [email protected], [email protected],
> [email protected]
> Date: Tue, 8 Jan 2002 21:13:43 -0500 (EST)
> Yes, of course! No one disagrees. I am talking about *LOADS* not
> stores, your example is 100% irrelevant to my point, since it does
> stores.
Ok, in the bodies of those, put in
j1=c1;
j2=c2;
j3=c3;
With new definitions for j1, j2 and k3 as being volatile. Accesses are volatile:
[#2] Accessing a volatile object, modifying an object,
modifying a file, or calling a function that does any of
those operations are all side effects
So, I would claim that the case is symetric with writing volatiles.
If the standard doesn't make a distinction for write v read, then you
can't and claim that distinction is based in the standard. If you
claim the standard does make a distinction, please point it out, I am
unaware of it.
<<So, I would claim that the case is symetric with writing volatiles.
If the standard doesn't make a distinction for write v read, then you
can't and claim that distinction is based in the standard. If you
claim the standard does make a distinction, please point it out, I am
unaware of it.
>>
Well obviously you do not go writing to other variables, but if you have
three variables IN ONE PROGRAM
a
b
c
adjacently allocated, and b is volatile, and a/c are not, then your
argument *so far* would appear to allow a compiler to do an "over-wide"
load for b. If you think otherwise, please prove from standard.
Of course a write is generally not at all symmetrical, since you don't
want a write to be to clobber a and c (yes yes, I know you could still
construct a far out case in which a and b might be stored together, and
indeed that is a legitimate separate discussion).
>>>>> "mike" == mike stump <[email protected]> writes:
>> From: [email protected] To: [email protected], [email protected],
>> [email protected] Cc: [email protected],
>> [email protected], [email protected],
>> [email protected] Date: Tue, 8 Jan 2002 21:13:43 -0500 (EST)
>> Yes, of course! No one disagrees. I am talking about *LOADS* not
>> stores, your example is 100% irrelevant to my point, since it does
>> stores.
mike> Ok, in the bodies of those, put in
mike> j1=c1;
mike> j2=c2;
mike> j3=c3;
mike> With new definitions for j1, j2 and k3 as being volatile.
mike> Accesses are volatile:
mike> [#2] Accessing a volatile object, modifying an object,
mike> modifying a file, or calling a function that does any of those
mike> operations are all side effects
mike> So, I would claim that the case is symetric with writing
mike> volatiles. If the standard doesn't make a distinction for
mike> write v read, then you can't and claim that distinction is
mike> based in the standard. If you claim the standard does make a
mike> distinction, please point it out, I am unaware of it.
Ah... so (paraphrasing) -- if you have two byte size volatile objects,
and they happen to end up adjacent in memory, the compiler is
explicitly forbidden from turning an access to one of them into a
wider access -- because that would be an access to both of them, which
is a *different* side effect. (Certainly that exactly matches the
hardware-centric view of why "volatile" exists.) And the compiler
isn't allowed to change side effects, including causing them when the
source code didn't ask you to cause them.
So that says that you are indeed entitled to expect the compiler not
to enlarge volatile accesses, neither loads nor stores. Or, if you
want to be *really* paranoid, you can at least enforce that by
"wrapping" the volatile object, if necessary, with two more volatile
objects that are +1 and -1 address unit away from the one you care
about. And if the compiler then generates a larger reference, that's
a compiler bug.
Nice.
paul
<<Ah... so (paraphrasing) -- if you have two byte size volatile objects,
and they happen to end up adjacent in memory, the compiler is
explicitly forbidden from turning an access to one of them into a
wider access -- because that would be an access to both of them, which
is a *different* side effect. (Certainly that exactly matches the
hardware-centric view of why "volatile" exists.) And the compiler
isn't allowed to change side effects, including causing them when the
source code didn't ask you to cause them.
>>
Right, and as you see that is covered by the language on external effects
in the Ada standard (remember the intent in Ada was to exactly match the
C rules :-)
But one thing in the Ada world that we consider left open is whether a
compiler is free to combine two volatile loads into a single load. Probably
the answer should be no, but the language at least in the Ada standard does
not seem strong enough to say this.
On Wed, Jan 09, 2002 at 08:47:15PM +0100, G?rard Roudier wrote:
> On Tue, 8 Jan 2002, J.A. Magallon wrote:
>
> There is nothing in this code that requires the compiler to allocate
> memory for 'b'. You just invent the volatile constant concept. :)
What's so strange about volatile constant? In C, const means you're
not allowed to modify something--not that it won't change. A read-only
hardware register, for instance, would be const and volatile.
Jeroen
>>>>> "dewar" == dewar <[email protected]> writes:
> <<Ah... so (paraphrasing) -- if you have two byte size
> volatile objects, and they happen to end up adjacent in
> memory, the compiler is explicitly forbidden from turning an
> access to one of them into a wider access -- because that
> would be an access to both of them, which is a *different*
> side effect. (Certainly that exactly matches the
> hardware-centric view of why "volatile" exists.) And the
> compiler isn't allowed to change side effects, including
> causing them when the source code didn't ask you to cause
> them.
dewar> Right, and as you see that is covered by the language on
dewar> external effects in the Ada standard (remember the intent in
dewar> Ada was to exactly match the C rules :-)
dewar> But one thing in the Ada world that we consider left open is
dewar> whether a compiler is free to combine two volatile loads into
dewar> a single load. Probably the answer should be no, but the
dewar> language at least in the Ada standard does not seem strong
dewar> enough to say this.
Would ordering rules help answer that? If you write two separate
loads you have two separate side effects that are ordered in time,
while for a single big load they occur concurrently. If the construct
where those two loads occur does not allow for side effects to be
interleaved, then the "as if" principle seems to say you cannot
legally merge the loads.
paul
<<Would ordering rules help answer that? If you write two separate
loads you have two separate side effects that are ordered in time,
while for a single big load they occur concurrently. If the construct
where those two loads occur does not allow for side effects to be
interleaved, then the "as if" principle seems to say you cannot
legally merge the loads.
>>
Yes maybe, but it's not air tight :-)
On Wed, 9 Jan 2002, Bernard Dautrevaux wrote:
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > Sent: Wednesday, January 09, 2002 11:42 AM
> > To: [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH] C undefined behavior fix
> >
> >
> > <<Especially if there are cases were this optimization yields
> > a slower =
> >
> > access (or even worse indirect bugs).
> > E.g. if the referenced "volatile short" is a hardware register and the
> > access is multiplexed over a slow 8 bit bus. There are
> > embedded systems
> > around where this is the case and the (cross-)compiler has no way to
> > know this (except it can be told by the programmer).
> > >>
> >
> > Well that of course is a situation where the compiler is
> > being deliberately
> > misinformed as to the relative costs of various machine
> > instructions, and
> > that is definitely a problem. One can even imagine hardware
> > (not such a hard
> > feat, one of our customers had such hardware) where a word
> > access works, but
> > a byte access fails due to hardware shortcuts,
>
> Tht's quite often the case with MMIO, and the only portable way to give a
> hint to the compiler that it should refrain from optimizing is "volatile";
> that's why I think the compiler should not do this optimization on volatile
> objects at all.
The C programming language and MMIO are two different things that must not
be mixed.
'volatile' is not enough a portable a concept to deal with MMIO. You also
need to tell about what is atomic and what is not atomic regarding
accesses through the involved BUS, for example. As a result, you may use
'volatile' to avoid assembly for MMIO, but you want to put the code in
architecture dependant code.
The atomicity issue also applies to memory-like access obviously, but the
compiler is expected to know about the architecture capabilities. If this
may differ for a given architecture family, then some compiler options
should be made available.
G?rard.
On 20020109 G?rard Roudier wrote:
>
>
>On Tue, 8 Jan 2002, J.A. Magallon wrote:
>
>>
>> On 20020107 jtv wrote:
>> >
>> >Let's say we have this simplified version of the problem:
>> >
>> > int a = 3;
>> > {
>> > volatile int b = 10;
>>
>> >>>>>>>>> here b changes
>
>Hmmm...
>Then your hardware is probably broken or may-be you are dreaming. :-)
>
>There is nothing in this code that requires the compiler to allocate
>memory for 'b'. You just invent the volatile constant concept. :)
>
>> > a += b;
>> > }
>> >
>> >Is there really language in the Standard preventing the compiler from
>> >constant-folding this code to "int a = 13;"?
>
Grrr. I really do not know why people is making so noise about volatile.
Don't look for esoteric meanings, it is just 'don't suppose ANYTHING
about this memory location, it CAN CNAHGE apart from anything you can
see or guess'.
Even
int b;
volatile const int a = 5;
b = a - a;
can not be optimized to
b = 0;
Why ? Write it in someking of ideal assembler:
dec stack-ptr to make room for b
dec stack-ptr to make room for a
put 5 in a
push a
push a (again)
sub
pop b
And a CAN change between the two pushes. It is not resposibility of the
compiler to try to be clever than the programmer, if the volatile is there
is for a reason (it can be a hardware mapped register like a DAC,
or who knows).
In the (in)famous example above the compiler should not convert to a += 13.
If it is expected to do it, then the (in)famous is the programmer who
put a volatile in a local variable. Usually a mapped register would
be a 'volatile extern int my_reg'.
--
J.A. Magallon # Let the source be with you...
mailto:[email protected]
Mandrake Linux release 8.2 (Cooker) for i586
Linux werewolf 2.4.18-pre2-beo #2 SMP Wed Jan 9 02:23:27 CET 2002 i686
>Even
>
>int b;
>volatile const int a = 5;
>b = a - a;
>
>can not be optimized to
>
>b = 0;
Until you define the scope of the variables, you can't make that
assertion. If the code is:
int b;
volatile const a=5;
void stuff()
{
b = a - a;
}
I can see how a can change in the midst of the execution since
some other code has access to a since its global scope.
If the code is:
int b;
void stuff()
{
volatile const a=5;
b = a - a;
}
Then the code can be optimized to 'b = 0;' since nowhere in the scope
of 'a' does anyone take its address(which would allow it to be changed).
Of course if 'a' is external then another function can access its
address.
>And a CAN change between the two pushes. It is not resposibility of the
>compiler to try to be clever than the programmer, if the volatile is there
>is for a reason (it can be a hardware mapped register like a DAC,
>or who knows).
>In the (in)famous example above the compiler should not convert to a += 13.
>If it is expected to do it, then the (in)famous is the programmer who
>put a volatile in a local variable. Usually a mapped register would
>be a 'volatile extern int my_reg'.
I don't think that anyone is arguing about that case *if so stated*.
What they are arguing about is the following code(and literally!):
void foo()
{
int a=3;
{
volatile int b=10;
a += b;
}
bar(a);
}
can *by inspection* be modified into:
void foo()
{
bar(13);
}
Since:
1) a and b's scope are limited to the function, so they will reside on
the stack.
2) No one takes the address of b, so there is no way for any external
hardware/thread to modify b.
For the above code, I see no reason for not optimizing out 'b'(and for
that case 'a'). If others want to argue about volatile variables,
generate a *realistic* testcase where the possibility for modification
can be inferred, and then pitch your case.
--
Peter Barada [email protected]
Wizard 781-852-2768 (direct)
WaveMark Solutions(wholly owned by Motorola) 781-270-0193 (fax)
<<Grrr. I really do not know why people is making so noise about volatile.
Don't look for esoteric meanings, it is just 'don't suppose ANYTHING
about this memory location, it CAN CNAHGE apart from anything you can
see or guess'.
>>
Nope, that's not enough, it's not that simple. Yes your
example of a-a is of course straightforward but what about
b = (a & 1) | (b & 1);
if a is volatile and b is known to be odd, can the read of a be eliminated?
The answer should be no (and I think the standard guarantees this), but the
reasoning is completely different from thinking about the fact that a may
change unexpectedly, since obviously no matter what value comes from
reading a, b will be set to 1 if b is known to be odd.
Of course you can provide a pragmatic justification, think of a as a hardware
counter that counts the number of times it is referenced, then this should
count as a reference, even though on an as-if basis b would have the same
value.
The Ada standard says it clearly: a read or write of a volatile variable
is an external effect. period.
Peter said
<<Then the code can be optimized to 'b = 0;' since nowhere in the scope
of 'a' does anyone take its address(which would allow it to be changed).
Of course if 'a' is external then another function can access its
address.
>>
Well if nothing else this shows that there is still significant disagreement.
I consider Peter's statement to be 100% wrong here, optimizing away the
access to b would be a clear violation of the standard. I thought that
the argument had been made in a clear and convincing manner, but apparently
some people completely refuse to be convinced!
On 09-Jan-2002, Peter Barada <[email protected]> wrote:
>
> If the code is:
>
> int b;
> void stuff()
> {
> volatile const a=5;
>
> b = a - a;
> }
>
> Then the code can be optimized to 'b = 0;'
No, you're wrong here. That would violate the following provisions of
the C99 standard, because the two accesses to `a' would not have occurred.
(It would also violate similar provisions of the C89 and C++ standards.)
The "as if" rule -- which is stated in 5.1.2.3 [#3] in C99 -- is explicitly
defined to NOT allow optimizing away accesses to volatile objects.
| 5.1.2.3 Program execution
...
| [#2] Accessing a volatile object, modifying an object,
| modifying a file, or calling a function that does any of
| those operations are all side effects
...
| [#3] In the abstract machine, all expressions are evaluated
| as specified by the semantics. An actual implementation
| need not evaluate part of an expression if it can deduce
| that its value is not used and that no needed side effects
| are produced (including any caused by calling a function or
| accessing a volatile object).
...
| [#5] The least requirements on a conforming implementation
| are:
|
| -- At sequence points, volatile objects are stable in the
| sense that previous accesses are complete and
| subsequent accesses have not yet occurred.
--
Fergus Henderson <[email protected]> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
<< | [#5] The least requirements on a conforming implementation
| are:
|
| -- At sequence points, volatile objects are stable in the
| sense that previous accesses are complete and
| subsequent accesses have not yet occurred.
>>
Note that this particular requirement is much laxer than that in the
Ada standard, since it is specialized to sequence points, and would
appear to allow reordering of accesses between sequence points)
> Nope, that's not enough, it's not that simple. Yes your
> example of a-a is of course straightforward but what about
>
> b = (a & 1) | (b & 1);
>
> if a is volatile and b is known to be odd, can the read of a be eliminated?
> The answer should be no (and I think the standard guarantees this), but the
> reasoning is completely different from thinking about the fact that a may
> change unexpectedly, since obviously no matter what value comes from
> reading a, b will be set to 1 if b is known to be odd.
But presumably the transformation to:
a; b = 1;
is ok. where 'a;' represents the operation "read the value a into a
register and then let that register die".
a; b = 1;
yes, that's a permissible transformation, and it is interesting to note
that the normally useless C statement
a;
has a useful meaning for volatile variables :-)
> -----Original Message-----
> From: Paul Koning [mailto:[email protected]]
> Sent: Wednesday, January 09, 2002 10:44 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] C undefined behavior fix
>
>
> >>>>> "dewar" == dewar <[email protected]> writes:
>
> > <<Ah... so (paraphrasing) -- if you have two byte size
> > volatile objects, and they happen to end up adjacent in
> > memory, the compiler is explicitly forbidden from turning an
> > access to one of them into a wider access -- because that
> > would be an access to both of them, which is a *different*
> > side effect. (Certainly that exactly matches the
> > hardware-centric view of why "volatile" exists.) And the
> > compiler isn't allowed to change side effects, including
> > causing them when the source code didn't ask you to cause
> > them.
>
> dewar> Right, and as you see that is covered by the language on
> dewar> external effects in the Ada standard (remember the intent in
> dewar> Ada was to exactly match the C rules :-)
>
> dewar> But one thing in the Ada world that we consider left open is
> dewar> whether a compiler is free to combine two volatile loads into
> dewar> a single load. Probably the answer should be no, but the
> dewar> language at least in the Ada standard does not seem strong
> dewar> enough to say this.
>
> Would ordering rules help answer that? If you write two separate
> loads you have two separate side effects that are ordered in time,
> while for a single big load they occur concurrently. If the construct
> where those two loads occur does not allow for side effects to be
> interleaved, then the "as if" principle seems to say you cannot
> legally merge the loads.
Of course ordering rules must be obeyed, and side effects cannot be moved
across sequence points. Thus if the two volatile loads are in separate
instructions, as in:
int x1, x2;
volatile unsigned char x1, x2;
x1 = v1;
x2 = v2;
Then you're NOT allowed to combined them. However if there is no sequence
points between them the compiler is free to do it (although I'm not sure it
will...) as in e.g.:
x1 = v1 << 8 + v2;
Note that this is not too much of a problem for system programming, as you
have a way to be sure they are not combined: just use intermediate variables
and set them separately; the nice thing there is that as you use these
intermediate variables just once, the compiler will eliminate them. But be
careful: the sequence point MUST BE RETAINED, and then the two loads cannot
be combined (in case 1 of course).
Bernard
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
On 9 Jan 2002, at 19:19, Peter Barada wrote:
>
> >Even
> >
> >int b;
> >volatile const int a = 5;
> >b = a - a;
> >
> >can not be optimized to
> >
> >b = 0;
>
> Until you define the scope of the variables, you can't make that
> assertion. If the code is:
>
> int b;
> volatile const a=5;
> void stuff()
> {
> b = a - a;
> }
>
> I can see how a can change in the midst of the execution since
> some other code has access to a since its global scope.
>
> If the code is:
>
> int b;
> void stuff()
> {
> volatile const a=5;
>
> b = a - a;
> }
>
> Then the code can be optimized to 'b = 0;' since nowhere in the scope
> of 'a' does anyone take its address(which would allow it to be changed).
I could have hardware attached directly to the data bus (a hardware
debugger for instance) that watches for the value 5 to appear. This is
beyond the knowledge of the compiler.
> 2) No one takes the address of b, so there is no way for any external
> hardware/thread to modify b.
see above. You don't need to take the address to catch the variable
access.
MSB
--
Who is this General Failure,
and why is he reading my disk ?
On Thu, Jan 10, 2002 at 10:03:42AM +0100, Bernard Dautrevaux wrote:
[snip]
> Of course ordering rules must be obeyed, and side effects cannot be moved
> across sequence points. Thus if the two volatile loads are in separate
> instructions, as in:
[snip]
Sorry, if I'm rude, but is this discussion really going anywhere, and
is it really necessary to have on lkml?! The signal/noise-ratio is low
enough as it is.
Instead of arguing about possible interpretations of the C-standard, why
not do some real C-programming instead...
Regards: David Weinehall
_ _
// David Weinehall <[email protected]> /> Northern lights wander \\
// Maintainer of the v2.0 kernel // Dance across the winter sky //
\> http://www.acc.umu.se/~tao/ </ Full colour fire </
In message <20020109204043.T1027-100000@gerard>, G rard Roudier wrote:
>On Tue, 8 Jan 2002, J.A. Magallon wrote:
>> On 20020107 jtv wrote:
>> >
>> >Let's say we have this simplified version of the problem:
>> >
>> > int a = 3;
>> > {
>> > volatile int b = 10;
>>
>> >>>>>>>>> here b changes
>
>Hmmm...
>Then your hardware is probably broken or may-be you are dreaming. :-)
No, it may be e.g. a Rx/Tx-register in a standard UART chip, which is
memory-mapped at some address. Actually reading and writing the same
address then would get different registers in the hardware. Reading
two times may get different values.
You would actually access this register through a "volatile char *"
but that's not a significant difference to the concept of "volatile",
so I see above as a simple example (and not as an optimizer-guru's
discussion subject "can we optimize just this volatile expression in
that context").
>There is nothing in this code that requires the compiler to allocate
>memory for 'b'. You just invent the volatile constant concept. :)
Nope, this existed before as [email protected] explained correctly.
Bernd
--
Bernd Petrovitsch Email : [email protected]
g.a.m.s gmbh Fax : +43 1 205255-900
Prinz-Eugen-Stra?e 8 A-1040 Vienna/Austria/Europe
LUGA : http://www.luga.at
<<Note that this is not too much of a problem for system programming, as you
have a way to be sure they are not combined: just use intermediate variables
and set them separately; the nice thing there is that as you use these
intermediate variables just once, the compiler will eliminate them. But be
careful: the sequence point MUST BE RETAINED, and then the two loads cannot
be combined (in case 1 of course).
>>
Of course we all understand that sequence points myust be retained, but this
is a weak condition compared to the rule that all loads and stores for
volatile variables must not be resequenced, and in particular, you seem to
agree that two loads *can* be combined if they both appear between two
sequence points. I think that's unfortunate, and it is why in Ada we
adopted a stricter point of view that avoids the notion of sequence points.
It even seems that if you have two stores between two sequence points then
the compiler is free to omit one, and again that seems the wrong decision
for the case of volatile variables. If it can omit a store in this way, can
it omit a load, i.e. if we have:
x := v - v;
can someone read the sequence point rule to mean that the compiler is
free to do only one load here? I hope not, but we have already seen how
much confusion there is on this point.
On Thu, Jan 10, 2002 at 07:18:45AM -0500, [email protected] wrote:
> <<Note that this is not too much of a problem for system programming, as you
> have a way to be sure they are not combined: just use intermediate variables
> and set them separately; the nice thing there is that as you use these
> intermediate variables just once, the compiler will eliminate them. But be
> careful: the sequence point MUST BE RETAINED, and then the two loads cannot
> be combined (in case 1 of course).
> >>
>
> Of course we all understand that sequence points myust be retained, but this
> is a weak condition compared to the rule that all loads and stores for
> volatile variables must not be resequenced, and in particular, you seem to
> agree that two loads *can* be combined if they both appear between two
> sequence points. I think that's unfortunate, and it is why in Ada we
> adopted a stricter point of view that avoids the notion of sequence points.
>
> It even seems that if you have two stores between two sequence points then
> the compiler is free to omit one, and again that seems the wrong decision
> for the case of volatile variables. If it can omit a store in this way, can
> it omit a load, i.e. if we have:
>
> x := v - v;
>
> can someone read the sequence point rule to mean that the compiler is
> free to do only one load here? I hope not, but we have already seen how
> much confusion there is on this point.
The compiler is not free to do only one load there if v is declared
volatile.
The relevant part of the C standard would be the following paragraph:
[6.7.3]
[#6] An object that has volatile-qualified type may be
modified in ways unknown to the implementation or have other
unknown side effects. Therefore any expression referring to
such an object shall be evaluated strictly according to the
rules of the abstract machine, as described in 5.1.2.3.
Furthermore, at every sequence point the value last stored
in the object shall agree with that prescribed by the
abstract machine, except as modified by the unknown factors
mentioned previously.105) What constitutes an access to an
object that has volatile-qualified type is implementation-
defined.
footnote 105:
105A volatile declaration may be used to describe an object
corresponding to a memory-mapped input/output port or an
object accessed by an asynchronously interrupting
function. Actions on objects so declared shall not be
`optimized out'' by an implementation or reordered
except as permitted by the rules for evaluating
expressions.
[Quotes taken from a draft of the final standard, so it is possible,
but unlikely, that this was changed for the final standard.]
--
<Insert your favourite quote here.>
Erik Trulsson
[email protected]
Erik Trulsson <[email protected]> writes:
> The compiler is not free to do only one load there if v is declared
> volatile.
> The relevant part of the C standard would be the following paragraph:
>
> [6.7.3]
> [#6] An object that has volatile-qualified type may be
> modified in ways unknown to the implementation or have other
> unknown side effects. Therefore any expression referring to
> such an object shall be evaluated strictly according to the
> rules of the abstract machine, as described in 5.1.2.3.
> Furthermore, at every sequence point the value last stored
> in the object shall agree with that prescribed by the
> abstract machine, except as modified by the unknown factors
> mentioned previously.105) What constitutes an access to an
> object that has volatile-qualified type is implementation-
> defined.
And 5.1.2.3 states:
[#2] Accessing a volatile object, modifying an object,
modifying a file, or calling a function that does any of
those operations are all side effects,11) which are changes
in the state of the execution environment. Evaluation of an
expression may produce side effects. At certain specified
points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and
no side effects of subsequent evaluations shall have taken
place. (A summary of the sequence points is given in annex
C.)
So it seems to be obvious ;-) that the compiler must not remove
seemingly unnecessary references to volatile objects.
>No, you're wrong here. That would violate the following provisions of
>the C99 standard, because the two accesses to `a' would not have occurred.
>(It would also violate similar provisions of the C89 and C++ standards.)
>The "as if" rule -- which is stated in 5.1.2.3 [#3] in C99 -- is explicitly
>defined to NOT allow optimizing away accesses to volatile objects.
>
> | [#3] In the abstract machine, all expressions are evaluated
> | as specified by the semantics. An actual implementation
> | need not evaluate part of an expression if it can deduce
> | that its value is not used and that no needed side effects
> | are produced (including any caused by calling a function or
> | accessing a volatile object).
Ahh, I see now said the blind man... When I read over this, I homed in
on "An actual emplementation need not evaluate part of the expression
if it can deduce that its value is not needed", and conveniently
skipped the rest of the sentence.
My Bad.
--
Peter Barada [email protected]
Wizard 781-852-2768 (direct)
WaveMark Solutions(wholly owned by Motorola) 781-270-0193 (fax)
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Thursday, January 10, 2002 1:19 PM
> To: [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH] C undefined behavior fix
>
>
> <<Note that this is not too much of a problem for system
> programming, as you
> have a way to be sure they are not combined: just use
> intermediate variables
> and set them separately; the nice thing there is that as you use these
> intermediate variables just once, the compiler will eliminate
> them. But be
> careful: the sequence point MUST BE RETAINED, and then the
> two loads cannot
> be combined (in case 1 of course).
> >>
>
> Of course we all understand that sequence points myust be
> retained, but this
> is a weak condition compared to the rule that all loads and stores for
> volatile variables must not be resequenced, and in
> particular, you seem to
> agree that two loads *can* be combined if they both appear between two
> sequence points. I think that's unfortunate, and it is why in Ada we
> adopted a stricter point of view that avoids the notion of
> sequence points.
What I said was that the C standard seems to allow two accesses to two
adjacent volatile variable with no intervening sequence point to be replaced
by one access to both variables together; and I think I was clear that,
IMHO, this should be avoided for volatile to be useful for system
programming.
My other remark was that by inserting a sequence point you solve this
possible problem without the need for any extension or specific
implementation defined behaviour.
Note that IMO volatile should be obeyed in C about like Volatile/Atomic is
in Ada; that is do exactly the access requested, no more, no less, with the
requested size.
>
> It even seems that if you have two stores between two
> sequence points then
> the compiler is free to omit one, and again that seems the
> wrong decision
> for the case of volatile variables. If it can omit a store in
> this way, can
> it omit a load, i.e. if we have:
>
> x := v - v;
>
> can someone read the sequence point rule to mean that the compiler is
> free to do only one load here? I hope not, but we have
> already seen how
> much confusion there is on this point.
Oh no; you would omit one side effect on v. What i said is that if we have
something like:
union {
struct {
volatile unsigned char x1;
volatile unsigned char x2;
} X;
volatile unsigned short Y;
} u;
the compiler could be allowed to handle ((u.X.x1 << 256) | u.X.x2) as if
we've written (u.Y), on a big-endian architecture of course. However saying
that it seems standard-compliant does not mean we should do it in GCC; in
fact my all argument is that I would like we explicitely decide NOT to do
this or any other kind of wider/shorter access to volatile variable, as the
compiler does for now.
Note that in fact it seems as while larger loads may be standard compilant
if the compiler ensures it is not reading another volatile object, shorter
writes, as those that gcc-3.95.3 generates, seems to be a genuine bug. I
would submit a bug-report if anybody can confirm the problem I already
mentioned here is there in 3.0 (or when I have time to install 3.0 and
test).
Bernard
PS: perhaps as someone suggested this thread could be limited to gcc ML
instead of both GCC (which I subscribe to) and LKML (which I don't).
--------------------------------------------
Bernard Dautrevaux
Microprocess Ingenierie
97 bis, rue de Colombes
92400 COURBEVOIE
FRANCE
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
e-mail: [email protected]
[email protected]
--------------------------------------------
Bernard Dautrevaux <[email protected]> said:
[...]
> So at least for the first test, gcc-3.1 generates the same (anoying) code as
> 2.95.3. I'm quite sure this is legal, as I can't see in the standard if when
> writing:
>
> volatile unsigned int x:8;
>
> I define:
> 1) a volatile 8-bit field to be interpreted as an unsigned int.
> 2) an 8-bit field which is part of a volatile unsigned int.
If the whole is volatile (x must be inside a struct) make that volatile.
Sounds quite natural to me...
--
Horst von Brand http://counter.li.org # 22616