include/linux/prefetch.h does a strange thing: if the arch doesn't have
the prefectch functions, this header defines no-op version of them and
then defines ARCH_HAS_PREFETCH. So there's no way for mainline code to
know if the architecture *really* has prefetch instructions.
This information loss is unfortunate. Examples:
for (i = 0; i < N; i++)
prefetch(foo[i]);
Problem is, if `prefetch' is a no-op, the compiler will still
generate an empty busy-wait loop. Which it must do. We need to
know the truth about ARCH_HAS_PREFETCH to correctly elide that loop.
#ifdef ARCH_HAS_PREFETCH
#define prefetch_prev_lru_page(_page, _base, _field) \
do { \
if ((_page)->lru.prev != _base) { \
struct page *prev; \
\
prev = list_entry(_page->lru.prev, \
struct page, lru); \
prefetch(&prev->_field); \
} \
} while (0)
#else
#define prefetch_prev_lru_page(_page, _base, _field) do { } while (0)
#endif
Which needs a working ARCH_HAS_PREFETCH to avoid probable extra code
generation on CPUs which don't have prefetch.
prefetch.h | 3 ---
1 files changed, 3 deletions(-)
--- 2.5.31/include/linux/prefetch.h~arch_has_prefetchw Sun Aug 11 00:20:08 2002
+++ 2.5.31-akpm/include/linux/prefetch.h Sun Aug 11 00:20:08 2002
@@ -39,17 +39,14 @@
*/
#ifndef ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCH
static inline void prefetch(const void *x) {;}
#endif
#ifndef ARCH_HAS_PREFETCHW
-#define ARCH_HAS_PREFETCHW
static inline void prefetchw(const void *x) {;}
#endif
#ifndef ARCH_HAS_SPINLOCK_PREFETCH
-#define ARCH_HAS_SPINLOCK_PREFETCH
#define spin_lock_prefetch(x) prefetchw(x)
#endif
.
Andrew Morton <[email protected]> writes:
> Which needs a working ARCH_HAS_PREFETCH to avoid probable extra code
> generation on CPUs which don't have prefetch.
When you use gcc 3.1+ you can use __builtin_prefetch() and gcc takes care of
it. See asm-x86_64/prefetch.h of a working example.
Of course generic C code should not use the ugly builtins directly, but
it could be used to define the wrappers.
-Andi
On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> This information loss is unfortunate. Examples:
>
> for (i = 0; i < N; i++)
> prefetch(foo[i]);
>
> Problem is, if `prefetch' is a no-op, the compiler will still
> generate an empty busy-wait loop. Which it must do.
Why - nothing there is volatile
Alan Cox wrote:
>
> On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> > This information loss is unfortunate. Examples:
> >
> > for (i = 0; i < N; i++)
> > prefetch(foo[i]);
> >
> > Problem is, if `prefetch' is a no-op, the compiler will still
> > generate an empty busy-wait loop. Which it must do.
>
> Why - nothing there is volatile
Because the compiler sees:
for (i = 0; i < N; i++)
;
and it says "ah ha. A busy wait delay loop" and leaves it alone.
It's actually a special-case inside the compiler to not optimise
away such constructs.
On Sun, Aug 11, 2002 at 11:47:22AM -0700, Andrew Morton wrote:
> Alan Cox wrote:
> >
> > On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> > > This information loss is unfortunate. Examples:
> > >
> > > for (i = 0; i < N; i++)
> > > prefetch(foo[i]);
> > >
> > > Problem is, if `prefetch' is a no-op, the compiler will still
> > > generate an empty busy-wait loop. Which it must do.
> >
> > Why - nothing there is volatile
>
> Because the compiler sees:
>
> for (i = 0; i < N; i++)
> ;
>
> and it says "ah ha. A busy wait delay loop" and leaves it alone.
>
> It's actually a special-case inside the compiler to not optimise
> away such constructs.
Good lord. If anyone depends all versions of GCC to not optimize this away,
they are going to hate life. Since GCC doesn't seem to do cross file
optimization (does it?) I've found the following works well:
cat > use_result.c
int dummy; // can't be static, the compiler will see it's not read
use_result(int i)
{
dummy = i;
}
^D
for (i = 0; i < N; ++i) use_result(i);
I'm positive we do stuff like this in LMbench, which is fairly well supported
on a pile of different platforms/compilers.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
On Sun, 11 Aug 2002, Andrew Morton wrote:
>
> It's actually a special-case inside the compiler to not optimise
> away such constructs.
I thought that special case was removed long ago, because it is untenable
in C++ etc (where such empty loops happen due to various abstraction
issues, and not optimizing them away is just silly).
But testing shows that you're right at least for 2.95 and 2.96. Argh
Linus
Linus Torvalds wrote:
> I thought that special case was removed long ago, because it is untenable
> in C++ etc (where such empty loops happen due to various abstraction
> issues, and not optimizing them away is just silly).
>
> But testing shows that you're right at least for 2.95 and 2.96. Argh
Unbelievably, 3.1 doesn't remove empty loops either.
I think there's a case for a compiler flag, `-fremove-empty-loops'.
Empty loop delays aren't portable acrosss compilers in general. If you
_really_ want an empty loop that must always work with GCC, it's easy
enough to write:
for (i = 0; i < 100000; i++)
__asm__ __volatile__ ("");
-- Jamie
On Sun, Aug 11, 2002 at 09:07:18PM +0100, Jamie Lokier wrote:
> Unbelievably, 3.1 doesn't remove empty loops either.
> I think there's a case for a compiler flag, `-fremove-empty-loops'.
Indeed ... It's sad having to scatter ifdefs over the code just because
gcc lacks this optimization ...
Ralf
On Sun, 11 Aug 2002, Andrew Morton wrote:
> Alan Cox wrote:
> >
> > On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> > > This information loss is unfortunate. Examples:
> > >
> > > for (i = 0; i < N; i++)
> > > prefetch(foo[i]);
> > >
> > > Problem is, if `prefetch' is a no-op, the compiler will still
> > > generate an empty busy-wait loop. Which it must do.
> >
> > Why - nothing there is volatile
>
> Because the compiler sees:
>
> for (i = 0; i < N; i++)
> ;
>
> and it says "ah ha. A busy wait delay loop" and leaves it alone.
>
> It's actually a special-case inside the compiler to not optimise
> away such constructs.
Why is this a special case? As long as a compiler can't prove that the
computed value of i isn't used later it mustn't optimize it away.
Kernighan/Ritchie (German translation of the second edition) contains the
following example program that shows why the compiler mustn't optimize it
away:
<-- snip -->
#include <stdio.h>
main()
{
double nc;
for (nc = 0; getchar() != EOF; ++nc)
;
printf("%.0f\n", nc);
}
<-- snip -->
cu
Adrian
--
You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox
Followup to: <Pine.NEB.4.44.0208132322340.1351-100000@mimas.fachschaften.tu-muenchen.de>
By author: Adrian Bunk <[email protected]>
In newsgroup: linux.dev.kernel
> >
> > Because the compiler sees:
> >
> > for (i = 0; i < N; i++)
> > ;
> >
> > and it says "ah ha. A busy wait delay loop" and leaves it alone.
> >
> > It's actually a special-case inside the compiler to not optimise
> > away such constructs.
>
> Why is this a special case? As long as a compiler can't prove that the
> computed value of i isn't used later it mustn't optimize it away.
>
Bullsh*t. It can legitimately transform it into:
i = N;
> Kernighan/Ritchie (German translation of the second edition) contains the
> following example program that shows why the compiler mustn't optimize it
> away:
>
> <-- snip -->
>
> #include <stdio.h>
>
> main()
> {
> double nc;
>
> for (nc = 0; getchar() != EOF; ++nc)
> ;
> printf("%.0f\n", nc);
>
> }
>
getchar() has side effects.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
On 13 Aug 2002, H. Peter Anvin wrote:
>...
> > > Because the compiler sees:
> > >
> > > for (i = 0; i < N; i++)
> > > ;
> > >
> > > and it says "ah ha. A busy wait delay loop" and leaves it alone.
> > >
> > > It's actually a special-case inside the compiler to not optimise
> > > away such constructs.
> >
> > Why is this a special case? As long as a compiler can't prove that the
> > computed value of i isn't used later it mustn't optimize it away.
>
> Bullsh*t. It can legitimately transform it into:
>
> i = N;
>...
Ah, my misunderstanding:
"optimize away" didn't mean "completely remove it" but "transform it to
something semantically equivalent".
> -hpa
Thanks
Adrian
--
You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox
Adrian Bunk wrote:
>
> Ah, my misunderstanding:
> "optimize away" didn't mean "completely remove it" but "transform it to
> something semantically equivalent".
>
The thing is that gcc has a special rule that keeps it from optimizing
it into something with less than O(n) time.
-=hpa
On Tue, Aug 13, 2002 at 03:12:53PM -0700, H. Peter Anvin wrote:
> Followup to: <Pine.NEB.4.44.0208132322340.1351-100000@mimas.fachschaften.tu-muenchen.de>
> By author: Adrian Bunk <[email protected]>
> In newsgroup: linux.dev.kernel
> > >
> > > Because the compiler sees:
> > >
> > > for (i = 0; i < N; i++)
> > > ;
> > >
> > > and it says "ah ha. A busy wait delay loop" and leaves it alone.
> > >
> > > It's actually a special-case inside the compiler to not optimise
> > > away such constructs.
> >
> > Why is this a special case? As long as a compiler can't prove that the
> > computed value of i isn't used later it mustn't optimize it away.
> >
>
> Bullsh*t. It can legitimately transform it into:
>
> i = N;
Right! But people are confusing "practise", "published interface", and
"spec" again.
Published interface in this case is that gcc will not optimize an empty
loop away, as it is often used to generate a timing loop.
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.
Rogier Wolff wrote:
>>
>>Bullsh*t. It can legitimately transform it into:
>>
>> i = N;
>
>
> Right! But people are confusing "practise", "published interface", and
> "spec" again.
>
> Published interface in this case is that gcc will not optimize an empty
> loop away, as it is often used to generate a timing loop.
>
Yes. This is a gcc-specific wart, a bad idea from the start, and
apparently one which has caught up with them to the point that they've
had to abandon it.
-hpa
H. Peter Anvin wrote:
> Yes. This is a gcc-specific wart, a bad idea from the start, and
> apparently one which has caught up with them to the point that they've
> had to abandon it.
It's still there in GCC 3.1.
-- Jamie
On Wed, Aug 14, 2002 at 12:41:04PM -0700, H. Peter Anvin wrote:
> Rogier Wolff wrote:
> >>
> >>Bullsh*t. It can legitimately transform it into:
> >>
> >> i = N;
> >
> >
> >Right! But people are confusing "practise", "published interface", and
> >"spec" again.
> >
> >Published interface in this case is that gcc will not optimize an empty
> >loop away, as it is often used to generate a timing loop.
> >
>
> Yes. This is a gcc-specific wart, a bad idea from the start, and
> apparently one which has caught up with them to the point that they've
> had to abandon it.
There would be a solution to tell gcc not to optimize things, which may
not require too much work from gcc people. Basically, we would need to
implement a __builtin_nop() function that would respect dependencies but
not generate any code. This way, we could have :
for (i=0; i<N, i++);
optimized as i=N
and
for (i=0; i<N; i++)
__builtin_nop();
or even
for (i=0; i<N; __builtin_nop(i++));
do the real work.
This way, some loops could be optimized, and the developpers could explicitely
tell the compiler when they need to prevent any optimization.
Cheers,
Willy
Willy Tarreau wrote:
>>>
>>
>>#define __nop() asm volatile("")
>
> and if you want to pass arguments, to guarantee that no optimization will
> be done, even on loop constants ?
> eg:
> for (i = 0; i < N; i++) {
> j++;
> __nop();
> }
>
> -> might be optimized this way :
> j = N;
> for (i = 0; i < N; i++) {
> __nop();
> }
>
> Perhaps using a volatile for j ?
>
OK, what are you trying to accomplish by this?
But if you wanted to, you could do:
for ( i = 0 ; i < N ; i++ ) {
j++;
asm volatile("" : "=g" (j));
}
Willy Tarreau wrote:
>
> There would be a solution to tell gcc not to optimize things, which may
> not require too much work from gcc people. Basically, we would need to
> implement a __builtin_nop() function that would respect dependencies but
> not generate any code. This way, we could have :
>
> for (i=0; i<N, i++);
>
> optimized as i=N
> and
> for (i=0; i<N; i++)
> __builtin_nop();
> or even
> for (i=0; i<N; __builtin_nop(i++));
> do the real work.
>
> This way, some loops could be optimized, and the developpers could explicitely
> tell the compiler when they need to prevent any optimization.
>
#define __nop() asm volatile("")
Since some processors now have "busy wait delay" instructions, this
would also make it possible to do:
#if defined(__i386__) || defined(__x86_64__)
#define __busy_wait() asm volatile("rep;nop")
#else
#define __busy_wait() asm volatile("")
#endif
On Wed, Aug 14, 2002 at 01:58:41PM -0700, H. Peter Anvin wrote:
> Willy Tarreau wrote:
> > This way, some loops could be optimized, and the developpers could explicitely
> > tell the compiler when they need to prevent any optimization.
> >
>
> #define __nop() asm volatile("")
and if you want to pass arguments, to guarantee that no optimization will
be done, even on loop constants ?
eg:
for (i = 0; i < N; i++) {
j++;
__nop();
}
-> might be optimized this way :
j = N;
for (i = 0; i < N; i++) {
__nop();
}
Perhaps using a volatile for j ?
Cheers,
Willy
On Wed, Aug 14, 2002 at 08:57:09PM +0100, Jamie Lokier wrote:
> H. Peter Anvin wrote:
> > Yes. This is a gcc-specific wart, a bad idea from the start, and
> > apparently one which has caught up with them to the point that they've
> > had to abandon it.
>
> It's still there in GCC 3.1.
Yes. If you check out the tree-ssa-branch, however (and use
appropriate commandline arguments), I think you'll find that it's no
longer there. That's the future of GCC's optimizer, but most of it
won't make even GCC 3.3.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
why are you useing loops for delays in the first place? it's a solution
that will fail as clock speeds keep improving (if for no other reason then
your loop counter will end up needing to be a larger int to achieve the
desired delay!!)
rather then debating how to convince gcc how to not optimize them away and
messing up the timing we should be talking about how to eliminate such
loops in the first place.
David Lang
On Wed, 14 Aug 2002, Willy Tarreau wrote:
> Date: Wed, 14 Aug 2002 22:45:56 +0200
> From: Willy Tarreau <[email protected]>
> To: H. Peter Anvin <[email protected]>
> Cc: Rogier Wolff <[email protected]>, [email protected]
> Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH
>
> On Wed, Aug 14, 2002 at 12:41:04PM -0700, H. Peter Anvin wrote:
> > Rogier Wolff wrote:
> > >>
> > >>Bullsh*t. It can legitimately transform it into:
> > >>
> > >> i = N;
> > >
> > >
> > >Right! But people are confusing "practise", "published interface", and
> > >"spec" again.
> > >
> > >Published interface in this case is that gcc will not optimize an empty
> > >loop away, as it is often used to generate a timing loop.
> > >
> >
> > Yes. This is a gcc-specific wart, a bad idea from the start, and
> > apparently one which has caught up with them to the point that they've
> > had to abandon it.
>
> There would be a solution to tell gcc not to optimize things, which may
> not require too much work from gcc people. Basically, we would need to
> implement a __builtin_nop() function that would respect dependencies but
> not generate any code. This way, we could have :
>
> for (i=0; i<N, i++);
>
> optimized as i=N
> and
> for (i=0; i<N; i++)
> __builtin_nop();
> or even
> for (i=0; i<N; __builtin_nop(i++));
> do the real work.
>
> This way, some loops could be optimized, and the developpers could explicitely
> tell the compiler when they need to prevent any optimization.
>
> Cheers,
> Willy
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
David Lang wrote:
> rather then debating how to convince gcc how to not optimize them away and
> messing up the timing we should be talking about how to eliminate such
> loops in the first place.
You misunderstand. We _do_ want gcc to optimize away empty loops.
-- Jamie
On Wed, 2002-08-14 at 21:58, H. Peter Anvin wrote:
>
> Since some processors now have "busy wait delay" instructions, this
> would also make it possible to do:
Nobody should be using an empty busy loop. If its a short timed busy
loop then they should be using udelay, if its a long one
schedule_timeout()
If its polling hardware then it isnt an empty loop
Alan Cox wrote:
> On Wed, 2002-08-14 at 21:58, H. Peter Anvin wrote:
>
>>Since some processors now have "busy wait delay" instructions, this
>>would also make it possible to do:
>
>
> Nobody should be using an empty busy loop. If its a short timed busy
> loop then they should be using udelay, if its a long one
> schedule_timeout()
>
Indeed.
> If its polling hardware then it isnt an empty loop
True indeed as well, although we should still have a busy_wait(); macro
that can insert whatever hint instruction the architecture might or
might not have.
-hpa
On Thu, 2002-08-15 at 02:28, H. Peter Anvin wrote:
> True indeed as well, although we should still have a busy_wait(); macro
> that can insert whatever hint instruction the architecture might or
> might not have.
We have one - its called cpu_relax()
On Wed, Aug 14, 2002 at 03:53:15PM -0700, David Lang wrote:
> why are you useing loops for delays in the first place? it's a solution
> that will fail as clock speeds keep improving (if for no other reason then
> your loop counter will end up needing to be a larger int to achieve the
> desired delay!!)
>
> rather then debating how to convince gcc how to not optimize them away and
> messing up the timing we should be talking about how to eliminate such
> loops in the first place.
I'm NEVER using loops for delays, I find this awful and always a waste of
time. But sometimes, you want to benchmark some algorithms, and the compiler
makes this difficult by eliminating things it thinks are useless. I once had
this problem when comparing several linked lists algorithms, for example.
Moreover, some people complain about the fact that gcc doesn't optimize
everything because of people using loops for delays. If we provide convenient
ways to help the user tell gcc when not to optimize, gcc could optimize
everything possible by default.
Cheers,
Willy
Willy Tarreau wrote:
>
> Moreover, some people complain about the fact that gcc doesn't optimize
> everything because of people using loops for delays. If we provide convenient
> ways to help the user tell gcc when not to optimize, gcc could optimize
> everything possible by default.
>
Well, again, such ways already exist.
-hpa