2002-08-11 07:43:04

by Andrew Morton

[permalink] [raw]
Subject: [patch 4/21] fix ARCH_HAS_PREFETCH



include/linux/prefetch.h does a strange thing: if the arch doesn't have
the prefectch functions, this header defines no-op version of them and
then defines ARCH_HAS_PREFETCH. So there's no way for mainline code to
know if the architecture *really* has prefetch instructions.

This information loss is unfortunate. Examples:

for (i = 0; i < N; i++)
prefetch(foo[i]);

Problem is, if `prefetch' is a no-op, the compiler will still
generate an empty busy-wait loop. Which it must do. We need to
know the truth about ARCH_HAS_PREFETCH to correctly elide that loop.


#ifdef ARCH_HAS_PREFETCH
#define prefetch_prev_lru_page(_page, _base, _field) \
do { \
if ((_page)->lru.prev != _base) { \
struct page *prev; \
\
prev = list_entry(_page->lru.prev, \
struct page, lru); \
prefetch(&prev->_field); \
} \
} while (0)
#else
#define prefetch_prev_lru_page(_page, _base, _field) do { } while (0)
#endif

Which needs a working ARCH_HAS_PREFETCH to avoid probable extra code
generation on CPUs which don't have prefetch.




prefetch.h | 3 ---
1 files changed, 3 deletions(-)

--- 2.5.31/include/linux/prefetch.h~arch_has_prefetchw Sun Aug 11 00:20:08 2002
+++ 2.5.31-akpm/include/linux/prefetch.h Sun Aug 11 00:20:08 2002
@@ -39,17 +39,14 @@
*/

#ifndef ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCH
static inline void prefetch(const void *x) {;}
#endif

#ifndef ARCH_HAS_PREFETCHW
-#define ARCH_HAS_PREFETCHW
static inline void prefetchw(const void *x) {;}
#endif

#ifndef ARCH_HAS_SPINLOCK_PREFETCH
-#define ARCH_HAS_SPINLOCK_PREFETCH
#define spin_lock_prefetch(x) prefetchw(x)
#endif


.


2002-08-11 09:49:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Andrew Morton <[email protected]> writes:

> Which needs a working ARCH_HAS_PREFETCH to avoid probable extra code
> generation on CPUs which don't have prefetch.

When you use gcc 3.1+ you can use __builtin_prefetch() and gcc takes care of
it. See asm-x86_64/prefetch.h of a working example.

Of course generic C code should not use the ugly builtins directly, but
it could be used to define the wrappers.

-Andi

2002-08-11 18:24:59

by Alan

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> This information loss is unfortunate. Examples:
>
> for (i = 0; i < N; i++)
> prefetch(foo[i]);
>
> Problem is, if `prefetch' is a no-op, the compiler will still
> generate an empty busy-wait loop. Which it must do.

Why - nothing there is volatile


2002-08-11 18:33:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Alan Cox wrote:
>
> On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> > This information loss is unfortunate. Examples:
> >
> > for (i = 0; i < N; i++)
> > prefetch(foo[i]);
> >
> > Problem is, if `prefetch' is a no-op, the compiler will still
> > generate an empty busy-wait loop. Which it must do.
>
> Why - nothing there is volatile

Because the compiler sees:

for (i = 0; i < N; i++)
;

and it says "ah ha. A busy wait delay loop" and leaves it alone.

It's actually a special-case inside the compiler to not optimise
away such constructs.

2002-08-11 18:43:05

by Larry McVoy

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Sun, Aug 11, 2002 at 11:47:22AM -0700, Andrew Morton wrote:
> Alan Cox wrote:
> >
> > On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> > > This information loss is unfortunate. Examples:
> > >
> > > for (i = 0; i < N; i++)
> > > prefetch(foo[i]);
> > >
> > > Problem is, if `prefetch' is a no-op, the compiler will still
> > > generate an empty busy-wait loop. Which it must do.
> >
> > Why - nothing there is volatile
>
> Because the compiler sees:
>
> for (i = 0; i < N; i++)
> ;
>
> and it says "ah ha. A busy wait delay loop" and leaves it alone.
>
> It's actually a special-case inside the compiler to not optimise
> away such constructs.

Good lord. If anyone depends all versions of GCC to not optimize this away,
they are going to hate life. Since GCC doesn't seem to do cross file
optimization (does it?) I've found the following works well:

cat > use_result.c
int dummy; // can't be static, the compiler will see it's not read

use_result(int i)
{
dummy = i;
}
^D

for (i = 0; i < N; ++i) use_result(i);

I'm positive we do stuff like this in LMbench, which is fairly well supported
on a pile of different platforms/compilers.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2002-08-11 19:01:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH


On Sun, 11 Aug 2002, Andrew Morton wrote:
>
> It's actually a special-case inside the compiler to not optimise
> away such constructs.

I thought that special case was removed long ago, because it is untenable
in C++ etc (where such empty loops happen due to various abstraction
issues, and not optimizing them away is just silly).

But testing shows that you're right at least for 2.95 and 2.96. Argh

Linus

2002-08-11 20:05:14

by Jamie Lokier

[permalink] [raw]
Subject: GCC still keeps empty loops? (was: [patch 4/21] fix ARCH_HAS_PREFETCH)

Linus Torvalds wrote:
> I thought that special case was removed long ago, because it is untenable
> in C++ etc (where such empty loops happen due to various abstraction
> issues, and not optimizing them away is just silly).
>
> But testing shows that you're right at least for 2.95 and 2.96. Argh

Unbelievably, 3.1 doesn't remove empty loops either.
I think there's a case for a compiler flag, `-fremove-empty-loops'.

Empty loop delays aren't portable acrosss compilers in general. If you
_really_ want an empty loop that must always work with GCC, it's easy
enough to write:

for (i = 0; i < 100000; i++)
__asm__ __volatile__ ("");

-- Jamie

2002-08-12 22:11:31

by Ralf Baechle

[permalink] [raw]
Subject: Re: GCC still keeps empty loops? (was: [patch 4/21] fix ARCH_HAS_PREFETCH)

On Sun, Aug 11, 2002 at 09:07:18PM +0100, Jamie Lokier wrote:

> Unbelievably, 3.1 doesn't remove empty loops either.
> I think there's a case for a compiler flag, `-fremove-empty-loops'.

Indeed ... It's sad having to scatter ifdefs over the code just because
gcc lacks this optimization ...

Ralf

2002-08-13 21:38:31

by Adrian Bunk

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Sun, 11 Aug 2002, Andrew Morton wrote:

> Alan Cox wrote:
> >
> > On Sun, 2002-08-11 at 08:38, Andrew Morton wrote:
> > > This information loss is unfortunate. Examples:
> > >
> > > for (i = 0; i < N; i++)
> > > prefetch(foo[i]);
> > >
> > > Problem is, if `prefetch' is a no-op, the compiler will still
> > > generate an empty busy-wait loop. Which it must do.
> >
> > Why - nothing there is volatile
>
> Because the compiler sees:
>
> for (i = 0; i < N; i++)
> ;
>
> and it says "ah ha. A busy wait delay loop" and leaves it alone.
>
> It's actually a special-case inside the compiler to not optimise
> away such constructs.

Why is this a special case? As long as a compiler can't prove that the
computed value of i isn't used later it mustn't optimize it away.

Kernighan/Ritchie (German translation of the second edition) contains the
following example program that shows why the compiler mustn't optimize it
away:

<-- snip -->

#include <stdio.h>

main()
{
double nc;

for (nc = 0; getchar() != EOF; ++nc)
;
printf("%.0f\n", nc);

}

<-- snip -->

cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox



2002-08-13 22:09:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Followup to: <Pine.NEB.4.44.0208132322340.1351-100000@mimas.fachschaften.tu-muenchen.de>
By author: Adrian Bunk <[email protected]>
In newsgroup: linux.dev.kernel
> >
> > Because the compiler sees:
> >
> > for (i = 0; i < N; i++)
> > ;
> >
> > and it says "ah ha. A busy wait delay loop" and leaves it alone.
> >
> > It's actually a special-case inside the compiler to not optimise
> > away such constructs.
>
> Why is this a special case? As long as a compiler can't prove that the
> computed value of i isn't used later it mustn't optimize it away.
>

Bullsh*t. It can legitimately transform it into:

i = N;

> Kernighan/Ritchie (German translation of the second edition) contains the
> following example program that shows why the compiler mustn't optimize it
> away:
>
> <-- snip -->
>
> #include <stdio.h>
>
> main()
> {
> double nc;
>
> for (nc = 0; getchar() != EOF; ++nc)
> ;
> printf("%.0f\n", nc);
>
> }
>

getchar() has side effects.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-08-13 22:18:58

by Adrian Bunk

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On 13 Aug 2002, H. Peter Anvin wrote:

>...
> > > Because the compiler sees:
> > >
> > > for (i = 0; i < N; i++)
> > > ;
> > >
> > > and it says "ah ha. A busy wait delay loop" and leaves it alone.
> > >
> > > It's actually a special-case inside the compiler to not optimise
> > > away such constructs.
> >
> > Why is this a special case? As long as a compiler can't prove that the
> > computed value of i isn't used later it mustn't optimize it away.
>
> Bullsh*t. It can legitimately transform it into:
>
> i = N;
>...

Ah, my misunderstanding:
"optimize away" didn't mean "completely remove it" but "transform it to
something semantically equivalent".

> -hpa

Thanks
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox

2002-08-13 22:38:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Adrian Bunk wrote:
>
> Ah, my misunderstanding:
> "optimize away" didn't mean "completely remove it" but "transform it to
> something semantically equivalent".
>

The thing is that gcc has a special rule that keeps it from optimizing
it into something with less than O(n) time.

-=hpa


2002-08-14 17:36:36

by Rogier Wolff

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Tue, Aug 13, 2002 at 03:12:53PM -0700, H. Peter Anvin wrote:
> Followup to: <Pine.NEB.4.44.0208132322340.1351-100000@mimas.fachschaften.tu-muenchen.de>
> By author: Adrian Bunk <[email protected]>
> In newsgroup: linux.dev.kernel
> > >
> > > Because the compiler sees:
> > >
> > > for (i = 0; i < N; i++)
> > > ;
> > >
> > > and it says "ah ha. A busy wait delay loop" and leaves it alone.
> > >
> > > It's actually a special-case inside the compiler to not optimise
> > > away such constructs.
> >
> > Why is this a special case? As long as a compiler can't prove that the
> > computed value of i isn't used later it mustn't optimize it away.
> >
>
> Bullsh*t. It can legitimately transform it into:
>
> i = N;

Right! But people are confusing "practise", "published interface", and
"spec" again.

Published interface in this case is that gcc will not optimize an empty
loop away, as it is often used to generate a timing loop.

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2002-08-14 19:37:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Rogier Wolff wrote:
>>
>>Bullsh*t. It can legitimately transform it into:
>>
>> i = N;
>
>
> Right! But people are confusing "practise", "published interface", and
> "spec" again.
>
> Published interface in this case is that gcc will not optimize an empty
> loop away, as it is often used to generate a timing loop.
>

Yes. This is a gcc-specific wart, a bad idea from the start, and
apparently one which has caught up with them to the point that they've
had to abandon it.

-hpa


2002-08-14 19:54:03

by Jamie Lokier

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

H. Peter Anvin wrote:
> Yes. This is a gcc-specific wart, a bad idea from the start, and
> apparently one which has caught up with them to the point that they've
> had to abandon it.

It's still there in GCC 3.1.

-- Jamie

2002-08-14 20:47:19

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Wed, Aug 14, 2002 at 12:41:04PM -0700, H. Peter Anvin wrote:
> Rogier Wolff wrote:
> >>
> >>Bullsh*t. It can legitimately transform it into:
> >>
> >> i = N;
> >
> >
> >Right! But people are confusing "practise", "published interface", and
> >"spec" again.
> >
> >Published interface in this case is that gcc will not optimize an empty
> >loop away, as it is often used to generate a timing loop.
> >
>
> Yes. This is a gcc-specific wart, a bad idea from the start, and
> apparently one which has caught up with them to the point that they've
> had to abandon it.

There would be a solution to tell gcc not to optimize things, which may
not require too much work from gcc people. Basically, we would need to
implement a __builtin_nop() function that would respect dependencies but
not generate any code. This way, we could have :

for (i=0; i<N, i++);

optimized as i=N
and
for (i=0; i<N; i++)
__builtin_nop();
or even
for (i=0; i<N; __builtin_nop(i++));
do the real work.

This way, some loops could be optimized, and the developpers could explicitely
tell the compiler when they need to prevent any optimization.

Cheers,
Willy

2002-08-14 21:11:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Willy Tarreau wrote:
>>>
>>
>>#define __nop() asm volatile("")
>
> and if you want to pass arguments, to guarantee that no optimization will
> be done, even on loop constants ?
> eg:
> for (i = 0; i < N; i++) {
> j++;
> __nop();
> }
>
> -> might be optimized this way :
> j = N;
> for (i = 0; i < N; i++) {
> __nop();
> }
>
> Perhaps using a volatile for j ?
>

OK, what are you trying to accomplish by this?

But if you wanted to, you could do:

for ( i = 0 ; i < N ; i++ ) {
j++;
asm volatile("" : "=g" (j));
}


2002-08-14 20:56:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Willy Tarreau wrote:
>
> There would be a solution to tell gcc not to optimize things, which may
> not require too much work from gcc people. Basically, we would need to
> implement a __builtin_nop() function that would respect dependencies but
> not generate any code. This way, we could have :
>
> for (i=0; i<N, i++);
>
> optimized as i=N
> and
> for (i=0; i<N; i++)
> __builtin_nop();
> or even
> for (i=0; i<N; __builtin_nop(i++));
> do the real work.
>
> This way, some loops could be optimized, and the developpers could explicitely
> tell the compiler when they need to prevent any optimization.
>

#define __nop() asm volatile("")

Since some processors now have "busy wait delay" instructions, this
would also make it possible to do:

#if defined(__i386__) || defined(__x86_64__)

#define __busy_wait() asm volatile("rep;nop")

#else

#define __busy_wait() asm volatile("")

#endif

2002-08-14 21:09:27

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Wed, Aug 14, 2002 at 01:58:41PM -0700, H. Peter Anvin wrote:
> Willy Tarreau wrote:
> > This way, some loops could be optimized, and the developpers could explicitely
> > tell the compiler when they need to prevent any optimization.
> >
>
> #define __nop() asm volatile("")

and if you want to pass arguments, to guarantee that no optimization will
be done, even on loop constants ?
eg:
for (i = 0; i < N; i++) {
j++;
__nop();
}

-> might be optimized this way :
j = N;
for (i = 0; i < N; i++) {
__nop();
}

Perhaps using a volatile for j ?

Cheers,
Willy

2002-08-14 21:59:31

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Wed, Aug 14, 2002 at 08:57:09PM +0100, Jamie Lokier wrote:
> H. Peter Anvin wrote:
> > Yes. This is a gcc-specific wart, a bad idea from the start, and
> > apparently one which has caught up with them to the point that they've
> > had to abandon it.
>
> It's still there in GCC 3.1.

Yes. If you check out the tree-ssa-branch, however (and use
appropriate commandline arguments), I think you'll find that it's no
longer there. That's the future of GCC's optimizer, but most of it
won't make even GCC 3.3.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2002-08-14 22:56:13

by David Lang

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

why are you useing loops for delays in the first place? it's a solution
that will fail as clock speeds keep improving (if for no other reason then
your loop counter will end up needing to be a larger int to achieve the
desired delay!!)

rather then debating how to convince gcc how to not optimize them away and
messing up the timing we should be talking about how to eliminate such
loops in the first place.

David Lang


On Wed, 14 Aug 2002, Willy Tarreau wrote:

> Date: Wed, 14 Aug 2002 22:45:56 +0200
> From: Willy Tarreau <[email protected]>
> To: H. Peter Anvin <[email protected]>
> Cc: Rogier Wolff <[email protected]>, [email protected]
> Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH
>
> On Wed, Aug 14, 2002 at 12:41:04PM -0700, H. Peter Anvin wrote:
> > Rogier Wolff wrote:
> > >>
> > >>Bullsh*t. It can legitimately transform it into:
> > >>
> > >> i = N;
> > >
> > >
> > >Right! But people are confusing "practise", "published interface", and
> > >"spec" again.
> > >
> > >Published interface in this case is that gcc will not optimize an empty
> > >loop away, as it is often used to generate a timing loop.
> > >
> >
> > Yes. This is a gcc-specific wart, a bad idea from the start, and
> > apparently one which has caught up with them to the point that they've
> > had to abandon it.
>
> There would be a solution to tell gcc not to optimize things, which may
> not require too much work from gcc people. Basically, we would need to
> implement a __builtin_nop() function that would respect dependencies but
> not generate any code. This way, we could have :
>
> for (i=0; i<N, i++);
>
> optimized as i=N
> and
> for (i=0; i<N; i++)
> __builtin_nop();
> or even
> for (i=0; i<N; __builtin_nop(i++));
> do the real work.
>
> This way, some loops could be optimized, and the developpers could explicitely
> tell the compiler when they need to prevent any optimization.
>
> Cheers,
> Willy
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-08-14 23:27:48

by Jamie Lokier

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

David Lang wrote:
> rather then debating how to convince gcc how to not optimize them away and
> messing up the timing we should be talking about how to eliminate such
> loops in the first place.

You misunderstand. We _do_ want gcc to optimize away empty loops.

-- Jamie

2002-08-15 01:22:14

by Alan

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Wed, 2002-08-14 at 21:58, H. Peter Anvin wrote:
>
> Since some processors now have "busy wait delay" instructions, this
> would also make it possible to do:

Nobody should be using an empty busy loop. If its a short timed busy
loop then they should be using udelay, if its a long one
schedule_timeout()

If its polling hardware then it isnt an empty loop

2002-08-15 01:24:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Alan Cox wrote:
> On Wed, 2002-08-14 at 21:58, H. Peter Anvin wrote:
>
>>Since some processors now have "busy wait delay" instructions, this
>>would also make it possible to do:
>
>
> Nobody should be using an empty busy loop. If its a short timed busy
> loop then they should be using udelay, if its a long one
> schedule_timeout()
>

Indeed.

> If its polling hardware then it isnt an empty loop

True indeed as well, although we should still have a busy_wait(); macro
that can insert whatever hint instruction the architecture might or
might not have.

-hpa


2002-08-15 01:35:51

by Alan

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Thu, 2002-08-15 at 02:28, H. Peter Anvin wrote:
> True indeed as well, although we should still have a busy_wait(); macro
> that can insert whatever hint instruction the architecture might or
> might not have.

We have one - its called cpu_relax()

2002-08-15 06:40:20

by Willy Tarreau

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

On Wed, Aug 14, 2002 at 03:53:15PM -0700, David Lang wrote:
> why are you useing loops for delays in the first place? it's a solution
> that will fail as clock speeds keep improving (if for no other reason then
> your loop counter will end up needing to be a larger int to achieve the
> desired delay!!)
>
> rather then debating how to convince gcc how to not optimize them away and
> messing up the timing we should be talking about how to eliminate such
> loops in the first place.

I'm NEVER using loops for delays, I find this awful and always a waste of
time. But sometimes, you want to benchmark some algorithms, and the compiler
makes this difficult by eliminating things it thinks are useless. I once had
this problem when comparing several linked lists algorithms, for example.

Moreover, some people complain about the fact that gcc doesn't optimize
everything because of people using loops for delays. If we provide convenient
ways to help the user tell gcc when not to optimize, gcc could optimize
everything possible by default.

Cheers,
Willy

2002-08-15 06:51:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch 4/21] fix ARCH_HAS_PREFETCH

Willy Tarreau wrote:
>
> Moreover, some people complain about the fact that gcc doesn't optimize
> everything because of people using loops for delays. If we provide convenient
> ways to help the user tell gcc when not to optimize, gcc could optimize
> everything possible by default.
>

Well, again, such ways already exist.

-hpa