2002-08-02 00:09:13

by David Mosberger

[permalink] [raw]
Subject: adjust prefetch in free_one_pgd()

As per an earlier discussion on the lkml, the existing prefetch in
free_one_pgd() is somewhat broken in that it prefetches further than
PREFETCH_STRIDE bytes. The patch below fixes that. I'm told that
this also performs better on x86 (it makes little difference on ia64,
but it is also marginally better).

--david

diff -Nru a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c Thu Aug 1 17:02:14 2002
+++ b/mm/memory.c Thu Aug 1 17:02:14 2002
@@ -110,7 +110,7 @@
pmd = pmd_offset(dir, 0);
pgd_clear(dir);
for (j = 0; j < PTRS_PER_PMD ; j++) {
- prefetchw(pmd+j+(PREFETCH_STRIDE/16));
+ prefetchw(pmd + j + PREFETCH_STRIDE/sizeof(*pmd));
free_one_pmd(tlb, pmd+j);
}
pmd_free_tlb(tlb, pmd);


2002-08-02 11:44:42

by Alan

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

On Fri, 2002-08-02 at 01:12, David Mosberger wrote:
> diff -Nru a/mm/memory.c b/mm/memory.c
> --- a/mm/memory.c Thu Aug 1 17:02:14 2002
> +++ b/mm/memory.c Thu Aug 1 17:02:14 2002
> @@ -110,7 +110,7 @@
> pmd = pmd_offset(dir, 0);
> pgd_clear(dir);
> for (j = 0; j < PTRS_PER_PMD ; j++) {
> - prefetchw(pmd+j+(PREFETCH_STRIDE/16));
> + prefetchw(pmd + j + PREFETCH_STRIDE/sizeof(*pmd));
>

It isnt a case of PREFETCH_STRIDE - thats the optimal fetchahead. You
must never prefetch an address beyond the end of an object. So you
actually need two loops one prefetching, then one to finish the job off
which does not prefetch.

Otherwise one day your page ends up against the ISA or PCI address space
or something else undesirable and on some cpus the prefetch then
variously confuses the PCI device or corrupts the cache.

Prefetching stuff you don't need is bad manners anyway 8)

2002-08-02 15:35:57

by David Mosberger

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

>>>>> On 02 Aug 2002 14:04:21 +0100, Alan Cox <[email protected]> said:

Alan> It isnt a case of PREFETCH_STRIDE - thats the optimal
Alan> fetchahead. You must never prefetch an address beyond the end
Alan> of an object. So you actually need two loops one prefetching,
Alan> then one to finish the job off which does not prefetch.

Alan> Otherwise one day your page ends up against the ISA or PCI
Alan> address space or something else undesirable and on some cpus
Alan> the prefetch then variously confuses the PCI device or
Alan> corrupts the cache.

I thought the prefetches API intended this to be a safe operation?
It's definitely not an issue on ia64: there, prefetches against
uncached memory translations are automatically canceled.

Note that there are many other prefetches in the kernel which may end
up prefetch "invalid" addresses (though it may often be address 0
that's being prefetched; perhaps that OK even on arches where you may
not prefetch against ISA or PCI addresses).

Alan> Prefetching stuff you don't need is bad manners anyway 8)

That would work for me (though it would increase code size). My patch
was intended as a minimal fix for a rather obvious typo (or perhaps it
was a thinko... ;-).

--david

2002-08-02 15:41:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()



On Fri, 2 Aug 2002, David Mosberger wrote:
>
> I thought the prefetches API intended this to be a safe operation?

Well, any _sane_ prefetch API would be safe.

However, there is known-broken hardware out there, in which a prefetch
from IO space will kill the machine.

Personally, I would just say that we should disable prefetch on such
clearly broken hardware, but since it's Alans favourite machine (some
early AMD Athlon if I remember correctly), I think Alan will disagree ;)

> It's definitely not an issue on ia64: there, prefetches against
> uncached memory translations are automatically canceled.

That's true of just about any other architecture too. I think the AMD case
is an erratum, so even on AMD it is _supposed_ to work.

Linus

2002-08-02 15:55:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()



On Fri, 2 Aug 2002, Dave Jones wrote:
> On Fri, Aug 02, 2002 at 08:46:38AM -0700, Linus Torvalds wrote:
>
> > Personally, I would just say that we should disable prefetch on such
> > clearly broken hardware, but since it's Alans favourite machine (some
> > early AMD Athlon if I remember correctly), I think Alan will disagree ;)
>
> I think I now understand why you silently dropped the 'disable broken hw
> prefetch on early stepping P4' patch I sent you. 8-)

No, I don't think either I (nor Alan) has any early stepping P4's.

Me dropping patches is just normal ;)

Linus

2002-08-02 15:53:31

by Dave Jones

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

On Fri, Aug 02, 2002 at 08:46:38AM -0700, Linus Torvalds wrote:

> Personally, I would just say that we should disable prefetch on such
> clearly broken hardware, but since it's Alans favourite machine (some
> early AMD Athlon if I remember correctly), I think Alan will disagree ;)

I think I now understand why you silently dropped the 'disable broken hw
prefetch on early stepping P4' patch I sent you. 8-)

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-08-02 16:01:45

by Dave Jones

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

On Fri, Aug 02, 2002 at 08:59:35AM -0700, Linus Torvalds wrote:

> > > Personally, I would just say that we should disable prefetch on such
> > > clearly broken hardware, but since it's Alans favourite machine (some
> > > early AMD Athlon if I remember correctly), I think Alan will disagree ;)
> > I think I now understand why you silently dropped the 'disable broken hw
> > prefetch on early stepping P4' patch I sent you. 8-)
>
> No, I don't think either I (nor Alan) has any early stepping P4's.
> Me dropping patches is just normal ;)

Hmm, it was in the patch including Patricks cpu/ reorganisation,
so it seems you applied it, and then hacked it out 8-)

*shines spotlight on Linus*

Oh well, I'll retransmit sometime. I've been really lax about pushing
stuff to you of late anyway, so a big push when I come back from
my vacation would be a good thing.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-08-02 16:29:42

by Alan

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

On Fri, 2002-08-02 at 16:46, Linus Torvalds wrote:
> Personally, I would just say that we should disable prefetch on such
> clearly broken hardware, but since it's Alans favourite machine (some
> early AMD Athlon if I remember correctly), I think Alan will disagree ;)

Its an expensive change. I'm also not convinced the AMD is the only
situation this comes up. And however you look at it, fetching memory you
don't need is a bug. You are not just wasting memory bandwidth, which
you may have on ia64 but not so much on real computers, you are pulling
stuff shared on other processors which can be quite expensive.

The guarantee I specced in the original API was specifically that
prefetching a NULL pointer was safe.

> That's true of just about any other architecture too. I think the AMD case
> is an erratum, so even on AMD it is _supposed_ to work.

Nothing guarantees a page adjacent to the user memory is not cachable. I
can't think of anything cachable with nasty side effects we might
encounter right now but one day someone will do it just to be annoying.

2002-08-02 16:39:12

by Alan

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

On Fri, 2002-08-02 at 17:38, David Mosberger wrote:
> >>>>> On 02 Aug 2002 18:49:27 +0100, Alan Cox <[email protected]> said:
>
> Alan> I can't think of anything cachable with nasty side effects we
> Alan> might encounter right now but one day someone will do it just
> Alan> to be annoying.
>
> Cacheable and side-effects don't go together. Even without explicit
> software prefetches, most modern CPUs will happily and aggressively
> prefetch stuff from cacheable translations.

Yes we got burned on that with the latest AMD processors. They prefetch
into an area we accidentally have marked with differing cachabilities
between kernel and user space when using the nv binary driver stuff (but
its our bug not theirs). There's a horrid hack in 2.4.19rc to deal with
it pending merging the proper patches


2002-08-02 16:36:37

by David Mosberger

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

>>>>> On 02 Aug 2002 18:49:27 +0100, Alan Cox <[email protected]> said:

Alan> I can't think of anything cachable with nasty side effects we
Alan> might encounter right now but one day someone will do it just
Alan> to be annoying.

Cacheable and side-effects don't go together. Even without explicit
software prefetches, most modern CPUs will happily and aggressively
prefetch stuff from cacheable translations.

That's (partly) why we have this strange situation where execution in
virtual address space is actually _faster_ than in physical space,
even though the former involves more work per memory access.

--david

2002-08-02 16:50:19

by David Mosberger

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()

>>>>> On 02 Aug 2002 18:58:53 +0100, Alan Cox <[email protected]> said:

>> Cacheable and side-effects don't go together. Even without
>> explicit software prefetches, most modern CPUs will happily and
>> aggressively prefetch stuff from cacheable translations.

Alan> Yes we got burned on that with the latest AMD processors. They
Alan> prefetch into an area we accidentally have marked with
Alan> differing cachabilities between kernel and user space when
Alan> using the nv binary driver stuff (but its our bug not
Alan> theirs). There's a horrid hack in 2.4.19rc to deal with it
Alan> pending merging the proper patches

Yes, I'm well aware of this issue as ia64 has basically the same setup
in this regard. For ia64, it seems to be acceptable to require
that AGP DMA operates in coherent mode, so that all the memory can be
mapped cacheable. With old chipsets, coherent AGP DMA is slow, but
with more recent chipsets (including hp's zx1), there won't be a
performance penalty. I realize that this isn't a solution for x86 as
there is too much of an installed base.

--david

2002-08-03 16:56:33

by David Woodhouse

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()


[email protected] said:
> > I thought the prefetches API intended this to be a safe operation?
> Well, any _sane_ prefetch API would be safe.
> However, there is known-broken hardware out there, in which a prefetch
> from IO space will kill the machine.

If you prefetch off the end of your object, and the thing you end up
prefetching is a DMA buffer which we'd just nuked from our cache to allow
the device to DMA into it, something's going to be unhappy regardless of
whether your prefetch faults or not.

--
dwmw2


2002-08-03 17:31:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()



On Sat, 3 Aug 2002, David Woodhouse wrote:
>
> If you prefetch off the end of your object, and the thing you end up
> prefetching is a DMA buffer which we'd just nuked from our cache to allow
> the device to DMA into it, something's going to be unhappy regardless of
> whether your prefetch faults or not.

Yeah, there are broken prefetches around. So what else is new?

My personal opinion is that if a prefetch has semantic meanings outside
the "speed up subsequent accesses", it should not be exposed to the rest
of the kernel (it might still be useful inside architecture-specific
routines like optimized memcpy etc).

If your caches aren't speculation-safe, then prefetch isn't safe to use in
a generic manner. Our current use is probably ok right now, but if we get
future code where doing a prefetch on a pointer that we do not trust is a
good idea, I would personally suggest just disabling prefetch on machines
where that is broken.

Examples of this might be using prefetch on user-supplied addresses (which
might in turn have IO mappings or other issues). Clearly it's not worth
doing a VMA lookup to see if the address is prefetch-safe, so either we
speed up non-broken machines, or we leave everybody slow.

I'd rather speed up non-broken machines and let the broken hardware
hopefully slowly die away.

Linus

2002-08-03 17:35:50

by David Woodhouse

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()


[email protected] said:
> My personal opinion is that if a prefetch has semantic meanings
> outside the "speed up subsequent accesses", it should not be exposed
> to the rest of the kernel (it might still be useful inside
> architecture-specific routines like optimized memcpy etc).

Prefetch generally means 'bring it into the cache'. On architectures with
non-cache-coherent DMA, doing a prefetch on wild address which happens to be
a DMA buffer for which we've just said 'drop it from the cache' is generally
a bad thing.

Not that I'm necessarily disagreeing with you -- but can you confirm that
you are including all architectures with non-cache-coherent DMA in the
'broken hardware' category below, or point out what I'm missing?

> I'd rather speed up non-broken machines and let the broken hardware
> hopefully slowly die away.


--
dwmw2


2002-08-03 19:44:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()



On Sat, 3 Aug 2002, David Woodhouse wrote:
>
> Not that I'm necessarily disagreeing with you -- but can you confirm that
> you are including all architectures with non-cache-coherent DMA in the
> 'broken hardware' category below, or point out what I'm missing?

Rule: if prefetch to random addresses is a problem, then that's broken
hardware.

I don't think that non-cache-coherency wrt DMA necessarily means that that
is true, though. If you flush all CPU caches to memory before starting the
DMA, and you invalidate the DMA'd memory range _after_ the DMA finished, a
"prefetch" on such an architecture is not a problem at all.

Sure, the data vould have been (for a while - between the potential
prefetch and the "DMA write finish invalidate") in the CPU caches in a
non-coherent state, but since the kernel reading anything from that region
would have been a bug in the first place, that should not matter.

Note that such architectures have issues with speculative reads etc too,
which can have all the same issues as prefetch. Of course, only slow and
stupid architectures tend to have non-coherent caches in the first place,
so "speculation" simply isn' tmuch of an issue.

Linus

(That's not to say that I think that non-coherent DMA is broken _anyway_,
but I don't think it should be a problem for prefetch itself)

2002-08-03 22:00:29

by David Woodhouse

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()


[email protected] said:
> I don't think that non-cache-coherency wrt DMA necessarily means that
> that is true, though. If you flush all CPU caches to memory before
> starting the DMA, and you invalidate the DMA'd memory range _after_
> the DMA finished, a "prefetch" on such an architecture is not a
> problem at all.

OK -- assuming you actually do flush before the DMA and invalidate
afterwards, that works. That's what I was missing; thanks :)

That's for a prefetch operation which doesn't mark the cache line
dirty/owned. If you have random addresses used with 'write prefetch'
operations, that's still going to be a problem.

--
dwmw2


2002-08-03 22:36:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: adjust prefetch in free_one_pgd()



On Sat, 3 Aug 2002, David Woodhouse wrote:
>
> That's for a prefetch operation which doesn't mark the cache line
> dirty/owned. If you have random addresses used with 'write prefetch'
> operations, that's still going to be a problem.

Does anybody do that? That's a horribly stupid prefetch, and nobody sane
should ever do anything like that, _especially_ if they don't have
cache-coherency at all layers.

Sure, "prefetch for writing" makes sense, but that shouldn't mark them
cacheline dirty, it should just try to get it exclusively. Does anybody
really mark it dirty?

Linus