2002-06-17 21:07:23

by Brunner, Richard

[permalink] [raw]
Subject: RE: another new version of pageattr caching conflict fix for 2.4

Making the AGP Aperture write-back cacheable is not good from
a performance perspective. (I can't comment on which is better
from a Linux Kernel perspective).

An Aperture Page can be made
cache-coherent depending on the implementation
and the AGP 3.0 spec provides an
architectural way of specifying and controlling these as
well. But, by default the area is not made cache-coherent
due to the performance loss and the lack of software to take
advantage of it -- the two play off against each
other.

Making it cache-coherent causes every AGP access to
snoop processor caches and this can be quite a hit in
performance when you consider the predominant AGP software
model. Most software that takes advantage of AGP is still
using the old Intel model of uncacheable, the majority of
data placed in the Aperture are read-only structures for the
AGP device -- such as vertex lists, locked vertex arrays,
and texture data. For the most part this fits the current
paradigm of throwing textures and vertices at the graphics
device. The only graphics area found so far that could
benefit from a coherent aperture is video capture data which
streams in from the graphics device and requires CPU
post-processing.



-Rich ...
[[email protected] -- (360)-867-0654]
[Senior Member, Technical Staff, SW R&D @ AMD]

> -----Original Message-----
> From: Albert D. Cahalan [mailto:[email protected]]
> Sent: Sunday, June 16, 2002 5:09 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected];
> [email protected]; Brunner, Richard; Langsdorf, Mark
> Subject: Re: another new version of pageattr caching conflict fix for
> 2.4
>
>
> Andi Kleen writes:
>
> >> the same problems if the agp aperture was marked
> write-back, and the
> >
> > AGP aperture is uncacheable, not write-back.
> >
> >> memory was marked uncacheable. My gut impression is to
> just make the
> >> agp aperture write-back cacheable, and then we don't have to change
> >> the kernel page table at all. Unfortunately I don't
> expect the host
> >
> > That would violate the AGP specification.
> >
> >> bridge with the memory and agp controllers to like that mode,
> >> especially as there are physical aliasing issues.
> >
> > exactly.
>
> You can do whatever you want, as long as...
>
> 1. you have cache control instructions and use them
> 2. the bridge ignores the coherency protocol (no machine check)
>
> Most likely you should make the AGP memory write-back
> cacheable. This requires some care regarding cache lines,
> but ought to be faster.
>
> >>> Fixing the MTRRs is fine, but it is really outside the
> scope of my patch.
> >>> Just changing the kernel map wouldn't be enough to fix
> wrong MTRRs,
> >>> because it wouldn't cover highmem.
> >>
> >> My preferred fix is to use PAT, to override the buggy mtrrs. Which
> >> brings up the same aliasing issues. Which makes it related but
> >> outside the scope of the problem.
> >
> > I don't follow you here. IMHO it is much easier to fix the
> MTRRs in the
> > MTRR driver for those rare buggy BIOS (if they exist - I've
> never seen one)
> > than to hack up all of memory management just to get the
> right bits set.
> > I see no disadvantage of using the MTRRs and it is lot simpler than
> > PAT and pte bits.
>
> For non-x86 one must "hack up all of memory management" anyway.
>
> Example: There aren't any MTRRs on the PowerPC, but every page
> has 4 memory type bits. It's not OK to map something more than
> one way at the same time. Large "pages" (256 MB each) are used
> to cover all of non-highmem physical memory.
>
>
>
>
>


2002-06-18 17:44:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: another new version of pageattr caching conflict fix for 2.4

[email protected] writes:

> Making the AGP Aperture write-back cacheable is not good from
> a performance perspective. (I can't comment on which is better
> from a Linux Kernel perspective).

Mainly I was arguing from.
- Make the common case fast.
- The common case is write-back.
- AGP is not the common case.
- AGP has performance limitations.

>From the kernel side, the caching attributes don't particularly matter
because physical aliasing is introduced, with the AGP aperture. So
the cache coherency protocols cannot make our lives simpler.

> An Aperture Page can be made
> cache-coherent depending on the implementation
> and the AGP 3.0 spec provides an
> architectural way of specifying and controlling these as
> well. But, by default the area is not made cache-coherent
> due to the performance loss and the lack of software to take
> advantage of it -- the two play off against each
> other.

Cache coherency is tricky, so there is some argument there.

> Making it cache-coherent causes every AGP access to
> snoop processor caches and this can be quite a hit in
> performance when you consider the predominant AGP software
> model. Most software that takes advantage of AGP is still
> using the old Intel model of uncacheable, the majority of
> data placed in the Aperture are read-only structures for the
> AGP device -- such as vertex lists, locked vertex arrays,
> and texture data. For the most part this fits the current
> paradigm of throwing textures and vertices at the graphics
> device. The only graphics area found so far that could
> benefit from a coherent aperture is video capture data which
> streams in from the graphics device and requires CPU
> post-processing.

I hadn't thought of the snooping from the AGP side, but even then given
that the AGP aperture is a fixed region it would probably work to just
have a fixed snoop on the AGP region, and only do something when AGP
traffic comes in. Though I will buy the argument it may not be
possible to do it at full performance unless the AGP card knows
something about cache coherency. Though mostly I suspect it is a cost
tradeoff issue.

If the area is purely uncacheable, then writing to that area cannot go
at full memory speed. So we should at the very least mark the region
as write-combining. This should be get the cpu putting data in there
at about 1400MB/s with PC2100, and moving data there just short of
2100MB/s. This doesn't help directly AGP performance, but it does
allow the cpu to spend it's cycles on more important things, much
sooner.

I don't believe there is a memory caching attribute that would get the
data copy from the AGP aperture sped up except write-back. Which is
where I guess video capture comes in.

Eric



2002-06-26 01:38:01

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: another new version of pageattr caching conflict fix for

richard.brunner writes:
> [Albert Cahalan]

>> You can do whatever you want, as long as...
>>
>> 1. you have cache control instructions and use them
>> 2. the bridge ignores the coherency protocol (no machine check)
>>
>> Most likely you should make the AGP memory write-back
>> cacheable. This requires some care regarding cache lines,
>> but ought to be faster.

> Making the AGP Aperture write-back cacheable is not good from
> a performance perspective. (I can't comment on which is better
> from a Linux Kernel perspective).
>
> An Aperture Page can be made
> cache-coherent depending on the implementation
> and the AGP 3.0 spec provides an
> architectural way of specifying and controlling these as
> well. But, by default the area is not made cache-coherent
> due to the performance loss and the lack of software to take
> advantage of it -- the two play off against each
> other.
>
> Making it cache-coherent causes every AGP access to
> snoop processor caches and this can be quite a hit in
> performance when you consider the predominant AGP software
> model. Most software that takes advantage of AGP is still
> using the old Intel model of uncacheable, the majority of
> data placed in the Aperture are read-only structures for the
> AGP device -- such as vertex lists, locked vertex arrays,
> and texture data. For the most part this fits the current
> paradigm of throwing textures and vertices at the graphics
> device. The only graphics area found so far that could
> benefit from a coherent aperture is video capture data which
> streams in from the graphics device and requires CPU
> post-processing.

I didn't suggest enabling coherency.

You can cache your _incoherent_ memory as long as the CPU
has instructions that manipulate cache lines. This gives
you write-combining without AGP snooping overhead. If you
can have the CPU be incoherent too, you should do so.

I'm used to working with PowerPC, so maybe you'll tell me
that x86 is too lame to handle this. Hopefully AMD supports
most of these useful operations:

a. mark a cache line valid (with junk data)
b. cause immediate write-back
c. mark a cache line invalid (discard data)
d. prefetch for load
e. prefetch for store (leave clean)
f. create a zero-filled dirty cache line
g. write-back, then invalidate
h. mark some memory as "cached, but NOT coherent"

So you can work like this:

1. mark a cache line valid (with junk data)
2. modify the data the regular way
3. write-back, then invalidate
4. tell the video card to read the data

For data coming the other way:

1. ensure that the cache line isn't dirty
2. tell the video card to write data
3. ensure that the cache line isn't valid
4. prefetch for read
5. see what the video card had to say