Hi All,
Sorry for the delay in getting this information out. I'll post this
info on gentoo.org as soon as I have time and some sleep.
Today, I had a conference call with Sean Cleveland and Wayne Meritsky of
AMD and Rik van Riel and William Lee Irwin of kernelhackerdom. Here is
what we know so far. There *is* an Athlon/AGP issue. This issue has
*not* been tied to a bug with the Athlon/Duron processors. The initial
reports of an Athlon CPU bug were based on information that I received
from an NVIDIA employee; this information turned out to be incorrect.
In all fairness, he simply misunderstood the technical details of this
issue and told me that it was due to a "Athlon CPU bug" rather than
saying that it was a "potential cache coherency interaction between
Athlon speculative writes and the GART". A mistake and misundestanding
of the details, but his heart was in the right place. He was trying to
get this problem out in the open so that it could be addressed by the
Linux community.
AMD's educated guess is that these Athlon/AGP stability problems have to
do with speculative writes by the CPU and how they can cause indavertent
trashing of AGP memory if pages are mapped with indiscretion by the OS
and drivers. The following explanation from AMD describes the issue in
detail:
-- AMD explanation follows ---
This note documents a subtle problem that AMD has seen in the field.
The operating system has created a 4MB translation that has attribute
bits that allow it to be cacheable. GART also contains translations to
part of the underlying physical memory of this 4MB translation.
This situation is fundamentally illegal because GART is non-coherent and
all translations that the processor could use to access the AGP memory
must, therefore, be non-cacheable. Although we have seen no intentional
access to the AGP memory by the processor via the 4MB cacheable
translation we have seen legitimate, speculative, accesses performed by
the processor.
The problem that has been experienced is caused by a speculative store
instruction that is not ultimately executed. The logical address of the
store is through the 4MB translation to physical memory also translated
by GART and used by the AGP processor.
The effect of the store is to write-allocate a cache line in the data
cache and fill that cache line with data from the underlying physical
memory. Because the line was write-allocated it is subsequently written
back to physical memory even though the bits have not been changed by
the processor. This write-back occurs when the cache-line is
re-allocated based upon replacement policy and is far removed in time
from the point at which the bits were read.
Between the time of the read and the time of the write, the AGP
processor has modified the bits in physical memory and the bits in the
data cache are stale. This is happens because GART, being non-coherent,
does not snoop the processor caches for modified data.
When the cache-line eviction occurs the stale data written to physical
memory has fatal side effects.
Our conclusion is that the operating system is creating coherency
problems within the system by creating cacheable translation to AGP
GART-mapped physical memory.
-- end of AMD explanation --
Best Regards,
--
Daniel Robbins <[email protected]>
Chief Architect/President http://www.gentoo.org
Gentoo Technologies, Inc.
This means that the fix belongs in the DRM drivers, specifically
DRM(mmap_dma) should clear the cacheability bits in the
vma->vm_page_prot at mmap time.
I always thought the idea was that the AGP device accessed main memory
through GART with full cache coherency with the processor. This
should be pretty easy to implement since the PCI controller has to do
this already.
I'm really surprised that both the NVIDIA driver and DRM both get this
wrong.
Actually, the AMD guys say this:
This situation is fundamentally illegal because GART is non-coherent and
all translations that the processor could use to access the AGP memory
must, therefore, be non-cacheable. Although we have seen no intentional
access to the AGP memory by the processor via the 4MB cacheable
translation we have seen legitimate, speculative, accesses performed by
the processor.
"access by the processor" to the 4MB cacheable translation or
somewhere else? This needs clarification.
Disabling 4MB translations has zero effect on the problem they say is
the root all of this. The mappings given to the OpenGL driver to the
GART memory is still going to be cacheable, thus the problem ought to
still exist.
As usual, AMD's commentary brings more questions than it answers.
From: Denis Vlasenko <[email protected]>
Date: Wed, 23 Jan 2002 12:10:57 -0200
Did AMD tell in what 4M page a cache line was speculativery read ant then
written? I mean, there are only a few 4M pages used by Linux, which one had
'cacheable' bit wrongly set? It's all we need to know now.
See my other email, actually it appears AMD tells us this and my
previous analysis was wrong.
> The problem that has been experienced is caused by a speculative store
> instruction that is not ultimately executed. The logical address of the
> store is through the 4MB translation to physical memory also translated
> by GART and used by the AGP processor.
>
> The effect of the store is to write-allocate a cache line in the data
> cache and fill that cache line with data from the underlying physical
> memory. Because the line was write-allocated it is subsequently written
> back to physical memory even though the bits have not been changed by
> the processor. This write-back occurs when the cache-line is
> re-allocated based upon replacement policy and is far removed in time
> from the point at which the bits were read.
He can only mean by this that there is some branch protected store
(not taken) to the 4MB linear mappings used by the kernel (starting
at PAGE_OFFSET).
But the only thing I am still confused about, is what 4MB mappings
have to do with any of this. What I take from the description is that
the problem will still exist after 4MB mappings are disabled. What
prevents the processor from doing the speculative store to the
cacheable mappings once 4MB pages are disabled?
At best, I bet turning off 4MB pages makes the bug less likely.
It does not eliminate the chance to hit the bug.
So what it sounds like is that if there is any cacheable mapping
_WHATSOEVER_ to physical memory accessible by the GART, the problem
can occur due to a speculative store being cancelled.
A real fix would be much more involved, therefore.
In fact, we map the GART mapped memory to the user fully cacheable.
That would have to be fixed, plus we'd need to mark non-cacheable the
linear PAGE_OFFSET mappings of the kernel (4MB or not) as well.
On Wed, 23 Jan 2002, David S. Miller wrote:
> But the only thing I am still confused about, is what 4MB mappings
> have to do with any of this. What I take from the description is that
> the problem will still exist after 4MB mappings are disabled. What
> prevents the processor from doing the speculative store to the
> cacheable mappings once 4MB pages are disabled?
>
> At best, I bet turning off 4MB pages makes the bug less likely.
> It does not eliminate the chance to hit the bug.
I've asked the same question yesterday on the phone.
The explanation is pretty simple:
1) the video driver gets free pages for the agp
data structures
2) the speculative store doesn't cross page
boundaries
This means that when using 4kB pages instead of 4MB
pages the agp data is "fenced off" from the other
kernel data.
kind regards,
Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document
http://www.surriel.com/ http://distro.conectiva.com/
>
> The effect of the store is to write-allocate a cache line in the data
> cache and fill that cache line with data from the underlying physical
> memory. Because the line was write-allocated it is subsequently written
> back to physical memory even though the bits have not been changed by
> the processor.
I'm not sure if I understand you correctly:
speculative write operations always set the cache line dirty bit, even
if the write operations is not executed (e.g. discarded due to a
mispredicted jump)
memory mapped by GART is not cache coherent, and the write-back of the
cache causes data corruptions.
Result: data corruption.
Is that correct?
Then "nopentium" only works by chance: I assume that speculative
operations do not walk the page tables, thus the probability that a
valid TLB entry is found for the GART mapped page is slim. But if there
is an entry, then the corruption would still occur.
How could we work around it?
a) At GART mapping time, we'd have to
- flush the cache
- unmap the pte entries that point to the pages that will be mapped by
GART
- create a new, uncached, ioremap mapping to the pages.
Obviously that won't work with 4 MB pages.
b) abuse highmem.
highmem memory is not mapped. If we only use highmem pages for GART, and
ensure that page->virtual is 0, then we know that no valid pte points
into the GART pages.
On Wed, Jan 23, 2002 at 02:24:41AM -0800, David S. Miller wrote:
> He can only mean by this that there is some branch protected store
> (not taken) to the 4MB linear mappings used by the kernel (starting
> at PAGE_OFFSET).
> But the only thing I am still confused about, is what 4MB mappings
> have to do with any of this. What I take from the description is that
> the problem will still exist after 4MB mappings are disabled. What
> prevents the processor from doing the speculative store to the
> cacheable mappings once 4MB pages are disabled?
The range of addresses where speculation is attempted is partially
limited by the page size, for it's unlikely the CPU will attempt to
resolve TLB misses during speculative memory access until it's
committed to them. Furthermore the separate TLB's for 4KB and 4MB
pages on i386 allow far more TLB hits during speculation.
On Wed, Jan 23, 2002 at 02:24:41AM -0800, David S. Miller wrote:
> At best, I bet turning off 4MB pages makes the bug less likely.
> It does not eliminate the chance to hit the bug.
> So what it sounds like is that if there is any cacheable mapping
> _WHATSOEVER_ to physical memory accessible by the GART, the problem
> can occur due to a speculative store being cancelled.
> A real fix would be much more involved, therefore.
> In fact, we map the GART mapped memory to the user fully cacheable.
Controlling how page tables are edited and/or statically set up does
not seem that far out to me, though it could be inconvenient,
especially with respect to dynamically-created translations such as
are done for user pages, as there is essentially no infrastructure
for controlling the cacheable attribute(s) of user mappings now as
I understand it.
On Wed, Jan 23, 2002 at 02:24:41AM -0800, David S. Miller wrote:
> That would have to be fixed, plus we'd need to mark non-cacheable the
> linear PAGE_OFFSET mappings of the kernel (4MB or not) as well.
I would be concerned about efficiency if a larger portion of the direct-
mapped kernel virtual address space than necessary were uncacheable.
Otherwise, if I understand this properly (pardon me for being conservative
in my interpretation) you refer only to the kernel mappings of memory used
by the GART.
Cheers,
Bill
From: Rik van Riel <[email protected]>
Date: Wed, 23 Jan 2002 08:31:32 -0200 (BRST)
This means that when using 4kB pages instead of 4MB
pages the agp data is "fenced off" from the other
kernel data.
Kernel data this explains, thank you.
But on the user side, we map these GART-mapped pages into user space
with the cacheable bit set.
From: Studierende der Universitaet des Saarlandes <[email protected]>
Date: Wed, 23 Jan 2002 10:38:09 +0000
Then "nopentium" only works by chance: I assume that speculative
operations do not walk the page tables, thus the probability that a
valid TLB entry is found for the GART mapped page is slim. But if there
is an entry, then the corruption would still occur.
This isn't true. The speculative store won't get data into the
cache if there is a TLB miss.
4MB pages map the GART pages and "other stuff", ie. memory used by
other subsystems, user pages and whatever else. This is the only
way the bug can be thus triggered for kernel mappings, which is why
turning off 4MB pages fixes this part.
The only unresolved bit is the fact that we map these GART pages
cacheable into user space. That ought to cause the problem too.
[email protected] said:
> speculative write operations always set the cache line dirty bit,
> even if the write operations is not executed (e.g. discarded due to a
> mispredicted jump)
How predictable is this? Dealing with non-coherent memory is perfectly
normal - could we manage to work around this problem by flushing the caches
when the CPU _might_ have dirtied a cache line rather than only when we know
we've actually written to memory? Something like...
--- old.c Wed Jan 23 11:31:01 2002
+++ new.c Wed Jan 23 11:30:30 2002
@@ -1,5 +1,7 @@
if (condition) {
writeb();
- simon_says_flush_cache_page();
}
+/* Flush the cache unconditionally - a speculative write may have dirtied
+ the cache line even though it didn't actually happen. */
+ simon_says_flush_cache_page();
Of course, if the behaviour is completely random, and the CPU will dirty
random cache lines from all over the place, even from completely unrelated
code that just happens to have the 'wrong' address in a register that it
doesn't actually end up dereferencing, that can never work.
--
dwmw2
From: William Lee Irwin III <[email protected]>
Date: Wed, 23 Jan 2002 03:39:42 -0800
as there is essentially no infrastructure
for controlling the cacheable attribute(s) of user mappings now as
I understand it.
Yes there most certainly are. The driver's MMAP method can fully edit
the page protection attributes for that mmap area as it pleases.
Franks a lot,
David S. Miller
[email protected]
From: David Woodhouse <[email protected]>
Date: Wed, 23 Jan 2002 11:44:48 +0000
[email protected] said:
> speculative write operations always set the cache line dirty bit,
> even if the write operations is not executed (e.g. discarded due to a
> mispredicted jump)
How predictable is this? Dealing with non-coherent memory is perfectly
normal - could we manage to work around this problem by flushing the caches
when the CPU _might_ have dirtied a cache line rather than only when we know
we've actually written to memory? Something like...
It isn't so simple. You would have to catch every single store to
every page in the 4MB mapped region that happens to contain GART
mapped pages.
This isn't the way to solve this problem, trust me. :)
>>>>> "David" == David S Miller <[email protected]> writes:
David> 4MB pages map the GART pages and "other stuff", ie. memory used by
David> other subsystems, user pages and whatever else. This is the only
David> way the bug can be thus triggered for kernel mappings, which is why
David> turning off 4MB pages fixes this part.
Erm, why would the granularity of mapping matter at all ? Or, for that
matter, the very existanse of address translation ?
David> The only unresolved bit is the fact that we map these GART pages
David> cacheable into user space. That ought to cause the problem too.
They're mapped with 4KB pages too, right ? What makes it diferent to
4KB kernel mappings ?
From: Momchil Velikov <[email protected]>
Date: 23 Jan 2002 14:32:57 +0200
Erm, why would the granularity of mapping matter at all ?
Because on a TLB miss the speculative store would be cancelled.
With 4MB pages the TLB can hit, with 4K pages it cannot.
Franks a lot,
David S. Miller
[email protected]
>>>>> "David" == David S Miller <[email protected]> writes:
David> From: Momchil Velikov <[email protected]>
David> Date: 23 Jan 2002 14:32:57 +0200
David> Erm, why would the granularity of mapping matter at all ?
David> Because on a TLB miss the speculative store would be cancelled.
David> With 4MB pages the TLB can hit, with 4K pages it cannot.
Yes. But there _is_ some instruction writing into the AGP memory, and
this instruction will still write there no matter what are mappings,
and it can still get speculatively executed and so on, leading to the
same result, no ?
>
>This means that the fix belongs in the DRM drivers, specifically
>DRM(mmap_dma) should clear the cacheability bits in the
>vma->vm_page_prot at mmap time.
I'm afraid there might be more to this, see below.
>I always thought the idea was that the AGP device accessed main memory
>through GART with full cache coherency with the processor. This
>should be pretty easy to implement since the PCI controller has to do
>this already.
AFAIK, most (if not all) AGP bridges do not enforce cache coherency
for AGP transactions (while they do for PCI transactions through the
AGP port). AGP memory has to be mapped uncacheable.?
>I'm really surprised that both the NVIDIA driver and DRM both get this
>wrong.
>
>Actually, the AMD guys say this:
>
> This situation is fundamentally illegal because GART is non-coherent and
> all translations that the processor could use to access the AGP memory
> must, therefore, be non-cacheable. Although we have seen no intentional
> access to the AGP memory by the processor via the 4MB cacheable
> translation we have seen legitimate, speculative, accesses performed by
> the processor.
>
>"access by the processor" to the 4MB cacheable translation or
>somewhere else? This needs clarification.
I'm not sure exactly about the AMD case, but there is at least a potential
problem with PPC in this regard. The issue is that in addition to the
non-cacheable mapping setup by the AGP driver (both the vma setup for
userland clients and the ioremap or whatever mapping setup for in kernel
clients), there is the kernel's own mapping of entire physical memory
(at least in non-highmem setup) which is cacheable. That means that there
is the theorical possibility of getting some AGP mapped cache lines
polluting the cache and causing coherency problem if
- That memory is accessed via the kernel mapping of physical memory
(which shouldn't happen, but we should still make sure we properly
invalidate that memory from the cache when we actually setup the
AGP mapping)
- That memory becomes the target of a speculative access by the CPU
(either read or write). This _can_ actually happen if the CPU can do
speculative accesses accross page boundaries. A 4k page mapped to AGP
can very well be physically located between 2 completely unrelated pages
that are used by the kernel via the kernel main RAM mapping. Accessing
some datas at the end of the previous page could cause the CPU to do
a speculative access to the next page as the mapping exist and is cacheable
and non-guarded.
The workaround here would be for AGP to also _unmap_ the AGP pages from
the main kernel mapping, which isn't always possible, for example on PPC
we use the BATs to map the kernel lowmem, we can't easily make "holes" in
a BAT mapping. That's one reason why I did some experiments to make the
PPC kernel able to disable it's BAT mapping.
Now the question is: Which CPUs can do speculative access accross page
boundaries ?
Ben.
From: [email protected]
Date: Wed, 23 Jan 2002 03:46:10 +0100
The workaround here would be for AGP to also _unmap_ the AGP pages from
the main kernel mapping, which isn't always possible, for example on PPC
we use the BATs to map the kernel lowmem, we can't easily make "holes" in
a BAT mapping. That's one reason why I did some experiments to make the
PPC kernel able to disable it's BAT mapping.
This would be impossible on sparc64 too, since we implement these
mappings statically with an add instruction in the TLB handler.
But we also lack AGP on sparc64 so...
I don't think your PPC case needs the kernel mappings messed with.
I really doubt the PPC will speculatively fetch/store to a TLB
missing address.... unless you guys have large TLB mappings on
PPC too?
On Wed, 23 Jan 2002, David S. Miller wrote:
>
> This isn't true. The speculative store won't get data into the
> cache if there is a TLB miss.
>
The Pentium III loads TLB entries speculatively, there is a Intel
document how to flush tbl entries where they explicitely mention that.
> 4MB pages map the GART pages and "other stuff", ie. memory used by
> other subsystems, user pages and whatever else. This is the only
> way the bug can be thus triggered for kernel mappings, which is why
> turning off 4MB pages fixes this part.
>
We might be luky - pIII performs speculative tlb loads, and Athlon
performs spurious cache line writeouts, but I don?t trust such solutions.
>I don't think your PPC case needs the kernel mappings messed with.
>I really doubt the PPC will speculatively fetch/store to a TLB
>missing address.... unless you guys have large TLB mappings on
>PPC too?
Yes, we use BATs (sort of built-in fixed large TLBs) to map
the lowmem (or entire RAM without CONFIG_HIGHMEM).
So if some kind of loop is fetching memory near the end of a non-AGP
page via the linear RAM mapping (BAT mapping) and the next page is an
AGP bound page, the CPU may do speculative access to the AGP page via
the BAT mapping thus bringing in a cache line for the AGP page.
At least, that's my understanding, it has to be validated by some
CPU gurus from IBM though.
Ben.
David S. Miller writes:
> From: [email protected]
>> The workaround here would be for AGP to also _unmap_ the AGP pages from
>> the main kernel mapping, which isn't always possible, for example on PPC
>> we use the BATs to map the kernel lowmem, we can't easily make "holes" in
>> a BAT mapping. That's one reason why I did some experiments to make the
>> PPC kernel able to disable it's BAT mapping.
>
> This would be impossible on sparc64 too, since we implement these
> mappings statically with an add instruction in the TLB handler.
>
> But we also lack AGP on sparc64 so...
>
> I don't think your PPC case needs the kernel mappings messed with.
> I really doubt the PPC will speculatively fetch/store to a TLB
> missing address.... unless you guys have large TLB mappings on
> PPC too?
Yup, we do.
The PPC has a regular TLB for 4 kB pages, typically loaded
by a hardware hash-table lookup. It also has the BAT registers,
which act as a 4-entry software reloaded TLB for large mappings.
Early-stage MMU operations go like this:
1. simultaneous lookup in BAT registers and regular TLB
2. use BAT mapping if found
3. use TLB entry if found
4. proceed to page table lookup
So, if a speculative load/store operation happens in kernel memory,
it will definitely not be impeded by any TLB or page restrictions.
The regular TLB is simply ignored when there is a BAT hit.
That leaves 2 things required for the problem:
speculative stores cause cache loads with the dirty bit?
AGP non-coherent?
In the MPC7400 (first "G4") user's manual, I find no indication
that speculative stores occur at all. Motorola's manuals are
horrible though, so who knows...
AGP might be non-coherent. If so, then the CPU should use a
non-coherent mapping to avoid useless memory bus traffic.
User code has access to some cache control instructions,
so one may mark the memory cacheable for better performance
even when it is non-coherent. ("flush when you're done")
BTW, I'd say the Athlon is pretty broken to set the dirty bit
before a store is certain. The CPU has to be able to set this
bit on a clean cache line anyway, so I don't see how this
brokenness could help performance. Indeed, it hurts performance
by causing erroneous memory bus traffic. (It's a bug.)
On Wed, 2002-01-23 at 09:31, Albert D. Cahalan wrote:
> AGP might be non-coherent. If so, then the CPU should use a
> non-coherent mapping to avoid useless memory bus traffic.
> User code has access to some cache control instructions,
> so one may mark the memory cacheable for better performance
> even when it is non-coherent. ("flush when you're done")
>
> BTW, I'd say the Athlon is pretty broken to set the dirty bit
> before a store is certain. The CPU has to be able to set this
> bit on a clean cache line anyway, so I don't see how this
> brokenness could help performance. Indeed, it hurts performance
> by causing erroneous memory bus traffic. (It's a bug.)
The answer I got from AMD on this is that because the page is marked as
cacheable, they are _allowed_ to do this. Cacheable means that you are
allowed to cache, and also means that you are allowed to write back into
main memory if a cache line is marked dirty. At least that's how they
explained it to me. I believe the way they explained it to me is that
due to the page being cacheable, the CPU "owns" the page.
I don't know if I'd describe the writing out to main memory as a bug as
much as it is an area where, for whatever reason, AMD decided not to add
some additional processor resources to handle this particular case.
Maybe the current behavior allowed them to implement speculative writes
much more efficiently (from a # of transistors on the chip perspective)
than if they added special logic so that these speculative writes were
*not* written out to main memory. To me, this particular bug seems more
like an unfortunate coincidence of having a non-cache-coherent GART
working alongside a cache-coherent CPU.
Well, at least you guys see how someone at NVIDIA could mistake this for
a CPU bug. If the kernel is affected, I hope we can find a nice
workable solution to this problem that doesn't involve disabling 4Mb
pages.
Also, everyone -- I can forward any additional questions you may have to
the people at AMD and get answers for you. All I ask is that these
questions come from actual people who will be looking into a solution,
since that is of course my main concern. I can then post answers from
AMD to this list for all to read.
Best Regards,
--
Daniel Robbins <[email protected]>
Chief Architect/President http://www.gentoo.org
Gentoo Technologies, Inc.
David S. Miller writes:
> From: William Lee Irwin III <[email protected]>
>> as there is essentially no infrastructure
>> for controlling the cacheable attribute(s) of user mappings now as
>> I understand it.
>
> Yes there most certainly are. The driver's MMAP method can fully edit
> the page protection attributes for that mmap area as it pleases.
That doesn't help for MAP_ANON pages.
That doesn't help when there are multiple useful cache settings.
It's not sane for every arch-independent driver to implement an
ioctl() or alternate devices. For PPC, you'd need 12 devices.
To a limited extent, the PPC can handle conflicting settings in
a useful manner. Not all 12 settings at once, but more than one.
BTW, reverse mappings could be useful for conflicting settings.
It is perfectly reasonable for a user to want non-coherent
memory and memory with odd caching behavior. It is not entirely
unreasonable to want large regions of memory to be BAT-mapped
for somewhat dedicated (Beowulf compute cluster) systems.
>AGP might be non-coherent. If so, then the CPU should use a
>non-coherent mapping to avoid useless memory bus traffic.
>User code has access to some cache control instructions,
>so one may mark the memory cacheable for better performance
>even when it is non-coherent. ("flush when you're done")
That's unfortunately not enough. The mapping of the page to
userland and the in-kernel mapping of the AGP aperture are done
with non-cacheable attribute. _BUT_, that same memory is also
mapped as part of the RAM linear mapping of the kernel (the
BAT mapping on PPC). The problem happens when some code working
near the end of a different page via this linear mapping cause
a speculative access to happen on the next page. This will have
the side-effect of loading the cache with a line from the page
used by AGP.
I think PPC does only speculative reads, but even those (non dirty
cache lines) may cause trouble in our case.
Now, we have to check if the PPC is allowed to do speculative
reads accross page boundaries. If it's the case, then we are screwed
and I will have to cleanup the code allowing the kernel to run without
the BAT mapping (with a performance impact unfortunately).
Ben.
From: "Albert D. Cahalan" <[email protected]>
Date: Wed, 23 Jan 2002 12:09:40 -0500 (EST)
> Yes there most certainly are. The driver's MMAP method can fully edit
> the page protection attributes for that mmap area as it pleases.
That doesn't help for MAP_ANON pages.
But it helps _THIS_ case, DRM(dma_mmap) is where all AGP memory comes
from and we can control the page protections for every page there.
If you want to start a thread about controlling cacheability
generically from mmap() or whatever idea you have, please
start a different thread and change the Subject.
On Wednesday 23 January 2002 11:18, David S. Miller wrote:
> This means that the fix belongs in the DRM drivers, specifically
> DRM(mmap_dma) should clear the cacheability bits in the
> vma->vm_page_prot at mmap time.
Is that sufficient ? Must the cache be flushed explicitely ?
[..]
> Disabling 4MB translations has zero effect on the problem they say is
> the root all of this. The mappings given to the OpenGL driver to the
> GART memory is still going to be cacheable, thus the problem ought to
> still exist.
>
> As usual, AMD's commentary brings more questions than it answers.
Perhaps speculative writes require an entry in the TLB making it less likely
that they'll happen to 4KB pages.
Regards
Oliver
[email protected] writes:
> [Albert Cahalan]
>> AGP might be non-coherent. If so, then the CPU should use a
>> non-coherent mapping to avoid useless memory bus traffic.
>> User code has access to some cache control instructions,
>> so one may mark the memory cacheable for better performance
>> even when it is non-coherent. ("flush when you're done")
>
> That's unfortunately not enough. The mapping of the page to
> userland and the in-kernel mapping of the AGP aperture are done
> with non-cacheable attribute.
This is the slowest choice, but will work correctly.
It is better to make the user do explicit "dcbf", etc.
on cached memory. (use non-coherent mappings to avoid
wasting memory bus cycles on cache coherency traffic)
As long as users go through a library, they won't mind.
> _BUT_, that same memory is also
> mapped as part of the RAM linear mapping of the kernel (the
> BAT mapping on PPC). The problem happens when some code working
> near the end of a different page via this linear mapping cause
> a speculative access to happen on the next page. This will have
> the side-effect of loading the cache with a line from the page
> used by AGP.
>
> I think PPC does only speculative reads, but even those (non dirty
> cache lines) may cause trouble in our case.
Speculative reads only cause trouble if:
1. they are cached by an access through the BAT mapping
2. reads through the uncached page mapping use the cache
3. user code cares... AGP memory is for WRITING video frames, yes?
Speculative writes are like speculative reads, unless the PPC
is stupid enough to set the dirty bit even when the write does
not get performed.
> Now, we have to check if the PPC is allowed to do speculative
> reads accross page boundaries. If it's the case, then we are screwed
> and I will have to cleanup the code allowing the kernel to run without
> the BAT mapping (with a performance impact unfortunately).
It's a waste to use BAT mappings for the kernel anyway, because
we try to keep the huge computations and graphics in userspace.
With page tables under BAT mappings, privileged user code could
be allowed to steal BAT registers for locked memory or IO memory.
So at the very least, you can keep the BAT mappings enabled
until user code wants AGP memory or is allowed to have the
BAT registers. When the user is done, the BAT registers may
be used to cover kernel space again. Other than the memory
used for page tables, there doesn't seem to be any harm in
having page tables that match the BAT registers in use.
On Wed, Jan 23, 2002 at 06:14:33PM -0500, Albert D. Cahalan wrote:
>
> It's a waste to use BAT mappings for the kernel anyway, because
> we try to keep the huge computations and graphics in userspace.
> With page tables under BAT mappings, privileged user code could
> be allowed to steal BAT registers for locked memory or IO memory.
The rationale behind BAT mapping the kernel is that the kernel does not
use any TLB entries, leaving them all for user processes. (As long as
we have < 512MB RAM.)
-VAL
On Wed, Jan 23, 2002 at 04:47:37PM +0100, [email protected] wrote:
> >I don't think your PPC case needs the kernel mappings messed with.
> >I really doubt the PPC will speculatively fetch/store to a TLB
> >missing address.... unless you guys have large TLB mappings on
> >PPC too?
>
> Yes, we use BATs (sort of built-in fixed large TLBs) to map
> the lowmem (or entire RAM without CONFIG_HIGHMEM).
Looking at bat_mapin_ram, it looks like we only map the first 512MB of
RAM with BATs, so we actually map the 512MB - 768MB range with PTEs
(and highmem starts at 768MB). Two of the DBATs are used by I/O
mappings, so that only leaves two DBATs of 256MB each to map lowmem
anyway. Am I missing something?
By the way, does the "nobats" option currently work on PowerMac?
-VAL
>On Wed, Jan 23, 2002 at 04:47:37PM +0100, [email protected] wrote:
>> >I don't think your PPC case needs the kernel mappings messed with.
>> >I really doubt the PPC will speculatively fetch/store to a TLB
>> >missing address.... unless you guys have large TLB mappings on
>> >PPC too?
>>
>> Yes, we use BATs (sort of built-in fixed large TLBs) to map
>> the lowmem (or entire RAM without CONFIG_HIGHMEM).
>
>Looking at bat_mapin_ram, it looks like we only map the first 512MB of
>RAM with BATs, so we actually map the 512MB - 768MB range with PTEs
>(and highmem starts at 768MB). Two of the DBATs are used by I/O
>mappings, so that only leaves two DBATs of 256MB each to map lowmem
>anyway. Am I missing something?
No, except maybe my last patch that actually limits lowmem to 512Mb ;)
I don't think we use the io mapping BATs any more, do we ? (well,
maybe on PReP...) I don't on pmac.
>By the way, does the "nobats" option currently work on PowerMac?
No, nor on any other BAT-capable PPC (and that's the reason why I
did the above). Basically, our exception return path and some of
the hash manipulation functions aren't safe without BAT mapping,
especially on SMP when you can get evicted from the hash table
by the other CPU in places where taking hash faults isn't safe.
Ben.
On Sat, Jan 26, 2002 at 01:20:45AM +0100, Benjamin Herrenschmidt wrote:
> >
> >Looking at bat_mapin_ram, it looks like we only map the first 512MB of
> >RAM with BATs, so we actually map the 512MB - 768MB range with PTEs
> >(and highmem starts at 768MB). Two of the DBATs are used by I/O
> >mappings, so that only leaves two DBATs of 256MB each to map lowmem
> >anyway. Am I missing something?
>
> No, except maybe my last patch that actually limits lowmem to 512Mb ;)
:)
> I don't think we use the io mapping BATs any more, do we ? (well,
> maybe on PReP...) I don't on pmac.
Lots and lots of PPC platforms use BATs for io mappings:
val@evilcat </sys/linuxppc_2_4_devel_pristine/arch/ppc/platforms>$ grep -l ppc_md.setup_io_mappings *
grep: SCCS: Is a directory
adir_setup.c
apus_setup.c
chrp_setup.c
gemini_setup.c
k2_setup.c
lopec_setup.c
mcpn765_setup.c
menf1_setup.c
mvme5100_setup.c
pcore_setup.c
powerpmc250.c
pplus_setup.c
prep_setup.c
prpmc750_setup.c
prpmc800_setup.c
sandpoint_setup.c
spruce_setup.c
zx4500_setup.c
I'm trying to get highmem working on Gemini, hence my interest.
> >By the way, does the "nobats" option currently work on PowerMac?
>
> No, nor on any other BAT-capable PPC (and that's the reason why I
> did the above). Basically, our exception return path and some of
> the hash manipulation functions aren't safe without BAT mapping,
> especially on SMP when you can get evicted from the hash table
> by the other CPU in places where taking hash faults isn't safe.
Hm, that's what I thought. Thanks for confirming that.
-VAL
>> I don't think we use the io mapping BATs any more, do we ? (well,
>> maybe on PReP...) I don't on pmac.
>
>Lots and lots of PPC platforms use BATs for io mappings:
>
>val@evilcat </sys/linuxppc_2_4_devel_pristine/arch/ppc/platforms>$ grep -
>l ppc_md.setup_io_mappings *
>grep: SCCS: Is a directory
>adir_setup.c
>apus_setup.c
>chrp_setup.c
>gemini_setup.c
>k2_setup.c
>lopec_setup.c
>mcpn765_setup.c
>menf1_setup.c
>mvme5100_setup.c
>pcore_setup.c
>powerpmc250.c
>pplus_setup.c
>prep_setup.c
>prpmc750_setup.c
>prpmc800_setup.c
>sandpoint_setup.c
>spruce_setup.c
>zx4500_setup.c
Hrm... all of these ? well... I don't like that. I'd prefer a lot
people to just properly ioremap what they need.
But well... I don't maintain all of them.
------------------ RFC822 Header Follows ------------------
From: <[email protected]>
To: Val Henson <[email protected]>
Cc: <[email protected]>
Subject: Re: Athlon/AGP issue update
Date: Sun, 27 Jan 2002 20:32:01 +0100
Message-Id: <[email protected]>
In-Reply-To: <20020127122235.D11111@boardwalk>
In-Reply-To: <20020127122235.D11111@boardwalk>
X-Mailer: CTM PowerMail 3.1.1 <http://www.ctmdev.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
-----------------------------------------------------------