On 7 Aug, 09:40, Andi Kleen <[email protected]> wrote:
> On Mon, Aug 06, 2007 at 10:03:15PM -0400, C?dric Augonnet wrote:
> > Hi all,
>
> > For quite a while now, there as been numerous attempt to introduce support for
> > Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
> > have some support for this feature. Such a proposition popping up periodically,
> > perhaps it would be an opportunity to fix that lack for once.
>
> The trouble is you need to avoid conflicting attributes, otherwise
> you risk cache corruption. This means the direct mapping needs to be fixed
> up and the kernel needs to keep track of the ranges to prevent conflicts.
The general case of drivers setting up mappings to PCI memory space is
usually done consistently, thus no virtual pages with different PAT
indexes referring to the same physical page would exist here.
What cases did you have in mind?
> Also when there is already a MTRR it might not work due to the complicated
> rules of MTRR<->PAT interaction.
Not so complicated - stronger ordering always takes precedence (for safety).
>From current Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, section 10.5.2:
"[...] If there is an overlap of page-level and MTRR caching controls,
the mechanism that prevents caching has precedence. In cases where
there is a overlap in the assignment of the write-back and
write-through caching policies to a page and a region of memory, the
write-through policy takes precedence. The write-combining policy
(which can only be assigned through an MTRR or the PAT) takes
precedence over either write-through or write-back."
If (eg) a broken BIOS marks PCI memory space as UC, we gain nothing
like today. Most modern systems (in the server domain) I've seen don't
do this, as Windows' use of PAT would be broken too.
> Then there are old CPU errata that need to be handled etc.
I don't see a problem with having CONFIG_PAT depend on P4/Athlon or
newer to avoid such workarounds for now. Better to have something
there which works in (only) 80% of the cases, than nothing at all.
> There are also some other issues.
What like?
> You didn't solve all that at all. If it was as simple as your patch
> we would have long done it already.
Perhaps this feature can be marked as WIP to allow this to move
forward while corner-cases are worked out. Adding such a go/no-go
barrier will hamper progress we do see, as it has done already for
years.
The PCI ordering config option years ago was a similar case, since
various drivers didn't issue wmb()s until they were fixed up.
Daniel
--
Daniel J Blueman
> What cases did you have in mind?
Mainly mapping memory which is rather tricky.
> From current Intel 64 and IA-32 Architectures Software Developer's Manual
> Volume 3A: System Programming Guide, section 10.5.2:
If you look at the detailed table it's not that simple.
> > Then there are old CPU errata that need to be handled etc.
>
> I don't see a problem with having CONFIG_PAT depend on P4/Athlon or
> newer to avoid such workarounds for now. Better to have something
> there which works in (only) 80% of the cases, than nothing at all.
Agreed, but the patch doesn't do that.
> > You didn't solve all that at all. If it was as simple as your patch
> > we would have long done it already.
>
> Perhaps this feature can be marked as WIP to allow this to move
> forward while corner-cases are worked out. Adding such a go/no-go
We're not going to have potentially cache corrupting features
as WIP. The last time one such bug took months to track down.
> barrier will hamper progress we do see, as it has done already for
> years.
Because it's not simple and nobody has done it properly yet.
> The PCI ordering config option years ago was a similar case, since
> various drivers didn't issue wmb()s until they were fixed up.
wmb alone is not enough for PCI ordering.
-Andi
On 07/08/07, Andi Kleen <[email protected]> wrote:
> > What cases did you have in mind?
>
> Mainly mapping memory which is rather tricky.
Yes, I'm talking about memory-mapping PCI memory regions; however I
can't see other immediate uses. Then maybe it's acceptable to document
that - for now - we need to hand out mappings which have consistent
PAT indexes, which would be natural in all cases I've seen with
processes writing/bursting to registers in PCI memory space.
> > From current Intel 64 and IA-32 Architectures Software Developer's Manual
> > Volume 3A: System Programming Guide, section 10.5.2:
>
> If you look at the detailed table it's not that simple.
Ignoring the N/A PPro and PII table, the various combinations of MTRR
and PAT entries (table 10-7) again show expected safe behaviour. We
don't worry about WP MTRRs, since BIOSes and Linux don't set that, and
we are practically only interested in WC and UC access, which we get
irrespective of the MTRR type covering that region.
A number of vendors (graphics, HPC etc) implement and use their own
PAT mechanisms, so defining only WC and UC covers all the cases.
> > > Then there are old CPU errata that need to be handled etc.
> >
> > I don't see a problem with having CONFIG_PAT depend on P4/Athlon or
> > newer to avoid such workarounds for now. Better to have something
> > there which works in (only) 80% of the cases, than nothing at all.
>
> Agreed, but the patch doesn't do that.
Noted for later.
> > > You didn't solve all that at all. If it was as simple as your patch
> > > we would have long done it already.
> >
> > Perhaps this feature can be marked as WIP to allow this to move
> > forward while corner-cases are worked out. Adding such a go/no-go
>
> We're not going to have potentially cache corrupting features
> as WIP. The last time one such bug took months to track down.
A driver that hands out mappings to the same PCI memory address with
different access ordering seems broken by design (due to PCI BARs
being strictly prefetchable or not). Again vendors implement their own
PAT mechanics which are consistent about access types.
(Intel IA32 ASDM 3A 10-47) "a WC page must never be aliased to a
cacheable page because WC writes may not check the processor caches."
This is not talking about corrupting the cache generally, but data
hitting a PCI target twice by (ie) a broken driver.
> > barrier will hamper progress we do see, as it has done already for
> > years.
>
> Because it's not simple and nobody has done it properly yet.
Providing only UC and WC types and documenting aliasing problems is
what is required here.
> > The PCI ordering config option years ago was a similar case, since
> > various drivers didn't issue wmb()s until they were fixed up.
>
> wmb alone is not enough for PCI ordering.
(not the point in question)
> -Andi
--
Daniel J Blueman
On Tue, Aug 07, 2007 at 04:26:06PM +0100, Daniel J Blueman wrote:
> (Intel IA32 ASDM 3A 10-47) "a WC page must never be aliased to a
> cacheable page because WC writes may not check the processor caches."
Yes that is what we're trying to avoid, but it is far more complicated
than you think.
> This is not talking about corrupting the cache generally, but data
> hitting a PCI target twice by (ie) a broken driver.
This can end up with data corruption, machine checks or
a broken device. All happened.
> > > barrier will hamper progress we do see, as it has done already for
> > > years.
> >
> > Because it's not simple and nobody has done it properly yet.
>
> Providing only UC and WC types and documenting aliasing problems is
> what is required here.
At least the direct mapping needs to be changed too and any mapping
comming from other sources (/sys/bus/pci, /dev/mem etc.)
Otherwise you get the dangerous illegal aliases.
The chances of a driver getting this right even when documented
are very small.
BTW i had some prototype patches to address some of this,
but they need much more work to handle all this properly.
Just reprogramming the PAT registers is really the smallest issue.
The IA64 port has done some of the work as an example, unfortunately they
use a strange memory mapping scheme that cannot be easily moved
over. iirc Bjorn identified over 10 potential sources of cache aliases
that all needed to be addressed in the end.
-Andi