2007-08-07 02:03:27

by Cédric Augonnet

[permalink] [raw]
Subject: [PATCH 0/2] PAT support for i386 and x86_64

Hi all,

For quite a while now, there as been numerous attempt to introduce support for
Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
have some support for this feature. Such a proposition popping up periodically,
perhaps it would be an opportunity to fix that lack for once.

PAT actually makes life much easier for people needing to use write-combining
(WC) attribute. Indeed the only solution yet available is MTRRs, which are not
as flexible as PAT is.
Indeed, even if they both allow to set memory types on memory regions, MTRRs
are intended to be used on a physical memory range, while PAT makes possible
to set such memory type based on a linear address mapping. The actual problem
with MTRR is that the BIOS usually covers physical RAM with write-back (WB),
and that MTRRs should not be overlapped.
Using write-combining feature is sometimes crucial for performance of driver
for which writes operation on the bus are critical. High-speed networking
drivers such as myri10ge would take benefit of using PAT instead of MTRRs.
Video drivers using frame-buffers could also avoid using MTRRs that way.

PAT6 can be a candidate for that purpose. Not only for backward compatibility
reasons, but also as various errata concerning PAT would end up by having the
PAT bit ignored, therefore if we use PAT6, when such an error occurs we would
have an uncached area, which means it would at worst not be effective, thus
resolving the issues raised in http://lkml.org/lkml/2004/4/13/102 .

*******

Back to 2.4.20, Terence Ripperda already submitted such a proposal for PAT
support in
- http://lkml.org/lkml/2003/5/20/131
himself refering to
- http://www.ussg.iu.edu/hypermail/linux/kernel/0303.1/0606.html

In 2005, Eric W. Biederman submitted another similar patch :
- http://lkml.org/lkml/2005/8/29/226

More recently, there have been numerous threads dealing with MTRR issues,
some of them suggesting that there should actually be some support for PAT.
See for instance :
- http://lkml.org/lkml/2007/6/6/333

*******

I therefore propose here a set of 2 patches to add some support for
write-combining using PAT :

[PATCH 1/2] - [PAT] Setting write combining on PAT6 at boot time
[PATCH 2/2] - [PAT] Support for write combining in pci_mmap_page_range

Kind regards,
Cedric


2007-08-07 08:30:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/2] PAT support for i386 and x86_64

On Mon, Aug 06, 2007 at 10:03:15PM -0400, C?dric Augonnet wrote:
> Hi all,
>
> For quite a while now, there as been numerous attempt to introduce support for
> Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
> have some support for this feature. Such a proposition popping up periodically,
> perhaps it would be an opportunity to fix that lack for once.

The trouble is you need to avoid conflicting attributes, otherwise
you risk cache corruption. This means the direct mapping needs to be fixed
up and the kernel needs to keep track of the ranges to prevent conflicts.

Also when there is already a MTRR it might not work due to the complicated
rules of MTRR<->PAT interaction.

Then there are old CPU errata that need to be handled etc.

There are also some other issues.

You didn't solve all that at all. If it was as simple as your patch
we would have long done it already.

-Andi

2007-08-08 09:07:06

by Loic Prylli

[permalink] [raw]
Subject: Re: PAT support for i386 and x86_64

On 8/7/2007 4:30 AM, Andi Kleen wrote:
> On Mon, Aug 06, 2007 at 10:03:15PM -0400, C?dric Augonnet wrote:
>
>> Hi all,
>>
>> For quite a while now, there as been numerous attempt to introduce support for
>> Page Attribute Table (PAT) in the Linux kernel, whereas most other OS already
>> have some support for this feature. Such a proposition popping up periodically,
>> perhaps it would be an opportunity to fix that lack for once.
>>
>
> The trouble is you need to avoid conflicting attributes, otherwise
> you risk cache corruption. This means the direct mapping needs to be fixed
> up and the kernel needs to keep track of the ranges to prevent conflicts.
>




I don't see why we have to worry about cache corruption in the case at
hand. Write-combining is needed to map io (typically pci-mem regions)
which are never mapped cachable anywhere, including in the linear map.


If somebody for some reason needs to play with special attributes on
regular RAM for which inconsistent aliasing could be a problem:
- please explain why that consistency issue is mentioned in the context
of write-combining/PAT: the problem already potentially exists through
the use of the _PAGE_PCD attribute, and having an extra WC choice should
not make the problem worse or better (note that with the initial patch
that WC/PAT combination is only exploited in pci_mmap_page_range() which
rightfully doesn't seem to care about cachable attribute consistency
anyway).



> Also when there is already a MTRR it might not work due to the complicated
> rules of MTRR<->PAT interaction.
>



Some PAT<->MTRR cases are messy, but getting a WC type through PAT seems
to straightforwardly take precedence over any MTRR type, doesn't it?



> Then there are old CPU errata that need to be handled etc.
>



We mentioned that point in the introduction of the patch. We have looked
at the documented PAT erratas that exists for the Pentium-II,
Pentium-III, Pentium-M, some early pentium-IV processors. While there
are minor variations for the description of the bug depending on the
processor, they all fit into the following description: "under some
circumstances the PAT bit might be ignored". The patch purposefully puts
write-combining at PAT6 so if the conditions are there for the errata
to trigger, PAT2 (UC-) will be selected by the processor and the
corresponding accesses will simply be uncacheable instead of being
write-combining, which doesn't affect correctness.


We would certainly appreciate having any other erratas we missed
mentioned here for reference.


> There are also some other issues.
>



Introducing something like ioremap_wc() or a "sfence" wmb_wc() was
excluded from the initial patch on purpose. It would be the logical next
step (but involves possible API driver extensions), so the proposed
patch was limited to making use of the new WC attribute by really
handling the write_combine argument of pci_mmap_page_range(). That
seemed to generate enough objections to start with.


There is at least one mostly cosmetic problem in the patch in
pci_mmap_page_range() where huge pages should not be a concern here.



> You didn't solve all that at all. If it was as simple as your patch
> we would have long done it already.


I am sorry, but after this and other messages on the list, I still don't
understand why a simple approach hasn't been made available already:
- the attribute consistency issue seems independant of using PAT to
create WC mappings (in particular the possibility of mixing by accident
WC and UC aliases has always existed, having broken driver make such
aliases through a new PAT-based attribute combination hardly changes
anything, same for mixing uncachable and cachable aliases, new WC/PAT
combination doesn't change anything).


Sure there would some logical follow-up after a simple patch (like
providing a proper ioremap() interface, and maybe a new kind of barrier,
and handling the PAT bit properly in arch/x86/mm/pageattr.c, but those
follow-ups were purposefully excluded, and none seems very complex either).


So if there is any remaining issue on doing something "as simple as this
patch", please clarify it.


Best regards,


Loic


2007-08-08 10:14:50

by Andi Kleen

[permalink] [raw]
Subject: Re: PAT support for i386 and x86_64

> I don't see why we have to worry about cache corruption in the case at
> hand. Write-combining is needed to map io (typically pci-mem regions)
> which are never mapped cachable anywhere, including in the linear map.

If we WC them using PAT then there would be a UC<->WC conflict with
the direct mapping and possibly others. That's already undefined
and not allowed.

After some very bad experiences in the past I'm not going to take
chances on this.

We really need to keep all possible mappings synchronized.

-Andi

2007-08-08 15:52:33

by Cédric Augonnet

[permalink] [raw]
Subject: Re: PAT support for i386 and x86_64

[Apologize for the double-post, messed up with my mailer... ]

2007/8/8, Andi Kleen <[email protected]>:
> > I don't see why we have to worry about cache corruption in the case at
> > hand. Write-combining is needed to map io (typically pci-mem regions)
> > which are never mapped cachable anywhere, including in the linear map.
>
> If we WC them using PAT then there would be a UC<->WC conflict with
> the direct mapping and possibly others. That's already undefined
> and not allowed.
>
> After some very bad experiences in the past I'm not going to take
> chances on this.
>
> We really need to keep all possible mappings synchronized.
>
> -Andi
>
>

Hi,

First thanks for your reactions.

Andi, what i don't understand is, if we only put a minimal patch, like only
modifying the register, perhaps we may avoid having all vendors doing
their own recipe, possibly all messing the one the other. As you said ,
changing this register is absolutely not the actual issue with PAT, but don't
you think such a first step is needed to avoid conflicts since people _do_
already set that register from their driver.

I agree there is work for a comprehensive support of PAT, especially when
dealing with ioremap, but even though we only handle a smallish portion
of the problem yet, perhaps it's time to at least modify that register ?

Of course there is no problem for that being dependant on a CONFIG_PAT.

Kind regards,
C?dric