2015-06-11 20:37:05

by Luis Chamberlain

[permalink] [raw]
Subject: RIP MTRR - status update for upcoming v4.2

The series to bury direct MTRR use is almost all in and on its way to
v4.2. As the pending series continue slowly to be merged I wanted to
take the time to reiterate the justification for these changes in
hopes it may help those still reviewing some of these patches which
are pending and to help document all these changes. There are also
some series which depend on symbols now exported through some other
subsystem trees so coordination is needed there. In those cases we
have the option there to sit and wait for the exported symbols to
trickle in through v4.2 and later on v4.3 finalize the changes, or to
let some of the depending changes to in through other subsystem trees.
I don't consider the coordination required difficult to handle so
would prefer to see the changes in for v4.2 to be able to put a nail
on the MTRR coffin sooner rather than later and to also help get more
testing out of this sooner rather than later. PAT is known to have
errata for some CPUs so hearing reports of issues with PAT would be
very valuable. I'll let maintainers decide on how that trickles
through. To help with all this towards the end I provide a status of
all the pending patches to get this work completed.

Justification
=========

We want to bury direct use of MTRR code because:

a) MTRR is x86 specific, this means all existing MTRR code is #idef'd
out. PAT support for x86 was implemented using architecture agnostic
APIs, this enables other architectures to provide support for a
similar write-combining feature, and removes the nasty #idef eyesores.
MTRR should be seen as a first step temporary architectural evolution
to what PAT eventually became on x86.

b) We have a long term goal to change the default behavior of
ioremap_nocache() and pci_mmap_page_range() to use PAT strong UC,
right now we cannot do this, but after all drivers are converted (all
these series I've been posting) we expect to be able to make the
change. Making a change to strong UC on these two calls can only
happen after a period of time of having Linux bake with all these
changes merged and in place. How many kernels we will want Linux baked
with all these transformations to arch_phys before making a change to
ioremap_nocache() and pci_mmap_page_range() is up to x86 folks. There
are other gains possible with this but I welcome others to chime in
here with what gains we can expect from this.

c) MTRR acts on physical addresses and requires power-of-two
alignment, on both the base used and size, this limits the flexibility
for MTRR use. For a good example of its limitations refer to the
patches which change the atyfb driver from using MTRR to PAT.

d) MTRR is known to be unreliable, it can at times not work even on
modern systems.

e) There is a limit to how many MTRRs you can use on a system. If
using a large number of devices with MTRR support you will quickly run
out of MTRRs. This is why originally Andy Lutomirski ended up adding
the arch_phys_wc_add() API, in order to take advantage of PAT which is
*not* bound to the same limitations as MTRRs are.

f) PAT has been available for quite a long time, since Pentium III
(circa 1999) and newer, but having PAT enabled does not restrict use
of MTRR and because of this some systems may end up then combining
MTRR and PAT. I do not believe this wasn't an original highly expected
wide use situation, it technically should work to combine both but
there might be issues with interactions between both, exactly what
issues can exist or have existed remains quite unclear as MTRR in and
of itself has been known to be unreliable anyway. If possible its best
to just be binary about this and only use MTRR if and only if
necessary because of the other issues known with MTRR.

g) Linux has support for Xen PV domains using PAT, this was introduced
by Juergen via v3.19 via commit 47591df50512 ("xen: Support Xen
pv-domains using PAT"). Since MTRR is old we don't want to add MTRR
support into Xen on Linux, instead since Linux now supports PV domains
with PAT we can take full advantage of write combining for PV domains
on Xen provided all Linux drivers are converted to use PAT properly.a
framebuffer folks's ACK

Review of the changes
================

Most of the series has consisted of driver transformations using
Coccinelle SmPL patches to transform existing code which access MTRR
APIs directly to the architecture agnostic write-combining calls.
Other patches extend bus subsystems to make available new
write-combining architecture agnostic APIs. Other patches have
consisted of extending architecture agnostic APIs to help work around
old MTRR hacks -- this was perhaps the hardest task and took quite a
bit of time and review as it required review of implications of all
combinatorial possibilities with MTRR and PAT, which also got
documented as part of the series. In the end it was also determined
some drivers required substantial work to be able to work properly
with PAT, the atyfb driver is an example driver which had the homework
required done. Due to complexities and since the driver / hardware was
ancient we decided to reach a compromise and require users of those
drivers to boot with a kernel parameter to disable PAT, fortunately
this was only required for two old device drivers:

* ipath: the ipath device driver powers the old HTX bus cards that
only work in AMD systems, while the newer IB/qib device driver powers
all PCI-e cards. The ipath device driver is obsolete, hardware hard to
find

* ivtv: the hardware is really rare these days, and perhaps only some
lost souls in some third world country are expected to be using this
feature of the device driver.

Pending RIP MTRR patches
====================

There are a few pending series so I wanted to provide a status update
on those series.

mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()

This is the nail on the MTRR coffin, it will prevent future direct
access to MTRR code. This will not be posted until all of the below
patches are in and merged. A possible next step here might be to
consider separating PAT code from MTRR code and making PAT a first
class citizen, enabling distributions to disable MTRR code in the
future. I thought this was possible but for some reason I recently
thought that there was one possible issue to make this happen. I
suppose we won't know unless we try, unless of course someone already
knows, Toshi?

IB/ipath: use arch_phys_wc_add() and require PAT disabled
IB/ipath: add counting for MTRR
ivtv: use arch_phys_wc_add() and require PAT disabled

This v7 series just posted addresses all drivers which cannot work
with PAT, fortunately we end up with only 2! This series has all
subsystem and driver maintainers ACKs, it was just posted with a few
fixes with the intent to be merged through Boris' x86 tree as it
depends on the newly exported pat_enabled() symbol.

fusion: remove dead MTRR code

This should go through Bottomley's tree, driver maintainer provided an
ACKed, this is just pending integration into the SCSI subsystem tree.
As a last resort my hope is that this can go through Andrew Morton's
tree.

video: fbdev: vesafb: use arch_phys_wc_add()
video: fbdev: vesafb: add missing mtrr_del() for added MTRR
video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB

A v4 series was posted, pending review / Integration through Tomis' tree

video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
video: fbdev: atyfb: replace MTRR UC hole with strong UC
video: fbdev: atyfb: clarify ioremap() base and length used
video: fbdev: atyfb: move framebuffer length fudging to helper

This provides a work around replacement for the direct MTRR use hack
using the new ioremap_uc() API. This is pending review / ACK by the
driver maintainer Ville Syrjälä, and the subsystem maintainer, Tomi.
Since it relies on ioremap_uc(), which is merged through Boris' tree,
this should go through Boris' tree *iff* we want it in for v4.2,
otherwise this will have to wait for v4.3.

video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
lib: devres: add pcim_iomap_wc() variants
pci: add pci_iomap_wc() variants

This has Tomi's ACK already for the driver specific changes, Bjorn
asked for a Documentation guidance update for EXPORT_SYMBOL_GPL(),
this is now merged via linux-next. This is pending consent from Tomi
if it can go in through Bjorn's tree. It is also obviously pending a
final ACK from Bjorn. The devres change would go in without any users,
I leave this for Bjorn to decide but do note my concern of someone in
the future adding a non EXPORT_SYMBOL_GPL() for this implementation.

video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
pci: add pci_ioremap_wc_bar()

This requires Tomi's Ack / review, and then obviously Bjorn's own ACK
/ integration. Consent from Tomi of whether or not this can go through
Bjorn's tree is also needed.

Luis


2015-06-11 23:23:32

by Toshi Kani

[permalink] [raw]
Subject: Re: RIP MTRR - status update for upcoming v4.2

On Thu, 2015-06-11 at 13:36 -0700, Luis R. Rodriguez wrote:
:
> Pending RIP MTRR patches
> ====================
>
> There are a few pending series so I wanted to provide a status update
> on those series.
>
> mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
>
> This is the nail on the MTRR coffin, it will prevent future direct
> access to MTRR code. This will not be posted until all of the below
> patches are in and merged. A possible next step here might be to
> consider separating PAT code from MTRR code and making PAT a first
> class citizen, enabling distributions to disable MTRR code in the
> future. I thought this was possible but for some reason I recently
> thought that there was one possible issue to make this happen. I
> suppose we won't know unless we try, unless of course someone already
> knows, Toshi?

There are two usages on MTRRs:
1) MTRR entries set by firmware
2) MTRR entries set by OS drivers

We can obsolete 2), but we have no control over 1). As UEFI firmwares
also set this up, this usage will continue to stay. So, we should not
get rid of the MTRR code that looks up the MTRR entries, while we have
no need to modify them.

Such MTRR entries provide safe guard to /dev/mem, which allows
privileged user to access a range that may require UC mapping while
the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
such a case.

UEFI memory table has memory attribute, which describes cache types
supported in physical memory ranges. However, this information gets
lost when it it is converted to e820 table.

Thanks,
-Toshi

2015-06-12 00:52:30

by Luis Chamberlain

[permalink] [raw]
Subject: Re: RIP MTRR - status update for upcoming v4.2

On Thu, Jun 11, 2015 at 05:23:16PM -0600, Toshi Kani wrote:
> On Thu, 2015-06-11 at 13:36 -0700, Luis R. Rodriguez wrote:
> :
> > Pending RIP MTRR patches
> > ====================
> >
> > There are a few pending series so I wanted to provide a status update
> > on those series.
> >
> > mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
> >
> > This is the nail on the MTRR coffin, it will prevent future direct
> > access to MTRR code. This will not be posted until all of the below
> > patches are in and merged. A possible next step here might be to
> > consider separating PAT code from MTRR code and making PAT a first
> > class citizen, enabling distributions to disable MTRR code in the
> > future. I thought this was possible but for some reason I recently
> > thought that there was one possible issue to make this happen. I
> > suppose we won't know unless we try, unless of course someone already
> > knows, Toshi?
>
> There are two usages on MTRRs:
> 1) MTRR entries set by firmware
> 2) MTRR entries set by OS drivers
>
> We can obsolete 2), but we have no control over 1). As UEFI firmwares
> also set this up, this usage will continue to stay. So, we should not
> get rid of the MTRR code that looks up the MTRR entries, while we have
> no need to modify them.
>
> Such MTRR entries provide safe guard to /dev/mem, which allows
> privileged user to access a range that may require UC mapping while
> the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
> such a case.
>
> UEFI memory table has memory attribute, which describes cache types
> supported in physical memory ranges. However, this information gets
> lost when it it is converted to e820 table.

Is there no way to modify CPU capability bits upon boot and kick UEFI
to re-evaluate ? In such UEFI cases what happens for instance when
Xen is used which does not support MTRR?

Luis

2015-06-12 07:59:58

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

>>> On 12.06.15 at 01:23, <[email protected]> wrote:
> There are two usages on MTRRs:
> 1) MTRR entries set by firmware
> 2) MTRR entries set by OS drivers
>
> We can obsolete 2), but we have no control over 1). As UEFI firmwares
> also set this up, this usage will continue to stay. So, we should not
> get rid of the MTRR code that looks up the MTRR entries, while we have
> no need to modify them.
>
> Such MTRR entries provide safe guard to /dev/mem, which allows
> privileged user to access a range that may require UC mapping while
> the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
> such a case.

But it wouldn't be impossible to simply read the MTRRs upon boot,
store the information, disable MTRRs, and correctly use PAT to
achieve the same effect (i.e. the "blindly maps" part of course
would need fixing).

> UEFI memory table has memory attribute, which describes cache types
> supported in physical memory ranges. However, this information gets
> lost when it it is converted to e820 table.

I'm afraid you rather don't want to trust that information, as
firmware vendors frequently screw it up.

Jan

2015-06-12 16:43:00

by Toshi Kani

[permalink] [raw]
Subject: Re: RIP MTRR - status update for upcoming v4.2

On Fri, 2015-06-12 at 02:52 +0200, Luis R. Rodriguez wrote:
> On Thu, Jun 11, 2015 at 05:23:16PM -0600, Toshi Kani wrote:
> > On Thu, 2015-06-11 at 13:36 -0700, Luis R. Rodriguez wrote:
> > :
> > > Pending RIP MTRR patches
> > > ====================
> > >
> > > There are a few pending series so I wanted to provide a status update
> > > on those series.
> > >
> > > mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
> > >
> > > This is the nail on the MTRR coffin, it will prevent future direct
> > > access to MTRR code. This will not be posted until all of the below
> > > patches are in and merged. A possible next step here might be to
> > > consider separating PAT code from MTRR code and making PAT a first
> > > class citizen, enabling distributions to disable MTRR code in the
> > > future. I thought this was possible but for some reason I recently
> > > thought that there was one possible issue to make this happen. I
> > > suppose we won't know unless we try, unless of course someone already
> > > knows, Toshi?
> >
> > There are two usages on MTRRs:
> > 1) MTRR entries set by firmware
> > 2) MTRR entries set by OS drivers
> >
> > We can obsolete 2), but we have no control over 1). As UEFI firmwares
> > also set this up, this usage will continue to stay. So, we should not
> > get rid of the MTRR code that looks up the MTRR entries, while we have
> > no need to modify them.
> >
> > Such MTRR entries provide safe guard to /dev/mem, which allows
> > privileged user to access a range that may require UC mapping while
> > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
> > such a case.
> >
> > UEFI memory table has memory attribute, which describes cache types
> > supported in physical memory ranges. However, this information gets
> > lost when it it is converted to e820 table.
>
> Is there no way to modify CPU capability bits upon boot and kick UEFI
> to re-evaluate ? In such UEFI cases what happens for instance when
> Xen is used which does not support MTRR?

EFI GetMemoryMap() is a boot service, and won't be available after
ExitBootServices() is called. But we should be able to keep the
attribute information copied into some table if necessary.

Xen provides virtual firmware on their guests, right? If this firmware
does not set up MTRRs today, then I do not think it needs to set up for
UEFI, either. Assuming the guest physical address is virtualized, it
does not have to carry the same platform attribute & restriction.

Thanks,
-Toshi

2015-06-12 16:58:48

by Toshi Kani

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, 2015-06-12 at 08:59 +0100, Jan Beulich wrote:
> >>> On 12.06.15 at 01:23, <[email protected]> wrote:
> > There are two usages on MTRRs:
> > 1) MTRR entries set by firmware
> > 2) MTRR entries set by OS drivers
> >
> > We can obsolete 2), but we have no control over 1). As UEFI firmwares
> > also set this up, this usage will continue to stay. So, we should not
> > get rid of the MTRR code that looks up the MTRR entries, while we have
> > no need to modify them.
> >
> > Such MTRR entries provide safe guard to /dev/mem, which allows
> > privileged user to access a range that may require UC mapping while
> > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
> > such a case.
>
> But it wouldn't be impossible to simply read the MTRRs upon boot,
> store the information, disable MTRRs, and correctly use PAT to
> achieve the same effect (i.e. the "blindly maps" part of course
> would need fixing).

It could be done, but I do not see much benefit of doing it. One of the
reasons platform vendors set MTRRs is so that a system won't hit a
machine check when an OS bug leads an access with a wrong cache type. A
machine check is hard to analyze and can be seen as a hardware issue by
customers. Emulating MTRRs with PAT won't protect from such a bug.

> > UEFI memory table has memory attribute, which describes cache types
> > supported in physical memory ranges. However, this information gets
> > lost when it it is converted to e820 table.
>
> I'm afraid you rather don't want to trust that information, as
> firmware vendors frequently screw it up.

Could be, but we need to use firmware info when necessary...

Thanks,
-Toshi

2015-06-12 23:15:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Jun 12, 2015 12:59 AM, "Jan Beulich" <[email protected]> wrote:
>
> >>> On 12.06.15 at 01:23, <[email protected]> wrote:
> > There are two usages on MTRRs:
> > 1) MTRR entries set by firmware
> > 2) MTRR entries set by OS drivers
> >
> > We can obsolete 2), but we have no control over 1). As UEFI firmwares
> > also set this up, this usage will continue to stay. So, we should not
> > get rid of the MTRR code that looks up the MTRR entries, while we have
> > no need to modify them.
> >
> > Such MTRR entries provide safe guard to /dev/mem, which allows
> > privileged user to access a range that may require UC mapping while
> > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
> > such a case.
>
> But it wouldn't be impossible to simply read the MTRRs upon boot,
> store the information, disable MTRRs, and correctly use PAT to
> achieve the same effect (i.e. the "blindly maps" part of course
> would need fixing).

This may crash and burn badly when we call a UEFI function or an SMI
happens. I think we should just leave the MTRRs alone.

--Andy

2015-06-12 23:29:24

by James Bottomley

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, 2015-06-12 at 16:15 -0700, Andy Lutomirski wrote:
> On Jun 12, 2015 12:59 AM, "Jan Beulich" <[email protected]> wrote:
> >
> > >>> On 12.06.15 at 01:23, <[email protected]> wrote:
> > > There are two usages on MTRRs:
> > > 1) MTRR entries set by firmware
> > > 2) MTRR entries set by OS drivers
> > >
> > > We can obsolete 2), but we have no control over 1). As UEFI firmwares
> > > also set this up, this usage will continue to stay. So, we should not
> > > get rid of the MTRR code that looks up the MTRR entries, while we have
> > > no need to modify them.
> > >
> > > Such MTRR entries provide safe guard to /dev/mem, which allows
> > > privileged user to access a range that may require UC mapping while
> > > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
> > > such a case.
> >
> > But it wouldn't be impossible to simply read the MTRRs upon boot,
> > store the information, disable MTRRs, and correctly use PAT to
> > achieve the same effect (i.e. the "blindly maps" part of course
> > would need fixing).
>
> This may crash and burn badly when we call a UEFI function or an SMI
> happens. I think we should just leave the MTRRs alone.

Wholeheartedly agree: PAT only works when the given memory map is in
operation but MTRRs function everywhere. Anything that goes into real
mode or installs its own memory map won't see the Linux page attributes.

James

2015-06-13 06:37:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2


* Andy Lutomirski <[email protected]> wrote:

> On Jun 12, 2015 12:59 AM, "Jan Beulich" <[email protected]> wrote:
> >
> > >>> On 12.06.15 at 01:23, <[email protected]> wrote:
> > > There are two usages on MTRRs:
> > > 1) MTRR entries set by firmware
> > > 2) MTRR entries set by OS drivers
> > >
> > > We can obsolete 2), but we have no control over 1). As UEFI firmwares
> > > also set this up, this usage will continue to stay. So, we should not
> > > get rid of the MTRR code that looks up the MTRR entries, while we have
> > > no need to modify them.
> > >
> > > Such MTRR entries provide safe guard to /dev/mem, which allows privileged
> > > user to access a range that may require UC mapping while the /dev/mem driver
> > > blindly maps it with WB. MTRRs converts WB to UC in such a case.
> >
> > But it wouldn't be impossible to simply read the MTRRs upon boot, store the
> > information, disable MTRRs, and correctly use PAT to achieve the same effect
> > (i.e. the "blindly maps" part of course would need fixing).
>
> This may crash and burn badly when we call a UEFI function or an SMI happens. I
> think we should just leave the MTRRs alone.

Not to mention suspend/resume, reboot and other goodies where the firmware might
pop up expecting intact MTRRs.

Btw., doesn't a lack of MTRRs imply UC? So is 'crash and burn' possible in most
cases? Isn't it just 'executes slower than before'?

Thanks,

Ingo

2015-06-15 06:20:24

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

>>> On 13.06.15 at 01:15, <[email protected]> wrote:
> On Jun 12, 2015 12:59 AM, "Jan Beulich" <[email protected]> wrote:
>>
>> >>> On 12.06.15 at 01:23, <[email protected]> wrote:
>> > There are two usages on MTRRs:
>> > 1) MTRR entries set by firmware
>> > 2) MTRR entries set by OS drivers
>> >
>> > We can obsolete 2), but we have no control over 1). As UEFI firmwares
>> > also set this up, this usage will continue to stay. So, we should not
>> > get rid of the MTRR code that looks up the MTRR entries, while we have
>> > no need to modify them.
>> >
>> > Such MTRR entries provide safe guard to /dev/mem, which allows
>> > privileged user to access a range that may require UC mapping while
>> > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
>> > such a case.
>>
>> But it wouldn't be impossible to simply read the MTRRs upon boot,
>> store the information, disable MTRRs, and correctly use PAT to
>> achieve the same effect (i.e. the "blindly maps" part of course
>> would need fixing).
>
> This may crash and burn badly when we call a UEFI function or an SMI
> happens. I think we should just leave the MTRRs alone.

I buy the SMI part, but UEFI runtime calls are being done on
page tables we construct and control, so attributes could be kept
correct without relying on MTRRs.

Jan

2015-08-06 19:53:27

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]> wrote:
> On Fri, 2015-06-12 at 08:59 +0100, Jan Beulich wrote:
>> >>> On 12.06.15 at 01:23, <[email protected]> wrote:
>> > There are two usages on MTRRs:
>> > 1) MTRR entries set by firmware
>> > 2) MTRR entries set by OS drivers
>> >
>> > We can obsolete 2), but we have no control over 1). As UEFI firmwares
>> > also set this up, this usage will continue to stay. So, we should not
>> > get rid of the MTRR code that looks up the MTRR entries, while we have
>> > no need to modify them.
>> >
>> > Such MTRR entries provide safe guard to /dev/mem, which allows
>> > privileged user to access a range that may require UC mapping while
>> > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in
>> > such a case.
>>
>> But it wouldn't be impossible to simply read the MTRRs upon boot,
>> store the information, disable MTRRs, and correctly use PAT to
>> achieve the same effect (i.e. the "blindly maps" part of course
>> would need fixing).
>
> It could be done, but I do not see much benefit of doing it. One of the
> reasons platform vendors set MTRRs is so that a system won't hit a
> machine check when an OS bug leads an access with a wrong cache type.
>
> A machine check is hard to analyze and can be seen as a hardware issue by
> customers. Emulating MTRRs with PAT won't protect from such a bug.

That's seems like a fair and valid concern. This could only happen if
the OS would have code that would use MTRR, in the case of Linux we'll
soon be able to vet that this cannot happen. For those type of OSes...
could it be possible to negotiate or hint to the platform through an
attribute somehow that the OS has such capability to not use MTRR?
Then, only if this bit is set, the platform could then avoid such MTRR
settings, and if we have issues you can throw rocks at us.

Could this also be used to prevent SMIs with MTRRs?

Luis

2015-08-06 19:56:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Thu, Aug 6, 2015 at 12:53 PM, Luis R. Rodriguez
<[email protected]> wrote:
> For those type of OSes...
> could it be possible to negotiate or hint to the platform through an
> attribute somehow that the OS has such capability to not use MTRR?

And if that's not possible how about a new platform setting that would
need to be set at the platform level to enable disabling this junk?
Then only folks who know what they are doing would enable it, and if
the customer set it, the issue would not be on the platform.

Luis

2015-08-06 23:00:08

by Toshi Kani

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]> wrote:
> > On Fri, 2015-06-12 at 08:59 +0100, Jan Beulich wrote:
> > > > > > On 12.06.15 at 01:23, <[email protected]> wrote:
> > > > There are two usages on MTRRs:
> > > > 1) MTRR entries set by firmware
> > > > 2) MTRR entries set by OS drivers
> > > >
> > > > We can obsolete 2), but we have no control over 1). As UEFI
> > > > firmwares
> > > > also set this up, this usage will continue to stay. So, we should
> > > > not
> > > > get rid of the MTRR code that looks up the MTRR entries, while we
> > > > have
> > > > no need to modify them.
> > > >
> > > > Such MTRR entries provide safe guard to /dev/mem, which allows
> > > > privileged user to access a range that may require UC mapping while
> > > > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to
> > > > UC in
> > > > such a case.
> > >
> > > But it wouldn't be impossible to simply read the MTRRs upon boot,
> > > store the information, disable MTRRs, and correctly use PAT to
> > > achieve the same effect (i.e. the "blindly maps" part of course
> > > would need fixing).
> >
> > It could be done, but I do not see much benefit of doing it. One of the
> > reasons platform vendors set MTRRs is so that a system won't hit a
> > machine check when an OS bug leads an access with a wrong cache type.
> >
> > A machine check is hard to analyze and can be seen as a hardware issue
> > by customers. Emulating MTRRs with PAT won't protect from such a bug.
>
> That's seems like a fair and valid concern. This could only happen if
> the OS would have code that would use MTRR, in the case of Linux we'll
> soon be able to vet that this cannot happen.

No, there is no OS support necessary to use MTRR. After firmware sets it
up, CPUs continue to use it without any OS support. I think the Linux
change you are referring is to obsolete legacy interfaces that modify the
MTRR setup. I agree that Linux should not modify MTRR.

> For those type of OSes...
> could it be possible to negotiate or hint to the platform through an
> attribute somehow that the OS has such capability to not use MTRR?

The OS can disable MTRR. However, this can also cause a problem in
firmware, which may rely on MTRR.

> Then, only if this bit is set, the platform could then avoid such MTRR
> settings, and if we have issues you can throw rocks at us.

> And if that's not possible how about a new platform setting that would
> need to be set at the platform level to enable disabling this junk?
> Then only folks who know what they are doing would enable it, and if
> the customer set it, the issue would not be on the platform.

> Could this also be used to prevent SMIs with MTRRs?

ACPI _OSI could be used for firmware to implement some OS-specific features,
but it may be too late for firmware to make major changes and is generally
useless unless OS requirements are described in a spec backed by logo
certification. SMIs are also used for platform management, such as fan
speed control.

Is there any issue for Linux to use MTRR set by firmware?

Thanks,
-Toshi

2015-08-07 20:25:46

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]> wrote:
> On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
>> On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]> wrote:
>> > On Fri, 2015-06-12 at 08:59 +0100, Jan Beulich wrote:
>> > > > > > On 12.06.15 at 01:23, <[email protected]> wrote:
>> > > > There are two usages on MTRRs:
>> > > > 1) MTRR entries set by firmware
>> > > > 2) MTRR entries set by OS drivers
>> > > >
>> > > > We can obsolete 2), but we have no control over 1). As UEFI
>> > > > firmwares
>> > > > also set this up, this usage will continue to stay. So, we should
>> > > > not
>> > > > get rid of the MTRR code that looks up the MTRR entries, while we
>> > > > have
>> > > > no need to modify them.
>> > > >
>> > > > Such MTRR entries provide safe guard to /dev/mem, which allows
>> > > > privileged user to access a range that may require UC mapping while
>> > > > the /dev/mem driver blindly maps it with WB. MTRRs converts WB to
>> > > > UC in
>> > > > such a case.
>> > >
>> > > But it wouldn't be impossible to simply read the MTRRs upon boot,
>> > > store the information, disable MTRRs, and correctly use PAT to
>> > > achieve the same effect (i.e. the "blindly maps" part of course
>> > > would need fixing).
>> >
>> > It could be done, but I do not see much benefit of doing it. One of the
>> > reasons platform vendors set MTRRs is so that a system won't hit a
>> > machine check when an OS bug leads an access with a wrong cache type.
>> >
>> > A machine check is hard to analyze and can be seen as a hardware issue
>> > by customers. Emulating MTRRs with PAT won't protect from such a bug.
>>
>> That's seems like a fair and valid concern. This could only happen if
>> the OS would have code that would use MTRR, in the case of Linux we'll
>> soon be able to vet that this cannot happen.
>
> No, there is no OS support necessary to use MTRR. After firmware sets it
> up, CPUs continue to use it without any OS support. I think the Linux
> change you are referring is to obsolete legacy interfaces that modify the
> MTRR setup. I agree that Linux should not modify MTRR.

Its a bit more than that though. Since you agree that the OS can live
without MTRR code I was hoping to then see if we can fold out PAT
Linux code from under the MTRR dependency on Linux and make PAT a
first class citizen, maybe at least for x86-64. Right now you can only
get PAT support on Linux if you have MTRR code, but I'd like to see if
instead we can rip MTRR code out completely under its own Kconfig and
let it start rotting away.

Code-wise the only issue I saw was that PAT code also relies on
mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
no other obvious issues.

Platform firmware and SMIs seems to be the only other possible issue.
More on this below.

>> For those type of OSes...
>> could it be possible to negotiate or hint to the platform through an
>> attribute somehow that the OS has such capability to not use MTRR?
>
> The OS can disable MTRR. However, this can also cause a problem in
> firmware, which may rely on MTRR.

Can you describe what type of issues we could expect ? I tend to care
more about this for 64-bit systems so if 32-bit platforms would be
more of the ones which could cause an issue would restricting
disabling MTRR only for 64-bit help?

>> Then, only if this bit is set, the platform could then avoid such MTRR
>> settings, and if we have issues you can throw rocks at us.
>
>> And if that's not possible how about a new platform setting that would
>> need to be set at the platform level to enable disabling this junk?
>> Then only folks who know what they are doing would enable it, and if
>> the customer set it, the issue would not be on the platform.
>
>> Could this also be used to prevent SMIs with MTRRs?
>
> ACPI _OSI could be used for firmware to implement some OS-specific features,
> but it may be too late for firmware to make major changes and is generally
> useless unless OS requirements are described in a spec backed by logo
> certification.

I see.. So there are no guarantees that platform firmware would not
expect OS MTRR support.

> SMIs are also used for platform management, such as fan
> speed control.

And its conceivable that some devices, or the platform itself, may
trigger SMIs to have the platform firmware poke with MTRRs?

> Is there any issue for Linux to use MTRR set by firmware?

Even though we don't have the Kconfig option right now to disable MTRR
cod explicitly I'll note that there are a few other cases that could
flip Linux to note use MTRR:

a) Some BIOSes could let MTRR get disabled
b) As of Xen 4.4, the hypervisor disables X86_FEATURE_MTRR which
disables MTRR on Linux

If these environments can exist it'd be good to understand possible
issues that could creep up as a result of the OS not having MTRR
enabled. If this is a reasonable thing for x86-64 I was hoping we
could just let users opt-in to a similar build configuration through
the OS by letting PAT not depend on MTRR.

Luis

2015-08-07 21:58:11

by Toshi Kani

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]> wrote:
> > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]> wrote:
> > > > On Fri, 2015-06-12 at 08:59 +0100, Jan Beulich wrote:
> > > > > > > > On 12.06.15 at 01:23, <[email protected]> wrote:
> > > > > > There are two usages on MTRRs:
> > > > > > 1) MTRR entries set by firmware
> > > > > > 2) MTRR entries set by OS drivers
> > > > > >
> > > > > > We can obsolete 2), but we have no control over 1). As UEFI
> > > > > > firmwares also set this up, this usage will continue to stay.
> > > > > > So, we should not get rid of the MTRR code that looks up the
> > > > > > MTRR entries, while we have no need to modify them.
> > > > > >
> > > > > > Such MTRR entries provide safe guard to /dev/mem, which allows
> > > > > > privileged user to access a range that may require UC mapping
> > > > > > while the /dev/mem driver blindly maps it with WB. MTRRs
> > > > > > converts WB to UC in such a case.
> > > > >
> > > > > But it wouldn't be impossible to simply read the MTRRs upon boot,
> > > > > store the information, disable MTRRs, and correctly use PAT to
> > > > > achieve the same effect (i.e. the "blindly maps" part of course
> > > > > would need fixing).
> > > >
> > > > It could be done, but I do not see much benefit of doing it. One of
> > > > the reasons platform vendors set MTRRs is so that a system won't hit
> > > > a machine check when an OS bug leads an access with a wrong cache
> > > > type.
> > > >
> > > > A machine check is hard to analyze and can be seen as a hardware
> > > > issue by customers. Emulating MTRRs with PAT won't protect from
> > > > such a bug.
> > >
> > > That's seems like a fair and valid concern. This could only happen if
> > > the OS would have code that would use MTRR, in the case of Linux we'll
> > > soon be able to vet that this cannot happen.
> >
> > No, there is no OS support necessary to use MTRR. After firmware sets
> > it up, CPUs continue to use it without any OS support. I think the
> > Linux change you are referring is to obsolete legacy interfaces that
> > modify the MTRR setup. I agree that Linux should not modify MTRR.
>
> Its a bit more than that though. Since you agree that the OS can live
> without MTRR code I was hoping to then see if we can fold out PAT
> Linux code from under the MTRR dependency on Linux and make PAT a
> first class citizen, maybe at least for x86-64. Right now you can only
> get PAT support on Linux if you have MTRR code, but I'd like to see if
> instead we can rip MTRR code out completely under its own Kconfig and
> let it start rotting away.
>
> Code-wise the only issue I saw was that PAT code also relies on
> mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
> no other obvious issues.

We can rip of the MTTR code that modifies the MTRR setup, but not
mtrr_type_lookup(). This function provides necessary checks per documented
in commit 7f0431e3dc89 as follows.

1) reserve_memtype() tracks an effective memory type in case
a request type is WB (ex. /dev/mem blindly uses WB). Missing
to track with its effective type causes a subsequent request
to map the same range with the effective type to fail.

2) pud_set_huge() and pmd_set_huge() check if a requested range
has any overlap with MTRRs. Missing to detect an overlap may
cause a performance penalty or undefined behavior.

mtrr_type_lookup() is still admittedly awkward, but I do not think we have
an immediate issue in PAT code calling it. I do not think it makes PAT code
a second class citizen.

> Platform firmware and SMIs seems to be the only other possible issue.
> More on this below.
>
> > > For those type of OSes...
> > > could it be possible to negotiate or hint to the platform through an
> > > attribute somehow that the OS has such capability to not use MTRR?
> >
> > The OS can disable MTRR. However, this can also cause a problem in
> > firmware, which may rely on MTRR.
>
> Can you describe what type of issues we could expect ? I tend to care
> more about this for 64-bit systems so if 32-bit platforms would be
> more of the ones which could cause an issue would restricting
> disabling MTRR only for 64-bit help?

The SMI handler runs in real-mode and relies on MTRR being effective to
provide right cache types. It does not matter if it is 64-bit or not.

> > > Then, only if this bit is set, the platform could then avoid such MTRR
> > > settings, and if we have issues you can throw rocks at us.
> >
> > > And if that's not possible how about a new platform setting that would
> > > need to be set at the platform level to enable disabling this junk?
> > > Then only folks who know what they are doing would enable it, and if
> > > the customer set it, the issue would not be on the platform.
> >
> > > Could this also be used to prevent SMIs with MTRRs?
> >
> > ACPI _OSI could be used for firmware to implement some OS-specific
> > features, but it may be too late for firmware to make major changes and
> > is generally useless unless OS requirements are described in a spec
> > backed by logo certification.
>
> I see.. So there are no guarantees that platform firmware would not
> expect OS MTRR support.
>
> > SMIs are also used for platform management, such as fan
> > speed control.
>
> And its conceivable that some devices, or the platform itself, may
> trigger SMIs to have the platform firmware poke with MTRRs?

SMIs are outside of OS control. SMI handler relies on MTRR being set. SMI
must be quick, so the handler should not be required to initialize MTRR or
page tables.

> > Is there any issue for Linux to use MTRR set by firmware?
>
> Even though we don't have the Kconfig option right now to disable MTRR
> cod explicitly I'll note that there are a few other cases that could
> flip Linux to note use MTRR:
>
> a) Some BIOSes could let MTRR get disabled
> b) As of Xen 4.4, the hypervisor disables X86_FEATURE_MTRR which
> disables MTRR on Linux
>
> If these environments can exist it'd be good to understand possible
> issues that could creep up as a result of the OS not having MTRR
> enabled. If this is a reasonable thing for x86-64 I was hoping we
> could just let users opt-in to a similar build configuration through
> the OS by letting PAT not depend on MTRR.

Case a) and b) do not cause any issue. They simply lead mtrr_type_lookup()
to return immediately with MTRR_TYPE_INVALID (i.e. MTRR disable), and the
callers handle this value properly. These cases are only problematic when
the OS tries to modify MTRR.

Thanks,
-Toshi

2015-08-07 22:24:23

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani <[email protected]> wrote:
> On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
>> On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]> wrote:
>> > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
>> > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]> wrote:
>> > > > On Fri, 2015-06-12 at 08:59 +0100, Jan Beulich wrote:
>> > > > > > > > On 12.06.15 at 01:23, <[email protected]> wrote:
>> > > > > > There are two usages on MTRRs:
>> > > > > > 1) MTRR entries set by firmware
>> > > > > > 2) MTRR entries set by OS drivers
>> > > > > >
>> > > > > > We can obsolete 2), but we have no control over 1). As UEFI
>> > > > > > firmwares also set this up, this usage will continue to stay.
>> > > > > > So, we should not get rid of the MTRR code that looks up the
>> > > > > > MTRR entries, while we have no need to modify them.
>> > > > > >
>> > > > > > Such MTRR entries provide safe guard to /dev/mem, which allows
>> > > > > > privileged user to access a range that may require UC mapping
>> > > > > > while the /dev/mem driver blindly maps it with WB. MTRRs
>> > > > > > converts WB to UC in such a case.
>> > > > >
>> > > > > But it wouldn't be impossible to simply read the MTRRs upon boot,
>> > > > > store the information, disable MTRRs, and correctly use PAT to
>> > > > > achieve the same effect (i.e. the "blindly maps" part of course
>> > > > > would need fixing).
>> > > >
>> > > > It could be done, but I do not see much benefit of doing it. One of
>> > > > the reasons platform vendors set MTRRs is so that a system won't hit
>> > > > a machine check when an OS bug leads an access with a wrong cache
>> > > > type.
>> > > >
>> > > > A machine check is hard to analyze and can be seen as a hardware
>> > > > issue by customers. Emulating MTRRs with PAT won't protect from
>> > > > such a bug.
>> > >
>> > > That's seems like a fair and valid concern. This could only happen if
>> > > the OS would have code that would use MTRR, in the case of Linux we'll
>> > > soon be able to vet that this cannot happen.
>> >
>> > No, there is no OS support necessary to use MTRR. After firmware sets
>> > it up, CPUs continue to use it without any OS support. I think the
>> > Linux change you are referring is to obsolete legacy interfaces that
>> > modify the MTRR setup. I agree that Linux should not modify MTRR.
>>
>> Its a bit more than that though. Since you agree that the OS can live
>> without MTRR code I was hoping to then see if we can fold out PAT
>> Linux code from under the MTRR dependency on Linux and make PAT a
>> first class citizen, maybe at least for x86-64. Right now you can only
>> get PAT support on Linux if you have MTRR code, but I'd like to see if
>> instead we can rip MTRR code out completely under its own Kconfig and
>> let it start rotting away.
>>
>> Code-wise the only issue I saw was that PAT code also relies on
>> mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
>> no other obvious issues.
>
> We can rip of the MTTR code that modifies the MTRR setup, but not
> mtrr_type_lookup(). This function provides necessary checks per documented
> in commit 7f0431e3dc89 as follows.
>
> 1) reserve_memtype() tracks an effective memory type in case
> a request type is WB (ex. /dev/mem blindly uses WB). Missing
> to track with its effective type causes a subsequent request
> to map the same range with the effective type to fail.
>
> 2) pud_set_huge() and pmd_set_huge() check if a requested range
> has any overlap with MTRRs. Missing to detect an overlap may
> cause a performance penalty or undefined behavior.
>
> mtrr_type_lookup() is still admittedly awkward, but I do not think we have
> an immediate issue in PAT code calling it. I do not think it makes PAT code
> a second class citizen.

OK since we know that if MTRR set up code ends up disabled and would
return MTRR_TYPE_INVALID what if we just static inline this for the
no-MTRR Kconfig build option immediately, and only then have the full
blown implementation for the case where MTRR Kconfig option is
enabled?

>> Platform firmware and SMIs seems to be the only other possible issue.
>> More on this below.
>>
>> > > For those type of OSes...
>> > > could it be possible to negotiate or hint to the platform through an
>> > > attribute somehow that the OS has such capability to not use MTRR?
>> >
>> > The OS can disable MTRR. However, this can also cause a problem in
>> > firmware, which may rely on MTRR.
>>
>> Can you describe what type of issues we could expect ? I tend to care
>> more about this for 64-bit systems so if 32-bit platforms would be
>> more of the ones which could cause an issue would restricting
>> disabling MTRR only for 64-bit help?
>
> The SMI handler runs in real-mode and relies on MTRR being effective to
> provide right cache types. It does not matter if it is 64-bit or not.

I see... since I have no visibility to what goes under the hood, can
you provide one example use case where an SMI handler would require
getting a cache type through MTRR ? I realize this can vary, vendor by
vendor, but any example would do just to satisfy my curiosity.

>> > > Then, only if this bit is set, the platform could then avoid such MTRR
>> > > settings, and if we have issues you can throw rocks at us.
>> >
>> > > And if that's not possible how about a new platform setting that would
>> > > need to be set at the platform level to enable disabling this junk?
>> > > Then only folks who know what they are doing would enable it, and if
>> > > the customer set it, the issue would not be on the platform.
>> >
>> > > Could this also be used to prevent SMIs with MTRRs?
>> >
>> > ACPI _OSI could be used for firmware to implement some OS-specific
>> > features, but it may be too late for firmware to make major changes and
>> > is generally useless unless OS requirements are described in a spec
>> > backed by logo certification.
>>
>> I see.. So there are no guarantees that platform firmware would not
>> expect OS MTRR support.
>>
>> > SMIs are also used for platform management, such as fan
>> > speed control.
>>
>> And its conceivable that some devices, or the platform itself, may
>> trigger SMIs to have the platform firmware poke with MTRRs?
>
> SMIs are outside of OS control. SMI handler relies on MTRR being set. SMI
> must be quick, so the handler should not be required to initialize MTRR or
> page tables.

Right makes sense.

>> > Is there any issue for Linux to use MTRR set by firmware?
>>
>> Even though we don't have the Kconfig option right now to disable MTRR
>> cod explicitly I'll note that there are a few other cases that could
>> flip Linux to note use MTRR:
>>
>> a) Some BIOSes could let MTRR get disabled
>> b) As of Xen 4.4, the hypervisor disables X86_FEATURE_MTRR which
>> disables MTRR on Linux
>>
>> If these environments can exist it'd be good to understand possible
>> issues that could creep up as a result of the OS not having MTRR
>> enabled. If this is a reasonable thing for x86-64 I was hoping we
>> could just let users opt-in to a similar build configuration through
>> the OS by letting PAT not depend on MTRR.
>
> Case a) and b) do not cause any issue. They simply lead mtrr_type_lookup()
> to return immediately with MTRR_TYPE_INVALID (i.e. MTRR disable), and the
> callers handle this value properly. These cases are only problematic when
> the OS tries to modify MTRR.

OK if the OS returns MTRR_TYPE_INVALID, for folks who do not want MTRR
code on their kernel, we should be OK?

Luis

2015-08-07 23:11:05

by Toshi Kani

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
> On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani <[email protected]> wrote:
> > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]> wrote:
> > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]>
> > > > > wrote:
:
> > > >
> > > > No, there is no OS support necessary to use MTRR. After firmware
> > > > sets it up, CPUs continue to use it without any OS support. I think
> > > > the Linux change you are referring is to obsolete legacy interfaces
> > > > that modify the MTRR setup. I agree that Linux should not modify
> > > > MTRR.
> > >
> > > Its a bit more than that though. Since you agree that the OS can live
> > > without MTRR code I was hoping to then see if we can fold out PAT
> > > Linux code from under the MTRR dependency on Linux and make PAT a
> > > first class citizen, maybe at least for x86-64. Right now you can only
> > > get PAT support on Linux if you have MTRR code, but I'd like to see if
> > > instead we can rip MTRR code out completely under its own Kconfig and
> > > let it start rotting away.
> > >
> > > Code-wise the only issue I saw was that PAT code also relies on
> > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
> > > no other obvious issues.
> >
> > We can rip of the MTTR code that modifies the MTRR setup, but not
> > mtrr_type_lookup(). This function provides necessary checks per
> > documented
> > in commit 7f0431e3dc89 as follows.
> >
> > 1) reserve_memtype() tracks an effective memory type in case
> > a request type is WB (ex. /dev/mem blindly uses WB). Missing
> > to track with its effective type causes a subsequent request
> > to map the same range with the effective type to fail.
> >
> > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> > has any overlap with MTRRs. Missing to detect an overlap may
> > cause a performance penalty or undefined behavior.
> >
> > mtrr_type_lookup() is still admittedly awkward, but I do not think we
> > have an immediate issue in PAT code calling it. I do not think it makes
> > PAT code a second class citizen.
>
> OK since we know that if MTRR set up code ends up disabled and would
> return MTRR_TYPE_INVALID what if we just static inline this for the
> no-MTRR Kconfig build option immediately, and only then have the full
> blown implementation for the case where MTRR Kconfig option is
> enabled?

Yes, the MTRR code could be disabled by Kconfig with such inline stubs as
long as the kernel is built specifically for a particular platform with MTRR
disabled, such as Xen guest kernel.

However, since MTRR is a CPU feature enabled on most of the systems, I am
not sure if it makes sense to be configurable with Kconfig, though.

> > > Platform firmware and SMIs seems to be the only other possible issue.
> > > More on this below.
> > >
> > > > > For those type of OSes...
> > > > > could it be possible to negotiate or hint to the platform through
> > > > > an attribute somehow that the OS has such capability to not use
> > > > > MTRR?
> > > >
> > > > The OS can disable MTRR. However, this can also cause a problem in
> > > > firmware, which may rely on MTRR.
> > >
> > > Can you describe what type of issues we could expect ? I tend to care
> > > more about this for 64-bit systems so if 32-bit platforms would be
> > > more of the ones which could cause an issue would restricting
> > > disabling MTRR only for 64-bit help?
> >
> > The SMI handler runs in real-mode and relies on MTRR being effective to
> > provide right cache types. It does not matter if it is 64-bit or not.
>
> I see... since I have no visibility to what goes under the hood, can
> you provide one example use case where an SMI handler would require
> getting a cache type through MTRR ? I realize this can vary, vendor by
> vendor, but any example would do just to satisfy my curiosity.

For fan control, it would need UC access to its registers.

> > > > > Then, only if this bit is set, the platform could then avoid such
> > > > > MTRR settings, and if we have issues you can throw rocks at us.
> > > >
> > > > > And if that's not possible how about a new platform setting that
> > > > > would need to be set at the platform level to enable disabling
> > > > > this junk?
> > > > > Then only folks who know what they are doing would enable it, and
> > > > > if the customer set it, the issue would not be on the platform.
> > > >
> > > > > Could this also be used to prevent SMIs with MTRRs?
> > > >
> > > > ACPI _OSI could be used for firmware to implement some OS-specific
> > > > features, but it may be too late for firmware to make major changes
> > > > and
> > > > is generally useless unless OS requirements are described in a spec
> > > > backed by logo certification.
> > >
> > > I see.. So there are no guarantees that platform firmware would not
> > > expect OS MTRR support.
> > >
> > > > SMIs are also used for platform management, such as fan
> > > > speed control.
> > >
> > > And its conceivable that some devices, or the platform itself, may
> > > trigger SMIs to have the platform firmware poke with MTRRs?
> >
> > SMIs are outside of OS control. SMI handler relies on MTRR being set.
> > SMI must be quick, so the handler should not be required to initialize
> > MTRR or page tables.
>
> Right makes sense.
>
> > > > Is there any issue for Linux to use MTRR set by firmware?
> > >
> > > Even though we don't have the Kconfig option right now to disable MTRR
> > > cod explicitly I'll note that there are a few other cases that could
> > > flip Linux to note use MTRR:
> > >
> > > a) Some BIOSes could let MTRR get disabled
> > > b) As of Xen 4.4, the hypervisor disables X86_FEATURE_MTRR which
> > > disables MTRR on Linux
> > >
> > > If these environments can exist it'd be good to understand possible
> > > issues that could creep up as a result of the OS not having MTRR
> > > enabled. If this is a reasonable thing for x86-64 I was hoping we
> > > could just let users opt-in to a similar build configuration through
> > > the OS by letting PAT not depend on MTRR.
> >
> > Case a) and b) do not cause any issue. They simply lead
> > mtrr_type_lookup() to return immediately with MTRR_TYPE_INVALID (i.e.
> > MTRR disable), and the callers handle this value properly. These cases
> > are only problematic when the OS tries to modify MTRR.
>
> OK if the OS returns MTRR_TYPE_INVALID, for folks who do not want MTRR
> code on their kernel, we should be OK?

Technically OK. Not sure if we want such a Kconfig option, though.

Thanks,
-Toshi

2015-08-07 23:21:20

by Toshi Kani

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, 2015-08-07 at 17:08 -0600, Toshi Kani wrote:
> On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
> > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani <[email protected]> wrote:
> > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]>
> > > > wrote:
> > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]>
> > > > > > wrote:
> :
> > > > >
> > > > > No, there is no OS support necessary to use MTRR. After firmware
> > > > > sets it up, CPUs continue to use it without any OS support. I
> > > > > think the Linux change you are referring is to obsolete legacy
> > > > > interfaces that modify the MTRR setup. I agree that Linux should
> > > > > not modify MTRR.
> > > >
> > > > Its a bit more than that though. Since you agree that the OS can
> > > > live without MTRR code I was hoping to then see if we can fold out
> > > > PAT Linux code from under the MTRR dependency on Linux and make PAT
> > > > a first class citizen, maybe at least for x86-64. Right now you can
> > > > only get PAT support on Linux if you have MTRR code, but I'd like to
> > > > see if instead we can rip MTRR code out completely under its own
> > > > Kconfig and let it start rotting away.
> > > >
> > > > Code-wise the only issue I saw was that PAT code also relies on
> > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I
> > > > found no other obvious issues.
> > >
> > > We can rip of the MTTR code that modifies the MTRR setup, but not
> > > mtrr_type_lookup(). This function provides necessary checks per
> > > documented in commit 7f0431e3dc89 as follows.
> > >
> > > 1) reserve_memtype() tracks an effective memory type in case
> > > a request type is WB (ex. /dev/mem blindly uses WB). Missing
> > > to track with its effective type causes a subsequent request
> > > to map the same range with the effective type to fail.
> > >
> > > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> > > has any overlap with MTRRs. Missing to detect an overlap may
> > > cause a performance penalty or undefined behavior.
> > >
> > > mtrr_type_lookup() is still admittedly awkward, but I do not think we
> > > have an immediate issue in PAT code calling it. I do not think it
> > > makes
> > > PAT code a second class citizen.
> >
> > OK since we know that if MTRR set up code ends up disabled and would
> > return MTRR_TYPE_INVALID what if we just static inline this for the
> > no-MTRR Kconfig build option immediately, and only then have the full
> > blown implementation for the case where MTRR Kconfig option is
> > enabled?
>
> Yes, the MTRR code could be disabled by Kconfig with such inline stubs as
> long as the kernel is built specifically for a particular platform with
> MTRR disabled, such as Xen guest kernel.

Noticed that we do have CONFIG_MTRR and mtrr_type_lookup() inline stub
returns MTRR_INVALID.

-Toshi

2015-08-07 23:26:30

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, Aug 7, 2015 at 4:08 PM, Toshi Kani <[email protected]> wrote:
> On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
>> On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani <[email protected]> wrote:
>> > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
>> > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]> wrote:
>> > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
>> > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]>
>> > > > > wrote:
> :
>> > > >
>> > > > No, there is no OS support necessary to use MTRR. After firmware
>> > > > sets it up, CPUs continue to use it without any OS support. I think
>> > > > the Linux change you are referring is to obsolete legacy interfaces
>> > > > that modify the MTRR setup. I agree that Linux should not modify
>> > > > MTRR.
>> > >
>> > > Its a bit more than that though. Since you agree that the OS can live
>> > > without MTRR code I was hoping to then see if we can fold out PAT
>> > > Linux code from under the MTRR dependency on Linux and make PAT a
>> > > first class citizen, maybe at least for x86-64. Right now you can only
>> > > get PAT support on Linux if you have MTRR code, but I'd like to see if
>> > > instead we can rip MTRR code out completely under its own Kconfig and
>> > > let it start rotting away.
>> > >
>> > > Code-wise the only issue I saw was that PAT code also relies on
>> > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I found
>> > > no other obvious issues.
>> >
>> > We can rip of the MTTR code that modifies the MTRR setup, but not
>> > mtrr_type_lookup(). This function provides necessary checks per
>> > documented
>> > in commit 7f0431e3dc89 as follows.
>> >
>> > 1) reserve_memtype() tracks an effective memory type in case
>> > a request type is WB (ex. /dev/mem blindly uses WB). Missing
>> > to track with its effective type causes a subsequent request
>> > to map the same range with the effective type to fail.
>> >
>> > 2) pud_set_huge() and pmd_set_huge() check if a requested range
>> > has any overlap with MTRRs. Missing to detect an overlap may
>> > cause a performance penalty or undefined behavior.
>> >
>> > mtrr_type_lookup() is still admittedly awkward, but I do not think we
>> > have an immediate issue in PAT code calling it. I do not think it makes
>> > PAT code a second class citizen.
>>
>> OK since we know that if MTRR set up code ends up disabled and would
>> return MTRR_TYPE_INVALID what if we just static inline this for the
>> no-MTRR Kconfig build option immediately, and only then have the full
>> blown implementation for the case where MTRR Kconfig option is
>> enabled?
>
> Yes, the MTRR code could be disabled by Kconfig with such inline stubs

OK thanks.

> as
> long as the kernel is built specifically for a particular platform with MTRR
> disabled, such as Xen guest kernel.

Sure.

> However, since MTRR is a CPU feature enabled on most of the systems, I am
> not sure if it makes sense to be configurable with Kconfig, though.

To me this is about making PAT a first class citizen in code though
and validating through Kconfig the option then to opt-out of MTRR from
OS code. Perhaps we can recommend to enable it but having the options
to split out PAT from MTRR is what I was aiming for.

>> > > Platform firmware and SMIs seems to be the only other possible issue.
>> > > More on this below.
>> > >
>> > > > > For those type of OSes...
>> > > > > could it be possible to negotiate or hint to the platform through
>> > > > > an attribute somehow that the OS has such capability to not use
>> > > > > MTRR?
>> > > >
>> > > > The OS can disable MTRR. However, this can also cause a problem in
>> > > > firmware, which may rely on MTRR.
>> > >
>> > > Can you describe what type of issues we could expect ? I tend to care
>> > > more about this for 64-bit systems so if 32-bit platforms would be
>> > > more of the ones which could cause an issue would restricting
>> > > disabling MTRR only for 64-bit help?
>> >
>> > The SMI handler runs in real-mode and relies on MTRR being effective to
>> > provide right cache types. It does not matter if it is 64-bit or not.
>>
>> I see... since I have no visibility to what goes under the hood, can
>> you provide one example use case where an SMI handler would require
>> getting a cache type through MTRR ? I realize this can vary, vendor by
>> vendor, but any example would do just to satisfy my curiosity.
>
> For fan control, it would need UC access to its registers.

OK thanks! To follow up with the example, since the platform firmware
would have set up the MTRRs anyway, the SMI should still work, even if
the OS didn't do anything, right?

>> > > > > Then, only if this bit is set, the platform could then avoid such
>> > > > > MTRR settings, and if we have issues you can throw rocks at us.
>> > > >
>> > > > > And if that's not possible how about a new platform setting that
>> > > > > would need to be set at the platform level to enable disabling
>> > > > > this junk?
>> > > > > Then only folks who know what they are doing would enable it, and
>> > > > > if the customer set it, the issue would not be on the platform.
>> > > >
>> > > > > Could this also be used to prevent SMIs with MTRRs?
>> > > >
>> > > > ACPI _OSI could be used for firmware to implement some OS-specific
>> > > > features, but it may be too late for firmware to make major changes
>> > > > and
>> > > > is generally useless unless OS requirements are described in a spec
>> > > > backed by logo certification.
>> > >
>> > > I see.. So there are no guarantees that platform firmware would not
>> > > expect OS MTRR support.
>> > >
>> > > > SMIs are also used for platform management, such as fan
>> > > > speed control.
>> > >
>> > > And its conceivable that some devices, or the platform itself, may
>> > > trigger SMIs to have the platform firmware poke with MTRRs?
>> >
>> > SMIs are outside of OS control. SMI handler relies on MTRR being set.
>> > SMI must be quick, so the handler should not be required to initialize
>> > MTRR or page tables.
>>
>> Right makes sense.
>>
>> > > > Is there any issue for Linux to use MTRR set by firmware?
>> > >
>> > > Even though we don't have the Kconfig option right now to disable MTRR
>> > > cod explicitly I'll note that there are a few other cases that could
>> > > flip Linux to note use MTRR:
>> > >
>> > > a) Some BIOSes could let MTRR get disabled
>> > > b) As of Xen 4.4, the hypervisor disables X86_FEATURE_MTRR which
>> > > disables MTRR on Linux
>> > >
>> > > If these environments can exist it'd be good to understand possible
>> > > issues that could creep up as a result of the OS not having MTRR
>> > > enabled. If this is a reasonable thing for x86-64 I was hoping we
>> > > could just let users opt-in to a similar build configuration through
>> > > the OS by letting PAT not depend on MTRR.
>> >
>> > Case a) and b) do not cause any issue. They simply lead
>> > mtrr_type_lookup() to return immediately with MTRR_TYPE_INVALID (i.e.
>> > MTRR disable), and the callers handle this value properly. These cases
>> > are only problematic when the OS tries to modify MTRR.
>>
>> OK if the OS returns MTRR_TYPE_INVALID, for folks who do not want MTRR
>> code on their kernel, we should be OK?
>
> Technically OK. Not sure if we want such a Kconfig option, though.

Its more of me wanting to get PAT out from under MTRR. Does that make sense?

Luis

2015-08-07 23:50:42

by Toshi Kani

[permalink] [raw]
Subject: Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

On Fri, 2015-08-07 at 16:26 -0700, Luis R. Rodriguez wrote:
> On Fri, Aug 7, 2015 at 4:08 PM, Toshi Kani <[email protected]> wrote:
> > On Fri, 2015-08-07 at 15:23 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Aug 7, 2015 at 2:56 PM, Toshi Kani <[email protected]> wrote:
> > > > On Fri, 2015-08-07 at 13:25 -0700, Luis R. Rodriguez wrote:
> > > > > On Thu, Aug 6, 2015 at 3:58 PM, Toshi Kani <[email protected]>
> > > > > wrote:
> > > > > > On Thu, 2015-08-06 at 12:53 -0700, Luis R. Rodriguez wrote:
> > > > > > > On Fri, Jun 12, 2015 at 9:58 AM, Toshi Kani <[email protected]
> > > > > > > >
> > > > > > > wrote:
:
> > > > >
> > > > > Its a bit more than that though. Since you agree that the OS can
> > > > > live without MTRR code I was hoping to then see if we can fold out
> > > > > PAT Linux code from under the MTRR dependency on Linux and make
> > > > > PAT a first class citizen, maybe at least for x86-64. Right now
> > > > > you can only get PAT support on Linux if you have MTRR code, but
> > > > > I'd like to see if instead we can rip MTRR code out completely
> > > > > under its own Kconfig and let it start rotting away.
> > > > >
> > > > > Code-wise the only issue I saw was that PAT code also relies on
> > > > > mtrr_type_lookup(), see pat_x_mtrr_type(), but other than this I
> > > > > found no other obvious issues.
> > > >
> > > > We can rip of the MTTR code that modifies the MTRR setup, but not
> > > > mtrr_type_lookup(). This function provides necessary checks per
> > > > documented
> > > > in commit 7f0431e3dc89 as follows.
> > > >
> > > > 1) reserve_memtype() tracks an effective memory type in case
> > > > a request type is WB (ex. /dev/mem blindly uses WB). Missing
> > > > to track with its effective type causes a subsequent request
> > > > to map the same range with the effective type to fail.
> > > >
> > > > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> > > > has any overlap with MTRRs. Missing to detect an overlap may
> > > > cause a performance penalty or undefined behavior.
> > > >
> > > > mtrr_type_lookup() is still admittedly awkward, but I do not think
> > > > we
> > > > have an immediate issue in PAT code calling it. I do not think it
> > > > makes
> > > > PAT code a second class citizen.
> > >
> > > OK since we know that if MTRR set up code ends up disabled and would
> > > return MTRR_TYPE_INVALID what if we just static inline this for the
> > > no-MTRR Kconfig build option immediately, and only then have the full
> > > blown implementation for the case where MTRR Kconfig option is
> > > enabled?
> >
> > Yes, the MTRR code could be disabled by Kconfig with such inline stubs
>
> OK thanks.
>
> > as
> > long as the kernel is built specifically for a particular platform with
> > MTRR disabled, such as Xen guest kernel.
>
> Sure.
>
> > However, since MTRR is a CPU feature enabled on most of the systems, I
> > am not sure if it makes sense to be configurable with Kconfig, though.
>
> To me this is about making PAT a first class citizen in code though
> and validating through Kconfig the option then to opt-out of MTRR from
> OS code. Perhaps we can recommend to enable it but having the options
> to split out PAT from MTRR is what I was aiming for.

Since we have CONFIG_MTRR already, we do not need to argue over this option.
:-) It makes sense since when MTRR code was introduced, there were CPUs
without this capability...

> > > > > Platform firmware and SMIs seems to be the only other possible
> > > > > issue. More on this below.
> > > > >
> > > > > > > For those type of OSes...
> > > > > > > could it be possible to negotiate or hint to the platform
> > > > > > > through an attribute somehow that the OS has such capability
> > > > > > > to not use MTRR?
> > > > > >
> > > > > > The OS can disable MTRR. However, this can also cause a problem
> > > > > > in firmware, which may rely on MTRR.
> > > > >
> > > > > Can you describe what type of issues we could expect ? I tend to
> > > > > care more about this for 64-bit systems so if 32-bit platforms
> > > > > would be more of the ones which could cause an issue would
> > > > > restricting disabling MTRR only for 64-bit help?
> > > >
> > > > The SMI handler runs in real-mode and relies on MTRR being effective
> > > > to provide right cache types. It does not matter if it is 64-bit or
> > > > not.
> > >
> > > I see... since I have no visibility to what goes under the hood, can
> > > you provide one example use case where an SMI handler would require
> > > getting a cache type through MTRR ? I realize this can vary, vendor by
> > > vendor, but any example would do just to satisfy my curiosity.
> >
> > For fan control, it would need UC access to its registers.
>
> OK thanks! To follow up with the example, since the platform firmware
> would have set up the MTRRs anyway, the SMI should still work, even if
> the OS didn't do anything, right?

Yes, MTRR works without the OS code. However, mtrr_type_lookup() is
necessary to make sure that OS mapping requests are aligned with with the
MTRR setup.


> > > > > > Is there any issue for Linux to use MTRR set by firmware?
> > > > >
> > > > > Even though we don't have the Kconfig option right now to disable
> > > > > MTRR cod explicitly I'll note that there are a few other cases
> > > > > that could flip Linux to note use MTRR:
> > > > >
> > > > > a) Some BIOSes could let MTRR get disabled
> > > > > b) As of Xen 4.4, the hypervisor disables X86_FEATURE_MTRR which
> > > > > disables MTRR on Linux
> > > > >
> > > > > If these environments can exist it'd be good to understand
> > > > > possible issues that could creep up as a result of the OS not
> > > > > having MTRR enabled. If this is a reasonable thing for x86-64 I
> > > > > was hoping we could just let users opt-in to a similar build
> > > > > configuration through the OS by letting PAT not depend on MTRR.
> > > >
> > > > Case a) and b) do not cause any issue. They simply lead
> > > > mtrr_type_lookup() to return immediately with MTRR_TYPE_INVALID
> > > > (i.e. MTRR disable), and the callers handle this value properly.
> > > > These cases are only problematic when the OS tries to modify MTRR.
> > >
> > > OK if the OS returns MTRR_TYPE_INVALID, for folks who do not want MTRR
> > > code on their kernel, we should be OK?
> >
> > Technically OK. Not sure if we want such a Kconfig option, though.
>
> Its more of me wanting to get PAT out from under MTRR. Does that make
> sense?

It makes sense if you need to make the kernel size a bit smaller, and you
build kernels specific to Xen guests. Leaving the MTRR code enabled on Xen
guests does not cause you any issue, though.

Thanks,
-Toshi