2009-10-10 01:31:29

by Thomas Schlichter

[permalink] [raw]
Subject: [RFC Patch] use MTRR for write combining if PAT is not available

Hi,

I've found a problem with X.org not setting up MTRR for the framebuffer
memory. After I investigated I think this is not a X.org problem, but a kernel
issue.

X.org uses libpciaccess to map the framebuffer memory. This library opens
/sys/bus/pci/devices/*/resource0_wc and mmaps the memory. Unfortunately, the
kernel only enables write combining if PAT is enabled, if it is not, the
memory is mmapped uncached. But Xorg (respectively libpciaccess) thinks it was
successfully mapped with write combining enabled and thus does not
additionally set up MTRR entries.

The corresponding libpciaccess code can be found here:
http://cgit.freedesktop.org/xorg/lib/libpciaccess/tree/src/linux_sysfs.c#n501

If the kernel behavior is intentional and X.org should always set up MTRR
entries, why should it use /sys/.../resource0_wc at all? I think there are 2
possibilities to make the kernel behavior consistent:

1. The mmap_wc should fail if PAT is not enabled.
(libpciaccess will then map the framebuffer uncached and set up
MTRR entries)
2. Use MTRR to enable write combining if PAT is not available.

In an earlier thread about ioremap_wc, Arjan van de Ven wrote that option 2 is
preferred over option 1:

http://lkml.indiana.edu/hypermail/linux/kernel/0805.3/2925.html

So, I've created the attached patch implementing option 2. For me this solves
the problem with the slow Video playback due to not correctly set up MTRR
entries.

Kind regards,
Thomas Schlichter


Attachments:
0001-Use-MTRR-for-write-combining-mmap-ioremap-if-PAT-is-.patch (3.16 kB)

2009-10-10 04:25:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Thomas Schlichter wrote:
> Hi,
>
> I've found a problem with X.org not setting up MTRR for the framebuffer
> memory. After I investigated I think this is not a X.org problem, but a kernel
> issue.

is there any CPU left that does not support PAT ?

eg in other words, should we try instead to get PAT working for you?

2009-10-10 08:39:12

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Am Samstag 10 Oktober 2009 06:24:41 schrieb Arjan van de Ven:
> Thomas Schlichter wrote:
> > Hi,
> >
> > I've found a problem with X.org not setting up MTRR for the framebuffer
> > memory. After I investigated I think this is not a X.org problem, but a
> > kernel issue.
>
> is there any CPU left that does not support PAT ?
>
> eg in other words, should we try instead to get PAT working for you?

Hmm, my CPU (VIA Nano) seems to support PAT, but Ubuntu disables this
feature in its kernel config. I'm using 9.04, but even 9.10 will have PAT
disabled:

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-
karmic.git;a=blob;f=debian.master/config/config.common.ubuntu;h=a43cbabd02bfdf5412de6fea232db86aa5d3e742;hb=HEAD

The patch I sent does nothing if PAT is enabled, so it should be quite
safe for this case. And for the case when PAT is not enabled, it simply
tries to set up MTRR entries, if this fails it will still behave like
it does today...

So, convincing Ubuntu to enable PAT would be a workaround, but I think
the problem would still exist and should be fixed.

Kind regards,
Thomas

2009-10-10 15:46:11

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Thomas Schlichter wrote:

> Hmm, my CPU (VIA Nano) seems to support PAT, but Ubuntu disables this
> feature in its kernel config. I'm using 9.04, but even 9.10 will have PAT
> disabled:

that sounds like a bad idea. But to each distro their own I suppose.


>
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-
> karmic.git;a=blob;f=debian.master/config/config.common.ubuntu;h=a43cbabd02bfdf5412de6fea232db86aa5d3e742;hb=HEAD

> So, convincing Ubuntu to enable PAT would be a workaround, but I think
> the problem would still exist and should be fixed.

I think it's time to get rid of CONFIG_PAT at all.. and just have it be there.
Making 10+ year old CPU features that are essential functionality a config option...
... not a good idea to be honest.

2009-10-10 17:51:11

by Roland Dreier

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


> > Hmm, my CPU (VIA Nano) seems to support PAT, but Ubuntu disables this
> > feature in its kernel config. I'm using 9.04, but even 9.10 will have PAT
> > disabled:
>
> that sounds like a bad idea. But to each distro their own I suppose.

Seems that Ubuntu has caught this issue, and it will probably be
resolved: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/446480

- R.

2009-10-11 07:41:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Arjan van de Ven <[email protected]> wrote:

> Thomas Schlichter wrote:
>
>> Hmm, my CPU (VIA Nano) seems to support PAT, but Ubuntu disables this
>> feature in its kernel config. I'm using 9.04, but even 9.10 will have PAT
>> disabled:
>
> that sounds like a bad idea. But to each distro their own I suppose.
>
>
>>
>> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-
>> karmic.git;a=blob;f=debian.master/config/config.common.ubuntu;h=a43cbabd02bfdf5412de6fea232db86aa5d3e742;hb=HEAD
>
>> So, convincing Ubuntu to enable PAT would be a workaround, but I think
>> the problem would still exist and should be fixed.
>
> I think it's time to get rid of CONFIG_PAT at all.. and just have it
> be there. Making 10+ year old CPU features that are essential
> functionality a config option... ... not a good idea to be honest.

Mind sending a patch that makes the configurability of PAT dependent on
EMBEDDED or so? There's still the nopat boot option so it can still be
configured by a distro if it wishes to.

Ingo

2009-10-11 10:02:55

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Arjan van de Ven wrote:
> Thomas Schlichter wrote:
> > So, convincing Ubuntu to enable PAT would be a workaround, but I think
> > the problem would still exist and should be fixed.
>
> I think it's time to get rid of CONFIG_PAT at all.. and just have it be
> there. Making 10+ year old CPU features that are essential functionality a
> config option... ... not a good idea to be honest.

I do completely agree with you in this point.

But should the current behavior of not using MTRR for write combining when PAT
is disabled (for whatever reason) stay? I think it is more consistent to do
whatever is necessary to enable write combining when we are asked to.

Kind regards,
Thomas

Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Fri, 09 Oct 2009, Arjan van de Ven wrote:
> >I've found a problem with X.org not setting up MTRR for the
> >framebuffer memory. After I investigated I think this is not a
> >X.org problem, but a kernel issue.
>
> is there any CPU left that does not support PAT ?

A few million of them. Like every Centrino laptop out there, unless the
kernel blacklist for PAT on Intel CPUs is wrong.

> eg in other words, should we try instead to get PAT working for you?

Is it even possible? Can you get Intel to fix my Pentium M? I'd be quite
happy to test any microcode updates you send my way :)

As things stand right now, you can assume PAT will be there in 2015 or
thereabouts. Machine models that can run Linux have a service life of
around 15 years.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2009-10-11 18:55:54

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Henrique de Moraes Holschuh wrote:
> On Fri, 09 Oct 2009, Arjan van de Ven wrote:
>>> I've found a problem with X.org not setting up MTRR for the
>>> framebuffer memory. After I investigated I think this is not a
>>> X.org problem, but a kernel issue.
>> is there any CPU left that does not support PAT ?
>
> A few million of them. Like every Centrino laptop out there, unless the
> kernel blacklist for PAT on Intel CPUs is wrong.

afaik it is extremely conservative right now. FAR too much so.

Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Sun, 11 Oct 2009, Arjan van de Ven wrote:
> Henrique de Moraes Holschuh wrote:
> >On Fri, 09 Oct 2009, Arjan van de Ven wrote:
> >>>I've found a problem with X.org not setting up MTRR for the
> >>>framebuffer memory. After I investigated I think this is not a
> >>>X.org problem, but a kernel issue.
> >>is there any CPU left that does not support PAT ?
> >
> >A few million of them. Like every Centrino laptop out there, unless the
> >kernel blacklist for PAT on Intel CPUs is wrong.
>
> afaik it is extremely conservative right now. FAR too much so.

That *still* means we need the patch in this thread.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2009-10-12 18:10:32

by Robert Hancock

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On 10/11/2009 02:19 PM, Henrique de Moraes Holschuh wrote:
> On Sun, 11 Oct 2009, Arjan van de Ven wrote:
>> Henrique de Moraes Holschuh wrote:
>>> On Fri, 09 Oct 2009, Arjan van de Ven wrote:
>>>>> I've found a problem with X.org not setting up MTRR for the
>>>>> framebuffer memory. After I investigated I think this is not a
>>>>> X.org problem, but a kernel issue.
>>>> is there any CPU left that does not support PAT ?
>>>
>>> A few million of them. Like every Centrino laptop out there, unless the
>>> kernel blacklist for PAT on Intel CPUs is wrong.
>>
>> afaik it is extremely conservative right now. FAR too much so.
>
> That *still* means we need the patch in this thread.

I don't know that it is too conservative - the erratum which justifies
the disablement (Core Solo/Duo erratum AE7, Pentium M erratum Y31,
likely other IDs for it too) states that "A page whose PAT memory type
is USWC while the relevant MTRR memory type is UC, the consolidated
memory type may be treated as UC (rather than WC as specified in IA-32
Intel? Architecture Software Developer's Manual). When this erratum
occurs, the memory page may be as UC (rather than WC). This may have a
negative performance impact." This is exactly the case being discussed
here with the framebuffer which we're trying to set as WC with existing
MTRRs set (or defaulting to) UC. This not only affects Pentium M but
also the original Core Solo and Core Duo CPUs which surely is millions
of existing laptops. It seems pretty clear-cut and there's no workaround
identified, so I think we do indeed need a patch like this one.

2009-10-12 19:17:52

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Hi!

One problem with this patch is that it conflicts with the way graphics
drivers traditionally handles
the situation, namely

1) Set up mtrr
2) Map. If fallback to uncached minus we will still have write-combined
access.

I think mtrr-add used in this fashion will typically fail due to the
alignment constraints. In particular,
for set_memory_wc() the typical usage pattern is a large number of pages
in a fragmented physical address space.

So if we were to fix the problem with libpciaccess in the kernel, I
think the best option would be to fail the user-space mapping when we
can't make it write-combined.

Thanks,
Thomas

Thomas Schlichter wrote:
> Hi,
>
> when I first sent this E-Mail on Saturday, I unfortunately forgot to CC many
> people. Now I used get_maintainer.pl to get the list of people that may want
> to contribute to this topic.
>
> Because of this topic, there is already a patch from Arjan in the -tip tree to
> make PAT and MTRR options only configurable if EMBEDDED and enabled by
> default. I think this is a step in the right direction, but at least Henrique,
> Robert and I seem to think something like the attached patch is still
> required. What do you think?
>
> Kind regards,
> Thomas
>
> -----------------------------------------------------------------------------
>
> Hi,
>
> I've found a problem with X.org not setting up MTRR for the framebuffer
> memory. After I investigated I think this is not a X.org problem, but a kernel
> issue.
>
> X.org uses libpciaccess to map the framebuffer memory. This library opens
> /sys/bus/pci/devices/*/resource0_wc and mmaps the memory. Unfortunately, the
> kernel only enables write combining if PAT is enabled, if it is not, the
> memory is mmapped uncached. But Xorg (respectively libpciaccess) thinks it was
> successfully mapped with write combining enabled and thus does not
> additionally set up MTRR entries.
>
> The corresponding libpciaccess code can be found here:
> http://cgit.freedesktop.org/xorg/lib/libpciaccess/tree/src/linux_sysfs.c#n501
>
> If the kernel behavior is intentional and X.org should always set up MTRR
> entries, why should it use /sys/.../resource0_wc at all? I think there are 2
> possibilities to make the kernel behavior consistent:
>
> 1. The mmap_wc should fail if PAT is not enabled.
> (libpciaccess will then map the framebuffer uncached and set up
> MTRR entries)
> 2. Use MTRR to enable write combining if PAT is not available.
>
> In an earlier thread about ioremap_wc, Arjan van de Ven wrote that option 2 is
> preferred over option 1:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0805.3/2925.html
>
> So, I've created the attached patch implementing option 2. For me this solves
> the problem with the slow Video playback due to not correctly set up MTRR
> entries.
>
> Kind regards,
> Thomas Schlichter
>

2009-10-12 19:56:45

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Thomas Hellstrom wrote:
> Hi!
>
> One problem with this patch is that it conflicts with the way graphics
> drivers traditionally handles
> the situation, namely
>
> 1) Set up mtrr
> 2) Map. If fallback to uncached minus we will still have write-combined
> access.
>
> I think mtrr-add used in this fashion will typically fail due to the
> alignment constraints. In particular,
> for set_memory_wc() the typical usage pattern is a large number of pages
> in a fragmented physical address space.

Yes, maybe this patch tries to change current behavior too less. Indeed, if
setting up MTRR entries it simply behaves as today, and userspace does not see
that write combining is not correctly enabled.

> So if we were to fix the problem with libpciaccess in the kernel, I
> think the best option would be to fail the user-space mapping when we
> can't make it write-combined.

One idea to do this would be the attached patch. It simply returns an error if
PAT is not available. It does not even try to use MTRR on its own. But maybe
even better would be to combine both patches to something like this:
1. try to use PAT
2. if this fails try to set up MTRR
3. if this also fails, return error

Kind regards,
Thomas


Attachments:
0001-Do-not-mmap-ioremap-uncached-when-WC-is-requested.patch (2.57 kB)

2009-10-13 07:35:17

by Jan Beulich

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

>>> Thomas Schlichter <[email protected]> 12.10.09 20:32 >>>
>@@ -268,11 +269,14 @@ EXPORT_SYMBOL(ioremap_nocache);
> */
> void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
> {
>- if (pat_enabled)
>- return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,
>- __builtin_return_address(0));
>- else
>- return ioremap_nocache(phys_addr, size);
>+ if (!pat_enabled) {
>+ void __iomem *ret = ioremap_nocache(phys_addr, size);
>+ if (ret)
>+ mtrr_add(phys_addr, size, MTRR_TYPE_WRCOMB, false);

This won't work if phys_addr or size aren't page aligned, or if size isn't
a power of two.

>+ return ret;
>+ }
>+ return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,
>+ __builtin_return_address(0));
> }
> EXPORT_SYMBOL(ioremap_wc);
>
>@@ -1010,8 +1011,13 @@ int set_memory_wc(unsigned long addr, int numpages)
> {
> int ret;
>
>- if (!pat_enabled)
>- return set_memory_uc(addr, numpages);
>+ if (!pat_enabled) {
>+ ret = set_memory_uc(addr, numpages);
>+ if (!ret)
>+ mtrr_add(__pa(addr), numpages * PAGE_SIZE,
>+ MTRR_TYPE_WRCOMB, false);

Similarly, this requires numpages to be a power of two.

>+ return ret;
>+ }
>
> ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
> _PAGE_CACHE_WC, NULL);

I think user mode code handles this by splitting the request and trying to
establish as many entries as possible (and those as big as possible).

Also, in all cases you pass 'false' for the 'increment' parameter, in order
to avoid having to tear down the established entries. While this may be
reasonable for kernel initiated mappings, I don't think that's acceptable
for such originating from user mode.

Jan

2009-10-13 21:12:28

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Eric Anholt wrote:
> On Mon, 2009-10-12 at 21:45 +0200, Thomas Schlichter wrote:
> > Yes, maybe this patch tries to change current behavior too less. Indeed,
> > if setting up MTRR entries it simply behaves as today, and userspace does
> > not see that write combining is not correctly enabled.
>
> I'm uncomfortable with this patch because it doesn't appear to cover any
> callers of these functions inside of the kernel. Have you audited them
> to make sure they can handle NULL being returned?

No, I haven't. And to be honest, I think the earlier patch that adds MTRR
entries should be more safe, as it modifies behavior only slightly.

> Seems like we should install an MTRR instead. Requiring userland to set
> up the MTRR on the kernel's behalf is backwards.

Exactly.

> With modern drivers we're installing any required MTRRs at module load
> in the kernel, and that's where things should be moving.

I think in general this is not possible, because not for all graphics chips
there are kernel drivers (required). E.g. for my VIA VX800 based notebook,
there is no kernel module that should be responsible to set-up the MTRR
entries. Here it's up to the (userspace) X.org driver. It opens the
/sys/bus/pci/devices/.../resource0_wc device and mmaps the framebuffer memory.
With PAT this will set up a write combining memory area, and with my first
patch this should also happen without PAT with MTRR.

> As long as
> this doesn't interfere with them, I'm OK. And if some kernel driver is
> failing to install its MTRR, well, let's fix it.

Well, I think the mtrr_add inside mmap should not do any harm...

Kind regards,
Thomas

2009-10-13 21:29:46

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Jan Beulich wrote:
> >>> Thomas Schlichter <[email protected]> 12.10.09 20:32 >>>
> >+ if (!pat_enabled) {
> >+ void __iomem *ret = ioremap_nocache(phys_addr, size);
> >+ if (ret)
> >+ mtrr_add(phys_addr, size, MTRR_TYPE_WRCOMB, false);
>
> This won't work if phys_addr or size aren't page aligned, or if size isn't
> a power of two.

No, at least the comments in mtrr_add and mtrr_check state that it is just
required that phys_addr and size are multiple of PAGE_SIZE. And I'm not sure
if it is always safe to round these up/down to the next PAGE boundary. If it
is not, maybe it is better to fail...

> >+ if (!pat_enabled) {
> >+ ret = set_memory_uc(addr, numpages);
> >+ if (!ret)
> >+ mtrr_add(__pa(addr), numpages * PAGE_SIZE,
> >+ MTRR_TYPE_WRCOMB, false);
>
> Similarly, this requires numpages to be a power of two.

Same as above.

> I think user mode code handles this by splitting the request and trying to
> establish as many entries as possible (and those as big as possible).

Seems not required here...

> Also, in all cases you pass 'false' for the 'increment' parameter, in order
> to avoid having to tear down the established entries. While this may be
> reasonable for kernel initiated mappings, I don't think that's acceptable
> for such originating from user mode.

Well, therefore I would have to remember which mmap successfully set up which
MTRR entries. And it must be ensured that for each mmap there always is a
munmap. If this is indeed always ensured, and if it's worth to maintain the
required data, I'd be happy to add a corresponding mtrr_del there.

Kind regards,
Thomas

2009-10-14 08:13:22

by Jan Beulich

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

>>> Thomas Schlichter <[email protected]> 13.10.09 23:29 >>>
>Jan Beulich wrote:
>> >>> Thomas Schlichter <[email protected]> 12.10.09 20:32 >>>
>> >+ if (!pat_enabled) {
>> >+ void __iomem *ret = ioremap_nocache(phys_addr, size);
>> >+ if (ret)
>> >+ mtrr_add(phys_addr, size, MTRR_TYPE_WRCOMB, false);
>>
>> This won't work if phys_addr or size aren't page aligned, or if size isn't
>> a power of two.
>
>No, at least the comments in mtrr_add and mtrr_check state that it is just
>required that phys_addr and size are multiple of PAGE_SIZE. And I'm not sure
>if it is always safe to round these up/down to the next PAGE boundary. If it
>is not, maybe it is better to fail...

That function isn't the limiting factor, generic_validate_add_page() is
what you need to look at (plus the spec on how the MTRR ranges are
calculated by the CPU from the base/mask register pairs).

Jan

2009-10-14 19:14:53

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Jan Beulich wrote:
> >>> Thomas Schlichter <[email protected]> 13.10.09 23:29 >>>
> >No, at least the comments in mtrr_add and mtrr_check state that it is just
> >required that phys_addr and size are multiple of PAGE_SIZE. And I'm not
> > sure if it is always safe to round these up/down to the next PAGE
> > boundary. If it is not, maybe it is better to fail...
>
> That function isn't the limiting factor, generic_validate_add_page() is
> what you need to look at (plus the spec on how the MTRR ranges are
> calculated by the CPU from the base/mask register pairs).

Yes, you are right. Sorry for not looking into generic_validate_add_page()
before. Thank you for showing!

I added a function mtrr_add_unaligned() that tries to create as many MTRR
entries as necessary, beginning with the biggest regions. It does not check
the return values of each mtrr_add(), nor does it return the indexes of the
created MTRR entries. So it seems to be only useful with increment=false. Or
do you have a better idea?

Kind regards,
Thomas


Attachments:
0001-Add-new-mtrr_add_unaligned-function.patch (2.56 kB)
0002-Use-MTRR-for-write-combining-mmap-ioremap-if-PAT-is-.patch (3.20 kB)
Download all attachments

2009-10-15 07:47:50

by Jan Beulich

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

>>> Thomas Schlichter <[email protected]> 14.10.09 21:14 >>>
>I added a function mtrr_add_unaligned() that tries to create as many MTRR
>entries as necessary, beginning with the biggest regions. It does not check
>the return values of each mtrr_add(), nor does it return the indexes of the
>created MTRR entries. So it seems to be only useful with increment=false. Or
>do you have a better idea?

I don't have immediate thoughts on how to address this, but nevertheless
I continue to think that the issue must be solved in some way, even more
that now you may be leaking multiple MTRRs.

Jan

2009-10-17 19:48:37

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Jan Beulich wrote:
> >>> Thomas Schlichter <[email protected]> 14.10.09 21:14 >>>
> >
> >I added a function mtrr_add_unaligned() that tries to create as many MTRR
> >entries as necessary, beginning with the biggest regions. It does not
> > check the return values of each mtrr_add(), nor does it return the
> > indexes of the created MTRR entries. So it seems to be only useful with
> > increment=false. Or do you have a better idea?
>
> I don't have immediate thoughts on how to address this, but nevertheless
> I continue to think that the issue must be solved in some way, even more
> that now you may be leaking multiple MTRRs.

OK, it required some changes on different edges, but now I have it...

Patches 0001-0003 introduce functionality/changes to the MTRR and sysfs
subsystems. Patch 0004 is the main patch which sets up the MTRR entries when
pci memory regions are mmapped. The Patches 0005-0006 also change ioremap()
and set_mempry_wc() to set up MTRR entries. These two are completely optional,
especially and there is currently no way to automatically remove MTRR entries
set up with them.

What do you think about these patches?

Kind regards,
Thomas


Attachments:
0005-Use-MTRR-for-write-combining-ioremap.patch (1.34 kB)
0001-Add-new-mtrr_add_unaligned-function.patch (2.78 kB)
0002-Make-num_var_ranges-accessible-outside-MTRR-code.patch (1.60 kB)
0003-Provide-per-file-private-data-for-bin-sysfs-files.patch (2.75 kB)
0004-Use-MTRR-for-pci_mmap_resource_wc-if-PAT-is-not-avai.patch (3.45 kB)
0006-Set-up-MTRR-entries-within-set_memory_wc.patch (1.22 kB)
Download all attachments

2009-10-19 09:15:16

by Jan Beulich

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

>>> Thomas Schlichter <[email protected]> 17.10.09 21:48 >>>
>Patches 0001-0003 introduce functionality/changes to the MTRR and sysfs
>subsystems. Patch 0004 is the main patch which sets up the MTRR entries when
>pci memory regions are mmapped. The Patches 0005-0006 also change ioremap()
>and set_mempry_wc() to set up MTRR entries. These two are completely optional,
>especially and there is currently no way to automatically remove MTRR entries
>set up with them.
>
>What do you think about these patches?

Functionality-wise this looks fine to me; whether the core sysfs changes
are acceptable I can't judge, though.

However, I would recommend folding the last two arguments of
mtrr_add_unaligned(), i.e. use mtrr_usage != NULL as the increment
argument passed to mtrr_add().

And patches 5 and 6 would apparently not build right now, as they're
failing to pass the new 5th argument to mtrr_add_unaligned().

Also, why do you add x86-specific code to drivers/pci-sysfs.c when the
function called there (pci_mmap_page_range()) already is arch-specific?
Moving your addition there would also allow covering the related
(legacy?) procfs based functionality... pci_release() could also become
arch-specific, with a fall-back definition to NULL.

Additionally, I would suggest making those code portions depend on
CONFIG_X86_MTRR rather than CONFIG_X86, so that you don't
needlessly (try to) allocate the mtrr_usage vector.

Jan

2009-10-19 13:46:56

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

> X.org uses libpciaccess which tries to mmap with write combining enabled via
> /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, we
> fall back to uncached mmap. Then libpciaccess thinks it succeeded mapping
> with write combining anabled and does not set up suited MTRR entries. ;-(

I am not familiar with libpciaccess, but I was wondering why that library
cannot realize it failed in its endeavour and use other means to accomplish
its goals?

2009-10-19 13:45:07

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Mon, 2009-10-19 at 02:16 -0700, Jan Beulich wrote:
> Functionality-wise this looks fine to me

If we are going to make ioremap() and set_memory_wc() add mtrr's in
non-pat case, then we need to delete the added mtrr(s) in the
corresponding iounmap() and set_memory_wb() aswell.

hmm, this is becoming too complex. The way i915 and other graphics
drivers are using set_memory_wc(), it is def a bad idea to start adding
mtrr's behind the back for non-pat case.

Can't we just force PAT option always and we probably don't care about
ioremap_wc() on processors were PAT doesn't get enabled because of known
errata.

or Perhaps just try to add mtrr only for the pci mmap case like the 4th
patch in this series..

thanks,
suresh

2009-10-19 13:55:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Suresh Siddha <[email protected]> wrote:

> On Mon, 2009-10-19 at 02:16 -0700, Jan Beulich wrote:
> > Functionality-wise this looks fine to me
>
> If we are going to make ioremap() and set_memory_wc() add mtrr's in
> non-pat case, then we need to delete the added mtrr(s) in the
> corresponding iounmap() and set_memory_wb() aswell.
>
> hmm, this is becoming too complex. The way i915 and other graphics
> drivers are using set_memory_wc(), it is def a bad idea to start
> adding mtrr's behind the back for non-pat case.

Touching MTRRs beyond working around basic bugs like non-cached RAM
sounds like madness. The interactions with PAT are ... countless.

> Can't we just force PAT option always and we probably don't care about
> ioremap_wc() on processors were PAT doesn't get enabled because of
> known errata.

We can make PAT configurability dependent on EMBEDDED-y - mind sending a
patch for that?

Ingo

2009-10-19 14:59:56

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Suresh Siddha wrote:
> If we are going to make ioremap() and set_memory_wc() add mtrr's in
> non-pat case, then we need to delete the added mtrr(s) in the
> corresponding iounmap() and set_memory_wb() aswell.
>
> hmm, this is becoming too complex. The way i915 and other graphics
> drivers are using set_memory_wc(), it is def a bad idea to start adding
> mtrr's behind the back for non-pat case.

Yes, maybe it's better to drop it for ioremap() and set_memory_wc(). But I'd like
to keep it for mmapping the PCI region. It should help all the people with
PAT-incapable CPUs and graphics chips without DRM support (for them there
simply is no driver that should set up the MTRR entries...).

> Can't we just force PAT option always and we probably don't care about
> ioremap_wc() on processors were PAT doesn't get enabled because of known
> errata.

I don't think this is a good idea, Robert Hancock wrote there may be millions of
such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum Y31) :
http://marc.info/?l=linux-kernel&m=125537136105246&w=2

> or Perhaps just try to add mtrr only for the pci mmap case like the 4th
> patch in this series..

I'd prefer this! ;-)

Kind regards,
Thomas

2009-10-19 15:07:37

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Konrad Rzeszutek Wilk wrote:
> I am not familiar with libpciaccess, but I was wondering why that library
> cannot realize it failed in its endeavour and use other means to accomplish
> its goals?

The thing is that the mmap() will succeed! Only the memory region will not be
set up as write combining when PAT is disabled. So userspace would have to
find out it PAT is enabled in kernel and accordingly set up MTRR entries. And
currently I don't know of a kernel interface that would tell userspace if PAT is
enabled.

Kind regards,
Thomas

2009-10-19 15:10:22

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Ingo Molnar wrote:
> * Suresh Siddha <[email protected]> wrote:
> > Can't we just force PAT option always and we probably don't care about
> > ioremap_wc() on processors were PAT doesn't get enabled because of
> > known errata.
>
> We can make PAT configurability dependent on EMBEDDED-y - mind sending a
> patch for that?

I think there already is such a patch in -tip:
http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=c03cb3149daed3e411657e3212d05ae27cf1a874

Kind regards,
Thomas

2009-10-19 15:29:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Thomas Schlichter <[email protected]> wrote:

> > We can make PAT configurability dependent on EMBEDDED-y - mind
> > sending a patch for that?
>
> I think there already is such a patch in -tip:
> http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=c03cb3149daed3e411657e3212d05ae27cf1a874

Yes :-)

Ingo

2009-10-19 15:32:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Thomas Schlichter <[email protected]> wrote:

> I don't think this is a good idea, Robert Hancock wrote there may be
> millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum
> Y31) :
>
> http://marc.info/?l=linux-kernel&m=125537136105246&w=2
>
> > or Perhaps just try to add mtrr only for the pci mmap case like the
> > 4th patch in this series..
>
> I'd prefer this! ;-)

Hm, we could perhaps do that - but i think we should only do that on
systems that have PAT disabled.

Which brings up the question of how to properly QA such a narrow segment
of the market. Maybe disabling CONFIG_X86_PAT should enable that logic
too.

Ingo

2009-10-19 21:50:39

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Mon, 2009-10-19 at 08:31 -0700, Ingo Molnar wrote:
> * Thomas Schlichter <[email protected]> wrote:
>
> > I don't think this is a good idea, Robert Hancock wrote there may be
> > millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum
> > Y31) :
> >
> > http://marc.info/?l=linux-kernel&m=125537136105246&w=2
> >
> > > or Perhaps just try to add mtrr only for the pci mmap case like the
> > > 4th patch in this series..
> >
> > I'd prefer this! ;-)
>
> Hm, we could perhaps do that - but i think we should only do that on
> systems that have PAT disabled.
>
> Which brings up the question of how to properly QA such a narrow segment
> of the market. Maybe disabling CONFIG_X86_PAT should enable that logic
> too.

Also, I think we can simplify and avoid mtrr_add_unaligned() and worry
only about the pci mmap aligned cases etc. Those are the only
interesting ones.

thanks,
suresh

2009-10-20 20:35:20

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Ingo Molnar wrote:
> * Thomas Schlichter <[email protected]> wrote:
> > > or Perhaps just try to add mtrr only for the pci mmap case like the
> > > 4th patch in this series..
> >
> > I'd prefer this! ;-)
>
> Hm, we could perhaps do that - but i think we should only do that on
> systems that have PAT disabled.

The patch I sent should do exactly that.

> Which brings up the question of how to properly QA such a narrow segment
> of the market. Maybe disabling CONFIG_X86_PAT should enable that logic
> too.

When CONFIG_X86_PAT is disabled, pat_enabled is 0, and thus my new code should
be active. Or do I miss something?

What do you think about the latest version of my patch series I just sent?

Kind regards,
Thomas

2009-10-20 22:00:00

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Tue, 2009-10-20 at 13:35 -0700, Thomas Schlichter wrote:
> What do you think about the latest version of my patch series I just sent?

I think we can simplify this by just using mtrr_add_page() and avoid
all the complexity that comes with mtrr_add_unaligned().

pci_mmap_range() should call one mtrr_add_page() if the base and size
are nicely aligned. Almost all the cases, this is the usage sequence
here anyway.

thanks,
suresh

2009-10-21 11:53:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Suresh Siddha <[email protected]> wrote:

> On Tue, 2009-10-20 at 13:35 -0700, Thomas Schlichter wrote:
> > What do you think about the latest version of my patch series I just sent?
>
> I think we can simplify this by just using mtrr_add_page() and avoid
> all the complexity that comes with mtrr_add_unaligned().
>
> pci_mmap_range() should call one mtrr_add_page() if the base and size
> are nicely aligned. Almost all the cases, this is the usage sequence
> here anyway.

yep, keeping it simple is a must.

Ingo

2009-10-21 13:45:10

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Suresh Siddha wrote:
> I think we can simplify this by just using mtrr_add_page() and avoid
> all the complexity that comes with mtrr_add_unaligned().
>
> pci_mmap_range() should call one mtrr_add_page() if the base and size
> are nicely aligned. Almost all the cases, this is the usage sequence
> here anyway.

Ingo Molnar wrote:
> Can we eliminate mtrr_add_unaligned() as Suresh suggested, and still
> make it work on your testbox?

Yes, I had that in the first place, but Jan suggested to extend it to also
handle non-aligned, non-power-of-two cases:
http://marc.info/?l=linux-kernel&m=125541951529918&w=2

So if it's OK for Jan, I'll reduce the functionality again and use mtrr_add()
instead. Btw. this simply means to drop mtrr_add_unaligned(), all the other
parts are still required for reference counting and a proper mtrr_del() on file
close.

Kind regards,
Thomas

2009-10-21 14:10:18

by Jan Beulich

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

>>> Thomas Schlichter <[email protected]> 21.10.09 15:45 >>>
>Ingo Molnar wrote:
>> Can we eliminate mtrr_add_unaligned() as Suresh suggested, and still
>> make it work on your testbox?
>
>Yes, I had that in the first place, but Jan suggested to extend it to also
>handle non-aligned, non-power-of-two cases:
> http://marc.info/?l=linux-kernel&m=125541951529918&w=2

I merely pointed out it wouldn't work for unaligned addresses or sizes
passed in.

>So if it's OK for Jan, I'll reduce the functionality again and use mtrr_add()
>instead. Btw. this simply means to drop mtrr_add_unaligned(), all the other
>parts are still required for reference counting and a proper mtrr_del() on file
>close.

I'm perfectly fine with just handling the aligned case, as long as the code
checks that the alignment constraints are met.

Jan

2009-10-21 17:36:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Jan Beulich <[email protected]> wrote:

> > So if it's OK for Jan, I'll reduce the functionality again and use
> > mtrr_add() instead. Btw. this simply means to drop
> > mtrr_add_unaligned(), all the other parts are still required for
> > reference counting and a proper mtrr_del() on file close.
>
> I'm perfectly fine with just handling the aligned case, as long as the
> code checks that the alignment constraints are met.

It could even emit a debug warning if they are not met - that way we'll
see how important the unaligned case is.

Ingo

2009-10-21 20:01:36

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Ingo Molnar wrote:
> * Jan Beulich <[email protected]> wrote:
> > I'm perfectly fine with just handling the aligned case, as long as the
> > code checks that the alignment constraints are met.
>
> It could even emit a debug warning if they are not met - that way we'll
> see how important the unaligned case is.

OK, so I think the attached patches should do it. Hopefully I have addressed
all your comments.

This series works for my test machine, without it, or without X running, I
have these MTRR entries:
reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
reg01: base=0x06ff00000 ( 1791MB), size= 1MB, count=1: uncachable
reg02: base=0x070000000 ( 1792MB), size= 256MB, count=1: uncachable

With the patches applied and X running I get these:
reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
reg01: base=0x06ff00000 ( 1791MB), size= 1MB, count=1: uncachable
reg02: base=0x070000000 ( 1792MB), size= 256MB, count=1: uncachable
reg03: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: write-combining

Best regards,
Thomas


Attachments:
0001-Make-num_var_ranges-accessible-outside-MTRR-code.patch (1.60 kB)
0002-Provide-per-file-private-data-for-bin-sysfs-files.patch (2.15 kB)
0003-Use-MTRR-for-pci_mmap_resource_wc-if-PAT-is-not-avai.patch (5.62 kB)
Download all attachments

2009-10-22 10:32:19

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Wed, 2009-10-21 at 13:01 -0700, Thomas Schlichter wrote:
> OK, so I think the attached patches should do it. Hopefully I have addressed
> all your comments.

Thomas,

I have couple of issues with this patchset still. pci_mmap_page_range()
doesn't get called for each fork(). So, we won't be ref counting the
mtrr usage properly.

I need to think a bit more carefully on this. Can I get back to you
early next week on this, as I am traveling and need to think through
this?

We already keep track of some of the PAT ref counting using
track_pfn_vma_copy(). And we need to extend/use something similar here.

Even if we need to extend sysfs or pci vma ops, we need to increment and
decrement the ref count of the mtrr register that gets used. There is no
need to go through num_var_ranges etc.

thanks,
suresh

2009-10-22 12:08:29

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Suresh Siddha wrote:
> On Wed, 2009-10-21 at 13:01 -0700, Thomas Schlichter wrote:
> > OK, so I think the attached patches should do it. Hopefully I have addressed
> > all your comments.
>
> Thomas,
>
> I have couple of issues with this patchset still. pci_mmap_page_range()
> doesn't get called for each fork(). So, we won't be ref counting the
> mtrr usage properly.

When forking, what happens with the "struct file"? If it is being copied, then the
processes share the same private data which would be freed during the first
release(). I think this would be a problem whereever file-private data are used.

So I think it must be shared between the forked processes and some reference
counting must exist. This reference counting must ensure that release() is only
called when all processes did close() their file.

And in that case (shared "struct file", one single release() call in the end) this
implementation should be completely safe...

> I need to think a bit more carefully on this. Can I get back to you
> early next week on this, as I am traveling and need to think through
> this?

Of course you can. But as I do this just in my spare time, I can only work on this
in the evenings...

> We already keep track of some of the PAT ref counting using
> track_pfn_vma_copy(). And we need to extend/use something similar here.
>
> Even if we need to extend sysfs or pci vma ops, we need to increment and
> decrement the ref count of the mtrr register that gets used.

The MTRR register ref count is already incremented with each mtrr_add() or
mtrr_add_page() that has the "increment" parameter set to true. Which is the
case in my implementation.

> There is no need to go through num_var_ranges etc.

Well I have to remember wich file added which MTRR entries. Because I have
to remove them if the file is being closed. Therefore I need an array of size
"num_var_ranges" (or MTRR_MAX_VAR_RANGES which is the uper bound).

Btw. if there really is a problem in my reference counting code, it also exists in
the current code in:
arch/x86/kernel/cpu/mtrr/if.c

My patch borrows its ref-counting parts from there...

Kind regards,
Thomas

2009-10-22 12:15:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On 10/22/2009 09:08 PM, Thomas Schlichter wrote:
>>
>> I have couple of issues with this patchset still. pci_mmap_page_range()
>> doesn't get called for each fork(). So, we won't be ref counting the
>> mtrr usage properly.
>
> When forking, what happens with the "struct file"? If it is being copied, then the
> processes share the same private data which would be freed during the first
> release(). I think this would be a problem whereever file-private data are used.
>
> So I think it must be shared between the forked processes and some reference
> counting must exist. This reference counting must ensure that release() is only
> called when all processes did close() their file.
>
> And in that case (shared "struct file", one single release() call in the end) this
> implementation should be completely safe...
>

struct file is shared between forked processes.

-hpa

2009-10-22 14:02:11

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 05:14 -0700, H. Peter Anvin wrote:
> On 10/22/2009 09:08 PM, Thomas Schlichter wrote:
> >>
> >> I have couple of issues with this patchset still. pci_mmap_page_range()
> >> doesn't get called for each fork(). So, we won't be ref counting the
> >> mtrr usage properly.
> >
> > When forking, what happens with the "struct file"? If it is being copied, then the
> > processes share the same private data which would be freed during the first
> > release(). I think this would be a problem whereever file-private data are used.
> >
> > So I think it must be shared between the forked processes and some reference
> > counting must exist. This reference counting must ensure that release() is only
> > called when all processes did close() their file.
> >
> > And in that case (shared "struct file", one single release() call in the end) this
> > implementation should be completely safe...
> >
>
> struct file is shared between forked processes.

That is correct. But I am referring to the ref-count getting incremented
in Thomas's patch only in the pci_mmap_page_range() which will be called
only during first mmap.

We need to keep track of the counts of later forks too. For PAT, we keep
track of this ref counting in track_pfn_vma_copy(). We shouldn't use
different tracking mechanisms for PAT and non-PAT. We should cleanly tap
into track_pfn_vma_copy() or extend that to cover this case aswell.

thanks,
suresh

2009-10-22 13:58:19

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 05:08 -0700, Thomas Schlichter wrote:
> When forking, what happens with the "struct file"? If it is being copied, then the
> processes share the same private data which would be freed during the first
> release(). I think this would be a problem whereever file-private data are used.
>
> So I think it must be shared between the forked processes and some reference
> counting must exist. This reference counting must ensure that release() is only
> called when all processes did close() their file.
>
> And in that case (shared "struct file", one single release() call in the end) this
> implementation should be completely safe...

I am referring to the refcount getting incremented. Also, let me think
about your direction (as the pci_mmap_page_rane() is explicitly adding
the mtrr, we should perhaps do the ref counting there, perhaps)

> > There is no need to go through num_var_ranges etc.
>
> Well I have to remember wich file added which MTRR entries. Because I have
> to remove them if the file is being closed. Therefore I need an array of size
> "num_var_ranges" (or MTRR_MAX_VAR_RANGES which is the uper bound).

No. the private data for example can keep track of a struct containing
mtrr number and ref count etc. Exporting var_ranges and going through
var ranges elements in an array is not clean, especially when you are
populating only one element.

thanks
suresh

2009-10-22 14:27:53

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Suresh Siddha wrote:
> On Thu, 2009-10-22 at 05:14 -0700, H. Peter Anvin wrote:
> > On 10/22/2009 09:08 PM, Thomas Schlichter wrote:
> > > And in that case (shared "struct file", one single release() call in the end) this
> > > implementation should be completely safe...
> >
> > struct file is shared between forked processes.
>
> That is correct. But I am referring to the ref-count getting incremented
> in Thomas's patch only in the pci_mmap_page_range() which will be called
> only during first mmap.
>
> We need to keep track of the counts of later forks too.

When forked processes do mmap() PCI additional memory,
pci_mmap_page_range() will be called again and the corresponding (shared)
mtrr_usage_count wil be incremented. So we do keep track of later forks...

Yes, the MTRR reference count has nothing to do with the processes using this
memory, but if you want this, arch/x86/kernel/cpu/mtrr/if.c must be changed, too.

Regards,
Thomas

2009-10-22 14:41:17

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Suresh Siddha wrote:
> > > There is no need to go through num_var_ranges etc.
> >
> > Well I have to remember wich file added which MTRR entries. Because I have
> > to remove them if the file is being closed. Therefore I need an array of size
> > "num_var_ranges" (or MTRR_MAX_VAR_RANGES which is the uper bound).
>
> No. the private data for example can keep track of a struct containing
> mtrr number and ref count etc. Exporting var_ranges and going through
> var ranges elements in an array is not clean, especially when you are
> populating only one element.

OK, I should have written that num_var_ranges is neccessary if I do copy a
algorithm for exactly the same purpose from an other place. And I don't see
anything better in having a dynamically growing list that makes the operation
of incrementing MTRR entries an O(n) operation where it now is a O(1) operation.

Additionally, the worst case memory requirement would be
2*sizeof(int)*num_var_ranges, where it is now 1*sizeof(int)*num_var_ranges.

So for me, this would be a step back, but if you want this, you should additionally
change arch/x86/kernel/cpu/mtrr/if.c from where I reused the algorithm.

Regards,
Thomas

2009-10-22 15:34:55

by Eric Anholt

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 02:53 -0700, Suresh Siddha wrote:
> On Wed, 2009-10-21 at 13:01 -0700, Thomas Schlichter wrote:
> > OK, so I think the attached patches should do it. Hopefully I have addressed
> > all your comments.
>
> Thomas,
>
> I have couple of issues with this patchset still. pci_mmap_page_range()
> doesn't get called for each fork(). So, we won't be ref counting the
> mtrr usage properly.
>
> I need to think a bit more carefully on this. Can I get back to you
> early next week on this, as I am traveling and need to think through
> this?
>
> We already keep track of some of the PAT ref counting using
> track_pfn_vma_copy(). And we need to extend/use something similar here.
>
> Even if we need to extend sysfs or pci vma ops, we need to increment and
> decrement the ref count of the mtrr register that gets used. There is no
> need to go through num_var_ranges etc.

Can we just not create the _wc sysfs entry if we don't have PAT? I
don't think there's userland relying on its presence as opposed to the
non-_wc entry.

--
Eric Anholt
[email protected] [email protected]



Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-10-22 21:48:03

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote:
> Can we just not create the _wc sysfs entry if we don't have PAT? I
> don't think there's userland relying on its presence as opposed to the
> non-_wc entry.

Yes indeed. Jesse do you see an issue with this? This is simple and
clean. Thanks Eric.

2009-10-22 23:10:53

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 22 Oct 2009 14:47:30 -0700
Suresh Siddha <[email protected]> wrote:

> On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote:
> > Can we just not create the _wc sysfs entry if we don't have PAT? I
> > don't think there's userland relying on its presence as opposed to
> > the non-_wc entry.
>
> Yes indeed. Jesse do you see an issue with this? This is simple and
> clean. Thanks Eric.

Yeah, I think that will be fine. In fact, older versions of
libpciaccess will behave better if we do it that way (iirc it only
allocates an MTRR if the resource_wc file doesn't exist or fails to get
mapped).

Jesse

2009-10-23 00:12:28

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote:
> On Thu, 22 Oct 2009 14:47:30 -0700
> Suresh Siddha <[email protected]> wrote:
>
> > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote:
> > > Can we just not create the _wc sysfs entry if we don't have PAT? I
> > > don't think there's userland relying on its presence as opposed to
> > > the non-_wc entry.
> >
> > Yes indeed. Jesse do you see an issue with this? This is simple and
> > clean. Thanks Eric.
>
> Yeah, I think that will be fine. In fact, older versions of
> libpciaccess will behave better if we do it that way (iirc it only
> allocates an MTRR if the resource_wc file doesn't exist or fails to get
> mapped).

Eric, care to send the patch?

2009-10-23 01:53:28

by Eric Anholt

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 17:11 -0700, Suresh Siddha wrote:
> On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote:
> > On Thu, 22 Oct 2009 14:47:30 -0700
> > Suresh Siddha <[email protected]> wrote:
> >
> > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote:
> > > > Can we just not create the _wc sysfs entry if we don't have PAT? I
> > > > don't think there's userland relying on its presence as opposed to
> > > > the non-_wc entry.
> > >
> > > Yes indeed. Jesse do you see an issue with this? This is simple and
> > > clean. Thanks Eric.
> >
> > Yeah, I think that will be fine. In fact, older versions of
> > libpciaccess will behave better if we do it that way (iirc it only
> > allocates an MTRR if the resource_wc file doesn't exist or fails to get
> > mapped).
>
> Eric, care to send the patch?

I don't have a patch, I was just suggesting a way to handle the
submitter's problem that won't involve complicated changes that nobody
else will be testing since everyone *should* have a graphics driver for
their graphics hardware.

--
Eric Anholt
[email protected] [email protected]



Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2009-10-23 04:31:57

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 22 Oct 2009 18:53:19 -0700
Eric Anholt <[email protected]> wrote:

> On Thu, 2009-10-22 at 17:11 -0700, Suresh Siddha wrote:
> > On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote:
> > > On Thu, 22 Oct 2009 14:47:30 -0700
> > > Suresh Siddha <[email protected]> wrote:
> > >
> > > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote:
> > > > > Can we just not create the _wc sysfs entry if we don't have
> > > > > PAT? I don't think there's userland relying on its presence
> > > > > as opposed to the non-_wc entry.
> > > >
> > > > Yes indeed. Jesse do you see an issue with this? This is simple
> > > > and clean. Thanks Eric.
> > >
> > > Yeah, I think that will be fine. In fact, older versions of
> > > libpciaccess will behave better if we do it that way (iirc it only
> > > allocates an MTRR if the resource_wc file doesn't exist or fails
> > > to get mapped).
> >
> > Eric, care to send the patch?
>
> I don't have a patch, I was just suggesting a way to handle the
> submitter's problem that won't involve complicated changes that nobody
> else will be testing since everyone *should* have a graphics driver
> for their graphics hardware.

Here's a quick & dirty version, totally untested. A cleaner approach
would be to separate the WC mapping routines and hide the return
-EINVAL in arch specific code...

Jesse

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 0f6382f..41010bb 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -23,6 +23,9 @@
#include <linux/mm.h>
#include <linux/capability.h>
#include <linux/pci-aspm.h>
+#ifdef CONFIG_X86
+#include <asm/pat.h>
+#endif
#include "pci.h"

static int sysfs_initialized; /* = 0 */
@@ -730,6 +733,10 @@ static int
pci_mmap_resource_wc(struct kobject *kobj, struct bin_attribute *attr,
struct vm_area_struct *vma)
{
+#ifdef CONFIG_X86
+ if (!pat_enabled)
+ return -EINVAL;
+#endif
return pci_mmap_resource(kobj, attr, vma, 1);
}

2009-10-23 04:33:57

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 18:53 -0700, Eric Anholt wrote:
> On Thu, 2009-10-22 at 17:11 -0700, Suresh Siddha wrote:
> > On Thu, 2009-10-22 at 16:10 -0700, Jesse Barnes wrote:
> > > On Thu, 22 Oct 2009 14:47:30 -0700
> > > Suresh Siddha <[email protected]> wrote:
> > >
> > > > On Thu, 2009-10-22 at 08:34 -0700, Eric Anholt wrote:
> > > > > Can we just not create the _wc sysfs entry if we don't have PAT? I
> > > > > don't think there's userland relying on its presence as opposed to
> > > > > the non-_wc entry.
> > > >
> > > > Yes indeed. Jesse do you see an issue with this? This is simple and
> > > > clean. Thanks Eric.
> > >
> > > Yeah, I think that will be fine. In fact, older versions of
> > > libpciaccess will behave better if we do it that way (iirc it only
> > > allocates an MTRR if the resource_wc file doesn't exist or fails to get
> > > mapped).
> >
> > Eric, care to send the patch?
>
> I don't have a patch, I was just suggesting a way to handle the
> submitter's problem that won't involve complicated changes that nobody
> else will be testing since everyone *should* have a graphics driver for
> their graphics hardware.

I can send the patch early next week unless Thomas or someone else beats
me.

2009-10-23 04:58:48

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Thu, 2009-10-22 at 21:31 -0700, Jesse Barnes wrote:
> Here's a quick & dirty version, totally untested. A cleaner approach
> would be to separate the WC mapping routines and hide the return
> -EINVAL in arch specific code...

Jesse How about this patch? Doing this in x86 is cleaner.

I would like Acks/sign-offs-by Thomas, Eric and Jesse, if it is ok with
this patch and works.

thanks,
suresh
---

From: Suresh Siddha <[email protected]>
Subject: x86, pat: return EINVAL for pci mmap WC request for !pat_enabled

Thomas Schlichter reported:
> X.org uses libpciaccess which tries to mmap with write combining enabled via
> /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, the
> kernel does fall back to uncached mmap. Then libpciaccess thinks it succeeded
> mapping with write combining enabled and does not set up suited MTRR entries.
> ;-(

Instead of silently mapping pci mmap region as UC minus in the case
of !pat_enabled and wc request, we can return error. Eric Anholt mentioned
that caller (like X) typically follows up with UC minus pci mmap request and
if there is a free mtrr slot, caller will manage adding WC mtrr.

Jesse Barnes says:
> Older versions of libpciaccess will behave better if we do it that way
> (iirc it only allocates an MTRR if the resource_wc file doesn't exist or
> fails to get mapped).

Reported-by: Thomas Schlichter <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
---

diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index b22d13b..a672f12 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -282,6 +282,15 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
return -EINVAL;

prot = pgprot_val(vma->vm_page_prot);
+
+ /*
+ * Return error if pat is not enabled and write_combine is requested.
+ * Caller can followup with UC MINUS request and add a WC mtrr if there
+ * is a free mtrr slot.
+ */
+ if (!pat_enabled && write_combine)
+ return -EINVAL;
+
if (pat_enabled && write_combine)
prot |= _PAGE_CACHE_WC;
else if (pat_enabled || boot_cpu_data.x86 > 3)

2009-10-23 07:24:54

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

Suresh Siddha wrote:
> On Thu, 2009-10-22 at 21:31 -0700, Jesse Barnes wrote:
> > Here's a quick & dirty version, totally untested. A cleaner approach
> > would be to separate the WC mapping routines and hide the return
> > -EINVAL in arch specific code...
>
> Jesse How about this patch? Doing this in x86 is cleaner.
>
> I would like Acks/sign-offs-by Thomas, Eric and Jesse, if it is ok with
> this patch and works.

Hmm, at this point I already was more than a week ago:
http://marc.info/?l=linux-kernel&m=125537770514713&w=2

OK, I also modified ioremap() and set_memory_wc() but your patch is just part
of what I did there...

And Eric Anholt answered:
> Seems like we should install an MTRR instead. Requiring userland to set
> up the MTRR on the kernel's behalf is backwards.

Where I totally agree with him.

Regards,
Thomas

2009-10-23 14:25:28

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available

On Fri, 2009-10-23 at 00:24 -0700, Thomas Schlichter wrote:
> Hmm, at this point I already was more than a week ago:
> http://marc.info/?l=linux-kernel&m=125537770514713&w=2
>
> OK, I also modified ioremap() and set_memory_wc() but your patch is just part
> of what I did there...

oops! Sorry I read the later emails in this thread carefully but not the
conversation with Thomas Hellstrom. Anyhow, glad that you were on top of
this. Thanks.

Ingo, can you please queue this patch with a sign-off from Thomas
Schlichter too?

thanks,
suresh

2009-10-23 14:38:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC Patch] use MTRR for write combining if PAT is not available


* Suresh Siddha <[email protected]> wrote:

> On Fri, 2009-10-23 at 00:24 -0700, Thomas Schlichter wrote:
> > Hmm, at this point I already was more than a week ago:
> > http://marc.info/?l=linux-kernel&m=125537770514713&w=2
> >
> > OK, I also modified ioremap() and set_memory_wc() but your patch is just part
> > of what I did there...
>
> oops! Sorry I read the later emails in this thread carefully but not
> the conversation with Thomas Hellstrom. Anyhow, glad that you were on
> top of this. Thanks.
>
> Ingo, can you please queue this patch with a sign-off from Thomas
> Schlichter too?

Please resend the latest with all the acks/signoffs added, with the
subject line changed to '[PATCH] ...' and all people Cc:-ed.

Thanks,

Ingo