2015-04-15 20:43:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <[email protected]> wrote:

> c) ivtv: the driver does not have the PCI space mapped out separately, and
> in fact it actually does not do the math for the framebuffer, instead it lets
> the device's own CPU do that and assume where its at, see
> ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> but not a setter. Its not clear if the firmware would make a split easy.
> We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
>

IMO this should be conceptually easy to split. Once we get the
framebuffer address, just unmap it (or don't prematurely map it) and
then ioremap the thing.

> From the beginning it seems only framebuffer devices used MTRR/WC, lately it
> seems infiniband drivers also find good use for for it for PIO TX buffers to
> blast some sort of data, in the future I would not be surprised if other
> devices found use for it.

IMO the Infiniband maintainers should fix their code. Especially in
the server space, there aren't that many MTRRs to go around. I wrote
arch_phys_wc_add in the first place because my server *ran out of
MTRRs*.

Hey, IB people: can you fix your drivers to use arch_phys_wc_add
(which is permitted to be a no-op) along with ioremap_wc? Your users
will thank you.

> It may be true that the existing drivers that
> requires the above type of work are corner cases -- but I wouldn't hold my
> breath for that. The ivtv device is a good example of the worst type of
> situations and these days. So perhap __arch_phys_wc_add() and a
> ioremap_ucminus() might be something more than transient unless hardware folks
> get a good memo or already know how to just Do The Right Thing (TM).

I disagree. We should try to NACK any new code that can't function
without MTRRs.

(Plus, ARM is growing in popularity in the server space, and ARM quite
sensibly doesn't have MTRRs.)

--Andy


2015-04-15 20:57:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On 04/15/2015 01:42 PM, Andy Lutomirski wrote:
>
> I disagree. We should try to NACK any new code that can't function
> without MTRRs.
>
> (Plus, ARM is growing in popularity in the server space, and ARM quite
> sensibly doesn't have MTRRs.)
>

<NOT SPEAKING FOR INTEL HERE>

Yes. People need to understand that MTRRs are fundamentally a
transitional solution, a replacement for the KEN# logic in the P4 and P5
generation processors. The KEN# logic in the chipset would notify the
CPU that a specific address should not be cached, without affecting the
software (which may have been written for x86s built before caching
existed, even.)

MTRRs move this to the head end, so the CPU knows ahead of time what to
do, as is required with newer architectures. It also enabled write
combining in a transparent fashion. However, it is still transitional;
it is there to describe the underlying constraints of the memory system
so that code which doesn't use paging can run at all, but the only thing
that can actually scale is PAT.

-hpa

2015-04-15 22:15:34

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 15, 2015 at 01:42:47PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <[email protected]> wrote:
>
> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> > in fact it actually does not do the math for the framebuffer, instead it lets
> > the device's own CPU do that and assume where its at, see
> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> > but not a setter. Its not clear if the firmware would make a split easy.
> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >
>
> IMO this should be conceptually easy to split. Once we get the
> framebuffer address, just unmap it (or don't prematurely map it) and
> then ioremap the thing.

The driver has split code for handling framebuffer devices, the framebuffer
base address will also vary depending on the type of device it has, for
some its on the encoder, for others its on the decoder. We'd have to account
for the removal of the framebuffer on either of those regions, and would also
need some vetting that the driver doesn't use areas beyond that for MMIO.
Using the trick you suggest though we could overlap ioremap calls and if that
truly works on PAT and non-PAT adding a new ioremap_wc() could do the trick,
I'd appreciate a Tested-by or Acked-by to be done with this. Mauro, any chance
we can get a tested-by of ivtvfb for both non-PAT and PAT systems with this:

diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..1838738 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,10 +44,6 @@
#include <linux/ivtvfb.h>
#include <linux/slab.h>

-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include "ivtv-driver.h"
#include "ivtv-cards.h"
#include "ivtv-i2c.h"
@@ -155,12 +151,11 @@ struct osd_info {
/* Buffer size */
u32 video_buffer_size;

-#ifdef CONFIG_MTRR
/* video_base rounded down as required by hardware MTRRs */
unsigned long fb_start_aligned_physaddr;
/* video_base rounded up as required by hardware MTRRs */
unsigned long fb_end_aligned_physaddr;
-#endif
+ int wc_cookie;

/* Store the buffer offset */
int set_osd_coords_x;
@@ -1099,6 +1094,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
static int ivtvfb_init_io(struct ivtv *itv)
{
struct osd_info *oi = itv->osd_info;
+ /* Find the largest power of two that maps the whole buffer */
+ int size_shift = 31;

mutex_lock(&itv->serialize_lock);
if (ivtv_init_on_first_open(itv)) {
@@ -1120,7 +1117,7 @@ static int ivtvfb_init_io(struct ivtv *itv)
oi->video_buffer_size = 1704960;

oi->video_pbase = itv->base_addr + IVTV_DECODER_OFFSET + oi->video_rbase;
- oi->video_vbase = itv->dec_mem + oi->video_rbase;
+ oi->video_vbase = ioremap_wc(oi->video_pbase, oi->video_buffer_size);

if (!oi->video_vbase) {
IVTVFB_ERR("abort, video memory 0x%x @ 0x%lx isn't mapped!\n",
@@ -1132,29 +1129,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
oi->video_pbase, oi->video_vbase,
oi->video_buffer_size / 1024);

-#ifdef CONFIG_MTRR
- {
- /* Find the largest power of two that maps the whole buffer */
- int size_shift = 31;
-
- while (!(oi->video_buffer_size & (1 << size_shift))) {
- size_shift--;
- }
- size_shift++;
- oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
- oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
- oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
- oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
- if (mtrr_add(oi->fb_start_aligned_physaddr,
- oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
- MTRR_TYPE_WRCOMB, 1) < 0) {
- IVTVFB_INFO("disabled mttr\n");
- oi->fb_start_aligned_physaddr = 0;
- oi->fb_end_aligned_physaddr = 0;
- }
- }
-#endif
-
+ while (!(oi->video_buffer_size & (1 << size_shift)))
+ size_shift--;
+ size_shift++;
+ oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+ oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+ oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+ oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+ oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+ oi->fb_end_aligned_physaddr -
+ oi->fb_start_aligned_physaddr);
/* Blank the entire osd. */
memset_io(oi->video_vbase, 0, oi->video_buffer_size);

@@ -1172,14 +1156,8 @@ static void ivtvfb_release_buffers (struct ivtv *itv)

/* Release pseudo palette */
kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
- if (oi->fb_end_aligned_physaddr) {
- mtrr_del(-1, oi->fb_start_aligned_physaddr,
- oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
- }
-#endif
-
+ iounmap(oi->video_vbase);
+ arch_phys_wc_del(oi->wc_cookie);
kfree(oi);
itv->osd_info = NULL;
}

>
> > From the beginning it seems only framebuffer devices used MTRR/WC, lately it
> > seems infiniband drivers also find good use for for it for PIO TX buffers to
> > blast some sort of data, in the future I would not be surprised if other
> > devices found use for it.
>
> IMO the Infiniband maintainers should fix their code. Especially in
> the server space, there aren't that many MTRRs to go around. I wrote
> arch_phys_wc_add in the first place because my server *ran out of
> MTRRs*.
>
> Hey, IB people: can you fix your drivers to use arch_phys_wc_add
> (which is permitted to be a no-op) along with ioremap_wc? Your users
> will thank you.

Provided the above ivtv driver changes are OK this would be the *last* and only
driver required to be changed.

> > It may be true that the existing drivers that
> > requires the above type of work are corner cases -- but I wouldn't hold my
> > breath for that. The ivtv device is a good example of the worst type of
> > situations and these days. So perhap __arch_phys_wc_add() and a
> > ioremap_ucminus() might be something more than transient unless hardware folks
> > get a good memo or already know how to just Do The Right Thing (TM).
>
> I disagree. We should try to NACK any new code that can't function
> without MTRRs.
>
> (Plus, ARM is growing in popularity in the server space, and ARM quite
> sensibly doesn't have MTRRs.)

Great, happy with this, but we need to address the last few drivers and their
exisitng code then.

Luis

2015-04-15 23:44:14

by Andy Walls

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, 2015-04-15 at 13:42 -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <[email protected]> wrote:
>
> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> > in fact it actually does not do the math for the framebuffer, instead it lets
> > the device's own CPU do that and assume where its at, see
> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> > but not a setter. Its not clear if the firmware would make a split easy.
> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >
>
> IMO this should be conceptually easy to split. Once we get the
> framebuffer address, just unmap it (or don't prematurely map it) and
> then ioremap the thing.

Not so easy. The main ivtv driver has already set up the PCI device and
done the mapping for the MPEG-2 decoder/video output engine. The video
decoder/output device nodes might already be open by user space calling
into the main driver, before the ivtvfb module is even loaded.

This could be mitigated by integrating all the ivtvfb module code into
the main ivtv module. But even then not every PVR-350 owner wants to
use the video output OSD as a framebuffer. Users might just want an
actual OSD overlaying their TV video playback.

Regards,
Andy

2015-04-15 23:52:58

by Andy Lutomirski

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 15, 2015 at 3:50 PM, Andy Walls <[email protected]> wrote:
> On Wed, 2015-04-15 at 13:42 -0700, Andy Lutomirski wrote:
>> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <[email protected]> wrote:
>>
>> > c) ivtv: the driver does not have the PCI space mapped out separately, and
>> > in fact it actually does not do the math for the framebuffer, instead it lets
>> > the device's own CPU do that and assume where its at, see
>> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
>> > but not a setter. Its not clear if the firmware would make a split easy.
>> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
>> >
>>
>> IMO this should be conceptually easy to split. Once we get the
>> framebuffer address, just unmap it (or don't prematurely map it) and
>> then ioremap the thing.
>
> Not so easy. The main ivtv driver has already set up the PCI device and
> done the mapping for the MPEG-2 decoder/video output engine. The video
> decoder/output device nodes might already be open by user space calling
> into the main driver, before the ivtvfb module is even loaded.

Surely the MPEG-2 decoder/video engine won't overlap the framebuffer,
though. Am I missing something?

--Andy

>
> This could be mitigated by integrating all the ivtvfb module code into
> the main ivtv module. But even then not every PVR-350 owner wants to
> use the video output OSD as a framebuffer. Users might just want an
> actual OSD overlaying their TV video playback.
>
> Regards,
> Andy
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-04-16 01:27:22

by Andy Walls

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, 2015-04-15 at 16:52 -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 3:50 PM, Andy Walls <[email protected]> wrote:
> > On Wed, 2015-04-15 at 13:42 -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <[email protected]> wrote:
> >>
> >> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> >> > in fact it actually does not do the math for the framebuffer, instead it lets
> >> > the device's own CPU do that and assume where its at, see
> >> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> >> > but not a setter. Its not clear if the firmware would make a split easy.
> >> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >> >
> >>
> >> IMO this should be conceptually easy to split. Once we get the
> >> framebuffer address, just unmap it (or don't prematurely map it) and
> >> then ioremap the thing.
> >
> > Not so easy. The main ivtv driver has already set up the PCI device and
> > done the mapping for the MPEG-2 decoder/video output engine. The video
> > decoder/output device nodes might already be open by user space calling
> > into the main driver, before the ivtvfb module is even loaded.
>
> Surely the MPEG-2 decoder/video engine won't overlap the framebuffer,
> though. Am I missing something?

ivtvfb is stealing the decoders' OSD for use as a framebuffer.
The decoder video output memory doesn't overlap the decoder OSD memory,
but there is a functional overlap. ivtv driver video output device
nodes can manipulate the OSD that ivtvfb is stealing.

It would be a dumb thing for the user to want to use ivtvfb, and to also
manipulate the OSD via the video output device nodes at the same time,
for anything other than setting up the TV video standard. However the
current ivtv driver code doesn't prevent the OSD from being manipulated
by the video output device nodes when ivtvfb is in use.

-Andy

2015-04-21 22:46:10

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 15, 2015 at 01:42:47PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <[email protected]> wrote:
>
> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> > in fact it actually does not do the math for the framebuffer, instead it lets
> > the device's own CPU do that and assume where its at, see
> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> > but not a setter. Its not clear if the firmware would make a split easy.
> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >
>
> IMO this should be conceptually easy to split. Once we get the
> framebuffer address, just unmap it (or don't prematurely map it) and
> then ioremap the thing.

Side note to ipath driver folks: as reviewed with Andy Walls, the
ivtv driver cannot easily be ported to use PAT so we are evaluating
simply removing write-combing from that driver on future kernels.

>
> > From the beginning it seems only framebuffer devices used MTRR/WC, lately it
> > seems infiniband drivers also find good use for for it for PIO TX buffers to
> > blast some sort of data, in the future I would not be surprised if other
> > devices found use for it.
>
> IMO the Infiniband maintainers should fix their code. Especially in
> the server space, there aren't that many MTRRs to go around. I wrote
> arch_phys_wc_add in the first place because my server *ran out of
> MTRRs*.
>
> Hey, IB people: can you fix your drivers to use arch_phys_wc_add
> (which is permitted to be a no-op) along with ioremap_wc? Your users

ipath driver maintainers:

The ipath driver is one of two drivers left to convert over to
arch_phys_wc_add(). MTRR use is being deprecated, and its use is actually
highly discouraged now that we have proper PAT implemenation on Linux. Since we
are talking about annotating the qib driver as "known to be broken without PAT"
and since the ipath driver needs considerable work to be ported to use PAT (the
userspace register is just one area) I wanted to review if we can just remove
MTRR use on the ipath driver and annotate write-combining with PAT as a TODO
item.

This would help a lot in our journey to bury MTRR use.

Luis

2015-04-21 22:58:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 12:46:01AM +0200, Luis R. Rodriguez wrote:

> are talking about annotating the qib driver as "known to be broken without PAT"
> and since the ipath driver needs considerable work to be ported to
> use PAT (the

This only seems to be true for one of the chips that driver supports,
not all possibilities.

> userspace register is just one area) I wanted to review if we can just remove
> MTRR use on the ipath driver and annotate write-combining with PAT as a TODO
> item.

AFAIK, dropping MTRR support will completely break the performance to
the point the driver is unusable. If we drop MTRR we may as well
remove the driver.

Mike, do you think the time is right to just remove the iPath driver?

Jason

2015-04-21 23:39:14

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Tue, Apr 21, 2015 at 04:57:32PM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 12:46:01AM +0200, Luis R. Rodriguez wrote:
>
> > are talking about annotating the qib driver as "known to be broken without PAT"
> > and since the ipath driver needs considerable work to be ported to
> > use PAT (the
>
> This only seems to be true for one of the chips that driver supports,
> not all possibilities.
>
> > userspace register is just one area) I wanted to review if we can just remove
> > MTRR use on the ipath driver and annotate write-combining with PAT as a TODO
> > item.
>
> AFAIK, dropping MTRR support will completely break the performance to
> the point the driver is unusable. If we drop MTRR we may as well
> remove the driver.

To be clear, the arch_phys_wc_add() API will be a no-op when PAT is
enabled on a system. Although some folks think PAT is new, its not,
its just that our implementation on Linux lacked a bit of push, recent
changes however make PAT support complete and that means now we'll have
PAT enabled on systems that likely didn't before on recent kernels.

There are other important motivations to use PAT:

* Xen won't work with MTRR, having a unified PAT enabled kernel
will ensure that when Xen is used write-combinging will take
effect

* Long term we want to make strong UC the default to ioremap() on
x86, right now its not, its UC-, we need to convert all drivers
that want write-combining over to use ioremap_wc() for their
specific area, and it must not overlap. There are issues with
using mtrr_add() on regions marked as UC-, since its the default
this means that mtrr_add() use on ioramp'd areas on PAT systems
will actually make write-combining *void*. Refer to this table
for combinatorial effects for non-PAT / PAT of wc MTRR:

https://marc.info/?l=linux-kernel&m=142964809710517&w=1

So as the train of PAT enablement moves forward it means systems
that have PAT enabled now might not have write-combining effective.
In order to get the best of both worlds, non-PAT and PAT systems
we must design drivers cleanly for the non-writecombining and
write-combining areas. This mean using ioremap_nocache() for MMIO
and ioremap_wc() *only* for the desired, write-combining area,
followed by arch_phys_wc_add().

> Mike, do you think the time is right to just remove the iPath driver?

With PAT now being default the driver effectively won't work
with write-combining on modern kernels. Even if systems are old
they likely had PAT support, when upgrading kernels PAT will work
but write-combing won't on ipath.

Luis

2015-04-22 05:40:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > Mike, do you think the time is right to just remove the iPath driver?
>
> With PAT now being default the driver effectively won't work
> with write-combining on modern kernels. Even if systems are old
> they likely had PAT support, when upgrading kernels PAT will work
> but write-combing won't on ipath.

Sorry, do you mean the driver already doesn't get WC? Or do you mean
after some more pending patches are applied?

Jason

2015-04-22 15:23:40

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > Mike, do you think the time is right to just remove the iPath driver?
> >
> > With PAT now being default the driver effectively won't work
> > with write-combining on modern kernels. Even if systems are old
> > they likely had PAT support, when upgrading kernels PAT will work
> > but write-combing won't on ipath.
>
> Sorry, do you mean the driver already doesn't get WC? Or do you mean
> after some more pending patches are applied?

No, you have to consider the system used and the effects of calls used
on the driver in light of this table:

----------------------------------------------------------------------
MTRR Non-PAT PAT Linux ioremap value Effective memory type
----------------------------------------------------------------------
Non-PAT | PAT
PAT
|PCD
||PWT
|||
WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
----------------------------------------------------------------------

(*) denotes implementation defined and is discouraged

ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
the default. When that flip occurs it will mean ipath cannot get
write-combining on both non-PAT and PAT systems. Now that is for
the future, lets review the current situation for ipath.

For PAT capable systems if mtrr_add() is used today on a Linux system on a
region mapped with ioremap_nocache() that will mean you effectively nullify the
mtrr_add() effect as the combinatorial effect above yields an effective memory
type of UC. For PAT systems you want to use ioremap_wc() on the region in
which you need write-combining followed by arch_phys_wc_add() which will *only*
call mtrr_add() *iff* PAT was not enabled. This also means we need to split
the ioremap'd areas so that the area that is using ioremap_nocache() can never
get write-combining (_PAGE_CACHE_MODE_UC). The ipath driver needs the regions
split just as was done for the qib driver.

Now we could just say that leaving things as-is is a non-issue if you are OK
with non-write-combining effects being the default behaviour left on the ipath
driver for PAT systems. In that case we can just use arch_phys_wc_add() on the
driver and while it won't trigger the mtrr_add() on PAT systems it sill won't
have any effect. We just typically don't want to see use of ioremap_nocache()
paired with arch_phys_wc_add(), grammatically the correct thing to do is pair
ioremap_wc() areas with a arch_phys_wc_add() to make the write-combining effects
on non-PAT systems. If the ipath driver is not going to get he work required
to split the regions though perhaps we can live with a corner case driver that
annotates PAT must be disabled on the systems that use it and convert it to
arch_phys_wc_add() to just help with phasing out of direct use of mtrr_add().
With this strategy if and when ipath driver gets a split done it would gain WC
on both PAT and non-PAT.

Luis

2015-04-22 15:54:55

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > Mike, do you think the time is right to just remove the iPath driver?
> > >
> > > With PAT now being default the driver effectively won't work
> > > with write-combining on modern kernels. Even if systems are old
> > > they likely had PAT support, when upgrading kernels PAT will work
> > > but write-combing won't on ipath.
> >
> > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > after some more pending patches are applied?
>
> No, you have to consider the system used and the effects of calls used
> on the driver in light of this table:
>
> ----------------------------------------------------------------------
> MTRR Non-PAT PAT Linux ioremap value Effective memory type
> ----------------------------------------------------------------------
> Non-PAT | PAT
> PAT
> |PCD
> ||PWT
> |||
> WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
> WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
> WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
> WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
> ----------------------------------------------------------------------
>
> (*) denotes implementation defined and is discouraged
>
> ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
> in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
> the default. When that flip occurs it will mean ipath cannot get
> write-combining on both non-PAT and PAT systems. Now that is for
> the future, lets review the current situation for ipath.
>
> For PAT capable systems if mtrr_add() is used today on a Linux system on a
> region mapped with ioremap_nocache() that will mean you effectively nullify the
> mtrr_add() effect as the combinatorial effect above yields an effective memory
> type of UC. For PAT systems you want to use ioremap_wc() on the region in
> which you need write-combining followed by arch_phys_wc_add() which will *only*
> call mtrr_add() *iff* PAT was not enabled. This also means we need to split
> the ioremap'd areas so that the area that is using ioremap_nocache() can never
> get write-combining (_PAGE_CACHE_MODE_UC). The ipath driver needs the regions
> split just as was done for the qib driver.
>
> Now we could just say that leaving things as-is is a non-issue if you are OK
> with non-write-combining effects being the default behaviour left on the ipath
> driver for PAT systems. In that case we can just use arch_phys_wc_add() on the
> driver and while it won't trigger the mtrr_add() on PAT systems it sill won't
> have any effect. We just typically don't want to see use of ioremap_nocache()
> paired with arch_phys_wc_add(), grammatically the correct thing to do is pair
> ioremap_wc() areas with a arch_phys_wc_add() to make the write-combining effects
> on non-PAT systems. If the ipath driver is not going to get he work required
> to split the regions though perhaps we can live with a corner case driver that
> annotates PAT must be disabled on the systems that use it and convert it to
> arch_phys_wc_add() to just help with phasing out of direct use of mtrr_add().
> With this strategy if and when ipath driver gets a split done it would gain WC
> on both PAT and non-PAT.

Folks, after some thought I do believe the above temporary strategy would
avoid issues and would not have to stir people up to go and make code
changes. We can use the same strategy for both ivtv and ipath:

* Annotate via Kconfig for the driver that it depends on !X86_PAT
that will ensure that PAT systems won't use it, and convert it
to use arch_phys_wc_add() to help phase out direct access to mtrr_add()

This would be correct given that the current situation on the driver
makes write-combining non-effective on PAT systems, we in fact gain
avoiding these type of use-cases, and annotate this as a big TODO item
for folks who do want it for PAT systems.

Thoughts?

Luis

2015-04-22 15:59:32

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 8:54 AM, Luis R. Rodriguez <[email protected]> wrote:
> On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
>> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
>> > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
>> > > > Mike, do you think the time is right to just remove the iPath driver?
>> > >
>> > > With PAT now being default the driver effectively won't work
>> > > with write-combining on modern kernels. Even if systems are old
>> > > they likely had PAT support, when upgrading kernels PAT will work
>> > > but write-combing won't on ipath.
>> >
>> > Sorry, do you mean the driver already doesn't get WC? Or do you mean
>> > after some more pending patches are applied?
>>
>> No, you have to consider the system used and the effects of calls used
>> on the driver in light of this table:
>>
>> ----------------------------------------------------------------------
>> MTRR Non-PAT PAT Linux ioremap value Effective memory type
>> ----------------------------------------------------------------------
>> Non-PAT | PAT
>> PAT
>> |PCD
>> ||PWT
>> |||
>> WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
>> WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
>> WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
>> WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
>> ----------------------------------------------------------------------
>>
>> (*) denotes implementation defined and is discouraged
>>
>> ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
>> in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
>> the default. When that flip occurs it will mean ipath cannot get
>> write-combining on both non-PAT and PAT systems. Now that is for
>> the future, lets review the current situation for ipath.
>>
>> For PAT capable systems if mtrr_add() is used today on a Linux system on a
>> region mapped with ioremap_nocache() that will mean you effectively nullify the
>> mtrr_add() effect as the combinatorial effect above yields an effective memory
>> type of UC. For PAT systems you want to use ioremap_wc() on the region in
>> which you need write-combining followed by arch_phys_wc_add() which will *only*
>> call mtrr_add() *iff* PAT was not enabled. This also means we need to split
>> the ioremap'd areas so that the area that is using ioremap_nocache() can never
>> get write-combining (_PAGE_CACHE_MODE_UC). The ipath driver needs the regions
>> split just as was done for the qib driver.
>>
>> Now we could just say that leaving things as-is is a non-issue if you are OK
>> with non-write-combining effects being the default behaviour left on the ipath
>> driver for PAT systems. In that case we can just use arch_phys_wc_add() on the
>> driver and while it won't trigger the mtrr_add() on PAT systems it sill won't
>> have any effect. We just typically don't want to see use of ioremap_nocache()
>> paired with arch_phys_wc_add(), grammatically the correct thing to do is pair
>> ioremap_wc() areas with a arch_phys_wc_add() to make the write-combining effects
>> on non-PAT systems. If the ipath driver is not going to get he work required
>> to split the regions though perhaps we can live with a corner case driver that
>> annotates PAT must be disabled on the systems that use it and convert it to
>> arch_phys_wc_add() to just help with phasing out of direct use of mtrr_add().
>> With this strategy if and when ipath driver gets a split done it would gain WC
>> on both PAT and non-PAT.
>
> Folks, after some thought I do believe the above temporary strategy would
> avoid issues and would not have to stir people up to go and make code
> changes. We can use the same strategy for both ivtv and ipath:
>
> * Annotate via Kconfig for the driver that it depends on !X86_PAT
> that will ensure that PAT systems won't use it, and convert it
> to use arch_phys_wc_add() to help phase out direct access to mtrr_add()
>
> This would be correct given that the current situation on the driver
> makes write-combining non-effective on PAT systems, we in fact gain
> avoiding these type of use-cases, and annotate this as a big TODO item
> for folks who do want it for PAT systems.
>
> Thoughts?

Another option in order to enable this type of checks at run time and
still be able to build the driver on standard distributions and just
prevent if from loading on PAT systems is to have some code in place
which would prevent the driver from loading if PAT was enabled, this
would enable folks to disable PAT via a kernel command line option,
and if that was used then the driver probe would complete.

Luis

2015-04-22 16:18:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > Mike, do you think the time is right to just remove the iPath driver?
> > >
> > > With PAT now being default the driver effectively won't work
> > > with write-combining on modern kernels. Even if systems are old
> > > they likely had PAT support, when upgrading kernels PAT will work
> > > but write-combing won't on ipath.
> >
> > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > after some more pending patches are applied?
>
> No, you have to consider the system used and the effects of calls used
> on the driver in light of this table:

So, just to be clear:

At some point Linux started setting the PAT bits during
ioremap_nocache, which overrides MTRR, and at that point the driver
became broken on all PAT capable systems?

Not only that, but we've only just noticed it now, and no user ever
complained?

So that means either no users exist, or all users are on non-PAT
systems?

This driver only works on x86-64 systems. Are there any x86-64 systems
that are not PAT capable? IIRC even the first Opteron had PAT, but my
memory is fuzzy from back then :|

> Another option in order to enable this type of checks at run time
> and still be able to build the driver on standard distributions and
> just prevent if from loading on PAT systems is to have some code in
> place which would prevent the driver from loading if PAT was
> enabled, this would enable folks to disable PAT via a kernel command
> line option, and if that was used then the driver probe would
> complete.

This seems like a reasonble option to me. At the very least we might
learn if anyone is still using these cards.

I'd also love to remove the driver if it turns out there are actually
no users. qib substantially replaces it except for a few very old
cards.

Mike?

Jason

2015-04-22 16:52:04

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 10:17:55AM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > > Mike, do you think the time is right to just remove the iPath driver?
> > > >
> > > > With PAT now being default the driver effectively won't work
> > > > with write-combining on modern kernels. Even if systems are old
> > > > they likely had PAT support, when upgrading kernels PAT will work
> > > > but write-combing won't on ipath.
> > >
> > > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > > after some more pending patches are applied?
> >
> > No, you have to consider the system used and the effects of calls used
> > on the driver in light of this table:
>
> So, just to be clear:
>
> At some point Linux started setting the PAT bits during
> ioremap_nocache, which overrides MTRR, and at that point the driver
> became broken on all PAT capable systems?

No, PAT code lacked quite a bit of love, and Juergen and some others have
been giving it some love and now we expect PAT to be enabled by default on
more systems. When and and on what systemes and as of what commits? Not
sure, there's quite a bit of PAT work but hoping Juergen might fill
in the details, CC'd.

> Not only that, but we've only just noticed it now, and no user ever
> complained?

No, well this is all recent, so we expect more PAT enabled systems now.

> So that means either no users exist, or all users are on non-PAT
> systems?

PAT was not being enabled before on most systems, it will now be on
most systems, for some systems there may be some errata that needs to
be addressed for PAT.

> This driver only works on x86-64 systems. Are there any x86-64 systems
> that are not PAT capable?

Not that I know of, but I've heard of some "PAT errata" systems, and
that may need some attention.

> IIRC even the first Opteron had PAT, but my
> memory is fuzzy from back then :|
>
> > Another option in order to enable this type of checks at run time
> > and still be able to build the driver on standard distributions and
> > just prevent if from loading on PAT systems is to have some code in
> > place which would prevent the driver from loading if PAT was
> > enabled, this would enable folks to disable PAT via a kernel command
> > line option, and if that was used then the driver probe would
> > complete.
>
> This seems like a reasonble option to me. At the very least we might
> learn if anyone is still using these cards.

OK great.

> I'd also love to remove the driver if it turns out there are actually
> no users. qib substantially replaces it except for a few very old
> cards.
>
> Mike?

By replacing do you mean same hardware? BTW below are the changes
which I describe above which would prevent both ipath and ivtv
to load on PAT enabled systems. I think its a reasonable compromise.
If this is OK I can proceed with a split of the patches and move
on with the last series that burries MTRR.

Andy Walls, please review too.

Luis

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..3ef592c 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -42,6 +42,9 @@
#include <linux/bitmap.h>
#include <linux/slab.h>
#include <linux/module.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
+#endif

#include "ipath_kernel.h"
#include "ipath_verbs.h"
@@ -395,6 +398,14 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
unsigned long long addr;
u32 bar0 = 0, bar1 = 0;

+#ifdef CONFIG_X86_64
+ if (WARN(pat_enabled,
+ "ipath needs PAT disabled, boot with nopat kernel parameter\n")) {
+ ret = EINVAL;
+ goto bail;
+ }
+#endif
+
dd = ipath_alloc_devdata(pdev);
if (IS_ERR(dd)) {
ret = PTR_ERR(dd);
@@ -542,6 +553,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dd->ipath_kregbase = __ioremap(addr, len,
(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
#else
+ /* XXX: split this properly to enable on PAT */
dd->ipath_kregbase = ioremap_nocache(addr, len);
#endif

@@ -587,12 +599,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

ret = ipath_enable_wc(dd);

- if (ret) {
- ipath_dev_err(dd, "Write combining not enabled "
- "(err %d): performance may be poor\n",
- -ret);
+ if (ret)
ret = 0;
- }

ipath_verify_pioperf(dd);

diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
/* offset in HT config space of slave/primary interface block */
u8 ipath_ht_slave_off;
/* for write combining settings */
- unsigned long ipath_wc_cookie;
- unsigned long ipath_wc_base;
- unsigned long ipath_wc_len;
+ int wc_cookie;
/* ref count for each pkey */
atomic_t ipath_pkeyrefs[4];
/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..7b6e4c8 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
*/

#include <linux/pci.h>
-#include <asm/mtrr.h>
#include <asm/processor.h>

#include "ipath_kernel.h"
@@ -122,27 +121,14 @@ int ipath_enable_wc(struct ipath_devdata *dd)
}

if (!ret) {
- int cookie;
- ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
- "(addr %llx, len=0x%llx)\n",
- (unsigned long long) pioaddr,
- (unsigned long long) piolen);
- cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
- if (cookie < 0) {
- {
- dev_info(&dd->pcidev->dev,
- "mtrr_add() WC for PIO bufs "
- "failed (%d)\n",
- cookie);
- ret = -EINVAL;
- }
- } else {
- ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
- "cookie is %d\n", cookie);
- dd->ipath_wc_cookie = cookie;
- dd->ipath_wc_base = (unsigned long) pioaddr;
- dd->ipath_wc_len = (unsigned long) piolen;
- }
+ dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+ if (dd->wc_cookie < 0) {
+ ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+ ret = -ENODEV;
+ } else if (dd->wc_cookie == 0)
+ ipath_cdbg(VERBOSE, "Set mtrr for chip to WC not needed\n");
+ else
+ ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
}

return ret;
@@ -154,16 +140,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
*/
void ipath_disable_wc(struct ipath_devdata *dd)
{
- if (dd->ipath_wc_cookie) {
- int r;
- ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
- r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
- dd->ipath_wc_len);
- if (r < 0)
- dev_info(&dd->pcidev->dev,
- "mtrr_del(%lx, %lx, %lx) failed: %d\n",
- dd->ipath_wc_cookie, dd->ipath_wc_base,
- dd->ipath_wc_len, r);
- dd->ipath_wc_cookie = 0; /* even on failure */
- }
+ arch_phys_wc_del(dd->wc_cookie);
}
diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..369598c 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,8 +44,8 @@
#include <linux/ivtvfb.h>
#include <linux/slab.h>

-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
#endif

#include "ivtv-driver.h"
@@ -155,12 +155,11 @@ struct osd_info {
/* Buffer size */
u32 video_buffer_size;

-#ifdef CONFIG_MTRR
/* video_base rounded down as required by hardware MTRRs */
unsigned long fb_start_aligned_physaddr;
/* video_base rounded up as required by hardware MTRRs */
unsigned long fb_end_aligned_physaddr;
-#endif
+ int wc_cookie;

/* Store the buffer offset */
int set_osd_coords_x;
@@ -1099,6 +1098,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
static int ivtvfb_init_io(struct ivtv *itv)
{
struct osd_info *oi = itv->osd_info;
+ /* Find the largest power of two that maps the whole buffer */
+ int size_shift = 31;

mutex_lock(&itv->serialize_lock);
if (ivtv_init_on_first_open(itv)) {
@@ -1120,6 +1121,7 @@ static int ivtvfb_init_io(struct ivtv *itv)
oi->video_buffer_size = 1704960;

oi->video_pbase = itv->base_addr + IVTV_DECODER_OFFSET + oi->video_rbase;
+ /* XXX: split this for PAT */
oi->video_vbase = itv->dec_mem + oi->video_rbase;

if (!oi->video_vbase) {
@@ -1132,29 +1134,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
oi->video_pbase, oi->video_vbase,
oi->video_buffer_size / 1024);

-#ifdef CONFIG_MTRR
- {
- /* Find the largest power of two that maps the whole buffer */
- int size_shift = 31;
-
- while (!(oi->video_buffer_size & (1 << size_shift))) {
- size_shift--;
- }
- size_shift++;
- oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
- oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
- oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
- oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
- if (mtrr_add(oi->fb_start_aligned_physaddr,
- oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
- MTRR_TYPE_WRCOMB, 1) < 0) {
- IVTVFB_INFO("disabled mttr\n");
- oi->fb_start_aligned_physaddr = 0;
- oi->fb_end_aligned_physaddr = 0;
- }
- }
-#endif
-
+ while (!(oi->video_buffer_size & (1 << size_shift)))
+ size_shift--;
+ size_shift++;
+ oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+ oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+ oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+ oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+ oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+ oi->fb_end_aligned_physaddr -
+ oi->fb_start_aligned_physaddr);
/* Blank the entire osd. */
memset_io(oi->video_vbase, 0, oi->video_buffer_size);

@@ -1172,14 +1161,8 @@ static void ivtvfb_release_buffers (struct ivtv *itv)

/* Release pseudo palette */
kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
- if (oi->fb_end_aligned_physaddr) {
- mtrr_del(-1, oi->fb_start_aligned_physaddr,
- oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
- }
-#endif
-
+ iounmap(oi->video_vbase);
+ arch_phys_wc_del(oi->wc_cookie);
kfree(oi);
itv->osd_info = NULL;
}
@@ -1190,6 +1173,13 @@ static int ivtvfb_init_card(struct ivtv *itv)
{
int rc;

+#ifdef CONFIG_X86_64
+ if (WARN(pat_enabled,
+ "ivtv needs PAT disabled, boot with nopat kernel parameter\n")) {
+ return EINVAL;
+ }
+#endif
+
if (itv->osd_info) {
IVTVFB_ERR("Card %d already initialised\n", ivtvfb_card_id);
return -EBUSY;

2015-04-22 16:53:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 8:23 AM, Luis R. Rodriguez <[email protected]> wrote:
> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
>> On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
>> > > Mike, do you think the time is right to just remove the iPath driver?
>> >
>> > With PAT now being default the driver effectively won't work
>> > with write-combining on modern kernels. Even if systems are old
>> > they likely had PAT support, when upgrading kernels PAT will work
>> > but write-combing won't on ipath.
>>
>> Sorry, do you mean the driver already doesn't get WC? Or do you mean
>> after some more pending patches are applied?
>
> No, you have to consider the system used and the effects of calls used
> on the driver in light of this table:
>
> ----------------------------------------------------------------------
> MTRR Non-PAT PAT Linux ioremap value Effective memory type
> ----------------------------------------------------------------------
> Non-PAT | PAT
> PAT
> |PCD
> ||PWT
> |||
> WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
> WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
> WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
> WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
> ----------------------------------------------------------------------
>
> (*) denotes implementation defined and is discouraged
>
> ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
> in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
> the default. When that flip occurs it will mean ipath cannot get
> write-combining on both non-PAT and PAT systems. Now that is for
> the future, lets review the current situation for ipath.
>
> For PAT capable systems if mtrr_add() is used today on a Linux system on a
> region mapped with ioremap_nocache() that will mean you effectively nullify the
> mtrr_add() effect as the combinatorial effect above yields an effective memory
> type of UC.

Are you sure? I thought that ioremap_nocache currently is UC-, so
mtrr_add + ioremap_nocache gets WC even on PAT systems.

Going forward, when mtrr_add is gone, this will change, of course.

--Andy

2015-04-22 17:07:41

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 09:53:03AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 22, 2015 at 8:23 AM, Luis R. Rodriguez <[email protected]> wrote:
> > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> >> On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> >> > > Mike, do you think the time is right to just remove the iPath driver?
> >> >
> >> > With PAT now being default the driver effectively won't work
> >> > with write-combining on modern kernels. Even if systems are old
> >> > they likely had PAT support, when upgrading kernels PAT will work
> >> > but write-combing won't on ipath.
> >>
> >> Sorry, do you mean the driver already doesn't get WC? Or do you mean
> >> after some more pending patches are applied?
> >
> > No, you have to consider the system used and the effects of calls used
> > on the driver in light of this table:
> >
> > ----------------------------------------------------------------------
> > MTRR Non-PAT PAT Linux ioremap value Effective memory type
> > ----------------------------------------------------------------------
> > Non-PAT | PAT
> > PAT
> > |PCD
> > ||PWT
> > |||
> > WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
> > WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
> > WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
> > WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
> > ----------------------------------------------------------------------
> >
> > (*) denotes implementation defined and is discouraged
> >
> > ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
> > in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
> > the default. When that flip occurs it will mean ipath cannot get
> > write-combining on both non-PAT and PAT systems. Now that is for
> > the future, lets review the current situation for ipath.
> >
> > For PAT capable systems if mtrr_add() is used today on a Linux system on a
> > region mapped with ioremap_nocache() that will mean you effectively nullify the
> > mtrr_add() effect as the combinatorial effect above yields an effective memory
> > type of UC.
>
> Are you sure?

Well lets double check.

> I thought that ioremap_nocache currently is UC-,

It is.

> so mtrr_add + ioremap_nocache gets WC even on PAT systems.

https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

As per Intel SDM "11.5.2.2 Selecting Memory Types for Pentium
III and More Recent Processor Families" the ffect of a WC MTRR
for a region with a PAT entry value of UC will be UC. The effect
of a WC MTRR on a region with a PAT entry UC- will be WC. The
effect of a WC MTRR on a regoin with PAT entry WC is WC.

And indeed as per table 11-7 mtrr WC on PAT UC- yields WC. So ineed the above
table needs adjustment for this. So for PAT systems write-combing would be
effective with mtrr_add(), but once strong UC (_PAGE_CACHE_MODE_UC) is used by
default for ioremap_nocache() what I mentioned will be true. Furhtermore if we
switch the drivers to use arch_phys_wc_add() then for sure write-combining will
also not be effective.

Jason, Andy, is the change still a reasonable compromise? We'd just be asking
users to boot with noat for users for ipath, ivtv until the drivers gets proper
PAT support with a split.

There are two motivations for this:

* help move to strong UC as default
* bury MTRR

> Going forward, when mtrr_add is gone, this will change, of course.

Indeed.

Luis

2015-04-22 18:54:25

by Doug Ledford

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, 2015-04-22 at 10:17 -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > > Mike, do you think the time is right to just remove the iPath driver?
> > > >
> > > > With PAT now being default the driver effectively won't work
> > > > with write-combining on modern kernels. Even if systems are old
> > > > they likely had PAT support, when upgrading kernels PAT will work
> > > > but write-combing won't on ipath.
> > >
> > > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > > after some more pending patches are applied?
> >
> > No, you have to consider the system used and the effects of calls used
> > on the driver in light of this table:
>
> So, just to be clear:
>
> At some point Linux started setting the PAT bits during
> ioremap_nocache, which overrides MTRR, and at that point the driver
> became broken on all PAT capable systems?
>
> Not only that, but we've only just noticed it now, and no user ever
> complained?
>
> So that means either no users exist, or all users are on non-PAT
> systems?
>
> This driver only works on x86-64 systems. Are there any x86-64 systems
> that are not PAT capable? IIRC even the first Opteron had PAT, but my
> memory is fuzzy from back then :|
>
> > Another option in order to enable this type of checks at run time
> > and still be able to build the driver on standard distributions and
> > just prevent if from loading on PAT systems is to have some code in
> > place which would prevent the driver from loading if PAT was
> > enabled, this would enable folks to disable PAT via a kernel command
> > line option, and if that was used then the driver probe would
> > complete.
>
> This seems like a reasonble option to me. At the very least we might
> learn if anyone is still using these cards.
>
> I'd also love to remove the driver if it turns out there are actually
> no users. qib substantially replaces it except for a few very old
> cards.

To be precise, the split is that ipath powers the old HTX bus cards that
only work in AMD systems, qib is all PCI-e cards. I still have a few
HTX cards, but I no longer have any systems with HTX slots, so we
haven't even used this driver in testing for 3 or 4 years now. And
these are all old SDR cards, where the performance numbers were 800MB/s
with WC enabled, 50MB/s without it.

> Mike?
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-22 19:05:28

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote:
> On Wed, 2015-04-22 at 10:17 -0600, Jason Gunthorpe wrote:
> > On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> > > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > > > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > > > Mike, do you think the time is right to just remove the iPath driver?
> > > > >
> > > > > With PAT now being default the driver effectively won't work
> > > > > with write-combining on modern kernels. Even if systems are old
> > > > > they likely had PAT support, when upgrading kernels PAT will work
> > > > > but write-combing won't on ipath.
> > > >
> > > > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > > > after some more pending patches are applied?
> > >
> > > No, you have to consider the system used and the effects of calls used
> > > on the driver in light of this table:
> >
> > So, just to be clear:
> >
> > At some point Linux started setting the PAT bits during
> > ioremap_nocache, which overrides MTRR, and at that point the driver
> > became broken on all PAT capable systems?
> >
> > Not only that, but we've only just noticed it now, and no user ever
> > complained?
> >
> > So that means either no users exist, or all users are on non-PAT
> > systems?
> >
> > This driver only works on x86-64 systems. Are there any x86-64 systems
> > that are not PAT capable? IIRC even the first Opteron had PAT, but my
> > memory is fuzzy from back then :|
> >
> > > Another option in order to enable this type of checks at run time
> > > and still be able to build the driver on standard distributions and
> > > just prevent if from loading on PAT systems is to have some code in
> > > place which would prevent the driver from loading if PAT was
> > > enabled, this would enable folks to disable PAT via a kernel command
> > > line option, and if that was used then the driver probe would
> > > complete.
> >
> > This seems like a reasonble option to me. At the very least we might
> > learn if anyone is still using these cards.
> >
> > I'd also love to remove the driver if it turns out there are actually
> > no users. qib substantially replaces it except for a few very old
> > cards.
>
> To be precise, the split is that ipath powers the old HTX bus cards that
> only work in AMD systems,

Do those systems have PAT support? CAn anyone check if PAT is enabled
if booted on a recent kernel?


Luis

2015-04-22 19:10:54

by Doug Ledford

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, 2015-04-22 at 21:05 +0200, Luis R. Rodriguez wrote:

> > > I'd also love to remove the driver if it turns out there are actually
> > > no users. qib substantially replaces it except for a few very old
> > > cards.
> >
> > To be precise, the split is that ipath powers the old HTX bus cards that
> > only work in AMD systems,
>
> Do those systems have PAT support? CAn anyone check if PAT is enabled
> if booted on a recent kernel?

I don't have one of these systems any more. The *only* one I ever had
was a monster IBM box...I can't even find a reference to it any more.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-22 19:14:30

by Luis Chamberlain

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 12:10 PM, Doug Ledford <[email protected]> wrote:
> On Wed, 2015-04-22 at 21:05 +0200, Luis R. Rodriguez wrote:
>
>> > > I'd also love to remove the driver if it turns out there are actually
>> > > no users. qib substantially replaces it except for a few very old
>> > > cards.
>> >
>> > To be precise, the split is that ipath powers the old HTX bus cards that
>> > only work in AMD systems,
>>
>> Do those systems have PAT support? CAn anyone check if PAT is enabled
>> if booted on a recent kernel?
>
> I don't have one of these systems any more. The *only* one I ever had
> was a monster IBM box...I can't even find a reference to it any more.

Um, yeah if its so rare then I think the compromise proposed might
make sense, specially since folks were even *considering* seriously
removing this device driver. I'll send some patches to propose the
strategy explained to require booting with pat disabled.

Luis

2015-04-22 20:47:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote:

> To be precise, the split is that ipath powers the old HTX bus cards that
> only work in AMD systems, qib is all PCI-e cards. I still have a few
> HTX cards, but I no longer have any systems with HTX slots, so we
> haven't even used this driver in testing for 3 or 4 years now. And
> these are all old SDR cards, where the performance numbers were 800MB/s
> with WC enabled, 50MB/s without it.

Wow, I doubt any HTX systems are still in any kind of use.

It would be a nice clean up to drop the PPC support out of this driver
too. PPC never had HTX.

Jason

2015-04-22 20:59:13

by Doug Ledford

[permalink] [raw]
Subject: Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

On Wed, 2015-04-22 at 14:46 -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote:
>
> > To be precise, the split is that ipath powers the old HTX bus cards that
> > only work in AMD systems, qib is all PCI-e cards. I still have a few
> > HTX cards, but I no longer have any systems with HTX slots, so we
> > haven't even used this driver in testing for 3 or 4 years now. And
> > these are all old SDR cards, where the performance numbers were 800MB/s
> > with WC enabled, 50MB/s without it.
>
> Wow, I doubt any HTX systems are still in any kind of use.
>
> It would be a nice clean up to drop the PPC support out of this driver
> too. PPC never had HTX.

commit f6d60848baf9f4015c76c665791875ed623cd5b7
Author: Ralph Campbell <[email protected]>
Date: Thu May 6 17:03:19 2010 -0700

IB/ipath: Remove support for QLogic PCIe QLE devices

The ib_qib driver is taking over support for QLogic PCIe QLE
devices,
so remove support for them from ib_ipath. The ib_ipath driver now
supports only the obsolete QLogic Hyper-Transport IB host channel
adapter (model QHT7140).

Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

There you go. It's been HTX only since 2010, and those cards were
already old then. I think we should seriously consider deprecating and
then removing the driver.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part