2009-04-28 15:05:54

by Joerg Roedel

[permalink] [raw]
Subject: IOMMU and graphics cards

Hi David,

as I have seen the VT-d code implements a workaround for broken graphics
card drivers. What it does, when enabled, is giving each grahics device
direct access to all physical memory. I really don't like to implement
this but a similar workaround for the AMD IOMMU seems to be necessary.
The biggest problem here is that this kind of workaround disables device
isolation for graphics cards. Device isolation is the main reason for
using an IOMMU in an unvirtualized environment. So if this workaround is
enabled it is as good as disable the IOMMU at all.
Thats why I think a kernel compile option for this workaround is not
sufficient. Distributors will probably enable this in their kernels
which also disables device isolation even if the user don't want to use
these broken drivers.
I think we should change that and provide a better way which allows to
enable this workaround only if it is required (and it should be
transparent to the user which IOMMU is built into the system).
We have several options to do this:

* Implement a kernel command line option to enable/disable the
workaround (what should be default?)
* Use the IOMMU-API and write a kernel module which creates a
direct mapped protection domain and assigns the graphics
cards to it (need to be done carefully to not break graphics
drivers which do everything right and use the DMA-API)
* Any other great idea?

So what do you (and all the others reading this :-) think? I would
prefer the way of implementing a module but there may also be reasons
against this. I would like this to be disussed before I implement the
workaround for the AMD IOMMU.

Thanks,

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632


2009-04-28 15:29:20

by David Woodhouse

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Tue, 2009-04-28 at 17:05 +0200, Joerg Roedel wrote:
> Hi David,
>
> as I have seen the VT-d code implements a workaround for broken graphics
> card drivers. What it does, when enabled, is giving each grahics device
> direct access to all physical memory. I really don't like to implement
> this but a similar workaround for the AMD IOMMU seems to be necessary.
> The biggest problem here is that this kind of workaround disables device
> isolation for graphics cards. Device isolation is the main reason for
> using an IOMMU in an unvirtualized environment. So if this workaround is
> enabled it is as good as disable the IOMMU at all.

For that device, yes. The IOMMU still catches errors from _other_
devices, of course.

But I agree that it's a crap thing to have to do, and there are probably
a number of ways we could make it slightly less crap. For a start, we
could at least refrain from mapping the kernel text -- the card has no
business DMAing there, whatever happens.

> Thats why I think a kernel compile option for this workaround is not
> sufficient. Distributors will probably enable this in their kernels
> which also disables device isolation even if the user don't want to use
> these broken drivers.

Yes, that makes a lot of sense.

Is the reason you're doing this actually because of broken drivers? Or
just because of the performance implications? I've heard both cited as
reasons for the 1:1 mapping... and if it's mostly the former, then
perhaps the best way for me to help you is to stop enabling the option
in Fedora (rawhide, at least), so that the buggy drivers get _fixed_ and
you don't get stuck with "it works on Intel, why can't you make it
work?" reports?

> I think we should change that and provide a better way which allows to
> enable this workaround only if it is required (and it should be
> transparent to the user which IOMMU is built into the system).
> We have several options to do this:
>
> * Implement a kernel command line option to enable/disable the
> workaround (what should be default?)
> * Use the IOMMU-API and write a kernel module which creates a
> direct mapped protection domain and assigns the graphics
> cards to it (need to be done carefully to not break graphics
> drivers which do everything right and use the DMA-API)
> * Any other great idea?
>
> So what do you (and all the others reading this :-) think? I would
> prefer the way of implementing a module but there may also be reasons
> against this. I would like this to be disussed before I implement the
> workaround for the AMD IOMMU.

I think I also prefer your second option. I don't like having this as a
hack in the IOMMU code.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-04-28 16:05:29

by Joerg Roedel

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Tue, Apr 28, 2009 at 04:28:50PM +0100, David Woodhouse wrote:
> On Tue, 2009-04-28 at 17:05 +0200, Joerg Roedel wrote:
>
> For that device, yes. The IOMMU still catches errors from _other_
> devices, of course.

Yes, it is in effect for other devices. But since its a security feature
it only makes sense if it covers all devices.

> Is the reason you're doing this actually because of broken drivers? Or
> just because of the performance implications? I've heard both cited as
> reasons for the 1:1 mapping... and if it's mostly the former, then
> perhaps the best way for me to help you is to stop enabling the option
> in Fedora (rawhide, at least), so that the buggy drivers get _fixed_ and
> you don't get stuck with "it works on Intel, why can't you make it
> work?" reports?

Currently some graphics drivers don't work with IOMMU enabled because
they don't use the DMA-API. The silently assume that device addess ==
physical address.
I don't know how it is solved for VT-d but for AMD IOMMU the available
DMA memory address space is limited per device. This is a problem for
graphics card drivers because, as developers told me, they may need
gigabytes of DMA memory. I am currently thinking about ways to fix that
without enlarging the DMA address space for each device (which would be
a huge waste of memory). But unless this problem isn't solved the
drivers won't be fixed, I guess.
I guess the DRM code in the kernel may have the same problem with IOMMU
enabled?

> > * Implement a kernel command line option to enable/disable the
> > workaround (what should be default?)
> > * Use the IOMMU-API and write a kernel module which creates a
> > direct mapped protection domain and assigns the graphics
> > cards to it (need to be done carefully to not break graphics
> > drivers which do everything right and use the DMA-API)
> > * Any other great idea?
>
> I think I also prefer your second option. I don't like having this as a
> hack in the IOMMU code.

Ok, so I will implement such a module and send the code around for
discussion.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-07-06 12:26:31

by David Woodhouse

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Fri, 2009-05-08 at 11:35 +0200, Joerg Roedel wrote:
> The other way to achieve consistency is to remove the workaround from
> VT-d code which significantly increases the chance to get the broken
> stuff fixed. David?

It's gone from 2.6.31-rc2, but even the in-tree drivers break, so I
think we have to put it back for 2.6.31 (I've sent Linus the pull
request already).

I do intend to remove it again for 2.6.32 though, and I filed
http://bugzilla.kernel.org/show_bug.cgi?id=13721

For now it's got a clearer description ("Workaround broken graphics
drivers") and defaults to 'n'.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-07-06 13:12:23

by Joerg Roedel

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Mon, Jul 06, 2009 at 01:26:17PM +0100, David Woodhouse wrote:
> On Fri, 2009-05-08 at 11:35 +0200, Joerg Roedel wrote:
> > The other way to achieve consistency is to remove the workaround from
> > VT-d code which significantly increases the chance to get the broken
> > stuff fixed. David?
>
> It's gone from 2.6.31-rc2, but even the in-tree drivers break, so I
> think we have to put it back for 2.6.31 (I've sent Linus the pull
> request already).
>
> I do intend to remove it again for 2.6.32 though, and I filed
> http://bugzilla.kernel.org/show_bug.cgi?id=13721
>
> For now it's got a clearer description ("Workaround broken graphics
> drivers") and defaults to 'n'.

Ok, cool, that sounds good. Which in-kernel DRM drivers break with IOMMU
for you? I'll may probably add a similar temporary workaround for AMD
IOMMU too...

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-07-06 14:19:23

by David Woodhouse

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Mon, 2009-07-06 at 15:11 +0200, Joerg Roedel wrote:
> Ok, cool, that sounds good. Which in-kernel DRM drivers break with IOMMU
> for you? I'll may probably add a similar temporary workaround for AMD
> IOMMU too...

The Intel one definitely broke -- I don't know about the others. There
are some old patches at http://people.freedesktop.org/~zhen/agp-mm-*
which make it look like _all_ AGP drivers are broken.

I wouldn't bother adding the workaround -- as I said, I'm planning to
rip it out of 2.6.32 (and in linux-next as soon as it's reasonable to do
so). Let's just let them fix it.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-07-06 21:36:12

by Dave Airlie

[permalink] [raw]
Subject: Re: IOMMU and graphics cards



On Mon, 6 Jul 2009, David Woodhouse wrote:

> On Mon, 2009-07-06 at 15:11 +0200, Joerg Roedel wrote:
> > Ok, cool, that sounds good. Which in-kernel DRM drivers break with IOMMU
> > for you? I'll may probably add a similar temporary workaround for AMD
> > IOMMU too...
>
> The Intel one definitely broke -- I don't know about the others. There
> are some old patches at http://people.freedesktop.org/~zhen/agp-mm-*
> which make it look like _all_ AGP drivers are broken.
>
> I wouldn't bother adding the workaround -- as I said, I'm planning to
> rip it out of 2.6.32 (and in linux-next as soon as it's reasonable to do
> so). Let's just let them fix it.

cc'ing Eric,

My memory of this is graphics becomes totally useless and can be 10x-50x
slower. I think ripping this out without the person doing the ripping
taking responsiblity for doing speed regression testing is totally insane.

I personally have no IOMMU hw from Intel or AMD and nobody has seen it fit
to supply me with any at any point in time, I'm not on the correct gravy
train. So I suspect the people with the hw will have to do the work and
the regression testing.

Dave.

2009-07-06 22:05:34

by Dave Airlie

[permalink] [raw]
Subject: Re: IOMMU and graphics cards


On 07/07/2009, at 7:35, Dave Airlie <[email protected]> wrote:

>
>
> On Mon, 6 Jul 2009, David Woodhouse wrote:
>
>> On Mon, 2009-07-06 at 15:11 +0200, Joerg Roedel wrote:
>>> Ok, cool, that sounds good. Which in-kernel DRM drivers break with
>>> IOMMU
>>> for you? I'll may probably add a similar temporary workaround for
>>> AMD
>>> IOMMU too...
>>
>> The Intel one definitely broke -- I don't know about the others.
>> There
>> are some old patches at http://people.freedesktop.org/~zhen/agp-mm-*
>> which make it look like _all_ AGP drivers are broken.
>>
>> I wouldn't bother adding the workaround -- as I said, I'm planning to
>> rip it out of 2.6.32 (and in linux-next as soon as it's reasonable
>> to do
>> so). Let's just let them fix it.
>
> cc'ing Eric,
>
> My memory of this is graphics becomes totally useless and can be
> 10x-50x
> slower. I think ripping this out without the person doing the ripping
> taking responsiblity for doing speed regression testing is totally
> insane.
>
> I personally have no IOMMU hw from Intel or AMD and nobody has seen
> it fit
> to supply me with any at any point in time, I'm not on the correct
> gravy
> train. So I suspect the people with the hw will have to do the work
> and
> the regression testing.

Could you also enumerate any limitations of the IOMMUs on the amount
of memory they can remap per device if any.

Dave

> Dave.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-07-07 08:54:23

by Joerg Roedel

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Tue, Jul 07, 2009 at 08:00:01AM +1000, Dave Airlie wrote:
>
> Could you also enumerate any limitations of the IOMMUs on the amount
> of memory they can remap per device if any.

The AMD IOMMU driver in 2.6.31-rcX can map up to 4GB per domain. If the
gfx card has a domain of its own it can use all of the 4GB.
I am not 100% sure about VT-d, but as far as I understood the allocator
it doesn't have that 4GB limitation per domain.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-07-07 08:59:19

by Joerg Roedel

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Mon, Jul 06, 2009 at 10:35:59PM +0100, Dave Airlie wrote:
>
>
> On Mon, 6 Jul 2009, David Woodhouse wrote:
>
> > On Mon, 2009-07-06 at 15:11 +0200, Joerg Roedel wrote:
> > > Ok, cool, that sounds good. Which in-kernel DRM drivers break with IOMMU
> > > for you? I'll may probably add a similar temporary workaround for AMD
> > > IOMMU too...
> >
> > The Intel one definitely broke -- I don't know about the others. There
> > are some old patches at http://people.freedesktop.org/~zhen/agp-mm-*
> > which make it look like _all_ AGP drivers are broken.
> >
> > I wouldn't bother adding the workaround -- as I said, I'm planning to
> > rip it out of 2.6.32 (and in linux-next as soon as it's reasonable to do
> > so). Let's just let them fix it.
>
> cc'ing Eric,
>
> My memory of this is graphics becomes totally useless and can be 10x-50x
> slower. I think ripping this out without the person doing the ripping
> taking responsiblity for doing speed regression testing is totally insane.

Are you sure that using the dma-api has such an performance impact? I've
heard from other people that switching to dma-api with amd iommu had no
significant performance impact.

> I personally have no IOMMU hw from Intel or AMD and nobody has seen it fit
> to supply me with any at any point in time, I'm not on the correct gravy
> train. So I suspect the people with the hw will have to do the work and
> the regression testing.

You just need to switch to dma-api. You don't need an iommu to test.
Most bugs in such code can also be found and eliminated using the
dma-api debugging interface.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-07-07 09:05:33

by David Woodhouse

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Mon, 2009-07-06 at 22:35 +0100, Dave Airlie wrote:
>
> On Mon, 6 Jul 2009, David Woodhouse wrote:
>
> > On Mon, 2009-07-06 at 15:11 +0200, Joerg Roedel wrote:
> > > Ok, cool, that sounds good. Which in-kernel DRM drivers break with IOMMU
> > > for you? I'll may probably add a similar temporary workaround for AMD
> > > IOMMU too...
> >
> > The Intel one definitely broke -- I don't know about the others. There
> > are some old patches at http://people.freedesktop.org/~zhen/agp-mm-*
> > which make it look like _all_ AGP drivers are broken.
> >
> > I wouldn't bother adding the workaround -- as I said, I'm planning to
> > rip it out of 2.6.32 (and in linux-next as soon as it's reasonable to do
> > so). Let's just let them fix it.
>
> cc'ing Eric,
>
> My memory of this is graphics becomes totally useless and can be 10x-50x
> slower. I think ripping this out without the person doing the ripping
> taking responsiblity for doing speed regression testing is totally insane.

If the IOMMU is absent, disabled or in pass-through mode, then the
performance impact should be virtually zero.

It's the case where the IOMMU is in use that matters -- and the choice
there is between "slow" and "broken". I'll take "slow", please.

Having said that, I've done a bunch of performance work recently and
especially multi-page mappings are a _lot_ faster than they used to be.
I'll see if I can do more once I can test it with your actual usage
patterns.

We currently have an evil hack which uses a shitload of memory to set up
a full set of page tables to map all of physical memory, for every
graphics device at boot time whether we need it or not. The AMD folks
have so far resisted doing that, partly because of the amount of RAM it
would take. I'm getting poked from all sides to _remove_ our hack.

> I personally have no IOMMU hw from Intel or AMD and nobody has seen it fit
> to supply me with any at any point in time, I'm not on the correct gravy
> train. So I suspect the people with the hw will have to do the work and
> the regression testing.

Anything with Cantiga chipset would suffice. Lenovo x200s, T400, etc...

> Could you also enumerate any limitations of the IOMMUs on the amount
> of memory they can remap per device if any.

The Intel one basically has no relevant limit -- it's 54 bits of virtual
address space or something like that.

I think I saw something about the AMD one perhaps being limited to 4GiB
per device -- which makes 1:1 mapping of all memory a lot less feasible,
but I'll let Jörg answer that question definitively.

Not sure about other platforms off-hand.

--
dwmw2

2009-07-07 09:16:41

by David Woodhouse

[permalink] [raw]
Subject: Re: IOMMU and graphics cards

On Tue, 2009-07-07 at 10:59 +0200, Joerg Roedel wrote:
> On Mon, Jul 06, 2009 at 10:35:59PM +0100, Dave Airlie wrote:
> > My memory of this is graphics becomes totally useless and can be 10x-50x
> > slower. I think ripping this out without the person doing the ripping
> > taking responsiblity for doing speed regression testing is totally insane.
>
> Are you sure that using the dma-api has such an performance impact? I've
> heard from other people that switching to dma-api with amd iommu had no
> significant performance impact.

The DMA API on its own shouldn't hurt at all. and I'm not sure about the
AMD IOMMU.

It's the other IOMMU that hurts -- something about cache-incoherent page
tables and having to flush the cache for every change... especially on
an architecture which doesn't have any sane cache management (because it
was always designed for everything to be cache coherent, dammit!) so you
end up _invalidating_ the cache at the same time as you write it back!

Of course, it's a lot saner now that we don't individually flush the
cache after _every_ PTE we change, but do it in batches instead.
Multi-page mapping/unmapping is now a _lot_ faster than it used to be.

I'm also looking at mapping the page tables uncached, so we don't have
to do the cache flushes at all.

> > I personally have no IOMMU hw from Intel or AMD and nobody has seen it fit
> > to supply me with any at any point in time, I'm not on the correct gravy
> > train. So I suspect the people with the hw will have to do the work and
> > the regression testing.
>
> You just need to switch to dma-api. You don't need an iommu to test.
> Most bugs in such code can also be found and eliminated using the
> dma-api debugging interface.

Indeed. Just turn on CONFIG_DMA_API_DEBUG (that might have some
performance impact, but it's not huge).

--
dwmw2

2009-07-07 15:23:59

by Duran, Leo

[permalink] [raw]
Subject: RE: IOMMU and graphics cards


On 07/07/2009, at 10:08 CST, Dave Airlie <[email protected]> wrote:
> Could you also enumerate any limitations of the IOMMUs
> on the amount of memory they can remap per device if any.

Here's the spec for the AND IOMMU:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
34434.pdf

Below is a cut-paste of "Table 3: Device Table Entry Field Definitions".
As an example, paging mode 4 supports a 48-bit virtual address space
(well beyond 4GB's). In terms of physical address space, "Figure 7: I/O
Page Translation Entry (PTE)" shows supports for up-to a 52-bit page
address.

Leo.


11:9 Mode: paging mode. Specify how the IOMMU performs page translation
on behalf of the device. If
page translation is enabled, the mode specifies the depth of the
device's I/O page tables (1 to 6
levels).

000b Translation disabled (Access controlled by IR and IW bits)
001b 1 Level Page Table (provides a 21-bit device virtual address space)
010b 2 Level Page Table (provides a 30-bit device virtual address space)
011b 3 Level Page Table (provides a 39-bit device virtual address space)
100b 4 Level Page Table (provides a 48-bit device virtual address space)
101b 5 Level Page Table (provides a 57-bit device virtual address space)
110b 6 Level Page Table (provides a 64-bit device virtual address space)
111b Reserved
Note: the page table root pointer is ignored when Mode=000b and when
Mode=111b.
Note: Mode=111b is reported as an error when V=1 and TV=1.

2009-07-07 15:33:00

by Duran, Leo

[permalink] [raw]
Subject: RE: IOMMU and graphics cards

Here's the spec for the AND IOMMU (with braces):
<http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs
/34434.pdf>

HTH,
Leo.

2009-07-07 15:36:24

by David Woodhouse

[permalink] [raw]
Subject: RE: IOMMU and graphics cards

On Tue, 2009-07-07 at 10:33 -0500, Duran, Leo wrote:
> Here's the spec for the AND IOMMU (with braces):
> <http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs
> /34434.pdf>

It wasn't the lack of braces that was the problem -- it was the presence
of a spurious newline in the middle.

Here's a nickel, kid -- get yourself a proper mail program. One that can
cope with the 1980s invention known as WYSIWYG, perhaps?

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/34434.pdf

--
dwmw2