2008-11-18 15:43:44

by Joerg Roedel

[permalink] [raw]
Subject: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

(sorry for resend, forgot to add the mailing lists)

Hi Ingo,


The following changes since commit 4e14e833ac3b97a4aa8803eea49f899adc5bb5f4:
Linus Torvalds (1):
Merge git://git.kernel.org/.../sfrench/cifs-2.6

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu.git iommu-fixes-2.6.28

Joerg Roedel (4):
AMD IOMMU: add parameter to disable device isolation
AMD IOMMU: enable device isolation per default
AMD IOMMU: fix fullflush comparison length
AMD IOMMU: check for next_bit also in unmapped area

Documentation/kernel-parameters.txt | 4 +++-
arch/x86/kernel/amd_iommu.c | 2 +-
arch/x86/kernel/amd_iommu_init.c | 6 ++++--
3 files changed, 8 insertions(+), 4 deletions(-)

As the most important change these patches enable device isolation per
default. Tests have shown that there are drivers which have bugs and do
double-freeing of DMA memory. This can lead to data corruption with a
hardware IOMMU when multiple devices share the same protection domain.
Therefore device isolation should be enabled by default.
The full diff of these changes is appended. Please pull.

Thanks,

Joerg

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9fa6508..f2e1e7f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -294,7 +294,9 @@ and is between 256 and 4096 characters. It is defined in the file
Possible values are:
isolate - enable device isolation (each device, as far
as possible, will get its own protection
- domain)
+ domain) [default]
+ share - put every device behind one IOMMU into the
+ same protection domain
fullflush - enable flushing of IO/TLB entries when
they are unmapped. Otherwise they are
flushed before they will be reused, which
diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c
index 331b318..e4899e0 100644
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -537,7 +537,7 @@ static void dma_ops_free_addresses(struct dma_ops_domain *dom,
address >>= PAGE_SHIFT;
iommu_area_free(dom->bitmap, address, pages);

- if (address + pages >= dom->next_bit)
+ if (address >= dom->next_bit)
dom->need_flush = true;
}

diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index 0cdcda3..30ae270 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -121,7 +121,7 @@ u16 amd_iommu_last_bdf; /* largest PCI device id we have
LIST_HEAD(amd_iommu_unity_map); /* a list of required unity mappings
we find in ACPI */
unsigned amd_iommu_aperture_order = 26; /* size of aperture in power of 2 */
-int amd_iommu_isolate; /* if 1, device isolation is enabled */
+int amd_iommu_isolate = 1; /* if 1, device isolation is enabled */
bool amd_iommu_unmap_flush; /* if true, flush on every unmap */

LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
@@ -1213,7 +1213,9 @@ static int __init parse_amd_iommu_options(char *str)
for (; *str; ++str) {
if (strncmp(str, "isolate", 7) == 0)
amd_iommu_isolate = 1;
- if (strncmp(str, "fullflush", 11) == 0)
+ if (strncmp(str, "share", 5) == 0)
+ amd_iommu_isolate = 0;
+ if (strncmp(str, "fullflush", 9) == 0)
amd_iommu_unmap_flush = true;
}

--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


2008-11-18 15:49:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5


* Joerg Roedel <[email protected]> wrote:

> (sorry for resend, forgot to add the mailing lists)
>
> Hi Ingo,
>
>
> The following changes since commit 4e14e833ac3b97a4aa8803eea49f899adc5bb5f4:
> Linus Torvalds (1):
> Merge git://git.kernel.org/.../sfrench/cifs-2.6
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu.git iommu-fixes-2.6.28
>
> Joerg Roedel (4):
> AMD IOMMU: add parameter to disable device isolation
> AMD IOMMU: enable device isolation per default
> AMD IOMMU: fix fullflush comparison length
> AMD IOMMU: check for next_bit also in unmapped area
>
> Documentation/kernel-parameters.txt | 4 +++-
> arch/x86/kernel/amd_iommu.c | 2 +-
> arch/x86/kernel/amd_iommu_init.c | 6 ++++--
> 3 files changed, 8 insertions(+), 4 deletions(-)
>
> As the most important change these patches enable device isolation
> per default. Tests have shown that there are drivers which have bugs
> and do double-freeing of DMA memory. This can lead to data
> corruption with a hardware IOMMU when multiple devices share the
> same protection domain. Therefore device isolation should be enabled
> by default. The full diff of these changes is appended. Please pull.

pulled into tip/x86/urgent, thanks Joerg!

Ingo

2008-11-19 06:15:59

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

On Tue, 18 Nov 2008 16:43:22 +0100
Joerg Roedel <[email protected]> wrote:

> Joerg Roedel (4):
> AMD IOMMU: add parameter to disable device isolation
> AMD IOMMU: enable device isolation per default
> AMD IOMMU: fix fullflush comparison length
> AMD IOMMU: check for next_bit also in unmapped area
>
> Documentation/kernel-parameters.txt | 4 +++-
> arch/x86/kernel/amd_iommu.c | 2 +-
> arch/x86/kernel/amd_iommu_init.c | 6 ++++--
> 3 files changed, 8 insertions(+), 4 deletions(-)
>
> As the most important change these patches enable device isolation per
> default. Tests have shown that there are drivers which have bugs and do
> double-freeing of DMA memory.

What drivers? We need to fix them if they are mainline drivers.


> This can lead to data corruption with a
> hardware IOMMU when multiple devices share the same protection domain.
> Therefore device isolation should be enabled by default.

Hmm, the change is just because of the bug workaround? If so, I'm not
sure it's a good idea. We need to fix the buggy drivers anyway. And
device isolation is not free; e.g. use more memory rather than sharing
a protection domain. I guess that more people prefer sharing a
protection domain by default. It had been the default option for AMD
IOMMU until you hit the bugs. IIRC, VT-d also shares a protection
domain by default. It would be nice to avoid surprising users if the
two virtualization IOMMUs works in the similar way.

2008-11-19 09:25:59

by Joerg Roedel

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

On Wed, Nov 19, 2008 at 03:05:24PM +0900, FUJITA Tomonori wrote:
> On Tue, 18 Nov 2008 16:43:22 +0100
> Joerg Roedel <[email protected]> wrote:
>
> > Joerg Roedel (4):
> > AMD IOMMU: add parameter to disable device isolation
> > AMD IOMMU: enable device isolation per default
> > AMD IOMMU: fix fullflush comparison length
> > AMD IOMMU: check for next_bit also in unmapped area
> >
> > Documentation/kernel-parameters.txt | 4 +++-
> > arch/x86/kernel/amd_iommu.c | 2 +-
> > arch/x86/kernel/amd_iommu_init.c | 6 ++++--
> > 3 files changed, 8 insertions(+), 4 deletions(-)
> >
> > As the most important change these patches enable device isolation per
> > default. Tests have shown that there are drivers which have bugs and do
> > double-freeing of DMA memory.
>
> What drivers? We need to fix them if they are mainline drivers.

I found issues in network drivers only for now. The two drivers where I
found issues are the in-kernel ixgbe driver (I see IO_PAGE_FAULTS
there), the ixgbe version from the Intel website has a double-free bug
when unloading the driver or changing the device mtu. The same problem
was found with the Broadcom NetXtreme II driver.

> > This can lead to data corruption with a
> > hardware IOMMU when multiple devices share the same protection domain.
> > Therefore device isolation should be enabled by default.
>
> Hmm, the change is just because of the bug workaround? If so, I'm not
> sure it's a good idea. We need to fix the buggy drivers anyway. And
> device isolation is not free; e.g. use more memory rather than sharing
> a protection domain. I guess that more people prefer sharing a
> protection domain by default. It had been the default option for AMD
> IOMMU until you hit the bugs. IIRC, VT-d also shares a protection
> domain by default. It would be nice to avoid surprising users if the
> two virtualization IOMMUs works in the similar way.

We can't test all drivers for those bugs until 2.6.28 will be released.
And these bugs can corrupt data, for example when a driver frees dma
addresses allocated by another driver and these addresses are then
reallocated.
The only way to protect the drivers from each other is to isolate them
in different protection domains. The AMD IOMMU driver prints a WARN_ON()
if a driver frees dma addresses not yet mapped. This triggered with the
bnx2 and the ixgbe driver.
And the data corruption is real, it eat the root-fs of my testbox one
time.
I agree that we need to fix the drivers. I plan to implement some debug
code which allows driver developers to detect those bugs even if they
have no IOMMU in the system.

Joerg

2008-11-19 09:36:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5


* Joerg Roedel <[email protected]> wrote:

> We can't test all drivers for those bugs until 2.6.28 will be
> released. And these bugs can corrupt data, for example when a driver
> frees dma addresses allocated by another driver and these addresses
> are then reallocated.
>
> The only way to protect the drivers from each other is to isolate
> them in different protection domains. The AMD IOMMU driver prints a
> WARN_ON() if a driver frees dma addresses not yet mapped. This
> triggered with the bnx2 and the ixgbe driver.
>
> And the data corruption is real, it eat the root-fs of my testbox
> one time.

a WARN_ON() can be acted upon much easier than silent/spurious data
corruption. So printing a WARN_ON() will result in drivers being fixed
a lot faster (and with a lot less debugging needed) than if we were
intentionally letting DMA corruption happen. The WARN_ON() will be
routed to kerneloops.org on the major distros, etc. etc.

> I agree that we need to fix the drivers. I plan to implement some
> debug code which allows driver developers to detect those bugs even
> if they have no IOMMU in the system.

That would be _really_ nice to have.

Ingo

2008-11-19 12:58:15

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

On Wed, Nov 19, 2008 at 03:05:24PM +0900, FUJITA Tomonori wrote:
> On Tue, 18 Nov 2008 16:43:22 +0100
> Joerg Roedel <[email protected]> wrote:
>
> > Joerg Roedel (4):
> > AMD IOMMU: add parameter to disable device isolation
> > AMD IOMMU: enable device isolation per default
> > AMD IOMMU: fix fullflush comparison length
> > AMD IOMMU: check for next_bit also in unmapped area
> >
> > Documentation/kernel-parameters.txt | 4 +++-
> > arch/x86/kernel/amd_iommu.c | 2 +-
> > arch/x86/kernel/amd_iommu_init.c | 6 ++++--
> > 3 files changed, 8 insertions(+), 4 deletions(-)
> >
> > As the most important change these patches enable device isolation
> > per default. Tests have shown that there are drivers which have
> > bugs and do double-freeing of DMA memory.
>
> What drivers? We need to fix them if they are mainline drivers.
>
> > This can lead to data corruption with a hardware IOMMU when
> > multiple devices share the same protection domain. Therefore
> > device isolation should be enabled by default.
>
> Hmm, the change is just because of the bug workaround? If so, I'm not
> sure it's a good idea. We need to fix the buggy drivers anyway.

This won't work around the bug, it will just make its outcome less
severe (by restricting the fault to the offensive device only).

> device isolation is not free; e.g. use more memory rather than
> sharing a protection domain. I guess that more people prefer sharing
> a protection domain by default.

I doubt it, why use an isolation-capable IOMMU at all if not for the
increased reliability? The majority of modern devices---those that you
are likely to find on machines with an IOMMU---don't have DMA
limitations.

> It had been the default option for AMD IOMMU until you hit the
> bugs. IIRC, VT-d also shares a protection domain by default. It
> would be nice to avoid surprising users if the two virtualization
> IOMMUs works in the similar way.

Calgary has a per-bus protection domain, both on x86 and PPC.

Cheers,
Muli
--
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
<->
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/

2008-11-20 04:25:55

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

On Wed, 19 Nov 2008 10:25:44 +0100
Joerg Roedel <[email protected]> wrote:

> On Wed, Nov 19, 2008 at 03:05:24PM +0900, FUJITA Tomonori wrote:
> > On Tue, 18 Nov 2008 16:43:22 +0100
> > Joerg Roedel <[email protected]> wrote:
> >
> > > Joerg Roedel (4):
> > > AMD IOMMU: add parameter to disable device isolation
> > > AMD IOMMU: enable device isolation per default
> > > AMD IOMMU: fix fullflush comparison length
> > > AMD IOMMU: check for next_bit also in unmapped area
> > >
> > > Documentation/kernel-parameters.txt | 4 +++-
> > > arch/x86/kernel/amd_iommu.c | 2 +-
> > > arch/x86/kernel/amd_iommu_init.c | 6 ++++--
> > > 3 files changed, 8 insertions(+), 4 deletions(-)
> > >
> > > As the most important change these patches enable device isolation per
> > > default. Tests have shown that there are drivers which have bugs and do
> > > double-freeing of DMA memory.
> >
> > What drivers? We need to fix them if they are mainline drivers.
>
> I found issues in network drivers only for now. The two drivers where I
> found issues are the in-kernel ixgbe driver (I see IO_PAGE_FAULTS
> there), the ixgbe version from the Intel website has a double-free bug
> when unloading the driver or changing the device mtu. The same problem
> was found with the Broadcom NetXtreme II driver.

I see, thanks. You already reported the bugs to netdev?


> > > This can lead to data corruption with a
> > > hardware IOMMU when multiple devices share the same protection domain.
> > > Therefore device isolation should be enabled by default.
> >
> > Hmm, the change is just because of the bug workaround? If so, I'm not
> > sure it's a good idea. We need to fix the buggy drivers anyway. And
> > device isolation is not free; e.g. use more memory rather than sharing
> > a protection domain. I guess that more people prefer sharing a
> > protection domain by default. It had been the default option for AMD
> > IOMMU until you hit the bugs. IIRC, VT-d also shares a protection
> > domain by default. It would be nice to avoid surprising users if the
> > two virtualization IOMMUs works in the similar way.
>
> We can't test all drivers for those bugs until 2.6.28 will be released.
> And these bugs can corrupt data, for example when a driver frees dma
> addresses allocated by another driver and these addresses are then
> reallocated.
> The only way to protect the drivers from each other is to isolate them
> in different protection domains. The AMD IOMMU driver prints a WARN_ON()
> if a driver frees dma addresses not yet mapped. This triggered with the
> bnx2 and the ixgbe driver.

It would be better to add such WARN_ON to VT-d. VT-d is everywhere
nowadays. I think that there are some developers who can test these
drivers with VT-d.


> And the data corruption is real, it eat the root-fs of my testbox one
> time.
> I agree that we need to fix the drivers. I plan to implement some debug
> code which allows driver developers to detect those bugs even if they
> have no IOMMU in the system.

It's not so hard to add such debug feature to swiotlb, I guess.

2008-11-20 04:26:18

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

On Wed, 19 Nov 2008 14:57:50 +0200
Muli Ben-Yehuda <[email protected]> wrote:

> > device isolation is not free; e.g. use more memory rather than
> > sharing a protection domain. I guess that more people prefer sharing
> > a protection domain by default.
>
> I doubt it, why use an isolation-capable IOMMU at all if not for the
> increased reliability? The majority of modern devices---those that you
> are likely to find on machines with an IOMMU---don't have DMA
> limitations.

I guess that there are still some modern SATA HBAs that are not
capable of 64bit DMA. You might be right though.


> > It had been the default option for AMD IOMMU until you hit the
> > bugs. IIRC, VT-d also shares a protection domain by default. It
> > would be nice to avoid surprising users if the two virtualization
> > IOMMUs works in the similar way.
>
> Calgary has a per-bus protection domain, both on x86 and PPC.

I see. Then it might be better to change VT-d to use a separate
protection domain by default.

2008-11-20 07:52:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5


* FUJITA Tomonori <[email protected]> wrote:

> > > It had been the default option for AMD IOMMU until you hit the
> > > bugs. IIRC, VT-d also shares a protection domain by default. It
> > > would be nice to avoid surprising users if the two
> > > virtualization IOMMUs works in the similar way.
> >
> > Calgary has a per-bus protection domain, both on x86 and PPC.
>
> I see. Then it might be better to change VT-d to use a separate
> protection domain by default.

yes, agreed, and that should be the sane default for any IOMMU driver
- unless the performance impact is prohibitive.

Note that this widens the positive impact of the IOMMU code: not only
does it enable transparent support of DMA to/from devices that have a
limited DMA range, not only does it help isolation in virtualization -
it also acts as a daily debug helper for _native_ drivers.

Note that people will prefer to run with an IOMMU enabled even if all
devices support the full memory range - just due to the DMA protection
features. Just like people prefer to run an OS with paging protections
enabled ;-)

It also puts pressure on the hw design side to treat IOMMUs not just
as some fringe feature for compatibility with older transports or
virtualization, but also as a prime-time native IO feature.

Ingo

2008-11-20 11:34:43

by Joerg Roedel

[permalink] [raw]
Subject: Re: [GIT PULL] AMD IOMMU updates for 2.6.28-rc5

On Thu, Nov 20, 2008 at 01:25:15PM +0900, FUJITA Tomonori wrote:
> On Wed, 19 Nov 2008 10:25:44 +0100
> Joerg Roedel <[email protected]> wrote:
> > I found issues in network drivers only for now. The two drivers where I
> > found issues are the in-kernel ixgbe driver (I see IO_PAGE_FAULTS
> > there), the ixgbe version from the Intel website has a double-free bug
> > when unloading the driver or changing the device mtu. The same problem
> > was found with the Broadcom NetXtreme II driver.
>
> I see, thanks. You already reported the bugs to netdev?
>

Not yet, but I will do so when I can prove the bug is not in my driver
;) (== when my debug code is ready)

> > We can't test all drivers for those bugs until 2.6.28 will be released.
> > And these bugs can corrupt data, for example when a driver frees dma
> > addresses allocated by another driver and these addresses are then
> > reallocated.
> > The only way to protect the drivers from each other is to isolate them
> > in different protection domains. The AMD IOMMU driver prints a WARN_ON()
> > if a driver frees dma addresses not yet mapped. This triggered with the
> > bnx2 and the ixgbe driver.
>
> It would be better to add such WARN_ON to VT-d. VT-d is everywhere
> nowadays. I think that there are some developers who can test these
> drivers with VT-d.

Yes, I agree.

> > And the data corruption is real, it eat the root-fs of my testbox one
> > time.
> > I agree that we need to fix the drivers. I plan to implement some debug
> > code which allows driver developers to detect those bugs even if they
> > have no IOMMU in the system.
>
> It's not so hard to add such debug feature to swiotlb, I guess.
>

Yes, but I prefer to have it outside of any dma_ops implementation. This
way it can also be used to find out if a bug originated from the device
driver or the dma_ops implementation.

Joerg

--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy