2009-11-04 23:03:16

by Alex Williamson

[permalink] [raw]
Subject: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

intel_alloc_coherent() needs to follow DMA mapping convention and
make use of the coherent_dma_mask of the device for identity mappings.
Without this, devices may get buffers they can't use. This patch
provides best effort allocations and fails the request if the mask
requirements are not met rather than returning an unusable buffer.

Signed-off-by: Alex Williamson <[email protected]>
---

This patch fixes a regression introduced since 2.6.31 that prevents
devices with a restricted coherent_dma_mask from working in passthrough
mode.

drivers/pci/intel-iommu.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b1e97e6..8283df9 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2582,7 +2582,7 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
BUG_ON(dir == DMA_NONE);

if (iommu_no_mapping(hwdev))
- return paddr;
+ return paddr + size > dma_mask ? 0 : paddr;

domain = get_valid_domain_for_dev(pdev);
if (!domain)
@@ -2767,7 +2767,15 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,

size = PAGE_ALIGN(size);
order = get_order(size);
- flags &= ~(GFP_DMA | GFP_DMA32);
+
+ if (!iommu_no_mapping(hwdev))
+ flags &= ~(GFP_DMA | GFP_DMA32);
+ else if (hwdev->coherent_dma_mask != DMA_BIT_MASK(64)) {
+ if (hwdev->coherent_dma_mask < DMA_BIT_MASK(32))
+ flags |= GFP_DMA;
+ else
+ flags |= GFP_DMA32;
+ }

vaddr = (void *)__get_free_pages(flags, order);
if (!vaddr)


2009-11-06 02:42:06

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Wed, 04 Nov 2009 15:59:34 -0700
Alex Williamson <[email protected]> wrote:

> intel_alloc_coherent() needs to follow DMA mapping convention and
> make use of the coherent_dma_mask of the device for identity mappings.
> Without this, devices may get buffers they can't use. This patch
> provides best effort allocations and fails the request if the mask
> requirements are not met rather than returning an unusable buffer.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
>
> This patch fixes a regression introduced since 2.6.31 that prevents
> devices with a restricted coherent_dma_mask from working in passthrough
> mode.
>
> drivers/pci/intel-iommu.c | 12 ++++++++++--
> 1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index b1e97e6..8283df9 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -2582,7 +2582,7 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
> BUG_ON(dir == DMA_NONE);
>
> if (iommu_no_mapping(hwdev))
> - return paddr;
> + return paddr + size > dma_mask ? 0 : paddr;

You can use dma_capable(hwdev, paddr, size) here.


> domain = get_valid_domain_for_dev(pdev);
> if (!domain)
> @@ -2767,7 +2767,15 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
>
> size = PAGE_ALIGN(size);
> order = get_order(size);
> - flags &= ~(GFP_DMA | GFP_DMA32);
> +
> + if (!iommu_no_mapping(hwdev))
> + flags &= ~(GFP_DMA | GFP_DMA32);
> + else if (hwdev->coherent_dma_mask != DMA_BIT_MASK(64)) {
> + if (hwdev->coherent_dma_mask < DMA_BIT_MASK(32))
> + flags |= GFP_DMA;
> + else
> + flags |= GFP_DMA32;
> + }

This is fine for 2.6.32 but we'll cleanly fix this by using
swiotlb_dma_ops later, right?

2009-11-06 03:19:52

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Fri, 2009-11-06 at 11:41 +0900, FUJITA Tomonori wrote:
> On Wed, 04 Nov 2009 15:59:34 -0700
> Alex Williamson <[email protected]> wrote:
> > @@ -2582,7 +2582,7 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
> > BUG_ON(dir == DMA_NONE);
> >
> > if (iommu_no_mapping(hwdev))
> > - return paddr;
> > + return paddr + size > dma_mask ? 0 : paddr;
>
> You can use dma_capable(hwdev, paddr, size) here.

Good thought, however __intel_map_single() gets called with either the
dma_mask or the coherent_dma_mask. dma_capable() only checks dma_mask,
so would only work for one of the callers.

> > domain = get_valid_domain_for_dev(pdev);
> > if (!domain)
> > @@ -2767,7 +2767,15 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
> >
> > size = PAGE_ALIGN(size);
> > order = get_order(size);
> > - flags &= ~(GFP_DMA | GFP_DMA32);
> > +
> > + if (!iommu_no_mapping(hwdev))
> > + flags &= ~(GFP_DMA | GFP_DMA32);
> > + else if (hwdev->coherent_dma_mask != DMA_BIT_MASK(64)) {
> > + if (hwdev->coherent_dma_mask < DMA_BIT_MASK(32))
> > + flags |= GFP_DMA;
> > + else
> > + flags |= GFP_DMA32;
> > + }
>
> This is fine for 2.6.32 but we'll cleanly fix this by using
> swiotlb_dma_ops later, right?

I'm open to suggestions. I don't really understand why we dropped
swiotlb for passthrough mode in 2.6.32 to start with. It seems like we
now have a couple corner cases where we have to either hope for the best
or effectively ignore the request to use passthrough. Thanks,

Alex


2009-11-06 03:34:40

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Thu, 05 Nov 2009 20:19:52 -0700
Alex Williamson <[email protected]> wrote:

> On Fri, 2009-11-06 at 11:41 +0900, FUJITA Tomonori wrote:
> > On Wed, 04 Nov 2009 15:59:34 -0700
> > Alex Williamson <[email protected]> wrote:
> > > @@ -2582,7 +2582,7 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
> > > BUG_ON(dir == DMA_NONE);
> > >
> > > if (iommu_no_mapping(hwdev))
> > > - return paddr;
> > > + return paddr + size > dma_mask ? 0 : paddr;
> >
> > You can use dma_capable(hwdev, paddr, size) here.
>
> Good thought, however __intel_map_single() gets called with either the
> dma_mask or the coherent_dma_mask. dma_capable() only checks dma_mask,
> so would only work for one of the callers.

Oops, you are right.


> > > domain = get_valid_domain_for_dev(pdev);
> > > if (!domain)
> > > @@ -2767,7 +2767,15 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,
> > >
> > > size = PAGE_ALIGN(size);
> > > order = get_order(size);
> > > - flags &= ~(GFP_DMA | GFP_DMA32);
> > > +
> > > + if (!iommu_no_mapping(hwdev))
> > > + flags &= ~(GFP_DMA | GFP_DMA32);
> > > + else if (hwdev->coherent_dma_mask != DMA_BIT_MASK(64)) {
> > > + if (hwdev->coherent_dma_mask < DMA_BIT_MASK(32))
> > > + flags |= GFP_DMA;
> > > + else
> > > + flags |= GFP_DMA32;
> > > + }
> >
> > This is fine for 2.6.32 but we'll cleanly fix this by using
> > swiotlb_dma_ops later, right?
>
> I'm open to suggestions. I don't really understand why we dropped
> swiotlb for passthrough mode in 2.6.32 to start with. It seems like we
> now have a couple corner cases where we have to either hope for the best
> or effectively ignore the request to use passthrough. Thanks,

I think that the cleanest solution is setting up swiotlb_dma_ops for
passthrough devices (and devices not behind pci, etc). Calgary IOMMU
does the same for years.

2009-11-06 04:09:09

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Fri, 2009-11-06 at 12:34 +0900, FUJITA Tomonori wrote:
> On Thu, 05 Nov 2009 20:19:52 -0700
> Alex Williamson <[email protected]> wrote:
> > On Fri, 2009-11-06 at 11:41 +0900, FUJITA Tomonori wrote:
> >
> > > This is fine for 2.6.32 but we'll cleanly fix this by using
> > > swiotlb_dma_ops later, right?
> >
> > I'm open to suggestions. I don't really understand why we dropped
> > swiotlb for passthrough mode in 2.6.32 to start with. It seems like we
> > now have a couple corner cases where we have to either hope for the best
> > or effectively ignore the request to use passthrough. Thanks,
>
> I think that the cleanest solution is setting up swiotlb_dma_ops for
> passthrough devices (and devices not behind pci, etc). Calgary IOMMU
> does the same for years.

intel-iommu was using swiotlb for the global dma_ops when in passthrough
mode until 19943b0e (2.6.31 and earlier). I would say the next step
would be to use per device dma_ops so we can point passthrough devices
to swiotlb, as you suggest, but that seems to work against part of what
19943b0e was trying to accomplish.

Alex

2009-11-09 23:03:06

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Fri, 2009-11-06 at 11:41 +0900, FUJITA Tomonori wrote:
> This is fine for 2.6.32 but we'll cleanly fix this by using
> swiotlb_dma_ops later, right?

Well, the idea was that with 'iommu=pt' we'd have passthrough mode for
_decent_ devices, but the crappy devices without 64-bit DMA would just
have the IOMMU enabled instead.

We can see this as a simple classification bug -- we should be checking
pdev->coherent_dma_mask as well as pdev->dma_mask:

--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2196,7 +2196,8 @@ static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
* take them out of the 1:1 domain later.
*/
if (!startup)
- return pdev->dma_mask > DMA_BIT_MASK(32);
+ return (pdev->dma_mask & pdev->coherent_dma_mask) <
+ dma_get_required_mask();

return 1;
}

That fixes the case of a 32-bit coherent_dma_mask as it was intended to
be fixed.

Unfortunately, Alex's hardware is more broken than that. It also likes
to do stray reads from unmapped addresses -- addresses which used to be
mapped at some time in the past. So he really _does_ want the IOMMU
disabled, or in passthrough mode.

But I think that's a special case and needs to be handled with a quirk,
while the above patch actually addresses the problem we claimed we were
trying to address.

Handling Alex's broken hardware probably wants to be done with
'iommu=off' for now, and then when Chris's swiotlb fallback patches are
done we can perhaps do something more cunning.

I'm slightly reluctant to put the half-arsed 'try to allocate in the
right region for broken devices but without full swiotlb support' option
into 2.6.32.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-11-09 23:32:56

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Mon, 2009-11-09 at 23:02 +0000, David Woodhouse wrote:
> On Fri, 2009-11-06 at 11:41 +0900, FUJITA Tomonori wrote:
> > This is fine for 2.6.32 but we'll cleanly fix this by using
> > swiotlb_dma_ops later, right?
>
> Well, the idea was that with 'iommu=pt' we'd have passthrough mode for
> _decent_ devices, but the crappy devices without 64-bit DMA would just
> have the IOMMU enabled instead.
>
> We can see this as a simple classification bug -- we should be checking
> pdev->coherent_dma_mask as well as pdev->dma_mask:
>
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -2196,7 +2196,8 @@ static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
> * take them out of the 1:1 domain later.
> */
> if (!startup)
> - return pdev->dma_mask > DMA_BIT_MASK(32);
> + return (pdev->dma_mask & pdev->coherent_dma_mask) <
> + dma_get_required_mask();
>
> return 1;
> }
>
> That fixes the case of a 32-bit coherent_dma_mask as it was intended to
> be fixed.
>
> Unfortunately, Alex's hardware is more broken than that. It also likes
> to do stray reads from unmapped addresses -- addresses which used to be
> mapped at some time in the past. So he really _does_ want the IOMMU
> disabled, or in passthrough mode.
>
> But I think that's a special case and needs to be handled with a quirk,
> while the above patch actually addresses the problem we claimed we were
> trying to address.

I'm not sure what this quirk looks like, do you have something in mind?

> Handling Alex's broken hardware probably wants to be done with
> 'iommu=off' for now, and then when Chris's swiotlb fallback patches are
> done we can perhaps do something more cunning.

iommu=off means a feature regression from 2.6.31 and kills support for
being able to use VT-d for virtualization for a large percentage of
servers from a major vendor. I don't think Chris' patches actually
address this since we don't actually know what the DMA mask is for a
device until the driver claims it. How long do we wait before we drop
the swiotlb? I think his patch is really intended for the "oops, the
DMAR is broken, the hardware is bad, I can't init the hardware IOMMU,
whew we can fallback to swiotlb".

> I'm slightly reluctant to put the half-arsed 'try to allocate in the
> right region for broken devices but without full swiotlb support' option
> into 2.6.32.

Since the device also makes use of RMRRs, once we have it in the
si_domain, we're stuck. I think that means we needs swiotlb anytime
we're in passthrough mode. That's what 2.6.31, can we get it back for
2.6.32? Thanks,

Alex

2009-11-10 00:20:04

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Mon, 2009-11-09 at 16:32 -0700, Alex Williamson wrote:
> iommu=off means a feature regression from 2.6.31 and kills support for
> being able to use VT-d for virtualization for a large percentage of
> servers from a major vendor. I don't think Chris' patches actually
> address this since we don't actually know what the DMA mask is for a
> device until the driver claims it. How long do we wait before we drop
> the swiotlb?

With a quirk for the broken device in question, we'll know nice and
early that it's there, and hence that we have to keep swiotlb around.
That fits on top of what Chris is doing relatively well.

> I think his patch is really intended for the "oops, the
> DMAR is broken, the hardware is bad, I can't init the hardware IOMMU,
> whew we can fallback to swiotlb".

Well, in the case Chris is trying to handle it's not really that the
hardware is bad. It's more that the 'major vendor' of which you speak is
shipping completely crap firmware that hasn't been given _any_ form of
QA. They gave it VT-d support and obviously never once booted a VT-d
capable OS on it.

> > I'm slightly reluctant to put the half-arsed 'try to allocate in the
> > right region for broken devices but without full swiotlb support' option
> > into 2.6.32.
>
> Since the device also makes use of RMRRs, once we have it in the
> si_domain, we're stuck. I think that means we needs swiotlb anytime
> we're in passthrough mode. That's what 2.6.31, can we get it back for
> 2.6.32? Thanks,

That's what 2.6.31 did on _some_ hardware. It depends on whether you had
hardware or software passthrough. This particular broken device only
ever worked by accident in 2.6.31; it wasn't really by design.

But I suppose as a short-term measure, the patch you posted isn't so
bad. We'll fix it properly in 2.6.32 to use the non-passthrough mode for
devices with an inadequate coherent_dma_mask, and you can provide a
quirk which handles your "speshul" device differently.

We can also debate whether the non-passthrough mode for crap devices
(other than yours) should be to use the IOMMU, or to use swiotlb.

--
dwmw2

2009-11-10 00:46:56

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Wed, 2009-11-04 at 15:59 -0700, Alex Williamson wrote:
>
> @@ -2582,7 +2582,7 @@ static dma_addr_t __intel_map_single(struct
> device *hwdev, phys_addr_t paddr,
> BUG_ON(dir == DMA_NONE);
>
> if (iommu_no_mapping(hwdev))
> - return paddr;
> + return paddr + size > dma_mask ? 0 : paddr;

Especially on 32-bit without PAE, that 'paddr + size' can wrap.

How's this... (Alex, you can tell me if you still want it to be From:
you after I changed the comments).


From: Alex Williamson <[email protected]>
Date: Wed, 4 Nov 2009 15:59:34 -0700
Subject: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

The model for IOMMU passthrough is that decent devices that can cope
with DMA to all of memory get passthrough; crappy devices with a limited
dma_mask don't -- they get to use the IOMMU anyway.

This is done on the basis that IOMMU passthrough is usually wanted for
performance reasons, and it's only the decent PCI devices that you
really care about performance for, while the crappy 32-bit ones like
your USB controller can just use the IOMMU and you won't really care.

Unfortunately, the check for this was only looking at dev->dma_mask, not
at dev->coherent_dma_mask. And some devices have a 32-bit
coherent_dma_mask even though they have a full 64-bit dma_mask.

Even more unfortunately, fixing that simple oversight would upset
certain broken HP devices. Not only do they have a 32-bit
coherent_dma_mask, but they also have a tendency to do stray DMA to
unmapped addresses. And then they die when they take the DMA fault they
so richly deserve.

So if we do the 'correct' fix, it'll mean that affects users have to
disable IOMMU support completely on "a large percentage of servers from
a major vendor."

Personally, I have little sympathy -- given that this is the _same_
'major vendor' who is shipping machines which claim to have IOMMU
support but have obviously never _once_ booted a VT-d capable OS to do
any form of QA. But strictly speaking, it _would_ be a regression even
though it only ever worked by fluke and the hardware is arguably broken.

For 2.6.33, we'll come up with a quirk which gives swiotlb support
for this particular device, and other devices with an inadequate
coherent_dma_mask will just get normal IOMMU mapping.

The simplest fix for 2.6.32, though, is just to jump through some hoops
to try to allocate coherent DMA memory for such devices in a place that
they can reach. We'd use dma_generic_alloc_coherent() for this if it
existed on IA64.

Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
drivers/pci/intel-iommu.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b1e97e6..fe9ca58 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2582,7 +2582,7 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
BUG_ON(dir == DMA_NONE);

if (iommu_no_mapping(hwdev))
- return paddr;
+ return paddr > dma_mask - size ? 0 : paddr;

domain = get_valid_domain_for_dev(pdev);
if (!domain)
@@ -2767,7 +2767,15 @@ static void *intel_alloc_coherent(struct device *hwdev, size_t size,

size = PAGE_ALIGN(size);
order = get_order(size);
- flags &= ~(GFP_DMA | GFP_DMA32);
+
+ if (!iommu_no_mapping(hwdev))
+ flags &= ~(GFP_DMA | GFP_DMA32);
+ else if (hwdev->coherent_dma_mask < dma_get_required_mask()) {
+ if (hwdev->coherent_dma_mask < DMA_BIT_MASK(32))
+ flags |= GFP_DMA;
+ else
+ flags |= GFP_DMA32;
+ }

vaddr = (void *)__get_free_pages(flags, order);
if (!vaddr)
--
1.6.5.2



--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-11-10 01:01:46

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Tue, 2009-11-10 at 00:46 +0000, David Woodhouse wrote:
>
> if (iommu_no_mapping(hwdev))
> - return paddr;
> + return paddr > dma_mask - size ? 0 : paddr;

Hm, that's still wrong. If your mask is 0xffffffff and you map
0xfffff000 + 0x1000, that should be allowed, right? How about:

return paddr <= dma_mask - size + 1 ? paddr : 0;

And is this strictly necessary as part of the patch we're discussing?

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-11-10 01:28:35

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough

On Tue, 2009-11-10 at 01:01 +0000, David Woodhouse wrote:
> On Tue, 2009-11-10 at 00:46 +0000, David Woodhouse wrote:
> >
> > if (iommu_no_mapping(hwdev))
> > - return paddr;
> > + return paddr > dma_mask - size ? 0 : paddr;
>
> Hm, that's still wrong. If your mask is 0xffffffff and you map
> 0xfffff000 + 0x1000, that should be allowed, right? How about:
>
> return paddr <= dma_mask - size + 1 ? paddr : 0;

Ok, I was going to suggest that there was probably a -1 missing from my
patch. I'd also be surprised if you could actually get a paddr + size -
1 from a single allocation that would wrap. Would a device even be able
to handle that for DMA?

> And is this strictly necessary as part of the patch we're discussing?

Yeah, we could probably do without it. As for the commit comment, it's
rather scathing, but I can live with it. Thanks,

Alex

2009-11-11 15:23:01

by Fenghua Yu

[permalink] [raw]
Subject: [stable][PATCH] PCIe hot-plug for Intel IOMMU

To support PCIe hot plug in IOMMU, we register a notifier to respond to device
change action.

When the notifier gets BUS_NOTIFY_UNBOUND_DRIVER, it removes the device from its
DMAR domain.

A hot added device will be added into an IOMMU domain when it first does IOMMU
op. So there is no need to add more code for hot add.

Without the patch, after a hot-remove, a hot-added device on the same slot will
not work.

Signed-off-by: Fenghua Yu <[email protected]>

---

The patch missed 2.6.32 release. Could it be in 2.6.32 stable?

drivers/pci/intel-iommu.c | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 855dd7c..d8b8cfc 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3193,6 +3193,33 @@ static int __init init_iommu_sysfs(void)
}
#endif /* CONFIG_PM */

+/*
+ * Here we only respond to action of unbound device from driver.
+ *
+ * Added device is not attached to its DMAR domain here yet. That will happen
+ * when mapping the device to iova.
+ */
+static int device_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct device *dev = data;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct dmar_domain *domain;
+
+ domain = find_domain(pdev);
+ if (!domain)
+ return 0;
+
+ if (action == BUS_NOTIFY_UNBOUND_DRIVER && !iommu_pass_through)
+ domain_remove_one_dev_info(domain, pdev);
+
+ return 0;
+}
+
+static struct notifier_block device_nb = {
+ .notifier_call = device_notifier,
+};
+
int __init intel_iommu_init(void)
{
int ret = 0;
@@ -3245,6 +3272,8 @@ int __init intel_iommu_init(void)

register_iommu(&intel_iommu_ops);

+ bus_register_notifier(&pci_bus_type, &device_nb);
+
return 0;
}

2009-11-11 21:27:05

by Yinghai Lu

[permalink] [raw]
Subject: Re: [stable][PATCH] PCIe hot-plug for Intel IOMMU

On Wed, Nov 11, 2009 at 7:23 AM, Fenghua Yu <[email protected]> wrote:
> To support PCIe hot plug in IOMMU, we register a notifier to respond to device
> change action.
>
> When the notifier gets BUS_NOTIFY_UNBOUND_DRIVER, it removes the device from its
> DMAR domain.
>
> A hot added device will be added into an IOMMU domain when it first does IOMMU
> op. So there is no need to add more code for hot add.
>
> Without the patch, after a hot-remove, a hot-added device on the same slot will
> not work.
>
> Signed-off-by: Fenghua Yu <[email protected]>

Tested-by: Yinghai Lu <[email protected]>

>
> ---
>
> The patch missed 2.6.32 release. Could it be in 2.6.32 stable?

Jesse,
can you push to Linus?

looks like David is too busy.

YH

2009-11-12 02:37:54

by David Woodhouse

[permalink] [raw]
Subject: Re: [stable][PATCH] PCIe hot-plug for Intel IOMMU

On Wed, 2009-11-11 at 07:23 -0800, Fenghua Yu wrote:
> To support PCIe hot plug in IOMMU, we register a notifier to respond to device
> change action.
>
> When the notifier gets BUS_NOTIFY_UNBOUND_DRIVER, it removes the device from its
> DMAR domain.
>
> A hot added device will be added into an IOMMU domain when it first does IOMMU
> op. So there is no need to add more code for hot add.
>
> Without the patch, after a hot-remove, a hot-added device on the same slot will
> not work.
>
> Signed-off-by: Fenghua Yu <[email protected]>
>
> ---
>
> The patch missed 2.6.32 release. Could it be in 2.6.32 stable?

Not strictly a regression, but it would make a lot of sense.

Fenghua, please could you test what's in
git://git.infradead.org/~dwmw2/iommu-2.6.32.git on IA64 before I send it
to Linus? There are a couple of other fixes for HP brain damage there.


--
dwmw2

2009-11-12 23:33:17

by Fenghua Yu

[permalink] [raw]
Subject: RE: [stable][PATCH] PCIe hot-plug for Intel IOMMU

>Fenghua, please could you test what's in
>git://git.infradead.org/~dwmw2/iommu-2.6.32.git on IA64 before I send it
>to Linus? There are a couple of other fixes for HP brain damage there.

The iommu-2.6.32.git tree doesn't have problem for building, booting, and simple IOMMU testing on ia64. Since there is no environment, I don't test iommu hot-plug on ia64 (which you may not care any way).

Thanks.

-Fenghua
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2009-11-28 06:17:51

by David Woodhouse

[permalink] [raw]
Subject: Re: [stable][PATCH] PCIe hot-plug for Intel IOMMU

On Wed, 2009-11-11 at 13:27 -0800, Yinghai Lu wrote:
> On Wed, Nov 11, 2009 at 7:23 AM, Fenghua Yu <[email protected]> wrote:
> > To support PCIe hot plug in IOMMU, we register a notifier to respond to device
> > change action.
> >
> > When the notifier gets BUS_NOTIFY_UNBOUND_DRIVER, it removes the device from its
> > DMAR domain.
> >
> > A hot added device will be added into an IOMMU domain when it first does IOMMU
> > op. So there is no need to add more code for hot add.
> >
> > Without the patch, after a hot-remove, a hot-added device on the same slot will
> > not work.
> >
> > Signed-off-by: Fenghua Yu <[email protected]>
>
> Tested-by: Yinghai Lu <[email protected]>

But not with 'intel_iommu=igfx_off', I note. The call to find_domain()
will oops when it runs on the graphics device with ->archdata.iommu ==
DUMMY_DEVICE_DOMAIN_INFO (== -1). All other calls to find_domain() are
guaranteed not to happen for such devices.

One might argue that find_domain() should cope, but do we really want to
be doing that extra check in the unmap fast path?

Testing this fix now...

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 1840a05..f44a015 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3228,6 +3228,9 @@ static int device_notifier(struct notifier_block *nb,
struct pci_dev *pdev = to_pci_dev(dev);
struct dmar_domain *domain;

+ if (iommu_no_mapping(dev))
+ return 0;
+
domain = find_domain(pdev);
if (!domain)
return 0;


--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation