2011-02-16 20:40:21

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 0/2] intel-iommu: Fix domain_ids exhaustion

When we unbind a device from a driver, we don't properly unlink
the domain from the iommu, so we never free the domain id it
was using. We're typically limited to something like 256 domain
ids, so a loop of unbinding and rebinding a device can exhaust
this pretty quickly. If we're assigning the device to a KVM
guest, libvirt does exactly this each time the device is removed
from the host driver or added back. When we do run out, we oops
the kernel. Fix these.

v2:

We only want to call domain_exit() for domains automatically created
via the dma ops path. VM and SI domains have their own life cycle
and should not be destroyed here. With v1, if a device was unbound
from pci-stub while assigned to a VM, the kernel would oops on the
next call into iommu ops.

BTW, should we even be removing the device from the domain in the
VM domain case? Drivers and VM domains are (unfortunately) orthogonal
concepts here with the way KVM is currently wired. Thanks,

Alex

---

Alex Williamson (2):
intel-iommu: Fix get_domain_for_dev() error path
intel-iommu: Unlink domain from iommu


drivers/pci/intel-iommu.c | 15 +++++++++++++--
1 files changed, 13 insertions(+), 2 deletions(-)


2011-02-16 20:40:40

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 1/2] intel-iommu: Unlink domain from iommu

When we remove a device, we unlink the iommu from the domain, but
we never do the reverse unlinking of the domain from the iommu.
This means that we never clear iommu->domain_ids, eventually leading
to resource exhaustion if we repeatedly bind and unbind a device
to a driver. Also free empty domains to avoid a resource leak.

Signed-off-by: Alex Williamson <[email protected]>
---

drivers/pci/intel-iommu.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 4789f8e..b670b06 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3260,9 +3260,15 @@ static int device_notifier(struct notifier_block *nb,
if (!domain)
return 0;

- if (action == BUS_NOTIFY_UNBOUND_DRIVER && !iommu_pass_through)
+ if (action == BUS_NOTIFY_UNBOUND_DRIVER && !iommu_pass_through) {
domain_remove_one_dev_info(domain, pdev);

+ if (!(domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE) &&
+ !(domain->flags & DOMAIN_FLAG_STATIC_IDENTITY) &&
+ list_empty(&domain->devices))
+ domain_exit(domain);
+ }
+
return 0;
}

@@ -3411,6 +3417,11 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
domain->iommu_count--;
domain_update_iommu_cap(domain);
spin_unlock_irqrestore(&domain->iommu_lock, tmp_flags);
+
+ spin_lock_irqsave(&iommu->lock, tmp_flags);
+ clear_bit(domain->id, iommu->domain_ids);
+ iommu->domains[domain->id] = NULL;
+ spin_unlock_irqrestore(&iommu->lock, tmp_flags);
}

spin_unlock_irqrestore(&device_domain_lock, flags);

2011-02-16 20:41:05

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 2/2] intel-iommu: Fix get_domain_for_dev() error path

If we run out of domain_ids and fail iommu_attach_domain(), we
fall into domain_exit() without having setup enough of the
domain structure for this to do anything useful. In fact, it
typically runs off into the weeds walking the bogus domain->devices
list. Just free the domain.

Signed-off-by: Alex Williamson <[email protected]>
---

drivers/pci/intel-iommu.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index b670b06..b0343d1 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -1835,7 +1835,7 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)

ret = iommu_attach_domain(domain, iommu);
if (ret) {
- domain_exit(domain);
+ free_domain_mem(domain);
goto error;
}

2011-02-17 20:13:57

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] intel-iommu: Fix domain_ids exhaustion

Alex Williamson wrote:
> When we unbind a device from a driver, we don't properly unlink
> the domain from the iommu, so we never free the domain id it
> was using. We're typically limited to something like 256 domain
> ids, so a loop of unbinding and rebinding a device can exhaust
> this pretty quickly. If we're assigning the device to a KVM
> guest, libvirt does exactly this each time the device is removed
> from the host driver or added back. When we do run out, we oops
> the kernel. Fix these.
>
> v2:
>
> We only want to call domain_exit() for domains automatically created
> via the dma ops path. VM and SI domains have their own life cycle
> and should not be destroyed here. With v1, if a device was unbound
> from pci-stub while assigned to a VM, the kernel would oops on the
> next call into iommu ops.
>
> BTW, should we even be removing the device from the domain in the
> VM domain case? Drivers and VM domains are (unfortunately) orthogonal
> concepts here with the way KVM is currently wired. Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (2):
> intel-iommu: Fix get_domain_for_dev() error path
> intel-iommu: Unlink domain from iommu
>
>
> drivers/pci/intel-iommu.c | 15 +++++++++++++--
> 1 files changed, 13 insertions(+), 2 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Acked-by: Donald Dutile <[email protected]>

Subject: Re: [PATCH v2 0/2] intel-iommu: Fix domain_ids exhaustion

Once merged, should it go to -stable? If so, maybe it would be a good
idea to add that while collecting the Acked-By's...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2011-02-23 19:33:05

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] intel-iommu: Fix domain_ids exhaustion

On Thu, 2011-02-17 at 22:38 -0200, Henrique de Moraes Holschuh wrote:
> Once merged, should it go to -stable? If so, maybe it would be a good
> idea to add that while collecting the Acked-By's...

Yeah, I think this would be a valid -stable candidate. I'm hoping we'll
get an Ack from David here soon...

Alex