Hi,
The Mediate Device is a framework for fine-grained physical device
sharing across the isolated domains. Currently the mdev framework
is designed to be independent of the platform IOMMU support. As the
result, the DMA isolation relies on the mdev parent device in a
vendor specific way.
There are several cases where a mediated device could be protected
and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
[1] introduces a new translation mode called 'scalable mode', which
enables PASID-granular translations. The vt-d scalable mode is the
key ingredient for Scalable I/O Virtualization [2] [3] which allows
sharing a device in minimal possible granularity (ADI - Assignable
Device Interface).
A mediated device backed by an ADI could be protected and isolated
by the IOMMU since 1) the parent device supports tagging an unique
PASID to all DMA traffic out of the mediated device; and 2) the DMA
translation unit (IOMMU) supports the PASID granular translation.
We can apply IOMMU protection and isolation to this kind of devices
just as what we are doing with an assignable PCI device.
In order to distinguish the IOMMU-capable mediated devices from those
which still need to rely on parent devices, this patch set adds two
new members in struct mdev_device.
* iommu_device
- This, if set, indicates that the mediated device could
be fully isolated and protected by IOMMU via attaching
an iommu domain to this device. If empty, it indicates
using vendor defined isolation.
* iommu_domain
- This is a place holder for an iommu domain. A domain
could be store here for later use once it has been
attached to the iommu_device of this mdev.
Below helpers are added to set and get above iommu device
and iommu domain pointers in mdev core implementation.
* mdev_set/get_iommu_device(dev, iommu_device)
- Set or get the iommu device which represents this mdev
in IOMMU's device scope. Drivers don't need to set the
iommu device if it uses vendor defined isolation.
* mdev_set/get_iommu_domain(domain)
- A iommu domain which has been attached to the iommu
device in order to protect and isolate the mediated
device will be kept in the mdev data structure and
could be retrieved later.
The mdev parent device driver could opt-in that the mdev could be
fully isolated and protected by the IOMMU when the mdev is being
created by invoking mdev_set_iommu_device() in its @create().
In the vfio_iommu_type1_attach_group(), a domain allocated through
iommu_domain_alloc() will be attached to the mdev iommu device if
an iommu device has been set. Otherwise, the dummy external domain
will be used and all the DMA isolation and protection are routed to
parent driver as the result.
On IOMMU side, a basic requirement is allowing to attach multiple
domains to a PCI device if the device advertises the capability
and the IOMMU hardware supports finer granularity translations than
the normal PCI Source ID based translation.
As the result, a PCI device could work in two modes: normal mode
and auxiliary mode. In the normal mode, a pci device could be
isolated in the Source ID granularity; the pci device itself could
be assigned to a user application by attaching a single domain
to it. In the auxiliary mode, a pci device could be isolated in
finer granularity, hence subsets of the device could be assigned
to different user level application by attaching a different domain
to each subset.
The device driver is able to switch between above two modes with
below interfaces:
* iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
- Represents the ability of supporting multiple domains
per device.
* iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
- Enable the multiple domains capability for the device
referenced by @dev.
* iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
- Disable the multiple domains capability for the device
referenced by @dev.
* iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
- Return ID used for finer-granularity DMA translation.
* iommu_attach_device_aux(domain, dev)
- Attach a domain to the device in the auxiliary mode.
* iommu_detach_device_aux(domain, dev)
- Detach the aux domain from device.
In order for the ease of discussion, sometimes we call "a domain in
auxiliary mode' or simply 'an auxiliary domain' when a domain is
attached to a device for finer granularity translations. But we need
to keep in mind that this doesn't mean there is a differnt domain
type. A same domain could be bound to a device for Source ID based
translation, and bound to another device for finer granularity
translation at the same time.
This patch series extends both IOMMU and vfio components to support
mdev device passing through when it could be isolated and protected
by the IOMMU units. The first part of this series (PATCH 1/08~5/08)
adds the interfaces and implementation of the multiple domains per
device. The second part (PATCH 6/08~8/08) adds the iommu device
attribute to each mdev, determines isolation type according to the
existence of an iommu device when attaching group in vfio type1 iommu
module, and attaches the domain to iommu aware mediated devices.
This patch series depends on a patch set posted here [4] for discussion
which added scalable mode support in Intel IOMMU driver.
References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[4] https://lkml.org/lkml/2018/11/5/136
Best regards,
Lu Baolu
Change log:
v3->v4:
- Use aux domain specific interfaces for domain attach and detach.
- Rebase all patches to 4.20-rc1.
v2->v3:
- Remove domain type enum and use a pointer on mdev_device instead.
- Add a generic interface for getting/setting per device iommu
attributions. And use it for query aux domain capability, enable
aux domain and disable aux domain purpose.
- Reuse iommu_domain_get_attr() to retrieve the id in a aux domain.
- We discussed the impact of the default domain implementation
on reusing iommu_at(de)tach_device() interfaces. We agreed
that reusing iommu_at(de)tach_device() interfaces is the right
direction and we could tweak the code to remove the impact.
https://www.spinics.net/lists/kvm/msg175285.html
- Removed the RFC tag since no objections received.
- This patch has been submitted separately.
https://www.spinics.net/lists/kvm/msg173936.html
v1->v2:
- Rewrite the patches with the concept of auxiliary domains.
Lu Baolu (8):
iommu: Add APIs for multiple domains per device
iommu/vt-d: Add multiple domains per device query
iommu/vt-d: Enable/disable multiple domains per device
iommu/vt-d: Attach/detach domains in auxiliary mode
iommu/vt-d: Return ID associated with an auxiliary domain
vfio/mdev: Add iommu place holders in mdev_device
vfio/type1: Add domain at(de)taching group helpers
vfio/type1: Handle different mdev isolation type
drivers/iommu/intel-iommu.c | 315 ++++++++++++++++++++++++++++---
drivers/iommu/iommu.c | 52 +++++
drivers/vfio/mdev/mdev_core.c | 36 ++++
drivers/vfio/mdev/mdev_private.h | 2 +
drivers/vfio/vfio_iommu_type1.c | 162 ++++++++++++++--
include/linux/intel-iommu.h | 11 ++
include/linux/iommu.h | 52 +++++
include/linux/mdev.h | 23 +++
8 files changed, 618 insertions(+), 35 deletions(-)
--
2.17.1
Add the response to IOMMU_DEV_ATTR_AUXD_CAPABILITY capability query
through iommu_get_dev_attr().
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/iommu/intel-iommu.c | 38 +++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5e149d26ea9b..298f7a3fafe8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5167,6 +5167,24 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
return phys;
}
+static inline bool scalable_mode_support(void)
+{
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+ bool ret = true;
+
+ rcu_read_lock();
+ for_each_active_iommu(iommu, drhd) {
+ if (!sm_supported(iommu)) {
+ ret = false;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
static bool intel_iommu_capable(enum iommu_cap cap)
{
if (cap == IOMMU_CAP_CACHE_COHERENCY)
@@ -5331,6 +5349,25 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
}
#endif /* CONFIG_INTEL_IOMMU_SVM */
+static int intel_iommu_get_dev_attr(struct device *dev,
+ enum iommu_dev_attr attr, void *data)
+{
+ int ret = 0;
+ bool *auxd_capable;
+
+ switch (attr) {
+ case IOMMU_DEV_ATTR_AUXD_CAPABILITY:
+ auxd_capable = data;
+ *auxd_capable = scalable_mode_support();
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -5345,6 +5382,7 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions = intel_iommu_get_resv_regions,
.put_resv_regions = intel_iommu_put_resv_regions,
.device_group = pci_device_group,
+ .get_dev_attr = intel_iommu_get_dev_attr,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
};
--
2.17.1
Add iommu ops for enabling and disabling multiple domains per
device.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/iommu/intel-iommu.c | 65 ++++++++++++++++++++++++++++++++++++-
include/linux/intel-iommu.h | 1 +
2 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 298f7a3fafe8..2c86ac71c774 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2476,6 +2476,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->domain = domain;
info->iommu = iommu;
info->pasid_table = NULL;
+ info->auxd_enabled = 0;
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5353,13 +5354,74 @@ static int intel_iommu_get_dev_attr(struct device *dev,
enum iommu_dev_attr attr, void *data)
{
int ret = 0;
- bool *auxd_capable;
+ struct device_domain_info *info;
+ bool *auxd_capable, *auxd_enabled;
switch (attr) {
case IOMMU_DEV_ATTR_AUXD_CAPABILITY:
auxd_capable = data;
*auxd_capable = scalable_mode_support();
break;
+ case IOMMU_DEV_ATTR_AUXD_ENABLED:
+ auxd_enabled = data;
+ info = dev->archdata.iommu;
+ *auxd_enabled = info && info->auxd_enabled;
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int intel_iommu_enable_auxd(struct device *dev)
+{
+ struct device_domain_info *info;
+ struct dmar_domain *domain;
+ unsigned long flags;
+
+ if (!scalable_mode_support())
+ return -ENODEV;
+
+ domain = get_valid_domain_for_dev(dev);
+ if (!domain)
+ return -ENODEV;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ info = dev->archdata.iommu;
+ info->auxd_enabled = 1;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return 0;
+}
+
+static int intel_iommu_disable_auxd(struct device *dev)
+{
+ struct device_domain_info *info;
+ unsigned long flags;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ info = dev->archdata.iommu;
+ if (!WARN_ON(!info))
+ info->auxd_enabled = 0;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return 0;
+}
+
+static int intel_iommu_set_dev_attr(struct device *dev,
+ enum iommu_dev_attr attr, void *data)
+{
+ int ret = 0;
+
+ switch (attr) {
+ case IOMMU_DEV_ATTR_AUXD_ENABLE:
+ ret = intel_iommu_enable_auxd(dev);
+ break;
+ case IOMMU_DEV_ATTR_AUXD_DISABLE:
+ ret = intel_iommu_disable_auxd(dev);
+ break;
default:
ret = -EINVAL;
break;
@@ -5383,6 +5445,7 @@ const struct iommu_ops intel_iommu_ops = {
.put_resv_regions = intel_iommu_put_resv_regions,
.device_group = pci_device_group,
.get_dev_attr = intel_iommu_get_dev_attr,
+ .set_dev_attr = intel_iommu_set_dev_attr,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
};
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index d174724e131f..6b198e13e75e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -552,6 +552,7 @@ struct device_domain_info {
u8 pri_enabled:1;
u8 ats_supported:1;
u8 ats_enabled:1;
+ u8 auxd_enabled:1; /* Multiple domains per device */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
--
2.17.1
When multiple domains per device has been enabled by the
device driver, the device will tag the default PASID for
the domain to all DMA traffics out of the subset of this
device; and the IOMMU should translate the DMA requests
in PASID granularity.
This extends the intel_iommu_attach/detach_device() ops
to support managing PASID granular translation structures
when the device driver has enabled multiple domains per
device.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/iommu/intel-iommu.c | 192 +++++++++++++++++++++++++++++++-----
include/linux/intel-iommu.h | 10 ++
2 files changed, 180 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2c86ac71c774..a61b25ad0d3b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2477,6 +2477,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->iommu = iommu;
info->pasid_table = NULL;
info->auxd_enabled = 0;
+ INIT_LIST_HEAD(&info->auxiliary_domains);
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5010,35 +5011,134 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
domain_exit(to_dmar_domain(domain));
}
-static int intel_iommu_attach_device(struct iommu_domain *domain,
- struct device *dev)
+/*
+ * Check whether a @domain will be attached to the @dev in the
+ * auxiliary mode.
+ */
+static inline bool
+is_device_attach_aux_domain(struct device *dev, struct iommu_domain *domain)
{
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct intel_iommu *iommu;
- int addr_width;
- u8 bus, devfn;
+ struct device_domain_info *info = dev->archdata.iommu;
- if (device_is_rmrr_locked(dev)) {
- dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
- return -EPERM;
- }
+ return info && info->auxd_enabled &&
+ domain->type == IOMMU_DOMAIN_UNMANAGED;
+}
- /* normally dev is not mapped */
- if (unlikely(domain_context_mapped(dev))) {
- struct dmar_domain *old_domain;
+static void auxiliary_link_device(struct dmar_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev->archdata.iommu;
- old_domain = find_domain(dev);
- if (old_domain) {
- rcu_read_lock();
- dmar_remove_one_dev_info(old_domain, dev);
- rcu_read_unlock();
+ assert_spin_locked(&device_domain_lock);
+ if (WARN_ON(!info))
+ return;
- if (!domain_type_is_vm_or_si(old_domain) &&
- list_empty(&old_domain->devices))
- domain_exit(old_domain);
+ domain->auxd_refcnt++;
+ list_add(&domain->auxd, &info->auxiliary_domains);
+}
+
+static void auxiliary_unlink_device(struct dmar_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev->archdata.iommu;
+
+ assert_spin_locked(&device_domain_lock);
+ if (WARN_ON(!info))
+ return;
+
+ list_del(&domain->auxd);
+ domain->auxd_refcnt--;
+
+ if (!domain->auxd_refcnt && domain->default_pasid > 0)
+ intel_pasid_free_id(domain->default_pasid);
+}
+
+static int domain_add_dev_auxd(struct dmar_domain *domain,
+ struct device *dev)
+{
+ int ret;
+ u8 bus, devfn;
+ unsigned long flags;
+ struct intel_iommu *iommu;
+
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu)
+ return -ENODEV;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ if (domain->default_pasid <= 0) {
+ domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN,
+ pci_max_pasids(to_pci_dev(dev)), GFP_ATOMIC);
+ if (domain->default_pasid < 0) {
+ pr_err("Can't allocate default pasid\n");
+ ret = -ENODEV;
+ goto pasid_failed;
}
}
+ spin_lock(&iommu->lock);
+ ret = domain_attach_iommu(domain, iommu);
+ if (ret)
+ goto attach_failed;
+
+ /* Setup the PASID entry for mediated devices: */
+ ret = intel_pasid_setup_second_level(iommu, domain, dev,
+ domain->default_pasid);
+ if (ret)
+ goto table_failed;
+ spin_unlock(&iommu->lock);
+
+ auxiliary_link_device(domain, dev);
+
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return 0;
+
+table_failed:
+ domain_detach_iommu(domain, iommu);
+attach_failed:
+ spin_unlock(&iommu->lock);
+ if (!domain->auxd_refcnt && domain->default_pasid > 0)
+ intel_pasid_free_id(domain->default_pasid);
+pasid_failed:
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return ret;
+}
+
+static void domain_remove_dev_aux(struct dmar_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info;
+ struct intel_iommu *iommu;
+ unsigned long flags;
+
+ if (!is_device_attach_aux_domain(dev, &domain->domain))
+ return;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ info = dev->archdata.iommu;
+ iommu = info->iommu;
+
+ intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
+
+ auxiliary_unlink_device(domain, dev);
+
+ spin_lock(&iommu->lock);
+ domain_detach_iommu(domain, iommu);
+ spin_unlock(&iommu->lock);
+
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+}
+
+static int __intel_iommu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu;
+ int addr_width;
+ u8 bus, devfn;
+
iommu = device_to_iommu(dev, &bus, &devfn);
if (!iommu)
return -ENODEV;
@@ -5071,7 +5171,47 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
dmar_domain->agaw--;
}
- return domain_add_dev_info(dmar_domain, dev);
+ if (is_device_attach_aux_domain(dev, domain))
+ return domain_add_dev_auxd(dmar_domain, dev);
+ else
+ return domain_add_dev_info(dmar_domain, dev);
+}
+
+static int intel_iommu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ if (device_is_rmrr_locked(dev)) {
+ dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
+ return -EPERM;
+ }
+
+ if (is_device_attach_aux_domain(dev, domain))
+ return -EPERM;
+
+ /* normally dev is not mapped */
+ if (unlikely(domain_context_mapped(dev))) {
+ struct dmar_domain *old_domain;
+
+ old_domain = find_domain(dev);
+ if (old_domain) {
+ rcu_read_lock();
+ dmar_remove_one_dev_info(old_domain, dev);
+ rcu_read_unlock();
+
+ if (!domain_type_is_vm_or_si(old_domain) &&
+ list_empty(&old_domain->devices))
+ domain_exit(old_domain);
+ }
+ }
+
+ return __intel_iommu_attach_device(domain, dev);
+}
+
+static int intel_iommu_attach_device_aux(struct iommu_domain *domain,
+ struct device *dev)
+{
+ return is_device_attach_aux_domain(dev, domain) ?
+ __intel_iommu_attach_device(domain, dev) : -EPERM;
}
static void intel_iommu_detach_device(struct iommu_domain *domain,
@@ -5080,6 +5220,12 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
}
+static void intel_iommu_detach_device_aux(struct iommu_domain *domain,
+ struct device *dev)
+{
+ domain_remove_dev_aux(to_dmar_domain(domain), dev);
+}
+
static int intel_iommu_map(struct iommu_domain *domain,
unsigned long iova, phys_addr_t hpa,
size_t size, int iommu_prot)
@@ -5436,6 +5582,8 @@ const struct iommu_ops intel_iommu_ops = {
.domain_free = intel_iommu_domain_free,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
+ .attach_dev_aux = intel_iommu_attach_device_aux,
+ .detach_dev_aux = intel_iommu_detach_device_aux,
.map = intel_iommu_map,
.unmap = intel_iommu_unmap,
.iova_to_phys = intel_iommu_iova_to_phys,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 6b198e13e75e..678c7fb05e74 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -473,9 +473,11 @@ struct dmar_domain {
/* Domain ids per IOMMU. Use u16 since
* domain ids are 16 bit wide according
* to VT-d spec, section 9.3 */
+ unsigned int auxd_refcnt; /* Refcount of auxiliary attaching */
bool has_iotlb_device;
struct list_head devices; /* all devices' list */
+ struct list_head auxd; /* link to device's auxiliary list */
struct iova_domain iovad; /* iova's that belong to this domain */
struct dma_pte *pgd; /* virtual address */
@@ -494,6 +496,11 @@ struct dmar_domain {
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr; /* maximum mapped address */
+ int default_pasid; /*
+ * The default pasid used for non-SVM
+ * traffic on mediated devices.
+ */
+
struct iommu_domain domain; /* generic domain data structure for
iommu core */
};
@@ -543,6 +550,9 @@ struct device_domain_info {
struct list_head link; /* link to domain siblings */
struct list_head global; /* link to global list */
struct list_head table; /* link to pasid table */
+ struct list_head auxiliary_domains; /* auxiliary domains
+ * attached to this device
+ */
u8 bus; /* PCI bus number */
u8 devfn; /* PCI devfn number */
u16 pfsid; /* SRIOV physical function source ID */
--
2.17.1
This adds support to return the default pasid associated with
an auxiliary domain. The PCI device which is bound with this
domain should use this value as the pasid for all DMA requests
of the subset of device which is isolated and protected with
this domain.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/iommu/intel-iommu.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a61b25ad0d3b..49a278a699b0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5576,6 +5576,27 @@ static int intel_iommu_set_dev_attr(struct device *dev,
return ret;
}
+static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
+ enum iommu_attr attr, void *data)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ int ret = -EINVAL, *id;
+
+ switch (attr) {
+ case DOMAIN_ATTR_AUXD_ID:
+ if (dmar_domain->default_pasid > 0) {
+ ret = 0;
+ id = data;
+ *id = dmar_domain->default_pasid;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return ret;
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -5592,6 +5613,7 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions = intel_iommu_get_resv_regions,
.put_resv_regions = intel_iommu_put_resv_regions,
.device_group = pci_device_group,
+ .domain_get_attr = intel_iommu_domain_get_attr,
.get_dev_attr = intel_iommu_get_dev_attr,
.set_dev_attr = intel_iommu_set_dev_attr,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
--
2.17.1
A parent device might create different types of mediated
devices. For example, a mediated device could be created
by the parent device with full isolation and protection
provided by the IOMMU. One usage case could be found on
Intel platforms where a mediated device is an assignable
subset of a PCI, the DMA requests on behalf of it are all
tagged with a PASID. Since IOMMU supports PASID-granular
translations (scalable mode in vt-d 3.0), this mediated
device could be individually protected and isolated by an
IOMMU.
This patch adds two new members in struct mdev_device:
* iommu_device
- This, if set, indicates that the mediated device could
be fully isolated and protected by IOMMU via attaching
an iommu domain to this device. If empty, it indicates
using vendor defined isolation.
* iommu_domain
- This is a place holder for an iommu domain. A domain
could be store here for later use once it has been
attached to the iommu_device of this mdev.
Below helpers are added to set and get above iommu device
and iommu domain pointers.
* mdev_set/get_iommu_device(dev, iommu_device)
- Set or get the iommu device which represents this mdev
in IOMMU's device scope. Drivers don't need to set the
iommu device if it uses vendor defined isolation.
* mdev_set/get_iommu_domain(domain)
- A iommu domain which has been attached to the iommu
device in order to protect and isolate the mediated
device will be kept in the mdev data structure and
could be retrieved later.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Suggested-by: Kevin Tian <[email protected]>
Suggested-by: Alex Williamson <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/vfio/mdev/mdev_core.c | 36 ++++++++++++++++++++++++++++++++
drivers/vfio/mdev/mdev_private.h | 2 ++
include/linux/mdev.h | 23 ++++++++++++++++++++
3 files changed, 61 insertions(+)
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0212f0ee8aea..5119809225c5 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool force_remove)
return 0;
}
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+
+ mdev->iommu_device = iommu_device;
+
+ return 0;
+}
+EXPORT_SYMBOL(mdev_set_iommu_device);
+
+struct device *mdev_get_iommu_device(struct device *dev)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+
+ return mdev->iommu_device;
+}
+EXPORT_SYMBOL(mdev_get_iommu_device);
+
+int mdev_set_iommu_domain(struct device *dev, void *domain)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+
+ mdev->iommu_domain = domain;
+
+ return 0;
+}
+EXPORT_SYMBOL(mdev_set_iommu_domain);
+
+void *mdev_get_iommu_domain(struct device *dev)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+
+ return mdev->iommu_domain;
+}
+EXPORT_SYMBOL(mdev_get_iommu_domain);
+
static int __init mdev_init(void)
{
return mdev_bus_register();
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index b5819b7d7ef7..c01518068e84 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -34,6 +34,8 @@ struct mdev_device {
struct list_head next;
struct kobject *type_kobj;
bool active;
+ struct device *iommu_device;
+ void *iommu_domain;
};
#define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index b6e048e1045f..c46777d3e568 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -14,6 +14,29 @@
#define MDEV_H
struct mdev_device;
+struct iommu_domain;
+
+/*
+ * Called by the parent device driver to set the PCI device which represents
+ * this mdev in iommu protection scope. By default, the iommu device is NULL,
+ * that indicates using vendor defined isolation.
+ *
+ * @dev: the mediated device that iommu will isolate.
+ * @iommu_device: a pci device which represents the iommu for @dev.
+ *
+ * Return 0 for success, otherwise negative error value.
+ */
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
+
+struct device *mdev_get_iommu_device(struct device *dev);
+
+/*
+ * Called by vfio iommu modules to save the iommu domain after a domain being
+ * attached to the mediated device.
+ */
+int mdev_set_iommu_domain(struct device *dev, void *domain);
+
+void *mdev_get_iommu_domain(struct device *dev);
/**
* struct mdev_parent_ops - Structure to be registered for each parent device to
--
2.17.1
This adds helpers to attach or detach a domain to a
group. This will replace iommu_attach_group() which
only works for pci devices.
If a domain is attaching to a group which includes the
mediated devices, it should attach to the iommu device
(a pci device which represents the mdev in iommu scope)
instead. The added helper supports attaching domain to
groups for both pci and mdev devices.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 114 ++++++++++++++++++++++++++++++--
1 file changed, 107 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d9fd3188615d..178264b330e7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -91,6 +91,7 @@ struct vfio_dma {
struct vfio_group {
struct iommu_group *iommu_group;
struct list_head next;
+ bool mdev_group; /* An mdev group */
};
/*
@@ -1327,6 +1328,105 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
return ret;
}
+static struct device *vfio_mdev_get_iommu_device(struct device *dev)
+{
+ struct device *(*fn)(struct device *dev);
+ struct device *iommu_parent;
+
+ fn = symbol_get(mdev_get_iommu_device);
+ if (fn) {
+ iommu_parent = fn(dev);
+ symbol_put(mdev_get_iommu_device);
+
+ return iommu_parent;
+ }
+
+ return NULL;
+}
+
+static int vfio_mdev_set_domain(struct device *dev, struct iommu_domain *domain)
+{
+ int (*fn)(struct device *dev, void *domain);
+ int ret;
+
+ fn = symbol_get(mdev_set_iommu_domain);
+ if (fn) {
+ ret = fn(dev, domain);
+ symbol_put(mdev_set_iommu_domain);
+
+ return ret;
+ }
+
+ return -EINVAL;
+}
+
+static int vfio_mdev_attach_domain(struct device *dev, void *data)
+{
+ struct iommu_domain *domain = data;
+ struct device *iommu_device;
+ int ret;
+
+ ret = vfio_mdev_set_domain(dev, domain);
+ if (ret)
+ return ret;
+
+ iommu_device = vfio_mdev_get_iommu_device(dev);
+ if (iommu_device) {
+ bool aux_mode = false;
+
+ iommu_get_dev_attr(iommu_device,
+ IOMMU_DEV_ATTR_AUXD_ENABLED, &aux_mode);
+ if (aux_mode)
+ return iommu_attach_device_aux(domain, iommu_device);
+ else
+ return iommu_attach_device(domain, iommu_device);
+ }
+
+ return -EINVAL;
+}
+
+static int vfio_mdev_detach_domain(struct device *dev, void *data)
+{
+ struct iommu_domain *domain = data;
+ struct device *iommu_device;
+
+ vfio_mdev_set_domain(dev, NULL);
+ iommu_device = vfio_mdev_get_iommu_device(dev);
+ if (iommu_device) {
+ bool aux_mode = false;
+
+ iommu_get_dev_attr(iommu_device,
+ IOMMU_DEV_ATTR_AUXD_ENABLED, &aux_mode);
+ if (aux_mode)
+ iommu_detach_device_aux(domain, iommu_device);
+ else
+ iommu_detach_device(domain, iommu_device);
+ }
+
+ return 0;
+}
+
+static int vfio_iommu_attach_group(struct vfio_domain *domain,
+ struct vfio_group *group)
+{
+ if (group->mdev_group)
+ return iommu_group_for_each_dev(group->iommu_group,
+ domain->domain,
+ vfio_mdev_attach_domain);
+ else
+ return iommu_attach_group(domain->domain, group->iommu_group);
+}
+
+static void vfio_iommu_detach_group(struct vfio_domain *domain,
+ struct vfio_group *group)
+{
+ if (group->mdev_group)
+ iommu_group_for_each_dev(group->iommu_group, domain->domain,
+ vfio_mdev_detach_domain);
+ else
+ iommu_detach_group(domain->domain, group->iommu_group);
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
@@ -1402,7 +1502,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
goto out_domain;
}
- ret = iommu_attach_group(domain->domain, iommu_group);
+ ret = vfio_iommu_attach_group(domain, group);
if (ret)
goto out_domain;
@@ -1434,8 +1534,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
list_for_each_entry(d, &iommu->domain_list, next) {
if (d->domain->ops == domain->domain->ops &&
d->prot == domain->prot) {
- iommu_detach_group(domain->domain, iommu_group);
- if (!iommu_attach_group(d->domain, iommu_group)) {
+ vfio_iommu_detach_group(domain, group);
+ if (!vfio_iommu_attach_group(d, group)) {
list_add(&group->next, &d->group_list);
iommu_domain_free(domain->domain);
kfree(domain);
@@ -1443,7 +1543,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;
}
- ret = iommu_attach_group(domain->domain, iommu_group);
+ ret = vfio_iommu_attach_group(domain, group);
if (ret)
goto out_domain;
}
@@ -1469,7 +1569,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;
out_detach:
- iommu_detach_group(domain->domain, iommu_group);
+ vfio_iommu_detach_group(domain, group);
out_domain:
iommu_domain_free(domain->domain);
out_free:
@@ -1560,7 +1660,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
if (!group)
continue;
- iommu_detach_group(domain->domain, iommu_group);
+ vfio_iommu_detach_group(domain, group);
list_del(&group->next);
kfree(group);
/*
@@ -1625,7 +1725,7 @@ static void vfio_release_domain(struct vfio_domain *domain, bool external)
list_for_each_entry_safe(group, group_tmp,
&domain->group_list, next) {
if (!external)
- iommu_detach_group(domain->domain, group->iommu_group);
+ vfio_iommu_detach_group(domain, group);
list_del(&group->next);
kfree(group);
}
--
2.17.1
This adds the support to determine the isolation type
of a mediated device group by checking whether it has
an iommu device. If an iommu device exists, an iommu
domain will be allocated and then attached to the iommu
device. Otherwise, keep the same behavior as it is.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 178264b330e7..eed26129f58c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1427,13 +1427,40 @@ static void vfio_iommu_detach_group(struct vfio_domain *domain,
iommu_detach_group(domain->domain, group->iommu_group);
}
+static bool vfio_bus_is_mdev(struct bus_type *bus)
+{
+ struct bus_type *mdev_bus;
+ bool ret = false;
+
+ mdev_bus = symbol_get(mdev_bus_type);
+ if (mdev_bus) {
+ ret = (bus == mdev_bus);
+ symbol_put(mdev_bus_type);
+ }
+
+ return ret;
+}
+
+static int vfio_mdev_iommu_device(struct device *dev, void *data)
+{
+ struct device **old = data, *new;
+
+ new = vfio_mdev_get_iommu_device(dev);
+ if (*old && *old != new)
+ return -EINVAL;
+
+ *old = new;
+
+ return 0;
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
struct vfio_iommu *iommu = iommu_data;
struct vfio_group *group;
struct vfio_domain *domain, *d;
- struct bus_type *bus = NULL, *mdev_bus;
+ struct bus_type *bus = NULL;
int ret;
bool resv_msi, msi_remap;
phys_addr_t resv_msi_base;
@@ -1468,11 +1495,18 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_free;
- mdev_bus = symbol_get(mdev_bus_type);
+ if (vfio_bus_is_mdev(bus)) {
+ struct device *iommu_device = NULL;
- if (mdev_bus) {
- if ((bus == mdev_bus) && !iommu_present(bus)) {
- symbol_put(mdev_bus_type);
+ group->mdev_group = true;
+
+ /* Determine the isolation type */
+ ret = iommu_group_for_each_dev(iommu_group, &iommu_device,
+ vfio_mdev_iommu_device);
+ if (ret)
+ goto out_free;
+
+ if (!iommu_device) {
if (!iommu->external_domain) {
INIT_LIST_HEAD(&domain->group_list);
iommu->external_domain = domain;
@@ -1482,9 +1516,11 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
list_add(&group->next,
&iommu->external_domain->group_list);
mutex_unlock(&iommu->lock);
+
return 0;
}
- symbol_put(mdev_bus_type);
+
+ bus = iommu_device->bus;
}
domain->domain = iommu_domain_alloc(bus);
--
2.17.1
Sharing a physical PCI device in a finer-granularity way
is becoming a consensus in the industry. IOMMU vendors
are also engaging efforts to support such sharing as well
as possible. Among the efforts, the capability of support
finer-granularity DMA isolation is a common requirement
due to the security consideration. With finer-granularity
DMA isolation, all DMA requests out of or to a subset of
a physical PCI device can be protected by the IOMMU. As a
result, there is a request in software to attach multiple
domains to a physical PCI device. One example of such use
model is the Intel Scalable IOV [1] [2]. The Intel vt-d
3.0 spec [3] introduces the scalable mode which enables
PASID granularity DMA isolation.
This adds the APIs to support multiple domains per device.
In order to ease the discussions, we call it 'a domain in
auxiliary mode' or simply 'auxiliary domain' when multiple
domains are attached to a physical device.
The APIs includes:
* iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
- Represents the ability of supporting multiple domains
per device.
* iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLED)
- Checks whether the device identified by @dev is working
in auxiliary mode.
* iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
- Enables the multiple domains capability for the device
referenced by @dev.
* iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
- Disables the multiple domains capability for the device
referenced by @dev.
* iommu_attach_device_aux(domain, dev)
- Attaches @domain to @dev in the auxiliary mode. Multiple
domains could be attached to a single device in the
auxiliary mode with each domain representing an isolated
address space for an assignable subset of the device.
* iommu_detach_device_aux(domain, dev)
- Detach @domain which has been attached to @dev in the
auxiliary mode.
* iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
- Return ID used for finer-granularity DMA translation.
For the Intel Scalable IOV usage model, this will be
a PASID. The device which supports Scalalbe IOV needs
to writes this ID to the device register so that DMA
requests could be tagged with a right PASID prefix.
Many people involved in discussions of this design.
Kevin Tian <[email protected]>
Liu Yi L <[email protected]>
Ashok Raj <[email protected]>
Sanjay Kumar <[email protected]>
Jacob Pan <[email protected]>
Alex Williamson <[email protected]>
Jean-Philippe Brucker <[email protected]>
and some discussions can be found here [4].
[1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[4] https://lkml.org/lkml/2018/7/26/4
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Suggested-by: Kevin Tian <[email protected]>
Suggested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommu.c | 52 +++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 52 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index edbdf5d6962c..0b7c96d1425e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2030,3 +2030,55 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
return 0;
}
EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/*
+ * Generic interfaces to get or set per device IOMMU attributions.
+ */
+int iommu_get_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->get_dev_attr)
+ return ops->get_dev_attr(dev, attr, data);
+
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(iommu_get_dev_attr);
+
+int iommu_set_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->set_dev_attr)
+ return ops->set_dev_attr(dev, attr, data);
+
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(iommu_set_dev_attr);
+
+/*
+ * APIs to attach/detach a domain to/from a device in the
+ * auxiliary mode.
+ */
+int iommu_attach_device_aux(struct iommu_domain *domain, struct device *dev)
+{
+ int ret = -ENODEV;
+
+ if (domain->ops->attach_dev_aux)
+ ret = domain->ops->attach_dev_aux(domain, dev);
+
+ if (!ret)
+ trace_attach_device_to_domain(dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_attach_device_aux);
+
+void iommu_detach_device_aux(struct iommu_domain *domain, struct device *dev)
+{
+ if (domain->ops->detach_dev_aux) {
+ domain->ops->detach_dev_aux(domain, dev);
+ trace_detach_device_from_domain(dev);
+ }
+}
+EXPORT_SYMBOL_GPL(iommu_detach_device_aux);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a1d28f42cb77..9bf1b3f2457a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,7 @@ enum iommu_attr {
DOMAIN_ATTR_NESTING, /* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_MAX,
+ DOMAIN_ATTR_AUXD_ID,
};
/* These are the possible reserved region types */
@@ -156,6 +157,14 @@ struct iommu_resv_region {
enum iommu_resv_type type;
};
+/* Per device IOMMU attributions */
+enum iommu_dev_attr {
+ IOMMU_DEV_ATTR_AUXD_CAPABILITY,
+ IOMMU_DEV_ATTR_AUXD_ENABLED,
+ IOMMU_DEV_ATTR_AUXD_ENABLE,
+ IOMMU_DEV_ATTR_AUXD_DISABLE,
+};
+
#ifdef CONFIG_IOMMU_API
/**
@@ -183,6 +192,8 @@ struct iommu_resv_region {
* @domain_window_enable: Configure and enable a particular window for a domain
* @domain_window_disable: Disable a particular window for a domain
* @of_xlate: add OF master IDs to iommu grouping
+ * @get_dev_attr: get per device IOMMU attributions
+ * @set_dev_attr: set per device IOMMU attributions
* @pgsize_bitmap: bitmap of all possible supported page sizes
*/
struct iommu_ops {
@@ -226,6 +237,15 @@ struct iommu_ops {
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
+ /* Get/set per device IOMMU attributions */
+ int (*get_dev_attr)(struct device *dev,
+ enum iommu_dev_attr attr, void *data);
+ int (*set_dev_attr)(struct device *dev,
+ enum iommu_dev_attr attr, void *data);
+ /* Attach/detach aux domain */
+ int (*attach_dev_aux)(struct iommu_domain *domain, struct device *dev);
+ void (*detach_dev_aux)(struct iommu_domain *domain, struct device *dev);
+
unsigned long pgsize_bitmap;
};
@@ -398,6 +418,16 @@ void iommu_fwspec_free(struct device *dev);
int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+int iommu_get_dev_attr(struct device *dev,
+ enum iommu_dev_attr attr, void *data);
+int iommu_set_dev_attr(struct device *dev,
+ enum iommu_dev_attr attr, void *data);
+
+extern int iommu_attach_device_aux(struct iommu_domain *domain,
+ struct device *dev);
+extern void iommu_detach_device_aux(struct iommu_domain *domain,
+ struct device *dev);
+
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -682,6 +712,28 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
return NULL;
}
+static inline int
+iommu_get_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
+{
+ return -EINVAL;
+}
+
+static inline int
+iommu_set_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
+{
+ return -EINVAL;
+}
+
+static inline int
+iommu_attach_device_aux(struct iommu_domain *domain, struct device *dev)
+{
+ return -ENODEV;
+}
+
+static inline void
+iommu_detach_device_aux(struct iommu_domain *domain, struct device *dev)
+{
+}
#endif /* CONFIG_IOMMU_API */
#ifdef CONFIG_IOMMU_DEBUGFS
--
2.17.1
Please use EXPORT_SYMBOL_GPL like most of the vfio code.
Hi,
On 11/5/18 10:51 PM, Christoph Hellwig wrote:
> Please use EXPORT_SYMBOL_GPL like most of the vfio code.
>
Sure. Will use this in the next version.
Best regards,
Lu Baolu
On Mon, 5 Nov 2018 15:34:06 +0800
Lu Baolu <[email protected]> wrote:
> A parent device might create different types of mediated
> devices. For example, a mediated device could be created
> by the parent device with full isolation and protection
> provided by the IOMMU. One usage case could be found on
> Intel platforms where a mediated device is an assignable
> subset of a PCI, the DMA requests on behalf of it are all
> tagged with a PASID. Since IOMMU supports PASID-granular
> translations (scalable mode in vt-d 3.0), this mediated
> device could be individually protected and isolated by an
> IOMMU.
>
> This patch adds two new members in struct mdev_device:
> * iommu_device
> - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation.
>
> * iommu_domain
> - This is a place holder for an iommu domain. A domain
> could be store here for later use once it has been
> attached to the iommu_device of this mdev.
>
> Below helpers are added to set and get above iommu device
> and iommu domain pointers.
>
> * mdev_set/get_iommu_device(dev, iommu_device)
> - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
>
> * mdev_set/get_iommu_domain(domain)
> - A iommu domain which has been attached to the iommu
> device in order to protect and isolate the mediated
> device will be kept in the mdev data structure and
> could be retrieved later.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Suggested-by: Alex Williamson <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/vfio/mdev/mdev_core.c | 36 ++++++++++++++++++++++++++++++++
> drivers/vfio/mdev/mdev_private.h | 2 ++
> include/linux/mdev.h | 23 ++++++++++++++++++++
> 3 files changed, 61 insertions(+)
>
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 0212f0ee8aea..5119809225c5 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool force_remove)
> return 0;
> }
>
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_device = iommu_device;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return mdev->iommu_device;
> +}
> +EXPORT_SYMBOL(mdev_get_iommu_device);
> +
> +int mdev_set_iommu_domain(struct device *dev, void *domain)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_domain = domain;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_domain);
> +
> +void *mdev_get_iommu_domain(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return mdev->iommu_domain;
> +}
> +EXPORT_SYMBOL(mdev_get_iommu_domain);
> +
> static int __init mdev_init(void)
> {
> return mdev_bus_register();
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> index b5819b7d7ef7..c01518068e84 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -34,6 +34,8 @@ struct mdev_device {
> struct list_head next;
> struct kobject *type_kobj;
> bool active;
> + struct device *iommu_device;
> + void *iommu_domain;
> };
>
> #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index b6e048e1045f..c46777d3e568 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -14,6 +14,29 @@
> #define MDEV_H
>
> struct mdev_device;
> +struct iommu_domain;
> +
> +/*
> + * Called by the parent device driver to set the PCI device which represents
s/PCI //
There is no requirement or expectation that the device is PCI.
> + * this mdev in iommu protection scope. By default, the iommu device is NULL,
> + * that indicates using vendor defined isolation.
> + *
> + * @dev: the mediated device that iommu will isolate.
> + * @iommu_device: a pci device which represents the iommu for @dev.
> + *
> + * Return 0 for success, otherwise negative error value.
> + */
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev);
> +
> +/*
> + * Called by vfio iommu modules to save the iommu domain after a domain being
> + * attached to the mediated device.
> + */
> +int mdev_set_iommu_domain(struct device *dev, void *domain);
> +
> +void *mdev_get_iommu_domain(struct device *dev);
I can't say I really understand the purpose of this, the cover letter
indicates this is a placeholder, should we add it separately when we
have a requirement for it? Thanks,
Alex
Hi Alex,
On 11/7/18 7:53 AM, Alex Williamson wrote:
> On Mon, 5 Nov 2018 15:34:06 +0800
> Lu Baolu <[email protected]> wrote:
>
>> A parent device might create different types of mediated
>> devices. For example, a mediated device could be created
>> by the parent device with full isolation and protection
>> provided by the IOMMU. One usage case could be found on
>> Intel platforms where a mediated device is an assignable
>> subset of a PCI, the DMA requests on behalf of it are all
>> tagged with a PASID. Since IOMMU supports PASID-granular
>> translations (scalable mode in vt-d 3.0), this mediated
>> device could be individually protected and isolated by an
>> IOMMU.
>>
>> This patch adds two new members in struct mdev_device:
>> * iommu_device
>> - This, if set, indicates that the mediated device could
>> be fully isolated and protected by IOMMU via attaching
>> an iommu domain to this device. If empty, it indicates
>> using vendor defined isolation.
>>
>> * iommu_domain
>> - This is a place holder for an iommu domain. A domain
>> could be store here for later use once it has been
>> attached to the iommu_device of this mdev.
>>
>> Below helpers are added to set and get above iommu device
>> and iommu domain pointers.
>>
>> * mdev_set/get_iommu_device(dev, iommu_device)
>> - Set or get the iommu device which represents this mdev
>> in IOMMU's device scope. Drivers don't need to set the
>> iommu device if it uses vendor defined isolation.
>>
>> * mdev_set/get_iommu_domain(domain)
>> - A iommu domain which has been attached to the iommu
>> device in order to protect and isolate the mediated
>> device will be kept in the mdev data structure and
>> could be retrieved later.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Suggested-by: Kevin Tian <[email protected]>
>> Suggested-by: Alex Williamson <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> ---
>> drivers/vfio/mdev/mdev_core.c | 36 ++++++++++++++++++++++++++++++++
>> drivers/vfio/mdev/mdev_private.h | 2 ++
>> include/linux/mdev.h | 23 ++++++++++++++++++++
>> 3 files changed, 61 insertions(+)
>>
>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>> index 0212f0ee8aea..5119809225c5 100644
>> --- a/drivers/vfio/mdev/mdev_core.c
>> +++ b/drivers/vfio/mdev/mdev_core.c
>> @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool force_remove)
>> return 0;
>> }
>>
>> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
>> +{
>> + struct mdev_device *mdev = to_mdev_device(dev);
>> +
>> + mdev->iommu_device = iommu_device;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(mdev_set_iommu_device);
>> +
>> +struct device *mdev_get_iommu_device(struct device *dev)
>> +{
>> + struct mdev_device *mdev = to_mdev_device(dev);
>> +
>> + return mdev->iommu_device;
>> +}
>> +EXPORT_SYMBOL(mdev_get_iommu_device);
>> +
>> +int mdev_set_iommu_domain(struct device *dev, void *domain)
>> +{
>> + struct mdev_device *mdev = to_mdev_device(dev);
>> +
>> + mdev->iommu_domain = domain;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(mdev_set_iommu_domain);
>> +
>> +void *mdev_get_iommu_domain(struct device *dev)
>> +{
>> + struct mdev_device *mdev = to_mdev_device(dev);
>> +
>> + return mdev->iommu_domain;
>> +}
>> +EXPORT_SYMBOL(mdev_get_iommu_domain);
>> +
>> static int __init mdev_init(void)
>> {
>> return mdev_bus_register();
>> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
>> index b5819b7d7ef7..c01518068e84 100644
>> --- a/drivers/vfio/mdev/mdev_private.h
>> +++ b/drivers/vfio/mdev/mdev_private.h
>> @@ -34,6 +34,8 @@ struct mdev_device {
>> struct list_head next;
>> struct kobject *type_kobj;
>> bool active;
>> + struct device *iommu_device;
>> + void *iommu_domain;
>> };
>>
>> #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
>> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
>> index b6e048e1045f..c46777d3e568 100644
>> --- a/include/linux/mdev.h
>> +++ b/include/linux/mdev.h
>> @@ -14,6 +14,29 @@
>> #define MDEV_H
>>
>> struct mdev_device;
>> +struct iommu_domain;
>> +
>> +/*
>> + * Called by the parent device driver to set the PCI device which represents
>
> s/PCI //
>
> There is no requirement or expectation that the device is PCI.
>
Fair enough.
>> + * this mdev in iommu protection scope. By default, the iommu device is NULL,
>> + * that indicates using vendor defined isolation.
>> + *
>> + * @dev: the mediated device that iommu will isolate.
>> + * @iommu_device: a pci device which represents the iommu for @dev.
>> + *
>> + * Return 0 for success, otherwise negative error value.
>> + */
>> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
>> +
>> +struct device *mdev_get_iommu_device(struct device *dev);
>> +
>> +/*
>> + * Called by vfio iommu modules to save the iommu domain after a domain being
>> + * attached to the mediated device.
>> + */
>> +int mdev_set_iommu_domain(struct device *dev, void *domain);
>> +
>> +void *mdev_get_iommu_domain(struct device *dev);
>
> I can't say I really understand the purpose of this, the cover letter
> indicates this is a placeholder, should we add it separately when we
> have a requirement for it?
Oh, I am sorry that I used a wrong word. It's not a placeholder for
something designed for future, but adding two members that will be used
in the following patches. Since they will be used in anther modules
(like vfio_iommu), we need function interfaces to get and set them.
mdev->iommu_device:
- This, if set, indicates that the mediated device could
be fully isolated and protected by IOMMU via attaching
an iommu domain to this device. If empty, it indicates
using vendor defined isolation.
mdev->iommu_domain:
- This is used to save the pointer of an iommu domain. Once
a domain has been attached to the iommu_device, it should
be stored here.
Sorry for the confusion.
Best regards,
Lu Baolu
On 11/7/2018 7:18 AM, Lu Baolu wrote:
> Hi Alex,
>
> On 11/7/18 7:53 AM, Alex Williamson wrote:
>> On Mon, 5 Nov 2018 15:34:06 +0800
>> Lu Baolu <[email protected]> wrote:
>>
>>> A parent device might create different types of mediated
>>> devices. For example, a mediated device could be created
>>> by the parent device with full isolation and protection
>>> provided by the IOMMU. One usage case could be found on
>>> Intel platforms where a mediated device is an assignable
>>> subset of a PCI, the DMA requests on behalf of it are all
>>> tagged with a PASID. Since IOMMU supports PASID-granular
>>> translations (scalable mode in vt-d 3.0), this mediated
>>> device could be individually protected and isolated by an
>>> IOMMU.
>>>
>>> This patch adds two new members in struct mdev_device:
>>> * iommu_device
>>> Â Â - This, if set, indicates that the mediated device could
>>> Â Â Â Â be fully isolated and protected by IOMMU via attaching
>>> Â Â Â Â an iommu domain to this device. If empty, it indicates
>>> Â Â Â Â using vendor defined isolation.
>>>
>>> * iommu_domain
>>> Â Â - This is a place holder for an iommu domain. A domain
>>> Â Â Â Â could be store here for later use once it has been
>>> Â Â Â Â attached to the iommu_device of this mdev.
>>>
>>> Below helpers are added to set and get above iommu device
>>> and iommu domain pointers.
>>>
>>> * mdev_set/get_iommu_device(dev, iommu_device)
>>> Â Â - Set or get the iommu device which represents this mdev
>>> Â Â Â Â in IOMMU's device scope. Drivers don't need to set the
>>> Â Â Â Â iommu device if it uses vendor defined isolation.
>>>
>>> * mdev_set/get_iommu_domain(domain)
>>> Â Â - A iommu domain which has been attached to the iommu
>>> Â Â Â Â device in order to protect and isolate the mediated
>>> Â Â Â Â device will be kept in the mdev data structure and
>>> Â Â Â Â could be retrieved later.
>>>
>>> Cc: Ashok Raj <[email protected]>
>>> Cc: Jacob Pan <[email protected]>
>>> Cc: Kevin Tian <[email protected]>
>>> Cc: Liu Yi L <[email protected]>
>>> Suggested-by: Kevin Tian <[email protected]>
>>> Suggested-by: Alex Williamson <[email protected]>
>>> Signed-off-by: Lu Baolu <[email protected]>
>>> ---
>>>  drivers/vfio/mdev/mdev_core.c   | 36 ++++++++++++++++++++++++++++++++
>>> Â drivers/vfio/mdev/mdev_private.h |Â 2 ++
>>>  include/linux/mdev.h            | 23 ++++++++++++++++++++
>>> Â 3 files changed, 61 insertions(+)
>>>
>>> diff --git a/drivers/vfio/mdev/mdev_core.c
>>> b/drivers/vfio/mdev/mdev_core.c
>>> index 0212f0ee8aea..5119809225c5 100644
>>> --- a/drivers/vfio/mdev/mdev_core.c
>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>> @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool
>>> force_remove)
>>> Â Â Â Â Â return 0;
>>> Â }
>>> Â +int mdev_set_iommu_device(struct device *dev, struct device
>>> *iommu_device)
>>> +{
>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>> +
>>> +Â Â Â mdev->iommu_device = iommu_device;
>>> +
>>> +Â Â Â return 0;
>>> +}
>>> +EXPORT_SYMBOL(mdev_set_iommu_device);
>>> +
>>> +struct device *mdev_get_iommu_device(struct device *dev)
>>> +{
>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>> +
>>> +Â Â Â return mdev->iommu_device;
>>> +}
>>> +EXPORT_SYMBOL(mdev_get_iommu_device);
>>> +
>>> +int mdev_set_iommu_domain(struct device *dev, void *domain)
>>> +{
>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>> +
>>> +Â Â Â mdev->iommu_domain = domain;
>>> +
>>> +Â Â Â return 0;
>>> +}
>>> +EXPORT_SYMBOL(mdev_set_iommu_domain);
>>> +
>>> +void *mdev_get_iommu_domain(struct device *dev)
>>> +{
>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>> +
>>> +Â Â Â return mdev->iommu_domain;
>>> +}
>>> +EXPORT_SYMBOL(mdev_get_iommu_domain);
>>> +
>>> Â static int __init mdev_init(void)
>>> Â {
>>> Â Â Â Â Â return mdev_bus_register();
>>> diff --git a/drivers/vfio/mdev/mdev_private.h
>>> b/drivers/vfio/mdev/mdev_private.h
>>> index b5819b7d7ef7..c01518068e84 100644
>>> --- a/drivers/vfio/mdev/mdev_private.h
>>> +++ b/drivers/vfio/mdev/mdev_private.h
>>> @@ -34,6 +34,8 @@ struct mdev_device {
>>> Â Â Â Â Â struct list_head next;
>>> Â Â Â Â Â struct kobject *type_kobj;
>>> Â Â Â Â Â bool active;
>>> +Â Â Â struct device *iommu_device;
>>> +Â Â Â void *iommu_domain;
>>> Â };
>>> Â Â #define to_mdev_device(dev)Â Â Â container_of(dev, struct
>>> mdev_device, dev)
>>> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
>>> index b6e048e1045f..c46777d3e568 100644
>>> --- a/include/linux/mdev.h
>>> +++ b/include/linux/mdev.h
>>> @@ -14,6 +14,29 @@
>>> Â #define MDEV_H
>>> Â Â struct mdev_device;
>>> +struct iommu_domain;
>>> +
>>> +/*
>>> + * Called by the parent device driver to set the PCI device which
>>> represents
>>
>> s/PCI //
>>
>> There is no requirement or expectation that the device is PCI.
>>
>
> Fair enough.
>
>>> + * this mdev in iommu protection scope. By default, the iommu device
>>> is NULL,
>>> + * that indicates using vendor defined isolation.
>>> + *
>>> + * @dev: the mediated device that iommu will isolate.
>>> + * @iommu_device: a pci device which represents the iommu for @dev.
>>> + *
>>> + * Return 0 for success, otherwise negative error value.
>>> + */
>>> +int mdev_set_iommu_device(struct device *dev, struct device
>>> *iommu_device);
>>> +
>>> +struct device *mdev_get_iommu_device(struct device *dev);
>>> +
>>> +/*
>>> + * Called by vfio iommu modules to save the iommu domain after a
>>> domain being
>>> + * attached to the mediated device.
>>> + */
>>> +int mdev_set_iommu_domain(struct device *dev, void *domain);
>>> +
>>> +void *mdev_get_iommu_domain(struct device *dev);
>>
>> I can't say I really understand the purpose of this, the cover letter
>> indicates this is a placeholder, should we add it separately when we
>> have a requirement for it?
>
> Oh, I am sorry that I used a wrong word. It's not a placeholder for
> something designed for future, but adding two members that will be used
> in the following patches. Since they will be used in anther modules
> (like vfio_iommu), we need function interfaces to get and set them.
>
> mdev->iommu_device:
> Â -Â This, if set, indicates that the mediated device could
> Â Â Â be fully isolated and protected by IOMMU via attaching
> Â Â Â an iommu domain to this device. If empty, it indicates
> Â Â Â using vendor defined isolation.
>
> mdev->iommu_domain:
> Â - This is used to save the pointer of an iommu domain. Once
> Â Â Â a domain has been attached to the iommu_device, it should
> Â Â Â be stored here.
>
I don't see mdev->iommu_domain is used anywhere in this series of patch.
If this is not being used, then no need to save it. With that symbols
mdev_set/get_iommu_domain(domain) are not required.
Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL
same as other exported symbols from mdev_core module.
Thanks,
Kirti
Hi,
On 11/16/18 5:31 AM, Kirti Wankhede wrote:
>
>
> On 11/7/2018 7:18 AM, Lu Baolu wrote:
>> Hi Alex,
>>
>> On 11/7/18 7:53 AM, Alex Williamson wrote:
>>> On Mon, 5 Nov 2018 15:34:06 +0800
>>> Lu Baolu <[email protected]> wrote:
>>>
>>>> A parent device might create different types of mediated
>>>> devices. For example, a mediated device could be created
>>>> by the parent device with full isolation and protection
>>>> provided by the IOMMU. One usage case could be found on
>>>> Intel platforms where a mediated device is an assignable
>>>> subset of a PCI, the DMA requests on behalf of it are all
>>>> tagged with a PASID. Since IOMMU supports PASID-granular
>>>> translations (scalable mode in vt-d 3.0), this mediated
>>>> device could be individually protected and isolated by an
>>>> IOMMU.
>>>>
>>>> This patch adds two new members in struct mdev_device:
>>>> * iommu_device
>>>> Â Â - This, if set, indicates that the mediated device could
>>>> Â Â Â Â be fully isolated and protected by IOMMU via attaching
>>>> Â Â Â Â an iommu domain to this device. If empty, it indicates
>>>> Â Â Â Â using vendor defined isolation.
>>>>
>>>> * iommu_domain
>>>> Â Â - This is a place holder for an iommu domain. A domain
>>>> Â Â Â Â could be store here for later use once it has been
>>>> Â Â Â Â attached to the iommu_device of this mdev.
>>>>
>>>> Below helpers are added to set and get above iommu device
>>>> and iommu domain pointers.
>>>>
>>>> * mdev_set/get_iommu_device(dev, iommu_device)
>>>> Â Â - Set or get the iommu device which represents this mdev
>>>> Â Â Â Â in IOMMU's device scope. Drivers don't need to set the
>>>> Â Â Â Â iommu device if it uses vendor defined isolation.
>>>>
>>>> * mdev_set/get_iommu_domain(domain)
>>>> Â Â - A iommu domain which has been attached to the iommu
>>>> Â Â Â Â device in order to protect and isolate the mediated
>>>> Â Â Â Â device will be kept in the mdev data structure and
>>>> Â Â Â Â could be retrieved later.
>>>>
>>>> Cc: Ashok Raj <[email protected]>
>>>> Cc: Jacob Pan <[email protected]>
>>>> Cc: Kevin Tian <[email protected]>
>>>> Cc: Liu Yi L <[email protected]>
>>>> Suggested-by: Kevin Tian <[email protected]>
>>>> Suggested-by: Alex Williamson <[email protected]>
>>>> Signed-off-by: Lu Baolu <[email protected]>
>>>> ---
>>>>  drivers/vfio/mdev/mdev_core.c   | 36 ++++++++++++++++++++++++++++++++
>>>> Â drivers/vfio/mdev/mdev_private.h |Â 2 ++
>>>>  include/linux/mdev.h            | 23 ++++++++++++++++++++
>>>> Â 3 files changed, 61 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/mdev/mdev_core.c
>>>> b/drivers/vfio/mdev/mdev_core.c
>>>> index 0212f0ee8aea..5119809225c5 100644
>>>> --- a/drivers/vfio/mdev/mdev_core.c
>>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>>> @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool
>>>> force_remove)
>>>> Â Â Â Â Â return 0;
>>>> Â }
>>>> Â +int mdev_set_iommu_device(struct device *dev, struct device
>>>> *iommu_device)
>>>> +{
>>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>>> +
>>>> +Â Â Â mdev->iommu_device = iommu_device;
>>>> +
>>>> +Â Â Â return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(mdev_set_iommu_device);
>>>> +
>>>> +struct device *mdev_get_iommu_device(struct device *dev)
>>>> +{
>>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>>> +
>>>> +Â Â Â return mdev->iommu_device;
>>>> +}
>>>> +EXPORT_SYMBOL(mdev_get_iommu_device);
>>>> +
>>>> +int mdev_set_iommu_domain(struct device *dev, void *domain)
>>>> +{
>>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>>> +
>>>> +Â Â Â mdev->iommu_domain = domain;
>>>> +
>>>> +Â Â Â return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(mdev_set_iommu_domain);
>>>> +
>>>> +void *mdev_get_iommu_domain(struct device *dev)
>>>> +{
>>>> +Â Â Â struct mdev_device *mdev = to_mdev_device(dev);
>>>> +
>>>> +Â Â Â return mdev->iommu_domain;
>>>> +}
>>>> +EXPORT_SYMBOL(mdev_get_iommu_domain);
>>>> +
>>>> Â static int __init mdev_init(void)
>>>> Â {
>>>> Â Â Â Â Â return mdev_bus_register();
>>>> diff --git a/drivers/vfio/mdev/mdev_private.h
>>>> b/drivers/vfio/mdev/mdev_private.h
>>>> index b5819b7d7ef7..c01518068e84 100644
>>>> --- a/drivers/vfio/mdev/mdev_private.h
>>>> +++ b/drivers/vfio/mdev/mdev_private.h
>>>> @@ -34,6 +34,8 @@ struct mdev_device {
>>>> Â Â Â Â Â struct list_head next;
>>>> Â Â Â Â Â struct kobject *type_kobj;
>>>> Â Â Â Â Â bool active;
>>>> +Â Â Â struct device *iommu_device;
>>>> +Â Â Â void *iommu_domain;
>>>> Â };
>>>> Â Â #define to_mdev_device(dev)Â Â Â container_of(dev, struct
>>>> mdev_device, dev)
>>>> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
>>>> index b6e048e1045f..c46777d3e568 100644
>>>> --- a/include/linux/mdev.h
>>>> +++ b/include/linux/mdev.h
>>>> @@ -14,6 +14,29 @@
>>>> Â #define MDEV_H
>>>> Â Â struct mdev_device;
>>>> +struct iommu_domain;
>>>> +
>>>> +/*
>>>> + * Called by the parent device driver to set the PCI device which
>>>> represents
>>>
>>> s/PCI //
>>>
>>> There is no requirement or expectation that the device is PCI.
>>>
>>
>> Fair enough.
>>
>>>> + * this mdev in iommu protection scope. By default, the iommu device
>>>> is NULL,
>>>> + * that indicates using vendor defined isolation.
>>>> + *
>>>> + * @dev: the mediated device that iommu will isolate.
>>>> + * @iommu_device: a pci device which represents the iommu for @dev.
>>>> + *
>>>> + * Return 0 for success, otherwise negative error value.
>>>> + */
>>>> +int mdev_set_iommu_device(struct device *dev, struct device
>>>> *iommu_device);
>>>> +
>>>> +struct device *mdev_get_iommu_device(struct device *dev);
>>>> +
>>>> +/*
>>>> + * Called by vfio iommu modules to save the iommu domain after a
>>>> domain being
>>>> + * attached to the mediated device.
>>>> + */
>>>> +int mdev_set_iommu_domain(struct device *dev, void *domain);
>>>> +
>>>> +void *mdev_get_iommu_domain(struct device *dev);
>>>
>>> I can't say I really understand the purpose of this, the cover letter
>>> indicates this is a placeholder, should we add it separately when we
>>> have a requirement for it?
>>
>> Oh, I am sorry that I used a wrong word. It's not a placeholder for
>> something designed for future, but adding two members that will be used
>> in the following patches. Since they will be used in anther modules
>> (like vfio_iommu), we need function interfaces to get and set them.
>>
>> mdev->iommu_device:
>> Â -Â This, if set, indicates that the mediated device could
>> Â Â Â be fully isolated and protected by IOMMU via attaching
>> Â Â Â an iommu domain to this device. If empty, it indicates
>> Â Â Â using vendor defined isolation.
>>
>> mdev->iommu_domain:
>> Â - This is used to save the pointer of an iommu domain. Once
>> Â Â Â a domain has been attached to the iommu_device, it should
>> Â Â Â be stored here.
>>
>
> I don't see mdev->iommu_domain is used anywhere in this series of patch.
> If this is not being used, then no need to save it. With that symbols
> mdev_set/get_iommu_domain(domain) are not required.
Yes. We won't use mdev->iommu_domain in this patch series. It should be
used by mdev parent driver to retrieve the default pasid of the domain.
Something like:
domain = mdev_get_iommu_domain(dev)
pasid = iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID);
reg_write(pasid_reg, pasid);
I am okay if we remove mdev_set/get_iommu_domain from this patch series
and add it later when the parent driver comes.
>
> Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL
> same as other exported symbols from mdev_core module.
Yes. It will be fixed in the next version.
>
> Thanks,
> Kirti
>
Best regards,
Lu Baolu
On Fri, Nov 16, 2018 at 09:20:48AM +0800, Lu Baolu wrote:
> > Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL
> > same as other exported symbols from mdev_core module.
>
> Yes. It will be fixed in the next version.
No. mdev shall not be used to circumvent the exports in the generic
vfio code.
Hi,
On 11/16/18 4:57 PM, Christoph Hellwig wrote:
> On Fri, Nov 16, 2018 at 09:20:48AM +0800, Lu Baolu wrote:
>>> Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL
>>> same as other exported symbols from mdev_core module.
>>
>> Yes. It will be fixed in the next version.
>
> No. mdev shall not be used to circumvent the exports in the generic
> vfio code.
Get it now. Thanks a lot.
Best regards,
Lu Baolu
On 11/16/2018 2:27 PM, Christoph Hellwig wrote:
> On Fri, Nov 16, 2018 at 09:20:48AM +0800, Lu Baolu wrote:
>>> Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL
>>> same as other exported symbols from mdev_core module.
>>
>> Yes. It will be fixed in the next version.
>
> No. mdev shall not be used to circumvent the exports in the generic
> vfio code.
>
It is about how mdev framework can be used by existing drivers. These
symbols doesn't use any other exported symbols.
Thanks,
Kirti
On Wed, Nov 21, 2018 at 02:22:08AM +0530, Kirti Wankhede wrote:
> It is about how mdev framework can be used by existing drivers. These
> symbols doesn't use any other exported symbols.
That is an unfortunate accident of history, but doesn't extent to new
ones. It also is another inidicator those drivers probably are derived
works of the Linux kernel and might be in legal trouble one way or
another.
Hi Lu,
On 11/5/18 8:34 AM, Lu Baolu wrote:
> When multiple domains per device has been enabled by the
> device driver, the device will tag the default PASID for
> the domain to all DMA traffics out of the subset of this
> device; and the IOMMU should translate the DMA requests
> in PASID granularity.
>
> This extends the intel_iommu_attach/detach_device() ops
> to support managing PASID granular translation structures
> when the device driver has enabled multiple domains per
> device.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 192 +++++++++++++++++++++++++++++++-----
> include/linux/intel-iommu.h | 10 ++
> 2 files changed, 180 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 2c86ac71c774..a61b25ad0d3b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2477,6 +2477,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
> info->iommu = iommu;
> info->pasid_table = NULL;
> info->auxd_enabled = 0;
> + INIT_LIST_HEAD(&info->auxiliary_domains);
>
> if (dev && dev_is_pci(dev)) {
> struct pci_dev *pdev = to_pci_dev(info->dev);
> @@ -5010,35 +5011,134 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
> domain_exit(to_dmar_domain(domain));
> }
>
> -static int intel_iommu_attach_device(struct iommu_domain *domain,
> - struct device *dev)
> +/*
> + * Check whether a @domain will be attached to the @dev in the
> + * auxiliary mode.
> + */
> +static inline bool
> +is_device_attach_aux_domain(struct device *dev, struct iommu_domain *domain)
> {
> - struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> - struct intel_iommu *iommu;
> - int addr_width;
> - u8 bus, devfn;
> + struct device_domain_info *info = dev->archdata.iommu;
>
> - if (device_is_rmrr_locked(dev)) {
> - dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
> - return -EPERM;
> - }
> + return info && info->auxd_enabled &&
> + domain->type == IOMMU_DOMAIN_UNMANAGED;
> +}
>
> - /* normally dev is not mapped */
> - if (unlikely(domain_context_mapped(dev))) {
> - struct dmar_domain *old_domain;
> +static void auxiliary_link_device(struct dmar_domain *domain,
> + struct device *dev)
> +{
> + struct device_domain_info *info = dev->archdata.iommu;
>
> - old_domain = find_domain(dev);
> - if (old_domain) {
> - rcu_read_lock();
> - dmar_remove_one_dev_info(old_domain, dev);
> - rcu_read_unlock();
> + assert_spin_locked(&device_domain_lock);
> + if (WARN_ON(!info))
> + return;
>
> - if (!domain_type_is_vm_or_si(old_domain) &&
> - list_empty(&old_domain->devices))
> - domain_exit(old_domain);
> + domain->auxd_refcnt++;
> + list_add(&domain->auxd, &info->auxiliary_domains);
> +}
> +
> +static void auxiliary_unlink_device(struct dmar_domain *domain,
> + struct device *dev)
> +{
> + struct device_domain_info *info = dev->archdata.iommu;
> +
> + assert_spin_locked(&device_domain_lock);
> + if (WARN_ON(!info))
> + return;
> +
> + list_del(&domain->auxd);
> + domain->auxd_refcnt--;
> +
> + if (!domain->auxd_refcnt && domain->default_pasid > 0)
> + intel_pasid_free_id(domain->default_pasid);
> +}
> +
> +static int domain_add_dev_auxd(struct dmar_domain *domain,
> + struct device *dev)
> +{
> + int ret;
> + u8 bus, devfn;
> + unsigned long flags;
> + struct intel_iommu *iommu;
> +
> + iommu = device_to_iommu(dev, &bus, &devfn);
> + if (!iommu)
> + return -ENODEV;
> +
> + spin_lock_irqsave(&device_domain_lock, flags);
> + if (domain->default_pasid <= 0) {
> + domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> + pci_max_pasids(to_pci_dev(dev)), GFP_ATOMIC);
> + if (domain->default_pasid < 0) {
> + pr_err("Can't allocate default pasid\n");
> + ret = -ENODEV;
> + goto pasid_failed;
> }
> }
>
> + spin_lock(&iommu->lock);
You may comment your nested lock policy somewhere.
> + ret = domain_attach_iommu(domain, iommu);
> + if (ret)
> + goto attach_failed;
> +
> + /* Setup the PASID entry for mediated devices: */
> + ret = intel_pasid_setup_second_level(iommu, domain, dev,
> + domain->default_pasid);
> + if (ret)
> + goto table_failed;
> + spin_unlock(&iommu->lock);
> +
> + auxiliary_link_device(domain, dev);
> +
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> + return 0;
> +
> +table_failed:
> + domain_detach_iommu(domain, iommu);
> +attach_failed:
> + spin_unlock(&iommu->lock);
> + if (!domain->auxd_refcnt && domain->default_pasid > 0)
> + intel_pasid_free_id(domain->default_pasid);
> +pasid_failed:
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> + return ret;
> +}
> +
> +static void domain_remove_dev_aux(struct dmar_domain *domain,
> + struct device *dev)
> +{
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> +
> + if (!is_device_attach_aux_domain(dev, &domain->domain))
> + return;
> +
> + spin_lock_irqsave(&device_domain_lock, flags);
> + info = dev->archdata.iommu;
> + iommu = info->iommu;
> +
> + intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
> +
> + auxiliary_unlink_device(domain, dev);
> +
> + spin_lock(&iommu->lock);
> + domain_detach_iommu(domain, iommu);
> + spin_unlock(&iommu->lock);
> +
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> +}
> +
> +static int __intel_iommu_attach_device(struct iommu_domain *domain,
> + struct device *dev)
> +{
Maybe introducing __intel_iommu_attach_device in a patch prior to that
one would help the review.
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct intel_iommu *iommu;
> + int addr_width;
> + u8 bus, devfn;
> +
> iommu = device_to_iommu(dev, &bus, &devfn);
> if (!iommu)
> return -ENODEV;
> @@ -5071,7 +5171,47 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
> dmar_domain->agaw--;
> }
>
> - return domain_add_dev_info(dmar_domain, dev);
> + if (is_device_attach_aux_domain(dev, domain))
> + return domain_add_dev_auxd(dmar_domain, dev);
why not putting this directly into intel_iommu_attach_device_aux()?
> + else
> + return domain_add_dev_info(dmar_domain, dev);
and this into intel_iommu_attach_device() as
__intel_iommu_attach_device() is the common part now?
> +}
> +
> +static int intel_iommu_attach_device(struct iommu_domain *domain,
> + struct device *dev)
> +{
> + if (device_is_rmrr_locked(dev)) {
> + dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
> + return -EPERM;
> + }
shouldn't we test this in the common part (ie. in
__intel_iommu_attach_device). Don't RMRR also impact aux domains ?
> +
> + if (is_device_attach_aux_domain(dev, domain))
> + return -EPERM;
> +
> + /* normally dev is not mapped */
> + if (unlikely(domain_context_mapped(dev))) {
> + struct dmar_domain *old_domain;
> +
> + old_domain = find_domain(dev);
> + if (old_domain) {
> + rcu_read_lock();
> + dmar_remove_one_dev_info(old_domain, dev);
> + rcu_read_unlock();
> +
> + if (!domain_type_is_vm_or_si(old_domain) &&
> + list_empty(&old_domain->devices))
> + domain_exit(old_domain);
> + }
> + }
> +
> + return __intel_iommu_attach_device(domain, dev);
> +}
> +
> +static int intel_iommu_attach_device_aux(struct iommu_domain *domain,
> + struct device *dev)
> +{
> + return is_device_attach_aux_domain(dev, domain) ?
> + __intel_iommu_attach_device(domain, dev) : -EPERM;
> }
>
> static void intel_iommu_detach_device(struct iommu_domain *domain,
> @@ -5080,6 +5220,12 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
> dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
> }
>
> +static void intel_iommu_detach_device_aux(struct iommu_domain *domain,
> + struct device *dev)
> +{
> + domain_remove_dev_aux(to_dmar_domain(domain), dev);
> +}
> +
> static int intel_iommu_map(struct iommu_domain *domain,
> unsigned long iova, phys_addr_t hpa,
> size_t size, int iommu_prot)
> @@ -5436,6 +5582,8 @@ const struct iommu_ops intel_iommu_ops = {
> .domain_free = intel_iommu_domain_free,
> .attach_dev = intel_iommu_attach_device,
> .detach_dev = intel_iommu_detach_device,
> + .attach_dev_aux = intel_iommu_attach_device_aux,
> + .detach_dev_aux = intel_iommu_detach_device_aux,
> .map = intel_iommu_map,
> .unmap = intel_iommu_unmap,
> .iova_to_phys = intel_iommu_iova_to_phys,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 6b198e13e75e..678c7fb05e74 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -473,9 +473,11 @@ struct dmar_domain {
> /* Domain ids per IOMMU. Use u16 since
> * domain ids are 16 bit wide according
> * to VT-d spec, section 9.3 */
> + unsigned int auxd_refcnt; /* Refcount of auxiliary attaching */
>
> bool has_iotlb_device;
> struct list_head devices; /* all devices' list */
> + struct list_head auxd; /* link to device's auxiliary list */
> struct iova_domain iovad; /* iova's that belong to this domain */
>
> struct dma_pte *pgd; /* virtual address */
> @@ -494,6 +496,11 @@ struct dmar_domain {
> 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> u64 max_addr; /* maximum mapped address */
>
> + int default_pasid; /*
> + * The default pasid used for non-SVM
> + * traffic on mediated devices.
> + */
> +
> struct iommu_domain domain; /* generic domain data structure for
> iommu core */
> };
> @@ -543,6 +550,9 @@ struct device_domain_info {
> struct list_head link; /* link to domain siblings */
> struct list_head global; /* link to global list */
> struct list_head table; /* link to pasid table */
> + struct list_head auxiliary_domains; /* auxiliary domains
> + * attached to this device
> + */
> u8 bus; /* PCI bus number */
> u8 devfn; /* PCI devfn number */
> u16 pfsid; /* SRIOV physical function source ID */
>
Thanks
Eric
Hi Lu,
On 11/5/18 8:34 AM, Lu Baolu wrote:
> Sharing a physical PCI device in a finer-granularity way
> is becoming a consensus in the industry. IOMMU vendors
> are also engaging efforts to support such sharing as well
> as possible. Among the efforts, the capability of support
> finer-granularity DMA isolation is a common requirement
> due to the security consideration. With finer-granularity
> DMA isolation, all DMA requests out of or to a subset of
> a physical PCI device can be protected by the IOMMU. As a
> result, there is a request in software to attach multiple
> domains to a physical PCI device. One example of such use
> model is the Intel Scalable IOV [1] [2]. The Intel vt-d
> 3.0 spec [3] introduces the scalable mode which enables
> PASID granularity DMA isolation.
>
> This adds the APIs to support multiple domains per device.
> In order to ease the discussions, we call it 'a domain in
> auxiliary mode' or simply 'auxiliary domain' when multiple
> domains are attached to a physical device.
>
> The APIs includes:
>
> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
> - Represents the ability of supporting multiple domains
> per device.
>
> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLED)
> - Checks whether the device identified by @dev is working
> in auxiliary mode.
>
> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
> - Enables the multiple domains capability for the device
> referenced by @dev.
>
> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
> - Disables the multiple domains capability for the device
> referenced by @dev.
>
> * iommu_attach_device_aux(domain, dev)
> - Attaches @domain to @dev in the auxiliary mode. Multiple
> domains could be attached to a single device in the
> auxiliary mode with each domain representing an isolated
> address space for an assignable subset of the device.
>
> * iommu_detach_device_aux(domain, dev)
> - Detach @domain which has been attached to @dev in the
> auxiliary mode.
>
> * iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
> - Return ID used for finer-granularity DMA translation.
> For the Intel Scalable IOV usage model, this will be
> a PASID. The device which supports Scalalbe IOV needs
s/Scalalbe/Scalable
> to writes this ID to the device register so that DMA
s/writes/write
> requests could be tagged with a right PASID prefix.
This is not crystal clear to me as the intel implementation returns the
default PASID and not the PASID of the aux domain.
>
> Many people involved in discussions of this design.
>
> Kevin Tian <[email protected]>
> Liu Yi L <[email protected]>
> Ashok Raj <[email protected]>
> Sanjay Kumar <[email protected]>
> Jacob Pan <[email protected]>
> Alex Williamson <[email protected]>
> Jean-Philippe Brucker <[email protected]>
>
> and some discussions can be found here [4].
>
> [1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> [3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> [4] https://lkml.org/lkml/2018/7/26/4
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/iommu/iommu.c | 52 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/iommu.h | 52 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 104 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index edbdf5d6962c..0b7c96d1425e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2030,3 +2030,55 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
> return 0;
> }
> EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/*
> + * Generic interfaces to get or set per device IOMMU attributions.
> + */
> +int iommu_get_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
> +{
> + const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> + if (ops && ops->get_dev_attr)
> + return ops->get_dev_attr(dev, attr, data);
> +
> + return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(iommu_get_dev_attr);
> +
> +int iommu_set_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
> +{
> + const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> + if (ops && ops->set_dev_attr)
> + return ops->set_dev_attr(dev, attr, data);
> +
> + return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(iommu_set_dev_attr);
> +
> +/*
> + * APIs to attach/detach a domain to/from a device in the
> + * auxiliary mode.
> + */
> +int iommu_attach_device_aux(struct iommu_domain *domain, struct device *dev)
> +{
> + int ret = -ENODEV;
> +
> + if (domain->ops->attach_dev_aux)
> + ret = domain->ops->attach_dev_aux(domain, dev);
> +
> + if (!ret)
> + trace_attach_device_to_domain(dev);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_device_aux);
> +
> +void iommu_detach_device_aux(struct iommu_domain *domain, struct device *dev)
> +{
> + if (domain->ops->detach_dev_aux) {
> + domain->ops->detach_dev_aux(domain, dev);
> + trace_detach_device_from_domain(dev);
> + }
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_device_aux);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index a1d28f42cb77..9bf1b3f2457a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -126,6 +126,7 @@ enum iommu_attr {
> DOMAIN_ATTR_NESTING, /* two stages of translation */
> DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> DOMAIN_ATTR_MAX,
> + DOMAIN_ATTR_AUXD_ID,
> };
>
> /* These are the possible reserved region types */
> @@ -156,6 +157,14 @@ struct iommu_resv_region {
> enum iommu_resv_type type;
> };
>
> +/* Per device IOMMU attributions */
> +enum iommu_dev_attr {
> + IOMMU_DEV_ATTR_AUXD_CAPABILITY,
> + IOMMU_DEV_ATTR_AUXD_ENABLED,
> + IOMMU_DEV_ATTR_AUXD_ENABLE,
> + IOMMU_DEV_ATTR_AUXD_DISABLE,
> +};
> +
> #ifdef CONFIG_IOMMU_API
>
> /**
> @@ -183,6 +192,8 @@ struct iommu_resv_region {
> * @domain_window_enable: Configure and enable a particular window for a domain
> * @domain_window_disable: Disable a particular window for a domain
> * @of_xlate: add OF master IDs to iommu grouping
> + * @get_dev_attr: get per device IOMMU attributions
s/attributions/attributes here and other locations?
> + * @set_dev_attr: set per device IOMMU attributions
> * @pgsize_bitmap: bitmap of all possible supported page sizes
> */
> struct iommu_ops {
> @@ -226,6 +237,15 @@ struct iommu_ops {
> int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
>
> + /* Get/set per device IOMMU attributions */
> + int (*get_dev_attr)(struct device *dev,
> + enum iommu_dev_attr attr, void *data);
> + int (*set_dev_attr)(struct device *dev,
> + enum iommu_dev_attr attr, void *data);
> + /* Attach/detach aux domain */
> + int (*attach_dev_aux)(struct iommu_domain *domain, struct device *dev);
> + void (*detach_dev_aux)(struct iommu_domain *domain, struct device *dev);
> +
> unsigned long pgsize_bitmap;
> };
>
> @@ -398,6 +418,16 @@ void iommu_fwspec_free(struct device *dev);
> int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
> const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>
> +int iommu_get_dev_attr(struct device *dev,
> + enum iommu_dev_attr attr, void *data);
> +int iommu_set_dev_attr(struct device *dev,
> + enum iommu_dev_attr attr, void *data);
> +
> +extern int iommu_attach_device_aux(struct iommu_domain *domain,
> + struct device *dev);
> +extern void iommu_detach_device_aux(struct iommu_domain *domain,
> + struct device *dev);
> +
> #else /* CONFIG_IOMMU_API */
>
> struct iommu_ops {};
> @@ -682,6 +712,28 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> return NULL;
> }
>
> +static inline int
> +iommu_get_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
> +{
> + return -EINVAL;
> +}
> +
> +static inline int
> +iommu_set_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
> +{
> + return -EINVAL;
> +}
> +
> +static inline int
> +iommu_attach_device_aux(struct iommu_domain *domain, struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline void
> +iommu_detach_device_aux(struct iommu_domain *domain, struct device *dev)
> +{
> +}
> #endif /* CONFIG_IOMMU_API */
>
> #ifdef CONFIG_IOMMU_DEBUGFS
>
Thanks
Eric
Hi,
On 11/5/18 8:34 AM, Lu Baolu wrote:
> Add the response to IOMMU_DEV_ATTR_AUXD_CAPABILITY capability query
> through iommu_get_dev_attr().
commit title: Advertise auxiliary domain capability?
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 38 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5e149d26ea9b..298f7a3fafe8 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5167,6 +5167,24 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
> return phys;
> }
>
> +static inline bool scalable_mode_support(void)
> +{
> + struct dmar_drhd_unit *drhd;
> + struct intel_iommu *iommu;
> + bool ret = true;
> +
> + rcu_read_lock();
> + for_each_active_iommu(iommu, drhd) {
> + if (!sm_supported(iommu)) {
> + ret = false;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> static bool intel_iommu_capable(enum iommu_cap cap)
> {
> if (cap == IOMMU_CAP_CACHE_COHERENCY)
> @@ -5331,6 +5349,25 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
> }
> #endif /* CONFIG_INTEL_IOMMU_SVM */
>
> +static int intel_iommu_get_dev_attr(struct device *dev,
> + enum iommu_dev_attr attr, void *data)
> +{
> + int ret = 0;
> + bool *auxd_capable;
nit: could be local to the case as other cases may use other datatypes.
> +
> + switch (attr) {
> + case IOMMU_DEV_ATTR_AUXD_CAPABILITY:
> + auxd_capable = data;
> + *auxd_capable = scalable_mode_support();
> + break;
> + default:
> + ret = -EINVAL;
> + break;
> + }
> +
> + return ret;
> +}
> +
> const struct iommu_ops intel_iommu_ops = {
> .capable = intel_iommu_capable,
> .domain_alloc = intel_iommu_domain_alloc,
> @@ -5345,6 +5382,7 @@ const struct iommu_ops intel_iommu_ops = {
> .get_resv_regions = intel_iommu_get_resv_regions,
> .put_resv_regions = intel_iommu_put_resv_regions,
> .device_group = pci_device_group,
> + .get_dev_attr = intel_iommu_get_dev_attr,
> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
> };
>
>
Thanks
Eric
Hi Lu,
On 11/5/18 8:34 AM, Lu Baolu wrote:
> This adds helpers to attach or detach a domain to a
> group. This will replace iommu_attach_group() which
> only works for pci devices.
s/pci/non mdev?
>
> If a domain is attaching to a group which includes the
> mediated devices, it should attach to the iommu device
> (a pci device which represents the mdev in iommu scope)
> instead. The added helper supports attaching domain to
> groups for both pci and mdev devices.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 114 ++++++++++++++++++++++++++++++--
> 1 file changed, 107 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d9fd3188615d..178264b330e7 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -91,6 +91,7 @@ struct vfio_dma {
> struct vfio_group {
> struct iommu_group *iommu_group;
> struct list_head next;
> + bool mdev_group; /* An mdev group */
> };
>
> /*
> @@ -1327,6 +1328,105 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
> return ret;
> }
>
> +static struct device *vfio_mdev_get_iommu_device(struct device *dev)
> +{
> + struct device *(*fn)(struct device *dev);
> + struct device *iommu_parent;
> +
> + fn = symbol_get(mdev_get_iommu_device);
> + if (fn) {
> + iommu_parent = fn(dev);
> + symbol_put(mdev_get_iommu_device);
> +
> + return iommu_parent;
> + }
> +
> + return NULL;
> +}
> +
> +static int vfio_mdev_set_domain(struct device *dev, struct iommu_domain *domain)
> +{
> + int (*fn)(struct device *dev, void *domain);
> + int ret;
> +
> + fn = symbol_get(mdev_set_iommu_domain);
> + if (fn) {
> + ret = fn(dev, domain);
> + symbol_put(mdev_set_iommu_domain);
> +
> + return ret;
> + }
> +
> + return -EINVAL;
> +}
> +
> +static int vfio_mdev_attach_domain(struct device *dev, void *data)
> +{
> + struct iommu_domain *domain = data;
> + struct device *iommu_device;
> + int ret;
> +
> + ret = vfio_mdev_set_domain(dev, domain);
> + if (ret)
> + return ret;
> +
> + iommu_device = vfio_mdev_get_iommu_device(dev);
> + if (iommu_device) {
> + bool aux_mode = false;
> +
> + iommu_get_dev_attr(iommu_device,
> + IOMMU_DEV_ATTR_AUXD_ENABLED, &aux_mode);
Don' you need to test the returned value before using aux_mode?
> + if (aux_mode)
> + return iommu_attach_device_aux(domain, iommu_device);
> + else
> + return iommu_attach_device(domain, iommu_device);
if for some reason the above ops fail, don't you want to call
vfio_mdev_set_domain(dev, NULL)
> + }
> +
> + return -EINVAL;
> +}
> +
> +static int vfio_mdev_detach_domain(struct device *dev, void *data)
> +{
> + struct iommu_domain *domain = data;
> + struct device *iommu_device;
> +
> + vfio_mdev_set_domain(dev, NULL);
> + iommu_device = vfio_mdev_get_iommu_device(dev);
> + if (iommu_device) {
> + bool aux_mode = false;
> +
> + iommu_get_dev_attr(iommu_device,
> + IOMMU_DEV_ATTR_AUXD_ENABLED, &aux_mode);
same here
> + if (aux_mode)
> + iommu_detach_device_aux(domain, iommu_device);
> + else
> + iommu_detach_device(domain, iommu_device);
> + }
> +
> + return 0;
> +}
> +
> +static int vfio_iommu_attach_group(struct vfio_domain *domain,
> + struct vfio_group *group)
> +{
> + if (group->mdev_group)
> + return iommu_group_for_each_dev(group->iommu_group,
> + domain->domain,
> + vfio_mdev_attach_domain);
> + else
> + return iommu_attach_group(domain->domain, group->iommu_group);
> +}
> +
> +static void vfio_iommu_detach_group(struct vfio_domain *domain,
> + struct vfio_group *group)
> +{
> + if (group->mdev_group)
> + iommu_group_for_each_dev(group->iommu_group, domain->domain,
> + vfio_mdev_detach_domain);
> + else
> + iommu_detach_group(domain->domain, group->iommu_group);
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> @@ -1402,7 +1502,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> goto out_domain;
> }
>
> - ret = iommu_attach_group(domain->domain, iommu_group);
> + ret = vfio_iommu_attach_group(domain, group);
> if (ret)
> goto out_domain;
>
> @@ -1434,8 +1534,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> list_for_each_entry(d, &iommu->domain_list, next) {
> if (d->domain->ops == domain->domain->ops &&
> d->prot == domain->prot) {
> - iommu_detach_group(domain->domain, iommu_group);
> - if (!iommu_attach_group(d->domain, iommu_group)) {
> + vfio_iommu_detach_group(domain, group);
> + if (!vfio_iommu_attach_group(d, group)) {
> list_add(&group->next, &d->group_list);
> iommu_domain_free(domain->domain);
> kfree(domain);
> @@ -1443,7 +1543,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> return 0;
> }
>
> - ret = iommu_attach_group(domain->domain, iommu_group);
> + ret = vfio_iommu_attach_group(domain, group);
> if (ret)
> goto out_domain;
> }
> @@ -1469,7 +1569,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> return 0;
>
> out_detach:
> - iommu_detach_group(domain->domain, iommu_group);
> + vfio_iommu_detach_group(domain, group);
> out_domain:
> iommu_domain_free(domain->domain);
> out_free:
> @@ -1560,7 +1660,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
> if (!group)
> continue;
>
> - iommu_detach_group(domain->domain, iommu_group);
> + vfio_iommu_detach_group(domain, group);
> list_del(&group->next);
> kfree(group);
> /*
> @@ -1625,7 +1725,7 @@ static void vfio_release_domain(struct vfio_domain *domain, bool external)
> list_for_each_entry_safe(group, group_tmp,
> &domain->group_list, next) {
> if (!external)
> - iommu_detach_group(domain->domain, group->iommu_group);
> + vfio_iommu_detach_group(domain, group);
> list_del(&group->next);
> kfree(group);
> }
>
Thanks
Eric
Hi Lu,
On 11/5/18 8:34 AM, Lu Baolu wrote:
> This adds the support to determine the isolation type
> of a mediated device group by checking whether it has
> an iommu device. If an iommu device exists, an iommu
> domain will be allocated and then attached to the iommu
> device. Otherwise, keep the same behavior as it is.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++++++++++++-----
> 1 file changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 178264b330e7..eed26129f58c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1427,13 +1427,40 @@ static void vfio_iommu_detach_group(struct vfio_domain *domain,
> iommu_detach_group(domain->domain, group->iommu_group);
> }
>
> +static bool vfio_bus_is_mdev(struct bus_type *bus)
> +{
> + struct bus_type *mdev_bus;
> + bool ret = false;
> +
> + mdev_bus = symbol_get(mdev_bus_type);
> + if (mdev_bus) {
> + ret = (bus == mdev_bus);
> + symbol_put(mdev_bus_type);
> + }
> +
> + return ret;
> +}
> +
> +static int vfio_mdev_iommu_device(struct device *dev, void *data)
> +{
> + struct device **old = data, *new;
> +
> + new = vfio_mdev_get_iommu_device(dev);
> + if (*old && *old != new)
if !new can't you return -EINVAL as well?
> + return -EINVAL;
> +
> + *old = new;
> +
> + return 0;
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> struct vfio_iommu *iommu = iommu_data;
> struct vfio_group *group;
> struct vfio_domain *domain, *d;
> - struct bus_type *bus = NULL, *mdev_bus;
> + struct bus_type *bus = NULL;
> int ret;
> bool resv_msi, msi_remap;
> phys_addr_t resv_msi_base;
> @@ -1468,11 +1495,18 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> if (ret)
> goto out_free;
>
> - mdev_bus = symbol_get(mdev_bus_type);
> + if (vfio_bus_is_mdev(bus)) {
> + struct device *iommu_device = NULL;
>
> - if (mdev_bus) {
> - if ((bus == mdev_bus) && !iommu_present(bus)) {
> - symbol_put(mdev_bus_type);
> + group->mdev_group = true;
> +
> + /* Determine the isolation type */
> + ret = iommu_group_for_each_dev(iommu_group, &iommu_device,
> + vfio_mdev_iommu_device);
> + if (ret)
> + goto out_free;
> +
> + if (!iommu_device) {
> if (!iommu->external_domain) {
> INIT_LIST_HEAD(&domain->group_list);
> iommu->external_domain = domain;
> @@ -1482,9 +1516,11 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> list_add(&group->next,
> &iommu->external_domain->group_list);
> mutex_unlock(&iommu->lock);
> +
extra new line
> return 0;
> }
> - symbol_put(mdev_bus_type);
> +
> + bus = iommu_device->bus;
> }
>
> domain->domain = iommu_domain_alloc(bus);
>
Thanks
Eric
Hi Eric,
On 11/23/18 6:50 PM, Auger Eric wrote:
> Hi Lu,
>
> On 11/5/18 8:34 AM, Lu Baolu wrote:
>> Sharing a physical PCI device in a finer-granularity way
>> is becoming a consensus in the industry. IOMMU vendors
>> are also engaging efforts to support such sharing as well
>> as possible. Among the efforts, the capability of support
>> finer-granularity DMA isolation is a common requirement
>> due to the security consideration. With finer-granularity
>> DMA isolation, all DMA requests out of or to a subset of
>> a physical PCI device can be protected by the IOMMU. As a
>> result, there is a request in software to attach multiple
>> domains to a physical PCI device. One example of such use
>> model is the Intel Scalable IOV [1] [2]. The Intel vt-d
>> 3.0 spec [3] introduces the scalable mode which enables
>> PASID granularity DMA isolation.
>>
>> This adds the APIs to support multiple domains per device.
>> In order to ease the discussions, we call it 'a domain in
>> auxiliary mode' or simply 'auxiliary domain' when multiple
>> domains are attached to a physical device.
>>
>> The APIs includes:
>>
>> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
>> - Represents the ability of supporting multiple domains
>> per device.
>>
>> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLED)
>> - Checks whether the device identified by @dev is working
>> in auxiliary mode.
>>
>> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
>> - Enables the multiple domains capability for the device
>> referenced by @dev.
>>
>> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
>> - Disables the multiple domains capability for the device
>> referenced by @dev.
>>
>> * iommu_attach_device_aux(domain, dev)
>> - Attaches @domain to @dev in the auxiliary mode. Multiple
>> domains could be attached to a single device in the
>> auxiliary mode with each domain representing an isolated
>> address space for an assignable subset of the device.
>>
>> * iommu_detach_device_aux(domain, dev)
>> - Detach @domain which has been attached to @dev in the
>> auxiliary mode.
>>
>> * iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
>> - Return ID used for finer-granularity DMA translation.
>> For the Intel Scalable IOV usage model, this will be
>> a PASID. The device which supports Scalalbe IOV needs
> s/Scalalbe/Scalable
>> to writes this ID to the device register so that DMA
> s/writes/write
Yes and thanks.
>> requests could be tagged with a right PASID prefix.
> This is not crystal clear to me as the intel implementation returns the
> default PASID and not the PASID of the aux domain.
The PASID of the aux domain is called default PASID.
>>
>> Many people involved in discussions of this design.
>>
>> Kevin Tian <[email protected]>
>> Liu Yi L <[email protected]>
>> Ashok Raj <[email protected]>
>> Sanjay Kumar <[email protected]>
>> Jacob Pan <[email protected]>
>> Alex Williamson <[email protected]>
>> Jean-Philippe Brucker <[email protected]>
>>
>> and some discussions can be found here [4].
>>
>> [1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>> [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>> [3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>> [4] https://lkml.org/lkml/2018/7/26/4
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Suggested-by: Kevin Tian <[email protected]>
>> Suggested-by: Jean-Philippe Brucker <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> ---
>> drivers/iommu/iommu.c | 52 +++++++++++++++++++++++++++++++++++++++++++
>> include/linux/iommu.h | 52 +++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 104 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index edbdf5d6962c..0b7c96d1425e 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2030,3 +2030,55 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>> return 0;
>> }
>> EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
>> +
>> +/*
>> + * Generic interfaces to get or set per device IOMMU attributions.
>> + */
>> +int iommu_get_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
>> +{
>> + const struct iommu_ops *ops = dev->bus->iommu_ops;
>> +
>> + if (ops && ops->get_dev_attr)
>> + return ops->get_dev_attr(dev, attr, data);
>> +
>> + return -EINVAL;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_get_dev_attr);
>> +
>> +int iommu_set_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
>> +{
>> + const struct iommu_ops *ops = dev->bus->iommu_ops;
>> +
>> + if (ops && ops->set_dev_attr)
>> + return ops->set_dev_attr(dev, attr, data);
>> +
>> + return -EINVAL;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_set_dev_attr);
>> +
>> +/*
>> + * APIs to attach/detach a domain to/from a device in the
>> + * auxiliary mode.
>> + */
>> +int iommu_attach_device_aux(struct iommu_domain *domain, struct device *dev)
>> +{
>> + int ret = -ENODEV;
>> +
>> + if (domain->ops->attach_dev_aux)
>> + ret = domain->ops->attach_dev_aux(domain, dev);
>> +
>> + if (!ret)
>> + trace_attach_device_to_domain(dev);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_attach_device_aux);
>> +
>> +void iommu_detach_device_aux(struct iommu_domain *domain, struct device *dev)
>> +{
>> + if (domain->ops->detach_dev_aux) {
>> + domain->ops->detach_dev_aux(domain, dev);
>> + trace_detach_device_from_domain(dev);
>> + }
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_device_aux);
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index a1d28f42cb77..9bf1b3f2457a 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -126,6 +126,7 @@ enum iommu_attr {
>> DOMAIN_ATTR_NESTING, /* two stages of translation */
>> DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
>> DOMAIN_ATTR_MAX,
>> + DOMAIN_ATTR_AUXD_ID,
>> };
>>
>> /* These are the possible reserved region types */
>> @@ -156,6 +157,14 @@ struct iommu_resv_region {
>> enum iommu_resv_type type;
>> };
>>
>> +/* Per device IOMMU attributions */
>> +enum iommu_dev_attr {
>> + IOMMU_DEV_ATTR_AUXD_CAPABILITY,
>> + IOMMU_DEV_ATTR_AUXD_ENABLED,
>> + IOMMU_DEV_ATTR_AUXD_ENABLE,
>> + IOMMU_DEV_ATTR_AUXD_DISABLE,
>> +};
>> +
>> #ifdef CONFIG_IOMMU_API
>>
>> /**
>> @@ -183,6 +192,8 @@ struct iommu_resv_region {
>> * @domain_window_enable: Configure and enable a particular window for a domain
>> * @domain_window_disable: Disable a particular window for a domain
>> * @of_xlate: add OF master IDs to iommu grouping
>> + * @get_dev_attr: get per device IOMMU attributions
> s/attributions/attributes here and other locations?
Yes. It should be "attributes". Thanks.
>> + * @set_dev_attr: set per device IOMMU attributions
>> * @pgsize_bitmap: bitmap of all possible supported page sizes
>> */
>> struct iommu_ops {
>> @@ -226,6 +237,15 @@ struct iommu_ops {
>> int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
>> bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
>>
>> + /* Get/set per device IOMMU attributions */
>> + int (*get_dev_attr)(struct device *dev,
>> + enum iommu_dev_attr attr, void *data);
>> + int (*set_dev_attr)(struct device *dev,
>> + enum iommu_dev_attr attr, void *data);
>> + /* Attach/detach aux domain */
>> + int (*attach_dev_aux)(struct iommu_domain *domain, struct device *dev);
>> + void (*detach_dev_aux)(struct iommu_domain *domain, struct device *dev);
>> +
>> unsigned long pgsize_bitmap;
>> };
>>
>> @@ -398,6 +418,16 @@ void iommu_fwspec_free(struct device *dev);
>> int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>> const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
>>
>> +int iommu_get_dev_attr(struct device *dev,
>> + enum iommu_dev_attr attr, void *data);
>> +int iommu_set_dev_attr(struct device *dev,
>> + enum iommu_dev_attr attr, void *data);
>> +
>> +extern int iommu_attach_device_aux(struct iommu_domain *domain,
>> + struct device *dev);
>> +extern void iommu_detach_device_aux(struct iommu_domain *domain,
>> + struct device *dev);
>> +
>> #else /* CONFIG_IOMMU_API */
>>
>> struct iommu_ops {};
>> @@ -682,6 +712,28 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>> return NULL;
>> }
>>
>> +static inline int
>> +iommu_get_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> +static inline int
>> +iommu_set_dev_attr(struct device *dev, enum iommu_dev_attr attr, void *data)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> +static inline int
>> +iommu_attach_device_aux(struct iommu_domain *domain, struct device *dev)
>> +{
>> + return -ENODEV;
>> +}
>> +
>> +static inline void
>> +iommu_detach_device_aux(struct iommu_domain *domain, struct device *dev)
>> +{
>> +}
>> #endif /* CONFIG_IOMMU_API */
>>
>> #ifdef CONFIG_IOMMU_DEBUGFS
>>
>
> Thanks
>
> Eric
>
Best regards,
Lu Baolu
Hi,
On 11/23/18 6:49 PM, Auger Eric wrote:
> Hi,
>
> On 11/5/18 8:34 AM, Lu Baolu wrote:
>> Add the response to IOMMU_DEV_ATTR_AUXD_CAPABILITY capability query
>> through iommu_get_dev_attr().
>
> commit title: Advertise auxiliary domain capability?
Yes. I should make it consistent. Thanks.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Signed-off-by: Liu Yi L <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 38 +++++++++++++++++++++++++++++++++++++
>> 1 file changed, 38 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 5e149d26ea9b..298f7a3fafe8 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5167,6 +5167,24 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
>> return phys;
>> }
>>
>> +static inline bool scalable_mode_support(void)
>> +{
>> + struct dmar_drhd_unit *drhd;
>> + struct intel_iommu *iommu;
>> + bool ret = true;
>> +
>> + rcu_read_lock();
>> + for_each_active_iommu(iommu, drhd) {
>> + if (!sm_supported(iommu)) {
>> + ret = false;
>> + break;
>> + }
>> + }
>> + rcu_read_unlock();
>> +
>> + return ret;
>> +}
>> +
>> static bool intel_iommu_capable(enum iommu_cap cap)
>> {
>> if (cap == IOMMU_CAP_CACHE_COHERENCY)
>> @@ -5331,6 +5349,25 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
>> }
>> #endif /* CONFIG_INTEL_IOMMU_SVM */
>>
>> +static int intel_iommu_get_dev_attr(struct device *dev,
>> + enum iommu_dev_attr attr, void *data)
>> +{
>> + int ret = 0;
>> + bool *auxd_capable;
> nit: could be local to the case as other cases may use other datatypes.
I have thought about this. Making it local to the case needs extra "{}".
That's the reason I put it here. We can change it later when we need
other datatypes.
>> +
>> + switch (attr) {
>> + case IOMMU_DEV_ATTR_AUXD_CAPABILITY:
>> + auxd_capable = data;
>> + *auxd_capable = scalable_mode_support();
>> + break;
>> + default:
>> + ret = -EINVAL;
>> + break;
>> + }
>> +
>> + return ret;
>> +}
>> +
>> const struct iommu_ops intel_iommu_ops = {
>> .capable = intel_iommu_capable,
>> .domain_alloc = intel_iommu_domain_alloc,
>> @@ -5345,6 +5382,7 @@ const struct iommu_ops intel_iommu_ops = {
>> .get_resv_regions = intel_iommu_get_resv_regions,
>> .put_resv_regions = intel_iommu_put_resv_regions,
>> .device_group = pci_device_group,
>> + .get_dev_attr = intel_iommu_get_dev_attr,
>> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
>> };
>>
>>
> Thanks
>
> Eric
>
>
Best regards,
Lu Baolu
Hi,
On 11/23/18 6:49 PM, Auger Eric wrote:
> Hi Lu,
>
> On 11/5/18 8:34 AM, Lu Baolu wrote:
>> When multiple domains per device has been enabled by the
>> device driver, the device will tag the default PASID for
>> the domain to all DMA traffics out of the subset of this
>> device; and the IOMMU should translate the DMA requests
>> in PASID granularity.
>>
>> This extends the intel_iommu_attach/detach_device() ops
>> to support managing PASID granular translation structures
>> when the device driver has enabled multiple domains per
>> device.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Signed-off-by: Sanjay Kumar <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Signed-off-by: Liu Yi L <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 192 +++++++++++++++++++++++++++++++-----
>> include/linux/intel-iommu.h | 10 ++
>> 2 files changed, 180 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 2c86ac71c774..a61b25ad0d3b 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2477,6 +2477,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>> info->iommu = iommu;
>> info->pasid_table = NULL;
>> info->auxd_enabled = 0;
>> + INIT_LIST_HEAD(&info->auxiliary_domains);
>>
>> if (dev && dev_is_pci(dev)) {
>> struct pci_dev *pdev = to_pci_dev(info->dev);
>> @@ -5010,35 +5011,134 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
>> domain_exit(to_dmar_domain(domain));
>> }
>>
>> -static int intel_iommu_attach_device(struct iommu_domain *domain,
>> - struct device *dev)
>> +/*
>> + * Check whether a @domain will be attached to the @dev in the
>> + * auxiliary mode.
>> + */
>> +static inline bool
>> +is_device_attach_aux_domain(struct device *dev, struct iommu_domain *domain)
>> {
>> - struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> - struct intel_iommu *iommu;
>> - int addr_width;
>> - u8 bus, devfn;
>> + struct device_domain_info *info = dev->archdata.iommu;
>>
>> - if (device_is_rmrr_locked(dev)) {
>> - dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
>> - return -EPERM;
>> - }
>> + return info && info->auxd_enabled &&
>> + domain->type == IOMMU_DOMAIN_UNMANAGED;
>> +}
>>
>> - /* normally dev is not mapped */
>> - if (unlikely(domain_context_mapped(dev))) {
>> - struct dmar_domain *old_domain;
>> +static void auxiliary_link_device(struct dmar_domain *domain,
>> + struct device *dev)
>> +{
>> + struct device_domain_info *info = dev->archdata.iommu;
>>
>> - old_domain = find_domain(dev);
>> - if (old_domain) {
>> - rcu_read_lock();
>> - dmar_remove_one_dev_info(old_domain, dev);
>> - rcu_read_unlock();
>> + assert_spin_locked(&device_domain_lock);
>> + if (WARN_ON(!info))
>> + return;
>>
>> - if (!domain_type_is_vm_or_si(old_domain) &&
>> - list_empty(&old_domain->devices))
>> - domain_exit(old_domain);
>> + domain->auxd_refcnt++;
>> + list_add(&domain->auxd, &info->auxiliary_domains);
>> +}
>> +
>> +static void auxiliary_unlink_device(struct dmar_domain *domain,
>> + struct device *dev)
>> +{
>> + struct device_domain_info *info = dev->archdata.iommu;
>> +
>> + assert_spin_locked(&device_domain_lock);
>> + if (WARN_ON(!info))
>> + return;
>> +
>> + list_del(&domain->auxd);
>> + domain->auxd_refcnt--;
>> +
>> + if (!domain->auxd_refcnt && domain->default_pasid > 0)
>> + intel_pasid_free_id(domain->default_pasid);
>> +}
>> +
>> +static int domain_add_dev_auxd(struct dmar_domain *domain,
>> + struct device *dev)
>> +{
>> + int ret;
>> + u8 bus, devfn;
>> + unsigned long flags;
>> + struct intel_iommu *iommu;
>> +
>> + iommu = device_to_iommu(dev, &bus, &devfn);
>> + if (!iommu)
>> + return -ENODEV;
>> +
>> + spin_lock_irqsave(&device_domain_lock, flags);
>> + if (domain->default_pasid <= 0) {
>> + domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN,
>> + pci_max_pasids(to_pci_dev(dev)), GFP_ATOMIC);
>> + if (domain->default_pasid < 0) {
>> + pr_err("Can't allocate default pasid\n");
>> + ret = -ENODEV;
>> + goto pasid_failed;
>> }
>> }
>>
>> + spin_lock(&iommu->lock);
> You may comment your nested lock policy somewhere.
Yes. I will add below comments.
/*
* iommu->lock must be held to attach domain to iommu and setup the
* pasid entry for second level translation.
*/
>> + ret = domain_attach_iommu(domain, iommu);
>> + if (ret)
>> + goto attach_failed;
>> +
>> + /* Setup the PASID entry for mediated devices: */
>> + ret = intel_pasid_setup_second_level(iommu, domain, dev,
>> + domain->default_pasid);
>> + if (ret)
>> + goto table_failed;
>> + spin_unlock(&iommu->lock);
>> +
>> + auxiliary_link_device(domain, dev);
>> +
>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>> +
>> + return 0;
>> +
>> +table_failed:
>> + domain_detach_iommu(domain, iommu);
>> +attach_failed:
>> + spin_unlock(&iommu->lock);
>> + if (!domain->auxd_refcnt && domain->default_pasid > 0)
>> + intel_pasid_free_id(domain->default_pasid);
>> +pasid_failed:
>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>> +
>> + return ret;
>> +}
>> +
>> +static void domain_remove_dev_aux(struct dmar_domain *domain,
>> + struct device *dev)
>> +{
>> + struct device_domain_info *info;
>> + struct intel_iommu *iommu;
>> + unsigned long flags;
>> +
>> + if (!is_device_attach_aux_domain(dev, &domain->domain))
>> + return;
>> +
>> + spin_lock_irqsave(&device_domain_lock, flags);
>> + info = dev->archdata.iommu;
>> + iommu = info->iommu;
>> +
>> + intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
>> +
>> + auxiliary_unlink_device(domain, dev);
>> +
>> + spin_lock(&iommu->lock);
>> + domain_detach_iommu(domain, iommu);
>> + spin_unlock(&iommu->lock);
>> +
>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>> +}
>> +
>> +static int __intel_iommu_attach_device(struct iommu_domain *domain,
>> + struct device *dev)
>> +{
> Maybe introducing __intel_iommu_attach_device in a patch prior to that
> one would help the review.
Yes. Will use a separated patch to introduce this helper.
>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> + struct intel_iommu *iommu;
>> + int addr_width;
>> + u8 bus, devfn;
>> +
>> iommu = device_to_iommu(dev, &bus, &devfn);
>> if (!iommu)
>> return -ENODEV;
>> @@ -5071,7 +5171,47 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
>> dmar_domain->agaw--;
>> }
>>
>> - return domain_add_dev_info(dmar_domain, dev);
>> + if (is_device_attach_aux_domain(dev, domain))
>> + return domain_add_dev_auxd(dmar_domain, dev);
> why not putting this directly into intel_iommu_attach_device_aux()?
>> + else
>> + return domain_add_dev_info(dmar_domain, dev);
> and this into intel_iommu_attach_device() as
> __intel_iommu_attach_device() is the common part now?
Good suggestion.
>> +}
>> +
>> +static int intel_iommu_attach_device(struct iommu_domain *domain,
>> + struct device *dev)
>> +{
>> + if (device_is_rmrr_locked(dev)) {
>> + dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
>> + return -EPERM;
>> + }
> shouldn't we test this in the common part (ie. in
> __intel_iommu_attach_device). Don't RMRR also impact aux domains ?
RMRR is only for Request ID based DMA translation. Aux domain uses
pasid-granular translation, hence should not be impacted by RMRR.
Furthermore, as far as I know, RMRR only used for legacy devices and
should never be used with modern devices.
>> +
>> + if (is_device_attach_aux_domain(dev, domain))
>> + return -EPERM;
>> +
>> + /* normally dev is not mapped */
>> + if (unlikely(domain_context_mapped(dev))) {
>> + struct dmar_domain *old_domain;
>> +
>> + old_domain = find_domain(dev);
>> + if (old_domain) {
>> + rcu_read_lock();
>> + dmar_remove_one_dev_info(old_domain, dev);
>> + rcu_read_unlock();
>> +
>> + if (!domain_type_is_vm_or_si(old_domain) &&
>> + list_empty(&old_domain->devices))
>> + domain_exit(old_domain);
>> + }
>> + }
>> +
>> + return __intel_iommu_attach_device(domain, dev);
>> +}
>> +
>> +static int intel_iommu_attach_device_aux(struct iommu_domain *domain,
>> + struct device *dev)
>> +{
>> + return is_device_attach_aux_domain(dev, domain) ?
>> + __intel_iommu_attach_device(domain, dev) : -EPERM;
>> }
>>
>> static void intel_iommu_detach_device(struct iommu_domain *domain,
>> @@ -5080,6 +5220,12 @@ static void intel_iommu_detach_device(struct iommu_domain *domain,
>> dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
>> }
>>
>> +static void intel_iommu_detach_device_aux(struct iommu_domain *domain,
>> + struct device *dev)
>> +{
>> + domain_remove_dev_aux(to_dmar_domain(domain), dev);
>> +}
>> +
>> static int intel_iommu_map(struct iommu_domain *domain,
>> unsigned long iova, phys_addr_t hpa,
>> size_t size, int iommu_prot)
>> @@ -5436,6 +5582,8 @@ const struct iommu_ops intel_iommu_ops = {
>> .domain_free = intel_iommu_domain_free,
>> .attach_dev = intel_iommu_attach_device,
>> .detach_dev = intel_iommu_detach_device,
>> + .attach_dev_aux = intel_iommu_attach_device_aux,
>> + .detach_dev_aux = intel_iommu_detach_device_aux,
>> .map = intel_iommu_map,
>> .unmap = intel_iommu_unmap,
>> .iova_to_phys = intel_iommu_iova_to_phys,
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index 6b198e13e75e..678c7fb05e74 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -473,9 +473,11 @@ struct dmar_domain {
>> /* Domain ids per IOMMU. Use u16 since
>> * domain ids are 16 bit wide according
>> * to VT-d spec, section 9.3 */
>> + unsigned int auxd_refcnt; /* Refcount of auxiliary attaching */
>>
>> bool has_iotlb_device;
>> struct list_head devices; /* all devices' list */
>> + struct list_head auxd; /* link to device's auxiliary list */
>> struct iova_domain iovad; /* iova's that belong to this domain */
>>
>> struct dma_pte *pgd; /* virtual address */
>> @@ -494,6 +496,11 @@ struct dmar_domain {
>> 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>> u64 max_addr; /* maximum mapped address */
>>
>> + int default_pasid; /*
>> + * The default pasid used for non-SVM
>> + * traffic on mediated devices.
>> + */
>> +
>> struct iommu_domain domain; /* generic domain data structure for
>> iommu core */
>> };
>> @@ -543,6 +550,9 @@ struct device_domain_info {
>> struct list_head link; /* link to domain siblings */
>> struct list_head global; /* link to global list */
>> struct list_head table; /* link to pasid table */
>> + struct list_head auxiliary_domains; /* auxiliary domains
>> + * attached to this device
>> + */
>> u8 bus; /* PCI bus number */
>> u8 devfn; /* PCI devfn number */
>> u16 pfsid; /* SRIOV physical function source ID */
>>
> Thanks
>
> Eric
>
Best regards,
Lu Baolu
Hi,
On 11/23/18 10:13 PM, Auger Eric wrote:
> Hi Lu,
>
> On 11/5/18 8:34 AM, Lu Baolu wrote:
>> This adds helpers to attach or detach a domain to a
>> group. This will replace iommu_attach_group() which
>> only works for pci devices.
> s/pci/non mdev?
... which doesn't work for mdev devices.
>>
>> If a domain is attaching to a group which includes the
>> mediated devices, it should attach to the iommu device
>> (a pci device which represents the mdev in iommu scope)
>> instead. The added helper supports attaching domain to
>> groups for both pci and mdev devices.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Signed-off-by: Liu Yi L <[email protected]>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 114 ++++++++++++++++++++++++++++++--
>> 1 file changed, 107 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index d9fd3188615d..178264b330e7 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -91,6 +91,7 @@ struct vfio_dma {
>> struct vfio_group {
>> struct iommu_group *iommu_group;
>> struct list_head next;
>> + bool mdev_group; /* An mdev group */
>> };
>>
>> /*
>> @@ -1327,6 +1328,105 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
>> return ret;
>> }
>>
>> +static struct device *vfio_mdev_get_iommu_device(struct device *dev)
>> +{
>> + struct device *(*fn)(struct device *dev);
>> + struct device *iommu_parent;
>> +
>> + fn = symbol_get(mdev_get_iommu_device);
>> + if (fn) {
>> + iommu_parent = fn(dev);
>> + symbol_put(mdev_get_iommu_device);
>> +
>> + return iommu_parent;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static int vfio_mdev_set_domain(struct device *dev, struct iommu_domain *domain)
>> +{
>> + int (*fn)(struct device *dev, void *domain);
>> + int ret;
>> +
>> + fn = symbol_get(mdev_set_iommu_domain);
>> + if (fn) {
>> + ret = fn(dev, domain);
>> + symbol_put(mdev_set_iommu_domain);
>> +
>> + return ret;
>> + }
>> +
>> + return -EINVAL;
>> +}
>> +
>> +static int vfio_mdev_attach_domain(struct device *dev, void *data)
>> +{
>> + struct iommu_domain *domain = data;
>> + struct device *iommu_device;
>> + int ret;
>> +
>> + ret = vfio_mdev_set_domain(dev, domain);
>> + if (ret)
>> + return ret;
>> +
>> + iommu_device = vfio_mdev_get_iommu_device(dev);
>> + if (iommu_device) {
>> + bool aux_mode = false;
>> +
>> + iommu_get_dev_attr(iommu_device,
>> + IOMMU_DEV_ATTR_AUXD_ENABLED, &aux_mode);
> Don' you need to test the returned value before using aux_mode?
Yes. Good catch.
>> + if (aux_mode)
>> + return iommu_attach_device_aux(domain, iommu_device);
>> + else
>> + return iommu_attach_device(domain, iommu_device);
> if for some reason the above ops fail, don't you want to call
> vfio_mdev_set_domain(dev, NULL)
Yes. Good catch.
>
>> + }
>> +
>> + return -EINVAL;
>> +}
>> +
>> +static int vfio_mdev_detach_domain(struct device *dev, void *data)
>> +{
>> + struct iommu_domain *domain = data;
>> + struct device *iommu_device;
>> +
>> + vfio_mdev_set_domain(dev, NULL);
>> + iommu_device = vfio_mdev_get_iommu_device(dev);
>> + if (iommu_device) {
>> + bool aux_mode = false;
>> +
>> + iommu_get_dev_attr(iommu_device,
>> + IOMMU_DEV_ATTR_AUXD_ENABLED, &aux_mode);
> same here
Will fix it.
>> + if (aux_mode)
>> + iommu_detach_device_aux(domain, iommu_device);
>> + else
>> + iommu_detach_device(domain, iommu_device);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int vfio_iommu_attach_group(struct vfio_domain *domain,
>> + struct vfio_group *group)
>> +{
>> + if (group->mdev_group)
>> + return iommu_group_for_each_dev(group->iommu_group,
>> + domain->domain,
>> + vfio_mdev_attach_domain);
>> + else
>> + return iommu_attach_group(domain->domain, group->iommu_group);
>> +}
>> +
>> +static void vfio_iommu_detach_group(struct vfio_domain *domain,
>> + struct vfio_group *group)
>> +{
>> + if (group->mdev_group)
>> + iommu_group_for_each_dev(group->iommu_group, domain->domain,
>> + vfio_mdev_detach_domain);
>> + else
>> + iommu_detach_group(domain->domain, group->iommu_group);
>> +}
>> +
>> static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct iommu_group *iommu_group)
>> {
>> @@ -1402,7 +1502,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> goto out_domain;
>> }
>>
>> - ret = iommu_attach_group(domain->domain, iommu_group);
>> + ret = vfio_iommu_attach_group(domain, group);
>> if (ret)
>> goto out_domain;
>>
>> @@ -1434,8 +1534,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> list_for_each_entry(d, &iommu->domain_list, next) {
>> if (d->domain->ops == domain->domain->ops &&
>> d->prot == domain->prot) {
>> - iommu_detach_group(domain->domain, iommu_group);
>> - if (!iommu_attach_group(d->domain, iommu_group)) {
>> + vfio_iommu_detach_group(domain, group);
>> + if (!vfio_iommu_attach_group(d, group)) {
>> list_add(&group->next, &d->group_list);
>> iommu_domain_free(domain->domain);
>> kfree(domain);
>> @@ -1443,7 +1543,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> return 0;
>> }
>>
>> - ret = iommu_attach_group(domain->domain, iommu_group);
>> + ret = vfio_iommu_attach_group(domain, group);
>> if (ret)
>> goto out_domain;
>> }
>> @@ -1469,7 +1569,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> return 0;
>>
>> out_detach:
>> - iommu_detach_group(domain->domain, iommu_group);
>> + vfio_iommu_detach_group(domain, group);
>> out_domain:
>> iommu_domain_free(domain->domain);
>> out_free:
>> @@ -1560,7 +1660,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>> if (!group)
>> continue;
>>
>> - iommu_detach_group(domain->domain, iommu_group);
>> + vfio_iommu_detach_group(domain, group);
>> list_del(&group->next);
>> kfree(group);
>> /*
>> @@ -1625,7 +1725,7 @@ static void vfio_release_domain(struct vfio_domain *domain, bool external)
>> list_for_each_entry_safe(group, group_tmp,
>> &domain->group_list, next) {
>> if (!external)
>> - iommu_detach_group(domain->domain, group->iommu_group);
>> + vfio_iommu_detach_group(domain, group);
>> list_del(&group->next);
>> kfree(group);
>> }
>>
> Thanks
>
> Eric
>
Best regards,
Lu Baolu
Hi,
On 11/23/18 10:23 PM, Auger Eric wrote:
> Hi Lu,
>
> On 11/5/18 8:34 AM, Lu Baolu wrote:
>> This adds the support to determine the isolation type
>> of a mediated device group by checking whether it has
>> an iommu device. If an iommu device exists, an iommu
>> domain will be allocated and then attached to the iommu
>> device. Otherwise, keep the same behavior as it is.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Sanjay Kumar <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Signed-off-by: Liu Yi L <[email protected]>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 48 ++++++++++++++++++++++++++++-----
>> 1 file changed, 42 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 178264b330e7..eed26129f58c 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1427,13 +1427,40 @@ static void vfio_iommu_detach_group(struct vfio_domain *domain,
>> iommu_detach_group(domain->domain, group->iommu_group);
>> }
>>
>> +static bool vfio_bus_is_mdev(struct bus_type *bus)
>> +{
>> + struct bus_type *mdev_bus;
>> + bool ret = false;
>> +
>> + mdev_bus = symbol_get(mdev_bus_type);
>> + if (mdev_bus) {
>> + ret = (bus == mdev_bus);
>> + symbol_put(mdev_bus_type);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static int vfio_mdev_iommu_device(struct device *dev, void *data)
>> +{
>> + struct device **old = data, *new;
>> +
>> + new = vfio_mdev_get_iommu_device(dev);
>> + if (*old && *old != new)
> if !new can't you return -EINVAL as well?
Yes, good catch.
>> + return -EINVAL;
>> +
>> + *old = new;
>> +
>> + return 0;
>> +}
>> +
>> static int vfio_iommu_type1_attach_group(void *iommu_data,
>> struct iommu_group *iommu_group)
>> {
>> struct vfio_iommu *iommu = iommu_data;
>> struct vfio_group *group;
>> struct vfio_domain *domain, *d;
>> - struct bus_type *bus = NULL, *mdev_bus;
>> + struct bus_type *bus = NULL;
>> int ret;
>> bool resv_msi, msi_remap;
>> phys_addr_t resv_msi_base;
>> @@ -1468,11 +1495,18 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> if (ret)
>> goto out_free;
>>
>> - mdev_bus = symbol_get(mdev_bus_type);
>> + if (vfio_bus_is_mdev(bus)) {
>> + struct device *iommu_device = NULL;
>>
>> - if (mdev_bus) {
>> - if ((bus == mdev_bus) && !iommu_present(bus)) {
>> - symbol_put(mdev_bus_type);
>> + group->mdev_group = true;
>> +
>> + /* Determine the isolation type */
>> + ret = iommu_group_for_each_dev(iommu_group, &iommu_device,
>> + vfio_mdev_iommu_device);
>> + if (ret)
>> + goto out_free;
>> +
>> + if (!iommu_device) {
>> if (!iommu->external_domain) {
>> INIT_LIST_HEAD(&domain->group_list);
>> iommu->external_domain = domain;
>> @@ -1482,9 +1516,11 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>> list_add(&group->next,
>> &iommu->external_domain->group_list);
>> mutex_unlock(&iommu->lock);
>> +
> extra new line
Yes.
>> return 0;
>> }
>> - symbol_put(mdev_bus_type);
>> +
>> + bus = iommu_device->bus;
>> }
>>
>> domain->domain = iommu_domain_alloc(bus);
>>
> Thanks
>
> Eric
>
Best regards,
Lu Baolu
On 11/21/2018 2:15 PM, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 02:22:08AM +0530, Kirti Wankhede wrote:
>> It is about how mdev framework can be used by existing drivers. These
>> symbols doesn't use any other exported symbols.
>
> That is an unfortunate accident of history, but doesn't extent to new
> ones. It also is another inidicator those drivers probably are derived
> works of the Linux kernel and might be in legal trouble one way or
> another.
>
These symbols are just to associate iommu properties of a physical
device with a mdev device, doesn't include low-level information.
Thanks,
Kirti
Hi,
Is this solution trying to support general user space processes who are
directly working on devices?
Thanks,
Zaibo
.
On 2018/11/5 15:34, Lu Baolu wrote:
> Hi,
>
> The Mediate Device is a framework for fine-grained physical device
> sharing across the isolated domains. Currently the mdev framework
> is designed to be independent of the platform IOMMU support. As the
> result, the DMA isolation relies on the mdev parent device in a
> vendor specific way.
>
> There are several cases where a mediated device could be protected
> and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
> [1] introduces a new translation mode called 'scalable mode', which
> enables PASID-granular translations. The vt-d scalable mode is the
> key ingredient for Scalable I/O Virtualization [2] [3] which allows
> sharing a device in minimal possible granularity (ADI - Assignable
> Device Interface).
>
> A mediated device backed by an ADI could be protected and isolated
> by the IOMMU since 1) the parent device supports tagging an unique
> PASID to all DMA traffic out of the mediated device; and 2) the DMA
> translation unit (IOMMU) supports the PASID granular translation.
> We can apply IOMMU protection and isolation to this kind of devices
> just as what we are doing with an assignable PCI device.
>
> In order to distinguish the IOMMU-capable mediated devices from those
> which still need to rely on parent devices, this patch set adds two
> new members in struct mdev_device.
>
> * iommu_device
> - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation.
>
> * iommu_domain
> - This is a place holder for an iommu domain. A domain
> could be store here for later use once it has been
> attached to the iommu_device of this mdev.
>
> Below helpers are added to set and get above iommu device
> and iommu domain pointers in mdev core implementation.
>
> * mdev_set/get_iommu_device(dev, iommu_device)
> - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
>
> * mdev_set/get_iommu_domain(domain)
> - A iommu domain which has been attached to the iommu
> device in order to protect and isolate the mediated
> device will be kept in the mdev data structure and
> could be retrieved later.
>
> The mdev parent device driver could opt-in that the mdev could be
> fully isolated and protected by the IOMMU when the mdev is being
> created by invoking mdev_set_iommu_device() in its @create().
>
> In the vfio_iommu_type1_attach_group(), a domain allocated through
> iommu_domain_alloc() will be attached to the mdev iommu device if
> an iommu device has been set. Otherwise, the dummy external domain
> will be used and all the DMA isolation and protection are routed to
> parent driver as the result.
>
> On IOMMU side, a basic requirement is allowing to attach multiple
> domains to a PCI device if the device advertises the capability
> and the IOMMU hardware supports finer granularity translations than
> the normal PCI Source ID based translation.
>
> As the result, a PCI device could work in two modes: normal mode
> and auxiliary mode. In the normal mode, a pci device could be
> isolated in the Source ID granularity; the pci device itself could
> be assigned to a user application by attaching a single domain
> to it. In the auxiliary mode, a pci device could be isolated in
> finer granularity, hence subsets of the device could be assigned
> to different user level application by attaching a different domain
> to each subset.
>
> The device driver is able to switch between above two modes with
> below interfaces:
>
> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
> - Represents the ability of supporting multiple domains
> per device.
>
> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
> - Enable the multiple domains capability for the device
> referenced by @dev.
>
> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
> - Disable the multiple domains capability for the device
> referenced by @dev.
>
> * iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
> - Return ID used for finer-granularity DMA translation.
>
> * iommu_attach_device_aux(domain, dev)
> - Attach a domain to the device in the auxiliary mode.
>
> * iommu_detach_device_aux(domain, dev)
> - Detach the aux domain from device.
>
> In order for the ease of discussion, sometimes we call "a domain in
> auxiliary mode' or simply 'an auxiliary domain' when a domain is
> attached to a device for finer granularity translations. But we need
> to keep in mind that this doesn't mean there is a differnt domain
> type. A same domain could be bound to a device for Source ID based
> translation, and bound to another device for finer granularity
> translation at the same time.
>
> This patch series extends both IOMMU and vfio components to support
> mdev device passing through when it could be isolated and protected
> by the IOMMU units. The first part of this series (PATCH 1/08~5/08)
> adds the interfaces and implementation of the multiple domains per
> device. The second part (PATCH 6/08~8/08) adds the iommu device
> attribute to each mdev, determines isolation type according to the
> existence of an iommu device when attaching group in vfio type1 iommu
> module, and attaches the domain to iommu aware mediated devices.
>
> This patch series depends on a patch set posted here [4] for discussion
> which added scalable mode support in Intel IOMMU driver.
>
> References:
> [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> [4] https://lkml.org/lkml/2018/11/5/136
>
> Best regards,
> Lu Baolu
>
> Change log:
> v3->v4:
> - Use aux domain specific interfaces for domain attach and detach.
> - Rebase all patches to 4.20-rc1.
>
> v2->v3:
> - Remove domain type enum and use a pointer on mdev_device instead.
> - Add a generic interface for getting/setting per device iommu
> attributions. And use it for query aux domain capability, enable
> aux domain and disable aux domain purpose.
> - Reuse iommu_domain_get_attr() to retrieve the id in a aux domain.
> - We discussed the impact of the default domain implementation
> on reusing iommu_at(de)tach_device() interfaces. We agreed
> that reusing iommu_at(de)tach_device() interfaces is the right
> direction and we could tweak the code to remove the impact.
> https://www.spinics.net/lists/kvm/msg175285.html
> - Removed the RFC tag since no objections received.
> - This patch has been submitted separately.
> https://www.spinics.net/lists/kvm/msg173936.html
>
> v1->v2:
> - Rewrite the patches with the concept of auxiliary domains.
>
> Lu Baolu (8):
> iommu: Add APIs for multiple domains per device
> iommu/vt-d: Add multiple domains per device query
> iommu/vt-d: Enable/disable multiple domains per device
> iommu/vt-d: Attach/detach domains in auxiliary mode
> iommu/vt-d: Return ID associated with an auxiliary domain
> vfio/mdev: Add iommu place holders in mdev_device
> vfio/type1: Add domain at(de)taching group helpers
> vfio/type1: Handle different mdev isolation type
>
> drivers/iommu/intel-iommu.c | 315 ++++++++++++++++++++++++++++---
> drivers/iommu/iommu.c | 52 +++++
> drivers/vfio/mdev/mdev_core.c | 36 ++++
> drivers/vfio/mdev/mdev_private.h | 2 +
> drivers/vfio/vfio_iommu_type1.c | 162 ++++++++++++++--
> include/linux/intel-iommu.h | 11 ++
> include/linux/iommu.h | 52 +++++
> include/linux/mdev.h | 23 +++
> 8 files changed, 618 insertions(+), 35 deletions(-)
>
Hi,
On 12/4/18 11:46 AM, Xu Zaibo wrote:
> Hi,
>
> Is this solution trying to support general user space processes who are
> directly working on devices?
Yes, it is.
>
> Thanks,
> Zaibo
Best regards,
Lu Baolu
>
> .
>
> On 2018/11/5 15:34, Lu Baolu wrote:
>> Hi,
>>
>> The Mediate Device is a framework for fine-grained physical device
>> sharing across the isolated domains. Currently the mdev framework
>> is designed to be independent of the platform IOMMU support. As the
>> result, the DMA isolation relies on the mdev parent device in a
>> vendor specific way.
>>
>> There are several cases where a mediated device could be protected
>> and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
>> [1] introduces a new translation mode called 'scalable mode', which
>> enables PASID-granular translations. The vt-d scalable mode is the
>> key ingredient for Scalable I/O Virtualization [2] [3] which allows
>> sharing a device in minimal possible granularity (ADI - Assignable
>> Device Interface).
>>
>> A mediated device backed by an ADI could be protected and isolated
>> by the IOMMU since 1) the parent device supports tagging an unique
>> PASID to all DMA traffic out of the mediated device; and 2) the DMA
>> translation unit (IOMMU) supports the PASID granular translation.
>> We can apply IOMMU protection and isolation to this kind of devices
>> just as what we are doing with an assignable PCI device.
>>
>> In order to distinguish the IOMMU-capable mediated devices from those
>> which still need to rely on parent devices, this patch set adds two
>> new members in struct mdev_device.
>>
>> * iommu_device
>> ?? - This, if set, indicates that the mediated device could
>> ???? be fully isolated and protected by IOMMU via attaching
>> ???? an iommu domain to this device. If empty, it indicates
>> ???? using vendor defined isolation.
>>
>> * iommu_domain
>> ?? - This is a place holder for an iommu domain. A domain
>> ???? could be store here for later use once it has been
>> ???? attached to the iommu_device of this mdev.
>>
>> Below helpers are added to set and get above iommu device
>> and iommu domain pointers in mdev core implementation.
>>
>> * mdev_set/get_iommu_device(dev, iommu_device)
>> ?? - Set or get the iommu device which represents this mdev
>> ???? in IOMMU's device scope. Drivers don't need to set the
>> ???? iommu device if it uses vendor defined isolation.
>>
>> * mdev_set/get_iommu_domain(domain)
>> ?? - A iommu domain which has been attached to the iommu
>> ???? device in order to protect and isolate the mediated
>> ???? device will be kept in the mdev data structure and
>> ???? could be retrieved later.
>>
>> The mdev parent device driver could opt-in that the mdev could be
>> fully isolated and protected by the IOMMU when the mdev is being
>> created by invoking mdev_set_iommu_device() in its @create().
>>
>> In the vfio_iommu_type1_attach_group(), a domain allocated through
>> iommu_domain_alloc() will be attached to the mdev iommu device if
>> an iommu device has been set. Otherwise, the dummy external domain
>> will be used and all the DMA isolation and protection are routed to
>> parent driver as the result.
>>
>> On IOMMU side, a basic requirement is allowing to attach multiple
>> domains to a PCI device if the device advertises the capability
>> and the IOMMU hardware supports finer granularity translations than
>> the normal PCI Source ID based translation.
>>
>> As the result, a PCI device could work in two modes: normal mode
>> and auxiliary mode. In the normal mode, a pci device could be
>> isolated in the Source ID granularity; the pci device itself could
>> be assigned to a user application by attaching a single domain
>> to it. In the auxiliary mode, a pci device could be isolated in
>> finer granularity, hence subsets of the device could be assigned
>> to different user level application by attaching a different domain
>> to each subset.
>>
>> The device driver is able to switch between above two modes with
>> below interfaces:
>>
>> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
>> ?? - Represents the ability of supporting multiple domains
>> ???? per device.
>>
>> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
>> ?? - Enable the multiple domains capability for the device
>> ???? referenced by @dev.
>>
>> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
>> ?? - Disable the multiple domains capability for the device
>> ???? referenced by @dev.
>>
>> * iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
>> ?? - Return ID used for finer-granularity DMA translation.
>>
>> * iommu_attach_device_aux(domain, dev)
>> ?? - Attach a domain to the device in the auxiliary mode.
>>
>> * iommu_detach_device_aux(domain, dev)
>> ?? - Detach the aux domain from device.
>>
>> In order for the ease of discussion, sometimes we call "a domain in
>> auxiliary mode' or simply 'an auxiliary domain' when a domain is
>> attached to a device for finer granularity translations. But we need
>> to keep in mind that this doesn't mean there is a differnt domain
>> type. A same domain could be bound to a device for Source ID based
>> translation, and bound to another device for finer granularity
>> translation at the same time.
>>
>> This patch series extends both IOMMU and vfio components to support
>> mdev device passing through when it could be isolated and protected
>> by the IOMMU units. The first part of this series (PATCH 1/08~5/08)
>> adds the interfaces and implementation of the multiple domains per
>> device. The second part (PATCH 6/08~8/08) adds the iommu device
>> attribute to each mdev, determines isolation type according to the
>> existence of an iommu device when attaching group in vfio type1 iommu
>> module, and attaches the domain to iommu aware mediated devices.
>>
>> This patch series depends on a patch set posted here [4] for discussion
>> which added scalable mode support in Intel IOMMU driver.
>>
>> References:
>> [1]
>> https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>>
>> [2]
>> https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>>
>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>> [4] https://lkml.org/lkml/2018/11/5/136
>>
>> Best regards,
>> Lu Baolu
>>
>> Change log:
>> ?? v3->v4:
>> ?? - Use aux domain specific interfaces for domain attach and detach.
>> ?? - Rebase all patches to 4.20-rc1.
>>
>> ?? v2->v3:
>> ?? - Remove domain type enum and use a pointer on mdev_device instead.
>> ?? - Add a generic interface for getting/setting per device iommu
>> ???? attributions. And use it for query aux domain capability, enable
>> ???? aux domain and disable aux domain purpose.
>> ?? - Reuse iommu_domain_get_attr() to retrieve the id in a aux domain.
>> ?? - We discussed the impact of the default domain implementation
>> ???? on reusing iommu_at(de)tach_device() interfaces. We agreed
>> ???? that reusing iommu_at(de)tach_device() interfaces is the right
>> ???? direction and we could tweak the code to remove the impact.
>> ???? https://www.spinics.net/lists/kvm/msg175285.html
>> ?? - Removed the RFC tag since no objections received.
>> ?? - This patch has been submitted separately.
>> ???? https://www.spinics.net/lists/kvm/msg173936.html
>>
>> ?? v1->v2:
>> ?? - Rewrite the patches with the concept of auxiliary domains.
>>
>> Lu Baolu (8):
>> ?? iommu: Add APIs for multiple domains per device
>> ?? iommu/vt-d: Add multiple domains per device query
>> ?? iommu/vt-d: Enable/disable multiple domains per device
>> ?? iommu/vt-d: Attach/detach domains in auxiliary mode
>> ?? iommu/vt-d: Return ID associated with an auxiliary domain
>> ?? vfio/mdev: Add iommu place holders in mdev_device
>> ?? vfio/type1: Add domain at(de)taching group helpers
>> ?? vfio/type1: Handle different mdev isolation type
>>
>> ? drivers/iommu/intel-iommu.c????? | 315 ++++++++++++++++++++++++++++---
>> ? drivers/iommu/iommu.c??????????? |? 52 +++++
>> ? drivers/vfio/mdev/mdev_core.c??? |? 36 ++++
>> ? drivers/vfio/mdev/mdev_private.h |?? 2 +
>> ? drivers/vfio/vfio_iommu_type1.c? | 162 ++++++++++++++--
>> ? include/linux/intel-iommu.h????? |? 11 ++
>> ? include/linux/iommu.h??????????? |? 52 +++++
>> ? include/linux/mdev.h???????????? |? 23 +++
>> ? 8 files changed, 618 insertions(+), 35 deletions(-)
>>
>
>
>
Hi,
>>
>> Is this solution trying to support general user space processes who
>> are directly working on devices?
>
> Yes, it is.
>
Okay. But I got another question. As I write a Crypto driver, could I
call 'mdev_register_device'?
Or in other words, is 'mdev_register_device' acceptable for drivers of
Crypto?
+cc: Herbert Xu
Thanks,
Zaibo
>>
>>
>> On 2018/11/5 15:34, Lu Baolu wrote:
>>> Hi,
>>>
>>> The Mediate Device is a framework for fine-grained physical device
>>> sharing across the isolated domains. Currently the mdev framework
>>> is designed to be independent of the platform IOMMU support. As the
>>> result, the DMA isolation relies on the mdev parent device in a
>>> vendor specific way.
>>>
>>> There are several cases where a mediated device could be protected
>>> and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
>>> [1] introduces a new translation mode called 'scalable mode', which
>>> enables PASID-granular translations. The vt-d scalable mode is the
>>> key ingredient for Scalable I/O Virtualization [2] [3] which allows
>>> sharing a device in minimal possible granularity (ADI - Assignable
>>> Device Interface).
>>>
>>> A mediated device backed by an ADI could be protected and isolated
>>> by the IOMMU since 1) the parent device supports tagging an unique
>>> PASID to all DMA traffic out of the mediated device; and 2) the DMA
>>> translation unit (IOMMU) supports the PASID granular translation.
>>> We can apply IOMMU protection and isolation to this kind of devices
>>> just as what we are doing with an assignable PCI device.
>>>
>>> In order to distinguish the IOMMU-capable mediated devices from those
>>> which still need to rely on parent devices, this patch set adds two
>>> new members in struct mdev_device.
>>>
>>> * iommu_device
>>> - This, if set, indicates that the mediated device could
>>> be fully isolated and protected by IOMMU via attaching
>>> an iommu domain to this device. If empty, it indicates
>>> using vendor defined isolation.
>>>
>>> * iommu_domain
>>> - This is a place holder for an iommu domain. A domain
>>> could be store here for later use once it has been
>>> attached to the iommu_device of this mdev.
>>>
>>> Below helpers are added to set and get above iommu device
>>> and iommu domain pointers in mdev core implementation.
>>>
>>> * mdev_set/get_iommu_device(dev, iommu_device)
>>> - Set or get the iommu device which represents this mdev
>>> in IOMMU's device scope. Drivers don't need to set the
>>> iommu device if it uses vendor defined isolation.
>>>
>>> * mdev_set/get_iommu_domain(domain)
>>> - A iommu domain which has been attached to the iommu
>>> device in order to protect and isolate the mediated
>>> device will be kept in the mdev data structure and
>>> could be retrieved later.
>>>
>>> The mdev parent device driver could opt-in that the mdev could be
>>> fully isolated and protected by the IOMMU when the mdev is being
>>> created by invoking mdev_set_iommu_device() in its @create().
>>>
>>> In the vfio_iommu_type1_attach_group(), a domain allocated through
>>> iommu_domain_alloc() will be attached to the mdev iommu device if
>>> an iommu device has been set. Otherwise, the dummy external domain
>>> will be used and all the DMA isolation and protection are routed to
>>> parent driver as the result.
>>>
>>> On IOMMU side, a basic requirement is allowing to attach multiple
>>> domains to a PCI device if the device advertises the capability
>>> and the IOMMU hardware supports finer granularity translations than
>>> the normal PCI Source ID based translation.
>>>
>>> As the result, a PCI device could work in two modes: normal mode
>>> and auxiliary mode. In the normal mode, a pci device could be
>>> isolated in the Source ID granularity; the pci device itself could
>>> be assigned to a user application by attaching a single domain
>>> to it. In the auxiliary mode, a pci device could be isolated in
>>> finer granularity, hence subsets of the device could be assigned
>>> to different user level application by attaching a different domain
>>> to each subset.
>>>
>>> The device driver is able to switch between above two modes with
>>> below interfaces:
>>>
>>> * iommu_get_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_CAPABILITY)
>>> - Represents the ability of supporting multiple domains
>>> per device.
>>>
>>> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_ENABLE)
>>> - Enable the multiple domains capability for the device
>>> referenced by @dev.
>>>
>>> * iommu_set_dev_attr(dev, IOMMU_DEV_ATTR_AUXD_DISABLE)
>>> - Disable the multiple domains capability for the device
>>> referenced by @dev.
>>>
>>> * iommu_domain_get_attr(domain, DOMAIN_ATTR_AUXD_ID)
>>> - Return ID used for finer-granularity DMA translation.
>>>
>>> * iommu_attach_device_aux(domain, dev)
>>> - Attach a domain to the device in the auxiliary mode.
>>>
>>> * iommu_detach_device_aux(domain, dev)
>>> - Detach the aux domain from device.
>>>
>>> In order for the ease of discussion, sometimes we call "a domain in
>>> auxiliary mode' or simply 'an auxiliary domain' when a domain is
>>> attached to a device for finer granularity translations. But we need
>>> to keep in mind that this doesn't mean there is a differnt domain
>>> type. A same domain could be bound to a device for Source ID based
>>> translation, and bound to another device for finer granularity
>>> translation at the same time.
>>>
>>> This patch series extends both IOMMU and vfio components to support
>>> mdev device passing through when it could be isolated and protected
>>> by the IOMMU units. The first part of this series (PATCH 1/08~5/08)
>>> adds the interfaces and implementation of the multiple domains per
>>> device. The second part (PATCH 6/08~8/08) adds the iommu device
>>> attribute to each mdev, determines isolation type according to the
>>> existence of an iommu device when attaching group in vfio type1 iommu
>>> module, and attaches the domain to iommu aware mediated devices.
>>>
>>> This patch series depends on a patch set posted here [4] for discussion
>>> which added scalable mode support in Intel IOMMU driver.
>>>
>>> References:
>>> [1]
>>> https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
>>>
>>> [2]
>>> https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
>>>
>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>>> [4] https://lkml.org/lkml/2018/11/5/136
>>>
>>> Best regards,
>>> Lu Baolu
>>>
>>> Change log:
>>> v3->v4:
>>> - Use aux domain specific interfaces for domain attach and detach.
>>> - Rebase all patches to 4.20-rc1.
>>>
>>> v2->v3:
>>> - Remove domain type enum and use a pointer on mdev_device instead.
>>> - Add a generic interface for getting/setting per device iommu
>>> attributions. And use it for query aux domain capability, enable
>>> aux domain and disable aux domain purpose.
>>> - Reuse iommu_domain_get_attr() to retrieve the id in a aux domain.
>>> - We discussed the impact of the default domain implementation
>>> on reusing iommu_at(de)tach_device() interfaces. We agreed
>>> that reusing iommu_at(de)tach_device() interfaces is the right
>>> direction and we could tweak the code to remove the impact.
>>> https://www.spinics.net/lists/kvm/msg175285.html
>>> - Removed the RFC tag since no objections received.
>>> - This patch has been submitted separately.
>>> https://www.spinics.net/lists/kvm/msg173936.html
>>>
>>> v1->v2:
>>> - Rewrite the patches with the concept of auxiliary domains.
>>>
>>> Lu Baolu (8):
>>> iommu: Add APIs for multiple domains per device
>>> iommu/vt-d: Add multiple domains per device query
>>> iommu/vt-d: Enable/disable multiple domains per device
>>> iommu/vt-d: Attach/detach domains in auxiliary mode
>>> iommu/vt-d: Return ID associated with an auxiliary domain
>>> vfio/mdev: Add iommu place holders in mdev_device
>>> vfio/type1: Add domain at(de)taching group helpers
>>> vfio/type1: Handle different mdev isolation type
>>>
>>> drivers/iommu/intel-iommu.c | 315
>>> ++++++++++++++++++++++++++++---
>>> drivers/iommu/iommu.c | 52 +++++
>>> drivers/vfio/mdev/mdev_core.c | 36 ++++
>>> drivers/vfio/mdev/mdev_private.h | 2 +
>>> drivers/vfio/vfio_iommu_type1.c | 162 ++++++++++++++--
>>> include/linux/intel-iommu.h | 11 ++
>>> include/linux/iommu.h | 52 +++++
>>> include/linux/mdev.h | 23 +++
>>> 8 files changed, 618 insertions(+), 35 deletions(-)
>>>
>>
>>
>>
>
> .
>