Hi,
The Mediate Device is a framework for fine-grained physical device
sharing across the isolated domains. Currently the mdev framework
is designed to be independent of the platform IOMMU support. As the
result, the DMA isolation relies on the mdev parent device in a
vendor specific way.
There are several cases where a mediated device could be protected
and isolated by the platform IOMMU. For example, Intel vt-d rev3.0
[1] introduces a new translation mode called 'scalable mode', which
enables PASID-granular translations. The vt-d scalable mode is the
key ingredient for Scalable I/O Virtualization [2] [3] which allows
sharing a device in minimal possible granularity (ADI - Assignable
Device Interface).
A mediated device backed by an ADI could be protected and isolated
by the IOMMU since 1) the parent device supports tagging an unique
PASID to all DMA traffic out of the mediated device; and 2) the DMA
translation unit (IOMMU) supports the PASID granular translation.
We can apply IOMMU protection and isolation to this kind of devices
just as what we are doing with an assignable PCI device.
In order to distinguish the IOMMU-capable mediated devices from those
which still need to rely on parent devices, this patch set adds one
new member in struct mdev_device.
* iommu_device
- This, if set, indicates that the mediated device could
be fully isolated and protected by IOMMU via attaching
an iommu domain to this device. If empty, it indicates
using vendor defined isolation.
Below helpers are added to set and get above iommu device in mdev core
implementation.
* mdev_set/get_iommu_device(dev, iommu_device)
- Set or get the iommu device which represents this mdev
in IOMMU's device scope. Drivers don't need to set the
iommu device if it uses vendor defined isolation.
The mdev parent device driver could opt-in that the mdev could be
fully isolated and protected by the IOMMU when the mdev is being
created by invoking mdev_set_iommu_device() in its @create().
In the vfio_iommu_type1_attach_group(), a domain allocated through
iommu_domain_alloc() will be attached to the mdev iommu device if
an iommu device has been set. Otherwise, the dummy external domain
will be used and all the DMA isolation and protection are routed to
parent driver as the result.
On IOMMU side, a basic requirement is allowing to attach multiple
domains to a PCI device if the device advertises the capability
and the IOMMU hardware supports finer granularity translations than
the normal PCI Source ID based translation.
As the result, a PCI device could work in two modes: normal mode
and auxiliary mode. In the normal mode, a pci device could be
isolated in the Source ID granularity; the pci device itself could
be assigned to a user application by attaching a single domain
to it. In the auxiliary mode, a pci device could be isolated in
finer granularity, hence subsets of the device could be assigned
to different user level application by attaching a different domain
to each subset.
Below APIs are introduced in iommu generic layer for aux-domain
purpose:
* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
- Detect both IOMMU and PCI endpoint devices supporting
the feature (aux-domain here) without the host driver
dependency.
* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
- Check the enabling status of the feature (aux-domain
here). The aux-domain interfaces are available only
if this returns true.
* iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
- Enable/disable device specific aux-domain feature.
* iommu_aux_attach_device(domain, dev)
- Attaches @domain to @dev in the auxiliary mode. Multiple
domains could be attached to a single device in the
auxiliary mode with each domain representing an isolated
address space for an assignable subset of the device.
* iommu_aux_detach_device(domain, dev)
- Detach @domain which has been attached to @dev in the
auxiliary mode.
* iommu_aux_get_pasid(domain, dev)
- Return ID used for finer-granularity DMA translation.
For the Intel Scalable IOV usage model, this will be
a PASID. The device which supports Scalable IOV needs
to write this ID to the device register so that DMA
requests could be tagged with a right PASID prefix.
In order for the ease of discussion, sometimes we call "a domain in
auxiliary mode' or simply 'an auxiliary domain' when a domain is
attached to a device for finer granularity translations. But we need
to keep in mind that this doesn't mean there is a differnt domain
type. A same domain could be bound to a device for Source ID based
translation, and bound to another device for finer granularity
translation at the same time.
This patch series extends both IOMMU and vfio components to support
mdev device passing through when it could be isolated and protected
by the IOMMU units. The first part of this series (PATCH 1/09~6/09)
adds the interfaces and implementation of the multiple domains per
device. The second part (PATCH 7/09~9/09) adds the iommu device
attribute to each mdev, determines isolation type according to the
existence of an iommu device when attaching group in vfio type1 iommu
module, and attaches the domain to iommu aware mediated devices.
References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
Best regards,
Lu Baolu
Change log:
v7->v8:
- [PATCH 9/9] Remove the iommu->external_domain check in both
vfio_iommu_type1_pin_pages() and vfio_iommu_type1_unpin_pages()
- Rebase all patches to 5.1-rc2.
v6->v7:
- Update PATCH 1/9 and the cover letter with Jean's comments
posted at https://patchwork.kernel.org/patch/10809093/.
- Add Jean's Reviewed-by in patch 1,7,8,9.
v5->v6:
- Add a new API iommu_dev_feature_enabled() to check whether
an IOMMU specific feature is enabled.
- Rework the vt-d specific per device feature ops according
to Joerg's comments. [https://lkml.org/lkml/2019/1/11/302].
- PATCH 2/9 is added to move intel_iommu_enable_pasid() out
of the scope of CONFIG_INTEL_IOMMU_SVM without functional
changes.
- All patches are rebased on top of vt-d branch of Joerg's
iommu tree.
v4->v5:
- The iommu APIs have been updated with Joerg's proposal posted
here https://www.spinics.net/lists/iommu/msg31874.html.
- Some typos in commit message and comments have been fixed.
- PATCH 3/8 was split from 4/8 to ease code review.
- mdev->domain was removed and could bring back when there's a
real consumer.
- Other code review comments I received during v4 review period
except the EXPORT_SYMBOL vs. EXPORT_SYMBOL_GPL in PATCH 6/8.
- Rebase all patches to 5.0-rc1.
v3->v4:
- Use aux domain specific interfaces for domain attach and detach.
- Rebase all patches to 4.20-rc1.
v2->v3:
- Remove domain type enum and use a pointer on mdev_device instead.
- Add a generic interface for getting/setting per device iommu
attributions. And use it for query aux domain capability, enable
aux domain and disable aux domain purpose.
- Reuse iommu_domain_get_attr() to retrieve the id in a aux domain.
- We discussed the impact of the default domain implementation
on reusing iommu_at(de)tach_device() interfaces. We agreed
that reusing iommu_at(de)tach_device() interfaces is the right
direction and we could tweak the code to remove the impact.
https://www.spinics.net/lists/kvm/msg175285.html
- Removed the RFC tag since no objections received.
- This patch has been submitted separately.
https://www.spinics.net/lists/kvm/msg173936.html
v1->v2:
- Rewrite the patches with the concept of auxiliary domains.
Lu Baolu (9):
iommu: Add APIs for multiple domains per device
iommu/vt-d: Make intel_iommu_enable_pasid() more generic
iommu/vt-d: Add per-device IOMMU feature ops entries
iommu/vt-d: Move common code out of iommu_attch_device()
iommu/vt-d: Aux-domain specific domain attach/detach
iommu/vt-d: Return ID associated with an auxiliary domain
vfio/mdev: Add iommu related member in mdev_device
vfio/type1: Add domain at(de)taching group helpers
vfio/type1: Handle different mdev isolation type
drivers/iommu/intel-iommu.c | 398 ++++++++++++++++++++++++++++---
drivers/iommu/intel-svm.c | 19 +-
drivers/iommu/iommu.c | 96 ++++++++
drivers/vfio/mdev/mdev_core.c | 18 ++
drivers/vfio/mdev/mdev_private.h | 1 +
drivers/vfio/vfio_iommu_type1.c | 139 +++++++++--
include/linux/intel-iommu.h | 13 +-
include/linux/iommu.h | 70 ++++++
include/linux/mdev.h | 14 ++
9 files changed, 710 insertions(+), 58 deletions(-)
--
2.17.1
This moves intel_iommu_enable_pasid() out of the scope of
CONFIG_INTEL_IOMMU_SVM with more and more features requiring
pasid function.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 21 +++++++--------------
drivers/iommu/intel-svm.c | 19 ++++++++++++++++++-
include/linux/intel-iommu.h | 2 +-
3 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 28cb713d728c..d2e613875b3a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5307,8 +5307,7 @@ static void intel_iommu_put_resv_regions(struct device *dev,
}
}
-#ifdef CONFIG_INTEL_IOMMU_SVM
-int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
+int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev)
{
struct device_domain_info *info;
struct context_entry *context;
@@ -5317,7 +5316,7 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
u64 ctx_lo;
int ret;
- domain = get_valid_domain_for_dev(sdev->dev);
+ domain = get_valid_domain_for_dev(dev);
if (!domain)
return -EINVAL;
@@ -5325,7 +5324,7 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
spin_lock(&iommu->lock);
ret = -EINVAL;
- info = sdev->dev->archdata.iommu;
+ info = dev->archdata.iommu;
if (!info || !info->pasid_supported)
goto out;
@@ -5335,14 +5334,13 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
ctx_lo = context[0].lo;
- sdev->did = FLPT_DEFAULT_DID;
- sdev->sid = PCI_DEVID(info->bus, info->devfn);
-
if (!(ctx_lo & CONTEXT_PASIDE)) {
ctx_lo |= CONTEXT_PASIDE;
context[0].lo = ctx_lo;
wmb();
- iommu->flush.flush_context(iommu, sdev->did, sdev->sid,
+ iommu->flush.flush_context(iommu,
+ domain->iommu_did[iommu->seq_id],
+ PCI_DEVID(info->bus, info->devfn),
DMA_CCMD_MASK_NOBIT,
DMA_CCMD_DEVICE_INVL);
}
@@ -5351,12 +5349,6 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
if (!info->pasid_enabled)
iommu_enable_dev_iotlb(info);
- if (info->ats_enabled) {
- sdev->dev_iotlb = 1;
- sdev->qdep = info->ats_qdep;
- if (sdev->qdep >= QI_DEV_EIOTLB_MAX_INVS)
- sdev->qdep = 0;
- }
ret = 0;
out:
@@ -5366,6 +5358,7 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
return ret;
}
+#ifdef CONFIG_INTEL_IOMMU_SVM
struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
{
struct intel_iommu *iommu;
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 3a4b09ae8561..8f87304f915c 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -228,6 +228,7 @@ static LIST_HEAD(global_svm_list);
int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+ struct device_domain_info *info;
struct intel_svm_dev *sdev;
struct intel_svm *svm = NULL;
struct mm_struct *mm = NULL;
@@ -291,13 +292,29 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
}
sdev->dev = dev;
- ret = intel_iommu_enable_pasid(iommu, sdev);
+ ret = intel_iommu_enable_pasid(iommu, dev);
if (ret || !pasid) {
/* If they don't actually want to assign a PASID, this is
* just an enabling check/preparation. */
kfree(sdev);
goto out;
}
+
+ info = dev->archdata.iommu;
+ if (!info || !info->pasid_supported) {
+ kfree(sdev);
+ goto out;
+ }
+
+ sdev->did = FLPT_DEFAULT_DID;
+ sdev->sid = PCI_DEVID(info->bus, info->devfn);
+ if (info->ats_enabled) {
+ sdev->dev_iotlb = 1;
+ sdev->qdep = info->ats_qdep;
+ if (sdev->qdep >= QI_DEV_EIOTLB_MAX_INVS)
+ sdev->qdep = 0;
+ }
+
/* Finish the setup now we know we're keeping it */
sdev->users = 1;
sdev->ops = ops;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fa364de9db18..b7d1e2fbb9ca 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -650,6 +650,7 @@ struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
int for_each_device_domain(int (*fn)(struct device_domain_info *info,
void *data), void *data);
void iommu_flush_write_buffer(struct intel_iommu *iommu);
+int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
#ifdef CONFIG_INTEL_IOMMU_SVM
int intel_svm_init(struct intel_iommu *iommu);
@@ -679,7 +680,6 @@ struct intel_svm {
struct list_head list;
};
-extern int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev);
extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
#endif
--
2.17.1
This adds the iommu ops entries for aux-domain per-device
feature query and enable/disable.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 159 ++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 1 +
2 files changed, 160 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d2e613875b3a..ba5f2e5b3a03 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2485,6 +2485,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->domain = domain;
info->iommu = iommu;
info->pasid_table = NULL;
+ info->auxd_enabled = 0;
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5223,6 +5224,42 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
return phys;
}
+static inline bool scalable_mode_support(void)
+{
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+ bool ret = true;
+
+ rcu_read_lock();
+ for_each_active_iommu(iommu, drhd) {
+ if (!sm_supported(iommu)) {
+ ret = false;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
+static inline bool iommu_pasid_support(void)
+{
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+ bool ret = true;
+
+ rcu_read_lock();
+ for_each_active_iommu(iommu, drhd) {
+ if (!pasid_supported(iommu)) {
+ ret = false;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
static bool intel_iommu_capable(enum iommu_cap cap)
{
if (cap == IOMMU_CAP_CACHE_COHERENCY)
@@ -5380,6 +5417,124 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
}
#endif /* CONFIG_INTEL_IOMMU_SVM */
+static int intel_iommu_enable_auxd(struct device *dev)
+{
+ struct device_domain_info *info;
+ struct intel_iommu *iommu;
+ unsigned long flags;
+ u8 bus, devfn;
+ int ret;
+
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu || dmar_disabled)
+ return -EINVAL;
+
+ if (!sm_supported(iommu) || !pasid_supported(iommu))
+ return -EINVAL;
+
+ ret = intel_iommu_enable_pasid(iommu, dev);
+ if (ret)
+ return -ENODEV;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ info = dev->archdata.iommu;
+ info->auxd_enabled = 1;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return 0;
+}
+
+static int intel_iommu_disable_auxd(struct device *dev)
+{
+ struct device_domain_info *info;
+ unsigned long flags;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ info = dev->archdata.iommu;
+ if (!WARN_ON(!info))
+ info->auxd_enabled = 0;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return 0;
+}
+
+/*
+ * A PCI express designated vendor specific extended capability is defined
+ * in the section 3.7 of Intel scalable I/O virtualization technical spec
+ * for system software and tools to detect endpoint devices supporting the
+ * Intel scalable IO virtualization without host driver dependency.
+ *
+ * Returns the address of the matching extended capability structure within
+ * the device's PCI configuration space or 0 if the device does not support
+ * it.
+ */
+static int siov_find_pci_dvsec(struct pci_dev *pdev)
+{
+ int pos;
+ u16 vendor, id;
+
+ pos = pci_find_next_ext_capability(pdev, 0, 0x23);
+ while (pos) {
+ pci_read_config_word(pdev, pos + 4, &vendor);
+ pci_read_config_word(pdev, pos + 8, &id);
+ if (vendor == PCI_VENDOR_ID_INTEL && id == 5)
+ return pos;
+
+ pos = pci_find_next_ext_capability(pdev, pos, 0x23);
+ }
+
+ return 0;
+}
+
+static bool
+intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat == IOMMU_DEV_FEAT_AUX) {
+ int ret;
+
+ if (!dev_is_pci(dev) || dmar_disabled ||
+ !scalable_mode_support() || !iommu_pasid_support())
+ return false;
+
+ ret = pci_pasid_features(to_pci_dev(dev));
+ if (ret < 0)
+ return false;
+
+ return !!siov_find_pci_dvsec(to_pci_dev(dev));
+ }
+
+ return false;
+}
+
+static int
+intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat == IOMMU_DEV_FEAT_AUX)
+ return intel_iommu_enable_auxd(dev);
+
+ return -ENODEV;
+}
+
+static int
+intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat == IOMMU_DEV_FEAT_AUX)
+ return intel_iommu_disable_auxd(dev);
+
+ return -ENODEV;
+}
+
+static bool
+intel_iommu_dev_feat_enabled(struct device *dev, enum iommu_dev_features feat)
+{
+ struct device_domain_info *info = dev->archdata.iommu;
+
+ if (feat == IOMMU_DEV_FEAT_AUX)
+ return scalable_mode_support() && info && info->auxd_enabled;
+
+ return false;
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -5394,6 +5549,10 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions = intel_iommu_get_resv_regions,
.put_resv_regions = intel_iommu_put_resv_regions,
.device_group = pci_device_group,
+ .dev_has_feat = intel_iommu_dev_has_feat,
+ .dev_feat_enabled = intel_iommu_dev_feat_enabled,
+ .dev_enable_feat = intel_iommu_dev_enable_feat,
+ .dev_disable_feat = intel_iommu_dev_disable_feat,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
};
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index b7d1e2fbb9ca..4f0745479b6d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -568,6 +568,7 @@ struct device_domain_info {
u8 pri_enabled:1;
u8 ats_supported:1;
u8 ats_enabled:1;
+ u8 auxd_enabled:1; /* Multiple domains per device */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
--
2.17.1
This part of code could be used by both normal and aux
domain specific attach entries. Hence move them into a
common function to avoid duplication.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 60 ++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ba5f2e5b3a03..a0f9c748ca9f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5066,35 +5066,14 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
domain_exit(to_dmar_domain(domain));
}
-static int intel_iommu_attach_device(struct iommu_domain *domain,
- struct device *dev)
+static int prepare_domain_attach_device(struct iommu_domain *domain,
+ struct device *dev)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu;
int addr_width;
u8 bus, devfn;
- if (device_is_rmrr_locked(dev)) {
- dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
- return -EPERM;
- }
-
- /* normally dev is not mapped */
- if (unlikely(domain_context_mapped(dev))) {
- struct dmar_domain *old_domain;
-
- old_domain = find_domain(dev);
- if (old_domain) {
- rcu_read_lock();
- dmar_remove_one_dev_info(dev);
- rcu_read_unlock();
-
- if (!domain_type_is_vm_or_si(old_domain) &&
- list_empty(&old_domain->devices))
- domain_exit(old_domain);
- }
- }
-
iommu = device_to_iommu(dev, &bus, &devfn);
if (!iommu)
return -ENODEV;
@@ -5127,7 +5106,40 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
dmar_domain->agaw--;
}
- return domain_add_dev_info(dmar_domain, dev);
+ return 0;
+}
+
+static int intel_iommu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ int ret;
+
+ if (device_is_rmrr_locked(dev)) {
+ dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
+ return -EPERM;
+ }
+
+ /* normally dev is not mapped */
+ if (unlikely(domain_context_mapped(dev))) {
+ struct dmar_domain *old_domain;
+
+ old_domain = find_domain(dev);
+ if (old_domain) {
+ rcu_read_lock();
+ dmar_remove_one_dev_info(dev);
+ rcu_read_unlock();
+
+ if (!domain_type_is_vm_or_si(old_domain) &&
+ list_empty(&old_domain->devices))
+ domain_exit(old_domain);
+ }
+ }
+
+ ret = prepare_domain_attach_device(domain, dev);
+ if (ret)
+ return ret;
+
+ return domain_add_dev_info(to_dmar_domain(domain), dev);
}
static void intel_iommu_detach_device(struct iommu_domain *domain,
--
2.17.1
When multiple domains per device has been enabled by the
device driver, the device will tag the default PASID for
the domain to all DMA traffics out of the subset of this
device; and the IOMMU should translate the DMA requests
in PASID granularity.
This adds the intel_iommu_aux_attach/detach_device() ops
to support managing PASID granular translation structures
when the device driver has enabled multiple domains per
device.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 152 ++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 10 +++
2 files changed, 162 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a0f9c748ca9f..28a998afaf74 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2486,6 +2486,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->iommu = iommu;
info->pasid_table = NULL;
info->auxd_enabled = 0;
+ INIT_LIST_HEAD(&info->auxiliary_domains);
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5066,6 +5067,131 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
domain_exit(to_dmar_domain(domain));
}
+/*
+ * Check whether a @domain could be attached to the @dev through the
+ * aux-domain attach/detach APIs.
+ */
+static inline bool
+is_aux_domain(struct device *dev, struct iommu_domain *domain)
+{
+ struct device_domain_info *info = dev->archdata.iommu;
+
+ return info && info->auxd_enabled &&
+ domain->type == IOMMU_DOMAIN_UNMANAGED;
+}
+
+static void auxiliary_link_device(struct dmar_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev->archdata.iommu;
+
+ assert_spin_locked(&device_domain_lock);
+ if (WARN_ON(!info))
+ return;
+
+ domain->auxd_refcnt++;
+ list_add(&domain->auxd, &info->auxiliary_domains);
+}
+
+static void auxiliary_unlink_device(struct dmar_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev->archdata.iommu;
+
+ assert_spin_locked(&device_domain_lock);
+ if (WARN_ON(!info))
+ return;
+
+ list_del(&domain->auxd);
+ domain->auxd_refcnt--;
+
+ if (!domain->auxd_refcnt && domain->default_pasid > 0)
+ intel_pasid_free_id(domain->default_pasid);
+}
+
+static int aux_domain_add_dev(struct dmar_domain *domain,
+ struct device *dev)
+{
+ int ret;
+ u8 bus, devfn;
+ unsigned long flags;
+ struct intel_iommu *iommu;
+
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu)
+ return -ENODEV;
+
+ if (domain->default_pasid <= 0) {
+ int pasid;
+
+ pasid = intel_pasid_alloc_id(domain, PASID_MIN,
+ pci_max_pasids(to_pci_dev(dev)),
+ GFP_KERNEL);
+ if (pasid <= 0) {
+ pr_err("Can't allocate default pasid\n");
+ return -ENODEV;
+ }
+ domain->default_pasid = pasid;
+ }
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ /*
+ * iommu->lock must be held to attach domain to iommu and setup the
+ * pasid entry for second level translation.
+ */
+ spin_lock(&iommu->lock);
+ ret = domain_attach_iommu(domain, iommu);
+ if (ret)
+ goto attach_failed;
+
+ /* Setup the PASID entry for mediated devices: */
+ ret = intel_pasid_setup_second_level(iommu, domain, dev,
+ domain->default_pasid);
+ if (ret)
+ goto table_failed;
+ spin_unlock(&iommu->lock);
+
+ auxiliary_link_device(domain, dev);
+
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return 0;
+
+table_failed:
+ domain_detach_iommu(domain, iommu);
+attach_failed:
+ spin_unlock(&iommu->lock);
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ if (!domain->auxd_refcnt && domain->default_pasid > 0)
+ intel_pasid_free_id(domain->default_pasid);
+
+ return ret;
+}
+
+static void aux_domain_remove_dev(struct dmar_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info;
+ struct intel_iommu *iommu;
+ unsigned long flags;
+
+ if (!is_aux_domain(dev, &domain->domain))
+ return;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ info = dev->archdata.iommu;
+ iommu = info->iommu;
+
+ auxiliary_unlink_device(domain, dev);
+
+ spin_lock(&iommu->lock);
+ intel_pasid_tear_down_entry(iommu, dev, domain->default_pasid);
+ domain_detach_iommu(domain, iommu);
+ spin_unlock(&iommu->lock);
+
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+}
+
static int prepare_domain_attach_device(struct iommu_domain *domain,
struct device *dev)
{
@@ -5119,6 +5245,9 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
return -EPERM;
}
+ if (is_aux_domain(dev, domain))
+ return -EPERM;
+
/* normally dev is not mapped */
if (unlikely(domain_context_mapped(dev))) {
struct dmar_domain *old_domain;
@@ -5142,12 +5271,33 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
return domain_add_dev_info(to_dmar_domain(domain), dev);
}
+static int intel_iommu_aux_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ int ret;
+
+ if (!is_aux_domain(dev, domain))
+ return -EPERM;
+
+ ret = prepare_domain_attach_device(domain, dev);
+ if (ret)
+ return ret;
+
+ return aux_domain_add_dev(to_dmar_domain(domain), dev);
+}
+
static void intel_iommu_detach_device(struct iommu_domain *domain,
struct device *dev)
{
dmar_remove_one_dev_info(dev);
}
+static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ aux_domain_remove_dev(to_dmar_domain(domain), dev);
+}
+
static int intel_iommu_map(struct iommu_domain *domain,
unsigned long iova, phys_addr_t hpa,
size_t size, int iommu_prot)
@@ -5553,6 +5703,8 @@ const struct iommu_ops intel_iommu_ops = {
.domain_free = intel_iommu_domain_free,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
+ .aux_attach_dev = intel_iommu_aux_attach_device,
+ .aux_detach_dev = intel_iommu_aux_detach_device,
.map = intel_iommu_map,
.unmap = intel_iommu_unmap,
.iova_to_phys = intel_iommu_iova_to_phys,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4f0745479b6d..6925a18a5ca3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -489,9 +489,11 @@ struct dmar_domain {
/* Domain ids per IOMMU. Use u16 since
* domain ids are 16 bit wide according
* to VT-d spec, section 9.3 */
+ unsigned int auxd_refcnt; /* Refcount of auxiliary attaching */
bool has_iotlb_device;
struct list_head devices; /* all devices' list */
+ struct list_head auxd; /* link to device's auxiliary list */
struct iova_domain iovad; /* iova's that belong to this domain */
struct dma_pte *pgd; /* virtual address */
@@ -510,6 +512,11 @@ struct dmar_domain {
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr; /* maximum mapped address */
+ int default_pasid; /*
+ * The default pasid used for non-SVM
+ * traffic on mediated devices.
+ */
+
struct iommu_domain domain; /* generic domain data structure for
iommu core */
};
@@ -559,6 +566,9 @@ struct device_domain_info {
struct list_head link; /* link to domain siblings */
struct list_head global; /* link to global list */
struct list_head table; /* link to pasid table */
+ struct list_head auxiliary_domains; /* auxiliary domains
+ * attached to this device
+ */
u8 bus; /* PCI bus number */
u8 devfn; /* PCI devfn number */
u16 pfsid; /* SRIOV physical function source ID */
--
2.17.1
A parent device might create different types of mediated
devices. For example, a mediated device could be created
by the parent device with full isolation and protection
provided by the IOMMU. One usage case could be found on
Intel platforms where a mediated device is an assignable
subset of a PCI, the DMA requests on behalf of it are all
tagged with a PASID. Since IOMMU supports PASID-granular
translations (scalable mode in VT-d 3.0), this mediated
device could be individually protected and isolated by an
IOMMU.
This patch adds a new member in the struct mdev_device to
indicate that the mediated device represented by mdev could
be isolated and protected by attaching a domain to a device
represented by mdev->iommu_device. It also adds a helper to
add or set the iommu device.
* mdev_device->iommu_device
- This, if set, indicates that the mediated device could
be fully isolated and protected by IOMMU via attaching
an iommu domain to this device. If empty, it indicates
using vendor defined isolation, hence bypass IOMMU.
* mdev_set/get_iommu_device(dev, iommu_device)
- Set or get the iommu device which represents this mdev
in IOMMU's device scope. Drivers don't need to set the
iommu device if it uses vendor defined isolation.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Suggested-by: Kevin Tian <[email protected]>
Suggested-by: Alex Williamson <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
drivers/vfio/mdev/mdev_private.h | 1 +
include/linux/mdev.h | 14 ++++++++++++++
3 files changed, 33 insertions(+)
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b96fedc77ee5..1b6435529166 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove)
return 0;
}
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+
+ mdev->iommu_device = iommu_device;
+
+ return 0;
+}
+EXPORT_SYMBOL(mdev_set_iommu_device);
+
+struct device *mdev_get_iommu_device(struct device *dev)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+
+ return mdev->iommu_device;
+}
+EXPORT_SYMBOL(mdev_get_iommu_device);
+
static int __init mdev_init(void)
{
return mdev_bus_register();
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 379758c52b1b..bfb7b22a7cb6 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -34,6 +34,7 @@ struct mdev_device {
struct list_head next;
struct kobject *type_kobj;
bool active;
+ struct device *iommu_device;
};
#define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index d7aee90e5da5..df2ea39f47ee 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -15,6 +15,20 @@
struct mdev_device;
+/*
+ * Called by the parent device driver to set the device which represents
+ * this mdev in iommu protection scope. By default, the iommu device is
+ * NULL, that indicates using vendor defined isolation.
+ *
+ * @dev: the mediated device that iommu will isolate.
+ * @iommu_device: a pci device which represents the iommu for @dev.
+ *
+ * Return 0 for success, otherwise negative error value.
+ */
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
+
+struct device *mdev_get_iommu_device(struct device *dev);
+
/**
* struct mdev_parent_ops - Structure to be registered for each parent device to
* register the device to mdev module.
--
2.17.1
This adds helpers to attach or detach a domain to a
group. This will replace iommu_attach_group() which
only works for non-mdev devices.
If a domain is attaching to a group which includes the
mediated devices, it should attach to the iommu device
(a pci device which represents the mdev in iommu scope)
instead. The added helper supports attaching domain to
groups for both pci and mdev devices.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 84 ++++++++++++++++++++++++++++++---
1 file changed, 77 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 73652e21efec..ccc4165474aa 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -91,6 +91,7 @@ struct vfio_dma {
struct vfio_group {
struct iommu_group *iommu_group;
struct list_head next;
+ bool mdev_group; /* An mdev group */
};
/*
@@ -1298,6 +1299,75 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
return ret;
}
+static struct device *vfio_mdev_get_iommu_device(struct device *dev)
+{
+ struct device *(*fn)(struct device *dev);
+ struct device *iommu_device;
+
+ fn = symbol_get(mdev_get_iommu_device);
+ if (fn) {
+ iommu_device = fn(dev);
+ symbol_put(mdev_get_iommu_device);
+
+ return iommu_device;
+ }
+
+ return NULL;
+}
+
+static int vfio_mdev_attach_domain(struct device *dev, void *data)
+{
+ struct iommu_domain *domain = data;
+ struct device *iommu_device;
+
+ iommu_device = vfio_mdev_get_iommu_device(dev);
+ if (iommu_device) {
+ if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
+ return iommu_aux_attach_device(domain, iommu_device);
+ else
+ return iommu_attach_device(domain, iommu_device);
+ }
+
+ return -EINVAL;
+}
+
+static int vfio_mdev_detach_domain(struct device *dev, void *data)
+{
+ struct iommu_domain *domain = data;
+ struct device *iommu_device;
+
+ iommu_device = vfio_mdev_get_iommu_device(dev);
+ if (iommu_device) {
+ if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
+ iommu_aux_detach_device(domain, iommu_device);
+ else
+ iommu_detach_device(domain, iommu_device);
+ }
+
+ return 0;
+}
+
+static int vfio_iommu_attach_group(struct vfio_domain *domain,
+ struct vfio_group *group)
+{
+ if (group->mdev_group)
+ return iommu_group_for_each_dev(group->iommu_group,
+ domain->domain,
+ vfio_mdev_attach_domain);
+ else
+ return iommu_attach_group(domain->domain, group->iommu_group);
+}
+
+static void vfio_iommu_detach_group(struct vfio_domain *domain,
+ struct vfio_group *group)
+{
+ if (group->mdev_group)
+ iommu_group_for_each_dev(group->iommu_group, domain->domain,
+ vfio_mdev_detach_domain);
+ else
+ iommu_detach_group(domain->domain, group->iommu_group);
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
@@ -1373,7 +1443,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
goto out_domain;
}
- ret = iommu_attach_group(domain->domain, iommu_group);
+ ret = vfio_iommu_attach_group(domain, group);
if (ret)
goto out_domain;
@@ -1405,8 +1475,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
list_for_each_entry(d, &iommu->domain_list, next) {
if (d->domain->ops == domain->domain->ops &&
d->prot == domain->prot) {
- iommu_detach_group(domain->domain, iommu_group);
- if (!iommu_attach_group(d->domain, iommu_group)) {
+ vfio_iommu_detach_group(domain, group);
+ if (!vfio_iommu_attach_group(d, group)) {
list_add(&group->next, &d->group_list);
iommu_domain_free(domain->domain);
kfree(domain);
@@ -1414,7 +1484,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;
}
- ret = iommu_attach_group(domain->domain, iommu_group);
+ ret = vfio_iommu_attach_group(domain, group);
if (ret)
goto out_domain;
}
@@ -1440,7 +1510,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;
out_detach:
- iommu_detach_group(domain->domain, iommu_group);
+ vfio_iommu_detach_group(domain, group);
out_domain:
iommu_domain_free(domain->domain);
out_free:
@@ -1531,7 +1601,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
if (!group)
continue;
- iommu_detach_group(domain->domain, iommu_group);
+ vfio_iommu_detach_group(domain, group);
list_del(&group->next);
kfree(group);
/*
@@ -1596,7 +1666,7 @@ static void vfio_release_domain(struct vfio_domain *domain, bool external)
list_for_each_entry_safe(group, group_tmp,
&domain->group_list, next) {
if (!external)
- iommu_detach_group(domain->domain, group->iommu_group);
+ vfio_iommu_detach_group(domain, group);
list_del(&group->next);
kfree(group);
}
--
2.17.1
This adds the support to determine the isolation type
of a mediated device group by checking whether it has
an iommu device. If an iommu device exists, an iommu
domain will be allocated and then attached to the iommu
device. Otherwise, keep the same behavior as it is.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 55 +++++++++++++++++++++++++--------
1 file changed, 42 insertions(+), 13 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ccc4165474aa..b91cafcd5181 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -559,7 +559,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
mutex_lock(&iommu->lock);
/* Fail if notifier list is empty */
- if ((!iommu->external_domain) || (!iommu->notifier.head)) {
+ if (!iommu->notifier.head) {
ret = -EINVAL;
goto pin_done;
}
@@ -641,11 +641,6 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
mutex_lock(&iommu->lock);
- if (!iommu->external_domain) {
- mutex_unlock(&iommu->lock);
- return -EINVAL;
- }
-
do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
for (i = 0; i < npage; i++) {
struct vfio_dma *dma;
@@ -1368,13 +1363,40 @@ static void vfio_iommu_detach_group(struct vfio_domain *domain,
iommu_detach_group(domain->domain, group->iommu_group);
}
+static bool vfio_bus_is_mdev(struct bus_type *bus)
+{
+ struct bus_type *mdev_bus;
+ bool ret = false;
+
+ mdev_bus = symbol_get(mdev_bus_type);
+ if (mdev_bus) {
+ ret = (bus == mdev_bus);
+ symbol_put(mdev_bus_type);
+ }
+
+ return ret;
+}
+
+static int vfio_mdev_iommu_device(struct device *dev, void *data)
+{
+ struct device **old = data, *new;
+
+ new = vfio_mdev_get_iommu_device(dev);
+ if (!new || (*old && *old != new))
+ return -EINVAL;
+
+ *old = new;
+
+ return 0;
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
struct vfio_iommu *iommu = iommu_data;
struct vfio_group *group;
struct vfio_domain *domain, *d;
- struct bus_type *bus = NULL, *mdev_bus;
+ struct bus_type *bus = NULL;
int ret;
bool resv_msi, msi_remap;
phys_addr_t resv_msi_base;
@@ -1409,23 +1431,30 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_free;
- mdev_bus = symbol_get(mdev_bus_type);
+ if (vfio_bus_is_mdev(bus)) {
+ struct device *iommu_device = NULL;
- if (mdev_bus) {
- if ((bus == mdev_bus) && !iommu_present(bus)) {
- symbol_put(mdev_bus_type);
+ group->mdev_group = true;
+
+ /* Determine the isolation type */
+ ret = iommu_group_for_each_dev(iommu_group, &iommu_device,
+ vfio_mdev_iommu_device);
+ if (ret || !iommu_device) {
if (!iommu->external_domain) {
INIT_LIST_HEAD(&domain->group_list);
iommu->external_domain = domain;
- } else
+ } else {
kfree(domain);
+ }
list_add(&group->next,
&iommu->external_domain->group_list);
mutex_unlock(&iommu->lock);
+
return 0;
}
- symbol_put(mdev_bus_type);
+
+ bus = iommu_device->bus;
}
domain->domain = iommu_domain_alloc(bus);
--
2.17.1
Sharing a physical PCI device in a finer-granularity way
is becoming a consensus in the industry. IOMMU vendors
are also engaging efforts to support such sharing as well
as possible. Among the efforts, the capability of support
finer-granularity DMA isolation is a common requirement
due to the security consideration. With finer-granularity
DMA isolation, subsets of a PCI function can be isolated
from each others by the IOMMU. As a result, there is a
request in software to attach multiple domains to a physical
PCI device. One example of such use model is the Intel
Scalable IOV [1] [2]. The Intel vt-d 3.0 spec [3] introduces
the scalable mode which enables PASID granularity DMA
isolation.
This adds the APIs to support multiple domains per device.
In order to ease the discussions, we call it 'a domain in
auxiliary mode' or simply 'auxiliary domain' when multiple
domains are attached to a physical device.
The APIs include:
* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
- Detect both IOMMU and PCI endpoint devices supporting
the feature (aux-domain here) without the host driver
dependency.
* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
- Check the enabling status of the feature (aux-domain
here). The aux-domain interfaces are available only
if this returns true.
* iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
- Enable/disable device specific aux-domain feature.
* iommu_aux_attach_device(domain, dev)
- Attaches @domain to @dev in the auxiliary mode. Multiple
domains could be attached to a single device in the
auxiliary mode with each domain representing an isolated
address space for an assignable subset of the device.
* iommu_aux_detach_device(domain, dev)
- Detach @domain which has been attached to @dev in the
auxiliary mode.
* iommu_aux_get_pasid(domain, dev)
- Return ID used for finer-granularity DMA translation.
For the Intel Scalable IOV usage model, this will be
a PASID. The device which supports Scalable IOV needs
to write this ID to the device register so that DMA
requests could be tagged with a right PASID prefix.
This has been updated with the latest proposal from Joerg
posted here [5].
Many people involved in discussions of this design.
Kevin Tian <[email protected]>
Liu Yi L <[email protected]>
Ashok Raj <[email protected]>
Sanjay Kumar <[email protected]>
Jacob Pan <[email protected]>
Alex Williamson <[email protected]>
Jean-Philippe Brucker <[email protected]>
Joerg Roedel <[email protected]>
and some discussions can be found here [4] [5].
[1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[4] https://lkml.org/lkml/2018/7/26/4
[5] https://www.spinics.net/lists/iommu/msg31874.html
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Suggested-by: Kevin Tian <[email protected]>
Suggested-by: Jean-Philippe Brucker <[email protected]>
Suggested-by: Joerg Roedel <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/iommu.c | 96 +++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 70 +++++++++++++++++++++++++++++++
2 files changed, 166 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1164b9926a2b..1b697feb3e30 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2030,3 +2030,99 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
return 0;
}
EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/*
+ * Per device IOMMU features.
+ */
+bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features feat)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->dev_has_feat)
+ return ops->dev_has_feat(dev, feat);
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_has_feature);
+
+int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->dev_enable_feat)
+ return ops->dev_enable_feat(dev, feat);
+
+ return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
+
+/*
+ * The device drivers should do the necessary cleanups before calling this.
+ * For example, before disabling the aux-domain feature, the device driver
+ * should detach all aux-domains. Otherwise, this will return -EBUSY.
+ */
+int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->dev_disable_feat)
+ return ops->dev_disable_feat(dev, feat);
+
+ return -EBUSY;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
+
+bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->dev_feat_enabled)
+ return ops->dev_feat_enabled(dev, feat);
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
+
+/*
+ * Aux-domain specific attach/detach.
+ *
+ * Only works if iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX) returns
+ * true. Also, as long as domains are attached to a device through this
+ * interface, any tries to call iommu_attach_device() should fail
+ * (iommu_detach_device() can't fail, so we fail when trying to re-attach).
+ * This should make us safe against a device being attached to a guest as a
+ * whole while there are still pasid users on it (aux and sva).
+ */
+int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
+{
+ int ret = -ENODEV;
+
+ if (domain->ops->aux_attach_dev)
+ ret = domain->ops->aux_attach_dev(domain, dev);
+
+ if (!ret)
+ trace_attach_device_to_domain(dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_aux_attach_device);
+
+void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev)
+{
+ if (domain->ops->aux_detach_dev) {
+ domain->ops->aux_detach_dev(domain, dev);
+ trace_detach_device_from_domain(dev);
+ }
+}
+EXPORT_SYMBOL_GPL(iommu_aux_detach_device);
+
+int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
+{
+ int ret = -ENODEV;
+
+ if (domain->ops->aux_get_pasid)
+ ret = domain->ops->aux_get_pasid(domain, dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ffbbc7e39cee..8239ece9fdfc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -156,6 +156,11 @@ struct iommu_resv_region {
enum iommu_resv_type type;
};
+/* Per device IOMMU features */
+enum iommu_dev_features {
+ IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
+};
+
#ifdef CONFIG_IOMMU_API
/**
@@ -186,6 +191,11 @@ struct iommu_resv_region {
* @of_xlate: add OF master IDs to iommu grouping
* @is_attach_deferred: Check if domain attach should be deferred from iommu
* driver init to device driver init (default no)
+ * @dev_has/enable/disable_feat: per device entries to check/enable/disable
+ * iommu specific features.
+ * @dev_feat_enabled: check enabled feature
+ * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
+ * @aux_get_pasid: get the pasid given an aux-domain
* @pgsize_bitmap: bitmap of all possible supported page sizes
*/
struct iommu_ops {
@@ -230,6 +240,17 @@ struct iommu_ops {
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
+ /* Per device IOMMU features */
+ bool (*dev_has_feat)(struct device *dev, enum iommu_dev_features f);
+ bool (*dev_feat_enabled)(struct device *dev, enum iommu_dev_features f);
+ int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
+ int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
+
+ /* Aux-domain specific attach/detach entries */
+ int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
+ void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
+
unsigned long pgsize_bitmap;
};
@@ -416,6 +437,14 @@ static inline void dev_iommu_fwspec_set(struct device *dev,
int iommu_probe_device(struct device *dev);
void iommu_release_device(struct device *dev);
+bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features f);
+int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
+int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
+bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
+int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev);
+void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
+int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
+
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -700,6 +729,47 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
return NULL;
}
+static inline bool
+iommu_dev_has_feature(struct device *dev, enum iommu_dev_features feat)
+{
+ return false;
+}
+
+static inline bool
+iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
+{
+ return false;
+}
+
+static inline int
+iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
+{
+ return -ENODEV;
+}
+
+static inline int
+iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
+{
+ return -ENODEV;
+}
+
+static inline int
+iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
+{
+ return -ENODEV;
+}
+
+static inline void
+iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev)
+{
+}
+
+static inline int
+iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
+{
+ return -ENODEV;
+}
+
#endif /* CONFIG_IOMMU_API */
#ifdef CONFIG_IOMMU_DEBUGFS
--
2.17.1
This adds support to return the default pasid associated with
an auxiliary domain. The PCI device which is bound with this
domain should use this value as the pasid for all DMA requests
of the subset of device which is isolated and protected with
this domain.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 28a998afaf74..c137f0f2cf49 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5697,6 +5697,15 @@ intel_iommu_dev_feat_enabled(struct device *dev, enum iommu_dev_features feat)
return false;
}
+static int
+intel_iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+ return dmar_domain->default_pasid > 0 ?
+ dmar_domain->default_pasid : -EINVAL;
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -5705,6 +5714,7 @@ const struct iommu_ops intel_iommu_ops = {
.detach_dev = intel_iommu_detach_device,
.aux_attach_dev = intel_iommu_aux_attach_device,
.aux_detach_dev = intel_iommu_aux_detach_device,
+ .aux_get_pasid = intel_iommu_aux_get_pasid,
.map = intel_iommu_map,
.unmap = intel_iommu_unmap,
.iova_to_phys = intel_iommu_iova_to_phys,
--
2.17.1
On 3/25/2019 7:00 AM, Lu Baolu wrote:
> A parent device might create different types of mediated
> devices. For example, a mediated device could be created
> by the parent device with full isolation and protection
> provided by the IOMMU. One usage case could be found on
> Intel platforms where a mediated device is an assignable
> subset of a PCI, the DMA requests on behalf of it are all
> tagged with a PASID. Since IOMMU supports PASID-granular
> translations (scalable mode in VT-d 3.0), this mediated
> device could be individually protected and isolated by an
> IOMMU.
>
> This patch adds a new member in the struct mdev_device to
> indicate that the mediated device represented by mdev could
> be isolated and protected by attaching a domain to a device
> represented by mdev->iommu_device. It also adds a helper to
> add or set the iommu device.
>
> * mdev_device->iommu_device
> - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation, hence bypass IOMMU.
>
> * mdev_set/get_iommu_device(dev, iommu_device)
> - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Suggested-by: Alex Williamson <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
> drivers/vfio/mdev/mdev_private.h | 1 +
> include/linux/mdev.h | 14 ++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b96fedc77ee5..1b6435529166 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove)
> return 0;
> }
>
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_device = iommu_device;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return mdev->iommu_device;
> +}
> +EXPORT_SYMBOL(mdev_get_iommu_device);
> +
> static int __init mdev_init(void)
> {
> return mdev_bus_register();
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> index 379758c52b1b..bfb7b22a7cb6 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -34,6 +34,7 @@ struct mdev_device {
> struct list_head next;
> struct kobject *type_kobj;
> bool active;
> + struct device *iommu_device;
> };
>
> #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index d7aee90e5da5..df2ea39f47ee 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -15,6 +15,20 @@
>
> struct mdev_device;
>
> +/*
> + * Called by the parent device driver to set the device which represents
> + * this mdev in iommu protection scope. By default, the iommu device is
> + * NULL, that indicates using vendor defined isolation.
> + *
> + * @dev: the mediated device that iommu will isolate.
> + * @iommu_device: a pci device which represents the iommu for @dev.
> + *
> + * Return 0 for success, otherwise negative error value.
> + */
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev);
> +
> /**
> * struct mdev_parent_ops - Structure to be registered for each parent device to
> * register the device to mdev module.
>
Reviewed-by: Kirti Wankhede <[email protected]>
Thanks,
Kirti
On 3/25/2019 7:00 AM, Lu Baolu wrote:
> This adds the support to determine the isolation type
> of a mediated device group by checking whether it has
> an iommu device. If an iommu device exists, an iommu
> domain will be allocated and then attached to the iommu
> device. Otherwise, keep the same behavior as it is.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kirti Wankhede <[email protected]>
Thanks,
Kirti
> ---
> drivers/vfio/vfio_iommu_type1.c | 55 +++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ccc4165474aa..b91cafcd5181 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -559,7 +559,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> mutex_lock(&iommu->lock);
>
> /* Fail if notifier list is empty */
> - if ((!iommu->external_domain) || (!iommu->notifier.head)) {
> + if (!iommu->notifier.head) {
> ret = -EINVAL;
> goto pin_done;
> }
> @@ -641,11 +641,6 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
>
> mutex_lock(&iommu->lock);
>
> - if (!iommu->external_domain) {
> - mutex_unlock(&iommu->lock);
> - return -EINVAL;
> - }
> -
> do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
> for (i = 0; i < npage; i++) {
> struct vfio_dma *dma;
> @@ -1368,13 +1363,40 @@ static void vfio_iommu_detach_group(struct vfio_domain *domain,
> iommu_detach_group(domain->domain, group->iommu_group);
> }
>
> +static bool vfio_bus_is_mdev(struct bus_type *bus)
> +{
> + struct bus_type *mdev_bus;
> + bool ret = false;
> +
> + mdev_bus = symbol_get(mdev_bus_type);
> + if (mdev_bus) {
> + ret = (bus == mdev_bus);
> + symbol_put(mdev_bus_type);
> + }
> +
> + return ret;
> +}
> +
> +static int vfio_mdev_iommu_device(struct device *dev, void *data)
> +{
> + struct device **old = data, *new;
> +
> + new = vfio_mdev_get_iommu_device(dev);
> + if (!new || (*old && *old != new))
> + return -EINVAL;
> +
> + *old = new;
> +
> + return 0;
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> struct vfio_iommu *iommu = iommu_data;
> struct vfio_group *group;
> struct vfio_domain *domain, *d;
> - struct bus_type *bus = NULL, *mdev_bus;
> + struct bus_type *bus = NULL;
> int ret;
> bool resv_msi, msi_remap;
> phys_addr_t resv_msi_base;
> @@ -1409,23 +1431,30 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> if (ret)
> goto out_free;
>
> - mdev_bus = symbol_get(mdev_bus_type);
> + if (vfio_bus_is_mdev(bus)) {
> + struct device *iommu_device = NULL;
>
> - if (mdev_bus) {
> - if ((bus == mdev_bus) && !iommu_present(bus)) {
> - symbol_put(mdev_bus_type);
> + group->mdev_group = true;
> +
> + /* Determine the isolation type */
> + ret = iommu_group_for_each_dev(iommu_group, &iommu_device,
> + vfio_mdev_iommu_device);
> + if (ret || !iommu_device) {
> if (!iommu->external_domain) {
> INIT_LIST_HEAD(&domain->group_list);
> iommu->external_domain = domain;
> - } else
> + } else {
> kfree(domain);
> + }
>
> list_add(&group->next,
> &iommu->external_domain->group_list);
> mutex_unlock(&iommu->lock);
> +
> return 0;
> }
> - symbol_put(mdev_bus_type);
> +
> + bus = iommu_device->bus;
> }
>
> domain->domain = iommu_domain_alloc(bus);
>
On Mon, 25 Mar 2019 09:30:34 +0800
Lu Baolu <[email protected]> wrote:
> A parent device might create different types of mediated
> devices. For example, a mediated device could be created
> by the parent device with full isolation and protection
> provided by the IOMMU. One usage case could be found on
> Intel platforms where a mediated device is an assignable
> subset of a PCI, the DMA requests on behalf of it are all
> tagged with a PASID. Since IOMMU supports PASID-granular
> translations (scalable mode in VT-d 3.0), this mediated
> device could be individually protected and isolated by an
> IOMMU.
>
> This patch adds a new member in the struct mdev_device to
> indicate that the mediated device represented by mdev could
> be isolated and protected by attaching a domain to a device
> represented by mdev->iommu_device. It also adds a helper to
> add or set the iommu device.
>
> * mdev_device->iommu_device
> - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation, hence bypass IOMMU.
>
> * mdev_set/get_iommu_device(dev, iommu_device)
> - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Suggested-by: Alex Williamson <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
> drivers/vfio/mdev/mdev_private.h | 1 +
> include/linux/mdev.h | 14 ++++++++++++++
> 3 files changed, 33 insertions(+)
Acked-by: Alex Williamson <[email protected]>
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b96fedc77ee5..1b6435529166 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove)
> return 0;
> }
>
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_device = iommu_device;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return mdev->iommu_device;
> +}
> +EXPORT_SYMBOL(mdev_get_iommu_device);
> +
> static int __init mdev_init(void)
> {
> return mdev_bus_register();
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> index 379758c52b1b..bfb7b22a7cb6 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -34,6 +34,7 @@ struct mdev_device {
> struct list_head next;
> struct kobject *type_kobj;
> bool active;
> + struct device *iommu_device;
> };
>
> #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index d7aee90e5da5..df2ea39f47ee 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -15,6 +15,20 @@
>
> struct mdev_device;
>
> +/*
> + * Called by the parent device driver to set the device which represents
> + * this mdev in iommu protection scope. By default, the iommu device is
> + * NULL, that indicates using vendor defined isolation.
> + *
> + * @dev: the mediated device that iommu will isolate.
> + * @iommu_device: a pci device which represents the iommu for @dev.
> + *
> + * Return 0 for success, otherwise negative error value.
> + */
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev);
> +
> /**
> * struct mdev_parent_ops - Structure to be registered for each parent device to
> * register the device to mdev module.
On Mon, 25 Mar 2019 09:30:35 +0800
Lu Baolu <[email protected]> wrote:
> This adds helpers to attach or detach a domain to a
> group. This will replace iommu_attach_group() which
> only works for non-mdev devices.
>
> If a domain is attaching to a group which includes the
> mediated devices, it should attach to the iommu device
> (a pci device which represents the mdev in iommu scope)
> instead. The added helper supports attaching domain to
> groups for both pci and mdev devices.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 84 ++++++++++++++++++++++++++++++---
> 1 file changed, 77 insertions(+), 7 deletions(-)
Acked-by: Alex Williamson <[email protected]>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 73652e21efec..ccc4165474aa 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -91,6 +91,7 @@ struct vfio_dma {
> struct vfio_group {
> struct iommu_group *iommu_group;
> struct list_head next;
> + bool mdev_group; /* An mdev group */
> };
>
> /*
> @@ -1298,6 +1299,75 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
> return ret;
> }
>
> +static struct device *vfio_mdev_get_iommu_device(struct device *dev)
> +{
> + struct device *(*fn)(struct device *dev);
> + struct device *iommu_device;
> +
> + fn = symbol_get(mdev_get_iommu_device);
> + if (fn) {
> + iommu_device = fn(dev);
> + symbol_put(mdev_get_iommu_device);
> +
> + return iommu_device;
> + }
> +
> + return NULL;
> +}
> +
> +static int vfio_mdev_attach_domain(struct device *dev, void *data)
> +{
> + struct iommu_domain *domain = data;
> + struct device *iommu_device;
> +
> + iommu_device = vfio_mdev_get_iommu_device(dev);
> + if (iommu_device) {
> + if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
> + return iommu_aux_attach_device(domain, iommu_device);
> + else
> + return iommu_attach_device(domain, iommu_device);
> + }
> +
> + return -EINVAL;
> +}
> +
> +static int vfio_mdev_detach_domain(struct device *dev, void *data)
> +{
> + struct iommu_domain *domain = data;
> + struct device *iommu_device;
> +
> + iommu_device = vfio_mdev_get_iommu_device(dev);
> + if (iommu_device) {
> + if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
> + iommu_aux_detach_device(domain, iommu_device);
> + else
> + iommu_detach_device(domain, iommu_device);
> + }
> +
> + return 0;
> +}
> +
> +static int vfio_iommu_attach_group(struct vfio_domain *domain,
> + struct vfio_group *group)
> +{
> + if (group->mdev_group)
> + return iommu_group_for_each_dev(group->iommu_group,
> + domain->domain,
> + vfio_mdev_attach_domain);
> + else
> + return iommu_attach_group(domain->domain, group->iommu_group);
> +}
> +
> +static void vfio_iommu_detach_group(struct vfio_domain *domain,
> + struct vfio_group *group)
> +{
> + if (group->mdev_group)
> + iommu_group_for_each_dev(group->iommu_group, domain->domain,
> + vfio_mdev_detach_domain);
> + else
> + iommu_detach_group(domain->domain, group->iommu_group);
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> @@ -1373,7 +1443,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> goto out_domain;
> }
>
> - ret = iommu_attach_group(domain->domain, iommu_group);
> + ret = vfio_iommu_attach_group(domain, group);
> if (ret)
> goto out_domain;
>
> @@ -1405,8 +1475,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> list_for_each_entry(d, &iommu->domain_list, next) {
> if (d->domain->ops == domain->domain->ops &&
> d->prot == domain->prot) {
> - iommu_detach_group(domain->domain, iommu_group);
> - if (!iommu_attach_group(d->domain, iommu_group)) {
> + vfio_iommu_detach_group(domain, group);
> + if (!vfio_iommu_attach_group(d, group)) {
> list_add(&group->next, &d->group_list);
> iommu_domain_free(domain->domain);
> kfree(domain);
> @@ -1414,7 +1484,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> return 0;
> }
>
> - ret = iommu_attach_group(domain->domain, iommu_group);
> + ret = vfio_iommu_attach_group(domain, group);
> if (ret)
> goto out_domain;
> }
> @@ -1440,7 +1510,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> return 0;
>
> out_detach:
> - iommu_detach_group(domain->domain, iommu_group);
> + vfio_iommu_detach_group(domain, group);
> out_domain:
> iommu_domain_free(domain->domain);
> out_free:
> @@ -1531,7 +1601,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
> if (!group)
> continue;
>
> - iommu_detach_group(domain->domain, iommu_group);
> + vfio_iommu_detach_group(domain, group);
> list_del(&group->next);
> kfree(group);
> /*
> @@ -1596,7 +1666,7 @@ static void vfio_release_domain(struct vfio_domain *domain, bool external)
> list_for_each_entry_safe(group, group_tmp,
> &domain->group_list, next) {
> if (!external)
> - iommu_detach_group(domain->domain, group->iommu_group);
> + vfio_iommu_detach_group(domain, group);
> list_del(&group->next);
> kfree(group);
> }
On Mon, 25 Mar 2019 09:30:36 +0800
Lu Baolu <[email protected]> wrote:
> This adds the support to determine the isolation type
> of a mediated device group by checking whether it has
> an iommu device. If an iommu device exists, an iommu
> domain will be allocated and then attached to the iommu
> device. Otherwise, keep the same behavior as it is.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/vfio/vfio_iommu_type1.c | 55 +++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 13 deletions(-)
Acked-by: Alex Williamson <[email protected]>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ccc4165474aa..b91cafcd5181 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -559,7 +559,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> mutex_lock(&iommu->lock);
>
> /* Fail if notifier list is empty */
> - if ((!iommu->external_domain) || (!iommu->notifier.head)) {
> + if (!iommu->notifier.head) {
> ret = -EINVAL;
> goto pin_done;
> }
> @@ -641,11 +641,6 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
>
> mutex_lock(&iommu->lock);
>
> - if (!iommu->external_domain) {
> - mutex_unlock(&iommu->lock);
> - return -EINVAL;
> - }
> -
> do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
> for (i = 0; i < npage; i++) {
> struct vfio_dma *dma;
> @@ -1368,13 +1363,40 @@ static void vfio_iommu_detach_group(struct vfio_domain *domain,
> iommu_detach_group(domain->domain, group->iommu_group);
> }
>
> +static bool vfio_bus_is_mdev(struct bus_type *bus)
> +{
> + struct bus_type *mdev_bus;
> + bool ret = false;
> +
> + mdev_bus = symbol_get(mdev_bus_type);
> + if (mdev_bus) {
> + ret = (bus == mdev_bus);
> + symbol_put(mdev_bus_type);
> + }
> +
> + return ret;
> +}
> +
> +static int vfio_mdev_iommu_device(struct device *dev, void *data)
> +{
> + struct device **old = data, *new;
> +
> + new = vfio_mdev_get_iommu_device(dev);
> + if (!new || (*old && *old != new))
> + return -EINVAL;
> +
> + *old = new;
> +
> + return 0;
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> struct vfio_iommu *iommu = iommu_data;
> struct vfio_group *group;
> struct vfio_domain *domain, *d;
> - struct bus_type *bus = NULL, *mdev_bus;
> + struct bus_type *bus = NULL;
> int ret;
> bool resv_msi, msi_remap;
> phys_addr_t resv_msi_base;
> @@ -1409,23 +1431,30 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> if (ret)
> goto out_free;
>
> - mdev_bus = symbol_get(mdev_bus_type);
> + if (vfio_bus_is_mdev(bus)) {
> + struct device *iommu_device = NULL;
>
> - if (mdev_bus) {
> - if ((bus == mdev_bus) && !iommu_present(bus)) {
> - symbol_put(mdev_bus_type);
> + group->mdev_group = true;
> +
> + /* Determine the isolation type */
> + ret = iommu_group_for_each_dev(iommu_group, &iommu_device,
> + vfio_mdev_iommu_device);
> + if (ret || !iommu_device) {
> if (!iommu->external_domain) {
> INIT_LIST_HEAD(&domain->group_list);
> iommu->external_domain = domain;
> - } else
> + } else {
> kfree(domain);
> + }
>
> list_add(&group->next,
> &iommu->external_domain->group_list);
> mutex_unlock(&iommu->lock);
> +
> return 0;
> }
> - symbol_put(mdev_bus_type);
> +
> + bus = iommu_device->bus;
> }
>
> domain->domain = iommu_domain_alloc(bus);
> -----Original Message-----
> From: [email protected] <linux-kernel-
> [email protected]> On Behalf Of Kirti Wankhede
> Sent: Tuesday, March 26, 2019 4:33 AM
> To: Lu Baolu <[email protected]>; Joerg Roedel <[email protected]>;
> David Woodhouse <[email protected]>; Alex Williamson
> <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Jean-Philippe Brucker <jean-
> [email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; Jacob Pan <[email protected]>; Neo Jia
> <[email protected]>
> Subject: Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in
> mdev_device
>
>
>
> On 3/25/2019 7:00 AM, Lu Baolu wrote:
> > A parent device might create different types of mediated devices. For
> > example, a mediated device could be created by the parent device with
> > full isolation and protection provided by the IOMMU. One usage case
> > could be found on Intel platforms where a mediated device is an
> > assignable subset of a PCI, the DMA requests on behalf of it are all
> > tagged with a PASID. Since IOMMU supports PASID-granular translations
> > (scalable mode in VT-d 3.0), this mediated device could be
> > individually protected and isolated by an IOMMU.
> >
> > This patch adds a new member in the struct mdev_device to indicate
> > that the mediated device represented by mdev could be isolated and
> > protected by attaching a domain to a device represented by
> > mdev->iommu_device. It also adds a helper to add or set the iommu
> > device.
> >
> > * mdev_device->iommu_device
> > - This, if set, indicates that the mediated device could
> > be fully isolated and protected by IOMMU via attaching
> > an iommu domain to this device. If empty, it indicates
> > using vendor defined isolation, hence bypass IOMMU.
> >
> > * mdev_set/get_iommu_device(dev, iommu_device)
> > - Set or get the iommu device which represents this mdev
> > in IOMMU's device scope. Drivers don't need to set the
> > iommu device if it uses vendor defined isolation.
> >
> > Cc: Ashok Raj <[email protected]>
> > Cc: Jacob Pan <[email protected]>
> > Cc: Kevin Tian <[email protected]>
> > Cc: Liu Yi L <[email protected]>
> > Suggested-by: Kevin Tian <[email protected]>
> > Suggested-by: Alex Williamson <[email protected]>
> > Signed-off-by: Lu Baolu <[email protected]>
> > Reviewed-by: Jean-Philippe Brucker <[email protected]>
> > ---
> > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
> > drivers/vfio/mdev/mdev_private.h | 1 +
> > include/linux/mdev.h | 14 ++++++++++++++
> > 3 files changed, 33 insertions(+)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index b96fedc77ee5..1b6435529166
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool
> force_remove)
> > return 0;
> > }
> >
> > +int mdev_set_iommu_device(struct device *dev, struct device
> > +*iommu_device) {
> > + struct mdev_device *mdev = to_mdev_device(dev);
> > +
> > + mdev->iommu_device = iommu_device;
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(mdev_set_iommu_device);
> > +
> > +struct device *mdev_get_iommu_device(struct device *dev) {
> > + struct mdev_device *mdev = to_mdev_device(dev);
> > +
> > + return mdev->iommu_device;
> > +}
> > +EXPORT_SYMBOL(mdev_get_iommu_device);
> > +
> > static int __init mdev_init(void)
> > {
> > return mdev_bus_register();
> > diff --git a/drivers/vfio/mdev/mdev_private.h
> > b/drivers/vfio/mdev/mdev_private.h
> > index 379758c52b1b..bfb7b22a7cb6 100644
> > --- a/drivers/vfio/mdev/mdev_private.h
> > +++ b/drivers/vfio/mdev/mdev_private.h
> > @@ -34,6 +34,7 @@ struct mdev_device {
> > struct list_head next;
> > struct kobject *type_kobj;
> > bool active;
> > + struct device *iommu_device;
> > };
> >
This is not a performance path, but it is a good practice to create naturally aligned/packed structures.
You should define struct device *iommu_device; before bool active.
On Wed, 27 Mar 2019 14:17:57 +0000
Parav Pandit <[email protected]> wrote:
> > -----Original Message-----
> > From: [email protected] <linux-kernel-
> > [email protected]> On Behalf Of Kirti Wankhede
> > Sent: Tuesday, March 26, 2019 4:33 AM
> > To: Lu Baolu <[email protected]>; Joerg Roedel <[email protected]>;
> > David Woodhouse <[email protected]>; Alex Williamson
> > <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; Jean-Philippe Brucker <jean-
> > [email protected]>; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; linux-
> > [email protected]; Jacob Pan <[email protected]>; Neo Jia
> > <[email protected]>
> > Subject: Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in
> > mdev_device
> >
> >
> >
> > On 3/25/2019 7:00 AM, Lu Baolu wrote:
> > > A parent device might create different types of mediated devices. For
> > > example, a mediated device could be created by the parent device with
> > > full isolation and protection provided by the IOMMU. One usage case
> > > could be found on Intel platforms where a mediated device is an
> > > assignable subset of a PCI, the DMA requests on behalf of it are all
> > > tagged with a PASID. Since IOMMU supports PASID-granular translations
> > > (scalable mode in VT-d 3.0), this mediated device could be
> > > individually protected and isolated by an IOMMU.
> > >
> > > This patch adds a new member in the struct mdev_device to indicate
> > > that the mediated device represented by mdev could be isolated and
> > > protected by attaching a domain to a device represented by
> > > mdev->iommu_device. It also adds a helper to add or set the iommu
> > > device.
> > >
> > > * mdev_device->iommu_device
> > > - This, if set, indicates that the mediated device could
> > > be fully isolated and protected by IOMMU via attaching
> > > an iommu domain to this device. If empty, it indicates
> > > using vendor defined isolation, hence bypass IOMMU.
> > >
> > > * mdev_set/get_iommu_device(dev, iommu_device)
> > > - Set or get the iommu device which represents this mdev
> > > in IOMMU's device scope. Drivers don't need to set the
> > > iommu device if it uses vendor defined isolation.
> > >
> > > Cc: Ashok Raj <[email protected]>
> > > Cc: Jacob Pan <[email protected]>
> > > Cc: Kevin Tian <[email protected]>
> > > Cc: Liu Yi L <[email protected]>
> > > Suggested-by: Kevin Tian <[email protected]>
> > > Suggested-by: Alex Williamson <[email protected]>
> > > Signed-off-by: Lu Baolu <[email protected]>
> > > Reviewed-by: Jean-Philippe Brucker <[email protected]>
> > > ---
> > > drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
> > > drivers/vfio/mdev/mdev_private.h | 1 +
> > > include/linux/mdev.h | 14 ++++++++++++++
> > > 3 files changed, 33 insertions(+)
> > >
> > > diff --git a/drivers/vfio/mdev/mdev_core.c
> > > b/drivers/vfio/mdev/mdev_core.c index b96fedc77ee5..1b6435529166
> > > 100644
> > > --- a/drivers/vfio/mdev/mdev_core.c
> > > +++ b/drivers/vfio/mdev/mdev_core.c
> > > @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool
> > force_remove)
> > > return 0;
> > > }
> > >
> > > +int mdev_set_iommu_device(struct device *dev, struct device
> > > +*iommu_device) {
> > > + struct mdev_device *mdev = to_mdev_device(dev);
> > > +
> > > + mdev->iommu_device = iommu_device;
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(mdev_set_iommu_device);
> > > +
> > > +struct device *mdev_get_iommu_device(struct device *dev) {
> > > + struct mdev_device *mdev = to_mdev_device(dev);
> > > +
> > > + return mdev->iommu_device;
> > > +}
> > > +EXPORT_SYMBOL(mdev_get_iommu_device);
> > > +
> > > static int __init mdev_init(void)
> > > {
> > > return mdev_bus_register();
> > > diff --git a/drivers/vfio/mdev/mdev_private.h
> > > b/drivers/vfio/mdev/mdev_private.h
> > > index 379758c52b1b..bfb7b22a7cb6 100644
> > > --- a/drivers/vfio/mdev/mdev_private.h
> > > +++ b/drivers/vfio/mdev/mdev_private.h
> > > @@ -34,6 +34,7 @@ struct mdev_device {
> > > struct list_head next;
> > > struct kobject *type_kobj;
> > > bool active;
> > > + struct device *iommu_device;
> > > };
> > >
> This is not a performance path, but it is a good practice to create naturally aligned/packed structures.
> You should define struct device *iommu_device; before bool active.
Agreed, if someone wants to fixup before commit or if there's another
spin please do so, otherwise we can adjust it in a trivial follow-up
patch. Thanks,
Alex
Hi Lu Baolu,
thanks for these patches!
On Mon, Mar 25, 2019 at 09:30:27AM +0800, Lu Baolu wrote:
> Lu Baolu (9):
> iommu: Add APIs for multiple domains per device
> iommu/vt-d: Make intel_iommu_enable_pasid() more generic
> iommu/vt-d: Add per-device IOMMU feature ops entries
> iommu/vt-d: Move common code out of iommu_attch_device()
> iommu/vt-d: Aux-domain specific domain attach/detach
> iommu/vt-d: Return ID associated with an auxiliary domain
I applied the patches above to my tree. The first patch is in a new
branch called 'api-features' and the other are in the x86/vt-d branch.
> vfio/mdev: Add iommu related member in mdev_device
> vfio/type1: Add domain at(de)taching group helpers
> vfio/type1: Handle different mdev isolation type
Can you please address the minor concern on these patches, add all
review and acks and re-send?
Thanks,
Joerg
Hi Joerg,
On 4/11/19 11:19 PM, Joerg Roedel wrote:
> Hi Lu Baolu,
>
> thanks for these patches!
>
> On Mon, Mar 25, 2019 at 09:30:27AM +0800, Lu Baolu wrote:
>> Lu Baolu (9):
>> iommu: Add APIs for multiple domains per device
>> iommu/vt-d: Make intel_iommu_enable_pasid() more generic
>> iommu/vt-d: Add per-device IOMMU feature ops entries
>> iommu/vt-d: Move common code out of iommu_attch_device()
>> iommu/vt-d: Aux-domain specific domain attach/detach
>> iommu/vt-d: Return ID associated with an auxiliary domain
>
> I applied the patches above to my tree. The first patch is in a new
> branch called 'api-features' and the other are in the x86/vt-d branch.
Thank you!
>
>> vfio/mdev: Add iommu related member in mdev_device
>> vfio/type1: Add domain at(de)taching group helpers
>> vfio/type1: Handle different mdev isolation type
>
> Can you please address the minor concern on these patches, add all
> review and acks and re-send?
Sure. I will resend these soon.
>
> Thanks,
>
> Joerg
Best regards,
Lu Baolu
On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote:
> A parent device might create different types of mediated
> devices. For example, a mediated device could be created
> by the parent device with full isolation and protection
> provided by the IOMMU. One usage case could be found on
> Intel platforms where a mediated device is an assignable
> subset of a PCI, the DMA requests on behalf of it are all
> tagged with a PASID. Since IOMMU supports PASID-granular
> translations (scalable mode in VT-d 3.0), this mediated
> device could be individually protected and isolated by an
> IOMMU.
>
> This patch adds a new member in the struct mdev_device to
> indicate that the mediated device represented by mdev could
> be isolated and protected by attaching a domain to a device
> represented by mdev->iommu_device. It also adds a helper to
> add or set the iommu device.
>
> * mdev_device->iommu_device
> - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation, hence bypass IOMMU.
>
> * mdev_set/get_iommu_device(dev, iommu_device)
> - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Suggested-by: Alex Williamson <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
> drivers/vfio/mdev/mdev_private.h | 1 +
> include/linux/mdev.h | 14 ++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b96fedc77ee5..1b6435529166 100644
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove)
> return 0;
> }
>
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_device = iommu_device;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_device);
I was looking at these functions when touching the mdev stuff and I
have some concerns.
1) Please don't merge dead code. It is a year later and there is still
no in-tree user for any of this. This is not our process. Even
worse it was exported so it looks like this dead code is supporting
out of tree modules.
2) Why is this like this? Every struct device already has a connection
to the iommu layer and every mdev has a struct device all its own.
Why did we need to add special 'if (mdev)' stuff all over the
place? This smells like the same abuse Thomas
and I pointed out for the interrupt domains.
After my next series the mdev drivers will have direct access to
the vfio_device. So an alternative to using the struct device, or
adding 'if mdev' is to add an API to the vfio_device world to
inject what iommu configuration is needed from that direction
instead of trying to discover it from a struct device.
3) The vfio_bus_is_mdev() and related symbol_get() nonsense in
drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons
it was not acceptable to do this for the interrupt side either.
4) It seems pretty clear to me this will be heavily impacted by the
/dev/ioasid discussion. Please consider removing the dead code now.
Basically, please fix this before trying to get idxd mdev merged as
the first user.
Jason
Hi Jason,
On 4/7/21 4:00 AM, Jason Gunthorpe wrote:
> On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote:
>> A parent device might create different types of mediated
>> devices. For example, a mediated device could be created
>> by the parent device with full isolation and protection
>> provided by the IOMMU. One usage case could be found on
>> Intel platforms where a mediated device is an assignable
>> subset of a PCI, the DMA requests on behalf of it are all
>> tagged with a PASID. Since IOMMU supports PASID-granular
>> translations (scalable mode in VT-d 3.0), this mediated
>> device could be individually protected and isolated by an
>> IOMMU.
>>
>> This patch adds a new member in the struct mdev_device to
>> indicate that the mediated device represented by mdev could
>> be isolated and protected by attaching a domain to a device
>> represented by mdev->iommu_device. It also adds a helper to
>> add or set the iommu device.
>>
>> * mdev_device->iommu_device
>> - This, if set, indicates that the mediated device could
>> be fully isolated and protected by IOMMU via attaching
>> an iommu domain to this device. If empty, it indicates
>> using vendor defined isolation, hence bypass IOMMU.
>>
>> * mdev_set/get_iommu_device(dev, iommu_device)
>> - Set or get the iommu device which represents this mdev
>> in IOMMU's device scope. Drivers don't need to set the
>> iommu device if it uses vendor defined isolation.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Suggested-by: Kevin Tian <[email protected]>
>> Suggested-by: Alex Williamson <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> ---
>> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
>> drivers/vfio/mdev/mdev_private.h | 1 +
>> include/linux/mdev.h | 14 ++++++++++++++
>> 3 files changed, 33 insertions(+)
>>
>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>> index b96fedc77ee5..1b6435529166 100644
>> +++ b/drivers/vfio/mdev/mdev_core.c
>> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove)
>> return 0;
>> }
>>
>> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
>> +{
>> + struct mdev_device *mdev = to_mdev_device(dev);
>> +
>> + mdev->iommu_device = iommu_device;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(mdev_set_iommu_device);
>
> I was looking at these functions when touching the mdev stuff and I
> have some concerns.
>
> 1) Please don't merge dead code. It is a year later and there is still
> no in-tree user for any of this. This is not our process. Even
> worse it was exported so it looks like this dead code is supporting
> out of tree modules.
>
> 2) Why is this like this? Every struct device already has a connection
> to the iommu layer and every mdev has a struct device all its own.
>
> Why did we need to add special 'if (mdev)' stuff all over the
> place? This smells like the same abuse Thomas
> and I pointed out for the interrupt domains.
I've ever tried to implement a bus iommu_ops for mdev devices.
https://lore.kernel.org/lkml/[email protected]/
Any comments?
Best regards,
baolu
>
> After my next series the mdev drivers will have direct access to
> the vfio_device. So an alternative to using the struct device, or
> adding 'if mdev' is to add an API to the vfio_device world to
> inject what iommu configuration is needed from that direction
> instead of trying to discover it from a struct device.
>
> 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in
> drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons
> it was not acceptable to do this for the interrupt side either.
>
> 4) It seems pretty clear to me this will be heavily impacted by the
> /dev/ioasid discussion. Please consider removing the dead code now.
>
> Basically, please fix this before trying to get idxd mdev merged as
> the first user.
>
> Jason
>
On Wed, Apr 07, 2021 at 09:58:05AM +0800, Lu Baolu wrote:
> I've ever tried to implement a bus iommu_ops for mdev devices.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> Any comments?
You still have the symbol_get, so something continues to be wrong with
that series:
+ mdev_bus = symbol_get(mdev_bus_type);
+ if (mdev_bus) {
+ if (bus == mdev_bus && !iommu_present(bus)) {
+ symbol_put(mdev_bus_type);
Jason
Hi Jason,
On 4/7/21 4:00 AM, Jason Gunthorpe wrote:
> On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote:
>> A parent device might create different types of mediated
>> devices. For example, a mediated device could be created
>> by the parent device with full isolation and protection
>> provided by the IOMMU. One usage case could be found on
>> Intel platforms where a mediated device is an assignable
>> subset of a PCI, the DMA requests on behalf of it are all
>> tagged with a PASID. Since IOMMU supports PASID-granular
>> translations (scalable mode in VT-d 3.0), this mediated
>> device could be individually protected and isolated by an
>> IOMMU.
>>
>> This patch adds a new member in the struct mdev_device to
>> indicate that the mediated device represented by mdev could
>> be isolated and protected by attaching a domain to a device
>> represented by mdev->iommu_device. It also adds a helper to
>> add or set the iommu device.
>>
>> * mdev_device->iommu_device
>> - This, if set, indicates that the mediated device could
>> be fully isolated and protected by IOMMU via attaching
>> an iommu domain to this device. If empty, it indicates
>> using vendor defined isolation, hence bypass IOMMU.
>>
>> * mdev_set/get_iommu_device(dev, iommu_device)
>> - Set or get the iommu device which represents this mdev
>> in IOMMU's device scope. Drivers don't need to set the
>> iommu device if it uses vendor defined isolation.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Suggested-by: Kevin Tian <[email protected]>
>> Suggested-by: Alex Williamson <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> ---
>> drivers/vfio/mdev/mdev_core.c | 18 ++++++++++++++++++
>> drivers/vfio/mdev/mdev_private.h | 1 +
>> include/linux/mdev.h | 14 ++++++++++++++
>> 3 files changed, 33 insertions(+)
>>
>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>> index b96fedc77ee5..1b6435529166 100644
>> +++ b/drivers/vfio/mdev/mdev_core.c
>> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool force_remove)
>> return 0;
>> }
>>
>> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
>> +{
>> + struct mdev_device *mdev = to_mdev_device(dev);
>> +
>> + mdev->iommu_device = iommu_device;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(mdev_set_iommu_device);
>
> I was looking at these functions when touching the mdev stuff and I
> have some concerns.
>
> 1) Please don't merge dead code. It is a year later and there is still
> no in-tree user for any of this. This is not our process. Even
> worse it was exported so it looks like this dead code is supporting
> out of tree modules.
>
> 2) Why is this like this? Every struct device already has a connection
> to the iommu layer and every mdev has a struct device all its own.
>
> Why did we need to add special 'if (mdev)' stuff all over the
> place? This smells like the same abuse Thomas
> and I pointed out for the interrupt domains.
>
> After my next series the mdev drivers will have direct access to
> the vfio_device. So an alternative to using the struct device, or
> adding 'if mdev' is to add an API to the vfio_device world to
> inject what iommu configuration is needed from that direction
> instead of trying to discover it from a struct device.
Just want to make sure that I understand you correctly.
We should use the existing IOMMU in-kernel APIs to connect mdev with the
iommu subsystem, so that the upper lays don't need to use something
like (if dev_is_mdev) to handle mdev differently. Do I get you
correctly?
>
> 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in
> drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons
> it was not acceptable to do this for the interrupt side either.
Yes. Agreed. I will look into it.
>
> 4) It seems pretty clear to me this will be heavily impacted by the
> /dev/ioasid discussion. Please consider removing the dead code now.
>
> Basically, please fix this before trying to get idxd mdev merged as
> the first user.
>
> Jason
>
Best regards,
baolu
On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote:
> > After my next series the mdev drivers will have direct access to
> > the vfio_device. So an alternative to using the struct device, or
> > adding 'if mdev' is to add an API to the vfio_device world to
> > inject what iommu configuration is needed from that direction
> > instead of trying to discover it from a struct device.
>
> Just want to make sure that I understand you correctly.
>
> We should use the existing IOMMU in-kernel APIs to connect mdev with the
> iommu subsystem, so that the upper lays don't need to use something
> like (if dev_is_mdev) to handle mdev differently. Do I get you
> correctly?
After going through all the /dev/ioasid stuff I'm pretty convinced
that none of the PASID use cases for mdev should need any iommu
connection from the mdev_device - this is an artifact of trying to
cram the vfio container and group model into the mdev world and is not
good design.
The PASID interfaces for /dev/ioasid should use the 'struct
pci_device' for everything and never pass in a mdev_device to the
iommu layer.
/dev/ioasid should be designed to support this operation and is why I
strongly want to see the actual vfio_device implementation handle the
connection to the iommu layer and not keep trying to hack through
building what is actually a vfio_device specific connection through
the type1 container code.
> > 3) The vfio_bus_is_mdev() and related symbol_get() nonsense in
> > drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons
> > it was not acceptable to do this for the interrupt side either.
>
> Yes. Agreed. I will look into it.
This will be harder, but the same logic applies - it serves to allow
controlling an ioasid without involving the vfio_device.
Jason
> From: Jason Gunthorpe <[email protected]>
> Sent: Wednesday, May 12, 2021 1:37 AM
>
> On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote:
>
> > > After my next series the mdev drivers will have direct access to
> > > the vfio_device. So an alternative to using the struct device, or
> > > adding 'if mdev' is to add an API to the vfio_device world to
> > > inject what iommu configuration is needed from that direction
> > > instead of trying to discover it from a struct device.
> >
> > Just want to make sure that I understand you correctly.
> >
> > We should use the existing IOMMU in-kernel APIs to connect mdev with the
> > iommu subsystem, so that the upper lays don't need to use something
> > like (if dev_is_mdev) to handle mdev differently. Do I get you
> > correctly?
>
> After going through all the /dev/ioasid stuff I'm pretty convinced
> that none of the PASID use cases for mdev should need any iommu
> connection from the mdev_device - this is an artifact of trying to
> cram the vfio container and group model into the mdev world and is not
> good design.
>
> The PASID interfaces for /dev/ioasid should use the 'struct
> pci_device' for everything and never pass in a mdev_device to the
> iommu layer.
'struct pci_device' -> 'struct device' since /dev/ioasid also needs to support
non-pci devices?
>
> /dev/ioasid should be designed to support this operation and is why I
> strongly want to see the actual vfio_device implementation handle the
> connection to the iommu layer and not keep trying to hack through
> building what is actually a vfio_device specific connection through
> the type1 container code.
>
I assume the so-called connection here implies using iommu_attach_device
to attach a device to an iommu domain. Did you suggest this connection
must be done by the mdev driver which implements vfio_device and then
passing iommu domain to /dev/ioasid when attaching the device to an
IOASID? sort of like:
ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid, domain);
If yes, this conflicts with one design in /dev/ioasid proposal that we're
working on. In earlier discussion we agreed that each ioasid is associated
to a singleton iommu domain and all devices that are attached to this
ioasid with compatible iommu capabilities just share this domain. It
implies that iommu domain is allocated at ATTACH_IOASID phase (e.g.
when the 1st device is attached to an ioasid). Pre-allocating domain by
vfio_device driver means that every device (SR-IOV or mdev) has its own
domain thus cannot share ioasid then.
Did I misunderstand your intention?
Baolu and I discussed below draft proposal to avoid passing mdev_device
to the iommu layer. Please check whether it makes sense:
// for every device attached to an ioasid
// mdev is represented by pasid (allocated by mdev driver)
// pf/vf has INVALID_IOASID in pasid
struct dev_info {
struct list_head next;
struct device *device;
u32 pasid;
}
// for every allocated ioasid
struct ioasid_info {
// the handle to convey iommu operations
struct iommu_domain *domain;
// metadata for map/unmap
struct rb_node dma_list;
// the list of attached device
struct dev_info *dev_list;
...
}
// called by VFIO/VDPA
int ioasid_attach_device(struct *device, u32 ioasid, u32 pasid)
{
// allocate a new dev_info, filled with device/pasid
// allocate iommu domain if it's the 1st attached device
// check iommu compatibility if an domain already exists
// attach the device to the iommu domain
if (pasid == INVALID_IOASID)
iommu_attach_device(domain, device);
else
iommu_aux_attach_device(domain, device, pasid);
// add dev_info to the dev_list of ioasid_info
}
// when attaching PF/VF to an ioasid
ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);
-> get vfio_device of device_fd
-> ioasid_attach_device(vfio_device->dev, ioasid, INVALID_IOASID);
// when attaching a mdev to an ioasid
ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);
-> get vfio_device of device_fd
-> find mdev_parent of vfio_device
-> find pasid allocated to this mdev
-> ioasid_attach_device(parent->dev, ioasid, pasid);
starting from this point the vfio device has been connected to the iommu layer.
/dev/ioasid can accept pgtable cmd on this ioasid and associated domain.
Thanks
Kevin
On Wed, May 12, 2021 at 07:46:05AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Wednesday, May 12, 2021 1:37 AM
> >
> > On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote:
> >
> > > > After my next series the mdev drivers will have direct access to
> > > > the vfio_device. So an alternative to using the struct device, or
> > > > adding 'if mdev' is to add an API to the vfio_device world to
> > > > inject what iommu configuration is needed from that direction
> > > > instead of trying to discover it from a struct device.
> > >
> > > Just want to make sure that I understand you correctly.
> > >
> > > We should use the existing IOMMU in-kernel APIs to connect mdev with the
> > > iommu subsystem, so that the upper lays don't need to use something
> > > like (if dev_is_mdev) to handle mdev differently. Do I get you
> > > correctly?
> >
> > After going through all the /dev/ioasid stuff I'm pretty convinced
> > that none of the PASID use cases for mdev should need any iommu
> > connection from the mdev_device - this is an artifact of trying to
> > cram the vfio container and group model into the mdev world and is not
> > good design.
> >
> > The PASID interfaces for /dev/ioasid should use the 'struct
> > pci_device' for everything and never pass in a mdev_device to the
> > iommu layer.
>
> 'struct pci_device' -> 'struct device' since /dev/ioasid also needs to support
> non-pci devices?
I don't know. PASID is a PCI concept, I half expect to have at least
some wrappers for PCI specific IOMMU APIs so there is reasonable type
safety possible. But maybe it is all general enough that isn't needed.
> I assume the so-called connection here implies using iommu_attach_device
> to attach a device to an iommu domain.
yes
> Did you suggest this connection must be done by the mdev driver
> which implements vfio_device and then passing iommu domain to
> /dev/ioasid when attaching the device to an IOASID?
Why do we need iommu domain in a uAPI at all? It is an artifact of the
current kernel implementation
> sort of like: ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid, domain);
ioasid and device_fd completely describe the IOMMU parameters, don't
they?
> If yes, this conflicts with one design in /dev/ioasid proposal that we're
> working on. In earlier discussion we agreed that each ioasid is associated
> to a singleton iommu domain and all devices that are attached to this
> ioasid with compatible iommu capabilities just share this domain.
I think you need to stand back a bit more from the current detailed
implementation of the iommu API. /dev/ioasid needs to be able to
create IOASID objects that can be used with as many devices as
reasonable without duplicating the page tables. If or how that maps to
todays iommu layer I don't know.
Remember the uAPI is forever, the kernel internals can change.
> Baolu and I discussed below draft proposal to avoid passing mdev_device
> to the iommu layer. Please check whether it makes sense:
>
> // for every device attached to an ioasid
> // mdev is represented by pasid (allocated by mdev driver)
> // pf/vf has INVALID_IOASID in pasid
> struct dev_info {
> struct list_head next;
> struct device *device;
> u32 pasid;
> }
This is a list of "attachments"? sure
> // for every allocated ioasid
> struct ioasid_info {
> // the handle to convey iommu operations
> struct iommu_domain *domain;
> // metadata for map/unmap
> struct rb_node dma_list;
> // the list of attached device
> struct dev_info *dev_list;
> ...
> }
Yes, probably something basically like that
> // called by VFIO/VDPA
> int ioasid_attach_device(struct *device, u32 ioasid, u32 pasid)
'u32 ioasid' should be a 'struct ioasid_info *' and 'pasid' is not
needed because ioasif_info->dev_list[..]->pasid already stores the
value.
Keep in mind at this API level the 'struct device *' here shuld be the
PCI device never the mdev device.
> {
> // allocate a new dev_info, filled with device/pasid
> // allocate iommu domain if it's the 1st attached device
> // check iommu compatibility if an domain already exists
>
> // attach the device to the iommu domain
> if (pasid == INVALID_IOASID)
> iommu_attach_device(domain, device);
> else
> iommu_aux_attach_device(domain, device, pasid);
And if device is the pci_device I don't really see how this works.
This API layer would need to create some dummy struct device to attach
the aux domain too if you want to keep re-using the stuff we have
today.
The idea that a PASID is 1:1 with a 'struct device' is completely VFIO
centric thinking.
> // when attaching PF/VF to an ioasid
> ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);
This would have to be a (ioasif_fd, ioasid) tuple as an ioasid is
scoped within each fd.
> -> get vfio_device of device_fd
> -> ioasid_attach_device(vfio_device->dev, ioasid, INVALID_IOASID);
The device knows if it is going to use a PASID or not. The API level
here should always very explicitly indicate *device* intention.
If the device knows it is going to form DMA's with a PASID tag then it
absoultely must be completely explict and clear in the API to the
layers below that is what is happening.
'INVALID_IOASID' does not communicate that ideal to me.
> // when attaching a mdev to an ioasid
> ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);
> -> get vfio_device of device_fd
> -> find mdev_parent of vfio_device
> -> find pasid allocated to this mdev
> -> ioasid_attach_device(parent->dev, ioasid, pasid);
Again no, mdev shouldn't be involved, it is the wrong layering.
Jason