Hi folks,
The former part of this series refactors the IOMMU SVA code by assigning
an SVA type of iommu_domain to a shared virtual address and replacing
sva_bind/unbind iommu ops with set/block_dev_pasid domain ops.
The latter part changes the existing I/O page fault handling framework
from only serving SVA to a generic one. Any driver or component could
handle the I/O page faults for its domain in its own way by installing
an I/O page fault handler.
This series has been functionally tested on an x86 machine and compile
tested for all architectures.
This series is also available on github:
[2] https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v9
Please review and suggest.
Best regards,
baolu
Change log:
v9:
- Some minor changes on comments and function names.
- Simplify dev_iommu_get_max_pasids().
v8:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Add support for calculating the max pasids that a device could
consume.
- Replace container_of_safe() with container_of.
- Remove iommu_ops->sva_domain_ops and make sva support through the
generic domain_alloc/free() interfaces.
- [Robin] It would be logical to pass IOMMU_DOMAIN_SVA to the normal
domain_alloc call, so that driver-internal stuff like context
descriptors can be still be hung off the domain as usual (rather than
all drivers having to implement some extra internal lookup mechanism
to handle all the SVA domain ops).
- [Robin] I'd just stick the mm pointer in struct iommu_domain, in a
union with the fault handler stuff those are mutually exclusive with
SVA.
- https://lore.kernel.org/linux-iommu/[email protected]/
v7:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Remove duplicate array for sva domain.
- Rename detach_dev_pasid to block_dev_pasid.
- Add raw device driver interfaces for iommufd.
- Other misc refinements and patch reorganization.
- Drop "dmaengine: idxd: Separate user and kernel pasid enabling" which
has been picked for dmaengine tree.
v6:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Refine the SVA basic data structures.
Link: https://lore.kernel.org/linux-iommu/YnFv0ps0Ad8v+7uH@myrica/
- Refine arm smmuv3 sva domain allocation.
- Fix a possible lock issue.
Link: https://lore.kernel.org/linux-iommu/YnFydE8j8l7Q4m+b@myrica/
v5:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Address review comments from Jean-Philippe Brucker. Very appreciated!
- Remove redundant pci aliases check in
device_group_immutable_singleton().
- Treat all buses except PCI as static in immutable singleton check.
- As the sva_bind/unbind() have already guaranteed sva domain free only
after iopf_queue_flush_dev(), remove the unnecessary domain refcount.
- Move domain get() out of the list iteration in iopf_handle_group().
v4:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Solve the overlap with another series and make this series
self-contained.
- No objection to the abstraction of data structure during v3 review.
Hence remove the RFC subject prefix.
- Refine the immutable singleton group code according to Kevin's
comments.
v3:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Rework iommu_group_singleton_lockdown() by adding a flag to the group
that positively indicates the group can never have more than one
member, even after hot plug.
- Abstract the data structs used for iommu sva in a separated patches to
make it easier for review.
- I still keep the RFC prefix in this series as above two significant
changes need at least another round review to be finalized.
- Several misc refinements.
v2:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Add sva domain life cycle management to avoid race between unbind and
page fault handling.
- Use a single domain for each mm.
- Return a single sva handler for the same binding.
- Add a new helper to meet singleton group requirement.
- Rework the SVA domain allocation for arm smmu v3 driver and move the
pasid_bit initialization to device probe.
- Drop the patch "iommu: Handle IO page faults directly".
- Add mmget_not_zero(mm) in SVA page fault handler.
v1:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Initial post.
*** BLURB HERE ***
Lu Baolu (11):
iommu: Add max_pasids field in struct iommu_device
iommu: Add max_pasids field in struct dev_iommu
iommu: Remove SVM_FLAG_SUPERVISOR_MODE support
iommu: Add sva iommu_domain support
iommu/vt-d: Add SVA domain support
arm-smmu-v3/sva: Add SVA domain support
iommu/sva: Refactoring iommu_sva_bind/unbind_device()
iommu: Remove SVA related callbacks from iommu ops
iommu: Prepare IOMMU domain for IOPF
iommu: Per-domain I/O page fault handling
iommu: Rename iommu-sva-lib.{c,h}
include/linux/intel-iommu.h | 12 +-
include/linux/intel-svm.h | 13 -
include/linux/iommu.h | 128 +++++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 19 +-
.../iommu/{iommu-sva-lib.h => iommu-sva.h} | 14 +-
drivers/dma/idxd/cdev.c | 3 +-
drivers/dma/idxd/init.c | 25 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 112 +++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +-
drivers/iommu/intel/dmar.c | 7 +
drivers/iommu/intel/iommu.c | 7 +-
drivers/iommu/intel/svm.c | 149 +++++------
drivers/iommu/io-pgfault.c | 73 ++----
drivers/iommu/iommu-sva-lib.c | 71 ------
drivers/iommu/iommu-sva.c | 229 +++++++++++++++++
drivers/iommu/iommu.c | 237 +++++++++++-------
drivers/misc/uacce/uacce.c | 2 +-
drivers/iommu/Makefile | 2 +-
18 files changed, 649 insertions(+), 463 deletions(-)
rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (83%)
delete mode 100644 drivers/iommu/iommu-sva-lib.c
create mode 100644 drivers/iommu/iommu-sva.c
--
2.25.1
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use
cases yet where DMA engines need kernel virtual addresses for in-kernel
DMA.
This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
The device drivers are suggested to handle kernel DMA with PASID through
the kernel DMA APIs.
The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
needed anymore. Cleanup them as well.
Link: https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
include/linux/intel-iommu.h | 3 +-
include/linux/intel-svm.h | 13 -----
include/linux/iommu.h | 8 +--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-
drivers/dma/idxd/cdev.c | 3 +-
drivers/dma/idxd/init.c | 25 +-------
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +-
drivers/iommu/intel/svm.c | 57 +++++--------------
drivers/iommu/iommu.c | 5 +-
drivers/misc/uacce/uacce.c | 2 +-
10 files changed, 26 insertions(+), 98 deletions(-)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index e065cbe3c857..31e3edc0fc7e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,8 +738,7 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
extern void intel_svm_check(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 207ef06ba3e1..f9a0d44f6fdb 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -13,17 +13,4 @@
#define PRQ_RING_MASK ((0x1000 << PRQ_ORDER) - 0x20)
#define PRQ_DEPTH ((0x1000 << PRQ_ORDER) >> 5)
-/*
- * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
- * for access to kernel addresses. No IOTLB flushes are automatically done
- * for kernel mappings; it is valid only for access to the kernel's static
- * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
- * A future API addition may permit the use of such ranges, by means of an
- * explicit IOTLB flush call (akin to the DMA API's unmap method).
- *
- * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
- * do such IOTLB flushes automatically.
- */
-#define SVM_FLAG_SUPERVISOR_MODE BIT(0)
-
#endif /* __INTEL_SVM_H__ */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d50afb2c9a09..3fbad42c0bf8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -243,8 +243,7 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
- struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+ struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
void (*sva_unbind)(struct iommu_sva *handle);
u32 (*sva_get_pasid)(struct iommu_sva *handle);
@@ -669,8 +668,7 @@ int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm,
- void *drvdata);
+ struct mm_struct *mm);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
@@ -1012,7 +1010,7 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
}
static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
return NULL;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..d2ba86470c42 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
@@ -791,7 +790,7 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
}
static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
{
return ERR_PTR(-ENODEV);
}
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index c2808fd081d6..66720001ba1c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -6,7 +6,6 @@
#include <linux/pci.h>
#include <linux/device.h>
#include <linux/sched/task.h>
-#include <linux/intel-svm.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/cdev.h>
#include <linux/fs.h>
@@ -100,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
filp->private_data = ctx;
if (device_user_pasid_enabled(idxd)) {
- sva = iommu_sva_bind_device(dev, current->mm, NULL);
+ sva = iommu_sva_bind_device(dev, current->mm);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 355fb3ef4cbf..00b437f4f573 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -14,7 +14,6 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/device.h>
#include <linux/idr.h>
-#include <linux/intel-svm.h>
#include <linux/iommu.h>
#include <uapi/linux/idxd.h>
#include <linux/dmaengine.h>
@@ -466,29 +465,7 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- int flags;
- unsigned int pasid;
- struct iommu_sva *sva;
-
- flags = SVM_FLAG_SUPERVISOR_MODE;
-
- sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
- if (IS_ERR(sva)) {
- dev_warn(&idxd->pdev->dev,
- "iommu sva bind failed: %ld\n", PTR_ERR(sva));
- return PTR_ERR(sva);
- }
-
- pasid = iommu_sva_get_pasid(sva);
- if (pasid == IOMMU_PASID_INVALID) {
- iommu_sva_unbind_device(sva);
- return -ENODEV;
- }
-
- idxd->sva = sva;
- idxd->pasid = pasid;
- dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
- return 0;
+ return -EOPNOTSUPP;
}
static void idxd_disable_system_pasid(struct idxd_device *idxd)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 1ef7bbb4acf3..f155d406c5d5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -367,8 +367,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
return ERR_PTR(ret);
}
-struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
{
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7ee37d996e15..d04880a291c3 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -313,8 +313,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
return 0;
}
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
- unsigned int flags)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
{
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
@@ -324,8 +323,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
struct device *dev,
- struct mm_struct *mm,
- unsigned int flags)
+ struct mm_struct *mm)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
unsigned long iflags, sflags;
@@ -341,22 +339,18 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
svm->pasid = mm->pasid;
svm->mm = mm;
- svm->flags = flags;
INIT_LIST_HEAD_RCU(&svm->devs);
- if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
- svm->notifier.ops = &intel_mmuops;
- ret = mmu_notifier_register(&svm->notifier, mm);
- if (ret) {
- kfree(svm);
- return ERR_PTR(ret);
- }
+ svm->notifier.ops = &intel_mmuops;
+ ret = mmu_notifier_register(&svm->notifier, mm);
+ if (ret) {
+ kfree(svm);
+ return ERR_PTR(ret);
}
ret = pasid_private_add(svm->pasid, svm);
if (ret) {
- if (svm->notifier.ops)
- mmu_notifier_unregister(&svm->notifier, mm);
+ mmu_notifier_unregister(&svm->notifier, mm);
kfree(svm);
return ERR_PTR(ret);
}
@@ -391,9 +385,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
}
/* Setup the pasid table: */
- sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
- PASID_FLAG_SUPERVISOR_MODE : 0;
- sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
+ sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
spin_lock_irqsave(&iommu->lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
FLPT_DEFAULT_DID, sflags);
@@ -410,8 +402,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
kfree(sdev);
free_svm:
if (list_empty(&svm->devs)) {
- if (svm->notifier.ops)
- mmu_notifier_unregister(&svm->notifier, mm);
+ mmu_notifier_unregister(&svm->notifier, mm);
pasid_private_remove(mm->pasid);
kfree(svm);
}
@@ -767,7 +758,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
* to unbind the mm while any page faults are outstanding.
*/
svm = pasid_private_find(req->pasid);
- if (IS_ERR_OR_NULL(svm) || (svm->flags & SVM_FLAG_SUPERVISOR_MODE))
+ if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
@@ -818,40 +809,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
return IRQ_RETVAL(handled);
}
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
{
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- unsigned int flags = 0;
struct iommu_sva *sva;
int ret;
- if (drvdata)
- flags = *(unsigned int *)drvdata;
-
- if (flags & SVM_FLAG_SUPERVISOR_MODE) {
- if (!ecap_srs(iommu->ecap)) {
- dev_err(dev, "%s: Supervisor PASID not supported\n",
- iommu->name);
- return ERR_PTR(-EOPNOTSUPP);
- }
-
- if (mm) {
- dev_err(dev, "%s: Supervisor PASID with user provided mm\n",
- iommu->name);
- return ERR_PTR(-EINVAL);
- }
-
- mm = &init_mm;
- }
-
mutex_lock(&pasid_mutex);
- ret = intel_svm_alloc_pasid(dev, mm, flags);
+ ret = intel_svm_alloc_pasid(dev, mm);
if (ret) {
mutex_unlock(&pasid_mutex);
return ERR_PTR(ret);
}
- sva = intel_svm_bind_mm(iommu, dev, mm, flags);
+ sva = intel_svm_bind_mm(iommu, dev, mm);
mutex_unlock(&pasid_mutex);
return sva;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6b731568efff..b5c32aab9686 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2788,7 +2788,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
* iommu_sva_bind_device() - Bind a process address space to a device
* @dev: the device
* @mm: the mm to bind, caller must hold a reference to it
- * @drvdata: opaque data pointer to pass to bind callback
*
* Create a bond between device and address space, allowing the device to access
* the mm using the returned PASID. If a bond already exists between @device and
@@ -2801,7 +2800,7 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
* On error, returns an ERR_PTR value.
*/
struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
struct iommu_group *group;
struct iommu_sva *handle = ERR_PTR(-EINVAL);
@@ -2826,7 +2825,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
if (iommu_group_device_count(group) != 1)
goto out_unlock;
- handle = ops->sva_bind(dev, mm, drvdata);
+ handle = ops->sva_bind(dev, mm);
out_unlock:
mutex_unlock(&group->mutex);
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..3238a867ea51 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
if (!(uacce->flags & UACCE_DEV_SVA))
return 0;
- handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
+ handle = iommu_sva_bind_device(uacce->parent, current->mm);
if (IS_ERR(handle))
return PTR_ERR(handle);
--
2.25.1
The sva iommu_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:
- Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
type. The IOMMU drivers that support SVA should provide the sva
domain specific iommu_domain_ops.
- Add a helper to allocate an SVA domain. The iommu_domain_free()
is still used to free an SVA domain.
- Add helpers to attach an SVA domain to a device and the reverse
operation.
Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, the attach/detach interfaces only apply to devices
belonging to the singleton groups, and the singleton is immutable in
fabric i.e. not affected by hotplug.
The iommu_attach/detach_device_pasid() can be used for other purposes,
such as kernel DMA with pasid, mediation device, etc.
Suggested-by: Jean-Philippe Brucker <[email protected]>
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
include/linux/iommu.h | 45 ++++++++++++++++++++-
drivers/iommu/iommu.c | 93 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 136 insertions(+), 2 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3fbad42c0bf8..b8b6b8c5e20e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,8 @@ struct iommu_domain_geometry {
#define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */
#define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue */
+#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
+
/*
* This are the possible domain-types
*
@@ -77,6 +79,8 @@ struct iommu_domain_geometry {
* certain optimizations for these domains
* IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
* invalidation.
+ * IOMMU_DOMAIN_SVA - DMA addresses are shared process address
+ * spaces represented by mm_struct's.
*/
#define IOMMU_DOMAIN_BLOCKED (0U)
#define IOMMU_DOMAIN_IDENTITY (__IOMMU_DOMAIN_PT)
@@ -86,15 +90,23 @@ struct iommu_domain_geometry {
#define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING | \
__IOMMU_DOMAIN_DMA_API | \
__IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA)
struct iommu_domain {
unsigned type;
const struct iommu_domain_ops *ops;
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
- iommu_fault_handler_t handler;
- void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+ union {
+ struct { /* IOMMU_DOMAIN_DMA */
+ iommu_fault_handler_t handler;
+ void *handler_token;
+ };
+ struct { /* IOMMU_DOMAIN_SVA */
+ struct mm_struct *mm;
+ };
+ };
};
static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -262,6 +274,8 @@ struct iommu_ops {
* struct iommu_domain_ops - domain specific operations
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
+ * @set_dev_pasid: set an iommu domain to a pasid of device
+ * @block_dev_pasid: block pasid of device from using iommu domain
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same size to
* an iommu domain.
@@ -282,6 +296,10 @@ struct iommu_ops {
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
+ void (*block_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -679,6 +697,12 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm);
+int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1052,6 +1076,23 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
{
return false;
}
+
+static inline struct iommu_domain *
+iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+}
#endif /* CONFIG_IOMMU_API */
/**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b5c32aab9686..8450f914cb2b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/cc_platform.h>
#include <trace/events/iommu.h>
+#include <linux/sched/mm.h>
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
@@ -39,6 +40,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+ struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -660,6 +662,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
+ xa_init(&group->pasid_array);
ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
if (ret < 0) {
@@ -1955,6 +1958,8 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
void iommu_domain_free(struct iommu_domain *domain)
{
+ if (domain->type == IOMMU_DOMAIN_SVA)
+ mmdrop(domain->mm);
iommu_put_dma_cookie(domain);
domain->ops->free(domain);
}
@@ -3271,3 +3276,91 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group)
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm)
+{
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+ struct iommu_domain *domain;
+
+ domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+ if (!domain)
+ return NULL;
+
+ domain->type = IOMMU_DOMAIN_SVA;
+ mmgrab(mm);
+ domain->mm = mm;
+
+ return domain;
+}
+
+static bool iommu_group_immutable_singleton(struct iommu_group *group,
+ struct device *dev)
+{
+ int count;
+
+ mutex_lock(&group->mutex);
+ count = iommu_group_device_count(group);
+ mutex_unlock(&group->mutex);
+
+ if (count != 1)
+ return false;
+
+ /*
+ * The PCI device could be considered to be fully isolated if all
+ * devices on the path from the device to the host-PCI bridge are
+ * protected from peer-to-peer DMA by ACS.
+ */
+ if (dev_is_pci(dev))
+ return pci_acs_path_enabled(to_pci_dev(dev), NULL,
+ REQ_ACS_FLAGS);
+
+ /*
+ * Otherwise, the device came from DT/ACPI, assume it is static and
+ * then singleton can know from the device count in the group.
+ */
+ return true;
+}
+
+int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group;
+ int ret = -EBUSY;
+ void *curr;
+
+ if (!domain->ops->set_dev_pasid)
+ return -EOPNOTSUPP;
+
+ group = iommu_group_get(dev);
+ if (!group || !iommu_group_immutable_singleton(group, dev)) {
+ iommu_group_put(group);
+ return -EINVAL;
+ }
+
+ mutex_lock(&group->mutex);
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+ if (curr)
+ goto out_unlock;
+ ret = domain->ops->set_dev_pasid(domain, dev, pasid);
+ if (ret)
+ xa_erase(&group->pasid_array, pasid);
+out_unlock:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+
+ mutex_lock(&group->mutex);
+ domain->ops->block_dev_pasid(domain, dev, pasid);
+ xa_erase(&group->pasid_array, pasid);
+ mutex_unlock(&group->mutex);
+
+ iommu_group_put(group);
+}
--
2.25.1
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 69 +++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +
3 files changed, 78 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d2ba86470c42..96399dd3a67a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
+struct iommu_domain *arm_smmu_sva_domain_alloc(void);
#else /* CONFIG_ARM_SMMU_V3_SVA */
static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
{
@@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
}
static inline void arm_smmu_sva_notifier_synchronize(void) {}
+
+static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+ return NULL;
+}
#endif /* CONFIG_ARM_SMMU_V3_SVA */
#endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index f155d406c5d5..fc4555dac5b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void)
*/
mmu_notifier_synchronize();
}
+
+static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ int ret = 0;
+ struct mm_struct *mm;
+ struct iommu_sva *handle;
+
+ if (domain->type != IOMMU_DOMAIN_SVA)
+ return -EINVAL;
+
+ mm = domain->mm;
+ if (WARN_ON(!mm))
+ return -ENODEV;
+
+ mutex_lock(&sva_lock);
+ handle = __arm_smmu_sva_bind(dev, mm);
+ if (IS_ERR(handle))
+ ret = PTR_ERR(handle);
+ mutex_unlock(&sva_lock);
+
+ return ret;
+}
+
+static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ struct mm_struct *mm = domain->mm;
+ struct arm_smmu_bond *bond = NULL, *t;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+ mutex_lock(&sva_lock);
+ list_for_each_entry(t, &master->bonds, list) {
+ if (t->mm == mm) {
+ bond = t;
+ break;
+ }
+ }
+
+ if (!WARN_ON(!bond) && refcount_dec_and_test(&bond->refs)) {
+ list_del(&bond->list);
+ arm_smmu_mmu_notifier_put(bond->smmu_mn);
+ kfree(bond);
+ }
+ mutex_unlock(&sva_lock);
+}
+
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
+{
+ kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+ .set_dev_pasid = arm_smmu_sva_set_dev_pasid,
+ .block_dev_pasid = arm_smmu_sva_block_dev_pasid,
+ .free = arm_smmu_sva_domain_free,
+};
+
+struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+ struct iommu_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+ if (!domain)
+ return NULL;
+ domain->ops = &arm_smmu_sva_domain_ops;
+
+ return domain;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ae8ec8df47c1..a30b252e2f95 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1999,6 +1999,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
struct arm_smmu_domain *smmu_domain;
+ if (type == IOMMU_DOMAIN_SVA)
+ return arm_smmu_sva_domain_alloc();
+
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&
--
2.25.1
These ops'es have been replaced with the dev_attach/detach_pasid domain
ops'es. There's no need for them anymore. Remove them to avoid dead
code.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
include/linux/intel-iommu.h | 3 --
include/linux/iommu.h | 7 ---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 16 ------
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 40 ---------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 --
drivers/iommu/intel/iommu.c | 3 --
drivers/iommu/intel/svm.c | 49 -------------------
7 files changed, 121 deletions(-)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 9007428a68f1..5bd19c95a926 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,9 +738,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
extern void intel_svm_check(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
-void intel_svm_unbind(struct iommu_sva *handle);
-u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
struct iommu_domain *intel_svm_domain_alloc(void);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c0c23d9fd8fe..17780537db6e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,9 +227,6 @@ struct iommu_iotlb_gather {
* @dev_has/enable/disable_feat: per device entries to check/enable/disable
* iommu specific features.
* @dev_feat_enabled: check enabled feature
- * @sva_bind: Bind process address space to device
- * @sva_unbind: Unbind process address space from device
- * @sva_get_pasid: Get PASID associated to a SVA handle
* @page_response: handle page request response
* @def_domain_type: device default domain type, return value:
* - IOMMU_DOMAIN_IDENTITY: must use an identity domain
@@ -263,10 +260,6 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
- struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
- void (*sva_unbind)(struct iommu_sva *handle);
- u32 (*sva_get_pasid)(struct iommu_sva *handle);
-
int (*page_response)(struct device *dev,
struct iommu_fault_event *evt,
struct iommu_page_response *msg);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 96399dd3a67a..15dd4c7e6d3a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,9 +754,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
-void arm_smmu_sva_unbind(struct iommu_sva *handle);
-u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
struct iommu_domain *arm_smmu_sva_domain_alloc(void);
#else /* CONFIG_ARM_SMMU_V3_SVA */
@@ -790,19 +787,6 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
return false;
}
-static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
-{
- return ERR_PTR(-ENODEV);
-}
-
-static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
-
-static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
-{
- return IOMMU_PASID_INVALID;
-}
-
static inline void arm_smmu_sva_notifier_synchronize(void) {}
static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index fc4555dac5b4..e36c689f56c5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -344,11 +344,6 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
if (!bond)
return ERR_PTR(-ENOMEM);
- /* Allocate a PASID for this mm if necessary */
- ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
- if (ret)
- goto err_free_bond;
-
bond->mm = mm;
bond->sva.dev = dev;
refcount_set(&bond->refs, 1);
@@ -367,41 +362,6 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
return ERR_PTR(ret);
}
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
-{
- struct iommu_sva *handle;
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
- struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-
- if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
- return ERR_PTR(-EINVAL);
-
- mutex_lock(&sva_lock);
- handle = __arm_smmu_sva_bind(dev, mm);
- mutex_unlock(&sva_lock);
- return handle;
-}
-
-void arm_smmu_sva_unbind(struct iommu_sva *handle)
-{
- struct arm_smmu_bond *bond = sva_to_bond(handle);
-
- mutex_lock(&sva_lock);
- if (refcount_dec_and_test(&bond->refs)) {
- list_del(&bond->list);
- arm_smmu_mmu_notifier_put(bond->smmu_mn);
- kfree(bond);
- }
- mutex_unlock(&sva_lock);
-}
-
-u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
-{
- struct arm_smmu_bond *bond = sva_to_bond(handle);
-
- return bond->mm->pasid;
-}
-
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
{
unsigned long reg, fld;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a30b252e2f95..8b9b78c7a67d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2855,9 +2855,6 @@ static struct iommu_ops arm_smmu_ops = {
.dev_feat_enabled = arm_smmu_dev_feature_enabled,
.dev_enable_feat = arm_smmu_dev_enable_feature,
.dev_disable_feat = arm_smmu_dev_disable_feature,
- .sva_bind = arm_smmu_sva_bind,
- .sva_unbind = arm_smmu_sva_unbind,
- .sva_get_pasid = arm_smmu_sva_get_pasid,
.page_response = arm_smmu_page_response,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
.owner = THIS_MODULE,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 993a1ce509a8..37d68eda1889 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4921,9 +4921,6 @@ const struct iommu_ops intel_iommu_ops = {
.def_domain_type = device_def_domain_type,
.pgsize_bitmap = SZ_4K,
#ifdef CONFIG_INTEL_IOMMU_SVM
- .sva_bind = intel_svm_bind,
- .sva_unbind = intel_svm_unbind,
- .sva_get_pasid = intel_svm_get_pasid,
.page_response = intel_svm_page_response,
#endif
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7d4f9d173013..db55b06cafdf 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -313,14 +313,6 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
return 0;
}
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
-{
- ioasid_t max_pasid = dev_is_pci(dev) ?
- pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
-
- return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
-}
-
static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
struct device *dev,
struct mm_struct *mm)
@@ -809,47 +801,6 @@ static irqreturn_t prq_event_thread(int irq, void *d)
return IRQ_RETVAL(handled);
}
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
-{
- struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- struct iommu_sva *sva;
- int ret;
-
- mutex_lock(&pasid_mutex);
- ret = intel_svm_alloc_pasid(dev, mm);
- if (ret) {
- mutex_unlock(&pasid_mutex);
- return ERR_PTR(ret);
- }
-
- sva = intel_svm_bind_mm(iommu, dev, mm);
- mutex_unlock(&pasid_mutex);
-
- return sva;
-}
-
-void intel_svm_unbind(struct iommu_sva *sva)
-{
- struct intel_svm_dev *sdev = to_intel_svm_dev(sva);
-
- mutex_lock(&pasid_mutex);
- intel_svm_unbind_mm(sdev->dev, sdev->pasid);
- mutex_unlock(&pasid_mutex);
-}
-
-u32 intel_svm_get_pasid(struct iommu_sva *sva)
-{
- struct intel_svm_dev *sdev;
- u32 pasid;
-
- mutex_lock(&pasid_mutex);
- sdev = to_intel_svm_dev(sva);
- pasid = sdev->pasid;
- mutex_unlock(&pasid_mutex);
-
- return pasid;
-}
-
int intel_svm_page_response(struct device *dev,
struct iommu_fault_event *evt,
struct iommu_page_response *msg)
--
2.25.1
Use this field to keep the number of supported PASIDs that an IOMMU
hardware is able to support. This is a generic attribute of an IOMMU
and lifting it into the per-IOMMU device structure makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver that supports PASID related features should set this
field before enabling them on the devices.
In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
enclave so that the pasid_supported() helper could be used in dmar.c
without compilation errors.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
include/linux/intel-iommu.h | 3 ++-
include/linux/iommu.h | 2 ++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
drivers/iommu/intel/dmar.c | 7 +++++++
4 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4f29139bbfc3..e065cbe3c857 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -479,7 +479,6 @@ enum {
#define VTD_FLAG_IRQ_REMAP_PRE_ENABLED (1 << 1)
#define VTD_FLAG_SVM_CAPABLE (1 << 2)
-extern int intel_iommu_sm;
extern spinlock_t device_domain_lock;
#define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
@@ -786,6 +785,7 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
extern const struct iommu_ops intel_iommu_ops;
#ifdef CONFIG_INTEL_IOMMU
+extern int intel_iommu_sm;
extern int iommu_calculate_agaw(struct intel_iommu *iommu);
extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu);
extern int dmar_disabled;
@@ -802,6 +802,7 @@ static inline int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
}
#define dmar_disabled (1)
#define intel_iommu_enabled (0)
+#define intel_iommu_sm (0)
#endif
static inline const char *decode_prq_descriptor(char *str, size_t size,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e1afe169549..03fbb1b71536 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -318,12 +318,14 @@ struct iommu_domain_ops {
* @list: Used by the iommu-core to keep a list of registered iommus
* @ops: iommu-ops for talking to this iommu
* @dev: struct device for sysfs handling
+ * @max_pasids: number of supported PASIDs
*/
struct iommu_device {
struct list_head list;
const struct iommu_ops *ops;
struct fwnode_handle *fwnode;
struct device *dev;
+ u32 max_pasids;
};
/**
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88817a3376ef..ae8ec8df47c1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
/* SID/SSID sizes */
smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+ smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
/*
* If the SMMU supports fewer bits than would fill a single L2 stream
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 592c1e1a5d4b..6c338888061a 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1123,6 +1123,13 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
raw_spin_lock_init(&iommu->register_lock);
+ /*
+ * A value of N in PSS field of eCap register indicates hardware
+ * supports PASID field of N+1 bits.
+ */
+ if (pasid_supported(iommu))
+ iommu->iommu.max_pasids = 2UL << ecap_pss(iommu->ecap);
+
/*
* This is only for hotplug; at boot time intel_iommu_enabled won't
* be set yet. When intel_iommu_init() runs, it registers the units
--
2.25.1
The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.
This refactors the IOMMU SVA interfaces implementation by using the
set/block_pasid_dev ops and align them with the concept of the SVA
iommu domain. Put the new SVA code in the sva related file in order
to make it self-contained.
Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 67 +++++++++++--------
drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 119 ++++++++--------------------------
3 files changed, 165 insertions(+), 119 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b8b6b8c5e20e..c0c23d9fd8fe 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -39,7 +39,6 @@ struct device;
struct iommu_domain;
struct iommu_domain_ops;
struct notifier_block;
-struct iommu_sva;
struct iommu_fault_event;
struct iommu_dma_cookie;
@@ -57,6 +56,14 @@ struct iommu_domain_geometry {
bool force_aperture; /* DMA only allowed in mappable range? */
};
+/**
+ * struct iommu_sva - handle to a device-mm bond
+ */
+struct iommu_sva {
+ struct device *dev;
+ refcount_t users;
+};
+
/* Domain feature flags */
#define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for iommu_map/unmap */
#define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in DMA-API
@@ -105,6 +112,7 @@ struct iommu_domain {
};
struct { /* IOMMU_DOMAIN_SVA */
struct mm_struct *mm;
+ struct iommu_sva bond;
};
};
};
@@ -638,13 +646,6 @@ struct iommu_fwspec {
/* ATS is supported */
#define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
-/**
- * struct iommu_sva - handle to a device-mm bond
- */
-struct iommu_sva {
- struct device *dev;
-};
-
int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
const struct iommu_ops *ops);
void iommu_fwspec_free(struct device *dev);
@@ -685,11 +686,6 @@ int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
-
int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);
@@ -703,6 +699,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid);
void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1033,21 +1031,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
return -ENODEV;
}
-static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
-{
- return NULL;
-}
-
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
-}
-
-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
- return IOMMU_PASID_INVALID;
-}
-
static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
{
return NULL;
@@ -1093,6 +1076,12 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
}
+
+static inline struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ return NULL;
+}
#endif /* CONFIG_IOMMU_API */
/**
@@ -1118,4 +1107,26 @@ void iommu_debugfs_setup(void);
static inline void iommu_debugfs_setup(void) {}
#endif
+#ifdef CONFIG_IOMMU_SVA
+struct iommu_sva *iommu_sva_bind_device(struct device *dev,
+ struct mm_struct *mm);
+void iommu_sva_unbind_device(struct iommu_sva *handle);
+u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+#else
+static inline struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
+
+static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+}
+
+static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+ return IOMMU_PASID_INVALID;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
#endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..1e3e2b395b1e 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -4,6 +4,7 @@
*/
#include <linux/mutex.h>
#include <linux/sched/mm.h>
+#include <linux/iommu.h>
#include "iommu-sva-lib.h"
@@ -69,3 +70,100 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
return ioasid_find(&iommu_sva_pasid, pasid, __mmget_not_zero);
}
EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to mm_users
+ *
+ * Create a bond between device and address space, allowing the device to access
+ * the mm using the returned PASID. If a bond already exists between @device and
+ * @mm, it is returned and an additional reference is taken. Caller must call
+ * iommu_sva_unbind_device() to release each reference.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * On error, returns an ERR_PTR value.
+ */
+struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+{
+ struct iommu_domain *domain;
+ ioasid_t max_pasids;
+ int ret = -EINVAL;
+
+ /* Allocate mm->pasid if necessary. */
+ max_pasids = dev->iommu->max_pasids;
+ if (!max_pasids)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
+ if (ret)
+ return ERR_PTR(ret);
+
+ mutex_lock(&iommu_sva_lock);
+ /* Search for an existing domain. */
+ domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
+ if (domain) {
+ refcount_inc(&domain->bond.users);
+ goto out_success;
+ }
+
+ /* Allocate a new domain and set it on device pasid. */
+ domain = iommu_sva_domain_alloc(dev, mm);
+ if (!domain) {
+ ret = -ENOMEM;
+ goto out_unlock;
+ }
+
+ ret = iommu_attach_device_pasid(domain, dev, mm->pasid);
+ if (ret)
+ goto out_free_domain;
+ domain->bond.dev = dev;
+ refcount_set(&domain->bond.users, 1);
+
+out_success:
+ mutex_unlock(&iommu_sva_lock);
+ return &domain->bond;
+
+out_free_domain:
+ iommu_domain_free(domain);
+out_unlock:
+ mutex_unlock(&iommu_sva_lock);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @handle: the handle returned by iommu_sva_bind_device()
+ *
+ * Put reference to a bond between device and address space. The device should
+ * not be issuing any more transaction for this PASID. All outstanding page
+ * requests for this PASID must have been flushed to the IOMMU.
+ */
+void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+ struct device *dev = handle->dev;
+ struct iommu_domain *domain =
+ container_of(handle, struct iommu_domain, bond);
+ ioasid_t pasid = iommu_sva_get_pasid(handle);
+
+ mutex_lock(&iommu_sva_lock);
+ if (refcount_dec_and_test(&domain->bond.users)) {
+ iommu_detach_device_pasid(domain, dev, pasid);
+ iommu_domain_free(domain);
+ }
+ mutex_unlock(&iommu_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
+
+u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+ struct iommu_domain *domain =
+ container_of(handle, struct iommu_domain, bond);
+
+ return domain->mm->pasid;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8450f914cb2b..34d71418e7c7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2789,97 +2789,6 @@ bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
}
EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
-/**
- * iommu_sva_bind_device() - Bind a process address space to a device
- * @dev: the device
- * @mm: the mm to bind, caller must hold a reference to it
- *
- * Create a bond between device and address space, allowing the device to access
- * the mm using the returned PASID. If a bond already exists between @device and
- * @mm, it is returned and an additional reference is taken. Caller must call
- * iommu_sva_unbind_device() to release each reference.
- *
- * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
- * initialize the required SVA features.
- *
- * On error, returns an ERR_PTR value.
- */
-struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
-{
- struct iommu_group *group;
- struct iommu_sva *handle = ERR_PTR(-EINVAL);
- const struct iommu_ops *ops = dev_iommu_ops(dev);
-
- if (!ops->sva_bind)
- return ERR_PTR(-ENODEV);
-
- group = iommu_group_get(dev);
- if (!group)
- return ERR_PTR(-ENODEV);
-
- /* Ensure device count and domain don't change while we're binding */
- mutex_lock(&group->mutex);
-
- /*
- * To keep things simple, SVA currently doesn't support IOMMU groups
- * with more than one device. Existing SVA-capable systems are not
- * affected by the problems that required IOMMU groups (lack of ACS
- * isolation, device ID aliasing and other hardware issues).
- */
- if (iommu_group_device_count(group) != 1)
- goto out_unlock;
-
- handle = ops->sva_bind(dev, mm);
-
-out_unlock:
- mutex_unlock(&group->mutex);
- iommu_group_put(group);
-
- return handle;
-}
-EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
-
-/**
- * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
- * @handle: the handle returned by iommu_sva_bind_device()
- *
- * Put reference to a bond between device and address space. The device should
- * not be issuing any more transaction for this PASID. All outstanding page
- * requests for this PASID must have been flushed to the IOMMU.
- */
-void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
- struct iommu_group *group;
- struct device *dev = handle->dev;
- const struct iommu_ops *ops = dev_iommu_ops(dev);
-
- if (!ops->sva_unbind)
- return;
-
- group = iommu_group_get(dev);
- if (!group)
- return;
-
- mutex_lock(&group->mutex);
- ops->sva_unbind(handle);
- mutex_unlock(&group->mutex);
-
- iommu_group_put(group);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
-
-u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
- const struct iommu_ops *ops = dev_iommu_ops(handle->dev);
-
- if (!ops->sva_get_pasid)
- return IOMMU_PASID_INVALID;
-
- return ops->sva_get_pasid(handle);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
-
/*
* Changes the default domain of an iommu group that has *only* one device
*
@@ -3364,3 +3273,31 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
iommu_group_put(group);
}
+
+/*
+ * This is a variant of iommu_get_domain_for_dev(). It returns the existing
+ * domain attached to pasid of a device. It's only for internal use of the
+ * IOMMU subsystem. The caller must take care to avoid any possible
+ * use-after-free case.
+ */
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+ struct iommu_group *group;
+
+ if (!pasid_valid(pasid))
+ return NULL;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return NULL;
+ /*
+ * The xarray protects its internal state with RCU. Hence the domain
+ * obtained is either NULL or fully formed.
+ */
+ domain = xa_load(&group->pasid_array, pasid);
+ iommu_group_put(group);
+
+ return domain;
+}
--
2.25.1
Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.
The iommu_get_domain_for_dev_pasid() which retrieves attached domain
for a {device, PASID} pair is used. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() waits for pending faults with
iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
to synchronize life cycle of the iommu domains between the unbind() and
the interrupt threads.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/io-pgfault.c | 64 +++++---------------------------------
1 file changed, 7 insertions(+), 57 deletions(-)
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index aee9e033012f..4f24ec703479 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
return iommu_page_response(dev, &resp);
}
-static enum iommu_page_response_code
-iopf_handle_single(struct iopf_fault *iopf)
-{
- vm_fault_t ret;
- struct mm_struct *mm;
- struct vm_area_struct *vma;
- unsigned int access_flags = 0;
- unsigned int fault_flags = FAULT_FLAG_REMOTE;
- struct iommu_fault_page_request *prm = &iopf->fault.prm;
- enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
- if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
- return status;
-
- mm = iommu_sva_find(prm->pasid);
- if (IS_ERR_OR_NULL(mm))
- return status;
-
- mmap_read_lock(mm);
-
- vma = find_extend_vma(mm, prm->addr);
- if (!vma)
- /* Unmapped area */
- goto out_put_mm;
-
- if (prm->perm & IOMMU_FAULT_PERM_READ)
- access_flags |= VM_READ;
-
- if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
- access_flags |= VM_WRITE;
- fault_flags |= FAULT_FLAG_WRITE;
- }
-
- if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
- access_flags |= VM_EXEC;
- fault_flags |= FAULT_FLAG_INSTRUCTION;
- }
-
- if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
- fault_flags |= FAULT_FLAG_USER;
-
- if (access_flags & ~vma->vm_flags)
- /* Access fault */
- goto out_put_mm;
-
- ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
- status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
- IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
- mmap_read_unlock(mm);
- mmput(mm);
-
- return status;
-}
-
static void iopf_handle_group(struct work_struct *work)
{
struct iopf_group *group;
+ struct iommu_domain *domain;
struct iopf_fault *iopf, *next;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
group = container_of(work, struct iopf_group, work);
+ domain = iommu_get_domain_for_dev_pasid(group->dev,
+ group->last_fault.fault.prm.pasid);
+ if (!domain || !domain->iopf_handler)
+ status = IOMMU_PAGE_RESP_INVALID;
list_for_each_entry_safe(iopf, next, &group->faults, list) {
/*
@@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work)
* faults in the group if there is an error.
*/
if (status == IOMMU_PAGE_RESP_SUCCESS)
- status = iopf_handle_single(iopf);
+ status = domain->iopf_handler(&iopf->fault,
+ domain->fault_data);
if (!(iopf->fault.prm.flags &
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
--
2.25.1
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.
Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/intel-iommu.h | 5 ++++
drivers/iommu/intel/iommu.c | 2 ++
drivers/iommu/intel/svm.c | 49 +++++++++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 31e3edc0fc7e..9007428a68f1 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -743,6 +743,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
+struct iommu_domain *intel_svm_domain_alloc(void);
struct intel_svm_dev {
struct list_head list;
@@ -768,6 +769,10 @@ struct intel_svm {
};
#else
static inline void intel_svm_check(struct intel_iommu *iommu) {}
+static inline struct iommu_domain *intel_svm_domain_alloc(void)
+{
+ return NULL;
+}
#endif
#ifdef CONFIG_INTEL_IOMMU_DEBUGFS
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 44016594831d..993a1ce509a8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4298,6 +4298,8 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
return domain;
case IOMMU_DOMAIN_IDENTITY:
return &si_domain->domain;
+ case IOMMU_DOMAIN_SVA:
+ return intel_svm_domain_alloc();
default:
return NULL;
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d04880a291c3..7d4f9d173013 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -931,3 +931,52 @@ int intel_svm_page_response(struct device *dev,
mutex_unlock(&pasid_mutex);
return ret;
}
+
+static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct mm_struct *mm = domain->mm;
+ struct iommu_sva *sva;
+ int ret = 0;
+
+ mutex_lock(&pasid_mutex);
+ sva = intel_svm_bind_mm(iommu, dev, mm);
+ if (IS_ERR(sva))
+ ret = PTR_ERR(sva);
+ mutex_unlock(&pasid_mutex);
+
+ return ret;
+}
+
+static void intel_svm_block_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ mutex_lock(&pasid_mutex);
+ intel_svm_unbind_mm(dev, pasid);
+ mutex_unlock(&pasid_mutex);
+}
+
+static void intel_svm_domain_free(struct iommu_domain *domain)
+{
+ kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_svm_domain_ops = {
+ .set_dev_pasid = intel_svm_set_dev_pasid,
+ .block_dev_pasid = intel_svm_block_dev_pasid,
+ .free = intel_svm_domain_free,
+};
+
+struct iommu_domain *intel_svm_domain_alloc(void)
+{
+ struct dmar_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+ if (!domain)
+ return NULL;
+ domain->domain.ops = &intel_svm_domain_ops;
+
+ return &domain->domain;
+}
--
2.25.1
Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} | 6 +++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
drivers/iommu/intel/iommu.c | 2 +-
drivers/iommu/intel/svm.c | 2 +-
drivers/iommu/io-pgfault.c | 2 +-
drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} | 2 +-
drivers/iommu/iommu.c | 2 +-
drivers/iommu/Makefile | 2 +-
9 files changed, 11 insertions(+), 11 deletions(-)
rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (95%)
rename drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} (99%)
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva.h
similarity index 95%
rename from drivers/iommu/iommu-sva-lib.h
rename to drivers/iommu/iommu-sva.h
index 1b3ace4b5863..7215a761b962 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva.h
@@ -2,8 +2,8 @@
/*
* SVA library for IOMMU drivers
*/
-#ifndef _IOMMU_SVA_LIB_H
-#define _IOMMU_SVA_LIB_H
+#ifndef _IOMMU_SVA_H
+#define _IOMMU_SVA_H
#include <linux/ioasid.h>
#include <linux/mm_types.h>
@@ -72,4 +72,4 @@ iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
return IOMMU_PAGE_RESP_INVALID;
}
#endif /* CONFIG_IOMMU_SVA */
-#endif /* _IOMMU_SVA_LIB_H */
+#endif /* _IOMMU_SVA_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index e36c689f56c5..b33bc592ccfa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -10,7 +10,7 @@
#include <linux/slab.h>
#include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
#include "../../io-pgtable-arm.h"
struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8b9b78c7a67d..79e8991e9181 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -31,7 +31,7 @@
#include <linux/amba/bus.h>
#include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
static bool disable_bypass = true;
module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 37d68eda1889..d16ab6d1cc05 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,7 @@
#include <linux/tboot.h>
#include "../irq_remapping.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
#include "pasid.h"
#include "cap_audit.h"
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index db55b06cafdf..58656a93b201 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -25,7 +25,7 @@
#include "pasid.h"
#include "perf.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
static irqreturn_t prq_event_thread(int irq, void *d);
static void intel_svm_drain_prq(struct device *dev, u32 pasid);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 4f24ec703479..91b1c6bd01d4 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,7 +11,7 @@
#include <linux/slab.h>
#include <linux/workqueue.h>
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
/**
* struct iopf_queue - IO Page Fault queue
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva.c
similarity index 99%
rename from drivers/iommu/iommu-sva-lib.c
rename to drivers/iommu/iommu-sva.c
index dee8e2e42e06..1a4897a5697b 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva.c
@@ -6,7 +6,7 @@
#include <linux/sched/mm.h>
#include <linux/iommu.h>
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
static DEFINE_MUTEX(iommu_sva_lock);
static DECLARE_IOASID_SET(iommu_sva_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a0e3d8083943..c766c852b647 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,7 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 44475a9b3eea..c1763476162b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
obj-$(CONFIG_APPLE_DART) += apple-dart.o
--
2.25.1
Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.
Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 2 ++
drivers/iommu/iommu.c | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 03fbb1b71536..d50afb2c9a09 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -364,6 +364,7 @@ struct iommu_fault_param {
* @fwspec: IOMMU fwspec data
* @iommu_dev: IOMMU device this device is linked to
* @priv: IOMMU Driver private data
+ * @max_pasids: number of PASIDs device can consume
*
* TODO: migrate other per device data pointers under iommu_dev_data, e.g.
* struct iommu_group *iommu_group;
@@ -375,6 +376,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;
+ u32 max_pasids;
};
int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 847ad47a2dfd..6b731568efff 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -20,6 +20,7 @@
#include <linux/idr.h>
#include <linux/err.h>
#include <linux/pci.h>
+#include <linux/pci-ats.h>
#include <linux/bitops.h>
#include <linux/property.h>
#include <linux/fsl/mc.h>
@@ -218,6 +219,24 @@ static void dev_iommu_free(struct device *dev)
kfree(param);
}
+static u32 dev_iommu_get_max_pasids(struct device *dev)
+{
+ u32 max_pasids = 0, bits = 0;
+ int ret;
+
+ if (dev_is_pci(dev)) {
+ ret = pci_max_pasids(to_pci_dev(dev));
+ if (ret > 0)
+ max_pasids = ret;
+ } else {
+ ret = device_property_read_u32(dev, "pasid-num-bits", &bits);
+ if (!ret)
+ max_pasids = 1UL << bits;
+ }
+
+ return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
+}
+
static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -243,6 +262,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
}
dev->iommu->iommu_dev = iommu_dev;
+ dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
group = iommu_group_get_for_dev(dev);
if (IS_ERR(group)) {
--
2.25.1
This adds some mechanisms around the iommu_domain so that the I/O page
fault handling framework could route a page fault to the domain and
call the fault handler from it.
Add pointers to the page fault handler and its private data in struct
iommu_domain. The fault handler will be called with the private data
as a parameter once a page fault is routed to the domain. Any kernel
component which owns an iommu domain could install handler and its
private parameter so that the page fault could be further routed and
handled.
This also prepares the SVA implementation to be the first consumer of
the per-domain page fault handling model. The I/O page fault handler
for SVA is copied to the SVA file with mmget_not_zero() added before
mmap_read_lock().
Suggested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
include/linux/iommu.h | 3 ++
drivers/iommu/iommu-sva-lib.h | 8 +++++
drivers/iommu/io-pgfault.c | 7 ++++
drivers/iommu/iommu-sva-lib.c | 60 +++++++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 4 +++
5 files changed, 82 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 17780537db6e..36c822a5b135 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -105,6 +105,9 @@ struct iommu_domain {
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+ enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
+ void *data);
+ void *fault_data;
union {
struct { /* IOMMU_DOMAIN_DMA */
iommu_fault_handler_t handler;
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..1b3ace4b5863 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -26,6 +26,8 @@ int iopf_queue_flush_dev(struct device *dev);
struct iopf_queue *iopf_queue_alloc(const char *name);
void iopf_queue_free(struct iopf_queue *queue);
int iopf_queue_discard_partial(struct iopf_queue *queue);
+enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
#else /* CONFIG_IOMMU_SVA */
static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
@@ -63,5 +65,11 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
{
return -ENODEV;
}
+
+static inline enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+ return IOMMU_PAGE_RESP_INVALID;
+}
#endif /* CONFIG_IOMMU_SVA */
#endif /* _IOMMU_SVA_LIB_H */
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 1df8c1dcae77..aee9e033012f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct *work)
* request completes, outstanding faults will have been dealt with by the time
* the PASID is freed.
*
+ * Any valid page fault will be eventually routed to an iommu domain and the
+ * page fault handler installed there will get called. The users of this
+ * handling framework should guarantee that the iommu domain could only be
+ * freed after the device has stopped generating page faults (or the iommu
+ * hardware has been set to block the page faults) and the pending page faults
+ * have been flushed.
+ *
* Return: 0 on success and <0 on error.
*/
int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 1e3e2b395b1e..dee8e2e42e06 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -167,3 +167,63 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
return domain->mm->pasid;
}
EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+/*
+ * I/O page fault handler for SVA
+ */
+enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+ vm_fault_t ret;
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+ unsigned int access_flags = 0;
+ struct iommu_domain *domain = data;
+ unsigned int fault_flags = FAULT_FLAG_REMOTE;
+ struct iommu_fault_page_request *prm = &fault->prm;
+ enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+ if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
+ return status;
+
+ mm = domain->mm;
+ if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
+ return status;
+
+ mmap_read_lock(mm);
+
+ vma = find_extend_vma(mm, prm->addr);
+ if (!vma)
+ /* Unmapped area */
+ goto out_put_mm;
+
+ if (prm->perm & IOMMU_FAULT_PERM_READ)
+ access_flags |= VM_READ;
+
+ if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
+ access_flags |= VM_WRITE;
+ fault_flags |= FAULT_FLAG_WRITE;
+ }
+
+ if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
+ access_flags |= VM_EXEC;
+ fault_flags |= FAULT_FLAG_INSTRUCTION;
+ }
+
+ if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
+ fault_flags |= FAULT_FLAG_USER;
+
+ if (access_flags & ~vma->vm_flags)
+ /* Access fault */
+ goto out_put_mm;
+
+ ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
+ status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+ IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ return status;
+}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 34d71418e7c7..a0e3d8083943 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,6 +29,8 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>
+#include "iommu-sva-lib.h"
+
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
@@ -3199,6 +3201,8 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
domain->type = IOMMU_DOMAIN_SVA;
mmgrab(mm);
domain->mm = mm;
+ domain->iopf_handler = iommu_sva_handle_iopf;
+ domain->fault_data = domain;
return domain;
}
--
2.25.1
Hi folks,
On 2022/6/21 22:43, Lu Baolu wrote:
> Hi folks,
>
> The former part of this series refactors the IOMMU SVA code by assigning
> an SVA type of iommu_domain to a shared virtual address and replacing
> sva_bind/unbind iommu ops with set/block_dev_pasid domain ops.
>
> The latter part changes the existing I/O page fault handling framework
> from only serving SVA to a generic one. Any driver or component could
> handle the I/O page faults for its domain in its own way by installing
> an I/O page fault handler.
>
> This series has been functionally tested on an x86 machine and compile
> tested for all architectures.
>
> This series is also available on github:
> [2]https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v9
>
> Please review and suggest.
Just a gentle ping on this series.
Do you have further inputs? I am trying to see if we can merge this
series for v5.20. The drivers also depend on it to enable their kernel
DMA with PASID.
Sorry to disturb you.
Best regards,
baolu
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
>
> Use this field to save the number of PASIDs that a device is able to
> consume. It is a generic attribute of a device and lifting it into the
> per-device dev_iommu struct could help to avoid the boilerplate code
> in various IOMMU drivers.
>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> include/linux/iommu.h | 2 ++
> drivers/iommu/iommu.c | 20 ++++++++++++++++++++
> 2 files changed, 22 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 03fbb1b71536..d50afb2c9a09 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -364,6 +364,7 @@ struct iommu_fault_param {
> * @fwspec: IOMMU fwspec data
> * @iommu_dev: IOMMU device this device is linked to
> * @priv: IOMMU Driver private data
> + * @max_pasids: number of PASIDs device can consume
... PASIDs *this* device can consume
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
>
> Use this field to keep the number of supported PASIDs that an IOMMU
> hardware is able to support. This is a generic attribute of an IOMMU
> and lifting it into the per-IOMMU device structure makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver that supports PASID related features should set this
> field before enabling them on the devices.
>
> In the Intel IOMMU driver, intel_iommu_sm is moved to
> CONFIG_INTEL_IOMMU
> enclave so that the pasid_supported() helper could be used in dmar.c
> without compilation errors.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
>
> The sva iommu_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
>
> - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA
> domain
> type. The IOMMU drivers that support SVA should provide the sva
> domain specific iommu_domain_ops.
> - Add a helper to allocate an SVA domain. The iommu_domain_free()
> is still used to free an SVA domain.
> - Add helpers to attach an SVA domain to a device and the reverse
> operation.
>
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, the attach/detach interfaces only apply to devices
> belonging to the singleton groups, and the singleton is immutable in
> fabric i.e. not affected by hotplug.
>
> The iommu_attach/detach_device_pasid() can be used for other purposes,
> such as kernel DMA with pasid, mediation device, etc.
I'd split this into two patches. One for adding iommu_attach/
detach_device_pasid() and set/block_dev_pasid ops, and the
other for adding SVA.
> struct iommu_domain {
> unsigned type;
> const struct iommu_domain_ops *ops;
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> - iommu_fault_handler_t handler;
> - void *handler_token;
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> + union {
> + struct { /* IOMMU_DOMAIN_DMA */
> + iommu_fault_handler_t handler;
> + void *handler_token;
> + };
why is it DMA domain specific? What about unmanaged
domain? Unrecoverable fault can happen on any type
including SVA. Hence I think above should be domain type
agnostic.
> + struct { /* IOMMU_DOMAIN_SVA */
> + struct mm_struct *mm;
> + };
> + };
> };
>
> +
> +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> + struct mm_struct *mm)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_domain *domain;
> +
> + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
> + if (!domain)
> + return NULL;
> +
> + domain->type = IOMMU_DOMAIN_SVA;
It's a bit weird that the type has been specified when calling
ops->domain_alloc while it still leaves to the caller to set the
type. But this is not caused by this series. could be cleaned
up separately.
> +
> + mutex_lock(&group->mutex);
> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> GFP_KERNEL);
> + if (curr)
> + goto out_unlock;
Need check xa_is_err(old).
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
>
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
> mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + ioasid_t max_pasids;
> + int ret = -EINVAL;
> +
> + /* Allocate mm->pasid if necessary. */
this comment is for iommu_sva_alloc_pasid()
> + max_pasids = dev->iommu->max_pasids;
> + if (!max_pasids)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +
...
> +void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> + struct device *dev = handle->dev;
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> + ioasid_t pasid = iommu_sva_get_pasid(handle);
> +
> + mutex_lock(&iommu_sva_lock);
> + if (refcount_dec_and_test(&domain->bond.users)) {
> + iommu_detach_device_pasid(domain, dev, pasid);
> + iommu_domain_free(domain);
> + }
> + mutex_unlock(&iommu_sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> +
> + return domain->mm->pasid;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
Looks this is only used by unbind_device. Just open code it.
On 2022/6/21 下午10:43, Lu Baolu wrote:
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Have tested the series on aarch64.
Thanks
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++
> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 69 +++++++++++++++++++
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +
> 3 files changed, 78 insertions(+)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index d2ba86470c42..96399dd3a67a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
> void arm_smmu_sva_unbind(struct iommu_sva *handle);
> u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
> void arm_smmu_sva_notifier_synchronize(void);
> +struct iommu_domain *arm_smmu_sva_domain_alloc(void);
> #else /* CONFIG_ARM_SMMU_V3_SVA */
> static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> {
> @@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
> }
>
> static inline void arm_smmu_sva_notifier_synchronize(void) {}
> +
> +static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
> +{
> + return NULL;
> +}
> #endif /* CONFIG_ARM_SMMU_V3_SVA */
> #endif /* _ARM_SMMU_V3_H */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index f155d406c5d5..fc4555dac5b4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void)
> */
> mmu_notifier_synchronize();
> }
> +
> +static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id)
> +{
> + int ret = 0;
> + struct mm_struct *mm;
> + struct iommu_sva *handle;
> +
> + if (domain->type != IOMMU_DOMAIN_SVA)
> + return -EINVAL;
> +
> + mm = domain->mm;
> + if (WARN_ON(!mm))
> + return -ENODEV;
> +
> + mutex_lock(&sva_lock);
> + handle = __arm_smmu_sva_bind(dev, mm);
> + if (IS_ERR(handle))
> + ret = PTR_ERR(handle);
> + mutex_unlock(&sva_lock);
> +
> + return ret;
> +}
> +
> +static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id)
> +{
> + struct mm_struct *mm = domain->mm;
> + struct arm_smmu_bond *bond = NULL, *t;
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> + mutex_lock(&sva_lock);
> + list_for_each_entry(t, &master->bonds, list) {
> + if (t->mm == mm) {
> + bond = t;
> + break;
> + }
> + }
> +
> + if (!WARN_ON(!bond) && refcount_dec_and_test(&bond->refs)) {
> + list_del(&bond->list);
> + arm_smmu_mmu_notifier_put(bond->smmu_mn);
> + kfree(bond);
> + }
> + mutex_unlock(&sva_lock);
> +}
> +
> +static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
> +{
> + kfree(domain);
> +}
> +
> +static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
> + .set_dev_pasid = arm_smmu_sva_set_dev_pasid,
> + .block_dev_pasid = arm_smmu_sva_block_dev_pasid,
> + .free = arm_smmu_sva_domain_free,
> +};
> +
> +struct iommu_domain *arm_smmu_sva_domain_alloc(void)
> +{
> + struct iommu_domain *domain;
> +
> + domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> + if (!domain)
> + return NULL;
> + domain->ops = &arm_smmu_sva_domain_ops;
> +
> + return domain;
> +}
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ae8ec8df47c1..a30b252e2f95 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1999,6 +1999,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> {
> struct arm_smmu_domain *smmu_domain;
>
> + if (type == IOMMU_DOMAIN_SVA)
> + return arm_smmu_sva_domain_alloc();
> +
> if (type != IOMMU_DOMAIN_UNMANAGED &&
> type != IOMMU_DOMAIN_DMA &&
> type != IOMMU_DOMAIN_DMA_FQ &&
Hi,
在 2022/6/21 22:43, Lu Baolu 写道:
> Tweak the I/O page fault handling framework to route the page faults to
> the domain and call the page fault handler retrieved from the domain.
> This makes the I/O page fault handling framework possible to serve more
> usage scenarios as long as they have an IOMMU domain and install a page
> fault handler in it. Some unused functions are also removed to avoid
> dead code.
>
> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
> for a {device, PASID} pair is used. It will be used by the page fault
> handling framework which knows {device, PASID} reported from the iommu
> driver. We have a guarantee that the SVA domain doesn't go away during
> IOPF handling, because unbind() waits for pending faults with
> iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
> to synchronize life cycle of the iommu domains between the unbind() and
> the interrupt threads.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/iommu/io-pgfault.c | 64 +++++---------------------------------
> 1 file changed, 7 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index aee9e033012f..4f24ec703479 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
> return iommu_page_response(dev, &resp);
> }
>
> -static enum iommu_page_response_code
> -iopf_handle_single(struct iopf_fault *iopf)
> -{
> - vm_fault_t ret;
> - struct mm_struct *mm;
> - struct vm_area_struct *vma;
> - unsigned int access_flags = 0;
> - unsigned int fault_flags = FAULT_FLAG_REMOTE;
> - struct iommu_fault_page_request *prm = &iopf->fault.prm;
> - enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
> -
> - if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
> - return status;
> -
> - mm = iommu_sva_find(prm->pasid);
> - if (IS_ERR_OR_NULL(mm))
> - return status;
> -
> - mmap_read_lock(mm);
> -
> - vma = find_extend_vma(mm, prm->addr);
> - if (!vma)
> - /* Unmapped area */
> - goto out_put_mm;
> -
> - if (prm->perm & IOMMU_FAULT_PERM_READ)
> - access_flags |= VM_READ;
> -
> - if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
> - access_flags |= VM_WRITE;
> - fault_flags |= FAULT_FLAG_WRITE;
> - }
> -
> - if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
> - access_flags |= VM_EXEC;
> - fault_flags |= FAULT_FLAG_INSTRUCTION;
> - }
> -
> - if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
> - fault_flags |= FAULT_FLAG_USER;
> -
> - if (access_flags & ~vma->vm_flags)
> - /* Access fault */
> - goto out_put_mm;
> -
> - ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
> - status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
> - IOMMU_PAGE_RESP_SUCCESS;
> -
> -out_put_mm:
> - mmap_read_unlock(mm);
> - mmput(mm);
> -
> - return status;
> -}
> -
Once the iopf_handle_single() is removed, the name of
iopf_handle_group() looks a little weired
and confused, does this group mean the iommu group (domain) ? while I
take some minutes to
look into the code, oh, means a batch / list / queue of iopfs , and
iopf_handle_group() becomes a
generic iopf_handler() .
Doe it make sense to revise the names of iopf_handle_group(),
iopf_complete_group, iopf_group in
this patch set ?
Thanks,
Ethan
> static void iopf_handle_group(struct work_struct *work)
> {
> struct iopf_group *group;
> + struct iommu_domain *domain;
> struct iopf_fault *iopf, *next;
> enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
>
> group = container_of(work, struct iopf_group, work);
> + domain = iommu_get_domain_for_dev_pasid(group->dev,
> + group->last_fault.fault.prm.pasid);
> + if (!domain || !domain->iopf_handler)
> + status = IOMMU_PAGE_RESP_INVALID;
>
> list_for_each_entry_safe(iopf, next, &group->faults, list) {
> /*
> @@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work)
> * faults in the group if there is an error.
> */
> if (status == IOMMU_PAGE_RESP_SUCCESS)
> - status = iopf_handle_single(iopf);
> + status = domain->iopf_handler(&iopf->fault,
> + domain->fault_data);
>
> if (!(iopf->fault.prm.flags &
> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
--
"firm, enduring, strong, and long-lived"
On 2022/6/27 19:50, Zhangfei Gao wrote:
>
> On 2022/6/21 下午10:43, Lu Baolu wrote:
>> Add support for SVA domain allocation and provide an SVA-specific
>> iommu_domain_ops.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>
> Tested-by: Zhangfei Gao <[email protected]>
> Have tested the series on aarch64.
Thank you very much! Very appreciated for your help!
Best regards,
baolu
Hi Kevin,
On 2022/6/27 16:29, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, June 21, 2022 10:44 PM
>>
>> The sva iommu_domain represents a hardware pagetable that the IOMMU
>> hardware could use for SVA translation. This adds some infrastructure
>> to support SVA domain in the iommu common layer. It includes:
>>
>> - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA
>> domain
>> type. The IOMMU drivers that support SVA should provide the sva
>> domain specific iommu_domain_ops.
>> - Add a helper to allocate an SVA domain. The iommu_domain_free()
>> is still used to free an SVA domain.
>> - Add helpers to attach an SVA domain to a device and the reverse
>> operation.
>>
>> Some buses, like PCI, route packets without considering the PASID value.
>> Thus a DMA target address with PASID might be treated as P2P if the
>> address falls into the MMIO BAR of other devices in the group. To make
>> things simple, the attach/detach interfaces only apply to devices
>> belonging to the singleton groups, and the singleton is immutable in
>> fabric i.e. not affected by hotplug.
>>
>> The iommu_attach/detach_device_pasid() can be used for other purposes,
>> such as kernel DMA with pasid, mediation device, etc.
>
> I'd split this into two patches. One for adding iommu_attach/
> detach_device_pasid() and set/block_dev_pasid ops, and the
> other for adding SVA.
Yes. Make sense.
>
>> struct iommu_domain {
>> unsigned type;
>> const struct iommu_domain_ops *ops;
>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
>> - iommu_fault_handler_t handler;
>> - void *handler_token;
>> struct iommu_domain_geometry geometry;
>> struct iommu_dma_cookie *iova_cookie;
>> + union {
>> + struct { /* IOMMU_DOMAIN_DMA */
>> + iommu_fault_handler_t handler;
>> + void *handler_token;
>> + };
>
> why is it DMA domain specific? What about unmanaged
> domain? Unrecoverable fault can happen on any type
> including SVA. Hence I think above should be domain type
> agnostic.
The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Jean has already started this work.
https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
Currently this is only for DMA domains, hence Robin suggested to make it
exclude with the SVA domain things.
https://lore.kernel.org/linux-iommu/[email protected]/
>
>> + struct { /* IOMMU_DOMAIN_SVA */
>> + struct mm_struct *mm;
>> + };
>> + };
>> };
>>
>
>
>
>> +
>> +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
>> + struct mm_struct *mm)
>> +{
>> + const struct iommu_ops *ops = dev_iommu_ops(dev);
>> + struct iommu_domain *domain;
>> +
>> + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
>> + if (!domain)
>> + return NULL;
>> +
>> + domain->type = IOMMU_DOMAIN_SVA;
>
> It's a bit weird that the type has been specified when calling
> ops->domain_alloc while it still leaves to the caller to set the
> type. But this is not caused by this series. could be cleaned
> up separately.
Yes. Robin has patches to refactor the domain allocation interface,
let's wait and see what it looks like finally.
>
>> +
>> + mutex_lock(&group->mutex);
>> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
>> GFP_KERNEL);
>> + if (curr)
>> + goto out_unlock;
>
> Need check xa_is_err(old).
Either
(1) old entry is a valid pointer, or
(2) xa_is_err(curr)
are failure cases. Hence, "curr == NULL" is the only check we need. Did
I miss anything?
Best regards,
baolu
I tested the patch and works as expect.
Tested-by: Tony Zhu [email protected]
Tony(zhu, xinzhan)
Cube:SHZ1-3W-279
iNet:8821-5077
-----Original Message-----
From: Baolu Lu <[email protected]>
Sent: Tuesday, June 28, 2022 2:13 PM
To: Zhangfei Gao <[email protected]>; Joerg Roedel <[email protected]>; Jason Gunthorpe <[email protected]>; Christoph Hellwig <[email protected]>; Tian, Kevin <[email protected]>; Raj, Ashok <[email protected]>; Will Deacon <[email protected]>; Robin Murphy <[email protected]>; Jean-Philippe Brucker <[email protected]>; Jiang, Dave <[email protected]>; Vinod Koul <[email protected]>; Zhu, Tony <[email protected]>
Cc: [email protected]; Jean-Philippe Brucker <[email protected]>; [email protected]; [email protected]; [email protected]; Pan, Jacob jun <[email protected]>
Subject: Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support
On 2022/6/27 19:50, Zhangfei Gao wrote:
>
>
> On 2022/6/21 下午10:43, Lu Baolu wrote:
>> Add support for SVA domain allocation and provide an SVA-specific
>> iommu_domain_ops.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>
> Tested-by: Zhangfei Gao <[email protected]> Have tested the
> series on aarch64.
Tony has been helping to validate this series on Intel's platform.
Tony, can I add your Test-by as well in this series?
Best regards,
baolu
On 2022/6/27 19:50, Zhangfei Gao wrote:
>
>
> On 2022/6/21 下午10:43, Lu Baolu wrote:
>> Add support for SVA domain allocation and provide an SVA-specific
>> iommu_domain_ops.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>
> Tested-by: Zhangfei Gao <[email protected]>
> Have tested the series on aarch64.
Tony has been helping to validate this series on Intel's platform.
Tony, can I add your Test-by as well in this series?
Best regards,
baolu
On 2022/6/27 18:14, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, June 21, 2022 10:44 PM
>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
>> mm_struct *mm)
>> +{
>> + struct iommu_domain *domain;
>> + ioasid_t max_pasids;
>> + int ret = -EINVAL;
>> +
>> + /* Allocate mm->pasid if necessary. */
>
> this comment is for iommu_sva_alloc_pasid()
Updated.
>
>> + max_pasids = dev->iommu->max_pasids;
>> + if (!max_pasids)
>> + return ERR_PTR(-EOPNOTSUPP);
>> +
>> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
>> + if (ret)
>> + return ERR_PTR(ret);
>> +
>
>
> ...
>> +void iommu_sva_unbind_device(struct iommu_sva *handle)
>> +{
>> + struct device *dev = handle->dev;
>> + struct iommu_domain *domain =
>> + container_of(handle, struct iommu_domain, bond);
>> + ioasid_t pasid = iommu_sva_get_pasid(handle);
>> +
>> + mutex_lock(&iommu_sva_lock);
>> + if (refcount_dec_and_test(&domain->bond.users)) {
>> + iommu_detach_device_pasid(domain, dev, pasid);
>> + iommu_domain_free(domain);
>> + }
>> + mutex_unlock(&iommu_sva_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
>> +
>> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
>> +{
>> + struct iommu_domain *domain =
>> + container_of(handle, struct iommu_domain, bond);
>> +
>> + return domain->mm->pasid;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
>
> Looks this is only used by unbind_device. Just open code it.
It's part of current IOMMU/SVA interfaces for the device drivers, and
has been used in various drivers.
$ git grep iommu_sva_get_pasid
drivers/dma/idxd/cdev.c: pasid = iommu_sva_get_pasid(sva);
drivers/iommu/iommu-sva-lib.c: ioasid_t pasid =
iommu_sva_get_pasid(handle);
drivers/iommu/iommu-sva-lib.c:u32 iommu_sva_get_pasid(struct iommu_sva
*handle)
drivers/iommu/iommu-sva-lib.c:EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
drivers/misc/uacce/uacce.c: pasid = iommu_sva_get_pasid(handle);
include/linux/iommu.h:u32 iommu_sva_get_pasid(struct iommu_sva *handle);
include/linux/iommu.h:static inline u32 iommu_sva_get_pasid(struct
iommu_sva *handle)
Or, I missed anything?
Best regards,
baolu
Hi Ethan,
On 2022/6/27 21:03, Ethan Zhao wrote:
> Hi,
>
> 在 2022/6/21 22:43, Lu Baolu 写道:
>> Tweak the I/O page fault handling framework to route the page faults to
>> the domain and call the page fault handler retrieved from the domain.
>> This makes the I/O page fault handling framework possible to serve more
>> usage scenarios as long as they have an IOMMU domain and install a page
>> fault handler in it. Some unused functions are also removed to avoid
>> dead code.
>>
>> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
>> for a {device, PASID} pair is used. It will be used by the page fault
>> handling framework which knows {device, PASID} reported from the iommu
>> driver. We have a guarantee that the SVA domain doesn't go away during
>> IOPF handling, because unbind() waits for pending faults with
>> iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
>> to synchronize life cycle of the iommu domains between the unbind() and
>> the interrupt threads.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> ---
>> drivers/iommu/io-pgfault.c | 64 +++++---------------------------------
>> 1 file changed, 7 insertions(+), 57 deletions(-)
>>
>> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
>> index aee9e033012f..4f24ec703479 100644
>> --- a/drivers/iommu/io-pgfault.c
>> +++ b/drivers/iommu/io-pgfault.c
>> @@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev,
>> struct iopf_fault *iopf,
>> return iommu_page_response(dev, &resp);
>> }
>> -static enum iommu_page_response_code
>> -iopf_handle_single(struct iopf_fault *iopf)
>> -{
>> - vm_fault_t ret;
>> - struct mm_struct *mm;
>> - struct vm_area_struct *vma;
>> - unsigned int access_flags = 0;
>> - unsigned int fault_flags = FAULT_FLAG_REMOTE;
>> - struct iommu_fault_page_request *prm = &iopf->fault.prm;
>> - enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>> -
>> - if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
>> - return status;
>> -
>> - mm = iommu_sva_find(prm->pasid);
>> - if (IS_ERR_OR_NULL(mm))
>> - return status;
>> -
>> - mmap_read_lock(mm);
>> -
>> - vma = find_extend_vma(mm, prm->addr);
>> - if (!vma)
>> - /* Unmapped area */
>> - goto out_put_mm;
>> -
>> - if (prm->perm & IOMMU_FAULT_PERM_READ)
>> - access_flags |= VM_READ;
>> -
>> - if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
>> - access_flags |= VM_WRITE;
>> - fault_flags |= FAULT_FLAG_WRITE;
>> - }
>> -
>> - if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
>> - access_flags |= VM_EXEC;
>> - fault_flags |= FAULT_FLAG_INSTRUCTION;
>> - }
>> -
>> - if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
>> - fault_flags |= FAULT_FLAG_USER;
>> -
>> - if (access_flags & ~vma->vm_flags)
>> - /* Access fault */
>> - goto out_put_mm;
>> -
>> - ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
>> - status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
>> - IOMMU_PAGE_RESP_SUCCESS;
>> -
>> -out_put_mm:
>> - mmap_read_unlock(mm);
>> - mmput(mm);
>> -
>> - return status;
>> -}
>> -
>
> Once the iopf_handle_single() is removed, the name of
> iopf_handle_group() looks a little weired
>
> and confused, does this group mean the iommu group (domain) ? while I
> take some minutes to
No. This is not the iommu group. It's page request group defined by the
PCI SIG spec. Multiple page requests could be put in a group with a
same group id. All page requests in a group could be responded to device
in one shot.
Best regards,
baolu
>
> look into the code, oh, means a batch / list / queue of iopfs , and
> iopf_handle_group() becomes a
>
> generic iopf_handler() .
>
> Doe it make sense to revise the names of iopf_handle_group(),
> iopf_complete_group, iopf_group in
>
> this patch set ?
>
>
> Thanks,
>
> Ethan
>
>> static void iopf_handle_group(struct work_struct *work)
>> {
>> struct iopf_group *group;
>> + struct iommu_domain *domain;
>> struct iopf_fault *iopf, *next;
>> enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
>> group = container_of(work, struct iopf_group, work);
>> + domain = iommu_get_domain_for_dev_pasid(group->dev,
>> + group->last_fault.fault.prm.pasid);
>> + if (!domain || !domain->iopf_handler)
>> + status = IOMMU_PAGE_RESP_INVALID;
>> list_for_each_entry_safe(iopf, next, &group->faults, list) {
>> /*
>> @@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct
>> *work)
>> * faults in the group if there is an error.
>> */
>> if (status == IOMMU_PAGE_RESP_SUCCESS)
>> - status = iopf_handle_single(iopf);
>> + status = domain->iopf_handler(&iopf->fault,
>> + domain->fault_data);
>> if (!(iopf->fault.prm.flags &
>> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
> +/*
> + * I/O page fault handler for SVA
> + */
> +enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +{
> + vm_fault_t ret;
> + struct mm_struct *mm;
> + struct vm_area_struct *vma;
> + unsigned int access_flags = 0;
> + struct iommu_domain *domain = data;
> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
> + struct iommu_fault_page_request *prm = &fault->prm;
> + enum iommu_page_response_code status =
> IOMMU_PAGE_RESP_INVALID;
> +
> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
> + return status;
> +
> + mm = domain->mm;
What about directly passing domain->mm in as the fault data?
The entire logic here is only about mm instead of domain.
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
>
> Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
> for SVA implementation in iommu core.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, June 21, 2022 10:44 PM
>
> Tweak the I/O page fault handling framework to route the page faults to
> the domain and call the page fault handler retrieved from the domain.
> This makes the I/O page fault handling framework possible to serve more
> usage scenarios as long as they have an IOMMU domain and install a page
> fault handler in it. Some unused functions are also removed to avoid
> dead code.
>
> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
> for a {device, PASID} pair is used. It will be used by the page fault
> handling framework which knows {device, PASID} reported from the iommu
> driver. We have a guarantee that the SVA domain doesn't go away during
> IOPF handling, because unbind() waits for pending faults with
> iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
> to synchronize life cycle of the iommu domains between the unbind() and
> the interrupt threads.
I found iopf_queue_flush_dev() is only called in intel-iommu driver. Did
I overlook anything?
> static void iopf_handle_group(struct work_struct *work)
> {
> struct iopf_group *group;
> + struct iommu_domain *domain;
> struct iopf_fault *iopf, *next;
> enum iommu_page_response_code status =
> IOMMU_PAGE_RESP_SUCCESS;
>
> group = container_of(work, struct iopf_group, work);
> + domain = iommu_get_domain_for_dev_pasid(group->dev,
> + group->last_fault.fault.prm.pasid);
> + if (!domain || !domain->iopf_handler)
> + status = IOMMU_PAGE_RESP_INVALID;
Miss a comment on why no refcnt is required on domain as explained
in the commit msg.
> From: Baolu Lu <[email protected]>
> Sent: Tuesday, June 28, 2022 1:41 PM
> >
> >> struct iommu_domain {
> >> unsigned type;
> >> const struct iommu_domain_ops *ops;
> >> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> >> - iommu_fault_handler_t handler;
> >> - void *handler_token;
> >> struct iommu_domain_geometry geometry;
> >> struct iommu_dma_cookie *iova_cookie;
> >> + union {
> >> + struct { /* IOMMU_DOMAIN_DMA */
> >> + iommu_fault_handler_t handler;
> >> + void *handler_token;
> >> + };
> >
> > why is it DMA domain specific? What about unmanaged
> > domain? Unrecoverable fault can happen on any type
> > including SVA. Hence I think above should be domain type
> > agnostic.
>
> The report_iommu_fault() should be replaced by the new
> iommu_report_device_fault(). Jean has already started this work.
>
> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
>
> Currently this is only for DMA domains, hence Robin suggested to make it
> exclude with the SVA domain things.
>
> https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-
> [email protected]/
Then it's worthy a comment that those two fields are for
some legacy fault reporting stuff and DMA type only.
> >
> >> +
> >> + mutex_lock(&group->mutex);
> >> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> >> GFP_KERNEL);
> >> + if (curr)
> >> + goto out_unlock;
> >
> > Need check xa_is_err(old).
>
> Either
>
> (1) old entry is a valid pointer, or
return -EBUSY in this case
> (2) xa_is_err(curr)
return xa_err(cur)
>
> are failure cases. Hence, "curr == NULL" is the only check we need. Did
> I miss anything?
>
But now you always return -EBUSY for all kinds of xa errors.
> From: Baolu Lu <[email protected]>
> Sent: Tuesday, June 28, 2022 1:54 PM
> >> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> >> +{
> >> + struct iommu_domain *domain =
> >> + container_of(handle, struct iommu_domain, bond);
> >> +
> >> + return domain->mm->pasid;
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> >
> > Looks this is only used by unbind_device. Just open code it.
>
> It's part of current IOMMU/SVA interfaces for the device drivers, and
> has been used in various drivers.
>
> $ git grep iommu_sva_get_pasid
> drivers/dma/idxd/cdev.c: pasid = iommu_sva_get_pasid(sva);
> drivers/iommu/iommu-sva-lib.c: ioasid_t pasid =
> iommu_sva_get_pasid(handle);
> drivers/iommu/iommu-sva-lib.c:u32 iommu_sva_get_pasid(struct
> iommu_sva
> *handle)
> drivers/iommu/iommu-sva-
> lib.c:EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> drivers/misc/uacce/uacce.c: pasid = iommu_sva_get_pasid(handle);
> include/linux/iommu.h:u32 iommu_sva_get_pasid(struct iommu_sva
> *handle);
> include/linux/iommu.h:static inline u32 iommu_sva_get_pasid(struct
> iommu_sva *handle)
>
> Or, I missed anything?
>
Forget it. I thought it's a new function introduced in this series. :/
Hi, Baolu
在 2022/6/28 14:28, Baolu Lu 写道:
> Hi Ethan,
>
> On 2022/6/27 21:03, Ethan Zhao wrote:
>> Hi,
>>
>> 在 2022/6/21 22:43, Lu Baolu 写道:
>>> Tweak the I/O page fault handling framework to route the page faults to
>>> the domain and call the page fault handler retrieved from the domain.
>>> This makes the I/O page fault handling framework possible to serve more
>>> usage scenarios as long as they have an IOMMU domain and install a page
>>> fault handler in it. Some unused functions are also removed to avoid
>>> dead code.
>>>
>>> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
>>> for a {device, PASID} pair is used. It will be used by the page fault
>>> handling framework which knows {device, PASID} reported from the iommu
>>> driver. We have a guarantee that the SVA domain doesn't go away during
>>> IOPF handling, because unbind() waits for pending faults with
>>> iopf_queue_flush_dev() before freeing the domain. Hence, there's no
>>> need
>>> to synchronize life cycle of the iommu domains between the unbind() and
>>> the interrupt threads.
>>>
>>> Signed-off-by: Lu Baolu <[email protected]>
>>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>>> ---
>>> drivers/iommu/io-pgfault.c | 64
>>> +++++---------------------------------
>>> 1 file changed, 7 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
>>> index aee9e033012f..4f24ec703479 100644
>>> --- a/drivers/iommu/io-pgfault.c
>>> +++ b/drivers/iommu/io-pgfault.c
>>> @@ -69,69 +69,18 @@ static int iopf_complete_group(struct device
>>> *dev, struct iopf_fault *iopf,
>>> return iommu_page_response(dev, &resp);
>>> }
>>> -static enum iommu_page_response_code
>>> -iopf_handle_single(struct iopf_fault *iopf)
>>> -{
>>> - vm_fault_t ret;
>>> - struct mm_struct *mm;
>>> - struct vm_area_struct *vma;
>>> - unsigned int access_flags = 0;
>>> - unsigned int fault_flags = FAULT_FLAG_REMOTE;
>>> - struct iommu_fault_page_request *prm = &iopf->fault.prm;
>>> - enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>>> -
>>> - if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
>>> - return status;
>>> -
>>> - mm = iommu_sva_find(prm->pasid);
>>> - if (IS_ERR_OR_NULL(mm))
>>> - return status;
>>> -
>>> - mmap_read_lock(mm);
>>> -
>>> - vma = find_extend_vma(mm, prm->addr);
>>> - if (!vma)
>>> - /* Unmapped area */
>>> - goto out_put_mm;
>>> -
>>> - if (prm->perm & IOMMU_FAULT_PERM_READ)
>>> - access_flags |= VM_READ;
>>> -
>>> - if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
>>> - access_flags |= VM_WRITE;
>>> - fault_flags |= FAULT_FLAG_WRITE;
>>> - }
>>> -
>>> - if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
>>> - access_flags |= VM_EXEC;
>>> - fault_flags |= FAULT_FLAG_INSTRUCTION;
>>> - }
>>> -
>>> - if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
>>> - fault_flags |= FAULT_FLAG_USER;
>>> -
>>> - if (access_flags & ~vma->vm_flags)
>>> - /* Access fault */
>>> - goto out_put_mm;
>>> -
>>> - ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
>>> - status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
>>> - IOMMU_PAGE_RESP_SUCCESS;
>>> -
>>> -out_put_mm:
>>> - mmap_read_unlock(mm);
>>> - mmput(mm);
>>> -
>>> - return status;
>>> -}
>>> -
>>
>> Once the iopf_handle_single() is removed, the name of
>> iopf_handle_group() looks a little weired
>>
>> and confused, does this group mean the iommu group (domain) ? while I
>> take some minutes to
>
> No. This is not the iommu group. It's page request group defined by the
> PCI SIG spec. Multiple page requests could be put in a group with a
> same group id. All page requests in a group could be responded to device
> in one shot.
Thanks your explaination, understand the concept of PCIe PRG. I meant
do we still have the necessity to mention the "group" here in the name
iopf_handle_group(), which one is better ? iopf_handle_prg() or
iopf_handler(), perhaps none of them ? :)
Thanks,
Ethan
>
> Best regards,
> baolu
>
>>
>> look into the code, oh, means a batch / list / queue of iopfs , and
>> iopf_handle_group() becomes a
>>
>> generic iopf_handler() .
>>
>> Doe it make sense to revise the names of iopf_handle_group(),
>> iopf_complete_group, iopf_group in
>>
>> this patch set ?
>>
>>
>> Thanks,
>>
>> Ethan
>>
>>> static void iopf_handle_group(struct work_struct *work)
>>> {
>>> struct iopf_group *group;
>>> + struct iommu_domain *domain;
>>> struct iopf_fault *iopf, *next;
>>> enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
>>> group = container_of(work, struct iopf_group, work);
>>> + domain = iommu_get_domain_for_dev_pasid(group->dev,
>>> + group->last_fault.fault.prm.pasid);
>>> + if (!domain || !domain->iopf_handler)
>>> + status = IOMMU_PAGE_RESP_INVALID;
>>> list_for_each_entry_safe(iopf, next, &group->faults, list) {
>>> /*
>>> @@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct
>>> *work)
>>> * faults in the group if there is an error.
>>> */
>>> if (status == IOMMU_PAGE_RESP_SUCCESS)
>>> - status = iopf_handle_single(iopf);
>>> + status = domain->iopf_handler(&iopf->fault,
>>> + domain->fault_data);
>>> if (!(iopf->fault.prm.flags &
>>> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
>>
>
--
"firm, enduring, strong, and long-lived"
On Tue, Jun 28, 2022 at 08:39:36AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <[email protected]>
> > Sent: Tuesday, June 21, 2022 10:44 PM
> >
> > Tweak the I/O page fault handling framework to route the page faults to
> > the domain and call the page fault handler retrieved from the domain.
> > This makes the I/O page fault handling framework possible to serve more
> > usage scenarios as long as they have an IOMMU domain and install a page
> > fault handler in it. Some unused functions are also removed to avoid
> > dead code.
> >
> > The iommu_get_domain_for_dev_pasid() which retrieves attached domain
> > for a {device, PASID} pair is used. It will be used by the page fault
> > handling framework which knows {device, PASID} reported from the iommu
> > driver. We have a guarantee that the SVA domain doesn't go away during
> > IOPF handling, because unbind() waits for pending faults with
> > iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
> > to synchronize life cycle of the iommu domains between the unbind() and
> > the interrupt threads.
>
> I found iopf_queue_flush_dev() is only called in intel-iommu driver. Did
> I overlook anything?
The SMMU driver will need it as well when we upstream PRI support.
Currently it only supports stall, and that requires the device driver to
flush all DMA including stalled transactions *before* calling unbind(), so
ne need for iopf_queue_flush_dev() in this case.
Thanks,
Jean
>
> > static void iopf_handle_group(struct work_struct *work)
> > {
> > struct iopf_group *group;
> > + struct iommu_domain *domain;
> > struct iopf_fault *iopf, *next;
> > enum iommu_page_response_code status =
> > IOMMU_PAGE_RESP_SUCCESS;
> >
> > group = container_of(work, struct iopf_group, work);
> > + domain = iommu_get_domain_for_dev_pasid(group->dev,
> > + group->last_fault.fault.prm.pasid);
> > + if (!domain || !domain->iopf_handler)
> > + status = IOMMU_PAGE_RESP_INVALID;
>
> Miss a comment on why no refcnt is required on domain as explained
> in the commit msg.
> From: Jean-Philippe Brucker <[email protected]>
> Sent: Tuesday, June 28, 2022 5:44 PM
>
> On Tue, Jun 28, 2022 at 08:39:36AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <[email protected]>
> > > Sent: Tuesday, June 21, 2022 10:44 PM
> > >
> > > Tweak the I/O page fault handling framework to route the page faults to
> > > the domain and call the page fault handler retrieved from the domain.
> > > This makes the I/O page fault handling framework possible to serve more
> > > usage scenarios as long as they have an IOMMU domain and install a
> page
> > > fault handler in it. Some unused functions are also removed to avoid
> > > dead code.
> > >
> > > The iommu_get_domain_for_dev_pasid() which retrieves attached
> domain
> > > for a {device, PASID} pair is used. It will be used by the page fault
> > > handling framework which knows {device, PASID} reported from the
> iommu
> > > driver. We have a guarantee that the SVA domain doesn't go away during
> > > IOPF handling, because unbind() waits for pending faults with
> > > iopf_queue_flush_dev() before freeing the domain. Hence, there's no
> need
> > > to synchronize life cycle of the iommu domains between the unbind() and
> > > the interrupt threads.
> >
> > I found iopf_queue_flush_dev() is only called in intel-iommu driver. Did
> > I overlook anything?
>
> The SMMU driver will need it as well when we upstream PRI support.
> Currently it only supports stall, and that requires the device driver to
> flush all DMA including stalled transactions *before* calling unbind(), so
> ne need for iopf_queue_flush_dev() in this case.
>
then it makes sense. Probably Baolu can add this information in the
commit msg so others with similar question can quickly get the
point here.
On 2022/6/28 16:29, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, June 21, 2022 10:44 PM
>> +/*
>> + * I/O page fault handler for SVA
>> + */
>> +enum iommu_page_response_code
>> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
>> +{
>> + vm_fault_t ret;
>> + struct mm_struct *mm;
>> + struct vm_area_struct *vma;
>> + unsigned int access_flags = 0;
>> + struct iommu_domain *domain = data;
>> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
>> + struct iommu_fault_page_request *prm = &fault->prm;
>> + enum iommu_page_response_code status =
>> IOMMU_PAGE_RESP_INVALID;
>> +
>> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
>> + return status;
>> +
>> + mm = domain->mm;
>
> What about directly passing domain->mm in as the fault data?
>
> The entire logic here is only about mm instead of domain.
Yes. Updated.
Best regards,
baolu
On 2022/6/28 16:39, Tian, Kevin wrote:
>> static void iopf_handle_group(struct work_struct *work)
>> {
>> struct iopf_group *group;
>> + struct iommu_domain *domain;
>> struct iopf_fault *iopf, *next;
>> enum iommu_page_response_code status =
>> IOMMU_PAGE_RESP_SUCCESS;
>>
>> group = container_of(work, struct iopf_group, work);
>> + domain = iommu_get_domain_for_dev_pasid(group->dev,
>> + group->last_fault.fault.prm.pasid);
>> + if (!domain || !domain->iopf_handler)
>> + status = IOMMU_PAGE_RESP_INVALID;
> Miss a comment on why no refcnt is required on domain as explained
> in the commit msg.
I had some comments around iommu_queue_iopf() in the previous patch. The
iommu_queue_iopf() is the generic page fault handler exposed by iommu
core, hence that's the right place to document this.
Post it below as well:
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 1df8c1dcae77..aee9e033012f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct *work)
* request completes, outstanding faults will have been dealt with by
the time
* the PASID is freed.
*
+ * Any valid page fault will be eventually routed to an iommu domain
and the
+ * page fault handler installed there will get called. The users of this
+ * handling framework should guarantee that the iommu domain could only be
+ * freed after the device has stopped generating page faults (or the iommu
+ * hardware has been set to block the page faults) and the pending page
faults
+ * have been flushed.
+ *
* Return: 0 on success and <0 on error.
*/
int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
Best regards,
baolu
On 2022/6/28 16:50, Tian, Kevin wrote:
>> From: Baolu Lu<[email protected]>
>> Sent: Tuesday, June 28, 2022 1:41 PM
>>>> struct iommu_domain {
>>>> unsigned type;
>>>> const struct iommu_domain_ops *ops;
>>>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
>>>> - iommu_fault_handler_t handler;
>>>> - void *handler_token;
>>>> struct iommu_domain_geometry geometry;
>>>> struct iommu_dma_cookie *iova_cookie;
>>>> + union {
>>>> + struct { /* IOMMU_DOMAIN_DMA */
>>>> + iommu_fault_handler_t handler;
>>>> + void *handler_token;
>>>> + };
>>> why is it DMA domain specific? What about unmanaged
>>> domain? Unrecoverable fault can happen on any type
>>> including SVA. Hence I think above should be domain type
>>> agnostic.
>> The report_iommu_fault() should be replaced by the new
>> iommu_report_device_fault(). Jean has already started this work.
>>
>> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
>>
>> Currently this is only for DMA domains, hence Robin suggested to make it
>> exclude with the SVA domain things.
>>
>> https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-
>> [email protected]/
> Then it's worthy a comment that those two fields are for
> some legacy fault reporting stuff and DMA type only.
The iommu_fault and SVA fields are exclusive. The former is used for
unrecoverable DMA remapping faults, while the latter is only interested
in the recoverable page faults.
I will update the commit message with above explanation. Does this work
for you?
Best regards,
baolu
On 2022/6/28 17:10, Ethan Zhao wrote:
> Hi, Baolu
>
> 在 2022/6/28 14:28, Baolu Lu 写道:
>> Hi Ethan,
>>
>> On 2022/6/27 21:03, Ethan Zhao wrote:
>>> Hi,
>>>
>>> 在 2022/6/21 22:43, Lu Baolu 写道:
>>>> Tweak the I/O page fault handling framework to route the page faults to
>>>> the domain and call the page fault handler retrieved from the domain.
>>>> This makes the I/O page fault handling framework possible to serve more
>>>> usage scenarios as long as they have an IOMMU domain and install a page
>>>> fault handler in it. Some unused functions are also removed to avoid
>>>> dead code.
>>>>
>>>> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
>>>> for a {device, PASID} pair is used. It will be used by the page fault
>>>> handling framework which knows {device, PASID} reported from the iommu
>>>> driver. We have a guarantee that the SVA domain doesn't go away during
>>>> IOPF handling, because unbind() waits for pending faults with
>>>> iopf_queue_flush_dev() before freeing the domain. Hence, there's no
>>>> need
>>>> to synchronize life cycle of the iommu domains between the unbind() and
>>>> the interrupt threads.
>>>>
>>>> Signed-off-by: Lu Baolu <[email protected]>
>>>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>>>> ---
>>>> drivers/iommu/io-pgfault.c | 64
>>>> +++++---------------------------------
>>>> 1 file changed, 7 insertions(+), 57 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
>>>> index aee9e033012f..4f24ec703479 100644
>>>> --- a/drivers/iommu/io-pgfault.c
>>>> +++ b/drivers/iommu/io-pgfault.c
>>>> @@ -69,69 +69,18 @@ static int iopf_complete_group(struct device
>>>> *dev, struct iopf_fault *iopf,
>>>> return iommu_page_response(dev, &resp);
>>>> }
>>>> -static enum iommu_page_response_code
>>>> -iopf_handle_single(struct iopf_fault *iopf)
>>>> -{
>>>> - vm_fault_t ret;
>>>> - struct mm_struct *mm;
>>>> - struct vm_area_struct *vma;
>>>> - unsigned int access_flags = 0;
>>>> - unsigned int fault_flags = FAULT_FLAG_REMOTE;
>>>> - struct iommu_fault_page_request *prm = &iopf->fault.prm;
>>>> - enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>>>> -
>>>> - if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
>>>> - return status;
>>>> -
>>>> - mm = iommu_sva_find(prm->pasid);
>>>> - if (IS_ERR_OR_NULL(mm))
>>>> - return status;
>>>> -
>>>> - mmap_read_lock(mm);
>>>> -
>>>> - vma = find_extend_vma(mm, prm->addr);
>>>> - if (!vma)
>>>> - /* Unmapped area */
>>>> - goto out_put_mm;
>>>> -
>>>> - if (prm->perm & IOMMU_FAULT_PERM_READ)
>>>> - access_flags |= VM_READ;
>>>> -
>>>> - if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
>>>> - access_flags |= VM_WRITE;
>>>> - fault_flags |= FAULT_FLAG_WRITE;
>>>> - }
>>>> -
>>>> - if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
>>>> - access_flags |= VM_EXEC;
>>>> - fault_flags |= FAULT_FLAG_INSTRUCTION;
>>>> - }
>>>> -
>>>> - if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
>>>> - fault_flags |= FAULT_FLAG_USER;
>>>> -
>>>> - if (access_flags & ~vma->vm_flags)
>>>> - /* Access fault */
>>>> - goto out_put_mm;
>>>> -
>>>> - ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
>>>> - status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
>>>> - IOMMU_PAGE_RESP_SUCCESS;
>>>> -
>>>> -out_put_mm:
>>>> - mmap_read_unlock(mm);
>>>> - mmput(mm);
>>>> -
>>>> - return status;
>>>> -}
>>>> -
>>>
>>> Once the iopf_handle_single() is removed, the name of
>>> iopf_handle_group() looks a little weired
>>>
>>> and confused, does this group mean the iommu group (domain) ? while I
>>> take some minutes to
>>
>> No. This is not the iommu group. It's page request group defined by the
>> PCI SIG spec. Multiple page requests could be put in a group with a
>> same group id. All page requests in a group could be responded to device
>> in one shot.
>
> Thanks your explaination, understand the concept of PCIe PRG. I meant
>
> do we still have the necessity to mention the "group" here in the name
>
> iopf_handle_group(), which one is better ? iopf_handle_prg() or
>
> iopf_handler(), perhaps none of them ? :)
Oh! Sorry for the misunderstanding.
I have no strong feeling to change this naming. :-) All the names
express what the helper does. Jean is the author of this framework. If
he has the same idea as you, I don't mind renaming it in this patch.
Best regards,
baolu
On 2022/6/28 16:50, Tian, Kevin wrote:
>>>> +
>>>> + mutex_lock(&group->mutex);
>>>> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
>>>> GFP_KERNEL);
>>>> + if (curr)
>>>> + goto out_unlock;
>>> Need check xa_is_err(old).
>> Either
>>
>> (1) old entry is a valid pointer, or
> return -EBUSY in this case
>
>> (2) xa_is_err(curr)
> return xa_err(cur)
>
>> are failure cases. Hence, "curr == NULL" is the only check we need. Did
>> I miss anything?
>>
> But now you always return -EBUSY for all kinds of xa errors.
Fair enough. Updated like below.
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
GFP_KERNEL);
if (curr) {
ret = xa_err(curr) ? : -EBUSY;
goto out_unlock;
}
Best regards,
baolu
On 2022/6/28 18:02, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker <[email protected]>
>> Sent: Tuesday, June 28, 2022 5:44 PM
>>
>> On Tue, Jun 28, 2022 at 08:39:36AM +0000, Tian, Kevin wrote:
>>>> From: Lu Baolu <[email protected]>
>>>> Sent: Tuesday, June 21, 2022 10:44 PM
>>>>
>>>> Tweak the I/O page fault handling framework to route the page faults to
>>>> the domain and call the page fault handler retrieved from the domain.
>>>> This makes the I/O page fault handling framework possible to serve more
>>>> usage scenarios as long as they have an IOMMU domain and install a
>> page
>>>> fault handler in it. Some unused functions are also removed to avoid
>>>> dead code.
>>>>
>>>> The iommu_get_domain_for_dev_pasid() which retrieves attached
>> domain
>>>> for a {device, PASID} pair is used. It will be used by the page fault
>>>> handling framework which knows {device, PASID} reported from the
>> iommu
>>>> driver. We have a guarantee that the SVA domain doesn't go away during
>>>> IOPF handling, because unbind() waits for pending faults with
>>>> iopf_queue_flush_dev() before freeing the domain. Hence, there's no
>> need
>>>> to synchronize life cycle of the iommu domains between the unbind() and
>>>> the interrupt threads.
>>>
>>> I found iopf_queue_flush_dev() is only called in intel-iommu driver. Did
>>> I overlook anything?
>>
>> The SMMU driver will need it as well when we upstream PRI support.
>> Currently it only supports stall, and that requires the device driver to
>> flush all DMA including stalled transactions *before* calling unbind(), so
>> ne need for iopf_queue_flush_dev() in this case.
>>
>
> then it makes sense. Probably Baolu can add this information in the
> commit msg so others with similar question can quickly get the
> point here.
Sure. Updated.
Best regards,
baolu
On Tue, Jun 28, 2022 at 07:53:39PM +0800, Baolu Lu wrote:
> > > > Once the iopf_handle_single() is removed, the name of
> > > > iopf_handle_group() looks a little weired
> > > >
> > > > and confused, does this group mean the iommu group (domain) ?
> > > > while I take some minutes to
> > >
> > > No. This is not the iommu group. It's page request group defined by the
> > > PCI SIG spec. Multiple page requests could be put in a group with a
> > > same group id. All page requests in a group could be responded to device
> > > in one shot.
> >
> > Thanks your explaination, understand the concept of PCIe PRG. I meant
> >
> > do we still have the necessity to mention the "group" here in the name
> >
> > iopf_handle_group(), which one is better ? iopf_handle_prg() or
> >
> > iopf_handler(), perhaps none of them ? :)
>
> Oh! Sorry for the misunderstanding.
>
> I have no strong feeling to change this naming. :-) All the names
> express what the helper does. Jean is the author of this framework. If
> he has the same idea as you, I don't mind renaming it in this patch.
I'm not attached to the name, and I see how it could be confusing. Given
that io-pgfault is not only for PCIe, 'prg' is not the best here either.
iopf_handle_faults(), or just iopf_handler(), seem more suitable.
Thanks,
Jean
On 2022/6/28 22:20, Jean-Philippe Brucker wrote:
> On Tue, Jun 28, 2022 at 07:53:39PM +0800, Baolu Lu wrote:
>>>>> Once the iopf_handle_single() is removed, the name of
>>>>> iopf_handle_group() looks a little weired
>>>>>
>>>>> and confused, does this group mean the iommu group (domain) ?
>>>>> while I take some minutes to
>>>>
>>>> No. This is not the iommu group. It's page request group defined by the
>>>> PCI SIG spec. Multiple page requests could be put in a group with a
>>>> same group id. All page requests in a group could be responded to device
>>>> in one shot.
>>>
>>> Thanks your explaination, understand the concept of PCIe PRG. I meant
>>>
>>> do we still have the necessity to mention the "group" here in the name
>>>
>>> iopf_handle_group(), which one is better ? iopf_handle_prg() or
>>>
>>> iopf_handler(), perhaps none of them ? :)
>>
>> Oh! Sorry for the misunderstanding.
>>
>> I have no strong feeling to change this naming. :-) All the names
>> express what the helper does. Jean is the author of this framework. If
>> he has the same idea as you, I don't mind renaming it in this patch.
>
> I'm not attached to the name, and I see how it could be confusing. Given
> that io-pgfault is not only for PCIe, 'prg' is not the best here either.
> iopf_handle_faults(), or just iopf_handler(), seem more suitable.
Okay, so I will rename it to iopf_handle_faults() in this patch.
Best regards,
baolu
> From: Baolu Lu <[email protected]>
> Sent: Tuesday, June 28, 2022 7:34 PM
>
> On 2022/6/28 16:50, Tian, Kevin wrote:
> >> From: Baolu Lu<[email protected]>
> >> Sent: Tuesday, June 28, 2022 1:41 PM
> >>>> struct iommu_domain {
> >>>> unsigned type;
> >>>> const struct iommu_domain_ops *ops;
> >>>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> >>>> - iommu_fault_handler_t handler;
> >>>> - void *handler_token;
> >>>> struct iommu_domain_geometry geometry;
> >>>> struct iommu_dma_cookie *iova_cookie;
> >>>> + union {
> >>>> + struct { /* IOMMU_DOMAIN_DMA */
> >>>> + iommu_fault_handler_t handler;
> >>>> + void *handler_token;
> >>>> + };
> >>> why is it DMA domain specific? What about unmanaged
> >>> domain? Unrecoverable fault can happen on any type
> >>> including SVA. Hence I think above should be domain type
> >>> agnostic.
> >> The report_iommu_fault() should be replaced by the new
> >> iommu_report_device_fault(). Jean has already started this work.
> >>
> >> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
> >>
> >> Currently this is only for DMA domains, hence Robin suggested to make it
> >> exclude with the SVA domain things.
> >>
> >> https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-
> >> [email protected]/
> > Then it's worthy a comment that those two fields are for
> > some legacy fault reporting stuff and DMA type only.
>
> The iommu_fault and SVA fields are exclusive. The former is used for
> unrecoverable DMA remapping faults, while the latter is only interested
> in the recoverable page faults.
>
> I will update the commit message with above explanation. Does this work
> for you?
>
Not exactly. Your earlier explanation is about old vs. new API thus
leaving the existing fault handler with current only user is fine.
but this is not related to unrecoverable vs. recoverable. As I said
unrecoverable could happen on all domain types. Tying it to
DMA-only doesn't make sense and I think in the end the new
iommu_report_device_fault() will need support both. Is it not the
case?
On 2022/6/29 09:54, Tian, Kevin wrote:
>> From: Baolu Lu <[email protected]>
>> Sent: Tuesday, June 28, 2022 7:34 PM
>>
>> On 2022/6/28 16:50, Tian, Kevin wrote:
>>>> From: Baolu Lu<[email protected]>
>>>> Sent: Tuesday, June 28, 2022 1:41 PM
>>>>>> struct iommu_domain {
>>>>>> unsigned type;
>>>>>> const struct iommu_domain_ops *ops;
>>>>>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
>>>>>> - iommu_fault_handler_t handler;
>>>>>> - void *handler_token;
>>>>>> struct iommu_domain_geometry geometry;
>>>>>> struct iommu_dma_cookie *iova_cookie;
>>>>>> + union {
>>>>>> + struct { /* IOMMU_DOMAIN_DMA */
>>>>>> + iommu_fault_handler_t handler;
>>>>>> + void *handler_token;
>>>>>> + };
>>>>> why is it DMA domain specific? What about unmanaged
>>>>> domain? Unrecoverable fault can happen on any type
>>>>> including SVA. Hence I think above should be domain type
>>>>> agnostic.
>>>> The report_iommu_fault() should be replaced by the new
>>>> iommu_report_device_fault(). Jean has already started this work.
>>>>
>>>> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
>>>>
>>>> Currently this is only for DMA domains, hence Robin suggested to make it
>>>> exclude with the SVA domain things.
>>>>
>>>> https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-
>>>> [email protected]/
>>> Then it's worthy a comment that those two fields are for
>>> some legacy fault reporting stuff and DMA type only.
>>
>> The iommu_fault and SVA fields are exclusive. The former is used for
>> unrecoverable DMA remapping faults, while the latter is only interested
>> in the recoverable page faults.
>>
>> I will update the commit message with above explanation. Does this work
>> for you?
>>
>
> Not exactly. Your earlier explanation is about old vs. new API thus
> leaving the existing fault handler with current only user is fine.
>
> but this is not related to unrecoverable vs. recoverable. As I said
> unrecoverable could happen on all domain types. Tying it to
> DMA-only doesn't make sense and I think in the end the new
> iommu_report_device_fault() will need support both. Is it not the
> case?
You are right.
The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Leave the existing fault handler with the
existing users and the newly added SVA members should exclude it.
Best regards,
baolu
在 2022/6/28 22:20, Jean-Philippe Brucker 写道:
> On Tue, Jun 28, 2022 at 07:53:39PM +0800, Baolu Lu wrote:
>>>>> Once the iopf_handle_single() is removed, the name of
>>>>> iopf_handle_group() looks a little weired
>>>>>
>>>>> and confused, does this group mean the iommu group (domain) ?
>>>>> while I take some minutes to
>>>> No. This is not the iommu group. It's page request group defined by the
>>>> PCI SIG spec. Multiple page requests could be put in a group with a
>>>> same group id. All page requests in a group could be responded to device
>>>> in one shot.
>>> Thanks your explaination, understand the concept of PCIe PRG. I meant
>>>
>>> do we still have the necessity to mention the "group" here in the name
>>>
>>> iopf_handle_group(), which one is better ? iopf_handle_prg() or
>>>
>>> iopf_handler(), perhaps none of them ? :)
>> Oh! Sorry for the misunderstanding.
>>
>> I have no strong feeling to change this naming. :-) All the names
>> express what the helper does. Jean is the author of this framework. If
>> he has the same idea as you, I don't mind renaming it in this patch.
> I'm not attached to the name, and I see how it could be confusing. Given
> that io-pgfault is not only for PCIe, 'prg' is not the best here either.
> iopf_handle_faults(), or just iopf_handler(), seem more suitable.
Both iopf_handle_faults() and iopf_handler() looks straight, iopf_handler()
saves one word 'faults', iopf already has the meaning 'io page fault' , so
iopf_handler() is clear enough I think.
Thanks,
Ethan
>
> Thanks,
> Jean
--
"firm, enduring, strong, and long-lived"