Some modern accelerators such as Intel's Data Streaming Accelerator (DSA)
require PASID in DMA requests to be operational. Specifically, the work
submissions with ENQCMD on shared work queues require PASIDs. The use cases
include both user DMA with shared virtual addressing (SVA) and in-kernel
DMA similar to legacy DMA w/o PASID. Here we address the latter.
DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device or Requester ID(RID) basis which is not
PASID-aware. To leverage DMA API for devices relies on PASIDs, this
patchset introduces the following APIs
1. A driver facing API that enables DMA API PASID usage:
iommu_enable_pasid_dma(struct device *dev, ioasid_t &pasid);
2. An IOMMU op that allows attaching device-domain-PASID generically (will
be used beyond DMA API PASID support)
Once PASID DMA is enabled and attached to the appropriate IOMMU domain,
device drivers can continue to use DMA APIs as-is. There is no difference
in terms of mapping in dma_handle between without PASID and with PASID.
The DMA mapping performed by IOMMU will be identical for both requests, let
it be IOVA or PA in case of pass-through.
In addition, this set converts DSA driver in-kernel DMA with PASID from SVA
lib to DMA API. There have been security and functional issues with the
kernel SVA approach:
(https://lore.kernel.org/linux-iommu/[email protected]/)
The highlights are as the following:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use cases yet
where DMA engines need kernel virtual addresses for in-kernel DMA.
Subsequently, cleanup is done around the usage of sva_bind_device() for
in-kernel DMA. Removing special casing code in VT-d driver and tightening
SVA lib API.
This work and idea behind it is a collaboration with many people, many
thanks to Baolu Lu, Jason Gunthorpe, Dave Jiang, and others.
ChangeLog:
v2
- Do not reserve a special PASID for DMA API usage. Use IOASID
allocation instead.
- Introduced a generic device-pasid-domain attachment IOMMU op.
Replaced the DMA API only IOMMU op.
- Removed supervisor SVA support in VT-d
- Removed unused sva_bind_device parameters
- Use IOMMU specific data instead of struct device to store PASID
info
Jacob Pan (6):
iommu/vt-d: Implement device_pasid domain attach ops
iommu/vt-d: Use device_pasid attach op for RID2PASID
iommu: Add PASID support for DMA mapping API users
dmaengine: idxd: Use DMA API for in-kernel DMA with PASID
iommu/vt-d: Delete supervisor/kernel SVA
iommu: Remove unused driver data in sva_bind_device
Lu Baolu (2):
iommu: Assign per device max PASID
iommu: Add attach/detach_dev_pasid domain ops
drivers/dma/idxd/cdev.c | 2 +-
drivers/dma/idxd/idxd.h | 1 -
drivers/dma/idxd/init.c | 34 +--
drivers/dma/idxd/sysfs.c | 7 -
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-
drivers/iommu/dma-iommu.c | 65 ++++++
drivers/iommu/intel/iommu.c | 214 ++++++++++++++++--
drivers/iommu/intel/svm.c | 51 +----
drivers/iommu/iommu.c | 4 +-
drivers/misc/uacce/uacce.c | 2 +-
include/linux/dma-iommu.h | 7 +
include/linux/intel-iommu.h | 15 +-
include/linux/iommu.h | 37 ++-
14 files changed, 338 insertions(+), 108 deletions(-)
--
2.25.1
> From: Jacob Pan <[email protected]>
> Sent: Tuesday, March 15, 2022 1:07 PM
>
> Some modern accelerators such as Intel's Data Streaming Accelerator (DSA)
> require PASID in DMA requests to be operational. Specifically, the work
> submissions with ENQCMD on shared work queues require PASIDs. The use
> cases
> include both user DMA with shared virtual addressing (SVA) and in-kernel
> DMA similar to legacy DMA w/o PASID. Here we address the latter.
>
> DMA mapping API is the de facto standard for in-kernel DMA. However, it
> operates on a per device or Requester ID(RID) basis which is not
> PASID-aware. To leverage DMA API for devices relies on PASIDs, this
> patchset introduces the following APIs
>
> 1. A driver facing API that enables DMA API PASID usage:
> iommu_enable_pasid_dma(struct device *dev, ioasid_t &pasid);
Should this be called dma_enable_pasid() since it's about DMA API? Doing
so also avoids the driver to include iommu.h.
Thanks
Kevin
In-kernel DMA with PASID should use DMA API now, remove supervisor PASID
SVA support. Remove special cases in bind mm and page request service.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/svm.c | 42 ++++++++-------------------------------
1 file changed, 8 insertions(+), 34 deletions(-)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 2c53689da461..37d6218f173b 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -516,11 +516,10 @@ static void intel_svm_free_pasid(struct mm_struct *mm)
static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
struct device *dev,
- struct mm_struct *mm,
- unsigned int flags)
+ struct mm_struct *mm)
{
struct device_domain_info *info = get_domain_info(dev);
- unsigned long iflags, sflags;
+ unsigned long iflags, sflags = 0;
struct intel_svm_dev *sdev;
struct intel_svm *svm;
int ret = 0;
@@ -533,16 +532,13 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
svm->pasid = mm->pasid;
svm->mm = mm;
- svm->flags = flags;
INIT_LIST_HEAD_RCU(&svm->devs);
- if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
- svm->notifier.ops = &intel_mmuops;
- ret = mmu_notifier_register(&svm->notifier, mm);
- if (ret) {
- kfree(svm);
- return ERR_PTR(ret);
- }
+ svm->notifier.ops = &intel_mmuops;
+ ret = mmu_notifier_register(&svm->notifier, mm);
+ if (ret) {
+ kfree(svm);
+ return ERR_PTR(ret);
}
ret = pasid_private_add(svm->pasid, svm);
@@ -583,8 +579,6 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
}
/* Setup the pasid table: */
- sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
- PASID_FLAG_SUPERVISOR_MODE : 0;
sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
spin_lock_irqsave(&iommu->lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
@@ -957,7 +951,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
* to unbind the mm while any page faults are outstanding.
*/
svm = pasid_private_find(req->pasid);
- if (IS_ERR_OR_NULL(svm) || (svm->flags & SVM_FLAG_SUPERVISOR_MODE))
+ if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
@@ -1011,29 +1005,9 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
{
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- unsigned int flags = 0;
struct iommu_sva *sva;
int ret;
- if (drvdata)
- flags = *(unsigned int *)drvdata;
-
- if (flags & SVM_FLAG_SUPERVISOR_MODE) {
- if (!ecap_srs(iommu->ecap)) {
- dev_err(dev, "%s: Supervisor PASID not supported\n",
- iommu->name);
- return ERR_PTR(-EOPNOTSUPP);
- }
-
- if (mm) {
- dev_err(dev, "%s: Supervisor PASID with user provided mm\n",
- iommu->name);
- return ERR_PTR(-EINVAL);
- }
-
- mm = &init_mm;
- }
-
mutex_lock(&pasid_mutex);
ret = intel_svm_alloc_pasid(dev, mm, flags);
if (ret) {
--
2.25.1
From: Lu Baolu <[email protected]>
An IOMMU domain represents an address space which can be attached by
devices that perform DMA within a domain. However, for platforms with
PASID capability the domain attachment needs be handled at device+PASID
level. There can be multiple PASIDs within a device and multiple devices
attached to a given domain.
This patch introduces a new IOMMU op which support device, PASID, and
IOMMU domain attachment. The immediate use case is for PASID capable
devices to perform DMA under DMA APIs.
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
include/linux/iommu.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 369f05c2a4e2..fde5b933dbe3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
* @aux_get_pasid: get the pasid given an aux-domain
* @sva_bind: Bind process address space to device
* @sva_unbind: Unbind process address space from device
+ * @attach_dev_pasid: attach an iommu domain to a pasid of device
+ * @detach_dev_pasid: detach an iommu domain from a pasid of device
* @sva_get_pasid: Get PASID associated to a SVA handle
* @page_response: handle page request response
* @cache_invalidate: invalidate translation caches
@@ -296,6 +298,10 @@ struct iommu_ops {
struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
+ int (*attach_dev_pasid)(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id);
+ void (*detach_dev_pasid)(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id);
u32 (*sva_get_pasid)(struct iommu_sva *handle);
int (*page_response)(struct device *dev,
--
2.25.1
No one is using drvdata for sva_bind_device after kernel SVA support is
removed from VT-d driver. Remove the drvdata parameter as well.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/cdev.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++---
drivers/iommu/intel/svm.c | 9 ++++-----
drivers/iommu/iommu.c | 4 ++--
drivers/misc/uacce/uacce.c | 2 +-
include/linux/intel-iommu.h | 3 +--
include/linux/iommu.h | 9 +++------
8 files changed, 15 insertions(+), 21 deletions(-)
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index b9b2b4a4124e..312ec37ebf91 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
filp->private_data = ctx;
if (device_pasid_enabled(idxd)) {
- sva = iommu_sva_bind_device(dev, current->mm, NULL);
+ sva = iommu_sva_bind_device(dev, current->mm);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a737ba5f727e..eb2f5cb0701a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
}
struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
{
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..d2ba86470c42 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
@@ -791,7 +790,7 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
}
static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
{
return ERR_PTR(-ENODEV);
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 37d6218f173b..94deb58375f5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -500,8 +500,7 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
return ret;
}
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
- unsigned int flags)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
{
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
@@ -1002,20 +1001,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
return IRQ_RETVAL(handled);
}
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
{
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
struct iommu_sva *sva;
int ret;
mutex_lock(&pasid_mutex);
- ret = intel_svm_alloc_pasid(dev, mm, flags);
+ ret = intel_svm_alloc_pasid(dev, mm);
if (ret) {
mutex_unlock(&pasid_mutex);
return ERR_PTR(ret);
}
- sva = intel_svm_bind_mm(iommu, dev, mm, flags);
+ sva = intel_svm_bind_mm(iommu, dev, mm);
if (IS_ERR_OR_NULL(sva))
intel_svm_free_pasid(mm);
mutex_unlock(&pasid_mutex);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 107dcf5938d6..fef34879bc0c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3049,7 +3049,7 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
* On error, returns an ERR_PTR value.
*/
struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
struct iommu_group *group;
struct iommu_sva *handle = ERR_PTR(-EINVAL);
@@ -3074,7 +3074,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
if (iommu_group_device_count(group) != 1)
goto out_unlock;
- handle = ops->sva_bind(dev, mm, drvdata);
+ handle = ops->sva_bind(dev, mm);
out_unlock:
mutex_unlock(&group->mutex);
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..3238a867ea51 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
if (!(uacce->flags & UACCE_DEV_SVA))
return 0;
- handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
+ handle = iommu_sva_bind_device(uacce->parent, current->mm);
if (IS_ERR(handle))
return PTR_ERR(handle);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3f4c98f170ec..9dc855d7479d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -777,8 +777,7 @@ extern int intel_svm_finish_prq(struct intel_iommu *iommu);
int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
struct iommu_gpasid_bind_data *data);
int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fb011722e4f8..b570b37181ad 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -294,9 +294,7 @@ struct iommu_ops {
int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
-
- struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+ struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
void (*sva_unbind)(struct iommu_sva *handle);
int (*attach_dev_pasid)(struct iommu_domain *domain,
struct device *dev, ioasid_t id);
@@ -705,8 +703,7 @@ void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm,
- void *drvdata);
+ struct mm_struct *mm);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
@@ -1065,7 +1062,7 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
}
static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
return NULL;
}
--
2.25.1
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.
Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a PASID for work submission within
the kernel
2. Enable the kernel PASID on the IOMMU for the device
3. Attach the kernel PASID to the device's default DMA domain, let it
be IOVA or physical address in case of pass-through.
This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/dma-iommu.c | 65 +++++++++++++++++++++++++++++++++++++++
include/linux/dma-iommu.h | 7 +++++
include/linux/iommu.h | 9 ++++++
3 files changed, 81 insertions(+)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b22034975301..d0ff1a34b1b6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
IOMMU_DMA_MSI_COOKIE,
};
+static DECLARE_IOASID_SET(iommu_dma_pasid);
+
struct iommu_dma_cookie {
enum iommu_dma_cookie_type type;
union {
@@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
}
+/**
+ * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
+ * @dev: Device to be enabled
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA map, the supervisor PASID will point to the same IOVA
+ * page table.
+ *
+ * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
+ */
+int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
+{
+ struct iommu_domain *dom;
+ ioasid_t id, max;
+ int ret;
+
+ dom = iommu_get_domain_for_dev(dev);
+ if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
+ return -ENODEV;
+ max = iommu_get_dev_pasid_max(dev);
+ if (!max)
+ return -EINVAL;
+
+ id = ioasid_alloc(&iommu_dma_pasid, 1, max, dev);
+ if (id == INVALID_IOASID)
+ return -ENOMEM;
+
+ ret = dom->ops->attach_dev_pasid(dom, dev, id);
+ if (ret) {
+ ioasid_put(id);
+ return ret;
+ }
+ *pasid = id;
+
+ return ret;
+}
+EXPORT_SYMBOL(iommu_enable_pasid_dma);
+
+/**
+ * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
+ * @dev: Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ * @return 0 on success or error code on failure
+ */
+void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *dom;
+
+ /* TODO: check the given PASID is within the ioasid_set */
+ dom = iommu_get_domain_for_dev(dev);
+ if (!dom->ops->detach_dev_pasid)
+ return;
+ dom->ops->detach_dev_pasid(dom, dev, pasid);
+ ioasid_put(pasid);
+}
+EXPORT_SYMBOL(iommu_disable_pasid_dma);
+
/**
* iommu_dma_get_resv_regions - Reserved region driver helper
* @dev: Device from iommu_get_resv_regions()
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..e6cb9b52a420 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
void iommu_put_dma_cookie(struct iommu_domain *domain);
+/*
+ * For devices that can do DMA request with PASID, setup a system PASID.
+ * Address modes (IOVA, PA) are selected by the platform code.
+ */
+int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
+void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
+
/* Setup call for arch DMA mapping code */
void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
int iommu_dma_init_fq(struct iommu_domain *domain);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fde5b933dbe3..fb011722e4f8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -395,6 +395,15 @@ static inline void iommu_set_dev_pasid_max(struct device *dev,
param->pasid_max = max;
}
+static inline ioasid_t iommu_get_dev_pasid_max(struct device *dev)
+{
+ struct dev_iommu *param = dev->iommu;
+
+ if (WARN_ON(!param))
+ return 0;
+
+ return param->pasid_max;
+}
int iommu_device_register(struct iommu_device *iommu,
const struct iommu_ops *ops,
--
2.25.1
On Wed, Mar 16, 2022 at 08:41:27AM +0000, Tian, Kevin wrote:
> 1) When the kernel wants a more scalable way of using IDXD e.g. having
> multiple CPUs simultaneously submitting works in a lockless way to a
> shared work queue via a new instruction (ENQCMD) which carries
> PASID.
IMHO the misdesign is the CPU can't submit work with ENQCMD from
kernel space that will do DMA on the RID.
> 2) When the host wants to share a workqueue between multiple VMs.
> In that case the virtual IDXD device exposed to each VM will only support
> the shared workqueue mode. Only in this case the DMA API in the
> guest must be attached by a PASID as ENQCMD is the only way to submit
> works.
It is the same issue - if ENQCMD had 'excute on the RID' then the
virtualization layer could translate that to 'execute on this PASID
setup by the hypervisor' and the kernel would not see additional
differences between SIOV and physical devices. IMHO mandatory kernel
PASID support in the guest just to support the kernel doing DMA to a
device is not nice.
Jason
From: Dave Jiang <[email protected]>
The idxd driver always gated the pasid enabling under a single knob and
this assumption is incorrect. The pasid used for kernel operation can be
independently toggled and has no dependency on the user pasid (and vice
versa). Split the two so they are independent "enabled" flags.
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/cdev.c | 4 ++--
drivers/dma/idxd/idxd.h | 6 ++++++
drivers/dma/idxd/init.c | 30 ++++++++++++++++++------------
3 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 312ec37ebf91..addaebca7683 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -99,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
ctx->wq = wq;
filp->private_data = ctx;
- if (device_pasid_enabled(idxd)) {
+ if (device_user_pasid_enabled(idxd)) {
sva = iommu_sva_bind_device(dev, current->mm);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
@@ -152,7 +152,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
if (wq_shared(wq)) {
idxd_device_drain_pasid(idxd, ctx->pasid);
} else {
- if (device_pasid_enabled(idxd)) {
+ if (device_user_pasid_enabled(idxd)) {
/* The wq disable in the disable pasid function will drain the wq */
rc = idxd_wq_disable_pasid(wq);
if (rc < 0)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a09ab4a6e1c1..190b08bd7c08 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -239,6 +239,7 @@ enum idxd_device_flag {
IDXD_FLAG_CONFIGURABLE = 0,
IDXD_FLAG_CMD_RUNNING,
IDXD_FLAG_PASID_ENABLED,
+ IDXD_FLAG_USER_PASID_ENABLED,
};
struct idxd_dma_dev {
@@ -468,6 +469,11 @@ static inline bool device_pasid_enabled(struct idxd_device *idxd)
return test_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
}
+static inline bool device_user_pasid_enabled(struct idxd_device *idxd)
+{
+ return test_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);
+}
+
static inline bool device_swq_supported(struct idxd_device *idxd)
{
return (support_enqcmd && device_pasid_enabled(idxd));
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 5d1f8dd4abf6..981150b7d09b 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -500,16 +500,19 @@ static int idxd_probe(struct idxd_device *idxd)
if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && sva) {
rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA);
- if (rc == 0) {
- rc = idxd_enable_system_pasid(idxd);
- if (rc < 0) {
- iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
- dev_warn(dev, "Failed to enable PASID. No SVA support: %d\n", rc);
- } else {
- set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
- }
- } else {
+ if (rc) {
+ /*
+ * Do not bail here since legacy DMA is still
+ * supported, both user and in-kernel DMA with
+ * PASID rely on SVA feature.
+ */
dev_warn(dev, "Unable to turn on SVA feature.\n");
+ } else {
+ set_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);
+ if (idxd_enable_system_pasid(idxd))
+ dev_warn(dev, "No in-kernel DMA with PASID.\n");
+ else
+ set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
}
}
idxd_read_caps(idxd);
@@ -545,7 +548,8 @@ static int idxd_probe(struct idxd_device *idxd)
err:
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
- iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
+ if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+ iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
return rc;
}
@@ -558,7 +562,8 @@ static void idxd_cleanup(struct idxd_device *idxd)
idxd_cleanup_internals(idxd);
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
- iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
+ if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+ iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
}
static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
@@ -677,7 +682,8 @@ static void idxd_remove(struct pci_dev *pdev)
free_irq(irq_entry->vector, irq_entry);
pci_free_irq_vectors(pdev);
pci_iounmap(pdev, idxd->reg_base);
- iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA);
+ if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+ iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA);
pci_disable_device(pdev);
destroy_workqueue(idxd->wq);
perfmon_pmu_remove(idxd);
--
2.25.1
Hi Kevin,
On Tue, 15 Mar 2022 08:16:59 +0000, "Tian, Kevin" <[email protected]>
wrote:
> > From: Jacob Pan <[email protected]>
> > Sent: Tuesday, March 15, 2022 1:07 PM
> >
> > Some modern accelerators such as Intel's Data Streaming Accelerator
> > (DSA) require PASID in DMA requests to be operational. Specifically,
> > the work submissions with ENQCMD on shared work queues require PASIDs.
> > The use cases
> > include both user DMA with shared virtual addressing (SVA) and in-kernel
> > DMA similar to legacy DMA w/o PASID. Here we address the latter.
> >
> > DMA mapping API is the de facto standard for in-kernel DMA. However, it
> > operates on a per device or Requester ID(RID) basis which is not
> > PASID-aware. To leverage DMA API for devices relies on PASIDs, this
> > patchset introduces the following APIs
> >
> > 1. A driver facing API that enables DMA API PASID usage:
> > iommu_enable_pasid_dma(struct device *dev, ioasid_t &pasid);
>
> Should this be called dma_enable_pasid() since it's about DMA API? Doing
> so also avoids the driver to include iommu.h.
>
PASID is still tied to IOMMU, drivers who wants to use this must explicitly
put dependency on IOMMU. So I prefer not to give that illusion.
> Thanks
> Kevin
Thanks,
Jacob
On Mon, Mar 14, 2022 at 10:07:06PM -0700, Jacob Pan wrote:
> From: Lu Baolu <[email protected]>
>
> An IOMMU domain represents an address space which can be attached by
> devices that perform DMA within a domain. However, for platforms with
> PASID capability the domain attachment needs be handled at device+PASID
> level. There can be multiple PASIDs within a device and multiple devices
> attached to a given domain.
> This patch introduces a new IOMMU op which support device, PASID, and
> IOMMU domain attachment. The immediate use case is for PASID capable
> devices to perform DMA under DMA APIs.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> include/linux/iommu.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 369f05c2a4e2..fde5b933dbe3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
> * @aux_get_pasid: get the pasid given an aux-domain
> * @sva_bind: Bind process address space to device
> * @sva_unbind: Unbind process address space from device
> + * @attach_dev_pasid: attach an iommu domain to a pasid of device
> + * @detach_dev_pasid: detach an iommu domain from a pasid of device
Isn't that operation "assign a PASID to a domain" instead? In patch 5,
the domain is already attached to the device, so set_domain_pasid() might
be clearer and to the point. If the IOMMU driver did the allocation we
could also avoid patch 1.
If I understand correctly this series is not about a generic PASID API
that allows drivers to manage multiple DMA address spaces, because there
still doesn't seem to be any interest in that. It's about the specific
IDXD use-case, so let's focus on that. We can introduce a specialized call
such as (iommu|dma)_set_device_pasid(), which will be easy to consolidate
later into a more generic "dma_enable_pasid()" API if that ever seems
useful.
Thanks,
Jean
> * @sva_get_pasid: Get PASID associated to a SVA handle
> * @page_response: handle page request response
> * @cache_invalidate: invalidate translation caches
> @@ -296,6 +298,10 @@ struct iommu_ops {
> struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
> void *drvdata);
> void (*sva_unbind)(struct iommu_sva *handle);
> + int (*attach_dev_pasid)(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id);
> + void (*detach_dev_pasid)(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id);
> u32 (*sva_get_pasid)(struct iommu_sva *handle);
>
> int (*page_response)(struct device *dev,
> --
> 2.25.1
>
> From: Jacob Pan <[email protected]>
> Sent: Wednesday, March 16, 2022 5:24 AM
>
> Hi Jason,
>
> On Tue, 15 Mar 2022 14:05:07 -0300, Jason Gunthorpe <[email protected]>
> wrote:
>
> > On Tue, Mar 15, 2022 at 09:31:35AM -0700, Jacob Pan wrote:
> >
> > > > IMHO it is a device mis-design of IDXD to require all DMA be PASID
> > > > tagged. Devices should be able to do DMA on their RID when the PCI
> >
> > > IDXD can do DMA w/ RID, the PASID requirement is only for shared WQ
> > > where ENQCMDS is used. ENQCMDS has the benefit of avoiding locking
> > > where work submission is done from multiple CPUs.
> > > Tony, Dave?
> >
> > This is what I mean, it has an operating mode you want to use from the
> > kernel driver that cannot do RID DMA. It is a HW mis-design, IMHO.
Well, that mode is configured per work queue and the device just provides
flexibility for the software to use it for a potential value (scalability). So in
the end it is a software question whether we can find a clean way to manage
this mode (fortunately with your proposal it does ????). From this angle I'd
not call it a mis-design because the software has other options if there is no
willing of using that mode. Even in the virtual IDXD case that I explained in
another thread, the admin should not share work queue between VMs unless
he knows that guest can support it. Otherwise the admin should just dedicate
a workqueue to the guest and then let the guest itself to decide whether to
use the shared mode capability for its own purpose.
Also in concept the IOVA space created with DMA API is not different from
other I/O address spaces. There is no architectural restriction why this space
cannot be attached by PASID.
> >
> > Something like PASID0 in the ENQCMDS should have triggered RID DMA.
> >
> That would simplify things a lot, it would be just a device change I think.
> From IA pov, only ENQCMD will #GP if PASID==0. I will bring this back to HW
> team to consider for future generations.
>
Maybe you can have a quick try?
Per SDM CPU doesn't #GP on PASID0 for ENQCMDS.
Also I don't think the device should do such check since with RID2PASID
the actual PASID value used to mark RID requests in the IOMMU is
configured by the iommu driver. In that regard it is not correct for any
hardware outside of the iommu to assume that PASID0 is for RID.
Then the only uncertain thing is within VT-d. In a quick check I didn't
find any VT-d fault specifically for a DMA request which contains
a value same as RID2PASID.
There is one interest paragraph in 6.2.1 (Tagging of cached translations):
In scalable mode, requests-without-PASID are treated as requests-with-
PASID when looking up the paging-structure cache, and PASID-cache.
Such lookups use the PASID value from the RID_PASID field in the
scalable-mode context-entry used to process the request-withoutPASID.
Refer to Section 9.4 for more details on scalable-mode context-entry.
Additionally, after translation process when such requests fill into
IOTLB, the entries are tagged with PASID value obtained from RID_PASID
field but are still marked as entries for requests-without-PASID.
Tagging of such entries with PASID value is required so that
PASID-selective P_IOTLB invalidation can correctly remove all stale
mappings. Implementation may allow requests-with-PASID from a given
Requester-ID to hit entries brought into IOTLB by requests-without-PASID
from the same Requester-ID to improve performance.
The last sentence gives me the impression that there is no problem for
VT-d to receive a request which happens to contain RID2PASID except
there might be a performance issue (if duplicated entries are created).
Thanks
Kevin
On 2022-03-15 05:07, Jacob Pan wrote:
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
>
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a PASID for work submission within
> the kernel
> 2. Enable the kernel PASID on the IOMMU for the device
> 3. Attach the kernel PASID to the device's default DMA domain, let it
> be IOVA or physical address in case of pass-through.
>
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
Surely the main point of PASIDs is to be able to use more than one of
them? The way I expected this to work is that iommu_enable_pasid_dma()
returns a *new* struct device representing the dev+PASID combination,
and the driver can then pass that to subsequent DMA API and/or IOMMU API
calls as normal, and they know to do the right thing. Automatically
inferring a PASID for the original physical device clearly can't scale,
and seems like a dead-end approach that only helps this one niche use-case.
Either way, I think this is also still fundamentally an IOMMU API
operation that belongs in iommu.[ch] - since the iommu_dma_ops
consolidation I'd prefer to continue working towards making dma-iommu.h
a private header just for IOMMU API internal helpers.
Thanks,
Robin.
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dma-iommu.c | 65 +++++++++++++++++++++++++++++++++++++++
> include/linux/dma-iommu.h | 7 +++++
> include/linux/iommu.h | 9 ++++++
> 3 files changed, 81 insertions(+)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index b22034975301..d0ff1a34b1b6 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> IOMMU_DMA_MSI_COOKIE,
> };
>
> +static DECLARE_IOASID_SET(iommu_dma_pasid);
> +
> struct iommu_dma_cookie {
> enum iommu_dma_cookie_type type;
> union {
> @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
> domain->iova_cookie = NULL;
> }
>
> +/**
> + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> + * @dev: Device to be enabled
> + *
> + * DMA request with PASID will be mapped the same way as the legacy DMA.
> + * If the device is in pass-through, PASID will also pass-through. If the
> + * device is in IOVA map, the supervisor PASID will point to the same IOVA
> + * page table.
> + *
> + * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
> + */
> +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> +{
> + struct iommu_domain *dom;
> + ioasid_t id, max;
> + int ret;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> + return -ENODEV;
> + max = iommu_get_dev_pasid_max(dev);
> + if (!max)
> + return -EINVAL;
> +
> + id = ioasid_alloc(&iommu_dma_pasid, 1, max, dev);
> + if (id == INVALID_IOASID)
> + return -ENOMEM;
> +
> + ret = dom->ops->attach_dev_pasid(dom, dev, id);
> + if (ret) {
> + ioasid_put(id);
> + return ret;
> + }
> + *pasid = id;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(iommu_enable_pasid_dma);
> +
> +/**
> + * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
> + * @dev: Device's PASID DMA to be disabled
> + *
> + * It is the device driver's responsibility to ensure no more incoming DMA
> + * requests with the kernel PASID before calling this function. IOMMU driver
> + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
> + * drained.
> + *
> + * @return 0 on success or error code on failure
> + */
> +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
> +{
> + struct iommu_domain *dom;
> +
> + /* TODO: check the given PASID is within the ioasid_set */
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom->ops->detach_dev_pasid)
> + return;
> + dom->ops->detach_dev_pasid(dom, dev, pasid);
> + ioasid_put(pasid);
> +}
> +EXPORT_SYMBOL(iommu_disable_pasid_dma);
> +
> /**
> * iommu_dma_get_resv_regions - Reserved region driver helper
> * @dev: Device from iommu_get_resv_regions()
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 24607dc3c2ac..e6cb9b52a420 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> void iommu_put_dma_cookie(struct iommu_domain *domain);
>
> +/*
> + * For devices that can do DMA request with PASID, setup a system PASID.
> + * Address modes (IOVA, PA) are selected by the platform code.
> + */
> +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
> +
> /* Setup call for arch DMA mapping code */
> void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
> int iommu_dma_init_fq(struct iommu_domain *domain);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index fde5b933dbe3..fb011722e4f8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -395,6 +395,15 @@ static inline void iommu_set_dev_pasid_max(struct device *dev,
>
> param->pasid_max = max;
> }
> +static inline ioasid_t iommu_get_dev_pasid_max(struct device *dev)
> +{
> + struct dev_iommu *param = dev->iommu;
> +
> + if (WARN_ON(!param))
> + return 0;
> +
> + return param->pasid_max;
> +}
>
> int iommu_device_register(struct iommu_device *iommu,
> const struct iommu_ops *ops,
On Tue, Mar 15, 2022 at 09:38:10AM -0700, Jacob Pan wrote:
> > > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > > +{
> > > + struct iommu_domain *dom;
> > > + ioasid_t id, max;
> > > + int ret;
> > > +
> > > + dom = iommu_get_domain_for_dev(dev);
> > > + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > > + return -ENODEV;
> >
> > Given the purpose of this API I think it should assert that the device
> > has the DMA API in-use using the machinery from the other series.
> >
> > ie this should not be used to clone non-DMA API iommu_domains..
> >
> Let me try to confirm the specific here. I should check domain type and
> rejects all others except IOMMU_DOMAIN_DMA type, right? Should also allow
> IOMMU_DOMAIN_IDENTITY.
>
> That makes sense.
That is one way, the other is to check the new group data that Lu's
patch added.
Jason
> From: Jean-Philippe Brucker <[email protected]>
> Sent: Tuesday, March 15, 2022 7:27 PM
>
> On Mon, Mar 14, 2022 at 10:07:06PM -0700, Jacob Pan wrote:
> > From: Lu Baolu <[email protected]>
> >
> > An IOMMU domain represents an address space which can be attached by
> > devices that perform DMA within a domain. However, for platforms with
> > PASID capability the domain attachment needs be handled at device+PASID
> > level. There can be multiple PASIDs within a device and multiple devices
> > attached to a given domain.
> > This patch introduces a new IOMMU op which support device, PASID, and
> > IOMMU domain attachment. The immediate use case is for PASID capable
> > devices to perform DMA under DMA APIs.
> >
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > include/linux/iommu.h | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 369f05c2a4e2..fde5b933dbe3 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
> > * @aux_get_pasid: get the pasid given an aux-domain
> > * @sva_bind: Bind process address space to device
> > * @sva_unbind: Unbind process address space from device
> > + * @attach_dev_pasid: attach an iommu domain to a pasid of device
> > + * @detach_dev_pasid: detach an iommu domain from a pasid of device
>
> Isn't that operation "assign a PASID to a domain" instead? In patch 5,
> the domain is already attached to the device, so set_domain_pasid() might
> be clearer and to the point. If the IOMMU driver did the allocation we
> could also avoid patch 1.
iiuc this API can also work for future SIOV usage where each mdev attached
to the domain has its own pasid. "assigning a PASID to a domain" sounds
like going back to the previous aux domain approach which has one PASID
per domain and that PASID is used on all devices attached to the aux domain...
>
> If I understand correctly this series is not about a generic PASID API
> that allows drivers to manage multiple DMA address spaces, because there
> still doesn't seem to be any interest in that. It's about the specific
> IDXD use-case, so let's focus on that. We can introduce a specialized call
> such as (iommu|dma)_set_device_pasid(), which will be easy to consolidate
> later into a more generic "dma_enable_pasid()" API if that ever seems
> useful.
>
> Thanks,
> Jean
>
> > * @sva_get_pasid: Get PASID associated to a SVA handle
> > * @page_response: handle page request response
> > * @cache_invalidate: invalidate translation caches
> > @@ -296,6 +298,10 @@ struct iommu_ops {
> > struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct
> *mm,
> > void *drvdata);
> > void (*sva_unbind)(struct iommu_sva *handle);
> > + int (*attach_dev_pasid)(struct iommu_domain *domain,
> > + struct device *dev, ioasid_t id);
> > + void (*detach_dev_pasid)(struct iommu_domain *domain,
> > + struct device *dev, ioasid_t id);
> > u32 (*sva_get_pasid)(struct iommu_sva *handle);
> >
> > int (*page_response)(struct device *dev,
> > --
> > 2.25.1
> >
Hi Jason,
On Tue, 15 Mar 2022 11:35:35 -0300, Jason Gunthorpe <[email protected]> wrote:
> On Mon, Mar 14, 2022 at 10:07:09PM -0700, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> >
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> >
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > drivers/iommu/dma-iommu.c | 65 +++++++++++++++++++++++++++++++++++++++
> > include/linux/dma-iommu.h | 7 +++++
> > include/linux/iommu.h | 9 ++++++
> > 3 files changed, 81 insertions(+)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index b22034975301..d0ff1a34b1b6 100644
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> > IOMMU_DMA_MSI_COOKIE,
> > };
> >
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> > struct iommu_dma_cookie {
> > enum iommu_dma_cookie_type type;
> > union {
> > @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> > }
> >
> > +/**
> > + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> > + * @dev: Device to be enabled
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA map, the supervisor PASID will point to the same
> > IOVA
> > + * page table.
> > + *
> > + * @return the kernel PASID to be used for DMA or INVALID_IOASID on
> > failure
> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > +{
> > + struct iommu_domain *dom;
> > + ioasid_t id, max;
> > + int ret;
> > +
> > + dom = iommu_get_domain_for_dev(dev);
> > + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > + return -ENODEV;
>
> Given the purpose of this API I think it should assert that the device
> has the DMA API in-use using the machinery from the other series.
>
> ie this should not be used to clone non-DMA API iommu_domains..
>
Let me try to confirm the specific here. I should check domain type and
rejects all others except IOMMU_DOMAIN_DMA type, right? Should also allow
IOMMU_DOMAIN_IDENTITY.
That makes sense.
> > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> > index 24607dc3c2ac..e6cb9b52a420 100644
> > +++ b/include/linux/dma-iommu.h
> > @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain
> > *domain); int iommu_get_msi_cookie(struct iommu_domain *domain,
> > dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain
> > *domain);
> > +/*
> > + * For devices that can do DMA request with PASID, setup a system
> > PASID.
> > + * Address modes (IOVA, PA) are selected by the platform code.
> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> > +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
>
> The functions already have a kdoc, don't need two..
>
Good point!
Thanks,
Jacob
On VT-d platforms with scalable mode enabled, devices issue DMA requests
with PASID need to attach to the correct IOMMU domains.
The attach operation involves the following:
- programming the PASID into device's PASID table
- tracking device domain and the PASID relationship
- managing IOTLB and device TLB invalidations
This patch extends DMAR domain and device domain info with xarrays to
track per domain and per device PASIDs. It provides the flexibility to
be used beyond DMA API PASID support.
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 194 +++++++++++++++++++++++++++++++++++-
include/linux/intel-iommu.h | 12 ++-
2 files changed, 202 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 881f8361eca2..9267194eaed3 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1622,20 +1622,48 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
qdep, addr, mask);
}
+static void __iommu_flush_dev_piotlb(struct device_domain_info *info,
+ u64 address,
+ ioasid_t pasid, unsigned int mask)
+{
+ u16 sid, qdep;
+
+ if (!info || !info->ats_enabled)
+ return;
+
+ sid = info->bus << 8 | info->devfn;
+ qdep = info->ats_qdep;
+ qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+ pasid, qdep, address, mask);
+}
+
static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
u64 addr, unsigned mask)
{
unsigned long flags;
struct device_domain_info *info;
struct subdev_domain_info *sinfo;
+ unsigned long pasid;
+ struct pasid_info *pinfo;
if (!domain->has_iotlb_device)
return;
spin_lock_irqsave(&device_domain_lock, flags);
- list_for_each_entry(info, &domain->devices, link)
- __iommu_flush_dev_iotlb(info, addr, mask);
-
+ list_for_each_entry(info, &domain->devices, link) {
+ /*
+ * We cannot use PASID based devTLB invalidation on RID2PASID
+ * Device does not understand RID2PASID/0. This is different
+ * than IOTLB invalidation where RID2PASID is also used for
+ * tagging.
+ */
+ xa_for_each(&info->pasids, pasid, pinfo) {
+ if (!pasid)
+ __iommu_flush_dev_iotlb(info, addr, mask);
+ else
+ __iommu_flush_dev_piotlb(info, addr, pasid, mask);
+ }
+ }
list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
info = get_domain_info(sinfo->pdev);
__iommu_flush_dev_iotlb(info, addr, mask);
@@ -1648,6 +1676,8 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
u64 addr, unsigned long npages, bool ih)
{
u16 did = domain->iommu_did[iommu->seq_id];
+ struct pasid_info *pinfo;
+ unsigned long pasid;
if (domain->default_pasid)
qi_flush_piotlb(iommu, did, domain->default_pasid,
@@ -1655,6 +1685,21 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
+
+ if (list_empty(&domain->devices) || xa_empty(&domain->pasids))
+ return;
+
+ /*
+ * Flush IOTLBs for all the PASIDs attached to this domain, RID2PASID
+ * included.
+ * TODO: If there are many PASIDs, we may resort to flush with
+ * domain ID which may have performance benefits due to fewer
+ * invalidation descriptors. VM exits may be reduced when running on
+ * vIOMMU. The current use cases utilize no more than 2 PASIDs per
+ * device, i.e. RID2PASID and a kernel DMA API PASID.
+ */
+ xa_for_each(&domain->pasids, pasid, pinfo)
+ qi_flush_piotlb(iommu, did, pasid, addr, npages, ih);
}
static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
@@ -1902,6 +1947,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
INIT_LIST_HEAD(&domain->subdevices);
+ xa_init(&domain->pasids);
return domain;
}
@@ -2556,6 +2602,144 @@ static bool dev_is_real_dma_subdevice(struct device *dev)
pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev);
}
+
+static bool is_device_domain_attached(struct dmar_domain *dmar_domain,
+ struct device *dev)
+{
+ struct device_domain_info *info;
+
+ list_for_each_entry(info, &dmar_domain->devices, link) {
+ if (info->dev == dev)
+ return true;
+ }
+
+ return false;
+}
+
+static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_domain_info *info = get_domain_info(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct pasid_info *pinfo;
+ unsigned long flags;
+ int ret = 0;
+ void *entry;
+
+ if (!info)
+ return -ENODEV;
+ /*
+ * Allow attaching kernel PASIDs only after the device is already attached
+ * with RID2PASID, which is used for legacy DMA.
+ */
+ if (pasid != PASID_RID2PASID && !is_device_domain_attached(dmar_domain, dev)) {
+ dev_err(dev, "Device not attached, must attach device before PASID!\n");
+ return -ENODEV;
+ }
+
+ /*
+ * The same PASID from the same device can only attach to this domain
+ * once. PASID table is per device. NULL entry is not allowed in the
+ * device-domain xarray.
+ */
+ entry = xa_load(&info->pasids, pasid);
+ if (entry) {
+ dev_err(dev, "PASID %d already attached!\n", pasid);
+ return -EBUSY;
+ }
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ spin_lock(&iommu->lock);
+ if (hw_pass_through && domain_type_is_si(info->domain))
+ ret = intel_pasid_setup_pass_through(info->iommu, info->domain,
+ dev, pasid);
+ else if (domain_use_first_level(dmar_domain))
+ ret = domain_setup_first_level(iommu, dmar_domain, dev, pasid);
+ else
+ ret = intel_pasid_setup_second_level(iommu, dmar_domain, dev, pasid);
+ spin_unlock(&iommu->lock);
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ if (ret)
+ return ret;
+
+ xa_lock(&dmar_domain->pasids);
+ /*
+ * Each domain could have multiple devices attached with shared or per
+ * device PASIDs. At the domain level, we keep track of unique PASIDs and
+ * device user count.
+ * E.g. If a domain has two devices attached, device A has PASID 0, 1;
+ * device B has PASID 0, 2. Then the domain would have PASID 0, 1, 2.
+ */
+ entry = xa_load(&dmar_domain->pasids, pasid);
+ if (entry) {
+ pinfo = entry;
+ } else {
+ pinfo = kzalloc(sizeof(*pinfo), GFP_ATOMIC);
+ if (!pinfo)
+ return -ENOMEM;
+ pinfo->pasid = pasid;
+ /* Store the new PASID info in the per domain array */
+ ret = xa_err(__xa_store(&dmar_domain->pasids, pasid, pinfo,
+ GFP_ATOMIC));
+ if (ret)
+ goto xa_store_err;
+ }
+ /* Store PASID in per device-domain array, this is for tracking devTLB */
+ ret = xa_err(xa_store(&info->pasids, pasid, pinfo, GFP_ATOMIC));
+ if (ret)
+ goto xa_store_err;
+
+ atomic_inc(&pinfo->users);
+ xa_unlock(&dmar_domain->pasids);
+
+ return 0;
+
+xa_store_err:
+ xa_unlock(&dmar_domain->pasids);
+ spin_lock_irqsave(&iommu->lock, flags);
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+ if (!atomic_read(&pinfo->users)) {
+ __xa_erase(&dmar_domain->pasids, pasid);
+ kfree(pinfo);
+ }
+ return ret;
+}
+
+static void intel_iommu_detach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_domain_info *info = get_domain_info(dev);
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long flags;
+ struct pasid_info *pinfo;
+
+ if (WARN_ON(!is_device_domain_attached(dmar_domain, dev)))
+ return;
+
+ spin_lock_irqsave(&iommu->lock, flags);
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+ xa_lock(&dmar_domain->pasids);
+ pinfo = xa_load(&dmar_domain->pasids, pasid);
+ if (!pinfo) {
+ dev_err(dev, "PASID %d not attached!\n", pasid);
+ xa_unlock(&dmar_domain->pasids);
+ return;
+ }
+
+ xa_erase(&info->pasids, pasid);
+ if (atomic_dec_and_test(&pinfo->users)) {
+ __xa_erase(&dmar_domain->pasids, pasid);
+ kfree(pinfo);
+ }
+ xa_unlock(&dmar_domain->pasids);
+}
+
static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
int bus, int devfn,
struct device *dev,
@@ -2590,6 +2774,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->pasid_table = NULL;
info->auxd_enabled = 0;
INIT_LIST_HEAD(&info->subdevices);
+ xa_init(&info->pasids);
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5165,6 +5350,7 @@ static void intel_iommu_probe_finalize(struct device *dev)
iommu_setup_dma_ops(dev, 0, U64_MAX);
}
+
static void intel_iommu_get_resv_regions(struct device *device,
struct list_head *head)
{
@@ -5491,6 +5677,8 @@ const struct iommu_ops intel_iommu_ops = {
.enable_nesting = intel_iommu_enable_nesting,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
+ .attach_dev_pasid = intel_iommu_attach_dev_pasid,
+ .detach_dev_pasid = intel_iommu_detach_dev_pasid,
.aux_attach_dev = intel_iommu_aux_attach_device,
.aux_detach_dev = intel_iommu_aux_detach_device,
.aux_get_pasid = intel_iommu_aux_get_pasid,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 82955524fad8..3f4c98f170ec 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -567,7 +567,7 @@ struct dmar_domain {
* The default pasid used for non-SVM
* traffic on mediated devices.
*/
-
+ struct xarray pasids; /* Tracks the PASIDs attached to this domain */
struct iommu_domain domain; /* generic domain data structure for
iommu core */
};
@@ -628,6 +628,15 @@ struct subdev_domain_info {
int users; /* user count */
};
+struct pasid_info {
+ struct device_domain_info *info;
+ ioasid_t pasid;
+ atomic_t users; /* Count the number of devices attached
+ * with the PASID
+ */
+ u32 flags; /* permission and other attributes */
+};
+
/* PCI domain-device relationship */
struct device_domain_info {
struct list_head link; /* link to domain siblings */
@@ -650,6 +659,7 @@ struct device_domain_info {
struct intel_iommu *iommu; /* IOMMU used by this device */
struct dmar_domain *domain; /* pointer to domain */
struct pasid_table *pasid_table; /* pasid table */
+ struct xarray pasids; /* PASIDs attached to this domain-device */
};
static inline void __iommu_flush_cache(
--
2.25.1
On Mon, Mar 14, 2022 at 10:07:09PM -0700, Jacob Pan wrote:
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
>
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a PASID for work submission within
> the kernel
> 2. Enable the kernel PASID on the IOMMU for the device
> 3. Attach the kernel PASID to the device's default DMA domain, let it
> be IOVA or physical address in case of pass-through.
>
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
>
> Signed-off-by: Jacob Pan <[email protected]>
> drivers/iommu/dma-iommu.c | 65 +++++++++++++++++++++++++++++++++++++++
> include/linux/dma-iommu.h | 7 +++++
> include/linux/iommu.h | 9 ++++++
> 3 files changed, 81 insertions(+)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index b22034975301..d0ff1a34b1b6 100644
> +++ b/drivers/iommu/dma-iommu.c
> @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> IOMMU_DMA_MSI_COOKIE,
> };
>
> +static DECLARE_IOASID_SET(iommu_dma_pasid);
> +
> struct iommu_dma_cookie {
> enum iommu_dma_cookie_type type;
> union {
> @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
> domain->iova_cookie = NULL;
> }
>
> +/**
> + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> + * @dev: Device to be enabled
> + *
> + * DMA request with PASID will be mapped the same way as the legacy DMA.
> + * If the device is in pass-through, PASID will also pass-through. If the
> + * device is in IOVA map, the supervisor PASID will point to the same IOVA
> + * page table.
> + *
> + * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
> + */
> +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> +{
> + struct iommu_domain *dom;
> + ioasid_t id, max;
> + int ret;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> + return -ENODEV;
Given the purpose of this API I think it should assert that the device
has the DMA API in-use using the machinery from the other series.
ie this should not be used to clone non-DMA API iommu_domains..
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 24607dc3c2ac..e6cb9b52a420 100644
> +++ b/include/linux/dma-iommu.h
> @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> void iommu_put_dma_cookie(struct iommu_domain *domain);
>
> +/*
> + * For devices that can do DMA request with PASID, setup a system PASID.
> + * Address modes (IOVA, PA) are selected by the platform code.
> + */
> +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
The functions already have a kdoc, don't need two..
Jason
On Mon, Mar 14, 2022 at 10:07:12PM -0700, Jacob Pan wrote:
> No one is using drvdata for sva_bind_device after kernel SVA support is
> removed from VT-d driver. Remove the drvdata parameter as well.
>
> Signed-off-by: Jacob Pan <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
> ---
> drivers/dma/idxd/cdev.c | 2 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++---
> drivers/iommu/intel/svm.c | 9 ++++-----
> drivers/iommu/iommu.c | 4 ++--
> drivers/misc/uacce/uacce.c | 2 +-
> include/linux/intel-iommu.h | 3 +--
> include/linux/iommu.h | 9 +++------
> 8 files changed, 15 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> index b9b2b4a4124e..312ec37ebf91 100644
> --- a/drivers/dma/idxd/cdev.c
> +++ b/drivers/dma/idxd/cdev.c
> @@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
> filp->private_data = ctx;
>
> if (device_pasid_enabled(idxd)) {
> - sva = iommu_sva_bind_device(dev, current->mm, NULL);
> + sva = iommu_sva_bind_device(dev, current->mm);
> if (IS_ERR(sva)) {
> rc = PTR_ERR(sva);
> dev_err(dev, "pasid allocation failed: %d\n", rc);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index a737ba5f727e..eb2f5cb0701a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> }
>
> struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> {
> struct iommu_sva *handle;
> struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index cd48590ada30..d2ba86470c42 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
> int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
> int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
> bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
> -struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
> - void *drvdata);
> +struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
> void arm_smmu_sva_unbind(struct iommu_sva *handle);
> u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
> void arm_smmu_sva_notifier_synchronize(void);
> @@ -791,7 +790,7 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
> }
>
> static inline struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> {
> return ERR_PTR(-ENODEV);
> }
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 37d6218f173b..94deb58375f5 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -500,8 +500,7 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
> return ret;
> }
>
> -static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
> - unsigned int flags)
> +static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
> {
> ioasid_t max_pasid = dev_is_pci(dev) ?
> pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
> @@ -1002,20 +1001,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> return IRQ_RETVAL(handled);
> }
>
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
> {
> struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> struct iommu_sva *sva;
> int ret;
>
> mutex_lock(&pasid_mutex);
> - ret = intel_svm_alloc_pasid(dev, mm, flags);
> + ret = intel_svm_alloc_pasid(dev, mm);
> if (ret) {
> mutex_unlock(&pasid_mutex);
> return ERR_PTR(ret);
> }
>
> - sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> + sva = intel_svm_bind_mm(iommu, dev, mm);
> if (IS_ERR_OR_NULL(sva))
> intel_svm_free_pasid(mm);
> mutex_unlock(&pasid_mutex);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 107dcf5938d6..fef34879bc0c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3049,7 +3049,7 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
> * On error, returns an ERR_PTR value.
> */
> struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> {
> struct iommu_group *group;
> struct iommu_sva *handle = ERR_PTR(-EINVAL);
> @@ -3074,7 +3074,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
> if (iommu_group_device_count(group) != 1)
> goto out_unlock;
>
> - handle = ops->sva_bind(dev, mm, drvdata);
> + handle = ops->sva_bind(dev, mm);
>
> out_unlock:
> mutex_unlock(&group->mutex);
> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> index 281c54003edc..3238a867ea51 100644
> --- a/drivers/misc/uacce/uacce.c
> +++ b/drivers/misc/uacce/uacce.c
> @@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
> if (!(uacce->flags & UACCE_DEV_SVA))
> return 0;
>
> - handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
> + handle = iommu_sva_bind_device(uacce->parent, current->mm);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 3f4c98f170ec..9dc855d7479d 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -777,8 +777,7 @@ extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> struct iommu_gpasid_bind_data *data);
> int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
> - void *drvdata);
> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
> void intel_svm_unbind(struct iommu_sva *handle);
> u32 intel_svm_get_pasid(struct iommu_sva *handle);
> int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index fb011722e4f8..b570b37181ad 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -294,9 +294,7 @@ struct iommu_ops {
> int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
> void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
> int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
> -
> - struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
> - void *drvdata);
> + struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
> void (*sva_unbind)(struct iommu_sva *handle);
> int (*attach_dev_pasid)(struct iommu_domain *domain,
> struct device *dev, ioasid_t id);
> @@ -705,8 +703,7 @@ void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
> int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);
>
> struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> - struct mm_struct *mm,
> - void *drvdata);
> + struct mm_struct *mm);
> void iommu_sva_unbind_device(struct iommu_sva *handle);
> u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>
> @@ -1065,7 +1062,7 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
> }
>
> static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> {
> return NULL;
> }
> --
> 2.25.1
>
> From: Jacob Pan <[email protected]>
> Sent: Tuesday, March 15, 2022 1:07 PM
>
> In-kernel DMA with PASID should use DMA API now, remove supervisor
> PASID
> SVA support. Remove special cases in bind mm and page request service.
>
> Signed-off-by: Jacob Pan <[email protected]>
so you removed all the references to SVM_FLAG_SUPERVISOR_MODE
but the definition is still kept in include/linux/intel-svm.h...
> ---
> drivers/iommu/intel/svm.c | 42 ++++++++-------------------------------
> 1 file changed, 8 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 2c53689da461..37d6218f173b 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -516,11 +516,10 @@ static void intel_svm_free_pasid(struct mm_struct
> *mm)
>
> static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
> struct device *dev,
> - struct mm_struct *mm,
> - unsigned int flags)
> + struct mm_struct *mm)
> {
> struct device_domain_info *info = get_domain_info(dev);
> - unsigned long iflags, sflags;
> + unsigned long iflags, sflags = 0;
> struct intel_svm_dev *sdev;
> struct intel_svm *svm;
> int ret = 0;
> @@ -533,16 +532,13 @@ static struct iommu_sva
> *intel_svm_bind_mm(struct intel_iommu *iommu,
>
> svm->pasid = mm->pasid;
> svm->mm = mm;
> - svm->flags = flags;
> INIT_LIST_HEAD_RCU(&svm->devs);
>
> - if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
> - svm->notifier.ops = &intel_mmuops;
> - ret = mmu_notifier_register(&svm->notifier, mm);
> - if (ret) {
> - kfree(svm);
> - return ERR_PTR(ret);
> - }
> + svm->notifier.ops = &intel_mmuops;
> + ret = mmu_notifier_register(&svm->notifier, mm);
> + if (ret) {
> + kfree(svm);
> + return ERR_PTR(ret);
> }
>
> ret = pasid_private_add(svm->pasid, svm);
> @@ -583,8 +579,6 @@ static struct iommu_sva *intel_svm_bind_mm(struct
> intel_iommu *iommu,
> }
>
> /* Setup the pasid table: */
> - sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
> - PASID_FLAG_SUPERVISOR_MODE : 0;
> sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ?
> PASID_FLAG_FL5LP : 0;
> spin_lock_irqsave(&iommu->lock, iflags);
> ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm-
> >pasid,
> @@ -957,7 +951,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> * to unbind the mm while any page faults are
> outstanding.
> */
> svm = pasid_private_find(req->pasid);
> - if (IS_ERR_OR_NULL(svm) || (svm->flags &
> SVM_FLAG_SUPERVISOR_MODE))
> + if (IS_ERR_OR_NULL(svm))
> goto bad_req;
> }
>
> @@ -1011,29 +1005,9 @@ static irqreturn_t prq_event_thread(int irq, void
> *d)
> struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
> *mm, void *drvdata)
> {
> struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> - unsigned int flags = 0;
> struct iommu_sva *sva;
> int ret;
>
> - if (drvdata)
> - flags = *(unsigned int *)drvdata;
> -
> - if (flags & SVM_FLAG_SUPERVISOR_MODE) {
> - if (!ecap_srs(iommu->ecap)) {
> - dev_err(dev, "%s: Supervisor PASID not supported\n",
> - iommu->name);
> - return ERR_PTR(-EOPNOTSUPP);
> - }
> -
> - if (mm) {
> - dev_err(dev, "%s: Supervisor PASID with user
> provided mm\n",
> - iommu->name);
> - return ERR_PTR(-EINVAL);
> - }
> -
> - mm = &init_mm;
> - }
> -
> mutex_lock(&pasid_mutex);
> ret = intel_svm_alloc_pasid(dev, mm, flags);
> if (ret) {
> --
> 2.25.1
> From: Jacob Pan <[email protected]>
> Sent: Tuesday, March 15, 2022 1:07 PM
The coverletter is [0/8] but here you actually have the 9th patch...
>
> From: Dave Jiang <[email protected]>
>
> The idxd driver always gated the pasid enabling under a single knob and
> this assumption is incorrect. The pasid used for kernel operation can be
> independently toggled and has no dependency on the user pasid (and vice
> versa). Split the two so they are independent "enabled" flags.
While you said kernel vs. user sva can be independently toggled you still
only have a single option 'sva' to gate both in the end.
Thanks
Kevin
On 2022/3/15 13:07, Jacob Pan wrote:
> From: Lu Baolu <[email protected]>
>
> An IOMMU domain represents an address space which can be attached by
> devices that perform DMA within a domain. However, for platforms with
> PASID capability the domain attachment needs be handled at device+PASID
> level. There can be multiple PASIDs within a device and multiple devices
> attached to a given domain.
> This patch introduces a new IOMMU op which support device, PASID, and
> IOMMU domain attachment. The immediate use case is for PASID capable
> devices to perform DMA under DMA APIs.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> include/linux/iommu.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 369f05c2a4e2..fde5b933dbe3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
> * @aux_get_pasid: get the pasid given an aux-domain
> * @sva_bind: Bind process address space to device
> * @sva_unbind: Unbind process address space from device
> + * @attach_dev_pasid: attach an iommu domain to a pasid of device
> + * @detach_dev_pasid: detach an iommu domain from a pasid of device
> * @sva_get_pasid: Get PASID associated to a SVA handle
> * @page_response: handle page request response
> * @cache_invalidate: invalidate translation caches
> @@ -296,6 +298,10 @@ struct iommu_ops {
> struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
> void *drvdata);
> void (*sva_unbind)(struct iommu_sva *handle);
> + int (*attach_dev_pasid)(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id);
> + void (*detach_dev_pasid)(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id);
As we have introduced iommu_domain_ops, these callbacks should be part
of the domain ops.
> u32 (*sva_get_pasid)(struct iommu_sva *handle);
>
> int (*page_response)(struct device *dev,
Best regards,
baolu
On 2022/3/15 13:07, Jacob Pan wrote:
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
>
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a PASID for work submission within
> the kernel
> 2. Enable the kernel PASID on the IOMMU for the device
> 3. Attach the kernel PASID to the device's default DMA domain, let it
> be IOVA or physical address in case of pass-through.
>
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dma-iommu.c | 65 +++++++++++++++++++++++++++++++++++++++
> include/linux/dma-iommu.h | 7 +++++
> include/linux/iommu.h | 9 ++++++
> 3 files changed, 81 insertions(+)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index b22034975301..d0ff1a34b1b6 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> IOMMU_DMA_MSI_COOKIE,
> };
>
> +static DECLARE_IOASID_SET(iommu_dma_pasid);
> +
> struct iommu_dma_cookie {
> enum iommu_dma_cookie_type type;
> union {
> @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
> domain->iova_cookie = NULL;
> }
>
> +/**
> + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> + * @dev: Device to be enabled
> + *
> + * DMA request with PASID will be mapped the same way as the legacy DMA.
> + * If the device is in pass-through, PASID will also pass-through. If the
> + * device is in IOVA map, the supervisor PASID will point to the same IOVA
> + * page table.
> + *
> + * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
The comment on the return value should be rephrased according to the
real code.
> + */
> +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> +{
> + struct iommu_domain *dom;
> + ioasid_t id, max;
> + int ret;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> + return -ENODEV;
> + max = iommu_get_dev_pasid_max(dev);
> + if (!max)
> + return -EINVAL;
> +
> + id = ioasid_alloc(&iommu_dma_pasid, 1, max, dev);
> + if (id == INVALID_IOASID)
> + return -ENOMEM;
> +
> + ret = dom->ops->attach_dev_pasid(dom, dev, id);
> + if (ret) {
> + ioasid_put(id);
> + return ret;
> + }
> + *pasid = id;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(iommu_enable_pasid_dma);
> +
> +/**
> + * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
> + * @dev: Device's PASID DMA to be disabled
> + *
> + * It is the device driver's responsibility to ensure no more incoming DMA
> + * requests with the kernel PASID before calling this function. IOMMU driver
> + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
> + * drained.
> + *
> + * @return 0 on success or error code on failure
Ditto.
> + */
> +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
> +{
> + struct iommu_domain *dom;
> +
> + /* TODO: check the given PASID is within the ioasid_set */
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom->ops->detach_dev_pasid)
> + return;
> + dom->ops->detach_dev_pasid(dom, dev, pasid);
> + ioasid_put(pasid);
> +}
> +EXPORT_SYMBOL(iommu_disable_pasid_dma);
> +
> /**
> * iommu_dma_get_resv_regions - Reserved region driver helper
> * @dev: Device from iommu_get_resv_regions()
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 24607dc3c2ac..e6cb9b52a420 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> void iommu_put_dma_cookie(struct iommu_domain *domain);
>
> +/*
> + * For devices that can do DMA request with PASID, setup a system PASID.
> + * Address modes (IOVA, PA) are selected by the platform code.
> + */
No need for a comment here. Or move it to the function if need.
> +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
> +
> /* Setup call for arch DMA mapping code */
> void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
> int iommu_dma_init_fq(struct iommu_domain *domain);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index fde5b933dbe3..fb011722e4f8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -395,6 +395,15 @@ static inline void iommu_set_dev_pasid_max(struct device *dev,
>
> param->pasid_max = max;
> }
> +static inline ioasid_t iommu_get_dev_pasid_max(struct device *dev)
> +{
> + struct dev_iommu *param = dev->iommu;
> +
> + if (WARN_ON(!param))
> + return 0;
> +
> + return param->pasid_max;
> +}
>
> int iommu_device_register(struct iommu_device *iommu,
> const struct iommu_ops *ops,
Best regards,
baolu
On 2022/3/15 19:49, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker<[email protected]>
>> Sent: Tuesday, March 15, 2022 7:27 PM
>>
>> On Mon, Mar 14, 2022 at 10:07:06PM -0700, Jacob Pan wrote:
>>> From: Lu Baolu<[email protected]>
>>>
>>> An IOMMU domain represents an address space which can be attached by
>>> devices that perform DMA within a domain. However, for platforms with
>>> PASID capability the domain attachment needs be handled at device+PASID
>>> level. There can be multiple PASIDs within a device and multiple devices
>>> attached to a given domain.
>>> This patch introduces a new IOMMU op which support device, PASID, and
>>> IOMMU domain attachment. The immediate use case is for PASID capable
>>> devices to perform DMA under DMA APIs.
>>>
>>> Signed-off-by: Lu Baolu<[email protected]>
>>> Signed-off-by: Jacob Pan<[email protected]>
>>> ---
>>> include/linux/iommu.h | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 369f05c2a4e2..fde5b933dbe3 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
>>> * @aux_get_pasid: get the pasid given an aux-domain
>>> * @sva_bind: Bind process address space to device
>>> * @sva_unbind: Unbind process address space from device
>>> + * @attach_dev_pasid: attach an iommu domain to a pasid of device
>>> + * @detach_dev_pasid: detach an iommu domain from a pasid of device
>> Isn't that operation "assign a PASID to a domain" instead? In patch 5,
>> the domain is already attached to the device, so set_domain_pasid() might
>> be clearer and to the point. If the IOMMU driver did the allocation we
>> could also avoid patch 1.
> iiuc this API can also work for future SIOV usage where each mdev attached
> to the domain has its own pasid. "assigning a PASID to a domain" sounds
> like going back to the previous aux domain approach which has one PASID
> per domain and that PASID is used on all devices attached to the aux domain...
>
This also works for SVA as far as I can see. The sva_bind essentially is
to attach an SVA domain to the PASID of a device. The sva_bind/unbind
ops could be removed with these two new callbacks.
Best regards,
baolu
On Fri, Mar 18, 2022 at 08:01:04PM +0800, Lu Baolu wrote:
> On 2022/3/15 19:49, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker<[email protected]>
> > > Sent: Tuesday, March 15, 2022 7:27 PM
> > >
> > > On Mon, Mar 14, 2022 at 10:07:06PM -0700, Jacob Pan wrote:
> > > > From: Lu Baolu<[email protected]>
> > > >
> > > > An IOMMU domain represents an address space which can be attached by
> > > > devices that perform DMA within a domain. However, for platforms with
> > > > PASID capability the domain attachment needs be handled at device+PASID
> > > > level. There can be multiple PASIDs within a device and multiple devices
> > > > attached to a given domain.
> > > > This patch introduces a new IOMMU op which support device, PASID, and
> > > > IOMMU domain attachment. The immediate use case is for PASID capable
> > > > devices to perform DMA under DMA APIs.
> > > >
> > > > Signed-off-by: Lu Baolu<[email protected]>
> > > > Signed-off-by: Jacob Pan<[email protected]>
> > > > include/linux/iommu.h | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index 369f05c2a4e2..fde5b933dbe3 100644
> > > > +++ b/include/linux/iommu.h
> > > > @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
> > > > * @aux_get_pasid: get the pasid given an aux-domain
> > > > * @sva_bind: Bind process address space to device
> > > > * @sva_unbind: Unbind process address space from device
> > > > + * @attach_dev_pasid: attach an iommu domain to a pasid of device
> > > > + * @detach_dev_pasid: detach an iommu domain from a pasid of device
> > > Isn't that operation "assign a PASID to a domain" instead? In patch 5,
> > > the domain is already attached to the device, so set_domain_pasid() might
> > > be clearer and to the point. If the IOMMU driver did the allocation we
> > > could also avoid patch 1.
> > iiuc this API can also work for future SIOV usage where each mdev attached
> > to the domain has its own pasid. "assigning a PASID to a domain" sounds
> > like going back to the previous aux domain approach which has one PASID
> > per domain and that PASID is used on all devices attached to the aux domain...
> >
>
> This also works for SVA as far as I can see. The sva_bind essentially is
> to attach an SVA domain to the PASID of a device. The sva_bind/unbind
> ops could be removed with these two new callbacks.
As we talked before I would like to see 'sva bind' go away and be
replaced with:
domain = alloc_sva_iommu_domain(device, current)
attach_dev_pasid_domain(domain, device, pasid)
Which composes better with everything else. SVA is just a special kind
of iommu_domain
Jason
On Fri, Mar 18, 2022 at 07:52:33PM +0800, Lu Baolu wrote:
> On 2022/3/15 13:07, Jacob Pan wrote:
> > From: Lu Baolu <[email protected]>
> >
> > An IOMMU domain represents an address space which can be attached by
> > devices that perform DMA within a domain. However, for platforms with
> > PASID capability the domain attachment needs be handled at device+PASID
> > level. There can be multiple PASIDs within a device and multiple devices
> > attached to a given domain.
> > This patch introduces a new IOMMU op which support device, PASID, and
> > IOMMU domain attachment. The immediate use case is for PASID capable
> > devices to perform DMA under DMA APIs.
> >
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > include/linux/iommu.h | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 369f05c2a4e2..fde5b933dbe3 100644
> > +++ b/include/linux/iommu.h
> > @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
> > * @aux_get_pasid: get the pasid given an aux-domain
> > * @sva_bind: Bind process address space to device
> > * @sva_unbind: Unbind process address space from device
> > + * @attach_dev_pasid: attach an iommu domain to a pasid of device
> > + * @detach_dev_pasid: detach an iommu domain from a pasid of device
> > * @sva_get_pasid: Get PASID associated to a SVA handle
> > * @page_response: handle page request response
> > * @cache_invalidate: invalidate translation caches
> > @@ -296,6 +298,10 @@ struct iommu_ops {
> > struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
> > void *drvdata);
> > void (*sva_unbind)(struct iommu_sva *handle);
> > + int (*attach_dev_pasid)(struct iommu_domain *domain,
> > + struct device *dev, ioasid_t id);
> > + void (*detach_dev_pasid)(struct iommu_domain *domain,
> > + struct device *dev, ioasid_t id);
>
> As we have introduced iommu_domain_ops, these callbacks should be part
> of the domain ops.
+1
Jason
Hi BaoLu,
On Fri, 18 Mar 2022 20:43:54 +0800, Lu Baolu <[email protected]>
wrote:
> On 2022/3/15 13:07, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> >
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> >
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/dma-iommu.c | 65 +++++++++++++++++++++++++++++++++++++++
> > include/linux/dma-iommu.h | 7 +++++
> > include/linux/iommu.h | 9 ++++++
> > 3 files changed, 81 insertions(+)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index b22034975301..d0ff1a34b1b6 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> > IOMMU_DMA_MSI_COOKIE,
> > };
> >
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> > struct iommu_dma_cookie {
> > enum iommu_dma_cookie_type type;
> > union {
> > @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> > }
> >
> > +/**
> > + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> > + * @dev: Device to be enabled
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA map, the supervisor PASID will point to the same
> > IOVA
> > + * page table.
> > + *
> > + * @return the kernel PASID to be used for DMA or INVALID_IOASID on
> > failure
>
> The comment on the return value should be rephrased according to the
> real code.
>
yes, will do.
> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > +{
> > + struct iommu_domain *dom;
> > + ioasid_t id, max;
> > + int ret;
> > +
> > + dom = iommu_get_domain_for_dev(dev);
> > + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > + return -ENODEV;
> > + max = iommu_get_dev_pasid_max(dev);
> > + if (!max)
> > + return -EINVAL;
> > +
> > + id = ioasid_alloc(&iommu_dma_pasid, 1, max, dev);
> > + if (id == INVALID_IOASID)
> > + return -ENOMEM;
> > +
> > + ret = dom->ops->attach_dev_pasid(dom, dev, id);
> > + if (ret) {
> > + ioasid_put(id);
> > + return ret;
> > + }
> > + *pasid = id;
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL(iommu_enable_pasid_dma);
> > +
> > +/**
> > + * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
> > + * @dev: Device's PASID DMA to be disabled
> > + *
> > + * It is the device driver's responsibility to ensure no more incoming
> > DMA
> > + * requests with the kernel PASID before calling this function. IOMMU
> > driver
> > + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared
> > and
> > + * drained.
> > + *
> > + * @return 0 on success or error code on failure
>
> Ditto.
>
same
> > + */
> > +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
> > +{
> > + struct iommu_domain *dom;
> > +
> > + /* TODO: check the given PASID is within the ioasid_set */
> > + dom = iommu_get_domain_for_dev(dev);
> > + if (!dom->ops->detach_dev_pasid)
> > + return;
> > + dom->ops->detach_dev_pasid(dom, dev, pasid);
> > + ioasid_put(pasid);
> > +}
> > +EXPORT_SYMBOL(iommu_disable_pasid_dma);
> > +
> > /**
> > * iommu_dma_get_resv_regions - Reserved region driver helper
> > * @dev: Device from iommu_get_resv_regions()
> > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> > index 24607dc3c2ac..e6cb9b52a420 100644
> > --- a/include/linux/dma-iommu.h
> > +++ b/include/linux/dma-iommu.h
> > @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain
> > *domain); int iommu_get_msi_cookie(struct iommu_domain *domain,
> > dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain
> > *domain);
> > +/*
> > + * For devices that can do DMA request with PASID, setup a system
> > PASID.
> > + * Address modes (IOVA, PA) are selected by the platform code.
> > + */
>
> No need for a comment here. Or move it to the function if need.
>
right, this comment is obsolete. will remove.
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> > +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
> > +
> > /* Setup call for arch DMA mapping code */
> > void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64
> > dma_limit); int iommu_dma_init_fq(struct iommu_domain *domain);
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index fde5b933dbe3..fb011722e4f8 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -395,6 +395,15 @@ static inline void iommu_set_dev_pasid_max(struct
> > device *dev,
> > param->pasid_max = max;
> > }
> > +static inline ioasid_t iommu_get_dev_pasid_max(struct device *dev)
> > +{
> > + struct dev_iommu *param = dev->iommu;
> > +
> > + if (WARN_ON(!param))
> > + return 0;
> > +
> > + return param->pasid_max;
> > +}
> >
> > int iommu_device_register(struct iommu_device *iommu,
> > const struct iommu_ops *ops,
>
> Best regards,
> baolu
Thanks,
Jacob
Hi Kevin,
On Fri, 18 Mar 2022 06:16:58 +0000, "Tian, Kevin" <[email protected]>
wrote:
> > From: Jacob Pan <[email protected]>
> > Sent: Tuesday, March 15, 2022 1:07 PM
> >
> > In-kernel DMA with PASID should use DMA API now, remove supervisor
> > PASID
> > SVA support. Remove special cases in bind mm and page request service.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
>
> so you removed all the references to SVM_FLAG_SUPERVISOR_MODE
> but the definition is still kept in include/linux/intel-svm.h...
>
Good catch, will remove.
> > ---
> > drivers/iommu/intel/svm.c | 42 ++++++++-------------------------------
> > 1 file changed, 8 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index 2c53689da461..37d6218f173b 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -516,11 +516,10 @@ static void intel_svm_free_pasid(struct mm_struct
> > *mm)
> >
> > static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
> > struct device *dev,
> > - struct mm_struct *mm,
> > - unsigned int flags)
> > + struct mm_struct *mm)
> > {
> > struct device_domain_info *info = get_domain_info(dev);
> > - unsigned long iflags, sflags;
> > + unsigned long iflags, sflags = 0;
> > struct intel_svm_dev *sdev;
> > struct intel_svm *svm;
> > int ret = 0;
> > @@ -533,16 +532,13 @@ static struct iommu_sva
> > *intel_svm_bind_mm(struct intel_iommu *iommu,
> >
> > svm->pasid = mm->pasid;
> > svm->mm = mm;
> > - svm->flags = flags;
> > INIT_LIST_HEAD_RCU(&svm->devs);
> >
> > - if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
> > - svm->notifier.ops = &intel_mmuops;
> > - ret = mmu_notifier_register(&svm->notifier,
> > mm);
> > - if (ret) {
> > - kfree(svm);
> > - return ERR_PTR(ret);
> > - }
> > + svm->notifier.ops = &intel_mmuops;
> > + ret = mmu_notifier_register(&svm->notifier, mm);
> > + if (ret) {
> > + kfree(svm);
> > + return ERR_PTR(ret);
> > }
> >
> > ret = pasid_private_add(svm->pasid, svm);
> > @@ -583,8 +579,6 @@ static struct iommu_sva *intel_svm_bind_mm(struct
> > intel_iommu *iommu,
> > }
> >
> > /* Setup the pasid table: */
> > - sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
> > - PASID_FLAG_SUPERVISOR_MODE : 0;
> > sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ?
> > PASID_FLAG_FL5LP : 0;
> > spin_lock_irqsave(&iommu->lock, iflags);
> > ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm-
> > >pasid,
> > @@ -957,7 +951,7 @@ static irqreturn_t prq_event_thread(int irq, void
> > *d)
> > * to unbind the mm while any page faults are
> > outstanding.
> > */
> > svm = pasid_private_find(req->pasid);
> > - if (IS_ERR_OR_NULL(svm) || (svm->flags &
> > SVM_FLAG_SUPERVISOR_MODE))
> > + if (IS_ERR_OR_NULL(svm))
> > goto bad_req;
> > }
> >
> > @@ -1011,29 +1005,9 @@ static irqreturn_t prq_event_thread(int irq, void
> > *d)
> > struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
> > *mm, void *drvdata)
> > {
> > struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> > - unsigned int flags = 0;
> > struct iommu_sva *sva;
> > int ret;
> >
> > - if (drvdata)
> > - flags = *(unsigned int *)drvdata;
> > -
> > - if (flags & SVM_FLAG_SUPERVISOR_MODE) {
> > - if (!ecap_srs(iommu->ecap)) {
> > - dev_err(dev, "%s: Supervisor PASID not
> > supported\n",
> > - iommu->name);
> > - return ERR_PTR(-EOPNOTSUPP);
> > - }
> > -
> > - if (mm) {
> > - dev_err(dev, "%s: Supervisor PASID with user
> > provided mm\n",
> > - iommu->name);
> > - return ERR_PTR(-EINVAL);
> > - }
> > -
> > - mm = &init_mm;
> > - }
> > -
> > mutex_lock(&pasid_mutex);
> > ret = intel_svm_alloc_pasid(dev, mm, flags);
> > if (ret) {
> > --
> > 2.25.1
>
Thanks,
Jacob